+ All Categories
Home > Documents > Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses...

Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses...

Date post: 14-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
30
The Impact on Survey Operations and Sampling Jizhou Fu and Lee Fiorio Modeling Coverage Error in Address Lists Due to Geocoding Error: AAPOR 2012, Orlando
Transcript
Page 1: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

The Impact on Survey Operations and Sampling

Jizhou Fu and Lee Fiorio

Modeling Coverage Error in Address Lists Due to Geocoding Error:

AAPOR 2012, Orlando

Page 2: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504
Page 3: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• ABS Background• Analysis Goals• Data and Methodology• Results• Discussion • Limitations

Outline

3

Page 4: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Address based frames first need geographical boundaries• Types of address-based frames

• US Postal Service Delivery Sequence File (DSF)– Purchased through market research vendors– Updated frequently– Adequate replacement for field listing in urban and suburban areas

• Dependent or Enhanced Listing– Provide DSF to listers for enhancement in the field– Reduces cost and increases accuracy of traditional lisitng

• Because of costs, DSF should be used where possible• Enhanced listing should be used where DSF is inadequate• Evaluating DSF coverage: DSF-to-Census Ratio

Address-Based Sampling (ABS) Background

4

Page 5: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Geographic information on the DSF:• Address, city, county, state, zip, zip4, carrier route, walk

sequence

• Geographic information not on the DSF:• Census block, census block group, census tract, latitude or

longitude

• Geocoding • Appends latitude and longitude as well as census geography• Requires commercial software • PO Boxes and Rural Route address not easily geocoded• Potential for error

DSF Geography

5

Page 6: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Geocoding Error

6

Page 7: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

7

Geocoding Error

Page 8: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

8

Geocoding Error

Page 9: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

9

Geocoding Error

Page 10: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

10

Geocoding Error

Page 11: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

11

Geocoding Error

Page 12: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• What are the correlates of geocoding error?• Logistic Model

– Urbanicity– Housing unit density– Vacancy rates– Drop delivery– Housing unit type (single family home, apartment)– Home ownership– Adjacent to water blocks

• Does geocoding error exhibit spatial clustering?• Moran’s I• Logistic Model

– Autocovariate

Research Questions

12

Page 13: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• NORC National Frame Listing effort• Fall 2011• Out of 1,516 segments (census tracts or block groups), 126 segments

needed enhancement• Device based listing

– Latitude and longitude collected– Segment level address list– Real-time QC in central office

• Selected 21 enhanced segments for analysis• Geocapture worked for at least 90% of addresses• Mix of urban and rural• Range of DSF-to Census ratios -- 0.31 to .81

• 8,560 DSF lines

Data and Methodology

13

Page 14: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

14

Geocoding error: over-coverage vs under-coverage

Addresses added in the

field

Final enhanced list

Confirmed DSF

addresses

Unconfirmed DSF

addresses

DSF

(over-coverage) (under-coverage)(coverage)

4,8597,5041,056

• 12.3% of DSF lines unconfirmed in field

• Difficult to separate causes of under-coverage

• Focus on over-coverage

Page 15: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Sample drawn of 4,000 DSF lines provided for enhancement• Dependent variable: flag if correctly geocoded into the segment• Independent variables:

• Address-level (DSF)– Drop point flag– Vacant flag– Record type indicator (High rise, rural, single family home)

• Block-level (census)– DSF-to-Census ratio – four categories(<0.9, 0.9 to 1.25, 1.25 to 2, >2)– TEA Code Flag– Type of Enumeration Area– Principal city flag– Water adjacency flag– Housing unit density– Area– Percent Multi-unit

Data and Methodology (cont’d)

15

Page 16: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Table 1: Logistic Model ResultsParameter Estimate

Intercept -***DSF-to-Census <0.9 +***DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***

16

Ratio Categories

Urbanicity

Postal Characteristics

Geographical Considerations

Significance: * p<0.05, ** p<0.01, *** p<0.001

Page 17: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

17

Table 2: A closer look at impact of DSF-to-Census Ratio

Category Parameter Odds Ratio

Signifi-cance

1 DSF-to-Census <0.9 2.25 ***

3 DSF-to-Census 1.25 to 2.0 2.37 **

4 DSF-to-Census >2.0 4.29 ***

• Addresses in category 1 census blocks have the same odds of being geocoded incorrectly as category 2

Page 18: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Does geocoding error exhibit spatial clustering?• Do blocks with geocoding error neighbor blocks with

geocoding error?

y = β1x1 + β2x2 + … + βpWy + ε

• Where Wy is weighted average of neighboring values or ‘spatial lag’

18

Spatial Autocorrelation

Page 19: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Spatial Autocorrelation

19

1 2

3 4 5

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

Example Segment Example Weight Matrix W

Page 20: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Spatial Autocorrelation

20

1 2

3 4 5

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

Example Segment Example Weight Matrix W

Page 21: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Moran’s I – Measure of Spatial Autocorrelation

21

1 2

3 4 5

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

Example Segment Example Weight Matrix W

Page 22: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Spatial Autocorrelation

22

1 2

3 4 5

Example Segment

Error1 12 13 04 05 1

Example variable of interest y

Page 23: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

23

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

y1 12 13 04 05 1

=

Wy1 12 23 14 25 1

Weight Matrix W Geocoding Error y

Spatial Autocorrelation

*

Weighted average of neighbors Wy

Page 24: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Degree of linear association between observed values y and a weighted average of neighboring values Wy

• Observed: 0.0281• Very significant (p < 0.0001)• Positive, indicating possible spatial clustering

• Add Wy to final logistic model

y = Xβ1x1 + Xβ2x2 + … + XβpWy + ε

24

Moran’s I and Spatial Autocorrelation Model

Page 25: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Table 3: Logistic Regression with Spatial AutocovariateParameter EstimateIntercept -***DSF-to-Census <0.9 +**DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***Autocovariate (W.y) +* 25

Ratio Categories

Urbanicity

Postal Characteristics

Geographical Considerations

Page 26: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Map 1: Example of Clustering

26

Page 27: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Map 2: Example of Clustering

27

Page 28: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Urbanicity, postal characteristics, block-level DSF-to-census ratio are highly correlated with geocoding error

• Addresses in low DSF-to-Census ratio blocks have similar odds of geocoding error as addresses in high DSF-to-Census ratio blocks

• Geocoding error exhibits spatial clustering• Problematic blocks within a segments can be used as a potential

flag for larger geocoding error

• Help with address frame decisions

Discussion

28

Page 29: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Analysis was limited to segments that already have less than acceptable DSF coverage

• Possible that census characteristics and DSF flags behave differently above threshold

• Sample of 21 segments used in analysis not random• Limits the ability to generalize findings

• Definition of geocoding error limited to over-coverage error

Limitations

29

Page 30: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Thank You!

Lee Fiorio [email protected]


Recommended