D. Jan van der Laan and Bart F.M. Bakker [email protected]
Indicator for the Representativeness of
Linked Sources
Representativeness
2
Representativeness
3
Representativeness
4
representativeness sensitivity
Representativeness indicator
Representativeness indicator for linkage:
When all records in population have equal probability of being linked (e.g.
S 𝜌𝑋 = 0), the linked data set is representative of the population.
Based on: B. Schouten, F. Cobben and J. Bethlehem (2009). “Indicators for the
representativeness of survey response”, Survey Methodology, 2009, 101-113.
5
𝓁 𝑋 =S(𝜌𝑋)
𝜌 𝑋,
Partial representativeness indicator
Measure
- contribution of single variable
- contribution of category of single variable
Two variants:
- unconditional partial indicator
- conditional partial indicator
6
Example I: employment register
- Target population: employed foreign residents, with exception of
residents with a Belgian or German address.
- Add variables to ER by linking to Population Register
7
Job 1
Address C
Address A
Job 2
Address A
Address D
Person
Address A
Address B
Person
Population Register Employment Register
Example I: linkage results
Data sources and deterministic linkage
Population register Employment register
Number of records 14,336,000 Number of records 12,859,000
Deterministically linked 12,302,000
Foreign address 361,000
To probabilistic linkage 196,000
Probabilistically linked 4,000
8
Representativeness indicator:
Deterministic linkage: 𝓁 = 0.295
Probabilistic linkage: 𝓁 = 0.294
9 Representativeness indicator
Example II: Twin Register
National Twin Register: panel of twins
Health Insurance Database: health insurance claims of one
company (covers ca. 25% of Dutch population)
10
Population NTR0 complete NTR
NTR1 used in linkage
NTR2 linked to HID
Records lost because no permission for linkage
Records lost because they could no be linked to the HID
The NTR is a panel in which twins voluntary participate
11
NTR used in linkage (NTR1) compared to population
𝓁 = 0.61
Linkage result (NTR2) compared to population 𝓁 = 0.67
12
NTR used in linkage (NTR1) compared to population Unconditional indicator
Conditional indicator
Conclusion
Representativeness indicator for linkage
- Under certain conditions upper bound on relative bias
- Results depend on set of covariates used
Applications
- Insight into which subpopulations underrepresented
- Direct further efforts in linkage
- Comparison of linkage algorithms
- Monitoring
13