+ All Categories
Home > Documents > Linking Large Datasets Why, How, and What Not To Do Bradley G Hammill Duke Clinical Research...

Linking Large Datasets Why, How, and What Not To Do Bradley G Hammill Duke Clinical Research...

Date post: 17-Dec-2015
Category:
Upload: florence-martin
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
48
Linking Large Datasets Linking Large Datasets Why, How, and What Not To Do Why, How, and What Not To Do Bradley G Hammill Bradley G Hammill Duke Clinical Research Institute Duke Clinical Research Institute
Transcript

Linking Large DatasetsLinking Large DatasetsWhy, How, and What Not To DoWhy, How, and What Not To Do

Bradley G HammillBradley G Hammill

Duke Clinical Research InstituteDuke Clinical Research Institute

Presenter disclosure informationPresenter disclosure information

Bradley G HammillBradley G Hammill

Linking Large Datasets: Why, How, and What Not Linking Large Datasets: Why, How, and What Not To DoTo Do

FINANCIAL DISCLOSURE: FINANCIAL DISCLOSURE:

NoneNone

UNLABELED/UNAPPROVED USES DISCLOSURE:UNLABELED/UNAPPROVED USES DISCLOSURE:

NoneNone

AcknowledgementsAcknowledgements

Thanks to:Thanks to:

Lesley CurtisLesley Curtis

Adrian HernandezAdrian Hernandez

Gregg FonarowGregg Fonarow

Kevin SchulmanKevin Schulman

Work initially funded by grant from GSKWork initially funded by grant from GSK

Why link Medicare data to registry data?Why link Medicare data to registry data?

MedicationsMedications

VitalsVitals

Lab resultsLab results

ProceduresProcedures

Clinical historyClinical history

In-hospital eventsIn-hospital events

etc.etc.

Long-term follow-up?Long-term follow-up?

Typical inpatient registryTypical inpatient registry

Why link Medicare data to registry data?Why link Medicare data to registry data?

Potential endpointsPotential endpoints

Mortality Mortality

ReadmissionReadmission

ProcedureProcedure

Adverse events (based on diagnoses)Adverse events (based on diagnoses)

InpatientInpatient

Mortality Mortality (or censoring)(or censoring)

Why not link Medicare data to registry data? Why not link Medicare data to registry data?

Linking will not help us address the limitations of Linking will not help us address the limitations of either data sourceeither data source

MedicareMedicare

No information on VA hospitals or managed care No information on VA hospitals or managed care patientspatients

Selective coverage under age 65Selective coverage under age 65

RegistriesRegistries

Voluntary participationVoluntary participation

May over-represent certain regions or hospital typesMay over-represent certain regions or hospital types

Data quality variesData quality varies

How to link Medicare data with registry dataHow to link Medicare data with registry data

Direct identifiersDirect identifiers

Name, address, SSN, date of birth, etc.Name, address, SSN, date of birth, etc.

GoalGoal: Identify each : Identify each registry patientregistry patient in the Medicare in the Medicare datadata

Indirect identifiersIndirect identifiers

Service dates, date of birth (or age), sexService dates, date of birth (or age), sex

GoalGoal: Identify each : Identify each registry hospitalizationregistry hospitalization in the in the Medicare dataMedicare data

Linking registry data to Medicare claimsLinking registry data to Medicare claims

StepStep 1. 1. Subset registry data Subset registry data

Step Step 2.2. Subset Medicare dataSubset Medicare data

Step Step 3.3. Link hospital identifiersLink hospital identifiers

Step Step 4.4. Link hospitalization recordsLink hospitalization records

Described in:Described in:

Hammill BG, Hernandez AF, Peterson ED, Fonarow GC, Hammill BG, Hernandez AF, Peterson ED, Fonarow GC, Schulman KA, Curtis LH. Linking inpatient clinical registry Schulman KA, Curtis LH. Linking inpatient clinical registry data to Medicare claims data using indirect identifiers. data to Medicare claims data using indirect identifiers. Am Am Heart JHeart J 2009 June;157(6):995-1000. 2009 June;157(6):995-1000.

You will have this conversation [Episode 1]You will have this conversation [Episode 1]

Me:Me: You know, we can link these data to You know, we can link these data to Medicare.Medicare.

Adrian:Adrian: How? We don’t know who the hospitals or How? We don’t know who the hospitals or the patients are?the patients are?

Me:Me: Turns out you don’t really need to know Turns out you don’t really need to know those things.those things.

[Brief explanation of how to link][Brief explanation of how to link]

Adrian:Adrian: (flustered) This feels like a giant leap of faith.(flustered) This feels like a giant leap of faith.

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit DischargeDischarge

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit DischargeDischarge DOBDOB

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit DOBDOB

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit DischargeDischarge 2/3 DOB2/3 DOB

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit DischargeDischarge AgeAge

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit DischargeDischarge AgeAge SexSex

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit AgeAge SexSex

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit1d1d DischargeDischarge AgeAge SexSex

Percent of unique records within sitesPercent of unique records within sites

2007 Medicare HF Records2007 Medicare HF Records

AdmitAdmit DischargeDischarge AgeAge1y1y SexSex

Distinguishing records (DOB available)Distinguishing records (DOB available)

VariablesVariables UniqueUnique

AdmitAdmit DischargeDischarge DOBDOB SexSex >99.9%>99.9%

AdmitAdmit DOBDOB SexSex >99.9%>99.9%

DischargeDischarge DOBDOB SexSex >99.9%>99.9%

AdmitAdmit Discharge 2/3 DOBDischarge 2/3 DOB SexSex 99.9%99.9%

AdmitAdmit DischargeDischarge DOBDOB >99.9%>99.9%

Within sites, what percent of 2007 Medicare HF records Within sites, what percent of 2007 Medicare HF records are unique given…are unique given…

Distinguishing records (Age available)Distinguishing records (Age available)

VariablesVariables UniqueUnique

AdmitAdmit DischargeDischarge AgeAge SexSex 99.4%99.4%

Admit DischargeAdmit Discharge1d1d AgeAge SexSex 98.5%98.5%

AdmitAdmit1d1d DischargeDischarge AgeAge SexSex 98.4%98.4%

AdmitAdmit Discharge AgeDischarge Age1y1y SexSex 98.3%98.3%

AdmitAdmit DischargeDischarge AgeAge 98.9%98.9%

Within sites, what percent of 2007 Medicare HF records Within sites, what percent of 2007 Medicare HF records are unique given…are unique given…

Distinguishing records, in generalDistinguishing records, in general

PopulationPopulation2007 HF Records per Site2007 HF Records per Site

Median (Q1, Q3)Median (Q1, Q3)

All recordsAll records 456456 (194, 1734)(194, 1734)

Heart failure, anyHeart failure, any 8989 (22, 391)(22, 391)

Heart failure, primaryHeart failure, primary 6464 (20, 168)(20, 168)

CABG procedureCABG procedure 7171 (36, 124)(36, 124)

ICD / CRT procedureICD / CRT procedure 1919 (6, 50)(6, 50)

Fewer records per site = Higher % unique recordsFewer records per site = Higher % unique records

Linking registry data to Medicare claimsLinking registry data to Medicare claims

StepStep 1. 1. Subset registry data Subset registry data

Limit to records for patients 65 years or olderLimit to records for patients 65 years or older

Step 2. Subset Medicare dataStep 2. Subset Medicare data

Step 3. Link hospital identifiersStep 3. Link hospital identifiers

Step 4. Link hospitalization recordsStep 4. Link hospitalization records

Example registry data to be used for linkingExample registry data to be used for linking

OPTIMIZE-HF populationOPTIMIZE-HF population

Adults hospitalized for episodes of new or worsening Adults hospitalized for episodes of new or worsening heart failureheart failure

2003–20042003–2004

52,879 records from 255 sites overall52,879 records from 255 sites overall

39,178 records for patients 65+ (74% of total)39,178 records for patients 65+ (74% of total)

Linking registry data to Medicare claimsLinking registry data to Medicare claims

Step 1. Subset registry dataStep 1. Subset registry data

StepStep 2. 2. Subset Medicare data Subset Medicare data

Limit to records for patients 65 years or olderLimit to records for patients 65 years or older

Limit using similar entry criteria as registry, if Limit using similar entry criteria as registry, if possiblepossible

Step 3. Link hospital identifiersStep 3. Link hospital identifiers

Step 4. Link hospitalization recordsStep 4. Link hospitalization records

Example Medicare data to be used for linkingExample Medicare data to be used for linking

Medicare inpatient populationMedicare inpatient population

Hospitalizations with a diagnosis of HF in any position Hospitalizations with a diagnosis of HF in any position (ICD-9-CM Dx 428.x, 402.x1, 404.x1, 404.x3)(ICD-9-CM Dx 428.x, 402.x1, 404.x1, 404.x3)

2003–20042003–2004

Age 65+Age 65+

5.5m records from >5000 sites overall5.5m records from >5000 sites overall

Linking registry data to Medicare claimsLinking registry data to Medicare claims

Step 1. Subset registry dataStep 1. Subset registry data

Step 2. Subset Medicare dataStep 2. Subset Medicare data

StepStep 3. 3. Link hospital identifiers Link hospital identifiers

Link records on exact values of all fields (service Link records on exact values of all fields (service dates, date of birth, sex)dates, date of birth, sex)

Use resulting matches to inform linksUse resulting matches to inform links

Step 4. Link hospitalization recordsStep 4. Link hospitalization records

OPTIMIZE-HF sample site link resultsOPTIMIZE-HF sample site link results

Using DOBUsing DOB Using AgeUsing Age

OPTIMIZE SiteOPTIMIZE Site Medicare SiteMedicare Site Exact MatchesExact Matches Medicare SiteMedicare Site Exact MatchesExact Matches

11 AA 105105 AA 114114

EE 11 KK 77

FF 11 LL 66

1217 others1217 others 55

22 BB 589589 BB 631631

GG 22 MM 2828

40 others40 others 11 NN 2828

3420 others3420 others 2626

33 CC 2929 CC 3232

DD 2525 DD 2727

HH 11 OO 44

II 11 938 others938 others 33

44 ---- ---- PP 44

QQ 44

541 others541 others 33

OPTIMIZE-HF site link resultsOPTIMIZE-HF site link results

Of 255 registry sites…Of 255 registry sites…

247 (97%) identified in Medicare247 (97%) identified in Medicare

All non-VA sites with 25+ records identifiedAll non-VA sites with 25+ records identified

Linking registry data to Medicare claimsLinking registry data to Medicare claims

Step 1. Subset registry dataStep 1. Subset registry data

Step 2. Subset Medicare dataStep 2. Subset Medicare data

Step 3. Link hospital identifiersStep 3. Link hospital identifiers

StepStep 4. 4. Link hospitalization recordsLink hospitalization records

Determine rules to applyDetermine rules to apply

Decide if one-to-one correspondence neededDecide if one-to-one correspondence needed

Go!Go!

Get follow-up data from Medicare Get follow-up data from Medicare

OPTIMIZE-HF hospitalization link resultsOPTIMIZE-HF hospitalization link results

Of 39,178 eligible registry hospitalizations…Of 39,178 eligible registry hospitalizations…

31,753 (81%) identified in Medicare31,753 (81%) identified in Medicare

25,964 unique patients25,964 unique patients

Combinations usedCombinations usedRecordsRecords

IdentifiedIdentified

AdmitAdmit DischargeDischarge DOBDOB SexSex

AdmitAdmit DOBDOB SexSex

DischargeDischarge DOBDOB SexSex

AdmitAdmit Discharge 2/3 DOBDischarge 2/3 DOB SexSex

AdmitAdmit DischargeDischarge DOBDOB

24,750 (86%)24,750 (86%)

1,171 (4%)1,171 (4%)

590 (2%)590 (2%)

2,258 (7%)2,258 (7%)

284 (1%)284 (1%)

You will have this conversation [Episode 2]You will have this conversation [Episode 2]

Me:Me: This is done using deterministic matching.This is done using deterministic matching.

Adrian:Adrian: No, that’s clearly probabilistic matching.No, that’s clearly probabilistic matching.

Me:Me: Actually, it’s not. Actually, it’s not.

Adrian:Adrian: Sure it is. We didn’t have names or SSNs.Sure it is. We didn’t have names or SSNs.

Deterministic v. Probabilistic LinkingDeterministic v. Probabilistic Linking

Deterministic Deterministic

Rule-basedRule-based

The rule determines the resultThe rule determines the result

ProbabilisticProbabilistic

Based on statistical theoryBased on statistical theory

Characteristics assigned weights and potential links Characteristics assigned weights and potential links are scoredare scored

Data-based score threshold determines the result Data-based score threshold determines the result

You will have this conversation [Episode 3]You will have this conversation [Episode 3]

Me:Me: (excited) We were able to link 75% of the (excited) We were able to link 75% of the eligible records!eligible records!

Adrian:Adrian: Golly, that seems low.Golly, that seems low.

Me:Me: It’s about what I expected. It’s about what I expected.

Adrian:Adrian: But [another registry] said they linked 98%.But [another registry] said they linked 98%.

Why might registry records not link to Medicare?Why might registry records not link to Medicare?

Sample siteSample site

All HF patientsAll HF patients

Linked to MedicareLinked to Medicare

Not linked to MedicareNot linked to Medicare

Why might registry records not link to Medicare?Why might registry records not link to Medicare?

In Medicare claims, but…In Medicare claims, but…

Inconsistent coding of procedures or Inconsistent coding of procedures or diagnosesdiagnoses

Inconsistent service dates or patient infoInconsistent service dates or patient info

Not in Medicare claims due to…Not in Medicare claims due to…

Medicare as secondary payerMedicare as secondary payer

Medicare managed care enrollmentMedicare managed care enrollment

AgeAge

VA hospital (site-level)VA hospital (site-level)

Histogram of OPTIMIZE-HF site link ratesHistogram of OPTIMIZE-HF site link rates

You will have this conversation [Episode 4]You will have this conversation [Episode 4]

Adrian:Adrian: The registry didn’t capture [obesity, anemia, The registry didn’t capture [obesity, anemia, etc.]. Now we can use prior claims to get etc.]. Now we can use prior claims to get that information.that information.

Me:Me: We’re going to lose a bunch of patients if we We’re going to lose a bunch of patients if we try that.try that.

Adrian:Adrian: But it’s so worth it. But it’s so worth it.

Me:Me: Maybe not for that particular information, Maybe not for that particular information, though.though.

Other uses of Medicare dataOther uses of Medicare data

Utilizing claims prior to registry hospitalizationUtilizing claims prior to registry hospitalization

Requires prior enrollment in Medicare FFSRequires prior enrollment in Medicare FFS

8% of OPTIMIZE-HF patients did not have 12 months of 8% of OPTIMIZE-HF patients did not have 12 months of prior claimsprior claims

Inpatient data only can be limitingInpatient data only can be limiting

Need to understand coding limitationsNeed to understand coding limitations

e.g. Anemia is poorly codede.g. Anemia is poorly coded

You will have this conversation [Episode 5]You will have this conversation [Episode 5]

Adrian:Adrian: I want to validate our registry with these I want to validate our registry with these links.links.

Me:Me: You can’t easily do that with these data.You can’t easily do that with these data.

Adrian:Adrian: Sure we can, because now we know which Sure we can, because now we know which Medicare patients are in the registry. Medicare patients are in the registry.

Me:Me: True, but that’s not the whole story.True, but that’s not the whole story.

Validation issuesValidation issues

If you start with the registry population…If you start with the registry population…

You usually do not know exactly who you You usually do not know exactly who you shouldshould find find in Medicare claims datain Medicare claims data

Cannot validate VA sitesCannot validate VA sites

Cannot validate managed care patientsCannot validate managed care patients

Cannot validate younger patientsCannot validate younger patients

Assumes all “linkable” records were linkedAssumes all “linkable” records were linked

Validation issuesValidation issues

If you start with the Medicare population…If you start with the Medicare population…

You usually do not know exactly who you You usually do not know exactly who you shouldshould find find in registry datain registry data

Physician groups may be the registry participants, not Physician groups may be the registry participants, not hospitalshospitals

Assumes all “linkable” records were linkedAssumes all “linkable” records were linked

Registry may have allowed sampling at larger sitesRegistry may have allowed sampling at larger sites

Do you want to link data with Medicare?Do you want to link data with Medicare?

Important caveatsImportant caveats

Acquisition requires major investment in claims data Acquisition requires major investment in claims data and infrastructureand infrastructure

Use of Medicare claims data governed by strict data Use of Medicare claims data governed by strict data use agreements (DUA)use agreements (DUA)

Delays in data release are commonDelays in data release are common

[Currently available through 2008][Currently available through 2008]

Why stop at inpatient Medicare data?Why stop at inpatient Medicare data?

Medicare dataMedicare data

InpatientInpatient

Outpatient / PhysicianOutpatient / Physician

PharmacyPharmacy

Mortality Mortality (or censoring)(or censoring)

Why stop with Medicare claims data?Why stop with Medicare claims data?

Other claims data sources existOther claims data sources exist

Private insurer databasesPrivate insurer databases

But more difficult as smaller % of site hospitalizations But more difficult as smaller % of site hospitalizations coveredcovered

PayerPayer Age 18-64Age 18-64 Age 65+Age 65+

MedicareMedicare 15%15% 89%89%

MedicaidMedicaid 20%20% 1%1%

PrivatePrivate 48%48% 8%8%

Other Other (incl. self-pay, charity)(incl. self-pay, charity) 17%17% 2%2%[Source: 2007 HCUP NIS, excluding maternal/neonate-related admissions][Source: 2007 HCUP NIS, excluding maternal/neonate-related admissions]

ConclusionConclusion

You You cancan link your registry to Medicare claims link your registry to Medicare claims

Get long-term follow-up for registry patients 65+ Get long-term follow-up for registry patients 65+ enrolled in fee-for-service Medicareenrolled in fee-for-service Medicare

However…However…

Manage expectationsManage expectations

Understand claims data limitationsUnderstand claims data limitations

Contact InformationContact Information

Brad HammillBrad Hammill

[email protected]@duke.edu


Recommended