Date post: | 06-May-2015 |
Category: |
Health & Medicine |
Upload: | kelemam |
View: | 1,034 times |
Download: | 4 times |
Risk-based De-identificationKhaled El Emam, CHEO RI & uOttawa
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Re-identification risk assessment, re-identification attacks, de-identification:– Birth registry / newborn screening program– Tumor bank– Hospital data (discharge abstracts and
pharmacy databases) – local, provincial/state, national
– EMR data
Background
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• De-identification works well in practice if you adopt a risk-based approach
• Re-identification attacks are hard• It is possible to de-identify data sets
and still retain sufficient utility• De-identification can be made simple
Issues
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Re-identification Risk Spectrum
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Managing Re-identification Risk
RiskExposure
Amount ofDe-identification
MitigatingControls
Motives &Capacity
Invasion-of-PrivacyV A
V A
-
- ++
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Determining Pr Re-identification Attempts
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Determining Risk Threshold to Use
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Adjust threshold• Adjust amount of suppression that is
acceptable• Adjust precision of variables• Sub-sample• Adjust variable weights
Tradeoffs Made
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Passage through research ethics is significantly faster for “secondary use” protocols that are certified as low risk
• Provides an incentive for data recipients to improve their security and privacy practices
• Provides an incentive for funders to cover the costs of infrastructure for handling data
• Amount of de-identification is proportionate to the actual risk
Advantages
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Risk Assessment
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
De-identification
Risk Assessment for REB
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Risk Assessment for REB
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
Risk Assessment for REB
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• ‘Rogue researcher’ adversary• Search queries considered high risk• Combination of sub-sampling and
generalization for each tumor site data• Moving towards researcher self-
assessments to decide appropriate level of de-identification
Example – Tumor Bank
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• ‘ Nosey neighbor’ adversary• Creation of a public data file• Diagnosis and intervention codes
presented difficulties• High level of suppression for a public
file, but acceptable utility with stronger access controls (higher threshold)
Example – Discharge Abstracts
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• An audit program is required to ensure compliance with ‘mitigating controls’
• What if a breach happens ?– A risk management approach ensures that
the data is highly de-identified in situations where breaches are most likely
– Can demonstrate due diligence
Practical Considerations
Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca
Bckgrd
Contents
End
• Geospatial data and longitudinal data always represent challenges because they increase the risk of re-identification
• Thus far we’ve never had to decline a data request because of identifiability or were unable to provide data with sufficient utility for a study
Lessons Learned
www.ehealthinformation.ca
www.ehealthinformation.ca/knowledgebase