Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
Privacy and Confidentiality Issues with Spatial Data: The Data Center
Perspective
Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi
22 May 2003
© The Trustees of Columbia University in the City of New York
2
Presentation Overview
Issues
Trends and Examples
Data Center-based Responses
Benefits from Appropriate Data Center Responses
3
Issues
Why privacy and confidentiality?
Privacy and confidentiality and spatial data
Why use spatial data
4
Restate the issue of privacy and confidentiality
Researchers and users of data have a legal and moral responsibility to protect the privacy and confidentiality of individuals participating in research.
5
Personal Identifying Information and Spatial Data
Typical case is not the spatial data itself, but the mapping of sensitive information in a way that potentially allows a subject to be identified or the integration of different data that allows for the potential identification of individual respondents
6
Why integrate or use spatial data?
Re-evaluation of social or health data in a geospatial framework Evaluating spatial patterns is only a first step Analysis of these data with geographically
specified and environmental parameters Geographic parameters have often been implicit
E.g., county of residence New technologies—like global position systems
—make geographic parameters explicit E.g., lat-long coordinates
7
Usages of linked micro-level data
Data applications At the individual-level: exact locations
knownConfidentiality a clear concern, even with
masked identifiers (remove names)Even when grouped (e.g., in sample
clusters) At different scales: aggregating up
Why isn’t this enough? Or, when it is enough?
8
Trends and Examples
Accessibility of higher resolution data is increasing
Ubiquity of GIS technology
Demographic Health Survey
9
Easily Accessible High Resolution Data
http://terraserver.microsoft.com/
Lamont-Doherty Earth Observatory, Palisades, NY
10
From Space Imaging (http://www.spaceimaging.com/)
Tornado Damage, Oklahoma City, May 8, 2003
One-meter IKONOS
11
Examples with Demographic and Health Survey (DHS) data
100 surveys in roughly 75 countries (1984-present)
45 with GPS data in 30 countries (late-90s to present) Mostly in Africa GPS points taken at population center of cluster (or
enumeration area) Roughly 30 households per cluster
Ranges from a single building in an urban area to 250 km2 area in sparsely populated areas
Survey content includes highly sensitive subjects: Births Deaths Contraceptive use HIV knowledge, preventative measures and blood samples Household assets
Data are publicly and freely available with request
12
Case for integrating geospatial data with health data: DHS Clusters & Aridity Zones
West Africa
13
Overlaying satellite imagery
Moderate resolutions—roughly 30 meters2—e.g., Landsat Gives a good indication of vegetation, land use change,
some vector habitats Gives general indication of DHS clusters, difficult to
determine precise location of cluster
High resolution—4 meters2—e.g., Quickbird Indicates vegetation, roads, bridges and built environments
Even exact buildings Could easily be mapped with street-location data
14
Landsat
Quickbird
15
16
17
18
19
Frequency of cluster size
Ranges from 2 to 36 persons per cluster
20
HIV/AIDS testing
Three recent DHS surveys have conducted testing among a subsample of surveyed women age 15-49 and men age 15-59, becoming some of the first, nationally representative survey data to include biomarker testing for HIV/AIDS: Mali, Dominican Republic, Zambia
HIV tests were "anonymized“ or “delinked” so that the results of the tests could not be linked back to the individual data file in order to preserve confidentiality of respondents Coupons were provided to the respondents to obtain testing
themselves if they wished, along with counseling services
Results then relinked to original survey but with random IDs
Source: L. Montana: 2003
21
Adding spatial noise
2 km urban Increases the
potential number of hhs from 260 to 2,340
Adds 9 EA for every sampled EA
5 km rural Increases the
potential number of hhs from 214 to 2,568
Adds 12 EA for every sampled EA
EA = Enumeration Areas, Malawi
22
Methodological Questions
How much error is introduced by these buffers? Especially if these buffers are within the spatial error
of some overlaying data sets.
Does spatial noise compound “tabular” noise?
Can we a priori predict all the possible permutations with newly available data?
23
Data Center Responses – 3Ps and a K
Policies
Procedures
People
Knowledge
24
Policies
To control data
Sensitize personnel and end-users
25
Procedures
Restricting access to data through a controlled environment Promote data “enclave model” whereby individual
researchers may visit “safe” site for full access to confidential data
Consider developing virtual data environments to extract and use micro-level data while protecting confidentiality, e.g. IPUMS at University of Minnesota
Documenting confidentiality issues in metadata
26
People
Staff Read and sign an agreement indicating a commitment
to protect confidential data and to follow relevant procedures (similar to a computer use policy)
Researchers Responsible use statement
27
Knowledge Transfer
Support researchers and local IRBs by transmitting knowledge of potential confidentiality
issues using spatial data Communicating the methods used to protect
confidentiality in a data set, i.e. adding spatial noise
28
Benefits
Protect respondents
Further science
Support researchers interface with local IRBs
Create an “enclave” for the responsible use of confidential data products, e.g. US Census Data Centers Alternative model for conducting research, “getting
out from behind your desk,” promotes scientific interactions and new ways of thinking