+ All Categories
Home > Documents > Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk,...

Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk,...

Date post: 20-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
28
Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi 22 May 2003 © The Trustees of Columbia University in the City of New York
Transcript
Page 1: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

Privacy and Confidentiality Issues with Spatial Data: The Data Center

Perspective

Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi

22 May 2003

© The Trustees of Columbia University in the City of New York

Page 2: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

2

Presentation Overview

Issues

Trends and Examples

Data Center-based Responses

Benefits from Appropriate Data Center Responses

Page 3: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

3

Issues

Why privacy and confidentiality?

Privacy and confidentiality and spatial data

Why use spatial data

Page 4: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

4

Restate the issue of privacy and confidentiality

Researchers and users of data have a legal and moral responsibility to protect the privacy and confidentiality of individuals participating in research.

Page 5: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

5

Personal Identifying Information and Spatial Data

Typical case is not the spatial data itself, but the mapping of sensitive information in a way that potentially allows a subject to be identified or the integration of different data that allows for the potential identification of individual respondents

Page 6: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

6

Why integrate or use spatial data?

Re-evaluation of social or health data in a geospatial framework Evaluating spatial patterns is only a first step Analysis of these data with geographically

specified and environmental parameters Geographic parameters have often been implicit

E.g., county of residence New technologies—like global position systems

—make geographic parameters explicit E.g., lat-long coordinates

Page 7: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

7

Usages of linked micro-level data

Data applications At the individual-level: exact locations

knownConfidentiality a clear concern, even with

masked identifiers (remove names)Even when grouped (e.g., in sample

clusters) At different scales: aggregating up

Why isn’t this enough? Or, when it is enough?

Page 8: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

8

Trends and Examples

Accessibility of higher resolution data is increasing

Ubiquity of GIS technology

Demographic Health Survey

Page 9: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

9

Easily Accessible High Resolution Data

http://terraserver.microsoft.com/

Lamont-Doherty Earth Observatory, Palisades, NY

Page 10: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

10

From Space Imaging (http://www.spaceimaging.com/)

Tornado Damage, Oklahoma City, May 8, 2003

One-meter IKONOS

Page 11: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

11

Examples with Demographic and Health Survey (DHS) data

100 surveys in roughly 75 countries (1984-present)

45 with GPS data in 30 countries (late-90s to present) Mostly in Africa GPS points taken at population center of cluster (or

enumeration area) Roughly 30 households per cluster

Ranges from a single building in an urban area to 250 km2 area in sparsely populated areas

Survey content includes highly sensitive subjects: Births Deaths Contraceptive use HIV knowledge, preventative measures and blood samples Household assets

Data are publicly and freely available with request

Page 12: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

12

Case for integrating geospatial data with health data: DHS Clusters & Aridity Zones

West Africa

Page 13: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

13

Overlaying satellite imagery

Moderate resolutions—roughly 30 meters2—e.g., Landsat Gives a good indication of vegetation, land use change,

some vector habitats Gives general indication of DHS clusters, difficult to

determine precise location of cluster

High resolution—4 meters2—e.g., Quickbird Indicates vegetation, roads, bridges and built environments

Even exact buildings Could easily be mapped with street-location data

Page 14: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

14

Landsat

Quickbird

Page 15: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

15

Page 16: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

16

Page 17: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

17

Page 18: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

18

Page 19: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

19

Frequency of cluster size

Ranges from 2 to 36 persons per cluster

Page 20: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

20

HIV/AIDS testing

Three recent DHS surveys have conducted testing among a subsample of surveyed women age 15-49 and men age 15-59, becoming some of the first, nationally representative survey data to include biomarker testing for HIV/AIDS: Mali, Dominican Republic, Zambia

HIV tests were "anonymized“ or “delinked” so that the results of the tests could not be linked back to the individual data file in order to preserve confidentiality of respondents Coupons were provided to the respondents to obtain testing

themselves if they wished, along with counseling services

Results then relinked to original survey but with random IDs

Source: L. Montana: 2003

Page 21: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

21

Adding spatial noise

2 km urban Increases the

potential number of hhs from 260 to 2,340

Adds 9 EA for every sampled EA

5 km rural Increases the

potential number of hhs from 214 to 2,568

Adds 12 EA for every sampled EA

EA = Enumeration Areas, Malawi

Page 22: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

22

Methodological Questions

How much error is introduced by these buffers? Especially if these buffers are within the spatial error

of some overlaying data sets.

Does spatial noise compound “tabular” noise?

Can we a priori predict all the possible permutations with newly available data?

Page 23: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

23

Data Center Responses – 3Ps and a K

Policies

Procedures

People

Knowledge

Page 24: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

24

Policies

To control data

Sensitize personnel and end-users

Page 25: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

25

Procedures

Restricting access to data through a controlled environment Promote data “enclave model” whereby individual

researchers may visit “safe” site for full access to confidential data

Consider developing virtual data environments to extract and use micro-level data while protecting confidentiality, e.g. IPUMS at University of Minnesota

Documenting confidentiality issues in metadata

Page 26: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

26

People

Staff Read and sign an agreement indicating a commitment

to protect confidential data and to follow relevant procedures (similar to a computer use policy)

Researchers Responsible use statement

Page 27: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

27

Knowledge Transfer

Support researchers and local IRBs by transmitting knowledge of potential confidentiality

issues using spatial data Communicating the methods used to protect

confidentiality in a data set, i.e. adding spatial noise

Page 28: Privacy and Confidentiality Issues with Spatial Data: The Data Center Perspective Deborah Balk, Robert Downs, W. Christopher Lenhardt, Francesca Pozzi.

28

Benefits

Protect respondents

Further science

Support researchers interface with local IRBs

Create an “enclave” for the responsible use of confidential data products, e.g. US Census Data Centers Alternative model for conducting research, “getting

out from behind your desk,” promotes scientific interactions and new ways of thinking


Recommended