Post on 11-Nov-2014
description
transcript
Data context: new developments for research the social sciences
4th Luso-Brazilian Conference on Open Access,
University of Sao Paulo
9th October 2013
Peter Elias
Structure of the presentation
• Recent reports - what’s going on?
• What constitutes data in the social sciences?
• What problems do we face with the more traditional forms of data?
• New forms of data
• Challenges using new data types
• The report of the Administrative Data Taskforce
• What does this mean for journals?
Recent reports…
Royal Society 2012
OECD 2013 ESRC, MRC, Wellcome Trust 2012
RCUK 2012
Science as an Open Enterprise (Royal Society 2012)
Royal Society 2012
The main thrust of this report was that transparency and
openness should characterise all scientific research. As a
major part of this, data sharing should be regarded as the
norm and researchers, their funders and research
institutions should adopt this stance in all their research
activities. An important recommendation relates to
situations where data hold personal information. In such
cases, appropriate safeguards should be put in place to
prevent disclosure of such details whilst facilitating data
sharing.
New Data for Understanding the Human Condition: international perspectives. (OECD 2013)
The focus of this report was on the need for global collaboration over data sharing. This will require improved incentives for researchers who agree to share data, and the adoption of agreed standards and protocols for data description. Additionally, the report calls for an international approach to the use of ‘Big Data’ for research, covering collaboration over the exploration of the research value of new forms of data, the development of tools for their analysis and improved access to administrative datasets on a cross-national basis.
OECD 2013
Report of the Administrative Data Taskforce 2012 (ESRC, MRC, Wellcome 2012)
This cross-departmental Taskforce proposes a
major boost to the resources available for linkage
and sharing across administrative datasets with
the establishment of Administrative Data
Research Centres in the countries in the UK.
Additionally, all taskforce members are agreed
that new legislation is required in order to
overcome current legal obstacles to record-level
linkage between data held by different
administrative bodies.
Investing for Growth: Capital infrastructure for the 21st Century (RCUK 2012)
This report sets out priorities for capital investment for research. A major theme throughout is to improve UK capacity to harness ‘Big Data’, emphasising the key importance of longitudinal data, of linking socioeconomic data sources to other data, including administrative records, private sector, and biomedical data, as well as ensuring these resources are accessible for social scientific research to benefit the economy, health and other sectors.
What constitutes data in the social sciences? • Research interests focus upon people and organisations, their
interaction, their evolution – seeking to understand better the behavioural relationships between them
• Data types of interests relate to people and organisations, variously classified as
Aggregated/disaggregated
Spatially referenced/time-stamped
Longitudinal/cross-sectional
Quantitative/qualitative
Structured/unstructured
• Data structures include ‘rectangular’ datasets, hierarchical data, textual, numerical, audio, video
What problems do we face with the more traditional forms of data?
• Discovery (NESSTAR; CESSDA; Data Management Plans)
• Documentation (DDI; SDMX)
• Access (DWB; IHSN)
• Reuse (CESSDA)
• Preservation (CESSDA)
New forms of data Broad category
of data Detailed categories Examples
Category A:
Government
transactions
Individual tax records Income tax; tax credits
Corporate tax records Corporation tax; sales; tax, value added tax
Property tax records Tax on sales of property; tax on value of property
Social security payments State pensions; hardship payments: unemployment benefits;
child benefits
Import/export records Border control records; import/export licensing records
Category B:
Government
and other
registration
records
Housing and land use
registers Registers of ownership
Educational registers School inspections; pupil results
Criminal justice registers Police records; court records
Social security registers Registers of eligible persons
Electoral registers Voter registration records
Employment registers Employer census records: registers of persons joining/leaving
employment
Population registers Births; marriages; civil unions; deaths; immigration/emigration
records; census records
Health system registers Personal medical records; hospital records
Vehicle/driver registers Driver licence registers; vehicle licence registers
Membership registers Political parties; charities; clubs
Broad category
of data Detailed categories Examples
Category C:
Commercial
transactions
Store cards Supermarket loyalty cards
Customer accounts Utilities; financial institutions; mobile phone usage
Other customer records Product purchases; service agreements
Category D:
Internet usage
Search terms Google; Bing; Yahoo search activity
Website interactions Visit statistics; user generated content
Downloads Music; films; TV
Social networks Facebook; Twitter; LinkedIn
Blogs; news sites Reddit
Category E:
Tracking data
CCTV images Security/safety camera recordings
Traffic sensors Vehicle tracking records; vehicle movement records
Mobile phone locations: GPS
data
Category F:
Satellite and
aerial imagery
Visible light spectrum Google Earth©
Night-time visible radiation Landsat
Infrared; radar mapping
New forms of data – contd.
Challenges using new data types
• Provenance
• Replicability
• Durability
• Volume
• Ethics
• Confidentiality
• Legal issues
• Access may be strictly controlled
Focus from here on one particular data type:
Administrative data – reuse for research
What are administrative data?
Data which are the product of an administrative system. They are generated by organisations for operational purposes or as a legal requirement. They might identify people and/or organisations and may contain detailed spatial information, be time-stamped. They are produced by public and private sector organisations. They are not designed for research.
What is the research value of such data?
• They already exist. No additional data collection costs associated with research use.
• They are typically large national datasets, permitting more detailed research to be undertaken than would otherwise be the case.
• They record a process, which can be documented and understood.
• Linkage between data relating to different time periods can create longitudinal resources.
• Linkage to other data sources (e.g. surveys) can enhance these resources.
What are the problems associated with their research use?
• Not designed for research. This may pose difficulties for their
use in specific research areas.
• They are not subject to statistical standards or statistical
quality controls.
• They may be difficult to access, and linkage may be prohibited
or may not be feasible.
• As the systems that generate them change, so might the data.
• Their preservation for research is not regarded as a
fundamental objective – may lead to problems with metadata.
Some of the problems currently faced by researchers
• Inconsistent access conditions.
• Severe time delays in granting access or refusal.
• Lack of information about selection and/or linking of administrative datasets.
• Restricted access to datasets – especially for addressing the counterfactual.
• Data controller making unilateral decision about appropriateness of data for research.
• Research permitted then publication denied.
Terms of reference for the Taskforce
• identification of potential risks and benefits from increased research use of administrative data;
• identification of likely resource implications arising from increased research use of administrative data;
• the development and introduction of common procedures to provide more efficient access to administrative datasets;
• clarification of the legal situation governing the use of routine data;
• clarification of when consent is required and what consent procedures should be used;
• identification of possible need for legislative change to improve access to administrative data for research.
What has the Taskforce recommended?
• Improved access and linkage procedures and arrangements for their governance.
• A clearer legal environment for linkage between data held by different departments.
• A common accreditation process for researchers applying for access to and linkage between administrative datasets.
Where are we now?
• £34 million released by government .
• Four Administrative Data Centres commissioned.
• A new UK Administrative Data Service set up.
• A national governing authority is being established.
• New legislation under preparation.
• Now commissioning centres for local government and private sector data
What are the implications for libraries and journals?
• Libraries as home for secure remote access facilities .
• More attention to data documentation and discovery tools.
• Building up capacity within the research community to facilitate research using the improved access and data linkage arrangements.
• Subject knowledge of librarians to extend to administrative datasets.
• To be solved – open access and access to administrative data