Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok,...

Post on 02-Jan-2016

214 views 0 download

Tags:

transcript

Documenting and disseminating census and survey data sets

Ilpo Survo, United Nations ESCAP, Bangkok, survo.unescap@un.org

for

UNECE Training Workshop on Census Technology for SPECA member countries, Astana, 7-8 June 2007

Content

A. Systematic documenting of census data sets

B. Why to disseminate microdata?

C. Microdata Management Toolkit

A. Systematic documenting of census data sets

A good census dataset is..

• Documented clearly

• Contains no surprises

• Allows users to– Start working effectively quickly

– Find the data they are interested in

– Understand what the data are measuring and how the data have been created

– Assess the quality of the data

Evolving documentation technology

• Own documentation standards => International metadata standards

• National practices => International good practices.

• Ad hoc tools => Structuring tools, databases

• Text-based codebooks => XML-based codebooks

Maintain metadata in a centralised database

• Manage definitions, methodology information, variable information, data collection information in one place

• Ensures consistency across data holdings

• Approach useful for planning, data collection, processing, analysis and dissemination

Good practices in data documentation

• Explanatory material

– Minimum material required to ensure the long-term viability and functionality of a dataset

• Contextual information

– Material about the context in which the data was collected, and how it was put to use

– Enables the secondary user to fully understand the background and processes behind the data collection exercise.

• Cataloguing material

– Bibliographic record of the dataset, for proper acknowledgement and citation

– Basic instrument used for resource discovery

• http://www.esds.ac.uk/news/goodPractice.pdf

B. Why to disseminate microdata?

Untapped potential of microdata for national development

• Even the best planned tabulations cannot exhaustively bring out all valuable information from census data

• Diversity, disparities and related causalities are best analysed from microdata, e.g.

– Tracking the effects of policy interventions on target groups

– Determining dimensions of within-country disparities

• The quality of research would improve

=> Return on data collection would increase

=> National policies could be targeted better

=> More efficient use of public resources

Factors that might hinder microdata dissemination - Discussion

• Concerns about data confidentiality

• Ambiguous or missing national legislation

• Narrow mandate of statistical agency

• Concerns about data quality

• Low demand from data users

International initiatives

• Marrakech Action Plan on Statistics, http://www.surveynetwork.org/home/docs/Marrakech_Action_Plan_for_Statistics.pdf

• International Household Survey Network, http://www.surveynetwork.org/

• IHSN Microdata Management Toolkit

• ESCAP-World Bank-PARIS21 project on improving access to survey microdata in Asia and the Pacific

ESCAP project on improving access to survey microdata in Asia and the Pacific, 2007-2008

• Household surveys and population and housing censuses, not establishment surveys

• Assessment of status of microdata dissemination

• Regional inventory and data archive of household surveys

• Regional advocacy and training workshops

• On-site training and technical advice on documentation and anonymization

C. Microdata Management Toolkit

Microdata Management Toolkit – Summary

A set of software tools for the documentation, archiving, dissemination and preservation of microdata

1. Metadata Editor

– Document survey data in accordance with international standards

2. CD-Rom Builder

– Generates user-friendly outputs, such as CDs, websites, for dissemination and archiving

3. The Explorer

– For viewing metadata

– For re-exporting data to various formats

Download and use

• The Toolkit can be downloaded from http://www.surveynetwork.org/home/?lvl1=tools&lvl2=documentation&lvl3=toolkit

• Except Metadata Editor, all Toolkit components are available for free

• Nesstar Editor: One free license for NSOs of the World bank IDA countries (e.g. Afghanistan, Georgia, Kyrgyz Republic, Moldova, Tajikistan)

Metadata Editor

• Documents survey data in accordance with international standards

• Data Documentation Initiative (DDI)

• Dublin Core Metadata Initiative (DCMI)

• Data & metadata in one single file

• Data can be imported from various formats, incl. statistical packages

• Produces survey documentation in PDF format

Extensible Mark-up Language (XML)

• Language to describe data using tags

• Tags conceptually the same as fields in databases

• XML files are regular text files

• Can be edited with text editors

• XML files, like databases, can be:

• Searched and queried

• Edited

• Tutorial: http://w3schools.com/xml

XML example

<titl> Multiple Indicator Cluster Survey 2005 </titl>

<altTitl> MICS </altTitl>

<AuthEnty> National Statistics Office (NSO) </AuthEnty>

<fundAg abbr= "UNICEF">United Nations Children Fund </fundAg>

<collDate date= "2005-01" event="start"/>

<collDate date= "2005-03" event="end"/>

<nation> Popstan </nation>

<geogCover> National </geogCover>

<sampProc> 5,000 households, stratified two stages </sampProc>

<respRate> 98 percent </respRate>

XML advantages

• Creation of a comprehensive checklist of useful metadata elements

• Potential to assess the content of a file by determining whether particular tags are, or are not, within that file

• Creation of a dataset catalogue which can be queried for key metadata elements

• Potential to transform the file into more user-friendly formats, such as HTML, PDF

• XML files can be exchanged across networks or over the Internet using web services or SOAP

CD-ROM Builder

• Integrates with Metadata Editor

• Generates user-friendly outputs (CD-Rom, website) for dissemination and archiving (HTML format)

• Allows customization– Branding: look and feel of CD or website

– Content: single or multiple surveys

CD-ROM Builder process

Create new CD-ROM Project

Create new CD-ROM Project

Add a survey to the project and select its type and branding

Add a survey to the project and select its type and branding

1

2• Selecting a consisting survey by opening the DDI-XML or Nesstar file• The survey branding determines the overall look and feel of the CD• The survey type determines the default metadata content

• Selecting a consisting survey by opening the DDI-XML or Nesstar file• The survey branding determines the overall look and feel of the CD• The survey type determines the default metadata content

Click the Save button to generate the HTML

interface

Click the Save button to generate the HTML

interface

3

After a few minutes, your CD Project is ready for

publishing!

After a few minutes, your CD Project is ready for

publishing!

4

CD-ROM Builder sample outputs

Demonstration of Metadata Editor

A live demonstration with Popstan dataset, on-screen in English and Russian

Thank you!

Discussion, questions, answers