+ All Categories
Home > Documents > DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data...

DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data...

Date post: 26-Mar-2015
Category:
Upload: sara-lopez
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
69
DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004
Transcript
Page 1: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

DLI Orientation: Concepts

A Framework for Thinking about Statistical Information

Chuck Humphrey

Data Library

University of Alberta

April 2004

Page 2: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistical Information Two models for identifying and

selecting appropriate statistical information:

1. A chart of statistical information Distinguishing statistics & data Distinguishing aggregate data &

microdata

Page 3: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistical Information2. Continuum of access

Matching dissemination channels with desired products

Page 4: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistics or DataStatistics

• numeric facts/figures • created from data,

i.e, already processed

• presentation-ready

Data• numeric files created

and organized for analysis

• requires processing• not ready for display

Page 5: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistics or Data

Page 6: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.
Page 7: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistics or Data

Page 8: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

Page 9: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

This is a typology of the categories or classes of statistical information. Remember the relationship between statistics and data, however, is causal. Statistics are created from data.

Page 10: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

Page 11: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

An overlap occurs in this chart

between Statistics: Databases and

Data: Aggregate, which will be

discussed below.

Page 12: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print Online

Statistics

Aggregate Microdata

Data

Statistical Information

In print

Page 13: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

In PrintRely on yearbooks, statistical abstracts,

catalogues, and indexes to locate statistics in print.

Examples of online indexes to print resources: Statistical Universe and Tablebase

Example of an online catalogue that includes print resources: Statistics Canada’s Online Catalogue

Page 14: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

Online

Page 15: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Online StatisticsExample of e-publications

Statistics Canada Downloadable Publications (DSP)

Example of e-tablesCanadian Statistics (STC Website)

Example of statistical databases CANSIM II (STC Website, E-STAT, CHASS)

Page 16: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

E-PublicationsTend to be available in PDF formatCan use the “Select Text” Tool in

the Adobe Reader and copy columns to another application

Page 17: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistical Information

Page 18: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

E-TablesTend to be displayed in HTMLMay provide a pull-down list to view

other categories in the tableSome e-tables will provide an

alternate format for the table that can be downloaded (e.g., the Census tables are available in comma-separated ASCII, IVT, and print-friendly formats)

Page 19: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.
Page 20: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

DatabasesOften use HTML forms to define

the statistics to be retrieved May offer a variety of output

formats for the retrieved statistics (e.g., E-STAT provides IVT format for Beyond 20/20, graphs, charts, maps, and ASCII formats for spreadsheets and databases)

Page 21: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.
Page 22: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.
Page 23: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

AggregateData

Page 24: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

Aggregate data consist of statistics that are organized into a data structure and stored in a database or in a data file.

The data structure is based on tabulations organized by time, geography, or social content.

Page 25: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

Data StructureTimeGeographySocial

Content

Example: CANSIM II

Page 26: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

Time series data have long fueled econometric models based on macro-economic indicators.

Comma-separate values (CSV) have become an important format for time series data, which is often manipulated in Excel if not analyzed in a spreadsheet.

Page 27: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

Data StructureTimeGeographySocial

Content

Example: CENSUS

Page 28: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

Increased availability of GIS software has created greater demand for Census statistics organized as aggregate data.

Beyond 20/20 has become a popular tool for reshaping census statistics from 1996 and 2001 for use with GIS software.

DBF is the most commonly used format to share census statistics with GIS software.

Page 29: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate DataA map from E-STAT of Montreal Census

Tracts

Page 30: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

“Small area statistics” are a special category of aggregate data. These data files consist of statistics for small geographic areas usually calculated from a population or manufacturing census or an administrative database with enough cases to create accurate summaries for small areas.

Page 31: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

Data StructureTimeGeographySocial

Content

Example: Cause of Death (HID)

Page 32: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Aggregate Data

Also known as “cross-classified” tables, these files tend to be made of statistics constructed from social-content variables. Examples of cross-classified tables in DLI are found in education and justice.

Page 33: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Chart of Statistical Information

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

Microdata

Page 34: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

MicrodataThis is raw data organized in a file

where the lines in the file represent a specific unit of observation and the information on the lines are the values of variables.

There are different types of microdata files, which will now be discussed.

Page 35: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Confidential MicrodataMaster files: these files contain the

fullness of detail captured about each case of the unit of observation. This detail is specific enough that the identify of a case can often be disclosed easily. Therefore, these files are treated as confidential.

Page 36: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Confidential MicrodataShare files: these are confidential

files in which the participants in the survey have signed a consent form permitting Statistics Canada to allow access to their information for approved research.

These files consist of a subset of the cases in the master file.

Page 37: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Confidential MicrodataIn summary, confidential microdata

get grouped into two types:master files and share files.

Page 38: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Public Use MicrodataThese microdata are specially

prepared to minimize the possibility of disclosing or identifying any of the cases in a file, i.e, participants in a survey.

The original data from the master file are edited to create a public use microdata file.

Page 39: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Public Use MicrodataSteps in Anonymizing Microdata

Remove of all personal identification information (names, addresses, etc);

Include only gross levels of geography;Collapse detailed information into a

smaller number of general categories;Cap the upper range of values of

variables with rare cases;Suppress the values of a variable; orSuppress entire cases.

Page 40: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Public Use MicrodataStatistics Canada PUMFs

Only available for select social surveys that undergo a review of the Data Release Committee, an internal Statistics Canada committee.

No ‘enterprise’ public use microdata.

Page 41: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Public Use MicrodataStatistics Canada PUMFs

Almost all PUMFs consist of cross-sectional samples, that is, samples where the data have been collected from respondents at one point in time.

Longitudinal samples, where data are collected from the same individuals two or more times, are difficult to anonymize and maintain any useful information.

Page 42: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Synthetic MicrodataThese data files have been created

by author divisions to assist with the analysis of confidential data files.The files provide the full variable

structure of the confidential microdata but do not contain any real cases.

They are intended to be used by researchers wanting to submit a file of commands in a statistical package’s language for remote job submission.

Page 43: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Synthetic MicrodataThey are also being used by those

with approved projects in Research Data Centres to help prepare their analysis strategies prior to working in an RDC.

Synthetic files are also commonly referred to as “dummy files,” although a more technical use of this term does exist for this specific type of synthetic file.

Page 44: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Synthetic MicrodataA variety of synthetic file types are being

created and tested by author divisions. One type has no real data but does

contain a complete set of real variables. This type is the more technical reference to a dummy file.

Another type has a mix of real data but no real cases. The purpose of this type is to provide -- in the aggregate -- results that should be close to an analysis of the real microdata file.

Page 45: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Synthetic MicrodataUsers of these files must be advised that

none of the analytic results from these files should ever be reported. Their only purpose is to help researchers construct their statistical analysis programs to guard against syntax errors that might exist in their setup.

The DLI FTP site clearly distinguishes synthetic files from real microdata files.

Page 46: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Summary: First Model

In print

E-publications E-tables Databases

Online

Statistics

Aggregate Microdata

Data

Statistical Information

Page 47: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Summary: First ModelThis first model provides a way of

thinking about the types of statistical information that exist.

Is the information Statistics or Data? If Statistics, is the information in print or

online? If online, is it in an e-pub, e-table, or

database? If Data, is the information aggregate

data or microdata?

Page 48: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

The Second ModelIt is one thing to know about the

variety of statistical information that exists, but access to this information is a separate issue.

The second model describes the various dissemination channels through which access is provided to statistical information by Statistics Canada.

Page 49: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Continuum of AccessStatistics Canada provides access

to its statistical information through a variety of services and initiatives that function as dissemination channels.

Think of this variety as constituting a continuum along which levels of access are provided.

Page 50: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Continuum of AccessThere are three characteristics that

make up this continuum:Cost : which runs from free to

expensive;Restrictions or conditions : which run

from open or no restrictions to very restricted; and

Type of Information : which runs from statistics to data.

Page 51: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Continuum of Access

ACCESS CHANNELSOpenFreeStatistics

RestrictedExpensiveData

Dep

osito

ry S

ervi

ce P

rogr

am

Rem

ote

Job

Sub

mis

sion

Sta

tistic

s C

anad

a W

ebsi

te

Dat

a Li

bera

tion

Initi

ativ

eC

usto

m T

abul

atio

ns

Res

earc

h D

ata

Cen

tres

Page 52: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistics Canada WebsiteFree, Open, Statistics

The Daily is an important source of publicly-released official statistics. It has been available on the Website for several years and was the primary source for free statistics in the early years of the Statistics Canada website.

Page 53: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistics Canada WebsiteFree, Open, Statistics

With the introduction of Community Profiles from the 1996 Census in 2000 and more recent offerings from the Health Statistics Division, this dissemination channel has had a big increase in the amount of statistics available at the national, provincial, CMA, CSD, and Health Region levels.

Page 54: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Depository Service ProgramFree, Open, Statistics

The Depository Service Program (DSP) has provided public access to government information for over 75 years. Through a network of public, special, and academic libraries, the Treasury Board has paid Federal Departments to release publications to the public through the DSP.

Page 55: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Depository Service ProgramFree, Open, Statistics

Statistics Canada has a large series of publications that it makes available through the DSP. Many of these titles are available online in PDF format and are part of the Statistics Canada Downloadable Publication series.

While these statistical publications are free, the public is required to go to a DSP library to access them.

Page 56: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Data Liberation InitiativeFee, Licenced, Conditional Access,

Data and StatisticsDLI provides a wider range of

statistical information than the Statistics Canada Website or the DSP, but access in no longer free and rules apply those who are eligible to use these materials.

This is a move away from free-&-open to fees-&-conditional access.

Page 57: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Data Liberation InitiativeFee, Licenced, Conditional Access,

Data and StatisticsDLI provides member institutions in

the post-secondary educational sector with access to all “standard data products,” which consists of the statistical databases, public use microdata files, and geography files listed for sale in the Statistics Canada Online Catalogue.

Page 58: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Data Liberation InitiativeFee, Licenced, Conditional Access,

Data and StatisticsPatrons of this service must hold a

current affiliation with a member institution and are restricted in their use of these materials for teaching, scholarly research, or institutional planning.

Furthermore, secondary redistribution of DLI materials is not allowed.

Page 59: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Customized TabulationsPay-per-view Access

A long-term dissemination channel within Statistics Canada has been custom tabulation services.

This is a contract service with Statistics Canada to produce tables from surveys or the Census that have not been produced for public release.

Each customized product comes with its own licence.

Page 60: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Remote Job SubmissionAKA, Remote Data Access (RDA)

This is a relatively new service for a select number of surveys.

The terms of access vary among the author divisions offering this service. Some charge a fee (e.g., access to YITS and PISA is $75 a run), while other divisions do not charge. The Health Statistics Division requires a proposal to access the surveys for which it provides remote job submission.

Page 61: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Remote Job SubmissionAKA, Remote Data Access (RDA)

Synthetic files have been created to assist with the preparation of the statistical command files that are submitted for remote processing.

An analysis is prepared in the command language of a statistical package supported by the author division (SAS or SPSS, e.g.) and submitted via email to the division.

Page 62: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Remote Job SubmissionAKA, Remote Data Access (RDA)

All results are screened by the author division for disclosure issues prior to the output being sent to the researcher who submitted the job.

This dissemination channel provides a means of producing analysis from confidential data files with conditional approval and in some instances for a fee.

Page 63: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Research Data CentresRestricted Access to Confidential Data

Research Data Centres house select confidential data files in a controlled Statistics Canada office environment.

Access is provided on a project-by-project basis.

A SSHRC-administered application process is used to evaluate the proposed use of the confidential data.

Page 64: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Research Data CentresRestricted Access to Confidential Data

Furthermore, a security clearance with Statistics Canada must be passed.

With approval from both the SSHRC peer review and the security clearance, the members of a research project must undergo an orientation to the RDC, swear an oath to the Statistics Act, and sign a contract with Statistics Canada.

Page 65: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Research Data CentresRestricted Access to Confidential Data

The advantage of RDC access over Remote Job Submission is that researchers get to work directly with the confidential data source.

Page 66: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Statistical Information available through Statistics Canada

Different Services

Service:Statistics

Canada WebsiteDepository

ServiceProgram

Data LiberationInitiative

Cu$tomizedTabulations &Pay per View

Remote JobSubmission

Research DataCentres

Who isEligible &Conditions:

General Public:available on theInternet atwww.statcan.ca

DesignatedDSP Libraries& their Users:available on site

Post-secondaryAcademic:restricted toteaching andresearch purposes

Individuals:contract betweenSTC andindividual

ApprovedResearchers:contract betweenSTC andindividual

ApprovedResearchers:SSHRC peerreview & deemedSTC employee

Products:- The Daily- Canadian

Statistics- Census- Statistical profiles

of Canadiancommunities

- Downloadablepublications

- Paper publica-tions

- Electronic pub-lications, whichincludes priceddown-loadablepublications &select CD ROMS

Standard dataproducts:aggregate databases, microdatafiles andgeography files

Tables fromconfidential filesthat are speciallyproduced byStatistics Canadafor a fee andaccess tospecializeddatabases

“Dummy” orsynthetic files tobuild analysissetups that mustthen be submittedto Stats Can forprocessing

Confidential datafiles from thelongitudinalsurveys begun inthe 1990’s

NotesWarning: someparts of the Websiteare fee-based

Some DSPlibraries provideoff-site access toauthenticatedusers

Interface toCANSIM I andTrade Analyzeravailable throughCHASS (Universityof Toronto) bysubscription

Specializeddatabases includeCANSIM II andTrade Analyzer

Services availablefor selected titles.Remote jobsubmission is themost developedfor NPHS.

Applications cannow be submittedthrough theSSHRC Web site.

ACCESSOpen

FreeStatistics

RestrictedExpensiveData

Page 67: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Using the Two ModelsCombining these two models should

assist you in identifying and selecting appropriate statistical information.

The types of statistical information should help you identify an appropriate product, while the continuum of access should help you locate the channel or channels through which the statistical information is disseminated.

Page 68: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

Using the Two ModelsHopefully, you will find this framework

useful in your data reference interviews, which is a separate topic in this orientation, and in navigating the DLI FTP site for various statistical information.

Page 69: DLI Orientation: Concepts A Framework for Thinking about Statistical Information Chuck Humphrey Data Library University of Alberta April 2004.

WarningRemember that while Statistics Canada

is an important source of statistical information in our country, it is not the only source.

Other important sources include other federal government and provincial departments, data libraries and archives, non- & inter-governmental agencies, and commercial vendors.


Recommended