+ All Categories
Home > Documents > Making Statistical Data More Available_1996.pdf

Making Statistical Data More Available_1996.pdf

Date post: 24-Feb-2018
Category:
Upload: marianne-cartagena
View: 216 times
Download: 0 times
Share this document with a friend

of 17

Transcript
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    1/17

    Wiley and International Statistical Institute (ISI)are collaborating with JSTOR to digitize, preserve and extend access to

    International Statistical Review / Revue Internationale de Statistique.

    http://www.jstor.org

    Making Statistical Data More Available

    Author(s): Bo SundgrenSource: International Statistical Review / Revue Internationale de Statistique, Vol. 64, No. 1 (Apr., 1996), pp. 23-38

    Published by: International Statistical Institute (ISI)Stable URL: http://www.jstor.org/stable/1403422Accessed: 01-02-2016 18:58 UTC

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of contentin a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.For more information about JSTOR, please contact [email protected].

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/http://www.jstor.org/publisher/isihttp://www.jstor.org/stable/1403422http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/1403422http://www.jstor.org/publisher/isihttp://www.jstor.org/
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    2/17

    Internationaltatistical

    eview

    1996),

    4,

    1,

    23-38,

    Printed

    n Mexico

    (

    International tatisticalInstitute

    M a k in g

    Statistical

    D a t a

    M o r e

    Available

    Bo

    Sundgren

    Statistics

    Sweden,

    S-115

    81

    Stockholm,

    Sweden

    Summary

    Will statistical

    offices

    be able to meet new

    challenges

    rom the users

    to make statisticaldata

    more

    available

    by

    means

    of modern

    echnology?

    Can

    they

    do this

    within

    existing

    budget

    restrictions,

    nd with

    due

    considerationo theinterestsof data

    providers?

    Theseare

    questions

    addressedhere. Problemsand

    opportunities

    re illustrated

    by

    examples

    romSweden.

    Key

    words:Statistics

    roduction;

    fficial

    tatistics;

    ata

    dissemination; etadata;

    tandard

    nterfaces;

    tan-

    dardized

    oftware;

    ystem

    development;

    onfidentiality;

    tatistical

    atabases;

    tatistical

    nformation

    ystems.

    1

    New

    Challenges

    for Statistics

    Producers

    Statistics

    producers

    in national statistical

    offices are

    facing

    new

    expectations,

    demands,

    and

    requirements

    rom

    several

    directions:

    *

    from

    statistics

    users,

    who

    want

    faster,

    easier,

    and

    less

    expensive

    access

    to

    statistical data

    -

    through

    media

    and routines

    that are better

    adapted

    o

    their

    own

    processing

    needs;

    *

    from data

    providers,

    who

    demand less

    burdensome

    reporting

    through

    media and routines

    that

    are

    better

    adapted

    o their own information

    ystems;

    *

    from

    governments

    and

    tax-payers,

    who want more

    value for

    less

    money ;

    *

    from international

    organisations,

    requesting

    member

    countries o

    provide

    imely,

    comparable,

    good quality

    statistics,

    which

    comply

    with international tandards.

    Technological

    progress

    s

    taking

    place

    as

    rapidly

    as ever. All

    the

    above-mentioned take-holders

    in statisticsproductionexpect statisticsproducers o take full advantageof advances n technology.

    This

    paper

    will

    discuss

    how statistics

    producers

    can

    respond

    to some of

    the

    challenges.

    The

    paper

    focuses on

    how

    statistical

    offices can make statisticaldata more

    available

    to

    statistics

    users,

    while

    satisfying

    restrictions

    given by

    scarce resourcesand the

    willingness

    of

    data

    providers

    o

    co-operate.

    2

    User-Orientation

    and User-Friendliness

    There is a need to review the

    concepts

    of user-orientation nd user-friendliness.

    t

    has become a

    widely accepted dogma

    that information hould

    be

    user-oriented

    nd

    user-friendly.

    All

    information

    system

    designers

    pay lip

    services to this

    dogma.

    To be

    fair,

    most

    designers

    sincerely

    believe

    they

    are

    developing

    systems

    characterised

    y

    user-orientation

    nd

    user-friendliness,

    lthough hey

    have since

    long stopped

    thinking

    more

    deeply

    about the

    meaning

    of these

    concepts.

    In the

    early ages

    of

    computer

    usage,

    that is in the

    1960's,

    the direct user of a

    computer

    had to

    be a

    computerprogrammer.

    ince most

    computerapplications

    n

    those

    days

    were

    mathematically

    oriented

    (as

    suggested

    by

    the word

    computer

    tself),

    it meanta

    step

    forward

    rom the

    user's

    point

    of

    view,

    when the user/mathematicianould communicatewith the

    computer

    by

    means

    of

    mathematical

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    3/17

    24

    B.

    SUNDGREN

    formulae

    like

    in

    FORTRAN)

    ather han

    having

    to

    program

    n machine

    code

    or

    assembler

    anguages.

    The

    programming

    anguage

    COBOL

    meant

    a

    similar

    step

    forward or

    users/programmers

    riented

    towards

    administrative

    pplications.

    In a statisticaloffice thereare numerous nformation

    ystems applications

    of more or

    less the

    same

    kind: statisticsproduction.As systematisedby figure 1, a statisticalproductionprocess includes a

    number of

    very

    typical

    functions like frame

    administration,

    ampling,

    data

    collection,

    data

    entry,

    coding, editing,

    estimation,tabulation,

    analysis,

    and

    presentation.

    n

    the

    late

    1960's

    there

    were

    few

    other

    organisations,

    f

    any,

    which had

    a

    similar

    opportunity

    o

    exploit

    economies

    of

    scale

    in

    the

    development

    of

    computerapplications.

    Thus,

    not

    surprisingly,

    tatistical

    offices became

    pioneers

    n

    the

    development

    of

    generalised

    software.These software

    products

    often

    supportedhigh-level,

    non-

    procedural

    ommand

    anguages,

    which enabled

    non-programmers

    o

    develop applications

    within a

    certain

    application

    area

    by

    simply

    specifying

    (i)

    the

    input

    datato

    the

    application, .g.

    a

    so-called

    flat file with a

    certainrecord

    ayout;

    and

    (ii) the requestedoutput rom the application,e.g. a statistical ablewitha certaincontents

    and a certain

    ayout.

    The

    variability

    of

    applications

    developed

    with

    tools

    of this

    type

    has to

    be

    relatively

    imited. This

    condition

    s satisfied

    by

    the

    functions

    corresponding

    o

    production teps

    of

    a

    typical

    statistical

    urvey.

    The

    high-level, non-procedural

    ommand

    anguagesrepresented

    certain

    degree

    of end-userori-

    entation n a

    computing

    environment hat was

    based

    upon

    mainframe

    omputer

    centres

    operated

    as

    closed

    shops

    and

    in

    batch mode.

    In the

    early

    1970's

    user-orientation nd

    user-friendlinessbecame

    more or less

    synonymous

    with

    person/computer

    nteraction

    hrough

    menu-driven

    nformation

    ys-

    tems.

    Certainly

    hese

    systems

    helped

    to

    bridge

    he

    gap

    betweenthe

    computer

    and ts

    non-programmer

    end-users.Neverthelessit was still verymuch the computer hatcontrolled he user rather han the

    other

    way

    around.The user

    could choose his route

    through

    he

    hierarchy

    mplied

    by

    the

    menus

    of

    the

    menu-driven

    ystem,

    but he could not affect the

    hierarchy

    as

    such,

    and he had

    to

    go through

    he

    hierarchy

    evel

    by

    level

    in a rather

    igid

    way.

    The

    introduction f

    powerful, nexpensive

    micro-computers

    n the

    beginning

    of

    the 1980's added

    several

    new dimensions

    to the

    concepts

    of

    user-orientation

    nd

    user-friendliness.

    First

    of

    all

    the

    new

    technology

    meant

    that the closed

    mainframe

    hops

    could be closed

    for

    good

    as

    far

    as

    many

    of

    the users were concerned. The

    users

    suddenly

    found themselves

    in

    control of

    computer

    resources

    in much the

    same

    way

    as

    they

    already

    were

    in

    control

    of

    other resources

    necessary

    for

    their

    daily

    work.

    The

    computer

    became

    demystified.

    Furthermore,

    he

    new

    technology

    finally

    enabledthe user

    to takecontrol of thecomputerrather hanthe otherway around.Thispossibilitymaterialised n the

    windowing techniques

    pioneered

    by

    Xerox,

    followed

    up by

    Apple,

    and

    successfully

    mass-marketed

    by

    Microsoft.

    Today

    practically

    every

    user of statistics

    s a user

    of

    computers

    as well. He

    has his

    own

    computer

    in the

    office,

    at

    home,

    and

    when

    travelling.

    He demands o choose

    whatever

    software

    he

    prefers

    to

    retrieve,

    process,

    and

    analyse

    statistical

    data.

    Through

    standardised

    network

    services

    (in

    his own

    office

    as

    well

    as

    world-wide)

    he

    is

    able

    to communicateand

    co-operate

    with

    other

    human

    beings

    and

    other

    computers,

    and he is able to

    do

    this

    very

    much on

    his own

    conditions.

    Naturally,

    n

    this

    situationthere

    is not-and cannot be-a

    single

    concept

    of

    user-orientation nd

    user-friendliness.

    Different

    users

    have different

    needs,

    different

    resources,

    and different

    preferences.

    There are indeed

    a wide

    variety

    of user

    profiles,

    as

    suggested by figure

    2. It would be futile for

    a

    statistical office to

    try

    and

    satisfy

    all these

    different

    requirements

    with one and the same notion

    of user-orientation

    nd user-friendliness.On

    the

    other

    hand,

    it would

    be

    equally

    futile

    to

    try

    and

    tailor

    specific products

    and

    services for each

    potential

    user

    of

    statistics.

    The

    challenge

    for a

    modern

    statisticaloffice is to offera

    multitudeof

    products

    and

    services

    ranging

    rom

    *

    simple

    free-of-charge

    products

    based on

    self-service;

    over

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    4/17

    Making

    Statistical Data

    More

    Available

    25

    S T T I S T

    C L

    IN F O RM A T O N

    S Y S T E M

    I N P U T

    C Q U I S I T I O N

    A G G R E G A T I O N

    O U T P U T

    D E L I V E R Y

    Survey

    Statistical

    Presentation

    preparation modelling

    Frame

    Observation

    preparation

    modelling

    Tables

    Population

    Sampling

    Pmodelling

    Graphs

    modelling

    Data

    Estimation

    forms

    Data

    I

    Estimation

    ther

    resentao

    collection

    Contact

    Point

    ___

    sources

    estimations

    Observation

    Estimationf

    I

    Trditiona

    sampling

    rrors

    -

    publications

    Data

    reparation

    _

    Estimationf

    Onine

    at source other

    uality

    databases

    Data

    Dtherstimations Otherlectronic

    preparation

    and nalyses

    media

    Data

    ntry

    Coding

    Dataditing

    Finalize

    bserva

    tion

    register

    Figure

    1. A

    functionally

    oriented model

    of

    a statistical

    information

    ystem.

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    5/17

    26

    B.

    SUNDGREN

    *

    standard,

    ff-the-shelf

    product/service

    packagescharged

    according

    o

    price-lists;

    to

    *

    sophisticated,

    ailor-made

    ervices

    provided

    o individual

    customers

    on

    the basis

    of

    tenders.

    3 Standard Interfaces: Decreased Complexity and Increased Flexibility

    It is a

    challenge

    for

    a

    modern statistical office

    to be

    responsive

    to

    expectations,

    demands,

    and

    requirements

    rom an ever more

    dynamic

    environment.

    Society

    itself,

    which is

    to

    be

    reflected

    by

    statistical

    data,

    is

    changing

    at an ever

    faster

    rate. This leads

    to needs for

    more

    variability,

    more

    flexibility,

    on

    the

    input

    side as well as

    on the

    output

    side

    of statistical

    nformation

    ystems

    managed

    by

    statistical

    offices.

    In orderto

    manage requirements

    or

    greatervariability

    n

    the

    exchange

    of data

    with the

    external

    world,

    and

    in

    order o do this

    with

    the

    same or

    even

    less financial

    resources,

    a

    statistical

    office

    must

    consider

    system

    level

    actions.

    It

    is not

    enough ust

    to do

    moreof the same

    thing

    or

    to run

    aster .

    It is necessaryto undertakemoredrasticredesignactions.

    Making

    more

    extensive and more

    systematic

    use of

    standard nterfaces

    are

    actions that

    may

    lead

    to desirable

    system

    changes.

    Such

    actions

    may

    lead to a

    combinationof the

    following

    two

    consequences:

    *

    a

    drastic

    decrease

    n

    the

    complexity

    of data

    exchange

    between

    statistical

    nformation

    ystems

    and their environments s

    well as

    between the

    internal

    omponents

    of the

    individual tatistical

    information

    ystems

    themselves;

    *

    a

    drastic increase in

    the

    (actual

    or

    potential) variability

    and

    flexibility

    in

    the

    (external

    and

    internal)

    behaviour

    of the statistical

    nformation

    ystems.

    Both types of consequencesarehighly desirable.Figure3 from Malmborg& Sundgren(1994)

    illustrates he differences n

    terms of

    complexity

    and

    variability

    between

    *

    a situation where

    two sets

    of

    systems

    interact

    directly

    in the

    absence of a

    standard

    nterface

    (figure

    3a);

    and

    *

    a situationwhere

    the same two

    sets of

    systems

    interactvia a

    standard

    nterface

    figure

    3b).

    In

    the situation

    illustrated

    by

    figure

    3a,

    the

    interaction ormat will

    have

    to be

    negotiated

    for

    each

    combinationof

    systems

    that

    need to

    interact.

    This will

    typically

    lead to

    many

    different,

    ailor-

    made interaction

    ormats that

    require

    a lot

    of

    resources to

    develop

    and

    maintain.The

    situation

    is

    inconvenient

    rom

    operation

    point

    of view

    as

    well,

    since

    every

    ndividual

    actorwill

    have

    to

    remember

    different

    nteraction

    ormats or

    different

    nteraction

    partners.

    f a new

    system

    is

    added

    to

    any

    of

    the

    two sets of

    systems,

    a

    new

    interaction

    ormatwill

    have to be

    negotiated

    or each

    other

    system,

    with

    which

    the

    new

    system

    needs to

    interact.

    In

    the

    situation

    llustrated

    by

    figure

    3b,

    every system

    will

    need to

    develop,

    maintain,

    and

    operate

    one

    single

    interaction

    process,

    the

    interaction

    with

    the standard

    nterface.

    Through

    his

    process,

    every

    system

    will be

    able

    to

    communicatewith all

    other

    systems,

    including systems

    that

    do

    not

    yet

    exist

    but will

    be

    introduced ater.

    Thus,

    in

    comparison

    with

    the

    situation n

    figure

    3a,

    this

    situation is

    both

    less

    complex

    (to

    develop,

    maintain,

    and

    operate)

    andmore

    flexible

    vis-&-vis

    growth

    and

    other

    changes

    in the

    system

    environment.

    Figure4 indicates a numberof places where a statistical nformation ystem could and should

    contain well

    designed,

    preferably

    tandardised

    nterfaces.One

    may

    distinguish

    between

    *

    external,

    nter-system

    nterfaces;

    and

    *

    internal,

    ntra-system

    nterfaces.

    External

    interfaces ae

    interfaces

    between,

    on

    the one

    hand,

    the statistical

    information

    system

    under

    consideration

    and,

    on the

    other

    hand

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    6/17

    N9

    (O

    USER

    CATEGORY

    Ministry

    Researcher

    Analyst Analyst

    Actor

    n Internationa

    BY

    of

    /scientist

    public

    private

    the finance

    organisation

    CHARACTERISTIC

    inance

    sector sector market

    Competence:

    -

    subject

    matter

    -

    statistical

    -

    EDP

    Knowledge

    bout

    relevant ata

    sources:

    -

    broad

    -

    deep

    Quality equirements:

    -

    contents

    -

    accuracy

    -availability

    Needs for

    search

    systems,

    documentation,

    nd

    metainformation

    Resources:

    -

    hardware

    -

    software

    -

    expertise

    -

    money

    -

    trading

    bjects

    Figure

    2.

    A

    scheme

    or analysing

    the

    profiles

    of

    different

    categories

    of

    stati

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    7/17

    28

    B. SUNDGREN

    *

    statistics users:

    human end-users as well

    as

    other

    (statistical)

    nformation

    ystems;

    these are

    output-oriented

    nterfaces;

    *

    data

    providers:

    human

    respondents

    as well as other

    administrative)

    nformation

    ystems;

    these

    are

    input-oriented

    nterfaces.

    An

    example

    of

    an

    output-oriented

    tandard

    nterface for statistical information

    systems

    is the

    GESMES

    format for

    representation

    f

    statisticalmacroinformation nd

    accompanying

    meta-infor-

    mation.

    GESMES

    tandsfor

    GEneric

    Statistical

    MESsage ,

    and the standard

    s

    developed

    by

    the

    UN/EDIFACT

    Message Development

    Group

    6.1.

    Similarly,

    on

    the

    input

    side,

    there

    are several UN/EDIFACT

    tandard

    ormats

    corresponding

    o

    typical

    documents

    of different

    branchesof

    activity

    n

    society,

    e.g.

    trade.

    A

    generic

    standard or

    input

    messages

    to statistical

    nformation

    ystems

    is the Raw Data

    Reporting

    Message;

    see UN/EDIFACT

    (1994).

    By

    providing

    a statistical

    nformation

    ystem

    with

    standardised

    xternal

    nterfaces,

    he

    designer

    makes the system open and easy to integratewith othersystems, e.g. the local systems of users

    and

    providers

    of statistical

    data. This is indeed

    a

    practicalapplication

    of the theoretical

    principles

    illustrated

    n

    figure

    3 above.

    By accepting

    data

    and metadata

    hrough

    standardised

    nterfaces,

    a

    statistics

    producer

    acilitates

    for

    respondents

    o

    provide

    statistical

    raw

    data

    as a natural ide effect

    of their own

    administrative

    outines.

    Analogously,

    by

    making

    (aggregated

    or

    anonymised)

    data

    and metadata

    available

    through

    standardised

    nterfaces,

    a statistics

    producer

    acilitatesfor statistics

    users to

    integrate

    statistical

    data

    from the

    statistics

    producer

    with the user's

    own

    (statistical

    and

    administrative)

    ata

    for

    analyses

    and

    decision-making.

    Statistical

    offices

    began

    to realise

    the

    importance

    of standardised

    nternal

    interfaces,

    at least

    implicitly,

    when

    they

    started

    o

    exploit

    the benefits

    of

    generalised

    software at a

    large

    scale

    in the

    middle of the 1970's. As long as statisticalinformation ystems were completelytailor-madeby

    professional

    programmers,

    who were

    using proceduralprogramming

    anguages,

    there was not

    a

    strong

    enough

    incentive to define

    and

    use standardised

    nterfaces

    between software

    components.

    It was

    up

    to the individual

    programmer

    o define

    suitable

    data structures

    as well

    as formats and

    procedures

    for data

    interchange.

    When

    generalised

    software

    productsgained

    in

    popularity,

    much

    on the

    initiative

    of

    non-programmers,

    ne

    problem

    was the enormous

    variability

    n data structures

    and data

    interchange

    ormats and

    procedures

    hat

    were exhibited

    by existing applications

    and data

    files. It was

    first considered

    to further

    develop

    the

    generalised

    software

    tools in order

    o

    make

    them

    capable

    of

    handling

    this

    variability.

    t was

    soon realised

    that this would

    be

    a

    Sisyphus

    task. Instead

    some statistical

    offices decided

    to standardisedata

    structures

    on the

    basis of the

    concept

    of a flat

    file ,

    that

    is,

    a file

    containing

    only

    one record

    ype, adhering

    o

    a record

    ayout

    with

    a

    fixed number

    of fields

    containing

    the

    (single)

    values of the

    attributes,

    r

    variables,

    of one

    particular

    nstance of

    a

    certain

    object type,

    e.g.

    a

    person,

    a

    household,

    or an

    enterprise.

    Multiple

    record

    types,

    hierarchical

    records,

    and

    repeating

    groups

    were

    among

    the data

    structure

    phenomena

    hat were banned

    in this

    standardisation

    rocess.

    This standardisation

    f data

    structures

    and data

    interchange

    can be seen

    as a first

    step

    towards

    database-oriented

    nformation

    ystems.

    Technically

    peaking,

    here

    was no

    physical

    databasevisible

    in those

    systems,

    where data

    were stored

    and

    exchanged

    n

    sequential

    iles storedon

    magnetictapes.

    Nevertheless he

    flat ile standard

    tarted o

    play

    the

    same role as

    the relationaldatamodel

    (with

    the

    SQLinterface)has in today'sdatabase-orientedystems.Differentprocesses,controlledby different

    generalised

    or tailor-made

    software

    products,

    exchanged

    data

    as flat files-within

    and

    between

    statistical

    nformation

    ystems.

    The

    generalised

    software

    products

    were often

    developed

    within

    the

    statistical

    offices

    themselves,

    butthe same

    principles

    could

    easily

    be

    applied

    o commercialsoftware

    as well.

    In fact commercial

    software could

    very

    seldom

    handle

    more

    complex

    data structures

    han

    flat files

    anyhow.

    In a modernstatistical

    nformation

    ystem

    the

    relational

    datamodel

    and the

    SQL

    standardor

    data

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    8/17

    Making

    Statistical Data More

    Available

    29

    Figure 3a One way of organisingthe interactionbetween two sets of systems.

    Figure

    3b

    Interaction

    between two sets

    of systems

    via a standardised

    nterface.

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    9/17

    30

    B.

    SUNDGREN

    OBSERVATION

    ROVIDERS

    STATISTICS

    SERS

    ADMINISTRATIVE

    SYSTEMS

    observations,

    bservation

    statistics,

    nonymou

    statistics,

    metadata

    metadata

    metadata

    microdata,

    metadata

    etada

    . . .

    . . .I.....

    E

    S.... . . .R. . .

    E

    R. .

    ....... .

    ... ...N

    . W. .

    ES

    U

    ER. ......

    ..

    REGISTER PRIMARY

    TATISTICSRODUCTIONECONDARY

    TATISTICSRODUCTION

    register

    ata,

    observations,

    statistics,

    microdata,

    macrodata,

    metadata metadata metadata

    metadata

    metadata

    .RETRIEVAL

    ECHANISMSND

    LOBAL

    ETADATA

    BASE

    REGISTERS'I

    BSERVATION

    REGISTERSCSTATISTICS

    COLLECTIONS

    MICRODATANDMETADATA MACRODATANDMETADATA

    TH

    E

    DATABASE

    OF A

    STATISTICAL

    OFFICE

    SEOTHERTATISTICALNFORMATION

    YSTEMS0

    Figure

    4.

    A

    database-oriented tatistical

    information

    ystem

    with

    clearly

    defined

    nternal

    and external

    interfaces.

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    10/17

    Making

    Statistical

    Data

    MoreAvailable

    31

    interchange

    between

    application

    oftwareand

    the

    database

    management ystem

    are

    obvious choices

    for

    internal

    nterfaces.All commercial

    software

    products

    hat

    want to

    survive

    on

    the market

    have

    to

    adhereto

    these

    standards.

    Anotherde

    facto

    standard

    though

    imited

    to

    PC

    software)

    s

    Microsoft's

    Object Linking

    and

    Embedding

    OLE)

    for

    transferring

    ata and control between different

    software

    components.

    Figure

    5

    indicates

    how the different

    unctions

    of a

    statistical nformation

    ystem

    (cf

    figure

    1)

    could

    be

    designed

    to interface

    the database

    ncluding

    microdata,macrodata,

    nd

    metadata.

    No standards

    are for ever.

    Maybe

    in five

    or ten

    years

    time

    today's

    de

    facto

    standardswill have

    become

    replaced

    by

    others,

    e.g.

    a

    widely accepted

    standard

    or

    object-oriented

    atabase

    management.

    This is not a

    great problem.

    It is

    relatively

    simple

    to move from

    one

    standard

    o

    another. t is much

    more difficult to

    live in a non-standardised

    ituation,

    and to

    make the first-timemove to a

    standard.

    Nor does

    it matter

    very

    much if standards re

    formallyagreedupon

    by

    standardisation odies.

    What

    is

    critical is that standards

    hould

    neither discriminate

    oftware manufacturers

    rom

    taking

    part

    in

    competition,

    nor force

    softwareusers to be faithful to

    any

    particular

    ardware r software

    vendor.

    OBSERVATION

    PROVIDERS

    STATISTICS

    USERS

    ADMINISTRATIVE

    SYSTEMS

    observations

    observations

    anonymous

    statistics.

    INPUT

    ACQUISITION

    AGGREGATION

    OUTPUT

    DELIVERY

    register data.

    observations.

    microdato.

    sto

    istics.

    microdto.

    mocrodotoa

    etada

    o

    metodato

    ietodato metodato

    metodato

    metadata

    MANAGEMENT

    F DATAAND METADATA

    B S E

    R E G I S T E R S

    O B S E R V T I O N

    R E G I S T E R S S T T I S T I C S

    C O L L E C T I O N S

    CODE

    REGISTERS

    MICRODATAND

    ETADATA

    MACRODATAND

    ETADATA

    STATISTICALATABASE

    Figure

    5. A

    functionally

    orientedmodel

    of

    a

    database-oriented tatistical

    information

    ystem.

    4

    Standard

    Components:

    Off-the-Shelf

    Software

    Statisticaloffices were

    among

    the first

    companies

    and

    organisations

    o make

    systematic

    use

    of

    stan-

    dard

    components(e.g. generalised

    software)

    n

    the

    development

    of information

    ystem

    applications.

    Already during

    the

    sixties statistical offices

    started

    o

    use

    commercially

    available and/or in-house

    developed

    statistical

    packages

    for common

    statistical

    operations

    ike

    data

    editing,

    tabulation,

    and

    statistical

    analysis.

    During

    the

    seventies

    some statistical

    offices could start

    reducing

    the

    number

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    11/17

    32

    B.

    SUNDGREN

    of

    applicationprogrammers, ncouragingsubject

    matter

    statisticians

    o

    develop (part

    of)

    their

    own

    applicationsby

    means of

    high-level, non-procedural, eneralised

    softwaretools.

    This

    development

    was intensified

    during

    he

    eighties.

    With the

    advent of

    inexpensive

    PC

    technology

    and

    software,

    the

    boundary

    between

    user

    pro-

    gramming and professionalprogramming as becomeblurred-in statisticalofficesas well as in

    the

    data

    processing

    community

    at

    large.

    Major

    companies

    are

    closing

    down

    their

    central

    application

    development

    departments,

    dvising

    business

    departments

    o use

    ready-made

    oftware

    packages

    for

    auxiliary

    functions,

    and

    to

    puttogether

    business-critical

    applications

    rom

    software

    components

    that can

    be

    bought

    off-the-shelf from

    commercialsoftware

    vendors.

    Welke

    (1994)

    has

    predicted

    that

    we

    shall

    see

    a

    paradigm

    shift

    in

    how

    information

    systems

    are

    typically developed:

    There

    s

    a

    fundamental

    paradigm

    shift underway

    n how

    (information)

    ystems

    and

    the

    software

    which

    supports

    them,

    is

    developed.

    The

    shift

    is

    away

    from

    a

    craft-based

    structure n which user requirements re specifiedand customsolutionsdeveloped,to

    a

    market-product

    ased

    approach

    in

    which the users

    themselves

    select

    and

    arrange

    meaningful-to-them

    omponents

    as

    a

    solution to their

    requirements.

    The

    paradigm

    hift is

    likely

    to

    imply

    an even

    greater

    uture or such

    things

    as

    *

    inexpensive, generalised

    software,

    available off-the-shelf

    *

    tool-boxes

    containinggeneralised

    standard

    omponents

    rapid

    applicationdevelopment

    RAD)

    methods

    and tools.

    In

    connectionwith

    RAD,

    it

    should

    be

    notedthat

    ools for

    Computer-Assisted

    ystems

    Engineering

    (CASE) are likely to become moredomain-specific hantoday.Jackson(1994) has articulated he

    importance

    of

    domain-specificknowledge

    for software

    development:

    The

    large

    aspiration

    to

    place

    the whole

    of software

    development

    .. as one

    more

    branch

    of

    engineering

    s misconceived.Our

    aspiration

    should be to

    develop

    specialised

    branches

    of softwareengineering

    ..

    ...

    there

    are no

    casual

    builders

    of

    cars or

    bridges.

    But

    in

    software

    development

    t is

    not

    easy

    to

    draw a

    clear line

    betweenthe casual

    developer

    and

    the

    serious,

    professional

    developer

    As a

    result,

    ..

    softwaredevelopment

    s still

    largely

    an amateur

    activity

    in

    a

    very

    important

    ense.

    5 Metadata

    There are

    many

    potential

    users

    of

    statistical

    data

    n a

    modern

    society. Many

    of

    them

    have the

    com-

    petence

    as well as the hardwareand software resources needed to take full

    responsibility

    or their

    own

    usage

    of statistical data.

    They

    are

    eager,

    and sometimes

    impatient,

    o

    exploit

    the

    information

    potential

    of

    statistical

    offices,

    and to do this on their own conditions-as far as

    permittedby

    confi-

    dentiality

    restrictions.One

    major

    obstacle,

    which often

    prevents

    hem from

    doing

    so,

    is the

    present

    inadequacy

    of available

    metadata,

    hat

    is,

    the absence or

    inadequacy

    of

    systematic

    descriptions

    of

    statisticaldata

    and the

    processes

    behind them.

    A (potential)user of statisticaldata will need metadata or threemajorpurposes:

    1.

    searching

    or

    potentially

    relevant

    and useful statistical

    data;

    2.

    evaluating

    he

    adequacy

    of availabledata and

    the

    cost/benefit

    of

    using

    them;

    3.

    retrieving, nterpreting,

    nd

    analysing

    statistical

    data.

    First,

    statistical

    metadata

    re neededas a basis for

    search

    operations.

    The

    (potential)

    user s

    looking

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    12/17

    Making

    tatistical

    DataMore

    Available

    33

    for

    statistical

    data that could

    be relevant and useful for

    him in

    describing,

    analysing,

    or

    solving

    a

    certain

    problem.

    The traditional

    approach

    s for the

    user

    to

    turn

    o a

    statisticaloffice. Staff members

    of

    statistical

    offices

    are

    often

    very helpful,

    but

    today

    this

    approach

    s not sufficient.There

    are

    far too

    many

    potential

    users for

    any

    statisticaloffice

    to

    cope

    with face-to-face.

    In

    addition,

    many

    users need

    to combine statisticaldata(andotherdata)from severalsources,andno particular taffmember,or

    even

    organisational

    unit,

    of a

    statistical office

    will

    have

    the

    necessary

    overview.

    Moreover,

    manual

    help-functions

    are

    relatively expensive

    and

    slow,

    even

    if

    they

    are

    computer-assisted.

    Today

    a user

    will

    expect

    the metadata

    needed for

    search

    tasks to

    be

    organised

    and disseminated n such

    ways

    that

    he himself

    can search for

    relevantdataon

    the basis of

    widely

    available,

    computerised

    metadata.

    The

    process

    may

    start from

    a

    relatively vaguely expressed

    informationneed.

    The

    computerised,

    metadata-supported

    rocess

    should

    help

    the user to betterunderstand

    his own

    needs,

    and it should

    result

    in

    explicit

    referencesto

    availablestatistical

    data,

    which

    are

    likely

    to

    be

    relevant

    or the user's

    problem.

    Second, once the user has identifiedsome statisticaldata of potentialrelevancefor his problem,

    he

    will

    have to

    determine,

    f

    the

    data

    are

    really adequate

    or the intended

    purpose.

    This means

    that

    the user

    has to

    evaluate

    the

    quality

    of

    the

    data,

    and

    to consider

    whether

    t is

    really

    worth the

    effort

    and

    cost to

    retrieve,

    nterpret,

    and

    analyse

    the

    data.

    Third,

    f

    and

    when the user

    has come to the conclusion

    that certainavailabledata

    are of

    sufficient

    quality

    to

    justify

    the efforts

    and costs

    to use

    them,

    he will

    need metadata n order

    to

    actually

    retrieve,

    interpret,

    and

    analyse

    the data. Retrieval

    may

    be

    accomplished

    by downloading

    data

    and

    accompanying

    metadata o

    the user's

    own

    PC

    or

    by obtaining

    a disk or CD-ROM

    copy.

    Interpretation

    and

    analysis

    will

    require

    the same

    kind of

    metadata

    as were needed

    for

    making

    the

    preliminary

    judgement

    of

    the

    quality

    of

    the

    data.

    However,

    at

    this

    stage

    it

    may

    be

    necessary

    to obtain

    deeper

    and

    more

    precise

    information

    about

    how

    the

    data

    were collected

    and

    processed,

    before

    they

    resulted

    in

    the available

    statistics.

    The documentation

    emplet

    n

    figure

    6 identifies

    metadata

    tems thataredesirable

    or even

    necessary

    as

    a basis

    for

    responsible

    usage

    of statistical

    data

    emanating

    rom a

    particular

    tatistical

    survey.

    If

    appropriately

    ompiled

    with

    the

    corresponding

    metadata or

    other

    surveys they

    may

    also

    serve as

    a

    basis

    for search

    operations.

    The

    survey

    documentation

    emplet

    is

    part

    of the documentation

    ystem

    SCBDOK,

    developed

    by

    Statistics

    Sweden.

    See

    also

    Sundgren

    1991a,

    1991b,

    1992,

    1993a,

    1993b).

    It is an

    equally

    important

    ask for

    a statistical

    office

    to

    produce

    metadata

    oncerning

    ts

    surveys

    as

    to

    produce

    the

    survey

    data

    themselves.

    In order

    o be able to

    accomplish

    this task in

    an efficient

    way,

    the statistical

    office

    must

    carefully

    design

    its

    metadata lows. Metadata

    should be

    captured

    when

    they

    naturally

    arise for

    the first

    time,

    e.g.

    as

    the result

    of a

    design

    decision.

    At later

    stages

    it

    should

    be

    possible

    to

    have

    them

    automatically

    ransferred

    nd transformed

    when

    survey

    data

    are

    transferred

    or transformed.

    Furthermore,

    t should

    be

    possible

    to

    have

    the

    metadata

    onsistently

    updated,

    when

    the

    survey processes

    are

    changed,

    e.g.

    as the result of

    new

    design

    decisions.

    The

    metadata

    describing

    a statistical

    survey

    and its data

    outputs

    are a combinationof

    formalised

    metadata,e.g. code lists and recorddescriptions,and free-textmetadata ike verbal descriptions

    of

    variables

    and

    processes.

    Thus

    software

    systems

    for

    handling

    statistical metadata

    may require

    different

    ypes

    of software

    components

    o be

    combined,

    e.g.

    relational

    database

    management ystems

    and software

    for

    managing

    and

    searching

    large

    amounts of

    text data.

    Hypertext

    software

    (like

    in

    advanced

    help

    functions

    and

    high-level

    Internet-tools)

    will

    also

    have

    a

    great potential

    for

    enabling

    the users

    to

    navigate

    and

    associate in available

    statisticaldata

    and metadataand to

    process

    them

    in

    efficient and

    intelligent ways.

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    13/17

    34

    B.

    SUNDGREN

    DOCUMENTATION

    EMPLETFOR

    A

    STATISTICAL

    URVEY

    0 Administrative

    information

    1

    Survey

    contents

    0.0 Documentation

    emplet

    1.1 Domainof interest

    and

    target

    domain,

    0.1

    Survey

    name and

    identification,

    verbal

    description

    organisation

    nd

    persons

    responsible

    1.2

    Target

    domain,

    formal

    description

    0.2

    Documentation

    modules

    and

    subsystems

    1.2.1

    Target bjects, escription

    ndobject

    raph

    0.3

    Archiveddata sets

    and

    published

    statistics

    1.2.2

    Target

    opulations

    0.4 References

    to

    other relevantdocumentation

    1.2.3Target

    variables

    1.3

    Surveyoutputs

    1.3.1

    Structured verviewof the

    tabulation

    lan

    1.3.2

    Publications

    n

    printed

    orm

    1.3.3

    Electronic

    istribution

    1.3.4 Database

    torage

    2

    Survey

    plan

    3

    Completed

    data

    collection

    2.1

    Frame

    procedure

    and observation

    objects

    3.1

    Frame

    production

    2.1.1 Overview

    3.2

    Sampling

    2.1.2

    Frame

    and

    ts inks o

    objects

    3.3

    Data

    collection

    2.1.3

    Frame

    production

    3.3.1

    Communication iththe

    data

    providers

    2.1.4

    Overcoverage

    and

    undercoverage

    3.3.2

    Measurements,

    experiences

    of

    instruments

    2.2

    Sampling procedure

    (if

    applicable)

    3.3.3

    Interruptions/overcoverage,

    ctions

    aken

    2.3 Data collection

    procedure

    3.3.4

    Non-response,

    ausesand

    actions

    aken

    2.3.1

    Observation

    bjects,

    description

    nd

    object

    graph

    3.3.5

    Editing

    nd

    correction t data

    collection ime

    2.3.2

    Data

    ources,

    ncluding

    ontact

    rocedures

    3.4

    Data

    preparation

    (coding,

    data

    entry,

    2.3.3 Observation ariablesand measurement

    nstruments

    editing and

    correction)

    2.3.4

    Interruptionsincluding

    ctions

    t

    overcoverage) 3.5 Production of final observation register

    2.3.5

    Non-response

    ctions

    3.5

    roduction of

    inal

    bserrupation

    egisterobjec

    2.4

    Planneddata

    preparationcoding,

    data

    entry,

    3.5.

    Treatment

    f

    nointerresuption/overoverage

    bjects

    editing

    and

    correction)

    3.5.3

    Treatment

    f

    partial

    on-response

    2.5 Planned

    observation

    register

    3.5.4

    Frequency

    ounts f

    overcoverage, responses,

    2.5.1

    Overview

    non-responses

    etc

    2.5.2

    Object

    ypes, including

    erived

    object types

    3.5.5

    Completed

    derivations

    f

    derived

    objects

    and

    2.5.3

    Object

    graph

    variables

    2.5.4

    Object/variable-matrixes,ncluding

    derived

    variables

    2.5.5

    Data

    set

    descriptions

    2.5.6

    Derivation

    rocedures in complicated

    ases)

    4

    Statistical

    processing

    and

    presentation

    5 Data

    processing system

    4.1 Observationmodels 5.0 Systemoverview

    4.1.1

    Sampling

    5.0.1

    Verbal

    description

    4.1.2

    Non-response

    5.0.2

    System

    flow

    4.1.3

    Measurement/observation

    5.1*

    Subsystemdescription

    4.1.4

    Frame

    coverage

    5.1.1 Overview

    4.1.5

    Totalmodel

    5.1.1.1

    Verbal

    description

    4.2

    Population

    models

    5.1.1.2

    System

    low

    4.3

    Computation

    ormulae

    or

    estimations

    5.1.2

    Component

    descriptions

    4.3.1 Point

    estimations

    5.1.2.1

    Data

    sets

    4.3.2 Estimations f

    sampling

    errors

    variance

    estimations)

    5.1.2.2

    Processes

    4.3.3

    Estimation/judgment

    f other

    quality

    haracteristics

    5.1.2.3

    Other

    components

    4.4

    Analyses

    4.5 Presentationand

    dissemination

    procedures

    6

    Log-book

    Figure

    6. Documentation

    emplet

    or

    a

    statistical

    survey

    and

    its

    production

    ystem.

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    14/17

    Making

    StatisticalData MoreAvailable

    35

    6

    Confidentiality

    Statistical

    data can

    only

    be made available o the users within the limitationsof certain

    confiden-

    tiality

    restrictions. The

    most fundamental

    purpose

    of

    these restrictions is to

    preserve

    the

    data

    provider'sconfidence in the statisticsproducer'swillingness and ability to ensure that data sub-

    mitted to

    a statistics

    producer

    will be

    used

    for statistical

    purposes

    only.

    Among

    other

    things

    the

    statistics

    producer

    must be

    able

    to

    ensure that

    statistical

    outputs

    will

    not,

    thanks to the

    input

    sub-

    mitted,

    directly

    or

    indirectly,

    enable

    a

    statistics user to associate sensitive information

    with the data

    provider

    or

    anyone

    whom

    the

    data

    provider

    would like to

    protect.

    Statistical

    confidentiality

    an

    only

    be ensured

    by

    a combination f

    technicaland

    egislative

    actions.

    Advanced statistical

    and mathematicalmethods

    alone will never

    be

    sufficient,

    however

    sophisti-

    cated

    they may

    be. This has

    been

    clearly

    demonstrated

    y

    massive

    researchefforts

    during

    he last

    25

    years. Basically,

    statistical

    confidentiality

    s aboutconfidence.A data

    provider,

    who does not trust

    a

    particular

    tatistics

    producer,

    will

    not

    change

    his mind

    just

    because the

    statistics

    producer

    promises

    to applya perfectlysafe statisticalmethod, f there were such a method(whichthere is not).

    An

    adequate

    combination

    of technical and

    legislative

    rules for

    protecting

    the

    confidentiality

    of

    statisticaldata could

    be

    something along

    the

    following

    lines:

    *

    It

    should be forbidden

    by

    law

    to use data submittedto

    a

    statistics

    producer

    or

    other than

    statistical

    purposes.

    *

    Data submitted to

    a

    statistics

    producer

    for

    statistical

    purposes

    should be

    protected

    against

    sabotage,

    theft,

    and intrusion

    by

    physical

    and

    technical measures. Data

    that

    are

    associated

    with identified

    subjects

    persons

    or

    organisations)

    must be

    handled

    only by

    authorised

    persons,

    sworn

    n

    by

    the

    statisticaloffice.

    * Statistical datamust be anonymised(microdata)or aggregated macrodata)before they can

    be distributed o users outside

    the statistical office.

    Anonymised

    microdataand

    aggregated

    macrodatamust

    be checked

    by

    the statistics

    producer,

    o that

    they

    do

    not

    contain

    obvious

    disclosures

    of sensitive data for

    individual,

    easily

    identifiable

    subjects

    (persons,

    enterprises

    and other

    organisations).

    A disclosure s obvious

    f

    it does not

    require

    any

    conscious effort.

    *

    It

    should

    be

    forbidden

    by

    law to make

    any

    conscious efforts

    to derive sensitive

    data about

    identified,

    ndividual

    subjects

    from

    statisticaldata.

    *

    It

    should

    always

    be less attractive

    or

    a

    potential

    ntruder,

    who considers

    all

    costs

    and

    benefits,

    to obtain

    information

    about

    identified

    subjects

    from

    protected

    statistical data than

    to obtain

    the same

    information

    rom some

    other source.

    * Statistical data that are not

    accompanied

    by

    adequate

    documentation metadata)should be

    destroyed.

    7

    Experiences

    from Statistics Sweden

    This

    paper

    has

    pointed

    to

    a

    numberof

    problems

    and

    opportunities

    hat need to

    be tackled

    by

    a

    statistics

    producer,

    who wants

    to make statistical

    data more

    available to

    a

    user,

    while

    satisfying

    restrictions

    given by

    scarce

    resourcesand the

    willingness

    of

    data

    providers

    o

    co-operate.

    The

    topics

    covered were:

    * the fuzzy conceptsof user-orientation nduser-friendliness

    *

    standard nterfacesas

    instruments or

    simplicity

    and

    flexibility

    *

    standard,

    off-the-shelf

    software

    components

    as

    instruments or

    speedy

    and

    inexpensive

    applicationdevelopment

    *

    good

    quality

    metadata

    enabling

    the user to

    retrieve and

    process

    data

    independently

    of

    the

    producer

    .

    technical and

    legislative

    measures or

    protecting

    he

    confidentiality

    of statisticaldata.

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    15/17

    36

    B.

    SUNDGREN

    Statistics

    Sweden is an

    example

    of a

    statistical

    agency,

    which has

    been

    working very

    actively

    in all

    these areas

    over

    the

    last three

    decades. In

    the late

    1960's and

    early

    1970's

    Statistics

    Sweden

    developed

    the TAB68

    suite of

    high-level,

    non-procedural

    oftware

    products.

    These

    tools,

    which

    covered

    many

    importantproduction

    steps, e.g.

    editing

    and

    tabulation,

    became

    extensively

    used at

    Statistics Sweden, first by non-programmersnd then (aftersome initial hesitation)even by the

    programmers

    hemselves.

    Many production ystems

    are still

    heavily

    dependent

    on

    these

    software

    products.

    After

    gaining important xperiences

    from

    using

    the

    Canadian ime

    series

    database

    ystem,

    CAN-

    SIM,

    Statistics Sweden

    developed

    its own AXIS

    system

    for

    making

    cross-sectionaldata as

    well as

    time series dataavailableon-line to internalandexternal

    users. The

    system

    was

    put

    into

    regular

    oper-

    ation

    in

    1976,

    and it

    is

    still

    running

    successfully, although

    many

    users

    now

    demanddatato

    be made

    available

    n

    many

    other

    ways

    than

    through

    elatively

    expensive

    and

    rigid

    mainframe

    ommunication.

    During

    the next few

    years

    the

    system

    will be

    phased

    out,

    and a

    new,

    client/server

    based

    system

    will

    be

    phased

    in. The new

    system

    is

    entirely

    PC

    based;

    it

    makes extensive

    use of standard

    nterfaces,

    e.g.

    SQL

    and

    GESMES,

    as

    well

    as

    a

    wide

    range

    of off-the-shelf

    oftware

    products,

    avoured

    by

    internal

    and

    externalusers:

    Figure

    7 illustrateshow the

    new

    statisticaldatabase

    ystem

    at

    StatisticsSweden is

    intended o

    co-

    operate

    with the

    survey-based

    production

    ystem

    within a

    client/server

    ramework.

    The new

    database

    system

    will

    make available

    a

    lot of

    aggregated

    macrodata

    time

    series as well

    as

    cross-sectional),

    some

    anonymised

    microdata,

    and the metadataneeded

    for efficient

    searching

    and

    responsible interpretation

    nd

    analysis by

    external users. Microdataand macrodatawill

    be

    stored

    n

    SQL

    databases.

    At

    a

    later

    stage

    object-oriented

    atabase

    management ystems

    (OODBMS)

    and

    so-called

    on-line

    analyticalprocessing

    (OLAP)

    productsmay

    be

    considered

    as

    alternatives

    or

    complementsto SQLdatabases or certain ypesof usages.

    The main sources of

    metadata

    will

    be

    survey

    documentations,

    ollowing

    the

    SCBDOK documen-

    tation

    templet

    shown

    in

    figure

    6

    above,

    complementedby product

    overviews,

    quality

    declarations,

    and some

    other

    types

    of

    documentation,

    which are

    available

    or statistical

    productsproduced

    within

    the

    Swedish

    Statistical

    System.

    The bulk of metadata

    will be

    textual data with limited

    structuring.

    These data are most

    likely

    to be handled

    as a

    text database

    by

    free

    text searchersand

    document

    handling systems.

    A small but

    importantpart

    of the metadataare to be used for

    controlling

    the

    operation

    of varioussoftware

    products.

    These metadata

    need

    to be stored

    n

    an

    SQL

    database,

    o that

    they

    can

    be

    handled

    formally

    and

    automatically

    ommunicated

    and

    transformedbetween

    different

    software

    components

    nside

    and

    outside the

    database

    ystem.

    The total size of the new

    statistical

    database,

    ncluding

    metadata,macrodata,

    and

    anonymised

    microdata

    may

    turn

    out to

    be in the orderof 100 GB.

    Many

    differentchannels

    will

    be utilised for

    disseminating

    data from the new statistical

    database

    to

    the

    users,

    including

    self-service PCs

    in the

    premises

    of Statistics

    Sweden,

    available

    or

    external

    users,

    who

    want

    to down-load data and metadata rom the statisticaldatabase

    o their

    own

    storage

    media,

    WorldWide Web

    (WWW)

    databases,

    CD-ROM

    products,

    diskettes,

    etc.

    As for

    confidentiality

    problems

    concerning

    statisticaldata

    (anonymised

    microdata

    and

    aggregated

    datawith few

    contributors)

    he situation

    n

    Sweden has become

    dramatically

    mproved

    or

    both users

    and

    producers

    as

    well

    as for

    data

    providers

    hanks

    o

    new

    legislation,

    which

    criminalises

    all

    attempts

    to derive dentifieddatafromstatisticaldata.Theparticular aragraph bout his in theSwedishLaw

    on Official Statisticsreads

    as

    follows:

    Official

    statistics must not be

    combined

    with other

    information or

    the

    purpose

    of

    finding

    out the

    identityof

    individual

    ubjects.

    In

    summary,on-going developments

    within the Swedish Statistical

    System

    providegood

    illustra-

    tions of the

    general principles

    that have been discussed in this

    paper.

    The

    practical

    results,

    which

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    16/17

    Making

    Statistical

    Data MoreAvailable

    37

    D T

    PROVIDERS

    N D

    U S E R S O F

    ST TISTICS

    branch f

    statistics,

    register

    o b s e r v a t i o n

    r e g i s t e r s

    s t a t i s t i c s

    a n d

    m e t a d a t a

    database unct

    information

    eo

    U S E R S

    O

    STATISTICS

    INTERN TION L

    ORGAN

    IZ A T IO N S

    Figure

    7. Client-server

    architecture

    of

    a

    system

    of

    statistical

    information

    ystems.

    This content downloaded from 136.145.187.76 on Mon, 01 Feb 2016 18:58:20 UTCAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/25/2019 Making Statistical Data More Available_1996.pdf

    17/17

    38 B.

    SUNDGREN

    havebeen achieved o

    far,

    ndicate hatstatistical

    ffices

    will be

    ableto meetthe

    challenges

    rom

    the

    usersto makestatistical atamore

    available

    y

    meansof modern

    echnology,

    with

    due

    con-

    siderationo

    the

    nterests f data

    providers

    nd

    he

    public

    at

    arge.

    References

    Jackson,

    M.

    (1994).

    Problems,Methods,

    and

    Specialization,

    IEEE

    Software.

    Johannesson,

    P.

    (1993).

    Schema

    Integration,

    Schema

    Translation,

    and

    Interoperability

    n Federated

    Information

    Systems,

    University

    of Stockholm.

    Lebaube,

    P.

    (1991).

    EDI and

    Statistics-A

    Challenge

    for

    statisticians.

    n Proc

    48th

    Session

    of

    the

    International

    Statistical

    Institute,

    Cairo.

    Malmborg,

    E.

    (1986).

    On the

    Semantics

    of

    Aggregated

    Data.

    In Proc.

    Third

    Int.

    Workshop

    n

    Statistical and

    Scientific

    Database

    Management,Luxembourg.

    Malmborg,

    E.

    (1992).

    Matrix-based

    nterchange

    f

    Aggregated

    Statistical

    Data.

    In

    Proc.

    Sixth

    International

    WorkingConfer-

    ence on

    Scientific

    and

    StatisticalDatabase

    Management,

    Ascona,

    Switzerland.

    Malmborg,

    E. &

    Lisagor,

    L.

    (1993).

    Implementing

    Statistical

    Meta-Information

    ystem.

    In

    Eurostat

    Conference

    n

    Statistical

    Meta

    Information,

    Luxembourg,

    -4

    Feb.

    93,

    also

    in Statistical

    Journal

    of

    the United

    Nations

    UN/ECE

    2/1993.

    Malmborg,

    E. &

    Sundgren,

    B.

    (1994).

    Integration

    f Statistical

    nformation

    ystems-Theory

    and

    Practice.In

    Proc. Seventh

    International

    Working

    Conference

    on

    Scientific

    and

    Statistical

    Database

    Management,

    Charlottesville,

    Virginia,

    USA.

    Shoshani,

    A.

    (1982).

    Statistical Databases:

    Characteristics,

    roblemsand some

    Solutions. In

    Proc.

    8th

    Int.

    Conf

    on

    Very

    Large

    Data Bases.

    Sundgren,

    B.

    (1973).

    An

    Infological Approach

    o

    Data

    Bases,

    Statistics

    Sweden,

    Urval

    Nr

    7.

    Sundgren,

    B.

    (1991a).

    StatisticalMetainformation

    nd

    Metainformation

    ystems,

    Statistics Sweden

    R&D

    Report

    1991:11;

    also in

    StatisticalJournal

    of

    the

    UN/ECE

    2/1992.

    Sundgren,

    B.

    (1991b).

    Whatmetainformation

    hould

    accompany

    tatistical

    macrodata?

    tatistics

    SwedenR&D

    Report

    1991:9.

    Sundgren,

    B.

    (1992).

    Organizing

    he

    Metainformation

    ystems

    of

    a

    Statistical

    Office,

    StatisticsSweden

    R&D

    Report

    1992:10;

    also

    in

    the

    documentation rom the

    UN/ECE

    Work

    ession on

    Statistical

    Metadata1992

    (METIS).

    Sundgren,

    B.

    (1993a).

    Statistical

    Metainformation

    ystems-pragmatics,

    semantics,

    syntactics.

    In

    Eurostat

    Conference

    on

    Statistical Meta

    Information

    Systems,

    Luxembourg;

    lso in

    StatisticalJournal

    of

    the

    UN/ECE

    2/1993.

    Sundgren,

    B.

    (1993b).

    Guidelines on the

    Design

    and

    Implementation

    of

    Statistical

    Metainformation

    Systems,

    Statistics

    Sweden R&D

    Report

    1993:4. ECE Work

    session on

    StatisticalMetadata

    Nov.

    1993,

    Revised

    versions 1994

    and

    1995.

    UN/EDIFACT and

    Eurostat

    (1993).

    GESMES 93

    Guidance to Users

    & Reference

    Guide

    (separate

    volumes), Eurostat,

    Luxembourg.

    UN/EDIFACT

    1994).

    Raw Data

    Reporting

    Message,

    Draft

    document.

    Welke,

    R. J.

    (1994).

    The

    Shifting

    Software

    Development

    Paradigm.

    n

    Proc.

    of

    the Baltic

    Workshop

    n

    National

    Infrastructure

    Databases,

    Vilnius,

    Lithuania.

    Resume

    Les

    bureauxdes

    statistiques,

    peuvent-ils repondre

    aux

    demandes

    des

    utilisateursde

    rendre

    es

    donnes

    statistiques

    plus

    accessible

    par

    les

    technologies

    modernes?

    Peuvent-ils e

    faire sous

    les

    restrictions

    impos~es

    par

    le

    budget

    et

    par

    'inter8t

    des

    repondants?

    Ce sont

    des

    questions

    adressees

    ci. Les

    problemes

    et les

    possibilitds

    ont

    illustr6s

    par

    des

    exemples

    de la

    Suede.

    [Received

    November,

    1995,

    accepted

    November,

    1995]


Recommended