Metadata Driven Metadata Driven Integrated Integrated SStatistical tatistical DData ata
MManagement anagement SSystemystem
CSB of LatviaCSB of Latvia
By Karlis Zeila Vice President CSB of LatviaBy Karlis Zeila Vice President CSB of Latvia
MSIS 2004, Geneva May 17 - 19MSIS 2004, Geneva May 17 - 19
META DATA DRIVEN ... ?
Any action within the system is ruled by metadata,
Meta data is the key element of the system,
All software modules of entire system is connected with the Core Metadata module (Meta data base).
Any changes within the system starts with the changes of meta dataFull cycle of the data processing is possible as late as the proper description process in meta data base are completed
INTEGRATED ... ?Most of the system software modules are
connected with the Registers module,Registers module is an integral part of the system,
All surveys are supported by adequate classifications stored in the Meta data base
In all surveys respondent data fields are connected with registers data
All data is stored in corporative data warehouse
Statistical data processing has split in unified steps for different surveys
Export / Import procedures ensure work with the system data files using different standard software packages
Advantages and Restrictions
Advantages
2. Centralized processing and storage of the statistical data, including metadata, by using data warehouse technologies and OLAP tools.
3. All the data processing procedures are being hosted from common metadata system. These procedures are being described in metadata base.Therefore for standardized procedure execution for each survey individual programming is not required.
4. The system is informatively connected with Business Register, which provides with the direct respondent data retrieval and updating.
5. Special import and export procedure is created for data exchange with other systems.
6. A link with PC Axis is created for electronic data dissemination.
1. At most standardized main business statistics data entry, processing and storage procedures, that provide the bases for transfer from stove pipe data processing approach to process oriented data processing approach.
Restrictions
1.The system is oriented towards the data processing of different periodicity business statistics surveys.
2.Metadata base does not foreseen description of confidentiality rules they are hard coded in the system.
3. Hardware and Standard software requirements:
PC’s >/= Pentium II, RAM >/=128Mb equipped with
W – 95 to W-2000 and MS Office 2000.4. Metadata base does not foreseen description of algorithm
for automatic creation of respondents lists for Sample surveys from the Business register frame.
5. Diagnostic tools for the metadata descriptions are not powerful enough, therefore experts preparing meta data descriptions should be of high experience.
ISDMS architecture
Integrated statistical data management system
Corporative data Warehouse CSB Web Site
Macrodata base
Metadata base
Microdata base
Registers base
OLAP data base
User adminis-
tration data base
Dissemi-nation data
base
Windows 2000 Server Advanced MS Internet Information
Server SQL server 2000,
PC-AxisISDMS Business application Software Modules
Core metadata base modulerelated with DB:
Registers module
related with DB:
Data entry and validation module
related with DB:
Data aggregation module
related with DB:
Data analysis module
related with DB:
FIR
EW
AL
L
METADATA
USER ADMINISTRATION
REGISTERS
USER ADMINISTRATION
METADATA MICRODATA REGISTERS
USER ADMINISTRATION
METADATA MICRODATA REGISTERS
USER ADMINISTRATION
OLAP
METADATA
MACRODATA
Raw data base
Data dissemination
modulerelated with DB:
Data WEB entry module
related with DB:
Data mass entry module
related with DB:
Missed data imputation module
related with DB:
METADATA MACRODATA REGISTERS
USER ADMINISTRATION
METADATA MICRODATA REGISTERS
USER ADMINISTRATION
METADATA MICRODATA REGISTERS
RAW DATABASEUSER
ADMINISTRATION
METADATA MICRODATA REGISTERS
DATA IMPUTATION SOFTWARE
User administration module
related with DB:
METADATA MICRODATA MACRODATA
USER ADMINISTRATION
INTEGRATED, METADATA DRIVEN STATISTICAL DATA MANAGEMENT SYSTEM (ISDMS)
Statistical metadata(structured and unstructured)
Data entryfrom paper
formMICRO
DATABASE
Web raw database
ACTIVE structured statistical metadata for ISDMS1.VARIABLE=INDICATOR + ATTRIBUTE (CLASSIFICATIONS)2.QUESTIONNAIRE,TABLE,RowColumn Code Statistical metadata for description of the Output data
1.STATISTICAL DOMAINS2.BREAKDOWNS, CLASSIFICATIONS3.INDICATORS (basic and derived)
Datavalidationmodule
Dataaggregation
module
Data analysesmodule
MACRODATABASE
Data outputmodule OUTPUT
DATA
PC AXISWEB
modules
Statisticaldata from
other sources
Datagathering
Datain
paperform
e - questi-onnaires
Data entry administration moduleData aggregation and analyse
administration moduleData dissemination administration module
Search
e-Clients
Publication in paperform
e-Publication
Me
tad
ata
for d
ata
dis
sem
ina
tion
Me
atd
a ta
f or d
at a
pr o
ce
ss
i ng
RESPONDENTS
CSB clients - PersonnelCSB Clients - Respondents CSB clients - data users
Structure of Surveys (questionnaires)
New survey should be registered in the System. For each survey shall by created questionnaire version, which is valid for at least one year. If questionnaire content and/or layout do not change, then current version and it description in Metadata base is usable for next year.Each survey contains one or more data entry tables or chapters (data matrix) which can be constant table - with fixed rows and columns number or table with variable rows or columns number.
For each chapter we have to describe rows and columns with their codes and names in the Metadata base. This information is necessary for automatic data entry application generation, data validation e.t.c.
Last step in the questionnaire content and layout description is cells formation. Cells are smallest data unit in survey data processing. Cells are created as combination of row and column from survey version side and variable from indicators and attributes side.
Structure of trade statistics questionnaire (data matrix - fixed table)Name of Questionnaire, index, code, corroboration date, Nr.Respondents (object) code, name and address;Period (year, quarter, month)Name of chapter
Goods and commodity groups
Row code
Total turnover
( 2,3,4)
Retail trade turnover
Public catering turnover
Wholesale trade
A B 1 2 3 4
Goods, in total ( 2010, 2020, 2030-2190)
2000 15000 9000 5000 1000
Food products (except alcoholic beverages and tobacco goods)
2010 12000 5600 6000 400
Alcoholic beverages, in total 2020 3000 2000 400 600
of which:
spirits and liqueurs, whisky, long drinks
2021 500 300 100 100
wines2022 1000 500 200 300
CELL
[2010,1]VARIABLE 1
INDICATOR 1 + ATTRIBUTE
Metadata repository: common table of statistical indicators, table of attributes (classifications) and table of created variables
A t
t r
i b
u t
e s
I n d i c a t o r s
1. Data matrix - Fixed number of Rows (3) and variable number of Columns (n)
(Example) Main economical indicators of the economics activity
Row heading Row’s code
Total Name1 Name2 N Name n-1 Name n
A B 9999 NACE 1 code
NACE 2 code
….. NACE n-1 code
NACE n code
Number of employees 1110 …
Net turnover 1120 …
Other income 1130
2. Data matrix - Fixed number of Columns (3) and variable number of Rows (n)
(Example) Production of industry products
Name of production
Production code
(PRODCOM or CN code)
Produced in natural measurement
Sailed in natural measurement
Income in lats (LVL)
A B 1 2 3
Product 1 1234567
Product 2 2345678
… … . . . . . . . . .
Product n-1 4567890
Product n 5678901
Creating of variables
INDICATOR
Example:
Number of employees
+ Regional code (ATVK or NUTS)
= Number of employees, total
= Number of employees in breakdown by kind of activity (~300 variables)
= Number of employees in breakdown by regions (~26 variables)
+ no attribute
+ Local kind of activity (NACE)
Dimensions (Vectors) of indicators
ATTRIBUTES (CLASSIFICATORS)+ = VARIABLES
Dimensions of objects and indicators (example)
Number of employees in breakdown by kind of activityNACE 1 NACE 2 NACE 3 NACE 4
55 35 5 5
Region 1 60
Region 2 25
Region 3 15
Number of employees, total
100
Nu
mbe
r of
em
ploy
ees
in
brea
kdo
wn
by r
egio
ns
NACEREGIONS (Teritory)
OWNERSHIP AND ENTERPRENERSHIPEMPLOYEES GROUP
TURNOVER GROUP
Main dimensions (vectors) of respondents (objects O(t) )
Dimensions (vectors) of indicators
Integrated Metadata Driven Quasy Process Oriented Technology
SURVEY 1SURVEY 2
.....SURVEY N
Data outputand
dissemination
Standardized output datadissemination interface
METAdatabase
MACROdatabase
Data validation procedure
Dataaggregationprocedure
Metadata entry
IMPORT- EXPORTFOR PROCEDURES
OUTSIDE ISDMS
MICROdatabase
PROCESSORIENTED
APPROACH INRECTANGLES
Businessregister
EXPORTFOR PROCEDURES
OUTSIDE ISDMS
SURVEY 1 SURVEY 2 SURVEY N
Standardized data entryinterface
Respondlist
Metadata base link with Microdata and Macrodata bases
General description of survey
Description of survey
version
Description of chapters
(data matrix)
Description of rows and
columns
Selecting Indicators
Selecting
Attributes
Creating of Variables
Linking variables to cells
Generation form for data entry (automatically)
MICRO DATABASE
Defining of data aggregation
rules
Data aggregation function
(automatically)
MACRO DATABASE
META DATA BASE
(REPOSITORY)
IMPORT EXPORT
Data entry and validation
Data transfer to Microdata
Base
Description of data entry
forms
Description of validation
rules
Standard data entry and validation
Creating list of Respon-
dents
MICRO DATA BASE
RAW Web
DATA BASE
META DATA BASE
BUSINESS REGISTER
Mass data entry
Web data entry and validation
RAW DATA BASE
Data validation
Web Data validation
F i r e w a l l
Data import from files
Full data validation
CSBHEADQUARTERS
in RIGA~ 200 ISDMS Users
Data Collectionand Processing
CENTRE~20 users
Data Collectionand Processing
CENTRE~20 users
Data Collectionand Processing
CENTRE~20 users
Data Collectionand Processing
CENTRE~20 users
2 Mbit/secon-line
ISDMS USERS in CSB of Latvia
LESSONS LEARNED
Design of the new information system should be based on the results of deep analysis of the statistical processes and data flows
Clear objectives of achievements have to be set up, discussed and approved by all parties involved
StatisticiansIT personalAdministration
LESSONS LEARNED
Within the process of the design and implementation of metadata driven integrated statistical information system both parties statisticians and IT specialists should be involved from the very beginning
Both parties have to have clear understanding of all statistical processes,which will be covered by the system, as well as metadata meaning and role within the system from production and user sides
LESSONS LEARNED
Initiative to move from classical stove-pipe production approach to process oriented have to come from statisticians side not from IT personal or administration
Motivation of the statisticians to move from existing to the new data processing environment is essential;
Improvement of knowledge about metadata is one of the most important tasks through out of the all process of the design and implementation phases of the project
LESSONS LEARNED
Clear division of the tasks and responsibilities between statisticians and IT personal is the key point to achieve successful implementation
To achieve the best performance of the entire system it is important to organize the execution of the statistical processes in the right sequence
Design of the new surveys and questionnaires particularly as well as changes in the existing ones should be done in accordance with the system requirements
LESSONS LEARNED
As the result of feasibility study we clear understood, that some steps of statistical data processing for different surveys defy standardization, some surveys may require complementary functionality (non standard procedures), which is necessary just for this exact survey data processing;
For solving problems with the non-standard procedures interfaces for data export/import to/from system has been developed to ensure use of the standard statistical data processing software packages and other generalized software available in market;
LESSONS LEARNED It is necessary to establish and train special group of
statisticians, which will maintain Metadata base and which will be responsible for accurateness of metadata;
For the administration and maintenance of the system it is necessary to have well trained IT staff, which is familiar with the MS SQL Server 2000 administration, MS Analysis Service, other MS tools, PC AXIS family products and system Data Model, system applications;