Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | edwin-small |
View: | 218 times |
Download: | 0 times |
1SDMX Global Conference28-30 September 2015
SDMX into the future
VTL(Validation and Transformation Language)
A new technical standard for enhancingdata validation and processing
Vincenzo Del Vecchio - Bank of Italy
Marco Pellegrino – Eurostat
SDMX TWG & VTL Task Force
228-30 September 2015 SDMX Global Conference
Approach
SDMX originally focused on data collection and dissemination
Current line of tendency: Support more stages of the statistical production process
Generic Statistical Business Process Model
328-30 September 2015 SDMX Global Conference
What is VTL
A standard language For defining validation and transformation rules
Validation (now)
Transformation (partially now, to be enriched at a later stage)
Main goals:Define and preserve validation and transformation rules
Exchange and share rules
Apply rules in industrialized processes
Apply to several standards (e.g. SDMX, DDI, GSIM) thanks to a generic information model
The VTL Information Model
VTL is a “stand-alone” specification– It can be used with SDMX, DDI, GSIM or potentially anything else– It can be used on its own
VTL has its own information model– All kind of data are modelled as mathematical functions having
independent variables (Identifiers) and dependent variables (Measures and Attributes)
– GSIM IM is used as a basis– It can be mapped against SDMX– It can be mapped against DDI
28-30 September 2015 SDMX Global Conference 4
Main VTL drivers (1)
Business orientation – Designed for subject matter experts use
Integrated Approach– Any kind of data
– Independent of the phase of the process
– Unique language for validation and calculation
528-30 September 2015 SDMX Global Conference
Main VTL drivers (2)
IT implementation independence– Independent of IT tools
– Allowing multiple tools
– Resilient to tools changes
Active Role for processing– Formal (described by means of BNF)
– Able to drive the validation & calculation software
Extensible and customizable
628-30 September 2015 SDMX Global Conference
728-30 September 2015 SDMX Global Conference
VTL 1.0 Operators
828-30 September 2015 SDMX Global Conference
VTL features (1)
Declarative language based on Expressions D4 := Check( (D1 – D2) = D3)
D1, D2, D3: Operands
D4: Result
+, > Operators
Operates on Data Sets (SDMX Dataflows)D1, D2, D3, D4 are typically Data Sets, e.g.:
D1 – population at time T by age and civil status
D2 - population at time T-1 by age and civil status
D3 – population flows between T-1 and T by age and civil status
D4 – consistency of population figures (true / false), by age and civil status
… and on parts of Data Setse.g. Time Series, Cross Sections, single Data Points
928-30 September 2015 SDMX Global Conference
VTL features (2)
Supports operations on many types of statistical data, e.g.:
Dimensional and Unit data, Survey and register data,
Quantitative and qualitative data, …
... And can combine them, e.g.:D1 – Securities Register (by security id)
D2 – Securities Holdings (by security holder, security id, date)
D3 := merge (D1, D2, on (D1#sec_id = D2#sec_id), return D2#sec_holder, D2#sec_id, D1#sec_type)produces D3 by adding to D2 the security type taken from D1)
1028-30 September 2015 SDMX Global Conference
VTL features (3)
Can concatenate expressions D4 := Check( (D1-D2) = D3)
D5 := if D4 = False then D2 else D1
(the result of the former is an operand of the latter)
Considers the validation as a kind of Transformation (calculation), in order to• Use a common language• Use validations and calculations together, e.g.:
Validation: D4 := Check( (D1-D2) = D3)
Calculation: D5 := if D4 = False then D2 else D1
1128-30 September 2015 SDMX Global Conference
The Tranformations graph
Collection activity n.1
D1
D2
D3D4
D5
T1
T3
T2
D11
D12D13
D15
D17
D16T13
T12
T14
Collection activity n.2
Collection activity n.3
D21
D22
D23
D24T22
T21
Legend: Di = Data Seti Tj = Transformationj
D51
D52
T53
T52
T51
Analysis & research models
D54
D53
T54
D60D61
Publications
T60T61
Statistical products
D70T71
T70T72D71
D72
D41
T42
T41
D42
1228-30 September 2015 SDMX Global Conference
VTL features (4)
VTL 1.0 allows:• Persistent and temporary results• Operations on mono and multi measure data• Dealing with missing data• Dealing with Attributes and their propagation rules
VTL 1.1 will introduce:• Other operators, mainly for validation purposes• Reusable rules• Bug fixing, fine tuning
1328-30 September 2015 SDMX Global Conference
VTL status
VTL 1.0: published in March 2015 – (http://sdmx.org/wp-content/uploads/2015/03/VTL-1-package-2015.zip)
VTL: part 1 - part 2
BNF (Extended Backus-Naur Form) Technical notation
VTL 1.1 (language extensions): work in progress
SDMX implementation: work in progress– Messages for exchanging VTL rules– Registry for storing VTL rules– Web services for retrieving VTL rules
• VTL is maintained by the SDMX TWG through the VTL Task Force– Extensions will be considered for inclusion in future versions
• VTL has already produced some feedback to GSIM for next version– VTL can be mapped against SDMX– VTL can be directly utilized by DDI in those places where
computations are included– As GSIM processing Rules
Governance and Standards Alignment
28-30 September 2015 SDMX Global Conference 14
1528-30 September 2015 SDMX Global Conference
SDMX into the future
Contribute to VTL 1.1 !!!Comments on VTL 1.0 and suggestions for improvement
can be sent to the SDMX Technical Working Group
Thanks for your attention !