GIS Data QualityGIS Data Quality
Producing better data quality Producing better data quality through robust business through robust business
processesprocessesBrightStar
TRAININGKim Ollivier
Schedule Day 2Schedule Day 2
Suggested breaks for the following times: Start: 9:00
Session 1 ( 90 min)Morning tea: 10:30 to 10:45
Session 2 ( 105 min)Lunch: 12:30 to 1:30
Session 3 ( 90 min) Afternoon tea: 3:00 to 3:15
Session 4 ( 105 min)Finish: 5:00
Each session will have an exercise or interactive discussion
TopicsTopics
•Metadata
•Designing rules
•Data warehouse and ETL
•Feature maintenance
MetadataMetadata
Data modelData model Business rules, relations, stateBusiness rules, relations, state Subclasses (lookup tables)Subclasses (lookup tables) GIS Metadata NZGLS and ISO XMLGIS Metadata NZGLS and ISO XML Readme.txt or readme.htmlReadme.txt or readme.html
MetadataMetadata
Which standard?Which standard? ISO 19115, NZGMSISO 19115, NZGMS Aust asdd.ga.gov.auAust asdd.ga.gov.au
Examine MetadataExamine Metadata
Geospatial metadataGeospatial metadata Benefit to users or producer?Benefit to users or producer? How do we collect it?How do we collect it? Standardisation or not?Standardisation or not? metadata\topo250k_metadata.htmlmetadata\topo250k_metadata.html metadata\metadata\DCW_DQ_Project.htmDCW_DQ_Project.htm metadata\metadata\meta.htmlmeta.html
Morning Tea
Data Quality RulesData Quality Rules
Attribute domain constraintsAttribute domain constraints Relational integrity rulesRelational integrity rules Rules for historical dataRules for historical data Rules for state-dependent objectsRules for state-dependent objects General dependency rulesGeneral dependency rules Spatial feature rulesSpatial feature rules
A GIS Data A GIS Data Quality SystemQuality System
Assess
Data Quality AssessmentData Profiling
Improve Prevent Recognise
Data CleaningMonitoring
Data IntegrationInterfaces
Ensuring Quality ofData Conversionand Consolidation
Building DataQuality Metadata
Warehouse
Monitor
Recurrent Data QualityAssessment
Assessing QualityAssessing Quality
Project stepsProject steps Required rolesRequired roles Defining the objectivesDefining the objectives Designing rulesDesigning rules Scorecard and MetadataScorecard and Metadata Frequency of assessmentFrequency of assessment
Building RulesBuilding Rules
Data profilingData profiling Interview usersInterview users Examine data modelExamine data model Data GazingData Gazing Application v data matrixApplication v data matrix
Attribute Domain ConstraintsAttribute Domain Constraints
Lookup tablesLookup tables Numeric rangesNumeric ranges Null valuesNull values Blank valuesBlank values Format constraintsFormat constraints PrecisionPrecision Complex domain restraintsComplex domain restraints
Relational Integrity RulesRelational Integrity Rules
Identity ruleIdentity rule Reference rulesReference rules Cardinal rulesCardinal rules Inheritance rulesInheritance rules
Historical DataHistorical Data
Time dependent attributeTime dependent attribute Value constraintsValue constraints Rates of changeRates of change VolatilityVolatility ContinuityContinuity GranularityGranularity
State-dependent ObjectsState-dependent Objects
State-transition modelsState-transition models States, terminatorsStates, terminators ActionsActions
start
Terminated(T)
On Leave(L)
Active(A)
Retired(R)
Deceased(D)
Event HistoriesEvent Histories
An object may have many eventsAn object may have many events Event OverlapsEvent Overlaps Event FrequenciesEvent Frequencies Event ConditionsEvent Conditions
Spatial RulesSpatial Rules
Projection, unitsProjection, units Dimensions 2D,3D,M,ZDimensions 2D,3D,M,Z point,line,polypoint,line,poly PrecisionPrecision TopologyTopology
Valuation RollValuation Roll
Legacy structure, 50 years oldLegacy structure, 50 years old Variable maintenance standardVariable maintenance standard Valuer General audit (DQ spec)Valuer General audit (DQ spec)
Rules ExerciseRules Exercise
Split into pairsSplit into pairs Examine sample DVR datasetExamine sample DVR dataset Devise some rules for each categoryDevise some rules for each category
Verbal discussion with classVerbal discussion with class
Lunch
Data Warehouse & ETLData Warehouse & ETL
Why not direct access to online DB?Why not direct access to online DB? Staging AreaStaging Area Scripting toolsScripting tools Trade-offsTrade-offs KPI for projectKPI for project
• better quality than sourcebetter quality than source• better quality than targetbetter quality than target
ETL ExtractETL Extract
ExtractExtract
ETL TransformETL Transform
The importance of primary keysThe importance of primary keys
ETL LoadETL Load
Batch offline most commonBatch offline most common Daily status usually enoughDaily status usually enough
Safe Software FMESafe Software FME
ExamplesExamples
Afternoon Tea
Data Quality TeamData Quality Team
IT DQ Team Users
Maintenance of featuresMaintenance of features
Time series importantTime series important Line/polygon features are not atomicLine/polygon features are not atomic Splitting loses inheritanceSplitting loses inheritance Calculating depreciation Calculating depreciation Direct editing bypasses business Direct editing bypasses business
rulesrules
Maintenance of the QualityMaintenance of the Quality
Gardening, not mountain climbingGardening, not mountain climbing Discussion of course topicsDiscussion of course topics
ReferencesReferences
Data Quality, Data Quality, The Accuracy DimensionThe Accuracy Dimension – Jack E – Jack E OlsonOlson
The Data Warehouse ETL Toolkit – Ralph KimballThe Data Warehouse ETL Toolkit – Ralph Kimball
Please fill in evaluation forms
Finish