Dataset Quality Ontology - An Engineering Experience

Post on 14-Jan-2017

270 views 1 download

transcript

DatasetQualityOntology:Anengineeringexperience

JeremyDeba*staUniversityofBonn/FraunhoferIAIS

Germany

…whoamI

•  PhDStudentattheUniversityofBonn•  OriginallyfromMalta(EnyislandintheMediterraneanbetweenItalyandLibya)

deba*s@cs.uni-bonn.de 2

…whoamI

•  B.Sc(Hons)inComputerScience–UniversityofMalta– Thesis:CollaboraEveEdiEngandExpertFinding

•  M.AppScinComputerScience–DERI(nowInsight),NaEonalUniversityofIreland,Galway– Thesis:Ontology-basedrulesforUser-ControlledSupportinUbiquitousEnvironments

deba*s@cs.uni-bonn.de 3

…myPhD–thebigpicture

•  WorkrelatedtoDataQuality(inLD)–  represenEngqualitymetadata(daQ)– assessingdataquality(Luzzu)–  idenEfyingnewmetricsfromstandardvocabularies(likePROV-O)usingdifferenttechniquesforscalability.

…agenda•  DefiniEonsofQuality

•  Lookatsomequalityaspectsre:OntologyEngineering

•  OurexperienceindevelopingdaQ–  contribuEonstowardsaW3Cvocab

•  VoCoLasatoolforcollaboraEvevocabularydevelopment

•  VOWLasatoolforvisualrepresentaEonofontologies

deba*s@cs.uni-bonn.de 5

…quality?

deba*s@cs.uni-bonn.de 6

Robert PirsigJoseph Juran

Phillip Crosby

Robert Pirsig

… the result of care Zen and the Art of Motorcycle Maintenance (1974)

7deba*s@cs.uni-bonn.de

… fitness for use Quality Control Handbook (1974)

Joseph Juran

8deba*s@cs.uni-bonn.de

… conformance to requirements

Quality is Free : The Art of Making Quality Certain. Mentor book. (1979)

Joseph JuranPhillip Crosby

9deba*s@cs.uni-bonn.de

…whatisqualityforyou?

deba*s@cs.uni-bonn.de 10

…QualityasdefinedinadicEonary

D1.howgoodorbadsomethingis

D2.acharacteris1corfeaturethatsomeoneorsomethinghas

D3.ahighlevelofvalueorexcellence

…defini1onsfromh9p://www.merriam-webster.com

deba*s@cs.uni-bonn.de 11

…dereferenceability“AnyHTTPURIshouldbedereferenceable,meaningthatHTTPclientscanlookuptheURIusingtheHTTPprotocolandretrieveadescrip1onoftheresourcethatisiden1fiedbytheURI.”–TomHeath,ChrisBizer:LinkedDataBook

(LDEvol)

deba*s@cs.uni-bonn.de 12

?303SeeOther 200OK

…theprocessthatretrievesarepresenta/onoftherequestedresource

…unknownexternalontologyused

•  Usageofontologiesthatcannotbedereferenced.–  rdfs:domain–  rdfs:range–  rdfs:subClassOf– …

•  UsageofdeprecatedClasses(classestaggedwithowl:deprecatedClass)

deba*s@cs.uni-bonn.de 13

…licencing“Specifyanappropriateopendatalicense.Data(inourcaseontology)reuseis

morelikelytooccurwhenthereisaclearstatementabouttheorigin,ownershipandtermsrelatedtotheuseofthepublisheddata”–BPfor

publishingLinkedData(hjps://www.w3.org/TR/ld-bp/)

•  Awaytodefineclearboundaries

•  InclusionofMachineandHumanReadableLicensetoontology’smetainformaEon

deba*s@cs.uni-bonn.de 14

…ontologydecleraEon

•  Describinganontologyusingowl:Ontology(andvoaf:Vocabulary–forLOVinclusion)– Metadataincludes:creator,datemodified,descrip1on,versioninfo,preferrednamespaceuri,preferredprefix…

– otherprovenanceinformaEonsuchashistoryofchangesetc…

deba*s@cs.uni-bonn.de 15

…domainandrangedefiniEon

•  Opendomain-rangeisnotrecommended

•  Reducesinteroperabilityandthus“understanding”ofresources’properEes

deba*s@cs.uni-bonn.de 16

…ontologyhijacking

•  RedefiniEonofclassesandproperEesinavocabularythatitisnotinitsnaturalnamespace.– e.gredefiningfoaf:PersoninyourownontologytobeasubclassofanewlydefinedPersonconcept.

deba*s@cs.uni-bonn.de 17

…consistencychecking

•  Possibleproblemswhenusingaxiomssuchas:– owl:inverseFunc1onalProperty– owl:disjointClass– owl:disjointWith– owl:inverseOf– …

deba*s@cs.uni-bonn.de 18

…otherpossiblemeasures•  MulElingualism

•  Humanreadablelabelsandcomments

•  Interlinkingwithsimilarterms/concepts

•  ValidSyntax

•  Un-typedclassesandproperEes

•  …others?

deba*s@cs.uni-bonn.de 19

…thedaQmeta-model

deba*s@cs.uni-bonn.de 20

…daQ–History

•  DescribingQualityMetadatainastandardisedmanner

•  Startedaroundtheendof2013

•  FormsthebasisoftheupcomingW3CDataQualityVocabulary(DQV)standard

deba*s@cs.uni-bonn.de 21

…daQ–TheFirstVersion

deba*s@cs.uni-bonn.de 22

…daQ–SubsequentVersions

•  Alwayspen,paperandasimpletexteditor– GITasaversioningcontrolsystem

•  4versionsbeforethecurrentversion

•  UseCaseiteraEontesEng

deba*s@cs.uni-bonn.de 23

…daQ–2ndVersion

•  Introduced:QualityGraph,and3levelsofAbstrac1on(BasedonZaverietal.categorisaEon)

deba*s@cs.uni-bonn.de 24

rdfg:Graph QualityGraphA

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

dateComputed requires

value

xsd:dateTime

computedOn rdfs:Resource

…daQ–AbstracEon•  HidingComplexity

•  DisEncEonbetweendaQconceptsandtangiblequalitymeasureconcepts

•  Abstractclassescannotbetyped(rdf:type),butinsteadshouldbesub-classed(rdfs:subClassOf)

•  ThereisnowaytocheckforabstractclassviolaEonunlessthereisanapplicaEonthatcheckssuchsyntaxerrors.

deba*s@cs.uni-bonn.de 25

…daq–WhyAbstractProperEes?

•  BestPracEcetoavoiddoubtandambiguity:– Ametricisajachedtoonedimensiononly.– Adimensionisajachedtoonecategoryonly.

•  UnifiedviewalsopresentedinZaverietal.DataQualitySurvey

deba*s@cs.uni-bonn.de 26

…daQ–3rdVersion

•  Introduced:TheDataCubeVocabulary

deba*s@cs.uni-bonn.de 27

rdfg:Graph QualityGraph

Aqb:DataSet

definesQBDataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

dateComputedrequires

valuexsd:dateTime

qb:ObservaDon

hasObservaDon

rdfs:Resource

computedOn

metric

qb:dataSet

…daQ–4thVersion

•  Modified:QualityGraph;Introduced:expectedDataType;Added:datetoObserva1on

deba*s@cs.uni-bonn.de 28

rdfg:Graph QualityGraph

Aqb:DataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

expectedDataType

requires

valuexsd:anySimpleType

qb:ObservaBon

hasObservaBon

rdfs:Resource

computedOn

metric

qb:dataSet

dc:date

…daQ–canametricreturnavalueotherthanasimpledatatype?

“ThispropertyfromDAQisdefinedtohaverangexsd:anySimpleType.Whileitseemsusefultodefinetheexpecteddatatypeforametric,asimpletypemaytoonarrow:inmanycasesametricwillbedeterminedonadatarecordorasubgraph.”–[BailerWarner28/10/2015]W3CDWBPPublicCommentsList-h9ps://lists.w3.org/Archives/Public/public-dwbp-comments/2015Oct/0019.html

deba*s@cs.uni-bonn.de 29

…daQ-5thVersion

•  Introduced:isEs1mate,computedBy

deba*s@cs.uni-bonn.de 30

rdfg:Graph QualityGraph

Aqb:DataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

expectedDataType requires

value

xsd:anySimpleType qb:ObservaBon

hasObservaBon

rdfs:Resource

computedOn

metric

qb:dataSet

dc:date

xsd:boolean

isEsBmate

prov:Agent

computedBy

…daQ–CurrentVersion

•  Removed:computedBy,qb:Observa1onAdded:daq:Observa1on

deba*s@cs.uni-bonn.de 31

rdfg:Graph QualityGraph

Aqb:DataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

expectedDataType requires

value

xsd:anySimpleType daq:ObservaBon

hasObservaBon

rdfs:Resource

computedOn

metric

qb:dataSet

sdmx-dimension:BmePeriod

xsd:boolean

isEsBmate

xsd:dateTime

qb:ObservaBon

prov:EnBty

…involvementinW3C

•  W3CWorkingGroup–DataontheWebBestPracEces–  developopendataecosystem–  provideguidancetopublishers–  fostertrustindata

•  3Deliverables:–  BestPracEces– DataQualityVocabulary(DQV)– DataUsageVocabulary

deba*s@cs.uni-bonn.de 32

…involvementinW3C-DQV

•  Ameta-modeltocovermanyqualityaspectsofadataset(linkeddataornot)

•  ThecorecomponentdescribingquanEtaEvemeasuresisinspiredbydaQ

deba*s@cs.uni-bonn.de 33

…involvementinW3C-DQV

deba*s@cs.uni-bonn.de 34

…involvementinW3C-DQV

•  Notableissues(204,205)betweendaQandDQV(hjps://www.w3.org/2013/dwbp/track/issues/xxx)-wherexxxis204or205

– UsageofabstractclassesandproperEes– DefiningCategory-Dimension-Metricassubclassofskos:Concept

deba*s@cs.uni-bonn.de 35

…collaboraEveframework•  VoCoL–anIDEforcollaboraEvevocabularydevelopment

withVCSintegraEon

•  A(exchangeable)componentbasedsystem–  HumanReadableDocumentGeneraEon–  IntelligentTurtleEditor–  EvoluEonTracker–  OntologyVisualisaEon–  SPARQLEndpointService–  Client-SidevalidaEonbeforecommittoVCS

•  OnlineDemo:hEp://buEerbur06.iai.uni-bonn.de/

deba*s@cs.uni-bonn.de 36

…visualisingontologies-VOWL

•  VOWL–AvisualnotaEonforOWL–  IntuiEve

–  Self-explaining

–  Comprehensible

– Well-specified

–  Complete

– Device-independent

deba*s@cs.uni-bonn.de 37

hjp://vowl.visualdataweb.org

…VOWLimplementaEons

deba*s@cs.uni-bonn.de 38

ProtégéPlugin

WebVOWL

…daQinVOWL

deba*s@cs.uni-bonn.de 39

…ReferencesandLinks•  (daQ)-RepresenEngdatasetqualitymetadatausingmulE-

dimensionalviews–J.Deba*sta,C.Lange,S.Auer•  (DQV)-hjps://www.w3.org/TR/vocab-dqv/•  (VoCoL)-hjps://github.com/vocol/vocol•  (Zaverietal.)-QualityAssessmentforLinkedData:A

Survey•  (LDEvol)-hjp://linkeddatabook.com/ediEons/1.0/•  (VOWL)–hjp://vowl.visualdataweb.org

AnySuggesJons?PossibleCollaboraJons?debaMs@cs.uni-bonn.de

deba*s@cs.uni-bonn.de 40