Date post: | 28-Jan-2018 |
Category: |
Technology |
Upload: | dr-ing-thomas-hartmann |
View: | 507 times |
Download: | 0 times |
Validating RDF Data Quality using Constraints
to Direct the Development of Constraint Languages
Thomas Hartmann
Benjamin Zapilko, Joachim Wackerow, Kai Eckert
International Conference on Semantic Systems (ICSC 2016)
XML Validation
<!ELEMENT library (book+, author*)>
<!ELEMENT book (isbn, title, author-ref+)>
<!ATTLIST book
id ID #REQUIRED
>
<!ELEMENT author-ref EMPTY>
<!ATTLIST author-ref
id IDREF #REQUIRED
>
<!ELEMENT author (name)>
<!ATTLIST author
id ID #REQUIRED
>
<!ELEMENT isbn (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT name (#PCDATA)>
RDF Validation Workshop
Working Groups on RDF Validation
W3C Data Shapes Working Group
DCMI RDF Application Profiles Task Group
http://purl.org/net/rdf-validation
81 Types of Constraints on RDF Data
Constraint Languages
SPARQL Query Language for RDF
SELECT ?concept
WHERE {
?concept a [ rdfs:subClassOf* skos:Concept ] .
FILTER NOT EXISTS {
?concept ?p ?o .
FILTER ( ?p IN (
skos:related,
skos:relatedMatch,
skos:broader, ... ) ) . } }
SPARQL Inferencing Notation (SPIN)
# FILTER NOT EXISTS { ?book author ?person }
[ a sp:Filter ;
sp:expression [
a sp:notExists ;
sp:elements (
[ sp:subject [ sp:varName "book" ] ;
sp:predicate author ;
sp:object [ sp:varName "person" ]])]])
Web Ontology Language (OWL)
:Publication rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :author ;
owl:allValuesFrom :Person ] .
Shape Expressions (ShEx)
:Publication {
( :isbn xsd:string, :title xsd:string )
|
( :issn xsd:string, :title xsd:string )}
Resource Shapes (ReSh)
:Computer-Science-Book
a oslc:ResourceShape ;
oslc:property [
oslc:propertyDefinition :subject ;
oslc:allowedValues [
oslc:allowedValue
"Computer Science" ,
"Informatics" ,
"Information Technology" ] ] .
[ a dsp:DescriptionTemplate ;
dsp:resourceClass :Science-Fiction-Book ;
dsp:statementTemplate [
dsp:property :subject ;
dsp:nonLiteralConstraint [
dsp:valueClass skos:Concept ;
dsp:valueURI
:Science-Fiction, :Sci-Fi, :SF ;
dsp:vocabularyEncodingScheme
:Science-Fiction-Book-Subjects ; ] ] .
Description Set Profiles (DSP)
Shapes Constraint Language (SHACL)
:BookShape
a sh:Shape ;
sh:scopeClass :Book ;
sh:property [
sh:predicate :author ;
sh:valueShape :PersonShape ;
sh:minCount 1 ; ] .
Constraint Types Classification
1. RDFS/OWL Based
2. Constraint Language Based
3. SPARQL Based
RDFS/OWL Based
:Publication rdfs:subClassOf
[ a owl:Restriction ;
owl:onProperty :author ;
owl:allValuesFrom :Person ] .
Constraint Language Based
:Publication {
( :isbn xsd:string, :title xsd:string )
|
( :issn xsd:string, :title xsd:string )}
SPARQL Based
SELECT ?concept
WHERE {
?concept a [ rdfs:subClassOf* skos:Concept ] .
FILTER NOT EXISTS {
?concept ?p ?o .
FILTER ( ?p IN (
skos:related,
skos:relatedMatch,
skos:broader, ... ) ) . } }
Constraints Classification
1. Informational
2. Warning
3. Error
Evaluation Setup
• 115 constraints from vocabularies and experts
• constraints classified and implemented
• on 3 vocabularies in the SBE sciences– well-established vocabularies (QB, SKOS)
– vocabulary under development (DDI-RDF)
Validated Data Sets
Vocabulary Data Sets Triples
QB 9,990 3,775,983,610
SKOS 4,178 477,737,281
DDI-RDF 1,526 9,673,055
Total 15,694 4.26 billion
33 SPARQL Endpoints
Finding 1
C [%] CV [%]
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
C (constraints), CV (constraint violations)
Finding 2
C [%] CV [%]
SPARQL 63.2 78.2
CL 34.7 21.8
RDFS/OWL 35.6 21.8
C (constraints), CV (constraint violations)
Finding 3
C [%] CV [%]
Info 42.3 31.3
Warning 18.7 62.7
Error 39.0 6.1
C (constraints), CV (constraint violations)
Limitations
> 3 Vocabularies
> 1 Domain