+ All Categories
Home > Documents > PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and...

PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and...

Date post: 29-Dec-2015
Category:
Upload: randall-garrison
View: 224 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social Sciences)
Transcript
Page 1: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

PHDD – An RDF Vocabulary for the Physical Data Description

Work in Progress

Joachim Wackerow and Thomas Bosch(both GESIS – Leibniz Institute for the Social Sciences)

Page 2: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

What is it?

• Description of the physical properties of a data file.

• Focus on most common format types– Rectangular format– Character-separated values (CSV) or fixed-record

length

Page 3: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Character-separated ValuesHeader row

Page 4: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Fixed Record-length Format

Page 5: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Motivation

• Data.gov and similar initiatives provide data in CSV format or similar

• W3C Government Linked Data Working Group Charter– “The mission … is to provide standards and other

information which help governments around the world publish their data as effective and usable Linked Data using Semantic Web technologies.”

• Machine-actionability - intended for program use

Page 6: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Example at data.gov

Page 7: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Existing Approaches• CSV on the Web Working Group Charter (W3C)• Linkage

– Linked CSV (Jeni Tennison)– CSV linked data (Quoderat)

• Description– Common Format and MIME Type for Comma-Separated Values (CSV) Files, RFC 4180– csv: a vocabulary for describing CSV files (Rurik Thomas Greenall, Norwegian

University at Trondheim)• Representations

– URI design for RDF conversion of CSV-based data (Tim Lebo, Gregory Todd Williams)– CSV2RDF Application (Ivan Ermilov, Sören Auer, Claus Stadler)

Research results:– Intended for different purposes– Description approaches not sufficient

Page 8: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

PHDD – First Ideas

Page 9: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

PHDD – UML Modelclass phdd

TableDescription

- caseQuantity :xsd:nonNegativeInteger [0..1]- fileName :xsd:string, xsd:uri- recordsPerCase :xsd:positiveInteger = 1- overallRecordCount :xsd:nonNegativeInteger [0..1]

TableStructure

- characterSet :xsd:string- defaultDecimalSeparator :xsd:string [0..1]- defaultDigitGroupSeparator :xsd:string [0..1]- defaultLanguage :xsd:string [0..1]- defaultLocale :xsd:string [0..1]- defaultDecimalPositions :xsd:positiveInteger [0..1]- newLine :xsd:string = CRLF

ColumnDescription

- recommendedDataType :xsd:string [0..1]- storageFormat :xsd:string [0..1]- recommendedDisplayDataFormat :xsd:string [0..1]- decimalPositions :xsd:positiveInteger [0..1]- recordNumber :xsd:positiveInteger [0..1] = 1

skos:Concept

FixedRecordLength

- recordLength :xsd:positiveInteger [0..1]

Delimited

- delimiter :xsd:string- textQualifier :xsd:string [0..1]- consecutiveDelimitersAsOne :xsd:boolean [0..1] = false- namesOnFirstRow :xsd:boolean [0..1] = true- firstDataLine :xsd:positiveInteger [0..1] = 2

DistributionTable

Column

InputProgram

- programFileName :xsd:string- softwareType :xsd:string- programVersion :xsd:string

FixedColumnDescription

- startPosition :xsd:positiveInteger- endPosition :xsd:positiveInteger [0..1]- width :xsd:positiveInteger [0..1]

DelimitedColumnDescription

- columnPosition :xsd:positiveInteger

0..*

isDescribedBy

1

0..*

isStructuredBy

1

0..*

isDescribedBy

1

0..*

column

1..*

0..*

storageFormat

0..1

0..1

inputProgram

0..*

0..*

defaultLocale

0..1

0..*

defaultLanguage

0..1

0..*

characterSet

1

0..*

recommendedDisplayDataFormat

0..1

Page 10: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

PHDD - Overview

General approach is not really new, just a complete set of properties for the most common cases.

Structure• Table – the rectangular data file

[disco::DataFile, dcat::Distribution]– TableStructure - common properties plus specific ones

for delimited and fixed columns• Column - common properties plus specific ones for delimited

and fixed columns [disco::Variable]

Page 11: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

class phdd

TableDescription

- caseQuantity :xsd:nonNegativeInteger [0..1]- fileName :xsd:string, xsd:uri- recordsPerCase :xsd:positiveInteger = 1- overallRecordCount :xsd:nonNegativeInteger [0..1]

TableStructure

- characterSet :xsd:string- defaultDecimalSeparator :xsd:string [0..1]- defaultDigitGroupSeparator :xsd:string [0..1]- defaultLanguage :xsd:string [0..1]- defaultLocale :xsd:string [0..1]- defaultDecimalPositions :xsd:positiveInteger [0..1]- newLine :xsd:string = CRLF

ColumnDescription

- recommendedDataType :xsd:string [0..1]- storageFormat :xsd:string [0..1]- recommendedDisplayDataFormat :xsd:string [0..1]- decimalPositions :xsd:positiveInteger [0..1]- recordNumber :xsd:positiveInteger [0..1] = 1

skos:Concept

FixedRecordLength

- recordLength :xsd:positiveInteger [0..1]

Delimited

- delimiter :xsd:string- textQualifier :xsd:string [0..1]- consecutiveDelimitersAsOne :xsd:boolean [0..1] = false- namesOnFirstRow :xsd:boolean [0..1] = true- firstDataLine :xsd:positiveInteger [0..1] = 2

DistributionTable

Column

InputProgram

- programFileName :xsd:string- softwareType :xsd:string- programVersion :xsd:string

FixedColumnDescription

- startPosition :xsd:positiveInteger- endPosition :xsd:positiveInteger [0..1]- width :xsd:positiveInteger [0..1]

DelimitedColumnDescription

- columnPosition :xsd:positiveInteger

0..*

isDescribedBy

1

0..*

isStructuredBy

1

0..*

isDescribedBy

1

0..*

column

1..*

0..*

storageFormat

0..1

0..1

inputProgram

0..*

0..*

defaultLocale

0..1

0..*

defaultLanguage

0..1

0..*

characterSet

1

0..*

recommendedDisplayDataFormat

0..1

Table Structure

Page 12: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

class phdd

TableDescription

- caseQuantity :xsd:nonNegativeInteger [0..1]- fileName :xsd:string, xsd:uri- recordsPerCase :xsd:positiveInteger = 1- overallRecordCount :xsd:nonNegativeInteger [0..1]

TableStructure

- characterSet :xsd:string- defaultDecimalSeparator :xsd:string [0..1]- defaultDigitGroupSeparator :xsd:string [0..1]- defaultLanguage :xsd:string [0..1]- defaultLocale :xsd:string [0..1]- defaultDecimalPositions :xsd:positiveInteger [0..1]- newLine :xsd:string = CRLF

ColumnDescription

- recommendedDataType :xsd:string [0..1]- storageFormat :xsd:string [0..1]- recommendedDisplayDataFormat :xsd:string [0..1]- decimalPositions :xsd:positiveInteger [0..1]- recordNumber :xsd:positiveInteger [0..1] = 1

skos:Concept

FixedRecordLength

- recordLength :xsd:positiveInteger [0..1]

Delimited

- delimiter :xsd:string- textQualifier :xsd:string [0..1]- consecutiveDelimitersAsOne :xsd:boolean [0..1] = false- namesOnFirstRow :xsd:boolean [0..1] = true- firstDataLine :xsd:positiveInteger [0..1] = 2

DistributionTable

Column

InputProgram

- programFileName :xsd:string- softwareType :xsd:string- programVersion :xsd:string

FixedColumnDescription

- startPosition :xsd:positiveInteger- endPosition :xsd:positiveInteger [0..1]- width :xsd:positiveInteger [0..1]

DelimitedColumnDescription

- columnPosition :xsd:positiveInteger

0..*

isDescribedBy

1

0..*

isStructuredBy

1

0..*

isDescribedBy

1

0..*

column

1..*

0..*

storageFormat

0..1

0..1

inputProgram

0..*

0..*

defaultLocale

0..1

0..*

defaultLanguage

0..1

0..*

characterSet

1

0..*

recommendedDisplayDataFormat

0..1

Column Description

Page 13: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Usage Scenarios

Data

PHDD Discovery DCAT

Program

Description

TransformationAnalysis

DataData

Use

rPr

ovid

er

Search

DDI XML

Page 14: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Relationship to DDI XML

• Mapping to DDI XML Specifications– DDI Codebook 2.*• approx. half of the properties of PHDD

– DDI Lifecycle 3.*• almost all properties of PHDD

Page 15: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Relationship of DDI Specifications

DDI Codebook 2.*

DDI Lifecycle 3.*

DDI 4 Model

OWL/RDF RepresentationXML Schema Representation

PHDD

Discovery

XKOS

XML Schema OWL/RDF

Futu

re

Page 16: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Acknowledgements

• Contributions by– Larry Hoyle (Institute for Policy & Social Research,

University of Kansas)– Richard Cyganiak (DERI - Digital Enterprise

Research Institute )

Page 17: PHDD – An RDF Vocabulary for the Physical Data Description Work in Progress Joachim Wackerow and Thomas Bosch (both GESIS – Leibniz Institute for the Social.

Further Information

• Development repository of PHDD– https://

github.com/linked-statistics/physical-data-description

• DDI Alliance RDF Vocabularies– http://www.ddialliance.org/Specification/RDF


Recommended