+ All Categories
Home > Education > Metadata Provenance

Metadata Provenance

Date post: 30-Jun-2015
Category:
Upload: kai-eckert
View: 1,882 times
Download: 0 times
Share this document with a friend
Description:
Two motivating scenarios for metametadata
23
1 DCMI Metadata Provenance Metadata Provenance Two motivating scenarios for metametadata Kai Eckert Mannheim University Library Michael Panzer OCLC DCMI Metadata Provenance F2F Meeting and Workshop October 20th, 2010 Pittsburgh, PA, USA
Transcript
Page 1: Metadata Provenance

1DCMI Metadata Provenance

Metadata ProvenanceTwo motivating scenarios for metametadata

Kai EckertMannheim University Library

Michael PanzerOCLC

DCMI Metadata ProvenanceF2F Meeting and Workshop

October 20th, 2010Pittsburgh, PA, USA

Page 2: Metadata Provenance

2DCMI Metadata Provenance

Metametadata

Provenance information outside of existing data models „Transparent“  Potential use­cases:

Whenever you have lots of legacy data in a model that does not support provenance.

Whenever new applications require information that can not be expressed in the existing data model.

Page 3: Metadata Provenance

3DCMI Metadata Provenance

Need for Metametadata Metadata are also data, so we need additional data 

about them.                  Metametadata Metadata about a whole metadata record, not for single 

statements: Who created this metadata record? When was this record created? …

 Metadata Provenance

Page 4: Metadata Provenance

4DCMI Metadata Provenance

Statements about (single) statements

Often proposed, but only vague instructions how to implement it.

Needed, if metadata records are created by the combination of single statements from different sources.

Needed for the storage of arbitrary additional information for single statements, that can not be represented in the metadata format easily.

Page 5: Metadata Provenance

5DCMI Metadata Provenance

Metametadata vs. Model based provenance

Simple statement: Peter knows Paul.

Provenance information: This statement is made by Mary.

Peter Paul

Mary

Knows

says

Metalevel

Page 6: Metadata Provenance

6DCMI Metadata Provenance

Data model extension

Peter

Paul

Mary

Has RelationRelation

Has Object

Has Creator

Knows Relation

Has Type

Simple statement: Peter knows Paul.

Provenance information: This statement is made by Mary.

Page 7: Metadata Provenance

7DCMI Metadata Provenance

Peter

Paul

Mary

hasRelationRelation

Has Object

Has Creator

Knows Relation

Has Type

Peter Paul

Mary

Knows

says

Metalevel

Page 8: Metadata Provenance

8DCMI Metadata Provenance

Implementation in RDF

This should not be limited to RDF! But it is a good example and RDF has a currently a 

high impact. RDF provides no satisfying answer how to express 

provenance information. Different possible implementation, e.g.:

Reification Named Graphs Extended data models ...

Page 9: Metadata Provenance

9DCMI Metadata Provenance

RDF Reification

RDF supports statements about statements by means of Reification, literally „objectification“ (actually a “subjectification”...).

“The book is written by Goethe“ is said by Kai.

How is it done in RDF:

ex:someID rdf:type rdf:Statement .ex:someID rdf:subject “The book”.ex:someID rdf:predicate ex:isWrittenBy . ex:someID rdf:object "Goethe" .ex:someID ex:isSaidBy “Kai” .

Subject Predicate Object

Page 10: Metadata Provenance

10DCMI Metadata Provenance

S u b j e c t P r e d i c a t e O b j e c t

1 e x : p 1 2 3 r d f : t y p e e x : p e r s o n

2 e x : p 1 2 3 e x : h a s N a m e “ K a i E c k e r t ”

3 e x : p 1 2 3 e x : w o r k s F o r e x : u n i m a

E x a m p l e 1 : A s i m p l e R D F e x a m p l e

Simplified Presentation

Based on Notation 3 (RDF/N3)

Identification of statements by the line number:

4 #1 dc:creator ''Kai Eckert''

The subject of a statement is a reference to another statement. With this notation, we imply a reification.

Page 11: Metadata Provenance

11DCMI Metadata Provenance

Scenario 1: Crosswalks

Crosswalks define rules, how metadata from one schema are represented in a different schema.

Problems:  Loss of information Erroneous Crosswalks

MARC field Dublin Core element

260$c (Date of publication, distribution, etc.) → Date.Created

522 (Geographic Coverage Note) → Coverage.Spatial

300$a (Physical Description) → Format.Extent

Page 12: Metadata Provenance

12DCMI Metadata Provenance

Possibilities for Metametadata

Storage of additional information, which would be lost in the target format.

Identification of Crosswalks with version and the specific rule for every generated statement.

Which statements are generated by a specific rule?

Which rule is responsible for a specific (erroneous) statement?

Which data in the originating format was used to generate a specific statement?

Page 13: Metadata Provenance

13DCMI Metadata Provenance

Example 1: Crosswalk Data

S u b j e c t P r e d i c a t e O b j e c t

1 e x : d o c b a s e / d o c 1 d c : t i t l e “ E x a m p l e t i t l e ”

2 # 1 e x : r u l e 1 6

3 # 1 e x : c r o s s w a l k 3

4 # 1 e x : o r i g i n M A R C : 2 4 5

5 e x : d o c b a s e / d o c 2 d c : t i t l e “ A b o u t f i n d i n g a t i t l e ”

6 # 5 e x : r u l e 1 6

7 # 5 e x : c r o s s w a l k 3

8 # 5 e x : o r i g i n M A R C : 2 4 5

9 e x : d o c b a s e / d o c 3 d c : t i t l e “ L o r e m i p s u m d o l o r ”

1 0 # 9 e x : r u l e 1 8

1 1 # 9 e x : c r o s s w a l k 3

1 2 # 9 e x : o r i g i n M A R C : 2 4 5

1 3 # 9 e x : o r i g i n M A R C : 2 4 6

1 4 e x : d o c b a s e / d o c 4 d c : t i t l e “ C o n s e t e t u r S a d i p s c i n g ”

1 5 # 1 4 e x : r u l e 1 9

1 6 # 1 4 e x : c r o s s w a l k 6

1 7 # 1 4 e x : o r i g i n x m l : / r e c o r d / d e s c r i p t i o n

E x a m p l e 4 : R e s u l t i n g R D F s t a t e m e n t s w i t h a d d i t i o n a l M e t a m e t a d a t a

Page 14: Metadata Provenance

14DCMI Metadata Provenance

Crosswalk Updates

Which statements are generated by a given rule and need to be regenerated after an update?

SELECT ?document ?field ?value WHERE { ?t rdf:subject ?document . ?t rdf:predicate ?field . ?t rdf:object ?value . ?t ex:rule 16 . ?t ex:crosswalk 3 .}

document field valueex:docbase/doc1 http://www.example.org/dc#title "Example title"ex:docbase/doc2 http://www.example.org/dc#title "About ding a title"

document field valueex:docbase/doc1 http://www.example.org/dc#title "Example title"ex:docbase/doc2 http://www.example.org/dc#title "About ding a title"

Page 15: Metadata Provenance

15DCMI Metadata Provenance

Crosswalk Debugging

Which rule is responsible for a given statement and what was the original data?

SELECT ?crosswalk ?rule ?origin WHERE { ?t rdf:subject <ex:docbase/doc1> . ?t rdf:predicate dc:title . ?t rdf:object "Example title" . ?t ex:rule ?rule . ?t ex:crosswalk ?crosswalk . ?t ex:origin ?origin .}

crosswalk rule origin3 16 "MARC:245"

crosswalk rule origin3 16 "MARC:245"

Page 16: Metadata Provenance

16DCMI Metadata Provenance

Scenario 2: Different Sources for Metadata

Manual indexing is costly. Many documents are not indexed at all or not 

searchable: Journal Articles Externally owned documents  Working papers Webpages

New sources for metadata?

Page 17: Metadata Provenance

17DCMI Metadata Provenance

New ways for document indexing

Automatic processes Tagging (Automatic) mapping of metadata from external 

sources Problem: Lack of quality How do you integrate these data from different sources without 

compromising the retrieval quality?

Page 18: Metadata Provenance

18DCMI Metadata Provenance

Possibilities for Metametadata

Storage of the source of single statements. Storage of further source­specific information:

Weighting for automatically generated subject headings. Number of users who tagged a document with a given tag. The original subject heading in case of an automatic 

translation or mapping.

Can we use these additional information to improve document retrieval?

Page 19: Metadata Provenance

19DCMI Metadata Provenance

Example 2: Subject indexing

S u b j e c t P r e d i c a t e O b j e c t

1 e x : d o c b a s e / d o c 1 d c : s u b j e c t e x : t h e s / s u b 2 0

2 # 1 e x : s o u r c e e x : s o u r c e s / a u t o i n d e x 1

3 # 1 e x : r a n k 0 . 5 5

4 e x : d o c b a s e / d o c 1 d c : s u b j e c t e x : t h e s / s u b 3 0

5 # 4 e x : s o u r c e e x : s o u r c e s / a u t o i n d e x 1

6 # 4 e x : r a n k 0 . 8

7 e x : d o c b a s e / d o c 1 d c : s u b j e c t e x : t h e s / s u b 3 0

8 # 7 e x : s o u r c e e x : s o u r c e s / p f e f f e r

9 # 7 e x : r a n k 1 . 0

1 0 e x : d o c b a s e / d o c 1 d c : s u b j e c t e x : t h e s / s u b 4 0

1 1 # 1 0 e x : s o u r c e e x : s o u r c e s / p f e f f e r

1 2 # 1 0 e x : r a n k 1 . 0

1 3 e x : s o u r c e s / a u t o i n d e x 1 e x : t y p e e x : t y p e s / a u t o

1 4 e x : s o u r c e s / p f e f f e r e x : t y p e e x : t y p e s / m a n u a l

E x a m p l e 7 : S u b j e c t a s s i g n m e n t s b y d i f f e r e n t s o u r c e s

Page 20: Metadata Provenance

20DCMI Metadata Provenance

Backward compatibility

While there are four assignments for subject headings, the statement “ex:docbase/doc1 dc:subject ex:thes/sub30”is still one statement, regardless of the number of times you put it into your RDF store.

Important for applications, that access the RDF Data, but do not handle the RDF reification.

Your metadata remains valid, in particular there are no doublets.

Page 21: Metadata Provenance

21DCMI Metadata Provenance

Separating the sources

Which statements are made by a specific source (here: Pfeffer)?

SELECT ?document ?value WHERE { ?t rdf:subject ?document . ?t rdf:predicate dc:subject . ?t rdf:object ?value . ?t ex:source <ex:sources/pfeffer> .}

document subjectex:docbase/doc1 ex:thes/sub30ex:docbase/doc1 ex:thes/sub40

document subjectex:docbase/doc1 ex:thes/sub30ex:docbase/doc1 ex:thes/sub40

Page 22: Metadata Provenance

22DCMI Metadata Provenance

Extended queries

Use all manually created subject headings. Use all subject headings with a rank > 0.7.

SELECT DISTINCT ?document ?subject WHERE { ?t rdf:subject ?document . ?t rdf:predicate dc:subject . ?t rdf:object ?subject . ?t ex:source ?source . ?source ex:type ?type . ?t ex:rank ?rank . FILTER ( ?type = <ex:types/manual> || ?rank > 0.7 )}

document subjectex:docbase/doc1 ex:thes/sub30ex:docbase/doc1 ex:thes/sub40

document subjectex:docbase/doc1 ex:thes/sub30ex:docbase/doc1 ex:thes/sub40

Page 23: Metadata Provenance

23DCMI Metadata Provenance

Conclusion Many applications of metametadata in the library fields 

can be realized with Metametadata. No change on the underlying data models needed. But:

Reification is not well accepted in the community. Named graphs are not (yet) part of RDF standard. ...

Existing approaches are usable, but users need more guidance how to implement them.

Metametadata is not always the appropriate solution (meta­level complexity vs. data model complexity)


Recommended