+ All Categories
Home > Technology > 20080917 Rev

20080917 Rev

Date post: 31-Aug-2014
Category:
Upload: charper
View: 2,752 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
46
Linked Library Data and the Semantic Web Leveraging Library Authority Control outside of MARC Applications Presented 2008-09-17 At the National Library of Sweden Corey A Harper
Transcript
Page 1: 20080917 Rev

Linked Library Dataand the Semantic Web

Leveraging Library Authority Control outside of MARC

Applications

Presented 2008-09-17At the National Library of SwedenCorey A Harper

Page 2: 20080917 Rev

2008-09-17 National Library of Sweden 2

Topical Overview• Linked Open Data and SemWeb• Library Authorities and Controlled

Vocabularies – Toward Library LOD• Work in progress in these areas• Metadata Normalization, Harmonization

and Recombination• Possibilities…

Page 3: 20080917 Rev

2008-09-17 National Library of Sweden 3

“The vast bulk of data to be on the Semantic Web is already sitting in databases … all that is needed [is] to write an adapter to convert a particular format into RDF and all the content in that format is available.”

-Tim Berners-Lee in an interview with the Consortium Standards Bulletin

Page 4: 20080917 Rev

2008-09-17 National Library of Sweden 4

Linked Open Data• Use URIs as names for things • Use HTTP URIs so that people can look

up those names. • When someone looks up a URI, provide

useful information. • Include links to other URIs. so that they

can discover more things. http://www.w3.org/DesignIssues/LinkedData.html

Page 5: 20080917 Rev

2008-09-17 National Library of Sweden 5

Page 6: 20080917 Rev

2008-09-17 National Library of Sweden 6

Page 7: 20080917 Rev

2008-09-17 National Library of Sweden 7

Linked Library Data• Resources get URI’s early in

lifecycle• Properties get URI’s• Vocabularies get URI’s• Everything is dereferenceable:

Able to request meaning over http

Page 8: 20080917 Rev

2008-09-17 National Library of Sweden 8

Library Authority Data“Include links to other URIs. so that they

can discover more things.”

Short of providing and linking to URIs, this *is* authority data.

This is what our authority files are for.

Page 9: 20080917 Rev

2008-09-17 National Library of Sweden 9

Authority Information• Controlled Vocabulary• SKOS for LCSH, Dewey, LCC, Mesh,

others• Need a structure for Name Authorities

– FOAF is only part of the answer• Standard URI’s for concepts and agents

– Possibly for FRBR Entities?

Page 10: 20080917 Rev

2008-09-17 National Library of Sweden 10

Page 11: 20080917 Rev

2008-09-17 National Library of Sweden 11

Library Controlled Vocabularies: Benefits

• Reputation - Trusted Tradition• Mature - Time tested and carefully

developed• General & Comprehensive - Cover

large knowledge spaces

Page 12: 20080917 Rev

2008-09-17 National Library of Sweden 12

Library Controlled Vocabularies: Drawbacks

• Overly Complicated - extraneous information

• Archaic Syntax - MARC Records• Slow to evolve - authorities control

the authority control

Page 13: 20080917 Rev

2008-09-17 National Library of Sweden 13

LCSH

Page 14: 20080917 Rev

2008-09-17 National Library of Sweden 14

LCSH in Dublin Core• Encoding Scheme for DC Subject• No easy way to draw on equivelent

terms and cross-references• Abstract Model, RDF and SKOS

could enable applications to make use of the whole vocabulary

Page 15: 20080917 Rev

2008-09-17 National Library of Sweden 15

Vocbaluary Encodings• MARC - Great for Library

Applications• MARC-XML• MADS• SKOS - Designed for use with RDF

}Helping Get Library Apps online

Page 16: 20080917 Rev

2008-09-17 National Library of Sweden 16

LCSH in SKOS<skos:Concept rdf:about="http://example.com/lcsh#95000541"> <skos:prefLabel>World Wide Web</skos:prefLabel> <skos:altLabel>W3 (World Wide Web)</skos:altLabel> <skos:altLabel>Web (World Wide Web)</skos:altLabel> <skos:altLabel>World Wide Web (Information Retrieval

System)</skos:altLabel> <skos:broader rdf:about="http://example.com/lcsh#88002671" /> <skos:broader rdf:about="http://example.com/lcsh#92002381" /> <skos:related rdf:about="http://example.com/lcsh#92002816"/> <skos:narrower rdf:about="http://example.com/lcsh#2002000569"/> <skos:narrower rdf:about="http://example.com/lcsh#2003001415"/> <skos:narrower rdf:about="http://example.com/lcsh#97003254"/></skos:Concept>

Page 17: 20080917 Rev

Diagram courtesy of Ed SummersSee upcoming DC2008Paper

Page 18: 20080917 Rev

2008-09-17 National Library of Sweden 18

Page 19: 20080917 Rev

2008-09-17 National Library of Sweden 19

Expected Benefits• Common RDF Semantics• Many Possible Web Services• Publish Vocabulary in Multiple

Formats– Ease of re-use

• Entertainment

Page 20: 20080917 Rev

2008-09-17 National Library of Sweden 20

Name Authorities• Many National Authority Files• Separate records representing

same author– Different Languages– Different Scripts

Page 21: 20080917 Rev

2008-09-17 National Library of Sweden 21

VIAF• Virtual International Authority File• First try - Merging• Second try - Linking

(then merging?)• Why not just link….?

Page 22: 20080917 Rev

22National Library of Sweden2008-09-17

Same Entity/Variant ScriptsJapanese

japanisch

Page 23: 20080917 Rev

2008-09-17 National Library of Sweden 23

Linking Open Names• Need an RDF Vocabulary for

Names and Corporations• FOAF is one piece of the puzzle• DC Agents Application Profile

– Quasi-Active DCMI Task Group

Page 24: 20080917 Rev

2008-09-17 National Library of Sweden 24

VIAF as LOD• Use owl:sameAs to declare

equality• Every national authority file gets a

SPARQL endpoint• No need to merge authority files• Applications can query, merging

relevant sets locally

Page 25: 20080917 Rev

2008-09-17 National Library of Sweden 25

Renew, reuse, recycle• Enable better sharing within

Library community• Share our data with other

communities• Reuse Authority Data in new and

interesting ways…

Page 26: 20080917 Rev

2008-09-17 National Library of Sweden 26

Page 27: 20080917 Rev

Shared Data Store

Local Data Store

IdentitySystemLCSH

ServiceLC-NAFService

The Restof the Web

Discovery Systems

Page 28: 20080917 Rev

2008-09-17 National Library of Sweden 28

Summers, Ed

DC2008ConferenceProceedings

Authority Files

SubjectHeadings

SemanticWeb

LCSH, SKOS and Linked Data

Article

Tag Blog Post

dc:subject dc:creator

dc:subject

dc:title

dcterms:isPartOfskos:broader

Authority Files

taggedBy tagTarget

owl:sameAs

tagName

This is only an example!!

•The Graph may not be entirely correct

•Tagging ontologies are very new

•May involve blank nodes &/or reification

Page 29: 20080917 Rev

2008-09-17 National Library of Sweden 29

Controlled Vocabularies Recontextualized

• LOD notion of “Information” vs. “Non-information” resources.– Info - documents on the web– Non-info - anything else: people, places,

things, books• Non-info resources have

representations / descriptions• These are info resources

Page 30: 20080917 Rev

2008-09-17 National Library of Sweden 30

Controlled Vocabularies Recontextualized

• Authority records are descriptions of non-information resources

• Bibliographic records are (usually) descriptions of non-information resources

• Other areas of Authority Control…

Page 31: 20080917 Rev

2008-09-17 National Library of Sweden 31

Image from the Getty Museum:http://www.getty.edu/research/conducting_research/standards/cdwa/entity.html

Page 32: 20080917 Rev

2008-09-17 National Library of Sweden 32

FRBR• Library community’s first

formalization of our data model• Untested• Incredibly complicated• Not reflected well in descriptive

standards or practice

Page 33: 20080917 Rev

2008-09-17 National Library of Sweden 33

FRBR“Simply by clustering your recordsinto work sets, you have not movedyour records into the FRBR model.FRBR is a complete data model that is anew way of looking at our data, not justtaking existing records and identifyingwork relationships”

- J. Rochkind - bibwild.wordpress.com

Page 34: 20080917 Rev

2008-09-17 National Library of Sweden 34

…and Librarydata is extremelycomplicated

Page 35: 20080917 Rev

2008-09-17 National Library of Sweden 35

MARC Record Graph• Does not include authority data• Coins new URI’s any non-literal value• Contains a few minor modeling errors

<modsrdf:Publisher modsrdf:value="Crowell" rdf:about="http://simile.mit.edu/2006/01/publisher/Crowell">

<modsrdf:location> <modsrdf:Place modsrdf:name="New York“

rdf:about="http://simile.mit.edu/2006/01/place/marccountry/nyu"/>

</modsrdf:location></modsrdf:Publisher>

Page 36: 20080917 Rev

2008-09-17 National Library of Sweden 36

A Distinction• Metadata Harmonization:

– the “ability to use serveral different metadata standards in a single software system.”

• Metadata Normalization:– mapping serveral different metadata

standards to a single schema or structure for use in a single software system.

Page 37: 20080917 Rev

2008-09-17 National Library of Sweden 37

Primo: A Case Study• Normalization Rules• Delivery templates• Tight SFX and MetaLib Integration• “Pipes” for different data sources• Hourly Availability Checking

– (Real Time in Version 2.0)

Page 38: 20080917 Rev

2008-09-17 National Library of Sweden 38

Harvesting• Different Data Sources• Different Normalization Rules• All standardized on Primo

Normalized XML (PNX) Record– Very Flat, sections corresponding to

Primo Functionality

Page 39: 20080917 Rev

2008-09-17 National Library of Sweden 39

Issues and Challenges• Managing Deduplication

– Dedup Data only out of box for MARC– Writing for OAI-PMH sources (EAD)

• Consortial Environment(s)• Appropriate Delivery Options• “Interpreting” Metadata

Page 40: 20080917 Rev

2008-09-17 National Library of Sweden 40

EAD Records• Archivists Toolkit

– Previously in Access, Notepad, Excel– Authority Control (sort of)

• OAI-PMH Overlay• Multiple layers of Crosswalking• Deduping

Page 41: 20080917 Rev

2008-09-17 National Library of Sweden 41

EAD / Aleph Dedup• Aleph Title:

– James E. Jackson and Esther Cooper Jackson papers

• EAD Title:– Guide to the James E. Jackson and Esther

Cooper Jackson papers 1917-2004 (Bulk 1937-1992) Tamiment 347

Page 42: 20080917 Rev

2008-09-17 National Library of Sweden 42

MARC + EADEAD Record

Aleph Record

Authority Records

MARC Recordw/ Auth Data

OAI-DC Recordw/ FT of EAD

EAD PNX

Aleph PNX

Dedup PNX

Page 43: 20080917 Rev

2008-09-17 National Library of Sweden 43

Value of Dedup• Indexing the Best of Both Worlds• EAD Records:

– Inventory– Long Biographical / Historical Notes

• MARC Data:– Cross References for Access Points

Page 44: 20080917 Rev

2008-09-17 National Library of Sweden 44

It shouldn’t be this hard!• Dedup Process shouldn’t be

necessary– Authority files should be useable

within non-MARC applications– Merging is easier with more

granularity, more homogeneity, in data sets

Page 45: 20080917 Rev

2008-09-17 National Library of Sweden 45

Endless possibilities• This barely scratches the surface• Authority Data is only a small part• With more soundly modeled

bibliographic and authority data…– Terminology Services– Context sensitive

searching– Customized interfaces– Customized exhibitis

– Mashups– Web Services– User Profiling– Collaboration tools

Page 46: 20080917 Rev

2008-09-17 National Library of Sweden 46

Thanks!Questions?

[email protected]+1 212.998.2479


Recommended