+ All Categories
Home > Documents > DE conferentie 2008 - Isaac en Wartena

DE conferentie 2008 - Isaac en Wartena

Date post: 30-Nov-2014
Category:
Upload: digitaal-erfgoedconferentie
View: 407 times
Download: 1 times
Share this document with a friend
Description:
 
91
Semantic Web Do it Yourself DE Conferentie Rotterdam, 11 december 2008 Antoine Isaac en Christian Wartena
Transcript
Page 1: DE conferentie 2008 - Isaac en Wartena

Semantic WebDo it Yourself

DE ConferentieRotterdam, 11 december 2008

Antoine Isaac en Christian Wartena

Page 2: DE conferentie 2008 - Isaac en Wartena

Preamble: who are we?Christian Wartena

Telematica Institutehttp://www.telin.nl/

Working on:MyMedia (http://www.mymedia-project.org/)Cultuur in ContextCATCH research programme (CHOICE)

[email protected]

Antoine IsaacWorking on:

CATCH research programme (STITCH)European Library-related projects (TELplus)SKOS @ W3C

http://www.few.vu.nl/~aisaac/[email protected]

Page 3: DE conferentie 2008 - Isaac en Wartena

Preamble: who are you?Domain

Archive?Library?Museum?

Experience with SWNew to the stuff?Basic knowledge?Advanced knowledge / already implemented something?

MotivationThinking about using it?Just to learn about it or what you can do with it?

Page 4: DE conferentie 2008 - Isaac en Wartena

Topics of this workshopSemantic Web in a nutshell (20’)Do it yourself (40’)Short break – Answer questions (15’)Examples from practice & Discussion (45’)

Page 5: DE conferentie 2008 - Isaac en Wartena

TopicsSemantic Web in a nutshell

A Web of dataSmart data

Do it yourselfShort break – Answer questionsExamples from practice & Discussion

Page 6: DE conferentie 2008 - Isaac en Wartena

The Web for humans

A cityThe city’s locationHyperlinks anchored to words

Meaning

Page 7: DE conferentie 2008 - Isaac en Wartena

SW problem: the Web for computers?

Where is meaning?

Page 8: DE conferentie 2008 - Isaac en Wartena

The Semantic Web vision: a web of (smart) data

defines

Amsterdam

par3

file1

Article

type

partOf

DocumentsubClassOfThe_Netherlands

hasCapital

City

type

Page 9: DE conferentie 2008 - Isaac en Wartena

A Web of resources

myVoc:Amsterdam

http://ex.org/files/file1

theirVoc:Article

• Web-enabled Identifiers (URIs)• Coming from different spaces

myVoc: = http://example.org/myVocabulary/

Page 10: DE conferentie 2008 - Isaac en Wartena

Data in an RDF “graph”

theirVoc:subject

myVoc:Amsterdam

http://ex.org/files/file1

theirVoc:Article

rdf:type

Resource Description Framework : structured data as triple statements

Links coming from different spaces

Page 11: DE conferentie 2008 - Isaac en Wartena

More than traditional metadata (1)Web-based resources allow distribution/sharing/linking of

documentsdescription vocabularies(meta)data

(file1, subject, Amsterdam)

differentowners & locations

http://www.kb.nl/eDepot

http://geo.org/voc/

http://ex.org/files/file1

Page 12: DE conferentie 2008 - Isaac en Wartena

http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

Web of data example: Linking Open Data community project

Page 13: DE conferentie 2008 - Isaac en Wartena

CH case: Librishttp://libris.kb.se/Swedish Union Catalogue as linked data

Page 14: DE conferentie 2008 - Isaac en Wartena

Martin Malmsten, Dublin Core 2008http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf

Linked descriptions of resources in Libris

Page 15: DE conferentie 2008 - Isaac en Wartena

External links in Libris: Library of Congress Subject Headings

Ed Summers et. al., Dublin Core 2008http://dc2008.de/wp-content/uploads/2008/09/summers-isaac-redding-krech.pdf

Page 16: DE conferentie 2008 - Isaac en Wartena

Searching using multiple vocabularies

Page 17: DE conferentie 2008 - Isaac en Wartena

AgendaSemantic Web in a nutshell

A Web of dataSmart data: the "semantics"

Page 18: DE conferentie 2008 - Isaac en Wartena

Creating vocabularies of “building blocks”for RDF graphs

theirVoc:subject

myVoc:Amsterdam

http://ex.org/files/file1

theirVoc:Article

rdf:type

Page 19: DE conferentie 2008 - Isaac en Wartena

OntologiesOntologies specify description vocabularies which can be shared

subject, Article

Give formal definition to vocabulary elements

Every Article is a Document

Page 20: DE conferentie 2008 - Isaac en Wartena

Machine-readable definitions

http://ex.org/files/file1

theirVoc:Article

rdf:type

theirVoc:Documentrdfs:subClassOf

rdf:type

Allows deduction of new facts & control of existing factsby reasoning engines

Ontology axiom

Page 21: DE conferentie 2008 - Isaac en Wartena

CH case: eCulture/Europeana

Page 22: DE conferentie 2008 - Isaac en Wartena

CH case study: eCulture/Europeana

Page 23: DE conferentie 2008 - Isaac en Wartena

Case study: Europeana

The query:

The existing description:

Why is there a match?For the Europeana ontology, every rma:depicts statement implies a vra:subject statement

rma:gezicht_in_cairo

rma:Cairo

rma:depicts

rma:Egypt

skos:broader

?x

?y

vra:subject

rma:Egypt

skos:broader

Page 24: DE conferentie 2008 - Isaac en Wartena

Flexible reasoning: a same base can be easily added with new descriptions using different ontologies

den08:shows_DEN_Participant

Requirement: semantically connect these ontologiesden08:shows_DEN_Participant "implies" vra:subject

SW principle: meaning is accessible with the data, not encoded in external programs

More than traditional metadata (2)

Page 25: DE conferentie 2008 - Isaac en Wartena

Semantic Web in a nutshellA web of (meta)data

Descriptions of resourcesEasy to share and interconnect

Smart dataMachine-readable definitions for the data

Relies on open standardsW3C's URI, XML, RDF, OWL, SPARQL, SKOS…

Naar buiten!

http://www.w3.org/2001/sw/

Page 26: DE conferentie 2008 - Isaac en Wartena

TopicsSemantic Web in a nutshellDo it yourselfShort break – Answer questionsExamples from practice & Discussion

Page 27: DE conferentie 2008 - Isaac en Wartena

Doing it yourself?

What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?

Porting existing data to the Semantic WebAnnotationCreating interoperability at the semantic level

Page 28: DE conferentie 2008 - Isaac en Wartena

Porting CH data to the Semantic WebTypical CH data:

Metadata on objects and documents, using specific description structures and possibly controlled vocabularies

Page 29: DE conferentie 2008 - Isaac en Wartena

Representing semantics of your dataSemantics is about relations!

Relations between words and things in the real world.Relations between concepts.

Concepts are defined by their relations to other concepts

E.g.: a theatre is a buildingevery building is located in a locationevery building is build in a period of timea building is designed by an architect

Page 30: DE conferentie 2008 - Isaac en Wartena

Semantic RelationsObjects are defined by their relations to other entitiesJust a thesaurus is not enough

Only ‘broader/narrower’ relation is still very poor.

E.g. the Rotterdamse Schouwburg is not only defined by saying that it is a theatre, but also by:

the Rot. Schouwb. is located in Schouwburgplein 25 the Rot. Schouwb. is build in 1988the Rot. Schouwb. is designed by Wim Gerhard Quist

Relations enable reasoning about entities.

Page 31: DE conferentie 2008 - Isaac en Wartena

Relations and InteroperabilityTwo collections generally don’t consists of the same type of objects (except for painting galleries…)Nevertheless, in many cases, there are many relations between the collections.

E.g. a collection about architecture photographs and a collection about theatre productions.Relations between these collections can only be found if we specify the relation between

a picture and the building on the picture;a theatre production and the buildings it was performed in.

Page 32: DE conferentie 2008 - Isaac en Wartena

How to find a suitable ontology?The set of concepts and relations is called an ontology.Which ontology should we use?

Find a suitable ontology.Makes your data directly interoperable to other institutions using that ontology.For many domains no ontologies are available.

Build an ontology yourself.Yields the perfectly suited ontology.Lot of work.

Page 33: DE conferentie 2008 - Isaac en Wartena

How to design an OntologyOntology should define relations between used concepts

Nothing more.No judgments whether a concept is important, peripheral, etc. Not a general world view

But: possibilities for extension

No solutions for standard problems Representation of NAL data

Page 34: DE conferentie 2008 - Isaac en Wartena

But what are our concepts?Defining an ontology forces you to make clear what concepts you use

That is a great value by itself.

This is a real challenge if you are working with people from different disciplines or institutes.

You should answer questions like what is the relation between

A theatre building and the theatre genreA performance and a theatre productionA location and an address (that might change when a street is renamed or renumbered)Etc.

Solutions should be consistent

Page 35: DE conferentie 2008 - Isaac en Wartena

How to write it downUse the standard semantic web languages, like RDF and OWL.

These languages have some restrictions.You have to learn how to deal with them

You can’t write larger ontologies without using specialized tooling.

Page 36: DE conferentie 2008 - Isaac en Wartena

OWL for domain experts?What is a class, what is an individual?

When do we use data(type) properties, when object properties?

And what are annotation properties?

How can we use subproperties?

Language restrictions: only binary relationsYou cannot say something like:

Jan is director of (Amphion, 1984-1992)This is not equivalent to

Jan is director of AmphionJan is director during 1984-1992

Page 37: DE conferentie 2008 - Isaac en Wartena

Tooling (1)Most tools seem to require a PhD in Semantic Web ScienceTools are

ProtégéOntoStudioEtc.

Page 38: DE conferentie 2008 - Isaac en Wartena

Tooling (2)Cooperate with semantic web experts

Use simple representations to talk about the ontology.

Graphical representations are impressive but don’t scale to more than a handful of concepts.

Proven ways to exchange information about the ontologyFace to face discussionsE-mailWikisOwlDoc

Page 39: DE conferentie 2008 - Isaac en Wartena

Example

Page 40: DE conferentie 2008 - Isaac en Wartena

But…

Ontologies are fine-grained models for the dataCreating (and using) them is labor-intensiveDo we need them for all kind of CH data?

Consider thesauri and other controlled vocabularies:

(dozens of) thousands of conceptsAAT

Loose semanticsCar wheel BT Car

Still useful for applications!Search, annotation

Page 41: DE conferentie 2008 - Isaac en Wartena

Porting controlled vocabularies to the Semantic Web

xxx

xxxx

xxxx

xxx

xxxx

xxxx xxx

xxx

xxx

xxxx

xxxx

xxx

xxxx

xxxx

xxx

xxxx

xxx

xxx

Page 42: DE conferentie 2008 - Isaac en Wartena

SKOS

Observation: there are many models/formats for controlled vocabularies:

thesauri, classification schemes, etc…

But also common features, used by typical applications

Lexical information, semantic links

SKOS (Simple Knowledge Organization System)model to represent KOSs on the Semantic Web in a simple way

Comparable to Dublin Core, for conceptual vocabularies

Page 43: DE conferentie 2008 - Isaac en Wartena

Concepts and labels

Page 44: DE conferentie 2008 - Isaac en Wartena

(Multilingual) labels

Page 45: DE conferentie 2008 - Isaac en Wartena

Semantic relations

Page 46: DE conferentie 2008 - Isaac en Wartena

Putting it together: a SKOS graphanimalscats

UF domestic catsRT wildcatsBT animals

domestic catsUSE cats

wildcats

Page 47: DE conferentie 2008 - Isaac en Wartena

Networking controlled vocabularies

Page 48: DE conferentie 2008 - Isaac en Wartena

Doing it yourself?

What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?

Porting existing data to the Semantic WebAnnotationCreating interoperability at the semantic level

Page 49: DE conferentie 2008 - Isaac en Wartena

MultimediaN eCulture Annotation tool

Page 50: DE conferentie 2008 - Isaac en Wartena

Benefiting from the availability of different vocabularies

Page 51: DE conferentie 2008 - Isaac en Wartena

Direct access to the context of annotations

Page 52: DE conferentie 2008 - Isaac en Wartena

CHOICE annotation tool: benefitingfrom information extraction technology

Page 53: DE conferentie 2008 - Isaac en Wartena

Doing it yourself?

What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?

Porting existing data to the Semantic WebAnnotationCreating interoperability at the semantic level

Page 54: DE conferentie 2008 - Isaac en Wartena

Interoperability

Levels of Interoperability

Creating Interoperability

Page 55: DE conferentie 2008 - Isaac en Wartena

Levels of interoperabilityInformation Interoperability

Syntactic Interoperabilityformat heterogeneitymany commercial solutions

Structural Interoperabilitystructural/Schematic heterogeneitysome commercial solutions

mainly based on manually provided mapping rules

Semantic Interoperabilitychallenge!

Page 56: DE conferentie 2008 - Isaac en Wartena

DefinitionSemantic Interoperability is the ability of two or more computer systems to exchange information and have the meaning of that information automatically interpreted by the receiving system accurately enough to produce useful results.

Page 57: DE conferentie 2008 - Isaac en Wartena

Structural Interoperability

Page 58: DE conferentie 2008 - Isaac en Wartena

Structural Interoperability (NPO)

Page 59: DE conferentie 2008 - Isaac en Wartena

Sem. Interoperability ‘missie congo’

Page 60: DE conferentie 2008 - Isaac en Wartena

Sem. Interoperability ‘missie congo’

Page 61: DE conferentie 2008 - Isaac en Wartena

Sem. Interoperability ‘missie congo’

Page 62: DE conferentie 2008 - Isaac en Wartena

Creating InteroperabilityUsing one standard model

By hand

The wisdom of the crowds?

Automatic alignment

Page 63: DE conferentie 2008 - Isaac en Wartena

StandardizationNot realistic for a lot data

Legacy dataWho determines the standard?

Talking about standards hinders annotating the collections

Useful for common and clearly restricted domains;

Name, address, location (NAL / NAW)common annotation types;

Dublin Coremetalevel

SKOS

Page 64: DE conferentie 2008 - Isaac en Wartena

Integrate by handLots of work

Best results

Page 65: DE conferentie 2008 - Isaac en Wartena

Example

Page 66: DE conferentie 2008 - Isaac en Wartena
Page 67: DE conferentie 2008 - Isaac en Wartena

Interoperability by Using theWisdom of the Crowds

Use web2.0 to realize the semantic web

“Defining new mappings interactively. As a user browses an ontology in a repository, he may come across a concept for which he knows there is a similar concept in another ontology. The user can create the mapping on-the-fly, linking the two concepts.” (Noy e.a. 2008)

Use tags as a common annotation level?

Page 68: DE conferentie 2008 - Isaac en Wartena

Example from Stanford University

Page 69: DE conferentie 2008 - Isaac en Wartena
Page 70: DE conferentie 2008 - Isaac en Wartena

Automatic alignment techniques

Lexical Labels of entities and textual definitions

StructuralStructure of the vocabularies

Background knowledge Using a shared conceptual reference to find links

ExtensionalObject information (e.g. book indexing)

brainLong tumor tumorLong

Frank van Harmelen, AIME05http://www.cs.vu.nl/~frankh/presentations/AIME05.ppt

Page 71: DE conferentie 2008 - Isaac en Wartena

Semantic interoperability by using synonyms and related terms

For many tasks we don’t have to know what the exact relation between two entities is.

It is sufficient to know that terms are representing the same concept (‘synonyms’) or related concepts.

UsageIntelligent query expansionAutomatic finding relations between collections

Relatedness of terms can in some cases be derived from actual metadata!

Page 72: DE conferentie 2008 - Isaac en Wartena

Finding related terms by using co-occurrence data

Statistics from usage in annotated dataCooccurence of metadata is the key

High cooccurrence of VN and veiligheidsraad

MissieVeilig-

heidsraadVN

MissieVeilig-

heidsraadVN

Veilig-heidsraad

VNNew-York

Veilig-heidsraad

VNNew-York

MissieBlauw-

helmVN

MissieBlauw-

helmVN

CongoVeilig-heidsraadVN

CongoVeilig-heidsraadVN

GazaVeilig-heidsraad

GazaVeilig-heidsraad

MissieVNBush

MissieVNBush

Page 73: DE conferentie 2008 - Isaac en Wartena

New measure for keyword similarity

Keywords have similar usage if they co-occur with similar frequency with all other keywords.

In other words:Terms are similar if they have similar co-occurrence patterns

Work very well for social tags

Page 74: DE conferentie 2008 - Isaac en Wartena

2 Experiments

Mapping between Teleblik keywords and User Tags100 videos12.414 tags 4.348 different tags269 different keywords

Mapping between del.icio.us tags and Wikipedia categories58.345 articles500.618 tags and category annotations42.425 different categories49.603 different tagsMapping computed for 4.182 most frequent tags/cat.

(Involving a transformation on a 92082 x 92082 matrix of floats!)

Page 75: DE conferentie 2008 - Isaac en Wartena

del.icio.us and WikipediaDel.icio.us

Social book marking siteBookmarks in most cases can be interpreted as labels or tags for the bookmarked URL.Many Wikipedia articles are tagged by del.icio.us users

WikipediaArticles are labeled with one or more categories by the article authors.Categories are organized hierarchically.Categories are organized consciously like in a thesaurus

Page 76: DE conferentie 2008 - Isaac en Wartena
Page 77: DE conferentie 2008 - Isaac en Wartena

economistAmerican_economist Economists economics

people philosophy history

ecommerceElectronic_commerce credit business web2.0

American_poetsAmerican_novelists poetry literature Living_people art

beautyactress American_television_actors

cinema interesting people

Classification_systemstaxonomy classification library folksonomy tagging

biology psychology

Page 78: DE conferentie 2008 - Isaac en Wartena

Results of instance based mappingcategories to tags

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

identical synonym broader narrower related unrelated

Page 79: DE conferentie 2008 - Isaac en Wartena

Results of instance based mappingtags to categories

0

0,05

0,1

0,15

0,2

0,25

0,3

identical synonym broader narrower related unrelated

Page 80: DE conferentie 2008 - Isaac en Wartena

hilariousfunny humor humour comedy fun

cookingcookbook recipes cookbooks cookery food

diarydiaries journal teenage family girls

unemploymentdrunk employment middle_class jobs class

homerodysseus greek_poetry trojan_war troy iliad

shakespeareElizabethan_drama william_shakespeare

british_drama plays tragedies

Page 81: DE conferentie 2008 - Isaac en Wartena

Case study: re-indexing at KB?KB has a depot collectionSome books are indexed beforehand at public librariesThe two collections use different thesauri

Biblion(openbare bibliotheken)

Depot

650Kbooks

1Mbooks

LTR Brinkman

(KB)

Page 82: DE conferentie 2008 - Isaac en Wartena

The re-indexing applicationPropose Brinkman indexing when KB receives a Biblion book

Biblion Depot

LTR Brinkman

? ? ?

LTR

Brinkman

Page 83: DE conferentie 2008 - Isaac en Wartena

Techniques used

Lexical: comparison of concept labelsExtensional: based on overlap of books indexed with LTR concepts and books indexed with Brinkman concepts

LTR Brinkman

Collectionof books

DutchLiterature

Dutch

Page 84: DE conferentie 2008 - Isaac en Wartena

Result

Page 85: DE conferentie 2008 - Isaac en Wartena

Feasibility?Quality of annotations

Level comparable to Christian's experimentFirst experiments: optimum around 60% precision, 50% recall

Automatic alignment has flaws, but it could help already!

Assisting users, not replacing them Note: the application is difficult (variability)

Page 86: DE conferentie 2008 - Isaac en Wartena

Doing it yourself?

What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?

Porting legacy data to the Semantic WebRepresenting semantics of your dataControlled vocabularies – SKOS

AnnotationCreating new semantic CH descriptions

Creating interoperability at the semantic levelConnecting (to standard) description modelsUsing the wisdom of the crowds?Automatically aligning existing description vocabularies

Page 87: DE conferentie 2008 - Isaac en Wartena

Take-home message: benefitsPerforming over the web:

Knowledge re-use & sharingLibris

Knowledge integrationCiC, KB re-indexing

Data enrichmentEnhanced collection access

eCulture/Europeana semantic search

It can really help open up CH data!

Page 88: DE conferentie 2008 - Isaac en Wartena

Take-home message: costsSteps to be taken

Porting legacy dataSemantic alignmentAnnotation

Page 89: DE conferentie 2008 - Isaac en Wartena

TopicsSemantic Web in a nutshellDo it yourselfShort break – Formulate QuestionsExamples from practice

Page 90: DE conferentie 2008 - Isaac en Wartena

Examples from practiceDo you have experiences or plans to

Semantically annotate?Using standard vocabularies?

Model your domain; write or extend an ontologyConnect to or integrate with other collections?

Inside or outside your instituteMake data available on the internet?

Tell us (after the break)!

Some issues to tell us about: (next slide)

Page 91: DE conferentie 2008 - Isaac en Wartena

Why do you think semantic web could be helpful to your organization

Where?Front-office

Search/Browse/recommendation/personalizationMake data available to end-users for reuse (mash-ups, virtual (user) collections, … )

Back-officeProvide data to third partiesInference of new knowledge/consistency checking

What type of data?Porting existing dataCreating new data

Automatic data extraction / Experts / End users

What new (semantic) links arise?New relations to which collection/vocabularies?Within a collection / With other collections / With other institutesHow? Automatic alignment / Experts / End users?


Recommended