Date post: | 30-Nov-2014 |
Category: |
Documents |
Upload: | digitaal-erfgoedconferentie |
View: | 407 times |
Download: | 1 times |
Semantic WebDo it Yourself
DE ConferentieRotterdam, 11 december 2008
Antoine Isaac en Christian Wartena
Preamble: who are we?Christian Wartena
Telematica Institutehttp://www.telin.nl/
Working on:MyMedia (http://www.mymedia-project.org/)Cultuur in ContextCATCH research programme (CHOICE)
Antoine IsaacWorking on:
CATCH research programme (STITCH)European Library-related projects (TELplus)SKOS @ W3C
http://www.few.vu.nl/~aisaac/[email protected]
Preamble: who are you?Domain
Archive?Library?Museum?
Experience with SWNew to the stuff?Basic knowledge?Advanced knowledge / already implemented something?
MotivationThinking about using it?Just to learn about it or what you can do with it?
Topics of this workshopSemantic Web in a nutshell (20’)Do it yourself (40’)Short break – Answer questions (15’)Examples from practice & Discussion (45’)
TopicsSemantic Web in a nutshell
A Web of dataSmart data
Do it yourselfShort break – Answer questionsExamples from practice & Discussion
The Web for humans
A cityThe city’s locationHyperlinks anchored to words
Meaning
SW problem: the Web for computers?
Where is meaning?
The Semantic Web vision: a web of (smart) data
defines
Amsterdam
par3
file1
Article
type
partOf
DocumentsubClassOfThe_Netherlands
hasCapital
City
type
A Web of resources
myVoc:Amsterdam
http://ex.org/files/file1
theirVoc:Article
• Web-enabled Identifiers (URIs)• Coming from different spaces
myVoc: = http://example.org/myVocabulary/
Data in an RDF “graph”
theirVoc:subject
myVoc:Amsterdam
http://ex.org/files/file1
theirVoc:Article
rdf:type
Resource Description Framework : structured data as triple statements
Links coming from different spaces
More than traditional metadata (1)Web-based resources allow distribution/sharing/linking of
documentsdescription vocabularies(meta)data
(file1, subject, Amsterdam)
differentowners & locations
http://www.kb.nl/eDepot
http://geo.org/voc/
http://ex.org/files/file1
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Web of data example: Linking Open Data community project
CH case: Librishttp://libris.kb.se/Swedish Union Catalogue as linked data
Martin Malmsten, Dublin Core 2008http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf
Linked descriptions of resources in Libris
External links in Libris: Library of Congress Subject Headings
Ed Summers et. al., Dublin Core 2008http://dc2008.de/wp-content/uploads/2008/09/summers-isaac-redding-krech.pdf
Searching using multiple vocabularies
AgendaSemantic Web in a nutshell
A Web of dataSmart data: the "semantics"
Creating vocabularies of “building blocks”for RDF graphs
theirVoc:subject
myVoc:Amsterdam
http://ex.org/files/file1
theirVoc:Article
rdf:type
OntologiesOntologies specify description vocabularies which can be shared
subject, Article
Give formal definition to vocabulary elements
Every Article is a Document
Machine-readable definitions
http://ex.org/files/file1
theirVoc:Article
rdf:type
theirVoc:Documentrdfs:subClassOf
rdf:type
Allows deduction of new facts & control of existing factsby reasoning engines
Ontology axiom
CH case: eCulture/Europeana
CH case study: eCulture/Europeana
Case study: Europeana
The query:
The existing description:
Why is there a match?For the Europeana ontology, every rma:depicts statement implies a vra:subject statement
rma:gezicht_in_cairo
rma:Cairo
rma:depicts
rma:Egypt
skos:broader
?x
?y
vra:subject
rma:Egypt
skos:broader
Flexible reasoning: a same base can be easily added with new descriptions using different ontologies
den08:shows_DEN_Participant
Requirement: semantically connect these ontologiesden08:shows_DEN_Participant "implies" vra:subject
SW principle: meaning is accessible with the data, not encoded in external programs
More than traditional metadata (2)
Semantic Web in a nutshellA web of (meta)data
Descriptions of resourcesEasy to share and interconnect
Smart dataMachine-readable definitions for the data
Relies on open standardsW3C's URI, XML, RDF, OWL, SPARQL, SKOS…
Naar buiten!
http://www.w3.org/2001/sw/
TopicsSemantic Web in a nutshellDo it yourselfShort break – Answer questionsExamples from practice & Discussion
Doing it yourself?
What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?
Porting existing data to the Semantic WebAnnotationCreating interoperability at the semantic level
Porting CH data to the Semantic WebTypical CH data:
Metadata on objects and documents, using specific description structures and possibly controlled vocabularies
Representing semantics of your dataSemantics is about relations!
Relations between words and things in the real world.Relations between concepts.
Concepts are defined by their relations to other concepts
E.g.: a theatre is a buildingevery building is located in a locationevery building is build in a period of timea building is designed by an architect
Semantic RelationsObjects are defined by their relations to other entitiesJust a thesaurus is not enough
Only ‘broader/narrower’ relation is still very poor.
E.g. the Rotterdamse Schouwburg is not only defined by saying that it is a theatre, but also by:
the Rot. Schouwb. is located in Schouwburgplein 25 the Rot. Schouwb. is build in 1988the Rot. Schouwb. is designed by Wim Gerhard Quist
Relations enable reasoning about entities.
Relations and InteroperabilityTwo collections generally don’t consists of the same type of objects (except for painting galleries…)Nevertheless, in many cases, there are many relations between the collections.
E.g. a collection about architecture photographs and a collection about theatre productions.Relations between these collections can only be found if we specify the relation between
a picture and the building on the picture;a theatre production and the buildings it was performed in.
How to find a suitable ontology?The set of concepts and relations is called an ontology.Which ontology should we use?
Find a suitable ontology.Makes your data directly interoperable to other institutions using that ontology.For many domains no ontologies are available.
Build an ontology yourself.Yields the perfectly suited ontology.Lot of work.
How to design an OntologyOntology should define relations between used concepts
Nothing more.No judgments whether a concept is important, peripheral, etc. Not a general world view
But: possibilities for extension
No solutions for standard problems Representation of NAL data
But what are our concepts?Defining an ontology forces you to make clear what concepts you use
That is a great value by itself.
This is a real challenge if you are working with people from different disciplines or institutes.
You should answer questions like what is the relation between
A theatre building and the theatre genreA performance and a theatre productionA location and an address (that might change when a street is renamed or renumbered)Etc.
Solutions should be consistent
How to write it downUse the standard semantic web languages, like RDF and OWL.
These languages have some restrictions.You have to learn how to deal with them
You can’t write larger ontologies without using specialized tooling.
OWL for domain experts?What is a class, what is an individual?
When do we use data(type) properties, when object properties?
And what are annotation properties?
How can we use subproperties?
Language restrictions: only binary relationsYou cannot say something like:
Jan is director of (Amphion, 1984-1992)This is not equivalent to
Jan is director of AmphionJan is director during 1984-1992
Tooling (1)Most tools seem to require a PhD in Semantic Web ScienceTools are
ProtégéOntoStudioEtc.
Tooling (2)Cooperate with semantic web experts
Use simple representations to talk about the ontology.
Graphical representations are impressive but don’t scale to more than a handful of concepts.
Proven ways to exchange information about the ontologyFace to face discussionsE-mailWikisOwlDoc
Example
But…
Ontologies are fine-grained models for the dataCreating (and using) them is labor-intensiveDo we need them for all kind of CH data?
Consider thesauri and other controlled vocabularies:
(dozens of) thousands of conceptsAAT
Loose semanticsCar wheel BT Car
Still useful for applications!Search, annotation
Porting controlled vocabularies to the Semantic Web
xxx
xxxx
xxxx
xxx
xxxx
xxxx xxx
xxx
xxx
xxxx
xxxx
xxx
xxxx
xxxx
xxx
xxxx
xxx
xxx
SKOS
Observation: there are many models/formats for controlled vocabularies:
thesauri, classification schemes, etc…
But also common features, used by typical applications
Lexical information, semantic links
SKOS (Simple Knowledge Organization System)model to represent KOSs on the Semantic Web in a simple way
Comparable to Dublin Core, for conceptual vocabularies
Concepts and labels
(Multilingual) labels
Semantic relations
Putting it together: a SKOS graphanimalscats
UF domestic catsRT wildcatsBT animals
domestic catsUSE cats
wildcats
Networking controlled vocabularies
Doing it yourself?
What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?
Porting existing data to the Semantic WebAnnotationCreating interoperability at the semantic level
MultimediaN eCulture Annotation tool
Benefiting from the availability of different vocabularies
Direct access to the context of annotations
CHOICE annotation tool: benefitingfrom information extraction technology
Doing it yourself?
What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?
Porting existing data to the Semantic WebAnnotationCreating interoperability at the semantic level
Interoperability
Levels of Interoperability
Creating Interoperability
Levels of interoperabilityInformation Interoperability
Syntactic Interoperabilityformat heterogeneitymany commercial solutions
Structural Interoperabilitystructural/Schematic heterogeneitysome commercial solutions
mainly based on manually provided mapping rules
Semantic Interoperabilitychallenge!
DefinitionSemantic Interoperability is the ability of two or more computer systems to exchange information and have the meaning of that information automatically interpreted by the receiving system accurately enough to produce useful results.
Structural Interoperability
Structural Interoperability (NPO)
Sem. Interoperability ‘missie congo’
Sem. Interoperability ‘missie congo’
Sem. Interoperability ‘missie congo’
Creating InteroperabilityUsing one standard model
By hand
The wisdom of the crowds?
Automatic alignment
StandardizationNot realistic for a lot data
Legacy dataWho determines the standard?
Talking about standards hinders annotating the collections
Useful for common and clearly restricted domains;
Name, address, location (NAL / NAW)common annotation types;
Dublin Coremetalevel
SKOS
Integrate by handLots of work
Best results
Example
Interoperability by Using theWisdom of the Crowds
Use web2.0 to realize the semantic web
“Defining new mappings interactively. As a user browses an ontology in a repository, he may come across a concept for which he knows there is a similar concept in another ontology. The user can create the mapping on-the-fly, linking the two concepts.” (Noy e.a. 2008)
Use tags as a common annotation level?
Example from Stanford University
Automatic alignment techniques
Lexical Labels of entities and textual definitions
StructuralStructure of the vocabularies
Background knowledge Using a shared conceptual reference to find links
ExtensionalObject information (e.g. book indexing)
brainLong tumor tumorLong
Frank van Harmelen, AIME05http://www.cs.vu.nl/~frankh/presentations/AIME05.ppt
Semantic interoperability by using synonyms and related terms
For many tasks we don’t have to know what the exact relation between two entities is.
It is sufficient to know that terms are representing the same concept (‘synonyms’) or related concepts.
UsageIntelligent query expansionAutomatic finding relations between collections
Relatedness of terms can in some cases be derived from actual metadata!
Finding related terms by using co-occurrence data
Statistics from usage in annotated dataCooccurence of metadata is the key
High cooccurrence of VN and veiligheidsraad
MissieVeilig-
heidsraadVN
MissieVeilig-
heidsraadVN
Veilig-heidsraad
VNNew-York
Veilig-heidsraad
VNNew-York
MissieBlauw-
helmVN
MissieBlauw-
helmVN
CongoVeilig-heidsraadVN
CongoVeilig-heidsraadVN
GazaVeilig-heidsraad
GazaVeilig-heidsraad
MissieVNBush
MissieVNBush
New measure for keyword similarity
Keywords have similar usage if they co-occur with similar frequency with all other keywords.
In other words:Terms are similar if they have similar co-occurrence patterns
Work very well for social tags
2 Experiments
Mapping between Teleblik keywords and User Tags100 videos12.414 tags 4.348 different tags269 different keywords
Mapping between del.icio.us tags and Wikipedia categories58.345 articles500.618 tags and category annotations42.425 different categories49.603 different tagsMapping computed for 4.182 most frequent tags/cat.
(Involving a transformation on a 92082 x 92082 matrix of floats!)
del.icio.us and WikipediaDel.icio.us
Social book marking siteBookmarks in most cases can be interpreted as labels or tags for the bookmarked URL.Many Wikipedia articles are tagged by del.icio.us users
WikipediaArticles are labeled with one or more categories by the article authors.Categories are organized hierarchically.Categories are organized consciously like in a thesaurus
economistAmerican_economist Economists economics
people philosophy history
ecommerceElectronic_commerce credit business web2.0
American_poetsAmerican_novelists poetry literature Living_people art
beautyactress American_television_actors
cinema interesting people
Classification_systemstaxonomy classification library folksonomy tagging
biology psychology
Results of instance based mappingcategories to tags
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
0,45
identical synonym broader narrower related unrelated
Results of instance based mappingtags to categories
0
0,05
0,1
0,15
0,2
0,25
0,3
identical synonym broader narrower related unrelated
hilariousfunny humor humour comedy fun
cookingcookbook recipes cookbooks cookery food
diarydiaries journal teenage family girls
unemploymentdrunk employment middle_class jobs class
homerodysseus greek_poetry trojan_war troy iliad
shakespeareElizabethan_drama william_shakespeare
british_drama plays tragedies
Case study: re-indexing at KB?KB has a depot collectionSome books are indexed beforehand at public librariesThe two collections use different thesauri
Biblion(openbare bibliotheken)
Depot
650Kbooks
1Mbooks
LTR Brinkman
(KB)
The re-indexing applicationPropose Brinkman indexing when KB receives a Biblion book
Biblion Depot
LTR Brinkman
? ? ?
LTR
Brinkman
Techniques used
Lexical: comparison of concept labelsExtensional: based on overlap of books indexed with LTR concepts and books indexed with Brinkman concepts
LTR Brinkman
Collectionof books
DutchLiterature
Dutch
Result
Feasibility?Quality of annotations
Level comparable to Christian's experimentFirst experiments: optimum around 60% precision, 50% recall
Automatic alignment has flaws, but it could help already!
Assisting users, not replacing them Note: the application is difficult (variability)
Doing it yourself?
What can or should a CH do to publish data on the Semantic Web and/or benefit from SW techniques?
Porting legacy data to the Semantic WebRepresenting semantics of your dataControlled vocabularies – SKOS
AnnotationCreating new semantic CH descriptions
Creating interoperability at the semantic levelConnecting (to standard) description modelsUsing the wisdom of the crowds?Automatically aligning existing description vocabularies
Take-home message: benefitsPerforming over the web:
Knowledge re-use & sharingLibris
Knowledge integrationCiC, KB re-indexing
Data enrichmentEnhanced collection access
eCulture/Europeana semantic search
It can really help open up CH data!
Take-home message: costsSteps to be taken
Porting legacy dataSemantic alignmentAnnotation
TopicsSemantic Web in a nutshellDo it yourselfShort break – Formulate QuestionsExamples from practice
Examples from practiceDo you have experiences or plans to
Semantically annotate?Using standard vocabularies?
Model your domain; write or extend an ontologyConnect to or integrate with other collections?
Inside or outside your instituteMake data available on the internet?
Tell us (after the break)!
Some issues to tell us about: (next slide)
Why do you think semantic web could be helpful to your organization
Where?Front-office
Search/Browse/recommendation/personalizationMake data available to end-users for reuse (mash-ups, virtual (user) collections, … )
Back-officeProvide data to third partiesInference of new knowledge/consistency checking
What type of data?Porting existing dataCreating new data
Automatic data extraction / Experts / End users
What new (semantic) links arise?New relations to which collection/vocabularies?Within a collection / With other collections / With other institutesHow? Automatic alignment / Experts / End users?