Post on 31-Mar-2015
transcript
Lifecycle Support for Networked OntologiesLifecycle Support for Networked Ontologies
And related research in KMi
Mathieu d’Aquin and Marta Sabou
And also Enrico Motta, Martin Dzbor, Lucia Sepia, Sofia Angeletou, Laurian Gridinoc and Claudio Baldassarre
Slide 2IST-2005-027595NeOn-project.org
The Semantic WebThe Semantic Web
A large scale, heterogenous collection of formal, machine processable, ontology-based statements (semantic metadata) about web resources and other entities in the world, expressed in a XML-based syntax
0
5
10
15
20
25
30
35
40
45
2003 2004
#SW Pages
Lee, J., Goodwin, R. (2004) The Semantic Webscape: a View of the Semantic Web. IBM Research Report.
Ontology
Metadata
UoD
<rdf:RDF><channel rdf:about=“http://watson.kmi.open.ac.uk/blog”><title>Elementaries - The Watson Blog</title><link>http://watson.kmi.open.ac.uk:8080/blog/</link><description>"Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23</description><language>en</language><copyright>Watson team</copyright><lastBuildDate>Thu, 01 Mar 2007 13:49:52 GMT</lastBuildDate><generator>Pebble (http://pebble.sourceforge.net)</generator><docs>http://backend.userland.com/rss</docs>…
<rdf:RDF><channel rdf:about=“http://watson.kmi.open.ac.uk/blog”><title>Elementaries - The Watson Blog</title><link>http://watson.kmi.open.ac.uk:8080/blog/</link><description>"Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23</description><language>en</language><copyright>Watson team</copyright><lastBuildDate>Thu, 01 Mar 2007 13:49:52 GMT</lastBuildDate><generator>Pebble (http://pebble.sourceforge.net)</generator><docs>http://backend.userland.com/rss</docs>…
<rdf:RDF> <foaf:Image rdf:about='http://static.flickr.com/132/400582453_e1e1f8602c.jpg'> <dc:title>Zen wisteria</dc:title> <dc:description></dc:description> <foaf:page rdf:resource='http://www.flickr.com/photos/xcv/400582453/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/vittelgarden/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/wisteria/'/> <dc:creator> <foaf:Person><foaf:name>Mathieu d'Aquin</foaf:name> …
<rdf:RDF> <foaf:Image rdf:about='http://static.flickr.com/132/400582453_e1e1f8602c.jpg'> <dc:title>Zen wisteria</dc:title> <dc:description></dc:description> <foaf:page rdf:resource='http://www.flickr.com/photos/xcv/400582453/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/vittelgarden/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/wisteria/'/> <dc:creator> <foaf:Person><foaf:name>Mathieu d'Aquin</foaf:name> …
<rdf:RDF> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://usefulinc.com/ns/doap#"/> </owl:Ontology> <j.1:Organization rdf:ID="KMi"> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >The Knoledge Media Institute of the Open University, Milton Keynes UK</rdfs:comment> </j.1:Organization> <j.1:Document rdf:ID="KMiWebSite"> …
<rdf:RDF> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://usefulinc.com/ns/doap#"/> </owl:Ontology> <j.1:Organization rdf:ID="KMi"> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >The Knoledge Media Institute of the Open University, Milton Keynes UK</rdfs:comment> </j.1:Organization> <j.1:Document rdf:ID="KMiWebSite"> …
DOAP
FOAFDC
RSS TAPWORDNET
NCI GalenMusic
…
…… …
…
…
Slide 4IST-2005-027595NeOn-project.org
SW = A Conceptual Layer over the webSW = A Conceptual Layer over the web
Slide 5IST-2005-027595NeOn-project.org
SW is Heterogeneous!SW is Heterogeneous!
Slide 6IST-2005-027595NeOn-project.org
The NeOn ProjectThe NeOn Project
NeOn is not 100% dependent on the SW– NeOn is really about developing large scale semantic applications.
However the SW as a large-scale, heterogeneous semantic layer over the web provides a natural focus for characterizing the NeOn project.
In other Words, the issues characterizing the NeOn project…– heterogeneity, – large-scale semantics, – metadata and ontology dynamics, – distributed development, etc.
…perfectly fit the emerging semantic web scenario
Slide 7IST-2005-027595NeOn-project.org
Economic vision underpinning NeOnEconomic vision underpinning NeOn
The vision of a knowledge-based economy supported by the availability of large scale semantic information
– Key is the ability to build open, ontology-based applications able to scale up to large quantities of data and to evolve, as heterogeneous data are dynamically generated on the (semantic) web
Ontologies become central– Semantic web built around ontologies– Ontologies key enablers for handling interoperability
Slide 8IST-2005-027595NeOn-project.org
Current technological limitationsCurrent technological limitations
No adequate infrastructure for the whole application development lifecycle of the envisaged applications
Specifically, current infrastructures not effective – Do not scale up – Poor support for rapid development of large applications by reuse
• Reuse typically so expensive that people prefer to re-build from scratch• Problem concerns both the lack of methodologies as well
tools/techniques
– Poor support for managing the evolution of an application– Poor support for collaborative development – Limitations of current user interfaces
• E.g., support for navigating several large ontologies at the same time
Software crisis all over again?
Slide 9IST-2005-027595NeOn-project.org
AmbitionAmbition
Overall goals– major integrative effort aiming at providing a radical ‘leap forward’ by
developing the infrastructure needed to make large-scale semantic application development feasible and cost-effective
– lowering the entry barrier for organizations needing semantic solutions– targeting robustness, scalability, multi-ontology scenarios, multi-
user development, multi-lingual solutions, ..
Emphasis– On concrete engineering solutions– On concrete support for life-cycle activities– On measurable improvements
Ambition on the technology level (4 yrs)– NeOn as the standard reference infrastructure for large-scale
semantic web application development
Slide 10IST-2005-027595NeOn-project.org
Key Planned OutputsKey Planned Outputs
System-level contributions (methodology, architecture, toolkit)– An open, service-centred reference architecture for managing the
complete lifecycle of networked ontologies and meta-data– The NeOn toolkit for system development with NOs– The NeOn methodology for sys. development with NOs
Contributions to foundational research – Methods and tools for managing dynamic, evolving, possibly
inconsistent and contextually grounded networked ontologies– Methods and tools for supporting large-scale collaborative
development
Also…– Sector-level: Three innovative testbeds in two sectors– Community-level: Creation of an active community of users and
developers
Slide 11IST-2005-027595NeOn-project.org
TestbedsTestbeds
Managing fishery knowledge to support automatic alert mechanisms
– United Nations Food and Agriculture Organization
E-Invoice management in the pharmaceutical sector– AECE/PharmaInnova
Integration and management of information about pharmaceutical products
– Atos Origin
Slide 12IST-2005-027595NeOn-project.org
PartnersPartners
• KMi, the Open University• University of Sheffield
• Universität Koblenz-Landau• Software AG• Universität Karlsruhe• Ontoprise GmbH
•Institute 'Jozef Stefan’
• Institut National de Recherche en Informatique et en Automatique
•Asociación Española de Comercio Electrónico - PharmaInnova Cluster• Universidad Politécnica de Madrid • Atos Origin SAE• Intelligent Software Components SA
• Consiglio Nazionale delle Ricerche• Food and Agriculture Organization of the United Nations
Slide 13IST-2005-027595NeOn-project.org
NeOn at KMi:
Supporting and developing next generation Semantic Web applications
Slide 14IST-2005-027595NeOn-project.org
Example: MagpieExample: Magpie
Slide 15IST-2005-027595NeOn-project.org
Example: PowerAquaExample: PowerAqua
Slide 16IST-2005-027595NeOn-project.org
Next Generation Semantic Web Next Generation Semantic Web ApplicationsApplications
Slide 17IST-2005-027595NeOn-project.org
Next Generation Semantic Web Next Generation Semantic Web ApplicationsApplications
NG SW Application
Able to exploit the SW at large – Dynamically retrieving the relevant semantic
resources – Combining several, heterogeneous Ontologies– …
Need tools to efficiently access the knowledge available on the SW: a Gateway…
Slide 18IST-2005-027595NeOn-project.org
Swoogle…Swoogle…
Existing Semantic Web Gateway, but…
Slide 19IST-2005-027595NeOn-project.org
Limitations of SwoogleLimitations of Swoogle
No quality control mechanisms– Many ontologies are duplicated– No quality information provided
Limited Query/Search mechanisms– Only keyword search, we need more powerful query methods (e.g.,
ability to pose formal queries)
Limited range of ontology ranking mechanisms– Swoogle only uses a 'popularity-based' one
No support for relations between ontologies– Duplication, incompatibility (contradiction), modularization, versioning,
etc.
Slide 20IST-2005-027595NeOn-project.org
Slide 21IST-2005-027595NeOn-project.org
Watson: (truly) a Gateway to the SWWatson: (truly) a Gateway to the SW
Slide 22IST-2005-027595NeOn-project.org
Watson ArchitectureWatson Architecture
Keyword Search
SPARQLQuery
Crawling Parsing(Jena)
Validation/Analysis
Indexing
RepositoryURLs Metadata Indexes
populates populates populates populatesusedextractedretrieved
OntologyExploration
queries queriesqueriesqueries queries
request
WWW
discovered
Collecting Analyzing
Querying
Slide 23IST-2005-027595NeOn-project.org
The current content of WatsonThe current content of Watson
The current demo version of Watson have collected more than 7500 (syntactically unique) semantic documents
– Could do more, but limited by our current test server…– 2983 RDF or RDF(S), 1997 OWL, 1391 DAML, 302 RSS, 83 FOAF, 133
mixed (e.g, OWL+DAML(5) or OWL+FOAF+RSS(1))– Lots of ontologies are in OWL FULL (3x the number of OWL Lite)– … but most of the ontologies use only a very restricted sub-part of the
expressivity of OWL and DAML, e.g.,• only 147 go beyond ALC• role transitivity is used in only 11 ontologies
– 1304 (semantic) duplications detected (to be refined)– About 300,000 entities extracted– typeOf and subClassOf are the most popular relations– Language information is rarely used but:
• English is clearly the most employed language • Then come in this order de, fr, fi, pt, es, tr, nl
Slide 24IST-2005-027595NeOn-project.org
Example: selection of the Example: selection of the complementary ontologies complementary ontologies
Slide 25IST-2005-027595NeOn-project.org
Formal Queries and relation discovery…Formal Queries and relation discovery…
Slide 26IST-2005-027595NeOn-project.org
Going Further: Knowledge SelectionGoing Further: Knowledge Selection
t2
t1tn
t1t3
t3t4
t5
OntologySelection
t1
t2
t3t4
t5
…tn
Web
t2
t1t3
t4
t5
tn
The ideal world (Web)The real world (Web)Knowledge Selection
Ontology Modularization
t1tn
t2
Ontology Modularization
t5
t4t3
Ontology Modularization t3
t1
t2
t1t3
t4
t5
tnOntology Merging
Slide 27IST-2005-027595NeOn-project.org
Modularization: Example 2Modularization: Example 2http://www.co-ode.org/ontologies/test/breaksmetrics.owlhttp://www.co-ode.org/ontologies/test/breaksmetrics.owl
…
Slide 28IST-2005-027595NeOn-project.org
Modularization: Example 2Modularization: Example 2Resulting moduleResulting module
Cancer
Lung
AdenoCarcinoma
Slide 29IST-2005-027595NeOn-project.org
ImplementationImplementation
Slide 30IST-2005-027595NeOn-project.org
ImplementationImplementationIntegration with ontology selectionIntegration with ontology selection
Slide 31IST-2005-027595NeOn-project.org
1
0.9
0.9 0.91
0.5
0.5
–Label similarity methods •e.g., Full_Professor = FullProfessor
–Structure similarity methods•Using taxonomic/property related information
Ontology MatchingOntology Matching
Slide 32IST-2005-027595NeOn-project.org
New paradigm: use of background New paradigm: use of background knowledgeknowledge
A B
Background Knowledge(external source)
A’ B’R
R
Slide 33IST-2005-027595NeOn-project.org
Where the background knowledge comes from?Where the background knowledge comes from?
Aleksovski et al. EKAW’06• A richly axiomatized domain ontology • Assumes that a suitable domain ontology is available.
van Hage et al. ISWC’05• Google and an online dictionary in the food domain• Noise introduce by the use of IR technique on a Web corpus
A Brel
+ OnlineDictionary
IR Methods
Slide 34IST-2005-027595NeOn-project.org
• rely on online ontologies (Semantic Web) to derive mappings• ontologies are dynamically discovered and combined
A Brel
Semantic Web
Our Approach:Our Approach:Using the SW as background knowledgeUsing the SW as background knowledge
• Exploit the Semantic Web: next generation Semantic Web application• Does not rely on any pre-selected knowledge sources.
Slide 35IST-2005-027595NeOn-project.org
ExamplesExamples
ka2.rdf
Researcher AcademicStaff
Sem
anti
c W
eb
Researcher
AcademicStaff
⊆
⊆
ISWC SWRC
Both concepts are found in one ontology
Ham SeaFoodS
eman
tic
Web
HamSeaFood
Meat
Meat
SeaFood
Concepts are related across several ontologies
Agrovoc NALT
⊆
€
⊥
€
⊥
€
⊥
pizza-to-go
wine.owl
NALT
Slide 36IST-2005-027595NeOn-project.org
Evaluation: 1600 mappings, two teams
Average precision: 70% (comparable/better than standard)
(derived from 180 different ontologies)
Matching AGROVOC (16k terms) and NALT(41k terms)
Large Scale EvaluationLarge Scale Evaluation
Slide 37IST-2005-027595NeOn-project.org
Back to the Web: FolksonomiesBack to the Web: Folksonomies
Tags are popular, easy to use annotations
But they are not structured…
No computable semantics…
Slide 38IST-2005-027595NeOn-project.org
Finding tagged imagesFinding tagged images
FlowerRose
Lilac
LilacFlower
TulipFlowersCutFlower
Tulip
Slide 39IST-2005-027595NeOn-project.org
FlowerRose
Lilac
LilacFlower
TulipFlowersCutFlower
Tulip
Finding tagged images –Finding tagged images –
FLOWERFLOWER
Slide 40IST-2005-027595NeOn-project.org
What if …What if …
Rose Tulip
Flower
Lilac
…folksonomies were semantically richer
Slide 41IST-2005-027595NeOn-project.org
FlowerRose
Lilac
LilacFlower
TulipFlowersCutFlower
Tulip
Finding tagged images –Finding tagged images –FLOWER (II)FLOWER (II)
Rose Tulip
Flower
Lilac
Slide 42IST-2005-027595NeOn-project.org
Learning Relations Between TagsLearning Relations Between Tags
Tags
{camera, digital slr, photograph} {damage, flooding, hurricane, katrina, Louisiana} Clusters
Digital SLR
cameraphotographtakenWith
Ontologies
NLP/ClusteringNLP/Clustering
Find and combine Online ontologies +modularizaton
+matching+modularizaton
+matching
Slide 43IST-2005-027595NeOn-project.org
ExamplesExamples
Slide 44IST-2005-027595NeOn-project.org
ExamplesExamples
Slide 45IST-2005-027595NeOn-project.org
ExamplesExamples
Slide 46IST-2005-027595NeOn-project.org
Read more…Read more…
NeOnhttp://www.neon-project.org/web
Next Generation Semantic Web ApplicationsE. Motta and M. Sabou. Next Generation Semantic Web Applications. AWC 2006. E. Motta and M. Sabou. Language Technologies and the Evolution of the Semantic Web. LREC 2006.E, Motta. Knowledge Publishing and Access on the Semantic Web: A Socio-Technological Analysis. IEEE
Intelligent Systems, Vol.21, 3, (88-90).
WastonM. d’Aquin, M. Sabou, M. Dzbor, C. Baldassarre, L. Gridinoc, S. Angeletou, and E. Motta. WATSON: A
Gateway for the Semantic Web. Accepted for the poster session of ESWC 2007.http://watson.kmi.open.ac.uk/http://watson.kmi.open.ac.uk/blog
Ontology ModularizationM. d’Aquin, M. Sabou, and E. Motta. Modularization: a Key for the Dynamic Selection of Relevant
Knowledge Components. ISWC 2006 workshop on Modular Ontologies (WoMO 2006).
Ontology MatchingM. Sabou, M. d’Aquin and E. Motta. Using the Semantic Web as Background Knowledge in Ontology
Mapping. ISWC 2006 workshop on Ontology Mapping (OM 2006).
Linking folksonomies to ontologiesL.Specia and E. Motta. Integrating Folksonomies with the Semantic Web. Accepted for ESWC 2007.
Slide 47IST-2005-027595NeOn-project.org
Thank you!