7/16/09
Challenge the future Delft University of Technology
Natural Language and the Semantic Web: a crucial symbiosis
Philipp Cimiano
Web Information Systems, TU Delft, The Netherlands
2 Semantic Web Summer School (SWSS09), Cercedilla
Aims and Not-Aims
• Aims • Overview • Raise Questions • Entertain and Encourage
Not-Aims • Present my own work (only a bit ;-) • Present solutions or answers
3 Semantic Web Summer School (SWSS09), Cercedilla
Structure
• The relation between ontologies and natural language • Applications at the ontology-language interface • Principled approaches to the language-ontology interface • The LexInfo model • Conclusion
4 Semantic Web Summer School (SWSS09), Cercedilla
Symbiosis
• The term symbiosis commonly describes close and often long-term interactions between different biological species.
5 Semantic Web Summer School (SWSS09), Cercedilla
Different type of symbiotic relations
• Mutualism is a biological interaction between two organisms, where each individual derives a fitness benefit, for example increased survivorship.
• Commensalism is a class of relationship between two organisms where one organism benefits but the other is unaffected.
• Parasitism is a type of symbiotic relationship between two different organisms where one organism, the parasite, takes favor from the host, sometimes for a prolonged time.
What type of relation exists between ontologies (as building blocks of the Semantic Web) and natural language?
6 Semantic Web Summer School (SWSS09), Cercedilla
Fitness Benefit
• What fitness benefit do ontologies derive from natural language?
7 Semantic Web Summer School (SWSS09), Cercedilla
Symbol Grounding
• Symbol Grounding Problem (Harnard 1990): it is very difficult (if not impossible) to express the meaning of a symbol in the system itself. We need anchoring to some external system.
• In the case of ontologies, this external system is language.
• We define local names as part of URIs http://www.example.org#car
• We specify labels of these URIs rdf:label(http://www.example.org#car,’car’)
• We add natural language definitions of the classes and properties we define (e.g. using rdfs:comment) “A car is a wheeled motor vehicle used for transporting
passengers, which also carries its own engine or motor.”
8 Semantic Web Summer School (SWSS09), Cercedilla
Further benefits
• Ontologies benefit from language:
• Grounding of meaning for humans • Population of ontologies from textual data (massively available) • Language-based interaction with knowledge (e.g. querying by way
of natural language) • Reading documents describing how humans perceive the world to
support ontology engineering (consulting domain-specific literature is an important step in most knowledge engineering methodologies)
9 Semantic Web Summer School (SWSS09), Cercedilla
A commensual or parasitic relation ?
• So is the relation commensal or even parasitic in the sense that ontologies need language (to ground the meaning of symbols) but language does not profit from ontologies?
10 Semantic Web Summer School (SWSS09), Cercedilla
NLP benefits from ontologies
• In formal semantics it is assumed that meaning can be captured by a logical formalism (typically FOL) which supports reasoning and drawing of inferences (humans clearly do so).
• The meaning of the sentence: “Vincent is married to Mia” is:
marriedTo(vincent,mia)
• But what do these symbols mean in the logical system in terms of what conclusions we can draw? (is MarriedTo symmetric? timeless?)
• What are the legal symbols that we can use? (a question of ontology)
11 Semantic Web Summer School (SWSS09), Cercedilla
There are a number of ways in which meaning can be represented
e.g. “Vincent is married to Mia.”
€
marriedTo(vincent,mia)
∃x marriage(x)∧ partner(x,vincent)∧ partner(x,mia)
∃x marriage(x)∧ partner(x,vincent)∧ partner(x,mia)∧holdsDuring(x,interval) ∧ overlap(interval,now)
12 Semantic Web Summer School (SWSS09), Cercedilla
Word Sense Disambiguation (WSD)
• Well-known that words have different senses (at least 10 according to WordNet!)
• There is no limit to the senses that we can consider (very fine-grained)
13 Semantic Web Summer School (SWSS09), Cercedilla
Named Entity Recognition
• Named entity recognition recognizes entities of a certain type in textual data:
<painter>Rembrandt Harmenszoon van Rijn </painter> was born on <date> July 15, 1606 </date> in <city> Leiden </city>, <country> the Netherlands </country>. He was the ninth child born to <person> Harmen Gerritszoon van Rijn </person> and <person> Neeltgen Willemsdochter van Zuytbrouck </person>.
• Arbitrary number of possible types and granularity (tag people as person or according to their profession etc.)
14 Semantic Web Summer School (SWSS09), Cercedilla
Semantic Normalization
• The Liffy flows through Dublin. => flowsThrough(Liffy,Dublin)
• Dublin lies at the Liffy. => lies_at(Dublin,Liffy)
• Dublin is located at the Liffy. => located_at(Dublin,Liffy)
• The Liffy passes Dublin. => passes(Liffy,Dublin)
15 Semantic Web Summer School (SWSS09), Cercedilla
Ontologies are crucial for the analysis of natural language
• Ontologies define and axiomatize a vocabulary.
• Define the meaning of symbols to allow to reason with them (e.g. marriedTo is symmetric, bound to a certain time interval)
• Define the granularity for WSD and NER and other tasks.
• Normalization
• Help to constrain the task of interpreting language for a specific purpose, domain, application etc.
16 Semantic Web Summer School (SWSS09), Cercedilla
Applications at the interface between language and ontologies
• Information Extraction / Ontology Population • Ontology-based Question Answering • Ontology Engineering • Ontology Verbalization
17 Semantic Web Summer School (SWSS09), Cercedilla
Scenario
• Assume we have an ontology about artists modeling: • Name • Birth and death dates • Birth and death places • Marriages, children • Paintings with their creation date • Influences by other artists • Etc.
There are many artists so it is hard add all relevant instances manually. Textual data is massively available, so what about extracting information from textual data to populate the ontology automatically? This process has been typically referred to as ontology population (as opposed to ontology learning which tries to learn the actual schema)
18 Semantic Web Summer School (SWSS09), Cercedilla
1. Ontology Population/Information Extraction
• “Claude Monet was born on 14 November 1840 on the fifth floor of 45 rue Laffitte, in the ninth arrondissement of Paris.” -> birthplace(Claude Monnet, Paris)
-> birthdate (Claude Monnet,14.11.1840)
• “Monet lived from December 1871 to 1878 at Argenteuil, a village on the Seine near Paris”.
-> type(stay_Monnet_Paris,Stay)
-> artist(stay_Monnet_Paris,Claude_Monnet)
-> place(stay_Monnet_Paris,Argenteuil)
-> during(stay_Monnet_Paris,interval_1871_1878)
->
19 Semantic Web Summer School (SWSS09), Cercedilla
Challenges for Ontology Population
• Normalization (different variants map to the same ontological representation)
• Capture different variants (learn from examples using machine learning techniques, sparseness)
• Ontology-sensitive processing • Granularity of word senses that need to be distinguished • Use ontology for disambiguation • Ignore constituents which are not relevant to the ontology
20 Semantic Web Summer School (SWSS09), Cercedilla
2. Question Answering
• Ontologies model relevant world knowledge in a certain domain. • As humans, we are interested in accessing this knowledge, preferably
in an intuitive way (e.g. by means of natural language)
• Many systems have been designed in the past to meet this need: • Aqualog [Lopez and Motta 2004] • ORAKEL [Cimiano et al. 2008] • GINO [Bernstein and Kaufmann 2006] • And many more…
• E.g. Who is a professor at the knowledge media institute? • Prof. Enrico Motta • Prof. John Domingue • Prof. Stefan Rueger
21 Semantic Web Summer School (SWSS09), Cercedilla
Ontology-based Question Answering (e.g. Aqualog [Lopez and Motta 2004])
Who is a Professor at the Knowledge Media Institute?
(Who,is,professor) (professor,at,Knowledge Media Institute)
<typeOf ?x Professor-in-Academia> & <works-in-unit ?x KMi>
RSS
22 Semantic Web Summer School (SWSS09), Cercedilla
Ontology-based Question Answering
Who is PC member of the ISWC conference?
(Who,is,PC Member) (PC Member,of,ISWC Conference)
(?x,PCMemberOf,ISWC)
23 Semantic Web Summer School (SWSS09), Cercedilla
3. Language in Ontology Engineering
Ontology Languages (OWL and RDFS) are hard to grasp, both semantically and syntactically. • RDF-XML and OWL-XML syntaxes hard to read by humans • OWL Abstract syntax only for experts (logicians) • Manchester syntax more intuitive, but not for “casual users” Pizza AND NOT (hasTopping SOME FishTopping) AND NOT (hasTopping SOME MeatTopping)
The idea has been to allow people to model ontological knowledge using natural language.
Most approaches along these lines rely on “controlled natural language”.
24 Semantic Web Summer School (SWSS09), Cercedilla
What is controlled natural language?
• Controlled natural languages (CNLs) are subsets of natural languages, obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity.
• Reducing ambiguity: “Every man loves a woman.”
• Reading 1: Every man loves a woman which is specific to him.” ⇒ Use “Every man loves a woman.”
• Reading2: All man love the same woman. ⇒ Use “There is a woman that every man loves.”
Controlled language is prescriptive in this sense and people have to learn what expressions to use to express a certain state of affairs.
25 Semantic Web Summer School (SWSS09), Cercedilla
4. Ontology Verbalization
• Helping in ontology engineering by: • Verbalizing the ontology • Allowing to create axioms in natural language
• Different approaches: • ACE [Kaljurand et al. 2008] • Sydney Syntax [Cregan et al. 2008]
26 Semantic Web Summer School (SWSS09), Cercedilla
Verbalization in ACE and Sydney Syntax
• Design Choices: • Bijective Mapping between Controlled Language and Axiomatic
Representation (allows “roundtripping”, see [Davis et al. 2008]) • Functional: one unique way of verbalizing something • Do not use constructs mirroring the OWL syntax, use “natural
English” • “car is a subclass of vehicle”: => “Every car is a vehicle.” • “Man and woman are disjoint classes” => “There is no man
that is also a woman.”
• Use variables to express more complex axioms: • “If X is married to Y then Y is also married to X”.
27 Semantic Web Summer School (SWSS09), Cercedilla
OWL Verbalization
owl:IntersectionOf( cat owl:ComplementOf( owl:SomeValuesFrom( like owl:IntersectionOf( dog owl:UnionOf( owl:SomeValuesFrom(attack mailman) owl:OneOf(Fido))))))
Verbalized as “something that is a cat and that does not like a dog that attacks a mailman or that is Fido”
28 Semantic Web Summer School (SWSS09), Cercedilla
Opaqueness
• Make the various SW formalisms opaque to the user (OWL, SWRL, SPARQL)
• Every employee that does not own a car owns a bike. • Every man that owns a car likes that car. • Who owns a car?
€
employee∩¬(∃ own.car)⊆ ∃ own.bike (OWL)man(?x)∧own(?x,?y)∧car(?y)→ like(?x,?y) (SWRL)SELECT ?x WHERE {?x owns ?y. ?y rdf : type car} (SPARQL)
29 Semantic Web Summer School (SWSS09), Cercedilla
Limits of ACE -> OWL
• Mathematical (e.g. transitive properties): • “If something A is taller than something B and B is taller than
something C then A is taller than C.” • Quite complex (disjoint union):
• “No male is a female. No female is a male. Every person is a male or is a female. Everything that is a male or that is a female is a person.”
• Much easier than Protégé?
30 Semantic Web Summer School (SWSS09), Cercedilla
5. Text Generation from Ontologies [Bontcheva 2005]
<rdf:Description rdf:about=http://www.aifb.uni-kalrsuhe.de/Personen/viewPersonOWL#instance?
id_db=20>
<rdf:type>
<owl:Class rdf:about=“&swrc;AssistantProfessor”>
</rdf:type>
<swrc:name rdf:datatype=“xsd:string”> York Sure </swrc:name>
<swrc:phone rdf:datatype=“xsd:string”> +49 (0) 721 608 6592 </swrc:phone>
<swrc:fax rdf:datatype=“xsd:string”> +49 (0) 721 608 6580 </swrc:fax>
<swrc:homepage rdf:datatype=“xsd:string”> http://www.aifb.uni-karlsruhe.de/WBS/ysu</
swrc:homepage>
</rdf:Description>
York Sure has a telephone number +49 (0) 721 608 6592, a fax number +49 (0) 721 608 6580, and a
web page http://www.aifb.uni-karlsruhe.de/WBS/ysu.
31 Semantic Web Summer School (SWSS09), Cercedilla
Different parts of the puzzle Ontology-based Question Answering
Ontology Generation Ontology Verbalization
Ontology Population
32 Semantic Web Summer School (SWSS09), Cercedilla
Reuse of NLP approaches
• To make the landscape more homogeneous, building a common resources, infrastructures, grammars etc. would be crucial.
• This should be achieved by reusing state-of-the-art and mature technologies from the NLP community, in particular: • Dependency parsing • Compositional semantics
33 Semantic Web Summer School (SWSS09), Cercedilla
Many state-of-the-art parsers (dependency parsers)
• On the one hand, there are many state-of-the-art dependency parsers that we could use:
• Stanford Parser (http://nlp.stanford.edu/software/lex-parser.shtml), Manning et al.
• RASP (http://www.informatics.susx.ac.uk/research/groups/nlp/rasp/) , Sussex, by T. Briscoe and T. Caroll
• Malt (http://maltparser.org/ ) by Joakim Nivre, Sweden
• In addition, there are large-scale grammars available: • XLE LFG Grammar Engineering Environment by PARC (used in Powerset and
Bing recently)
• LinGO English Resource Grammar (ERG)
• …
34 Semantic Web Summer School (SWSS09), Cercedilla
Who is professor at the Knowledge Media Institute?
is
who professor
attr nsubj
Knowledge Media Institute
prep_at
the
det
35 Semantic Web Summer School (SWSS09), Cercedilla
Problems of “shallow” triple-based approaches
• Triple-based approach too simplistic to account for fine-grained meaning variations:
Who was a professor at the Knowledge Media Institute?
Who has been PC member of all ISWC conferences?
• The distinctions are often lost when we mapping into triples and it is very hard to reconstruct them (there is nothing similar to “was” and “all” in the ontology), so still if we carry the information on, similarity-based approaches can not use it properly!
36 Semantic Web Summer School (SWSS09), Cercedilla
Problematic examples for triple-based approaches
Who was PC member of all ISWC conferences?
(Who,was_a,PC Member) (PC Member,of,ISWC Conference)
(?x,PCMemberOf,ISWC)
?x forall y (y rdf:type Conference; y hasAcronym “ISWC”) -> x pcMemberOf y
Small but important variations in meaning which escape triple-based approaches
37 Semantic Web Summer School (SWSS09), Cercedilla
Compositional semantics • Principle of compositional semantics: The meaning of a question
(sentence) is determined by the meaning of its parts and the way they are composed together.
Important: takes into account the contribution of every single word in terms of the overall meaning of the sentence, guided by the dependency analysis.
Vincent loves Mia. => loves(vincent,mia)
loves
vincent mia
subj obj
€
λx λy love(x,y)
Vincent Mia
38 Semantic Web Summer School (SWSS09), Cercedilla
Who was a PC Member of the ISWC conference?
was
who PC member
conference
the ISWC
attr nsubj
prep_of
det nn
€
?x person(x) ∧∃y PCMemberOf(x,y)∧Conference(y)∧hasAcronym(y,"ISWC")
39 Semantic Web Summer School (SWSS09), Cercedilla
Who was a PC Member of all ISWC conferences?
was
who PC member
conferences
all ISWC
attr nsubj
prep_of
det nn
€
?x person(x) ∧∀y (Conference(y)∧hasAcronym(y,"ISWC"))→ PCMemberOf(x,y)
40 Semantic Web Summer School (SWSS09), Cercedilla
Compositional Semantics Approach
• Elegant and principled approach to compute the meaning of sentences (w.r.t. to the given ontology)
• Together with dependency parsing, it has the potential to provide a common basis for all those applications mapping language to ontology and the other way round: • Ontology Population • Ontology-based Question Answering • Generation
• Powerful approach not always trivial to implement (research challenge!)
41 Semantic Web Summer School (SWSS09), Cercedilla
Language-ontology interfaces needs lexical semantics
• The lexical semantics of nouns, adjectives, verbs etc. has to be specified with respect to the domain ontology for all applications at the ontology-language interface.
• The meaning of a question, sentence etc. w.r.t. to the ontology can then be calculated on the basis of the lexical semantics of the single words according to the principle of compositional semantics.
• Suboptimal solution: every instantiation of an application at the language-ontology interface (population, verbalization, generation) instantiates the mapping from the ontology from scratch.
• Optimal solution: we make the meaning of nouns, adjectives, verbs etc. explicit and publish them declaratively (as an ontology)
• This is the goal we have pursued when developing the LexInfo model.
42 Semantic Web Summer School (SWSS09), Cercedilla
Bridging the gap: ontology lexicons Ontology
lexicon
• “Ontology lexicons” provide information about linguistic realization of concepts, properties, instances etc. (clearly separating but linking both levels) in a declarative fashion
• These ontology lexicons are not proprietary to any system and can be reused.
• While the ontology “talks” about concepts, properties, instances and other axioms, the ontology lexicon “talks” about “lexical elements”, together with information about part-of-speech, morphological (de-) composition, syntactic behaviour etc.
43 Semantic Web Summer School (SWSS09), Cercedilla
Separation between Linguistic and Ontological Level
• Separation also allows to develop and maintain the lexicons independently of the ontology.
• This means that we can perfectly allow different lexica for each ontology to co-exist (why not?)
• In RDF and SKOS, this is not possible. • Our solution (sketch):
ontology lexicon ontology
river (concept)
river (lexical entry)
noun
lemma “river”
“rivers”
POS
plural refersTo
(meta‐ontology)
44 Semantic Web Summer School (SWSS09), Cercedilla
LexInfo: LexicalEntry
• Top level distinguishes specifically different parts-of-speech as classes:
45 Semantic Web Summer School (SWSS09), Cercedilla
X flows through Y (IntransitivePP)
46 Semantic Web Summer School (SWSS09), Cercedilla
Variants of Expression with LexInfo
flowsThrough(Seine,Paris)
Paris is located at the Seine.
The Seine flows through Paris.
The Seine crosses Paris.
The Seine passes Paris.
47 Semantic Web Summer School (SWSS09), Cercedilla
Vision
Ontology lexicon
Ontology lexicon
Ontology lexicon Ontology
lexicon
48 Semantic Web Summer School (SWSS09), Cercedilla
Conclusion • There is indeed a symbiotic relation between language and ontologies
in which both benefit from each other • Many applications at the language-ontology interface do not build on a
common approach (allowing reuse of grammars, components etc.) • Many mature techniques from the computational linguistics/semantics
communities are ready to be used. • Important step: principled models for representing the lexical
semantics of words in a way that they can be reused
49 Semantic Web Summer School (SWSS09), Cercedilla
Thanks for your attention!
50 Semantic Web Summer School (SWSS09), Cercedilla
Acknowledgements
• Multipla project - DFG grant 38457858
51 Semantic Web Summer School (SWSS09), Cercedilla
References 1. A. Bernstein, E. Kaufmann (2006),” GINO - A Guided Input Natural Language Ontology Editor”, Proceedings of the
5th International Semantic Web Conference (ISWC 2006).
2. K. Bontcheva (2005), “Generating Tailored Textual Summaries from Ontologies”, Proceedings of the European
Semantic Web Conference (ESWC), pp. 531-545
3. P. Cimiano and P. Haase and J. Heizmann and M. Mantel and R. Studer (2008), “Towards portable natural language
interfaces to knowledge bases: The Case of the ORAKEL system”, Data Knowledge Engineering (DKE), 65(2), pp.
325-354
4. A. Cregan, R. Schwitter, T. Meyer: Sydney OWL Syntax - towards a Controlled Natural Language Syntax for OWL
1.1., Proceedings of the Fourth OWLED Workshop on OWL Experiences and Directions, 2007.
5. B. Davis, A. Ali Iqbal, A. Funk, V. Tablan, K. Bontcheva, H. Cunningham, S. Handschuh (2008), “RoundTrip Ontology
Authoring”. Proceedings of the International Semantic Web Conference, pp. 50-65
6. S. Harnad (1990) The Symbol Grounding Problem. Physica D 42: 335-346. Lopez
7. K. Kaljurand, “ACE View – an ontology and rule editor based on Attempto Controlled English” Proceedings of the
Fifth OWLED Workshop on OWL: Experiences and Directions, collocated with ISWC 2008
8. V. Lopez, and E. Motta (2004) Ontology Driven question answering in AquaLog, Proceedings of the 9th International
Conference on Applications of Natural Language to Information Systems (NLDB 2004), Manchester, UK