+ All Categories
Home > Documents > Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote...

Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote...

Date post: 17-Nov-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
45
OBIES 2008 Sept. 2008 German Research Center for Artificial Intelligence LT lab Ontology-based Information Extraction and Question Answering – Coming Together Günter Neumann LT lab, DFKI, Saarbrücken Mittwoch, 17. März 2010
Transcript
Page 1: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab

Ontology-based Information Extraction and Question Answering – Coming Together

Günter Neumann

LT lab, DFKI, Saarbrücken

Mittwoch, 17. März 2010

Page 2: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab What do I mean ?

✩Ontology-based information extraction

– Ontology defines target knowledge structures

• i.e., type of entities, relations, templates

– IE for identifying and extracting instances

– Merging of partial instances by means of reasoning

Mittwoch, 17. März 2010

Page 3: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab What do I mean ?

✩ Question answering from text and Web

– Answering questions about who, what, whom, when, where or why

– Question analysis:

• “Human carries ontology”

• Identifies the partially instantiated relation expressed in a Wh-question

• Identification of the “expected answer type”

– Answer extraction

• The „information extraction“ part of QA

• Also here: RTE for validating answer candidates (cf. Clef 2007/2008)

Who is Prime Minister of Canada?-> PM_of(person:X,country:Canada)-> EAT=person

Stephen Harper was sworn in as Canada’s 22nd Prime Minister on February 6, 2006. (Source: http://pm.gc.ca/eng/pm.asp)

Mittwoch, 17. März 2010

Page 4: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Two Possible Approaches of OBIES+QA

✩ Entailment-based QA

– Domain ontology as interface between NL and DB

– Bijective mapping between NL patterns and DB patterns

– Textual entailment for mastering the mapping/reasoning

– EU project QALL ME

✩ Web-based ontology learning using QA

– Unsupervised methods for extracting answers for factoid, list and definition based question

– Basis for large-scale, web-based bottom-up knowledge extraction and ontology population

– BMBF project Hylap

Mittwoch, 17. März 2010

Page 5: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Architectures of QA Systems

DB-QA Text-QA Hybrid-QA

NL Question NL Question NL Question

Mittwoch, 17. März 2010

Page 6: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Architectures of QA Systems

DB-QA Text-QA Hybrid-QA

NL Question NL Question NL Question

attr:val attr:val attr:val attr:val

Answer:facts

DBSystem

NL2DB Interface

SQL Query

Mittwoch, 17. März 2010

Page 7: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Architectures of QA Systems

DB-QA Text-QA Hybrid-QA

NL Question NL Question NL Question

attr:val attr:val attr:val attr:val

Answer:facts

DBSystem

NL2DB Interface

SQL Query

Answer:Text fragments

IR System

NL2IR Interface

Keywords

Answer Extraction

Mittwoch, 17. März 2010

Page 8: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Architectures of QA Systems

DB-QA Text-QA Hybrid-QA

NL Question NL Question NL Question

attr:val attr:val attr:val attr:val

Answer:facts

DBSystem

NL2DB Interface

SQL Query

Answer:Text fragments

IR System

NL2IR Interface

Keywords

Answer Extraction

attr:val attr:val attr:val attr:val

Anser:facts

DbSystem

NL2DB Interface

SQL Query

Answer:Text fragments

IR System

NL2IR Interface

Keywords

Answer Extraction

NL Interface

Answer Integration

Answer:facts

Mittwoch, 17. März 2010

Page 9: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Architectures of QA Systems

DB-QA Text-QA Hybrid-QA

NL Question NL Question NL Question

attr:val attr:val attr:val attr:val

Answer:facts

DBSystem

NL2DB Interface

SQL Query

Answer:Text fragments

IR System

NL2IR Interface

Keywords

Answer Extraction

attr:val attr:val attr:val attr:val

Anser:facts

DbSystem

NL2DB Interface

SQL Query

Answer:Text fragments

IR System

NL2IR Interface

Keywords

Answer Extraction

NL Interface

Answer Integration

Answer:facts

Mittwoch, 17. März 2010

Page 10: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab The QA bottleneck

✩ Hybrid QA:

– Increase of semantic structure (Semantic Web, Web 2.0) ⇒ Fusion of ontology-based DBMS and information extraction from text

– Dynamics and interactivity of Web requests for additional new complexity of the NL interface.

„Who wrote the script of Saw III?"

SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string . ?movie hasWriter ?writer . ?writer name ?writerName . }

„Who is the author of the script of the movie Saw III?"

=Complex linguistic & knowledge-based reasoning

Mittwoch, 17. März 2010

Page 11: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Possible approaches

✩ Full computation (inference)

– ⇒ AI complete; especially, if incomplete/wrong queries are allowed

✩ Controlled sublanguage

– A user may only express questions using a constrained grammar and with unambiguous meaning

– ⇒ cognitive burden is not acceptable

✩ Controlled mapping

– One-to-one mapping between NL patterns and DB-query patterns

– Flexible use of NL possible through methods of textual inference

Mittwoch, 17. März 2010

Page 12: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Textual Inference

✩ Motivation: textual variability of semantic expressions

✩ Idea: for two text expressions T & H:

– Does text T justify an inference of hypothesis H?

– Is H semantically entailed in T?

✩ PASCAL Recognizing Textual Entailment (RTE) Challenge

– since 2005, cf. Dagan et al.

– 2008: 4th RTE (at TAC), 26 groups (two subtasks)

✩ RTE is considered as a core technology for a number of text based applications:

– QA, IE, semantic search, text summarization, …

Prof. Clever works at Bostford University.

Prof. Clever, full professor at Bostford University,

published a new paper.

?

Mittwoch, 17. März 2010

Page 13: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Textual Inference for QA

✩ RTE successfully applied to answer validation

– Example

• Q: „In which country was Edouard Balladur born?”, A: “France”

• T: „Paris, Wednesday CONSERVATIVE Prime Minister Edouard Balladur, defeated in France's presidential election, resigned today clearing the way for President-elect Jacques Chirac to form his own new government…”

– Entailed(Q+A, T) ⇒ YES/NO ?

– Clef 2008, AVE task ⇒ DFKI best results for English and German

✩ New: RTE for semantic search

– Does question X entail an (already answered) question Y ?

Mittwoch, 17. März 2010

Page 14: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Current Control Flow

attr:val attr:val attr:val attr:val

Answers:values

Domain ontology

DBMS: RDF expressions

Bijective mapping betweenNL-patterns and SPARQL-patterns

NL Question

LinguisticAnalysis

TextualEntailment

Mittwoch, 17. März 2010

Page 15: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Current Control Flow

attr:val attr:val attr:val attr:val

Answers:values

Domain ontology

DBMS: RDF expressions

Bijective mapping betweenNL-patterns and SPARQL-patterns

NL Question

LinguisticAnalysis

TextualEntailment

Mittwoch, 17. März 2010

Page 16: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Current Control Flow

attr:val attr:val attr:val attr:val

Answers:values

Domain ontology

DBMS: RDF expressions

Bijective mapping betweenNL-patterns and SPARQL-patterns

NL Question

LinguisticAnalysis

TextualEntailment

Wo läuft Dreamgirls?

Mittwoch, 17. März 2010

Page 17: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Current Control Flow

attr:val attr:val attr:val attr:val

Answers:values

Domain ontology

DBMS: RDF expressions

Bijective mapping betweenNL-patterns and SPARQL-patterns

NL Question

LinguisticAnalysis

TextualEntailment

Wo läuft Dreamgirls?

Wo läuft [movie]?

Mittwoch, 17. März 2010

Page 18: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Current Control Flow

attr:val attr:val attr:val attr:val

Answers:values

Domain ontology

DBMS: RDF expressions

Bijective mapping betweenNL-patterns and SPARQL-patterns

NL Question

LinguisticAnalysis

TextualEntailment

Wo läuft Dreamgirls?

Wo läuft [movie]?

Mittwoch, 17. März 2010

Page 19: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Current Control Flow

attr:val attr:val attr:val attr:val

Answers:values

Domain ontology

DBMS: RDF expressions

Bijective mapping betweenNL-patterns and SPARQL-patterns

NL Question

LinguisticAnalysis

TextualEntailment

Wo läuft Dreamgirls?

Wo läuft [movie]?

"SELECT ?cinema ... WHERE ?movie name Dreamgirls ..."

Mittwoch, 17. März 2010

Page 20: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Current Control Flow

attr:val attr:val attr:val attr:val

Answers:values

Domain ontology

DBMS: RDF expressions

Bijective mapping betweenNL-patterns and SPARQL-patterns

NL Question

LinguisticAnalysis

TextualEntailment

Wo läuft Dreamgirls?

Wo läuft [movie]?

"SELECT ?cinema ... WHERE ?movie name Dreamgirls ..."

Xanadu

Mittwoch, 17. März 2010

Page 21: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Advantages

✩ Inference remains on the linguistic level

✩ RTE method are by definition robust ⇒ supports processing of underspecified/illspecified requests

✩ Good interplay with ontology-based DB

✩ Opens up possibility to automatically learn mappings via ontology-based information extraction

Mittwoch, 17. März 2010

Page 22: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Ontology-based Information Extraction

✩ Extraction of relevant information from textual sources (Web pages)

✩ Integration of the extracted data into current DB

✩ Domain ontology as starting point:

– Relevance

– Normalization

– Mapping

attr:val attr:val attr:val attr:val

DomainOntology

DB System

IE System

Mittwoch, 17. März 2010

Page 23: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Possible approaches

✩ Bootstrapping an ontology

– Basic components for handling IE-specific subtasks expressed as Wh-questions

– Unsupervised, language-indepdent approaches

– Populating/extending domain ontology

✩ Interactive dynamic information extraction

– Topic-based web crawling

– IE system mines for all possible relevant entities and relations

– See talk on Eichler et al., Friday, 13:30

Mittwoch, 17. März 2010

Page 24: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Unsupervised Web-based Question Answering for ontology bootstrapping

✩ Our goal:

– Development of ML-based strategies for complete end-to-end answer extraction for different types of questions and the open domain.

✩ Our perspective:

– Extract exact answers for different types of questions only from web snippets

– Use strong data-driven strategies

– Evaluate them with Trec/Clef Q-A pairs

✩ Our current results:

– ML-based strategies for open domain factoid, definition and list questions

– Question type specific query expansion for controlling web search

– Unsupervised learning for answer extraction

– Promising performance ( ~ 0.5 MRR on Trec/Clef data)

Mittwoch, 17. März 2010

Page 25: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Unsupervised Web-based Question Answering for ontology bootstrapping

✩ Our goal:

– Development of ML-based strategies for complete end-to-end answer extraction for different types of questions and the open domain.

✩ Our perspective:

– Extract exact answers for different types of questions only from web snippets

– Use strong data-driven strategies

– Evaluate them with Trec/Clef Q-A pairs

✩ Our current results:

– ML-based strategies for open domain factoid, definition and list questions

– Question type specific query expansion for controlling web search

– Unsupervised learning for answer extraction

– Promising performance ( ~ 0.5 MRR on Trec/Clef data)

F: When was Madonna born?D: What is Ubuntu?L: What movies did James Dean appear in?

Mittwoch, 17. März 2010

Page 26: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab

Lexico-syntacticpatterns

Genetic Algorithms

NL-Question

Exact answ 1Exact answ 2…

NL-string(s)

Snippets

AnswerPrediction Answer

Context

QA-History

Current ML-basedWeb-QA System

(feedback Loops)

Extraction viaTrivial patterns

Definition Extraction

Clusters of Potential senses…

Snippets

SurfaceE-patterns

Definitioncontext

Snippets

Factoid-WQA

GA-QA

Def-WQA

List Extraction

List-WQA

Listcontext

Mittwoch, 17. März 2010

Page 27: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Factoid-WQA - Technology

✩ Consult only snippets

– Submit NL question string (no query refinement, expansion, reformulation, …)

✩ Goal

– Identify smallest possible phrases from snippets that contain exact answers (AP phrases)

– Do not make use of any smoothing technology or pre-specified window sizes or length of phrases

✩ Answer extraction

– Use only very trivial patterns for extracting exact answers from AP phrases

– Only Wh-keywords, distinguish type of tokens, punctuation symbols for sentence splitting

http://amasci.com/tesla/tradio.txt TESLA INVENTED RADIO? ... He invented modern radio, but made such serious business mistakes that the recognition (to say ...

The prime minister Tony Blair said that

Who → Person; When → Time

The prime minister Tony Blair said that

Mittwoch, 17. März 2010

Page 28: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT labExperiments

Mittwoch, 17. März 2010

Page 29: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Experiments

Mittwoch, 17. März 2010

Page 30: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab ML for Definition Questions – Def-WQA

✩ Questions such as:

– What is a prism?

– Who is Ben Hur?

– What is the BMZ?

✩ Answering consists in collecting as much descriptive information as possible (nuggets):

– The distinction of relevant information

– Multiple sources

– Redundancy

✩ Exploit only web snippets:

– Avoid processing and downloading a wealth of documents.

– Avoid specialized wrappers (for dictionaries and encyclopedias)

– Snippets are automatically “anchored” around questions terms → Q-A proximity

– Considering N-best snippets → redundancy via implicit multi-document approach

– Extend the coverage by boosting the number of sources through simple surface patterns (also here: KB poor approach).

Mittwoch, 17. März 2010

Page 31: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Determining descriptive phrases from snippets

Mittwoch, 17. März 2010

Page 32: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Determining descriptive phrases from snippets

✩ Surface patterns, e.g., “What is the DFKI?”

– “DFKI is a” OR “DFKI is an” OR “DFKI is the” OR “DFKI are a”…

– “DFKI, or ”.

– “(DFKI)”

– “DFKI becomes” OR “DFKI become” OR “DFKI became”

Mittwoch, 17. März 2010

Page 33: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Determining descriptive phrases from snippets

✩ Surface patterns, e.g., “What is the DFKI?”

– “DFKI is a” OR “DFKI is an” OR “DFKI is the” OR “DFKI are a”…

– “DFKI, or ”.

– “(DFKI)”

– “DFKI becomes” OR “DFKI become” OR “DFKI became”

✩ Some fetched sentences:

– “DFKI is the German Research Center for Artificial Intelligence”.

– “The DFKI is a young and dynamic research consortium”

– “Our partner DFKI is an example of excellence in this field.”

– “the DFKI, or Deutsches Forschungszentrum für Künstliche ...”

– “German Research Center for Artificial

Mittwoch, 17. März 2010

Page 34: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Determining descriptive phrases from snippets

✩ Surface patterns, e.g., “What is the DFKI?”

– “DFKI is a” OR “DFKI is an” OR “DFKI is the” OR “DFKI are a”…

– “DFKI, or ”.

– “(DFKI)”

– “DFKI becomes” OR “DFKI become” OR “DFKI became”

✩ Some fetched sentences:

– “DFKI is the German Research Center for Artificial Intelligence”.

– “The DFKI is a young and dynamic research consortium”

– “Our partner DFKI is an example of excellence in this field.”

– “the DFKI, or Deutsches Forschungszentrum für Künstliche ...”

– “German Research Center for Artificial

✩ LSA-based clustering into potential senses

– Determine semantically similar words/substrings

– Define different clusters/potential senses on basis of non-membership in sentences

✩ Ex: What is Question Answering ?

– SEARCHING: Question Answering is a computer-based activity that involves searching large quantities of text and understanding both questions and textual passages to the degree necessary to. ...

– INFORMATION: Question-answering is the well-known application that goes one step further than document retrieval and provides the specific information asked for in a natural language question. ...

– …

Mittwoch, 17. März 2010

Page 35: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Example Output: What is epilepsy ?

✩ Our system’s answer in terms of clustered senses:

------------ Cluster STRANGE ----------------0<->In epilepsy, the normal pattern of neuronal activity becomes disturbed, causing strange...------------ Cluster SEIZURES ----------------0<->Epilepsy, which is found in the Alaskan malamute, is the occurrence of repeated seizures.1<->Epilepsy is a disorder characterized by recurring seizures, which are caused by electrical disturbances in the nerve cells in a section of the brain.2<->Temporal lobe epilepsy is a form of epilepsy, a chronic neurological condition characterized by recurrent seizures.------------ Cluster ORGANIZATION ----------------0<->The Epilepsy Foundation is a national, charitable organization, founded in 1968 as the Epilepsy Foundation of America.------------ Cluster NERVOUS ----------------0<->Epilepsy is an ongoing disorder of the nervous system that produces sudden, intense bursts of electrical activity in the brain....

Mittwoch, 17. März 2010

Page 36: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Def-WQA: Results

Corpus # Questions # Answered

Def-WQA/Baseline

# nuggets

Def-WQA/BaselineTREC 2003 50 50/38 14.14/7.7

CLEF 2006 152 136/102 13.13/5.43

CLEF 2005 185 173/160 13.86/11.08

TREC 2001 133 133/81 18.98/7.35

CLEF 2004 86 78/67 13.91/5.47

Corpus F-score (β=5)

Trec 2003 0.52

Trec 2003 best systems (on newspaper articles):0.5 – 0.56

Notes: • we prefer sentences instead of nuggets (readability)• we need no predefined window size for nuggets (~ 125 characters)

• Def-WQA as a basis for more applications, e.g., •list-based questions, web person identification, ontology learning

• Still missing: merging/splitting of partitions (evtl. using KBs and authority)

Mittwoch, 17. März 2010

Page 37: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA – Overview

Search Queryconstruction

Answer Candidateextraction

Answer Candidateselection

The Moving Image, Woman to Man, The Gateway, The Two Fires, Birds, The Other Half, City Sunrise, The Flame three and Shadow.

“What are 9 works written by Judith Wright?”

Mittwoch, 17. März 2010

Page 38: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA – Overview

Search Queryconstruction

Answer Candidateextraction

Answer Candidateselection

Qfocus → inbodyNPs → intitleApply 4 patterns Qi

The Moving Image, Woman to Man, The Gateway, The Two Fires, Birds, The Other Half, City Sunrise, The Flame three and Shadow.

“What are 9 works written by Judith Wright?”

Mittwoch, 17. März 2010

Page 39: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA – Overview

Search Queryconstruction

Answer Candidateextraction

Answer Candidateselection

Q1: (intitle:“Judith Wright”) AND (inbody:“works” OR inbody:“written")

Qfocus → inbodyNPs → intitleApply 4 patterns Qi

The Moving Image, Woman to Man, The Gateway, The Two Fires, Birds, The Other Half, City Sunrise, The Flame three and Shadow.

“What are 9 works written by Judith Wright?”

Mittwoch, 17. März 2010

Page 40: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA – Overview

Search Queryconstruction

Answer Candidateextraction

Answer Candidateselection

Q1: (intitle:“Judith Wright”) AND (inbody:“works” OR inbody:“written")

Qfocus → inbodyNPs → intitleApply 4 patterns Qi

The Moving Image, Woman to Man, The Gateway, The Two Fires, Birds, The Other Half, City Sunrise, The Flame three and Shadow.

“What are 9 works written by Judith Wright?”

Most of Wright's poetry was written in the mountains of southern Queensland. ... Several of her early works such as 'Bullocky' and 'Woman to Man' became standard ...

Max 80 snippets:

Mittwoch, 17. März 2010

Page 41: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA – Overview

Search Queryconstruction

Answer Candidateextraction

Answer Candidateselection

Q1: (intitle:“Judith Wright”) AND (inbody:“works” OR inbody:“written")

Qfocus → inbodyNPs → intitleApply 4 patterns Qi

Apply 8 patterns πi (hyponym, possessive,copula, quoting, etc.)

The Moving Image, Woman to Man, The Gateway, The Two Fires, Birds, The Other Half, City Sunrise, The Flame three and Shadow.

“What are 9 works written by Judith Wright?”

Most of Wright's poetry was written in the mountains of southern Queensland. ... Several of her early works such as 'Bullocky' and 'Woman to Man' became standard ...

Max 80 snippets:

Mittwoch, 17. März 2010

Page 42: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA – Overview

Search Queryconstruction

Answer Candidateextraction

Answer Candidateselection

Q1: (intitle:“Judith Wright”) AND (inbody:“works” OR inbody:“written")

Qfocus → inbodyNPs → intitleApply 4 patterns Qi

Apply 8 patterns πi (hyponym, possessive,copula, quoting, etc.)

π4: entity is \w+ qfocus \w*Chubby Hubby is …. Ben and Jerry’s ice cream brand.

The Moving Image, Woman to Man, The Gateway, The Two Fires, Birds, The Other Half, City Sunrise, The Flame three and Shadow.

“What are 9 works written by Judith Wright?”

Most of Wright's poetry was written in the mountains of southern Queensland. ... Several of her early works such as 'Bullocky' and 'Woman to Man' became standard ...

Max 80 snippets:

Mittwoch, 17. März 2010

Page 43: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA – Overview

Search Queryconstruction

Answer Candidateextraction

Answer Candidateselection

Q1: (intitle:“Judith Wright”) AND (inbody:“works” OR inbody:“written")

Qfocus → inbodyNPs → intitleApply 4 patterns Qi

Apply 8 patterns πi (hyponym, possessive,copula, quoting, etc.)

π4: entity is \w+ qfocus \w*Chubby Hubby is …. Ben and Jerry’s ice cream brand.

Use Semantic kernel &Google N-grams

The Moving Image, Woman to Man, The Gateway, The Two Fires, Birds, The Other Half, City Sunrise, The Flame three and Shadow.

“What are 9 works written by Judith Wright?”

Most of Wright's poetry was written in the mountains of southern Queensland. ... Several of her early works such as 'Bullocky' and 'Woman to Man' became standard ...

Max 80 snippets:

Mittwoch, 17. März 2010

Page 44: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab List-WQA: Results

✩ Answer Selection:

– Two measures Accuracy and F1 score.

– Two values

• All questions

• Only questions where at least one answer was found in the fetched snippets.

– Duplicate answers have also an impact on the performance. For instance:

• “Maybelline” (also found as “Maybellene” and “Maybeline”).

• John Updike’s novel “The Poorhouse Fair” was also found as “Poorhouse Fair”.

Systems\Trec 2001 2002 2003 2004

ListWebQA(F1) 0.35/0.46 0.34/0.37 0.22/0.28 0.30/0.40

ListWebQA(Acc) 0.5/0.65 0.58/0.63 0.43/0.55 0.47/0.58

Top one(Acc.) 0.76 0.65 - -

Top two(Acc.) 0.45 0.15 - -

Top three(Acc.) 0.34 0.11 - -

Top one(F1) - - 0.396 0.622

Top two(F1) - - 0.319 0.486

Top three(F1) - - 0.134 0.258

Yang & Chua 04 (F1)

- - .464 ~.469

-

We conclude: Encouraging results, competes well with 2nd best;Still creates too much noise;

Mittwoch, 17. März 2010

Page 45: Ontology-based Information Extraction and Question ...neumann/slides/NeumannOBIE2008.pdf„Who wrote the script of Saw III?" SELECT DISTINCT ?writerName WHERE { ?movie name "Saw III"^^string

OBIES 2008 • Sept. 2008 German Research Center for Artificial Intelligence

LT lab Web QA and Information Extraction

✩ WebQA:

– Combining generic lexico-syntactic patterns with unsupervised answer extraction from Snippets only

– Language independent and multilingual

– Our approach has a close relationship to the new approach of unsupervised IE, e.g., Etzioni et al. , Weikum et al., Rosenfeld & Feldman

✩ Information extraction

– WebQA as a generic tool for web-based bottom-up knowledge extraction and ontology population

– Ontology-based clustering for unsupervised information extraction

• Use ontology for generating QA requests -> ontology-driven active QA

• Use web QA for populating and extending ontology

– Interactive dynamic information extraction

Mittwoch, 17. März 2010


Recommended