DIESIRAE A semantic search engine
based on NLP
L. Sbattella and R. Tedesco
Knowledge Management
OmniFind
tag-based query
Con
text
-aw
are
Web
Por
tal
Admin
Winery
Consultant
Sales agent
Restaurateur
Retail
Wine lover
Oenologist
Trend setter
Text Extractor
Relat.DBWeb Apps
Cont
ext-b
ased
Info
rmat
ion
Filte
rs
PDFDOC
...
Domain OntologySemantic Network(s)Domain Model
NL query
DW & DB query
Data Feed Extractor
PerLa Extractor
Ontology Extractor
Taxonomy
KnowledgeIndexing & Extraction
AD-DDIS
XML
DB DW
Internal Enterprise Data PROM
Process Data Extractor
Process
on-the-fly queries to external sources
processquery
Enterprise ApplicationsMapping
Model
Knowledge Indexing & Extraction: Goals • Domain model à Ontology (W3C OWL standard)
– Describes the concepts of the domain
• Domain vocabulary à Semantic Network – Describes the lemmas of the domain
• Mapping model à Stochastic model – 2° order HMM-inspired model – Transition probs approximated by means of MaxEnt models – Solves mapping ambiguities
• Queries: – Keyword-based (AND/OR; max probability/exaustive) – Phrase-based (Disambiguated Word queries and Ontological
queries)
Knowledge indexing & extraction: Functionalities
Information Extraction Engine
Conceptual Index
Document Repository
Query EngineNL query
doc. / URLupload
Text Extractor
WebPDFDOC
...
document file + text
Domain Modelplain text
concepts
concepts
domainknowledge
domainknowledge
Ontology Extender
Training set
Training Procedure
Domain Ontology
Semantic Network(s)
Domain Model
Document Repositoryfor training
TestingProcedure
Test set
Linguistic Context Extractor
Expert
MappingModel
Con
text
-aw
are
Web
Por
tal
Domain Ontology
Semantic Network(s)
MappingModel
To the Internal Enterprise Data module
Exporter
Training Indexing, querying, and extending
Keyword-based queries
• Sequence of isolated words – No linguistic structure
• Exhaustive AND/OR keywords – No concept disambiguation – Searches for multiple tuples – Example: light wine à several meanings found…
country à search for instances…
• Max probability AND/OR keywords – Searches for a single tuple – Exploits the a-priori concept probabilities – Example: [light wine] à max probability meaning
Phrase-based queries
• Phrase – Linguistic structure – Context-based disambiguation
• Disambiguated Word queries – Context used for concept disambiguation
• Index the phrase ( à extract concepts) • Search for AND-ed concepts
– Example: (fruit taste) à disambiguates fruit
• Ontological queries – Context used to select the request to the ontology
• Indexes the sentences • Select the request; searches the ontology for the mapped concepts
– Example: “type of tannins in wine” à instance list