7/29/2019 Ontology Learning From Swedish Text
1/29
Ontology learning from
Swedish textJan Bothma
Supervisor: Eva Blomqvist
7/29/2019 Ontology Learning From Swedish Text
2/29
What is the problem?
Computer support implemented manually
Problem Goals State of the art Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
3/29
Problem Goals State of the art Contribution Recap
Berners-Lee, Tim et al (May 1, 2001)."The Semantic Web". Scientific American
7/29/2019 Ontology Learning From Swedish Text
4/29
Problem Goals State of the art Contribution Recap
Berners-Lee, Tim et al (May 1, 2001)."The Semantic Web". Scientific American
7/29/2019 Ontology Learning From Swedish Text
5/29
What is the problem?
Computer support implemented manually Encode semantics
Problem Goals State of the art Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
6/29
What is the problem?
Computer support implemented manually Encode semantics
Problem Goals State of the art Contribution Recap
Encoding semantics is time consuming
Domain Expert knowledge Encoding knowledge a challenge in itself Keeping current
7/29/2019 Ontology Learning From Swedish Text
7/29
What is the problem?
Computer support implemented manually Encode semantics
Encoding semantics is time consuming
Problem Goals State of the art Contribution Recap
One approach: Ontology learning Ontology learning from text
NLP, Computational Linguistics Information Extraction, Data mining, etc
Ontology learning from Swedish text
Problem is to encode domain semantics
from Swedish text
7/29/2019 Ontology Learning From Swedish Text
8/29
What is the problem?
Problem Goals State of the art Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
9/29
Motivation and goals
Prototype system for OL from Swedish Much research in each subtask How does it fit together?
Problem Goals State of the art Contribution Recap
Investigate OL from Swedish text Identify relevant tools Target further research
7/29/2019 Ontology Learning From Swedish Text
10/29
Motivation and goals
Specifically Preprocess natural language Swedish corpus Extract evidence for important
Concepts Subclass relations
Non-taxonomic relations Open domain Semi-supervised
Problem Goals State of the art Contribution Recap
Investigate OL from Swedish text Prototype system for OL from Swedish
7/29/2019 Ontology Learning From Swedish Text
11/29
Relevant state of the art
Problem Goals State of the art
Preprocess Concepts
Taxonomy Relations OL Systems
Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
12/29
Relevant state of the art
Problem Goals State of the art
Wilson Wong, Wei Liu, and Mohammed Bennamoun. 2012. Ontology learning from
text: A look back and into the future.ACM Comput. Surv. 44, 4, Article 20(September 2012), 36 pages.
7/29/2019 Ontology Learning From Swedish Text
13/29
Relevant state of the art
Part of Speech tagging
Compound splitting
Stemming/Lemmatisation
Syntactic analysis Chunking
Syntactic Dependencies
Word sense disambiguation Named entity recognition
Coreference resolution
Problem Goals State of the art
Preprocess Concepts
Taxonomy Relations OL Systems
Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
14/29
Relevant state of the art
Parsers Stanford TnT HunPOS MaltParser
Saldo Systems
GATE NLTK Korp
Corpus Workbench
Problem Goals State of the art
Preprocess Concepts
Taxonomy Relations OL Systems
Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
15/29
Relevant state of the art
Lexo-syntactic Patterns
TF-IDF
C-Value/NC-Value
PageRank, graph-based
Markov Logic Network-based syntactic parse
Problem Goals State of the art
Preprocess Concepts
Taxonomy Relations OL Systems
Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
16/29
Relevant state of the art
General taxonomies (e.g. WordNet)
Lexico-syntactic Patterns
Agglomerative Clustering
Distributional similarity
Formal Concept Analysis
Markov Logic Network-based syntactic parse
Problem Goals State of the art
Preprocess Concepts
Taxonomy Relations OL Systems
Contribution Recap
7/29/2019 Ontology Learning From Swedish Text
17/29
Relevant state of the art
Association rule mining
Lexico-syntactic patterns
Graph theory
Markov Logic Network-based syntactic parse
Compound splitting/paraphrasing
Problem Goals State of the art
Preprocess Concepts
Taxonomy Relations OL Systems
Contribution Recap
P bl
7/29/2019 Ontology Learning From Swedish Text
18/29
Relevant state of the art
Text2Onto
OntoUSP
OntoCMaps
OntoGain
Cross-language (Hjelm, Volk)
Problem Goals State of the art
Preprocess Concepts
Taxonomy Relations OL Systems
Contribution Recap
P bl
7/29/2019 Ontology Learning From Swedish Text
19/29
Key idea and contribution
Problem Goals State of the art Contribution Recap
Prototype ontology learning from Swedish text
P bl
7/29/2019 Ontology Learning From Swedish Text
20/29
Key idea and contribution
Problem Goals State of the art Contribution
P bl
7/29/2019 Ontology Learning From Swedish Text
21/29
Preprocessing
CRP
tillhr
en
grupp
proteiner
,
s
k
pentraxiner
, och
utgr
en
del
av
det
akuta
inflammationssvaret
vid
t
ex
akuta
infektionssjukdomar
.
Problem Goals State of the art Contribution
Preprocess
Recap
Justkidding!
P bl
7/29/2019 Ontology Learning From Swedish Text
22/29
Key idea and contribution
Korp POS Lemmatisation Semantic dependencies
GATE Traverse annotations programmatically Pattern match over annotations (JAPE)
Problem Goals State of the art Contribution
Preprocess
Concepts Taxonomy Relations Evaluation
Recap
Problem
7/29/2019 Ontology Learning From Swedish Text
23/29
Key idea and contribution
Noun Phrase Linguistic Filter Adj* Noun+
C-value Reward log(|candidate|) Reward frequency of candidate Penalise occurrence as substring Reward independence from containing candidates
Problem Goals State of the art Contribution
Preprocess
Concepts Taxonomy Relations Evaluation
Recap
Term Confidence
kardiovaskulr sjukdom 1
lkartidningen nr 0.8797681
mm hg 0.6492587
et al 0.6396211
fysisk aktivitet 0.4494908
typ 2-diabetes 0.413715
kardiovaskulra hndelse 0.32784185
hg blodtryck 0.32001647
potentiell bindning 0.31713346
Problem
7/29/2019 Ontology Learning From Swedish Text
24/29
Key idea and contribution
Hierarchical Agglomerative Clustering Similarity Measure
Reward common phrase head Reward common words Penalise disjoint words
Head = last noun in NP
Problem Goals State of the art Contribution
Preprocess
Concepts Taxonomy Relations Evaluation
Recap
Blodtryck
DiastoliskBlodtryck SystoliskBlodtryck HgBlodtryck
Problem
7/29/2019 Ontology Learning From Swedish Text
25/29
Key idea and contribution
Hierarchical Agglomerative Clustering Lexico-syntactic pattern
Supersuch assub1,sub2,and/orsub3
Superssomsub1,sub2,och/ellersub3
Problem Goals State of the art Contribution
Preprocess
Concepts Taxonomy Relations Evaluation
Recap
Breda ochjourtunga specialiteter, ssomallmnkirurgiochfamiljemedicin,
JourtungaSpecialitet
FamiljemedicinAlmnkirurgi
Problem
7/29/2019 Ontology Learning From Swedish Text
26/29
Key idea and contribution
Problem Goals State of the art Contribution
Preprocess
Concepts Taxonomy Relations Ontology Evaluation
Recap
Syntactic dependencies
Ultraljud visade trombos av vena portae
root
subject object
Problem
7/29/2019 Ontology Learning From Swedish Text
27/29
Evaluation
Problem Goals State of the art Contribution
Preprocess
Concepts Taxonomy Relations Evaluation
Recap
Broad medical domain 1381 articles 12 + 8 hours
MeSH C14 - Hjrt-krlsjukdomar 312 Articles 3 + 1 hours
Candidates thousands of terms, hundreds of relations
Useful evidence Hundreds of terms, dozens of relations Ranking helps
Problem
7/29/2019 Ontology Learning From Swedish Text
28/29
Future work
Ranking relations Thorough evaluation Provenance
User interaction, e.g. Corpus and rationale Ontology
Extensibility Evidence combination
Methods Cross-language
Ontology Consistency and Reasoning
Problem Goals State of the art Contribution
Preprocess Concepts Taxonomy Relations Evaluation
Recap
Problem
7/29/2019 Ontology Learning From Swedish Text
29/29
Recap
Implementing computer support is manual
Encode semantics
Semi-supervised Ontology Learning
Problem Goals State of the art Contribution Result Recap