MotivationIssues of transformation
Conclusions
On Distance between Deep Syntax and SemanticRepresentation
Vaclav Novak
Institute of Formal and Applied LinguisticsCharles University
Prague, Czech Republic
Frontiers in Linguistically Annotated CorporaJuly 22, 2006, 16:00 – 16:30
Sydney, Australia
[email protected] Syntax – Semantic Distance 1/ 20
MotivationIssues of transformation
Conclusions
Presentation Outline
Again
1 MotivationMultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
2 Issues of transformationMappingTopic-Focus ArticulationAdditional Requirements
3 ConclusionsConclusionsRelated WorkFuture Work
[email protected] Syntax – Semantic Distance 3/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
MultiNet
What is MultiNet
Multilayered Semantic Network
University in Hagen, Germany
Hermann Helbig, Sven Hartrumpf
Parser: WOCADI for German(relies heavily on HaGenLex lexicon)
MWR interface (Workbench of Knowledge Engineer)
Designed w.r.t. question answering and cognitive modeling
[email protected] Syntax – Semantic Distance 4/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Semantic Network
Properties of Semantic Networks
Everything represented as graph nodes
The utterances gradually build the graph
Inference rules can further connect the nodes(or add new ones)
⇒ Representation of knowledge, usable for inferencing and QA
[email protected] Syntax – Semantic Distance 5/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
[email protected] Syntax – Semantic Distance 6/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
MultiNet Example: “The car was damaged because of the impact.”
[email protected] Syntax – Semantic Distance 7/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
MultiNet – technical info
Properties of MultiNet
93 relations + 18 functions
7 layers of attributes
hierarchy of 46 sorts
1 edge-end attribute distinguishing immanent (prototypical /categorical) vs. situational knowledge
encapsulation of concepts
default vs. categorical inference rules
[email protected] Syntax – Semantic Distance 8/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Prague Dependency Treebank
Developed at the Institute of Formaland Applied Linguistics, CharlesUniversity, Prague
Three layers of annotation
3,168 documents ≈ 49,442 sentences≈ 833,357 tokens annotated on allthree layers.
[email protected] Syntax – Semantic Distance 9/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Prague Dependency Treebank
[email protected] Syntax – Semantic Distance 10/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Prague Dependency Treebank
Sed’ klidne, nehybej se, nabadala me prıtelkyne. Kritizovali hvezdny system, verıce v autenticnost dosudneokoukanych tvarı, ktere se vsak zahy take staly hvezdami (a nenı to jen osud Belmonduv). Pacient, vzpomenuv sina vsechna prıkorı zpusobena mu spolecnostı, vztahl na doktora ruku.
[email protected] Syntax – Semantic Distance 10/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Prague Dependency Treebank
Sed’ klidne, nehybej se, nabadala me prıtelkyne. Kritizovali hvezdny system, verıce v autenticnost dosudneokoukanych tvarı, ktere se vsak zahy take staly hvezdami (a nenı to jen osud Belmonduv). Pacient, vzpomenuv sina vsechna prıkorı zpusobena mu spolecnostı, vztahl na doktora ruku.
[email protected] Syntax – Semantic Distance 10/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Prague Dependency Treebank
Sed’ klidne, nehybej se, nabadala me prıtelkyne. Kritizovali hvezdny system, verıce v autenticnost dosudneokoukanych tvarı, ktere se vsak zahy take staly hvezdami (a nenı to jen osud Belmonduv). Pacient, vzpomenuv sina vsechna prıkorı zpusobena mu spolecnostı, vztahl na doktora ruku.
.
.
t-lnd94103-085-p1s21B
root
přítelkyně
ACT
n.denot
#PersPron
ADDR
n.pron.def.pers
nabádat enunc
PRED
v
#PersPron
ACT
n.pron.def.pers
sedět
PAT
v
klidný
MANN
adj.denot
#Comma enunc
CONJ
coap
#Neg
RHEM
atom
hýbat_se
PAT
v
#PersPron
.
.
.
t-ln94200-173-p2s6
root
#PersPron
ACT
n.pron.def.pers
kritizovat enunc
PRED
v
systém
PAT
n.denot
hvězdný
RSTR
adj.denot
#Cor
ACT
qcomplex
věřit
COMPL
v
autentičnost
PAT
n.denot.neg
tvář
APP
n.denot
okoukaný
RSTR
adj.denot
dosud
TTILL
adv.denot.ngrad.nneg
však
PREC
atom
který
ACT
n.pron.indef
záhy
TWHEN basic
adv.denot.ngrad.nneg
také
RHEM
atom
stát_se
RSTR
v
hvězda
PAT
n.denot
a
PREC
atom
ten
ACT
n.pron.def.demon
#Neg
RHEM
atom
být enunc
PAR
v
osud
PAT
n.denot
jen
RHEM
atom
Belmondo
APP
n.denot
.
t-ln94211-120-p5s4
root
pacient
ACT
n.denot
#Cor
ACT
qcomplex
vzpomenout_si
COMPL
v
příkoří
PAT
n.denot
#PersPron
PAT
n.pron.def.pers
způsobený
RSTR
adj.denot
společnost
ACT
n.denot
který
RSTR
adj.pron.indef
vztáhnout enunc
PRED
v
doktor
PAT
n.denot
ruku
DPHR
dphr
[email protected] Syntax – Semantic Distance 10/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Tectogrammatical Representation
Properties of Tectogrammatical Layer
One sentence ≈ one tree
Auxiliaries and function words removed
Missing obligatory valents inserted
Attributes of nodes
FunctorSemantic part of speech15 grammatemes (negation, tense, politeness, . . . )Topic-Focus distinctionSentential modality+ technical attributes (coordinations, parentheses, IDs)
[email protected] Syntax – Semantic Distance 11/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Tectogrammatical Representation
.
.
t-lnd94103-085-p1s21B
root
přítelkyně
ACT
n.denot
#PersPron
ADDR
n.pron.def.pers
nabádat enunc
PRED
v
#PersPron
ACT
n.pron.def.pers
sedět
PAT
v
klidný
MANN
adj.denot
#Comma enunc
CONJ
coap
#Neg
RHEM
atom
hýbat_se
PAT
v
[email protected] Syntax – Semantic Distance 12/ 20
MotivationIssues of transformation
Conclusions
MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
Additional Required Information
Missing Pieces
1 Named entities recognition
NumbersPlacesPeople. . .
2 Metadata
AuthorDatePlaceDocument typeIntended recipient of the textBibliographical and other references
[email protected] Syntax – Semantic Distance 13/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Presentation Outline Again
1 MotivationMultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces
2 Issues of transformationMappingTopic-Focus ArticulationAdditional Requirements
3 ConclusionsConclusionsRelated WorkFuture Work
[email protected] Syntax – Semantic Distance 14/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Mapping of Representational Means
Main Issues of Transformation
– closer look
1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs toattribute-value assignments
4 Mapping of verbal tenses to temporal axis
[email protected] Syntax – Semantic Distance 15/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 1
1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles
Actor and Patient highly ambiguousLocation functors are used also where no location is involved(ELMT, CTXT, SITU)However, other functors correspond quite straightforwardly toMultiNet roles (a table is presented in the paper)
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs toattribute-value assignments
4 Mapping of verbal tenses to temporal axis
[email protected] Syntax – Semantic Distance 15/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 2
1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
Typically, a TR node corresponds to a MultiNet concept (i.e.,also a node)Quite often, a TR node corresponds to a subnetwork inMultiNetSometimes, the TR node corresponds to an edge in MultiNet(e.g., CORR, CTXT)
3 Mapping of various natural language constructs toattribute-value assignments
4 Mapping of verbal tenses to temporal axis
[email protected] Syntax – Semantic Distance 15/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 3
1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs toattribute-value assignments
The color of x is y .
x has y color.
x is y .
y is the color of x .
ATTR
SUB
color
VAL
y
x4 Mapping of verbal tenses to temporal axis
[email protected] Syntax – Semantic Distance 15/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Mapping of Representational Means
Main Issues of Transformation – closer look 4
1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles
2 Mapping of TR nodes to MultiNet concepts
3 Mapping of various natural language constructs toattribute-value assignments
4 Mapping of verbal tenses to temporal axis
Verbal tenses encoded in grammatemesIn MultiNet, TEMP, ANTE, DUR, STRT, and FIN relations can beused.
[email protected] Syntax – Semantic Distance 15/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Topic-Focus Articulation
TFA in PDT
TFA is annotated on the Tectogrammatical layer
Every word has an attribute: c, t, or f
The nodes are ordered with respect to“communicativedynamism”
⇓
TFA in MultiNet
Content expressed by TFA is further analyzed into:1 Encapsulation of concepts2 Scope of quantifiers3 Layer attributes (GENER, REFER, VARIA, . . . )
[email protected] Syntax – Semantic Distance 16/ 20
MotivationIssues of transformation
Conclusions
MappingTopic-Focus ArticulationAdditional Requirements
Additional Requirements
Additional Requirements
1 Spatio-Temporal Representation
For simple inferences about space and time
2 Calendar
For computations with dates
3 Ontology
For all kinds of inferencesOntology is an inherent part of MultiNet semantic networkdesignUpper conceptual ontology represented by sorts
[email protected] Syntax – Semantic Distance 17/ 20
MotivationIssues of transformation
Conclusions
ConclusionsRelated WorkFuture Work
Conclusions
Conclusions
MultiNet is a suitable formalism for inferences and QA
It’s difficult to transform texts into MultiNet
Tectogrammatical representation is not designed forinferencing and QA
There are tools for text-to-TR conversion
TR is a good starting point for conversion to MultiNet(structural similarity, disambiguation in TR)
We have presented issues arising in such a process
[email protected] Syntax – Semantic Distance 18/ 20
MotivationIssues of transformation
Conclusions
ConclusionsRelated WorkFuture Work
Related Work
Related Work
Helbig (1986): Automatical transformation to MultiNet
Horak (2001): Automatical transformation to TransparentIntensional Logic
Callmeier et al. (2004): DeepThought project – automaticaltransformation to Robust Minimal Recursion Semantics
Bos (2005): Automatical transformation to DiscourseRepresentation Theory
Bolshakov and Gelbukh (2000): Automatical transformationin Meaning–Text Theory framework
Kruijff-Korbayova (1998): TR to DRT automaticaltransformation
[email protected] Syntax – Semantic Distance 19/ 20
MotivationIssues of transformation
Conclusions
ConclusionsRelated WorkFuture Work
Future Work
Future Work
1 Stage I – Preparation
Annotation toolsAnnotation guidelines
2 Stage II – Annotation
Pilot studyAutomated preprocessingEvaluation of annotators
3 Stage III – Application
Supervised“parsing”Assessment of TR necessity
[email protected] Syntax – Semantic Distance 20/ 20