+ All Categories
Home > Documents > Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C...

Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C...

Date post: 23-Dec-2015
Category:
Upload: sybil-franklin
View: 218 times
Download: 4 times
Share this document with a friend
Popular Tags:
26
Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg Knowledge Based Systems and Document Processing
Transcript
Page 1: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Detection of Relations in Textual Documents

Manuela Kunze,

Dietmar Rösner

University of Magdeburg Knowledge Based Systems and Document Processing

Page 2: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 2

Introduction

http://en.wikipedia.org/wiki/Unsupervised_learning

Page 3: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 3

Introduction

• to extract information from text, you can use techniques like simple pattern matching etc.

• additional knowledge is required:• 'Thursday': a day of a week• meaning of

• (implicit) `open' vs. `close'• `Pay-what-you-wish'

• text understanding / techniques of NLP • `Exhibition of over 30 color photographs and stories of life in

China's Yunnan Province …'

Page 4: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 4

Introduction

ontologies contain information about:

• definition/description of concepts and

• description of instances

• kind of relation (name, type),– definition of domain and range values,

– characteristic of the relation: cardinality, transitivity, ...,

Page 5: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 5

Natural Language Processing

• NLP techniques: – case frame analysis– exploiting syntactic structures– corpus-based IE for an initial ontology

• corpus:– autopsy protocols (400 protocols)– different document parts:

• findings• histological findings• background• discussion• …

– short linguistic structures – typical attribute-value structures

Page 6: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 6

Overview

Case Frame

Analysis of Specific Syntactic Structures

Discussion/Conclusion

Page 7: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 7

Case Frames

• resources:– results from syntactic parser

<NP TYPE="COMPLEX" RULE="NPC3" GEN="MAS" NUM="SG" CAS="NOM">       <NP TYPE="FULL" RULE="NP1" CAS="NOM" NUM="SG" GEN="MAS">         <N>Flachschnitt</N>       </NP>       <PP RULE="PP1" CAS="AKK">         <PRP CAS="AKK">in</PRP>         <NP TYPE="FULL" RULE="NP2" CAS="AKK" NUM="SG" GEN="NTR">           <DETD>das</DETD>           <N>Zungengewebe</N>         </NP>       </PP>     </NP>

– results from semantic tagger– description of case frames

Page 8: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 8

Case Frames

• (corpus-based) definition of roles for a concept– `Flachschnitt' (flat cut)

• `location'– sem. category: `tissue'– PP, case of NP: accusative, preposition: `in'

– `Herausschleudern' (skidding)• `patient'

– sem. category: `body-hum'– NP; case of NP: genitive

• `location' – sem. category: `vehicle' – PP, case of NP: dative, preposition: `aus'

Page 9: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 9

Case Frames…<CONCEPT TYPE="medicalOperation">

        <WORD>Flachschnitt</WORD>         <DESC>medizinischer Schnitt</DESC>         <SLOTS>                 <RELATION TYPE="LOCATION">                         <ASSIGN_TO>TISSUE</ASSIGN_TO>                         <FORM>P(akk, fak, in)</FORM>                         <CONTENT>in das Zungengewebe</CONTENT>                 </RELATION>         </SLOTS> </CONCEPT>

<CONCEPT TYPE="traffic-event">         <WORD>Herausschleudern</WORD>         <DESC>event</DESC>         <SLOTS>                 <RELATION TYPE="PATIENT">                         <ASSIGN_TO>BODY-HUM</ASSIGN_TO>                         <FORM>N(gen, fak)</FORM>                         <CONTENT>des Koerpers</CONTENT>                 </RELATION>                 <RELATION TYPE="LOCATION">                         <ASSIGN_TO>VEHICLE</ASSIGN_TO>                         <FORM>P(dat, fak, aus)</FORM>                         <CONTENT></CONTENT>                 </RELATION>         </SLOTS> </CONCEPT>

Page 10: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 10

Case Frames

• coverage of phrases like `fracture of elbow joint'?

• abstraction– `fracture' (sem. category: `trauma')

• role `patient': sem. category: `bone'

– `bruise' (sem. category: `trauma')• role `patient': sem. category: `organ'

– `hematoma' (sem. category: `trauma')• role `patient': sem. category: `tissue'

• concept x (sem. category: `trauma')– role `patient': sem. category: `body-part'

Page 11: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 11

Case Frames

• results:– relations are defined by the case frame

• name/type of relation• domain, range

– corpus-based abstractions:• redefinition of semantic restriction

– use the least general hypernym as semantic restriction

• not yet extracted:– information about the characteristic of a relation

Page 12: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 12

Overview

Case Frame

Analysis of Specific Syntactic Structures

Discussion/Conclusion

Page 13: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 13

Analysis of Specific Syntactic Structures

• from general to specific information• resources:

– results from syntactic parser– results from semantic tagger– description of interpretation of syntactic structures

• Which word class can be interpreted as concept/instance?

• Which word class describes a relation?– adjective in a NP: describes the noun in the NP relation `prop‘– negations: negate concepts, verbs, or properties of a concept– particle: modification of adjectives

Page 14: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 14

Analysis of Specific Syntactic Structures

CLMed N ADJ

prop(N, ADJ)

N interpreted as concept

ADJ interpreted as concept

results:

prop_catadj(N,ADJ)

Page 15: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 15

Analysis of Specific Syntactic Structures

`liver tissue bloodless‘

Steps:

bloodless*blood

concentrationbloodless

liver_tissue* tissueliver tissue

• nouns and adjectives are interpreted as concept/instance

• adjectives describe a relation• in general: 'prop'

prop_blood-concentrationprop_blood-concentration

conceptinstancerelation

Page 16: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 16

Analysis of Specific Syntactic Structures`liver tissue bloodless‘

<owl:Class rdf:ID="lebergewebe">

<rdfs:subClassOf><owl:Class rdf:ID="tissue"/></rdfs:subClassOf></owl:Class>

<owl:Class rdf:ID="blood-concentration"/>

<owl:Class rdf:ID="blutleer">

<rdfs:subClassOf rdf:resource="#blood-concentration"/></owl:Class>

<owl:ObjectProperty rdf:ID="prop_blood-concentration">

<rdfs:domain rdf:resource="#tissue"/><rdfs:range rdf:resource="#blood-concentration"/></owl:ObjectProperty>

<lebergewebe rdf:ID="Lebergewebe_6">

<prop_blood-concentration><blutleer rdf:ID="blutleer_7"/></prop_blood-concentration></lebergewebe> …

Page 17: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 17

Analysis of Specific Syntactic Structures"kaum wahrnehmbare Unterblutungen"(Engl. "hardly detectable hematomas")

results of syntactic parser:<NP TYPE="FULL" RULE="NP4" CAS="_" NUM="PL" GEN="FEM">

<ADJP RULE="ADJP1">

<ADV>kaum</ADV>

<ADJ>wahrnehmbare</ADJ>

</ADJP>

<N>Unterblutungen</N>

</NP>

results of semantic tagger:– `kaum': weak-graduation– `wahrnehmbar': unknown token– `Unterblutung': trauma

resources for interpretation:• N: concept/instance• ADJ:

• concept/instance• rel: prop

• ADV:• concept/instance• rel: mod

adverb specifies adjective

adjective specifies noun

Page 18: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 18

Analysis of Specific Syntactic Structures

`hardly detectable hematomas‘ Steps:

detectable* unspecified

hematoma* traumahematoma

• nouns, adjectives and adverbs are interpreted as concept/instance

• adjectives and adverbs describe relations

prop_unspecifiedprop_unspecified

conceptinstancerelation

hardly* hardly weak-graduation

mod_weak-graduationmod_weak-graduation

Page 19: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 19

Analysis of Specific Syntactic Structures`hardly detectable hematomas‘

<owl:Class rdf:ID="unterblutung"><rdfs:subClassOf rdf:resource="#trauma"/></owl:Class>

<owl:Class rdf:ID="trauma"/>

<owl:Class rdf:ID="wahrnehmbar">

<rdfs:subClassOf rdf:resource="#unspecified"/></owl:Class>

<owl:Class rdf:ID="unspecified"/>

<owl:Class rdf:ID="kaum">

<rdfs:subClassOf rdf:resource="#weak-graduation"/></owl:Class>

<owl:Class rdf:ID="weak-graduation"/>

Page 20: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 20

Analysis of Specific Syntactic Structures`hardly detectable hematomas‘

<owl:ObjectProperty rdf:ID="mod_weak-graduation">

<rdfs:domain rdf:resource="#unspecified"/>

<rdfs:range rdf:resource="#weak-graduation"/></owl:ObjectProperty>

<owl:ObjectProperty rdf:ID="prop_unspecified">

<rdfs:domain rdf:resource="#trauma"/>

<rdfs:range rdf:resource="#unspecified"/></owl:ObjectProperty>

<unterblutung rdf:ID="Unterblutungen_5">

<prop_unspecified rdf:resource="#wahrnehmbare_4"/></unterblutung>

<wahrnehmbar rdf:ID="wahrnehmbare_4">

<mod_weak-graduation rdf:resource="#kaum_3"/></wahrnehmbar>

<kaum rdf:ID="kaum_3"></kaum>

Page 21: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 21

Analysis of Specific Syntactic Structures

conceptinstancerelation

Protégé Plugin for Visualization: Ontoviz

Phrases like: • NP NP NP• NP N Adj Conj Adj• NP N conj N Adj• …

Page 22: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 22

Analysis of Specific Syntactic Structures

• results– definition of concepts/instances– corpus-based definition/concretion of relations:

• prop prop_catADJ

• information about domain, relation

• not extracted:– information about the characteristic of a relation

Page 23: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 23

Overview

Case Frame

Analysis of Specific Syntactic Structures

Discussion/Conclusion

Page 24: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 24

Conclusion

• NLP techniques for extraction of information– analyse syntactic structures – information about semantic categories– result: corpus-based description of an initial ontology

• case frame analysis– relations are described in the case frame– disadvantage: creation of case frames– advantage: a definition of the relation

• analysis specific syntactic structures– a general interpretation of tokens and the syntactic structures– redefined by results from the semantic tagger– disadvantage: in some case, only the general relation definition is

delivered– advantage: less effort to describe the resources

Page 25: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 25

Conclusion

• no information about the characteristic of a relation (cardinality, …)

• solutions– analyse occurrences in the corpus

• corpus-based assumption about cardinality

– integration of additional knowledge• initial domain specific ontology

Page 26: Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.

Kunze, Rösner: Detection of Relations in Textual Documents 26

Key Aspects for IE

• ‘conceptual’ preprocessing steps: Names of concepts occur in different linguistic structures; compound vs. complex noun phrase (like ‘liver tissue’ and ’tissue of liver’)

– handle only one canonical linguistic structure as a representative for all paraphrases

• treatment of generalisation within local contexts – The token ‘liver’ may occur in the first sentence of a paragraph. In the next sentences

of the paragraph, only the hypernym ‘organ’ is used.

• concept or instance: which term in a linguistic structure has to be interpreted as a concept and which as an instance of a concept resp.

• definition of the scope for a concept: – a paragraph starts with a description of an organ (e.g. organ ‘liver’ in: ‘The liver

shows ... . Bloodrichness of the tissue.’ ), after this follows a description of parts of the organ (e.g., ‘Gewebe’). In such cases, additional knowledge about the domain has to be employed (for example, about meronyms or holonyms)

– tissue part-of liver vs tissue part-of concept X


Recommended