+ All Categories
Home > Documents > Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin...

Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin...

Date post: 24-Jun-2020
Category:
Upload: others
View: 3 times
Download: 1 times
Share this document with a friend
23
Automatic Extraction of Topic Maps based Argumentation Trails Text Mining Services Conference Leipzig, 2009/03/25 Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bock Natural Language Processing Group Department of Computer Science University of Leipzig
Transcript
Page 1: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

Automatic Extraction of Topic Maps basedArgumentation Trails

Text Mining Services ConferenceLeipzig, 2009/03/25

Marco Büchler, Lutz Maicher,Frederik Baumgardt, Benjamin Bock

Natural Language Processing GroupDepartment of Computer Science

University of Leipzig

Page 2: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

2

Starting Point: Panionion

Page 3: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

3

Computation of argumentation trails on fragmentary texts

Surplus and relation between Topic Maps and argumentation trails

Results

Further work / conclusion

Agenda

Page 4: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

4

Technical details

Page 5: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

5

Text source

Page 6: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

6

Co-occurrence as underlying graph- de Saussure (1898/1916):

Structuralism assumes that meaning is the result of structural relations between word forms

The fundamental structural relations are syntagmatic and paradigmatic relations [Heyer & Bordag 2007]

Argumentation trails vs. Lexical Chaining

- fragmentary texts

Underlying graph

Page 7: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

7

“Definition/Motivation”: What's the average path length in a graph?

Average path length is typically not larger than7.Average path length is typically not larger than7. Simple proof of concept (Using XING):Simple proof of concept (Using XING):

Every person of my contacts has in Every person of my contacts has in average about 73 contacts (1. and 2.average about 73 contacts (1. and 2. level) level) loglog7373(6,800,000,000)= 5,28(6,800,000,000)= 5,28

Small World

Page 8: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

8

Methodology

Page 9: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

9

Topic Maps

Page 10: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

Data model of Topic Maps (Topics)

10

Nikolaikirche

variant

St. Nicholas Church

St. Nikolai

name

English

scope

1165occurrence

www.nikolaikirche-leipzig.de/

occurrence

foundation

type

website

type

Page 11: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

Data model of Topic Maps (Associations)

11

St. Nikolai Leipzig

association

container-containee

ass. rolerole player

containercontainee

role type

Page 12: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

Data model of Topic Maps (Summary)

one topic represents one subject in a data source− names represent the names of the subject

names might have variants− occurrences represent properties of the subject− associations represent relationships between subjects

flexibility through roles n-ary associations

− all types and scopes are (set of) Topics in a topic map everything is a topic

12

Page 13: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

What are Topic Maps (ISO 13250)?

Topic Maps are highly-networked data sources one topic for each subject relationships of subjects are associations between topics

Topic Maps have a human-centric data model vocabulary for documenting information fits human cognition network resembles human cognition

Topic Maps have an integration model whenever two topics represent the same subject, they have to be merged always one information access hub for each subject high terminological flexibility and schema-free use in knowledge federation and sensemaking

Topic Maps is an international industry standard (ISO 13250)

T

13

Page 14: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

14

Extraction of typed significant terms

Corpus is categorized in several classification schemas.

Split corpus into several sub corpora

Medusa

age gender geography

....

Categorized co-occurrences/terms

Tomcat/Prefuse

Age

gender

geography

(Source:Taken from bachelor thesis slides of Marcus Puchalla.)

(

Page 15: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

15

Results

Page 16: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

16

Several graph properties

Number of nodes 538,572 388,929 363,359 353,618 1,14 9 4,487 2,178

57,762,474 34,818,138 25,615,956 21,004,538 15,4 36 126,188 152,856

30,382,422 21,739,476 17,687,582 15,462,940 14 ,876 69,858 84,124

Percentage 0.53 0.62 0.69 0.74 0.96 0.55 0.55

Average degree 56.41 55.90 48.68 43.73 12.95 15.57 38.62

Number of trails 361.094 7.958.240 3.087.581

Average degree 15.34 9.93 7.70 6.79 7.03 7.77 9.93

31.34 21.08 14.33 11.45 7.02 10.15 12.31

301.38 362.56 285.86 231.39 55.66 76.06 81.86

Complete graph

w_id>=100 &&

freq(word)>1

w_id>=300 &&

freq(word)>1

w_id>=500 &&

freq(word)>1

Named Entities

Normalised Named Entities

Normalised Text and Named Entities

Number of co-occurrences

Number of significant co-occurrences

> 108 > 108 > 108 > 108

Average degree of internal node (trail length 2)

Average degree of internal node (trail

length 3)

Grap

h pr

oper

ties

Argu

men

tatio

n tra

il pr

oper

ties

Page 17: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

17

Visualisation of two argumentation trails

Page 18: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

Marco Büchler

onotoa.topicmapslab.de

Topic-Maps-Ontologie for the Argumentation Trails

Topic Maps and Argumentation Trails

Page 19: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based
Page 20: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based
Page 21: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based
Page 22: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based
Page 23: Marco Büchler, Lutz Maicher, Frederik Baumgardt, Benjamin Bockasv.informatik.uni-leipzig.de/.../buechler_maicher_argumentationtrails… · Automatic Extraction of Topic Maps based

23

- Reduction of graph comlexity- e. g. by semantic pre-clustering or - authors restrictions

- Weighting of argumentation trails- e. g. Trails containing hubs should be weighted lower

- Improvements in visualisation- Clustering of similar trails to a bunch of semanitic similar trails

- Improvements in typing nodes and especially edges

Further work / conclusion


Recommended