Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou...

Summarizing Email Conversations with Clue Words

Giuseppe CareniniRaymond T. NgXiaodong Zhou

Department of Computer ScienceUniv. of British Columbia

2

Motivations of Email Summarization

Email overloading– 40~60 emails per day or even more…

Personal information repository Email summarization can be helpful

– Two examples Meeting Access emails from mobile devices.

3

Outline

Characteristics of email Related work Our summarization approach Experimental results Conclusions and future work

4

Characteristics of Emails

Conversation structure– Context related: reply to the previous

messages. (>60%)

Hidden email– A hidden email is an email quoted by at

least one email in a folder but is not present itself in the same folder.

Writing style– Short length, informal writing, multiple

authors, etc.

AB

> A> BCE

> D> > A> > B > C

F> > A> > BG

m1

m2

m3

m4

5

Requirements for Email Summarization

Conversation structure– Context information is provided.

Information completeness– Include hidden emails as well as existing

messages.

Informative summarization– Cover the core points of the email discussion.– Replacement of the original emails.

6

Outline

Characteristics of email Related work Our summarization approach Result Conclusions and future work

7

Related Work

Multi-Document Summarization (MDS)– Extractive: MEAD, MMR-MD.– Abstractive/Generative: MultiGen, SEA

Email summarization– Single email summarization(Muresan et al.)– Summarizing email threads by sentence

selection (Rambow et al. and Wan et al.)

8

Related Work

MDS methods Email summarization Our method

MEAD & MMR-MD

MultiGen SEA Muresan

et al.

Rambow

et al.

Wan

et al.

Hidden Email

Hidden

Emailx

Conv.

Structure

Thread x x x

Quotation analysis x

informative

summary

Sentence selection x x x x x

Lang. gen.

x x

9

Outline

Characteristics of email Related work Our summarization approach

– Fragment quotation graph– ClueWordSummarizer (CWS)

Result Conclusions and future work

10

Framework

Input: a set of emails Output: email summaries Process:

– Discover and represent email conversations as fragment quotation graphs

– ClueWordSummarizer generates email summaries.

11

Conversation Structure - Fragment Quotation Graph

Complications of email conversation: – Header information

E.g., subject, in-reply-to, and references. Not accurate enough.

– Quotation A good indication for email conversation(Yeh et al.). Selective quotations reflect the conversation in detail.

– Assumption: quotation conversation

Build a fragment quotation graph email conversation.

12

Fragment Quotation Graph

Create nodes– Compare quotations and

new messages– a, b, c, d, e, f, g, h, i, j.

Create edges– Neighbouring quotations

13

Outline

Characteristics of email Related work Our summarization approach

– Fragment quotation graph– ClueWordSummarizer (CWS)

Result Conclusions and future work

14

ClueWordSummarizer

Clue words in the fragment quotation graph– A clue word in node (fragment) F is a word which

also appears in a semantically similar form in a parent or a child node of F in the fragment quotation graph.

– E.g.,

15

ClueWordSummarizer

Three types of clue words

– Root/stem:

settle vs. settlement– Synonym/antonym:

war vs. peace– Loose semantic

meaning:

Friday vs. deadline

16

ClueWordSummarizer

1. ClueScore(CW)– A word CW is in a sentence S

of a fragment F

– ClueScore(discussed, a )=1– ClueScore(settle, b ) = 2

))(,(

))(,(),(

FchildCWfreq

FparentCWfreqFCWClueScore

17

ClueWordSummarizer

2.

3. For each conversation, rank all of the sentences based on their ClueScores.

4. Select the top-k sentences as the summary.

scw

FCWClueScoreSClueScore ),()(

18

Outline

Characteristics of email Related work Our summarization approach Result

– User study– Empirical experiments

Conclusions and future work

19

Result 1: User Study

Objective:– Gold standard– How human summarize email conversations

Setup– Dataset: 20 conversations from Enron dataset– Human reviewers: 25 grads/ugrads in UBC– Each sentence is evaluated by 5 different human reviewers. – Select important sentences and mark crucial important ones.

Gold standard– 4 selections and at least 2 are essentially important.– 88 “gold” sentences out of the 20 conversations (12%).

20

Result 1: User Study

Information completeness– 18% gold sentences from hidden emails. – Hidden emails carry crucial information as well.

Significance of clue words– Clue words appears more frequently in the 88

gold sentences. – Average ratio of ClueScore in gold sentences &

ClueScore in non-gold sentences 3.9

21

Result 2: Empirical Experiments

RIPPER A machine learning classifier In the summary or not. 14 features(Rambow et al.): linguistic and email specific. Sentence/conversation level training 10-fold cross validation

CWS & MEAD

The same summary length(2%) as that of RIPPER.

22

Result 2: Empirical Experiments (CWS v.s MEAD)

sumLen = 15% CWS has a higher

accuracy. P-value:

– 0.077 (precision)– 0.049 (recall)– 0.053 (F-measure)

23

Result 2: Empirical Experiments (CWS v.s MEAD)

CWS has a higher accuracy when sumLen <= 30%.

MEAD is more accurate when sumLen = 40% and higher.

Clue words are significant in important sentences.

24

Result 2: Empirical Experiments (Fragment quotation graph)

25

Outline

Characteristics of email Related work Our conversation-based approach Result Conclusions and future work

26

Conclusions and Future Work

Conclusions– The conversation structure is important and

should be paid more attention.– Fragment quotation graph– Clue Words and ClueWordSummarizer– Empirical evaluation

Clue words frequently appears in important sentences. CWS is accurate.

27

Future Work

Refine the fragment quotation graph User study on different dataset Try other ML classifiers Integrate CWS and other methods … …

Thank you!

Questions?

Date post:	03-Jan-2016
Category:	Documents
Upload:	pierce-lamb
View:	214 times
Download:	0 times

Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou...

Documents