+ All Categories
Home > Science > Summarizing discussion threads

Summarizing discussion threads

Date post: 16-Apr-2017
Category:
Upload: radboud-university-nijmegen
View: 75 times
Download: 0 times
Share this document with a friend
21
Summarizing discussion threads Suzan Verberne SAKE, 12-12-2016
Transcript
Page 1: Summarizing discussion threads

Summarizing discussion threads • Suzan Verberne• SAKE, 12-12-2016

Page 2: Summarizing discussion threads

About DISCOSUMO

• Automatic summarization of discussion forum threads

• Radboud University:- Antal van den Bosch- Suzan Verberne

• Tilburg University:- Emiel Krahmer- Sander Wubben

• Sanoma Media

Page 3: Summarizing discussion threads

Case: Viva forum

Page 4: Summarizing discussion threads

Problem

• Discussion forums on the web are an important source of information. • But: forum threads can be extremely long • finding information in a forum thread can be a challenge, especially

when accessing the forum from a mobile device

Can we serve mobile forum users better by showing them summaries of long threads?

Page 5: Summarizing discussion threads

Problem

How to summarize a forum thread?

• Question answering forums (e.g. StackOverflow):- the opening post is a (technical) question and the responses are

answers to that question- the best answer may be selected by the forum community through

voting

• Discussion forums (e.g. Viva, Autoweek, reddit):- opinions and experiences are shared- there is generally no such thing as the best answer- threads can consists of dozens/hundreds posts

Page 6: Summarizing discussion threads

Case: Viva forum

Viva Forum (forum.viva.nl/)• Dutch• predominantly female user community• 19 Million page views per month (1.5 Million unique visitors)• readable for everyone; sample obtained from Sanoma• most threads: experience and opinion sharing • no hierarchy in the threads (‘flat structure’, but quotes possible)• no liking/upvoting

• 21% of threads on Viva forum has >= 20 posts

Page 7: Summarizing discussion threads

Approach

Post/sentence selection: • Show the user only the most important information • Hide the less relevant information in between

Page 9: Summarizing discussion threads

How is it made?

1. Collect example data 2. Train classifiers to learn what are the most important posts and

sentences in a thread3. Apply the classifier to unseen threads4. Use a threshold on the classifier prediction to show more/fewer posts

and sentences

Page 10: Summarizing discussion threads

Collect example data

• If you ask five humans to create a summary of a discussion thread, they create five different summaries

• But: a post selected by four of them it is more important than a post selected by one of them

• We showed 106 long Viva threads to 10 different raters and asked them to select the posts that they consider to be the most important for the thread (number of selected posts decided by rater)

• 57 subjects participated in the study: all female, average age 27

Page 11: Summarizing discussion threads

Page 12: Summarizing discussion threads

Results: Usefulness of thread summarization

• Median usefulness score: 3 (on a 5-point scale) • Standard deviation: 1.14 (averaged over threads)

• For 92% of the threads, at least one subject gave a usefulness score of 3 or higher

• For 62% of the threads, at least half of the subjects gave a usefulness score of 3 or higher

Page 13: Summarizing discussion threads

Results: Agreement between human raters

• Median number of posts selected per thread: 7, with a large standard deviation over raters (6.4)

• The agreement between the human summarizers was low (as expected)Mean Cohen’s Kappa: 0.117

Page 14: Summarizing discussion threads

What determines the importance of a post or sentence?

• Number of words (longer = more important)• Position in the thread (early response = more important)• Punctuation and emoticons (fewer = more important)• Similarity to the complete thread (higher = more important)

Page 15: Summarizing discussion threads

Evaluation setup

• 5-fold cross validation of threads

• Evaluation measures:- Cohen’s Kappa (agreement with humans)- Precision/Recall/F1 (using the human summaries as reference)

• Baselines:- Random: select 7 posts randomly- Position-based: select the first 7 posts- Length-based: select the 7 longest posts

Page 16: Summarizing discussion threads

Results of the automatic summarization

(human-human Kappa: 0.117)

Kappa F1random baseline -0.085 22.8%position baseline 0.060 35.9%length baseline 0.092 38.2%our model 0.138 45.2%

Page 17: Summarizing discussion threads

Results of the automatic summarization

• Two different summaries can still both be good summaries• Is it possible that readers are satisfied by a summary, even though the

summary is different from the summary that they would create themselves?

Pairwise (side-by-side) blind comparison and judgment by human subjects

Page 18: Summarizing discussion threads
Page 19: Summarizing discussion threads

Results of the automatic summarization

• Pairwise (side-by-side) blind comparison and judgment by human subjects: a human summary vs. our model’s summary

- human summary wins 48.3% of the comparison- model summary wins 35.7% of the comparisons- tie: 16.1% of the comparisons

in 51.7% of the direct comparisons, the summary by our model is considered equal to or better than the human-made summary

Page 20: Summarizing discussion threads

Conclusions

• Subjects value the idea of thread summarization through post selection• But inter-rater agreement for this task is low

• Despite the low agreement,• we can automatically generate summaries that will in half of the cases be

judged equal to or better than summaries created by another human

• Also, the agreement between the model and human subjects is not lower than the agreement among human subjects

• Two different summaries can both be good summaries

Page 21: Summarizing discussion threads

Thank you! Questions?

• http://discosumo.ruhosting.nl/ • http://sverberne.ruhosting.nl/


Recommended