Summarizing discussion threads

Summarizing discussion threads • Suzan Verberne• SAKE, 12-12-2016

About DISCOSUMO

• Automatic summarization of discussion forum threads

• Radboud University:- Antal van den Bosch- Suzan Verberne

• Tilburg University:- Emiel Krahmer- Sander Wubben

• Sanoma Media

Case: Viva forum

Problem

• Discussion forums on the web are an important source of information. • But: forum threads can be extremely long • finding information in a forum thread can be a challenge, especially

when accessing the forum from a mobile device

Can we serve mobile forum users better by showing them summaries of long threads?

Problem

How to summarize a forum thread?

• Question answering forums (e.g. StackOverflow):- the opening post is a (technical) question and the responses are

answers to that question- the best answer may be selected by the forum community through

voting

• Discussion forums (e.g. Viva, Autoweek, reddit):- opinions and experiences are shared- there is generally no such thing as the best answer- threads can consists of dozens/hundreds posts

Case: Viva forum

Viva Forum (forum.viva.nl/)• Dutch• predominantly female user community• 19 Million page views per month (1.5 Million unique visitors)• readable for everyone; sample obtained from Sanoma• most threads: experience and opinion sharing • no hierarchy in the threads (‘flat structure’, but quotes possible)• no liking/upvoting

• 21% of threads on Viva forum has >= 20 posts

http://forum.viva.nl/

http://forum.viva.nl/

Approach

Post/sentence selection: • Show the user only the most important information • Hide the less relevant information in between

Demo

http://localhost:5000/

How is it made?

1. Collect example data 2. Train classifiers to learn what are the most important posts and

sentences in a thread3. Apply the classifier to unseen threads4. Use a threshold on the classifier prediction to show more/fewer posts

and sentences

Collect example data

• If you ask five humans to create a summary of a discussion thread, they create five different summaries

• But: a post selected by four of them it is more important than a post selected by one of them

• We showed 106 long Viva threads to 10 different raters and asked them to select the posts that they consider to be the most important for the thread (number of selected posts decided by rater)

• 57 subjects participated in the study: all female, average age 27

…

Results: Usefulness of thread summarization

• Median usefulness score: 3 (on a 5-point scale) • Standard deviation: 1.14 (averaged over threads)

• For 92% of the threads, at least one subject gave a usefulness score of 3 or higher

• For 62% of the threads, at least half of the subjects gave a usefulness score of 3 or higher

Results: Agreement between human raters

• Median number of posts selected per thread: 7, with a large standard deviation over raters (6.4)

• The agreement between the human summarizers was low (as expected)Mean Cohen’s Kappa: 0.117

What determines the importance of a post or sentence?

• Number of words (longer = more important)• Position in the thread (early response = more important)• Punctuation and emoticons (fewer = more important)• Similarity to the complete thread (higher = more important)

Evaluation setup

• 5-fold cross validation of threads

• Evaluation measures:- Cohen’s Kappa (agreement with humans)- Precision/Recall/F1 (using the human summaries as reference)

• Baselines:- Random: select 7 posts randomly- Position-based: select the first 7 posts- Length-based: select the 7 longest posts

Results of the automatic summarization

(human-human Kappa: 0.117)

Kappa F1random baseline -0.085 22.8%position baseline 0.060 35.9%length baseline 0.092 38.2%our model 0.138 45.2%


• Two different summaries can still both be good summaries• Is it possible that readers are satisfied by a summary, even though the

summary is different from the summary that they would create themselves?

Pairwise (side-by-side) blind comparison and judgment by human subjects


• Pairwise (side-by-side) blind comparison and judgment by human subjects: a human summary vs. our model’s summary

- human summary wins 48.3% of the comparison- model summary wins 35.7% of the comparisons- tie: 16.1% of the comparisons

in 51.7% of the direct comparisons, the summary by our model is considered equal to or better than the human-made summary

Conclusions

• Subjects value the idea of thread summarization through post selection• But inter-rater agreement for this task is low

• Despite the low agreement,• we can automatically generate summaries that will in half of the cases be

judged equal to or better than summaries created by another human

• Also, the agreement between the model and human subjects is not lower than the agreement among human subjects

• Two different summaries can both be good summaries

Thank you! Questions?

• http://discosumo.ruhosting.nl/ • http://sverberne.ruhosting.nl/

http://discosumo.ruhosting.nl/

http://discosumo.ruhosting.nl/

http://sverberne.ruhosting.nl/

http://sverberne.ruhosting.nl/

Date post:	16-Apr-2017
Category:	Science
Upload:	radboud-university-nijmegen
View:	75 times
Download:	0 times

Summarizing discussion threads

Science