Download - 1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.

1

Sentence-extractive automatic speech summarization and evaluation techniques

Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui

Presented by Yi-Ting Chen

2

Reference

• Makoto Hirohata, Yosuke Shinnaka, Koji Iwano and Sadaoki Furui, Sentence-extractive automatic speech summarization and evaluation techniques, Speech Communication, In Press, Corrected Proof, , Available online 5 June 2006

3

Outline

• Introduction• Sentence extraction methods• Objective evaluation metrics• Experiments• Conclusion

4

Introduction

• Why summarize?– Recognition errors cause transcriptions of obtained from

spontaneous speech to include irrelevant or incorrect information– Spontaneous speech is ill-formed and usually includes redundant

information– Direct transcriptions are therefore not useful

• They have proposed a two-stage summarization method, but it was confirmed that sentence extraction is more important than sentence compaction

• This paper investigates and evaluates sentence extraction-based speech summarization techniques under 10% summarization ratio

5

Sentence extraction methods

6


• Extraction using sentence location– One hundred and sixty nine presentation were used in the

analysis

– This result shows that human subjects tend to extract sentence from introduction and conclusion segments under 10% summarization ratio

– This is no such tendency at 50%

7


• Extraction using sentence location– The introduction and conclusion segments are estimated based

on the Hearst method using sentence cohesiveness– The cohesiveness C(r) for each segmentation boundary r is mea

sured by a cosine value:

– The segmentation boundary for the end of an introduction part is the beginning of the presentation where the cohesiveness becomes lower than a preset threshold set to 0.1

)()(

)()()(

rVrV

rVrVrC

ab

ab

c

8


• Extraction using confidence and linguistic scores– Confidence score: a logarithmic value of the posterior

probability for each transcribed word

• Linguistic score

)()()()( iLiCiSiScore LC

iN

nn

i

wcN

iC1

)(1

)(

ii N

nnnn

i

N

nn

i

wwwPN

wlN

iL1

121

)|(log1

)(1

)(

9


• Extraction using a significance scores

• Extraction using latent semantic analysis


w

Aww

N

nn

i

F

FficffwI

wIN

iSi

log)(

)(1

)(1

icffa jiji

ikviS )(

12

k

10


• Extraction using dimension reduction based on SVD


Mi

i

i

i

i

a

a

a

a

A

3

2

1

SVD

iNN

i

i

i

v

v

v

A

22

11

ˆ

Dimension reduction

iKK

i

i

v

v

11

Weighted word-frequency vector

Weighted singular-value

vector

Reduced dimension vector

K

kikkviS

1

2)(

11

Objective evaluation metrics

• Summarization accuracy– All the human summaries are merged into a single word network

– Word accuracy of the automatic summary is then measured as a summarization accuracy in comparison with the closest word string extracted from the word network (SumACCY)

– Problem: the variation between manual summaries is so large that the network accepts inappropriate summaries

– Using word accuracy obtained by using the manual summaries individually was proposed (SumACCY-E, SumACCY-E/Max, SumACCY-E/avg)

12


• Sentence F-measure– Sentence recall/precision or F-measure is commonly used in eva

luating sentence-extraction text summarization.– Since sentence boundaries are not explicitly indicated in input sp

eech– Extraction of a sentence in the recognition result is considered a

s extraction of one or multiple sentence in the manual summary having an overlap of 50% or more words

– F-measure/max, F-measure/ave.

precisionrecall

precisionrecallmeasureF

2

13


• N-gram recall– ROUGE-N is an N-gram recall between an automatic summary a

nd a set of manual summaries.

–

– SH is a set of manual summaries, S is an individual manual summary, gn is an N-gram, C(gn) is the number of gn’s in the manual summary, and C(gn) is the number of co-occurrences of gn in the manual summary and automatic summary.

– 1-grams, 2-grams, and 3-grams

H n

H n

m nS S g S

nS S g S

C g

ROUGE NC g

14

Experiments

• Experimental conditions – 30 presentations by 20 males and 10 females in the CSJ were

automatically summarized at 10% summarization ratio.– Mean word recognition accuracy was 69%.– Sentence boundaries in the recognition results were

automatically determined using language models, which achieved 72% recall and 75% precision.

• Subjective evaluation– 180 automatic summaries (30 presentations X 6 summarization

methods) were evaluated by 12 human subjects– Methods: SIG, LSA, DIM, SIG+IC, LSA+IC, and DIM+IC– DIM: k set to 5

15

Experiments

• Correlation between subjective and objective evaluation results

16

17

Experiments

• Correlation between subjective and objective evaluation results

18

19

Experiments

• objective evaluation results for various summarizations– The weight factors were optimized a posteriori so that the

objective evaluation score got maximized– The parameter K was also set a posteriori to 1

Word recognition accuracy

20

Experiments

• objective evaluation results for various summarizations

Word recognition accuracy

21

Conclusion

• A sentence extraction method using dimension reduction based on SVD and another method using sentence location information are proposed

• Among the objective evaluation metrics, summarization accuracy, sentence F-measure, 2 and 3-gram recall were found to the effective under 10% summarization ratio

• It was confirmed that the summarization method using sentence location improves summarization results