1
Sentence-extractive automatic speech summarization and evaluation techniques
Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui
Presented by Yi-Ting Chen
2
Reference
• Makoto Hirohata, Yosuke Shinnaka, Koji Iwano and Sadaoki Furui, Sentence-extractive automatic speech summarization and evaluation techniques, Speech Communication, In Press, Corrected Proof, , Available online 5 June 2006
3
Outline
• Introduction• Sentence extraction methods• Objective evaluation metrics• Experiments• Conclusion
4
Introduction
• Why summarize?– Recognition errors cause transcriptions of obtained from
spontaneous speech to include irrelevant or incorrect information– Spontaneous speech is ill-formed and usually includes redundant
information– Direct transcriptions are therefore not useful
• They have proposed a two-stage summarization method, but it was confirmed that sentence extraction is more important than sentence compaction
• This paper investigates and evaluates sentence extraction-based speech summarization techniques under 10% summarization ratio
5
Sentence extraction methods
6
Sentence extraction methods
• Extraction using sentence location– One hundred and sixty nine presentation were used in the
analysis
– This result shows that human subjects tend to extract sentence from introduction and conclusion segments under 10% summarization ratio
– This is no such tendency at 50%
7
Sentence extraction methods
• Extraction using sentence location– The introduction and conclusion segments are estimated based
on the Hearst method using sentence cohesiveness– The cohesiveness C(r) for each segmentation boundary r is mea
sured by a cosine value:
– The segmentation boundary for the end of an introduction part is the beginning of the presentation where the cohesiveness becomes lower than a preset threshold set to 0.1
)()(
)()()(
rVrV
rVrVrC
ab
ab
c
8
Sentence extraction methods
• Extraction using confidence and linguistic scores– Confidence score: a logarithmic value of the posterior
probability for each transcribed word
• Linguistic score
)()()()( iLiCiSiScore LC
iN
nn
i
wcN
iC1
)(1
)(
ii N
nnnn
i
N
nn
i
wwwPN
wlN
iL1
121
)|(log1
)(1
)(
9
Sentence extraction methods
• Extraction using a significance scores
• Extraction using latent semantic analysis
)()()()( iLiCiSiScore LC
w
Aww
N
nn
i
F
FficffwI
wIN
iSi
log)(
)(1
)(1
icffa jiji
ikviS )(
12
k
10
Sentence extraction methods
• Extraction using dimension reduction based on SVD
)()()()( iLiCiSiScore LC
Mi
i
i
i
i
a
a
a
a
A
3
2
1
SVD
iNN
i
i
i
v
v
v
A
22
11
ˆ
Dimension reduction
iKK
i
i
v
v
11
Weighted word-frequency vector
Weighted singular-value
vector
Reduced dimension vector
K
kikkviS
1
2)(
11
Objective evaluation metrics
• Summarization accuracy– All the human summaries are merged into a single word network
– Word accuracy of the automatic summary is then measured as a summarization accuracy in comparison with the closest word string extracted from the word network (SumACCY)
– Problem: the variation between manual summaries is so large that the network accepts inappropriate summaries
– Using word accuracy obtained by using the manual summaries individually was proposed (SumACCY-E, SumACCY-E/Max, SumACCY-E/avg)
12
Objective evaluation metrics
• Sentence F-measure– Sentence recall/precision or F-measure is commonly used in eva
luating sentence-extraction text summarization.– Since sentence boundaries are not explicitly indicated in input sp
eech– Extraction of a sentence in the recognition result is considered a
s extraction of one or multiple sentence in the manual summary having an overlap of 50% or more words
– F-measure/max, F-measure/ave.
precisionrecall
precisionrecallmeasureF
2
13
Objective evaluation metrics
• N-gram recall– ROUGE-N is an N-gram recall between an automatic summary a
nd a set of manual summaries.
–
– SH is a set of manual summaries, S is an individual manual summary, gn is an N-gram, C(gn) is the number of gn’s in the manual summary, and C(gn) is the number of co-occurrences of gn in the manual summary and automatic summary.
– 1-grams, 2-grams, and 3-grams
H n
H n
m nS S g S
nS S g S
C g
ROUGE NC g
14
Experiments
• Experimental conditions – 30 presentations by 20 males and 10 females in the CSJ were
automatically summarized at 10% summarization ratio.– Mean word recognition accuracy was 69%.– Sentence boundaries in the recognition results were
automatically determined using language models, which achieved 72% recall and 75% precision.
• Subjective evaluation– 180 automatic summaries (30 presentations X 6 summarization
methods) were evaluated by 12 human subjects– Methods: SIG, LSA, DIM, SIG+IC, LSA+IC, and DIM+IC– DIM: k set to 5
15
Experiments
• Correlation between subjective and objective evaluation results
16
17
Experiments
• Correlation between subjective and objective evaluation results
18
19
Experiments
• objective evaluation results for various summarizations– The weight factors were optimized a posteriori so that the
objective evaluation score got maximized– The parameter K was also set a posteriori to 1
Word recognition accuracy
20
Experiments
• objective evaluation results for various summarizations
Word recognition accuracy
21
Conclusion
• A sentence extraction method using dimension reduction based on SVD and another method using sentence location information are proposed
• Among the objective evaluation metrics, summarization accuracy, sentence F-measure, 2 and 3-gram recall were found to the effective under 10% summarization ratio
• It was confirmed that the summarization method using sentence location improves summarization results