+ All Categories
Home > Documents > Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Date post: 13-Mar-2016
Category:
Upload: orson-cote
View: 29 times
Download: 0 times
Share this document with a friend
Description:
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News. Gina-Anne Levow University of Chicago SIGHAN July 25, 2004. Roadmap. The Problem: Mandarin Story Segmentation The Tools: Prosodic and Text Cues Mandarin Chinese Individual Results Integrating Cues - PowerPoint PPT Presentation
Popular Tags:
22
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004
Transcript
Page 1: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Combining Prosodic and Text Featuresfor Segmentation of

Mandarin Broadcast NewsGina-Anne Levow

University of ChicagoSIGHAN

July 25, 2004

Page 2: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Roadmap

• The Problem: Mandarin Story Segmentation• The Tools: Prosodic and Text Cues

– Mandarin Chinese• Individual Results• Integrating Cues• Conclusion & Future Work

Page 3: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

The Problem:Mandarin Speech Topic Segmentation

•Separate audio stream into component topics

Page 4: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Why Segment?

• Enables language understanding tasks– Information Retrieval

• Only regions of interest– Summarization

• Cover all main topics– Reference Resolution

• Pronouns tend to refer within segments

Page 5: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

The Challenge

• How do we define/measure topicality?– Are two regions on the same topic?– Fundamentally requires full understanding

• How can we approach with partial understanding?

• How do we identify boundaries sharply?– Association of sentences may be ambiguous

• Especially, “filler”

Page 6: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

The Tools: Prosodic and Text Cues

• Represent local changes at boundaries with audio– Silence!, speaker change, pitch, loudness, rate (GHN, AT&T00)

• Represent topicality with text– Component words in audio stream

• Possibly noisy • Many possible models (Hearst 94, Beeferman99,..)

• Combining Prosody and Text – Human annotators more accurate, confident if use BOTH

transcribed text and original audio!! (Swerts 97)– English broadcast news (Tur et al, 2001)

Page 7: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Data and Processing

• Broadcast News– Topic Detection and Tracking TDT3 corpus– Voice of America broadcast news

• ASR transcription• Manually segmented – known boundaries

– ~4,000 stories, ~750K words • Acoustic analysis (Praat)

– Automatic pitch, intensity tracking• Smoothed, speaker-normalized, per-word

Page 8: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Acoustic-Prosodic Cues

• Languages differ in use of intonation– E.g. English: declarative fall, question rise– Chinese: pitch contour determines word meaning

• At segment boundaries???– Surprisingly similar, though not identical– Significantly lower pitch at end of segment– Significantly lower amplitude at end of segment– Significantly longer duration at end of segment

Page 9: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Acoustic-Prosodic Contrasts

-0.25

-0.2

-0.15

-0.1

-0.05

0

Non-finalFinal

MandarinNormalized Pitch

MandarinNormalizedIntensity

Page 10: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Learning Boundaries

• Decision tree classifier (Quinlan C4.5)– Classification problem

• For each word, classify as final/non-final

• Features– Acoustic-Prosodic:

• Duration, Pitch, Loudness, Silence– Word average, Between-word difference

Page 11: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Text Boundary Features

– Text• Information retrieval style

– Cosine similarity between weighted term vectors» tf*idf in 50-word windows

• Cue phrases– N-gram features

» Identified by BoosTexter (Schapire & Singer, 2000)– E.g. “Voice of America”, “Audience”, “Reporting”

Page 12: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Classification Results• Balanced training and test sets

– Results on held-out subsets• Acoustic cues only

– 95.6% accuracy • Text cues (+ silence)

– 95.6% accuracy• Combined text and prosody

– 96.4% accuracy

• Typically, false alarms twice as common as miss

Page 13: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Joint Decision Tree

<<

Page 14: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Feature Assessment

•Role of silence•Useful in both text and acoustic classifiers

•More necessary for text•Text captures topicality, not locality

•Can not identify boundaries sharply•Prosodic cues:

•Localize boundaries•Multiple supporting cues: intensity, pitch: contrastive use

Page 15: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Issue: False Alarms

• Evaluate representative sample– Boundary <<< Non-boundary– 95.6% accuracy

• 2% miss, 4.4% false alarms

• Non-boundary frequent• False alarms frequent

Page 16: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Voting Against False Alarms

• Error analysis:– Construct per-feature classifiers:

• Prosody-only, text-only, silence-only

– Compare classifiers: per-feature, joint• Joint + 0,1 per-feature classifer FALSE ALARM

• Approach: Voting– Require joint + 2 per-feature classifiers

• Result: 1/3 reduction in false alarms– ~97% accuracy: 2.8% miss, 3.15% false alarm

Page 17: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Conclusion

• Mandarin broadcast news segmentation– Identify topicality and boundary locality

• Integrate text and acoustic cues– Text similarity: vector space model, n-gram cues– Prosodic cues: Silence, intensity, pitch, duration

» Robust across range of languages

• Provide supporting and orthogonal information• Majority agreement of per-feature classifiers:

– 1/3 fewer alarms

Page 18: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Current & Future Work• Improving the model of topicality

– Richer text similarity models; broader acoustic models• Alternative classifiers

– Preliminary experiments: • Boosting, Boosted Decision trees, MaxEnt

– Comparable– Alternative integration strategies

• Hierarchical subtopic segmentation– Broadcast news– Dialogue: human-computer, human-human

• Integration with multi-modal features: e.g. gesture, gaze

Page 19: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Acoustic-Prosodic Contrasts

-0.25

-0.2

-0.15

-0.1

-0.05

0

Non-finalFinal

MandarinNormalized Pitch

MandarinNormalizedIntensity

EnglishNormalized Intensity

EnglishNormalized Pitch

Page 20: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Text Decision Tree

Page 21: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

Prosodic Decision Tree

Page 22: Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News

The Problem:Speech Topic Segmentation

• Separate audio stream into component topics

On "World News Tonight" this Thursday, another bad day on stock markets, all over the world global economic anxiety. || Another massacre in Kosovo, the U.S. and its allies prepare to do something about it. Very slowly. ||And the millennium bug, Lubbock Texas prepares for catastrophe, India sees only profit.||


Recommended