Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | lorena-cole |
View: | 218 times |
Download: | 0 times |
User Benefits of Non-Linear Time Compression
Liwei He and Anoop Gupta
Microsoft Research
Introduction
Time compression: key to browse AV content
We focus on informational content
Audio time compression algorithms
Linear: speed up audio uniformly
Non-linear: exploit fine-grain structure of human speech (e.g. pause, phonemes)
How much more do users gain from more complex algorithms?
Methodology
Conduct user listening test
One Linear TC algorithm
Two Non-linear TC algorithms
Simple: Pause-removal followed by Linear TC
Sophisticated: Adaptive TC
Compare objective and subjective measurements
Time Compression Algorithms
Linear Time Compression
Classic algorithms
Overlap Add (OLA) and Synchronized OLA (SOLA)
We use SOLA
Non-Linear Time Compression
Algorithm 1: Pause removal plus TC
Energy and Zero Crossing Rate analysis
Leave 150ms untouched
Shorten >150ms to 150ms
Apply SOLA algorithm
PR shortens speech by 10-25%
Non-Linear Time Compression (cont.)
Algorithm 2: Adaptive TC
Mimics people when talking fast
Pauses and silences are compressed the most
Stressed vowels are compressed the least
Consonants are compressed more than vowels
Consonants are compressed based on neighboring vowels
System Implications
Computational complexity
Adaptive TC 10x more costly than Linear TC
Complexity in client-server implementation
Buffer management required for non-linear TC
Audio-video synchronization quality
User Study Method
User Study Goals
Highest intelligible speed
Comprehension
Subjective preference
Sustainable speed
Experiment Method
24 subjects
4 tasks for each subject
3 time compression algorithms
Linear TC using SOLA (Linear)
Pause removal plus Linear TC (PR-Lin)
Adaptive TC (Adapt)
Each test takes approximately 30 minutes
Highest Intelligible Speed Task
3 clips from technical talks
Find the highest speed when most of words are understandable
Comprehension Task
3 clips at 1.5x and 3 clips at 2.5x
Clips from TOEFL listening test
Answer 4 multiple choice questions
Subjective Preference Task
3 pairs of clips at 1.5x
3 pairs of clips at 2.5x
Each pair contains the same clip compressed with 2 of the 3 TC algorithms
Indicate preference on 3-point scale
Sustainable Speed Task
3 clips each 8 minute along
Clips from a CD audio book
Find the maximum comfortable speed
Write a 4-5 sentence summary at the end
User Study Results
Highest Intelligible Speed Task
PR-Lin is significantly better than Adapt (p<.01)
0
0.5
1
1.5
2
2.5
3
Linear PR-Lin Adapt
Co
mp
res
sio
n R
ate
Comprehension Task
0
10
20
30
40
50
60
70
80
90
Linear PR-Lin Adapt
Sc
ore
(%
)
1.5x
2.5x
Adapt is better than PR-Lin (p=.083) at 2.5x
Preference Task at 1.5x
Slight preference for PR-Lin (p=.093)
1.5xPrefer Former
Prefer None
Prefer Latter
Linear vs. PR-Lin
6 5 13
PR-Lin vs. Adapt
13 5 6
Adapt vs. Linear
8 8 8
Preference Task at 2.5x
PR-Lin and Adapt do significantly better than Linear
2.5xPrefer Former
Prefer None
Prefer Latter
Linear vs. PR-Lin
2 8 14
PR-Lin vs. Adapt
4 9 11
Adapt vs. Linear
21 3 0
Sustainable Speed Task
0
0.5
1
1.5
2
2.5
Linear PR-Lin Adapt
Co
mp
res
sio
n R
ate
Conclusions
Previous Works
Mach1 (Covell et. al. ICASSP 98)
Comprehension and preference tasks
Comparing Linear and Mach1 (Adapt) at 2.6-4.2x
Comprehension scores 17% better w/ Mach1
95% prefers Mach1 to Linear
No data on < 2.0x
Other works (Harrigan, Omoigui, Li, Foulke)
1.2-1.7x is the sustainable listening speed
Conclusions
Trade off in TC algorithms is task-related
Listening: Linear TC is sufficient
Fast Forwarding: Non-linear TC is more suitable
Adapt TC is close to the way people talk fast
Limit lies in the human-listening and comprehension