Date post: | 13-Dec-2014 |
Category: |
Technology |
Upload: | multimedia-networking-and-systems-laboratory |
View: | 626 times |
Download: | 4 times |
Inferring Speech Activity from Encrypted Skype Traffic
Yu‐Chun Chang, Kuan‐Ta Chen, Chen‐Chi Wu, and
Chin‐Laung Lei
Oct. 27, 2008
2008/10/27 1
Outline
• Introduction
• Data description
• Proposed scheme
• Performance evaluation
• Conclusion
2008/10/27 2
Introduction
• VAD (Voice Activity Detection)– The algorithm to extract the presence or absence of human speech in speech processing.
• Source‐level VAD– Audio signal
– Silence suppression
• Network‐level VAD– Network traffic
– Flow identification, QoS measurement
2008/10/27 3
• The differences between source‐level and network‐level VAD
2008/10/27 4
source‐level network‐level
input audio signal network traffic
location speaker’s host network node
purpose silence suppressionecho cancellation
traffic managementQoS measurement
Introduction (contd.)
• Challenges– Payload encryption
– Skype do not support silence suppression
• Contribution– We propose a network‐level VAD that can infers speech activity from encrypted and non‐silence‐suppressed VoIP traffic.
2008/10/27 5
Data Description
• Experiment setup
2008/10/27 6
(Chosen by Skype)
Network traffic Audio signal
Data Description (contd.)
• Trace summary
2008/10/27 7
Total # of traces # TCP # UDP
1839 1427 412
# Relay node Mean packet size Mean time period
1677 109.6 bytes 612.5 sec
Proposed Scheme
• The indicator of voice activity – packet size
• Smoothing
• Adaptive thresholding
2008/10/27 8
The indicator of voice activity – Packet size
2008/10/27 9
Smoothing
• EWMA (Exponentially Weighted Moving Average)
2008/10/27 10
1)1( −−+= iii PYP λλEWMA :
Y : Observed packet sizeP : Smoothed packet size
)2.0( =λ
Adaptive thresholding
2008/10/27 11
Packet Size (bytes)
Adaptive thresholding (contd.)
2008/10/27 12
P : 140 bytes
T1 : 74 bytesT2 : 80 bytes
(P + T1)/2 = 107 bytes(P + T2)/2 = 110 bytes
Adaptive thresholding (contd.)
2008/10/27 13estimated ON periods
Packet Size (bytes)
Performance Evaluation
• Number of ON periods
2008/10/27 14
periodsONtrueofNumberperiodsONestimatedofNumber
________
Performance Evaluation (contd.)
• Average length of ON periods
2008/10/27 15
periodsONtrueoflengthMeanperiodsONestimatedoflengthMean
__________
Performance Evaluation (contd.)
2008/10/27 16
• State correctness
NorMNandM
____
True speech activity (M) : 0 0 1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 0
Estimated speech activity (N): 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 1 1
M and N: 0 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 0 1 1 0
M or N: 0 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1
ON period ‐> 1OFF period ‐> 0
Performance Evaluation (contd.)
• State correctness
2008/10/27 17
Conclusion
• We propose the network‐level VAD which infers speech activity from network traffic instead of audio signal.
• We propose a VAD algorithm that can extract voice activity from encrypted and non‐silence‐suppressed VoIP network traffic.
2008/10/27 18
• Thanks
2008/10/27 19
Backup slides
2008/10/27 20
VAD on audio signaling
2008/10/27 21
)log(*10 2∑=i
iSvolume
J.‐S. R. Jang, “Audio signal processing and recognition,”http://www.cs.nthu.edu.tw/jang
Static threshold : 183 db
2008/10/27 22
2008/10/27 23
I am a student of National Taiwan University.
Performance Evaluation (contd.)
• Number of ON periods
2008/10/27 24
Performance Evaluation (contd.)
• Average length of ON periods
2008/10/27 25
Performance Evaluation (contd.)
2008/10/27 26
• State correctness
NMNM
∪∩
True speech activity (M) : 0 0 1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 0
Estimated speech activity (N): 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 0 0 1 1 1 0 1 1 1