Week 7 Psychoacoustic Compression 1 ESE 250 – S’12 Kod & DeHon
ESE250:
Digital Audio Basics
Week 7
February 23, 2012
Psychoacoustic
Compression
2
Course Map
Numbers correspond to course weeks
2,5 6
11
13
12
Week 7 Psychoacoustic Compression
Today: audio
signal processing
– putting it all
together
ESE 250 – S’12 Kod & DeHon
Week 7 Psychoacoustic Compression 3 ESE 250 – S’12 Kod & DeHon
Today’s Agenda
?
• How do we compress from
WAV Bit Rate (per channel):
~ 700 kbps @ 44.1 kHz
• Down to
MP3 Target (per channel) :
~ 60 kbps @ 44.1 kHz
?
Where are we ? • Week 2 Received signal is sampled &
quantized
q = PCM[ r ]
• Week 3 Quantized Signal is Coded
c =code[ q ]
• Week 4 Sampled signal first
transformed into frequency domain
Q = DFT[ q ]
• Week 5 signal oversampled & low
pass filtered
Q = LPF[ DFT(q+n) ]
• Week 6 Transformed signal analyzed
Using human psychoaoustic models
• Week 7 Acoustically Interesting signal
is “perceptually coded”
C = MP3[ Q]
Over
Sample DFT LPF
Decode Produce
r(t)
p(t)
q + n
C Perceptual
Coding
Store /
Transmit
Q + N Q
Week 4
Week 6
Week 5 Week 3
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
Week 7 Psychoacoustic Compression ESE 250 – S’12 Kod & DeHon 4
Week 7 Psychoacoustic Compression 5 ESE 250 – S’12 Kod & DeHon
- T0/2 T0/2 T0 -T0
TA
- T0/2 T0/2
TN
- T0/2 T0/2
Week 5 Review (Oversampling) audio-relevant signal
q(t)
Time window, T0 sec per “block”
Nyquist sample rate, TA
ideally nA = T0 / TA samples per “block”: q = (qnA
… , q-2 , q-1 , q0 , q1 , q2, …, qnA,,)
= PCM[ q(t) ]
• ambient noise n(t)
Nyquist sample rate, TN << TA
• receive signal in a block nS = T0 / TN >> nA = T0 / TA
ultimately, record
r = (r-nN , … , r-2 , r-1 , r0 , r1 , r2 , … , rnN
) = PCM[ r(t) ]
= PCM[ q(t) + n(t)]
= q + n
= (q-nN + n-nN
, … , q-2 + n-2, q-1 + n-1 ,
q0 + n0, q1 + n1, q2 + n2, … , qnN + n-nN
)
q(t)
n(t)
r(t) = q(t) + n(t)
Week 7 Psychoacoustic Compression 6 ESE 250 – S’12 Kod & DeHon
- T0/2 T0/2
r(t) = q(t) + • Given r, compute frequency domain representation
R = DFT[ r ]
= (R-nN , … , R-2 , R-1 , R0 , R1 , R2 , … , RnN
)
= (Q-nN + N-nN
, … , Q-2 + N-2, Q-1 + N-1 , Q0 + N0, Q1 + N1, Q2 + N2, … , QnN + N-nN
)
• introduce assumptions about frequency content:
k > nA = A / 0 ) Qk = 0
k < nA = A / 0 ) Nk = 0
• to realize
R = (0,…, 0, QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,
, 0,…, 0 )
+ (N-nM,… , N-nA
, 0 , … , 0 , 0 , 0 , 0 , 0 , … , 0 , NnA
,…, NnM )
• Low Pass Filter
Q = (QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,
)
• Bit count = nA 32
T0 ¼ 1/44 sec ) nA ¼ 1000
T0 ¼ 1/88 sec ) nA ¼ 500
Week 5 Review (Anti-Aliasing)
… … nA nS - nS - nA
… … nA nS - nS - nA
DFT
LPF
Week 7 Psychoacoustic Compression 7 ESE 250 – S’12 Kod & DeHon
Week 6 Review (Hearing Model)
• Power Spectrum Model of Hearing:
Critical Bands: Auditory system contains finite array of adaptively tunable, overlapping bandpass filters
Frequency Bins: humans process a signal’s component (against noisy background) in the one filter with closest center frequency
Masking: certain signal components in a given band are “favored” and others are filtered out
• Established through decades of psychoacoustic experiments
• Model underlying today’s algorithmic thinking
B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.
Week 7 Psychoacoustic Compression 8 ESE 250 – S’12 Kod & DeHon
• acquire &
transform “frame”
• assign
frequencies to
bands
use
psychoacoustic
model lookup
to determine
frequency
bandwidth
of each critical
band
Today: Critical Band Assignments
… … nA nS - nS - nA
|Q|
LUT
Bands
QnA
… …
Q1 Q2 Q3 Q4 Q5 Qk-1 Qk+1 Qk
…
… …
1 2 k 22 … …
Week 7 Psychoacoustic Compression 9 ESE 250 – S’12 Kod & DeHon
• use psychoacoustic model
• to minimize bits per critical band
Cband k(j) = Round[Qband k(j) , Level]
• by appeal to “masking” models
Qband k = Cband k + Dband k
where distortion (“perceptual noise”)
Dband k
between retained signal
Cband k
and actual signal
Qband k
should be “masked” by retained
signal
Today: Code (Compress) Frequencies
Bands
Lossy Coding
Week 7 Psychoacoustic Compression 10 ESE 250 – S’12 Kod & DeHon
• E.g., Look at kth band
Qband k = (Qband k(1) , … ,,Qband k(m) ) amplitudes represented as reals
• Determine masking paradigm
tone-masking-noise
noise-masking-tone
noise-masking-noise
• E.g., for tone masker,
Pick tone frequency, band k (j)
at maximal amplitude in the band
• Choose quantization level and compute compressed signal
Cband k(j) = Round[Qband k(j) , Level]
Cband k = (0band k(1) , … , Cband k(j) , … , 0band k(m) )
• Assess noise magnitude, | Dband k |
Dband k =
(Qband k(1) , … , Qband k(j) , … ,,Qband k(m) )
- (0band k(1) , … , Cband k(j) , …, 0band k(m)
• Use psychoacoustic model
to determine whether compressed signal
will mask the distortion noise for that band
Single Band SPL
frequency
SPL
1 bit | Noise |
frequency
SPL
1 bit Signal
Real Input
frequency
frequency
SPL
2 bit Signal
frequency
SPL
2 bit | Noise |
SMR for
2 bits
SMR for 1 bit
more bits yields larger signal-to-mask-ratio
Week 7 Psychoacoustic Compression 11 ESE 250 – S’12 Kod & DeHon
Overview of Perceptual Coding • Goals
digitally represent a signal
minimum number of bits
“transparent” reproduction
o most sensitive human
o cannot distinguish between
o original and generated signal
• Perceptual Entropy Using psychoacoustic model to estimate information content of audio
signals
[J. D. Johnston. IEEE J. Sel. Ar. Comm., 6(2):314–323, 1988]
suggested transparency achievable at 2 bits per sample
or 88 kbps @ 44.1 kHz
• Our present (WAV) bit count = 32 bits per sample
T0 ¼ 1/44 sec ) nA ¼ 1000 ) bits per frame ¼ 32 ¢ 44 kbps ¼ 1.4 ¢ 103 kbps
T0 ¼ 1/88 sec ) nA ¼ 500 ) bits per frame ¼ 16 ¢ 88 kbps ¼ 1.4 ¢ 103 kbps
• Our leverage?
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
Week 7 Psychoacoustic Compression 12 ESE 250 – S’12 Kod & DeHon
What are the “Knobs” ?
• “Reservoir”
Are all frames equally full of audio information?
• Masking
How should we exploit the perceptual model?
• Local decoupling
Can we exploit (rough) independence of each
band?
• Global accounting
How should we re-impose the average frame-rate?
Week 7 Psychoacoustic Compression 13 ESE 250 – S’12 Kod & DeHon
MP3 Encoder Design Strategy • Target: ~ 1032 bits per frame
~ 2 bits per sample ~ 512 samples per frame
• Use masker(signal)-masking-maskee(noise) paradigm assume 1 masker “costs” ~ 2 K bits ) retained signal, C should consist of ~ 210 – K maskers on average
o specified amplitudes
o at specified frequencies
allocated to some subset of the ~ 32 = 25 critical bands
• Rough algorithm for computing retained signal (i) frame bit-reservoir: supplies bits per each critical band (ii) commit to masker(s) within each band (iii) attempt to mask within-band distortion with available bits (iv) each band: give bits back to or take more from reservoir (v) iterate
Week 7 Psychoacoustic Compression 14 ESE 250 – S’12 Kod & DeHon
Emerging Picture
Bits Retained
Audio
Quality
Knob #1
Knob #2
Knobs
#1 & #2
Week 7 Psychoacoustic Compression 16 ESE 250 – S’12 Kod & DeHon
The Ultimate Boss
• Subjective Quality Scales
• International Standards
(a) Absolute impairment
(b) Differential grades
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
Week 7 Psychoacoustic Compression 17 ESE 250 – S’12 Kod & DeHon
General Dimensions of Merit
• Bit Rate:
bits per sample
samples per second
• Complexity:
computational effort required
to encode and decode
• Delay:
time required
to encode and decode
Week 7 Psychoacoustic Compression 18 ESE 250 – S’12 Kod & DeHon
MP3 Perceptual Coding Algorithm • Commit to Observation Window
“Long” frame (complex sound; frequency resolution)
“Short” frame (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band
Characterize Masker
Estimate Mask-to-Noise Threshold
Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band
Quantize band to bits-allowed levels
Run Huffmann & count actual bits
[ Raissi. Technical report, MP3’ Tech, December 2002]
Week 7 Psychoacoustic Compression 19 ESE 250 – S’12 Kod & DeHon
MP3 Perceptual Coding Algorithm
[ Raissi. Technical report, MP3’ Tech, December 2002]
• Commit to Observation Window “Long” frame (complex sound;
frequency resolution)
“Short” frame (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band
Characterize Masker
Estimate Mask-to-Noise Threshold
Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band
Quantize band to bits-allowed levels
Run Huffmann & count actual bits
Week 7 Psychoacoustic Compression 20 ESE 250 – S’12 Kod & DeHon
Resolution: Time vs. Frequency • Example: two sample plots in time-
frequency plane Masking Thresholds for
(a) castanets (b) piccolo
Recall Masking Threshold def’n: o lower volume signals
o in relation to specified critical band (simultaneous; just before; soon after)
o are inaudible
• Affects Choice of Observation Window (“frame”)
(a) Castanets: prefer ~ 10 ms time resolution Implies blurrier frequency resolution
(b) Piccolo: prefer ~ 2 Critical-band frequency resolution Implies blurrier temporal resolution
[Painter & Spanias. Proc.IEEE,
88(4):451–512, 2000]
Week 7 Psychoacoustic Compression 21 ESE 250 – S’12 Kod & DeHon
MP3 Perceptual Coding Algorithm • Commit to Observation Window
“Long” frame (complex sound; frequency resolution)
“Short” frame (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band
Characterize Masker
Estimate Mask-to-Noise Threshold
Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band
Quantize band to bits-allowed levels
Run Huffmann & count actual bits
[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]
Week 7 Psychoacoustic Compression 22 ESE 250 – S’12 Kod & DeHon
Use of Perceptual Entropy
• Transform Coding
Trade vector quantization for scalar quantization
By transforming to frequency domain (decoupled)
• Critical Band Analysis
Compute spectral power in each band, Pf = | Sf |
Determine Masker for each band via SFM
Determine JND (“just noticeable distortion”) threshold for each band
• Bit Assignment
Week 7 Psychoacoustic Compression 23 ESE 250 – S’12 Kod & DeHon
MP3 Perceptual Coding Algorithm • Commit to Observation Window
“Long” frame (complex sound; frequency resolution)
“Short” frame (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band
Characterize Masker
Estimate Mask-to-Noise Threshold
Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band
Quantize band to bits-allowed levels
Run Huffmann & count actual bits [You & Chen. Multim.Tools &
Appl., 40(3):341–359, 2008.]
Week 7 Psychoacoustic Compression 24 ESE 250 – S’12 Kod & DeHon
Spectral Quantization/Coding Loop
• Inner loop Global gain – overall bit rate control
Shared across all spectral values
Larger value o increased quantizer step size
o increased quantization noise (“distortion”)
• Outer Loop Adjusts scale factors – reallocating bits to bands
Affects only the spectral values within a critical band
Insures the masker for that band will mask the distortion o Larger mask signal relative to threshold implies fewer bits needed
o Smaller mask relative to threshold implies more bits needed
• Loop Termination when bit rate constraint is satisfied with no audible distortion or after a set number of iterations (with excessive bits spent)
Freq. (Hz)
SP
L (
dB
)
… …
Larger mask-to-noise ratio
Smaller
mask-to-noise
ratio
Week 7 Psychoacoustic Compression 25 ESE 250 – S’12 Kod & DeHon
Typical Quantization Control
[You & Chen. Multim.Tools &
Appl., 40(3):341–359, 2008.]
Quantized
Output amplitude
Inner loop bit
rate control Outer loop band specific
Distortion control
Frequency Input
masker amplitude
Critical Band Index
5 10 15 20 25
5
10
15
20
25
GlobalGain = 220
ScaleFactor = 2
5 10 15 20 25
5
5
10
15
20
25
GlobalGain = 230
ScaleFactor = 2 Output
Amplitude
[quantized
SPL]
Input Amplitude [real SPL]
Output
Amplitude
[quantized
SPL]
Input Amplitude [real SPL]
Week 7 Psychoacoustic Compression 26 ESE 250 – S’12 Kod & DeHon
MP3 Perceptual Coding Algorithm • Commit to Observation Window
“Long” (complex sound; frequency resolution)
“Short” (transient sound; temporal resolution)
• Estimate Perceptual Entropy Analyze Each Critical Band
Characterize Masker
Estimate Mask-to-Noise Threshold
Update “bit reservoir”
• Spectral Quantization/Coding Loop Allocate bits per critical band
Quantize band to bits-allowed levels
Run Huffmann & count actual bits
[ Raissi. Technical report, MP3’ Tech, December 2002]
Week 7 Psychoacoustic Compression 27 ESE 250 – S’12 Kod & DeHon
To Probe Further
• Tutorials on Psychoacoustic Coding (in increasing order of abstraction and generality) D. Pan, M. Inc, and I. L. Schaumburg. A tutorial on MPEG/audio
compression. IEEE multimedia, 2(2):60–74, 1995.
Nikil Jayant, James Johnston, and Robert Safranek. Signal compression based on models of human perception. Proceedings of the IEEE, 81(10):1385–1422, 1993.
V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001.
• Lightweight Overview of MP3 Rassol Raissi. The theory behind mp3. Technical report, MP3’ Tech,
December 2002.
• Scientific Basis of MP3 Coding Standard J. D. Johnston. Transform coding of audio signals using perceptual noise
criteria. IEEE Journal on selected areas in communications, 6(2):314–323, 1988.
Week 7 Psychoacoustic Compression 28 ESE 250 – S’12 Kod & DeHon
ESE250:
Digital Audio Basics
Week 7
February 23, 2012
Psychoacoustic
Compression