Week 7 February 23, 2012 Psychoacoustic Compressionese250/week7/day7-S'12.pdf · 2012-03-13 · ESE...

Week 7 Psychoacoustic Compression 1 ESE 250 – S’12 Kod & DeHon

ESE250:

Digital Audio Basics

Week 7

February 23, 2012

Psychoacoustic

Compression

2

Course Map

Numbers correspond to course weeks

2,5 6

11

13

12

Week 7 Psychoacoustic Compression

Today: audio

signal processing

– putting it all

together

ESE 250 – S’12 Kod & DeHon


Today’s Agenda

?

• How do we compress from

WAV Bit Rate (per channel):

~ 700 kbps @ 44.1 kHz

• Down to

MP3 Target (per channel) :

~ 60 kbps @ 44.1 kHz

?

Where are we ? • Week 2 Received signal is sampled &

quantized

q = PCM[ r ]

• Week 3 Quantized Signal is Coded

c =code[ q ]

• Week 4 Sampled signal first

transformed into frequency domain

Q = DFT[ q ]

• Week 5 signal oversampled & low

pass filtered

Q = LPF[ DFT(q+n) ]

• Week 6 Transformed signal analyzed

Using human psychoaoustic models

• Week 7 Acoustically Interesting signal

is “perceptually coded”

C = MP3[ Q]

Over

Sample DFT LPF

Decode Produce

r(t)

p(t)

q + n

C Perceptual

Coding

Store /

Transmit

Q + N Q

Week 4

Week 6

Week 5 Week 3

[Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Week 7 Psychoacoustic Compression ESE 250 – S’12 Kod & DeHon 4


- T0/2 T0/2 T0 -T0

TA

- T0/2 T0/2

TN

- T0/2 T0/2

Week 5 Review (Oversampling) audio-relevant signal

q(t)

Time window, T0 sec per “block”

Nyquist sample rate, TA

ideally nA = T0 / TA samples per “block”: q = (qnA

… , q-2 , q-1 , q0 , q1 , q2, …, qnA,,)

= PCM[ q(t) ]

• ambient noise n(t)

Nyquist sample rate, TN << TA

• receive signal in a block nS = T0 / TN >> nA = T0 / TA

ultimately, record

r = (r-nN , … , r-2 , r-1 , r0 , r1 , r2 , … , rnN

) = PCM[ r(t) ]

= PCM[ q(t) + n(t)]

= q + n

= (q-nN + n-nN

, … , q-2 + n-2, q-1 + n-1 ,

q0 + n0, q1 + n1, q2 + n2, … , qnN + n-nN

)

q(t)

n(t)

r(t) = q(t) + n(t)


- T0/2 T0/2

r(t) = q(t) + • Given r, compute frequency domain representation

R = DFT[ r ]

= (R-nN , … , R-2 , R-1 , R0 , R1 , R2 , … , RnN

)

= (Q-nN + N-nN

, … , Q-2 + N-2, Q-1 + N-1 , Q0 + N0, Q1 + N1, Q2 + N2, … , QnN + N-nN

)

• introduce assumptions about frequency content:

k > nA = A / 0 ) Qk = 0

k < nA = A / 0 ) Nk = 0

• to realize

R = (0,…, 0, QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,

, 0,…, 0 )

+ (N-nM,… , N-nA

, 0 , … , 0 , 0 , 0 , 0 , 0 , … , 0 , NnA

,…, NnM )

• Low Pass Filter

Q = (QnA … , Q-2 , Q-1 , Q0 , Q1 , Q2, …, QnA,,

)

• Bit count = nA 32

T0 ¼ 1/44 sec ) nA ¼ 1000

T0 ¼ 1/88 sec ) nA ¼ 500

Week 5 Review (Anti-Aliasing)

… … nA nS - nS - nA


DFT

LPF


Week 6 Review (Hearing Model)

• Power Spectrum Model of Hearing:

Critical Bands: Auditory system contains finite array of adaptively tunable, overlapping bandpass filters

Frequency Bins: humans process a signal’s component (against noisy background) in the one filter with closest center frequency

Masking: certain signal components in a given band are “favored” and others are filtered out

• Established through decades of psychoacoustic experiments

• Model underlying today’s algorithmic thinking

B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.


• acquire &

transform “frame”

• assign

frequencies to

bands

use

psychoacoustic

model lookup

to determine

frequency

bandwidth

of each critical

band

Today: Critical Band Assignments


|Q|

LUT

Bands

QnA

… …

Q1 Q2 Q3 Q4 Q5 Qk-1 Qk+1 Qk

…

… …

1 2 k 22 … …


• use psychoacoustic model

• to minimize bits per critical band

Cband k(j) = Round[Qband k(j) , Level]

• by appeal to “masking” models

Qband k = Cband k + Dband k

where distortion (“perceptual noise”)

Dband k

between retained signal

Cband k

and actual signal

Qband k

should be “masked” by retained

signal

Today: Code (Compress) Frequencies

Bands

Lossy Coding


• E.g., Look at kth band

Qband k = (Qband k(1) , … ,,Qband k(m) ) amplitudes represented as reals

• Determine masking paradigm

tone-masking-noise

noise-masking-tone

noise-masking-noise

• E.g., for tone masker,

Pick tone frequency, band k (j)

at maximal amplitude in the band

• Choose quantization level and compute compressed signal

Cband k(j) = Round[Qband k(j) , Level]

Cband k = (0band k(1) , … , Cband k(j) , … , 0band k(m) )

• Assess noise magnitude, | Dband k |

Dband k =

(Qband k(1) , … , Qband k(j) , … ,,Qband k(m) )

- (0band k(1) , … , Cband k(j) , …, 0band k(m)

• Use psychoacoustic model

to determine whether compressed signal

will mask the distortion noise for that band

Single Band SPL

frequency

SPL

1 bit | Noise |

frequency

SPL

1 bit Signal

Real Input

frequency

frequency

SPL

2 bit Signal

frequency

SPL

2 bit | Noise |

SMR for

2 bits

SMR for 1 bit

more bits yields larger signal-to-mask-ratio


Overview of Perceptual Coding • Goals

digitally represent a signal

minimum number of bits

“transparent” reproduction

o most sensitive human

o cannot distinguish between

o original and generated signal

• Perceptual Entropy Using psychoacoustic model to estimate information content of audio

signals

[J. D. Johnston. IEEE J. Sel. Ar. Comm., 6(2):314–323, 1988]

suggested transparency achievable at 2 bits per sample

or 88 kbps @ 44.1 kHz

• Our present (WAV) bit count = 32 bits per sample

T0 ¼ 1/44 sec ) nA ¼ 1000 ) bits per frame ¼ 32 ¢ 44 kbps ¼ 1.4 ¢ 103 kbps

T0 ¼ 1/88 sec ) nA ¼ 500 ) bits per frame ¼ 16 ¢ 88 kbps ¼ 1.4 ¢ 103 kbps

• Our leverage?



What are the “Knobs” ?

• “Reservoir”

Are all frames equally full of audio information?

• Masking

How should we exploit the perceptual model?

• Local decoupling

Can we exploit (rough) independence of each

band?

• Global accounting

How should we re-impose the average frame-rate?


MP3 Encoder Design Strategy • Target: ~ 1032 bits per frame

~ 2 bits per sample ~ 512 samples per frame

• Use masker(signal)-masking-maskee(noise) paradigm assume 1 masker “costs” ~ 2 K bits ) retained signal, C should consist of ~ 210 – K maskers on average

o specified amplitudes

o at specified frequencies

allocated to some subset of the ~ 32 = 25 critical bands

• Rough algorithm for computing retained signal (i) frame bit-reservoir: supplies bits per each critical band (ii) commit to masker(s) within each band (iii) attempt to mask within-band distortion with available bits (iv) each band: give bits back to or take more from reservoir (v) iterate


Emerging Picture

Bits Retained

Audio

Quality

Knob #1

Knob #2

Knobs

#1 & #2


The Ultimate Boss

• Subjective Quality Scales

• International Standards

(a) Absolute impairment

(b) Differential grades



General Dimensions of Merit

• Bit Rate:

bits per sample

samples per second

• Complexity:

computational effort required

to encode and decode

• Delay:

time required

to encode and decode


MP3 Perceptual Coding Algorithm • Commit to Observation Window

“Long” frame (complex sound; frequency resolution)

“Short” frame (transient sound; temporal resolution)

• Estimate Perceptual Entropy Analyze Each Critical Band

Characterize Masker

Estimate Mask-to-Noise Threshold

Update “bit reservoir”

• Spectral Quantization/Coding Loop Allocate bits per critical band

Quantize band to bits-allowed levels

Run Huffmann & count actual bits

[ Raissi. Technical report, MP3’ Tech, December 2002]


MP3 Perceptual Coding Algorithm


• Commit to Observation Window “Long” frame (complex sound;

frequency resolution)



Characterize Masker







Resolution: Time vs. Frequency • Example: two sample plots in time-

frequency plane Masking Thresholds for

(a) castanets (b) piccolo

Recall Masking Threshold def’n: o lower volume signals

o in relation to specified critical band (simultaneous; just before; soon after)

o are inaudible

• Affects Choice of Observation Window (“frame”)

(a) Castanets: prefer ~ 10 ms time resolution Implies blurrier frequency resolution

(b) Piccolo: prefer ~ 2 Critical-band frequency resolution Implies blurrier temporal resolution

[Painter & Spanias. Proc.IEEE,

88(4):451–512, 2000]






Characterize Masker








Use of Perceptual Entropy

• Transform Coding

Trade vector quantization for scalar quantization

By transforming to frequency domain (decoupled)

• Critical Band Analysis

Compute spectral power in each band, Pf = | Sf |

Determine Masker for each band via SFM

Determine JND (“just noticeable distortion”) threshold for each band

• Bit Assignment






Characterize Masker





Run Huffmann & count actual bits [You & Chen. Multim.Tools &

Appl., 40(3):341–359, 2008.]


Spectral Quantization/Coding Loop

• Inner loop Global gain – overall bit rate control

Shared across all spectral values

Larger value o increased quantizer step size

o increased quantization noise (“distortion”)

• Outer Loop Adjusts scale factors – reallocating bits to bands

Affects only the spectral values within a critical band

Insures the masker for that band will mask the distortion o Larger mask signal relative to threshold implies fewer bits needed

o Smaller mask relative to threshold implies more bits needed

• Loop Termination when bit rate constraint is satisfied with no audible distortion or after a set number of iterations (with excessive bits spent)

Freq. (Hz)

SP

L (

dB

)

… …

Larger mask-to-noise ratio

Smaller

mask-to-noise

ratio


Typical Quantization Control

[You & Chen. Multim.Tools &

Appl., 40(3):341–359, 2008.]

Quantized

Output amplitude

Inner loop bit

rate control Outer loop band specific

Distortion control

Frequency Input

masker amplitude

Critical Band Index

5 10 15 20 25

5

10

15

20

25

GlobalGain = 220

ScaleFactor = 2

5 10 15 20 25

5

5

10

15

20

25

GlobalGain = 230

ScaleFactor = 2 Output

Amplitude

[quantized

SPL]

Input Amplitude [real SPL]

Output

Amplitude

[quantized

SPL]

Input Amplitude [real SPL]



“Long” (complex sound; frequency resolution)

“Short” (transient sound; temporal resolution)


Characterize Masker








To Probe Further

• Tutorials on Psychoacoustic Coding (in increasing order of abstraction and generality) D. Pan, M. Inc, and I. L. Schaumburg. A tutorial on MPEG/audio

compression. IEEE multimedia, 2(2):60–74, 1995.

Nikil Jayant, James Johnston, and Robert Safranek. Signal compression based on models of human perception. Proceedings of the IEEE, 81(10):1385–1422, 1993.

V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001.

• Lightweight Overview of MP3 Rassol Raissi. The theory behind mp3. Technical report, MP3’ Tech,

December 2002.

• Scientific Basis of MP3 Coding Standard J. D. Johnston. Transform coding of audio signals using perceptual noise

criteria. IEEE Journal on selected areas in communications, 6(2):314–323, 1988.


ESE250:

Digital Audio Basics

Week 7

February 23, 2012

Psychoacoustic

Compression

Date post:	13-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Week 7 February 23, 2012 Psychoacoustic Compressionese250/week7/day7-S'12.pdf · 2012-03-13 · ESE...

Documents