175
Embedded Audio Coder
Jin Li
275
Outline
IntroductionEmbedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design
Experimental results amp demosConclusion
375
Introduction
475
Introduction ndash Audio Compression
Audio Waveform
Bitstream
575
EAC vs Other Compression
Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip
Why research for a new audio codec
675
Media vs File Compression
File compression Every bit is important has to be compressed
losslessly
Media compression Exact bitvalue is not important distortion is
tolerable Amount of media is huge high compression ratio is
required Media needs adaptation
775
Key Features of EAC
Not only good compression performance
But also flexible bitstream syntax The compressed bitstream may be manipulated for
Different bitrate Different of audio channels Different audio sampling rate
Versatile Lossless Low delay Streamingstorage application
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
275
Outline
IntroductionEmbedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream assembly Modular software design
Experimental results amp demosConclusion
375
Introduction
475
Introduction ndash Audio Compression
Audio Waveform
Bitstream
575
EAC vs Other Compression
Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip
Why research for a new audio codec
675
Media vs File Compression
File compression Every bit is important has to be compressed
losslessly
Media compression Exact bitvalue is not important distortion is
tolerable Amount of media is huge high compression ratio is
required Media needs adaptation
775
Key Features of EAC
Not only good compression performance
But also flexible bitstream syntax The compressed bitstream may be manipulated for
Different bitrate Different of audio channels Different audio sampling rate
Versatile Lossless Low delay Streamingstorage application
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
375
Introduction
475
Introduction ndash Audio Compression
Audio Waveform
Bitstream
575
EAC vs Other Compression
Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip
Why research for a new audio codec
675
Media vs File Compression
File compression Every bit is important has to be compressed
losslessly
Media compression Exact bitvalue is not important distortion is
tolerable Amount of media is huge high compression ratio is
required Media needs adaptation
775
Key Features of EAC
Not only good compression performance
But also flexible bitstream syntax The compressed bitstream may be manipulated for
Different bitrate Different of audio channels Different audio sampling rate
Versatile Lossless Low delay Streamingstorage application
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
475
Introduction ndash Audio Compression
Audio Waveform
Bitstream
575
EAC vs Other Compression
Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip
Why research for a new audio codec
675
Media vs File Compression
File compression Every bit is important has to be compressed
losslessly
Media compression Exact bitvalue is not important distortion is
tolerable Amount of media is huge high compression ratio is
required Media needs adaptation
775
Key Features of EAC
Not only good compression performance
But also flexible bitstream syntax The compressed bitstream may be manipulated for
Different bitrate Different of audio channels Different audio sampling rate
Versatile Lossless Low delay Streamingstorage application
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
575
EAC vs Other Compression
Existing audio compression schemes MP3 AAC MPEG4 audio WMA Real Audio hellip
Why research for a new audio codec
675
Media vs File Compression
File compression Every bit is important has to be compressed
losslessly
Media compression Exact bitvalue is not important distortion is
tolerable Amount of media is huge high compression ratio is
required Media needs adaptation
775
Key Features of EAC
Not only good compression performance
But also flexible bitstream syntax The compressed bitstream may be manipulated for
Different bitrate Different of audio channels Different audio sampling rate
Versatile Lossless Low delay Streamingstorage application
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
675
Media vs File Compression
File compression Every bit is important has to be compressed
losslessly
Media compression Exact bitvalue is not important distortion is
tolerable Amount of media is huge high compression ratio is
required Media needs adaptation
775
Key Features of EAC
Not only good compression performance
But also flexible bitstream syntax The compressed bitstream may be manipulated for
Different bitrate Different of audio channels Different audio sampling rate
Versatile Lossless Low delay Streamingstorage application
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
775
Key Features of EAC
Not only good compression performance
But also flexible bitstream syntax The compressed bitstream may be manipulated for
Different bitrate Different of audio channels Different audio sampling rate
Versatile Lossless Low delay Streamingstorage application
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
875
EAC Encoder
Encoder
Master Bitstream
Companion File
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
975
Parser
Except header application bitstream is a subset of the master bitstream (parsing is fast)
May be changed according to the required bitrate of audio channels and audio sampling rate
Parser
Master Bitstream
Companion File
Application Bitstream
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1075
EAC Decoder
Encoder
Bitstream
Speaker (Direct Sound)
wav file
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1175
Embedded Audio Coder- Algorithm Description
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1275
Frame Work - Encoder
Transform Entropy coder
BitstreamAssembly
TransformEntropy
coderBitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1375
Audio Transform
Input audio sample
Output transform coefficient
Goal convert audio from space domain to frequency domain Compact energy Better match with psychoacoustic
characteristics Enable audio sampling rate change
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1475
Lossy vs Lossless Mode
MLT(SW)Audio
Quantization
Lossy mode
Reversible MLT(SW)
Audio
Lossless mode
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1575
Lossy (Float) Pass
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1675
MLT - Modulated Lapped Transforms
0 100 200 300 400 500 600 700 800 900 1000-1
-08
-06
-04
-02
0
02
04
06
08
1
Spatial Response
0 01 02 03 04 05 06 07 08 09 1-100
-80
-60
-40
-20
0
20
40
Frequency (pi)G
ain
(dB
)
Frequency Domain
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1775
MLT with Window Switching
Features Basic window size 2048 Short window size 256 Switching criterion A frame (2048 samples) is switched to short window if and
only if Energy is bigger than a certain threshold Energy within the 8 subframes (256 samples) differs more than Ta
There are at least two neighbor subframes where the energy of the former subframe is greater than the latter subframe by Tb
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1875
Band Separation
Audio (441kHz sampling)
MLT with window switching
Band separation0
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
1975
Synthesis (Half Sampling)
Audio (2205kHz sampling)
MLT with window switching
Band separation
0
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2075
Synthesis (Quarter Sampling)
Audio (11025kHz sampling)
MLT with window switching
Band separation
0
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2175
Quantizer
Input coefficient
Output quantized coefficient
Goal convert coefficient from float to integer Reduce signal levels Fast implementation of entropy coding
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2275
Quantizer
Scalar quantizer with a deadzone
s
snmsnm
1
0n][m
][][
Quantized Magnitude Sign
0
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2375
Lossless (Integer) Pass
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2475
Key to Achieve Lossless
Break the MLT into small steps
Make every step reversible
Definition of reversible transform Integer input integer output The transform should have a determinant of 1
(donot expand data volume)
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2575
MLT Framework
Pre-R
otate
Com
plex FF
T
Post R
otation
DCT IV
Window
Lapped Transform
Pre-R
otate-l
Com
plex FF
T-l
Post R
otation-l
Inv Window
-l
Forward MLT
Inverse MLT
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2675
Window Operation
x(n)x(-n-1)
N
n
4
)21(
4
Complex Rotate
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2775
Pre-Rotation
Complex Rotate ndash32 xw(0)
xw(1)
xw(2)
xw(3)
xw(4)
xw(5)
xw(6)
xw(7)
Complex Rotate ndash532
Complex Rotate ndash932
Complex Rotate ndash1332
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2875
FFT (4 Point Complex)
xp(0)
xp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
xc(0)
xc(1)
xc(2)
xc(3)
-
- e-j2
-
-
yc(0)
yc(1)
yc(2)
yc(3)
yp(0)
yp(1)
xp(2)
xp(3)
xp(4)
xp(5)
xp(6)
xp(7)
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
2975
Post-Rotation
Conjugate Rotate ndash0y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
Conjugate Rotate ndash8
Conjugate Rotate ndash28
Conjugate Rotate ndash38
yp(0)
yp(1)
yp(2)
yp(3)
yp(4)
yp(5)
yp(6)
yp(7)
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3075
Reversible MLT
Make the following operation reversible Butterfly operation Complex rotation Conjugate rotation
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3175
Reversible Unit Transform
b
a
b
a
11
11
2
1
2
1
0
actb
tcba
bcat
21
21
1
20
c
cc
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3275
Entropy Coder
Input quantized coefficients
Output embedded coded bitstream with R-D
performance curve
Goal Compression Embedded bitstream for future manipulation
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3375
Frame Grouping
Time slot
1 2 3 4 5 6 7 8
Fram
e
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3475
Entropy Coder
D
R
Bitstream
R-D curve
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3575
Entropy Coder
Embedded coding
Implicit psychoacoustic masking
Context modeling
Arithmetic coding
Implementation concerns
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3675
A block of coefficients
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Next View graph
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3775
Bits of Coefficients
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
coef
fici
ent
45
-74
21
14-4
-18
4
-1
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3875
Conventional Coding
First
Second
Third
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +
Signb1 b2 b3 b4 b5 b6 b7
w0
w1
w2
w3
w4
w5
w6
w7
46
-74
22
00
0
00
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
3975
Embedded Coding
01 -000000
1 +0000000
001 +001 -00
Signb1 b2 b3 b4 b5 b6 b7
0 1 0 1 1 0 10 1 0 1 1 0 1 +1 0 0 1 0 1 0 -0 0 1 0 1 0 1 +0 0 0 1 1 1 0 +0 0 0 0 1 0 0 -0 0 1 0 0 1 0 -0 0 0 0 1 0 0 +0 0 0 0 0 0 1 -
First Second
Third
w0
w1
w2
w3
w4
w5
w6
w7
Value
40
Range
3247
-72 -79-64
163124
-31310
-31310
-3131-24
-31310
-31310
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4075
Audio Masking
FrequencyCriticalBand
NeighboringBand
Noise Level
Signal
Masking Threshold
Maximum Mask
Signal-tomask ratio
Noise-tomask ratio
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4175
Psychoacoustic Masking
Traditional approach (explicit masking all existing approaches) Calculate the mask Transmit the mask Modify transform coefficients (or coding
approach) according to the masking Encode the transform coefficients
Note Mask modifies the coding content
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4275
Implicit Psychoacoustic Masking
Key Mask modifies the coding order the content is the same
Implicit masking Calculate the static masking (Fletcher_Munson threshold) Encode the MSB of the transform coefficients Calculate the mask based on the MSB of the coefficients Modify coding order Encode the next most important part of the coefficients Repeat the process
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4375
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
Signb1 b2 b3 b4 b5 b6 b7
001 -000000
First
w0
w1
w2
w3
w4
w5
w6
w7
Value
0
Range
-6363
-96 -127-64
-63630
-63630
-63630
-63630
-1271270
-1271270
Coefficient SignificantInsignificant
Mask
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4475
Embedded Coding with Implicit Psychoacoustic Masking
01 -000000
1 +0000000
Signb1 b2 b3 b4 b5 b6 b7
0 10 1 +1 0 -0 00 00 00 00 00 0
First Second
w0
w1
w2
w3
w4
w5
w6
w7
Value
48
Range
3263
-96 -127-64
-31310
-31310
-31310
-31310
-63630
-63630
Coefficient SignificantInsignificant
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4575
Context Modeling
Context Zero coding
Significant statuses of neighbor coefficients Refinement
Whether it is the 1st refinement pass Significant statuses of neighbor coefficients
Sign Neighbor signs
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4675
After Implicit Psychoacoustic Masking amp Context Modeling
45 0 0 0-74 -13 0 0
21 0 4 014 0 23 23
0 0 0 03 0 4 0
0 3 5 00 0 0 0
0 1 -1 0-4 33 0 -1
0 0 1 00 0 0 0
-4 5 0 0-18 0 0 19
4 0 23 0-1 0 0 0
Bit 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 helliphellipCtx 0 0 9 0 0 0 0 0 0 7 10 0 0 0 0 0 0 0 0 helliphellip
Automatically generated
To be encoded
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4775
Arithmetic Coding ndash Illustration (QM Coder used)
What is arithmetic coding
0
1
1-P0
P0
1-P1
P1 1-P2
P2
S0=0 S1=1 S2=0
0100
Coding result
(Shortest binarybitstream ensures thatintervalB=0100 0000000 toC=0100 1111111 is(BC) A)
AB
C
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4875
Entropy Coder (Summary)
D
R
Bitstream
R-D curve
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
4975
Speed Up Issues
Context Modeling Use stored context Update context when a coefficient becomes significant
Implicit Masking Fast calculation of energy in a critical band Lookup table convert energy to mask
R-D curve calculation Lookup table calculation of distortion
Context entropy coder QM coder Run-length Rice coder
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5075
Bitstream Assembly
Input Bitstream R-D curve
Output Assembled bitstream Companion file
Bitstream assembling
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5175
EAC Bitstream Syntax
Timeslot header Whether a certain channel exist (1 bit) Length of the channel bitstream (1-2 bytes)
EA
C m
arke
rG
loba
l H
eade
r Timeslot
Head Body
Timeslot
Head Body
Timeslot
Head Body
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5275
Companion FileG
loba
l H
eade
r Timeslot
Head R-D curve
Timeslot
Head R-D curve
Timeslot
Head R-D curve
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5375
Rate-Distortion Optimized Assembling (Single Timeslot)
D1
R1
D2
R2
D3
R3
D4
R4
D1
R1
D2
R2
D3
R3
D4
R4
r1 r2
r3 r4
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5475
Rate-Distortion Optimized Assembling (Multiple Timeslots)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5575
Allocated Bytes Per Timeslots
Allocated bytes for a certain timeslot Bi = Bufi-1 ndash Bufi + Ratetrans Time
Where Bi allocated bytes for timeslot i
Bufi buffer occupancy level at timeslot i
Ratetrans coding (network) rate per second Time time duration of the timeslot
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5675
Optimization
Given Initial buffer occupancy level Final buffer occupancy level ( or intermediate
level with a sliding window ) Buffer occupancy constraint Search for the allocated of bytes for the
current timeslot
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5775
Search (R-D slope)B
uffe
r O
ccup
ancy
(B
ytes
)
Time (timeslots)
Illegal Region
Illegal Region
Underflow (too many bytes)
Overflow (too few bytes)
Wastebytes
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5875
Multiple Timeslots ndash Constant Bitrate
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
5975
Multiple Timeslots ndash Internet Streaming (Slow Start)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
Buffer-Occupancy Curve
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6075
Multiple Timeslots ndash Internet Streaming (Normal)
Buf
fer
Occ
upan
cy (
Byt
es)
Time (timeslots)
Illegal Region
Illegal Region
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6175
Modular Software Design
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
MLT(SW) Quantizer Entropy coder
BitstreamAssembly
Audio
Bitstream
L+R(or mono)
L-R
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6275
Modular Software Design
Highly modularized pipeline design Quantizer entropy coder can be used for imagevideo
compression as well Probe and data input can be inserted into any part of the
program
Data flow driven (with necessary memory regulator ltbuffergt) No long delay No need for large memory
Memory and computation efficient Working memory preallocated
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6375
Experimental Results
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6475
EAC ndash Highly Efficient (NMR)
Results based on the average of 16 MPEG4 test clipsThe smaller the NMR the better
669568280-22EAC
847556325040WMA
748700571448MP4TwinVQ
8kbps16kbps32kbps48kbpsCodec
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6575
EAC ndash Lossless
Results based on the average of 16 MPEG4 test clips
132WinZip
272Monkeyrsquos Audio
272EAC
Compression RatioCodec
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6675
EAC (Versatile)
Versatile Real time 2-way communication (Low delay
mode) Storage device (Pocket PC Xbox) Internet streaming
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6775
EAC (Low Delay Mode)
Reducing frame size
Timeslot = 1 frame
Fixed length timeslot bitstream
Delay = 2 frame Ignore encodingdecoding delay) Network transmission time (if modem line
delay = 3 frames )
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6875
EAC (Low Delay Mode)
Encoder
Frame = i-1 i i+1Start Encoding Frame i
MLT Quantizer Entropy
Bitstream
Start Decoding Frame iEntropy Quantizer
network
Playable here
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
6975
EAC ndash Flexible Bitstream Syntax
Flexible bitstream syntax Parser may reassemble the bitstream 1000x real
time Change
bit rate of audio channels audio sampling rate
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
7075
EAC ndash Software
Software Encoder 8x realtime (Stereo 441kHz
sampling) Decoder 20x realtime Parser 1000x realtime
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
7175
EAC - Encoder
Audio
EncoderStereo128kbps
Companion file
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
7275
EAC - Parser
Parser
Companion file
Stereo128kbps
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
Server
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
7375
EAC - Decoder
Decoder
Stereo 16kbps
Mono 8kbps
Stereo 16kbps Slow start
Mono 8kbps 11kHz sampling
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
7475
Comparison
Original MP4 TwinVQ WMA EAC
MP3
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint
7575
Conclusions
An embedded audio coder is developed Highly efficient Versatile
Low delay constant bitrate streaming Flexible bitstream
Parsing for bitrate of audio channels audio sampling rate
Good prototype available realtime execution small memory footprint