Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

transcript

Part IIPart II(MPEG-4) Audio(MPEG-4) Audio

TSBK01 Image Coding and Data Compression

Lecture 11, 2003

Jörgen Ahlberg

MPEG-4 Audio - OutlineMPEG-4 Audio - Outline

Psycho-acoustic models

Overview of MPEG-4 Audio

AAC - Advanced Audio Codec

Specialized coders

Synthetic (structured) audio

Psycho-acoustic modelsPsycho-acoustic models

A psycho-acoustic model tells how humans perceive the sound.

The main feature of the psycho-acoustic model in the compression context is that it tells what parts that we can remove.

Hearing ThresholdHearing Threshold

2 4 6 8 10 12

Will not be heard anyway; discard!

Frequency MaskingFrequency Masking

Energy

Frequency

Frequency MaskingFrequency Masking

Energy

Frequency

Temporal MaskingTemporal Masking

Energy

TimeStrong sound (”masker”)

Forward (post) maskingApprox. 100 ms

Backward (pre) masking< 10 ms

Psycho-acoustic Model: Psycho-acoustic Model: DemoDemo

Music without distortion

Music with white noise

Music with perceptually distributed noise

Parts of MPEG-4 AudioParts of MPEG-4 Audio

General natural audio– AAC

BSAC TwinVQ

– HILN (parametric)

Natural speech– CELP– HVXC (parametric)

Synthetic audio– TTS– SAOL– SASL

Composition– Mixing– Re-sampling– 3D-rendering

Parts of MPEG-4 Audio Parts of MPEG-4 Audio (cont.)(cont.)

Error Protection– CRC– FEC

Block code Convolution code

– Interleaving

Error Resilience– Error resilient

bitstreams– Error concealment

Natural Audio CodersNatural Audio Coders

Quality

Cellular

Telephone

2 4 8 16 32 64 kbit/s

Parametric speech(HVXC)

High quality speech(CELP)

General audio(AAC, TwinVQ)

Parametric audio(HILN)

MPEG-2/4 AAC:MPEG-2/4 AAC:Advanced Audio CoderAdvanced Audio Coder

DCT-based time/frequency coder.

Typically 16 – 64 kbit/s/channel.

”Expert listener quality” at 128 kbit/s.

Added to MPEG-2, but without MPEG-4 features.

Half the bitrate compared to mp3, mainly due to improved psycho-acoustic model.

kbits/s kHz Haydn Tracy Chapman

Mono 16 16

Stereo 32 16

Stereo 64 32

MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC

TwinVQ (Transform-domain Weighted Interleave)

– Improves performance for low bitrates(6-18 kbit/s).

PNS (Perceptual Noise Substituion)

– Allows coding ”noise-like” parts parametrically.

LTP (Long-term prediction)

– Allows ”tone-like” parts to be coded with higher accuracy to a lower bitrate.

MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC

BSAC (Bit-sliced Arithmetic Coder)

– Adds scaleability to the bitstream.– 16 – 64 kbit/s in steps of 1 kbit/s.

kbit/s

Other MPEG-4 Natural Other MPEG-4 Natural Audio CodersAudio Coders

Speech coders– High bitrate speech coder (CELP)– Low bitrate speech coder (HVXC)

HILN low bitrate parametric coder– Harmonic and Individual Lines plus Noise– 4 - 16 kbit/s– Subband coder that codes each subband as

a tone or as shaped noise.

MPEG-4 High Bitrate MPEG-4 High Bitrate Speech CoderSpeech Coder

High quality CELP coder.

8 or 16 kHz sampling (NB or WB mode).

4 – 24 kbit/s.

PCM (uncompressed) 16 kbit/s 24 kbit/s

Codebook index k

LPC filterPerceptual w. filter

Basic principle of CELP coder

MPEG-4 Low Bitrate MPEG-4 Low Bitrate Speech CoderSpeech Coder

HVXC – Harmonic Vector eXcitation Coder.

8 kHz sampling, 2 – 4 kbit/s.

Down to 1.2 kbit/s in variable rate mode.

Sinusoidal coding for voiced parts and CELP coding for unvoiced part.

HVXC can be combined with HILN.– Automatic switching between the coders– Produces one bitstream.

MPEG-4 Natural Audio MPEG-4 Natural Audio Coders: DemoCoders: Demo

Originalaudio

Music coder(TwinVQ)

Music coder(HILN)

Speech coder(CELP)

Speech coder

(HVXC)

6 kbit/s 6 kbit/s 6 kbit/s 2 kbit/s

Speech

Simplemusic

Complexmusic

Speed ChangeSpeed Change

Possibility to decode to arbitrary speed, without changing the pitch.

Original

Music ~20% faster

Synthetic AudioSynthetic Audio

TTS – Text-To-Speech– MPEG-4 defines an interface, not the TTS itself

SAOL - Structured Audio Orchestra Language– SAOL describes how to generate instruments

SASL - Structured Audio Score Language– SASL describes which instruments to play when– MIDI is a subset of SASL

Demo:– Orchestra: Initially 80 kB instrument descriptions

(SAOL)– While playing: 1 kbit/s (SASL)

BIFS –Binary Format for BIFS –Binary Format for Scene DescriptionScene Description

All the sound you hear is coded at 16 kbit/s.

Initial voice coded using TTS.

Current voice coded using parametric speech coder (HVXC).

Background ”music” coded using Structured Audio.

Post-production specified using BIFS, using the Structured Audio tools.

A Scene GraphA Scene Graph

AudioMix

AudioFX

AudioSource AudioSource

Mix the sounds

Add reverb

Hand claps(SA decoder)

Speech(CELP-coder)

AudioMix

AudioMix AudioFX

AudioDelayAudioFX AudioFX

AudioSource AudioSourceAudioSource

Piano Bass (SA) Finger snaps

That was the last slide!

Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Documents