+ All Categories
Home > Documents > Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Date post: 02-Apr-2015
Category:
Upload: scarlett-irwin
View: 225 times
Download: 2 times
Share this document with a friend
24
Part II Part II (MPEG-4) Audio (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg
Transcript
Page 1: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Part IIPart II(MPEG-4) Audio(MPEG-4) Audio

TSBK01 Image Coding and Data Compression

Lecture 11, 2003

Jörgen Ahlberg

Page 2: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

2

MPEG-4 Audio - OutlineMPEG-4 Audio - Outline

Psycho-acoustic models

Overview of MPEG-4 Audio

AAC - Advanced Audio Codec

Specialized coders

Synthetic (structured) audio

Page 3: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

3

Psycho-acoustic modelsPsycho-acoustic models

A psycho-acoustic model tells how humans perceive the sound.

The main feature of the psycho-acoustic model in the compression context is that it tells what parts that we can remove.

Page 4: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

4

Hearing ThresholdHearing Threshold

dB

0

10

20

30

40

2 4 6 8 10 12

Will not be heard anyway; discard!

kHz

Page 5: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

5

Frequency MaskingFrequency Masking

Energy

Frequency

Page 6: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

6

Frequency MaskingFrequency Masking

Energy

Frequency

Page 7: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

7

Temporal MaskingTemporal Masking

Energy

TimeStrong sound (”masker”)

Forward (post) maskingApprox. 100 ms

Backward (pre) masking< 10 ms

Page 8: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

8

Psycho-acoustic Model: Psycho-acoustic Model: DemoDemo

Music without distortion

Music with white noise

Music with perceptually distributed noise

Page 9: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

9

Parts of MPEG-4 AudioParts of MPEG-4 Audio

General natural audio– AAC

BSAC TwinVQ

– HILN (parametric)

Natural speech– CELP– HVXC (parametric)

Synthetic audio– TTS– SAOL– SASL

Composition– Mixing– Re-sampling– 3D-rendering

Page 10: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

10

Parts of MPEG-4 Audio Parts of MPEG-4 Audio (cont.)(cont.)

Error Protection– CRC– FEC

Block code Convolution code

– Interleaving

Error Resilience– Error resilient

bitstreams– Error concealment

Page 11: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

11

Natural Audio CodersNatural Audio Coders

Quality

Cellular

Telephone

AM

FM

CD

2 4 8 16 32 64 kbit/s

Parametric speech(HVXC)

High quality speech(CELP)

General audio(AAC, TwinVQ)

Parametric audio(HILN)

Page 12: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

12

MPEG-2/4 AAC:MPEG-2/4 AAC:Advanced Audio CoderAdvanced Audio Coder

DCT-based time/frequency coder.

Typically 16 – 64 kbit/s/channel.

”Expert listener quality” at 128 kbit/s.

Added to MPEG-2, but without MPEG-4 features.

Half the bitrate compared to mp3, mainly due to improved psycho-acoustic model.

kbits/s kHz Haydn Tracy Chapman

Mono 16 16

Stereo 32 16

Stereo 64 32

Page 13: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

13

MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC

TwinVQ (Transform-domain Weighted Interleave)

– Improves performance for low bitrates(6-18 kbit/s).

PNS (Perceptual Noise Substituion)

– Allows coding ”noise-like” parts parametrically.

LTP (Long-term prediction)

– Allows ”tone-like” parts to be coded with higher accuracy to a lower bitrate.

Page 14: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

14

MPEG-4 ExtensionsMPEG-4 Extensionsto the AACto the AAC

BSAC (Bit-sliced Arithmetic Coder)

– Adds scaleability to the bitstream.– 16 – 64 kbit/s in steps of 1 kbit/s.

Demo:

60

40

20

kbit/s

Page 15: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

15

Other MPEG-4 Natural Other MPEG-4 Natural Audio CodersAudio Coders

Speech coders– High bitrate speech coder (CELP)– Low bitrate speech coder (HVXC)

HILN low bitrate parametric coder– Harmonic and Individual Lines plus Noise– 4 - 16 kbit/s– Subband coder that codes each subband as

a tone or as shaped noise.

Page 16: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

16

MPEG-4 High Bitrate MPEG-4 High Bitrate Speech CoderSpeech Coder

High quality CELP coder.

8 or 16 kHz sampling (NB or WB mode).

4 – 24 kbit/s.

PCM (uncompressed) 16 kbit/s 24 kbit/s

Codebook index k

LPC filterPerceptual w. filter

e(n)

gk

xk(n)

s(n)

Basic principle of CELP coder

Page 17: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

17

MPEG-4 Low Bitrate MPEG-4 Low Bitrate Speech CoderSpeech Coder

HVXC – Harmonic Vector eXcitation Coder.

8 kHz sampling, 2 – 4 kbit/s.

Down to 1.2 kbit/s in variable rate mode.

Sinusoidal coding for voiced parts and CELP coding for unvoiced part.

HVXC can be combined with HILN.– Automatic switching between the coders– Produces one bitstream.

Page 18: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

18

MPEG-4 Natural Audio MPEG-4 Natural Audio Coders: DemoCoders: Demo

Originalaudio

Music coder(TwinVQ)

Music coder(HILN)

Speech coder(CELP)

Speech coder

(HVXC)

6 kbit/s 6 kbit/s 6 kbit/s 2 kbit/s

Speech

Simplemusic

Complexmusic

Page 19: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

19

Speed ChangeSpeed Change

Possibility to decode to arbitrary speed, without changing the pitch.

Original

Music ~20% faster

Page 20: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

20

Synthetic AudioSynthetic Audio

TTS – Text-To-Speech– MPEG-4 defines an interface, not the TTS itself

SAOL - Structured Audio Orchestra Language– SAOL describes how to generate instruments

SASL - Structured Audio Score Language– SASL describes which instruments to play when– MIDI is a subset of SASL

Demo:– Orchestra: Initially 80 kB instrument descriptions

(SAOL)– While playing: 1 kbit/s (SASL)

Page 21: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

21

BIFS –Binary Format for BIFS –Binary Format for Scene DescriptionScene Description

All the sound you hear is coded at 16 kbit/s.

Initial voice coded using TTS.

Current voice coded using parametric speech coder (HVXC).

Background ”music” coded using Structured Audio.

Post-production specified using BIFS, using the Structured Audio tools.

Page 22: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

22

A Scene GraphA Scene Graph

AudioMix

AudioFX

AudioSource AudioSource

Mix the sounds

Add reverb

Hand claps(SA decoder)

Speech(CELP-coder)

Page 23: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

23

AudioMix

AudioMix AudioFX

AudioDelayAudioFX AudioFX

AudioSource AudioSourceAudioSource

Piano Bass (SA) Finger snaps

Page 24: Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

That was the last slide!


Recommended