Roberta EklundRoberta EklundConsultantConsultant
MPEG-4 AUDIO MPEG-4 AUDIO OVERVIEWOVERVIEW
MPEG-4 Audio OverviewMPEG-4 Audio Overview Natural Audio
T/F CELP PARA
Structured Audio SAOL SASL SASBF MIDI-DLS-version 2 TTS
Cross Tool(Algorithm) Functionality Pitch/tempo change Bitrate scalability Computation complexity scalability Error robustness Audio related effects Acoustic virtualization
Different Tools for Different Tools for Bitrates/ApplicationBitrates/Application
dSatellite
UMTS,Cellular
DAM,Internet
DCME ISDN
2 4 6 8 10121416 24 32 48 64
Parametrtriccoder
CELPcoder
ITU-Tcoder
4 kHz 8 kHz 20kHz
ty p i ca l A u d i oB a n d w i th
bit-rate(kbps)
T/Fcoder
Scalable Coder
MPEG-4 Audio Tools MPEG-4 Audio Tools PROFILESPROFILES
Object Profile - Profile - defines the syntax of defines the syntax of the bitstream for one single Object, that the bitstream for one single Object, that can represent a meaningful entity in the can represent a meaningful entity in the Audio or Visual scene. Elementary Audio or Visual scene. Elementary bitstreambitstream
Composition Profile - Profile - defines which defines which different Object Profiles can be different Object Profiles can be combined in the Audio or Visual scene. combined in the Audio or Visual scene. Combinations of Elementary bitstreamsCombinations of Elementary bitstreams..
OBJECT PROFILESOBJECT PROFILESProfile Hierarchy Tools supported: Object
ProfileID
reserved 0AAC Main contains AAC LC 13818-7 main profile
PNS1
AAC LC 13818-7 LC profilePNS
2
AAC SSR 13818-7 SSR profilePNS
3
T/ F 13818-7 LC PNS LTP 4T/ F Mainscalable
contains T/ F LCscalable
13818-7 main PNS LTP BSACtools for large step scalability(TLSS)core codecs: CELP, TwinVQ,HILN
5
T/ F LC scalable 13818-7 LC PNS LTP BSACtools for large step scalability(TLSS)core codecs. CELP, TwinVQ,HILN
6
TwinVQ core TwinVQ 7CELP CELP 8HVXC HVXC 9HILN HILN 10TTSI Text-To-Speech Interface 11Main Synthetic Superset Wavetable all structured audio tools 12WavetableSynthesis
SASBFMIDI
13
reserved 14reserved 15
Combination ProfilesCombination ProfilesCombination
ProfileHierarchy Audio Object Profiles
supportedMain Contains Scalable AAC Main, LC, SSR
Speech and T/ F, T/ F Main Scalable, T/ F LCLow Rate Synthetic Scalable
TwinVQ coreCELPHVXCHILNMain Synthetic TTSI
Scalable Contains Speech T/ F LC ScalableAAC-LC and/ or T/ FCELPHVXCTwin VQ coreHILNWavetable SynthesisTTSI
Speech CELPHVXCTTSI
Low Rate Wavetable Synthesis Synthesis TTSI
MPEG-4 Encoder MPEG-4 Encoder StructureStructure
multi-plex
PARAcore
signal analysisand control
sepa-ration
pre-processingaudio
signalbit
stream
CELPcore
T/Fcore
MPEG-4 T/F EncoderEncoderConfigurationConfiguration
PerceptualModel
Bark Scale toScalefactor
BandMapping
AACQuantizationand Coding Twin VQ
AACGain Control
Tool
Prediction
Filterbank
SpectralNormalization
TNS
Intensity /Coupling
M / S
WindowLength
Decision
BitstreamFormatter coded audio
stream
Legend:DataControl
input timesignal
Psychoacoustic Model Spectral Processing
Quantization and Coding
BSACQuantizationand Coding
MPEG-4 T/F MPEG-4 T/F DecoderDecoderConfigurationConfiguration Bitstream
Formattercoded audio
stream
Legend:DataControl
AACGain Control
Tool
Prediction
Filterbank
SpectralNormalization
TNS
Intensity /Coupling
M / S
Spectral Processing
AACQuantizationand Coding
Twin VQ
Decoding and Inverse Quantization
output timesignal
BSACQuantizationand Coding
coding
LPC synthesisfilter
excitation signalgenerator
LPC analysisand quant.
spectralweighting filter
errorminimization
audiosignal
bitstream
Block Diagram of CELP Encoder
decoding excitation signalgenerator
LPC synthesisfilter
bitstream
audiosignal
Excitation signal generator: codebook regular pulse excitation (RPE) multi-pulse excitation (MPE)
Block Diagram of CELP Decoder
perceptionmodel
model basedseparation
multi-plex
quantizationand coding
quantizationand coding
quantizationand coding
individualcomponents
noisecomponents
harmoniccomponents
audiosignal
bitstream
parameterestimation
parametercoding
Block Diagram of PARA Encoder
parameterdecoding
individualcomponents
noisecomponents
harmoniccomponents
audiosignal
bitstream
synthesis
Block Diagram of PARA Decoder
Two operating modes harmonic and noise components (HVXC)
– for speech coding at 2...4 kbps harm. & indiv. sinusoidal comp. + noise (HILN)
– for coding of music signals with low complexity content (e.g. single instruments) at 4...16 kbps
combination of both modes– support by syntax, defined transition– automatic mode selector– cross fade from one signal to another one
PARA is Two Codecs in One
Text-to-SpeechText-to-Speech
Phonemic (language-independent) Phonemic (language-independent) syntaxsyntax
Prosody, timing cuesProsody, timing cues Language, dialect, gender, age Language, dialect, gender, age
parametersparameters Automatic synchronization with FBAAutomatic synchronization with FBA Exact TTS synthesis non-normative; Exact TTS synthesis non-normative;
only interface is specifiedonly interface is specified
Structured AudioStructured Audio
Structured Audio - Structured Audio - Sound coding Sound coding using structured descriptionsusing structured descriptions
Structured Audio decoder - music Structured Audio decoder - music and sound-effect synthesisand sound-effect synthesis
MMA, Microsoft, EMU now MMA, Microsoft, EMU now collaborating on MIDI DLS-version 2 collaborating on MIDI DLS-version 2 in MPEG4in MPEG4
SAOLSAOL Downloadable BNF synthesis grammarDownloadable BNF synthesis grammar Header contains description of several Header contains description of several
synthesizers and effects processors synthesizers and effects processors control algorithms and routing control algorithms and routing instructions for audio flow of controlinstructions for audio flow of control
SAOL has 100 primitive processing SAOL has 100 primitive processing instructions, signal generators and instructions, signal generators and operators which fill wavetables with data.operators which fill wavetables with data.
SASL and MIDISASL and MIDI New format for describing control parametersNew format for describing control parameters
- Basically a scheduler of audio events- Basically a scheduler of audio events - Designed to interface well with SAOL- Designed to interface well with SAOL - New Control Language Similar to MIDI- New Control Language Similar to MIDI MIDI (Musical Instrument Digital Interface) MIDI (Musical Instrument Digital Interface)
– Simpler format for describing controlSimpler format for describing control– Included as alternate control methodIncluded as alternate control method– Leverages existing authoring toolsLeverages existing authoring tools– Gives “backwards compatibility” to SAGives “backwards compatibility” to SA
DLS Level 2DLS Level 2
Aims at Aims at consistentconsistent synthetic audio synthetic audio playback across wide range of platformsplayback across wide range of platforms
Defines a simple wavetable synthesizerDefines a simple wavetable synthesizer Bitstream Bitstream includesincludes sound samples sound samples Score expressed in MIDIScore expressed in MIDI Growing support from both software and Growing support from both software and
hardware developershardware developers– DLS Part of DirectMusic in Microsoft’s DirectX DLS Part of DirectMusic in Microsoft’s DirectX
6.06.0
DLS-2 synthesizer modelDLS-2 synthesizer model
Simple yet powerful structure much alike Simple yet powerful structure much alike to many existing synthesizers in the to many existing synthesizers in the market (eg in PC soundcards)market (eg in PC soundcards)– Uses loopable samples as sound sources Uses loopable samples as sound sources
(wavetable)(wavetable)– variable routing of control sourcesvariable routing of control sources
2 envelopes for amplitude control2 envelopes for amplitude control 2 low frequency oscillators2 low frequency oscillators 1-pole dynamic low-pass filter1-pole dynamic low-pass filter
– Standardized response to MIDI controllersStandardized response to MIDI controllers
mission.mss
Audio BifsAudio Bifs
AudioSource AudioSource
Piano (SA) Finger snaps (Parametric)
BIFSstuff
Audiochannels
Bass (SA)
AudioSource
AudioMixAudioFX
Synchronization with Visual!
AudioFX AudioFX AudioDelay
AudioMix
HRTF
Demo Audio BIFSDemo Audio BIFS
ConclusionConclusion
MPEG-4 Audio attempts to offer MPEG-4 Audio attempts to offer solutions to all spectra of sound.solutions to all spectra of sound.
Some of the tools are more stable, Some of the tools are more stable, while others are still in Research while others are still in Research and Development.and Development.
MPEG2-AAC is the best multi-MPEG2-AAC is the best multi-channel lossy audio compression channel lossy audio compression standard to date.standard to date.
AcknowledgementsAcknowledgements
I would like to thank I would like to thank the authors from the authors from the references for the references for providing the providing the material presented material presented here today. here today.
DefinitionsDefinitions T/F T/F Time/Frequency (MDCT transform)Time/Frequency (MDCT transform) AAC AAC Advanced Audio CodingAdvanced Audio Coding PARA PARA Parametric Parametric CELP CELP Code Excited Linear PredictionCode Excited Linear Prediction SASA Structured AudioStructured Audio PNS PNS Perceptual Noise SubstitutionPerceptual Noise Substitution HVXC HVXC Harmonic Vector eXcitation CodingHarmonic Vector eXcitation Coding HILN HILN Harmonic and Individual Line + NoiseHarmonic and Individual Line + Noise SAOL SAOL Structured Audio Orchestra LanguageStructured Audio Orchestra Language SASL SASL Structured Audio Score LanguageStructured Audio Score Language MIDI MIDI Musical Instrument Digital InterfaceMusical Instrument Digital Interface TTS TTS Text to SpeechText to Speech
More DefinitionsMore Definitions CD CD Committee DraftCommittee Draft IS13818-7 IS13818-7 Advanced Audio CodingAdvanced Audio Coding LC LC Low Complexity Low Complexity BSAC BSAC Bit Sliced Arithmetic CodingBit Sliced Arithmetic Coding SSRSSR Scalable Sample Rate Scalable Sample Rate PNS PNS Perceptual Noise SubstitutionPerceptual Noise Substitution VBRVBR Variable Bit RateVariable Bit Rate TLSSTLSS Tools for Large Step ScalabilityTools for Large Step Scalability SNHCSNHC Synthetic/Natural Hybrid CodingSynthetic/Natural Hybrid Coding DLSDLS Downloadable SamplesDownloadable Samples
Natural Audio Natural Audio ComplexityComplexity
1chanaudio
AAC AAC-LC
T/ Fmain
TwinVQ
HILN HVXC
NB-CELP
WB-CELP
RAM(Words)
4256 2232 4346 4240 3000 1500
650 830
ROM(Words)
3545 3545 3618 43000
4000 7700
2300 1000
min.WordLength
>=20 >=20 >=20 >=20 16 16 16 24(16)
Samp.Rate
48 48 48 48 8 8 8 16
MOPS/MIPS
5 3 6 3 4 typ.10max
2 2 4
AAC Decoder ComplexityAAC Decoder Complexity EvaluationEvaluation
MPEG AAC DecoderMPEG AAC Decoder ComplexityComplexity2-channel Main Profile2-channel Main Profile 40% of 133 MHz 40% of 133 MHz PentiumPentium
2-channel Low Complexity2-channel Low Complexity 25% of 133 MHz 25% of 133 MHz PentiumPentium5-channel Main Profile5-channel Main Profile 90 sq. mm die, 0.5 90 sq. mm die, 0.5 micron micron CMOSCMOS5-channel Low Complexity5-channel Low Complexity 60 sq.mm die, 0.5 60 sq.mm die, 0.5 micron micron CMOSCMOS
AAC Test ResultsAAC Test Results
Test at BBC and NHK according to ITU-R Test at BBC and NHK according to ITU-R BS.1116BS.1116– triple-stimulus/hidden-reference/double-triple-stimulus/hidden-reference/double-
blindblind– ITU-R 5-point impairment scaleITU-R 5-point impairment scale– 95% Confidence Intervals95% Confidence Intervals
MPEG AAC provides “indistinguishable” quality MPEG AAC provides “indistinguishable” quality at 320 kb/s per five channelsat 320 kb/s per five channels
MPEG AAC at 320 kb/s outperforms MPEG BC MPEG AAC at 320 kb/s outperforms MPEG BC Layer II at 640 kb/s per five channelsLayer II at 640 kb/s per five channels
Recent Stereo Tests at NHK Showed MPEG AAC Recent Stereo Tests at NHK Showed MPEG AAC provides “indistinguishable” quality at 128 provides “indistinguishable” quality at 128 kb/s per two channelskb/s per two channels
ReferencesReferences M. Bosi, E. Schrierer, B. Edler, Peter G. Schreiner MPEG-4 M. Bosi, E. Schrierer, B. Edler, Peter G. Schreiner MPEG-4
Seminar, Fribourg, Switzerland 1997Seminar, Fribourg, Switzerland 1997 S. Quackenbush, “Coding of Natural Audio in MPEG-4”, Proc S. Quackenbush, “Coding of Natural Audio in MPEG-4”, Proc
IEEE ICASSP, Seattle, 1998IEEE ICASSP, Seattle, 1998 B. Grill, B. Edler, I. Kaneko, Y. Lee, M. Nishiguichi, E. Scheirer, B. Grill, B. Edler, I. Kaneko, Y. Lee, M. Nishiguichi, E. Scheirer,
and M. Väänänen (Eds). ISO 14496-4(MPEG-4 Audio) and M. Väänänen (Eds). ISO 14496-4(MPEG-4 Audio) Committee Draft. MPEG document N1903Committee Draft. MPEG document N1903
E. Schrier, “The MPEG-4 Structured Audio Standard”, Proc E. Schrier, “The MPEG-4 Structured Audio Standard”, Proc IEEE ICASSP, Seattle, 1998IEEE ICASSP, Seattle, 1998
Juergen Herre, “Updated Description for Perceptual Noise Substitution Tool”, MPEG Document M2692
E. Scheirer, R. Väänänen, J. Huopaniemi, “AudioBIFS: The MPEG-4 Standard for Effects Processing”, AES, SF, 1998
Overview: http://www.cselt.it/mpeg/standards/mpeg-4/mpeg-4.htm