Introducing Audio Signal Processing & Audio Coding
Dr Michael Mason
Senior Manager, CE Technology
Dolby Australia Pty Ltd
© 2016-17 DOLBY LABORATORIES, INC.
Overview
Audio Signal Processing Applications @ Dolby
Audio Signal Processing Basics
• Sampling
• What is an audio signal?
• Signal Processing Domains
Case Study – Headphone Virtualisation
• Frequency Response
• FIR filtering
• Computational Complexity
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Applications @ Dolby
Cinema
• Delivering channel based audio - 5.1 – 7.1
– Distribute movies to multiple screens in a multiplex
– Cinemas use speaker arrays – rather than single speakers – so processing required to fill the arrays from single channel feeds
• Rendering immersive audio – Dolby Atmos
– Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific cinema’s speaker locations
• Speaker equalization & protection
– Process the audio sent to each speaker to compensate for the electro-acoustic properties of the speaker. (e.g., frequency response, distortion characteristics)
– Ensure that audio sent to the speakers doesn’t over driver the speaker, which would damage them.
• High channel count amplifiers
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Applications @ Dolby
Broadcast / Home Theatre
• Compression of Audio for Streaming / DVD / Blu-ray Disc
– Perceptual audio coding (case study later)
– Matrix encoding (Pro-logic)
– Multi-channel audio coding
– Multiple languages
– Multiple playback formats (stereo / 5.1 / etc)
• Broadcast end-to-end
– Capture, mixing, coding, transmission, playback
• AV Receivers (AVRs), Set Top Boxes (STBs), Digital Media Adapters (DMAs)
• Games consoles
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Applications @ Dolby
Personal Audio
• Devices
– Mobile phones (feature phones & smart phones)
– Tablets
– Music players
– PCs
• Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers)
• Headphone playback is a big use case (case study later)
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Applications @ Dolby
Voice Processing
• Many of the ‘same’ basic challenges – but because speech has some specifically different characteristics from general audio, different solutions exist
• Speech coders use different approaches than audio codecs
– What makes a good codec is measured differently
– The transmission bandwidths used for the data is much more limited
• Conferencing & Telephony
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
DOLBY DIMENSION
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Products with Dolby processing
Introducing Audio Signal Processing & Audio Coding
The
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Basics
Sampling
• Digital signals have samples which are discrete in time and magnitude
• Process of converting a continuous signal to the digital domain is Sampling
– Two key questions when sampling are: How often to sample & how precisely?
Introducing Audio Signal Processing & Audio Coding
Analogue to Digital Converter
(ADC)
Digital to Analogue Converter
(DAC)
Digital Signal
Processing
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Basics
Sampling Frequency – 𝑓 (how often?)
• Number of samples per second
• Nyquist rate:
– Greater than twice the highest frequency
Introducing Audio Signal Processing & Audio Coding
Tf0 = 1/T
T/2
fs = 2/T
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Basics
Resolution (how precisely?)
• Each sample is represented by a number, how many bits should we use?
• Converting a continuous value to a discrete value requires quantisation.
• Quantisation Error
– ‘1’ → 0.5
– ‘0’ → -0.5
Introducing Audio Signal Processing & Audio Coding
0
1
-1.0 +1.0Analogue
Digital
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Basics
Resolution (how precisely?)
• By using more bits, we reduce the error
… skipping all the math …
• Each additional bit of resolution improves SNR (signal to noise ratio) by 6.02 dB
Introducing Audio Signal Processing & Audio Coding
000
101
-1.0 +1.0Analogue
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Basics
Audio Signal
• Sampling Frequency
– Human perception – 20 Hz – 20,000 Hz
– Nyquist says Fs >= 40 kHz
• CD Audio: 44.1 kHz
• Blu-ray (and before that DAT): 48 kHz
• Bit depth
– Range of loudness relative to human hearing…
• Threshold of hearing – 0 dB
• Jet Engines – 110-140 dB
• Busy Road (standing at the curb) – 100 dB
• Sustained exposure will cause damage – 85 dB
– 16 bits per sample gives ~ 96 dB of dynamic range
– 24 bits per sample = 144 dB
Introducing Audio Signal Processing & Audio Coding
When/Where might we use more?
(higher sampling rate or more bits?)
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Basics
Audio Signal data rates
• 48 kHz, 16 bits per sample = 768 kbps / ch
• 3.86 GB for a 2hr movie (5.1 channels) (NB: DVD capacity = 4.7GB)
• 4G has 5-12 Mbps bandwidth (down) compared to 5.1 channels of audio ~4.6 Mbps
For practical transmission of Audio it needs to be compressed
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Audio Signal Processing Basics
Processing domains• Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain
• Not everything we want to do with audio is formulated as a time domain operation
– e.g., Flattening the frequency response of a speaker
• The Fourier Transform expresses a signal in terms of it’s frequency components (sinusoids). Using it we can formulate processing in the frequency domain
• Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient.
• Signal processing also has other useful transform domains which may offer advantages for specific types of processing
– e.g., image coding often uses the Discrete Cosine Transform – DCT
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Headphone Virtualisation
How do you get surround sound out of a pair of headphones?
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Headphone Virtualisation
Two things we need to achieve:
• Make it sound like the audio is coming from different directions
• Make it sound like the listener is in a room.
Both can be achieved by filtering the signal using the impulse response of the room (RIR) and the head-related transfer functions (HRTF).
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Headphone Virtualisation
Room impulse response
• By measuring how a short impulsive sound is altered by a room, the room’s reflections and echoes can be characterised to create an impulse response.
https://www.youtube.com/watch?v=PkZjIHTJ4jc
• The impulse response can in turn be used to filter any signal, to make it sound like it was in the room.
• The process of filtering a signal using an impulse response is convolution:
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Headphone Virtualisation
Room impulse response
• How many points would be required to capture a room? (i.e. how long is the impulse response?)
• Limiting the impulse response to 50ms gives us 1440 points (@48kHz)
• Considering the computational cost:
1440 * 48k –> 69 MFLOPS
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Headphone Virtualisation
Computational load
• On a DSP chip with a single cycle MAC -> 69 MCPS
• On an ARM, ‘MAC’s ~ 3.5 cycles each -> ~240 MCPS
• 5.1 channels -> 10 filters = 2,400 MCPS
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Headphone Virtualisation
The solution?
• Convolution in Time domain <-> Multiplication in Frequency Domain
– Fourier Transform the impulse response & the signal
• Block based, e.g., blocks of 2048
• O[N.log2(N)] -> k*22528 ~ 78,848
– Operate in the Frequency domain,
• Complex multiplies -> 4 * 2048 -> 8,192
– Transform the result back to the time domain.
• Same as forward transform
– Blocks per second?
• 23 blocks/sec … ~4 MFLOPS / filter
What about the HRTFs ?
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Headphone Virtualisation
Head-related Transfer Function
• Measured on a dummy
• Applied as filters
• Same computational arguments lead us to the need to apply these in the frequency domain.
NB: we don’t need to go back to the time domain between the two sets of filters
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Dolby Atmos for headphones debuted in Blizzard’s Overwatch
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Dolby Atmos for Headphones is available in Windows through Dolby Access App
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
How do you reduce the storage and transmission bandwidth requirements of Audio signals?
Bitrates:
• Uncompressed : 768 kbps / ch
• DVD (AC3) : 448 kbps (5.1 channels) (~10:1 compression ratio)
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Audio Coding is Lossy
• Lossless compression: must perfectly reconstruct their source. (zip files)
• Lossy compression: can ‘throw away’ data if it isn’t ‘needed’. The reconstruction need only be ‘good enough.’
– Deciding which bits to ‘throw away’ and what is ‘good enough’ is the hard part.
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Introducing Audio Signal Processing & Audio Coding
Time/Frequency analysis
Psychoacousticanalysis
Quantisation
Bit allocation
Entropy coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Psychoacoustics
• Study of sound Perception
– Perception implies the human experience – which include physiological and psychological factors.
http://auditoryneuroscience.com/vocalizations-speech/mcgurk-effect
• Is at the heart of the question of which parts of an audio signal are important, or unimportant.
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Psychoacoustics
• Most perceptual quantities are non-linear and subjective
• Loudness
– Non-linearly related to sound pressure
– Scales include: sone, phon
• Pitch
– Non-linearly related to frequency
– Scales include: Bark, Mel, ERB
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Frequency Masking
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Temporal Masking
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Time/Frequency analysis
• Break the incoming signal into time blocks and transform into the frequency domain
• Coding is always block based
• The frequency representation is analysed in bins of equal perceptual bandwidth (bark)
Psychoacoustic analysis
• Use the frequency representation of the current block to calculate the masking curve
• Use the frequency masking curves from previous frames to account for temporal masking
Introducing Audio Signal Processing & Audio Coding
Time/Frequency analysis
Psychoacousticanalysis
Quantisation
Bit allocation
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Masking Curve
• Areas of the spectrum where the masking curve is above the signal energy, represent ‘things we can’t hear’
• If we can’t hear them, we shouldn’t spend bits encoding them
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Bit allocation
• Using the masking curve, we can calculate the allowed signal to noise ratio in each of the frequency bands
• Knowing that allocating a bit to a quantiserimproves SNR by 6 dB, iterative allocate the bits available in the bit pool to band, until we either; run out of bits, or exceed the SNR requirements in all bands
• (any left over bits can be used to code the next frame)
• The bit distribution must be sent to the decoder
Quantiser
• Quantise the frequency domain representation to send to the decoder.
Introducing Audio Signal Processing & Audio Coding
Time/Frequency analysis
Psychoacousticanalysis
Quantisation
Bit allocation
© 2016-17 DOLBY LABORATORIES, INC.
Perceptual Audio Coding
Decoding is ‘simple’
• Recreate the frequency representation of each frame
• Transform back to the time domain
• Additional processing can be used to enhance the reconstructed signal
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
Summary
Audio Signal Processing Applications
Audio Signal Processing Basics• DSP requires sampling.
• Audio signals are those we can hear – which tells us the sampling rate and bit depth we need.
• We can process in different domains, e.g., time domain or frequency domain.
Case Study – Headphone Virtualisation• Reproduce a scene through headphones by simulating the room and your head shape
• Key tool is FIR filtering with impulse responses.
• Dur to computational complexity, we apply the filters using FFT
Case Study – Perceptual Audio Coding• Audio Coding is Lossy Compression
• Psychoacoustics is the study of how humans perceive sound
• Masking phenomena can be used to tell us which bits to ‘throw away’ when encoding
Introducing Audio Signal Processing & Audio Coding