+ All Categories
Home > Documents > *NTRODUCING UDIO 4IGNAL 1ROCESSING UDIO $ODINGcs3121/Lectures/UNSW_Dolby_2019.pdf · x "-* 7 * -0...

*NTRODUCING UDIO 4IGNAL 1ROCESSING UDIO $ODINGcs3121/Lectures/UNSW_Dolby_2019.pdf · x "-* 7 * -0...

Date post: 17-Jul-2019
Category:
Upload: vulien
View: 223 times
Download: 0 times
Share this document with a friend
42
Introducing Audio Signal Processing & Audio Coding Dr Michael Mason Senior Manager, CE Technology Dolby Australia Pty Ltd
Transcript

Introducing Audio Signal Processing & Audio Coding

Dr Michael Mason

Senior Manager, CE Technology

Dolby Australia Pty Ltd

© 2016-17 DOLBY LABORATORIES, INC.

Overview

Audio Signal Processing Applications @ Dolby

Audio Signal Processing Basics

• Sampling

• What is an audio signal?

• Signal Processing Domains

Case Study – Headphone Virtualisation

• Frequency Response

• FIR filtering

• Computational Complexity

Introducing Audio Signal Processing & Audio Coding

Audio Signal Processing Applications @ Dolby

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Applications @ Dolby

Cinema

• Delivering channel based audio - 5.1 – 7.1

– Distribute movies to multiple screens in a multiplex

– Cinemas use speaker arrays – rather than single speakers – so processing required to fill the arrays from single channel feeds

• Rendering immersive audio – Dolby Atmos

– Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific cinema’s speaker locations

• Speaker equalization & protection

– Process the audio sent to each speaker to compensate for the electro-acoustic properties of the speaker. (e.g., frequency response, distortion characteristics)

– Ensure that audio sent to the speakers doesn’t over driver the speaker, which would damage them.

• High channel count amplifiers

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Applications @ Dolby

Broadcast / Home Theatre

• Compression of Audio for Streaming / DVD / Blu-ray Disc

– Perceptual audio coding (case study later)

– Matrix encoding (Pro-logic)

– Multi-channel audio coding

– Multiple languages

– Multiple playback formats (stereo / 5.1 / etc)

• Broadcast end-to-end

– Capture, mixing, coding, transmission, playback

• AV Receivers (AVRs), Set Top Boxes (STBs), Digital Media Adapters (DMAs)

• Games consoles

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Applications @ Dolby

Personal Audio

• Devices

– Mobile phones (feature phones & smart phones)

– Tablets

– Music players

– PCs

• Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers)

• Headphone playback is a big use case (case study later)

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Applications @ Dolby

Voice Processing

• Many of the ‘same’ basic challenges – but because speech has some specifically different characteristics from general audio, different solutions exist

• Speech coders use different approaches than audio codecs

– What makes a good codec is measured differently

– The transmission bandwidths used for the data is much more limited

• Conferencing & Telephony

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

DOLBY DIMENSION

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Products with Dolby processing

Introducing Audio Signal Processing & Audio Coding

The

Audio Signal Processing Basics

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Basics

Sampling

• Digital signals have samples which are discrete in time and magnitude

• Process of converting a continuous signal to the digital domain is Sampling

– Two key questions when sampling are: How often to sample & how precisely?

Introducing Audio Signal Processing & Audio Coding

Analogue to Digital Converter

(ADC)

Digital to Analogue Converter

(DAC)

Digital Signal

Processing

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Basics

Sampling Frequency – 𝑓 (how often?)

• Number of samples per second

• Nyquist rate:

– Greater than twice the highest frequency

Introducing Audio Signal Processing & Audio Coding

Tf0 = 1/T

T/2

fs = 2/T

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Basics

Resolution (how precisely?)

• Each sample is represented by a number, how many bits should we use?

• Converting a continuous value to a discrete value requires quantisation.

• Quantisation Error

– ‘1’ → 0.5

– ‘0’ → -0.5

Introducing Audio Signal Processing & Audio Coding

0

1

-1.0 +1.0Analogue

Digital

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Basics

Resolution (how precisely?)

• By using more bits, we reduce the error

… skipping all the math …

• Each additional bit of resolution improves SNR (signal to noise ratio) by 6.02 dB

Introducing Audio Signal Processing & Audio Coding

000

101

-1.0 +1.0Analogue

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Basics

Audio Signal

• Sampling Frequency

– Human perception – 20 Hz – 20,000 Hz

– Nyquist says Fs >= 40 kHz

• CD Audio: 44.1 kHz

• Blu-ray (and before that DAT): 48 kHz

• Bit depth

– Range of loudness relative to human hearing…

• Threshold of hearing – 0 dB

• Jet Engines – 110-140 dB

• Busy Road (standing at the curb) – 100 dB

• Sustained exposure will cause damage – 85 dB

– 16 bits per sample gives ~ 96 dB of dynamic range

– 24 bits per sample = 144 dB

Introducing Audio Signal Processing & Audio Coding

When/Where might we use more?

(higher sampling rate or more bits?)

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Basics

Audio Signal data rates

• 48 kHz, 16 bits per sample = 768 kbps / ch

• 3.86 GB for a 2hr movie (5.1 channels) (NB: DVD capacity = 4.7GB)

• 4G has 5-12 Mbps bandwidth (down) compared to 5.1 channels of audio ~4.6 Mbps

For practical transmission of Audio it needs to be compressed

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Audio Signal Processing Basics

Processing domains• Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain

• Not everything we want to do with audio is formulated as a time domain operation

– e.g., Flattening the frequency response of a speaker

• The Fourier Transform expresses a signal in terms of it’s frequency components (sinusoids). Using it we can formulate processing in the frequency domain

• Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient.

• Signal processing also has other useful transform domains which may offer advantages for specific types of processing

– e.g., image coding often uses the Discrete Cosine Transform – DCT

Introducing Audio Signal Processing & Audio Coding

Headphone Virtualisation

Case Study 1

© 2016-17 DOLBY LABORATORIES, INC.

Headphone Virtualisation

How do you get surround sound out of a pair of headphones?

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Headphone Virtualisation

Two things we need to achieve:

• Make it sound like the audio is coming from different directions

• Make it sound like the listener is in a room.

Both can be achieved by filtering the signal using the impulse response of the room (RIR) and the head-related transfer functions (HRTF).

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Headphone Virtualisation

Room impulse response

• By measuring how a short impulsive sound is altered by a room, the room’s reflections and echoes can be characterised to create an impulse response.

https://www.youtube.com/watch?v=PkZjIHTJ4jc

• The impulse response can in turn be used to filter any signal, to make it sound like it was in the room.

• The process of filtering a signal using an impulse response is convolution:

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Headphone Virtualisation

Room impulse response

• How many points would be required to capture a room? (i.e. how long is the impulse response?)

• Limiting the impulse response to 50ms gives us 1440 points (@48kHz)

• Considering the computational cost:

1440 * 48k –> 69 MFLOPS

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Headphone Virtualisation

Computational load

• On a DSP chip with a single cycle MAC -> 69 MCPS

• On an ARM, ‘MAC’s ~ 3.5 cycles each -> ~240 MCPS

• 5.1 channels -> 10 filters = 2,400 MCPS

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Headphone Virtualisation

The solution?

• Convolution in Time domain <-> Multiplication in Frequency Domain

– Fourier Transform the impulse response & the signal

• Block based, e.g., blocks of 2048

• O[N.log2(N)] -> k*22528 ~ 78,848

– Operate in the Frequency domain,

• Complex multiplies -> 4 * 2048 -> 8,192

– Transform the result back to the time domain.

• Same as forward transform

– Blocks per second?

• 23 blocks/sec … ~4 MFLOPS / filter

What about the HRTFs ?

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Headphone Virtualisation

Head-related Transfer Function

• Measured on a dummy

• Applied as filters

• Same computational arguments lead us to the need to apply these in the frequency domain.

NB: we don’t need to go back to the time domain between the two sets of filters

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Atmos for headphones debuted in Blizzard’s Overwatch

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Dolby Atmos for Headphones is available in Windows through Dolby Access App

Introducing Audio Signal Processing & Audio Coding

Perceptual Audio Coding

Case study 2

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

How do you reduce the storage and transmission bandwidth requirements of Audio signals?

Bitrates:

• Uncompressed : 768 kbps / ch

• DVD (AC3) : 448 kbps (5.1 channels) (~10:1 compression ratio)

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Audio Coding is Lossy

• Lossless compression: must perfectly reconstruct their source. (zip files)

• Lossy compression: can ‘throw away’ data if it isn’t ‘needed’. The reconstruction need only be ‘good enough.’

– Deciding which bits to ‘throw away’ and what is ‘good enough’ is the hard part.

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Introducing Audio Signal Processing & Audio Coding

Time/Frequency analysis

Psychoacousticanalysis

Quantisation

Bit allocation

Entropy coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Psychoacoustics

• Study of sound Perception

– Perception implies the human experience – which include physiological and psychological factors.

http://auditoryneuroscience.com/vocalizations-speech/mcgurk-effect

• Is at the heart of the question of which parts of an audio signal are important, or unimportant.

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Psychoacoustics

• Most perceptual quantities are non-linear and subjective

• Loudness

– Non-linearly related to sound pressure

– Scales include: sone, phon

• Pitch

– Non-linearly related to frequency

– Scales include: Bark, Mel, ERB

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Frequency Masking

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Temporal Masking

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Time/Frequency analysis

• Break the incoming signal into time blocks and transform into the frequency domain

• Coding is always block based

• The frequency representation is analysed in bins of equal perceptual bandwidth (bark)

Psychoacoustic analysis

• Use the frequency representation of the current block to calculate the masking curve

• Use the frequency masking curves from previous frames to account for temporal masking

Introducing Audio Signal Processing & Audio Coding

Time/Frequency analysis

Psychoacousticanalysis

Quantisation

Bit allocation

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Masking Curve

• Areas of the spectrum where the masking curve is above the signal energy, represent ‘things we can’t hear’

• If we can’t hear them, we shouldn’t spend bits encoding them

Introducing Audio Signal Processing & Audio Coding

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Bit allocation

• Using the masking curve, we can calculate the allowed signal to noise ratio in each of the frequency bands

• Knowing that allocating a bit to a quantiserimproves SNR by 6 dB, iterative allocate the bits available in the bit pool to band, until we either; run out of bits, or exceed the SNR requirements in all bands

• (any left over bits can be used to code the next frame)

• The bit distribution must be sent to the decoder

Quantiser

• Quantise the frequency domain representation to send to the decoder.

Introducing Audio Signal Processing & Audio Coding

Time/Frequency analysis

Psychoacousticanalysis

Quantisation

Bit allocation

© 2016-17 DOLBY LABORATORIES, INC.

Perceptual Audio Coding

Decoding is ‘simple’

• Recreate the frequency representation of each frame

• Transform back to the time domain

• Additional processing can be used to enhance the reconstructed signal

Introducing Audio Signal Processing & Audio Coding

Summary

© 2016-17 DOLBY LABORATORIES, INC.

Summary

Audio Signal Processing Applications

Audio Signal Processing Basics• DSP requires sampling.

• Audio signals are those we can hear – which tells us the sampling rate and bit depth we need.

• We can process in different domains, e.g., time domain or frequency domain.

Case Study – Headphone Virtualisation• Reproduce a scene through headphones by simulating the room and your head shape

• Key tool is FIR filtering with impulse responses.

• Dur to computational complexity, we apply the filters using FFT

Case Study – Perceptual Audio Coding• Audio Coding is Lossy Compression

• Psychoacoustics is the study of how humans perceive sound

• Masking phenomena can be used to tell us which bits to ‘throw away’ when encoding

Introducing Audio Signal Processing & Audio Coding


Recommended