+ All Categories
Home > Documents > 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or...

2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or...

Date post: 16-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
29
2018 Fall CTP431: Music and Audio Computing Sound Representations Graduate School of Culture Technology, KAIST Juhan Nam
Transcript
Page 1: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

2018 FallCTP431: Music and Audio Computing

Sound Representations

Graduate School of Culture Technology, KAISTJuhan Nam

Page 2: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Outlines

• Introduction

• Time-domain representation- Waveform

• Frequency domain representation- Discrete Fourier Transform (DFT)

• Time-Frequency domain representation- Short-time Fourier Transform (STFT)- Spectrogram

Page 3: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Introduction

• Visualizing sound as image or animation is very important- For research purpose- Analyzing the properties of sound: loudness, pitch and timbre - More complicated patterns in different contexts

- For artistic purpose- Mapping the sound properties to visual elements- Visual elements become more important in music

• In this topic, we will focus on visualizing sound “as it is”

Page 4: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Time-domain Representation

• The raw waveform: the amplitude of sound over time

• Phonautograph (Leon Scott,1857)- The first invention of sound recording- Recent research on image to sound restoration: http://firstsounds.org/

Source: http://edcarter.net/home/phonautogram/

Page 5: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Time-domain Representation

• Zoom-In view- Loudness: yes- Pitch: yes if the waveform is periodic (monophonic) - Timbre: to some extent from the wave shape (e.g. round or squared)

• Zoom-out view- Loudness: yes- Pitch: no- Timbre: to some extent from the amplitude envelop

Page 6: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Amplitude Envelope

• Summarized visualization of the waveform- Computed by max-peak picking or root-mean-square (RMS)

• Parameterized with “ADSR” for musical tones- Attack time, Decay time, Sustain level and Release time

• Used to determine gain in dynamic range compression: - e.g. compressor, expander

Page 7: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Example: Amplitude Envelope

Piano C4 Note Flute A4 Note

Page 8: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Modes

Tone Generation and Perception Perspective

• Musical tones are generated as a combination of (sinusoidal) oscillation modes

• Cochlear has frequency-selective responses

Source: http://acousticslab.org/psychoacoustics/PMFiles/Module03a.htm

High freq. Low freq.

Source: https://www.acs.psu.edu/drussell/Demos/string/Fixed.html

Cochlear

Page 9: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Frequency-Domain Representation

• Can we represent 𝑥 𝑛 with a finite set of sinusoids?- 𝑥 𝑛 = %

&∑ 𝐴 𝑘 𝑟+ 𝑛&,%+-.

- 𝑟+ 𝑛 = cos(45+6&

+ ϕ(𝑘)): discrete-time sinusoid with length N

- Find 𝐴 𝑘 , ϕ(𝑘)

Page 10: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Euler’s identity

• Euler’s identity

- Can be proved by Taylor’s series - If 𝜃 = 𝜋, 𝑒>5 + 1 = 0 (“the most beautiful equation in math”)

• Properties

𝑒>A = cos𝜃 + 𝑗sin𝜃

cos𝜃 = 𝑒>A + 𝑒,>A

2sin𝜃 =

𝑒>A − 𝑒,>A

2𝑗

Page 11: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Complex Sinusoids

• Cosine and sine can be represented in a single term

- Frequencies: 45+&

radian or +&𝐹H Hz (𝐹H: the sampling rate) ( 𝐾 =

0, 1, 2, … , 𝑁 − 1)- Example: N = 8

𝑠+ 𝑛 = 𝑒>45+6& = cos

2𝜋𝑘𝑛𝑁

+ 𝑗sin2𝜋𝑘𝑛𝑁

Figures are from https://ccrma.stanford.edu/~jos/dft/

Page 12: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Complex Sinusoids

Figures are from https://ccrma.stanford.edu/~jos/dft/

N = 8

Page 13: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Frequency-Domain Representation Using Complex Sinusoids• 𝑥 𝑛 is expressed in a simpler form:

- Now, how can we find 𝑋 𝑘 ?

𝑥 𝑛 =1𝑁N 𝐴 𝑘 cos

2𝜋𝑘𝑛𝑁

+ 𝜙(𝑘)&,%

+-.

=1𝑁N 𝐴 𝑘 (𝑒>(

45+6& PQ + )+𝑒,>(

45+6& PQ + ))/2

&,%

+-.

=1𝑁N(𝐴 𝑘 𝑒>Q(+)𝑒>

45+6& + 𝐴 𝑘 𝑒,>Q(+)𝑒,>

45+6& )/2

&,%

+-.

=1𝑁N(𝑋 𝑘 𝑒>

45+6& + 𝑋 𝑘 𝑒,>

45+6& )/2

&,%

+-.

= Real{1𝑁N 𝑋 𝑘 𝑒>

45+6&

&,%

+-.

}

=1𝑁N 𝑋 𝑘 𝑒>

45+6&

&,%

+-.𝑋 𝑘 = 𝐴(𝑘)𝑒>Y + = 𝐴 𝑘 cosϕ 𝑘 + 𝑗 sinϕ 𝑘

Page 14: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Orthogonality of Sinusoids

• Inner product between two complex sinusoids

𝑠Z 𝑛 [ 𝑠\∗ 𝑛 = N 𝑒>45Z6& [ 𝑒,>

45\6&

&,%

6-.

= ^ 𝑁if𝑝 = 𝑞0otherwise

cos(2n=0

N−1

∑ π pn / N )cos(2πqn / N )) =N / 2 if p = q or p = N − q0 otherwise

#$%

&%

sin(2n=0

N−1

∑ π pn / N )sin(2πqn / N )) =0 otherwiseN / 2 if p = q−N / 2 if p = N − q

#

$%

&%%

cos(2n=0

N−1

∑ π pn / N )sin(2πqn / N )) = 0

Page 15: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Orthogonal Projection on Complex Sinusoids

• Do the inner product with the signal and sinusoids

𝑥 𝑛 [ 𝑠+(𝑛) = N 𝑥 𝑛 𝑒,>45+6& = N(

1𝑁N 𝑋 𝑘 𝑒>

45f6&

&,%

f-.

)𝑒,>45+6&

&,%

6-.

&,%

6-.

=1𝑁N 𝑋 𝑘 (N 𝑒>

45f6&

&,%

6-.

𝑒,>45+6& )

&,%

f-.

=1𝑁𝑋 𝑘 𝑁 = 𝑋 𝑘 = 𝐴 𝑘 𝑒>Y +

Page 16: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

To Wrap Up

• Discrete Fourier Transform

- Magnitude spectrum:

- Phase spectrum:

• Inverse Discrete Fourier Transform

𝑥(𝑛) =1𝑁N 𝑋 𝑘 𝑒>

45+6&

&,%

+-.

𝑋 𝑘 = N 𝑥 𝑛 𝑒,>45+6&

&,%

6-.

= 𝑋g 𝑘 + 𝑗𝑋h 𝑘 = 𝐴(𝑘)>Y +

𝑋 𝑘 = 𝐴 𝑘 = 𝑋g4 𝑘 + 𝑋h4 𝑘�

∠𝑋 𝑘 = ϕ 𝑘 = tan,%(𝑋h(𝑘)𝑋g(𝑘)

)

Page 17: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Properties of DFT

• Periodicity - 𝑋 𝑘 = 𝑋 𝑘 + 𝑁 = 𝑋 𝑘 + 2𝑁 = …- 𝑋 𝑘 = 𝑋 𝑘 − 𝑁 = 𝑋 𝑘 − 2𝑁 = …

• Symmetry- Magnitude response: 𝑋 𝑘 = 𝑋 −𝑘 = 𝑋 𝑁 − 𝑘- Phase response : ∠𝑋 𝑘 = −∠𝑋 −𝑘 =−∠𝑋 𝑁 − 𝑘- We often display only half the amplitude and phase responses

Page 18: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Properties of DFT

0 5 10 15 20 25 30-1

-0.5

0

0.5

1Waveform

0 5 10 15 20 25 300

5

10

15Magnitude (N=32)

0 5 10 15 20 25 30-4

-2

0

2

4Phase (N=32)

0 5 10 15 20 25 30-1

-0.5

0

0.5

1Waveform

-15 -10 -5 0 5 10 150

5

10

15Magnitude (N=32)

-15 -10 -5 0 5 10 15-4

-2

0

2

4Phase (N=32)

𝑋 𝑘 = 𝑋 𝑁 − 𝑘 𝑋 𝑘 = 𝑋 −𝑘

∠𝑋 𝑘 =−∠𝑋 𝑁 − 𝑘 ∠𝑋 𝑘 = −∠𝑋 −𝑘

Page 19: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Frequency Scaling

• 𝑋 𝑘 𝑘 = 0, 1,… ,𝑁 corresponds to frequency values that are evenly distributed between 0 and 𝑓𝑠 in Hz

fs-fs 0

N-N 0-N/2 N/2

-fs /2 fs /2

Page 20: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Examples of DFT

Sine: waveform5 10 15 20 25 30 35 40 45 50

−0.5

0

0.5

time−milliseconds

amplitude

0 500 1000 1500 2000 2500 3000 3500 40000

50

100

150

freqeuncy

magnitude

Sine: spectrum

0 20 40 60 80 100 120 140 160

−0.5

0

0.5

time−milliseconds

amplitude

0 0.5 1 1.5 2 2.5x 104

0

5

10

15

freqeuncy

mag

nitu

deDrum: waveform Drum: spectrum

50 52 54 56 58 60−0.4

−0.2

0

0.2

0.4

time−milliseconds

amplitude

0 0.5 1 1.5 2 2.5x 104

0

10

20

30

40

freqeuncy

mag

nitu

de

Flute: waveform Flute: spectrum

Page 21: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Fast Fourier Transform (FFT)

• Matrix multiplication view of DFT

• In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)”- Complexity reduction by FFT: O(N2)à O(Nlog2N) - Divide and conquer

𝑋(0)𝑋(1)𝑋(2)𝑋(3)⋮

𝑋(𝑁 − 2)𝑋(𝑁 − 1)

=

1 1111⋮

1

𝑊&𝑊&

4

𝑊&o

𝑊&&,%

1 ⋯𝑊&

4

𝑊&q

𝑊&r

𝑊&4(&,%)

⋯⋯⋯⋯⋯⋯

1𝑊&

&,%

𝑊&4(&,%)

𝑊&o(&,%)

𝑊&(&,%)(&,%)

𝑥(0)𝑥(1)𝑥(2)𝑥(3)⋮

𝑥(𝑁 − 2)𝑥(𝑁 − 1)

Page 22: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Time-Frequency Domain Representation

• DFT assumes that the signal is stationary- It is not a good idea to apply DFT to a long and dynamically changing

signal like music- Instead, we segment the signal and apply DFT separately

• Short-Time Fourier Transform

• This produces 2-D time-frequency representations- Parameters: window size, window type, FFT size, hop size - “Spectrogram” from the magnitude

: hop size : window: FFT size

𝑋(𝑘, 𝑙) = N 𝑤(𝑛)𝑥(𝑛 + 𝑙 [ ℎ)𝑒,>45+6&

&,%

6-.

𝑤(𝑛)𝑁

Page 23: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Windowing

• Types of window functions- Trade-off between the width of main-lobe and the level of side-lobe

Main-lobe width

Side-lobe level

Page 24: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Short-Time Fourier Transform (STFT)

50% overlap

Page 25: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Example: Spectrogram

Piano C4 Note Flute A4 Note

Page 26: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Example: Spectrogram - 3D waterfall

Piano C4 Note Flute A4 Note

Page 27: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Example: Pop Music

Page 28: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Example: Deep Note

Page 29: 2018 Fall CTP431: Music and Audio Computing · Introduction •Visualizing sound as image or animation is very important-For research purpose-Analyzing the properties of sound: loudness,

Time-Frequency Resolutions in STFT

• Trade-off between time and frequency resolution by window size

Long windowHigh freq. resolutionLow time resolution

Short windowHigh time resolution Low freq. resolution


Recommended