2018 FallCTP431: Music and Audio Computing
Sound Representations
Graduate School of Culture Technology, KAISTJuhan Nam
Outlines
• Introduction
• Time-domain representation- Waveform
• Frequency domain representation- Discrete Fourier Transform (DFT)
• Time-Frequency domain representation- Short-time Fourier Transform (STFT)- Spectrogram
Introduction
• Visualizing sound as image or animation is very important- For research purpose- Analyzing the properties of sound: loudness, pitch and timbre - More complicated patterns in different contexts
- For artistic purpose- Mapping the sound properties to visual elements- Visual elements become more important in music
• In this topic, we will focus on visualizing sound “as it is”
Time-domain Representation
• The raw waveform: the amplitude of sound over time
• Phonautograph (Leon Scott,1857)- The first invention of sound recording- Recent research on image to sound restoration: http://firstsounds.org/
Source: http://edcarter.net/home/phonautogram/
Time-domain Representation
• Zoom-In view- Loudness: yes- Pitch: yes if the waveform is periodic (monophonic) - Timbre: to some extent from the wave shape (e.g. round or squared)
• Zoom-out view- Loudness: yes- Pitch: no- Timbre: to some extent from the amplitude envelop
Amplitude Envelope
• Summarized visualization of the waveform- Computed by max-peak picking or root-mean-square (RMS)
• Parameterized with “ADSR” for musical tones- Attack time, Decay time, Sustain level and Release time
• Used to determine gain in dynamic range compression: - e.g. compressor, expander
Example: Amplitude Envelope
Piano C4 Note Flute A4 Note
Modes
Tone Generation and Perception Perspective
• Musical tones are generated as a combination of (sinusoidal) oscillation modes
• Cochlear has frequency-selective responses
Source: http://acousticslab.org/psychoacoustics/PMFiles/Module03a.htm
High freq. Low freq.
Source: https://www.acs.psu.edu/drussell/Demos/string/Fixed.html
Cochlear
Frequency-Domain Representation
• Can we represent 𝑥 𝑛 with a finite set of sinusoids?- 𝑥 𝑛 = %
&∑ 𝐴 𝑘 𝑟+ 𝑛&,%+-.
- 𝑟+ 𝑛 = cos(45+6&
+ ϕ(𝑘)): discrete-time sinusoid with length N
- Find 𝐴 𝑘 , ϕ(𝑘)
Euler’s identity
• Euler’s identity
- Can be proved by Taylor’s series - If 𝜃 = 𝜋, 𝑒>5 + 1 = 0 (“the most beautiful equation in math”)
• Properties
𝑒>A = cos𝜃 + 𝑗sin𝜃
cos𝜃 = 𝑒>A + 𝑒,>A
2sin𝜃 =
𝑒>A − 𝑒,>A
2𝑗
Complex Sinusoids
• Cosine and sine can be represented in a single term
- Frequencies: 45+&
radian or +&𝐹H Hz (𝐹H: the sampling rate) ( 𝐾 =
0, 1, 2, … , 𝑁 − 1)- Example: N = 8
𝑠+ 𝑛 = 𝑒>45+6& = cos
2𝜋𝑘𝑛𝑁
+ 𝑗sin2𝜋𝑘𝑛𝑁
Figures are from https://ccrma.stanford.edu/~jos/dft/
Complex Sinusoids
Figures are from https://ccrma.stanford.edu/~jos/dft/
N = 8
Frequency-Domain Representation Using Complex Sinusoids• 𝑥 𝑛 is expressed in a simpler form:
- Now, how can we find 𝑋 𝑘 ?
𝑥 𝑛 =1𝑁N 𝐴 𝑘 cos
2𝜋𝑘𝑛𝑁
+ 𝜙(𝑘)&,%
+-.
=1𝑁N 𝐴 𝑘 (𝑒>(
45+6& PQ + )+𝑒,>(
45+6& PQ + ))/2
&,%
+-.
=1𝑁N(𝐴 𝑘 𝑒>Q(+)𝑒>
45+6& + 𝐴 𝑘 𝑒,>Q(+)𝑒,>
45+6& )/2
&,%
+-.
=1𝑁N(𝑋 𝑘 𝑒>
45+6& + 𝑋 𝑘 𝑒,>
45+6& )/2
&,%
+-.
= Real{1𝑁N 𝑋 𝑘 𝑒>
45+6&
&,%
+-.
}
=1𝑁N 𝑋 𝑘 𝑒>
45+6&
&,%
+-.𝑋 𝑘 = 𝐴(𝑘)𝑒>Y + = 𝐴 𝑘 cosϕ 𝑘 + 𝑗 sinϕ 𝑘
Orthogonality of Sinusoids
• Inner product between two complex sinusoids
𝑠Z 𝑛 [ 𝑠\∗ 𝑛 = N 𝑒>45Z6& [ 𝑒,>
45\6&
&,%
6-.
= ^ 𝑁if𝑝 = 𝑞0otherwise
cos(2n=0
N−1
∑ π pn / N )cos(2πqn / N )) =N / 2 if p = q or p = N − q0 otherwise
#$%
&%
sin(2n=0
N−1
∑ π pn / N )sin(2πqn / N )) =0 otherwiseN / 2 if p = q−N / 2 if p = N − q
#
$%
&%%
cos(2n=0
N−1
∑ π pn / N )sin(2πqn / N )) = 0
Orthogonal Projection on Complex Sinusoids
• Do the inner product with the signal and sinusoids
𝑥 𝑛 [ 𝑠+(𝑛) = N 𝑥 𝑛 𝑒,>45+6& = N(
1𝑁N 𝑋 𝑘 𝑒>
45f6&
&,%
f-.
)𝑒,>45+6&
&,%
6-.
&,%
6-.
=1𝑁N 𝑋 𝑘 (N 𝑒>
45f6&
&,%
6-.
𝑒,>45+6& )
&,%
f-.
=1𝑁𝑋 𝑘 𝑁 = 𝑋 𝑘 = 𝐴 𝑘 𝑒>Y +
To Wrap Up
• Discrete Fourier Transform
- Magnitude spectrum:
- Phase spectrum:
• Inverse Discrete Fourier Transform
𝑥(𝑛) =1𝑁N 𝑋 𝑘 𝑒>
45+6&
&,%
+-.
𝑋 𝑘 = N 𝑥 𝑛 𝑒,>45+6&
&,%
6-.
= 𝑋g 𝑘 + 𝑗𝑋h 𝑘 = 𝐴(𝑘)>Y +
𝑋 𝑘 = 𝐴 𝑘 = 𝑋g4 𝑘 + 𝑋h4 𝑘�
∠𝑋 𝑘 = ϕ 𝑘 = tan,%(𝑋h(𝑘)𝑋g(𝑘)
)
Properties of DFT
• Periodicity - 𝑋 𝑘 = 𝑋 𝑘 + 𝑁 = 𝑋 𝑘 + 2𝑁 = …- 𝑋 𝑘 = 𝑋 𝑘 − 𝑁 = 𝑋 𝑘 − 2𝑁 = …
• Symmetry- Magnitude response: 𝑋 𝑘 = 𝑋 −𝑘 = 𝑋 𝑁 − 𝑘- Phase response : ∠𝑋 𝑘 = −∠𝑋 −𝑘 =−∠𝑋 𝑁 − 𝑘- We often display only half the amplitude and phase responses
Properties of DFT
0 5 10 15 20 25 30-1
-0.5
0
0.5
1Waveform
0 5 10 15 20 25 300
5
10
15Magnitude (N=32)
0 5 10 15 20 25 30-4
-2
0
2
4Phase (N=32)
0 5 10 15 20 25 30-1
-0.5
0
0.5
1Waveform
-15 -10 -5 0 5 10 150
5
10
15Magnitude (N=32)
-15 -10 -5 0 5 10 15-4
-2
0
2
4Phase (N=32)
𝑋 𝑘 = 𝑋 𝑁 − 𝑘 𝑋 𝑘 = 𝑋 −𝑘
∠𝑋 𝑘 =−∠𝑋 𝑁 − 𝑘 ∠𝑋 𝑘 = −∠𝑋 −𝑘
Frequency Scaling
• 𝑋 𝑘 𝑘 = 0, 1,… ,𝑁 corresponds to frequency values that are evenly distributed between 0 and 𝑓𝑠 in Hz
fs-fs 0
N-N 0-N/2 N/2
-fs /2 fs /2
Examples of DFT
Sine: waveform5 10 15 20 25 30 35 40 45 50
−0.5
0
0.5
time−milliseconds
amplitude
0 500 1000 1500 2000 2500 3000 3500 40000
50
100
150
freqeuncy
magnitude
Sine: spectrum
0 20 40 60 80 100 120 140 160
−0.5
0
0.5
time−milliseconds
amplitude
0 0.5 1 1.5 2 2.5x 104
0
5
10
15
freqeuncy
mag
nitu
deDrum: waveform Drum: spectrum
50 52 54 56 58 60−0.4
−0.2
0
0.2
0.4
time−milliseconds
amplitude
0 0.5 1 1.5 2 2.5x 104
0
10
20
30
40
freqeuncy
mag
nitu
de
Flute: waveform Flute: spectrum
Fast Fourier Transform (FFT)
• Matrix multiplication view of DFT
• In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)”- Complexity reduction by FFT: O(N2)à O(Nlog2N) - Divide and conquer
𝑋(0)𝑋(1)𝑋(2)𝑋(3)⋮
𝑋(𝑁 − 2)𝑋(𝑁 − 1)
=
1 1111⋮
1
𝑊&𝑊&
4
𝑊&o
⋮
𝑊&&,%
1 ⋯𝑊&
4
𝑊&q
𝑊&r
⋮
𝑊&4(&,%)
⋯⋯⋯⋯⋯⋯
1𝑊&
&,%
𝑊&4(&,%)
𝑊&o(&,%)
𝑊&(&,%)(&,%)
𝑥(0)𝑥(1)𝑥(2)𝑥(3)⋮
𝑥(𝑁 − 2)𝑥(𝑁 − 1)
Time-Frequency Domain Representation
• DFT assumes that the signal is stationary- It is not a good idea to apply DFT to a long and dynamically changing
signal like music- Instead, we segment the signal and apply DFT separately
• Short-Time Fourier Transform
• This produces 2-D time-frequency representations- Parameters: window size, window type, FFT size, hop size - “Spectrogram” from the magnitude
: hop size : window: FFT size
𝑋(𝑘, 𝑙) = N 𝑤(𝑛)𝑥(𝑛 + 𝑙 [ ℎ)𝑒,>45+6&
&,%
6-.
𝑤(𝑛)𝑁
ℎ
Windowing
• Types of window functions- Trade-off between the width of main-lobe and the level of side-lobe
Main-lobe width
Side-lobe level
Short-Time Fourier Transform (STFT)
50% overlap
Example: Spectrogram
Piano C4 Note Flute A4 Note
Example: Spectrogram - 3D waterfall
Piano C4 Note Flute A4 Note
Example: Pop Music
Example: Deep Note
Time-Frequency Resolutions in STFT
• Trade-off between time and frequency resolution by window size
Long windowHigh freq. resolutionLow time resolution
Short windowHigh time resolution Low freq. resolution