Post on 06-Jul-2020
transcript
E85.2607: Lecture 8 – Source-Filter Processing
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 1 / 21
Source-filter analysis/synthesis
Transformation
Analysis
Synthesis
Sourcesignal
Spectralenvelope
n
f
n
n
n1 n2
n
Spectralenvelope
Separate
Source/excitation fine time/frequency structure (e.g. pitch)Filter broad spectral shape (resonances)
Similar to subtractive synthesis
Satisfying physical interpretation for real-world signals
Easier to make sense of than e.g. phase
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 2 / 21
Human speech production
Reasonable approximation to speechsignals:
Source is oscillation of vocal chords
e.g. normal speech (varyingpitches) vs whispering
Filtered by vocal tract(throat + tongue + lips)
e.g. “oooh” vs “aaah”resonances = formants
Both are time-varying
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 3 / 21
Source filter model
Excitation source
t
tResonance filter
f
0 200 400 600 800 1000
−5
0
5
10
x 10−3 time signal of pred. error e(n)
n !0 2 4 6 8
−100
−80
−60
−40
−20
magnitude spectra |X(f)| and |G" H(f)| in dB
f/kHz !
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 4 / 21
Formants in speech
watch thin as a dimeahas
mdnctcl
^
θ zwzh e
III ayε
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 5 / 21
How to separate the source and filter?
x(n)
Chan. Voc.LPC
Cepstrum
y(n)e (n)1
H (z)1
H (z)21
Spectral EnvelopeEstimation
Source Signal
Spectral EnvelopeTransformation
Source SignalProcessing
Short-time analysis
For each frame, estimate spectral envelope (filter response)1 Channel vocoder (frequency-domain)2 Linear Predictive Coding (LPC) (time-domain)3 Cepstral analysis
Source signal is whats left over (residual) after “whitening”
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 6 / 21
Channel vocoder
Wideband STFT filterbank
but using relatively few filters
Linearly spaced with equalbandwidth (STFT)Logarithmically spaced(constant-Q filter bank)
Take RMS energy in eachfrequency band
x(n)
BP 1
x (n)2BP1
( )2 LP x (n)RMS1
BP 2 ( )2 LP x (n)RMS2
BP k ( )2 LP x (n)RMSk
x (n)2BP2
x (n)2BPk
BP1
f
BP2
BPk
(a)
(b)
BP1
f
BPk
BP2
Octave-spaced channel stacking
Equally-spaced channel stacking
0 1000 2000 3000 4000 5000 6000 7000 8000
−100
−80
−60
−40
−20
0
X(f)/
dB
f/Hz !
Short−time spectrum and spectral envelope
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 7 / 21
Channel vocoder using FFT
x(n)
BP 1
x (n)2BP1
( )2 LP x (n)RMS1
BP 2 ( )2 LP x (n)RMS2
BP k ( )2 LP x (n)RMSk
x (n)2BP2
x (n)2BPk
BP1
f
BP2
BPk
(a)
(b)
BP1
f
BPk
BP2
Octave-spaced channel stacking
Equally-spaced channel stacking
0 1000 2000 3000 4000 5000 6000 7000 8000
−100
−80
−60
−40
−20
0
X(f)/
dB
f/Hz !
Short−time spectrum and spectral envelope
Lowpass filter magnitude of each STFT frame
i.e. filter columns of the spectrogram
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 8 / 21
Linear predictive coding
Predict next input sample as linear combination of previous samples
Synthesis filter(spectral envelope
model)
Excitation source Sound
a2
x(n) z -1
apa1
e(n)
x(n)
z -1 z -1
_
Filter is described by a few filter coefficients for each frame
xm[n] ≈ x [n] =
p∑k=1
akx [n − k]
Excitation is whats left after filtering (residual aka prediction error)
e[n] = x [n]− x [n] = x [n]−p∑
k=1
akx [n − k]
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 9 / 21
LPC analysis/synthesis
x(n)
P(z) x(n)
e(n) y(n)
P(z)
~e(n)
(a) (b)
_
(a) LPC analysis (b) LPC synthesis
P(z) is just an FIR filter: P(z) =∑p
k=1 akz−k
Excitation is still a filtered version of the input:
E (x) = X (z) (1− P(z))
For synthesis, pass (approximate) excitation through the inverse filter:
Y (z) = E (z)H(z)
H(z) =1
1− P(z)
all-pole “autoregressive” (AR) modeling
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 10 / 21
LPC - varying filter order
LPC filter H(z) models the spectrum of x [n]Minimizing the energy of the residual e[n] gives optimal coefficients
{ak} = argminak
∑n
(x [n]−
∑k
akx [m − k]
)2
The approximation improves with increasing filter order p
0 2 4 6 8−100
−50
0
50
100spectra of original and LPC filters
|X(f)|/dB
f/kHz !
p=10p=20
p=40p=60p=80p=120
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 11 / 21
Estimating LPC parameters
Set derivative of∑
n e2[n] w.r.t. ak zero and solve for ak :
∂
∂ak
∑n
e2[n] = 0
End up with p linear equations involving autocorrelations of x :∑m
x [m]x [m − k] =∑
i
ak
∑m
x [m − i ]x [m − k]
Solve using Levinson-Durbin recursion
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 12 / 21
LPC example
0 1000 2000 3000 4000 5000 6000 7000 freq / Hz
time / samp
-60
-40
-20
0
0 50 100 150 200 250 300 350 400
dB
windowed original
original spectrum
LPC residual
residual spectrum
LPC spectrum
-0.3
-0.2
-0.1
0
0.1
Filter poles
z-plane
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 13 / 21
Short-time LPC analysis
E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16
Short-Time LP Analysis• Solve LPC for each ~20 ms frame
10time / s
freq
/ kHz
0
2
4
6
8
freq
/ kHz
0
2
4
6
8
0.5 1 1.5 2 2.5 3
-1 0 1-1
-0.5
0
0.5
1
12
Real Part
Imag
inary
Par
t
0 0.2 0.4 0.6 0.8 1-15
-10
-5
0
5
10
15
20
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 14 / 21
Cepstral analysis
cepstrum = String.reverse(“spec”) + “trum”Entire lexicon of funny anagrams
Insight: source and filter add in the log spectral domain
X (z) = E (z)H(z)
log X (z) = log E (z) + log H(z)
Makes them easy to separate
y(n)=x(n)*h(n)
FFT log|Y(k)|Y(k) Y
^(k)R
IFFT
w(n)
w (n)HP
FFT
Source Envelope
Real Cepstrum
c(n)
w (n)LP
FFT
Spectral Envelope
c (n)h
c (n)x
C (k)=h log|H(k)|
C (k)=x log|X(k)|
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 15 / 21
Liftering example
By low-pass “liftering” the cepstrum we obtain the spectral envelope of the signal
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 16 / 21
Liftering example 2
Original waveform has excitation finestructure convolved with resonances
DFT shows harmonics modulated byresonances
Log DFT is sum of harmonic ‘comb’ andresonant bumps
IDFT separates out resonant bumps (lowquefrency) and regular, fine structure(‘pitch pulse’)
Selecting low-n cepstrum separatesresonance information (deconvolution /‘liftering’)
0 100 200 300 400-0.2
0
0.2Waveform and min. phase IR
samps
0 1000 2000 30000
10
20abs(dft) and liftered
freq / Hz
freq / Hz0 1000 2000 3000
-40
-20
0
log(abs(dft)) and liftered
0 100 200
0
100
200 real cepstrum and lifter
quefrency
dB
pitch pulse
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 17 / 21
Applications - Speech coding
E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16
4. LPC Synthesis• LP analysis on ~20ms frames gives
prediction filter and residual recombining them should yield perfectcoding applications further compress
e.g. simple pitch tracker ! “buzz-hiss” encoding
13
A(z) e[n]s[n]
e[n]
f
|1/A(ej!)|
LPC analysis
Represent & encode
Represent & encode
Excitation generator
All-pole filter
Input s[n]
Filter coefficients {ai}
Residual e[n]
Encoder Decoder
t
Output s[n]^e[n]^
H(z) = 1
1 - "aiz
-i
-50
0
50
100
1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s
16 ms frame boundariesPitch period valuesLow bitrate speech codec used in cell phones is based on LPCQuantize LPC filter parameters, use crude approximation to residual
Many different ways to represent filter params:prediction coefficients {ak}, roots of 1− P(z), line spectral frequenciesSwitch between noise and pulse train for excitation
E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16
4. LPC Synthesis• LP analysis on ~20ms frames gives
prediction filter and residual recombining them should yield perfectcoding applications further compress
e.g. simple pitch tracker ! “buzz-hiss” encoding
13
A(z) e[n]s[n]
e[n]
f
|1/A(ej!)|
LPC analysis
Represent & encode
Represent & encode
Excitation generator
All-pole filter
Input s[n]
Filter coefficients {ai}
Residual e[n]
Encoder Decoder
t
Output s[n]^e[n]^
H(z) = 1
1 - "aiz
-i
-50
0
50
100
1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 time / s
16 ms frame boundariesPitch period values
Use codebook of excitations (CELP: Code Excited Linear Prediction)
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 18 / 21
Applications - Cross-synthesis/Vocodingfr
eq /
Hz
freq
/ H
z
0
1000
2000
3000
4000
time / s0 0.2 0.4 0.6 0.8 1 1.2 1.40
1000
2000
3000
4000
Original (mpgr1_sx419)
Noise-excited LPC resynthesis with pole freqs
Reconstruct using excitation from one sound and filter from another
Whisperization: replace excitation with white noise
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 19 / 21
Still more applications
E4896 Music Signal Processing (Dan Ellis) 2010-02-22 - /16
LPC Warping• Replacing delays z-1 with allpass elements
warps frequencies but not magnitudes
http://www.ee.columbia.edu/~dpwe/resources/matlab/polewarp/
14
0 0.2 0.4 0.6 0.8 ^0
0.2
0.4
0.6
0.8 = 0.6
= -0.6
z + α
αz + 1
Time
Freq
uenc
yOriginal
0.5 1 1.5 2 2.5 30
2000
4000
6000
8000
Time
Freq
uenc
y
Warped LPC resynth, = -0.2
0.5 1 1.5 2 2.5 30
2000
4000
6000
8000
Process formants independent of pitchPitch-shifting while preserving formantsShift formants while preserving pitch
http://www.ee.columbia.edu/˜dpwe/resources/matlab/polewarp/
Voice transformationPitch-analysis
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 20 / 21
Reading
DAFX 9.1 – 9.3 - Source-Filter Processing
E85.2607: Lecture 8 – Source-Filter Processing 2010-04-01 21 / 21