E85.2607: Lecture 9 – Sinusoidal Modeling
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 1 / 23
Sinusoidal Modeling/Spectral Modeling Synthesis (SMS)
OriginalSpectrum
TransformedSpectrum
TransformedFeature
OriginalFeature
Time
TransformedSound
SpectralAnalysis
FeatureExtraction Transform.
FeatureAddition
SpectralSynthesis
0 100 200 300 400 500
-1
-0.5
0
0.5
1
0 100 200 300 400 500
-1
-0.5
0
0.5
1
0 2000 4000 6000 8000 10000-140
-120
-100
-80
-60
-40
-20
0
0 2000 4000 6000 8000 10000-140
-120
-100
-80
-60
-40
-20
0
0 100 200 300 400 500
OriginalSound
Featu
re
Time
0 100 200 300 400 500
Time
Frequency Frequency
Time
Similar to phase vocoder, but assumes that signal issum of sinusoids with smoothly varying parameters:
x [n] ≈ x̃ [n] =∑k
ak [n] cos(ωk [n])
Freq. domain representation analogous to Fourier series
Flexible representation for transformations...
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
1. Sinusoidal Modeling• Periodic sounds
! ridges in spectrogram
each ridge is a sinusoidal harmonic
.. with smoothly-varying parameters
.. an efficient & flexibledescription?
2
Violin.arco.ff.A4
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 2 / 23
SMS overview
smoothingwindow
FFT
windowgeneration
peakdetection
pitchestimationsound
magnitudespectrum
phasespectrum
peakdata
peakcontinuation
pitchfrequency
sine frequency
sine magnitudes
sine phases
additivesynthesis
amplitudecorrection
FFT
Residualmodeling
magnitudespectrum
phasespectrum
residualspectral data
smoothingwindowwindow
generation
sinusoidalcomponent
residualcomponent
peakdata
Analysis
Extract sinusoidal tracks from STFT
Residual contains portion of signal that isnot well modeled by sinusoids
Synthesis
Each track controls an oscillator
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 3 / 23
Analysis overview
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
Sinusoidal Peak Picking• Local maxima in DFT frames
• Quadratic fit for sub-bin resolution
7
400 600 800 freq / Hz
-20
-10
0
10
20
400 600 800 freq / Hz
-10
-5
0
leve
l / d
B
phas
e / r
ad!
"!#$#%"&"'()
(*+
%(+*,
time / s
freq
/ Hz
leve
l / d
B
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180
2000
4000
6000
8000
0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60
-40
-20
0
20
1 Break signal into frame ala STFT
Be aware of time-frequency tradeoff
2 Pick peaks in each frame
Often expect to find peaks in harmonic series due to common pitchSearch for common factor
3 Organize spectral peaks into time-varying sinusoidal tracks
Expect sinusoids to drift in amplitude and frequency
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 4 / 23
Review: Time-frequency tradeoff
X [k , m] =N−1∑n=0
x [n + mL]w [n]e−j2πkn
N
DFT length N
Window determines freq. resolution
Must be long enough to resolve harmonics∼ 2× longest pitch period (∼ 50− 100 ms)Too long → blurred ak [n]
Hop size L
typically N/2 or N/4
small hop → simplerinterpolation over time
good freq. resolutionbad time resolution
bad freq. resolutiongood time resolution
FFT
sine wave, 2,000 HzHamming window
sine wave spectrum
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 5 / 23
Peak picking
Peak detection ! Accuracy on the detection of peak is limited to half a sample. ! We cannot rely solely on zero-padding since it becomes very expensive ! A solution is to combine zero-padding with quadratic interpolation
Find local maxima in each DFT frame
Accuracy of peak detection is limited to half a sample
Can zero pad, but can be expensive
Alternative: Frequency resolution smaller than one bin using quadraticinterpolation
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 6 / 23
Peak picking 2
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
Peak Selection• Don’t want every peak
just “true” sinusoidsthreshold?
local shape - fits
• Look for stabilityof frequency & amplitude in successive time framesphase derivative in time/freq
8
δ(ω − ω0) ∗W (ejω)
0 1000 2000 3000 4000 5000 6000 7000 freq / Hz-60
-40
-20
0
20le
vel /
dB
Not all peaks correspond to a stable sinusoid
Only retain peaks larger than some threshold
Only retain peaks consistent with harmonic series of underlying pitch
need pitch tracker, input must be monophonicpitch-synchronous analysis?
Look for stable parameters in adjacent frames
e.g. stable phase derivative in time and frequency
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 7 / 23
Peak continuation
Connect peaks in adjacent frames totrack sinusoid trajectory
Lots of different approaches
e.g. Macaulay-Quatieri approach:Greedily attach peak in currentframe to closest peak in next frame
Ambiguous if large frequencychanges
Peak continuation
! Once peaks have been detected, and possibly a fundamental frequency identified, we want to organise peaks into time-varying trajectories.
! The sinusoidal model assumes peaks as part of a frequency trajectory.
! Peak continuation assigns peaks to a given track
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 8 / 23
Track formation heuristics
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
Track Formation• Connect peaks in adjacent frames to form
sinusoidscan be ambiguous if large frequency changes
• Unclaimed peak → create new track• No continuation of track → termination
hysteresis
9
time
freq
death
birthexistingtracks
newpeaks
Unclaimed peak → create new trackNo continuation of track → terminationLots of other potential rules:
min track length to avoid spurious peaksmax allowable frequency deviation between adjacent framesmax silent gap lengthmax number of tracks per frame
Tricky to implement . . .
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 9 / 23
Sinusoidal synthesis
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
3. Sinusoidal Synthesis• Each sinusoid track
drives an oscillator
can interpolate amplitude, frequency samples
• Faster method synthesizes DFT framesthen overlap-addtrickier to achieve frequency modulation
11
{ak[n],ωk[n]}
0 0.05 0.1 0.15 0.2500
600
7000
1
2
3
0 0.05 0.1 0.15 0.2time / s
time / sfreq
/ Hz
leve
l
-3-2-10123
ak[n]·cos( k[n]·t)
k[n]
ak[n]
n
Each sinusoid track {ak [n], ωk [n]} drives an oscillatorInterpolate parameters to avoid clicks at frameboundaries
Faster method: synthesize DFT frames, thenoverlap-add
Amp
Freq
Amp
Freq
Amp
Freq
Amp (t)1
Freq (t)1
Amp (t)2
Freq (t)2
Amp (t)N
Freq (t)N
!
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 10 / 23
Sinusoid + noise model
Sinusoids is not a good fit for all types of signals (e.g. noise)
Sometimes want to retain residual in addition to sinusoids
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
4. Noise Residual• Some energy is not well fit with sinusoids
e.g. noisy energy
• Can just keep it as residualor model it some other way
• Leads to “sinusoidal + noise” model
13
0 1000 2000 3000 4000 5000 6000 7000 freq / Hz
mag
/ dB
-80
-60
-40
-20
0
20
original
sinusoids
residualLPC
x[n] =�
k
ak[n]cos(ωk[n]n) + e[n]
Model residual as white noise passed through time-varying filter
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 11 / 23
Sinusoidal subtraction
original soundx(n)
synthesized soundwith phase matchings(n)
residual sounde(n) = w(n) x(x(n) - s(n)),n = 0,1, . . ., N - 1
0.105 0.106 0.107 0.108 0.109 0.11 0.111
-5000
0
5000
time (sec)
am
plit
ud
e
0.105 0.106 0.107 0.108 0.109 0.11 0.111
-5000
0
5000
time (sec)
am
plit
ud
e
0.105 0.106 0.107 0.108 0.109 0.11 0.111
-5000
0
5000
time (sec)
am
plit
ud
e
0 5 10 15 20-100
-90
-80
-70
-60
-50
-40
-30
frequency (KHz)
magnitu
de
(dB
)
0 5 10 15 20-100
-90
-80
-70
-60
-50
-40
-30
frequency (KHz)
magnitu
de
(dB
)
b) residual spectrum and its approximation
a) original spectrum
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 12 / 23
Modeling the residual
original soundx(n)
synthesized soundwith phase matchings(n)
residual sounde(n) = w(n) x(x(n) - s(n)),n = 0,1, . . ., N - 1
0.105 0.106 0.107 0.108 0.109 0.11 0.111
-5000
0
5000
time (sec)
am
plit
ud
e
0.105 0.106 0.107 0.108 0.109 0.11 0.111
-5000
0
5000
time (sec)
am
plit
ud
e
0.105 0.106 0.107 0.108 0.109 0.11 0.111
-5000
0
5000
time (sec)
am
plit
ud
e
0 5 10 15 20-100
-90
-80
-70
-60
-50
-40
-30
frequency (KHz)
magnitu
de
(dB
)
0 5 10 15 20-100
-90
-80
-70
-60
-50
-40
-30
frequency (KHz)
magnitu
de
(dB
)
b) residual spectrum and its approximation
a) original spectrum
Many options for modeling residual filter parameters
Can use LPC to approximate shape of residual
or simply smooth magnitude spectrum ala channel vocoder
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 13 / 23
Residual synthesis
spectral magnitudeapproximation of residual
random spectral phase
synthesized sound
synthesized soundwith window
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 14 / 23
Synthesis: Putting it all together
spectralsine
generation
sinefrequencies
sinemagnitudes
sinephases
magnitudespectrum
polar torectangularconversion
phasespectrum
complexspectrum
polar torectangularconversion
IFFT
spectralresidual
generation
residualspectral data
magnitudespectrum
phasespectrum
windowgeneration
synthesiswindow
outputsound
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 15 / 23
Sin + noise - example
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
Sinusoids + Noise Decomposition• Removing sines reveals noise & transients
• Different representation approaches...14
Time
Freq
uenc
yGuitar - original
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
1000
2000
3000
4000
Time
Freq
uenc
y
Guitar - sinusoid reconstruction
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
1000
2000
3000
4000
Time
Freq
uenc
y
Guitar - residual (original - sines)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
1000
2000
3000
4000
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 16 / 23
Examples
Orig Sin Residual SMS Transformedflute
clarinetguitardrumswater
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 17 / 23
http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.det.sms.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.stoc.sms.http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/fl-A6-fr.sms.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dr2.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dre.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/clar-dx2.wavhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_orig.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_det.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_stoch.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_synth.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_transf3.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/guitar_transf10.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_synt.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_det.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drums_synt.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drum_transf2.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/drum_transf10.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/water.auhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/water.sms.au
Applications
Loads of applications
Similar to other TF analysis-synthesis techniquesbut parameters more convenient for some transformations
can treat spectral shape independent from harmonicscan treat different tracks independently
Filtering with arbitrary time resolution
50 10 15 20
2
4
6
8
10
12
f in kHz
Timescale modification
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
Sinusoidal Modification• Sinusoidal description very easy to modify
e.g. changing time base of sample points
• Frequency stretchpreserve formant envelope?
12
0 1000 2000 3000 40000
10
20
30
40
freq / Hz
leve
l / d
B
0 1000 2000 3000 40000
10
20
30
40
freq / Hz
leve
l / d
B
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
1000
2000
3000
4000
5000
time / s
freq
/ Hz
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 18 / 23
Applications: Frequency transformations
Partial-dependent frequency scaling
!f
e.g. pseudo inharmonicities of higher partials in piano sound
Frequency stretching: ω̂k [n] = p ∗ ωk [n]
e.g. pitch-shift without preserving timbre
Spectral shape shift
!f
Quantize pitch (autotune)E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 19 / 23
Still more applications
Effects we’ve seen before:
Vibrato modulate ωk [n] with LFOTremolo modulate ak [n] with LFO
Hoarseness: boost residual relative to sinusoidal components
Morphing between sounds
Gender change: shift pitch and formants separately
Singing voice synthesis/conversion (Vocaloid)
Separate transients from steady-state harmonics
. . .
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 20 / 23
http://www.vocaloid.com/en/index.htmlhttp://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/LEON_Demo_Check_It_Out.mp3http://www.ee.columbia.edu/~ronw/adst/lectures/matlab/wavs/SMS/kimi_no_uwasa_128.mp3http://www.ee.columbia.edu/~ronw/e6820/ylt.wavhttp://www.ee.columbia.edu/~ronw/e6820/yltsin.wavhttp://www.ee.columbia.edu/~ronw/e6820/ylt-transients-512.wav
Tools: SPEAR
From Michael Klingbeil: http://www.klingbeil.com/spear/
E4896 Music Signal Processing (Dan Ellis) 2010-02-15 - /15
Examples• Using Michael Klingbeil’s SPEARhttp://www.klingbeil.com/spear/
4
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 21 / 23
http://www.klingbeil.com/spear/
Bringing it all together
PVOC X [k , n] = A[k , n] e jω[k,n]
Magnitude and phase of fixed oscillators
Source-filter X [k , n] = E [k , n] H[k , n]
Filter captures spectral shape (smoothed A[k , n])Source is whatever is left - typically pulse traincorresponding to harmonic series
Sinusoids X [k, n] = Ak [n] cos(ωk [n]) + E [k, n]
Organize input into oscillators with time-varyingfrequency (harmonic tracks)Sinusoid magnitudes Ak [n] approximate spectralshapeFrequencies ωk [n] encode source information
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 22 / 23
Reading
DAFX Chapter 10 - Spectral Processing
E85.2607: Lecture 9 – Sinusoidal Modeling 2010-04-08 23 / 23