Department of Precision and Microsystems Engineering
Improvements on Time-Frequency Analysis
using Time-Warping and Timbre Techniques
Name: Maarten van der Seijs
Report no: EM 11.018
Coach: dr. ir. D. de Klerk
Professor: prof. dr. D.J. Rixen
Specialisation: Engineering Mechanics
Type of report: Masters Thesis
Date: Delft, June 6, 2011
Abstract
Spectral analysis of non-stationary signals is known to be a challenging task. Classical methods
like the discrete Fourier transform are often inadequate to capture and track periodic content with
rapidly changing frequencies. This is basically for two reasons. On one hand, the Fourier transform
is intended for expressing frequency content in terms of constant-frequency contributions. On the
other hand, the simultaneous accuracy of temporal and spectral localisation is limited by the time-
frequency uncertainty principle. This thesis lays out the findings of an explorative study towards
potential improvements on time-frequency analysis.
Anticipating on the first issue, the concept of time-warping has been explored. By stretching and
contraction of pieces of the signal, frequency changes may be "flattened out", resulting in improved
detection of non-stationary frequencies and much sharper spectra than possible with traditional
Fourier analysis. Both linear and non-linear time warping approaches were investigated, together
with the required non-uniform interpolation techniques.
Application of linear time-warping prior to a Fourier transformation leads to the definition of the Fan
chirp transform. This transformation is in essence closely related to the popular short-time Fourier
transform, but provides time-frequency basis functions in a fan-geometry rather than a rectangularly-
tiled grid. The skewed basis functions match the harmonic structure of an instationary component
with linearly increasing frequency.
The second issue is addressed by considering periodic signals in their entirety rather than by their
individual partials (or harmonics, overtones). A novel concept is proposed: timbre analysis. The
timbre representation provides means to classify a tonal signal, similar to the way the human ear
(which is in fact a remarkably sophisticated Fourier analyser) perceives and identifies sound. It is
shown that the instantaneous timbre, obtained by normalisation of the harmonic phases, tends to
remain stationary throughout a non-stationary signal.
The timbre representation is used to identify components in polyphonic problems, where the signal is
a mixture of multiple crossing tonal components. In addition, a pitch tracking technique is proposed
that tracks a periodic component based on its timbre. The component can then be isolated and
extracted using Vold-Kalman filtering.
3
Preface
This thesis is the result of a Master of Science thesis project from October 2010 onwards. It was
fulfilled in the group of Engineering Dynamics, which is part of the Precision and Microsystems
Engineering department at Delft University of Technology.
First, I would like to thank dr. ir. Dennis de Klerk for his enthusiastic and dedicated supervision. As
a true expert in experimental dynamics, he confronted me with a diversity of challenges and never
failed to inspire me.
Second, I greatly acknowledge prof. dr. Daniel Rixen for his support throughout my entire Masters
studies. His readiness to help and ever-constructive suggestions are exemplary. I frankly believe that
due to his involvement with students, many will eventually find the path to Engineering Dynamics.
Finally, I would like to thank my family, friends and house mates for their love, support and reflection
throughout my entire studies in Delft.
Maarten van der Seijs,
June 2011
4
Contents
Abstract 3
Preface 4
Contents 5
Nomenclature 9
Introduction 13
Research context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Research goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Personal contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
I Basic Concepts 15
1 Time-Domain Concepts 17
1.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1.1 Continuous-time signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1.2 Discrete-time signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1.3 Digital signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Periodic signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.1 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.2 Frequency and phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.3 Periodic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.4 Orthogonality of harmonic waves . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Signal modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 Sinusoids plus noise model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Tonal components plus noise model . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Frequency-Domain Concepts 25
2.1 Time domain vs. frequency domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5
CONTENTS
2.1.1 Example 1: Basis vector transformation . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.1 Trigonometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.2 Complex exponential series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.3 Example 2: Trumpet harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Continuous-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Discrete-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Spectral symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Periodic extension & spectral leakage . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.3 Plancherel theorem & Parseval’s theorem . . . . . . . . . . . . . . . . . . . . . 35
2.4.4 Fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.5 Example 3: Trumpet DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.1 Window application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.2 Window properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.3 Rectangular window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.4 Hanning window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.5 Gaussian window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5.6 Cosine and cosine-sigma window . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.7 Example 4: Windowing of a simple signal . . . . . . . . . . . . . . . . . . . . . 42
2.6 Uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6.1 Temporal and spectral localisation . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6.2 Time-frequency product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6.3 Uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6.4 Example 5: Time-frequency product of four windows . . . . . . . . . . . . . . . 45
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
II Advanced concepts 47
3 Time Warping 49
3.1 Linear time warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Warp function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.2 Chirp wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.3 Chirp rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.4 Inverse warp function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.5 Inverse time warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.6 Example 6: Linear warp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Non-linear time warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Discrete implementation & interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Interpolation approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.2 Spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.3 Interpolation performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.4 Example 7: Linear warp DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6
CONTENTS
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Timbre 59
4.1 Definition of timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.1 Normalised amplitude and phase . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.2 Complex normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.3 Timbre vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Instantaneous timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.2 Discrete implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.3 Example 8: Trumpet timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.4 Bandwidth considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.5 Example 9: Helicopter timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Warped timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
III Short-time Spectral Analysis 69
5 Short-Time Fourier Transform 71
5.1 Short-time blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1.1 Shift size & overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 Short-time DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Time-frequency considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.1 Spectral/temporal resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.2 Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.3 Zero-padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.4 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Fan Chirp Transform 77
6.1 Formulation of the FChT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Short-time FChT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2.1 Block chirp rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2.2 Example 10: STFChT of a chirp wave . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2.3 Example 11: STFChT of an engine run-up . . . . . . . . . . . . . . . . . . . . . 80
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
IV Pitch Tracking & Order Extraction 83
7 Pitch Tracking Techniques 85
7.1 Pitch detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.2 Pitch tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.2.1 Pitch salience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2.2 Salience tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2.3 Pitch tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7
CONTENTS
8 Vold-Kalman Order Filtering 91
8.1 Vold-Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.1.1 Data equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.1.2 Structural equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.1.3 Least squares problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.1.4 Order extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 VKF operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2.1 Solving the linear system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.2.2 Bandwidth and roll-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.2.3 Time-varying bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2.4 Multi-order tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.2.5 Example 12: Helicopter signal seperation . . . . . . . . . . . . . . . . . . . . . . 98
8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Appendices 101
A C-code 103
A.1 hdtft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B MATLAB functions 104
B.1 window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.2 interpolant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
B.3 timewarp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
B.4 warpedtimbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
B.5 vkf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography 118
Index 120
8
Nomenclature
Conventions
The following conventions are used throughout the report:
– Lower-case symbols followed by round brackets, e.g. y(t) denote continuous signals functions,
t ∈ R.
– Lower-case symbols followed by square brackets, e.g. y[n] denote discrete signals or indexed
functions, n ∈ N.
– Bold-face lower-case symbols can denote either discrete vectors, e.g. y =[y[0], y[1], . . . , y[n]
]or continuous vectors, e.g. c(t) =
[c1(t), c2(t), . . . , cK(t)
]– Capital symbols denote scalars, e.g. N or Tb. Bold-face capitals represent arrays, e.g. E.
– Scalars n and m are zero-based indexing integers of a discrete-time signal, e.g. y[n]. If not
specified, n = 0, . . . , N−1 and t[n] = nfs.
– i represents the imaginary unit, defined by i2 = −1.
– The decibel dB is defined as 2010log(a).
Acronyms
CTFT Continuous-time Fourier Transform
DFT Discrete Fourier Transform
DTFT Discrete-time Fourier Transform
FChT Fan Chirp Transform
FT Fourier Transform
IF Instantaneous Frequency
IT Instantaneous Timbre
PME Precision and Microsystems Engineering
STFT Short-time Fourier Transform
STFChT Short-time Fan Chirp Transform
VKF Vold-Kalman filter
9
NOMENCLATURE
Symbols
Lower-case symbols
a Trigonometric Fourier series coefficient (cosine)
b Trigonometric Fourier series coefficient (sine) or block index
c Complex Fourier series coefficient
cn Timbre vector, normalised to amplitude and phase
cθ Timbre vector, normalised to phase
cn(t) Instantaneous timbre vector, normalised to amplitude and phase
cθ(t) Instantaneous timbre vector, normalised to phase
d Sample deviation
e Euler’s number
ei(·) Complex exponential expression
e Time-domain basis vector
e Frequency-domain basis vector
f frequency
f(t) Instantaneous frequency
fc Centre or mean frequency
fm Modulation frequency
fs Sample rate in Hertz
g Tonal component index
h Harmonic index
i Imaginary unit
k Component or harmonic partial index
l Signal block index
m Block signal sample index
n Signal sample index
nb Block centre sample index
p Power or function order
q Spline order
r Resampling ratio
s(f0) Pitch salience function
t Time
t[n] Time vector
tb Block centre time
Δt Sampling interval
w(t) Window function
w[n] Window vector
x(t) Sinusoidal wave function
y(t) Signal or function
y[n] Signal vector
yb[m] Block vector
10
Upper-case symbols
A Amplitude
B Number of signal blocks
E Array of time-domain basis vectors
E Array of frequency-domain basis vectors
G Number of tonal components
H Number of harmonics
I Unity matrix
K Number of components or harmonic partials
L Shift size
M Signal block size
N Signal size
Q Bit depth
T Signal length
Tb Signal block length
Tl Shift length
U Time-frequency product
Greek symbols
α Linear chirp rate
β(d) Spline basis function
δ Kronecker delta or Dirac pulse
ε Normalised error
η[n] Noise
θ Phase shift
μ Mean value
ρα(t) Warp normalisation function
σ Standard deviation or second central moment
ϕ(t) Phase function
φα(t) Warp function
ψα(t) Inverse warp function
ωs Sample rate in radians per second
Superscripts and embellishments
� Frequency domain representation
� Complex conjugate
�′ Time-derivative
� Time-warped
�n Phase and amplitude normalised
�θ Phase normalised
�(w) Windowed
11
Introduction
Research context
Spectral analysis is the field of analysis that characterises signals in terms of frequency content.
Whether it concerns a measurement of an accelerating car engine or an acoustic recording of a entire
symphony orchestra, a spectral representation can provide valuable information about the periodic
components present in the signal. Periodic components often correlate with clear deterministic
systems and are therefore in the very interest of engineers.
A spectral representation of a signal is obtained by means of a Fourier transformation. A major
disadvantage of the Fourier transform is that it tries to express signals in terms of constant
frequencies. This will not be a problem for stationary signals, e.g. a car engine running at constant
speed or a decaying piano chord. Non-stationary signals however exhibit frequencies that change
over time and will appear “blurry” in a spectral representation.
Time-frequency analysis extends spectral analysis and represents a signal in both time and frequency
domain. The most popular technique is the short-time Fourier transform, that analyses shorter blocks
of a signal. The general idea is that shorter blocks are “quasi-stationary”, or at least more stationary
than the entire signal. By choosing a proper block size and thereby time/frequency resolution, one
can often obtain a reasonably sharp spectrum. However, the simultaneous accuracy of temporal and
spectral resolution is always subject to the time-frequency uncertainty principle.
Research goal
At the start of my thesis project, I was confronted with some highly non-stationary signals from
dynamic measurements. It appeared to be very challenging, if not impossible, to identify and
characterise the tonal components to a reasonable degree of accuracy. As classical Fourier time-
frequency analysis appeared to be inadequate for this purpose, we decided to take time-frequency
analysis a step further. An explorative approach was chosen that pursues the following research
objective:
Improvements on time-frequency analysis and identification of non-stationary signals.
This thesis lays out the basics of present time-frequency analysis and the findings towards potential
improvements.
13
INTRODUCTION
Personal contributions
In order to extend the current state of time-frequency analysis, the following developments are
proposed:
– The technique of linear and non-linear time-warping as an independent operation.
– The fan chirp transform applied to signals from experimental dynamics, as a way to improve
the spectral resolution.
– Timbre analysis as a means of characterising a tonal signal on the basis of the harmonic
amplitudes and phases relative to the fundamental.
– The formulation of the cosine-sigma window as an intermediate between the Hanning and
Gaussian window.
– Pitch tracking based on timbre-following.
Thesis outline
The research approach is explorative and not focussed on solving a single problem. It was therefore
chosen to present the theory in a textbook-like format. The thesis consists of 8 chapters, subsequently
divided into four parts:
– Part I discusses the fundamental concepts of the time domain (chapter 1) and frequency domain
(chapter 2). All analysis is performed on complete signals. The main topics are signal modelling,
Fourier transformations, windowing and the uncertainty principle.
– Part II introduces the “advanced” concepts. Time-warping is discussed in chapter 3, together
with the required interpolation techniques. Timbre analysis is introduced in chapter 4.
– Part III is dedicated to short-time spectral analysis techniques. Chapter 5 discusses the short-
time Fourier transform. Chapter 6 extends this concept to the fan chirp transform.
– Part IV discusses pitch tracking techniques in chapter 7 and Vold-Kalman order filtering in
chapter 8.
All chapters are illustrated by examples and concluded with a summary.
All calculations carried out in this thesis were performed by a collection of Matlab R© functions,
specially written for efficient time-frequency analysis. Only a few functions are included in appendix
A and B. The complete toolbox including the code for the examples is found on a CD-ROM.
14
Part I
Basic Concepts
Chapter1
Time-Domain Concepts
This chapter introduces some basic concepts related to signals and signal processing in the time-
domain. First, the discretisation of continuous signals into digital signals is discussed. In section 1.2,
the concepts of periodicity and harmonicity are considered. Section 1.3 concludes with a discussion
of signal modelling.
A thorough discussion of the concepts is found in textbooks on signal processing, for instance [14, 13].
This chapter offers a brief recap of time-domain signal theory that is relevant for this thesis.
1.1 Signals
In a most general formulation, a time-domain signal can be any real-valued quantity that varies in
time. Mathematically, a signal may be written as y = f(t), where the quantity y and the time domain
t are not necessarily bounded. Due the complexity of most signals, the function f can rarely be
expressed as a simple closed form, but may in some cases be approximated.
1.1.1 Continuous-time signals
By nature, all signals we encounter in the real world are continuous. Nature treats everything with
infinite smoothness; it does not require any type of discretisation, nor limits accuracy. Therefore both
quantity and time have an uncountable domain of real values: y, t ∈ R. These signals are called
continuous-time signals and are often referred to as analogue, in contrast to digital.
1.1.2 Discrete-time signals
Discrete-time signals on the other hand have a discretised time domain, meaning that the signal is
sampled at a finite number (N ) of instances tn: t0, t1, . . . , tN−1. The sampling is usually performed
at a constant interval (i.e. tn+1 − tn = Δt), yielding the so-called sample rate fs = 1/Δt in Hz or
ωs = 2π/Δt in rad/s. The discrete-time signal may be represented as a vector y[n] with the signal
values denoted by y[0], y[1], . . . , y[N−1]. Note that these values can still be continuous; y[n] ∈ R.
After sampling, the signal values are merely known at the specified instances tn, which would imply
that everything that happened in between of two adjacent samples is lost. What this really means
for signals in the context of frequency content is discussed later on in chapter 2. An illustration of
discrete-time sampling is shown in figure 1.1.
17
1. TIME-DOMAIN CONCEPTS
time
val
ue
(a) Continuous-time signal
time
val
ue
(b) Discrete-time signal
Figure 1.1: Discrete-time signals use a finite number of values at a fixed sample rate.
1.1.3 Digital signals
Digital signals take both a discretised time and value domain. Digital devices like computers use
finite sets of values to approximate and store the values of every sample into a binary format. For this
purpose, the values are rounded or quantised to a fixed set of equally spaced values a1, a2, . . . , aM(see figure 1.2). The amountM of possible values depends on the chosen resolution. The resolution
is usually specified in terms of a bit depth or word-length Q, related to the total amount of values by
M = 2Q.
The bit depth is directly related to the dynamic range. Dynamic range is defined as the range
between the smallest and largest possible value in a set. For instance: compact disc audio is
formatted in 16-bit resolution which has M = 216 = 65536 values, providing a dynamic range of
20 · 10log(65536) = 96.33 ≈ 96 dB. As an approximation, it is often said that every bit increases the
dynamic range with 6 dB.
Typical classes of quantised values encountered in signal processing are 8, 16, 32 or 64-bit integers,
single precision (32-bit) or double precision (64-bit) floating point. The latter is used most often in
computational applications such as Matlab R© and provides a virtually unlimited dynamic range and
negligible quantisation errors.
time
val
ue
Figure 1.2: The values of digital signals are quantised to a fixed set of values.
18
1.2. PERIODIC SIGNALS
1.2 Periodic signals
A particular interest is in the signals that exhibit a certain amount of periodicity. Periodicity implies
that events occur repeatedly in time, with constant period T in seconds. That may be the extension
of an oscillating spring, the water level of the ocean due to tidal change or the sound pressure created
by the vibration of a guitar string. Periodic signals often origin from clear deterministic systems
and are therefore in the interest of engineers. Noise, in contrast, is typically generated by more
or less stochastic processes. Many signals encountered in practice exhibit both noise and periodic
components.
(a) Periodic signal (b) Noise
Figure 1.3: Periodic signals have repetitive content with period T . Noise is fully stochastic.
1.2.1 Periodicity
Mathematically, a signal or function is said to be periodic if there is a period T that satisfies:
y(t) = y (t+ kT ) k ∈ N (1.1a)
or for discrete-time signals:
y[n] = y [n+ kT ] k ∈ N, T fs ∈ N (1.1b)
which shows already a periodicity problem for Tfs /∈ N, see section 2.4.2. However, equations (1.1a)
and (1.1b) are very strict definitions of periodicity and only hold for a few simple functions. In real
life, a signal consist of several components that only partly satisfy the definition.
1.2.2 Frequency and phase
Periodic components can be assigned by frequency, f = 1/T . As a convention, frequencies expressed
in Hertz take the symbol f , while angular frequencies in radians per second are notated with ω = 2πf .
After one period T , the phase ϕ(t) of the component has advanced 2π radians. The phase refers to the
instantaneous position of a component y at time t and indicates the fraction of the period in radians
that has been elapsed. For a sine wave, it is simply given by y(t) = sin(ϕ(t)
).
19
1. TIME-DOMAIN CONCEPTS
For stationary signals with constant frequency f , the phase continues to increase linearly with ϕ(t) =
2πft or ϕ(t) = ωt. The phase may be biased by a constant phase shift θ in radians that determines
the phase for t = 0:
ϕ(t) = 2πft+ θ (1.2)
The concept of frequency is generalised for non-stationary signals by defining the time-dependent
instantaneous frequency (IF) f(t) as the derivative of the phase ϕ(t) with respect to time:
f(t) =1
2π
dϕ(t)
dt(1.3)
Likewise, the phase of a component with time-varying frequency is found by
ϕ(t) = 2π
∫ t
0
f(t) dt+ θ (1.4a)
or in the discrete case
ϕ[n] = 2π
n∑n=1
f [n] Δt+ θ (1.4b)
although it is better to replace the latter summation by a proper numerical integration method. A
complete discussion of instantaneous frequency and phase is found in [3].
1.2.3 Periodic functions
The most basic but very important periodic functions are the sine and cosine functions (or sinusoids)
with constant frequency f :
x(t) = sin(ϕ(t)
)= sin
(2πft
)(1.5a)
x(t) = cos(ϕ(t)
)= cos
(2πft
)(1.5b)
These functions fully satisfy the definition of periodicity. A complex exponential equation is based on
Euler’s formula and describes a complex-valued1 wave x(t) with constant frequency f :
x(t) = ei 2πft = cos(2πft
)+ i sin
(2πft
)(1.6a)
Multiplying with the complex amplitude scalar c = a + bi and keeping only the real part, the wave
can be given an amplitude and phase shift:
x(t) = � (c ei 2πft) = a cos(2πft
)− b sin(2πft
)= A cos
(2πft+ θ
)(1.6b)
with A = ‖c‖ the absolute amplitude and θ = ∠c the phase shift in radians. An alternative notation
uses the identities cos(ϕ) = 12
(ei ϕ + e−i ϕ
)and sin(ϕ) = 1
2i
(ei ϕ − e−i ϕ
):
x(t) = c+ ei 2πft + c− e−i 2πft (1.6c)
If c− is the complex conjugate of c+, the resulting signal y(t) is real. If so, it follows that c+ = 12 c
and c− = 12c and consequently A = 2 ‖c+‖ = 2 ‖c−‖ and θ = ∠c+ = −∠c−.
Equations 1.6a — 1.6c lie in the very essence of the frequency-domain representation, as discussed in
chapter 2.
1Throughout this text, the quantity i is reserved for the imaginary unit, defined by i2 = −1.
20
1.3. SIGNAL MODELLING
1.2.4 Orthogonality of harmonic waves
An important property of the sinusoids and complex waves is the orthogonality between waves with
intersecting full periods. Consider the complex exponential wave x0(t) with base frequency f0 and
non-zero complex amplitude c0 given by
x0(t) = ei 2πf0t (1.7)
Also consider the kth harmonic wave with derived frequency fk = kf0 and complex amplitude ck:
sk(t) = ck ei 2πkf0t k ∈ N (1.8)
Then the projection of the wave xk(t) onto xl(t) over a full period T0 = 1/f0 writes:∫ T0
0
xk(t) · xl(t) dt =∫ T0
0
ck cl ei 2π (l−k) f0t = (ck cl) δkl k, l ∈ N (1.9)
using the Kronecker delta notation:
δkl =
{1 for k = l
0 for k �= l
The harmonic waves xk and xl for k �= l are thus orthogonal over a full period of the base frequency,
regardless of their amplitude and phase shift.
In discrete notation, vector sk is given by:
sk[n] = ck ei 2πk n
N n = 0, . . . , N−1 (1.10)
and sk and sl are orthogonal as well:
1
NsHk sl =
1
N
N−1∑n=0
sk[n]sl[n] = (ck cl) δkl k, l ∈ N (1.11)
where (·)H denotes the complex conjugate or Hermitian vector transpose.
1.3 Signal modelling
The sinusoid wave by itself is the most pure periodic signal and is (in the audible frequency range)
perceived by the human ear as a tone with a most “mellow” quality. In reality, such signals are rarely
seen, except for the test signal on some audio devices or the swing of a pendulum clock. Still, in a
good approximation, periodic signals can be brought down to a combination of many sinusoids:
y(t) =
K∑k=1
Ak(t) cos(ϕk(t)
)0 ≤ t < T (1.12)
This observation is actually the basis for the Fourier series as introduced in section 2.2.
In addition to periodic components, signals may also contain a certain amount of uncorrelated
noise. The following sections discuss means to model an arbitrary signal comprising both periodic
components and noise.
21
1. TIME-DOMAIN CONCEPTS
+ =
Figure 1.4: A signal can be modelled as a combination of periodic components and noise.
1.3.1 Sinusoids plus noise model
Let us consider an arbitrary signal y(t) for 0 ≤ t < T . Then the signal can be modelled as a
summation of a deterministic periodic part consisting of K sinusoids characterised by Ak(t) and
ϕk(t), plus a stochastic part of uncorrelated noise η(t):
y(t) =
K∑k=1
Ak(t) cos(ϕk(t))︸ ︷︷ ︸deterministic
+ η(t)︸︷︷︸stochastic
0 ≤ t < T (1.13)
This way of representing a signal is often referred to as the sinusoids plus noise model [17]. In case
of a stationary signal ys(t), frequencies and amplitudes remain constant and ys(t) can be written as:
ys(t) =
K∑k=1
Ak cos(2π fk t+ θk
)+ η(t) 0 ≤ t < T (1.14)
An example is shown in figure 1.4.
1.3.2 Tonal components plus noise model
If some frequencies f(1)h ⊆ fk can be related to a fundamental frequency f
(1)0 by f
(1)h = hf
(1)0 ,
then the waves for h = 1, . . . , H are considered to be harmonic partials of a tonal component with
fundamental frequency f(1)0 . The stationary tonal component y(1)(t) is then modelled as:
y(1)s (t) =
H∑h=1
A(1)h cos
(2π hf
(1)0 t+ θ
(1)h
)0 ≤ t < T (1.15)
An non-stationary tonal component may be described by:
y(1)(t) =
H∑h=1
A(1)h (t) cos
(ϕ(1)h (t)
)0 ≤ t < T (1.16)
Defining the fundamental phase shift θ(1)0
Δ= 0, it follows that ϕ
(1)0 (0) = 0 and the phase functions of
the partials are given by:
ϕ(1)h (t) = hϕ
(1)0 (t) + θ
(1)h 0 ≤ t < T (1.17)
This concept will be used extensively in chapter 4.
22
1.4. SUMMARY
A monophonic signal y(t) comprises only one tonal component y(1)(t) in addition to noise. A
polyphonic signal contains multiple tonal components y(g)(t) with different fundamental phase
functions ϕ(g)0 (t). The model for an instationary signal comprising g = 1, . . . , G tonal components
plus noise finally writes:
y(t) =G∑
g=1
y(g)(t) + η(t) 0 ≤ t < T (1.18)
This model is referred to as the tonal components plus noise model.
1.4 Summary
Time-domain signals are formulated in either the continuous-time or the discrete-time domain. The
value of the continuous-time signal is known at every instance in time. The values of a discrete-time
signal are solely known at a finite number of instances in time and can be obtained by sampling. A
digital signal additionally requires the values to be quantised to a finite set of values.
Signals usually comprise both periodic components and noise. Periodic components can be assigned
by frequency, which is the inverse of the period. A stationary component has a constant frequency
and consequently a linearly increasing phase. For a non-stationary component, the instantaneous
frequency is found as the derivative of the phase with respect to time. Essential periodic functions
are the sine and cosine wave or sinusoids, that may also be formulated using complex exponential
notation. The sinusoid exhibits orthogonality for waves with intersecting full periods.
In a good approximation, any time-domain signal may be considered as a deterministic part consisting
of sinusoids plus a stochastic part of uncorrelated noise. If some sinusoids can be related to a
fundamental frequency, the sinusoids are considered to be harmonic partials of a tonal component. A
monophonic signal comprises only one tonal component, while a polyphonic signal contains multiple
tonal components.
23
Chapter2
Frequency-Domain Concepts
For many analysis purposes, a time-domain representation of a signal does not offer enough
information. More insight is gained from its frequency domain representation or frequency spectrum,
which can be obtained by means of a Fourier transformation.
This chapter explores the fundamentals of the frequency domain and Fourier analysis. First, the
difference between the time and frequency domain is discussed and illustrated by a basis vector
transformation. In section 2.2, the Fourier series and Fourier transformations are introduced. The
discrete Fourier transform and its properties are discussed in section 2.4. Section 2.5 addresses the
theory and application of windowing. The chapter concludes with a study of the uncertainty principle
in section 2.6 that brings up the fundamental trade-off between time and frequency localisation.
2.1 Time domain versus frequency domain
The frequency domain is the domain in which a function or signal is expressed in terms of frequency
content rather than time content. Formally, a frequency domain representation shows the spectral
distribution of a signal, whereas a time-domain representation shows its temporal distribution.
time
freq
uen
cy
(a) Time domain
time
freq
uen
cy
(b) Frequency domain
Figure 2.1: The difference between the time and frequency domain representation of a signal. Thetime domain offers perfect time localisation but no frequency localisation, while the frequency domainoffers excellent frequency localisation but lacks time localisation.
25
2. FREQUENCY-DOMAIN CONCEPTS
The concept is illustrated in figure 2.1 with time on the horizontal axis and frequency on the vertical
axis. Let us assume that all time and frequency content of a signal is contained within the square.
Then the time-domain only offers temporal localisation, while the frequency-domain merely provides
spectral localisation. Nevertheless, both domains can represent exactly the same signal as long as
some conditions are satisfied, as will be discussed in this chapter.
The frequency-domain representation or Fourier transformed of a signal y(t) is denoted by y(f), with
f the frequency in Hertz. The transformation y(t) ⇒ y(f) is called a Fourier transform (FT) and is
discussed in section 2.2. For discrete signals, the time-domain sequence y[n] can be transformed to a
frequency-domain spectrum y[k] of equal length by means of a discrete Fourier transform (DFT), as
discussed in section 2.4. First, the difference between the two domains is illustrated by a basis vector
transformation.
2.1.1 Example 1: Basis vector transformation
Let us consider the simple time-domain sequence y[n] as shown in figure 2.2 with onlyN = 8 points.
In a time-domain representation, the 8× 1 vector y[n] holds the 8 values for the 8 different instances
n = 0, 2, . . . , 7:
y =[1 2 0 −1 −1.5 −0.5 1.5 1
]TAll 8 values are independent of each other: there exists no other combination of these 8 entries in
y that represent the sample signal. Mathematically speaking, the space of the R8 time-domain can
exactly be spanned byK = 8 orthogonal basis vectors ek that form the basis vector matrix E:
E =
⎡⎢⎢⎢⎣e0e1...
e7
⎤⎥⎥⎥⎦ =
⎡⎢⎢⎢⎣1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0...
......
......
......
...
0 0 0 0 0 0 0 1
⎤⎥⎥⎥⎦ = I
Since E equals the unity matrix, the vectors ek are also orthonormal and it simply follows that
y = Ey. It is observed that the vectors of E are perfectly independent in terms of time localisation,
but do not offer any information about the frequency content.
1 2 3 4 5 6 7 8−2
−1
0
1
2
n
y
Figure 2.2: A simple time-domain sequence of 8 samples.
In the frequency-domain representation, the same sequence y is expressed in another set of 8 basis
vectors ek, k = 0, . . . , 7. The vectors correspond to the 8 orthogonal complex exponential waves
26
2.1. TIME DOMAIN VS. FREQUENCY DOMAIN
0 1 2 3 4 5 6 7 8−1
0
1
e 0
0 1 2 3 4 5 6 7 8−1
0
1
e 1
0 1 2 3 4 5 6 7 8−1
0
1
e 2
0 1 2 3 4 5 6 7 8−1
0
1
e 30 1 2 3 4 5 6 7 8
−1
0
1
e 4
0 1 2 3 4 5 6 7 8−1
0
1
e 5
0 1 2 3 4 5 6 7 8−1
0
1
e 6
0 1 2 3 4 5 6 7 8−1
0
1
e 7
Figure 2.3: The orthogonal basis vectors of the 8-point frequency domain. The real part is colouredblue, the imaginary part is red. The arrows indicate the conjugate pairs: waves with similarfrequencies but opposing imaginary part.
with full periods (see equation 1.10), counting from k = 0 to 7:
ek[n] = ei 2π knN k, n = 0, 1, . . . , 7 N = 8
The vector values and corresponding waves are shown in figure 2.3 with the real part in blue and the
imaginary part in red.
The first vector e0 corresponds to a constant level, also referred to as the DC component. Vector 1 to
4 contain respectively 1 to 4 full periods. Note that vector 0 and 4 are real-valued.
For vector 5 to 7, the number of periods are expected to be 5 to 7, but their vector values only show
3 to 1 periods. This is in accordance with the Nyquist-Shannon sampling theorem that states that a
sampled signal can only contain frequency content up to half the sampling frequency. As a result,
waves with frequencies > 12fs will appear as aliased waves with lower frequency and an opposing
complex part, which yields in case of this example:
e5 = e3
e6 = e2
e7 = e1
The conjugate pairs are indicated by arrows in figure 2.3. In fact, the aliased vectors correspond to
the negative frequencies of the spectrum, as will be discussed in section 2.4.
Getting back to the basis vectors, a square matrix E can be constructed from the 8 vectors ek , similar
to the time domain procedure. Just as the basis vectors ek , the vectors ek span an orthogonal space
(see equation (1.11)). In contrast toE, E contains vectors that are independent in terms of frequency
27
2. FREQUENCY-DOMAIN CONCEPTS
0 1 2 3 4 5 6 70
0.5
1
n
YFigure 2.4: Frequency-domain representation y of the sequence y of figure 2.2.
but are evenly spread over time:
E =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
1 1 1 1 1 1 1 1
1√2(12 + 1
2 i)
i√2(− 1
2 + 12 i) −1
√2(− 1
2 − 12 i) −i √
2(12 − 1
2 i)
1 i −1 −i 1 i −1 −i1
√2(− 1
2 + 12 i) −i √
2(12 + 1
2 i) −1
√2(12 − 1
2 i)
i√2(− 1
2 − 12 i)
1 −1 1 −1 1 −1 1 −1
1√2(− 1
2 − 12 i)
i√2(12 − 1
2 i) −1
√2(12 + 1
2 i) −i √
2(− 1
2 + 12 i)
61 −i −1 i 1 −i −1 i
1√2(12 − 1
2 i) −i √
2(− 1
2 − 12 i) −1
√2(− 1
2 + 12 i)
i√2(12 + 1
2 i)
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
While EHE = I , it appears that EHE = NI with N = 8, meaning that the vectors are orthogonal
but not orthonormal. Nevertheless, E can be regarded as an orthogonal transformation matrix,
that can be used to transform the representation in the frequency-domain by the basis ek to the
representation in the time-domain by the basis ek:
y =1√N
E yn (2.1a)
yn =1√N
EHy (2.1b)
Vector yn denotes the normalised Fourier transformed of y. However, a more common transforma-
tion writes:
y = E y (2.1c)
y =1
NE
Hy (2.1d)
As such, vector y corresponds to the amplitudes of the complex basis waves as present in the signal
y[n].
The values for y are found:
y =[0.31 0.71 −0.25 −0.09 −0.06 −0.09 −0.25 0.71
]T+[
0.00 0.14 −0.19 −0.23 0.00 0.23 0.19 −0.14]Ti
28
2.2. FOURIER SERIES
It can be observed that y[0] is real and y[1], y[2], y[3] form complex conjugate pairs with y[7], y[6], y[5].
The absolute values |y| are shown in figure 2.4.
It is verified that yH y = 1/N yHy and yHn yn = yHy, which shows that energy is conserved
throughout the transformation. This property is known as Parseval’s identity (see section 2.4.3).
The obtained vector y is exactly the Fourier transformed of y, as will be discussed in the following
sections.
2.2 Fourier series
A decomposition of a periodic signal into its harmonic sinusoidal partials is called a Fourier series.
The concept is named after Joseph Fourier (1768 — 1830), a French mathematician who discovered
that stationary periodic signals can be expressed as a superposition of sinusoids:
y(t) =K∑
k=1
Ak cos(fkt+ θk
) −∞ < t <∞ (2.2)
In theory, a Fourier series can describe any periodic signal exactly as long as an infinite number of
partials is allowed.
2.2.1 Trigonometric series
Following the definition of equation (1.6b), the harmonic partials k = 1, . . . ,K of a periodic signal
y(t) with fundamental frequency f0 can be found in terms of ak and bk by:
ak =2
T0
∫ T0
0
y(t) cos(2πf0t
)dt (2.3a)
bk =2
T0
∫ T0
0
y(t) sin(2πf0t
)dt (2.3b)
The DC offset a0 is determined by:
a0 =1
T0
∫ T0
0
y(t) dt (2.3c)
The integration is performed over 0 ≤ t < T0, although any full period can be used. The
trigonometric series is easy to interpret, but mathematically inferior to its complex equivalent.
2.2.2 Complex exponential series
The complex exponential Fourier series provides a mathematically more elegant alternative to the
trigonometric series. Considering a signal y(t) with fundamental frequency f0, the complex partials
ck are found by:
ck =1
T0
∫ T0
0
y(t) e−i 2πkf0t dt k = 0, ±1, ±2, . . . , ±K (2.4)
The DC component is also obtained by (2.4): the exponential term is 1 for k = 0. Since complex
exponentials come in pairs as in (1.6c), both positive and negative frequencies should be addressed:
29
2. FREQUENCY-DOMAIN CONCEPTS
k = 0,±1, ±2, . . . , ±K . For real-valued signals it is found that c−k = ck, hence the negative
components do not need to be computed separately. The complex exponential partials are related to
the trigonometric partials by:
ck =
⎧⎪⎪⎨⎪⎪⎩a0 for k = 012 (ak − ibk) for k = 1, 2, , . . . ,K12 (ak + ibk) for k = −1,−2, . . . ,−K
This follows from the Euler identities of section 1.2.3. For discrete signals, time is replaced by t = nΔt
and the Fourier series reads:
ck =1
N0
N0−1∑n=0
y[n] e−i 2πkf0nΔt k = 0,±1, ±2, . . . , ±K (2.5)
N0 corresponds to the required number of samples for a complete period: N0 = T0fs, rounded to an
integer value1. Note that by the sampling theorem, waves with frequency kf0 >12fs become aliased
waves, which should be avoided:
K < 12
fsf0
The inverse Fourier series combines the complex partials to a stationary periodic function similar to
equation (2.2):
y(t) =
K∑k=−K
ck ei 2πkf0t −∞ < t <∞ (2.6a)
If the partials are complex conjugate pairs, the following summation is equivalent:
y(t) = c0 + 2
K∑k=1
� (ck ei 2πkf0t) −∞ < t <∞ (2.6b)
The Fourier series are illustrated by the following example.
2.2.3 Example 2: Trumpet harmonics
As an illustration of Fourier series, a short fragment of a trumpet tone is analysed. The played note
is a B�4 which has a fundamental frequency of f0 = 466Hz and period T0 = 2.145ms. The signal
is sampled at fs = 16000Hz; 5 full periods are shown in figure 2.5. Clearly, the signal consists of a
fundamental component plus harmonics.
The first 8 partials ck of the signal are determined using equation (2.5) and listed in table 2.1. The DC
component for k = 0 is neglected. Since the signal is real, the negative partials c−k are simply the
complex conjugates of ck. The amplitudes read A = 2 ‖ck‖; the phase shifts θ = ∠ck are expressed
in degrees instead of radians. The tonal intervals of the partials are also listed.
The partials ck allow us to reconstruct the signal from its harmonic sinusoidal components using the
inverse Fourier series of equation (2.6a). Figure 2.6 shows the procedure for the first 5 partials, adding
a partial for every T0. It can be observed that the similarity between the assembled and original signal
increases with k.1If T0fs does not yield an integer number of samples, spectral leakage will occur (see section 2.4.2).
30
2.2. FOURIER SERIES
0 1 2 3 4 5 6 7 8 9 10−1
−0.5
0
0.5
1
Time [ms]
Am
plitu
de
Figure 2.5: A short fragment of a B�4 on a trumpet
0 1 2 3 4 5 6 7 8 9 10−1
−0.5
0
0.5
1
Time [ms]
Am
plitu
de
Figure 2.6: The signal cumulatively built up from the first 5 partials: every period adds one partial.
k ck A = 2 ‖ck‖ θ = ∠ck interval
1 0.160− 0.014i 0.321 −5.0 fundamental2 −0.006− 0.172i 0.344 −92.0 octave3 −0.119 + 0.049i 0.258 157.4 octave + fifth4 −0.034− 0.072i 0.158 −114.9 2 octaves5 0.003 + 0.032i 0.064 83.4 2 octaves + third6 0.005 + 0.043i 0.088 −97.2 2 octaves + fifth7 0.008 + 0.011i 0.027 125.5 2 octaves + seventh8 0.002 + 0.014i 0.029 −96.7 3 octaves
Table 2.1: Amplitudes and phase shifts of the first 8 harmonics of the trumpet signal
31
2. FREQUENCY-DOMAIN CONCEPTS
2.3 Fourier transform
The Fourier transform is a generalisation of the Fourier series. While the Fourier series is limited to
periodic signals and uses a discrete set of wave functions, the Fourier transform also applies to a large
class of non-periodic signals and represents spectral content in a continuous frequency-domain. Both
the continuous-time and discrete-time Fourier transform are discussed.
2.3.1 Continuous-time Fourier transform
The continuous-time Fourier transform (CTFT) or simply Fourier transform (FT) transforms a time-
domain function y(t) to the corresponding frequency-domain function y(f). The Fourier transform
does not require the signal to be periodic with some period T0 = 1/f0 and therefore also applies to
non-periodic signals. The most general formulation writes:
y(f) =
∫ ∞
−∞y(t) e−i 2πft dt −∞ < f <∞ (2.7a)
The inverse Fourier transform performs the transformation vice-versa:
y(t) =
∫ ∞
−∞y(f) ei 2πft df −∞ < t <∞ (2.7b)
Strictly, the Fourier transform in the ordinary sense only exists if y(t) is Lebesgue integrable, which
requires the signal to converge:∫ ∞
−∞|y(t)|dt <∞
However, the generalised Fourier transform provides ways to describe non-converging signals by
means of standard Fourier pairs, such as the Dirac pulse for a complex wave:
ei 2πf0t ↔ δ(f − f0) (2.8)
The Fourier transform is an important mathematical concept. However, since both time and frequency
appear unbounded in equation (2.7a), it is not the most applicable variant for signal processing.
2.3.2 Discrete-time Fourier transform
The discrete-time Fourier transform (DTFT) yields the same continuous frequency information as the
Fourier transform, but operates on finite discrete-time signals y[n] sampled at fs. The time instances
are t[n] = n/fs. The DTFT is given by:
y(f) =1
N
N−1∑n=0
y[n] e−i 2πf
nfs − 1
2fs < f < 12fs (2.9)
The frequency domain is limited to ± 12fs, since y[n] can only contain non-aliased frequency content
up to the Nyquist frequency. Note that only the frequencies (or rather: waves with frequency)
f = kfs/N , k ∈ Z are fully periodic over the sequence y[n]. Mathematically spoken, frequencies
with complete periods over lengthN are orthogonal. These are fk = kfs/N with−N/2 < k < N/2.
All frequencies in between project on multiple fk and are therefore “dependent”.
32
2.4. DISCRETE FOURIER TRANSFORM
Note that due to aliasing, the waves with negative frequencies are equivalent to waves with positive
frequencies above the Nyquist frequency: k = N/2, . . . , N−1. This was already seen in example 1.
Consequently, the set of frequencies k = 0, . . . , N − 1 provides all orthogonal waves within
− 12fs < f < 1
2fs.
Due to this orthogonality, theK = N frequencies fk span a sufficient space to describe y[n] spectrally.
The Discrete Fourier transform (DFT) uses these frequencies to build up a discrete set of orthogonal
basis functions and can therefore be regarded as a special case of the DTFT, as will be discussed in the
following section.
2.4 Discrete Fourier transform
The discrete Fourier transform (DFT) is an extremely powerful tool for analysis of discrete signals. It
transforms a finite time-domain sequence y[n] sampled by fs to N samples into a finite frequency-
domain spectrum y[k] with N frequency bins, and reads:
y[k] =1
N
N−1∑n=0
y[n] e−i 2π k nN k = 0, . . . , N−1 (2.10a)
The obtained values correspond to complex waves with frequency
fk =k
Nfs k = 0, . . . , N−1 (2.10b)
The DFT can be considered as a generalisation of the complex Fourier series (2.5) of a finite signal
with f0 = fs/N and only for non-negative frequencies k. The inverse discrete Fourier transform
(IDFT) performs the transformation vice-versa:
y[n] =
N−1∑k=0
y[k] ei 2π k nN n = 0, . . . , N−1 (2.10c)
The factor 1/N in (2.10a) is present for the reason that the formulation of (2.10a) is not a unitary
transformation. The transformation can be made unitary by appropriate scaling, as was also seen for
the basis vector transformation (2.1a) in example 1. The normalised DFT then reads:
yn[k] =1√N
N−1∑n=0
yn[n] e−i 2π k n
N k = 0, . . . , N−1 (2.11)
yn[n] =1√N
N−1∑k=0
yn[k] ei 2π k n
N n = 0, . . . , N−1 (2.12)
showing that the DFT and IDFT are equal except for a minus sign. The formulation of (2.10a) is used
more frequently since it is intuitively related to the amplitudes of the partials.
The following sections briefly discuss some properties of the DFT. A more thorough discussion can
be found in standard textbooks on Fourier analysis, for instance [14, 13].
33
2. FREQUENCY-DOMAIN CONCEPTS
2.4.1 Spectral symmetry
The DFT exhibits symmetry about k = N/2 as can be observed from y[N − k] and the fact that
e−i 2πn = 1 for n ∈ N:
y[N − k] =1
N
N−1∑n=0
y[n] e−i 2π (N−k) nN
=1
N
N−1∑n=0
y[n] ei 2π k nN e−i 2π n
=1
N
N−1∑n=0
y[n] ei 2π k nN = y[k] k = 1, . . . , N−1
(2.13)
If y[n] is real-valued, y[N − k] is the complex conjugate of y[k] and the pairs combine to the real-
valued waves characterised by:
Ak =∣∣y[k]∣∣+ ∣∣y[N − k]
∣∣ = 2∣∣y[k]∣∣ (2.14a)
θk = ∠ y[k] = −∠ y[N − k] (2.14b)
which is in accordance with Euler’s formula (1.6c). Therefore, only n = 0, . . . , N/2 need to be
computed for a complete spectral representation of y[n].
2.4.2 Periodic extension & spectral leakage
The DFT intrinsically assumes the sequence y[n] to be periodic with period T = N/fs. After the last
sample y[N − 1], y[N ] = y[0] is expected. If there is a large difference between the samples y[N − 1]
and y[0], this will be understood as a 0th order discontinuity, resulting in so-called spectral leakage.
Leakage occurs for every periodic component that has incomplete periods, as illustrated in figure 2.7,
or discontinuities of higher order [18]. Spectral leakage can be minimised by applying windowing, as
discussed in section 2.5.
0 0.5 1 1.5 2−1
−0.5
0
0.5
1
Time [s]
(a) No discontinuity for f = 1Hz, T = 1 s
0 0.5 1 1.5 2−1
−0.5
0
0.5
1
Time [s]
(b) Discontinuity for f = 1.5Hz, T = 1 s
Figure 2.7: The DFT assumes periodic extension, which may lead to discontinuities.
34
2.5. WINDOWING
2.4.3 Plancherel theorem & Parseval’s theorem
The Plancherel theorem applied to the DFT reads:
1
N
N−1∑n=0
x[n] y[n] =
N−1∑k=0
x[k] y[k] (2.15a)
For x[n] = y[n], it reduces to the well known theorem of Parseval :
1
N
N−1∑n=0
∣∣y[n]∣∣2 =
N−1∑k=0
∣∣y[k]∣∣2 (2.15b)
The latter equation is used to determine the power of a signal; the RMS power is defined as the square
root of (2.15b).
2.4.4 Fast Fourier transform
The fast Fourier transform (FFT) is a highly efficient implementation of the DFT, proposed by James
W. Cooley and John W. Tukey in 1965 [6]. It computes the transformation of (2.10a) in a much more
efficient way by breaking up y[n] in multiple DFTs of smaller size. This approach is also known
as the divide-and-conquer method and allows the sub-problems to be solved as parallel procedures.
Especially when N is a power of 2 or a highly composite number, an enormous speed increase is
achieved. The necessary amount of multiplications is in the order 2log(N)N/2 for FFT, while direct
evaluation of DFT would require N2 multiplications. For example: for N = 4096, the increase of
speed is:
MDFT
MFFT=
N2
2log(N)N/2≈ 682
For most signal processing applications, it is common to chooseN as a power of 2.
2.4.5 Example 3: Trumpet DFT
Let us again consider the trumpet signal from example 2. The signal is sampled at fs = 16000Hz and
has a fundamental period T0 = 2.145ms. As a comparison, two DFTs are plotted in figure 2.8:
– Blue: a DFT for N = 34 points, corresponding with one period.
– Red: a DFT for N = 172 points, corresponding with five periods.
Both DFTs exhibit a small amount of leakage as the frequencies fk do not entirely match the
harmonics of f0 = 466Hz. Both spectra are symmetric about f = 8000Hz. The partials of example 2
appear as peaks with the same amplitude as in table 2.1. For the 1-period DFT, all points correspond
to the harmonic partials of the trumpet. For the 5-period DFT, only 1 out of 5 points correspond to
the harmonics; the remaining points represent leakage or noise.
2.5 Windowing
An arbitrary finite-length signal usually has discontinuities between the first and the last point. Only
frequencies that intersect with the frequencies of equation (2.10b) can be fully periodic in the interval.
35
2. FREQUENCY-DOMAIN CONCEPTS
0 2000 4000 6000 8000 10000 12000 14000 160000
0.05
0.1
0.15
0.2
Frequency [Hz]
Am
plitu
de
Figure 2.8: DFT of the first period of the trumpet signal
All other frequencies have incomplete periods which leads to discontinuities. The discontinuities
cause the DFT to come up with non-zero values at frequencies other than the principal frequencies
present in the signal. These spurious leakage components can sometimes be severe enough to mask
components from smaller signals.
A more mathematical explanation was formulated by Harris [12, page 173]:
From the continuum of possible frequencies, only those which coincide with the basis will
project onto a single basis vector; all other frequencies will exhibit non-zero projections
on the entire basis set. This is often referred to as spectral leakage and is the result
of processing finite-duration records. [. . . ] An intuitive approach to leakage is the
understanding that signals with frequencies other than those of the basis set are not
periodic in the observation window. The periodic extension of a signal not commensurate
with its natural period exhibits discontinuities at the boundaries of the observation. The
discontinuities are responsible for spectral contributions (or leakage) over the entire basis
set.
Windowing is a popular technique to reduce the effect of leakage by reducing the discontinuities at
the boundaries of the signal. This is achieved by multiplying the signal with a smooth amplitude
envelope that reaches zero (or almost zero) at the boundaries.
There are many different window functions; the choice of the window depends on the purpose.
The Matlab R© function in appendix B.1 implements 15 different window functions. The following
sections only discuss a few important window functions and ways to quantify their properties. For a
complete study on windowing, refer to [12].
2.5.1 Window application
Let y[n] be an N point signal and w[n] the window function of equal length. Then the windowed
signal is obtained by element-wise multiplication:
y(w)[n] = w[n]y[n] n = 0, . . . , N−1 (2.16)
36
2.5. WINDOWING
The time instances t[n] are:
t[n] =n−N/2
fs= n/fs − T/2 n = 0, . . . , N−1 (2.17)
Hence, the window length is T = N/fs, the time domain −T/2 ≤ t < T/2 and the window is
symmetric about t[N/2] = 0 s. The DFT of the window is denoted by w[k].
In general, windows have their maximum value at n = N/2 and reduce smoothly to zero towards the
boundaries. The obtained spectrum after windowing y(w)[k] is predictable, since it can be seen as a
convolution of the DFT of y[n] and w[n].
Since one is especially interested in the response of the window to incomplete waves, k should not be
limited to integers (k ∈ N). Instead, the Fourier transform of w[n] is considered as a function of the
continuous frequency f ∈ R in Hertz:
w(f) =1
N
N−1∑n=0
w[n] e−i 2πf nN − 1
2fs < f < 12fs (2.18)
Equation (2.18) represents the discrete-time Fourier transform (DTFT) of w[n] for the domain of
feasible frequencies (see section 2.3.2). The frequency f corresponds to the number of frequency
bins relative to DC, as will be illustrated below.
2.5.2 Window properties
The window properties are introduced using the rectangular window as an example. Figure 2.9 shows
(from left to right) the window w[n] on linear scale, w(f) on linear scale and w(f) on logarithmic
(dB) scale. The performance indicators of the window will be discussed separately.
−0.5 −0.25 0 0.25 0.50
0.5
1
1.5
2
Time [s]
Time domain − linear
−10 −5 0 5 100
0.2
0.4
0.6
0.8
1
Frequency [bins]
Frequency domain − linear
−10 −5 0 5 10−160
−120
−80
−40
0
Frequency [bins]
Frequency domain − dB
Figure 2.9: Time and frequency domain representation of the rectangular window.
Coherent gain
Let us have a look at the frequency spectrum of figure 2.9. As already mentioned, the spectrum
of a windowed signal can be seen as the spectrum of the signal convolved with the spectrum of the
window. Let the signal be composed from broad-band noise plus a single sinusoidal component. Then
37
2. FREQUENCY-DOMAIN CONCEPTS
the coherent gain (CG) is defined as the DC component of the window (f = 0) and determines the
gain of the sinusoid at its true frequency in the spectrum:
CG =1
N
∑N
w[n] (2.19)
For the rectangular window, CG = 1. However, for most windows CG < 1, meaning that the
amplitude of the principal component is reduced. Often, window functions are normalised to a
processing gain of 0 dB to make sure that the spectral amplitudes correspond to the true amplitudes,
apart from the contributions of the noise.
Equivalent noise power & bandwidth
Unfortunately, the amplitude determined by the coherent gain is biased by the neighbouring
frequency content that is accumulated according to the response of the filter for f �= 0. The total noise
power is defined as the integral of the square of the frequency response over the complete frequency
domain −fs/2 < f < fs/2:
NP =
∫ fs/2
−fs/2
|w(f)|2df =1
N
∑N
|w[n]|2 (2.20)
where Parseval’s theorem is used for the latter expression. The equivalent noise bandwidth (ENBW) is
a measure for the width of a hypothetical rectangular “filter” with coherent power CG2, that would
accumulate the same amount of noise power. In other words, it represents the width of a rectangle of
height CG2 that has the same squared area as the area under |w(f)|2. It is indicated by a dashed box
in the centre plot in figure 2.9. Using the definitions of CG and NP, it is given by:
ENBW =1
N
NP
CG2 (2.21)
The rectangular window has an ENBW of 1. For most windows however, ENBW > 1. Consequently,
the ENBW quantifies the reduction of the achievable spectral resolution compared to the rectangular
window.
Main lobe width & -6dB bandwidth
The main lobe width (MLW) is the width of the centre lobe between the first points where w(f) = 0.
It is a measure for the sharpness or spectral resolution of the DFT: lower values correlate with sharp
spectra while higher values produce more blurry spectra, making it difficult to distinguish closely
spaced frequencies. The -6dB bandwidth (BW) is a similar measure but corresponds the the bandwidth
between the points where w(f) = 0.5 = −6 dB. Note that both bandwidths are implications of the
window itself and are not related to leakage due to incomplete periods in the signal.
Side lobe level & roll-off rate
The side lobe level (SLL) and the side lobe roll-off (SLR) quantify the amount of leakage. The side lobe
level is the maximum level of the contributions of frequencies that are not part of the main lobe and
should therefore be minimised. The side lobe roll-off is a measure for the asymptotic rate of side lobe
level decrease per frequency bin, usually specified in dB per octave. It is a direct result of the order of
38
2.5. WINDOWING
discontinuity on the boundaries:
0th order 1/f −6 dB/oct
1st order 1/f2 −12 dB/oct
2nd order 1/f3 −18 dB/oct
pth order 1/fp+1 −6(p+ 1) dB/oct
The above relation is a result of the differentiation property of the Fourier transform: every additional
order of continuity adds factor 1/f to the roll-off rate [18].
2.5.3 Rectangular window
The rectangular or Dirichlet window (figure 2.9) is the most trivial window, as it is often explained as
applying no windowing at all:
w[n] = 1 n = 0, . . . , N−1 (2.22)
The DTFT of the rectangular window is given in closed form by:
w(f) =cos(πf)
πf= sinc(f) (2.23)
The coherent gain and equivalent noise bandwidth are both 1. The main lobe width is 2 bins: except
for f = 0, all integer frequency bins yield zero amplitude. The -6dB bandwidth is only 1.21 bins. The
side lobe level is −13.3 dB and the roll-off rate is of course −6 dB/oct, since the window exhibits a 0th
order discontinuity.
The rectangular window has the lowest possible ENBW, MLW and BW, meaning that it is able to
produce a very sharp DFT. However, due to the high side lobe level and slow roll-off rate, the window
suffers from severe leakage.
2.5.4 Hanning window
The Hanning window (or Hann, named after the Austrian meteorologist Julius von Hann) is perhaps
the most frequently applied window since it offers excellent leakage suppression and has a very
predictable response. The window is given by:
w[n] = 12 − 1
2 cos(2π
n
N
)= cos2
(π( nN
− 12
))n = 0, . . . , N−1 (2.24)
The window is shown in figure 2.10. The red point represent the window values at the DFT points,
i.e. integer values of f . Clearly, the CG is 0.5 due to the first term in equation (2.24). The cosine term
appears as two points with amplitude 14 at f = ±1. The ENBW is 1.5, meaning that the spectrum
is 1.5 times less sharp than the rectangular window. The SLL is −31.5 dB and the SLR is −18 dB/oct
since both the window value and its first derivative are continuous at the boundaries. This comes at
the cost of a higher bandwidth: the MLW is 4 bins and the BW is 2 bins.
The Hanning window offers much better leakage suppression then the rectangular window. In
addition, it has perfect temporal coverage when adjacent windows are observed with 12T spacing
in time, as will be discussed in chapter 5.
39
2. FREQUENCY-DOMAIN CONCEPTS
−0.5 −0.25 0 0.25 0.50
0.2
0.4
0.6
0.8
1
Time [s]
Time domain − linear
−10 −5 0 5 100
0.1
0.2
0.3
0.4
0.5
Frequency [bins]
Frequency domain − linear
−10 −5 0 5 10−160
−120
−80
−40
0
Frequency [bins]
Frequency domain − dB
Figure 2.10: The Hanning or Hann window.
−0.5 −0.25 0 0.25 0.50
0.2
0.4
0.6
0.8
1
Time [s]
Time domain − linear
−10 −5 0 5 100
0.1
0.2
0.3
0.4
0.5
Frequency [bins]
Frequency domain − linear
−20 −10 0 10 20−160
−120
−80
−40
0
Frequency [bins]
Frequency domain − dB
Figure 2.11: The Gaussian window for σ = 0.2 (blue), σ = 0.1 (green) and σ = 0.05 (red).
2.5.5 Gaussian window
The Gaussian window implements the Gaussian function with standard deviation σ:
wσ [n] = e− 1
2
⎛⎝ t[n]σ
⎞⎠
2
n = 0, . . . , N−1 (2.25)
Time t[n] is defined by (2.17). Due to the absence of the normalisation term 1/(σ√2π), the function
is not the normalised Gaussian or normal distribution with unity area. Instead, it has a peak value
w[N/2] = 1, just like the other windows. The Gaussian function is the only function known in closed
form that transforms to itself:
wσ(f) =(σ√2π)e−1
2
⎛⎝2πf
1/σ
⎞⎠
2
− 12fs < f < 1
2fs (2.26)
The proof is not trivial [1, equation 7.4.6, page 302]. The Gaussian window can thus be “tuned” by the
time-domain standard deviation σt, while the frequency-domain σf follows according to:
σf =1
2πσ(2.27)
Figure 2.11 shows the Gaussian window for σ = 0.2 (blue), σ = 0.1 (green) and σ = 0.05 (red). It can
be verified that both the time-domain and frequency-domain window have the shape of a Gaussian
40
2.5. WINDOWING
function. Note that the axes of the logarithmic frequency domain plot extend to ±20 bins.
The Gaussian function only reaches zero at infinity. Therefore the transformation of (2.26) only holds
for an time window of infinite length. In equation (2.25), the function is truncated at T/2. The
standard deviation σt determines the amount of discontinuity on the boundaries. Looking at the
logarithmic frequency domain plot, it is observed that w(f) is perfectly quadratic near the centre,
as was expected from equation 2.26. From a certain level, the response starts to show side lobes.
It is observed that the side lobe level increases with an increasing discontinuity of the time-domain
window w[0], although there is no explicit relation.
Since w(f) is logarithmic quadratic near the centre, frequency estimates can be obtained by quadratic
interpolation. Also, the Gaussian window minimises the time-frequency product, as will be discussed
in the following section.
2.5.6 Cosine and cosine-sigma window
Recall that the Hanning window can be written as a cosine function to the power of 2 (equation
(2.24)). The Hanning window is therefore part of the family of cosα windows. The family for non-
negative α can be characterised by a MLW of exactly 2+α bins and a SLR of−6(2+α) dB/oct, since
every additional cosine wave adds one order of derivative continuity. Furthermore, the following
dependencies were found for α ∈ [0, 10]:
SLL = 13.3− 7.47α dB
BW =√1.45 + 1.35α dB
The window for α = 0 is obviously the rectangular window. It is observed that for increasing α, the
window tends to converge to a Gaussian window. This observation is justified by the mathematical
limit that shows the convergence of a power cosine function to an exponential function:
limN⇒∞
cos
(t
N
)N2
= e−12 t
2
(2.28)
In contrast to the Gaussian window, the boundaries of a cosine window are always zero, meaning
that the window has a much higher roll-off rate. The cosine-sigma window is therefore suggested,
combining the properties of the cosα window and the Gaussian window:
wσ,α[n] =
⎧⎪⎨⎪⎩cos
(t[n]
σ√α
)α
− 12πσ
√α < t[n] < 1
2πσ√α
0 elsewhere
(2.29)
The parameter σ determines the theoretical Gaussian standard deviation for the case α = ∞. The
parameter α controls the exponent of the cosine function and thereby the validity of equation (2.28).
Note that the window for α = 2 and σ = 1/(π√2)= 0.225 is exactly the Hanning window.
Figure 2.12 shows a Gaussian window for σ = 0.1 (blue) and the cosine-sigma window with σ = 0.1
and α = 16 (green). On linear scale, the windows appear almost the same. On logarithmic scale, it
can be observed that the cosine-sigma window is slightly more concentrated. The measured standard
deviation is 0.097 which is close to the expected σ = 0.1. It is verified that the cosine-sigma window
converges to the Gaussian window for α >> 100.
41
2. FREQUENCY-DOMAIN CONCEPTS
−0.5 −0.25 0 0.25 0.50
0.2
0.4
0.6
0.8
1
Time [s]
Time domain − linear
−10 −5 0 5 100
0.1
0.2
0.3
0.4
Frequency [bins]
Frequency domain − linear
−20 −10 0 10 20−160
−120
−80
−40
0
Frequency [bins]
Frequency domain − dB
Figure 2.12: The Gaussian window for σ = 0.1 (blue) and the cosine-sigma window for σ = 0.1 andα = 16 (green). The cosine-sigma window has a better roll-off rate than the Gaussian window.
2.5.7 Example 4: Windowing of a simple signal
To illustrate the difference between window functions, a simple signal is considered:
y[n] = cos(2πf1t[n]
)+ cos
(2πf2t[n]
)+ 0.01 cos
(2πf3t[n]
)+ 0.01η[n] n = 0, . . . , N−1 (2.30)
The time instances are t[n] = n/fs with fs = 100Hz. The length of the sequence is N = 100, which
corresponds to T = 1 s.
The first periodic component at f1 = 10Hz is fully periodic in the window. The second component
at f2 = 25.5Hz exhibits a discontinuity and will cause leakage. A third component at f3 = 40Hz is
periodic in the window but has a 100× smaller amplitude. Furthermore, the signal is corrupted by
white noise represented by a random vector 0.01η[n] ∈ [−0.01, 0.01].
The signal is multiplied with four different window functions and the DFT is computed. The
amplitudes of the single-sided spectra are shown in figure 2.13. The four window functions are:
1. Rectangular window. The 10Hz component is represented by a single peak. The 25Hz com-
ponent however causes a severe amount of leakage, completely masking the third component
40Hz. Clearly, the rectangular window is a bad choice for signals with a so-called high dynamic
range.
2. Hanning window. The Hanning window is able to reveal all periodic components, although it
may be difficult to exactly determine the frequencies from the spectrum. The amplitudes of the
peaks are a factor 0.5 = 6 dB lower than the true amplitudes. The remaining 6 dB is spread
over the neighbouring frequency bins. Also, the window offers enough leakage suppression to
reveal the noise floor at approximately −80 dB.
3. Gaussian window with σ = 0.2. This windows yields similar results as the Hanning window.
All periodic components appear as peaks with a quadratic shape, regardless of being periodic
in the window or not.
42
2.6. UNCERTAINTY PRINCIPLE
0 10 20 30 40 50−120
−100
−80
−60
−40
−20
0
Frequency [Hz]
(a) Rectangular window
0 10 20 30 40 50−120
−100
−80
−60
−40
−20
0
Frequency [Hz]
(b) Hanning window
0 10 20 30 40 50−120
−100
−80
−60
−40
−20
0
Frequency [Hz]
(c) Gaussian window σ = 0.2
0 10 20 30 40 50−120
−100
−80
−60
−40
−20
0
Frequency [Hz]
(d) Gaussian window σ = 0.1
Figure 2.13: Four different window functions applied to a simple signal.
4. Gaussian window with σ = 0.1. The window offers a very poor frequency resolution, but
excellent leakage and noise suppression. If the aim was to estimate the spectral location of
the periodic components, this window may still be a good choice since the frequencies can be
estimated accurately using quadratic interpolation.
2.6 Time-frequency uncertainty principle
Throughout the chapter, it has become clear that temporal accuracy is inversely related with spectral
accuracy. Recall from example 1 that the time-domain basis vectors have excellent time localisation
but give no frequency information, while the Fourier basis vectors offer perfect frequency localisation
without time information. Frequency localisation means the ability to clearly identify periodic
components that are concentrated at particular frequencies [16].
In effect, by applying a non-rectangular window to a signal, one centres the observation at t = 0
and accepts that events “further away” from this centre are attenuated more than events close to the
43
2. FREQUENCY-DOMAIN CONCEPTS
centre. Thereby, windowing introduces a certain amount of temporal localisation.
To conclude the chapter, this fundamental trade-off is formalised as the time-frequency uncertainty
principle.
2.6.1 Temporal and spectral localisation
Let us once again consider a window w[n] of length N with time t[n] given by (2.17). Using the
definition for the total noise power (2.20), the temporal and spectral centres can be found by:
μt =1
NP
N−1∑n=0
t[n]∣∣w[n]∣∣2 (2.31a)
μf =1
NP
∫ fs/2
−fs/2
f |w(f)|2 df (2.31b)
Next, define the second central moments of the temporal and spectral distribution:
σ2t =
1
NP
N−1∑n=0
(t[n]− μt)2 ∣∣w[n]∣∣2 (2.32a)
σ2f =
1
NP
∫ fs/2
−fs/2
(f − μf )2 |w(f)|2 df (2.32b)
Note that σ2t and σ
2f are the standard deviations of the quadratic values of respectively w[n] and w(f).
If the window is well-localised in time, then the signal is concentrated at time instance μt and σ2t is
small. Similarly, if the window is well-localised in frequency, then the spectrum is centred around μf
and σ2f is small.
It can be verified that σ2t and σ2
f are invariant under time and frequency shift, by the definition of the
temporal and spectral centres (2.31a) and (2.31b). The relation for time-scaling by factor a writes:
σ2t
(at[n]
)=
1
|a|σ2t
(t[n])
a ∈ R (2.33a)
σ2f
(at[n]
)= |a|σ2
f
(t[n])
a ∈ R (2.33b)
This means that a decrease of the window length in time increases the spectral width proportionally.
2.6.2 Time-frequency product
The dimensionless time-frequency product is given by:
U = σtσω = 2π σtσf (2.34)
Obviously, U is invariant under time and frequency shifts. By equation (2.33a) and (2.33b), the product
is also invariant under time scaling. The product can therefore be interpreted as a measure of how
well the window is localised in both time and frequency: a low value of U correlates with a good
localisation.
44
2.7. SUMMARY
2.6.3 Uncertainty principle
The time-frequency uncertainty principle states that no window or waveform can have a time-
frequency product less than 12 , frequency being expressed in radians:
U = 2π σtσf >12 (2.35)
The proof follows from the Cauchy-Schwartz inequality. It is related to Heisenberg’s uncertainty
principle for quantum physics, that states that one can not at the same time determine the position
and momentum of a particle to an arbitrary degree of accuracy. Similarly for time-frequency analysis,
the simultaneous accuracy of time and frequency localisation is limited by (2.35).
2.6.4 Example 5: Time-frequency product of four windows
The time-frequency products of the four windows of example 4 are determined. As stated above, the
lower limit of U is 0.5. The windows are shown in figure 2.9, 2.10 and 2.11.
1. Rectangular window. The second central moment of the window is exactly ( 112 )
3/2 = 0.29 s.
The second central moment of the frequency spectrum is 8.38Hz. The time-frequency product
is 15.200, which is awfully high compared to the lower limit.
2. Hanning window. σt = 0.18 s and σf = 0.57Hz. The time-frequency product is 0.513, which
is close to the minimum.
3. Gaussian window with σt = 0.2. The second central moments are σt = 0.14 s and σf =
0.83Hz. The time-frequency product is 0.742. Although the window is a Gaussian, the lower
limit of 0.5 is not reached since the Gaussian is truncated, which causes a large discontinuity.
4. Gaussian window with σt = 0.1. The second central moments are σt = 0.07 s and
σf = 1.13Hz. The time-frequency product is 0.500. The window achieves the lower limit
for uncertainty. Compared to the Gaussian window with σt = 0.2, the truncation causes a
negligible discontinuity.
Concluding, the Hanning window appears to be a good all-round window. For specific purposes, the
Gaussian window or cosine-sigma may be preferred, especially if one needs control over temporal
and spectral localisation.
2.7 Summary
A frequency-domain representation provides spectral information of a signal or function, in contrast
to the time-domain representation that only provides temporal information. The study of signals in
terms of frequency content is called a Fourier analysis. The fundamental concept is the observation
that periodic signals can be decomposed into harmonic sinusoids, the so-called Fourier series.
The continuous-time and discrete-time Fourier transform generalise this concept to non-periodic
signals and provide a continuous frequency domain. The discrete Fourier transform only examines
the frequencies that have complete periods over the signal length. As these waves are orthogonal, the
DFT functions as a linear transformation.
45
2. FREQUENCY-DOMAIN CONCEPTS
The relationship between the four Fourier transformations and their domains is depicted schemat-
ically in figure 2.14. The continuous / discrete time domain is distinguished horizontally; the
continuous / discrete frequency domain vertically. Note that the Fourier series only fits in this scheme
if the windows length T equals the fundamental period T0. In that case, the complex series ck are
directly related to y[k].
y(t) y[n]
y(f)
y[k]FS
R ⇒ Z
DFT
N ⇒ N
CTFT
R ⇒ R
DTFT
N ⇒ R
Figure 2.14: The four Fourier transformations and their domains.
Finite-length signals usually exhibit waves that are not fully periodic over the signal length. This
causes discontinuities on the boundaries, resulting in spectral leakage. Leakage can be reduced by
applying a window function. Generally, a window function suppresses the spectral leakage, at the
cost of a reduced spectral resolution. The window function should be chosen with care, depending on
the analysis purpose.
For every finite-length signal, one can determine the localisation accuracy in time and frequency.
By choosing the window function and length, both properties can be controlled. However, the
simultaneous temporal and spectral accuracy is limited by the time-frequency uncertainty principle.
The Gaussian window is the only window that potentially reaches the minimum time-frequency
product. The trade-off between time and frequency accuracy is a fundamental issue for time-
frequency analysis.
46
Part II
Advanced concepts
Chapter3
Time Warping
Time warping is a technique that applies stretching and contraction of a signal in the time domain. It
involves a modification using a non-linear time function that realises frequency scaling. The concept
of a non-linear time may sound odd, but has some well-known equivalents in real life:
– A disk-jockey manually controlling the speed of a vinyl disc on a turn-table
– The observed frequency shift of the siren signal of a passing emergency vehicle (Doppler effect)
– The vibrato or chorus effect of a guitar, realised by a series of electronic delay circuits
In all cases, frequency scaling occurs due to a change in the way time is conceived. This conceived
time or warped time is denoted by t and can be expressed as t = φα(t) or t = ψα(t), some non-linear
functions of t. The time-warped signal then reads y(t) = y(t).
Note that the term frequency scaling is used rather than frequency modulation or shift. Frequency
modulation is achieved by multiplication with a complex wave. If this wave has frequency fm, the
frequency content of the modulated signal is shifted entirely with −fm or +fm, depending on the
sign in the exponent. This modulation is the basis of the Fourier transformations, as discussed in
chapter 2. In contrast, time warping scales all frequencies by a constant rate, keeping the ratios
between harmonics intact.
The concept of time warping is often applied for analysis of non-stationary chirp signals1, i.e. signals
with increasing or decreasing frequency. It will be shown that time warping can be used effectively
as pre-processing operation for analysis of a more general class of non-stationary signals. It thereby
concentrates the energy of the partials on centre frequencies while preserving the harmonic structure.
3.1 Linear time warping
As an introduction on the time warping topic, the following sections discuss the theory of linear
time warping in the continuous time domain. The formulation of the linear time warping function is
adopted from [21] where it was mentioned as part of the fan Chirp transform (see chapter 6). This
section discusses linear time warping as an independent operation.
1The term chirp is adopted from the sound of chirping birds.
49
3. TIME WARPING
3.1.1 Warp function
Following the notation of [21] and [5], the linear time warping function2 is defined as:
φα(t) = (1 + 12αt)t (3.1)
and the time derivative:
φ′α(t) = 1 + αt (3.2)
Note that the time warp function is a quadratic function of t for α �= 0, and simply the linear time
function for α = 0.
3.1.2 Chirp wave
Let us consider a linear chirp wave y(t) with mean frequency fc, subject to the warping function of
equation (3.1). The instantaneous frequency is defined as:
f(t) = (1 + αt)fc = φ′α(t)fc (3.3a)
and the wave with constant amplitude A reads:
y(t) = A cos
(2π
∫(1 + αt)fc dt
)= A cos
(2πfc φα(t)
) (3.3b)
fc is the instantaneous frequency at t = 0. Note that f(−1/α) = 0, meaning that the instantaneous
frequency has a focal point at t = −1/α, regardless of the value of fc. Beyond the focal point,
frequencies become negative, which in this case implies reversal of time. Since this is not the aim of
warping, the time domain is limited to −1/α < t < 1/α. In this interval, fc is the mean frequency.
3.1.3 Chirp rate
The chirp rate α is defined as the frequency increase relative to the mean frequency fc and reads:
α =f ′(t)fc
(3.4)
The derivative of (3.2) is only valid for a constant chirp rate α. For linear chirps, f ′(t) is constant sothe chirp rate α is constant as well, which is easily verified from equation (3.3a):
α =(φ′α(t) fc)
′
fc= φ′′α(t) = α
For higher order chirp waves, the time-derivative writes:
α = φ′′α(t) = α+O(t) (3.5)
indicating that the chirp rate is not constant. For linear time warping, the higher order terms must be
neglected. Non-linear time warping is discussed in section 3.2.
2Note that the warp function denoted by φα(t) is not related to the phase function ϕ(t), although the Greek letter (“phi”)may appear the same.
50
3.1. LINEAR TIME WARPING
3.1.4 Inverse warp function
Linear chirp signals that satisfy f(t) = φ′α(t)fc can be transformed back to stationary signals by
applying inverse time-warping. In order to perform inverse time-warping, an inverse expression for
φα(t) must be found: t(φα) = φ−1α (t). This is not trivial since φα(t) is a quadratic expression that
may have two solutions. For the time domain −1/α < t < 1/α, it is verified that:
0 < φ′α(t) < 2 ∀α (3.6)
Since φ′α(t) has no sign change, φ−1α (t) is a strictly monotonic increasing function. Consequently, the
inverse has only one solution:⎧⎨⎩φ
−1α (t) = − 1
α+
√1 + 2αt
αfor α �= 0
φ−1α (t) = t for α = 0
(3.7)
From here on, the inverse warp function is denoted by the symbol ψ(t) = φ−1α (t).
Note that√1 + 2αt becomes imaginary for t < −1/2α. Also, the time derivative or warped-time
rate reads:
ψ′α(t) =
1√1 + 2αt
(3.8)
The warped-time rate reaches infinity at t = −1/2α, implying that the warped time would have to
run infinitely fast and then becomes imaginary. The applicable time domain is therefore reduced to:
−1/2α < t < 1/2α (3.9)
which is referred to as the time support property of the warp function [21]. Figure 3.1 shows the
warp function (green) and inverse warp function (red) for α = 0.5 and −1 < t < 1, the maximum
allowable warp rate. It can be observed that the derivative of the inverse warp function reaches
infinity at t = −1 = −1/2α.
3.1.5 Inverse time warping
A time-warped version y(t) of the linear chirp signal y(t) can now be obtained using the inverse warp
function:
y(t) = y(ψα(t)) = A cos(2πfc φα
(ψα(t)
))(3.10)
Development of the interior time-dependent term yields:
φα(ψα(t)
)= − 1
α+
√1 + 2αt
α+
1
2α
(1
α2+
1 + 2αt
α2− 2
√1 + 2αt
α2
)= t
(3.11)
which shows that the the linear chirp signal y(t) after time warping has become a stationary signal
with constant frequency fc:
y(t) = A cos(2πfc t) − 1
2α< t <
1
2α(3.12)
51
3. TIME WARPING
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−2
−1.5
−1
−0.5
0
0.5
1
1.5
Time
War
ped
time
Warped time φα(t)
Inverse warped time ψα(t)
Figure 3.1: The warp and inverse warp functions.
It can be concluded that any linear chirp signal can be transformed into a constant-frequency signal
by applying linear time-warping, as long as the time support −1/2α < t < 1/2α is satisfied. This is
illustrated by the following example.
3.1.6 Example 6: Linear warp
An example is shown in figure 3.2 for both the time and frequency domain. The chirp signal is
composed from a cosine wave plus a sine wave with half the amplitude and twice the frequency:
f(t) = 4 + t
ϕ(t) = 2π (4t+ 12 t
2)
y(t) =1
2cos(ϕ(t)
)+
1
4sin(2ϕ(t)
)The mean frequency is fc = 4Hz; the chirp rate is α = 0.25. The time interval of interest is
t = (−1, 1), which lies safely in the supported time-domain. The warped time interval is found from
(3.7):
t = ψα(t) = (−4 + 2√2,−4 + 2
√6) = (−1.1716, 0.8990)
The original signal for t is shown in green. The warped signal y(t) = y(t) is shown in red.
Clearly, y now has a constant frequency and will appear as two distinct pulses in a Fourier
representation (provided that leakage is absent by proper choosing of window length, see section
2.5). This will be illustrated in example 5.
52
3.2. NON-LINEAR TIME WARPING
−4 −3 −2 −1 0 1 2 3 4−1
−0.5
0
0.5
1
Am
plitu
de
Time domain
−4 −3 −2 −1 0 1 2 3 40
5
10
15
Time [s]
Fre
quen
cy [H
z]
Frequency domain
Figure 3.2: A warped signal in time and frequency domain.
3.2 Non-linear time warping
The previous discussion merely applied to linear time warping functions with constant chirp rate α.
It was seen that the inverse function of φα(t) only exists is φα(t) is strictly monotonically increasing,
i.e. φ′α(t) > 0. However, that does not limit the applicability to linear warp functions.
Consider the instantaneous frequency f(t) > 0 on the time domain −T < t < T . Let us define
fc = f(0). According to (3.3a), the time-derivative of the warp function writes:
φ′(t) =f(t)
fc(3.13)
such that the non-linear warp function can be found by integration:
φ(t) =
∫φ′(t) dt (3.14)
A “quick and dirty” inverse warp function can be found by time-integration of the inverse of the warp
rate (3.13):
ψ(t) ≈∫
1
φ′(t)dt =
∫f(0)
f(t)dt (3.15)
The basic idea is that the inverse warp rate ψ′(t) for some point t should be roughly 1/φ′(t) to stretchout frequency modulations in f(t). An approximation of the inverse warp function is then found by
time-integration, which can be performed either numerically or symbolically. Note that it may be
necessary to bias ψ(t) by some value, to make sure that ψ(0) = 0.
Equation (3.15) applied to the linear chirp function yields:
ψ(t) ≈∫
1
1 + αtdt =
log(1 + αt)
α
For small chirp rates, (3.15) yields a good approximation of the exact inverse warp function (3.7).
53
3. TIME WARPING
3.3 Discrete implementation & interpolation
The example above describes an analytical signal in a rather ideal situation, since every time
point ψα(t) is available from the function. As discrete signals are sampled at a finite number of
instances t[n], n = 1, . . . , N , the instances ψα(t[n]) do generally not correspond to existing samples.
Hence, non-uniform interpolation is required to obtain y for the instances t[n]. The quality of
the interpolation process is crucial: a badly performed interpolation increases the spectral leakage
dramatically. The issue of interpolation is addressed in the following sections.
3.3.1 Interpolation approaches
Two interpolation approaches are distinguished:
1. Upsampling — cheap interpolation
2. Expensive interpolation
Method 1 first samples the signal up by a factor r to increase the number of time points. Next, a
numerically cheap interpolation algorithm is employed to obtain the samples for t[n]. Cheap methods
include nearest neighbour (0th order) and linear (1st order) interpolation, or polynomial methods of
low degree e.g. cubic spline interpolation. Since resampling can be performed quickly, this method
does not necessarily take more time than the second approach. Reference [5] suggests upsampling by
a factor 2 and linear interpolation.
Method 2 skips the resampling step and performs the interpolation right away. To obtain similar
or better results than the first approach, a numerically more expensive interpolation method should
be employed. Theoretically, a sinc interpolation achieves the best result: according to the Nyquist-
Shannon sampling theorem, a band-limited signal can be revealed exactly from its samples by:
y(t) =
N∑n=1
y[n] sinc(fs(t− t[n])
)(3.16)
The sinc function is defined as:
sinc x =cos(πx)
πx
For every point t, the complete sequence is evaluated and weighted by the sinc function. This
implementation is computationally far to expensive and thus seldom used. Luckily, similar results
can be obtained using higher order spline interpolation [15].
3.3.2 Spline interpolation
The B-spline or basis spline interpolation method generalises a large class of interpolation techniques.
The interpolation algorithm for order q writes:
y(t) =
N∑n=1
c[n]βq(d[n])
(3.17a)
d[n] = fs(t− t[n]) (3.17b)
54
3.3. DISCRETE IMPLEMENTATION & INTERPOLATION
−3 −2 −1 0 1 2 30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Deviation
Wei
ghtin
g
0
1
2
3
4
5
6
7
8
Figure 3.3: Spline basis functions obtained by convolution of the 0th order function
d[n] is the sample deviation of t to every point t[n], for instance: a value of 0.5 corresponds to a time
instance exactly in the middle of two existing time points in t[n]. βq(d) is the basis function of order
q as a function of the deviation d. The basis functions for 0th to 8th order are shown in figure 3.3. They
are the result of repetitive convolution of the 0th order basis function:
βq(d) =
∫ ∞
−∞βq−1(δ)β0(d− δ) dδ (3.18a)
with
β0(d) =
⎧⎨⎩1 for |d| ≤ 0.5
0 for |d| > 0.5(3.18b)
The coefficients c[n] are found by solving the linear system of N equations:
y[n] =
N∑n=1
c[m]βq(d[n])
m = 1, . . . , N (3.19)
Looking at the basis functions, it can be observed that the width of function βq is limited to q, which
is a direct result of the convolution. Therefore both (3.17a) and (3.19) can be performed by sparse
linear arithmetic [15].
It may be observed that spline interpolation for order q = 0, 1, 2, 3 is equivalent to respectively
nearest neighbour, linear, quadratic and cubic interpolation.
3.3.3 Interpolation performance
To test the accuracy of the interpolation methods, 4 different test signals yref[n], −2 < t[n] < 2 are
considered at a sample rate of fs = 1024Hz:
55
3. TIME WARPING
1 2 4 8 16 32
10−12
10−9
10−6
10−3
100
Resampling ratio (r) or interpolation order (q)
Rel
ativ
e er
ror
(a) Cosine wave with f0 = 10Hz
1 2 4 8 16 32
10−12
10−9
10−6
10−3
100
Resampling ratio (r) or interpolation order (q)
Rel
ativ
e er
ror
(b) Cosine wave with f0 = 100Hz
1 2 4 8 16 3210
−3
10−2
10−1
100
Resampling ratio (r) or interpolation order (q)
Rel
ativ
e er
ror
(c) Block wave with f0 = 10Hz
1 2 4 8 16 32
10−9
10−6
10−3
100
Resampling ratio (r) or interpolation order (q)
Rel
ativ
e er
ror
(d) White noise band-limited to fs/4 = 256Hz
Figure 3.4: Interpolation errors. Blue, green and red correspond to respectively nearest neighbour,linear and spline interpolation. The x-axis represents resampling ratio r for the nearest neighbourand linear interpolation methods and interpolation order q for the spline interpolation.
1. Cosine wave with f0 = 10Hz
2. Cosine wave with f0 = 100Hz
3. Block wave with f0 = 10Hz
4. White noise band-limited to fs/4 = 256Hz
All signals are first warped using equation (3.1) to yw1[n] and then warped back to yw2[n] according
to (3.7). The chirp rate is chosen α = 0.25 for all signals. For evaluation of yw2[n], the time domain
−1 ≤ t < 1 s is considered. The error is determined as the norm of the difference between the original
signal yref[n] and warped signal yw2[n]:
εt =
∥∥∥yw2[n]− yref[n]∥∥∥∥∥∥yref[n]∥∥∥
The latter is obtained from the DFT of the time signals.
The results are shown in figure 3.4. Blue, green and red correspond to respectively nearest neighbour,
linear and spline interpolation. For the nearest neighbour and linear interpolation methods, the x-axis
indicates the ratio of resampling r. For the spline interpolation, the x-axis corresponds to the order
56
3.3. DISCRETE IMPLEMENTATION & INTERPOLATION
of the spline algorithm q .
Some observations:
– The nearest neighbour interpolation performs worst, followed by the linear interpolation. The
spline interpolation with q > 3 outperforms both methods. However, the spline method takes
considerably more time to compute, since it first needs to solve the system of equation (3.19)
and then computes the desired points by (3.17a).
– The nearest neighbour interpolation errors decrease with approximately 1/r. The linear
interpolation errors decrease in some cases with approximately 1/r2 and in some cases remain
equal.
– The spline interpolation errors for q = 1 equals the linear interpolation errors. This confirms
that the methods are the same for q = 1without resampling. The quadratic spline interpolation
for q = 2 appears to be a bad choice.
– It was observed that the interpolation methods that use resampling (nearest neighbour and
linear interpolation) suffer from a small time shift. This might explain the high error compared
to spline interpolation.
– All methods have difficulties with warping of the block wave. The block wave can be considered
a worst-case signal, as it comprises large discontinuities that are difficult to fit with any of the
interpolation methods. Also, since the frequency content of the block wave extends to fs/2,
aliasing occurs due to warping. Resampling or the use of higher order spline methods do not
yield any improvements.
– The band-limited noise signal can be warped to reasonable accuracy. If the noise was not band-
limited, the results would be almost as bad as for the block wave.
Generally, it can be concluded that spline interpolation is a good choice for a wide range of signals. A
basic implementation of multi-order spline interpolation is included in the Curve Fitting ToolboxTM
of Matlab R© . In addition, some efficient implementations are available under the BSD licence on the
File Exchange of the Mathworks R© website.
A Matlab R© function ����������was written, implementing all interpolation algorithms that were
discussed in this section. See appendix B.2 for the syntax and implementation.
3.3.4 Example 7: Linear warp DFT
To conclude this chapter, the DFT of the warped signal of example 6 is considered. The signal is
sampled at fs = 100Hz and is composed from a cosine wave with A = 0.5 and fc = 4, plus a sine
wave with A = 0.25 and fc = 8. The chirp rate is α = 0.25. The DFT of the original chirp signal
(green) and the warped signal (red) for −1 ≤ t < 1 are shown in figure 3.5, a Hanning window is
applied.
The DFT of the original signal is spread out over the frequency range, which makes it difficult to
determine the centre frequency. The DFT of the warped signal shows two distinct peaks (with side
lobes caused by the Hanning window) at 4 and 8Hz with the right magnitude.
57
3. TIME WARPING
0 2 4 6 8 10 12 14 16 18 200
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Frequency [Hz]
Am
plitu
de
Original signalWarped signal
Figure 3.5: DFT of original and warped signal.
3.4 Summary
Time warping can be applied to (pieces of) a non-stationary signal in the time domain. The aim is to
stretch out frequency variations by stretching and contraction of the signal. The ratio of frequency
variation is called the chirp ratio. Time warping can theoretically warp chirping signals into constant
frequency signals, as long as certain conditions are satisfied.
Time warping requires an inverse of the warp function. In case of linear warping, the inverse function
exists in closed-form. In case of non-linear warping however, it can be challenging to find an inverse
expression. If it does not exist in closed form, the inverse may be approximated either by linearisation
of the warp function, or by time-integration of the inverse warp rate.
A warped signal can be obtained from a discrete-time signal by non-uniform interpolation. One
approach first resamples the signal to a higher sample rate and then applies a cheap interpolation
algorithm, such as nearest neighbour or linear interpolation. A second approach applies a higher
order spline algorithm on the original signal. The second approach generally yields better results
than the first, although it is computationally a bit more demanding.
Time warping proves to be a useful technique in detecting non-stationary tonal components, since
it concentrates the energy of the partials on the centre frequencies while preserving the harmonic
structure.
58
Chapter4
Timbre
The term timbre is so often used, but also ambiguously defined. It is mostly used in a perceptual
context, saying something about the quality of tonal sound. For certain, it has nothing to do with
pitch or loudness. It may be related to the presence of the harmonic partials or the amount of vibrato
(amplitude and frequency modulation), but more often it is expressed in qualitative terms like bright,
harsh, nasal, sonorous, etc. For sure, timbre helps the perceiver in determining the source of the sound.
For example, a singer and a saxophone may play (sing) the same note at the same loudness, but due
to timbral differences, one is able to distinguish the different sources.
Many interesting psycho-acoustic studies focus on perceptive attributes of timbre, which is beyond
the scope of this thesis. Rather than qualitative descriptions, two quantitative properties can assist in
identification of tonal components in a signal: harmonic relative amplitude and relative phase.
A novel method is proposed that uses both properties to investigate the stationarity of the harmonic
partials. For this purpose, the timbre is defined as the amplitude and phase shift of the partials relative
to its fundamental partial. As such, the timbre may characterise a tonal component independently of
time and can thus be employed to identify and track tonal components in a signal.
This method and the concept of timbre is introduced in a step-by-step way in the following sections.
4.1 Definition of timbre
Let us consider a real-valued continuous-time signal y(t). The signal consists of a stationary tonal
component with fundamental frequency f0, built up fromK stationary partials xk(t):
xk(t) = ck eiϕk(t) k = ±1,±2, . . . ,±K (4.1)
The notation is analogue to the complex Fourier series, see section 2.2.2. For convenience and without
loss of generality, only the positive frequencies are considered, as ck and c−k are complex conjugate
pairs. The complex scalars ck determine both amplitude and phase shift of partial k:
Ak = ‖ck‖ (4.2a)
θk = ∠ck = tan−1 �(ck)�(ck) (4.2b)
59
4. TIMBRE
Since the signal is stationary, frequency and amplitude of the partials are constant. Hence, the phase
of partial xk(t) grows linearly:
ϕk(t) = 2π kf0 t+ θk k = 1, . . . ,K (4.3)
with θk the phase shift of partial k at t = 0.
4.1.1 Normalised amplitude and phase
Let us now define the normalised amplitude and phase shift as quantities relative to the fundamental
partial k = 1:
Ank =
Ak
A1(4.4a)
θnk = θk − k θ1 (4.4b)
The first equation is straight-forward. The latter is less obvious: it uses the fact that the phase of
partial xk(t) runs k times faster than partial x1(t), as seen in equation (1.17). Similarly, the normalised
phase function at time t is defined as:
ϕnk (t) = ϕk(t)− k ϕ1(t)
= 2π kf0 t+ θk − k (2π f0 t+ θ1)
= θk − k θ1
= θnk
(4.5)
which means that the normalised phase function has become the normalised phase shift: a time-
independent angle, relative to the phase shift of the fundamental. Consequently, θn1 = 0 and
ϕn1 (t) = 0 by definition.
Figure 4.1 illustrates the phase normalisation for three partials. The fundamental frequency is 1Hz.
The phase functions and normalised phase functions are given by:
ϕ1(t) = 110 + t ϕn1 (t) = 110 + t− (110 + t) = 0◦
ϕ2(t) = 180 + 2t ϕn1 (t) = 180 + 2t− 2(110 + t) = −40◦
ϕ3(t) = 30 + 3t ϕn1 (t) = 30 + 3t− 3(110 + t) = −300◦
It is observed that the normalised phase functions reduce to the constant normalised phase shifts:
ϕnk (t) = θnk .
4.1.2 Complex normalisation
The normalisation steps can be performed efficiently using the complex notation of (4.1). Multiplica-
tion of ck with complex scalar rp is equivalent to scaling with ‖rp‖ and rotation with p∠r. If r hasunit length, only the rotation is effective. This property can be used for normalisation of the partials:
cnk =ck‖c1‖
(c1
‖c1‖)−k
(4.6)
60
4.2. INSTANTANEOUS TIMBRE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−360
−180
0
180
360
540
720
900
1080
1260
Time [s]
Pha
se [d
eg]
φ
1(t)
φ2(t)
φ3(t)
φ1n(t)
φ2n(t)
φ3n(t)
Figure 4.1: Three phase functions (continuous lines) together with their normalised phase functions(dashed lines).
The first term normalises the amplitude, the second term rotates with angle −k∠c1. Depending on
the purpose, one can choose to leave out the amplitude normalisation and only normalise the phases:
cθk = ck
(c1‖c1‖
)−k
(4.7)
4.1.3 Timbre vector
The normalised partials from equation (4.6) can be combined to a K-dimensional complex timbre
vector:
cn =
⎛⎜⎜⎜⎝cn1cn2...
cnK
⎞⎟⎟⎟⎠ or cθ =
⎛⎜⎜⎜⎝cθ1cθ2...
cθK
⎞⎟⎟⎟⎠ (4.8)
In the ideal case of stationary partials, this vector remains constant and uniquely defines the timbre
of the tonal component.
4.2 Instantaneous timbre
The definition of timbre (4.5) applies to harmonically stationary signals. In other words, (4.5) only
holds when the frequencies of the partials are kept at an exact ratio throughout the time interval. It
will not come as a surprise that this concerns a rather hypothetical situation.
Consider for example the harmonics of a vibrating piano string. If the dynamics of the string
were fully governed by linear dynamics, one would find the harmonics at exact multiples of the
fundamental frequency. Due to their mutual orthogonality, there would be no interaction between
61
4. TIMBRE
the harmonics, hence the timbre would remain stationary throughout a free vibration.
However, little dynamic systems behave perfectly linear. The string, for example, may exhibit a
certain amount of inharmonicity: a small discrepancy between the actual harmonic frequencies and
their ideally expected values. This can for instance be caused by bending stiffness or in-elasticity of
the material; effects that can not be fully described by linear dynamics. As a result, the timbre is no
longer exactly stationary.
Thus, for real-life signals, it can be interesting to see to what extend the timbre remains stationary
over time. This can be studied on the basis of the instantaneous timbre (IT). For an arbitrary non-
stationary signal, the instantaneous timbre at time t can be interpreted as a “cross-section” of the
analytical signal for a certain instant in time. That is: the instantaneous amplitude and phase per
partial, normalised to the fundamental partial.
4.2.1 Definition
Consider a non-stationary signal y(t) consisting of a tonal component with instantaneous fundamen-
tal frequency f0(t). Then the instantaneous amplitude and phase of partial k are determined by:
ck(t) = 21
T
∫ T/2
−T/2
w(τ) y(t + τ) e−i 2π kf0τdτ k = 1, . . . ,K (4.9)
Equation (4.9) is in fact a Fourier transform for a single frequency component fk = kf0, centred
around time instance t. Factor 2 compensates for the absence of the negative frequencies. T is the
length of the window. w(t) is a window function that is symmetric about τ = 0 (see section 2.5). The
complex exponential is the modulator that shifts all frequency content of y(t) by −kf0.
Using equation (4.6) and (4.8), the instantaneous timbre reads:
cn(t) =
⎛⎜⎜⎜⎝cn1 (t)
cn2 (t)...
cnK(t)
⎞⎟⎟⎟⎠ or cθ(t) =
⎛⎜⎜⎜⎝cθ1(t)
cθ2(t)...
cθK(t)
⎞⎟⎟⎟⎠ (4.10)
4.2.2 Discrete implementation
The discrete-time implementation of (4.9) uses the DTFT to compute the partials. Consider the signal
y[n] of lengthN with corresponding time vector t[n] = n/fs, n = 0, . . . , N−1. In order to obtain the
instantaneous timbre for a certain instant in time tb, a smaller block yb[m] of sizeM < N is analysed.
Block yb[m] has length Tb =M/fs and is obtained from y[n] by:⎧⎨⎩yb[m] = y[m−M/2 + nb] m = 0, . . . ,M−1
nb = tbfs nb ∈ n(4.11)
with nb the sample index corresponding to the time instance tb of interest.
62
4.2. INSTANTANEOUS TIMBRE
0 0.5 1 1.5 20
0.05
0.1
0.15
0.2
Time [s]M
agni
tude
0 0.5 1 1.5 2−360
0
360
Time [s]
Pha
se [d
eg]
f1
f2
f3
f4
f5
f6
Figure 4.2: Timbre of a stationary trumpet note, normalised to the phase of the fundamental.
TheK partials for f0 at the particular point in time tb are then determined by:
ck(tb) = 21
M
M−1∑m=0
w[m] yb[m] e−i 2π kf0mfs k = 1, . . . ,K (4.12)
and the timbre vector cn(tb) or cθ(tb) is constructed using (4.10). Equation (4.12) can be evaluated
efficiently by a small C-code, that can be found in appendix A.1.
4.2.3 Example 8: Trumpet timbre
The timbre of a stationary trumpet note is determined from a fs = 16000Hz recording of a B�4 with
fundamental frequency f0 = 466Hz. Six partials are estimated using (4.12). A Gaussian window is
used with lengthM = 4000, equal to Tb = 0.25 s.
Figure 4.2 shows the timbre cθ(t) of the six partials. The normalised phase of the first partial is zero
throughout the interval, as expected from the definition. The other partials are quite stable throughout
the stationary part of the signal. From t = 1.8 s, the amplitudes drop and the phases change. This is
understandable, since the signal is no longer stationary.
4.2.4 Bandwidth considerations
Example 8 concerns a monophonic signal, i.e. a signal with only one tonal component. For polyphonic
signals or signals with disturbances, the bandwidth of the timbre estimation by equation (4.12) can be
an important issue, as will be discussed next.
63
4. TIMBRE
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
Time [s]M
agni
tude
0 0.5 1 1.5 2 2.5 3 3.5 4−180
−90
0
90
180
Time [s]
Pha
se [d
eg]
f1
f2
f3
f4
Figure 4.3: Timbre of a disturbed signal for M = fs/4. The bandwidth is to high to neglect thedisturbance.
Consider the following signal sampled at fs = 1024Hz with a tonal component at f0 = 10Hz, build
up from 4 partials:
y[n] =
4∑k=1
1
kcos
(2πkf0
n
fs
)n = 0, . . . , N−1
Hence, the frequencies present in the signal are 10, 20, 30, 40Hz. Let the signal be disturbed by an
enharmonic wave at fd = 22Hz:
yd[n] = y[n] + cos
(2πfd
n
fs
)n = 0, . . . , N−1
First, the timbre of the tonal component at f0 is estimated using a window size of M = 256,
Tb = 0.25 s and a Hanning window. The results are shown in figure 4.3; note that the amplitudes
are attenuated by factor 12 due the coherent gain of the Hanning window (see section 2.5.4).
Clearly, the disturbance has influence on the second partial. It follows from DFT theory that the
frequency spacing between two orthogonal waves is M/fs = 1/Tb = 4Hz. Hence, the spectral
resolution or bandwidth is 4Hz. In addition, the Hanning window has a −6 dB bandwidth of 2 bins,
thereby increasing the effective bandwidth to 8Hz.
In fact, the algorithm “feels” the two waves at 20Hz and 22Hz as one wave at 20Hz with some
amplitude modulation. From a continuous-time approximation, partial c2(t) reads:
c2(t) =
∫Tb
(12 cos(2π 20 t) + cos(2π 22 t)
)e−i 2π 20tdt = a+ b cos(2π 2 t)
The scalars a and b follow from the window characteristics. Indeed, the wave at 22Hz is enharmonic
to the fundamental, which explains the mismatch in the normalised phase (equation 4.5). The
64
4.2. INSTANTANEOUS TIMBRE
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
Time [s]M
agni
tude
0 0.5 1 1.5 2 2.5 3 3.5 4−180
−90
0
90
180
Time [s]
Pha
se [d
eg]
f1
f2
f3
f4
Figure 4.4: Timbre of a disturbed signal forM = fs. The bandwidth is small enough to suppress thedisturbance.
normalised phase seems to travel 720 degrees per second upwards, suggesting that the frequency
is +2Hz off. The amplitude modulation confirms this observation.
A complete suppression of the disturbance requires an effective bandwidth of 2 Hz. This is achieved
by increasing the block size toM = 1024 samples, or Tb = 1 s. The newly obtained timbre is shown in
figure 4.4. The disturbance is complete suppressed and the partials appear with the correct amplitude
and phase.
Summarizing: the disturbance of figure 4.3 was removed by increasing the window size and thereby
decreasing the effective bandwidth. In this perspective, the single-frequency Fourier transform of (4.9)
can be considered as a band-pass filter around f = kf0 with a certain effective bandwidth, realised
by the spectral resolution of the window 1/Tb and the -6dB bandwidth. Note that an increase of
the spectral resolution always compromises the temporal localisation, by the uncertainty principle
(section 2.6).
4.2.5 Example 9: Helicopter timbre
A 5 second microphone recording of a distant helicopter is analysed. The signal is sampled at
fs = 4096Hz. It is known that the main rotor has fundamental frequency f0 = 29.4Hz. The
timbre is first analysed using window size M = 512, Tb = 0.125 s. Again, the Hanning window is
applied, which causes the effective bandwidth to be 16Hz.
The result is shown in figure 4.5. The third partial, estimated at 3f0 = 88.2Hz exhibits amplitude
modulation, suggesting the presence of another wave. From the figure, the amplitude modulation is
approximated at 8.5Hz. Furthermore, the phase seems to be running downwards. This suggests the
65
4. TIMBRE
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.005
0.01
0.015
Time [s]
Mag
nitu
de
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−180
−90
0
90
180
Time [s]
Pha
se [d
eg]
f1
f2
f3
f4
Figure 4.5: Timbre of a helicopter for M = 512. The amplitude modulation on the third partialsuggests the presence of another wave with similar frequency.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.005
0.01
0.015
Time [s]
Mag
nitu
de
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−180
−90
0
90
180
Time [s]
Pha
se [d
eg]
f1
f2
f3
f4
Figure 4.6: Timbre of a helicopter forM = 1024. The disturbing source at 80Hz is suppressed.
presence of a disturbing wave at approximately 80Hz. This is verified from a DFT; the 80Hz wave is
in fact the fundamental frequency of the tail rotor.
To suppress the disturbing wave from the timbre, the effective bandwidth (including the 2 bins
window bandwidth) is decreased to 8Hz by increasing the window size to M = 1024, Tb = 0.25 s.
The new timbre plot is shown in figure 4.6. The amplitude modulation is suppressed and the phase
has become stable.
66
4.3. WARPED TIMBRE
4.3 Warped timbre
Let us get back to the discrete-time definition of instantaneous timbre (4.12). The equation accepts
blocks yb[m] of a time signal y[n] and returns the instantaneous timbre centred around time instances
tb. However, any block of correct sizeM may be inserted into the equation, including blocks obtained
by time-warping. Let y{t} denote a continuous-time interpolant function that was obtained by
interpolation of y[n]. Then the warped signal block can be found by:
yb[m] = y{t[m] + tb
}(4.13)
t[m] represents the warped local time vector centred around 0, as defined in section 3.3. tb is the
centred time instance of interest for block b.
The computation of the warped timbre is implemented in the Matlab R© function ������� ��,
which can be found in appendix B.4. The function returns the instantaneous timbre for a given set of
time instances, frequencies and linear chirp rates.
4.4 Summary
The timbre of a tonal component is in this work defined as the amplitude and phase of the harmonic
partials, relative to the fundamental partial. It is observed that the normalised phase function of
a partial reduces to a constant phase shift, as long as the partials remain exactly harmonic. The
timbre, formed by the normalised DTFT coefficients of the partials, characterises a tonal component
independently from fundamental frequency and total amplitude.
The assumption of exact harmonicity of the partials is grounded by theory from linear dynamics.
As most real-life dynamic systems are not perfectly linear (or even highly non-linear), it can be
interesting to observe the development of the timbre over time. The instantaneous timbre makes
use of the DTFT, which brings up time-frequency considerations as discussed in chapter 2. For a
good estimate of the timbre, one must make sure that the effective bandwidth (the result of spectral
resolution and window bandwidth) is small enough to suppress contributions of enharmonic waves.
The timbre may also be determined from time-warped blocks. An implementation of the required
time-warp, interpolation, windowing and DTFT operations is found in appendix B.4.
Timbre provides a useful and intuitive approach to representing a periodic signal, that bears a much
closer resemblance to the human perception of sound.
67
Part III
Short-time Spectral Analysis
Chapter5
Short-Time Fourier Transform
The Fourier analysis techniques discussed in the previous chapters were predominantly applied to
signals in their entirety. This methods is useful if the signal is short or if the signal is reasonably
stationary throughout the time-domain. In practice, most signals obtained from experiments are
non-stationary and can be minutes long. It would be impractical to analyse these signal as a whole.
Besides, one is often interested in how the signal changes over time.
Short-time analysis divides a longer signal into many shorter blocks, that usually have some overlap
in time. Every block is centred at a certain point in time and can be analysed using standard Fourier
techniques as discussed in chapter 2.
The short-time Fourier transform (STFT) (or formally: short-time discrete Fourier transform) is
perhaps the most popular and generally applicable method for short-time spectral analysis. This
chapter will discuss the basic aspects and time-frequency considerations.
5.1 Short-time blocks
Consider an arbitrary N -point signal y[n] sampled at fs. The time domain of the signal is 0 ≤ t < T
with total duration T = N/fs. It was already seen that a N -point DFT offers excellent frequency
information, but does not give any temporal information.
Therefore, signal y[n] is subdivided into B blocks yb[m]. Similar to n, m is the sample index of the
signal blocks: m = 0, . . . ,M − 1. The blocks are counted b = 1, . . . , B and have a smaller size
M < N , which corresponds to length Tb =M/fs.⎧⎨⎩yb[m] = y[m−M/2 + nb] m = 0, . . . ,M−1
nb = tbfs nb ∈ n(5.1)
The blocks are centred in time at the instances tb. The shift between two adjacent blocks, i.e. tb+1− tbis called the shift length Tl. Similarly, the shift size is L = Tlfs. To make sure that no samples of y[n]
are skipped, L < M and consequently Tl < Tb.
The first block b1 lies at t1 = 12Tb. The time instances for the other blocks b are given by:
tb =12Tb + (b− 1)Tl b = 1, . . . , B (5.2a)
71
5. SHORT-TIME FOURIER TRANSFORM
T
Tb
Tl
tb
yb[n]
y[n]
t = 0 t = T
Figure 5.1: The short-time Fourier transform divides a signal sequence into many shorter blocks ofequal length.
The block centre index is given by nb:
nb =12M + (b − 1)L b = 1, . . . , B (5.2b)
The short-time blocks are shown schematically in figure 5.1. The symbols involved in equations (5.1),
(5.2a) and (5.2b) are listed in table 5.1.
symbol domain name
y[n] R signal vectort[n] [0, T 〉 time vectorfs R sample raten 0, . . . , N−1 signal sample indexN N signal sizeT R signal length
yb[m] R block vectorm 0, . . . ,M−1 block sample indexM N < N block sizeTb R < T block length
b 1, . . . , B block numberB N total number of blockstb [0, T 〉 block centre timenb 0, . . . , N−1 block centre sample index
L N < M shift sizeTl R < Tb shift length
Table 5.1: Symbols used to define the short-time blocks.
5.1.1 Shift size & overlap
Figure 5.1 shows a certain amount of overlap between two blocks. The overlap is determined by the
block size and the shift size: M/L. Theoretically, the shift size can be anything from 1 sample to the
block size. A too low number for L results in a large number of blocks and consequently many DFT
72
5.2. SHORT-TIME DFT
computations. For instance, if L is much smaller than M , the DFTs of successive blocks will not be
much different. If however L is too large, some events in the signal may not be detected.
When a non-rectangular window is applied, an overlap of at least 2× is necessary to make sure every
sample of y[n] is contained in the spectrum. In particular for the Hanning window, an overlap of
exactly 2 ensures that every sample in y[n] is covered equally over the successive blocks.
5.2 Short-time DFT
The distribution of y[n] over short-time blocks yields B blocks yb[m] of size M . The DFTs of the
blocks can be obtained by the fast Fourier transform (section 2.4.4), that performs the following
transformation:
yb[k] =1
M
M−1∑m=0
y(w)b [m] e−i 2π k m
M k = 0, . . . ,M−1 b = 1, . . . , B (5.3)
The blocks y(w)b [m] are windowed by a window function w[m] according to (2.16).
After FFT computation, array yb[k] is obtained that consists of B ×M complex elements. B is the
number of DFT blocks for the time instances tb. M is the number of frequencies fk that follow from
(2.10b). The frequency resolution is determined by the block size and the sample rate:
Δf =M/fs (5.4)
A popular representation of yb[k] is the waterfall plot or spectrogram, which shows the amplitudes of
the single-sided spectrum colours on a 2D time-frequency grid. Examples of the spectrogram will be
shown in the following sections.
5.3 Time-frequency considerations
Although (5.1) — (5.3) involve many different symbols, only three independent choices are left for the
spectral representation:
1. Block sizeM . The block length follows from Tb =M/fs.
2. Shift size L. The shift length follows from Tl = L/fs.
3. Window type.
The meaning of these properties for the spectrogram are discussed on the basis of a STFT of a fly-by
helicopter, sampled at fs = 4096Hz.
5.3.1 Spectral/temporal resolution
Figure 5.2(a) shows the STFT for block sizeM = fs/1 = 4096 and shift size L = M/2 = 2048. The
overlap ratio isM/L = 2×. Figure 5.2(b) has a 4× smaller block size: M = fs/4 = 1024. The shift
size is reduced to L =M/8 = 512, keeping the same overlap ratio of 2×. Both STFTs use a Hanning
window.
Clearly, the first STFT yields more spectral detail, whereas the second STFT has better temporal
localisation.
73
5. SHORT-TIME FOURIER TRANSFORM
Frequency [Hz]
Tim
e [s
]
0 50 100 150 200
1
2
3
4
5
6
7
8
9
[dB]0
10
20
30
40
50
60
70
80
(a) STFT for Δf = 1Hz and Tl = 0.5 s
Frequency [Hz]
Tim
e [s
]
0 50 100 150 200
1
2
3
4
5
6
7
8
9
[dB]0
10
20
30
40
50
60
70
80
(b) STFT forΔf = 4Hz and Tl = 0.125 s
Frequency [Hz]
Tim
e [s
]
0 50 100 150 200
1
2
3
4
5
6
7
8
9
[dB]0
10
20
30
40
50
60
70
80
(c) STFT forΔf = 1Hz and Tl = 0.125 s
Frequency [Hz]
Tim
e [s
]
0 50 100 150 200
1
2
3
4
5
6
7
8
9
[dB]0
10
20
30
40
50
60
70
80
(d) STFT for (b) with 3× zero-padding
Frequency [Hz]
Tim
e [s
]
0 50 100 150 200
1
2
3
4
5
6
7
8
9
[dB]0
10
20
30
40
50
60
70
80
(e) STFT for (c) using a rectangular window
Frequency [Hz]
Tim
e [s
]
0 50 100 150 200
1
2
3
4
5
6
7
8
9
[dB]0
10
20
30
40
50
60
70
80
(f) STFT for (c) using a Gaussian window, σ = 0.1 s
Figure 5.2: Six different DTFTs for the same signal. Figure (a) and (b) show the effect of a differentspectral/temporal resolution. Figure (c) increases the overlap ratio. Figure (d) applies 3× zero-paddingto (b). Figure (e) and (f) use different window functions for the settings of (c).
74
5.4. SUMMARY
5.3.2 Overlap
An overlap ratio of 2× ensures that every point of the original signal is contained in the STFT.
Following this reasoning, a higher ratio will undoubtedly bring up some redundancy in the STFT.
Nevertheless, a higher ratio can yield a better time-frequency localisation. Figure 5.2(c) shows the
STFT for M = fs/1 = 4096 and L = M/8 = 2048; the overlap ratio is 8×. Compared to figure
5.2(a), the STFT is better localised in time.
5.3.3 Zero-padding
Zero-padding is an elegant trick to reach a somewhat higher frequency resolution while keeping
the same block size. By padding a block of M samples by (for instance) 3M zeros before being
transformed by the FFT, the spectral resolution becomes 4 times higher. Although no new information
is added to the block, the increased resolution canmake it easier to recognise closely spaced frequency
peaks. The effect of 3× zero-padding is shown in figure 5.2(d).
5.3.4 Windowing
The quality of an STFT heavily depends on the chosen window function (section 2.5). Two STFTs are
shown as a comparison with figure 5.2(c). Figure 5.2(e) uses a rectangular window, resulting in sharp
frequency lines but a high level of leakage. Figure 5.2(f) uses a Gaussian window with σ = 0.1 s,
resulting in more “blurry” frequency lines.
Time and frequency localisation was formalised in section 2.6.1 by the temporal and spectral second
moments: σt and σf . These values objectively quantify the performance of the window:
figure window σt σf5.2(c) Hanning 0.14 s 0.58Hz
5.2(e) Rectangular 0.29 s 8.38Hz
5.2(f) Gaussian 0.07 s 1.13Hz
Looking at figures 5.2(c), 5.2(e) and 5.2(f), it is verified that the Gaussian window has the best temporal
localisation of the three windows. Also, the Hanning window yields the best spectral localisation.
5.4 Summary
The short-time Fourier transform is a popular method for short-time spectral analysis. The DTFT
divides a signal into shorter segments and applies the DFT to the individual blocks. The obtained
spectrum can be visualised by a waterfall diagram, with time on one axis and frequency on the other.
The temporal and spectral localisation can be controlled by proper choosing of the block size, shift size
and window type. As a last resort, zero-padding can be applied to increase the number of frequency
points. Yet, the STFT is subject to the time-frequency uncertainty principle, limiting the simultaneous
temporal and spectral localisation.
75
Chapter6
Fan Chirp Transform
Many real-life signals are non-stationary by nature. Examples are countless and include recordings
or measurements of speech, music, passing vehicles, engines, etc. The short-time Fourier transform
(STFT) as discussed in chapter 5 can be applied to any signal. However, if the signal comprises rapidly
changing frequency content, the results can be cumbersome: the DFT tries to “project” the changing
frequencies on a rectangular-tiled time-frequency grid, resulting in undesired frequency spreading.
For highly instationary signals, the question arises if the rectangular grid provides the best basis
for analysis. The answer is “not really” and an ingenious alternative is provided by the Fan Chirp
Transform (FChT). The fan chirp transform was proposed in 2007 by Luis Weruaga and Márian
Képesi [21]. It effectively provides basis vectors in a fan-geometry by pre-processing the signal with
time-warping technique as discussed in chapter 3. The Short-Time Fan Chirp Transform (STFChT)
implements this time-warping and operates per block, similar to the STFT.
Figure 6.1 illustrates the difference between the FChT and STFChT schematically. Considering the
fact that the harmonic structure of a tonal component ensures constant ratios to the fundamental
frequency, the skewness of the grid must increase accordingly.
frequency
tim
e
(a) DFT basis grid
frequency
tim
e
(b) FChT basis grid
Figure 6.1: Schematic representation of a non-stationary tonal component against a rectangular-tiledand fan-tiled basis grid.
77
6. FAN CHIRP TRANSFORM
6.1 Formulation of the Fan Chirp Transform
The Fan Chirp Transform1 for the continuous-time domain as formulated in [21] reads:
yα(f) =
∫ ∞
−∞y(t)
√|φ′α(t)| e−i 2πfφα(t) dt −∞ < f <∞ (6.1)
The linear time warp function φα(t) is defined by equation (3.1). The term√φ′α(t) is a normalisation
function that preserves the unitarity of the transformation2. Equation (6.1) can be interpreted as a
projection of y(t) onto a set of chirping basis functions e−i 2πfφα(t).
By applying the change of variable τ = φα(t) and inversely t = φ−1α (τ) = ψα(τ), equation (6.1) is
placed on the warped-time axis and becomes:
yα(f) =
∫ ∞
−∞y(ψα(τ))
√∣∣φ′α(ψα(τ))∣∣ e−i 2πfτ dτ
=
∫ ∞
−∞y(τ) ρα(τ) e
−i 2πfτ dτ −∞ < f <∞ (6.2)
The latter equation uses two substitutions:
1. The warped signal y(τ) obtained by the procedure of linear time-warping, as discussed in
section 3.1.
2. A normalisation function ρα(τ) which can be shown to be [21, 9]:
ρα(τ) =1
4√|1 + 2ατ | (6.3)
Equation (6.2) is easily recognised as a Fourier transform of the product of y(τ) and ρα(τ). It can
therefore be computed efficiently by the fast Fourier transform.
6.2 Short-time Fan Chirp Transform
The short-time fan chirp transform (STFChT) combines the STFT and FChT and computes the
following transformation:
yb[k] =1
M
M−1∑m=0
yb[m] ραb[m]w[m] e−i 2π k m
M k = 0, . . . ,M−1 b = 1, . . . , B (6.4)
Vector yb[m] denotes the warped signal block b centred at time instance tb and obtained by linear time-
warping with chirp rate αb. Note that the warped block is obtained by non-uniform interpolation (see
section 3.3) and as a consequence, the time instances do not fully correspond with those of the STFT
blocks. This was illustrated by example 6.
The vector ραb[m] is the normalisation term of (6.3) for block b. w[m] represents a symmetric window
function. Note that windowing is applied on the warped-time axis, while it can also be applied on the
linear-time axis, i.e. prior to the time-warping. [5] suggests the first method, motivated by the fact that
the main function of the window is improving the spectral representation, rather than distributing the
signal evenly over time. For both cases however, the peak of the window will always correspond to
the centre time instance tb.
1The term “fan” points out that all frequencies focus in the same focal point, see section 3.1.2.2It can be shown that due to this term, the transformation is still unitary and therefore Parseval’s theorem also applies to
the FChT [21].
78
6.2. SHORT-TIME FCHT
6.2.1 Block chirp rate
In contrast to the STFT, the frequencies of the STFChT are instationary, as was illustrated by the
skewed grid of figure 6.1(b):
fkb(t) = (1 + αb(t− tb)) − 1
2Tb ≤ (t− tb) <12Tb (6.5)
The chirp rates can be set individually for each block, within the time support limit of (3.9):
− 1
2αb< (t− tb) <
1
2αb
The block chirp rate is therefore limited by the block length Tb:
αb <1
Tb(6.6)
Considering a tonal component with instantaneous fundamental frequency f0(t), the ideal linear
chirp rate for block b is simply estimated by:
αb =f ′0(tb)
f0(tb)b = 1, . . . , B (6.7)
As the fundamental frequency is often given or approximated as a discrete-time vector f0[n], the chirp
rate can be found by finite differences, for instance:
αb =f0[nb + 1]− f0[nb − 1]
2Δt f0[nb]b = 1, . . . , B (6.8)
If the fundamental frequency is not (precisely) known, one can vary the chirp rate and choose the rate
that yields the sharpest spectrum / highest peaks. This approach is used in [5].
6.2.2 Example 10: Short-time fan chirp transform of a chirp wave
Consider a T = 5 s signal y[n] of a chirp signal sampled at fs = 1024Hz. The instantaneous
frequency is given by:
f0(t) = 10 + 1220t
2 0 ≤ t < T
The signal consists of two partials with frequencies f0(t) and 2f0(t). Using equation (6.7), the chirp
rate for block b at time tb reads:
αb =20tb
10 + 10t2bb = 1, . . . , B
The maximum of αb is 1 at tb = 1 s. Therefore, the block length Tb is chosen to be 1 s to satisfy the
time-support of (6.6).
An STFT and STFChT are computed for y[n] and shown in figure 6.2. Both transformations use block
length Tb = 1 s, shift length Tl = 1/32 s and a Hanning window. The time-warping for the FChT is
performed by 8-point spline interpolation.
79
6. FAN CHIRP TRANSFORM
Frequency [Hz]
Tim
e [s
]
0 100 200 300 400 5000.5
1
1.5
2
2.5
3
3.5
4
[dB]−160
−140
−120
−100
−80
−60
−40
−20
0
(a) STFT
Frequency [Hz]
Tim
e [s
]
0 100 200 300 400 5000.5
1
1.5
2
2.5
3
3.5
4
[dB]−160
−140
−120
−100
−80
−60
−40
−20
0
(b) STFChT
Figure 6.2: A instationary wave represented by a STFT and a STFchT.
As expected, the chirp waves appear as widely spread frequency bands in the STFT of figure 6.2(a).
This can be regarded as a very bad analysis, since the chirp signal covers a considerable part of the
spectrum.
The STFChT of figure 6.2(b) shows much more concentrated lines. However, the warping operation
introduces some leakage. Most leakage is found around t = 1 and t = 4.5. The leakage around t = 1
can be explained by the fact that the chirp reaches the time-support limit as defined by (6.6). The
leakage around t = 4.5 is due to aliasing: the frequencies at t = 4.5 are warped in the vicinity of the
Nyquist sampling limit 12fs = 512Hz. It was verified that both types of leakage are absent for a signal
with fundamental frequency f0(t) = 10+ 1210t
2 with the maximum chirp rate 0.707 and frequencies
extending to 270Hz.
6.2.3 Example 11: Short-time fan chirp transform of an engine run-up
The possibilities of the STFChT are illustrated on the basis of a typical signal from dynamic
experiments: a microphone measurement of a car engine during a run-up. The signal and
transformation are characterised by:
– Sample rate: fs = 8192Hz.
– Signal length: T = 30 s.
– Block length: Tb = 1 s.
– Shift length: Tl = 1/8 s.
– Window: Hanning.
The chirp rate was estimated from the tacho-vector (the engine speed in RPM) by finite differences
using equation (6.8). Since the engine speed increases slowly, the chirp rate is rather low: α stays
below 0.05. Still, the STFChT offers better spectral information than STFT, as can be observed from
figure 6.3.
80
6.2. SHORT-TIME FCHT
Frequency [Hz]
Tim
e [s
]
DFT Engine run−up
0 500 1000 150010
11
12
13
14
15
16
17
18
19
20
[dB]0
10
20
30
40
50
60
70
80
(a) STFT
Frequency [Hz]
Tim
e [s
]
FCHT Engine run−up
0 500 1000 150010
11
12
13
14
15
16
17
18
19
20
[dB]0
10
20
30
40
50
60
70
80
(b) STFChT
Figure 6.3: An STFT and STFchT of an engine run-up.
81
6. FAN CHIRP TRANSFORM
Frequency [Hz]
Tim
e [s
]
DFT Engine run−up
600 650 700 750
20
21
22
23
24
(a) STFT
Frequency [Hz]
Tim
e [s
]
FCHT Engine run−up
600 650 700 750
20
21
22
23
24
(b) STFChT
Figure 6.4: Detail of the STFT and STFchT.
Figure 6.4 shows a detail of the STFT and STFChT. The linear warp operation successfully transforms
the instationary signal blocks into stationary blocks.
6.3 Summary
A well-known shortcoming of the DFT is its inability to detect and localise instationary signals. As
the basis vectors of the DFT are constant (rectangular), an instationary wave always projects on a
group of frequencies. The fan chirp transform (FChT) effectively provides basis functions in a fan-
geometry, matching the harmonic structure of an instationary tonal component. This is realised by
applying time-warping to the original signal, prior to being processed by the DFT.
The short-time fan chirp transform (STFChT) is in essence the STFT of time-warped signal blocks.
The improvement depends on the quality of the time-warp process. If the fundamental frequency of a
dominant tonal component is known, the required linear block chirp rates can be estimated by finite
differences. If however no fundamental frequency information is available, one can vary the chirp
rate per block and choose the chirp rate that yields the sharpest spectrum.
82
Part IV
Pitch Tracking & Order Extraction
Chapter7
Pitch Tracking Techniques
Pitch tracking can be an important task in analysis of dynamic measurements. Let us for example
consider a measurement of the vibration of the exhaust pipe of combustion engine during a run-up
from 1000 RPM to 3000 RPM. Such measurements are performed to characterise the response of the
mechanical parts to its rotational inputs. After measurement, the acquired data is stored in time-
domain signals. Short-time spectral analysis can then be performed right away, for example with
techniques described in chapter III. However, one often wants to replace the time axis by a linear
scale of RPM and the frequencies by equivalent engines orders. The first implementation is called a
Campbell diagram, the latter an order plot.
There are several ways to achieve this [2]. Regardless of the method, one will need exact information
of speed (RPM) as a function of time. In most cases this information is obtained during the
measurement by so-called tacho pulses: a pulse is released for every full or partial revolution of
the engine shaft. From these pulses, a vector can be constructed that accurately describes the
instantaneous speed, which is related to the fundamental frequency of the spectrum by some ratio
depending on the engine configuration (number of cylinders, 2 or 4-stroke).
Additionally, one may want to extract orders from the time-domain signal in order to analyse them
separately. As long as a tacho vector is supplied with the measurements, the diagrams and orders
are obtained rather easily. However, the availability of such data is not obvious, for example in the
following cases:
– Acoustic (microphone) measurements of a car while driving
– Asynchronous components in measurements
– Acoustic measurements on drive-by or fly-by vehicles
– Any other dynamic system for which no a-priori fundamental frequency data is available
For these situations, a pitch tracking algorithm can be employed to determine the instantaneous
frequency of instationary periodic components in a signal.
Note:
In most research, the term pitch is used instead of fundamental frequency. Although strongly related,
the fundamental frequency is an objective property, while the term pitch is subjective to a listener and
should actually be preserved for a perceptual context, as stated in a.o [4, 11]. Nevertheless, both terms
are used interchangeably in this chapter.
85
7. PITCH TRACKING TECHNIQUES
7.1 Pitch detection
Pitch detection is somewhat different than pitch tracking. Pitch detection (or fundamental frequency
estimation) is the procedure by which a fundamental frequency of a stationary periodic component
is sought within a signal. This is usually done without any knowledge of the expected location of
the fundamental. Pitch detection algorithms (PDA) exist for both time and frequency domain. Most
applications are the detection of pitch and automatic transcription of musical or speech signals. Often,
PDAs are limited to monophonic signals, i.e. signals with only one tonal component.
Popular time-domain techniques are often limited to detection of monophonic signals and include:
– Time event rate techniques: zero-crossing rate, peak rate, slope event rate
– Autocorrelation techniques
– Phase space techniques
Frequency domain methods operate on the Fourier transform of a signal. Many of them benefit from
the presence of harmonics. Typical methods are:
– Spectral autocorrelation
– Harmonic product spectrum
– Cepstrum
– Maximum likelihood estimators
– Multi-resolution techniques
Most methods return a vector with a certain “score” for a series of fundamental frequency candidates.
The pitch is then usually estimated as the frequency with the highest score. All methods mentioned
above are discussed in [11, 7, 8].
7.2 Pitch tracking
The aim of a pitch tracking algorithm (PTA) is to follow the fundamental frequency of an instationary
periodic component in a signal that is reasonably continuous over time. The latter means that there
are no large discontinuities in the fundamental frequency trajectory. To do this, the PTA should be
robust to noise and interfering orders and accurate in both temporal and spectral sense.
An STFT of a typical signal is shown in figure 7.1. It concerns a 50 seconds recording of a fly-by
helicopter, recorded from ground. It can be immediately observed that the first fundamental starts
at approximately 30Hz and ends around 26Hz, in this case due to the Doppler effect of the passing
helicopter. A second fundamental is located around 80Hz, decreasing to 64Hz. It is understood that
the first fundamental corresponds to the main rotor and the second fundamental to the tail rotor.
The goal of pitch tracking would be to track the frequency lines, i.e. follow the evaluation of
fundamental frequency over time.
86
7.2. PITCH TRACKING
Frequency [Hz]
Tim
e [s
]
DFT Fly−by helicopter
0 50 100 150 200 250 300
5
10
15
20
25
30
35
40
45
[dB]0
10
20
30
40
50
60
70
80
Figure 7.1: STFT of a recording of a fly-by helicopter.
7.2.1 Pitch salience
Let us recall the definition of the phase-normalised instantaneous timbre (4.10) as introduced in
section 4.2:
cθ(t) =
⎛⎜⎜⎜⎝cθ1(t)
cθ2(t)...
cθK(t)
⎞⎟⎟⎟⎠
The partials are obtained from a discrete-time signal by (4.12). The timbre is centred around a point in
time t and applies to a periodic component with fundamental frequency f0. If cθ(t) was determined
for a frequency where no tonal component is present, the vector would still exist but have much
smaller values. The highest values are presumably found at the location of the dominant tonal
component. This presumption leads to the following definition of pitch salience:
sb(f0) =
K∑k=1
|ck(f0, tb)| b = 1, . . . , B (7.1)
sb(f0) is the sum of the partial amplitudes of a harmonic series with fundamental frequency f0 and
K harmonics, centred around time instance tb. The time instances are defined similar to section 5.1.
The pitch salience of the helicopter signal is determined for t = 10 using K = 8 harmonics, block
length Tb = 1 s and a Hanning window. The results are shown in figure 7.2. The highest peak is
found at f0 = 29.5Hz, which corresponds to the main rotor fundamental.
87
7. PITCH TRACKING TECHNIQUES
0 50 100 150 200 250 3000
100
200
300
400
500
600
700
Fundamental frequency [Hz]
Sal
ienc
e
Figure 7.2: Pitch salience at t = 10 s. The highest peak is found around f0 = 29.5Hz.
7.2.2 Salience tracking
Let us say that the the fundamental frequency of the main rotor was known for the first time instance
t1, but that the evaluation over time was unknown. Then the fundamental frequency trajectory can
be followed by:
fb = maxf0
sb(f0) b = 1, . . . , B (7.2)
Equation (7.2) can be performed by a line search optimisation algorithm. Note that the search should
be performed in the vicinity of the previously found fb−1. To ensure that the algorithm does not jump
to another local maximum, a Gaussian function can be included:
fb = maxf0
⎡⎢⎢⎣sb(f0) e−
12
⎛⎝f0 − fb−1
σf0
⎞⎠
2⎤⎥⎥⎦ b = 1, . . . , B (7.3)
The Gaussian imposes a standard deviation of σf0 around the previously found frequency fb−1,
thereby masking spurious local maxima. Figure 7.3 shows the salience for σf0 = 10Hz.
Equation 7.3 allows for easy and accurate estimation of the fundamental frequency vector.
7.2.3 Pitch tracking
It was observed in chapter 4 that timbre tends to remain constant over time. This is a very useful
property for pitch tracking. For example, the timbre difference can be defined as:
sb(f0) =∥∥∥c(f0, tb)− c(f0, tb−1)
∥∥∥ b = 1, . . . , B (7.4)
By minimization of sb(f0), a similar estimate of the fundamental frequency trajectory can be found.
88
7.3. SUMMARY
0 10 20 30 40 50 60 70 80 90 1000
100
200
300
400
500
600
700
Fundamental frequency [Hz]
Sal
ienc
e
Figure 7.3: Pitch salience including a Gaussian function.
Experiments with different minimization and maximization functions were promising. Due to limited
time, the results are not reported. The CD-ROM contains several examples of timbre-tracking locked
tracking.
7.3 Summary
The aim of pitch tracking is to follow a fundamental frequency trajectory as it develops over time.
Good results can be obtained frommaximisation of a pitch salience function: the cumulative harmonic
amplitudes obtained by DTFT. To make sure that the algorithm does not jump to another local
maximum, a Gaussian function can be included, window one the frequency axis.
Even better estimates can be obtained by minimisation of the difference in instantaneous timbre. Due
to limited time, this was not reported.
89
Chapter8
Vold-Kalman Order Filtering
The last topic of this thesis is order extraction using the Vold-Kalman filter (VKF) algorithm. The
previous chapters discussed means to identify and visualise periodic components in signals. Order
extraction tries to isolate these components and extract them from a signal. The VKF is perfectly
suited for extraction of instationary components, as will be discussed in this chapter.
8.1 Vold-Kalman Filter
The Vold-Kalman filter, introduced by Håvard Vold and Jan Leuridan in 1993, is a filter for extraction
of instationary periodic components from a signal using a known frequency vector. Being formulated
in a least-squares problem, it can be solved as a linear system. Similar to the Kalman filter, the
Vold-Kalman filter uses structural equation and a data equation. In what follows, only the second
generation VK filtering is discussed. References to VK filtering in general are [20, 10, 19].
8.1.1 Data equation
Let us consider the signal y[n], n = 0, . . . , N−1. The signal is assumed to be composed ofK sinusoidal
components plus remaining noise η[n], similar to the sinusoids plus noise model of equation (1.13):
y[n] =
K−1∑k=0
xk[n]sk[n] + η[n] (8.1)
Every sinusoidal component consists of a complex amplitude xk[n] and a complex phasor sk[n]. By
definition of (1.4a), the phasor writes in respectively continuous and discrete time:
sk(t) = exp
(i
∫ t
0
2πfk(τ)dτ
)(8.2a)
sk[n] = exp
(i
n∑m=1
2πfk[m]Δt
)(8.2b)
In equation (8.1), η[n] is the remaining noise signal, which is to be minimised by proper choosing
of the amplitude vectors xk and phasor vectors sk[n]. For extraction of the kth component, the data
equation in matrix notation reads:
y −Ckxk = δ (8.3)
with the complex phasor sk stored on the diagonal of the matrix Ck .
91
8. VOLD-KALMAN ORDER FILTERING
8.1.2 Structural equation
The structural equation imposes smoothness on the amplitude vector xk using a finite difference
sequence. For a first order difference, the structural equation writes:
∇xk[n] = xk[n]− xk[n− 1] = εk[n] (8.4a)
Higher-order differences are found by applying the backward difference equation1:
∇(2)xk[n] = xk[n]− 2xk[n− 1] + xk[n− 2] = ε(2)k [n] (8.4b)
∇(3)xk[n] = xk[n]− 3xk[n− 1] + 3xk[n− 2]− xk[n− 3] = ε(3)k [n] (8.4c)
∇(4)xk[n] = xk[n]− 4xk[n− 1] + 6xk[n− 2]− 4xk[n− 3] + xk[n− 4] = ε(4)k [n] (8.4d)
or in general for order p:
∇(p)xk[n] =
p∑r=0
(−1)r(p
r
)xk[n− r)] = ε
(p)k [n] (8.4e)
εk[n] is the error made in the smoothness of the amplitude vector xk[n], that is to be minimised.
The system of structural equations for all amplitudes xk[0], xk[1], . . . , xk[N − 1] takes the form of a
matrix equation; let us for example consider the second order differences:⎡⎢⎢⎢⎢⎣1 −2 1 0 0 0
0 1 −2 1 0 0. . .
. . .. . .
0 0 0 1 −2 1
⎤⎥⎥⎥⎥⎦
⎡⎢⎢⎢⎢⎣
xk[0]
xk[1]...
xk[N − 1]
⎤⎥⎥⎥⎥⎦ =
⎡⎢⎢⎢⎢⎣
εk[2]
εk[3]...
εk[N − 1]
⎤⎥⎥⎥⎥⎦ (8.5)
Note that error εk[0] and εk[1] can not be determined, as the first amplitude is xk[0] corresponding
with y[0]. Ak is therefore an (N − p)×N matrix, p being the order of the difference. The symbolic
notation reads:
Akxk = εk (8.6)
8.1.3 Least squares problem
The data equation and structural equation combine to a least squares problem. The data error (8.3)
and structural error (8.6) have the scalar quadratic forms (dropping the �k subscript for readability):(yT − xHCH
)(y −Cx) = ηHη (8.7a)
xHAHAx = εHε (8.7b)
The common unknown variable in (8.7a) and (8.7b) is the complex amplitude vector x. Introducing a
selectivity scalar r, a cost function J(x) is composed:
J(x) = r2εHε+ ηHη =
r2xHATAx+(yT − xHCH
)(y −Cx) (8.8)
1Similar schemes may also be obtained with central or forward difference equations.
92
8.2. VKF OPERATION
The minimum of J(x) is found by evaluation of the derivative dJ/dxH = 0:
dJ
dxH= 2r2ATAx+ 2
(CHCx−CHy
)= 0 (8.9)
Rewriting the equation for x and observing that CHC = I , one obtains the linear system:(r2ATA+ I
)x = CHy (8.10)
x = B−1c (8.11)
substituting B =(r2ATA+ I
)and c = CHy in the latter equation.
By construction, det r2ATA = 0 and it can be shown that r2ATA is a positive semi-definite matrix
with 2p+ 1 non-zero diagonal bands. The addition of unity I turnsB into a positive definite matrix.
Note that matrix B is independent of the tracked frequency in C and the signal y. Moreover, since
B is symmetric and sparse, it is easily factorised into a Cholesky decomposition and can be reused
again for tracking of orders with different phasors.
8.1.4 Order extraction
The above procedure operates on a single order k. The result is a complex amplitude xk[n] for a signal
with phasor sk[n]. The desired signal yk[n] for order k is simply obtained with:
yk[n] = 2 · � (xk[n] sk[n]) n = 0, . . . , N−1 (8.12)
Factor 2 is necessary since only the positive frequency part was shifted and extracted (see figure 8.1).
The obtained order yk[n] is the signal tracked at frequency fk[n] with the same phase as it appears in
the signal. Depending on the selectivity factor r, some neighbouring frequency content is included as
well.
If the aim was to filter out the component with frequency vector fk[n], the filtered signal y(f)[n] is
obtained by:
y(f)[n] = y[n]− yk[n] n = 0, . . . , N−1 (8.13)
or for multiple components k = 1, . . . ,K :
y(f)[n] = y[n]−K∑
k=1
yk[n] n = 0, . . . , N−1 (8.14)
The latter procedure can be applied per order in a step-by-step way, provided that the orders do not
have close or crossing frequency trajectories. If this is the case, multiple orders can be extracted
simultaneously using a multiple-component algorithm; see section 8.2.4.
8.2 VKF operation
Let us recall equation (8.10) for tracking of a single order k:(r2ATA+ I
)xk = CH
k y
93
8. VOLD-KALMAN ORDER FILTERING
At the right-hand side, signal y is pre-multiplied with the transposed phasor matrix CHk . Theoreti-
cally, this operation shifts the tracked frequency fk in y towards zero, or rather: fk and−fk become
0fk and −2fk since the original signal was real. That means that the wave of interest is modulated
to 0Hz, hence it has become an amplitude envelope at DC.
On the left-hand side, matrix B = r2ATA + I represents a zero-phase low-pass filter on the
amplitude vector x. An increase of r2 yields an increase of the smoothness constraint, corresponding
with a decreasing relative cut-off frequency fc/fs. However, the numerical stability gets compromised
once matrixB becomes close to singular due to a high r-value. As an illustration: except for the first
and last p rows, the entries in row i for p = 2 and r = 100 are:
Bi =[ · · · 0 10000 −40000 60001 −40000 10000 0 · · · ]
(8.15)
while r is typically in the range of 104 − 106 for a reasonably narrow passband (see section 8.2.2 for
details on selection of r).
The procedure in the frequency domain is illustrated in figure 8.1. An single sinusoidal wave is
considered. The red peaks represent the original signal y, the blue peaks are the modulated part
CHy. The yellow area is the low-pass filter on xk that is imposed byB.
amplitude[dB]
frequency [Hz]
fs/2−fs/2
fc0
0
−3
−∞
Figure 8.1: The VKF shifts the frequency content to DC and applies low-pass filtering.
8.2.1 Solving the linear system
The linear system to solve is typically very large, sparse and only lightly coupled. As already
mentioned, it is a common practice to factorise matrix B into a Cholesky decomposition. This is
possible since B is symmetric and positive definite. The decomposition is B = LU with U = LT .
Once the triangular matrices are calculated, an efficient forward reduction and backward substitution
can be carried out:
Lc′ = c (8.16a)
Ux = c′ (8.16b)
with c = CHy and c′ as an intermediate vector. The forward reduction is computed with the
coefficients of matrix L. L is a lower triangular matrix with p non-zero diagonals below the main
94
8.2. VKF OPERATION
diagonal, meaning that most of the lower triangle is zero. For a filter with p = 2, the computation
evolves as follows:
c′1 = c1/L1,1
c′2 = (c2 − L1,2 c′1) /L2,2
c′3 = (c3 − L1,3 c′1 − L2,3 c
′2) /L3,3
. . .
c′N = (cN − LN−2,N c′N−2 − LN−1,N c′N − 1) /LN,N
In this process, vector c undergoes low-pass filtering and some delay is gained. Steady-state values
for the entries of L can be defined:
l0 = limj→∞
Lj,j l1 = limj→∞
Lj,j+1 l2 = limj→∞
Lj,j+2 (8.17)
Using these coefficients, a transfer function HF (z) can be determined for the forward operation of
(8.16a) by applying the transformation c′j+k = c′j z−k to the z-domain:
HF (z) =c′(z)c(z)
=1
l0 + l1 z−1 + l2 z−2(8.18)
The backward substitution is performed similarly but in the opposite direction using U :
xN = c′N/UN,N
xN−1 = (c′N−1 − UN−1,N xN ) /UN−1,N−1
xN−2 = (c′N−2 − UN−2,N−1 xN−1 − UN−2,N xN ) /UN−2,N−2
. . .
x1 = (c′1 − U1,2 x2 − U1,3 x3) /U1,1
Since U = LT , the steady-state values for u0, u1, u2 equal l0, l1, l2 from (8.17). The transfer function
for the backward substitution of (8.16b) is:
HB(z) =x(z)
c′(z)=
1
u0 + u1 z1 + u2 z2(8.19)
which is (8.18) in the opposite direction. In fact, c is processed twice and the delays cancel each other
out, resulting in zero-phase filtering with transfer function:
H(z) =x(z)
c(z)= HF (z)HB(z) (8.20)
The transfer functions have the same gain characteristics. The frequency response of the combined
transfer function is found by substitution of the complex quantity z = eiΩ:
∣∣H(eiΩ)∣∣ = ∣∣HB(e
iΩ)∣∣2 =
∣∣∣∣ 1
u0 + u1 eiΩ + u2 e2iΩ
∣∣∣∣2 (8.21)
Noticing that Ω = 2πf/fs, the frequency response characteristics can be obtained from the LU
factorisation coefficients u0, u1, u2 or u0, . . . , up in general for order p.
95
8. VOLD-KALMAN ORDER FILTERING
100 110 120 130 140 150 160 170 180 190 200−60
−50
−40
−30
−20
−10
−30
Frequency [Hz]
Mag
nitu
de [d
B]
1−pole, r = 2.0e+0012−pole, r = 6.5e+0023−pole, r = 2.1e+0044−pole, r = 6.6e+005
Figure 8.2: Frequency response of the VKF.
8.2.2 Bandwidth and roll-off
The −3 dB bandwidth of the filter is controlled by the value of r and is usually expressed relative to
the Nyquist frequency fs:
Δf =fcfs/2
(8.22)
Equations exist that relate r to the coefficients u0, . . . , up in (8.21); see for the derivation [19]. With
the following equations, a desired bandwidth Δf can be translated into the required r value:
p = 1 r2 =
√2− 1
2− 2 cos(πΔf)
p = 2 r2 =
√2− 1
6− 8 cos(πΔf) + 2 cos(2πΔf)
p = 3 r2 =
√2− 1
20− 30 cos(πΔf) + 12 cos(2πΔf)− 2 cos(3πΔf)
p = 4 r2 =
√2− 1
70− 112 cos(πΔf) + 56 cos(2πΔf)− 16 cos(3πΔf) + 2 cos(4πΔf)
Hence, a higher value of r results in a smaller bandwidth but also a slower varying amplitude x. The
latter may be a drawback when the signal is subject to large amplitude fluctuations in time.
As an illustration, figure 8.2 shows the bandwidth and roll-off for filters with p = 1, . . . , 4. The signal
is a swept sine from 100 to 200Hz, sampled at fs = 1000Hz, while the tracked frequency is 150Hz.
The absolute bandwidth for all cases is fc = 5Hz, hence the relative bandwidth isΔf = 5/500 = 1%,
resulting in the following values for r:
rp=1 = 20.5 rp=2 = 652.2 rp=3 = 20759.5 rp=4 = 661817.5
96
8.2. VKF OPERATION
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
Time [s]
Am
plitu
de
1−pole, fc = 1 Hz
1−pole, fc = 5 Hz
3−pole, fc = 1 Hz
3−pole, fc = 5 Hz
Figure 8.3: VKF impulse response
The filter roll-off is 20p dB per decade, or 40p dB on a quadratic (power) scale. This corresponds with
the number of poles, since it is a quadratic filter.
The response to an amplitude change is shown in figure 8.3. A wave with f = 100Hz undergoes
an amplitude step from 0.1 to 1.0 at t = 0.5 (the black dashed line). The response is shown for the
following filter settings:
– p = 1 and Δf = 1Hz
– p = 1 and Δf = 5Hz
– p = 3 and Δf = 1Hz
– p = 3 and Δf = 5Hz
It can be concluded that a high frequency selectivity (1Hz) results in a slow response to amplitude
changes. In addition, the third order filter shows some overshoot. This is in accordance with filter
theory.
8.2.3 Time-varying bandwidth
Often it is required to change the bandwidth over time, for example in situations with very high slew
rates or inaccurate frequency vectors. Recalling the cost function (8.8), the introduction of a N × N
diagonal matrix R enables us to choose r-values per sample:
J(x) = εHRTRε+ ηHη =
xHATRTRAx+(yT − xHCH
)(y −Cx) (8.23)
After putting the derivative to zero, one yields a filter matrix B =(ATRTRA+ I
)that is still
positive definite, since RTR is positive and diagonal.
97
8. VOLD-KALMAN ORDER FILTERING
8.2.4 Multi-order tracking
The algorithm described in section 8.1 is capable of tracking a single order from a signal. Successive
orders can be tracked or extracted in a step-by-step way, as long as the orders are not close or crossing.
Equation (8.8) can be modified for simultaneous extraction ofK orders:
J(x) =
K∑k=1
r2kεHk εk + ηHη =
K∑k=1
r2kxHk AT
kAkxk +
(yT −
K∑k=1
xHk CH
k
)(y −
K∑k=1
Ckxk
)(8.24)
The derivative with respect to the amplitude xi becomes:
dJ
dxHi
= r2kATi Aixi +CH
i
K∑k=1
Ckxk −CHi y = 0 i = 1, . . . ,K (8.25)
hence coupling between the amplitude vectors is introduced. Still, CHi Ckxk = I for i = k. The
global solution reads:⎡⎢⎢⎢⎢⎢⎣
B1 CH1 C2 · · · CH
1 CK
CH2 C1 B2 · · · CH
2 CK
......
. . ....
CHKC1 CH
KC2 · · · BK
⎤⎥⎥⎥⎥⎥⎦
⎡⎢⎢⎢⎢⎢⎣x1
x2
...
xK
⎤⎥⎥⎥⎥⎥⎦ =
⎡⎢⎢⎢⎢⎢⎣CH
1 y
CH2 y
...
CHKy
⎤⎥⎥⎥⎥⎥⎦ (8.26)
The matrix on the left-hand side has grown to a dimension of NK × NK and is still complex
Hermitian. However, it is no longer banded diagonal, which implies that the solution can not be
obtained efficiently by forward reduction and backward substitution with a Cholesky factorisation.
The most effective way to solve (8.26) appears to be the preconditioned conjugate gradient method
(PCG), as discussed in [10].
The major benefit of the multi-component algorithm is the ability to track crossing orders while
preserving the correct amplitude. The off-diagonal terms take care of energy spreading for order
phasors that are not orthogonal. Note however that the cross terms CHi Ck should disappear for
harmonics, since harmonics are by definition orthogonal to each other.
8.2.5 Example 12: Helicopter signal seperation
As an illustration of VK filtering, a recording of a fly-by helicopter is considered. The original signal
is denoted by y[n], sampled at fs = 4096Hz. The STFT of y[n] is shown in figure 8.4.
The original signal y is being separated into three signals: the the main rotor signal y(m), the tail
rotor signal y(t) and the remaining signal y(r).
Main rotor
The fundamental frequency vector of the main rotor was determined by the time-domain tracking
technique of section 7.2.3 and stored as vector f(m)0 . A variable bandwidth vector was defined that
98
8.2. VKF OPERATION
Frequency [Hz]
Tim
e [s
]
0 200 400 600 800 1000
5
10
15
20
25
30
35
40
45
Figure 8.4: STFT of the original fly-by helicopter signal.
varies between 1 and 4 Hz, based on the rate of change of f(m)0 , hence the relative bandwidth lies
between 0.5 and 2%. The filter order is p = 2; the first K = 120 orders are extracted.
The STFT of the extracted signal y(m) and the filtered signal y − y(m) are shown in figure 8.5(a)
and 8.5(b). The main rotor signal contains a large part of the energy (see section 2.4.3): 84.3% of the
original signal.
Tail rotor
Having subtracted the main rotor signal, the tail rotor harmonics with fundamental starting at 80Hz
are now much easier to identify in figure 8.5(b). A frequency vector f (t) is obtained following
Frequency [Hz]
Tim
e [s
]
0 200 400 600 800 1000
5
10
15
20
25
30
35
40
45
(a) STFT of main rotor signal
Frequency [Hz]
Tim
e [s
]
0 200 400 600 800 1000
5
10
15
20
25
30
35
40
45
(b) STFT of filtered signal
Figure 8.5: Extraction of main rotor signal using VKF.
99
8. VOLD-KALMAN ORDER FILTERING
Frequency [Hz]
Tim
e [s
]
0 200 400 600 800 1000
5
10
15
20
25
30
35
40
45
(a) STFT of tail rotor signal
Frequency [Hz]
Tim
e [s
]
0 200 400 600 800 1000
5
10
15
20
25
30
35
40
45
(b) STFT of remaining signal
Figure 8.6: Extraction of tail rotor signal.
the same approach as before. This time, 30 harmonics are included. The filtering is performed on
the newly obtained signal y − y(m). The resulting tracked signal is y(t), the remaining signal is
y(r) = y − y(m) − y(t).
The STFTs are shown in figure 8.6(a) and 8.6(b). The tail rotor energy is 54.9% of the previously
filtered signal and only 7.6% of the original signal.
Remaining signal
The remaining signal y(r) still contains audible content of the main and tail rotor, but much less
than before and mostly in a higher frequency range. Nevertheless, some chirping birds can now be
distinguished, that were almost drowned in the original signal.
8.3 Summary
Vold-Kalman filtering can be used to extract instationary periodic components from a signal. The
algorithm is formulated as a least-squares problem, that can be solved efficiently using a Cholesky
factorisation. The filtering process is controlled by a frequency vector and a (possibly) time-varying
bandwidth. The obtained order has zero phase delay and can thus be subtracted from the original
signal to obtain a filtered signal.
The VKF can be interpreted as a running band-pass filter on a single order. If crossing orders appear
in the signal, a multi-order implementation can be employed. However, this is numerically more
demanding and in case of harmonic orders unnecessary, as harmonic orders are orthogonal.
The VKF was applied to a recording of a fly-by helicopter and appeared to be able to separate the
independent contributions of the main and tail rotor from the signal.
100
Appendices
AppendixA
C-code
A.1 hdtft
The following code is used to efficiently calculate equation (4.12), i.e. the H partials for a tonal
component at f0. It is wrapped by a Matlab R© gateway function (not shown here) that accepts the
syntax:
� � ����������������
� is the (windowed) signal block, �� the sample rate in Hertz, � is a 1 × F vector with one or more
fundamental frequencies and � is a H × 1 vector specifying the harmonics. The algorithm returns a
H × F array � containing the harmonic partials estimated for the given fundamental frequencies.
�������� ���
�������� �� �
������� ���� ����������������������
���� ���� ���!�� "�# $ ���!�� "�� $ ���!�� "%$ ���!�� " $
���!�� �& $ ���!�� "�$ '(�)� *$ '(�)� +$ '(�)� ,-
.
'(�)� �$ /$ �0
��# �120 �3,0 �44- .
��# /120 /3+0 /44- .
��# �120 �3*0 �44- .
" �#4�"+4/- 41 ��& �" ���� " " 4/- " " �4�- " �5�&- " " %4�-0
" ��4�"+4/- 41 &�� �" ���� " " 4/- " " �4�- " �5�&- " " %4�-0
6
" �#4�"+4/- 1 " �#4�"+4/- 5 *0
" ��4�"+4/- 1 " ��4�"+4/- 5 *0
6
6
6
103
AppendixB
Matlab R© functions
B.1 window
This function returns time-domain windows as discussed in section 2.5. Many popular windows
are implemented as formulated by [12]. Also, the cosine-sigma window is available, as proposed in
section 2.5.6.
�������� ' 1 '����' �%7� $ *$ ��#�#8�� -
9:�*;<: =���#���& � ��� '����'�
9 : 1 :�*;<: >?�@ $ *- #���#�& � � ��� '����' '�� * 7����& $
9 &7������� !% >?�@ � > � &�77�#��� '����' �%7�& �#�A
9
9 B�����8���# *��� -
9 +�����8 +��� -
9 C�&��� 7-
9 C�&���(�8� 7$ &�8�-
9 +���8
9 D���/��+�##�& E�� �� �� ��F-
9 G���)�& B�����-
9 B��&)
9 =��&&��� &�8�-
9 D�#�����
9 >#���8���#
9 ����H�������&&��
9 D� ��
9 +�����8���&&�� �-
9 C �!%& �� ��7 �-
9
9 : 1 :�*;<: >?�@ $ *$ IB=� $ IB=� - ����7�& ���������� �#8����& ��
9 ����#�� � � '����' $ �& & �'� �!���� (�8� �& ��'�%& #������� �� �
9 � &�� '����' ���8� �
9
9 D% ������� $ � +�����8 '����' '�� * 1 ���� �& #���#��� �
9
9 ,�# �7���������� �� � � '����' ��������& $ #���# ��A
9 ,�J� +�##�&A <� � � �&� �� '����'& ��# �#���� ����%&�& '�� � �
104
B.1. WINDOW
9 ��&�#��� ,��#��# �#��&��# � �#�� � �� � � �@@@ �� �- $ �����
9
9 :#����� !%A ���#��� ��� ��# (��K& $ �2���
�� ��#8�� 3 �
�%7� 1 L+�����8 L0
���
�� ��#8�� 3 �
* 1 ����0
��&�
* 1 ���!�� *-0
���
� 1 2A * M�--0
� 1 �5*0 9 2 31 � 3 �
�� 1 � M 2��0
�%7� 1 ��������&�#��8 � �# �%7� -$.LB�����8���# L$ L*��� L$ L+�����8 L$ ���
L+���8 L$ LC�&��� L$ LC�&���(�8� L$ LD���/��+�##�& L$ LG���)�& L$ ���
LB����� L$ LB��&)L$ L=��&&��� L$ LD�#����� L$ L>#���8���# L$ ���
L����H�������&&�� L$ LD� ��L$ L+�����8���&&�� L$ LC �!%& �� L6-0
&'��� �%7�
��&� .LB�����8���# L$L*��� L6
' 1 ���& �$*-0
��&� L+�����8 L
' 1 2�� M 2�� " ��& �"7�"�--0
��&� LC�&���L
�� ��#8�� 11 �$ 7 1 ��#�#8�� .�60 ��&� 7 1 �0 ���
' 1 &�� 7�"�-�N70
��&� LC�&���(�8� L
�� ��#8�� O1 �$ & 1 ��#�#8�� .�60 ��&� & 1 2��0 ���
�� ��#8�� O1 �$ 7 1 ��#�#8�� .�60 ��&� 7 1 �20 ���
' 1 ��& �� �5 &"&P#� 7---�N70
' 1 ' �" �!& ��- 3 &"&P#� 7-"2��" 7�-0
��&� L+���8 L
' 1 2��� M 2��� " ��& �"7�"�--0
��&� LD���/��+�##�& L
�� ��#8�� 11 �$ � 1 ��#�#8�� .�60 ��&� � 1 E2��� 2�� 2�2� 2F0 ���
' 1 � �- M � �- " ��& �"7�"�-- 4 ���
� �- " ��& �" 7�"�- M 4 � �- " ��& �"7�"�-0
��&� .LG���)�& L$ LB����� L6
105
B. MATLAB FUNCTIONS
' 1 &��� �"��-0
��&� LB��&)L
' 1 � M �"�� -�N�0
��&� L=��&&��� L
�� ��#8�� 11 �$ & 1 ��#�#8�� .�60 ��&� & 1 2��0 ���
' 1 ��7 M2�� " ��5&-�N�-0
��&� .L>#���8���# L$LD�#����� L6
' 1 � M �" �!& ��-0
��&� L����H�������&&�� L
' 1 �!& �� - 312���- �" � M�" �!& �" �� --�N� �" �M�!& �"�� --- 4 ���
�!& �� - O2���- �" �" � M �!& �" �� --�N�-0
��&� LD� ��L
' 1 �M �!& �"�� -- �" ��& 7�"�!& �" ��-- 4 �57� " &�� 7�"�!& �"�� --0
��&� L+�����8���&&�� L
�� ��#8�� 11 �$ � 1 ��#�#8�� .�60 ��&� � 1 2��0 ���
' 1 2�� " � 4 ��& �" 7�"��-- �" ��7 M�"�" �!& �� --0
��&� LC �!%& �� L
�� ��#8�� 11 �$ ��7 � 1 ��#�#8�� .�60 ��&� ��7 � 1 �0 ���
!��� 1 ��& �5*"���& �2N ��7 �--0
: 1 M�-�N� �" ��& *"���& !��� "��& 7�"�--- �5 ��& *"���& !��� --0
' 1 #��� ���� :--0
' 1 ' �" *5&� '-0
�� �#'�&�
�##�# L*�� � ����� '����' �%7� �L-
���
9 *�#���)� '����' �� � ;C- 1 2 �DA
9 ' 1 ' �" *5&� '-0
���
106
B.2. INTERPOLANT
B.2 interpolant
The following function returns an interpolation function for a given signal y[n] and time vector t[n].
The obtained function can be used for rapid non-uniform interpolation. The syntax is:
��� � ���������������������������
������ � ������������
The first command generates an 8th order spline interpolation function from the signal vector � and
time vector �. The second command evaluates the function for the time instances in ������.
�������� ��� 1 ����#7����� %$�$�� �� $�#�-
9�*>@B�<GI*> =���#���& � ����#7������� �������� �
9 ,Q* 1 �*>@B�<GI*> ?$>$�@>+<; $<B;- #���#�& � �������� ����� � ��
9 ����7�& � �����# �� ��� 7����&� ? �& � � &�8��� ���� $ > � �
9 ��##�&7�����8 ��� ���� � �@>+<; &7������& � � ����#7������� �� ���
9 <B; ��� !� � � #���� �� �7&�7���8 $ � � &7���� �#��# �# � � ��!�#
9 �� 7����& �� � � &��� &������ A
9
9 �@>+<; <B;
9 M ���#�&� �7&�7���8
9 M �����# �7&�7���8
9 M 7� �7 �7&�7���8
9 M ��!�� �7&�7���8
9 M &7���� &7���� �#��#
9 M &��� 7����& �� &��� &������
9
9 �� > �& � &����# $ �� �& ��/�� �& � � &�7���8 #��� ,( ��� � � ���
9 �����# �& 8���� !% > 1 2A,( A G@*=>+ ?-M�-",(-� �� > 1 M�$ #�8���#
9 �M !�&�� �������8 �& �7������� �
9
9 D% ������� $ � �M7���� &7���� ����#7������� �& �7������� �
9
9 (�� �� ��& �&� � � ��8�#�� & �� � � �I>GID C�#�� ,�����8 >���!�� �
9 (�� ��&� A ����#7� $ #�&�7�� �
9
9 :#����� !%A ���#��� ��� ��# (��K& $ �2���
�� ��#8�� 3 �$ �#� 1 �0 ���
�� ��#8�� 3 �$ �� �� 1 L&7����L0 ���
�� ��#8�� 3 �$ � 1 �0 ���
�� ��#8�� 3 �$ �##�# L(7����% �� ���&� %�L-$ ���
9 C ��/ ��7��&
�����������#�!���& %$.L���#�� L6$.L#��� L$L�����#L6-0
�����������#�!���& �$.L���#�� L6$.L#��� L6-0
�� �� 1 ��������&�#��8 �� �� $ .L���#�&� L$L�����#L$L7� �7L$ ���
L��!��L$L&7����L$L&��� L6-0
�����������#�!���& �#� $.L���#�� L6$.L�����8����� L$L����8�# L$L&����#L6-0
9 *�!�# �� &�7��&
107
B. MATLAB FUNCTIONS
� 1 ���� %-0
9 (�� �#��# 5 &�7�#&�7���8
�� �&�!�# �� �� $.L&7���� L$L&��� L6-
# 1 �0
P 1 �#�0
��&�
# 1 �#�0
���
9 I77�% &�7�#&�7���8 '�� #��� #
�� # R1 �
% 1 #�&�7�� %$#$�-0
���
9 (�7�#&�7��� - ��� �����#
�� �&&����# �-
9 � �& � � &�7�� #���
�& 1 �0
�� �& 3 2
9 B�8���# �������8 &�%��
� 1 �A �"#-0
� 1 ��5#0
��&�
9 >�� �������8 &�%��
� 1 2A �"# M �-0
� 1 ��5 �&"#-0
���
��&�
9 � �& � � ��� �����#
>� 1 � �-0
�& 1 �5 � �-M >�-0
� 1 2A �"# M �-0
� 1 ��5 �&"#- 4 >�0
���
9 C��&�#��� ����#7������� �!K���
&'��� �� ��
��&� L&��� L
�� P 11 �
P 1 ���0
���
��� 1 S ��- �&�� �� $�-�" ����#7&��� �$%$�� $P-0
��&� L&7����L
&'��� P
��&� 2
��� 1 ����#7����� %$�$L���#�&� L$�$&����� -0
#���#�
108
B.2. INTERPOLANT
��&� �
��� 1 ����#7����� %$�$L�����#L$�$&����� -0
#���#�
���
77 1 &7�7� P$�$%-0
��� 1 S ��- �&�� �� $�-�" &7��� 77 $��-0
�� �#'�&�
77 1 ����#7� �$%$�� �� $L77L-0
��� 1 S ��- �&�� �� $�-�" 77��� 77 $��-0
���
�������� � 1 �&�� �$!-
� 1 � O ! �-- �" � 3 ! ��� ---0
���
�������� � 1 ����#7&��� �$%$�� $P-
� 1 ���8� ��-0
�� 1 � �-M� �-0
� 1 )�#�& �$�-0
��# � 1 �A�
�� 1 �� �- M ��L-�5��0
��� 1 ���� �!& ��- 3 P5�-0
� �- 1 % ���- " &��� �� ��� --0
���
���
109
B. MATLAB FUNCTIONS
B.3 timewarp
The function ������ uses an interpolant function obtained by ���������� (see appendix B.2)
and returns time-warped blocks for B time instances tb and linear chirp rates αb. The block size is
specified byM ; an array is returned of size B ×M .
�������� T 1 ���'�#7 ��� $ �& $ �$ �$ ��7 �-
9>��@:IB� B���#� ��� M'�#7�� !���/& �� � &�8����
9 T 1 >��@:IB� ,Q* $,( $�$>$IG�+I- #���#�& ��� M'�#7�� !���/& �#� � �
9 ����#7����� �������� ,Q*� ,( �& � � &�7�� #��� $ � � � !���/ &�)� ��
9 � � '�#7�� !���/& $ > � �����# '�� �����# ��&�����& ��� IG�+I � �
9 ��##�&7�����8 �����# � �#7 #���&� > ��� IG�+I ��� ��� �# !� &����#&
9 �# �����#& �� ���8� D� T �& �� D � � �##�% '�� ��� M '�#7�� !���/&�
9
9 (�� ��&� A �*>@B�<GI*> $ :IB�@;>��DB@ �
9
9 :#����� !%A ���#��� ��� ��# (��K& $ �2���
�#%
� 1 � A-0
��7 � 1 ��7 � A-0
9 *�!�# �� !���/&
D 1 �� E ���� �-$���� ��7 � -F-0
� 1 ���& D $�- �" �0
��7 � 1 ���& D$�- �" ��7 �0
���� 9��/
�##�# L� ��� ��7 � & ���� ��� � � &�� ���8� �L-
���
9 C��7 ��7 � �� ��� &�77�#� ����
��7 ���� 1 �5 �"�5�&-0
�� �� �!& ��7 �-- O ��7 ���� 0
��7 � �!& ��7 �- O ��7 ���� - 1 ��7 ���� 0
�7#���� L:IB*�*=A ��7 � �& ���77�� �� 9��U�L$��7 ���� -
���
9 G���� ��� �����#
�! 1 2A � M�-- M �5�-5 �&0
9 :�#7 ��������
'�#7���� 1 S �- M��5� 4 &P#� � 4 ��"��" �!-�5�0
9 �#� M�������� ��� M'�#7�� !���/&
T 1 )�#�& D$�-0
��# !� 1 �AD
�� �!& ��7 � !�-- 3 2�22�0 9 ;� ��� ��� '�#7
110
B.3. TIMEWARP
��'�#7 1 �!0
��&�
��'�#7 1 '�#7���� ��7 � !� --0
���
9 ( ��� �� !���/ ���
��'�#7 1 ��'�#7 4 � !�-0
9 ����#7�������
� 1 ��� ��'�#7 -0
9 *�#���)����� ��# �����#��%
9 � 1 �!& � 4 ��7 � !� -�"�! --�N M�5�- �" �0
9 (��#� �� ��#��
T !� $A- 1 �0
���
���
111
B. MATLAB FUNCTIONS
B.4 warpedtimbre
The following function computes the warped timbre (see section 4.3) for time points tb, centre
frequencies fb and chirp rates αb using an interpolant function obtained from ����������. The
harmonics are specified by h, the window size isM .
�������� � 1 '�#7����!#� ��� $�& $�$�$��7 � $ $�$'����'�%7� -
9:IB�@;>��DB@ ;���#���& � � ��� M'�#7�� ��!#� �� � &�8����
9 C 1 :IB�@;>��DB@ ,Q* $,( $>$,$IG�+I $+$�$ :;:>?�@- #���#�& � � '�#7��
9 ��!#� �� � &�8��� $ ��#���&�� �� � � 7 �&� �� � � ����������
9 7�#���� � > � �����'��8 �#8����& �&� !� &7������� A
9
9 ,Q* ����#7����� �������� �!������ !% �*>@B�<GI*>
9 ,( (�7�� #��� �� +�#�)
9 > C����# ��� �# �����# �� �����# ��� 7����&
9 , ,#�P����% �# �����# �� �#�P������&
9 IG�+I C �#7 #��� �# �����# �� � �#7 #���&
9 + ��!�# �� �#����&
9 � :����' &�)�
9 :;:>?�@ :����' �%7�
9
9 >$,$ ��� IG�+I ��� !� ��� �# �����#& �# &����#& �� ���8� D� > �
9 #���#��� ��!#� C �& &�)� + � D�
9
9 (�� ��&� A �*>@B�<GI*> $ >��@:IB� �
9
9 :#����� !%A ���#��� ��� ��# (��K& $ �2���
9 H������� ��7��
�����������#�!���& ��� $.L��������� ����� L6$.L&����#L6-
�����������#�!���& �& $.L���#�� L6$.L7�&����� L$L#��� L$L����8�# L$L&����#L6-
�����������#�!���& �$.L���#�� L6$.L�����8����� L$L#��� L$L�����# L6-
�����������#�!���& �$.L���#�� L6$.L�����8����� L$L#��� L$L�����# L6-
�����������#�!���& ��7 � $.L���#�� L6$.L�����8����� L$L#��� L$L�����#L6-
�����������#�!���& $.L���#�� L6$.L7�&����� L$L#��� L$L����8�# L$L�����#L6-
�����������#�!���& �$.L���#�� L6$.L7�&����� L$L#��� L$L����8�# L$L&����#L6-
�� ��#8�� 3 �
'����'�%7� 1 L=��&&��� L0
���
� 1 � A-0
� 1 � A-0
��7 � 1 ��7 � A-0
9 *�!�# �� ��!#� 7����&
D 1 �� E ���� �-$���� �-$���� ��7 � -F-0
� 1 ���& D$�- �" �0
� 1 ���& D$�- �" �0
112
B.4. WARPEDTIMBRE
��7 � 1 ���& D$�- �" ��7 �0
9 *�!�# �� �#����&
�� �&&����# -
+ 1 0
1 �A 0
��&�
+ 1 ���� -0
���
9 �#� M ��������
� 1 ��7��� )�#�& +$D-$2-0
9 ;����� '����'
' 1 ����'����' '����'�%7� $�-0
9 <!���� '�#7�� ��� M!���/&
T 1 ���'�#7 ��� $ �& $ �$ �$ ��7 �-0
9 B������
��# !� 1 �AD
9 ��#��# ��7�������
� A$!�- 1 ���� ���� '�"T !� $A-$�& $� !�-$ -0
��# � 1 �A+
�� �- 11 �
�� 1 � �$!�- �5 �!& � �$!�--0 9 C�7��� ��8�� �� ����������
���
9 *�#���)� 7 �&� �� ����������
� � $!�- 1 � � $!�- 5 ��N �--0
���
���
���
113
B. MATLAB FUNCTIONS
B.5 vkf
This function implements the Vold-Kalman filter as discussed in chapter 8. The syntax is:
� � ��������������� ���������
!��" � ��������������� ���������
The first command returns � orders as a single signal. The second command returns � complex
amplitudes and phasors, that may be combined to the orders by � � #$����%$��.
�������� ��#�#8��� 1 �/� %$�& $�$��#�#8�� -
9HV, ��� =���#����� H��� MV���� <#��# ,����#��8 �
9 � 1 HV, %$�& $�- ���#���& � � �#��# '�� �#�P����% �����# � �#� &�8���
9 % '�� &�7��#��� �& $ �&��8 � �M7��� �����# '�� � !���'��� �� �9 ��
9 � � &�7��#��� � > � ���7�� �& � '�����# �
9
9 � 1 �/� %$�& $�$V$7$D$L� �� L- ���#���& �#� &�8��� % � � �#����
9 �#��#& �� �����# V ��##�&7�����8 �� � � ���������� �#�P����% ��
9 �����# �$ �&��8 � 7M7��� �����# �$ �$ � �# �- ��� � &���������%
9 !���'��� �����# D �� +�#�)� D ��� !� � &��8�� ����� $ � E&��#� ���F
9 �����# �# � �����# '�� � � &�� ���8� �& %� �� � � ����������
9 �#8���� L� �� L �& 8���� $ � C ���&/% �����#�)����� '��� !� 7�#��#�� $
9 ' �� �8 � &��� ��7������� ��� ��# ��#8� V �����#&�
9
9 E�$�F 1 �/� %$�& $���- #���#�& � � ��7��� �7������ ��� 7 �&�# $ &��
9 � �� � � �#��# '�����#& ��� !� #����&�#����� !% � 1 �" #��� ��"�-�
9
9 E�$�$#F 1 �/� %$�& $���- ��7��& �� ���������� �����# # '�� � � �&��
9 &���������% �����& $ ��#���� �#� !���'��� �����# D�
9
9 :#����� !%A ���#��� ��� ��# (��K& $ �2���
���
&'��� ��#8��
��&� �
V 1 �0
7 1 �0
D 1 �& 5�220
�� �� 1 L��#���L0
��&� �
V 1 ��#�#8�� .�60
7 1 �0
D 1 �& 5�220
�� �� 1 L��#���L0
��&� �
V 1 ��#�#8�� .�60
7 1 ��#�#8�� .�60
D 1 �& 5�220
�� �� 1 L��#���L0
��&� �
V 1 ��#�#8�� .�60
114
B.5. VKF
7 1 ��#�#8�� .�60
D 1 ��#�#8�� .�60
�� �� 1 L��#��� L0
��&� �
V 1 ��#�#8�� .�60
7 1 ��#�#8�� .�60
D 1 ��#�#8�� .�60
�� �� 1 ��#�#8�� .�60
�� �#'�&�
�##�# 8���#���&8�� LI#8C /L-$ ���
L:#��8 ��!�# �� ��7�� �#8����& � Q&�A �/� %$�& $�$V$7$D$�� ��-�L-0
���
% 1 % A-0
* 1 ���8� %-0
�� �&&����# �-
� 1 ��" ���& *$�-0
��&��� ���� �- 11 *
� 1 � A-0
��&�
�##�# 8���#���&8�� LI#8C /L-$���
LH����# � & ���� ��� �$ � �# 9� ������& �L$*-
���
D 1 D A-0
V 1 V A-� L0
��&7 L MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMML-
��&7 EL ��� =���#����� H��� MV���� <#��# ,����# L ���&�# 7- LM7��� LF-
��&7 L MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMML-
�#�'��'
&'��� 7
��&� �
I 1 &7���8& ���& * M� $�-"E� M�F$2A�$*M�$*-0
# 1 &P#� &P#� �- M�-�5 � M �"��& 7�5�&�"D---0
#�� 1 &P#� �5 �" �7& --0
��&� �
I 1 &7���8& ���& * M� $�-"E� M� �F$2A�$ *M�$*-0
# 1 &P#� &P#� �- M�-�5 � M �"��& 7�5�&�"D- 4 �"��& �"7�5�& �"D---0
#�� 1 &P#� �5 �" �7& --0
��&� �
I 1 &7���8& ���& * M� $�-"E� M� � M�F$2A�$ *M�$*-0
# 1 &P#� &P#� �- M�-�5 �2 M �2" ��& 7�5�&�"D- ���
4 ��" ��& �" 7�5�&�"D- M �" ��& �" 7�5�&�"D---0
#�� 1 &P#� �5 �2" �7& --0
��&� �
I 1 &7���8& ���& * M� $�-"E� M� � M� �F$2A�$ *M�$*-0
# 1 &P#� &P#� �- M�-�5 �2 M ���" ��& 7�5�&�"D- ���
115
B. MATLAB FUNCTIONS
4 ��" ��& �"7�5�&�"D- M ��" ��& �"7�5�&�"D- 4 �" ��& �" 7�5�&�"D---0
#�� 1 &P#� �5 �2" �7& --0
�� �#'�&�
�##�# 8���#���&8�� LI#8C /L-$���
L*�!�# �� 7���& �&� !� �$ �$ � �# �� L-
���
9 C ��/ ��# !�� �����������8
�� �� #- O #��
�##�# 8���#���&8�� LD��C���������� L-$���
LD�� ����������� DM��#�� ��� �� ��� ��#8� #M�����& 9�� 8-�L$�� #--
��&��� �� #- O #�� "&P#� 2��-
'�#���8 8���#���&8�� LD��C���������� L-$���
LB�&���& �% !� ������#��� ��� �� ��#8� #M�����& 9�� 8-�L$�� #--
��&��� R�&#��� #-
�##�# 8���#���&8�� LD��C���������� L-$���
LD�� ����������� DM��#�� ��� �� ��7��� #M�����&�L-
���
�� &�)� #$�- 11 �0
D 1 IL"I-�" # �-�"# �-- 4 &7�%� *-0
��&��� &�)� # $�- 11 �0
B 1 &7���8& ���&7��� # �-$# �-$*-�L$2$*$*-0
D 1 B" IL"I-"B 4 &7�%� *--0
��&��� &�)� # $�- 11 *0
B 1 &7���8& #$2$*$*-0
D 1 B" IL"I-"B 4 &7�%� *--0
��&�
�##�# ELH����# # �&� !� �� ���8� �$ � �# L ���&�# *- L�LF-
���
����# I B
�� &�#�7� �� �� $L� �� L-0
D� 1 � �� D-0
����# D
��&7 L M DM��#�� C ���&/% �����#�)�� L-
���
�� ��#8��� 11 � 9 <��7�� � &��8�� '�����#
'��� 1 )�#�& �$*$L���!�� L-0
��# / 1 V
��&7 EL M �#��# L ���&�# /- L5L ���&�# ���8� V--F-
� 1 ��7 M��"7�"��&� /�"�-�5�& -0
�� &�#�7� �� �� $L� �� L-
� 1 D�U D� �LU ��"%--0
��&�
� 1 DU ��"%-0
���
'��� 1 '��� 4 �" #��� ��" ���K �---� L0
���
116
B.5. VKF
��#�#8��� .�6 1 '��� 0
��&��� ��#8��� O1 �- 9 <��7�� ��7��� �7������& ��� 7 �&�#&
� 1 )�#�& *$�� V-$L���!��L-0
� 1 )�#�& *$�� V-$L���!��L-0
��# / 1 V
��&7 EL M �#��# L ���&�# /- L5L ���&�# ���8� V--F-
� A$/- 1 ��7 M��"7�"��&� /�"�-�5 �&-0
�� &�#�7� �� �� $L� �� L-
� A$/- 1 D�U D��LU � A$/-�"%--0
��&�
� A$/- 1 DU � A$/-�"%-0
���
���
��#�#8��� .�6 1 ��L0
��#�#8��� .�6 1 ���K �-�L0
���
�� ��#8��� 11 �
��#�#8��� .�6 1 #�L0
���
��&7 EL " ,���& �� �� L ���&�# ���- L &�����&�LF-
���
117
Bibliography
[1] Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions With Formulas,
Graphs, and Mathematical Tables. Applied Mathematics Series 55. National Bureau of Standards,
tenth printing edition, 1972.
[2] Jason R. Blough. Improving the analysis of operation data on rotating automotive components. PhD
thesis, University of Cincinnati, Department of Mechanical, Industrial and Nuclear Engineering
of the College of Engineering, 1998.
[3] Boualem Boashash and Senior Member. Estimating and interpreting the instantaneous frequency
of a signal. In Proceedings of the IEEE, pages 520–538, 1992.
[4] Judith C. Brown. Calculation of a constant Q spectral transform. The Journal of the Acoustical
Society of America, 89(1):425–434, 1991.
[5] Pablo Cancela, Ernesto López, and Martín Rocamora. Fan chirp transform for music
representation. In Proceedings of the 13th Conference on Digital Audio Effects (DAFX-10), pages
1–8, 2010.
[6] James W. Cooley and John W. Tukey. An algorithm for the machine calculation of complex
fourier series. Mathematics of Computation, 19:297–301, 1965.
[7] Patricio De La Cuadra and Aaron Master. Efficient pitch detection techniques for interactive
music. In Proceedings of the 2001 International Computer Music Conference, La Habana, pages
403–406, 2001.
[8] Alain de Cheveigné and Hideki Kawahara. Yin, a fundamental frequency estimator for speech
and music. Journal of the Acoustical Society of America, 111(4):1917–1930, 2002.
[9] Robert B. Dunn, Thomas F. Quatieri, and Nicolas Malyska. Sinewave parameter estimation using
the fast fan-chirp transform. In IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics (WASPAA ’09), pages 349–352, 2009.
[10] Christian Feldbauer and Robert Höldrich. Realization of a Vold-Kalman tracking filter - a least
squares problem. In Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00),
Verona, Italy, December 2000.
[11] David Gerhard. Pitch extraction and fundamental frequency: History and current techniques.
Technical report, Dept. of Computer Science, University of Regina, 2003.
118
BIBLIOGRAPHY
[12] Fredric J. Harris. On the use of windows for harmonic analysis with the discrete Fourier
transform. Proceedings of the IEEE, 66(1):51–83, June 1978.
[13] Edward W. Kamen and Bonnie S. Heck. Fundamentals of Signals and Systems using the Web and
MATLAB. Prentice Hall, Upper Saddle River, New Jersey 07458, third edition edition, 2007.
[14] Alan V. Oppenheim and Ronald W. Schafer. Discrete-Time Signal Processing. Prentice Hall, Upper
Saddle River, New Jersey 07458, 1989.
[15] P.V. Sankar and L.A. Ferrari. Simple algorithms and architectures for B-spline interpolation. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 10(2):271–276, 1988.
[16] Phil Schniter. Time-frequency uncertainty principle. Connexions web site, accessed May 21
2011. ����&''���%��('�������'�)*+),'#%)�'.
[17] Xavier Serra. Musical sound modeling with sinusoids plus noise. In Musical Signal Processing,
pages 497–510. Swets & Zeitlinger Publishers, 1997.
[18] Julius Orion Smith. Spectral audio signal processing, october 2008 draft. Stanford On-line book,
accessed May 19 2011. ����&''����%�������%���'-.��'���'.
[19] Jiri Tuma. Setting the passband width in the Vold-Kalman order tracking filter. In Proceedings of
the International Congress on Sound and Vibration (ICSV12), Lisbon, Portugal, 2005.
[20] Håvard Vold and Jan Leuridan. High resolution order tracking at extreme slew rates, using
Kalman tracking filters. Technical Report 931288, Society of Automotive Engineers, 1993.
[21] Luis Weruaga and Márian Képesi. The fan-chirp transform for non-stationary harmonic signals.
Signal Processing, 87(6):1504 – 1522, 2007.
119
Index
aliasing, 27, 33
bandwidth, 64
effective, 64
basis vector, 26
orthogonality, 26
transformation, 26
chirp
rate, 50
wave, 50
complex conjugate, 20, 34
complex exponential, 20
discontinuity, 34, 35
dynamic range, 18
Euler’s formula, 20
fan chirp transform, 77
block chirp rate, 79
short-time, 78
Fourier, 25, 29
analysis, 25
series, 29
transform, 32
continuous-time, 32
discrete, 33
discrete-time, 32
fast, 35
short-time, 71
frequency, 19
base, 21
constant, 20
fundamental, 22
instantaneous, 20
frequency domain, 25
harmonic, 21
partials, 22
harmonics, 30
inharmonicity, 62
interpolation, 54
approaches, 54
cubic spline, 54
linear, 54
nearest neighbour, 54
sinc, 54
spline, 54
basis functions, 55
coefficients, 55
Kronecker delta, 21
localisation, 26, 44, 75
noise, 19
orthogonality, 21, 32
Parseval’s theorem, 35
period, 19
periodicity, 19
phase, 19
partials, 22
shift, 20
phasor, 91
pitch, 85
120
INDEX
detection, 86
salience, 87
tracking, 85, 86
Plancherel theorem, 35
quantisation, 18
resampling, 54
resolution, 18
spectral, 64, 73
temporal, 73
roll-off, 38
salience, 87
tracking, 88
sample rate, 17
sampling, 17
aliasing, 27
theorem, 27, 30, 54
short-time
blocks, 71
Fourier transform, 71
overlap, 75
resolution, 73
windowing, 75
zero-padding, 75
signal
analogue, 17
blocks, 71
continuous-time, 17
digital, 18
discrete-time, 17
modelling, 21
monophonic, 23
non-stationary, 20
periodic, 19
polyphonic, 23
stationary, 20
tonal, 22
sinusoid, 20
complex exponential, 20
orthogonality, 21
spectral
leakage, 34, 36
localisation, 43
symmetry, 34
width, 44
temporal
localisation, 43
width, 44
timbre, 59
complex normalisation, 60
definition, 59
instantaneous, 61
normalised, 60
stationary, 61
tracking, 88
warped, 67
time domain, 17
signal, 17
time warping, 49
inverse, 51
inverse function, 51
linear, 49
non-linear, 53
time-frequency product, 44
tonal component, 22
non-stationary, 22
stationary, 22
uncertainty, 43
Vold-Kalman, 91
data equation, 91
filter, 91
least squares problem, 92
structural equation, 92
window, 35, 75
bandwidth, 38
coherent gain, 37
cosine, 41
cosine-sigma, 41
equivalent noise bandwidth, 38
Gaussian, 40
Hanning, 39
main lobe width, 38
noise power, 38
properties, 37
rectangular, 39
side lobe level, 38
side lobe roll-off, 38
zero-padding, 75
121