UNIVERSITY OF MIAMI
TONALITY ESTIMATION USING
WAVELET PACKET ANALYSIS
By
Vaibhav Chhabra
A Research Project
Submitted to the Faculty of the University of Miami
in partial fulfillment of the requirements for
the degree of Master of Science
Coral Gables, Florida
May 2005
UNIVERSITY OF MIAMI
A research project submitted in partial fulfillment of
the requirements for the degree of
Master of Science
TONALITY ESTIMATION USING
WAVELET PACKET ANALYSIS
Vaibhav Chhabra
Approved:
________________ _________________
Ken Pohlmann Dr. Edward Asmus
Professor of Music Engineering Associate Dean of Graduate Studies
________________ _________________
Colby Leider Dr. Paul Mermelstein
Assistant Professor of Music Engineering Professor of Electrical Engineering
DEDICATION
They say that one’s experience is what defines an individual. After all, you are
what you are because of your experiences. On that note I would like to dedicate this work
to all those who have contributed to my experience in this journey. For what I have
learned has laid the foundation for what I will learn.
I would also like to thank my family who has always been supportive of me, my
brother Ruchir who is a natural send-master, Papa and Ma thanks for keeping the faith.
All the Chacha’s, Chachi’s and cousins, thank you all for the support.
Next on my thank you list are my Tae Kwon Do buddies. Sensei Jeff thanks for
all of your advice, some day I’ll be teacher like you. Rico, training with you was an honor
(congratulations on your black-belt). Nat (Ryu) sparring with you was almost like
dancing, I’m sure you’ll be an awesome martial artist. Sensei Chikaco, Sensei Gerard,
Sensei Kat, Lora, Lila, Becky, Erwin thank you all, I’ll miss you guys.
Last and definitely not the least are my friends in Miami. Jon (silent warrior) for
teaching me all his DSP jutsu’s, Vishu(2nd best table-tennis player at UM), Marc (water),
Lindsey (Defense-Master), Rian (Gentoo), Becky(Mini Marina), Jess, Jose, Doug (me
and my beer), Drew (Send-Master), Neri (p3), Tiina(Jo-master), Rob.R, Rob.B (adidas-
w), Kai(thanks for taking care of my knee), Joe Abbati (Rathskellar) ...
Rene Descartes - “I think therefore I exist”
Vaibhav Chhabra (Meno) - “I exist therefore I oscillate”
SEND WITH RESPECT ^_^
ACKNOWLEDGMENT
I would like to especially thank my advisor and mentor, Ken Pohlmann, who has
generously given of his precious time and provided me with several great opportunities
during my time at UM. I would also like to thank my thesis committee for being patient
in spending time with me and guiding me at time when I needed them the most. I hope
we keep in touch and have opportunities to work again.
i
Table of Contents Chapter 1: Introduction......................................................................................................................................................................1
1.1 Masking.................................................................................................................................................................1
1.1.1 Threshold of Masking...............................................................................................................................2
1.2 Tonality..................................................................................................................................................................3
1.2.1 Common Tonality Classification Methods................................................................................................4
1.2.2 Typical Psychoacoustic Model....................................................................................................................4
1.2.3 Psychoacoustic Model 1...............................................................................................................................5
1.2.4 Psychoacoustic Model 2...............................................................................................................................8
Chapter 2: Signal Representation................................................................................................................................................12
2.1 Periodic Signals...............................................................................................................................................12
2.2 Fourier Series....................................................................................................................................................13
2.3 Fourier Transform...........................................................................................................................................14
2.3.1 Fourier Transform Derivation...................................................................................................................15
2.3.2 Dirac Delta Function...................................................................................................................................17
2.3.3 Fourier Coefficients.....................................................................................................................................18
2.3.4 Fourier Coefficients Derivation................................................................................................................19
2.4 Hilbert Transform...........................................................................................................................................22
2.4.1 Analytic Signal.............................................................................................................................................22
2.4.2 Hilbert Transform Theory....................................................................................................................23
2.4.3 Phase Rotation..........................................................................................................................................24
2.4.3 Complex Envelope......................................................................................................................................26
2.4.4 Advantages of the Complex Envelope....................................................................................................27
2.5 Summary.............................................................................................................................................................30
Chapter 3: Time to Frequency Mapping.................................................................................................................................31
3.1 Quadrature Mirror Filter (QMF)...............................................................................................................31
3.2 Aliasing and Imaging.....................................................................................................................................32
3.3 Distortion Transfer Function.......................................................................................................................34
ii
3.4 Polyphase Decomposition............................................................................................................................34
3.4.1 Perfect Reconstruction................................................................................................................................35
3.5 Paraunitary Property......................................................................................................................................36
3.5.1 Unitary Matrix..............................................................................................................................................36
3.6 Summary.............................................................................................................................................................37
3.6.1 Advantages of Paraunitary Filter Banks..................................................................................................37
Chapter 4: Short-Time Fourier Transform (STFT)............................................................................................................38
4.1 Analysis of the STFT equation..................................................................................................................39
4.2 STFT as a Bank of Filters............................................................................................................................40
4.3 Effects of Windowing....................................................................................................................................41
4.3.1 Choice of the Best Window.......................................................................................................................42
4.4 Summary.............................................................................................................................................................43
Chapter 5: The Wavelet Transform..........................................................................................................................................44
5.1 Weakness of the STFT..................................................................................................................................44
5.2 STFT to Wavelets...........................................................................................................................................45
5.2.1 Modifications on the STFT........................................................................................................................46
5.3 Inverse Wavelet Transform.........................................................................................................................48
5.4 Orthonormal Basis..........................................................................................................................................49
5.5 Wavelet Packet Analysis..............................................................................................................................50
5.5.1 Discrete Wavelet Transform......................................................................................................................51
5.6 Wavelet Packet Tree Representation.......................................................................................................52
5.6.1 Energy Representation................................................................................................................................52
5.6.2 Index Representation...................................................................................................................................53
5.6.3 Filterbank Representation..........................................................................................................................54
Chapter 6: Analysis and Results..................................................................................................................................................56
6.1 Detection Scheme...........................................................................................................................................57
6.1.1 Frequency Breakdown................................................................................................................................59
6.1.2 Detector Pseudocode Methodology.........................................................................................................60
6.1.3 Detection Process.........................................................................................................................................61
6.2 Node Reconstruction......................................................................................................................................63
iii
6.3 Tonality Estimation........................................................................................................................................64
6.3.1 Auto-Correlation Function.........................................................................................................................65
6.3.2 Auto-Covariance..........................................................................................................................................67
6.3.3 Type-I Analysis............................................................................................................................................68
6.3.4 Type-II Analysis..........................................................................................................................................69
6.4 Tonality Index (Time-Domain)..................................................................................................................73
6.5 Tonality Index (Frequency-Domain).......................................................................................................75
6.5.1 Comparison with Model 2..........................................................................................................................76
Chapter 7: Conclusions and Recommendations.....................................................................................................................81
References............................................................................................................................................................................................83
Appendix...............................................................................................................................................................................................85
iv
List of Figures:
Figure 1.1: General block diagram of a perceptual coder........................................................................................3
Figure 1.2: General block diagram of a psychoacoustic model............................................................................5
Figure 1.3: Tonal components identified in Model 1..................................................................................................6
Figure 1.4: Maskers Decimation in Model 1...................................................................................................................7
Figure 1.5: Block diagram of MPEG Psychoacoustic Model 1.............................................................................8
Figure 1.6: Example of a predicted masking threshold for a masker...............................................................10
Figure 1.7: General block diagram of Model 2............................................................................................................10
Figure 2.1: A continuous-time signal.................................................................................................................................12
Figure 2.2: Continuous-time sinusoidal signal.............................................................................................................13
Figure 2.3: Discrete-time unit impulse (sample).........................................................................................................17
Figure 2.4: The Dirac Delta Function................................................................................................................................18
Figure 2.5: Dirac Delta in Time-Domain.........................................................................................................................18
Figure 2.6: Dirac Delta in Frequency-Domain.............................................................................................................18
Figure 2.7: Frequency Response of Rectangular Pulse............................................................................................19
Figure 2.8: Periodic Square Wave.......................................................................................................................................20
Figure 2.9: Fourier Series Coefficients for a Periodic Square Wave...............................................................21
Figure 2.10: Cosine Wave Properties.................................................................................................................................24
Figure 2.11: Sine Wave Properties......................................................................................................................................24
Figure 2.12: Rotating Phasors to create a sine wave out of a cosine................................................................25
Figure 2.13: Hilbert Transform shifts the phase of positive frequencies by -90° and negative
frequencies by +90°......................................................................................................................................................................26
Figure 2.14: Spectral Properties of the Complex Exponential............................................................................26
Figure 2.15: Spectral Properties of the s(t).....................................................................................................................28
Figure 2.16: The Modulated Signal and its Envelope...............................................................................................28
Figure 2.17: Frequency Domain Representation of Complex Envelope and Analytic Signal..........30
Figure 3.1: QMF filter-bank....................................................................................................................................................31
Figure 3.2: Aliasing......................................................................................................................................................................32
Figure 4.1: FFT Block Diagram............................................................................................................................................38
Figure 4.2: STFT Represented in terms of a Linear System.................................................................................40
Figure 4.3: Rearranged STFT Representation in terms of a Linear System................................................41
Figure 4.4: STFT viewed as a Filter-Bank......................................................................................................................41
Figure 4.5: Fourier Transform of 512 (left) and 2048 (right) Samples...........................................................42
v
Figure 5.1: (a) high-frequency signal, (b) low-frequency signal x(t) modulated by the windowed
function v(t)......................................................................................................................................................................................44
Figure 5.2: Fundamental difference between the STFT (a) and the wavelet transform (b)................47
Figure 5.3: Amplitude, scale and translation plot of a continuous wavelet transform Robi, P., “The Story of Wavelets”, Rowan University ©.......................................................................................48
Figure 5.4: 3-level Wavelet decomposition tree..........................................................................................................51
Figure 5.5: (left) Frequency response obtained by scaling, (right) Filterbank representation of
discrete wavelet transform..............................................................................................................52
Figure 5.6: Depth Level-3 Energy Tree of 1kHz Signal.........................................................................................53
Figure 5.7: Depth Level-3 Index Tree of 1kHz Signal.............................................................................................54
Figure 5.8: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree....54
Figure 5.9: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree....55
Figure 5.10: Discrete wavelet packet tree (analysis stage)....................................................................................55
Figure 6.1: General block diagram of the proposed model...................................................................................57
Figure 6.2: level-1 Wavelet Packet Decomposition of a signal having multiple tone (4kHz,
10kHz, 15kHz)................................................................................................................................................................................58
Figure 6.3: level-3 Wavelet Packet Decomposition of multiple tones (4kHz, 10kHz, 15kHz)........58
Figure 6.4: level-3 Wavelet Packet Index Tree and the Coefficients of the Terminal Nodes...........60
Figure 6.5: level-2 Wavelet Packet Energy Tree Detector Code Pointers....................................................60
Figure 6.6: level-2 Wavelet Index Tree used to trace the Nodes that are sent to the tonality
analyzer...............................................................................................................................................................................................61
Figure 6.7: level-2 Wavelet Packet Energy Tree Detector Stage-I: nodes (4), (5) and (6) are
analyzed first....................................................................................................................................................................................62
Figure 6.8: level-4 Wavelet Packet Energy Tree Detector Stage-II: nodes (4), (5) and (6) are
analyzed first; green lines represent the nodes that are going to be analyzed by the tonality
analyzer...............................................................................................................................................................................................62
Figure 6.9: level-4 Wavelet Packet Energy Tree Detector Stage-III: nodes (3) is analyzed; green
lines represent the nodes that are going to be analyzed by the tonality analyzer......................................63
Figure 6.10 Wavelet Energy Tree: The white-arrows showing the two nodes used to calculate our
tonality.................................................................................................................................................................................................64
Figure 6.11: Auto-correlation Function of a Pure Tone..........................................................................................65
Figure 6.12: Auto-correlation Function of White Noise.........................................................................................66
Figure 6.13: Auto-correlation Function of Band limited Noise (0-22kHz)..................................................66
vi
Figure 6.14: Energy Tree, where the blue lines represent the nodes from which the tonality value
is calculated.......................................................................................................................................................................................68
Figure 6.15: A 4kHz tone with selected path (red arrows) and nodes used to calculate tonality
value (blue lines) [left figure]; Difference of the max values of auto-covariance [right figure]......68
Figure 6.16: A 4kHz tone with -0.9dB white-noise added, selected path (red arrows) and nodes
used to calculate tonality value (blue lines) [left figure]; Difference of the max values of auto-
covariance [right figure]............................................................................................................................................................69
Figure 6.17: A Snare Crash......................................................................................................................................................70
Figure 6.18a: Auto-Covariance of White-Noise..........................................................................................................70
Figure 6.18b: Auto-Covariance of Band-limited 0-22kHz Noise......................................................................71
Figure 6.18c: Auto-Covariance of Pure-Tone (1kHz)..............................................................................................71
Figure 6.19: A Snare crash analysis(a) Wavelet Tree, (b) Auto-Covariance..............................................72
Figure 6.20: Snare Crash (Last Frame) Auto-Covariance......................................................................................72
Figure 6.21: Tonality Index (Time-Domain) with Input Signal consisting of 1kHz tone then
Bandlimited Noise (0-22kHz) of power -20dB............................................................................................................73
Figure 6.22: Time-Domain plot of test signal (1kHz tone then Bandlimited Noise (0-22kHz) of
power -20dB)...................................................................................................................................................................................74
Figure 6.23: Tonality Index (Time-Domain) with Input Signal consisting of white noise (power -
20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB)....................74
Figure 6.24: Time-Domain plot of test signal of white noise (power -20dB) followed by a 1kHz
tone and then Bandlimited Noise (0-22kHz; power -0.9dB)................................................................................75
Figure 6.25: Frequency Map of Wavelet Tree: The red arrows represent the generated path which
consist of an array of nodes from which the last node value are taken (blue lines) to map................76
Figure 6.26a: Tonality Index – Model 2 (1kHz)..........................................................................................................77
Figure 6.26b: Tonality Index – Proposed Model (1kHz)........................................................................................77
Figure 6.27a: Tonality Index – Model 2 (4kHz)..........................................................................................................78
Figure 6.27b: Tonality Index – Proposed Model (4kHz)........................................................................................79
Figure 6.28a: Tonality Index – Model 2 (6kHz) .........................................................................................................79
Figure 6.28b: Tonality Index – Proposed Model (6kHz)........................................................................................80
vii
VAIBHAV, CHHABRA (M.S., Music Engineering Technology) Tonality Estimation using Wavelets (May 2005) Packet Analysis Abstract of a Master’s Research Project at the University of Miami. Research project supervised by Professor Ken Pohlmann. No. of pages in text: 124 Abstract: Perceptual audio coding is a novel approach to compress audio by taking
advantage of models of the human auditory system also known as psychoacoustic models.
The quality and efficiency of the encoding process depends highly on how these models
accurately characterize the nature of the audio signal, in particular its tonality attributes.
This paper explores various analysis techniques using wavelet packet tree decomposition
to accurately estimate tonality by exploiting energy and statistical information. More
specifically, the tonality estimation is based on the correlation information of the nodes
and uses wavelets such as Haar and Daubechies 1 for decomposing the signal.
1
Chapter 1: Introduction
In recent years, several advancements have been made in the field of audio
coding. One must realize that no matter how many advancements we make, the ultimate
receiver of the analysis-coding-transmission-decoding-synthesis chain is the human
auditory system. In fact, all perceptual audio coders or lossy audio coders solely rely on
the exploitation of this system. A model based on this system, also known as the
psychoacoustic model, exploits properties and tolerances of the human auditory system to
remove irrelevant components of the audio signal which are those components that do not
contribute to the auditory impression of the acoustic stimulus. Thus, these irrelevant
components of the audio signal may be removed from the initial stages of the signal
communication chain (analysis/coding) resulting in more information capacity, which can
be used to code relevant audio components. This operation is called irrelevancy reduction
and is based on the concept of masking.
1.1 Masking
Masking refers to the total and relative inaudibility of one sound component due
to the presence of another one [Ferreira, A.J.S, 1995], with particular relation to
amplitude, frequency, time [Zwicker, E, Fastl, H, 1990] and space [Blauert, J, 1993].
There are two types of masking: frequency masking (simultaneous masking) and
temporal masking. Frequency masking can also be looked as the excitation in the
cochlea’s basilar membrane that prevents the detection of a weaker sound being excited
in the same area in the basilar membrane, whereas temporal masking takes place without
the presence of the masker and maskee. One might think of it as the auditory path delay
2
between auditory neuron to the brain or the time taken by the individual to give meaning
to the auditory information.
Among the usually considered aspects of masking, simultaneous masking is, by
far, the most important source of irrelevancy reduction. It is because of this that most
perceptual audio coders involve time to frequency domain mapping in the form of sub-
band or a transform filterbank.
1.1.1 Threshold of Masking
In the context of frequency domain audio coders, the masker is the input audio
signal, containing coherent (tone-like) or incoherent (noise-like) components, and the
maskee is the quantization noise. It should be noted that the ultimate goal of a perceptual
coder is to generate a good estimate for the profile of the quantization noise that does not
cause noticeable impairments when actually added to the original signal [Ferreira, A.J.S,
1995]. In other words, the noise profile, also called Threshold of Masking [Johnston,
J.D,1988], should be optimally shaped in frequency, time and space in such a way that
the quantization noise can be efficiently masked. Several studies [Moore, C.J.B,
1982][Hellman, R.P, Harvard University] reveal that the threshold of masking may vary
substantially as a function of noise-like or tone-like nature of audio signals. As a
consequence, this aspect has a significant influence on the quality and efficiency of the
audio encoding process [Ferreira, A.J.S, 1995], as shown in Figure 1.1
3
Figure 1.1: General block diagram of a perceptual coder Introduction to Digital Audio Coding and Standards by Marina Bosi, Richard E. Goldberg
1.2 Tonality
One of the key components in the psychoacoustic model is the calculating of
tonality, whose values are used to calculate the Signal to Mask ratio which determines the
absolute masking threshold of the input signal. Different values for masking have been
reported in [Hellman, R.P, Harvard University 02138] for tone masking noise versus
noise masking tone. In [Hellman, R.P, Harvard University 02138] and [Zwicker, E, Fastl,
H, 1990] it is clear that a narrow band of noise masks a tone much more effectively than
a tone masking it. In fact, the masking effects of tone and noise of equal intensity vary by
20dB. It is interesting to note that bandlimited noise with constant SPL and varying
bandwidth flattens the masking function whereas increasing the SPL and keeping the
bandwidth constant narrows the masking function [Hellman, R.P, Harvard University
02138]. In particular, a signal can be quantized using more or less bits according to its
tonality properties, which emphasizes on the importance of accurately estimating tonality
leading to improved bit allocation.
digital in Analysis filterbank
Quantization and coding
serial bitstream multiplexing
bit stream
Calculation of masking threshold based on psychoacoustics
4
1.2.1 Common Tonality Classification Methods
Tonality in most audio coders is generally evaluated by taking a short segment of
audio samples (eg. 512 or 1024 samples) and making a spectral analysis, using for
example a FFT (fast Fourier transform). For example power and phase evolution of each
spectral component are examined, making it possible to infer the tonal behavior of the
signal at different regions of the spectrum. An average tonality measure for the whole
analyzed signal segment can be computed using the Spectral Flatness Measure (SFM).
The SFM is defined as the ratio of the geometric mean of the power spectrum to the
arithmetic mean of the power spectrum [Ferreira, A.J.S, 1995]. Once calculated, its
values are converted to dB values with the reference set to SFMdBmax=-60dB as an
estimate for tone-like signals. The SFMdB is finally converted to a tonality coefficient α
whose values range from [0,1]. The lower values indicate a global noise-like behavior
and the higher value a global tone-like behavior. This particular method is used in
psychoacoustic model-2 MPEG-AUDIO standard to classify tonality.
1.2.2 Typical Psychoacoustic Model
A typical psychoacoustic model consists of modeling a cochlea filter during its
initial stages which, models the energy or phase information based on the ear. This is
accomplished by applying the spreading function. A spreading function is a function that
models the spreading of masking curves thus, modeling the energy excitation along the
basilar membrane. This information is then passed to the tonality estimator which
determines the relevant and irrelevant components of the signal and helps in the
estimation of the masking threshold which eventually gives rise to the absolute threshold
as shown in Figure 1.2
5
Figure 1.2: General block diagram of a psychoacoustic model Johnston, J.D
1.2.3 Psychoacoustic Model-1
The psychoacoustic model 1 in the MPEG standard performs an FFT to the input
data which is windowed using a Hanning window of length 512 samples for layer I and
1024 for layer II and III. An overlap of N/16 is done between the adjacent frames, where
N is the number of samples in that frame. After applying the FFT, the signal level for
each spectral line k is calculated
)3/8][/4(log1096 2210 kXNdBLk += (1.1)
for k = 0,...N/2-1
where 1/N2 factor comes from the Parseval’s theorem and takes in account the positive
frequencies components of the spectrum. The other factor of 2 deals with the scaling of
the amplitude of the spectral components to 1 from ½, followed by the 8/3 factor which is
due to the reduction in gain of the Hanning window [Bosi, M., Goldberg, R., 2003].
Once the spectral components are calculated the sound pressure level in each sub-
band is calculated Lsb[m] corresponding to maximum amplitude FFT spectral line. The
FFT spectral line is chosen in such a way that it corresponds to the maximum scale factor
(scf).
dBmscfLmL ksb 10)768,32][(log20,max][ max10 −= (1.2)
Having calculated the sound pressure level, we next compute the mask threshold in order
to calculate the signal to mask ratio (SMR) which leads us to the tonality estimation
Cochlear Filter Modeling
TonalityEstimation
Absolute Threshold
Threshold Estimation
6
process where the model identifies peaks that have 7dB more energy than it neighboring
spectral lines as it tonal components [MPEG Standard, ISO11172-3],
dBLL jkk 7≥− + (1.3)
j is the index that varies in central frequency.
This is based on the assumption that local maximum within a critical band represents a
tonal component, as shown in Figure 1.3.
Figure 1.3: Tonal components identified in Model 1 Fabien A. P. Petitcolas University of Cambridge, England.
if Lk represents a tonal component then adjacent spectral components centered at k are
added to define a tonal masker LT, the other components are summed to gives us the
noise maskers LN. Based on this information the spread of the masking curves are defined
by applying the spreading function [MPEG Standard, IS011172-3].
Having defined the tonal and non-tonal components, the number of maskers is
reduced prior to computing the global masking threshold by eliminating maskers whose
levels are below the level of the threshold in quiet. Also, maskers extremely close to
7
stronger maskers are eliminated. If two or more components are separated in frequency
by less than 0.5 bark then only components with the highest power is retained [Johnston,
J.D, Brandenburg, K, 1990, MPEG Standard, IS011172-3, Bosi, M., Goldberg, R., 2003].
Figure 1.4: Maskers Decimation in Model 1 Fabien A. P. Petitcolas University of Cambridge, England.
Based on this information the individual masking thresholds are calculated and
summed along with the power of the threshold in quiet to give us the global masking
threshold, which leads to the calculation of the SMRs in each sub-band. This is done by
taking the difference between the maximum sound pressure level along with the
minimum global masking level of that sub-band. A general block diagram of the whole
process is shown in Figure 1.5
8
Figure 1.5: Block diagram of MPEG Psychoacoustic Model 1 MPEG Standard ISO11172-3
1.2.4 Psychoacoustic Model-2
The psychoacoustic model 2 in the MPEG standard also performs an FFT to the
Hann windowed input block, of size 1024 for all layers, however for layers II and III the
model computes two FFTs for each frame. This model uses the output of the FFT
analysis to calculate the masking curves and its associated signal to mask ratio for the
coder sub-bands [Bosi, M., Goldberg, R., 2003, MPEG Standard, ISO11172-3]
For the SPL calculation the model groups frequency lines into “threshold
calculation partitions” whose widths are roughly 1/3 of a critical band. For a sampling
rate of 44.1 kHz one single masker SPL is derived by summing the energies in each
partition [MPEG ISO11172-3, Annex D, Table 3-D.3b]. The total masking energy of the
input audio frame is then calculated by convolving a spreading function with each of the
maskers in the signal, which leads us to the tonality calculation.
FFT
Compute SPL in each sub-band
Estimate Tonal and Non-Tonal
Masker Decimation
Masking Thresholds
Global Masking Threshold
SMR Calculated
Compare to Loudness Curve
9
The tonality index in the model revolves around the core concept of how
predictable a signal is from the prior two frames [Brandenburg, K, Johnston, J.D 1990].
For each frame ‘m’ and for each frequency line ‘k’, the signal amplitude, Am[k], and
phase, ϕm[k], are predicted by linear extrapolation from the prior values as follows: [Bosi,
M., Goldberg, R., 2003, MPEG Standard, ISO11172-3]
][][][][
][][][][
211
21
kkkkkAkAkAkA
mmmm
mmmm
−−−
−−
−+=′−+=′
θθθθ (1.4)
where ][kAm′ and ][kmθ ′ represent the predicted values. These values are then mapped into
an “unpredictability measure” defined as:
][][
][sin][][sin][][cos][][cos][][
22
kAkAkkAkkAkkAkkA
kCmm
mmmmmmmmm ′+
−′′+−′′=
θθθθ (1.5)
where ][kCm is equal to zero when the current value is exactly predicted and its
equal to one when the power of either the predicted or actual signal is dramatically higher
than the previous frames [Bosi, M., Goldberg, R., 2003, MPEG Standard, ISO11172-3].
This unpredictability measure is then weighted with the energy in each partition, thus
giving us the partitioned unpredictability measure, which is then convolved with the
spreading function. The result of this convolution is normalized with the normalizing
coefficient (normb) derived from the spreading function
∑=
= max
0
),(
1b
bbbbb bvalbvalsprdngf
normb (1.6)
and then mapped onto a tonality index which is a function of the partition number whose
values vary from zero to one.
)(log43.0299.0 bebb cbt −−= (1.7)
10
The tonality index is then used calculate the masking down-shift ∆(z) in dB which
dependents of the tonal characteristics determined by the tonality index bbt . This down-
shift value is different for tonal and non-tonal signals as shown in Figure 1.6.
∆tone masking noise = 14.5 + z dB [‘z’ is the bark value from 0-24] (1.8)
∆noise masking tone = C dB [C varies from 3-6dB]
Figure 1.6: Example of a predicted masking threshold for a masker Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”
Once the global masking threshold is calculated it is eventually used for calculating SMR
values in each partition by comparing itself to the threshold in quiet and then taking its
maximum. A general block diagram of the whole process is shown in Figure 1.7
Figure 1.7: General block diagram of Model 2 Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”
UnpredictabilityLevels (Cw)
SignalLevels (eb)
SpreadUnpredictability
Level (Ctb)
Tonality Indices (tbb)
FFT
Masking Levels (thrw)
SMR per Sub-band
Spread SignalLevels (ecb,,en,nbb,nbw)
11
In general, these two psychoacoustic models are very similar in their masking
threshold calculations but vary in their tonality classification scheme. The basic problem
that arises in the classification of tonality in current perceptual models is the analysis tool
that is used to analyze the frequency content of the incoming audio segment data, also
know as the Fourier transform. Its design necessitates the trade-off between time domain
and frequency domain resolution. The more the frequency resolution the more spectral
components are used resulting in a masking function that can be estimated with better
accuracy. On the other hand, a higher spectral resolution yields lower time resolution.
The solution to this problem would be to replace the tools used to analyze the audio
segment data, with tool that has better flexibility in adapting to the signals coherent state.
These requirements are met by a variant of the short-time Fourier transform also known
as the Wavelet transform.
The purpose of this thesis is to explore tonality estimation using wavelet packet
analysis (wavelet transform in a tree structure) based on the coherence and energy
distribution of the input audio segment, which could be eventually used in a
psychoacoustic model to increase coding efficiency and quality. The first theory section
is presented in Chapters 2 and 3 which develops relevant theory used in the later chapters.
The second section, Chapters 4 and 5, discusses Fourier analysis and introduces Wavelet
theory, which leads us to Chapters 6 and 7 that focuses on the tone-detector, tonality
analyzer and experiments that compliment their performance.
12
Chapter 2: Signal Representation
To understand the basic concept of Fourier analysis we must understand how it is
used to represent a signal. A periodic signal can oscillate with a time period T and
frequency f. We proceed to our first step in the analysis of complex exponentials and
sinusoidal signal and see how both of them are related to each other. Once this gap has
been bridged we can further draw conclusions on how the Fourier series leads to the
Fourier transform and also state certain interesting characteristics of the Fourier
transform.
2.1 Periodic Signals
A signal can be classified as periodic or aperiodic. A periodic continuous-time
signal x(t) has the property that there is a positive value of T for which [Oppenheim, A.V
& Willsky, A.S, 1996]
)()( Ttxtx += (2.1)
in other words if the signal were shifted to the left it would repeat itself with a
fundamental period T as shown in Figure 2.1.
Figure 2.1: A continuous-time signal (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)
It is this property that exponentials share, specifically, tjetx 0)( ω= .This can be easily
shown when we equate the above equation with
13
)(0)( TtjeTtx +=+ ω (2.2)
Tjtjtj eee 000 ωωω =
10 =Tje ω (2.3)
Based on this result, we can conclude that a complex exponential is periodic for any
value of T if ω= 0 and if ω ≠ 0, then it has fundamental period of To (the smallest positive
value) equal to oωπ2 similarly, the sinusoidal signal )cos()( φω += tAtx o is periodic with
a fundamental period of o
oTωπ2
= , as shown in Figure 2.2
Figure 2.2: Continuous-time sinusoidal signal (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)
2.2 Fourier Series
Sinusoidal waves and complex exponentials are periodic signals with fundamental
period o
oTωπ2
= and fundamental frequencyoTπω 2
0 = . We can therefore extend our
expression of complex exponential by associating the signal with a set of harmonic
related exponentials.
14
tT
jktjkk eet
)2(0)(
πωφ == , k = 0,±1, ±2,... (2.4)
Each of these signals are multiples of the fundamental frequency ωo hence being periodic
too. When k = +1 and k = -1 both signals have a fundamental frequency equal to ωo
which are collectively referred to as the first harmonic components. Similarly, when k =
+2 and k = -2 they refer to the second harmonic components. More generally, the
components for k = +N and k = -N are referred to as the Nth harmonic components.
A complex sinusoidal signal can be represented as a linear combination of harmonically
related complex exponentials of the form:
∑ ∑+∞
∞−
+∞
∞−
==t
Tjk
ntjk
n eCeCtx)2(
0)(π
ω (2.5)
This representation is also known as the Fourier series representation. Note that the
complex exponential can be written as sines and cosines
)sin()cos()( 00 tjtetx tjwo ωω +== , therefore making the Fourier series:
∑ ∑+∞
−∞=
+∞
−∞=
+=n n
nn tBjtAtx )sin()cos()( 00 ωω (2.6)
Where An are the coefficients of the cosines and Bn the coefficients of the sines
nnn jBAC += and nnn jBAC −=−
2.3 Fourier Transform
Before we derive the Fourier transform from the Fourier series let’s understand
what a transform is and why we need it. A transform is a mathematical operation that
takes a function or sequence and maps it into another one. In our case the Fourier
transform maps a time domain function or sequence in to the frequency domain.
Transforms are useful; they may give us additional or hidden information about the
15
original function. Most of the time transform equations are easier to solve than the
original equation. They may require less storage space and hence be used for data
compression or reduction. Operations such as convolution are easier to apply on a
transformed function, rather than the original function.
Fourier said “An arbitrary function, continuous or with discontinuities, defined in a finite
interval by an arbitrarily capricious graph can always be expressed as a sum of
sinusoids”. This is seen in the Fourier series. It is by manipulating this series we can
derive the Fourier transform.
2.3.1 Fourier-Transform Derivation
This can be shown by multiplying both sides eq. (2.5) with tjnwoe− to obtain
[Oppenheim, A.V & Willsky, A.S, 1996]
∑+∞
−∞=
−− =k
tjntjkk
tjn eeCetx 000)( ωωω (2.7)
Integrating both sides from 0 to0
2ωπ
=T , we have
∫ ∫ ∑ −+∞
−∞=
− =T
tjnT
k
tjkk
tjn dteeCdtetx0 0
000)( ωωω
Here, T is the fundamental period of x(t), and consequently, we are integrating over one
period. Now interchanging the order of integration and summation yields:
⎥⎦
⎤⎢⎣
⎡= ∫∑∫ −
+∞
−∞=
− dteCdtetxT
tnkj
kk
Ttjn
0
)(
0
00)( ωω (2.8)
The evaluation of the bracketed integral is straightforward. Rewriting this integral using
Euler’s formula, we get:
16
∫ ∫ ∫ −+−=−T T T
tjn tdtnkjtdtnkdtetx0 0 0
00 )sin()cos()( 0 ωωω (2.9)
Since the integral may be viewed as measuring the total area under the functions over the
interval and we are integrating over an interval (of length T), we see that for k≠n, both
the integrals on the right-hand of eq. (2.9) are zero. For k=n, the integrand on the left-
hand side of eq (2.9) equals 1, and thus, the integral equals T (the right-hand side). We
therefore have:
∫⎩⎨⎧
≠=
=−T
tnkj
nknkT
dte0
)(
,0,
0ω
and consequently, the right-hand side of eq (2.8) reduces to Cn giving:
∫ −=T
tjnn dtetx
TC
0
0)(1 ω (2.10)
Note, that this equation looks very similar to the Fourier Transform:
)()( ∫+∞
∞−
−= dtetfF tjωω (2.11)
Here, we have written an equivalent expression for the Fourier series in terms of
the fundamental frequency ωo and the fundamental period T. Equation (2.5) is referred to
as the synthesis equation and eq. (2.12) as the analysis equation. The set of coefficients of
Cn are often called the Fourier series coefficients or the spectral coefficients of x(t).
These complex coefficients measure the portion of the signal x(t) that is at each harmonic
of the fundamental component. It’s interesting to note that when n=0 then eq. (2.9)
becomes:
∫=T
dttxT
C0
0 )(1 (2.12)
17
This is a simple average value of x(t) over one period.
2.3.2 Dirac Delta Function
One of the best ways to understand Fourier analysis is to analyze a square pulse
train. A square pulse can be viewed as a magnified version of a Dirac delta function δ(t)
which is defined in the continuous domain. The equivalent version of the Dirac delta in
the discrete domain is known as the Unit Step or Kronecker Delta, as shown in Figure 2.3
⎩⎨⎧
=≠
=0,10,0
][nn
nδ (2.13)
Figure 2.3: Discrete-time unit impulse (sample) (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)
The Dirac delta function δ(t) is zero for t ≠ zero, but is infinite at t = 0 in such a way that
its integral is unity. This function is one that is infinitesimally narrow, infinitely tall, yet
integrates to unity. Perhaps the simplest way to visualize this is as a rectangular pulse
from 2ε
−a to 2ε
+a with a height ofε1 as shown in Figure 2.4. As we take the limit of
this, 0lim0
→→
εε
we see that the width tends to zero and the height tends to infinity as the
total area remains constant at one as shown in Figure 2.5. The impulse function is often
written as δ(t) [Selik, M. & Baraniuk, R].
∫+∞
∞−
= dtt)(1 δ (2.14)
18
Figure 2.4: The Dirac Delta Function (Selik, M. & Baraniuk, R., The Impulse Function, Connexions)
Since it is quite difficult to draw something that is infinitely tall, we represent the Dirac
with an arrow centered at the point it is applied.
2.3.3 Fourier Coefficients
The relationship between time and frequency representation are mutual. A sharp
spike in the time domain, represented by a unit dirac delta function, is represented as a
superposition of all frequencies with equal amplitudes in the frequency domain and vice
versa in the time domain [Calvert, J.B]. This is shown in Figures 2.5, 2.6 below
Figure 2.5: Dirac Delta in Time-Domain Figure 2.6: Dirac Delta in Frequency-Domain (Calvert, J.B., Time and Frequency) (Calvert, J.B., Time and Frequency)
The use of complex exponential in the Fourier transform is very convenient, since
complex coefficients generated by it can be expressed using magnitude and phase. As
ε/1
2/ε− 2/ε
19
mentioned earlier, analyzing the square pulse in the frequency domain yields more
insight into this relationship. When we have a signal of certain duration, such as a
rectangular pulse, the frequency representation is no longer like that of the dirac delta.
Interestingly, the frequency response of a rectangular pulse is a sinc function whose
central lobe’s width is inversely proportional to the width of the rectangular pulse. This is
seen in Figure 2.7
⎪⎩
⎪⎨⎧
=
xxc sin
1sin (2.15)
Figure 2.7: Frequency Response of Rectangular Pulse
2.3.4 Derivation of Fourier Coefficients
To confirm the above relationship and see the mathematical beauty behind it, let’s
consider a periodic square wave over one period as shown in Figure 2.8[Oppenheim, A.V
& Willsky, A.S, 1996]
⎪⎩
⎪⎨⎧
<<
<=
2/,0
,1)(
1
1
TtT
Tttx (2.16)
(Calvert, J.B., Time and Frequency)
for x = 0
otherwise,
20
-T0/2 T0/2 -T0 T0
x(t)
t
Figure 2.8: Periodic Square Wave (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)
This signal is periodic with fundamental period T and fundamental frequencyTπω 2
0 = .
Due to its periodic nature, let’s analyze the pulse centered at t=0, where –T/2 ≤ t < T/2. It
is this interval over which the integration is performed. Using these limits of integration
we have n=0 and therefore eq. (2.10) becomes
∫−
==1
1
10
21 T
T TTdt
TC (2.17)
As mentioned earlier C0 is interpreted as a dc or constant component, which in this case
equals the fraction of each rectangular pulse during which x(t)=1. For n ≠ 0, eq. (2.10)
becomes:
∫−
−
−− −==1
1
1
1
00
0
11 T
T
T
T
tjkwtjk e
Tjkdte
TC
ωω
This can be rewritten as
⎥⎦
⎤⎢⎣
⎡ −=
−
jee
TKC
TjTj
k 22 1010
0
ωω
ω (2.18)
Noting that the term in the brackets is sin(kωoT1), we can therefore express the Fourier
coefficients as:
21
πω
ωω
kTk
TkTk
Ck)sin()sin(2 10
0
10 == , where ω0T = 2π (2.19)
In Figure 2.9 the coefficients are plotted for a fixed T1 and several values of T.
Although our time domain signals are real, the frequency domain representations may be
complex (Ck coefficients). For this specific example, the Fourier coefficients are real and
consequently, they can be depicted graphically with only a single graph. So, as we
change the interval length of the square wave T, we also in turn change the width of the
rectangular pulse width which affects the width of the center lobe of the sinc function.
The narrower the width of the rectangular pulse the wider the width of the center lobe of
the sinc function becomes, since area under the region has to be conserved.
Figure 2.9: Fourier Series Coefficients for a Periodic Square Wave: (a) T0=4T1; (b) T0=8T1; (c) T0=16T1 (Oppenheim, A.V & Willsky, A.S, Signal and Systems 2nd edition)
22
2.4 Hilbert Transform
The complex exponential is a vital component of the Fourier and the short time
Fourier transform. It acts as a kernel and extracts phase and magnitude information from
the analyzed signal, therefore it is important to get a good perspective of the exponential
term of the Fourier transform eq. (2.11) and how it is related to the phase. The ability of
the complex exponential to act as a modulator and frequency shifter helps to understand
the filter-bank structure of the short-time Fourier transform (STFT). In eq. (2.11) we
have:
)()( ∫+∞
∞−
−= dtetfF tjωω where:
)sin()cos( tjte tj ωωω +=− (2.20)
This eq. (2.20) can be interpreted as an analytic signal of a cosine.
2.4.1 Analytic Signal
An analytic signal is a complex signal created by taking a signal and then adding
in quadrature its Hilbert Transform. It is also called the pre-envelope of the real signal
[Langton, C, Signal Processing & Simulation Newsletter]. It can be defined as:
∧
+ += )()()( tgjtgtg (2.21)
Substituting cosωt for g(t) in eq. (2.21) we get:
tjetjttg ωωω =+=+ )sin()cos()(
Before we go any further let’s understand the Hilbert transform and how it is related to
the analytic equation.
23
2.4.2 Hilbert Transform Theory
The Hilbert transform is related to the Fourier series, which is a representation of
a signal as a summation of sines and cosines eq. (2.2.3). By analyzing the building blocks
of the Fourier series we can understand the Hilbert transform. In general, the Hilbert
transform acts as a filter that changes the phase of the spectral components depending
on the sign of their frequency. It only effects the phase of the signal and has no effect
on the amplitude [Langton, C, Signal Processing & Simulation Newsletter]. Let’s take a
look at how we have come to this conclusion.
Recall that the Fourier series can be written as:
∑ ∑+∞
−∞=
+∞
−∞=
+=n n
nn tBjtAtx )sin()cos()( 00 ωω (2.22)
where:
C A jBn n n= + and C A jBn n n− = − (2.23)
An and Bn are the spectral coefficients of cosine and sine waves. The phase of the signal
is calculated by
n
n
AB1tan −=ϕ (2.24)
Cosine waves are 90° out of phase compared to sine waves and vise versa. So if a wave is
strictly in terms of cosines then the Bn component of eq. (2.6) is zero, therefore the phase
of the signal is zero. One way to look at the phase is the angle between real and
imaginary axis, which implies the spectral components of the signal to lie on the real axis
as shown in Figure 2.10
24
Figure 2.10: Cosine Wave Properties (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)
Similarly, the sine terms have its An component of eq. (2.6) as zero, therefore the phase
of the signal is 90°. In other words the phase of the sine terms is not symmetric where it
has a +90 for positive frequencies and -90 for negative frequencies. This symmetrical
concept is clearly presented by the variable Q in Figure 2.11
Figure 2.11: Sine Wave Properties (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) 2.4.3 Phase Rotation
In the above section we described important characteristics of sin and cosine
terms in the spectral domain. The term Q is directly related to the phase of the signal. So,
if we were to turn the cosine into a sine, we need to rotate the negative frequency (-Q)
component of the cosine by +90° and the positive frequency component (+Q) by -90° In
v ( t )
t
f
A /2 A/2
Real
[V]
F r equency
[ V]
Magnitud e S p ectru m
-f + f Q+
Q-
A/2 A /2
Spectral Amplitude
v( t )
t
f F r equency
[V]
Real
[V]
Q+
Q-
Spectral Amplitude Magnitud e S p ectru m
A/ 2A/2A/2
A/2
-f +f
25
other words we need to multiply the –Q component by j and the +Q component by –j as
shown in Figure 2.12 [Langton, C, Signal Processing & Simulation Newsletter]
Figure 2.12: Rotating Phasors to create a sine wave out of a cosine (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) Therefore for any signal g(t) its Hilbert Transform is:
G fj for f
j for f^
( ) =− >
<0
0 (2.25)
(The hat over G(f) is a typical way of representing a time domain signal as a Hilbert
Transform)
For example, applying the Hilbert transform on a cosine term gives us a sine term.
Applying it again gives us a negative cosine term and further application gives us a
negative sine term and then at last our original cosine.
ttttt ωωωωω cossincossincos →−→−→→
For this reason Hilbert transform is also called a “quadrature filter” [Langton, C,
Signal Processing & Simulation Newsletter]. As seen in the following Figure 2.13
R e a l
[V ]
A /2A /2
+9 0 °- 90° Q+
Q-
26
Figure 2.13: Hilbert Transform shifts the phase of positive frequencies by -90° and negative frequencies by +90° (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) 2.4.3 Complex Envelope
Based on our knowledge of the analytic signal and the Hilbert transform we can
now analyze the complex exponential. The analytic signal of a cosine, knowing that its
Hilbert transform is a sine, is given by:
tjetjttg ωωω =+=+ )sin()cos()( (2.26)
We know that the spectral components of a cosine term lie on the real axis and the
spectral components of a sine term are asymmetrical in nature and lie on the imaginary
axis (sec. 2.4.2). It is interesting to note that the analytic signal of a cosine (complex
exponential) has its spectral components all in the positive domain of the real axis even
thought it consists of both cosine and sine terms, as shown in Figure 2.14
.
Figure 2.14: Spectral Properties of the Complex Exponential (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter)
v ( t )
t
f R e a l
[ V ] s q r t ( 2 ) * s q r t ( 2 ) *
R ea l
[φ ]
- 90 °
+ 9 0°
27
We now can define the complex envelope as:
g t g t e j f tc+ =( )
~( ) 2π (2.27)
where,~
)(tg is the complex envelope of the signal )(tg . Rewriting this equation (eq. 2.28)
and taking its Fourier transform (eq. 2.29) reveals the complex envelope is just a
frequency shifted version of the analytic signal [Langton, C, Signal Processing &
Simulation Newsletter]:
tfj cetgtg π2)()(~
−+= (2.28)
GG f f for f
G for ffor f
f
c~ ( )( )( ) =
− >=
<
⎧
⎨⎪
⎩⎪
2 00 0
0 0 (2.29)
It is this feature that is used in linear system theory, where e-jωt acts as a
modulator. We will see its application in the coming sections of the short-time Fourier
transform. One might ask why do we need this representation, is there an advantage?
Here is an example which shows the advantages of complex envelopes.
2.4.4 Advantages of the Complex Envelope
To illustrate the advantages of the complex envelope let us consider an example
where the signal s(t) is a base-band signal:
ttts 3sin62cos4)( −= (2.30)
(Note: for simplification purposes the 2π factor is omitted)
with phase and magnitude properties are shown in Figure 2.15.
28
Figure 2.15: Spectral Properties of the s(t) (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) Now lets multiply the following signal with cos(100t) to modulate and make it a band-
pass signal to give us:
ttstg 100cos)()(^
=
tttttg 100cos3sin6100cos2cos4)(^
−= (2.31)
It is important to note that the envelope of the modulated signal is the information signal.
In Figure 2.16 the solid line represents this information signal.
Figure 2.16: The Modulated Signal and its Envelope (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) After simplifying the signal using trigonometric identities we will take the Hilbert
Transform of )(^
tg and create its analytic signal. The steps are as follows:
The trigonometric identities used to simplify eq. (2.31) are:
sin cos sin( ) sin( )A B A B A B=
+ + −2
cos cos cos( ) cos( )A B A B A B=
+ + −2
2 3-2-3
23
29
Using these identities we get:
tttttg )1003sin(3)1003sin(3)1002cos(2)1002cos(2)(^
−−+−−++= (2.32)
Applying the Hilbert Transform to each term gives us:
g t t t t t^( ) sin( ) sin( ) cos( ) cos( )= + + − + + + −2 2 100 2 2 100 3 3 100 3 3 100 (2.33)
Now we will create an analytic signal by adding the original signal eq. (2.32) with its
Hilbert Transform eq. (2.33):
)()()( tgjtgtg )+=+
))1003cos(3)1003cos(3)1002sin(2)1002sin(2()1003sin(3)1003sin(3)1002cos(2)1002cos(2)(
ttttjtttttg
−+++−+++−−+−−++=+
(2.34)
Rearranging eq. (2.33) using the Euler representation gives us:
( ) tj
etttg100
3sin62cos4)( −=+ (2.35)
It is interesting to see that eq. (2.4.12) and eq. (2.4.7) are similar except for the
modulator (ejωt). On analyzing the complex envelope s(t) and the analytic signal eq.
(2.35) in the frequency domain it becomes clear why this representation is advantageous
when the analytic signal is viewed as a pass-band signal. Now taking the Fourier
Transform of eq. (2.35) and (2.30), It is clear that the analytic signal based on the
sampling theorem needs a higher bandwidth than the complex envelope. So, with this
method of coding it is easier to separate the information signal s(t) from the carrier. This
concept is used in time to frequency transforms when STFT is represented as a bank of
filters or pass-bands.
30
Spectrum of the Complex Envelope s(t) Spectrum of the Analytic Signal
Figure 2.17: Frequency Domain Representation of Complex Envelope and Analytic Signal (Langton, C., Hilbert Transform, Analytic Signal and the Complex Envelop, Signal Processing & Simulation Newsletter) 2.5 Summary
In this chapter we have seen the basic building blocks that are required to
represent a signal. We have seen that a complex function can be represented through its
simple building blocks, also known as the basis function. Using these building blocks we
form a compressed representation of a complex function. One generalized view of a
complex function is of the form:
( ) ( )ii
i FunctionSimpleweightFunctionComplex •= ∑
In our case sinusoids (complex exponential) are the building blocks of the Fourier
transform, where for each frequency of the complex exponential the sinusoids at that
frequency are compared to the signal. Based on the analysis the frequency correlation are
determined. The spectral coefficients are high if the correlation is high and vice versa.
Along with this we have also seen how the complex exponential term acts as a modulator
(frequency shifter) and how it affects the signal as a complex envelope. This gives us
insight into how the Fourier transform can be viewed as linear system. Taking this
knowledge we shall proceed to the next chapter where we will discuss the short-time
Fourier transform and the conditions that gave rise to such a concept.
4 6
F r eque nc y 102
F r e qu en cy
4
6
10332
31
Chapter 3: Time to Frequency Mapping
3.1 Quadrature Mirror Filter (QMF)
The QMF filter-bank gained its popularity with the introduction of decimators and
expanders in its structure. The system was introduced in the mid seventies [Croisier, et
al., 1976] and has been since studied by researchers. A simple QMF filterbank consists of
two banks, typically low-pass and high-pass bandlimted to a total width ofπ . The input
signal x(t) is then filtered by H0(z) and H1(z) which are further decimated by a factor of 2
(down-sampled by factor 2, where odd samples are removed and even ones are kept)
resulting in )(0 nv and )(1 nv . These decimated signals are then sent through expanders
(up-sampled by the same factor as decimated, in our case 2) which are passed down to
filters F0(z) and F1(z) whose purpose is to cancel all types of distortions.
Figure 3.1: QMF filter-bank
There are four types of distortions caused by the filterbank structures. They are
aliasing, amplitude distortion, phase distortion and quantization effects. It was found that
in the case of M channel filter banks, the conditions for alias cancellation and perfect
reconstruction are much more complicated. This was the reason pseudo QMF techniques
were introduced [Nussbaumer, 1981], as means of approximating alias cancellation.
Vetterli and Vaidyanathan later showed that the use of polyphase components leads to
considerable calculation simplification in filter-bank theory. A technique for the design of
)(0 zH
)(1 zH )(1 zF
)(0 zF
2
2 2
2
)(nx
)(^
nx
Analysis Synthesisbank bank
)(0 nx )(0 nv
)(1 nv)(1 nx
)(0 ny
)(1 ny
32
M channel perfect reconstruction systems was developed [Vaidyanathan, 1987a,b], based
on polyphase matrixes with the so-called paraunitary property. This same property also
finds application in the theory of orthonormal wavelet transforms [Vaidyanathan, P.P,
1993].
3.2 Aliasing and Imaging
In theory, a perfect ideal filter is one that has high stop-band attenuation which
has the least aliasing effects. Aliasing can be defined in a broad sense as frequency
confusion cause by decimation of a signal. The decimation (down-sampling) process
causes overlap between the adjacent sub-bands, as shown in Figure 3.2
Figure 3.2: Aliasing
When substantial energy for a bandwidth exceeds the ideal pass-band region,
aliasing has greater effect on the integrity of the signal. In principle it is possible to
choose filters that do not overlap, but this causes severe attenuation in the region of no
overlap. Boosting frequencies in that region will result in severe amplification of noise
(coding noise, channel noise, filter roundoff noise). A solution to this problem might be
increasing the filter order but this can be expensive computationally. The overlapping
response is therefore more practical. Even though this causes aliasing, the effect can be
cancelled by carefully designing the synthesis filters [Vaidyanathan, P.P, 1993].
Let’s examine Figure 3.1 in the z-domain to get a better understanding of the process.
The input signal x(n) can be expressed as:
ω 0
2π π
)(1 zH)(0 zH
overlap
33
)()()( zXzHzX kk = (3.1)
where k = 0,1. The z-transform of the decimated signal )(nvk can be expressed as:
⎥⎦
⎤⎢⎣
⎡+=
−)()(
21)( 2
121
zXzXzV kkk (3.2)
for k = 0,1. Upsampling the decimated signal yields:
)]()([21)()( 2 zXzXzVzY kkkk −+== (3.3)
The reconstructed signal is
)()()()()( 1100
^zYzFzYzFzX += (3.4)
Substituting eq. (3.2.3) in eq. (3.2.4) we obtain:
[ ] [ ] )()()()()(21)()()()()(
21)( 11001100
^zXzHzFzHzFzXzHzFzHzFzX −−+−++=
In matrix form
[ ]43421444 3444 21)(
1
0
)(
10
10^
)()(
)()()()(
)()()(2
zfzH
zFzF
zHzHzHzH
zXzXzX ⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡−−
−= (3.5)
Here the matrix H(z) is know as the alias component matrix. It can be noted that
)()( )( πω−=− jeXzX takes into account aliasing due to decimation and imaging due to
expanders. So, it is clear that we can cancel aliasing by choosing the filters such that the
quantity )()()()( 1100 zFzHzFzH −+− is zero. Implying that )()( 10 zHzF −= ,
)()( 01 zHzF −−= in order to satisfy the above condition.
34
3.3 Distortion Transfer Function
Apart from aliasing, amplitude distortion and phase distortion are related to the
distortion transfer function T(z). The distortion transfer function or ‘overall’ transfer
function is defined as:
[ ])()()()(21)( 1100 zFzHzFzHzT += (3.6)
and is related to the input signal:
)()()(^
zXzTzX = (3.7)
Expressing T(z) in terms of magnitude and phase:
)()()( ωφωω jjj eeTeT = (3.8)
we can represent eq. (3.7):
)()()( )(^
ωωφωω jjjj eXeeTeX = (3.9)
If )( ωjeT ≠ 0 then we have amplitude distortion. Similarly if T(z) does not have linear
phase, X(z) suffers from phase distortion.
3.4 Polyphase Representation
Examining the matrix representation eq. (3.5) we can in principle cancel aliasing
by solving for the synthesis filters from )()()( 1 ztzHzf −= , where )()()( zfzHzt = but
this results in calculating f(z) explicitly as:
)()(det)()( zt
zHzAdjHzf = (3.10)
This is possible unless the determinant is not equal to zero. Also the zeros of the quantity
det H(z) are related to the analysis filters Hk(z) in a very complicated manner, thus
making it difficult to ensure if they are inside the unit circle which is necessary for
35
stability of Fk(z) [Vaidyanathan, P.P, 1993]. It is for this reason that the polyphase
representation comes in handy.
3.4.1 Perfect Reconstruction
We now can express Hk in eq. (3.1) as an M channel filter-bank, where:
∑−
=
−=1
0)()(
M
l
Mkl
lk zEzzH (3.11)
Similarly, the synthesis filters Fk can also be expressed as:
)()(1
0
)1( MM
llk
lMk zRzzF ∑
−
=
−−−= (3.12)
Examining eq. (3.11) and eq. (3.12) we can generate the matrix E(z) and R(z), which are
related to each other as:
IzEzR =)()( (3.13)
Our aim is to obtain the reconstructed signal unchanged, and this is only possible if each
matrix nullifies the effect of the other. In other words the product of the two polyphase
matrix equals an identity matrix. The condition still holds if we replace eq. (3.13) with:
IczzEzR m0)()( −= (3.14)
Since, the output of the filter-bank structure is just the delayed version of the input, we
can make this kind of modification to the equation. A matrix representation of this case
eq. (3.14) is:
⎥⎦
⎤⎢⎣
⎡= −
−−
00
)()( 10 rM
r
m IIz
czzEzR
The reconstructed signal is of form )()( 0
^nncxnx −= , where n0 = Mm0 + r + M-1 for
some integer ‘r’ with 0 ≤ r ≤ M-1 and m0.
36
3.5 Paraunitary Property
In the above section we expressed an M channel filter in terms of polyphase
matrices E(z) and R(z). It should be noted that if these filters are FIR and the filter-bank
has perfect reconstruction property, then the polyphase matrix E(z) has to satisfy the
condition that the determinant of E(z) must be a delay [Vaidyanathan, P.P, 1993] where:
kzzE −= α)(det , α ≠ 0, K= integer (3.15)
A causal transfer matrix H(z) is said to be lossless or paraunitary if (a) each entry Hkm(z)
is stable and (b) H(ejω) is unitary.
Before we examine the paraunitary property we must understand what a unitary matrix is.
3.5.1 Unitary Matrix
A complex matrix A is said to be unitary if A§A = I, for example:
A = ⎢⎣
⎡i1
21 ⎥
⎦
⎤−−
1i A* = ⎢
⎣
⎡− i1
21 ⎥
⎦
⎤−1i
(3.16)
A‐1 = ⎢⎣
⎡−−
i1
21 ⎥
⎦
⎤1i A§ = A*T = ⎢
⎣
⎡i1
21 ⎥
⎦
⎤−−
1i (3.17)
then A§ = A‐1
This property complements the paraunitary property which states that a matrix function
H(z) is said to be paraunitary if it is unitary for all values of the parameter z
IzHzH T =− )()( 1 , for all z ≠ 0 (3.18)
eq. (3.18) define a lossless system to be causal, stable and paraunitary.
37
3.6 Summary
3.6.1 Advantages of Paraunitary Filter Banks
The advantages of applying the paraunitary property to E(z) is that no matrix
inversion is involved in the design. The synthesis filters are FIR and have the same length
as the analysis filters, and can be obtained by time-reversal and conjugation of the
analysis filter coefficients. If the paraunitary matrix E(z) is implemented as a cascade
structure then the perfect reconstruction property still holds in spite of multiplier
quantization. The cascade paraunitary structure also ensures that the computational
complexity is low. Last but not the least the filter banks with paraunitary E(z) can be used
to generate an orthonormal basis for wavelet transforms. The orthonormal basis property
is discussed in Chapter 6 which leads to orthonormal basis tree structured filter banks,
also know as wavelet packets.
38
Chapter 4: Short-Time Fourier Transform
In section 2.3.3 we observed the relationship between time and frequency
representation. We observed that if an impulse in the time domain was viewed as a very
narrow rectangular pulse then its frequency representation would be a sinc function
whose central lobe is affected by the width of the impulse in the time domain. This in
turn reflects on the localization property of the Fourier transform which rejects the notion
of “frequency that varies with time.” According to Fourier analysis, a single frequency is
always associated with infinite time duration, as shown in Figure 4.1. To deal with this
time localization problem, the sampled signal can be windowed.
The basic mechanics of the discrete Fourier transform is to multiply the analyzed
signal with an impulse train of a certain sampling frequency, abiding by the sampling
theorem. Assuming the sampled signal is harmonic over N samples we then window the
digitized signal. Starting with the fundamental frequency we multiply the signal by a
complex exponential and perform summation (calculating the area under the curve) of the
result. Recall that the Fourier transform equation is the summation of ftjetf π2)( − over an
interval. This term can be interpreted as the block diagram, in Figure 4.1, where the
cosine and sin multipliers are part of the complex exponential.
)(tf
)2cos( tπ
)2sin( tπ
)2cos()( ttf π
)2sin()( ttf π
Figure 4.1: FFT Block Diagram
39
When the cosine and sine multipliers are multiplied, the area of the resultant
signal is considered. If the resulting area is zero then there is no correlation. Harmonic
frequencies of sine and cosines are multiplied and Fourier coefficients are thus derived.
Though this approach may seem trivial, the discrete Fourier transform has drawbacks
based on it formulation. For it work flawlessly it must have a sample space ranging from
negative infinity to positive infinity, which is not practical. So, in order to tackle this
problem, a discrete set of samples was windowed and then applied to the Fourier
transform. The window is then shifted in uniform amounts and the above computation is
repeated. This is also known as the short-time Fourier transform.
4.1 Analysis of the STFT Equation
The short-time Fourier transform consist of three main components. The signal to
be analyzed x(n), the window function v(n), and the Fourier transform kernel or the basis
function nje ω− . First the signal is multiplied with the window signal, which is typically of
a finite duration. After this is done the kernel is applied to the product x(n)v(n) to
calculate the Fourier transform. The window is then shifted and the process is repeated
again.
nj
n
jSTFT emnvnxmeX ωω −
∞
−∞=
−= ∑ )()(),( (4.1)
The function ),( meX jSTFT
ω has two variables ω and m. The frequency variable ω is
continuous and ranges from ‐π ≤ ω < π. The shift variable m is typically an integer
multiple of some fixed integer. Essentially the window captures features of the signal
around m and helps to localize time domain data.
40
4.2 STFT as a Bank of Filters
Since most signal processing is done using linear time-invariant signals, it is
beneficial to explore this representation of STFT. Furthermore, this interpretation helps
us to generalize the STFT to obtain more flexibility [Vaidyanathan, P.P, 1993]. In section
2.4.3 we discussed the complex envelope and showed how the complex exponential acts
as a modulator that performs a frequency-shift. More specifically, it shifts the Fourier
transform towards the left by cf . The STFT can be looked as a bank of band-pass filters.
The basic block diagram of a single frequency channel as shown in Figure 4.2
Figure 4.2: STFT Represented in terms of a Linear System
To gain further insight, of this let’s modify the eq. (4.1) by multiplying it with mje ω− :
)()()(),( nmj
n
mjjSTFT emnvnxemeX −−
∞
−∞=
− −= ∑ ωωω (4.2)
This equation represents an LTI system as shown in Figure 4.3, where m
represents the center of the STFT window. Although k is not mentioned in the above
equation it is related to m in such a way that m is the integer multiple of k. So, if the
window were to shift it would be from v(n), v(n-k), v(n-2k) and so on. In this example let
k = 1, so the output is constant like a traditional Fourier transform. The impulse response
of the LTI system is a band-pass filter of the form njenv 0)( ω− whose frequency
representation is )( )( 0ωω−− jeV . The output sequence t0(n) is therefore a band-pass filter,
∑∞
∞−
)(ns)(nx
njemnv 0)( ω−−
),()( 0 meXny jSTFT
ω=
41
whose pass-band is centered around ω0. The modulator acts as a frequency shifter which
re-centers the frequency response around zero [Vaidyanathan, P.P, 1993].
Figure 4.3: Rearranged STFT Representation in terms of a Linear System
Examining this in the frequency domain we see that the STFT reduces to a
filterbank with M band-pass filters with response )()( )( kjjk eVeH ωωω −−= , as shown in
Figure 4.4. The pass-band of )( ωjk eH is centered around ωk, where k = 0, 1, 2 ..., M-1
Figure 4.4: STFT viewed as a Filter-Bank
4.3 Effects of Windowing
Unlike the traditional Fourier transform the STFT is uniquely defined based on
the type of window chosen v(n). The choice of window governs the tradeoff between
time localization and frequency resolution. It is interesting to see that as the window
function gets wider the frequency information gets more localized and vice versa. Figure
4.5 shows how a small window of 512 samples has wider lobes compared to the larger
window of 2048 samples. This confirms our previous statement that wider windows have
better frequency resolution or better information localization in the frequency domain.
)(nx
)(0 nt
nje 0ω−
),()( 00 neXny j
STFTω=njenv 0)( ω−
(modulator)
ω
0H 1H 2H 1−MH 0H
1−Mω2ω1ω0ω
π2
42
Figure 4.5: Fourier Transform of 512 (left) and 2048 (right) Samples
4.3.1 Choice of the Best Window
Earlier we know that a narrow window in the time domain leads to a broader
frequency transform and vice versa. To make this concept more precise the rms (root-
mean squared) duration of a signal was introduced [Gabor, 1946], [Papoullis 1977a]. The
two non-negative quantities Dt and Df are defined as the duration energy in the time and
frequency domain:
∫+∞
∞−
= dttvtE
D t )(1 222 (4.3)
∫+∞
∞−
ΩΩΩ= djVE
D f222 )(
21π
(4.4)
Where E is the window energy, that is ∫= dttvE )(2 and v(t) is the signal in the
time domain. Interesting enough, the rms duration of a triangular waveform is smaller
than that of a rectangular one even though they both share the same duration. This is
because of the t2 factor in the definition of D2t increases for non-zero values of v(t). It
turns out based on the uncertainty principle the product of ft DD cannot be arbitrarily
43
small. ft DD ≥ 0.5 if and only if 0,)(2
>= − ααtAetv . Therefore, the optimal window
would take the form of a Gaussian waveform, with its ideal length being infinity
[Vaidyanathan, P.P, 1993].
4.4 Summary
In this chapter we have seen the transition of the traditional Fourier transform to
the short-time Fourier transform. We have come to realize that the windowing function is
what uniquely defines the STFT, which underlines its weakness. The STFT is the result
of the evolution of the Fourier transform in order to gain better flexibility in localizing
time and frequency. In the next chapter we shall explore a new way of analyzing a time
signal in order to gain better frequency resolution.
44
Chapter 5: The Wavelet Transform
The short-time Fourier transform is a convenient way to analyze the frequency
information of a signal. However, we know that audio signals and most of the signals that
exist in the real world are very dynamic in nature. Information can be hidden by means of
modulation. To get a better understanding of the weakness of the STFT, let’s consider
Figure 5.1 which shows two cases.
5.1 Weakness of the STFT
In the first case, x(t) is a high-frequency signal and v(t) is the window function. It
is apparent from part (a) of the above figure that the window function captures many
cycles of the input signal x(t) as compared to part (b). Thus, the accuracy of the estimated
Fourier transform is poor at low frequencies, and improves as the frequency increases
[Vaidyanathan, P.P , 1993]. To gain more information about the signal it would be
appropriate to have a window whose width adjusts with the frequency of the input signal.
An ideal filter bank structure would have narrow bandwidths (wider windows) at low
frequencies and wider bandwidths (shorter windows) at high frequencies. Keeping this
)()( tvtx
)(tv
)(a )(b
Figure 5.1: (a) high-frequency signal, (b) low-frequency signal x(t) modulated by the windowed function v(t)
tt
45
concept in mind one can tackle this problem by replacing the window function v(t) with a
function of both frequency and time, so that the time domain window gets wider (narrow
bandwidth) as frequency decreases and vice versa. This way the window function
captures the same amount of zero crossings of the input signal irrespective to the change
in frequency. Furthermore, as the window gets wider, it is also desirable to have wider
step sizes for moving the window. This also means that the decimation ratio also
increases as you go higher in frequency.
5.2 STFT to Wavelets
When the STFT is viewed as a bank of filters it consists of band-pass filters of
equal bandwidth, which are obtained by modulating component mje ω− . It is this that
restricts the time resolution of the STFT. To overcome this, one must abandon this
modulation scheme and replace it with a function of both frequency and time, Thus
obtaining filters )(thk where a is greater than one and k is an integer:
)()( 2/ tahath kkk
−−= (5.1)
Here k plays the role of frequency, thus frequency scaling the response rather than
frequency shifting it (STFT case). The scale factor 2/ka − in eq. (5.2.1) is meant to ensure
that the energy dtthk∫∞
∞−
2)( is independent of k. An equivalent representation of eq. (5.1)
in the frequency domain can be written as:
)()( 2/ Ω=Ω kkk jaHajH (5.2)
We know the ear can be viewed as a set of non-linear band-pass filters whose frequency
resolution decreases with the increase in frequency. Based on this analogy let’s assume
)( ΩjH as a band-pass filter with cutoff frequencies α and β. Since discrete systems are
46
efficiently described in powers of two let’s assume a = 2 and β = 2α. We can define the
center frequency to be the geometric mean of the two cutoff edges, that is [Vaidyanathan,
P.P, 1993]:
222 kkk
−− ==Ω ααβ (5.3)
5.2.1 Modification of the STFT
If we consider the continuous version of eq. (4.1) we get:
dtthtxejX kkSTFTk )()(),( −=Ω ∫Ω− ττ τ (5.4)
here τke Ω− is the kernel, Ωk is the frequency (analog domain) and hk are the filters.
Keeping this equation structure we substitute eq. (5.1) in eq. (5.4) and get:
dttahtxea kjk k ))(()(2/ ∫∞
∞−
−Ω−− −ττ
(5.5)
Now we know from our earlier discussion that the bandwidth of Hk(jΩ) gets
smaller as k (frequency) increases. With this varying bandwidth the windowing (time
domain) is also affected. As the window size varies one must account for the step sizes,
so we replace the continuous variable τ with Tnak , where n is an integer. This means
that the step size for window movement is Tak and it increases with k. In other words the
window movement increases as the center-frequency Ωk of the filter decreases.
eq. (5.5) takes care of the first case of changing bandwidths. Removing the kernel and
taking account of the step size ( Tnak ) as frequency resolution (bandwidth) changes we
can modify eq. (5.5) and get:
∫∞
∞−
−= dttTnahtxnkX kkDWT )()(),(
(5.6)
47
Note the above equation represents the convolution between x(t) and hk(t)
evaluated at a discrete set of points nakT. In other words, the output of the convolution is
sampled with spacing akT [Vaidyanathan, P.P, 1993]. To summarize the fundamental
differences between the STFT and Wavelet transform one can look at the time-frequency
plot. The STFT has uniform time and frequency spacing whereas the wavelet transform,
the frequency spacing gets smaller at lower frequencies, and the corresponding time
spacing get larger [Vaidyanathan, P.P, 1993], as shown in Figure 5.2
Figure 5.2: Fundamental difference between the STFT (a) and the wavelet transform (b) Vaidyanathan, P.P, “Multirate Systems and Filter Banks”
The beauty of wavelets is that the wavelet transform is not explicitly implemented by a
moving window because there is in reality no unique window, as seen in equation (5.6).
The system is in essence a filter bank, and is somewhat analogous to the family of
windows.
Another general form of the wavelet transform is:
∫∞
∞−⎟⎟⎠
⎞⎜⎜⎝
⎛ −= dt
pqtf
pqpX CWT
1),( (5.7)
t (time)
Ω (frequency) Ω1 2Ω1 ...
3T 2T T
Ω0/4 Ω0/2 ...
t (time)
Ω (frequency)
4T 3T 2T T
(a) (b)
48
here p and q are real-valued continuous variables, where kap = , Tnaq k= and f(t) = h(-
t). This is known as the continuous wavelet transform. The variable p can also be
considered as a scaling function, where the scale factor of a wavelet is inversely related to
the frequency, in other words the larger the scale of the wavelet the lower the frequency
of the wavelet, the narrower the bandwidth and vice versa. The variable q can be looked
upon as the translation parameter which is responsible for the shifting of the wavelet. It’s
step size movement increases as the value of k increases. This can be seen in Figure 5.3
Figure 5.3: Amplitude, scale and translation plot of a continuous wavelet transform Robi, P., “The Story of Wavelets”, Rowan University ©
5.3 Inversion of the Wavelet Transform
The original signal x(t) is reconstructed from the wavelet coefficients. The
reconstruction of XDWT depends on the filter h(t), and the parameters a and T which
completely characterizes the transformation. Changing a will change the spacing of the
49
band-pass filters and the frequency resolution of filter-banks as described earlier in
section 5.2. If the inverse transform exists it appears as:
∑∑=k n
knDWT tnkXtx )(),()( ψ (inverse DWT) (5.8)
5.4 Orthonormal Basis
A subset v1,.....vk of a vector space V, with the inner product is called
orthonormal if <vi,vj> = 0 when i ≠ j. That is when the vectors are mutually
perpendicular. Moreover, they are all required to have length one: <vi,vi> = 1. An
orthonormal set must be linearly independent (linear combination of functions cannot be
expressed equal to zero) so that it is a vector space basis for the space it spans. Such a
basis is called a orthonormal basis [Eric W. Weisstein et al].
Of particular interest is the case where )(tknψ is a set of orthonormal funtions,
where the integral of the basis )(tknψ and its conjugate is equal to unity.
∫∞
∞−
−−= )()()()(kn* mnlkdttt lm δδψψ (5.9)
applying the orthogonality property to eq. (5.6):
dtttxnkX DWT )()(),( kn*ψ∫
∞
∞−
=
we conclude:
)()( *2/ tanThat kkkn
−− −=ψ (5.10)
)(* tnTah kk −= (5.11)
But we have )()( tft =ψ so that, in the orthonormal case, )()( * thtf −= thus
50
)()( * thtf kk −= which is very similar to the perfect reconstruction paraunitary QMF
banks.
5.5 Wavelet Packet Analysis
Wavelet Packets are smooth versions of Walsh functions [Coifman, R, Ronald &
Wickerhauser, V, Mladen]. Walsh functions consist of trains of square pulses with -1 and
+1 states, such that transitions may only occur at fixed intervals of a unit time step, the
initial state is always +1.
It is a generalization of the wavelet decomposition that offers a rich range of
possibilities to analyze a signal. The wavelet packet analysis is a tree structured filter
bank that splits the signal in two sub-bands, and after decimating, each sub-band is again
split into two and decimated. The sub-bands are then recombined, two at a time, by use of
two-channel synthesis banks. Each node in the tree structure represents a subspace of the
original signal. Each subspace is the orthogonal direct sum (direct sum of two subspaces)
of its two children nodes. The leaves of every connected subtree give an orthonormal
basis. This procedure permits the segmentation of acoustic signals into those dyadic
windows best adapted to the local frequency content [Coifman, R, Ronald &
Wickerhauser, V, Mladen]. The low-pass sub-band of the decomposition is known as
approximation and the high-pass sub-band of the decomposition is known as detail. For
an n-level decomposition, there are n+1 possible ways to decompose the signal [Matlab
documentation, wavelet packet analysis]. A three level wavelet decomposition tree is
shown in Figure 5.4 where, the signal S can be reconstructed by adding the
approximation and its previous details.
51
S = A1+D1
= A2+D2+ D1
= A3+D3+ D2+D1
Figure 5.4: 3-level Wavelet decomposition tree
5.5.1 Discrete Wavelet Transform
The information achieved from the continuous wavelet transform (CWT) is often
redundant in nature. In fact CWT computed by computers is actually discretized versions
of itself.
∑∞
∞−
− −= dttTnahtxnkX kk
kDWT )()(2),( 2/ , k, n are integers (5.12)
An elegant way to represent the non-redundant data in the time-frequency plane
would be to sample the plane on a dyadic (octave) grid. This representation is ideal since
it maps the frequency spectrum on a logarithmic scale similar to that of the ear. The
dyadic sampling of the time-frequency plane can be achieved by series of up/down
sampling operations. This approach gives us a multi-resolution representation as seen in
Figure 5.5
1A
1A
1A
S
1D
2D
3D
52
5.6 Wavelet Packet Tree Representation
The wavelet packet tree can be represented in many ways. Among them the most
important to us are the Energy Representation, Index Representation, and the Depth
Representation.
5.6.1 Energy Representation
The energy representation of the wavelet decomposition tree displays the energy
of each node in the tree. The frequency response of the wavelet plays an important role
on how the energy is distributed along the wavelet decomposition tree which is
determined by the cut-off frequency of the wavelet. The pseudocode methodology of
calculating the energy of the wavelet tree is as follows:
1) get all coefficients of the parent node
2) calculate the total energy of the parent node
∑=n
nTotal CE 2 , where Cn are the wavelet coefficients (5.13)
3) get terminal nodes
4) calculate the energy of the terminal nodes
∑=n
nalNodesTer CE 2min , where Cn are the wavelet coefficients of the terminal nodes
Ω (frequency)
α/4 α/2 α β=2α
1 2
2
Magnitude Sampleat nT
Sampleat 2nT
Sampleat 4nT
H0
H1
H2
x(t) XDWT(0,n)
XDWT(1,n)
XDWT(2,n)
Wavelet Coefficients
Figure 5.5: (left) Frequency response obtained by scaling, (right) Filterbank representation of discrete wavelet transform
53
5) Represent the energy in terms of a percentage
Total
alNodeTeralNodeTer E
EE min
min*100
=
Below is the energy representation of a 1kHz signal, analyzed using a Haar wavelet at
depth level 3.
Figure 5.6: Depth Level-3 Energy Tree of 1kHz Signal
5.6.2 Index Representation
The index representation represents the index value of the each node in the tree.
The values progress from left to right in an ascending order. It is important to note that
the order of the index node does not change even if one of the nodes is not decomposed
in to its children. Figure 5.7 is an index representation of a 1kHz signal with node 5 not
decomposed.
54
Figure 5.7: Depth Level-3 Index Tree of 1kHz Signal
This type of representation is important in terms of coding the detection algorithm which
relies on keeping track of the terminal nodes (7,8,9,10,5,13,14).
5.6.3 Filterbank Representation
One can also look at the wavelet tree decomposition as a tree of filter banks where
each parent node is split in to its low-pass and high-pass. These nodes in turn can also act
as parent nodes and so on. Figure 5.8 shows us the mapping of the wavelet tree in Figure
5.9
Figure 5.8: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree
AA2
A1 D1
DA2 AD2 DD2
DAA3 ADA3 DDA3 AAD3 DAD3 ADD3 DDD3
Magnitude
AAA3
Frequency
55
Figure 5.9: Filter-bank Representation of Depth Level-3 Wavelet Packet Decomposition Tree
The discrete wavelet packet tree obtains it multi-resolution representation by
sampling the time-frequency plane on a dyadic (octave) grid. This is done by down
sampling by a factor of two in the analysis stage and up-sampling by a factor of two in
the reconstruction stage. Figure 5.10 is a discrete wavelet packet tree in its analysis stage.
Note that the amount of samples decreases as the depth of the wavelet tree increases, also
the bandwidth of the filters decreases by half during each decomposition.
Figure 5.10: Discrete wavelet packet tree (analysis stage)
1A
2AA
3AAA
S
1D
2DA
3DAA 3DDA3ADA
2DD
3DDD3ADD
2AD
3AAD 3DAD
Length: 512B: 0 ~ π
g[n] h[n]
g[n] h[n]
g[n] h[n]
2
d1: Level 1 DWTCoeff.
Length: 256B: 0 ~ π/2 Hz
Length: 128B: 0 ~ π /4 Hz
d2: Level 2 DWTCoeff.
d3: Level 3 DWTCoeff.
…a3….
Length: 64B: 0 ~ π/8 HzLength: 64
B: π/8 ~ π/4 Hz
22
22 22
2222
|H(jw)|
wπ/2-π/2
|H(jw)|
wπ/2-π/2
|G(jw)|
wπ-π π/2-π/2
a2
a1
Level 3 approximation
Coefficients
Length: 256B: π/2 ~ π Hz
Length: 128B: π/4 ~ π/2 Hz
56
Chapter 6: Analysis and Results
To estimate tonality it is important to understand the masking phenomenon. In
regards to masking we are more concerned with instantaneous masking or frequency
masking. Data suggest that a pure tone and narrow-band noise of equal intensity and
equal loudness have different masking ability. Narrow band noise in particular is
complicated by the fact that a reduction in bandwidth is accompanied by a decrease in the
rate of intensity fluctuations [Stevens, 1956]. Bos and de Boer [1966] point out that the
slow rate of intensity fluctuations inherent in the structure of narrow-band noise increases
its ability to mask. Young and Wenner also indicated that the 20-dB difference between
the masking effect of a tone and a narrow critical band noise disappears when the pure
tone is replaced by a tone that is frequency-modulated at a rate of 25 per Hz. More
research is needed to see how frequency-modulated tones can be used as partial maskers
[Hellman, R.P, Harvard University, Cambride, Massachusetts 02138].
In previous work [Johnston, J.D, 1988] of tonality estimation spectral flatness
measure was used to interpolate between the masking threshold formulas [Hellman, R.P,
Harvard University, Cambride, Massachusetts 02138] and [Scharf, B, 1970]. The
problem arises with the notion of global tonality [Johnston, J.D, 1990]. Signals such as
speech have “tonal” parts and “noisy” parts of considerable energy at high frequencies.
The resultant unpredictability measure will not show the parts of the signal that are very
tonal (due to the fixed block size of the transform). Tonality by definition, in terms of
perceptual coding, estimates the amount of masking a signal can achieve based on its
type (tonal or noisy). The Tonality Index on the other hand is a global value that
characterizes a signal’s tonality based on its correlation information. The wavelet packet
57
analysis is suitable for this task because one can control the bandwidth of the frequency
bands thus making it possible to detect transients (change in frequency, attack regions)
more accurately.
Our proposed model uses this analysis tool to estimate tonality and uses Fourier
analysis to determine signal levels (SPL) and spread signal levels. A general block
diagram is shown in Figure 6.1
Figure 6.1: General block diagram of the proposed model
6.1 Detection Scheme
The proposed detection scheme relies on the flow of energy, which is analyzed
using the wavelet tree decomposition. Each audio frame is considered to have an energy
value of 100 and is decomposed to the first level as shown in Figure 6.2. The energy
ratios of the child nodes (low-end and high-end) are then calculated and compared to the
Node Reconstruction
SignalLevels (eb)
Spread SignalLevels (ecb,,en,nbb,nbw)
Tonality Estimator
Tonality Indices (tbb)
FFTWavelet PacketAnalysis
Node FrequencyMapping
Detector
Masking Levels (thrw)
SMR per Sub-band
58
parent node, which are compared to a threshold ratio (in our case 1.0 ≤ ratio<2.4). Nodes
with ratios in this range are further decomposed.
Figure 6.2: level-1 Wavelet Packet Decomposition of a signal having multiple tone (4kHz, 10kHz, 15kHz)
A signal having multiple tones (4kHz, 10kHz, 15kHz) would have a decomposition tree,
as shown in Figure 6.3, where the indicate the nodes that the detector has detected.
Figure 6.3: level-3 Wavelet Packet Decomposition of multiple tones (4kHz, 10kHz, 15kHz)
It is important to know the frequency response of the wavelet being used for
decomposition, since that determines the cut-off frequency. The energy distribution is
determined by the wavelet coefficients the wavelet generates and in order to map this
(wavelet decomposition tree) to the frequency axis one must know where it cuts off the
59
frequency. The filterbank representation in the previous chapter is a good way to picture
the frequency distribution in the decomposition tree.
This thesis uses the simplest case which is the Haar and Daubechies 1 wavelet, with a
cut-off frequency that is half of Nyquist.
6.1.1 Frequency Breakdown
The resolution of the detector is based on how far it is allowed to decompose. The
level of decomposition is also known as the depth of the wavelet tree. At present the
threshold for the depth is set to 5, which means that the wavelet decomposition can
produce 62 nodes. The concept of the frequency breakdown is analogous to how the
filterbank (wavelet) splits the energy spectrum. The algorithm checks for the energy
ratios and splits the node with the highest energy, which means it will split nodes having
energies above 95% and will stop splitting nodes below 23%.
Energy residing between these percentage criteria’s usually is a strong indicator
of a tone. For example a tone present in one of the sub-bands could be detected along its
adjacent as one decomposes a node (decreases the bandwidth), this usually is seen as
energy being shifted from an approximation (low-pass) to a detail (high-pass) sub-band
or vice versa. It is at this stage that the detector has successfully detected a tone and this
is also known as frequency breakdown, as shown in Figure 6.4
60
6.1.2 Detector Pseudocode Methodology
The detector code methodology is set up in such a way that there are four pointers
(AA, AD, DD, DA). Two of the pointers are responsible for keeping track of the
offspring nodes of the approximations (low-end, AA, AD) and the other two for the
details (high-end, DD, DA). These pointers point to values that contain parent to child
node energy ratios and are updated whenever a node is split. A variable keeps track of the
energy of the child nodes; it ensures that the algorithm picks the highest energy node for
the decomposition tree path. Figure 6.5 is a 2nd level decomposition tree with pointers
(AA, AD, DD, DA) pointing to the corresponding wavelet tree branches.
Figure 6.4: level-3 Wavelet Packet Index Tree and the Coefficients of the Terminal Nodes
Figure 6.5: level-2 Wavelet Packet Energy Tree Detector Code Pointers
61
As nodes are being split, terminal nodes of the decomposition tree are stored. This
information is important since one can trace back the path the decomposition tree takes
and send the desired nodes to the tonality analyzer. The tracing of the nodes is done by
examining the energy-difference and the nodes that correspond to it. A simple formula
generates the parent nodes from it child nodes:
21−
=CNPN (6.1)
A condition is set if the child node is even, that is decrementing it by one and then
applying equation 6.1 to it. This is shown in Figure 6.6
Figure 6.6: level-2 Wavelet Index Tree used to trace the Nodes that are sent to the tonality analyzer
6.1.3 Detection Process
For an input signal containing three tones (4kHz, 10kHz, 15kHz) the detector first
analyzes the signal’s mid and high range of frequencies. In other words nodes (4), (5) and
(6) are analyzed, as shown in Figure 6.7.
62
Figure 6.7: level-2 Wavelet Packet Energy Tree Detector Stage-I: nodes (4), (5) and (6) are analyzed first
It then calculates the energy ratios based on the threshold ratio criteria (1.0 ≤
ratio<2.4) and splits the nodes that meet this criteria. The decomposition tree will stop
splitting once the tree reaches a depth of 5, this condition also in turn defines the
frequency resolution of the wavelet decomposition tree. This is seen in Figure 6.8, where
the green lines are the nodes that are selected by the detector for synthesis or
reconstruction. It is these nodes that are passed on the node reconstructor.
Figure 6.8: level-4 Wavelet Packet Energy Tree Detector Stage-II: nodes (4), (5) and (6) are analyzed first; green lines represent the nodes that are going to be analyzed by the tonality analyzer
63
Once the decompositions of the mid and high (nodes 4,5,6) range frequencies are
done; the algorithm further analyzes and decomposes low frequencies (node 3) on a new
threshold criteria (1.0 ≤ ratio) to ensure better resolution in the low frequency range, as
shown in Figure 6.9
Figure 6.9: level-4 Wavelet Packet Energy Tree Detector Stage-III: nodes (3) is analyzed; green lines represent the nodes that are going to be analyzed by the tonality analyzer. 6.2 Node Reconstruction
In Figure 6.9 the green line represents the nodes that are selected for
reconstruction. This process involves applying the inverse discrete wavelet transform to
selected wavelet coefficients (selected nodes) along the wavelet tree. The index values of
these nodes are then mapped along the frequency axis based on the bandwidth of each
node. Among the selected nodes, the correlation information of the first two nodes or
nodes with the highest energy are used to estimate tonality, as shown in Figure 6.10, this
is due to the integrity of information these nodes contain.
64
Figure 6.10 Wavelet Energy Tree: The white-arrows showing the two nodes used to calculate our tonality
The correlation information is achieved by performing the auto-correlation
function. One might ask why perform the auto-correlation function? And why not do the
auto-correlation instead? This is because the auto-correlation function varies the lag of
the step size, thus giving us more information of the reconstructed signal. The core
concept of estimating tonality relies on how much the reconstructed nodes are correlated
with its parent. This information is then passed on the tonality estimator discussed in
section 6.2
6.3 Tonality Estimation
Once the desired nodes are reconstructed (explained in Chapter 5) they are sent to
the tonality estimator. It is here that the tonality estimator decides whether the input audio
frame has tone characteristic or noise. These characteristic are accurately determined by
the auto-covariance of the auto-correlation function. To justify this process let’s look at
the characteristics of the auto-correlation function.
65
6.3.1 Auto-Correlation Function
A pure tone has periodic peaks in its auto-correlation function. These peaks
decrease in amplitude as the amount of lag (amount of overlap) is increased. Beyond a
lag of 10 the correlation information repeats itself just like a periodic signal, as shown in
Figure 6.11
Figure 6.11: Auto-correlation Function of a Pure Tone
Conversely, the auto-correlation function of a random process (noise-like) does
not contain periodic peaks, since the there is no correlation information contained in the
signal, as shown in Figure 6.12 for white noise and Figure 6.13 for bandlimited noise.
0 2 4 6 8 10 12 14 16 18 20-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Lag
Sam
ple
Aut
ocor
rela
tion
Sample Autocorrelation Function (ACF)
66
Figure 6.12: Auto-correlation Function of White Noise
Figure 6.13: Auto-correlation Function of Band limited Noise (0-22kHz)
Generally the auto-correlation function tells us how our random signal or
processes changes with respect to the time function. It also determines whether our
process has a periodic component or random ones. So, if the signal were tone like it
0 2 4 6 8 10 12 14 16 18 20-0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Aut
ocor
rela
tion
Sample Autocorrelation Function (ACF)
0 2 4 6 8 10 12 14 16 18 20-0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Aut
ocor
rela
tion
Sample Autocorrelation Function (ACF)
67
would have characteristics similar to Figure 6.11 and noise like would have
characteristics similar to Figure 6.12 and 6.13. A signal in between these characteristics
will exhibit a variation in amplitude of it peaks. To get an estimate of how these auto-
correlation peaks change in time we take the auto-covariance of our generated auto-
correlation function because by definition the auto-covariance of a random process tell us
to what extent its values co-vary [Garcia, A.L, 1994].
It is important to note that auto-covariance is similar to auto-correlations except
that the effects of the means are removed. They are mathematically related by the
following relation:
⎩⎨⎧ +
=+]][[
]][[]][[],[ 2 nxE
mnxEnxEmnnRXX (6.2)
]][[]][[],[].[ mnxEnxEmnnRmnnC XXXX +−+=+ (6.3)
Where RXX is the auto-correlation function and CXX is the auto-covariance.
6.3.2 Auto-Covariance
The nodes that are the most important to us are the top two nodes of the selected
path. It is from these nodes that we estimate our tonality, as discussed in section 6.2. In
Figure 6.14 the blue lines represents the nodes that were used to calculate tonality for
signal having pure-tone characteristics. These type of signal are analyzed using the type-I
analysis technique.
if m ≠ 0
if m = 0
68
Figure 6.14: Energy Tree, where the blue lines represent the nodes from which the tonality value is calculated
6.3.3 Type-I Analysis
There are different areas in the auto-covariance plot which are exploited for the
estimation of tonality. In the pure tone case (Type-I Analysis) the difference in maximum
values of the selected nodes is our tonality estimate, as shown in Figure 6.15.
(b)
Figure 6.15: A 4kHz tone with selected path (red arrows) and nodes used to calculate tonality value (blue lines) [left figure]; Difference of the max values of auto-covariance [right figure]
69
It is interesting to see that as we add noise to our 4kHz signal, we see that the
peak-difference varies. This variation is directly proportional to the masking effect of the
noise over tone. One can see that as the peak-difference (tonality value) decreases the
effect noise masking tone increases.
When the noise of power -0.9 dB is added to the 4kHz tone the auto-covariance
figure (Figure 6.16 b) has a peak difference of 0.53922
Figure 6.16: A 4kHz tone with -0.9dB white-noise added, selected path (red arrows) and nodes used to calculate tonality value (blue lines) [left figure]; Difference of the max values of auto-covariance [right figure]
6.3.4 Type-II Analysis
When the input signal has noise-like characteristics, for example a snare crash in
Figure 6.17
(a) (b)
70
Figure 6.17: A Snare Crash
The size of our auto-covariance side-lobe plays a vital role in estimating the
tonality value. It’s the difference between max and min points of the first 10 points that
gives us an estimated tonality value. Figure 6.18 shows three cases: White-Noise, Band-
limited Noise and 1kHz Pure tone.
Figure 6.18a: Auto-Covariance of White-Noise
(a)
time
71
Figure 6.18b: Auto-Covariance of Band-limited 0-22kHz Noise
(b)
Figure 6.18c: Auto-Covariance of Pure-Tone (1kHz)
72
It is important to know that band-limiting the noise defines the side-lobes but they
are not well defined to the extent of a pure tone. Figure 6.19 shows Type-II analysis
(side-lobe auto-covariance) applied to the snare crash example.
First Frame
Figure 6.19: A Snare crash analysis(a) Wavelet Tree, (b) Auto-Covariance The last frame of the snare crash converges to a pure tone auto-covariance, as shown in Figure 6.20
0 5 10 15 20 25 30 35 40 45-3
-2
-1
0
1
2
3AutoCovariance of ACF
0-21 lags of ACF
AC
(a) (b)
Figure 6.20: Snare Crash (Last Frame) Auto-Covariance
0 5 10 15 20 25 30 35 40 45-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4AutoCovariance of ACF
0-21 lags of ACF
AC
73
Interestingly, when observing the snare crash, the tonality estimator switches from
Type-II Analysis to Type-I Analysis, displaying its accuracy in detecting the attack of the
snare hit (noise characteristics) which later becomes tonal.
The tonality estimator switches between these two analysis techniques as the
behavior of the input signal changes and it compiles the overall tonality index. The index
is later mapped to the frequency domain
6.4 Tonality Index (Time-Domain)
Figure 6.21 shows the tonality-index of a test signal (train of 1kHz tone + noise of
power -20dB) in the time domain, it confirms our on/off state of our tone detector by
giving a value of 1 for tone-like behavior and a value of 0 for noise-like behavior. The x-
axis is the number of frames (with 50% overlap of size 1024) and the y-axis shows the
tonality-index. The band-limited noise is generated by applying a low-pass filter
(Fstop=22050, Fpass=9600, fs=44100). The green line is the depth of the wavelet tree
and the blue line is the tonality index.
0 10 20 30 40 50 60 70 80 900
1
2
3
4
5
6
Frames
valu
es 0
-1 to
nalit
y in
dex,
2-5
dep
th o
f wav
elet
tree
Figure 6.21: Tonality Index (Time-Domain) with Input Signal consisting of 1kHz tone then Bandlimited Noise (0-22kHz) of power -20dB
74
The time domain plot of the test signal (train of 1kHz tone + noise of power -20dB) is
shown in Figure 6.22
Figure 6.22: Time-Domain plot of test signal (1kHz tone then Bandlimited Noise (0-22kHz) of power -20dB)
The test signal shown in Figure 6.23 is a train of White-Noise (power -20dB) +
1kHz + Band-limited Noise (power -0.9dB). Figure 6.24 is the time domain
representation of Figure 6.23.
Figure 6.23: Tonality Index (Time-Domain) with Input Signal consisting of white noise (power -20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB)
0 20 40 60 80 100 120 1400
1
2
3
4
5
6
Frames
valu
es 0
-1 to
nalit
y in
dex,
2-5
dep
th o
f wav
elet
tree
time
75
Figure 6.24: Time-Domain plot of test signal of white noise (power -20dB) followed by a 1kHz tone and then Bandlimited Noise (0-22kHz; power -0.9dB) Observing Figure 6.23 it is clear that the detector is tolerant to different powers of noise.
The tonality index is also not effected by bandlimited noise.
6.5 Tonality Index (Frequency-Domain)
Once the wavelet tree is generated the selected nodes are used to trace back to the
parent nodes (as described in Figure 6.6). The tracing back of the nodes stops once they
reach node 1 or node 2. These nodes are then mapped to the frequency domain along with
there calculated tonality. The mapping of the nodes is done by looking at the index value
of each node and assigning it a frequency value based on a bandwidth of which the node
corresponds to. For simplification the chosen cut-off frequency is exactly half of sub-
bands bandwidth. Figure 6.25 is a representation of this with nodes up to 14.
time
76
Figure 6.25: Frequency Map of Wavelet Tree: The red arrows represent the generated path which consist of an array of nodes from which the last node value are taken (blue lines) to map. 6.5.1 Comparison with Model- 2
The frequency axis is further mapped into threshold calculation partitions, whose
widths are roughly 1/3 of a critical band. The partition values can be referred from the
Appendix-I [MPEG standard, ISO11172-3]. Figure 6.26a shows the tonality index of
Model-2 and Figure 6.26b shows the tonality index of the proposed model.
77
Figure 6.26a: Tonality Index – Model 2 (1kHz)
Figure 6.26b: Tonality Index – Proposed Model (1kHz)
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Partitions 1-57
Tona
lity
Val
ue
0 10 20 30 40 50 60
0.4
0.5
0.6
0.7
0.8
0.9
1
Partitions 1-57
Tona
lity
Val
ue
(a)
(b)
78
The error seen from partition zero to partition ten in Figure 6.26b is due to nodes
sent for frequency mapping, rather than one node (1kHz) the detector keeps sending
nodes that have high energy. This can be solved by implementing a better detection
algorithm which retains the knowledge of the energy path and only sends the specific
terminal node.
According to Table 3D-b in the MPEG standard a 1 kHz signal lies in partition 26 which
is seen in Figure 6.26b.
Figure 6.27a is the tonality index of a 4 kHz signal which lies in partition 45
according to Table 3D-b in MPEG standard ISO11172-3 and Figure 6.27b is the tonality
index of the proposed model.
Figure 6.27a: Tonality Index – Model 2 (4kHz)
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Partitions 1-57
Tona
lity
Val
ue
(a)
79
Figure 6.27b: Tonality Index – Proposed Model (4kHz) Figure 6.28a is the tonality index of a 6 kHz signal which lies in partition 50 according to
Table 3D-b in MPEG standard ISO11172-3 and Figure 6.28b is the tonality index of the
proposed model.
0 10 20 30 40 50 600.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Tona
lity
Val
ue
Partitions 1-57
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Partitions 1-57
Tona
lity
Val
ue
(b)
(a)
Figure 6.28a: Tonality Index – Model 2 (6kHz)
80
Figure 6.28b: Tonality Index – Proposed Model (6kHz)
Based on the results the proposed model performs well compared to the tonality
measure in psychoacoustic model 2. The error in the tonality index is due to the depth
constraints set during the wavelet decomposition. A depth constraint of 5 limits the
decomposition and frequency resolution of the discrete wavelet packet tree which carries
to the frequency mapping.
One must be careful in scaling the energy of the nodes, as the energy decreases on
decimation. This scaling is taken in to consideration by representing the energy of the
nodes in terms of a percentage value.
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1To
nalit
y V
alue
Partitions 1-57
(b)
81
Chapter 7 Conclusions and Recommendations
The correct estimation of tonality is vital for a perceptual audio coder, since one
can optimally shape the noise for an arbitrary audio signal. In fact the concept of
optimality means reducing as much redundant data needed to express the digital
representation of the audio signal, while maintaining the perceptual transparency or in the
case of low bit rate, minimizing the perceptual disturbance caused by increased
quantization noise.
Optimal noise shaping is achieved by considering the masking model of the
human auditory system which takes into account the relevant components of a signal in
order for it to be encoded. For this reason, masking properties of tonal and non-tonal
(noise) signals are important for coding a signal with high efficiency.
Audio coded using the MPEG standard tonality classification algorithms are
confined due to the block constraints of the Fourier transform which leads us to the
discrete wavelet transform. The discrete wavelet transform used in a tree structure gives
us the flexibility to vary our time-frequency resolution, thus enabling us to map the
frequency spectrum similar to the basilar membrane of the ear.
In our investigation of classifying tonality using wavelet packet analysis, several
interesting discoveries were made. First, it seems logical to approach tonality based on
the correlation information of the audio frame rather than a prediction scheme. The two
types of analysis (Type-I and Type-II) are good estimates of tonality whether our input is
noise or tone like. It is to be noted that the number of samples decrease as a node is
decomposed thus, calculating the tonality estimate using the first and second
decompositions was appropriate.
82
Also, it was found that lower order wavelets with poor cut-off frequency
performed well in detecting tones than ones with higher order and sharp cut-off, since the
detection scheme relies on energy, thus attenuating it would only lead to inaccurate
readings of tonality.
The flaws of the proposed tonality estimator are apparent when the detector
decides which nodes are to be sent for frequency mapping. The detector sends all the
nodes that have high energy to the node frequency mapping module, rather than sending
the terminal node after the detection process.
The concept of splitting nodes based on their energy ratio is an effective method
for detecting tones but to work perfectly requires an additional variable that retains the
knowledge of the energy path and only sends the specific terminal node for frequency
mapping. Comparing, energy differences of the terminal nodes should also be added to
the detector’s detection scheme. This is necessary when two tones lie in the same sub-
band.
On the whole, it can be concluded that our tonality estimation using wavelet
packet analysis performs well compared to the one purposed in the MPEG standard
ISO11172-3. With an accurate mapping of the wavelets frequency response and
improved detection scheme this analysis technique can prove its worth.
83
References:
1) Oppenheim, A.V & Willsky, A.S, “Signal and Systems 2nd edition”, Prentice Hall Signal
Processing Series
2) Selik, M. & Baraniuk, R., “The Impulse Function”, Connexions
3) Calvert, J.B., “Time and Frequency Domain”
4) Langton, C., “Hilbert Transform, Analytic Signal and the Complex Envelop”, Signal
Processing & Simulation Newsletter©
5) Vaidyanathan, P.P, “Multirate Systems and Filter Banks”, Pearson Education
6) Hitachi Denshi, Inc., Operation Manual, Model V-1050F Oscilloscope
7) Dynascan Corporation, Instruction Manual, Function Generator, B&K Precision 3010
8) Eric W. Weisstein et al. "Orthonormal Basis." From MathWorld
9) Coifman, R, Ronald & Wickerhauser, V, Mladen, “Entropy-Based Algorithm for Best
Basis Selection”
10) Matlab Documentation, “Wavelet Packet Analysis”
11) Hellman, R.P, “Asymmetry of masking between noise and tone”, Harvard University,
Cambride, Massachusetts 02138
12) Johnston, J.D, Brandenburg, K, “Second Generation Perceptual Audio Coding: The
Hybrid Coder”
13) Johnston, J.D, “Transform Coding of Audio Signals Using Perceptual Noise” Criteria,
IEEE Journal on Selected Areas in Communications, Vol. 6 (1988), pp. 314-323
14) Scharf, B, “Chapter 5 of Foundations of Modern Auditory Theory”, New York,
Academic Press, 1970.
15) Garcia, A.L, “Probability and Random Processes for Electrical Engineering 2nd Edition”
16) Ferreira, A.J.S, “Tonality Detection in Perceptual Coding in Audio”, AT&T Bell-
Laboratories, New Jersey, USA
17) Zwicker, E, Fastl, H, “Psychoacoustics, Facts and Models”, Springer-Verlag, 1990
84
18) Blauert, J, Spatial Hearing, The MIT Press, 1993
19) Moore, C.J.B, An Introduction to the Psychology of Hearing, Academic Press, 1982
20) Jayant, N.S, Noll, P, “Digital Coding of Waveforms”, Prentice-Hall, 1984.
21) Wickerhauser, V.M, Coifman, R.R, “Entropy Based Algorithms for Best Basis Selection”
22) Shlomo Dubnov, “Generalization of Spectral Flatness Measure for Non-Gaussian Linear
Processes”
23) Johnston, J.D, “Estimation of Perceptual Entropy Using Noise Masking Criteria”
24) Learned, E.R, Karl, W.C, Willsky, A.S, “Wavelet Packet Based Transient Signal
Classification
25) Chen, Y.L, Ching, C.H, Lin, K.W, “Robust Block Switching Decision for Transform-
based Audion Coder”
26) Erne, M., Moschytz, G., Faller, C., “Best Wavelet-Packet Bases for Audio Coding using
Perceptual and Rate-Distortion Criteria
27) Bosi, M., Goldberg, R., “Introduction to Digital Audio Coding and Standards”
28) Johnston, J.D, “United States Patent, Patent Number: 5,267,938”
29) MPEG Standard, “ISO11172-3”
30) Robi, P., “The Story of Wavelets”, Rowan University
85
Appendix-I
Table 3-D.3b. Calculation Partition Table This table is valid at a sampling rate of 44.1.0 kHz.
Index wlow whigh bval minval TMN 1 1 1 0.00 0.0 24.5 2 2 2 0.43 0.0 24.5 3 3 3 0.86 0.0 24.5 4 4 4 1.29 20.0 24.5 5 5 5 1.72 20.0 24.5 6 6 6 2.15 20.0 24.5 7 7 7 2.58 20.0 24.5 8 8 8 3.01 20.0 24.5 9 9 9 3.45 20.0 24.5 10 10 10 3.88 20.0 24.5 11 11 11 4.28 20.0 24.5 12 12 12 4.67 20.0 24.5 13 13 13 5.06 20.0 24.5 14 14 14 5.42 20.0 24.5 15 15 15 5.77 20.0 24.5 16 16 16 6.11 17.0 24.5 17 17 19 6.73 17.0 24.5 18 20 22 7.61 15.0 24.5 19 23 25 8.44 10.0 24.5 20 26 28 9.21 7.0 24.5 21 29 31 9.88 7.0 24.5 22 32 34 10.51 4.4 25.0 23 35 37 11.11 4.5 25.6 24 38 40 11.65 4.5 26.2 25 41 44 12.24 4.5 26.7 26 45 48 12.85 4.5 27.4 27 49 52 13.41 4.5 27.9 28 53 56 13.94 4.5 28.4 29 57 60 14.42 4.5 28.9 30 61 64 14.86 4.5 29.4 31 65 69 15.32 4.5 29.8 32 70 74 15.79 4.5 30.3 33 75 80 16.26 4.5 30.8 34 81 86 16.73 4.5 31.2 35 87 93 17.19 4.5 31.7 36 94 100 17.62 4.5 32.1 37 101 108 18.05 4.5 32.5 38 109 116 18.45 4.5 32.9 39 117 124 18.83 4.5 33.3 40 125 134 19.21 4.5 33.7 41 135 144 19.60 4.5 34.1 42 145 155 20.00 4.5 34.5 43 156 166 20.38 4.5 34.9 44 167 177 20.74 4.5 35.2 45 178 192 21.12 4.5 35.6 46 193 207 21.48 4.5 36.0 47 208 222 21.84 4.5 36.3 48 223 243 22.20 4.5 36.7 49 244 264 22.56 4.5 37.1 50 265 286 22.91 4.5 37.4 51 287 314 23.26 4.5 37.8 52 315 342 23.60 4.5 38.1 53 343 371 23.95 4.5 38.4 54 372 401 24.30 4.5 38.8 55 402 431 24.65 4.5 39.1 56 432 469 25.00 4.5 39.5 57 470 513 25.33 3.5 39.8
86
Appendix-II
Matlab Files:
1) Main Function (Encoder)
% Vaibhav Chhabra % Thesis - Main Fucntion (Encoder) % Last Modified: 04/12/2005 % % This program sets up all the variables and frame work to encode and decode the audio stream % the function CalculateAll (Psychoacoustic Model) calls modules that perform % the SMR calculation function new_codec() clear all; clc; global FRAMES frame_count s iblen_index iblen r f Fs earlyblock prevblock SMR_interp scalebits = 4; bitrate = 128000; N = 2048; % framelength original_filename = sprintf('Sound 6.wav'); coded_filename = sprintf('encoded_file.enc'); decoded_filename = sprintf('decoded_file.wav'); [Y,Fs,NBITS] = wavread(original_filename); tone = Y; num_subbands = floor(fftbark(N/2,N/2,Fs))+1; bits_per_frame = floor(((bitrate/Fs)*(N/2)) - (scalebits*num_subbands)); % Enframe Audio tonality_index=1; un_index=1; FRAMES = enframe(tone,N,N/2); r = zeros(3,1024); f = zeros(3,1024); earlyblock = []; prevblock = []; s=zeros(1,512); iblen = 512; % Write File Header fid = fopen(coded_filename,'w'); fwrite(fid, Fs, 'ubit16'); % Sampling Frequency fwrite(fid, N, 'ubit12'); % Frame Length fwrite(fid, bitrate, 'ubit18'); % Bit Rate fwrite(fid, scalebits, 'ubit4'); % Number of Scale Bits per Sub-Band fwrite(fid, length(FRAMES(:,1)), 'ubit26'); % Number of frames % Computations for frame_count=1:length(FRAMES(:,1)) if (mod(frame_count,2) == 0) | (mod(frame_count,2)==1) outstring = sprintf('NOW ENCODING FRAME %i of %i', frame_count, length(FRAMES(:,1))); disp(outstring);
87
end fft_frame = fft(FRAMES(frame_count,:)); if fft_frame == zeros(1,N) Gain = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); bit_alloc = zeros(1,floor(fftbark(N/2,N/2,Fs))+1); else for iblen_index = 0:1 s = FRAMES(frame_count,((iblen_index*iblen)+1:(iblen_index*iblen)+iblen)); CalculateAll() end % End Main Loop New_FFT2 = SMR_interp; if frame_count == 25 figure; semilogx([0:(Fs/2)/(N/2):Fs/2-1],New_FFT2); title('SMR');xlabel('Frequency');ylabel('dB') figure; stem(allocate(New_FFT2,bits_per_frame,N,Fs)); title('Bits perceptually allocated');xlabel('Critical Bands');ylabel('Bits Allocated') end bit_alloc = allocate(New_FFT2,bits_per_frame,N,Fs); [Gain,Data] = p_encode(mdct(FRAMES(frame_count,:)),Fs,N,bit_alloc,scalebits); end % end of computations % Write Audio Data to File qbits = sprintf('ubit%i', scalebits); fwrite(fid, Gain, qbits); fwrite(fid, bit_alloc, 'ubit4'); for i=1:25 indices = find((floor(fftbark([1:N/2],N/2,Fs))+1)==i); qbits = sprintf('ubit%i', bit_alloc(i)); % bits(floor(fftbark(i,framelength/2,48000))+1) if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) fwrite(fid, Data(indices(1):indices(end)) ,qbits); end end end % end of frame loop fclose(fid); % RUN DECODER disp('Decoding...'); p_decode(coded_filename,decoded_filename); disp('Okay, all done!'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % FFTBARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=fftbark(bin,N,Fs) % b=fftbark(bin,N,Fs) % Converts fft bin number to bark scale
88
% N is the fft length % Fs is the sampling frequency f = bin*(Fs/2)/N; b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ENFRAME % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function f=enframe(x,win,inc) %ENFRAME split signal up into (overlapping) frames: one per row. F=(X,WIN,INC) % % F = ENFRAME(X,LEN) splits the vector X up into % frames. Each frame is of length LEN and occupies % one row of the output matrix. The last few frames of X % will be ignored if its length is not divisible by LEN. % It is an error if X is shorter than LEN. % % F = ENFRAME(X,LEN,INC) has frames beginning at increments of INC % The centre of frame I is X((I-1)*INC+(LEN+1)/2) for I=1,2,... % The number of frames is fix((length(X)-LEN+INC)/INC) % % F = ENFRAME(X,WINDOW) or ENFRAME(X,WINDOW,INC) multiplies % each frame by WINDOW(:) % Copyright (C) Mike Brookes 1997 % % Last modified Tue May 12 13:42:01 1998 % % VOICEBOX home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % This program is free software; you can redistribute it and/or modify % it under the terms of the GNU General Public License as published by % the Free Software Foundation; either version 2 of the License, or % (at your option) any later version. % % This program is distributed in the hope that it will be useful, % but WITHOUT ANY WARRANTY; without even the implied warranty of % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the % GNU General Public License for more details. % % You can obtain a copy of the GNU General Public License from % ftp://prep.ai.mit.edu/pub/gnu/COPYING-2.0 or by writing to % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% nx=length(x); nwin=length(win); if (nwin == 1) len = win; else len = nwin; end if (nargin < 3) inc = len; end
89
nf = fix((nx-len+inc)/inc); f=zeros(nf,len); indf= inc*(0:(nf-1)).'; inds = (1:len); f(:) = x(indf(:,ones(1,len))+inds(ones(nf,1),:)); if (nwin > 1) w = win(:)'; f = f .* w(ones(nf,1),:); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SCHROEDER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function m=Schroeder(freq,spl,downshift) % Calculate the Schroeder masking spectrum for a given frequency and SPL N = 2048; f_kHz = [1:48000/N:48000/2]; f_kHz = f_kHz/1000; A = 3.64*(f_kHz).^(-0.8) - 6.5*exp(-0.6*(f_kHz - 3.3).^2) + (10^(-3))*(f_kHz).^4; f_Hz = f_kHz*1000; % Schroeder Spreading Function dz = bark(freq)-bark(f_Hz); mask = 15.81 + 7.5*(dz+0.474) - 17.5*sqrt(1 + (dz+0.474).^2); New_mask = (mask + spl - downshift); m = New_mask; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % BARK % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function b=bark(f) % b=bark(f) % Converts frequency to bark scale % Frequency should be specified in Hertz b = 13*atan(0.76*f/1000) + 3.5*atan((f/7500).^2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ALLOCATE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function x=allocate(y,b,N,Fs) % x=allocate(y,b,N) % Allocates b bits to the 25 subbands % of y (a length N/2 MDCT, in dB SPL) bits(floor(bark( (Fs/2)*[1:N/2]/(N/2) )) +1) = 0; for i=1:N/2 bits(floor(bark( (Fs/2)*i/(N/2) )) +1) = max(bits(floor(bark( (Fs/2)*i/(N/2) )) +1) , ceil( y(i)/6 )); end
90
indices = find(bits(1:end) < 2); bits(indices(1:end)) = 0; % NEED TO CALCULATE SAMPLES PER SUBBAND n = 0:N/2-1; f_Hz = n*Fs/N; f_kHz = f_Hz / 1000; A_f = 3.64*f_kHz.^-.8 - 6.5*exp(-.6*(f_kHz-3.3).^2) + 1e-3*f_kHz.^4; % *** Threshold in Quiet z = 13*atan(0.76*f_kHz) + 3.5*atan((f_kHz/7.5).^2); % *** bark frequency scale crit_band = floor(z)+1; num_crit_bands = max(crit_band); num_crit_band_samples = zeros(num_crit_bands,1); for i=1:N/2 num_crit_band_samples(crit_band(i)) = num_crit_band_samples(crit_band(i)) + 1; end x=zeros(1,25); bitsleft=b; [blah,i]=max(bits); while bitsleft > num_crit_band_samples(i) [blah,i]=max(bits); x(i) = x(i) + 1; bits(i) = bits(i) - 1; bitsleft=bitsleft-num_crit_band_samples(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_ENCODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [Quantized_Gain,quantized_words]=p_encode(x2,Fs,framelength,bit_alloc,scalebits) for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); Gain(i) = 2^(ceil(log2((max(abs(x2(indices(1):indices(end))+1e-10)))))); if Gain(i) < 1 Gain(i) = 1; end x2(indices(1):indices(end)) = x2(indices(1):indices(end)) / (Gain(i)+1e-10); Quantized_Gain(i) = log2(Gain(i)); end for i=1:length(x2) quantized_words(i) = midtread_quantizer(x2(i), max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)+1e-10); % 03/20/03 end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_QUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_quantizer(x,R) Q = 2 / (2^R - 1);
91
q = quant(x,Q); s = q<0; ret_value = uint16(abs(q)./Q + s*2^(R-1)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MIDTREAD_DEQUANTIZER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [ret_value] = midtread_dequantizer(x,R) sign = (2 * (x < 2^(R-1))) - 1; Q = 2 / (2^R - 1); x_uint = uint32(x); x = bitset(x_uint,R,0); x = double(x); ret_value = sign * Q .* x; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % P_DECODE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Fs=p_decode(coded_filename,decoded_filename) %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE HEADER % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% fid = fopen(coded_filename,'r'); Fs = fread(fid,1,'ubit16'); % Sampling Frequency framelength = fread(fid,1,'ubit12'); % Frame Length bitrate = fread(fid,1,'ubit18'); % Bit Rate scalebits = fread(fid,1,'ubit4' ); % Number of Scale Bits per Sub-Band num_frames = fread(fid,1,'ubit26'); % Number of frames for frame_count=1:num_frames %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % READ FILE CONTENTS % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% qbits = sprintf('ubit%i', scalebits); gain = fread(fid,25,qbits); bit_alloc = fread(fid,25,'ubit4'); for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); if ((bit_alloc(i) ~= 0) & (bit_alloc(i) ~= 1)) qbits = sprintf('ubit%i', bit_alloc(i)); InputValues(indices(1):indices(end)) = fread(fid, length(indices) ,qbits); else InputValues(indices(1):indices(end)) = 0; end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % DEQUANTIZE VALUES %
92
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:length(InputValues) if InputValues(i) ~= 0 if max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0) ~= 0 InputValues(i) = midtread_dequantizer(InputValues(i),... max(bit_alloc(floor(fftbark(i,framelength/2,Fs))+1),0)); end end end for i=1:25 gain2(i) = 2^gain(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % APPLY GAIN % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=1:floor(fftbark(framelength/2,framelength/2,Fs))+1 indices = find((floor(fftbark([1:framelength/2],framelength/2,Fs))+1)==i); InputValues(indices(1):indices(end)) = InputValues(indices(1):indices(end)) * gain2(i); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % INVERSE MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x2((frame_count-1)*framelength+1:frame_count*framelength) = imdct(InputValues(1:framelength/2)); end status = fclose(fid); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % RECOMBINE FRAMES % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% x3 = zeros(1,(length(x2)-1)/2+1); for i=0:0.5:floor(length(x2)/(2*framelength))-1 x3(i*framelength+1 : (i+1)*framelength) = x3(i*framelength+1 : (i+1)*framelength) + x2((2*i)*framelength+1 : (2*i+1)*framelength); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % WRITE FILE % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% wavwrite(x3/2,Fs,decoded_filename); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % MDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = mdct(x) x=x(:); N=length(x); n0 = (N/2+1)/2; wa = sin(([0:N-1]'+0.5)/N*pi);
93
y = zeros(N/2,1); x = x .* exp(-j*2*pi*[0:N-1]'/2/N) .* wa; X = fft(x); y = real(X(1:N/2) .* exp(-j*2*pi*n0*([0:N/2-1]'+0.5)/N)); y=y(:); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % IMDCT % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function y = imdct(X) X=X(:); N = 2*length(X); ws = sin(([0:N-1]'+0.5)/N*pi); n0 = (N/2+1)/2; Y = zeros(N,1); Y(1:N/2) = X; Y(N/2+1:N) = -1*flipud(X); Y = Y .* exp(j*2*pi*[0:N-1]'*n0/N); y = ifft(Y); y = 2*ws .* real(y .* exp(j*2*pi*([0:N-1]'+n0)/2/N)); function CalculateAll() % Calculate All Variables global iblen Fs earlyblock prevblock newblock r f global Y s r_hat f_hat cw e cb en cbb spreadplot epart npart global tbb SNRb bcb nbb nbw thrw SMR THRw SMR2 Fs iblen_index newblock = s; % Step 1 - Reconstruct 1024 Samples of the Input Signal [earlyblock, prevblock, newblock, s] = ... Reconstruct(earlyblock, prevblock, newblock, iblen); if length(s) == 1024 % Step 2 - Calculates the magnitude and phase using FFT [r,f] = Spectrum(s, r, f); % Step 3 - Calculates the energy and unpredictability in the threshold calculation partitions [e] = Energy_Unpredictability(r); % Step 4 - Convolves the partioned energy with the spreading function load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; [en] = Spread(e); % Step 5 - Calculate tonality index tbb=MAIN(s);
94
% Step 6 - Calculate the Required SNR in Each Partition SNRb = CalcSNR(tbb); % Step 7 - Calculate the Power Ratio bcb = CalcPwrRatio(SNRb); % Step 8 - Calculation of Actual Energy Threshold, nbb nbb = CalcNbb(en, bcb); % Step 9 - Spread the Threshold Energy over FFT Lines, Yielding nbw nbw = CalcNb(nbb); % Step 10 - Include Absolute Thresholds, Yielding the Final Energy Threshold of Audibility, thrw thrw = CalcThresh(nbw); % Step 11 - Calculate the Signal-to-Mask Ratios, SMRn [SMR(iblen_index+1,:),epart,npart] = CalcSMR(r, thrw); end %-------------------------------------------------------------------------------------------------------------------------- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Purposed Model - Calculations begin %Spreading Function (has been calculated and stored in the 'tables.mat' for %more information refer p129 of standard % load ('tables.mat'); % Tables and Spreading Functions for sampling rate 44.1Khz %-------------------------------------------------------------------------------------------------------------------------- %Reconstruct 1024 samples of the input signal function [earlyblock, prevblock, newblock, s] = ... Reconstruct(earlyblock, prevblock, newblock, iblen); if iblen >= 512, block = [prevblock newblock]; else block = [earlyblock prevblock newblock]; end earlyblock=prevblock; prevblock=newblock; if length(block) >= 1024 s = block(end-1023:end); % Newest 1024 samples else s = zeros(1,512); end %-------------------------------------------------------------------------------------------------------------------------- %Calculate the complex spectrum of the input signal function [r,f] = Spectrum(s, r, f) global frame_count iblen_index Fs sw = s .* (0.5 - 0.5*cos((2*pi*([1:1024]-0.5))/1024)); % Hann Window r(1:2,:) = r(2:3,:); % Shift previous two magnitude values f(1:2,:) = f(2:3,:); % Shift previous two phase values r(3,:) = abs(fft(sw)); f(3,:) = angle(fft(sw)); mag=r(3,:); if frame_count ==25 && iblen_index ==1 figure freq = (Fs/2)*(1:513)/1024;
95
plot(freq,mag(1:513)); title('Magnitude Component of the fft') xlabel('Frequency') ylabel('Magnitude "f"') figure plot(f(3,:)); title('Phase Component of the fft') xlabel('Frequency') ylabel('Phase "r"') end %-------------------------------------------------------------------------------------------------------------------------- %Calculate the Energy and Unpredictabilty using the (Threshold) Calculation %Partition Table D.3b p134 standard function [e] = Energy_Unpredictability(r) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; % Applying smooth function to AbsTable w_lo(1:57) = Table3D3b([1:57],1); %Median Bark Values of the Partition w_hi(1:57) = Table3D3b([1:57],2); for w=1:57 e(w) = sum((r(3,w_lo(w):w_hi(w))).^2); end if frame_count==25 && iblen_index==1 for i=1:57, spreadplot(i,:) = sprdngf(i,:)*e(i); end figure;plot(spreadplot', 'r:'); hold on; plot(e, 'b'); hold off; title('Energy in each partition using Table D.3b p134 of standard'); ylabel('Energy "e" (blue) and Spreading Functions (red)'); xlabel('Partitions 1-57') end %-------------------------------------------------------------------------------------------------------------------------- %Convolve the partitioned energy and unpredictabilty with the spreading function function [en] = Spread(e) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; ecb=zeros(1,57); ct=zeros(1,57); for i=1:57 for j=1:57 ecb(i) = ecb(i) + (e(j) * sprdngf(j,i)); end end %normalizing rnormb = 1 ./ (sum(sprdngf,1)); %normalizing coefficient used to normailize ecb en = ecb.*rnormb; if frame_count==25 && iblen_index==1 for i=1:57, spreadplot(i,:) = sprdngf(i,:)*e(i); end figure;plot(spreadplot', 'r:'); hold on;plot(ecb);hold off title('Convolved Partitioned Energy with Spreading Function ecb') xlabel('Partitions 1-57');ylabel('ecb (blue) and Spreading Functions (red)')
96
for i=1:57, spreadplot(i,:) = sprdngf(i,:)*ecb(i); end figure;plot(spreadplot', 'r:'); hold on;plot(en);hold off title('Convolved Partitioned Energy with Spreading Function (normalized) enb') xlabel('Partitions 1-57');ylabel('enb (blue) and Spreading Function (red)') end %-------------------------------------------------------------------------------------------------------------------------- %Calculating the SNR in each partition function SNRb = CalcSNR(tbb) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; NMTb = 5.5; % Downshift for Noise Masking Tone (in dB) SNRb = max(Table3D3b(:,4)', tbb .* Table3D3b(:,5)'+(1-tbb)*NMTb); if frame_count==25 && iblen_index==1 figure plot(SNRb) title('SNR in each partition') xlabel('Partitions 1-57');ylabel('SNR') end %-------------------------------------------------------------------------------------------------------------------------- %Calculating the power ratio function bcb = CalcPwrRatio(SNRb) global frame_count iblen_index bcb = 10.^(-SNRb/10); if frame_count==25 && iblen_index==1 figure plot(bcb) title('Power Ratio bcb');xlabel('Partition 1-57');ylabel('bcb') end %-------------------------------------------------------------------------------------------------------------------------- %Calculation of actual energy threshold, nb function nbb = CalcNbb(tbb, bcb) global frame_count iblen_index load ('tables.mat'); abstable=smooth(abstable,9,'moving')'; nbb = tbb .* bcb; if frame_count==25 && iblen_index==1 figure;plot(nbb) title('Actual Energy threshold nbb');xlabel('Partition 1-57');ylabel('nbb (blue) and Spreading Fucntion (red)') end %-------------------------------------------------------------------------------------------------------------------------- %Spread the threshold energy over FFT lines, yeilding nb(w) function nb = CalcNb(nbb) global frame_count iblen_index Fs load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; w_lo(1:57) = Table3D3b([1:57],1); w_hi(1:57) = Table3D3b([1:57],2);
97
for b=1:57 for w=1:513 if ((w>=w_lo(b))&(w<=w_hi(b))) nb(w) = nbb(b)/(w_hi(b)-w_lo(b)+1); end; end end % TRANSFORM FFT'S TO SPL VALUES fftmax = 471.4874; % max(abs(fft(1kHz tone)))... defined as 96dB %586.7143 nb = 96 +20*log10(abs(nb)/fftmax); if frame_count==25 && iblen_index==1 figure % freq = (Fs/2)*(1:513)/1024; semilogx(nb) title('Spread the threshold energy over FFT lines, yielding nb(w)') xlabel('Frequency'); ylabel('dB') end %-------------------------------------------------------------------------------------------------------------------------- %Include absolute threshold, yeilding the final energy threshold of %audibility, thrw function thrw = CalcThresh(nb) global frame_count iblen_index load ('tables.mat'); % Tables and Spreading Functions abstable=smooth(abstable,9,'moving')'; thrw = max(nb, abstable); if frame_count==25 && iblen_index==1 figure semilogx(thrw) title('Absolute Thresholds, yielding final energy threshold of audibility thrw') xlabel('Frequency');ylabel('dB') end %-------------------------------------------------------------------------------------------------------------------------- %Calculating the Signal-to-Mask ration (SMR) function [SMR_interp,epart,npart] = CalcSMR(r, thrw) global SMR_interp frame_count iblen_index w_low=1; for i=1:31 epart(i)=sum((r(3,w_low:w_low+16)).^2); if i < 13 npart(i) = sum(thrw(w_low:w_low+16)); else npart(i) = min(thrw(w_low:w_low+16)) * 17; end w_low=w_low+16; end SMR = 10 * log10(epart./npart); SMR_interp=interp(SMR,33); SMR_interp=real(SMR_interp); SMR_interp=[SMR_interp SMR_interp(1023)]; %-------------------------------------------------------------------------------------------------------------------------- 2) Main Function (Tonality Index Calculation)
98
% Vaibhav Chhabra % Thesis - Main Fucntion (Tonality Index Calculation) % Last Modified: 04/12/2005 % % This program sends the frame of audio for wavelet analysis and scales the % tonality index (frequency), it then converts the frequency axis to bin % values and maps it to the partition table of the MPEG ISO11172-3 % Table-3D.b function [tbb]=MAIN(s) length_block=length(s); Fs=44100; global frame_count frames length_block diff tin tIndex_f if length(s) == 1024 tIndex_f=zeros(1,22050); % sending input block for wavelet analysis [rcfs,count,gNodes,wpt,tn]=WPDALGb(s); % scaling the tonality index (frequency axis) tIndex_f=tIndex_f./20; %converting frequency to bin values for f=1:22050 bin(f)=(f*length_block)/(Fs/2); end storeTINDEX=zeros(22050,2); storeTINDEX(:,1)=bin; storeTINDEX(:,2)=tIndex_f; load tables.mat % map to patition table in MPEG standart ISO11172-3 Table 3D-b tIndex_p=zeros(57,1); tIndex_p(1)=storeTINDEX(22,2); tIndex_p(2)=storeTINDEX(44,2); tIndex_p(3)=storeTINDEX(65,2); tIndex_p(4)=storeTINDEX(87,2); tIndex_p(5)=storeTINDEX(108,2); tIndex_p(6)=storeTINDEX(130,2); tIndex_p(7)=storeTINDEX(151,2); tIndex_p(8)=storeTINDEX(174,2); tIndex_p(9)=storeTINDEX(195,2); tIndex_p(10)=storeTINDEX(216,2); tIndex_p(11)=storeTINDEX(237,2); tIndex_p(12)=storeTINDEX(259,2); tIndex_p(13)=storeTINDEX(280,2); tIndex_p(14)=storeTINDEX(302,2); tIndex_p(15)=storeTINDEX(323,2); tIndex_p(16)=storeTINDEX(345,2); tIndex_p(17)=storeTINDEX(367,2); tIndex_p(18)=storeTINDEX(431,2); tIndex_p(19)=storeTINDEX(496,2);
99
tIndex_p(20)=storeTINDEX(560,2); tIndex_p(21)=storeTINDEX(625,2); tIndex_p(22)=storeTINDEX(690,2); tIndex_p(23)=storeTINDEX(754,2); tIndex_p(24)=storeTINDEX(819,2); tIndex_p(25)=storeTINDEX(883,2); tIndex_p(26)=storeTINDEX(969,2); tIndex_p(27)=storeTINDEX(1056,2); tIndex_p(28)=storeTINDEX(1142,2); tIndex_p(29)=storeTINDEX(1230,2); tIndex_p(30)=storeTINDEX(1314,2); tIndex_p(31)=storeTINDEX(1400,2); tIndex_p(32)=storeTINDEX(1508,2); tIndex_p(33)=storeTINDEX(1615,2); tIndex_p(34)=storeTINDEX(1745,2); tIndex_p(35)=storeTINDEX(1874,2); tIndex_p(36)=storeTINDEX(2025,2); tIndex_p(37)=storeTINDEX(2175,2); tIndex_p(38)=storeTINDEX(2348,2); tIndex_p(39)=storeTINDEX(2520,2); tIndex_p(40)=storeTINDEX(2692,2); tIndex_p(41)=storeTINDEX(2908,2); tIndex_p(42)=storeTINDEX(3123,2); tIndex_p(43)=storeTINDEX(3360,2); tIndex_p(44)=storeTINDEX(3596,2); tIndex_p(45)=storeTINDEX(3835,2); tIndex_p(46)=storeTINDEX(4156,2); tIndex_p(47)=storeTINDEX(4479,2); tIndex_p(48)=storeTINDEX(4802,2); tIndex_p(49)=storeTINDEX(5254,2); tIndex_p(50)=storeTINDEX(5707,2); tIndex_p(52)=storeTINDEX(6783,2); tIndex_p(53)=storeTINDEX(7386,2); tIndex_p(54)=storeTINDEX(8011,2); tIndex_p(55)=storeTINDEX(8657,2); tIndex_p(56)=storeTINDEX(9303,2); tIndex_p(57)=storeTINDEX(10121,2); % condition for setting the index values to zero (noise detection) for z=1:length(tIndex_p) if (tIndex_p(z) <= 18) tIndex_p(z)=0; end end % applying the spreading function to the tonality index tbb=zeros(1,57); for i=1:57 for j=1:57 tbb(i) = tbb(i) + (tIndex_p(j) * sprdngf(j,i)); end end tbb=tbb./max(tbb); end % end Calculations
100
% Scaling Tonality Index (Time Domain) tIndex=zeros(size(diff)); if (diff(:,1)==zeros) tIndex(:,1)=diff(:,1); else tIndex(:,1)=diff(:,1); tIndex(:,1)=diff(:,1)./max(diff(:,1)); end tIndex(:,2)=diff(:,2)./1000; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Reconstruct block %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [containerblock, prevblock, newblock, s] = reconstruct(containerblock, prevblock, newblock, iblen) % global iblen containerblock newblock prevblock if iblen >= 512, block = [prevblock newblock]; else block = [containerblock prevblock newblock]; end containerblock=prevblock; prevblock=newblock; if length(block) >= 1024 s = block(end-1023:end); % Newest 1024 samples else s = zeros(1,512); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3) Wavelet Packet Analysis
% Vaibhav Chhabra % Thesis - Wavelet Packet Analysis Fucntion % Last Modified: 04/12/2005 % % This program analysis the frame of audio using the discrete wavelet % packet tree. It detects the nodes with high energy and sends it to the % node reconstruction module function [rcfs,count,gNodes,wpt,tn]=WPDALGb(block) global frame_count frames diff tin tIndex_f % Initializing variables PN_E=100; PN_AA_ratio=[]; PN_DD_ratio=[]; PN_AD_ratio=[]; PN_DA_ratio=[]; store_energy=[]; store_d=[]; store_child_node=zeros(1,2);
101
store_energy=PN_E; % Doing a level 1 decomposition using Daubechies 1 wavelet wpt=wpdec(block,1,'db1','shannon'); [d,tn]=get(wpt,'depth','tn'); % getting depth and terminal nodes PN=tn(1); E=wenergy(wpt); % getting energy of terminal nodes Eratio=zeros(length(E)); % Initializing pointers SN_AA=1; SN_DD=2; SN_AD=[]; SN_DA=[]; var_block=var(block); Eratio=ratio(store_energy,E); store_energy=E; E_diff=abs(E(end-1)-E(end)); % Setting pointers PN_AA_ratio=Eratio(tn(1)); PN_DD_ratio=Eratio(tn(2)); store_Enodes_tn=[SN_AA SN_DD]; if (E(1) > E(2)) theRightPath=[1]; else theRightPath=[2]; end % Start detection scheme while (((1.0<=PN_AA_ratio) & PN_AA_ratio<2.4) | ((1.0<=PN_DD_ratio) & PN_DD_ratio<2.4)| ((1.0<=PN_AD_ratio) & PN_AD_ratio<2.4)| ((1.0<=PN_DA_ratio) & PN_DA_ratio<2.4) ) & (d<5) & (E_diff>0) check_flag=0; if ((1.0<=PN_AA_ratio) & PN_AA_ratio<2.4) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_AA); % split node [CN_A,CN_D]=Cnode(SN_AA,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (CN_A > tn(end)) & (store_d<d) Enodes=E(end-1:end); Pnode_energy=E(end-1)+E(end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; E_diff=abs(E(end-1)-E(end)); end Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2);
102
[PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if ((1.0<=PN_DD_ratio) & (PN_DD_ratio<2.4)) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_DD); [CN_A,CN_D]=Cnode(SN_DD,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (tn(end) > tn(1)) & (store_d<=d) & (d==2) Pnode_energy=E(end-1)+E(end); Enodes=E(end-1:end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end if (7<CN_A) & (store_d==d) & (CN_A<14) Pnode_energy=E(end-3)+E(end-2); Enodes=E(end-3:end-2); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end Eratio=ratio(Pnode_energy,Enodes); PN_DA_ratio=Eratio(1); PN_DD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if ((1.0<=PN_AD_ratio) & (PN_AD_ratio<2.4)) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_AD); [CN_A,CN_D]=Cnode(SN_AD,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt);
103
store_Enodes_tn=[CN_A CN_D]; if CN_A < CN_D & (store_d < d) Pnode_energy=E(end-1)+E(end); Enodes=E(end-1:end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if ((1.0<=PN_DA_ratio) & (PN_DA_ratio<2.4)) & (d<5) & (E_diff>0) wpt=wpsplt(wpt,SN_DA); [CN_A,CN_D]=Cnode(SN_DA,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (CN_A < CN_D) & (store_d<=d) Pnode_energy=E(end-1)+E(end); Enodes=E(end-1:end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; end Eratio=ratio(Pnode_energy,Enodes); PN_DA_ratio=Eratio(1); PN_DD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end if (E(1) > E(2:end))==ones(1,length(E)-1) & (1.0<=PN_AA_ratio) & (E_diff>0) CN_A=tn(1);
104
SN_AA=CN_A; Pnode_energy=E(1); wpt=wpsplt(wpt,SN_AA); [CN_A,CN_D]=Cnode(SN_AA,store_child_node); SN_AA=CN_A; SN_AD=CN_D; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); Enodes=E(2:3); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2); if (store_Enodes(1) > store_Enodes(2)) theRightPath=[ theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end end end % end while-I % Start detection scheme for low frequencies while ((PN_AA_ratio<1.0) & (PN_DD_ratio>2.4)) & (d<5) & (E_diff>0) if (PN_AA_ratio<1.0) & (PN_DD_ratio>2.4) & (d<5) check_flag=0; wpt=wpsplt(wpt,SN_AA); [CN_A,CN_D]=Cnode(SN_AA,store_child_node); PN=Pnode(CN_A); store_d=d; [d,tn]=get(wpt,'depth','tn'); E = wenergy(wpt); store_Enodes_tn=[CN_A CN_D]; if (CN_A > tn(end)) & (store_d<d) Enodes=E(end-1:end); Pnode_energy=E(end-1)+E(end); store_Enodes=Enodes; store_Enodes_tn=[CN_A CN_D]; E_diff=abs(E(end-1)-E(end)); end Eratio=ratio(Pnode_energy,Enodes); PN_AA_ratio=Eratio(1); PN_AD_ratio=Eratio(2); [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); store_energy=E; if (store_Enodes(1) > store_Enodes(2)) theRightPath=[theRightPath store_Enodes_tn(1)]; else theRightPath=[theRightPath store_Enodes_tn(2)]; end
105
end end % end while-II % giving data (theRightPath) to generate nodes if its Energy Difference meets the criteria if ((E_diff>0) ) [gNodes,count,rcfs]=generateNodes(wpt,theRightPath,store_Enodes,store_Enodes_tn); else odd_Nodes_index=find(mod(tn,2)==1); gNodes=tn(odd_Nodes_index); count=length(gNodes); rcfs=[]; store_Enodes=[]; store_Enodes_tn=[]; tin=zeros(1,22050); [gNodes,count,rcfs,diff]=generateNodes(wpt,theRightPath,store_Enodes,store_Enodes_tn); end % end main %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Pointer Update Function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function [PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, SN_AA, SN_AD, SN_DD, SN_DA,store_Enodes]=pointerUpdate(tn,Eratio,PN_AA_ratio, PN_AD_ratio, PN_DD_ratio, PN_DA_ratio, CN_A, CN_D, SN_AA, SN_AD, SN_DD, SN_DA,PN,check_flag,d,store_Enodes); if (diff(tn)==ones(length(tn)-1,1)) & (tn(end) > tn(1)) & (check_flag==0) SN_DA=CN_A; SN_DD=CN_D; check_flag=1; end if (1.0<=PN_AA_ratio) & (1.0<=PN_AD_ratio) & (tn(1) < tn(end)) & (check_flag==0) & (d > 3) & (PN_AA_ratio<4) & (PN_AD_ratio<4) & (CN_A < 10) CN_A=tn(1); CN_D=tn(1)+1; SN_AA=CN_A; SN_AD=CN_D; check_flag=1; end if (10 <CN_A) & (check_flag==0) & (CN_A < 14) & (d >3) SN_DA=CN_A; SN_DD=CN_D; check_flag=1; end if (PN==1) & (CN_A==3) SN_AA=CN_A; SN_AD=CN_D; check_flag=1; end
106
if (PN == 2) & (CN_A==5) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 6) & (CN_A==13) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 5) & (CN_A==11) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 4) & (CN_A==9) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 3) & (CN_A==7) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 6) & (CN_A==13) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 14) & (CN_A==29) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 13) & (CN_A==27) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 12) & (CN_A==25) SN_DD=CN_D; SN_DA=CN_A; end
107
if (PN == 11) & (CN_A==23) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 7) & (CN_A==15) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 8) & (CN_A==17) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 9) & (CN_A==19) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 10) & (CN_A==21) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 15) & (CN_A==31) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 16) & (CN_A==33) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 17) & (CN_A==35) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 18) & (CN_A==37) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 19) & (CN_A==39) SN_AD=CN_D;
108
SN_AA=CN_A; end if (PN == 20) & (CN_A==41) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 21) & (CN_A==43) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 22) & (CN_A==45) SN_AD=CN_D; SN_AA=CN_A; end if (PN == 23) & (CN_A==47) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 24) & (CN_A==49) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 25) & (CN_A==51) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 26) & (CN_A==53) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 27) & (CN_A==55) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 28) & (CN_A==57) SN_DD=CN_D; SN_DA=CN_A;
109
end if (PN == 29) & (CN_A==59) SN_DD=CN_D; SN_DA=CN_A; end if (PN == 30) & (CN_A==61) SN_DD=CN_D; SN_DA=CN_A; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate Energy Ratio %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Eratio=ratio(store_energy,E) for i=1:length(store_energy) for j=1:length(E) Eratio(i,j)=store_energy(i)/E(j); end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate Child Nodes from Parent Node %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [CN_A,CN_D]=Cnode(PN,store_child_node) CN=(PN*2)+1; container_child_node_A(1,1)=CN; container_child_node_A(1,2)=CN+1; CN_A=CN; CN_D=CN+1; %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate Parent Nodes of Updated Child Nodes %%%%%%%%%%%%%%%%%%%%%%%%%%%%% function PN=Pnode(CN_A) if (mod(CN_A,2)==0) CN_A=CN_A-1; end PN=((CN_A-1)/2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Generate Nodes that will be sent to the Node Reconstructor %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [gNodes,count,rcfs,diff]=generateNodes(wpt,tn,store_Enodes,store_Enodes_tn) global frame_count tin tIndex_f clear gNodes diff odd_Nodes = tn(find(mod(tn,2)==1))'; count_gn = 0; temp=[]; rcfs_nodes=[]; gNodes=[]; rcfs=[]; check_flag=0; for path=1:length(tn)
110
HE_N=tn(path); check=isempty(store_Enodes); if (check==0) if (length(tn)==1) pn_HE_N=Pnode(HE_N); temp=[pn_HE_N]; if (rcfs_nodes==temp) rcfs_nodes=[rcfs_nodes]; else rcfs_nodes=[temp,rcfs_nodes]; end gNodes=rcfs_nodes(:); gNodes=gNodes'; count_gn=length(gNodes); [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count_gn,tn); check_flag=1; end if (HE_N==1) | (HE_N==2) temp=HE_N; if (rcfs_nodes==temp) rcfs_nodes=[rcfs_nodes]; else rcfs_nodes=[temp,rcfs_nodes]; end gNodes=rcfs_nodes(:); gNodes=gNodes'; count_gn=length(gNodes); [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count_gn,tn); end % If nodes are not 1 and 2 then generate them (ken, this is where % the flaw is my tonality index (the one you asked in my thesis) while ((HE_N~=1) & (HE_N~=2)) & (length(tn) >1) & check_flag==0 clear gNodes HE_N=Pnode(HE_N); temp=HE_N; rcfs_nodes=[temp, rcfs_nodes]; if(HE_N==1)|(HE_N==2) rcfs_nodes=[rcfs_nodes, tn(path)]; end gNodes=rcfs_nodes(:); gNodes=gNodes'; count_gn=length(gNodes); end % end while-III % send the nodes for reconstruction [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count_gn,tn); % generate tonality index with frequency mapping
111
tIndex_f=tIndex_f+tin; % clear node list rcfs_nodes=[]; check_flag=0; else count=count_gn; container_diff_sl_rcfs10(frame_count,1) = 0; container_diff_sl_rcfs10(frame_count,2) = 0; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; disp('rcfs is empty') end end %end for
4) Node Reconstruction
% Vaibhav Chhabra % Thesis - Node Reconstruction Fucntion % Last Modified: 04/12/2005 % % This program gets the nodes from the wavelet analysis function and % reconstructs them using the inverse discrete wavelet transform. It then % sends the array of reconstructed nodes to the tonality estimator function [rcfs,count,gNodes]=reconsCoef(wpt,gNodes,count,tn) global length_block rcfs=zeros(count,length_block); % Initializing the array for i=1:length(gNodes) rcfs_str = ['rcfs',int2str(gNodes(i)),' = wprcoef(wpt, [gNodes(i)]);']; eval(rcfs_str); rcfs_store_str=['rcfs(',int2str(i) ',:) = rcfs',int2str(gNodes(i)),';']; eval(rcfs_store_str); end % check if gNodes is empty check=isempty(gNodes); if (check==0) [diff,count,gNodes]=ACFALG(rcfs,count,gNodes,tn); end
5) Tonality Estimation
% Vaibhav Chhabra % Thesis - Tonality Estimator Fucntion % Last Modified: 04/12/2005 % % This program get the reconstructed nodes and analyzes them based on the % auto-covariance peaks function [diff,count,gNodes] = ACFALG(rcfs,count,gNodes,tn) global frame_count frames diff f check_flag=0;
112
check=isempty(rcfs); if (check==1) & (check_flag==0) container_diff_sl_rcfs10(frame_count,1) = 0; container_diff_sl_rcfs10(frame_count,2) = 0; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; disp('rcfs is empty') check_flag=1; end if (check==0) & (check_flag==0) % Calculate Autocorrelation Function ACFrcfs=zeros(count,21); for acf_count=1:count ACFrcfs(acf_count,:) = autocorr(rcfs(acf_count,:)); end % Calculate Autocovariance varACF=zeros(count,41); for var_count=1:count varACF(var_count,:) = xcov(ACFrcfs(var_count,:)); end % Uncomment this if you want to plot the auto-covariance of the ACF % for p=1:count % % if p==1 % plot(varACF(p,:));hold on % end % % if p==2 % plot(varACF(p,:),'g') % end % % if p==3 % plot(varACF(p,:),'r') % end % % if p==4 % plot(varACF(p,:),'y') % end % % if p==5 % plot(varACF(p,:),'c') % end % % if p==6 % plot(varACF(p,:),'*k') % end % % if p==7 % plot(varACF(p,:),'om') % end % end % hold off;title('AutoCovariance of ACF');xlabel('0-21 lags of ACF');ylabel('AC')
113
% legend=sprintf('Legend: blue-rcfs10, green-rcfs20, red-rcfs30, yellow-rcfs40, cyan-rcfs50, black*-rcfs60, purpleo-rcfs70'); % disp(legend); % Calculate AC difference if (var_count==1) & check_flag==0 max_var_rcfs10=max(varACF(1,1:10)); diff_sl_rcfs10=abs(max_var_rcfs10); container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; disp('Almost all the energy is in the Low-end'); gNodes; else if (max(varACF(1,:)) > max(varACF(2,:))) & check_flag==0 diff_rcfs1020=abs((max(varACF(1,:))-max(varACF(2,:)))); disp('type I Analysis - Peak Difference of AC-ACF'); gNodes; container_diff_rcfs1020(frame_count,1) = diff_rcfs1020; container_diff_rcfs1020(frame_count,2) = count; diff(frame_count,:)=container_diff_rcfs1020(frame_count,:)*1000; check_flag=1; end if (var_count>=4 & check_flag==0) min_var_rcfs10=min(varACF(1,(1:10))); if(min_var_rcfs10<0) min_var_rcfs10=0; end diff_sl_rcfs10=min_var_rcfs10; disp('var_count >=4 taking min of side-lobe') container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; end % side lobe variance difference if (max(varACF(2,:)) > max(varACF(1,:))) & check_flag==0 max_var_rcfs10=max(varACF(1,(1:10))); min_var_rcfs10=min(varACF(1,(1:10))); diff_sl_rcfs10=(max_var_rcfs10-min_var_rcfs10); disp('type II Analysis - Side Lobe Peaks of AC-ACF-rcfs10'); gNodes; container_diff_sl_rcfs10(frame_count,1) = diff_sl_rcfs10; container_diff_sl_rcfs10(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs10(frame_count,:)*1000; check_flag=1; end
114
if (var_count==4) & check_flag==0 if (max(varACF(4,:)) > max(varACF(3,:))) max_var_rcfs40=max(varACF(4,:)); min_var_rcfs30=min(varACF(3,:)); diff_sl_rcfs4030=(max_var_rcfs40-min_var_rcfs30); disp('type III Analysis - noise like characteristics varACF40 and varACF30 peaks compared'); gNodes; container_diff_sl_rcfs4030(frame_count,1) = diff_sl_rcfs4030; container_diff_sl_rcfs4030(frame_count,2) = count; diff(frame_count,:)=container_diff_sl_rcfs4030(frame_count,:)*1000; check_flag=1; end end end end % if for check==0 % send gNodes for frequency mapping fTable(gNodes,diff,f);
6) Frequency Mapping
% Vaibhav Chhabra % Thesis - Tonality Estimator Fucntion % Last Modified: 04/12/2005 % % This program get the last node "generated nodes array" and maps them on a % frequency axis. It then stores them in an array corresponding to its % frame value (frame_count) function tin=fTable(gNodes,diff,store_gNodes,f) global frame_count tin tin=zeros(1,22050); i=gNodes(end); % Stupid way of mapping to the frequency axis if (gNodes(end)==1) for j=1:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==2) for j=11024:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end
115
end end if (gNodes(end)==3) for j=1:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==4) for j=5512:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==5) for j=11024:16538 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==6) for j=16538:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==7) for j=1:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count); end end end if (gNodes(end)==8) for j=2756:5512 if (round(diff(frame_count))==0)
116
tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==9) for j=5512:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==10) for j=8268:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==11) for j=11024:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==12) for j=13780:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==13) for j=16536:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==14) for j=19292:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else
117
tin(j)=diff(frame_count);end end end if (gNodes(end)==15) for j=1:1378 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==16) for j=1378:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==17) for j=2756:4134 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==18) for j=4134:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==19) for j=5512:6890 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==20) for j=6890:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end
118
end if (gNodes(end)==21) for j=8268:9646 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==22) for j=9646:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==23) for j=11024:12402 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==24) for j=12402:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==25) for j=13780:15158 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==26) for j=15158:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end
119
if (gNodes(end)==27) for j=16536:17914 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==28) for j=17914:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==29) for j=19292:20670 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==30) for j=20670:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==31) for j=1:689 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==32) for j=689:1378 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==33) for j=1378:2067
120
if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==34) for j=2067:2756 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==35) for j=2756:3445 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==36) for j=3445:4134 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==37) for j=4134:4823 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==38) for j=4823:5512 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==39) for j=5512:6201 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count));
121
else tin(j)=diff(frame_count);end end end if (gNodes(end)==40) for j=6201:6890 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==41) for j=6890:7579 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==42) for j=7579:8268 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==43) for j=8268:8957 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==44) for j=8957:9646 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==45) for j=9646:10335 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end
122
end end if (gNodes(end)==46) for j=10335:11024 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==47) for j=11024:11713 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==48) for j=11713:12402 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==49) for j=12402:13091 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==50) for j=13091:13780 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==51) for j=13780:14469 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end
123
if (gNodes(end)==52) for j=14469:15158 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==53) for j=15158:15847 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==54) for j=15847:16536 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==55) for j=16536:17225 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==56) for j=17225:17914 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==57) for j=17914:18603 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==58)
124
for j=18603:19292 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==59) for j=19292:19981 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==60) for j=19981:20670 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==61) for j=20670:21359 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end if (gNodes(end)==62) for j=21359:22049 if (round(diff(frame_count))==0) tin(j)=round(diff(frame_count)); else tin(j)=diff(frame_count);end end end