Abstract - LSE Statisticsstats.lse.ac.uk/fryzlewicz/phd/thesis.pdf · 2003. 12. 27. · Abstract...

Wavelet Techniques for Time Series andPoisson Data

ByPiotr Z. Fryzlewicz ��

A dissertation submitted to the University of Bristol inaccordance with the requirements of the degreeof Doctor of Philosophy in the Faculty of ScienceSeptember 2003Department of Mathematics

AbstractThis thesis considers the application of wavelet methods to a selection of problemsarising in non-stationary time series analysis and Poisson regression.In the �rst part of the thesis, we attempt to provide an answer to the questionof whether and how wavelets can be useful in forecasting non-stationary timeseries. To achieve this, we consider several theoretical and computational aspects offorecasting in the Locally Stationary Wavelet (LSW) model (introduced by Nasonet al. (2000)), which uses discrete non-decimated wavelets as building blocks. Wepropose a wavelet-based adaptive algorithm for forecasting non-stationary timeseries. The performance of the algorithm is investigated by simulation.Secondly, we apply the LSW framework to model �nancial log-returns. We showthat the LSW model accounts well for the stylised facts of log-return data. Severalexamples clearly demonstrate the need for local modelling of �nancial data, andalso indicate the usefulness of wavelets as basic building blocks.Next, we propose a multiscale algorithm for denoising the wavelet periodogram inthe LSW model, and investigate some of its theoretical properties. The idea of thealgorithm is to pre-process the data in the wavelet domain, in order to transforma gamma-contaminated signal into an approximately Gaussian-contaminated one,and then use one of the many denoising techniques available for Gaussian data.Then, the inverse transformation yields an estimate of the original signal.Finally, as another application of the same methodology, we propose an algorithmfor denoising Poisson-contaminated signals. We analyse some of its theoreticalproperties, and use simulation to demonstrate its excellent performance.3

AcknowledgementsFirst of all, I would like to thank the Statistics Group and the Department ofMathematics at the University of Bristol for providing a perfect environment inwhich to pursue research. In particular, I would like to thank my adviser, ProfessorGuy Nason, for all his help, enthusiasm, encouragement, and extremely valuablefeedback on my work. I am also grateful to my second adviser, Professor BernardSilverman, not only for all his help, but also for holding an engaging seminar serieson wavelets which I had the opportunity to attend. My research would not havebeen possible without my sponsors, the University of Bristol and \UniversitiesUK", whose generous �nancial support is gratefully acknowledged.Most of the second year of my study was spent collaborating on a joint researchproject with Professor Rainer von Sachs and Mr S�ebastien Van Bellegem from theInstitute of Statistics, University of Louvain-la-Neuve, Belgium. The collaborationnot only taught me how to work in a research team, but also reminded me of howdi�erently statistics can be understood and pursued on both sides of the Channel.I am also personally indebted to my coauthors for their hospitality during myresearch visits to Louvain-la-Neuve.In the course of the last three years, I have had many interesting conversationswith several other members of the mathematical and statistical community: amongothers, Anestis Antoniadis, Albert Cohen, Rainer Dahlhaus, Nils Hjort, MaartenJansen, Thomas Mikosch, Byeong Uk Park, Theofanis Sapatinas, Catalin Staricaand Kamila _Zychaluk. Each of those discussions was special in its own way, butthey were all highly enjoyable, stimulating, and friendly.5

On a more personal note, I wish to thank all of my friends, �rst and foremost fromBristol, but also from Wroc law and elsewhere, for making the last three years sucha wonderful time. I refrain from listing their names here for fear that this sectionmight otherwise exceed the others in length!Finally, I would like to thank my parents and sister for their constant love andsupport.Dzi�ekuj�e Wam wszystkim! Thank you all!

6

DeclarationI, the author, declare that the work in this dissertation was carriedout in accordance with the Regulations of the University of Bristol.The work is original except where indicated by special reference inthe text and no part of the dissertation has been submitted for anyother degree.The views expressed in the dissertation are those of the author andin no way represent those of the University of Bristol.The dissertation has not been presented to any other Universityfor examination either in the United Kingdom or overseas.Piotr Z. Fryzlewicz

7

ContentsAbstract 3Acknowledgements 5Declaration 71 Introduction 192 Literature review 222.1 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.1.1 Multiresolution analysis . . . . . . . . . . . . . . . . . . . . 242.1.2 Discrete Wavelet Transform . . . . . . . . . . . . . . . . . . 252.1.3 Non-decimated Wavelet Transform . . . . . . . . . . . . . . 282.1.4 Recent extensions of wavelets . . . . . . . . . . . . . . . . . 292.1.5 Applications of wavelets . . . . . . . . . . . . . . . . . . . . 312.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2 Time series analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.2 Evolutionary spectral theory . . . . . . . . . . . . . . . . . . 362.2.3 Wavelets and time series . . . . . . . . . . . . . . . . . . . . 372.2.4 The Locally Stationary Wavelet model . . . . . . . . . . . . 392.2.5 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3 Nonparametric regression . . . . . . . . . . . . . . . . . . . . . . . . 462.3.1 Non-linear wavelet smoothing . . . . . . . . . . . . . . . . . 472.3.2 Wavelet shrinkage in time series analysis . . . . . . . . . . . 502.3.3 Wavelet and multiscale methods for Poisson data . . . . . . 519

3 Forecasting LSW processes 533.1 Forecasting by approximate MSPE minimisation . . . . . . . . . . . 543.2 A closer look at the results of Section 3.1 . . . . . . . . . . . . . . . 603.2.1 Assumptions of Lemma 3.1.1 . . . . . . . . . . . . . . . . . . 603.2.2 Assumptions of Proposition 3.1.1 . . . . . . . . . . . . . . . 623.3 Kolmogorov's formula for LSW2 processes . . . . . . . . . . . . . . 653.4 Estimation of the approximating matrix BT . . . . . . . . . . . . . 733.5 Prediction based on data . . . . . . . . . . . . . . . . . . . . . . . . 783.5.1 Nuisance parameters . . . . . . . . . . . . . . . . . . . . . . 783.5.2 Future observations in rescaled time . . . . . . . . . . . . . . 793.5.3 Data-driven choice of parameters . . . . . . . . . . . . . . . 803.6 Application of the predictor to real data . . . . . . . . . . . . . . . 823.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854 Modelling log-returns using wavelets and rescaled time 874.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.2 Wavelet-based model . . . . . . . . . . . . . . . . . . . . . . . . . . 904.3 Explanation of the stylised facts . . . . . . . . . . . . . . . . . . . . 964.3.1 Heavy tails of the \marginal" distribution . . . . . . . . . . 974.3.2 Sample autocorrelations of Xt;T and X2t;T . . . . . . . . . . . 984.3.3 Clustering of volatility . . . . . . . . . . . . . . . . . . . . . 1004.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.4.1 Generic algorithm . . . . . . . . . . . . . . . . . . . . . . . . 1034.4.2 Smoothing the decimated periodogram . . . . . . . . . . . . 1054.4.3 Estimating the spectrum with guaranteed nonnegativity . . 1074.4.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . 1084.5 Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . . 1104.5.1 Analysis based on the scalogram . . . . . . . . . . . . . . . . 1104.5.2 Full evolutionary Haar spectrum analysis . . . . . . . . . . . 113

4.6 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195 Denoising the wavelet periodogram using Haar-Fisz 1225.1 Motivation: the Fisz transform . . . . . . . . . . . . . . . . . . . . 1225.2 Properties of the wavelet periodogram in the Gaussian LSW model 1245.3 The Haar-Fisz transform . . . . . . . . . . . . . . . . . . . . . . . . 1275.3.1 Algorithm for the Haar-Fisz transform . . . . . . . . . . . . 1275.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.4 A Functional CLT for the centred wavelet periodogram . . . . . . . 1305.5 Properties of the Haar-Fisz transform . . . . . . . . . . . . . . . . . 1375.5.1 Properties of the Haar-Fisz transform for M �xed . . . . . . 1375.5.2 Properties of the Haar-Fisz transform for M = log2(T ) . . . 1435.5.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1455.6 Denoising the wavelet periodogram . . . . . . . . . . . . . . . . . . 1465.6.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1495.7 Real data example: the Dow Jones index . . . . . . . . . . . . . . . 1535.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556 A Haar-Fisz algorithm for Poisson intensity estimation 1586.1 The Fisz transform for Poisson variables . . . . . . . . . . . . . . . 1596.2 The Haar-Fisz transform for Poisson counts . . . . . . . . . . . . . 1606.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.2.2 A general formula for the Haar-Fisz transform . . . . . . . . 1636.3 Properties of the Haar-Fisz transform for constant intensities . . . . 1656.4 Properties of the Haar-Fisz transform for non-constant intensities . 1706.4.1 Decorrelation and Gaussianisation . . . . . . . . . . . . . . . 1706.4.2 Variance stabilisation . . . . . . . . . . . . . . . . . . . . . . 1726.4.3 Summary of conclusions . . . . . . . . . . . . . . . . . . . . 1786.5 Poisson intensity estimation . . . . . . . . . . . . . . . . . . . . . . 183

6.5.1 Methods for Poisson intensity estimation . . . . . . . . . . . 1846.5.2 Simulation results for various test functions . . . . . . . . . 1866.5.3 Performance of Haar-Fisz methods as a function of the num-ber of cycle shifts . . . . . . . . . . . . . . . . . . . . . . . . 1916.6 Application to earthquake data . . . . . . . . . . . . . . . . . . . . 1926.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1957 Conclusions and future directions 197Bibliography 201

List of Tables4.1 Values of the criterion functions averaged over 25 simulations. \De-fault" is the method of Nason et al. (2000) with default parameters,\splines" is our method using spline smoothing and \wavelets" isour method using translation-invariant nonlinear wavelet smoothing. 1094.2 Mean Squared Prediction Error and Median Squared Prediction Er-ror (�107 and rounded) in forecasting D1106;T ; : : : ; D1205;T one stepahead, for the three methods tested in Section 4.6. . . . . . . . . . . 1196.1 Normalised MISE values (�10000) for various existing techniquesand our F ./U and H:CV+BT methods using Haar wavelets andDaubechies' least asymmetric wavelets with 10 vanishing moments(LA10), on the test functions with peak intensities 8 and 128. Thebest results are indicated by a box. . . . . . . . . . . . . . . . . . . 1876.2 MISE per bin (�100 and rounded) for clipped block intensity esti-mation using BMSMShrink and H:CV+BT as denoted in the textfor a variety of intensity scalings. . . . . . . . . . . . . . . . . . . . 190

13

List of Figures2.1 Bottom plot: spectrum of an exemplary LSW process plottedagainst the rescaled time. The y-axis shows negative scale �j. Thespectrum is only non-zero at scales �1 and �3. Top plot: a sam-ple path of length 512 simulated from this spectrum using Haarwavelets and Gaussian innovations. . . . . . . . . . . . . . . . . . . 443.1 The wind anomaly index (in cm/s). The two vertical lines indicatethe segment shown in Figure 3.2. . . . . . . . . . . . . . . . . . . . 833.2 Comparison between the one-step prediction in the LSW2 model(dashed lines) and AR (dotted lines). The middle line is the pre-dicted value, the top (bottom) line is the upper (lower) end of thecorresponding 95% prediction interval. . . . . . . . . . . . . . . . . 844.1 Left-hand column, from top to bottom: Xt with �t superimposed,acf of Xt, acf of X2t , qqnorm plot of Xt. Right-hand column, fromtop to bottom: Zt, acf of Zt, acf of Z2t , qqnorm plot of Zt. SeeSection 4.1 for a discussion. . . . . . . . . . . . . . . . . . . . . . . 914.2 Left-hand plot: log-returns on daily closing values of Nikkei (5/6Jan 1970 { 11/14 May 2001). Right-hand plot: log-returns on dailyclosing values of the Dow Jones Industrial Average (3/4 Jan 1995 {10/11 May 2001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9614

4.3 Left-hand plot: sample path from Gaussian TMWN model withtime-varying standard deviation superimposed. Right-hand plot:time-varying variance (solid), its estimate using splines (dot-dashed), its estimate using nonlinear wavelet thresholding (dotted),and its estimate using nonlinear wavelet thresholding with defaultparameters (dashed). . . . . . . . . . . . . . . . . . . . . . . . . . . 1084.4 Solid lines: log-scalograms of Xt;T (top left), Ft;T (top right), Nt;T(bottom left) and Dt;T (bottom right), plotted against �j. Dottedlines: theoretical scalograms if the processes were (time-modulated)white noise (not necessarily Gaussian). Dashed lines: �j = 3; 5 (seetext for discussion). . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114.5 Left-hand plot: �2(h) for Haar wavelets for h = 0; 1; : : : ; 5. Right-hand plot: autocorrelation function for Ft;T at lags 0; 1; : : : ; 5. . . . 1144.6 Left-hand plot: sample autocorrelation of F 01;T ; : : : ; F 01200;T . Right-hand plot: sample autocorrelation of F 01201;T ; : : : ; F 02048;T . . . . . . . 1144.7 Estimated evolutionary Haar spectrum of T = 2048 last obser-vations of FTSE 100 of Figure 4.1. Smoothing uses splines. X-axis is the rescaled time z = t=T , and Y-axis is negative scale�j = 1; 2; : : : ; 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.8 Top left, top right and bottom left: the actual series (dotted line),one-step forecasts (solid line) and 95% prediction intervals (dashedlines) for AR(1) + GARCH(1,1), AR(16)+GARCH(1,1) and LSW3,respectively. Bottom right: actual series �2000 and the evolutionof the bandwidth g. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.1 Top plot: example of a wavelet spectrum where only S�1(z) andS�3(z) are non-zero. Middle plot: sample path of length 1024 sim-ulated from this spectrum using Haar wavelets and Gaussian in-novations. Bottom plot: the Haar periodogram of the simulatedrealisation at scale j = �1. . . . . . . . . . . . . . . . . . . . . . . 1265.2 The log transform (left plot) and the Haar-Fisz transform with M =10 (right plot) of the wavelet periodogram from the bottom plot ofFigure 5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.3 Left plot: the q-q plot of f9 arising from the Haar periodogramof a pure white noise process at scale j = �1 (against the normalquantiles). Right plot: solid line | the variance of f log2(T )�1n againstthe correlation of the Gaussian variables involved; dotted line |variance = 0:4 (see text for further description). . . . . . . . . . . 1445.4 Proportion of p-values exceeding or equal to 5% (x-axis shows neg-ative scale �j). Left column: results for TVAR, right column: re-sults for TMWN. Top row: T = 256, Bottom row: T = 1024. Solidline: M = log2(T ), dotted line: M = log2(T ) � 1, dashed line:M = log2(T ) � 2, long-dashed line: the log transform. Horizontalsolid line: 0.95. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1475.5 Solid lines: estimates of the local variances for T = 1024 in theTMWN model (top row), and the TVAR model (bottom row), usingthe method of Nason et al. (2000) (left column) and the Haar-Fiszalgorithm (right column) as described in the text. Dotted lines:true local variances. . . . . . . . . . . . . . . . . . . . . . . . . . . 1515.6 Solid line: di�erence between logged MISE for Nason et al. (2000)and for our Haar-Fisz algorithm (x-axis shows negative scale �j).Positive value means our algorithm does better. Left column:results for TVAR, right column: results for TMWN. Top row:T = 256, bottom row: T = 1024. Dotted line: zero. . . . . . . . . . 152

5.7 Four estimates of the local variance of Dt;T on a log scale. Solidline: method 1. Dashed line: method 2. Long-dashed line: method3. Dotted line: method 4. See text for further description. . . . . . 1555.8 Empirical quantiles of the residuals of Dt;T against the quantilesof the standard normal. Top left: method 1. Top right: method2. Bottom left: method 3. Bottom right: method 4. See text forfurther description. . . . . . . . . . . . . . . . . . . . . . . . . . . 1566.1 Top left: Di�erence between Kolmogorov-Smirnov test statisticscomputed on Anscombe-transformed Poisson variables with inten-sity (�1; �2), and z(�1; �2). Positive di�erence means that Haar-Fiszis closer to Gaussian. Top right: j�z(�1; �2) � �1=2(�1; �2)j. Bottomleft (and right): perspective (and contour) plot of Var(z(�1; �2)). . 1616.2 Templates used in the experiment of Sections 6.4.1 and 6.4.2. . . . 1716.3 Q-Q and acf plots for v0; see Section 6.4.1 for detailed description. . 1736.4 Q-Q and acf plots for v25; see Section 6.4.1 for detailed description. 1746.5 Q-Q and acf plots for v50; see Section 6.4.1 for detailed description. 1756.6 Q-Q and acf plots for v75; see Section 6.4.1 for detailed description. 1766.7 From top to bottom: intensity vector � of Donoho & Johnstone(1994) bumps function (solid; shifted and scaled so that the mini-mum intensity is 3 and the maximum is 18) and one sample pathv (dotted); Q-Q plots of vectors v � �, Av � A�, and Fv � F�averaged over 100 v samples. . . . . . . . . . . . . . . . . . . . . . 1776.8 Averaged squared residuals for v0; see Section 6.4.2 for detaileddescription. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1796.9 Averaged squared residuals for v25; see Section 6.4.2 for detaileddescription. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1806.10 Averaged squared residuals for v50; see Section 6.4.2 for detaileddescription. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

6.11 Averaged squared residuals for v75; see Section 6.4.2 for detaileddescription. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1826.12 Selected estimates for the Donoho and Johnstone intensity func-tions (dashed, described in text). Each estimate gives an idea of\average" performance in that in each case its MISE is the closestto the median MISE obtained over 50 sample paths. The estimationmethod in each case was F ./U with Daubechies least-asymmetricwavelets with 10 vanishing moments except for blocks which usedH:CV+BT with Haar wavelets. . . . . . . . . . . . . . . . . . . . 1886.13 Left: Scaled and shifted blocks function, and its clipped version:clipped blocks. Right: The true intensity function (with scaling 1,dashed) and an estimate computed using our algorithm using hybridmethod H:CV+BT whose MISE was closest to the median MISEobtained over 50 sample paths. . . . . . . . . . . . . . . . . . . . . 1906.14 MISE against the number of shifts for clipped blocks 1 (top tworows) and blocks 128 (bottom two rows). See Section 6.5.3 fordetailed description. . . . . . . . . . . . . . . . . . . . . . . . . . . 1936.15 MISE against the number of shifts for bumps 8, doppler 8 andheavisine 128. See Section 6.5.3 for detailed description. . . . . . . . 1946.16 The number of earthquakes of magnitude � 3:0 which occurred inNorthern California in 1024 consecutive weeks, the last week being29 Nov { 5 Dec 2000. . . . . . . . . . . . . . . . . . . . . . . . . . 1956.17 Intensity estimates for earthquake data for weeks 201 to 400. Dottedline is BMSMShrink estimate and solid is H:CV+BT estimate. . . 196

Chapter 1IntroductionWavelets can be casually described as oscillatory basis functions, cleverly con-structed to possess several attractive features not enjoyed by \big waves" (sinesand cosines): for example multiscale structure, ability to represent a variety offunctions in a sparse manner, or simultaneous localisation in time and frequency.These and other properties have recently led many researchers to investigate thepotential for using wavelets in various branches of statistics, such as time seriesanalysis or nonparametric regression. In this thesis, we also employ wavelets totackle a selection of problems arising in these two important areas of statistics.In the introductory Chapter 2, we �rst review the basics of wavelet theory, andthen provide a survey of wavelet applications in time series analysis and nonpara-metric regression. In particular, we describe the Locally Stationary Wavelet (LSW)time series model (Nason et al. (2000)), whose various aspects are studied in Chap-ters 3, 4 and 5. The LSW model uses wavelets as building blocks, which makesit a potentially useful tool for modelling multiscale phenomena whose characteris-tics evolve over time. Also, it uses the concept of rescaled time: the time-varyingsecond order quantities are modelled as functions de�ned on a compact interval,which enables meaningful asymptotics. Chapter 2 concludes with a brief sectionon estimating Poisson intensities by wavelet methods, which prepares the groundfor the material of Chapter 6.The work of Chapter 3 is motivated by the interesting question of whether and19

Chapter 1. Introductionhow wavelets can help in forecasting non-stationary time series. We provide ananswer by considering various aspects of linear prediction in the Gaussian LSWmodel. The rescaled time principle enables us to obtain a variety of asymptoticresults. In particular, we generalise the Yule-Walker equations (well known in thestationary case), and derive Kolmogorov's formula for one-step prediction error.In the second half of Chapter 3, we analyse the properties of wavelet-based esti-mators of the prediction matrix, and provide a complete algorithm for forecastingnon-stationary time series. Interesting and encouraging results are obtained byapplying the algorithm to a meteorological time series.In Chapter 4, we model �nancial log-return series as LSW processes. In ourchoice of model, we are motivated by several factors, including the comment madein Calvet & Fisher (2001) that various economic agents operate at di�erent timescales, which may translate into a possible multiscale mechanism underlying �nan-cial log-returns. We slightly modify the de�nition of the LSW model to include thetime-modulated white noise, the simplest possible linear model for log-returns, asa special case. We then exploit the rescaled time principle to provide a theoreticalexplanation of the \stylised facts" of �nancial time series in the LSW framework.We propose a generic algorithm for estimating the time-varying covariance struc-ture of log-returns, and perform various analyses of log-return data in the LSWframework. These seem to con�rm the appropriateness of the LSW model for theanalysis of this type of data.The work of Chapters 5 and 6 stems from a rather unexpected discovery that acomputationally straightforward modi�cation of the Discrete Haar Transform canbe used to stabilise the variance of �2 and Poisson data. Being able to denoise thewavelet periodogram is essential for understanding the local second order structureof the LSW series under consideration. In Chapter 5, we propose a transforma-tion, called the Haar-Fisz transform, for stabilising the variance of the waveletperiodogram in the Gaussian LSW model, and bringing its distribution closer tonormality. Several theoretical results are established, and the above properties20

of the Haar-Fisz transform are proved under a certain asymptotic regime. TheHaar-Fisz transform is shown to perform excellently in practice. Then, a denois-ing methodology for the wavelet periodogram is proposed, which consists of takingthe Haar-Fisz transform of the periodogram, denoising the transformed vector us-ing any technique suitable for Gaussian data, and taking the inverse Haar-Fisztransform. Simulations and an example involving the Dow Jones series demon-strate the usefulness of the technique.The Haar-Fisz transform constitutes a \bridge" between Chapters 5 and 6. Inthe latter, we propose a similar technique for stabilising the variance of sequences ofPoisson counts and bringing their distribution close to Gaussianity. The Haar-Fisztransform for Poisson data is investigated theoretically for constant intensities, andempirically for non-constant ones. It turns out that in this context, the Haar-Fisztransform is a more e�ective Gaussianiser and variance stabiliser than the tradi-tional square-root transform. A Haar-Fisz-based algorithm for Poisson intensityestimation is proposed: its performance is shown to be typically better than, butoccasionally comparable to, that of the current state-of-the-art techniques.Finally, Chapter 7 concludes with a summary of contributions and a few inter-esting ideas for future research.

21

Chapter 2Literature reviewThis chapter provides an overview of the wavelet theory and reviews recent liter-ature in the two areas of statistics studied in this thesis: time series analysis andnonparametric regression. We place particular emphasis on evolutionary spectraltheory for time series and multiscale methods for Poisson data.2.1 WaveletsWavelets can be informally described as localised, oscillatory functions designed tohave several attractive properties not enjoyed by \big waves" | sines and cosines.Since their \invention" in the early eighties (the term \wavelet" appeared for the�rst time in Morlet et al. (1982)), wavelets have received enormous attention bothin the mathematical community and in the applied sciences. Several monographsappeared, both on the mathematical theory of wavelets (Meyer (1992), Daubechies(1992), Chui (1992), Mallat (1998), Cohen (2003)), as well as on their applications(Ruskai (1992), Ja�ard et al. (2001)).Formally, after Daubechies (1992), we de�ne a wavelet to be any function 2L2(R) which satis�es the admissibility conditionZ 1�1 j (!)j2j!j d! <1: (2.1)22

2.1. WaveletsIn this thesis, we only concentrate on those wavelet functions whose dyadic dila-tions and translations j;k(x) = 2j=2 (2jx� k); j; k 2 Z (2.2)form an orthonormal basis of L2(R). Indices j and k are commonly called scale (ordilation) and location (or translation) parameters, respectively. Condition (2.1)implies, in particular, that ZR (x)dx = 0: (2.3)While (2.1) can be viewed as a requirement that should be localised in frequency,(2.3) can be interpreted as both localisation in time (as it implies 2 L1(R)) andoscillation.Haar (1910) considered an orthonormal basis which would later become arguablythe best-known wavelet system with the wavelet function of the form H(x) = If0�x<1=2g� If1=2�x�1g: (2.4)We say that the wavelet has n vanishing moments ifZ 1�1 xk (x)dx = 0 for k 2 f0; 1; : : : ; ng: (2.5)It is easy to see that H has 0 vanishing moments. Daubechies (1992, Chapter6) identi�es the Extremal Phase family: a collection of orthonormal wavelet basespossessing di�erent degrees of smoothness and numbers of vanishing moments.This family of bases is indexed by the number of vanishing moments and the Haarbasis is its zeroth member. A review of this and other families of wavelets (in-cluding Daubechies' Least Asymmetric family) can be found in Vidakovic (1999),Sections 3.4 and 3.5.The vanishing moments property (2.5), together with the localisation properties(2.1) and (2.3), imply that wavelets are often capable of representing signals ina sparse manner. Coe�cients dj;k of the wavelet expansion of f 2 L2(R) canbe expressed in the usual way as dj;k = hf; j;ki. Larger (smaller) values of jcorrespond to �ner (coarser) scale coe�cients.23

Chapter 2. Literature review2.1.1 Multiresolution analysisIn statistics, we are often faced with discretely-sampled signals and therefore weneed to be able to perform wavelet decomposition of vectors, rather than contin-uous functions as above. The multiresolution analysis framework, �rst introducedby Mallat (1989a,b), is commonly used to de�ne discrete wavelet �lters. The start-ing point is a scaling function � and a multiresolution analysis of L2(R), i.e. asequence fVjgj2Z of closed subspaces of L2(R) such that� f�(x� k)gk2Z is an orthonormal basis for V0;� : : : � V�1 � V0 � V1 � : : : � L2(R);� f 2 Vj () f(2�) 2 Vj+1;� Tj Vj = f0g, Sj Vj = L2(R).The set fp2�(2x � k)gk2Z is an orthonormal basis for V1 since the map f 7!p2f(2�) is an isometry from V0 onto V1. The function � is in V1 so it must havean expansion �(x) = p2Xk hk�(2x� k); fhkgk 2 l2; x 2 R: (2.6)Once we have the scaling function �, we use it to de�ne the wavelet function (alsocalled the mother wavelet) . We de�ne the latter in such a way that f (x� k)gkis an orthonormal basis for the space W0, being the orthogonal complement of V0in V1: V1 = V0 �W0: (2.7)De�ning Wj = spanf j;k : k 2 Zg, we obtain that Wj is the orthogonal comple-ment of Vj in Vj+1. We can writeVj+1 = Vj �Wj = : : : = V0 � jMi=0 Wi! ; (2.8)24

2.1. Waveletsor, taking the limit (recall that Sj Vj is dense in L2(R)),L2(R) = V0 � 1Mi=0 Wi! = Vj0 � 1Mi=j0 Wi! ; 8j0: (2.9)There are precise procedures for �nding once � is known (see Daubechies (1992),Section 5.1). One possibility (Daubechies (1992), Theorem 5.1.1) is to set (x) = p2Xk h1�k(�1)k�(2x� k): (2.10)It can be shown that the appropriate orthogonality conditions are satis�ed.2.1.2 Discrete Wavelet TransformThe nested structure of the multiresolution analysis can be exploited to constructa fast decomposition-reconstruction algorithm for discrete data, analogous to theFast Fourier Transform of Cooley & Tukey (1965). The algorithm, called theDiscrete Wavelet Transform (Mallat (1989a,b)) produces a vector of wavelet co-e�cients of the input vector at dyadic scales and locations. The transformationis linear and orthonormal but is not performed by matrix multiplication to savetime and memory.We �rst describe a single reconstruction step, used in computing the inverseDiscrete Wavelet Transform (DWT). The following two sets are orthonormal basesfor V1: fp2�(2x � k)gk2Z, f�(x � k); (x � l)gk;l2Z. Using (2.6) and (2.10), weobtain for any f 2 V1f(x) = Xk c0;k�(x� k) +Xk d0;k (x� k)= Xl Xk hlc0;k +Xk h1�l(�1)ld0;k!p2�(2x� 2k � l)= Xl0 Xk hl0�2kc0;k +Xk h1�l0+2k(�1)l0d0;k!p2�(2x� l0):Writing the expansion w.r.t. the other basis as f(x) = Pl0 c1;l0p2�(2x � l0) andequating the coe�cients, we obtainc1;l0 = Xk hl0�2kc0;k +Xk h1�l0+2k(�1)l0d0;k; (2.11)25

Chapter 2. Literature reviewwhich completes the reconstruction part: the coarser scale coe�cients fc0;kg; fd0;kgare used to obtain the �ner scale coe�cients fc1;kg.The decomposition step (used in the DWT) is equally straightforward: we havec0;k = Z 1�1 f(x)�(x� k)dx= Z 1�1 f(x)Xl hlp2�(2x� 2k � l)dx= Xl hlc1;2k+l = Xl c1;lhl�2k: (2.12)Similarly, d0;k = Xl (�1)l�2kh1�l+2kc1;l: (2.13)The same mechanism works for each scale: fcj;kg gives fcj�1;kg and fdj�1;kg forall j. On the other hand, fcj;kg can be reconstructed using fcj�1;kg and fdj�1;kgfor all j. To start this \pyramid" algorithm, we only need to compute the scalingcoe�cients cj;k at the �nest scale of interest, say j = J . Indeed, when performingwavelet decomposition of �nite sequences, it is commonly assumed that our inputvector f = ffng2J�1n=0 is a vector of scaling coe�cients of a function f , i.e. fn =cJ;n = hf; �J;ni, where �j;k = 2j=2�(2jx� k). The DWT of f is given byDWT(f) = (c0;0; d0;0; d1;0; d1;1; d2;0; : : : ; d2;3; : : : ; dJ�1;0; : : : ; dJ�1;2J�1�1): (2.14)Informally speaking, the wavelet coe�cients dj;k contain information on the localoscillatory behaviour of f at scale j and location 2J�jk, whereas the coe�cient c0;0contains information on the global \mean level" of f . A few remarks are in order.Decimation. De�ne c�0;k = Xl c1;lhl�kd�0;k = Xl (�1)l�kh1�l+kc1;l;so that c�0;k is a convolution of c1;k with hk, and d�0;k is a convolution ofc1;k with (�1)kh1�k. We have c0;k = c�0;2k and d0;k = d�0;2k: coarser scale26

2.1. Waveletscoe�cients are decimated convolutions of �ner scale coe�cients with �xed(scale-independent) �lters. This is in contrast to the Non-decimated WaveletTransform where no decimation is performed, yielding a shift-invariant (butredundant) transform: see Section 2.1.3 for details.High-pass and low-pass �lters. We de�ne gk = (�1)kh1�k. Due to its e�ectin the frequency domain, gk (hk) is often referred to as a high-pass (low-pass) �lter in the wavelet literature (Daubechies (1992)). This motivates thecommonly used name for the wavelet and scaling coe�cients: they are oftenreferred to as detail and smooth coe�cients, respectively.Example of the DWT. By simple algebra, �H(x) = If0�x�1g generates the Haarwavelet H , with a low-pass �lter hk s.t. h0 = h1 = 1=p2, hk = 0 otherwise,and a high-pass �lter gk s.t. g0 = �g1 = 1=p2, gk = 0 otherwise. We shallnow decompose a four-element vector(c2;0; c2;1; c2;2; c2;3) = (1; 1; 2; 3)using the DWT with Haar wavelets. By (2.12) and (2.13), we obtainc1;0 = 1=p2� 1 + 1=p2� 1 = p2c1;1 = 1=p2� 2 + 1=p2� 3 = 5=p2d1;0 = 1=p2� 1� 1=p2� 1 = 0d1;1 = 1=p2� 2� 1=p2� 3 = �1=p2:Continuing at the next coarser scale, we obtainc0;0 = 1=p2�p2 + 1=p2� 5=p2 = 7=2d0;0 = 1=p2�p2� 1=p2� 5=p2 = �3=2:The original vector (c2;0; c2;1; c2;2; c2;3) can now be easily reconstructed from(c0;0; d0;0; d1;0; d1;1), (i.e. from the smooth coe�cient at the coarsest scale and27

Chapter 2. Literature reviewthe detail coe�cients at all scales) using the inverse DWT. As the DWT isorthonormal, the inverse DWT uses exactly the same �lters as the DWT.Note that the high-pass �lter annihilates constants (recall that Haar waveletshave vanishing moments up to degree 0). Wavelets with higher numbers ofvanishing moments are capable of annihilating polynomials of higher degrees.Boundary issue. With wavelet �lters longer than Haar, there often arises theproblem of what action to perform when the support of the �lter extendsbeyond the support of the input vector. Several solutions have been pro-posed, including symmetric re ection of the input vector at the boundaries,polynomial extrapolation, periodising the vector, padding it out with zeros,etc. See Nason & Silverman (1994) for an overview. Cohen et al. (1993)introduced wavelets on the interval, i.e. wavelet bases for functions de�nedon an interval as opposed to the whole real line. They also proposed a cor-responding fast wavelet transform which uses �lters adapted to the �nitesupport situation. The lifting scheme (see Section 2.1.4) o�ers a natural wayof dealing with the boundary problem.Computational speed. O(n) operations are needed for the DWT which uses acompactly-supported wavelet, where n is the size of the input sequence. Thisis an advantage over the Fast Fourier Transform, which requires O(n log(n))operations.2.1.3 Non-decimated Wavelet TransformAn undesirable property of the DWT is that it is not translation-invariant, and thatat any given scale, it only provides information about the input vector at certain(dyadic) locations. Using the toy example above, the coe�cient c1;0 uses c2;0 andc2;1, while the coe�cient c1;1 uses c2;2 and c2;3, but there is no coe�cient whichwould use, say, c2;1 and c2;2. Motivated by this, Pesquet et al. (1996) introduce aNon-decimated DWT (NDWT) which remedies this problem by computing wavelet28

2.1. Waveletscoe�cients at all possible locations at all scales (see also Nason & Silverman (1995),Coifman & Donoho (1995)). Continuing the example of the previous section, theNDWT of (c2;0; c2;1; c2;2; c2;3) = (1; 1; 2; 3) which uses Haar wavelets is performedas follows. We begin withc1;0 = (1=p2; 1=p2) � (c2;0; c2;1)c1;1 = (1=p2; 1=p2) � (c2;1; c2;2)c1;2 = (1=p2; 1=p2) � (c2;2; c2;3)c1;3 = (1=p2; 1=p2) � (c2;3; c2;0);where the \�" denotes the dot product. The detail coe�cients d1;k are obtainedsimilarly by replacing the low-pass �lter with the high-pass one. Note that weimplicitly assume \periodic" boundary conditions in the above (see the remark onthe \boundary issue" in Section 2.1.2). Before we proceed to the next stage, weinsert zeros between each two elements of the wavelet �lters. Thus, we havec0;0 = (1=p2; 0; 1=p2; 0) � (c1;0; c1;1; c1;2; c1;3)c0;1 = (1=p2; 0; 1=p2; 0) � (c1;1; c1;2; c1;3; c1;0)c0;2 = (1=p2; 0; 1=p2; 0) � (c1;2; c1;3; c1;0; c1;1)c0;3 = (1=p2; 0; 1=p2; 0) � (c1;3; c1;0; c1;1; c1;2);and similarly for the detail coe�cients. The insertion of zeros is necessarysince decimation is not performed. Were we to compute the NDWT at yet an-other scale, we would use the �lter (1=p2; 0; 0; 0; 1=p2; 0; 0; 0) for the smoothand (1=p2; 0; 0; 0;�1=p2; 0; 0; 0) for the detail. The computational speed of theNDWT is O(n log(n)), where n is the length of the input vector.2.1.4 Recent extensions of waveletsSince the late eighties, several extensions and modi�cations of wavelets have beenproposed. We only give a brief overview below.29

Chapter 2. Literature reviewThe construction of multidimensional wavelets is due to Mallat (1989a), whoalso proposed a multivariate version of the DWT. Cohen et al. (1992) introducedbiorthogonal wavelets, where the decomposition and reconstruction steps use di�er-ent non-orthogonal bases which are however, in a certain sense, mutually orthog-onal. Geronimo et al. (1994) formulated multiple wavelets which use translationsand dilations of more than one wavelet function. Lawton (1993) derived complex-valued wavelets (although their construction was already mentioned in Daubechies(1992); see also Lina & Mayrand (1995) for a detailed description and derivation ofcomplex Daubechies' wavelets). Coifman et al. (1989) introduced wavelet packets:redundant collections of linear combinations of wavelets capable of representing sig-nals more economically than wavelets themselves. Wavelet packet coe�cients arerapidly computable by applying both low- and high-pass �lters to both smoothand detail coe�cients, and can be searched for the \best basis" representation(Coifman & Wickerhauser (1992)).Donoho (2000) introduced orthonormal ridgelets, which form a basis of L2(R2)and provide an e�cient representation of so-called ridge functions, i.e. functions ofthe form r�(x) = r(x1 cos(�)+x2 sin(�)), where the ridge pro�le r is not necessarilysmooth. Curvelets (Cand�es & Donoho (2001)), whose theory relies on ridgelets,provide a near-optimal approximation of distributions in 2D which are integralsalong curves and which can be viewed as 2D extensions of the Dirac delta. Donoho& Huo (2002) introduced beamlets, i.e. collections of line segments in 2D, occurringat dyadic locations and scales and at a range of orientations; a beamlet transformof a 2D function is a collection of integrals along beamlets.The lifting scheme, proposed by Sweldens (1996) is a powerful way of generatingmultiscale transforms of (possibly) unequally spaced data. The transform consistsof predict and update steps, is fast and can be performed \in place". For particularchoices of �lters, the lifting scheme generalises the (bi)orthogonal DWT or thewavelet packet transform.Some of the above extensions, as well as some others, are discussed in detail in30

2.1. WaveletsVidakovic (1999), Chapter 5.2.1.5 Applications of waveletsWavelets and their extensions have been applied in a multitude of areas, suchas signal and image processing, data compression, computer graphics, astronomy,quantum mechanics and turbulence: for a discussion of these and other areas ofapplication we refer the reader to the monographs by Ruskai (1992) and Ja�ardet al. (2001). An important �eld of application is numerical analysis, extensivelycovered in Cohen (2003). One can venture to say that wavelets are indeed oneof those fortunate mathematical concepts that have almost become \householdobjects": for example, they were used in the JPEG2000 compression algorithm;and multiscale subdivision schemes, related to wavelets, were employed in somerecent animated movies such as \A Bug's Life" (Mackenzie (2001)). See Cohen(2003), Chapter 2, for an overview of subdivision schemes and related topics.Following Vidakovic (1999), who gives a comprehensive overview of waveletapplications in statistics, we list some of the most important areas of statisticswhere wavelets have been successfully applied:� time series analysis (see Section 2.2 for more details);� non-parametric regression (see Section 2.3 for more details);� estimation of densities (Hall & Patil (1995), Donoho et al. (1996), Penev& Dechevsky (1997), Pinheiro & Vidakovic (1997), Antoniadis et al. (1999),Pensky (1999), Herrick et al. (2001)) and density functionals (Kerkyacharian& Picard (1996), Prakasa Rao (1999));� deconvolution and inverse problems (Donoho (1995), Abramovich & Silver-man (1998), Pensky & Vidakovic (1999), Walter & Shen (1999), Pensky(2002)); 31

Chapter 2. Literature review� statistical turbulence (see Farge et al. (1999) and Schneider & Farge (2001)for reviews).Abramovich et al. (2000) is a useful review article on statistical applications ofwavelets. We give a detailed review of wavelet applications in time series analysis(with a particular emphasis on evolutionary spectral theory) and non-parametricregression in Sections 2.2 and 2.3, respectively.2.1.6 SummaryIn this section, we brie y summarise the attractive features of wavelets shown ormentioned in this chapter.Due to their vanishing moments property and localisation in time, wavelets arecapable of representing certain functions, e.g. piecewise polynomials, in a sparsemanner. As the wavelet coe�cients computed at locations where the function issmooth will be zero, only a few signi�cant coe�cients will su�ce to accurately ap-proximate the function. Also, their simultaneous localisation in time and frequencymakes them potentially useful building blocks for phenomena whose spectral char-acteristics evolve over time.The multiscale structure inherent to wavelets serves two useful purposes: itenables the construction of fast decomposition-reconstruction algorithms, and itmakes them a natural tool for analysing multiscale phenomena. The fact thatit is possible to construct orthonormal wavelet bases is extremely important instatistics, where i.i.d. Gaussian noise in the time domain gets mapped to noisewith the same characteristics in the wavelet domain.Also, in contrast to Fourier analysis where only one set of basis functions isavailable, there are several families of wavelets to choose from.32

2.2. Time series analysis2.2 Time series analysis2.2.1 IntroductionA time series is a collection of random variables fXt; t 2 D � Zg, with t ofteninterpreted as time. Usually, D = N , D = Z or D = f1; 2; : : : ; Tg. With a slightabuse of terminology, an observed realisation of Xt is also often referred to as atime series. Time series arise is several areas of science and technology and timeseries analysis (TSA) is one of the most widely studied branches of statistics, withthe Journal of Time Series Analysis dedicated solely to this important �eld. Tworecommendable monographs are Priestley (1981) and Brockwell & Davis (1987).We say that a time series Xt is stationary when (some of) its statistical propertiesdo not change through time, therefore enabling us, in most cases, to estimate itsparameters consistently. We say that Xt is strict-sense stationary if(Xt1 ; : : : ; Xtn) D= (Xt1+d; : : : ; Xtn+d)for all n, t1; : : : ; tn and d. Often, strict-sense stationarity is too di�cult to verifyand/or too restrictive; one of the weaker concepts is that of covariance stationarity.For a univariate, zero-mean time series Xt, we de�ne its covariance as ~ X(s; t) =E (XsXt). We say that Xt is covariance stationary if ~ X(s; t) = X(jt� sj).It is well known (see e.g. Brockwell & Davis (1987), Theorem 4.8.2) that everyunivariate, zero-mean, covariance stationary discrete-time process has the follow-ing Cram�er representation:Xt = Z �� A(!) exp(i!t)dZ(!); t 2 Z; (2.15)where A(!) is the amplitude and Z(!) is a stochastic process with orthonormalincrements, i.e. E (dZ(!1)dZ(!2)) = d!1�!1=!2 , where � is the Kronecker delta.The parameter ! can be interpreted as frequency: Xt is a weighted linear combi-nation of Fourier exponentials oscillating at various frequencies. Correspondingly,under mild conditions (Brockwell & Davis (1987), Theorem 4.9.2), the covariance33

Chapter 2. Literature reviewfunction of Xt can be expressed as X(�) = Z �� fX(!) exp(i!�)d!; (2.16)where fX(!) := jA(!)j2 is called the spectral density of Xt.Below we list two of the most commonly used time series models. All thede�nitions given below hold for univariate, zero-mean processes.ARMA models. ARMA (Autoregressive Moving Average) processes are ar-guably the most popular time series models used in the applied sciences.An ARMA(p, q) process Xt is de�ned asXt = pXj=1 �jXt�j + "t + qXi=1 �i"t�i; (2.17)where "t is a sequence of independent or uncorrelated identically distributedr.v.s, often assumed Gaussian for tractability. Stationarity of an ARMA(p,q) process is guaranteed by the condition that the polynomial�(z) = 1� �1z � : : :� �pzp (2.18)has no roots in the closed unit disk, e.g. Xt = 0:9Xt�1 + "t is stationary butXt = Xt�1 + "t is not. The spectral density of Xt is given byfX(!) = �22� ��(e�i!)�(e�i!)��2 ; (2.19)where �(z) = 1 + �1z + : : :+ �pzp and �2 = Var("t).ARMA(p, q) is an example of a so-called linear time series model, where Xtand its innovations "t are related by a linear mapping.GARCH models. Several authors have studied �nancial log-return series, i.e.time series of the form Xt = log(Pt=Pt�1), where Pt is a share price, a stockindex, or a currency exchange rate. It has been empirically observed thatmost �nancial log-returns display the following features: the sample mean isclose to zero; the marginal distribution is heavy-tailed; the sample autocorre-lations ofXt are mostly insigni�cant but those of jXtjp decay only very slowly;34

2.2. Time series analysis�nally there are \bursts" of high volatility (standard deviation) among peri-ods of low volatility. These \stylised facts" imply that �nancial log-returnscannot be modelled as stationary linear processes: to preserve stationarity,various non-linear models have been proposed. The Autoregressive Condi-tionally Heteroscedastic (ARCH) model was proposed by Engle (1982), andGeneralised ARCH (GARCH), its most popular extension | independentlyby Bollerslev (1986) and Taylor (1986). The Stochastic Volatility (SV) modelwas suggested by Taylor (1986) as an alternative to ARCH-type modelling.The two families of models are by far the most widely used in practice andthere is massive literature on both of them; Cox et al. (1996) and Maddala& Rao (1996) are two recommendable monographs.The zero-mean GARCH(p,q) model is speci�ed asXt = �tZt; t 2 Z (2.20)�2t = �0 + pXi=1 �iX2t�i + qXj=1 �j�2t�j ;where Zt is symmetric i.i.d. with variance one and �i; �i � 0. In other words,the current standard deviation is a linear deterministic function of the pastsquared returns and/or the past values of the variance. By contrast, in theSV framework, the current variance is modelled as a stochastic function ofthe past returns.Strict-sense stationarity of a GARCH(p, q) process is guaranteed by the wellknown conditions that �0 > 0 andpXi=1 �i + qXj=1 �j < 1; (2.21)see Bougerol & Picard (1992). Davidson (2003) reviews some recent exten-sions to the ARCH model, analyses their moment and memory properties,and proposes a new model. 35

Chapter 2. Literature review2.2.2 Evolutionary spectral theoryTime series which cannot be modelled well as stationary processes arise in several�elds, e.g. biomedical TSA (Nason et al. (2000)) or geophysics (Sakiyama (2002)).Also in �nance, several authors agree that stationary nonlinear processes cannotaccount well for some empirical characteristics of log-return data: see e.g. Mikosch& Starica (2003), Kokoszka & Leipus (2000) (who look at change point detectionin ARCH models) or H�ardle et al. (2000) (who introduce a time-varying SV modeland look at the adaptive estimation of its parameters).In this section, we review some of those time series models which assume thatthe process under consideration can be \well approximated", in some sense, by astationary model over a short stretch of time. An appropriate name for this conceptwould be \local stationarity"; however, this term has already been reserved for asubclass of processes possessing this characteristic so we avoid using it at thisstage. We only restrict ourselves to linear models and refer the reader interestedin non-stationary nonlinear models to the articles on �nancial time series listedabove.Piecewise stationarity, possibly the simplest departure from stationarity, wasconsidered e.g. by Ombao et al. (2001a), who attempted to �nd \optimal" stretchesof stationarity in the series in a data-driven way. Several other approaches assumeda smoother evolution of the second-order structure. Here, two subgroups can bedistinguished:� time-domain approaches, which allow the coe�cients of a parametric model,e.g. AR, to vary slowly with time: M�elard & Herteleer-De Schutter (1989),Dahlhaus et al. (1999), Grillenzoni (2000);� frequency-domain approaches, which control the evolution of frequency-dependent quantities over time: Priestley (1965), Battaglia (1979), Dahlhaus(1997), Mallat et al. (1998), Swift (2000), Ombao et al. (2002).36

2.2. Time series analysisDahlhaus (1996a) introduces an important concept of rescaled time into the anal-ysis of non-stationary time series. In his class of Locally Stationary Fourier (LSF)processes, X is modelled as a triangular stochastic array fXt;TgTt=1, T = 1; 2; : : : ,such that Xt;T = Z �� exp(i!t)A0t;T (!)dZ(!); (2.22)and there exists a function A : [0; 1]� (��; �] 7! C , continuous in the �rst argu-ment, such that supt;! ��A0t;T (!)� A� tT ; !�� KT 8T (2.23)(see the paper for a complete de�nition). The time-rescaling in (2.23) is reminis-cent of nonparametric regression where the function of interest is also de�ned ona �nite interval and possesses some degree of regularity, thus enabling asymptoticconsiderations of e.g. consistency of the estimation procedure. In Dahlhaus' ap-proach, the longer the stretch of the series, the �ner the grid t=T and therefore themore information is gathered about A(u; !) and about the evolutionary spectraldensity de�ned as f(u; !) := jA(u; !)j2. Kim (1998) provides various statisticalanalyses of �nancial and macroeconomic data in the LSF framework.The approach of Nason et al. (2000), which also adopts the rescaled time concept,will be discussed in detail in Section 2.2.4.2.2.3 Wavelets and time seriesWavelets, due to their attractive properties listed in Section 2.1.6, have been usedextensively in TSA. Reviews of wavelet methods in time series forecasting andwavelet smoothing in TSA appear in separate sections (Section 2.2.5 and 2.3.2,respectively). The paper by Nason & von Sachs (1999) reviews the use of waveletsin TSA, and the comprehensive monograph of Percival & Walden (2000) cov-ers, among others, wavelet analysis of long memory processes (see also Vidakovic(1999), Section 9.5). 37

Chapter 2. Literature reviewSeveral authors use wavelets in hypothesis testing in TSA: for example, Neu-mann & von Sachs (2000) propose a test for time series stationarity, Lee & Hong(2001) construct a test for serial correlation, and Whitcher et al. (2002) proposea test for variance homogeneity in long memory processes. Chiann & Morettin(1999) de�ne the wavelet periodogram for stationary processes as a sequence ofsquared wavelet coe�cients of the process; they also analyse some of its proper-ties. Even though this wavelet-based analysis provides useful insight into the data,it is the classical Fourier analysis that can be shown to be \optimal" for station-ary processes, see e.g. Priestley (1965). Nason & Sapatinas (2002) use waveletpackets to model a transfer function between two nonstationary time series. Wanget al. (2001) and Audit et al. (2002), among others, use wavelets to estimate thescaling exponent in self-similar processes. Bilen & Huzurbazar (2002) propose amodel-free method for detecting outliers in time series data using wavelets, Wonget al. (2001) use wavelets to detect jumps and Li & Xie (1997) | hidden pe-riodicities. Walden & Serroukh (2002) construct multi-resolution �lters for theanalysis of matrix-valued time series. Whitcher (2001) proposes a method, basedon wavelet packets, for simulating Gaussian processes with unbounded spectra.Serroukh et al. (2000) investigate time-scale properties of time series in variousmodels by estimating the variance of non-decimated wavelet coe�cients (so-called\wavelet variance") at di�erent scales. Rao & Indukumar (1996) look at higherorder moments of wavelet transforms of nonlinear signals.In �nancial time series, Hong & Lee (2001) develop a test for ARCH e�ects usinga wavelet estimator of the spectral density of the squared residuals at frequencyzero. Struzik (2001) uses wavelets to examine the scaling properties of the S&Pindex, and Gencay et al. (2001) | those of foreign exchange volatility. Ramsey(1999) and Ramsey (2002) review the use of wavelet analysis in �nance and eco-nomics. Gen�cay et al. (2001) is an introductory monograph on wavelet methodsin �nance and economics.38

2.2. Time series analysis2.2.4 The Locally Stationary Wavelet modelWe now move on to describe the time series model whose various aspects arestudied in Chapters 3, 4 and 5 of this thesis. The Locally Stationary Wavelet(LSW) model, due to Nason et al. (2000), is based on two main ingredients:� following Dahlhaus (1996a), it adopts the rescaled time principle;� it replaces the representation with respect to the Fourier basis by a repre-sentation with respect to non-decimated discrete wavelets.Before de�ning the LSW model, we �rst de�ne compactly supported discretewavelet vectors. In what follows, j = �1 denotes the �nest scale, j = �2 isthe second �nest scale, etc. Following Nason et al. (2000), we de�ne the discretewavelet vectors associated with �lters fhkg, fgkg as j = ( j;0; : : : ; j;Lj�1), where �1;n = gn j;n = Xk hn�2k j+1;k for j < �1Lj = (2�j � 1)(Nh � 1) + 1Nh = #fk : hk 6= 0g:For example, Hj;n = 2j=2(If0�n�2�j�1�1g � I2�j�1�n�2�j�1): (2.24)The nondecimated collection j;k(t) of discrete wavelet vectors is formed by shiftingvectors j to all integer locations k: j;k(t) := j;t�k: (2.25)We are now in a position to de�ne the LSW model.De�nition 2.2.1 (Nason et al. (2000)) A triangular stochastic arrayfXt;TgT�1t=0 , for T = 1; 2; : : : , is in the class of LSW processes if there existsa mean-square representationXt;T = �1Xj=�J(T ) 1Xk=�1!j;k;T j;k(t)�j;k; (2.26)39

Chapter 2. Literature reviewwhere j;k(t) are nondecimated discrete wavelet vectors, !j;k;T are real constants,J(T ) = �minfj : Lj � Tg, and f�j;kgj;k are zero-mean orthonormal identicallydistributed random variables. Also, we assume that for each j � �1, there existsa Lipschitz function Wj(z) : [0; 1] ! R such that� P�1j=�1 jWjj2 <1,� the Lipschitz constants Lj satisfy�1Xj=�1 2�jLj <1; (2.27)� there exists a sequence of constants Cj satisfying P�1j=�1Cj <1 such that,for each T ,supk=0;::: ;T�1 j!j;k;T �Wj(k=T )j � Cj=T for j = �1; : : : ;�J(T ): (2.28)The representation in (2.26) can be viewed as a \wavelet counterpart" of theclassical Cram�er representation (2.15). As wavelets are parametrised by scalej and location k, the integration over frequencies in (2.15) is replaced by thesummation over j and k in (2.26). Thus, the representation becomes naturallylocation-dependent (or, in this case, time-dependent).Like in the classical theory, !2j;k;T (the square of the amplitude, or the transferfunction) constitutes a \wavelet spectrum" which measures the power of the seriesat scale j and location k. Our aim will often be to make inference on this quantity;however, if !2j;k;T depends on j; k in an arbitrary fashion, there is no hope ofestimating it accurately: note that we only observe a single row of the triangulararray Xt;T , and there are of order J(T ) � T = O(T log(T )) parameters !2j;k;Tto be estimated. Clearly, we cannot do a good job here unless we control theevolution of this sequence, and this is where the rescaled time property (2.28)comes into play. It ensures that for each j, the sequence f!j;k;TgT�1k=0 evolves slowly,by requiring that it should be \close" to a sequence formed by sampling a regular(here, Lipschitz continuous) function Wj(z) on a �ner and �ner grid. This idea,adopted from Dahlhaus (1996a), embeds inference in the LSW framework into thenonparametric regression setting. Note that, unlike the classical setting, T ! 140

2.2. Time series analysisdoes not mean that more and more future observations arrive; the rows of fXgt;Tare completely di�erent stochastic processes, only linked to each other by the factthat they possess the same asymptotic transfer function Wj(z).The asymptotic evolutionary wavelet spectrum Sj(z) is de�ned in rescaled timeas Sj(z) = Wj(z)2 = limT!1!2j;bzT c;T : (2.29)In the classical theory, the spectral density and the covariance function are Fouriertransforms of each other, see formula (2.16). It is possible to establish an analogousrelationship here. Let cT (z; �) denote the �nite-sample covariance function of Xt;Tat lag � and rescaled time location z:cT (z; �) = E (XbzT c;TXbzT c+�;T ): (2.30)Further, let us recall the de�nition of autocorrelation wavelets j from Nason et al.(2000): j(�) = Xk j;k j;k+� : (2.31)The system fjgj is linearly independent, see Nason et al. (2000), Theorem 2.13.Some other properties of fjgj will be discussed in Chapter 3. Let c(z; �) denotethe asymptotic local covariance function of Xt;T at lag � and rescaled time locationz, de�ned as a transform of fSj(z)gj with respect to the set of autocorrelationwavelets: c(z; �) = �1Xj=�1Sj(z)j(�): (2.32)We quote the following result.Theorem 2.2.1 (Nason et al. (2000)) Under the assumptions of De�nition2.2.1, kc� cTkL1 = O(T�1).Therefore, the asymptotic local covariance c is a good approximation to the �nite-sample covariance cT . Formula (2.32) provides a multiscale decomposition of the41

Chapter 2. Literature reviewcovariance structure of Xt;T . As j(0) = 1 for all j, the local variance decomposesas �2(z) := c(z; 0) = Xj Sj(z): (2.33)Also, the representation (2.32) is invertible: denotingAi;j = X� i(�)j(�); (2.34)we obtain Sj(z) = X� Xi i(�)A�1i;j! c(z; �) (2.35)(see Nason et al. (2000), Theorem 2.15, for the proof of invertibility of A).Proposition 2.17 of Nason et al. (2000) states that all stationary processes withabsolutely summable covariance are LSW processes; for them, the spectrum Sjdoes not depend on the rescaled time z.One way of performing inference on time-varying second-order quantities in theLSW framework is by using the wavelet periodogram, de�ned below.De�nition 2.2.2 (Nason et al. (2000)) Let Xt;T be an LSW process con-structed using the wavelet system . The triangular stochastic arrayI(j)t;T = ��Xs Xs;T j;t�s��2 (2.36)is called the wavelet periodogram of Xt;T at scale j.In practice, the wavelet periodogram is not computed separately for each scale jbut instead, we compute the full NDWT transform of the observed row of Xt;Twith periodic boundary conditions, and then square the wavelet coe�cients toobtain I(j)t;T for t = 0; : : : ; T � 1 and j = �1;�2; : : : ;�J(T ).We quote the following result:Proposition 2.2.1 (Nason et al. (2000)) We haveE I(j)t;T = �1Xi=�1Si� tT �Ai;j +O(2�j=T ): (2.37)42

2.2. Time series analysisIf, in addition, Xt;T is Gaussian, thenVar�I(j)t;T� = 2 �1Xi=�1Si� tT �Ai;j!2 +O(2�j=T ): (2.38)Formulas (2.37) and (2.38) imply that the wavelet periodogram is an inconsistentbut asymptotically unbiased estimator of a quantity which is a linear transform ofthe wavelet spectrum. By (2.37), an estimate of Sj(z) can be obtained by settingSj(z) = P�J(T )i=�1 I(i)bzT c;TA�1i;j . Some properties of this estimator are analysed inNason et al. (2000).Figure 2.1 shows an example of an LSW process whose spectrum is only non-zeroat scales �1 and �3. S�1(z) and S�3(z) (bottom plot) are chosen in such a waythat the local variance c(z; 0) = �2(z) is independent of z, but c(z; �) varies withz for � = 1; 2; : : : ; 7. The top plot shows a sample path of length 512 simulatedfrom this spectrum using Haar wavelets and Gaussian innovations. It is visiblynon-stationary: the series oscillates more rapidly over the time intervals where the�ner-scale spectrum S�1(z) dominates.In the original paper by Nason et al. (2000), the authors apply the LSW model toa biomedical time series. In Chapter 4 of this thesis, we demonstrate the usefulnessof LSW modelling by considering various analyses of �nancial log-return data inthe LSW framework.2.2.5 ForecastingForecasting the future behaviour of time series is, along with understanding thedata generating mechanism, one of the main aims of TSA, and two journals: Jour-nal of Forecasting and International Journal of Forecasting publish articles devotedexclusively to this important area. Having observed X1; : : : ; Xt, the quantity ofinterest to the analyst is often the predictor Xt+h of Xt+h (h > 0) which minimisesthe Mean-Square Prediction Error (MSPE):MSPE(Xt+h; Xt+h) = E (Xt+h �Xt+h)2: (2.39)43

Chapter 2. Literature reviewExample of an LSW process

0 100 200 300 400 500

-2-1

01

23

Rescaled Time

Sca

le

0.0 0.2 0.4 0.6 0.8 1.0

24

68

Figure 2.1: Bottom plot: spectrum of an exemplary LSW process plotted againstthe rescaled time. The y-axis shows negative scale �j. The spectrum is only non-zero at scales �1 and �3. Top plot: a sample path of length 512 simulated fromthis spectrum using Haar wavelets and Gaussian innovations.44

2.2. Time series analysis(2.39) is minimised by Xt+h = E (Xt+h jX1; : : : ; Xt) (2.40)(see Brockwell & Davis (1987), Section 2.7). For Gaussian time series, Xt+h canbe expressed as a linear combination of the past observations:Xt+h = tXi=1 aiXi; (2.41)where ai solve the so-called Yule-Walker equationsCov(Xn; Xt+h) = tXi=1 ai Cov(Xn; Xi); n = 1; : : : ; t; (2.42)which take a particularly simple form when Xt is stationary. GARCH modelsare used to forecast future volatility �t+h and not Xt+h itself: note that in theGARCH model speci�ed by (2.20), the best mean-square predictor of Xt+h, givenby (2.40), is simply zero. See Bera & Higgins (1993) for a discussion on forecastingin ARCH-type models.For non-stationary Gaussian models, various more sophisticated forecastingtechniques have been developed. Kalman �ltering (see e.g. Chat�eld (1996), Chap-ter 10) updates the parameters of the model as new observations arrive and canbe used to produce forecasts. As well as being computationally fast, it exhibitsfast convergence when the underlying model is stationary but is also able to tracethe evolution of non-stationary models. Bayesian forecasting (West & Harrison(1997)) also exploits the principle of \parameter updating". Methods based onneural networks are often applied to the forecasting of non-linear time series, es-pecially in the engineering literature, see e.g. Zhang et al. (2001b). Several othermethods exist: the recent monograph by Chat�eld (2000) provides a comprehen-sive overview.Wavelets have often been used in time series forecasting in conjunction withneural network methods (Geva (1998), Milidiu et al. (1999), Hee et al. (2002),Soltani (2002)). Combined wavelet + neural network techniques were used to45

Chapter 2. Literature reviewforecast electricity demand data (Zhang & Dong (2001)), �nancial time series(Zhang et al. (2001a)) and web tra�c (Aussem & Murtagh (2001)). Some authorshave considered forecasting based on wavelet methods but not supplemented withneural networks. The forecasting method proposed by Wong et al. (2003) relies onthe decomposition of the time series using wavelets into three summands: trend,harmonic and irregular components. Li & Hinich (2002) use wavelets (and other�lter banks) to forecast seasonal patterns. Zheng et al. (2001) apply their SVH-ARMA (state-dependent vector hybrid ARMA) technique to the forecasting ofvector time series constructed by taking the DWT of scalar time series. Masuda &Okabe (2001) base their forecasting technique on the multiscale decomposition ofa time series. The method of Soltani et al. (2000) exploits the decorrelating prop-erty of wavelets to forecast long-memory processes. Zheng et al. (2000) combinewavelets and Kalman �ltering by modelling wavelet coe�cients as state variablesfor the Kalman �lter. Ikeda & Tokinaga (1999) use wavelets to forecast fractaltime series.In Chapter 3 of this thesis, we consider several theoretical and practical aspectsof forecasting LSW processes reviewed in Section 2.2.4 above.2.3 Nonparametric regressionIn this section, we consider the problem of estimating a function f : [0; 1] 7! Rfrom noisy observations yi on an equispaced grid:yi = f(i=n) + �i; i = 1; : : : ; n; (2.43)where the �i's (\noise") are r.v.'s with E (yi) = 0. Denoting the estimator byf : [0; 1] 7! R, we are often only interested in the values of f on fi=ngni=1. Theperformance of f is often measured by the Mean-Square Error (MSE):MSE(f; f) = 1nEkf � fk2l2 : (2.44)Various subclasses of the problem can be identi�ed, depending on the jointdistribution of f�igni=1 and on the smoothness of f . Linear methods produce an46

2.3. Nonparametric regressionestimate f(i=n) by taking a linear transform of the observations: f(i=n) = By,where B is a square matrix, and y = (y1; : : : ; yn)0. Linear methods can oftenbe shown to be optimal in terms of MSE if the underlying function f is smooth.For example, a linear method based on natural cubic splines is optimal for twice-di�erentiable functions in the sense that the estimator minimises the penalisedsum of squares S(f) = nXi=1 (yi � f(i=n))2 + � Z 10 (f 00(x))2dx; (2.45)where the penalty term controls the \roughness" of f (see Green & Silverman(1994)). For reviews of other nonparametric linear methods, including kernelsmoothing, see the monographs of Simono� (1996) and Wand & Jones (1994).2.3.1 Non-linear wavelet smoothingFor less regular (e.g. discontinuous) functions, linear smoothing performs inad-equately, and non-linear smoothing methods are needed. In a seminal paper,Donoho & Johnstone (1994) introduce the principle of a non-linear smoothingmethod called wavelet thresholding. First, the signal is transformed via the DWTto obtain dj;k = �j;k + �j;k, where dj;k, (�j;k, �j;k) is the DWT of yi (f(i=n), �i).Then, dj;k are shrunk towards zero (with the threshold chosen in an appropriatemanner), and �nally the inverse DWT is taken to obtain an estimate of f . Therationale behind this principle is twofold:� As DWT is orthonormal, i.i.d. Gaussian noise in the time domain transformsinto i.i.d. Gaussian noise in the wavelet domain;� Due to the vanishing moments property, wavelet coe�cients �j;k correspond-ing to the locations where the signal is smooth will be close to zero. Onthe other hand, those (hopefully few) corresponding to discontinuities orother irregularities will be signi�cantly di�erent from zero: the signal will berepresented sparsely in the wavelet domain. Therefore, we can expect that47

Chapter 2. Literature reviewan appropriately chosen threshold will be able to accurately separate signalfrom noise.Two thresholding rules have been particularly commonly used and well-studied.For a given threshold �, hard and soft thresholding shrink dj;k todhj;k = dj;kIfjdj;kj>�gdsj;k = sgn(dj;k)(jdj;kj � �)+;respectively. The threshold introduced in Donoho & Johnstone (1994) was theso-called universal threshold, � = �p2 log(n). The authors show that the MSEof the soft thresholding estimator with the universal threshold is close (withina logarithmic factor) to the ideal risk one can achieve by \keeping" or \killing"the wavelet coe�cients dj;k using knowledge of the underlying signal. At the sametime, the universal threshold is an e�cient noise suppressor as described in Section4.2 of their paper.In another ground-breaking paper, Donoho & Johnstone (1995) consider a non-linear wavelet estimator with soft thresholding where the threshold selection proce-dure is based on Stein's shrinkage method for estimating the mean of multivariatenormal variables. They consider the behaviour of the estimator over a range of so-called Besov spaces (see Triebel (1983)), which form an extremely rich collectionof functions with various degrees of smoothness (for certain values of the spaceparameters, Besov spaces can be shown to contain other better known functionspaces such as H�older or Sobolev spaces or the space of functions with boundedvariation). The authors demonstrate that their estimator is simultaneously nearlyminimax over a range of Besov balls, i.e. without knowing the regularity of thefunction, it nearly achieves the optimal rate of convergence which could be achievedif the regularity was known.In most papers on the theory of non-linear wavelet estimation, it is assumedthat the standard deviation � of the noise is known. In practice, it needs to beestimated from the data. For Gaussian data, the method recommended by several48

2.3. Nonparametric regressionauthors (see e.g. Johnstone & Silverman (1997)) computes the scaled MedianAbsolute Deviation (MAD) on the sequence of wavelet coe�cients at the �nestresolution level, thereby ensuring robustness.More recently, other thresholding rules have been proposed. Nason (1996) usescross-validation as a means of selecting the threshold. Abramovich & Benjamini(1996) set up wavelet thresholding as a multiple hypothesis testing problem andpropose an approach based on the so-called false discovery rate. Ogden & Parzen(1996) also adopt the hypothesis testing point of view and use recursive likelihoodratio tests to determine the threshold. Johnstone & Silverman (1997) considerlevel-dependent universal thresholding for correlated Gaussian noise. Averkamp& Houdr�e (2003) extend the approach of Donoho & Johnstone (1994) to othernoise distributions such as exponential, mixture of normals or compactly supporteddistributions. Vanreas et al. (2002) consider stable wavelet transforms for denoisingdata observed on non-equispaced grids. Barber & Nason (2003) develop variousthresholding procedures using complex-valued wavelets. Johnstone & Silverman(2003) propose an empirical Bayes approach to the threshold selection problem.Cai & Silverman (2001), among others, consider block thresholding: they propose athresholding procedure whereby wavelet coe�cients are considered in overlappingblocks and the action performed on the coe�cients in the middle of the blockdepends upon the data in the whole block.Coifman & Donoho (1995) introduce translation invariant denoising: the fullNDWT transform of the data is taken, then the universal threshold is applied toall resulting wavelet coe�cients, and then an inverse NDWT transform yields anestimate of the signal. As the NDWT is redundant, there are many possible waysof generating an inverse NDWT transform: the one proposed by the authors isequivalent to taking the average over all possible DWT's contained in the NDWT,corresponding to all possible circular shifts of the data set (hence the name \trans-lation invariant").49

Chapter 2. Literature review2.3.2 Wavelet shrinkage in time series analysisWavelet shrinkage has been used extensively in the time series context. Gao (1997)proposes an algorithm for wavelet smoothing of the log-periodogram, using asymp-totic normality of the wavelet coe�cients at coarser scales, and adjusting thethresholds for non-normality at �ner scales (in this case, the noise is asymptoti-cally independent but not Gaussian). Neumann (1996) considers wavelet smooth-ing of a \tapered" periodogram for possibly non-Gaussian stationary time series,basing his choice of thresholds on asymptotic normality arguments. Neumann &von Sachs (1997) and von Sachs & Schneider (1996) propose thresholds for esti-mating time-varying spectrum in Dahlhaus' locally stationary time series model.von Sachs & MacGibbon (2000) consider wavelet thresholding of signals contam-inated with locally stationary noise. Nason et al. (2000) propose the followingthreshold for shrinking the wavelet coe�cients ~di;k of the wavelet periodogram I(j)t;Tof a Gaussian LSW process:�i;k;j;T = qVar( ~di;k) log(T ); (2.46)where a pre-estimate of each Var( ~di;k) is required, which can potentially hamper thepractical performance of the method. Also note that the threshold is independentof j. Cristan & Walden (2002) consider wavelet and wavelet packet smoothingof the (tapered and logged) periodogram, and conclude that the wavelet-basedalgorithm performs adequately and therefore the use of wavelet packets is notnecessary (this article complements an earlier paper by Walden et al. (1998)).Truong & Patil (2001) derive MSE's of wavelet-based estimators of density andautoregression functions in stationary time series which satisfy appropriate mixingconditions. Dahlhaus & Neumann (2001) use wavelet shrinkage to estimate atime-varying p-dimensional parameter of the spectral density function of a non-stationary process. Ho�mann (1999) proposes a wavelet thresholding estimator ofthe mean and conditional variance functions in a non-linear AR(1) model.In Chapter 5 of this thesis, we propose a multiscale technique for denoising the50

2.3. Nonparametric regressionwavelet periodogram of a Gaussian LSW process.2.3.3 Wavelet and multiscale methods for Poisson dataSome authors have also considered the problem of estimating the intensity of aPoisson process using a wavelet-based technique. The usual setting is as follows:the possibly inhomogeneous one-dimensional Poisson process is observed on theinterval [0; T ), and discretised into a vector v = (v0; v1; : : : ; vN�1), where vn isthe number of events falling into the interval [nT=N; (n + 1)T=N), and N = 2Jis an integer power of two. Each vn can be thought of as coming from a Poissondistribution with an unknown parameter �n, which needs to be estimated. Notethat in this case the \noise" vn � E (vn) is independent but not identically dis-tributed. The approach proposed by Donoho (1993) consists in �rst preprocessingthe data using Anscombe's (1948) square-root transformation, Av = 2pv + 3=8,so that the noise becomes approximately Gaussian. Then the analysis proceeds asif the noise were indeed Gaussian, yielding (after applying the inverse square-roottransformation) an estimate of the intensity of the process.Besbeas et al. (2004) report that the best performing methods currently avail-able in literature are those based on translation-invariant multiscale Bayesian tech-niques as described in Kolaczyk (1999a) and Timmermann & Nowak (1997, 1999).Kolaczyk (1999a) introduces a Bayesian multiscale algorithm to estimate the dis-cretised intensity. However, rather than transforming the data using a wavelettransform, he considers recursive dyadic partitions, and places prior distributionsat the nodes of the binary trees associated with these partitions. The Bayesianmethods outperform earlier techniques in Kolaczyk (1997, 1999b), Nowak & Bara-niuk (1999) and also the recent technique of Antoniadis & Sapatinas (2001) (sincethe latter is equivalent to Nowak & Baraniuk (1999) for Poisson data). The articleby Sardy et al. (2004) describes a computationally intensive l1-penalised likelihoodmethod which can be used for estimating Poisson intensities.Other recent contributions to the �eld of wavelet-based intensity estimation51

Chapter 2. Literature reviewinclude Patil & Wood (2004), who concentrate on the theoretical MSE propertiesof wavelet intensity estimators, where the intensity is a random process rather thana deterministic function (or, after discretisation, a deterministic vector). Brillinger(1998) gives a brief overview of wavelet-based methodology in the analysis of pointprocess data, and obtains an estimate of the autointensity function of the well-known California earthquake data.In Chapter 6 of this thesis, we propose a multiscale method for estimatingthe discretised intensity function of an inhomogeneous one-dimensional Poissonprocess.

52

Chapter 3Forecasting LSW processesIn this chapter, we consider several theoretical and practical aspects of forecastingGaussian LSW processes. Some results of this chapter were used, in a modi�edform, in the article by P. Fryzlewicz, S. Van Bellegem and R. von Sachs (2003)\Forecasting non-stationary time series by wavelet process modelling" (Annals ofthe Institute of Statistical Mathematics, 55, 737{764). Throughout the thesis, thisarticle will be referred to as Fryzlewicz et al. (2003). The results of the articlewhich are not due to the author are only quoted in this chapter (without proofs)and their authorship is acknowledged.The chapter is organised as follows. In Section 3.1, we investigate the minimi-sation of the approximate Mean Square Prediction Error (MSPE) for a linear pre-dictor in the LSW framework. The reason why approximate MSPE minimisationis preferred is that it involves the uniquely de�ned asymptotic wavelet spectrumfSj(z)gj, unlike the exact MSPE which involves the unidenti�able �nite-sampleparameters !j;k;T . In Section 3.2, we look in detail at the assumptions made inderiving the results of Section 3.1. We identify an assumption which we �nd overlyrestrictive and propose to circumvent the problem by introducing a modi�cationto the LSW model (we call the new class of processes \LSW2"). In Section 3.3, wederive Kolmogorov's formula for the one-step predicition error in the LSW2 model.In Section 3.4, we investigate the behaviour of the (unsmoothed) local covarianceestimator, used to estimate the entries of the approximate prediction matrix. We53

Chapter 3. Forecasting LSW processes�nd that the estimator is asymptotically unbiased but inconsistent and thus needsto be smoothed. In Section 3.5, we propose an algorithm for choosing values of thenuisance parameters arising in the forecasting procedure (including the smoothingparameter for the covariance estimator). Finally, in Section 3.6, we demonstratethe performance of our forecasting algorithm on a time series of yearly values ofthe wind speed anomaly index in a speci�c region of the Paci�c.3.1 Forecasting by approximate MSPE minimi-sationAssume that we have observed X0;T ; X1;T ; : : : ; Xt�1;T and want to predict Xt+h�1;Tfor h = 1; 2; : : : ; T � t. As we are only dealing with Gaussian LSW processes, it islegitimate to consider a linear h-step predictorXt+h�1;T = t�1Xs=0 b(h)t�1�s;TXs;T ; (3.1)where, ideally, we would like the coe�cients fbs;Tgt�1s=0 to minimise the MSPE:E (Xt+h�1;T �Xt+h�1;T )2 =E 0@�J(T )Xj=�1 Xk2Z!j;k;T t�1Xs=0 b(h)t�1�s;T j;k�s � j;k�(t+h�1)! �j;k1A2 =Xj;k !2j;k;T t�1Xs=0 b(h)t�1�s;T j;k�s � j;k�(t+h�1)!2 =T�1Xm=0 T�1Xn=0 ~b(h)m;T~b(h)n;TXj;k !2j;k;T j;k�m j;k�n; (3.2)where ~b(h)n;T = b(h)t�1�n;T for n = 0; : : : ; t� 1~b(h)n;T = �1 for n = t� 1 + h~b(h)n;T = 0 otherwise:54

3.1. Forecasting by approximate MSPE minimisationDe�ning ~b(h)T = (~b(h)0;T ; : : : ;~b(h)T�1;T )(�T )m;n = Cov(Xm;T ; Xn;T ) = Xj;k !2j;k;T j;k�m j;k�n;we can write (3.2) as a quadratic formE (Xt+h�1;T �Xt+h�1;T )2 = ~b(h)T �T �~b(h)T �0 : (3.3)However, (3.3) involves !j;k;T 's which, as we said earlier, are non-identi�able (agiven LSW process does not determine the sequence of !j;k;T 's uniquely), andtherefore cannot be estimated from the data. It is for this reason that in the LSWframework, it is both elegant and useful in practice to approximate quantitiesinvolving f!j;k;Tgj;k by ones involving fWj(z)gj | an approach adopted in theoriginal paper by Nason et al. (2000). We shall now investigate the possibility ofapproximating (3.3) by ~b(h)T BT �~b(h)T �0, where(BT )m;n = �1Xj=�1Xk2ZSj � kT � j;k�m j;k�n: (3.4)Note that both �T and BT are symmetric. We �rst show a result concerning thespectral norms of BT and its inverse. Denote j(!) = Xn j;n exp(i!n): (3.5)Lemma 3.1.1 Let kAk denote the spectral norm of a quadratic matrix A, and letSj = supz Sj(z) and Sj = infz Sj(z). Ifess sup! Xj Sj �� j(!)��2 <1; (3.6)then kBTk is bounded in T . Similarly, ifess inf! Xj Sj �� j(!)��2 > 0; (3.7)then B�1T is bounded in T . 55

Chapter 3. Forecasting LSW processesProof. As BT is nonnegative de�nite, we havekBTk � B1=2T 2 = supkxk22=1 xBTx0; (3.8)andkB�1T k � B�1=2T 2 = supx xB�1T x0xx0 = supx xx0xBTx0 = � infkxk22=1 xBTx0��1 : (3.9)It remains to investigate the behaviour of the quadratic form xBTx0. Denotex(!) = Xn xn exp(i!n):Simple algebra givesxBTx0 = Xj;k Sj(k=T ) Xn xn j;k�n!2� Xj SjXk Xn xn j;k�n!2= 12�Xj Sj Z �� jx(!)j2 �� j(!)��2 dw� ess sup! (Xj Sj �� j(!)��2) 12� Z �� jx(!)j2 d!= ess sup! (Xj Sj �� j(!)��2) xx0;which proves the �rst part of the Lemma. Similar steps (with obvious modi�ca-tions) are used to prove the second part. �Proposition 3.1.1 Let �T and BT arise from an LSW process satisfyingT�1 �1Xj=�J(T )CjLj = oT (1) (3.10)T �1Xj=�J(T )�1Sj = oT (1) (3.11)and assumption (3.7). We have~b(h)T �T �~b(h)T �0 = ~b(h)T BT �~b(h)T �0 (1 + oT (1)): (3.12)56

3.1. Forecasting by approximate MSPE minimisationProof. We �rst consider an approximation by ~b(h)T ~BT �~b(h)T �0, where( ~BT )m;n = �J(T )Xj=�1 Xk2ZSj � kT � j;k�m j;k�n: (3.13)We have ~b(h)T � ~BT � �T��~b(h)T �0 =Xm;nXj;k �Sj � kT �� !2j;k;T�~b(h)m;T~b(h)n;T j;k�m j;k�n �Xm;nXj;k ��Sj � kT �� !2j;k;T �� ~b(h)m;T~b(h)n;T j;k�m j;k�n�� : (3.14)We know from Nason et al. (2000) that��Sj � kT �� !2j;k;T �� CjT : (3.15)Thus, continuing from (3.14), we obtain~b(h)T � ~BT � �T��~b(h)T �0 � T�1Xm;nXj;k Cj ��~b(h)m;T~b(h)n;T j;k�m j;k�n�� := T�1Xj CjXk Xn ��~b(h)n;TIfLj>k�n�0g j;k�n��!2� T�1Xj CjXn �~b(h)n;T�2Xk IfLj>k�n�0gXm 2j;k�m= T�1~b(h)T �~b(h)T �0Xj CjLj;using the Cauchy inequality and the property that Pk 2j;k = 1. By assumption(3.10), we arrive at~b(h)T � ~BT � �T��~b(h)T �0 = ~b(h)T �~b(h)T �0 oT (1): (3.16)Let us now turn to the approximation of ~b(h)T ~BT �~b(h)T �0 by ~b(h)T BT �~b(h)T �0. De-noting b(h)T (!) = Xn exp(i!n)~b(h)n;T ; (3.17)57

Chapter 3. Forecasting LSW processeswe have~b(h)T �BT � ~BT��~b(h)T �0 = Xm;n �1Xj=�J(T )�1Xk Sj(k=T ) j;k�m j;k�n~b(h)m;T~b(h)n;T= �1Xj=�J(T )�1Xk Sj(k=T ) Xn ~b(h)n;T j;k�n!2� �1Xj=�J(T )�1SjXk Xn ~b(h)n;T j;k�n!2= 12� �1Xj=�J(T )�1Sj Z �� b(h)T (!)��2 �� j(!)��2 d!� sup! ��b(h)T (!)��2 �1Xj=�J(T )�1Sj 12� Z �� j(!)��2 d!� ~b(h)T 2l1 �1Xj=�J(T )�1Sj� T ~b(h)T �~b(h)T �0 �1Xj=�J(T )�1Sj= ~b(h)T �~b(h)T �0 oT (1); (3.18)by assumption (3.11). Combining (3.16) and (3.18), we get~b(h)T (BT � �T )�~b(h)T �0 = ~b(h)T �~b(h)T �0 oT (1): (3.19)Noting that ~b(h)T �~b(h)T �0 � ~b(h)T BT �~b(h)T �0 B�1T (3.20)and using the second result of Lemma 3.1.1 completes the proof. �Proposition 3.1.1 implies that there exists a sequence dT # 0 such that~b(h)T BT �~b(h)T �0 (1� dT ) � ~b(h)T �T �~b(h)T �0 � ~b(h)T BT �~b(h)T �0 (1 + dT ) (3.21)which in turn means thatinf ~b(h)T BT �~b(h)T �0 (1� dT ) � inf ~b(h)T �T �~b(h)T �0= MSPE(Xt+h�1;T ; Xt+h�1;T )� inf ~b(h)T BT �~b(h)T �0 (1 + dT );58

3.1. Forecasting by approximate MSPE minimisationwhere the in�mum has been taken w.r.t. ~b(h)0 ; : : : ;~b(h)t�1. Thus, �nding the h-step prediction error is asymptotically equivalent to minimising ~b(h)T BT �~b(h)T �0.Like in the classical (stationary) setting, the minimisation is performed by simpledi�erentiation, yielding the system of prediction (or Yule-Walker) equations:t�1Xn=0 ~b(h)n �1Xj=�1Xk2ZSj � kT � j;k�n j;k�m = �1Xj=�1Xk2ZSj � kT � j;k�(t�1+h) j;k�m;(3.22)for m = 0; 1; : : : ; t� 1. Let Bt;T denote the matrix of this system. By a standardresult in numerical analysis (Kress (1991), Theorem 5.3), the asymptotic stabilityof inversion of the system (3.22) is governed by the so-called condition number,de�ned by cond(Bt;T ) = kBt;Tk B�1t;T : if cond(Bt;T ) � C < 1 as t ! 1, thenthe inversion is asymptotically numerically stable, i.e. \small" perturbations ofthe entries of Bt;T lead to \small" perturbations of the solution. Using identicalreasoning as in Lemma 3.1.1, it can readily be shown that under assumptions (3.6)and (3.7) we have cond(Bt;T ) � C <1 as T !1, uniformly in t.Note that no assumption concerning the asymptotic behaviour of t has beenmade, and indeed, no such assumption is needed for the results of this section tohold.It is interesting to observe that the entries of (Bt;T )m;n are not exactly asymptoticlocal covariances of Xt;T , as they cannot generally be represented in the formc(z;m�n) = P�1j=�1 Sj(z)j(m�n) for any z. However, they can be approximatedby e.g. c((m+ n)=2T;m� n) in the following sense:��(Bt;T )m;n � c�m + n2T ;m� n�� =�� 1Xj=�1Xk2Z�Sj � kT �� Sj �m + n2T �� j;k�m j;k�n�� 1Xj=�1Xk2Z ��Sj � kT �� Sj �m + n2T �� j j;k�m j;k�nj : (3.23)We know from Nason et al. (2000) that jSj(z)�Sj(z+ �)j � Lj�=T . Also, the j'sare compactly supported, so 0 � k �m < Lj and 0 � k � n < Lj, which implies59

Chapter 3. Forecasting LSW processes0 � k � (m+ n)=2 < Lj. Therefore, (3.23) can be bounded from above byT�1 �1Xj=�1LjLjXk2Z j j;k�m j;k�nj � T�1 �1Xj=�1LjLjXk 2j;k= T�1 �1Xj=�1LjLj= O(T�1);using the Cauchy inequality in the �rst step, the property that Pk 2j;k = 1 inthe second step, and the de�nition of the LSW process in the �nal one (note thatLj = O(2�j)). Thus, the entries of Bt;T are uniformly close to the correspondingasymptotic local covariances.If the second-order structure of the process was known, the system of predictionequations (3.22) could be solved e.g. using the innovations algorithm (see Brock-well & Davis (1987), Section 5.2) to yield the prediction coe�cients f~b(h)n gt�1n=0.However, in practice the second-order structure needs to be estimated from thedata: see Section 3.4 for details of the estimation procedure.3.2 A closer look at the results of Section 3.1In this section, we investigate whether the assumptions of Lemma 3.1.1 and Propo-sition 3.1.1 can be regarded as \restrictive" and if so, what can be done to relaxthem.3.2.1 Assumptions of Lemma 3.1.1First of all, note that assumptions (3.6) and (3.7) are \LSW counterparts" of theclassical assumptions from stationary time series theory that the spectral densitybe bounded from above and bounded away from zero (respectively). Indeed, letXt be a stationary LSW process with wavelet spectrum fSjgj, covariance (�) and60

3.2. A closer look at the results of Section 3.1spectral density f(!). We haveZ �� f(!) exp(i!n)d! =Z �� 12�X� (�) exp(�i!�) exp(i!n)d! =Z �� 12�X� Xj Sjj(�) exp(�i!�) exp(i!n)d! =Z �� 12�X� Xj SjXk j;k j;k+� exp(i!k) exp(�i!(� + k)) exp(i!n)d! =Z �� 12�Xj Sj �� j(!)��2 exp(i!n)d!for all n, so that f(!) = 12�Xj Sj �� j(!)��2 a:e: (3.24)and assumptions (3.6) and (3.7) simplify as0 < ess inf! f(!) � ess sup! f(!) <1:Note that in the LSW case, it is necessary to use \ess inf" instead of \inf", due tothe following fact:inf! �1Xj=�1Sj(z) �� j(!)��2 = �1Xj=�1Sj(z) �� j(0)��2= �1Xj=�1Sj(z) ��Xk j;k��2= 0; (3.25)using the property that Pk j;k = 0.There arises a natural question whether there exist LSW processes for which(3.7) is satis�ed, even though, as we have shown in (3.25), the same condition with\ess inf" replaced by \inf" is not satis�ed by any LSW process. The (reassuring)answer is yes: S. Van Bellegem shows in Fryzlewicz et al. (2003) that standardwhite noise is an LSW process with Sj = 2j, so, by (3.24), we must have1 = Xj 2j �� j(!)��2 a:e: (3.26)61

Chapter 3. Forecasting LSW processesfor any system of compactly supported Daubechies' wavelets j, and, clearly, (3.7)is then satis�ed. Let us now give an example of a class of LSW processes for which(3.7) does not hold. We �rst de�ne sparse LSW processes.De�nition 3.2.1 An LSW processXt;T with spectrum fSj(z)gj is said to be sparseif Sj(z) � 0 for all j except for a �nite set.Proposition 3.2.1 No sparse LSW process satis�es (3.7).Proof. Let D = fj : Sj(z) 6� 0g. Being a Fourier transform of a �nite-lengthvector, j(!) is continuous for all j, soPj2D Sj(z) j j(!)j2 is continuous as a �nitesum of continuous functions. Therefore,ess inf! Xj2D Sj(z) j j(!)j2 = inf! Xj2D Sj(z) j j(!)j2 = 0; (3.27)and (3.7) is violated. �This is certainly bad news from the point of view of the philosophy of LSWmodelling. Indeed, sparse LSW processes which have an economical representationin the model and are therefore appealing, are \badly behaved" as far as forecastingis concerned: for them, the system of prediction equations (3.22) cannot be solvednumerically in a stable manner. One of the avenues for future research might beto investigate how this situation can be remedied by modifying the de�nition ofan LSW process.3.2.2 Assumptions of Proposition 3.1.1A purely technical assumption (3.10) controls the evolution of the sequence fCjgj.An assumption like this is inevitable in the context of approximating the �nite-sample MSPE ~b(h)T �T �~b(h)T �0 by ~b(h)T ~BT �~b(h)T �0.On the other hand, assumption (3.11), controlling the \tail behaviour" of thesequence fSjgj, is extremely restrictive. Indeed, even the white noise process does62

3.2. A closer look at the results of Section 3.1not satisfy it: noting that J(T ) = O(log2(T )), we obtainT �1Xj=�J(T )�1 2j = TO(T�1) = O(1) 6= oT (1): (3.28)By inspecting the proof of Proposition 3.1.1, it is easy to see why there is a need tocontrol the tail behaviour of fSjgj. The underlying reason is that in the de�nitionof an LSW process, Xt;T is only built of wavelets at the J(T ) �nest scales (i.e.the summation over j only goes from �1 to �J(T )), whereas the asymptoticquantities such as c(z; �), or indeed ~b(h)T BT �~b(h)T �0, typically involve the waveletspectrum at all scales, i.e. fSj(z)g�1j=�1. Therefore, without controlling the \tail"of the sequence fSj(z)g�1j=�1 in one way or another, we cannot hope to achieve thedesired rates of convergence.However, no assumption controlling the tail of fSj(z)g�1j=�1 is made in the orig-inal paper by Nason et al. (2000). To illustrate the implications of this fact, notethat the result of Proposition 2.11 from Nason et al. (2000) does not formally holdwithout such an assumption. Let us �rst recall the statement of the proposition.Proposition 3.2.2 (Nason et al. (2000)) As T !1, uniformly in � 2 Z andz 2 (0; 1), jcT (z; �)� c(z; �)j = O(T�1).It is easy to �nd a counterexample to the above proposition. Consider a stationaryprocess with !j;k;T = Wj(z) = �1=j. This process is LSW in the sense of Nasonet al. (2000), and jcT (z; 0)� c(z; 0)j = �1X�J(T )�1 1j2 ; (3.29)which behaves likeZ 1log(T ) 1x2dx = �1x ��1log(T ) = log�1(T ) 6= O(T�1); (3.30)and this disproves the authors' claim.An easy way to avoid such \tail considerations" altogether is to assume thatthe summation over j in the de�nition of an LSW process ranges from �1 to �1,63

Chapter 3. Forecasting LSW processeseven in the �nite sample situation. Indeed, this is also implicitly done in the proofof Proposition 2.11 in Nason et al. (2000) (thereby enabling the authors to achievethe claimed rate of O(T�1)).For completeness, we give the amended de�nition below. Note the subscript in\LSW2".De�nition 3.2.2 A triangular stochastic array fXt;TgT�1t=0 , for T = 1; 2; : : : , is inthe class of LSW2 processes if there exists a mean-square representationXt;T = �1Xj=�1 1Xk=�1!j;k;T j;k(t)�j;k; (3.31)where j;k(t) are nondecimated discrete wavelet vectors, !j;k;T are real constants,and f�j;kgj;k are zero-mean orthonormal identically distributed random variables.Also, we assume that for each j � �1, there exists a Lipschitz function Wj(z) :[0; 1] ! R such that� P�1j=�1 jWjj2 <1,� the Lipschitz constants Lj satisfy�1Xj=�1 2�jLj <1; (3.32)� there exists a sequence of constants Cj satisfying P�1j=�1Cj <1 such that,for each T , supk=0;::: ;T�1 j!j;k;T �Wj(k=T )j � Cj=T 8 j: (3.33)In other words, all \building blocks" j are included in the construction of Xt;T ,even in the �nite sample case. What lends credibility to the above de�nition is thefact that similar approach was adopted in Dahlhaus' theory of locally stationaryprocesses, where the entire set of building blocks fexp(i!t)g!2(��;�] was used toconstruct Xt;T , even in the �nite sample situation.Note that if the word \LSW" is replaced by \LSW2" in Proposition 3.1.1, thenassumption (3.11) becomes unnecessary. Lemma 3.1.1 holds for LSW2 processesin an unchanged form. 64

3.3. Kolmogorov's formula for LSW2 processesOn a �nal note, we come back to the proof of Proposition 3.1.1. In the derivationof (3.18), is tempting to write12� �1Xj=�J(T )�1Sj Z �� b(h)T (!)��2 �� j(!)��2 d! =12� Z �� b(h)T (!)��2 �1Xj=�J(T )�1Sj �� j(!)��2 d! �ess sup! �1Xj=�J(T )�1Sj �� j(!)��2 12� Z �� b(h)T (�)��2 d� = (3.34)ess sup! �1Xj=�J(T )�1Sj �� j(!)��2 ~b(h)T �~b(h)T �0 ;in the hope that ess sup! �1Xj=�J(T )�1Sj �� j(!)��2 = oT (1);as certainly would be the case if P�1j=�1 Sj �� j(!)��2 was continuous (by Dini'stheorem). However, we showed in Section 3.2.1 that this need not be the case.Indeed, we haveess sup! �1Xj=�J(T )�1Sj �� j(!)��2 �ess inf! �1Xj=�1Sj �� j(!)��2 � inf! �J(T )Xj=�1 Sj �� j(!)��2 =ess inf! �1Xj=�1Sj �� j(!)��2 ;and the last quantity can be strictly positive, e.g. for white noise. This shows thattransformation step (3.34) would not be helpful in proving Proposition 3.1.1.3.3 Kolmogorov's formula for LSW2 processesIn this section, we state and prove Kolmogorov's formula for the one-step MSPEin the LSW2 framework. The only di�erence with the LSW setting is that thesums over j in the �nite-sample quantities go from �1 to �1 and not to �J(T ).65

Chapter 3. Forecasting LSW processesWe �rst recall the statement of Kolmogorov's formula in the classical stationarysetting (Brockwell & Davis (1987), Theorem 5.8.1).Theorem 3.3.1 Let fXtg be a real-valued zero-mean stationary process with spec-tral density function f . The one-step MSPE of fXtg is�2 = exp� 12� Z �� log(2�f(!))d!� : (3.35)An analogous formula was derived by Dahlhaus in the locally stationary model(Dahlhaus (1996b), Theorem 3.2 (i)). We follow his method of proof here; however,some important modi�cations are needed due to the fact that the building blocksin the LSW model are wavelets and not Fourier exponentials. We �rst introducesome essential notation. The observation domain f0; : : : ; T � 1g is divided intooverlapping blocks Im of length N with shift S (assume that both T and N aremultiples of S). At the edges the length of the blocks is reduced, but the shiftkept so that each observation is contained in exactly N=S blocks:Im = 8>>><>>>: [0; : : : ; mS � 1] m = 1; : : : ; N=S[mS �N; : : : ;mS � 1] m = N=S + 1; : : : ; T=S[mS �N; : : : ; T � 1] m = T=S + 1; : : : ; (T +N)=S � 1: (3.36)If T is not divisible by S then we \clip" the �nal blocks in the natural way (notethat in this case we still have each observation contained in exactly N=S blocks).Let M = (T +N)=S � 1 be the total number of blocks and let tm be an arbitrarypoint 2 Im. For each m = 1; : : : ;M , we de�ne a T � T matrix�D(m)T �u;s = �1Xj=�1Sj �tmT �j(u� s)Ifu;s2Img; (3.37)where the indices u; s go from 0 to T � 1. De�ne furtherDT = SN MXm=1D(m)T (3.38)and c�(k) = supz jc(z; k)j : (3.39)66

3.3. Kolmogorov's formula for LSW2 processesConsider the following set of assumptions. N ! 1 (3.40)S=N ! 0 (3.41)N2=T ! 0 (3.42)1Xk=1 c�(k) < 1 (3.43)�1Xj=�1Lj(Cj + LjLj) < 1 (3.44)(3.40) { (3.42) are purely technical assumptions concerning the behaviour of S andN in relation to T . Assumption (3.43) is a non-stationary equivalent of the shortmemory property. Finally, assumption (3.44) controls the speed of convergence ofthe sequences fCjg and fLjg and is similar to (3.10).We begin with the following lemma.Lemma 3.3.1 Let x be a row vector of length T . Under assumptions (3.40) {(3.44), we have x (�T �DT )x0 = xx0oT (1): (3.45)Proof. De�ne ��(m)T �u;s = (�T )u;s2Im : (3.46)We havex (�T �DT ) x0 = x SN MXm=1 �(m)T �D(m)T ! x0+ T=S�1Xk;l=0 min�jk � lj SN ; 1� S�1Xu;s=0xkS+u (�T )kS+u;lS+s xlS+s:(3.47)We will �rst show that the second term tends to zero. Replace (�T )u;s by c((u+67

Chapter 3. Forecasting LSW processess)=2T; u� s). The second term is bounded byT=S�1Xd=1 min�d SN ; 1� T�1Xu;s=0(d�1)S<ju�sj�dS ��xuc�u+ s2T ; u� s� xs��+R �2xx0 T=S�1Xd=1 min�d SN ; 1� dSXk=(d�1)S+1 c�(k) +R �2xx00@S +pNN 1Xk=1 c�(k) + Xk>pN c�(k)1A+R;and the �rst term in the above sum is of order xx0oT (1) by assumptions (3.40),(3.41) and (3.43). Let us now turn to the remainder R. We haveR � T�1Xu;s=0 ��xuxsXj;k �!2j;k;T � Sj �u+ s2T �� j;k�s j;k�u�� ; (3.48)and, using exactly the same technique as in Proposition 3.1.1, it can be shownthat R = xx0O(T�1) under assumption (3.44).We now consider the main term in (3.47). Denote by Im and Im, respectively,the initial and �nal indices in the segment Im. We havex SN MXm=1 �(m)T �DT!x0 =SN MXm=1Xj;k �!2j;k;T � Sj �tmT �� Xu j;k�uxuIfu2Img!2 �SN MXm=1Xj Im+Lj�1Xk=Im Cj + Lj(Lj +N)T Xu x2uIfu2Img! �SN MXm=1 Xu x2uIfu2Img!Xj (Cj + Lj(Lj +N))(Lj +N)T =xx0Xj (Cj + Lj(Lj +N))(Lj +N)T ; (3.49)where the last equality holds because by construction each xu is contained inexactly N=S segments Im. By assumptions (3.44) and (3.42), the above is of orderxx0oT (1), which completes the proof. �To derive Kolmogorov's formula, we also need another (similar) lemma.68

3.3. Kolmogorov's formula for LSW2 processesLemma 3.3.2 Suppose that assumption (3.44) holds and that there exists a t�such that xu = 0 for all u 62 ft�; : : : ; t� + Lg. Then for each ~t 2 ft�; : : : ; t� + Lgx�Tx0 = Xj Sj � ~tT �Xk t�+LXu=t� xu j;k�u!2 + xx0O�L2T � : (3.50)The proof is completely analogous to the part of the proof of Lemma 3.3.1 leadingto the bound for the main term, i.e. formula (3.49).Before moving on to the statement of Kolmogorov's formula for LSW2 processes,we present an interesting technique for bounding the spectral norms of �T and itsinverse. Suppose that the assumptions of Lemma 3.3.1 hold and that x is a rowvector of length T . As �T is nonnegative de�nite, its spectral norm is boundedfrom above bysupkxk2=1 x�Tx0 = supkxk2=1 x SN MXm=1D(m)T x0 + oT (1)= supkxk2=1 SN MXm=1Xj<0 Sj � tmT �Xk Xu xu j;k�uIfu2Img!2 + oT (1)= supkxk2=1 S2�N MXm=1 Z ��Xj<0 Sj �tmT � �� j(!)��2 �� Xu xuIfu2Imge�i!u��2 d! + oT (1)� ess supz;! Xj<0 Sj(z) �� j(!)��2 supkxk2=1 xx0 + oT (1)= ess supz;! Xj<0 Sj(z) �� j(!)��2 + oT (1):In the same way, it can be shown thatinfkxk2=1 x�Tx0 � ess infz;!Xj<0 Sj(z) �� j(!)��2 + oT (1): (3.51)As we can always choose S and N with the properties required in (3.40) { (3.42),the only restrictive assuptions needed for the above derivation to be valid are (3.43)and (3.44). With these assumptions, k�Tk is bounded in T ifess supz;! Xj<0 Sj(z) �� j(!)��2 <1; (3.52)69

Chapter 3. Forecasting LSW processesand k��1T k is bounded in T ifess infz;!Xj<0 Sj(z) �� j(!)��2 > 0: (3.53)Two remarks are in order.1. Observe that the short-memory property (3.43) implies (3.52). Indeed,Xj Sj(z) �� j(!)��2 = Xj Sj(z)Xk j;k exp(i!k)Xn j;n exp(�i!n)= Xj Sj(z)Xk j;kX� j;k+� exp(�i!�)= X� exp(�i!�)Xj Sj(z)j(�)� X� c�(�):In the classical stationary setting, the analogous well known fact says thatthe absolute summability of the covariance implies the boundedness of thespectral density from above.2. Note that we could also bound the norms of �T and its inverse using Lemma3.1.1 and Proposition 3.1.1: �rst by approximating x�Tx0 by xBTx0, andthen using the boundedness of the norms of BT and its inverse (in otherwords, we could use BT instead of DT ). For the approximation by xBTx0 tobe valid, we would need an LSW2 version of assumption (3.10):�1Xj=�1CjLj <1 (3.54)(recall that (3.11) is not required in the LSW2 setting). With assumption(3.54), k�Tk is bounded in T if (3.6) holds, and k��1T k is bounded in T if(3.7) holds.We are now in a position to state and prove Kolmogorov's formula for LSW2processes. 70

3.3. Kolmogorov's formula for LSW2 processesTheorem 3.3.2 Let Xt;T be an LSW2 process satisfying assumptions (3.43),(3.44) and (3.53). Let Xt;T be the best linear predictor of Xt;T givenX0;T ; : : : ; Xt�1;T . ThenE �Xt;T �Xt;T�2 = exp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!)+ oT (1):(3.55)Proof. Let x be a row vector of length T such that x0; : : : ; xt�1 are arbitrary,xt = �1, and xt+1; : : : ; xT�1 = 0. By Lemma 3.3.1, we havex�Tx0 = xDTx0 + xx0oT (1)= SN MXm=1Xj<0 Sj � tmT �Xk Xu xu j;k�uIfu2Img!2 + xx0oT (1):Let Mt = fm : t 2 Img. For m 2 Mt, we set tm = t. The above expression isbounded from below bySN Xm2MtXj<0 Sj � tmT �Xk Xu xu j;k�uIfu2Img!2 + xx0oT (1): (3.56)Each of the sums over j represents the one-step prediction error for a stationarytime series with spectral densityf(!) = 12�Xj Sj � tT � �� j(!)��2 : (3.57)There are exactly S=N such sums. By classical Kolmogorov's formula, each ofthem is bounded from below byexp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!) :Therefore, the lower bound isx�Tx0 � exp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!)+ xx0oT (1): (3.58)We now turn to the remainder xx0oT (1). We have� xkxk2�� xkxk2�0 � � xkxk2��T � xkxk2�0 ��1T � k�Tk ��1T (3.59)71

Chapter 3. Forecasting LSW processesand thus xx0 � x�Tx0 ��1T � xx0 k�Tk ��1T : (3.60)k�Tk is bounded by assumption (3.43) (see (3.52) and Remark 1. above), and ��1T is bounded by assumption (3.53). Therefore, xx0 = O(x�Tx0) and, contin-uing from (3.58),x�Tx0(1� oT (1)) � exp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!) ; (3.61)which �nally yieldsE �Xt;T �Xt;T�2 = infx x�Tx0� exp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!)+ oT (1):To obtain an upper bound, we set �t = min(t; L) with L2=T ! 0. Let y�0; : : : ; y��t�1be the coe�cients for the best linear predictor for a stationary process (�t obser-vations) with spectral density (3.57). Set xj = y�j+�t�t for j = t� �t; : : : ; t� 1, andxt = �1, all the other components of x being zero. By Lemma 3.3.2, we haveE �Xt;T �Xt;T�2 � x�Tx0= Xj Sj � tT �Xk tXu=t��t xu j;k�u!2 + xx0oT (1):Since the sum over j is the one-step prediction error for the stationary processwith spectral density (3.57), we havex�Tx0 = exp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!)+ xx0oT (1): (3.62)By identical reasoning as in the derivation of the lower bound, this reduces tox�Tx0 = exp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!)+ oT (1) (3.63)andE �Xt;T �Xt;T�2 � exp( 12� Z �� log Xj<0 Sj � tT � �� j(!)��2! d!)+ oT (1);(3.64)which completes the proof. �72

3.4. Estimation of the approximating matrix BT3.4 Estimation of the approximating matrix BTIn order to perform forecasting in practice, we need to be able to estimate theentries of the approximating matrix BT . As was mentioned in Section 2.2.4, Nasonet al. (2000) used the wavelet periodogram, de�ned in (2.36), to perform inferenceon the wavelet spectrum fSj(z)g in the LSW model. We also base our estimatorsof BT in the LSW2 framework on the wavelet periodogram.Note that the wavelet periodogram I(j)t;T is a function of Xt+1�Lj ; : : : ; Xt;T . Ift < Lj�1, then one possibility is to assume periodicity in the data; however, for thetheoretical results of this section to hold, we set I(j)t;T = I(j)Lj�1;T for t = 0; : : : ;Lj�2.We only compute the periodogram down to scale j = �J(t). Throughout thissection, we assume that t=T remains constant as T !1.As was shown in the �nal part of Section 3.1, the entries of BT tend to thecorresponding local autocovariances at the uniform rate of O(T�1). Therefore,asymptotically, estimating the entries of BT is equivalent to estimating the localautocovariance structure of the process. In constructing the estimator of c(z; �),we �rst consider the case � = 0 (local variance). We de�ne our estimator c(k=T; 0)as c� kT ; 0� = �1Xj=�J(t) 2jI(j)k;T ; for k = 0; : : : ; t� 1: (3.65)The extension of (3.65) to � 6= 0 uses the in�nite matrix A, de�ned by (2.34).The invertibility of A for Haar wavelets was proved in Nason et al. (2000). Eventhough numerical results suggest that A is also invertible for other Daubechies'wavelets, no proof of this conjecture has as yet been established. For � 6= 0, wede�ne c(k=T; �) as follows:c� kT ; �� = �1Xj=�J(t) �1Xl=�1(A�1)j;ll(�)! I(j)k;T : (3.66)Before we analyse some properties of c(k=T; �), we quote the following lemma fromFryzlewicz et al. (2003). 73

Chapter 3. Forecasting LSW processesLemma 3.4.1 The matrix A de�ned in (2.34) has the following properties:�1Xj=�1 2jAi;j = 1 (3.67)�1Xj=�1(A�1)i;j = 2i (3.68)�1Xj=�1 ��(A�1)i;j�� = O(2i=2); (3.69)where (3.68) and (3.69) only apply to Haar wavelets.The proofs of (3.67) and (3.68) rely on the following result, due to S. Van Bellegem(Fryzlewicz et al. (2003), Lemma B.2):Xj 2jj(�) = �� ; (3.70)where �� is the Kronecker delta. The proof of (3.68) is also due to S. Van Bellegem.The following proposition concerns the asymptotic behaviour of the �rst twomoments of c(k=T; 0):Proposition 3.4.1 If (3.43) holds, then the estimator (3.65) satis�esE �c� kT ; 0�� = c� kT ; 0�+O(T�1 log(T )): (3.71)If, in addition, the increment process f�j;kg is Gaussian, thenVar�c� kT ; 0�� = 2 �1Xi;j=�J(t) 2i+j X� c(k=T; �)Xn i;n�� j;n!2 +O(T�1):(3.72)Proof. We will �rst showcov Xs Xs;T i;k�s;Xs Xs;T j;k�s! =X� c(k=T; �)Xn i;n�� j;n +O �2�(i+j)=2T�1� : (3.73)We havecov Xs Xs;T i;k�s;Xs Xs;T j;k�s! =Xl;u �Sl� kT � +O�Cl + Ll(u� k)T ��Xs;t l;s�u j;k�s l;t�u i;k�t:74

3.4. Estimation of the approximating matrix BTUsing Lj = O(M2�j) in the �rst step, and the Cauchy inequality in the secondone, we bound the remainder as follows:��Xl;u O�Cl + Ll(u� k)T �Xs;t l;s�u j;k�s l;t�u i;k�t�� Xl Cl +MLl(2�l + min(2�i; 2�j))T Xu ��Xs;t l;s�u j;k�s l;t�u i;k�t�� Xl Cl +MLl(2�l + 2�i=22�j=2)T (Al;j)1=2(Al;i)1=2 =2�(i+j)=2T (Xl (Cl +MLl2�l)2(i+j)=2(Al;j)1=2(Al;i)1=2++Xl MLl(Al;j)1=2(Al;i)1=2) =2�(i+j)=2T fI + IIg:By formula (3.67),I �Xl (Cl +MLl2�l)(2iAl;i + 2jAl;j) �Xl (Cl +MLl2�l)2Xi 2iAl;i � D1:As Pi Li2�i <1, we must have Li � C2i so Pi LiAi;j � C again by (3.67). Thisand the Cauchy inequality giveII � 2M Xl LlAl;i!1=2 Xl LlAl;j!1=2 � D2:The bound for the remainder is therefore O(2�(i+j)=2T�1). For the main term,straightforward computation givesXl;u Sl � kT �Xs;t l;s�u j;k�s l;t�u i;k�t = X� c(k=T; �)Xn i;n�� j;n;which yields formula (3.73). Using (3.70) and (3.73) with i = j, we obtainE fc(k=T; 0)g = �1Xj=�J(t) 2j(X� c(k=T; �)j(�) +O(2�j=T ))= X� c(k=T; �)�� 1Xj=�J(t)�1 2jX� c(k=T; �)j(�)+ O(log(T )=T )= c(k=T; 0) +O T�1X� c�(�)!+O(log(T )=T );75

Chapter 3. Forecasting LSW processeswhich proves the expectation by (3.43). For the variance, observe that, usingGaussianity, we havecov �I(i)k;T ; I(j)k;T� = 2 X� c(k=T; �)Xn i;n�� j;n +O(2�(i+j)=2T�1)!2= 2 X� c(k=T; �)Xn i;n�� j;n!2 +O(2�(i+j)=2T�1);(3.74)

provided that (3.43) holds. Using (3.74), we �nally obtainVar fc(k=T; 0)g = 2 �1Xi;j=�J(t) 2i+j X� c(k=T; �)Xn i;n�� j;n!2 +O(T�1);(3.75)which completes the proof. �Using Lemma 3.4.1, it is possible to show a similar result for � 6= 0. We showthe derivation for the expectation of c(k=T; �) below. Using (3.73), we can write

E fc(k=T; �)g =�1Xj=�J(t) �1Xl=�1(A�1)j;ll(�)!(Xn c(k=T; n)j(n) +O(2�j=T ))�1Xj=�1 �1Xl=�1(A�1)j;ll(�)Xn c(k=T; n)j(n) +� �1Xj=�J(t)�1 �1Xl=�1(A�1)j;ll(�)Xn c(k=T; n) ++O0@T�1 �1Xj=�J(t) 2�j �� 1Xl=�1(A�1)j;ll(�)��1A =I + II + III: 76

3.4. Estimation of the approximating matrix BTWe �rst concentrate on the main term.I = �1Xj=�1 �1Xl=�1(A�1)j;ll(�)Xn �1Xi=�1Si(k=T )i(n)j(n)= �1Xi=�1 �1Xl=�1 �1Xj=�1(A�1)j;lAi;j!Si(k=T )l(�)= �1Xi=�1 �1Xl=�1 �i�lSi(k=T )l(�)= �1Xi=�1Si(k=T )i(�)= c(k=T; �);as expected. We now focus on the remainders. Using (3.69) and (3.43),jIIj = �� 1Xj=�J(t)�1 �1Xl=�1(A�1)j;ll(�)!Xn c(k=T; n)�� 1Xj=�J(t)�1 �1Xl=�1 ��(A�1)j;l��Xn c�(n)= O0@ �1X� log(T ) 2j=21A= O(T�1=2):Similarly, jIIIj = O0@T�1 �1Xj=�J(t) 2�j �� 1Xl=�1(A�1)j;ll(�)��1A= O0@T�1 �1Xj=�J(t) 2�j=21A= O(T�1=2):The above derivation leads toE fc(k=T; �)g = c(k=T; �) +O(T�1=2) for � 6= 0: (3.76)We conclude this section with a remark on the formula for the asymptotic vari-ance (3.72). To gain some insight into this \complicated" formula, we examine77

Chapter 3. Forecasting LSW processeshow it simpli�es in the case of a Gaussian white noise process Xt;T = �Zt, whereZt � N(0; 1) i.i.d. Substituting c(k=T; �) = �2�� and using Pn i;n j;n = �i�j, weobtain Var�c� kT ; 0�� = 2 �1Xi;j=�J(t) 2i+j ��2�i�j�2 +O(T�1)= 2�4 �1Xi=�J(t) 22i +O(T�1)! 2=3�4:Probably the simplest estimator of the local variance in the LSW model can beobtained by simply squaring the relevant observation:c1� kT ; 0� = X2k;T : (3.77)Note that in the case Xt;T = �Zt, we haveVar�c1� kT ; 0�� = 2�4 = 3Var�c� kT ; 0�� : (3.78)3.5 Prediction based on data3.5.1 Nuisance parametersFormula (3.72) shows the inconsistency of c(�; �) for � = 0, and a similar resultcan be shown for � 6= 0. To obtain consistency, we smooth c(�; �) for each �xed� w.r.t. to the �rst argument. We use standard kernel smoothing with a band-width parameter g (later we propose a procedure for choosing g). For speed ofcomputation, we use the same g for all � . We denote the smoothed version of c by~c. In practice, the coe�cients for the linear predictor are computed by invertingthe system of linear equations (3.22), where the entries of Bt;T have been replacedby their empirical versions�Bt;T�m;n = ~c�m+ n2T ;m� n� : (3.79)78

3.5. Prediction based on dataThe vector of the right-hand side coe�cients in (3.22), given by( �1Xj=�1Xk2ZSj � kT � j;k�(t+h�1) j;k�m)m=0;1;::: ;t�1 (3.80)can be approximated by the vector of corresponding local autocovariances�c�t + h+m� 12T ; t+ h� 1�m��m=0;1;::: ;t�1 : (3.81)Therefore, we estimate it by�~c�t+ h+m� 12T ; t+ h� 1�m��m=0;1;::: ;t�1 ; (3.82)where ~c(z; �) for z > (t � 1)=T has been extrapolated from the values ofc(0; �); : : : ; c((t� 1)=T; �) using the same kernel smoothing procedure with band-width g.To achieve a greater forecast accuracy in practice, we reduce the dimension ofthe system of prediction equations (3.22) by considering a \clipped" predictorX(p)t+h�1;T = t�1Xs=t�p b(h)t�1�s;TXs;T ; (3.83)where the index p needs to be chosen from the set f1; : : : ; tg. The matrix ofthe resulting system of empirical prediction equations is now of dimension p � p,instead of (t � 1) � (t � 1), and its entries are estimated using the proceduredescribed above. The construction (3.83) is reminiscent of the classical idea ofAR(p) approximation for stationary processes.In order for the forecasting to be successful, the two nuisance parameters g and pneed to be chosen in a data-driven manner. Section 3.5.3 describes a computationalprocedure for performing this selection.3.5.2 Future observations in rescaled timeAn important ingredient of the rescaled time concept, introduced in Section 2.2.4,is that the data come in the form of a triangular array whose rows correspond todi�erent stochastic processes, only linked through the asymptotic wavelet spectrum79

Chapter 3. Forecasting LSW processessampled on a �ner and �ner grid. This mechanism is inherently di�erent to what weobserve in practice, where, typically, observations arrive one by one and neither thevalues of the \old" observations, nor their corresponding second-order structure,change when a new observation arrives.One way to reconcile the practical setup with the theory is to assume that foran observed process X0; : : : ; Xt�1, there exists a doubly-indexed LSW2 process Ysuch that Xk = Yk;T for k = 0; : : : ; t� 1. When a new observation Xt arrives, theunderlying LSW2 process changes, i.e. there exists another LSW2 process Z suchthat Xk = Zk;T+1 for k = 0; : : : ; t. An essential point underlying the adaptivealgorithm of the next subsection is that the spectra of Y and Z are close to eachother, due to the above construction and the regularity assumptions imposed bythe de�nition of an LSW2 process (in particular, the Lipschitz continuity of Wj(z)).For clarity of presentation, we assume from now on that h = 1. The objectiveof the algorithm is to choose appropriate values of the two nuisance parameters gand p (see the next subsection) in order to forecast Xt from X0; : : : ; Xt�1. Sup-pose that these parameters have been selected well, i.e. that the forecasting hasbeen successful. The closeness of the two spectra implies that we can also ex-pect to successfully forecast Xt+1 from X0; : : : ; Xt using the same, or possibly\neighbouring", values of the nuisance parameters.Bearing in mind the above discussion, we introduce the algorithm with a slightabuse of notation: we drop the second subscript when referring to the observedtime series.3.5.3 Data-driven choice of parametersThe idea of the procedure is to start with some initial values of p and g and togradually update them using a criterion which measures how well the series getspredicted using a given pair of parameters. This type of approach is in the spiritof adaptive forecasting (Ledolter (1981)).Suppose that we observe the series up to Xt�1 and want to forecast Xt using80

3.5. Prediction based on dataan appropriately chosen pair (p; g). The idea of the method is to move backwardsby s observations and choose an initial pair �p(0)0 ; g(0)0 � for predicting Xt�s fromthe observed series up to Xt�s�1. We compute the forecast of Xt�s using not only�p(0)0 ; g(0)0 �, but also the 8 \neighbouring" parameter pairs�p(0)0 � 1; g(0)0 � �� ;�p(0)0 ; g(0)0 � �� ; : : : ;�p(0)0 + 1; g(0)0 + �� ;for a pre-selected constant �. As the true value of Xt�s is known, we are able touse a preset criterion to compare the 9 prediction results, and we set �p(0)1 ; g(0)1 � tobe the pair that corresponds to the best result. In the next step, we use the pair�p(0)1 ; g(0)1 �, as well as its 8 neighbours, to predict Xt�s+1 from X0; : : : ; Xt�s, andobtain �p(0)2 ; g(0)2 � as the pair which resulted in the best forecast. Continuing in thesame fashion until we reach Xt�1, we �nally obtain an updated pair �p(0)s ; g(0)s �,which is used to perform the actual prediction of Xt.Several di�erent criteria can be used to compare the quality of the pairs ofparameters at each step. Denote by Xt�i(p; g) the predictor of Xt�i computedusing the pair (p; g), and by It�i(p; g) | the corresponding 95% prediction intervalbased on the assumption of Gaussianity:It�i(p; g) = h�1:96 �t�i(p; g) + Xt�i(p; g) ; 1:96 �t�i(p; g) + Xt�i(p; g)i ; (3.84)where �2t�i(p; g) is the estimate of MSPE(Xt�i(p; g); Xt�i) computed using formula(3.12) with the remainder neglected. The criterion which we use in the simulationsreported in the next section is to compute��Xt�i � Xt�i(p; g)��lengthfIt�i(p; g)gfor each of the 9 pairs at each step of the procedure and select the updated pairas the one which minimises this ratio.We also need to choose the initial parameters �p(0)0 ; g(0)0 � and the number s ofdata points at the end of the series which are used in the procedure. We suggestthat s should be set to the length of the largest segment at the end of the series81

Chapter 3. Forecasting LSW processeswhich does not contain any apparent breakpoints observed after a visual inspection.If, after a single pass along the segment Xt�s; : : : ; Xt�1, the forecasts are stillinaccurate, then one or more further passes may be necessary: one possibility isthen to set �p(n)0 ; g(n)0 � := �p(n�1)s ; g(n�1)s � and proceed as before.Note that the procedure is completely on-line: when the observation Xt becomesavailable, only a single update of the pair �p(0)s ; g(0)s � is needed to obtain a \good"pair of parameters for predicting Xt+1.There are, obviously, many possible variants of the algorithm. Possible modi-�cations include, for example, using a di�erent criterion, restricting the allowedparameter space for (p; g), penalising certain regions of the parameter space, orpermitting more than one parameter update at each time point.The following section presents an application of the algorithm to a real data set.A more theoretical study of this algorithm is left for future work.3.6 Application of the predictor to real dataIn this section, we study the wind speed anomaly index, i.e. its standard-ised deviation from the mean, in a speci�c region of the Paci�c (12-2N,160E-70W). Modelling this anomaly helps to understand the El Ni~no e�ectin that region (see Philander (1990) for a detailed overview). The time se-ries composed of T = 910 monthly observations is available free of charge athttp://tao.atmos.washington.edu/data sets/eqpacmeridwindts. Figure 3.1shows the plot of the series.Throughout this section, we use Haar wavelets to estimate the local (co)variance.Having provisionally made a safe assumption of the possible non-stationarity ofthe data, we �rst attempt to �nd a suitable pair of parameters (p; g) which willbe used for forecasting the series. By inspecting the acf of the series, and bytrying di�erent values of the bandwidth, we have found that the pair (7; 70) workswell for many segments of the data; indeed, the segment of 100 observations from82

3.6. Application of the predictor to real data

1920 1930 1940 1950 1960 1970 1980 1990

-200

-100

010

0

Figure 3.1: The wind anomaly index (in cm/s). The two vertical lines indicate thesegment shown in Figure 3.2.June 1928 to October 1936 gets predicted very accurately in one-step prediction:96% of the actual observations are contained in the corresponding 95% predictionintervals (formula (3.84)).However, the pair (7; 70) does not appear to be uniformly well suited for fore-casting the whole series. For example, in the segment of 40 observations betweenNovember 1986 and February 1990, only 5% of the observations fall into the cor-responding one-step prediction intervals computed using the above pair of param-eters. This provides strong evidence that the series is non-stationary (indeed, ifit was stationary, we could expect to obtain a similar percentage of accuratelypredicted values in both segments).Motivated by the above observation, we now apply the algorithm described inthe previous section to the segment of 40 observations mentioned above, setting the83

Chapter 3. Forecasting LSW processes

1987 1988 1989 1990

-100

-50

050

Figure 3.2: Comparison between the one-step prediction in the LSW2 model(dashed lines) and AR (dotted lines). The middle line is the predicted value, thetop (bottom) line is the upper (lower) end of the corresponding 95% predictioninterval.initial parameters to (7; 70). After the �rst pass along the segment, the parametersdrift up to (14; 90), and 85% of the observations fall within the prediction intervals,which is indeed a dramatic improvement over the 5% obtained without applyingthe adaptive algorithm. In the second pass, we set the initial values to (14; 90), andobtain a 92:5% coverage by the one-step prediction intervals, with the parametersdrifting up to (14; 104). In the last iteration, we �nally obtain a 95% coverage, andthe parameters get updated to (14; 114). We now have every reason to believe thatthis pair of parameters is well suited for one-step prediction within a short distanceof February 1990. Without performing any further updates, we apply the one-stepforecasting procedure to predict, one by one, the eight observations which followFebruary 1990, the prediction parameters being �xed at (14; 114). The results84

3.7. Conclusionare plotted in Figure 3.2, which also compares our results to those obtained bymeans of AR modelling. At each time point, the order of the AR process ischosen as the one that minimises the AIC criterion, and then the parametersare estimated by means of the standard S-Plus routine. We observe that forboth models, all of the true observed values fall within the corresponding one-step prediction intervals. However, the main gain obtained using our procedureis that the prediction intervals are on average 17.45% narrower in the case of ouralgorithm. This result is not peculiar to AR modelling as this percentage is alsosimilar in comparison with other stationary models, like ARMA(2,10), believed toaccurately �t the series. A similar phenomenon has also been observed at otherpoints of the series.3.7 ConclusionIn this chapter, we have investigated several theoretical and practical aspects offorecasting Gaussian LSW processes. As the model is Gaussian, we have consid-ered the linear predictor where the coe�cients minimise the Mean Square Pre-diction Error (MSPE). The exact MSPE, however, involves parameters which areunidenti�able in the LSW model. Therefore, we have considered the minimisationof an approximation to the MSPE which involves the (uniquely de�ned) waveletspectrum. The derivation of our asymptotic results has been possible due to therescaled time concept which is one of the ingredients of the LSW framework.To overcome a theoretical di�culty arising in the approximation, we have intro-duced a slight modi�cation to the LSW framework and called the new class the\LSW2 model". All subsequent results have been derived for the new model. Inparticular, we have generalised Kolmogorov's formula for one-step-ahead predic-tion error.In practice, the entries of the prediction matrix in the Yule-Walker equationsneed to be estimated. We have analysed the behaviour of the �rst two moments of85

Chapter 3. Forecasting LSW processesthe arising wavelet-based estimators and concluded that the estimators are asymp-totically unbiased but inconsistent. Thus, the estimators need to be smoothed overtime, and therefore we have to choose the smoothing parameter (e.g. the band-width of the smoothing kernel). Moreover, we need to reduce the dimension ofthe prediction equations to avoid too much inaccuracy of the resulting predictioncoe�cients due to estimation errors. We have proposed an automatic computa-tional procedure for selecting these two parameters. Our algorithm is in the spiritof adaptive forecasting as it gradually updates the two parameters basing on thesuccess of prediction.We have applied our new algorithm to a time series of yearly values of thewind speed anomaly index in a speci�c region of the Paci�c. Our non-parametricforecasting algorithm shows interesting advantages over the classical parametricalternative (AR forecasting). Moreover, we believe that one of the biggest ad-vantages of our new algorithm is that it can be successfully applied to a varietyof data sets, ranging from �nancial log-returns (Chapter 4) to series traditionallymodelled as ARMA processes, including in particular data sets which are not, ordo not appear to be, second-order stationary. The S-Plus routines implementingthe algorithm, as well as the data set, are included on the associated CD.

86

Chapter 4Modelling �nancial log-returnseries using wavelets and rescaledtimeIn this chapter, we attempt to model �nancial log-return series as locally stationarytime series in a setup which combines wavelets and the rescaled time concept,and is closely related to the LSW framework of Nason et al. (2000). The initialmotivation for this research can be summarised as follows:1. As was mentioned in Section 2.2, stationary linear time series models can-not capture the \stylised facts" of �nancial log-return series and to preservestationarity, non-linear models, such as (G)ARCH or Stochastic Volatilityhave been proposed. However, some authors (see the references in Section2.2.2) have recently argued that even when non-linear models are used, non-stationary modelling may still be preferred. This provokes another inter-esting general question: once we abandon the assumption of stationarity, isnon-linearity still needed to model �nancial log-returns accurately, or is itsu�cient to stick to linear models?2. Some authors observe that various economic factors operate at di�erent timescales, which may translate into a possible \multiscale" mechanism underly-ing �nancial log-return series (see for example Calvet & Fisher (2001)). On87

Chapter 4. Modelling log-returns using wavelets and rescaled timethe other hand, wavelets are a commonly used tool in the analysis of mul-tiscale phenomena, so a wavelet-based approach may prove to be a suitablemodelling technique here.3. The rescaled time idea, whereby the time-varying (�rst and) second or-der quantities of a process are de�ned on a compact interval, like in non-parametric regression, enables meaningful asymptotic considerations (seeSection 2.2.4). Therefore the hope is that by modelling �nancial log-returnsin a rescaled time framework we will be able to keep track of the asymp-totic behaviour of the statistics of interest (e.g. sample autocorrelationsof the squared returns), which may be helpful in explaining the commonlyobserved \stylised facts" of �nancial time series.The chapter is organised as follows: in Section 4.1, we motivate our methodologyby arguing that daily returns on the FTSE 100 index can be adequately modelledas Gaussian time-modulated white noise (TMWN). In Section 4.2, we introducethe LSW3 model as a modi�cation of the LSW framework of Nason et al. (2000),and show that Gaussian TMWN is a special case of an LSW3 process. In Section4.3, we provide theoretical evidence that LSW3 processes can capture most of thestylised facts of �nancial time series modelling. In Section 4.4, we introduce a new(suitable for log-returns) estimation approach for LSW3 processes, and demon-strate its superiority to the general method of Nason et al. (2000). In Section 4.5,we provide an interesting example of exploratory data analysis using the LSW3model. Finally, in Section 4.6, we apply the adaptive forecasting algorithm of Sec-tion 4.6 to log-returns, and provide a comparison with forecasts based on GARCHmodelling.4.1 Motivating exampleIn this section, we motivate our \linear non-stationary" approach by arguing thatreturns on the daily closing values of the FTSE 100 index can be adequately88

4.1. Motivating examplemodelled as Gaussian time modulated white noise (TMWN), i.e. a process of theform Xt = �tZt, where �t is a deterministic sequence, and Zt's are independentN(0; 1). In Section 4.2, we show that Gaussian TMWN is a special case of awavelet-based time series model, closely related to the LSW framework recalled inSection 2.2.4.For the purpose of this section, let Xt denote 2158 consecutive observations oflogged and di�erenced daily closing values of the FTSE 100 index, from 22/23October 1992 to 10/11 May 2001. The source of the data here, and throughoutthe rest of the chapter, is http://bossa.pl/notowania/daneatech/metastock(page in Polish).Xt is plotted in the top left sub�gure of Figure 4.1. Superimposed on the plotis an estimate �t of the local standard deviation �t (the estimate was obtained bysmoothing X2t using a Gaussian kernel with the bandwidth chosen by trial anderror, and then square-rooting the result; see Section 4.4 for automatic methods ofestimation). Following down the left-hand column, the next plot shows the sampleautocorrelation of Xt, and the plot below it | the sample autocorrelation of X2t .The bottom left sub�gure shows the Q-Q plot of Xt against the normal quantiles.From those plots, it is evident that Xt obeys the well-known \stylised facts": thesample autocorrelations of Xt are negligible, but the sample autocorrelations ofX2t are signi�cant; volatility is clustered; the marginal distribution of Xt is heavy-tailed.The right-hand column provides evidence that Xt can be modelled as GaussianTMWN, which is a linear, but non-stationary stochastic process. Indeed, the topplot shows Zt = Xt=�t, and the plots in the 2nd and 3rd rows | the sample acfof Zt and Z2t , respectively. The bottom right sub�gure shows the Q-Q plot of Ztagainst the normal quantiles. From the inspection of the sample autocorrelationfunctions of Zt and Z2t , it appears that, as a �rst approximation, Zt can be modelledfairly accurately as an i.i.d. sample of N(0; 1) variables. This in turn impliesthat Xt can be modelled as Gaussian TMWN: clearly, there exists a �t such that89

Chapter 4. Modelling log-returns using wavelets and rescaled timeXt = �tZt with Zt i.i.d. � N(0; 1).One of the consequences of the non-stationarity of Xt is the fact that the sampleacf is simply not an appropriate tool for computing the acf of Xt or X2t . We wouldsubmit, and will argue this point later in the chapter, that the \long memory"e�ect in squared log-returns on indices is nothing else than a spurious e�ect ofapplying the sample acf to non-stationary data (see Mikosch & Starica (2003) forsimilar considerations in the GARCH framework).Having demonstrated that daily FTSE 100 can be modelled as Gaussian TMWN,we now proceed to de�ne our wavelet-based model (which is a modi�cation of theLSW setup) and show that Gaussian TMWN is its special case. In Section 4.5,we come back to the example of FTSE 100 and model this series in our waveletframework. We show that, in this way, more local features of the FTSE 100 datacan be picked up.4.2 Wavelet-based modelIn the rescaled time framework, the TMWN process is de�ned asXt;T = �� tT �Zt;T ; (4.1)for t = 0; : : : ; T � 1 and T = 1; 2; : : : , where � 2 C[0; 1] is a \smooth" time-varying standard deviation function, and Zt;T are i.i.d. N(0; 1). Motivated by thediscussion above, we wish to embed this process in a larger, \multiscale" stochasticframework. Clearly, the LSW (or LSW2) model is a good candidate, possessingall the required characteristics: non-stationarity, linearity, rescaled time and amultiscale structure. However, we will now show that in the present formulation,the TMWN is not embedded in either the LSW or the LSW2 model:TMWN 6� LSW(2): (4.2)90

4.2. Wavelet-based model

0 500 1000 1500 2000

-0.0

40.

00.

020.

04

0 500 1000 1500 2000

-20

2

Lag

AC

F

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

oo

o

oo

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

o

oo

ooo o

o

ooo

oo

oo

o

oo

ooo

o

o

oo

ooooo

oo

ooo

o

oo

oo

o

oo o

o

o

o

o oo

o

o

oo

o

oo

ooo

oo

o

oo

o

o

oo

oo

o

o

oo

oo

o

o

ooo oooo

ooo

oooo

o

ooo

oo

oo

oo

o

o

o

o

ooo o

o

o

oo

oo

oo

o

oooooo

o

o o

oo

oo

o

o oo

o

oo

oo

oooo

oo

o

oo

oooo

oo

o oo o

oo

ooo

o

o

o

o ooo o

oo

o

o

oooo

o

oooo

o

oo

oooooo

ooo

oo

o

oooo o

oo

oo

ooo

ooo

oo

oooo

oo

o

o oo

oo ooo

o ooo

o

o

oooooo

oo

oo

oo

oo o

o

oooo

o

oooo

oooo

o

oo

oo

o

oo oo

o

oo

oo

o

o

o

oo o

oo

o

o

oo

oo

o

o

oo

o

o

ooo

o

ooo

oo

o

o

o

o

o

oo

oo

oo o

o

oo

o

o

oo

o

oo

oooo

o o

oo

oo

oo

oo

o

oo

o

o

o

o

ooo o

oo

o

oo

o

oo

oo

oo

oooo

o

o

o

o

o

o

ooo

o

o

o

o

oo

o o

oo

o

o

o

oo

o

o

oo

o

oo

o

o

ooo

ooo

oo

oo

o

o

o

o

o

oo

oooo

o

oo o

o

oo

oo

ooo

oo

o

o

oo

oo

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

oo

o oo

o

o

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

oo

o oo o

oo

o

oo o

oo

o

o

o

oooo o

o oo

o

oooo

o

oo

oo

oo

oo

ooo

ooo

o

oo

oo oo o

o

oo

oo

o

oo

o

oo

o

ooo

o

oo

oo

oo

o

o

o

o

o

o

oooo

oo

o

oo

o

o

o

o oooo

oo

oo

oo

oo oo o o

oo

oo

ooo

o oo oo

o

oo

o

oo

oo

oo

o

oo

o

o

ooo

o

oo

o oo

o

o

o

o

o oo

oo

o

o

oo

o oo

o

o o oo o

oo

o ooo

o

o

ooo

o oo

o

oo

o

oo

oooo

oo

o

ooo

o

oo

oo

o

o o

o

oo

o

o

oo

o

o

ooo

oo

oo

o

o

oo

oo

o

oo o

oooo

o o

oo

o

oo ooo

ooo

o

oo

o

o

o

ooo oo o

o ooo

o

oo oo

oo

o

oooo

oo

oo

ooo

ooo

oo

o

o

o

o oo

o

oooo

oo

o

o

ooo

ooo

o

o

ooo

ooo

o

o o

o

oooo

o

ooo

o

o

o

oo

ooo

oo

ooooo

o

oo o

o

ooo

o

oo

ooo

o o

o

oo

oooo

oo

o

oo

o

o

oo

o ooo o

ooo

oo o

o

ooo

ooo

ooo

ooo

o

o

oo

oo

o

oo

ooooo

o

o oo

o

o

o

oo

oo

oo oo

oo

oooo

ooo

oo

ooo

oo

oo

o

o

ooo oo

o

o oo

oo

oo

o

o

oo

ooo

o

o

ooo

o o

ooo

o

oo o

ooo

oo

oo

o

o oo

o

o

ooo

o o

oo o

o o

oo

o

oo o

o

o

oo

o

oo

o

oo

o

o

oooooo

oo

o

oo

o

ooo

o o

o

oo

oo

oo

o

oo

o

oo

oo

o

o

o

ooooo

ooo

oo o

oo

o

oo

o

o

oooo

oo

o

ooo

o

o

o

oo

o

o

oo

o o

ooo

oo

ooo o oo

o

oo

oo

o oo

o

oooo

ooo

o

o o

ooo o

oo

oo

ooo

oo o

o

oo

oo

oo

o

o

oo

oo

o

o

o

o

oo

o

ooo

oo

o

o

oo

oooo

ooo

o oo

ooo

o

o

o

o

o

o

o

oo

o

o

oo

oo

oo

ooooooo

o

oo

oo

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

oo

o

o

oo

oo

o

o

oo

oo

o

oo

o

o

oo

o o

oo

o

oo

o oo

o

oooo

o

oo

ooo

o

oo

o o

ooo

o

o o

o

ooo

o

o

o

oo

o

o

oo

o

ooo

oo o

o oo

oo

o

oo

o

oo

o

oo

o

oo

o

oo

oo

oo

ooo

o

o

o

oo

o o

o

oo

o

ooo

o

oooo

o

o oo

o

o

o

oo

oo

o

oo

o

o

oo

o

oo

ooo

o

o

ooo oo

oo

o

o

o

o

o

o

o

o

o

oo

o o

o

oo

oo

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

oo

o

o o

oo

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o oo

o

o

o

o

o

o

o

oo

o

oo

oo o

oo

o

o

o

o

o

oo

oo o

o

o

oo

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

ooooo

o

o

o

o

o

oo

o

o

oo o

oo

o

oo

o

o

o

o

oo

oo

oo

o

o

oo

o

oo

o

o

oo

oo

o

ooo

oo

o

o

o

o

ooo

o

o

o

o

o

o o

o

o

o

o

o

o

oo

o

o

o

o o

o oo

ooo

oo

oooo

oo

oo

o

o

o

oo

oo o

oo

o

oo

o

o

o

o

o

ooo

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

oo

ooo

oo

o

o

o

o

o

o

oo

o

o

oo

o

oo

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

ooo

o

o

o

oo

oo

oo

o

o

o

oo

o

oo

o

o oo

oo

o o

o

o oo

oo

ooo

o

o

oo

oo

o

oo

o

o

o

oo

oo

o

oo

oo

oo o

o

o

o

o

o

oo

o

o

o

oo o

o

o

o

o

oo

ooo

oo

o

o

o

o

o

o

o

o

ooo

o

o oo

o

o

ooo

oo

o

o

oo

oo

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

oo

oo

oo

o

o

o

o

o

ooo

oo

oo

o

o

o o

oo

oo

o

oo

o

o

oo

o

oooo o

ooo

o

oo

o

o

oo

o oo

oo

oo

o o o ooo

o

ooo

o

o

ooo

o

o

oo

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

oo

oo

ooo

o

oo

oo

oo

o

ooo

ooo

o

o

oo o

o

o

o

ooo

oo

o

oo

o

oo

oo

oo

oo

oo

o

oo

o

ooo

o

o

o

ooo

ooo

o

o

ooo

o

o

oooo

oooo o

oo

o

o

oo

o

o

o

o

oo

oo

oo o

o

oo

o o

oooo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o o

o

o

o

oo o

o

ooo

o o

o oo

o o

o

o

oo

o

o

Quantiles of Standard Normal

FT

-SE

100

-2 0 2

-0.0

40.

00.

020.

04

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

oo

oo

o

o

o

o

oo

o

o

oo

o

ooo

oo o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

ooo

oo

o o

oo

o

oooo

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

oo

oooo

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

ooo

o

o

o

o

o

o

ooo

o

o

ooo

o

oo

ooo

o

o

o

oo

oo

o

oo

o

o

ooo

o

o

ooo

o

o

o

o

oo

oooo

ooo

o

o

o

o

oooo

oo

oo

o

oo

ooo

o

o

ooo

o

o

o

o

o o

o

oo

o

oo

ooo

o

o

o

o

ooooo

o

o

o

o

oo

o

oo

o

o

ooo

o

oooo

oooo

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

oo

o

o

o

o

oo

o

o

ooo

o

ooo

oo

o

o

o

o

o

o

o

oo

o

o o

o

oo

o

o

oo

o

oo

oo

o

o

o o

oo

oo

o

o

oo

o

o

o

o

o

o

o

oo

o o

oo

o

oo

o

oo

oo

oo

oo

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

oo

o

oo

o

o

o

o

o

o

o

oo

oo

oo

o

oo o

o

oo

o

o

ooo

oo

o

o

o

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

oo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

ooo

o

o

o

o

oo o

oo

o

o

o

o

ooo

o

oo

o

o

oo

oo

o

oo

oo

oo

o

o

oo

o

ooo

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

oo

o

ooo

o

oo

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

oo

o

o

oo

o

o

ooo

oo o

o

o

o

o

o

oo

oo

oo

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

ooo

o

o

o

o o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

oo

o

o

oo

oo

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

oo

o

o o

o

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

oo

o

o

o

oo

o

oo

oo

oo

o

o

o

oo

oo

o

ooo

o

o

o

o

o

o

oo

ooo

o

oo o

o

o

o

o oo

o

o

o

o

oo

o

oo

o

o

oo

o

oo

o

oo

o

o

o

oo

o

o

o

ooo

o

o

o

o

o

oo

ooo

o

o

o

oo

oo

o

o

oo

o

oo

o

o

o

ooo

o

o

o

oo

o

oo

o o

oo

oo

oo

o

oo

o

ooo

o

o

o

o

o o

oo

o

o

o

o ooo

o o

o

oo

o

o

o

o

oo

ooo

o

oo

o

oo

o

o oo

oo

ooo

o

oo

o

o

o

o

o

oo

o

o

ooo

ooo

o

oo

o

o

o

o

o

o

o

o

oo o

o

o

o

ooo

ooo

o

o

o

o

oo

o

o

o

o

o

o

ooo

oo

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

oo

o

oo

o

ooo

o

o

oo

o

oo

o

o

o

oo

o

oo

o

oo

o o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oooooo

o

o

o

oo

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

ooo

oo

oo

o

oo

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

oo o oo

o

oo

o

o

o oo

o

o

o oo

o

ooo

oo

o

ooo

oo

oo

ooo

o

oo

o

oo

oo

oo

o

o

oo

o

oo

o

o

o

oo

o

ooo

oo

o

o

oo

oo

oooo

o

o oo

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

ooooooo

o

oo

oo

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

oo

o

o

oo

oo

o

o

oo

oo

o

oo

o

o

oo

o o

o

o

o

oo

oo

o

o

oooo

o

oo

ooo

o

oo

oo

o

oo

o

oo

o

oo

o

o

o

o

oo

o

o

o

o

o

ooo

oo

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

oo

o

oo

oo

oo

ooo

o

o

o

oo

oo

o

oo

o

o

oo

o

oooo

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

oo

oo

ooo

o

o

ooooo

oo

o

o

o

o

oo

o

o

o

oo

oo

o

oo

oo

o

o

o

o

o

o

oo

o

o

o

o

o o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

oo

oo

o

o

oo

o

ooo

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o oo

o

o

o

o

oo

o

oo

ooo

oo o

oo

oo

o

o

o

ooo

o o

o

o

oo

ooo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

ooooo

o

o

o

o

o

oo

o

o

ooo

oo

o

oo

o

o

o

o

oo

o

o

oo

o

o

oo

o

oo

o

o

o

o

oo

o

ooo

oo

o

o

o

o

ooo

o

o

o

o

o

o o

o

o

o

o

o

o

o

oo

o

o

oo

oo

o

oo

o

oo

oooo

oo

oo

o

o

o

oo

o

oo

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oooo

oo

o

o

o

o

o

o

oo

o

o

oo

o

oo

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

ooo

o

o

o

oo

oo

oo

o

o

o

oo

o

oo

o

o oo

oo

oo

o

o oo

oo

ooo

o

o

oo

oo

o

oo

o

o

o

oo

oo

o

oo

oo

ooo

o

o

o

o

oo

o

o

o

o

oo o

o

o

o

o

oo

ooo

oo

o

o

o

o

o

o

o

o

ooo

o

ooo

o

o

ooo

oo

o

o

oo

oo

oo

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

oo

oo

oo

o

o

o

o

o

ooo

o

ooo

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

oooo

o

ooo

o

oo

o

o

oo

ooo

o

oo

o

o oo

ooo

o

o

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

oo

ooo

o

oo

oo

oo

o

o

ooo

oo

o

o

oo o

o

o

o

ooo

oo

o

oo

o

oo

oo

oo

o

o

oo

o

oo

o

oo

o

o

o

o

ooo

oo

o

o

o

ooo

o

o

o

ooo

oo

oo o

o

oo

o

oo

o

o

o

o

oo

oo

oo o

o

oo

o o

oooo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o o

o

o

o

oo o

o

ooo

o o

o oo

o o

o

o

oo

o

o


Sta

ndar

dise

d F

T-S

E10

0

-2 0 2

-20

2

Figure 4.1: Left-hand column, from top to bottom: Xt with �t superimposed, acfof Xt, acf of X2t , qqnorm plot of Xt. Right-hand column, from top to bottom: Zt,acf of Zt, acf of Z2t , qqnorm plot of Zt. See Section 4.1 for a discussion.91

Chapter 4. Modelling log-returns using wavelets and rescaled timeIndeed, multiplying both sides of (3.70) by �2(z), we obtain that the waveletspectrum of a TMWN process (4.1) isSj(z) = 2j�2(z); (4.3)which leads to Wj(z) = 2j=2�(z). Assume that �(z) is Lipschitz-continuous withthe Lipschitz constant L. The Lipschitz constants Lj for Wj are therefore equal toLj = 2j=2L: however, this clearly violates the summability conditions (2.27) and(3.32).To remedy this unwelcome situation, we introduce an alternative model which isa modi�ed version of the LSW and LSW2 setups. We call the new class \LSW3".De�nition 4.2.1 A triangular stochastic array fXt;TgT�1t=0 , for T = 1; 2; : : : , is inthe class of LSW3 processes if there exists a mean-square representationXt;T = �1Xj=�1 1Xk=�1!j;k;T j;k(t)�j;k; (4.4)where j;k(t) are nondecimated discrete wavelet vectors, !j;k;T are real constants,and f�j;kgj;k are zero-mean orthonormal identically distributed random variables.Also, we assume that for each j � �1, there exists a continuous function Wj(z) :R ! R such that Sj := W 2j is Lipschitz with constant Lj and� Wj(z) = Wj(0) for z < 0 and Wj(z) = Wj(1) for z > 1,� P�1j=�1 Sj <1,� the Lipschitz constants Lj satisfy9D <1 8 j � �1 Lj2�j � D; (4.5)� there exists a sequence of constants Cj satisfying P�1j=�1Cj <1 such that,for each T , supk j!2j;k;T � Sj(k=T )j � Cj=T 8 j: (4.6)Three remarks are worth making at this point.92

4.2. Wavelet-based model1. Nason et al. (2000) control the evolution of the second order structure ofLSW processes by making certain assumptions on the Lipschitz constantsof fWj(z)gj, as well as on the distances supk j!j;k;T �Wj(k=T )j (we adoptthe same convention in the de�nition of the LSW2 process). However whatis really required in the second-order theory is the Lipschitzness of Sj(z) =Wj(z)2 and bounds on the distances supk j!2j;k;T �Sj(k=T )j. Here is how thelatter set of assumptions relates to the former: we havej!2j;k;T � Sj(k=T )j = j!j;k;T �Wj(k=T )jj!j;k;T �Wj(k=T ) + 2Wj(k=T )j� Cj=T (Cj=T + 2Wj(z))� Cj=T (Cj=T + 2 (Wj(z) _ 1))� Cj=T (Cj=T + 2 (Sj(z) _ 1))� Cj=T Cj=T + 2 supz Xj Sj(z) _ 1!!� K1Cj=T:Also, jSj((k + l)=T )� Sj(k=T )j �Ljjlj=T jWj((k + l)=T )�Wj(k=T ) + 2Wj(k=T )j �Ljjlj=T Ljjlj=T + 2 supz Xj Sj(z) _ 1!! �K2Ljjlj=T:Therefore, in deriving rates of convergence in the LSW (LSW2) model, wecan \ignore" the constants K1, K2 and assume that j!2j;k;T�Sj(k=T )j � Cj=Tand jSj((k + l)=T ) � Sj(k=T )j � Ljjlj=T , which is implicitly done both inNason et al. (2000) and in Chapter 3. Note that the de�nition of an LSW3process contains these two assumptions in an explicit form, but does notcontain the \redundant" assumptions that j!j;k;T � Wj(k=T )j � Cj=T orjWj((k + l)=T )�Wj(k=T )j � Ljjlj=T .93

Chapter 4. Modelling log-returns using wavelets and rescaled time2. Note that the summability conditions (2.27) and (3.32) are stronger than(4.5). Indeed, Pj 2�jLj <1 implies (but is not equivalent to) 2�jLj ! 0,which in turn implies (but is not equivalent to) (4.5).3. Note that we also de�ne Wj(z) and Sj(z) beyond the interval [0; 1] (weassume that Wj and Sj are continuous everywhere and constant outside[0; 1]). This will be needed later in the proof of Proposition 4.2.1.It is easy to see that Gaussian TMWN with � Lipschitz satis�es the assumptionsof De�nition 4.2.1 with !j;k;T = Wj(k=T ) = �(k=T )2j=2.The de�nitions of the local (co)variance and other quantities in the LSW3 modelare analogous to those in the LSW framework (see Section 2.2.4). Under theassumptions of De�nition 4.2.1, the evolutionary wavelet spectrum Sj(z) and thelocal autocovariance c(z; �) remain uniquely de�ned (the proof of this statementis identical to Nason et al. (2000), Theorem 1). We are also in a position to provethe following proposition:Proposition 4.2.1 Under the assumptions of De�nition 4.2.1, kcT � ckL1 =O(T�1 log(T )).Proof. We havejcT (l=T; �)� c(l=T; �)j = ��Xj;k �!2j;k;T � Sj � lT �� j;k�l j;k�l�� Xj;k �CjT + ��Sj � kT �� Sj � lT �� j j;k�l j;k�l�� j :(4.7)Now, observe that on the one hand,��Sj � kT �� Sj � lT �� Ljjk � ljT � LjLjT ; (4.8)due to the compact support of j. However, on the other hand,��Sj � kT �� Sj � lT �� Lj; (4.9)94

4.2. Wavelet-based modelusing the fact that Sj is Lipschitz (with constant Lj) and constant outside [0; 1].Therefore, continuing from (4.7), we haveXj;k �CjT + ��Sj � kT �� Sj � lT �� j j;k�l j;k�l�� j �O(T�1) +Xj Lj min�LjT ; 1� ; (4.10)by applying the Cauchy inequality for Pk j j;k�l j;k�l�� j. Let us now concentrateon the sum in (4.10). Recall that Lj �M2�j for some constant M . We haveXj Lj min�LjT ; 1� � Xj Lj min�M2�jT ; 1�= MT �blog2(T=M)cXj=�1 Lj2�j + Xj<�blog2(T=M)cLj� MT �blog2(T )cXj=�1 D +D Xj<�blog2(T=M)c 2j� MD log2(T )T +O(DM=T );which completes the proof. �Throughout the rest of the chapter, we will work with De�nition 4.2.1, ratherthan De�nition 2.2.1 or 3.2.2.Innovations �j;k. Throughout the chapter, we stick to �j;k i.i.d. N(0; 1). Gaus-sian innovations cannot account for skewed \marginal" distributions or ex-treme events, such as those present in the Nikkei index (left-hand plot inFigure 4.2) or the Dow Jones Industrial Average (DJIA) index (right-handplot in Figure 4.2). We believe that these stylised facts can be captured byan appropriate choice of the distribution of �j;k, e.g. a mixture of normalswould have a better chance of picking up the occasional \spikes" in the se-ries (see, again, Figure 4.2). Also, a combination of skewed innovations and\skewed" wavelets (i.e. such that Pk 3j;k 6= 0) would be able to pick up theoften-observed skewness of the log-return data. However, the emphasis in95

Chapter 4. Modelling log-returns using wavelets and rescaled timeNikkei

0 2000 6000

-0.1

50.

00.

10

DJIA

0 500 1000 1500

-0.0

60.

0

Figure 4.2: Left-hand plot: log-returns on daily closing values of Nikkei (5/6 Jan1970 { 11/14 May 2001). Right-hand plot: log-returns on daily closing values ofthe Dow Jones Industrial Average (3/4 Jan 1995 { 10/11 May 2001).this chapter is on the non-stationarity of the log-return series, and not onthe possible non-Gaussianity of the innovations. Therefore, we restrict our-selves to Gaussian innovations in the theoretical considerations, leaving anextension to other distributions as an interesting direction for future study.Trend. Throughout the chapter, we assume E (Xt;T ) = 0 (as is obvious from Def-inition 4.2.1). A more thorough study would also incorporate trend �(t=T )in the model. This trend could then be estimated by wavelet methods, seee.g. von Sachs & MacGibbon (2000).4.3 Explanation of the stylised factsIn this section, we demonstrate that Gaussian LSW3 processes can successfullyaccount for the following stylised facts of �nancial log-returns:� heavy tails of the \marginal" distribution,� negligible sample autocorrelations,� non-negligible sample autocorrelations of the squares,96

4.3. Explanation of the stylised facts� clustering of volatility.4.3.1 Heavy tails of the \marginal" distributionIn this section, we consider the sample second moment and the sample fourthmoment: mT2 (X) = 1T T�1Xt=0 X2t;TmT4 (X) = 1T T�1Xt=0 X4t;T :For stationary Gaussian processes, we could expect that mT4 (X)=(mT2 (X))2 � 3.However, the following demonstrates that this ratio is \spuriously" distorted if thevariance �2(z) of Xt;T varies over time.E �mT4 (X)� = 3T T�1Xt=0 �4� tT �+ O(log(T )=T )= 3T T�1Xt=0 �2� tT �� 1T T�1Xs=0 �2 � sT �!2+ 3 1T T�1Xt=0 �2� tT �!2 +O(log(T )=T )= 3T T�1Xt=0 �2� tT �� 1T T�1Xs=0 �2 � sT �!2+ 3 �E (mT2 (X))�2 +O(log(T )=T ):For the purpose of this section, denote the �rst summand in the above formula byA2. Obviously, A2 = 0 for all T if and only if �2(z) is constant w.r.t. z. Therefore,for a non-constant �2(z) and for large T , we havemT4 (X)(mT2 (X))2 � A2(mT2 (X))2 + 3 > 3: (4.11)The above relationship provides a heuristic explanation of the fact that themarginal distribution of processes with a non-constant variance appears heavy-tailed when the sample fourth moment and the sample second moment are (incor-rectly) applied to them. 97

Chapter 4. Modelling log-returns using wavelets and rescaled time4.3.2 Sample autocorrelations of Xt;T and X2t;TLike in Mikosch & Starica (2003), we consider the sample autocovariance function T (X; h) = 1T T�1�hXt=0 Xt;TXt+h;T � 1T T�1Xt=0 Xt;T!2 ; (4.12)and the sample autocorrelation function�T (X; h) = T (X; h) T (X; 0) : (4.13)Also, de�ne the scalogram: ~STj = 1T T�1Xt=0 Sj � tT � : (4.14)The following proposition provides a representation of the expectation of the sam-ple autocovariance in terms of the scalogram. The implications of the propositionare discussed after the proof.Proposition 4.3.1 Let an LSW3 process Xt;T satisfysupz 1X�=�1 jc(z; �)j <1: (4.15)We have E ( T (X; h)) = �1Xj=�1 ~STj j(h) +O�h + log(T )T � : (4.16)Proof. We haveE 8<: 1T T�1�hXt=0 Xt;TXt+h;T � 1T T�1Xt=0 Xt;T!29=; =1T T�1�hXt=0 c� tT ; h�� 1T 2 T�1Xt;t0=0 c� tT ; t� t0�+O� log(T )T � =1T T�1Xt=0 c� tT ; h�� 1T 2 T�1Xt;t0=0 c� tT ; t� t0�+O�h + log(T )T � :Let us now consider the second summand.�� 1T 2 T�1Xt;t0=0 c� tT ; t� t0�� 1T 2 T�1Xt=0 1X�=�1 ��c� tT ; �� 1T supz 1X�=�1 jc(z; �)j= O(T�1);98

4.3. Explanation of the stylised factsby assumption (4.15). Noting that 1T PT�1t=0 c � tT ; h� = P�1j=�1 ~STj j(h) completesthe proof. �The representation (4.16) implies that the sample autocorrelations at positivelags will be negligible provided that P�1j=�1 ~STj j(h) is \close" to C�h. By formula(3.70), this is guaranteed by ~STj being \close" to C2j, which is indeed often thecase, as the examples provided in Section 4.5 demonstrate. This would explainthe frequently occurring negligible sample autocorrelations of log-returns.An analogous proposition can be formulated for the sample autocovariance ofX2t;T . The following result is true.Proposition 4.3.2 Let a Gaussian LSW3 process Xt;T satisfy (4.15). We haveE ( T (X2; h)) = 1T T�1Xt=0 �2� tT �� 1T T�1Xs=0 �2 � sT �!2 + 2T T�1Xt=0 c2� tT ; h�+ O�h+ log(T )T � :Proof. Denote �(z) = P� c2(z; �). By assumption (4.15), supz �(z) <1. UsingGaussianity, we obtainE 8<: 1T T�1�hXt=0 X2t;TX2t+h;T � 1T T�1Xt=0 X2t;T!29=; =1T T�1�hXt=0 (�2� tT � �2�t+ hT � + 2�c� tT ; h��2)+� 1T 2 T�1Xt;t0=0(�2� tT � �2� t0T �+ 2�c� tT ; t� t0��2)+O� log(T )T � =1T T�1Xt=0 (�4� tT �+ 2�c� tT ; h��2)+O� hT �+� 1T T�1Xt=0 �2� tT �!2 � 2T 2 T�1Xt=0 �� tT � +O� log(T )T � =1T T�1Xt=0 �2� tT �� 1T T�1Xs=0 �2 � sT �!2 + 2T T�1Xt=0 c2� tT ; h� +O�h+ log(T )T � ;and the proof is completed. �99

Chapter 4. Modelling log-returns using wavelets and rescaled timeFor the purpose of this paragraph, denote the �rst summand of formula (4.17)by A2, and the second one by B2(h). Two spurious e�ects can potentially beobserved here. If the variance �2(z) is non-constant, A2 always gives a spuriouspositive contribution to the sample autocovariance. Note that A2 is independentof h, which explains the fact that the sample autocovariance of the squares oftendecays very slowly (a feature which cannot be picked up by classical GARCHmodels, see again Mikosch & Starica (2003)). For extremely large h, the remainderO(h=T ) often makes the positive contribution of A2 less pronounced.The second spurious e�ect is due to B2(h), which distorts the information aboutthe local autocovariance by averaging it over time. Things are not recti�ed in thecase of the sample autocorrelation, either: as an example, consider again TMWN.For a non-constant �2(z) and h 6= 0, we have�T (X2; h) = T (X2; h) T (X2; 0) � A2 + 0A2 +B2(0) > 0B2(0) = 0;while, obviously, we would expect a good estimate to return a value close to zero.A similar mechanism works in the case of absolute values.4.3.3 Clustering of volatilityThe \clustering of volatility" or, in other words, a \slowly varying local variance" isindeed one of the features of LSW (LSW2, LSW3) modelling. Occasional \spikes"in the log-return series, see for example Figure 4.2, are clearly against this principle.Yet, we believe that this problem can be recti�ed by resorting to non-Gaussianinnovations �j;k, e.g. modelled by mixtures of normal variables. As mentioned inthe penultimate paragraph of Section 4.2, a more thorough investigation of thispossibility is left for future study.4.4 EstimationTo estimate the spectrum, Nason et al. (2000) use the wavelet periodogram I(j)t;T ,de�ned by formula (2.36). In our altered setup of De�nition 4.2.1, we will also use100

4.4. Estimationthe statistic de�ned by (2.36). The following proposition holds.Proposition 4.4.1 Let Xt;T satisfy De�nition 4.2.1. We haveEI(j)t;T = �1Xi=�1Si� tT �Ai;j +O�2�j log(T )T � : (4.17)where A is de�ned by (2.34). In addition, if Xt;T is Gaussian, thenVar�I(j)t;T� = 2 Xi Si� tT �Ai;j!2 +O�2�j log(T )T � : (4.18)Proof. In the proof, we use the orthonormality of �j;k, the fact that Lj � M2�jand formula (3.67). First note thatXi CiAi;jT � 2�jT Xi Ci Xj 2jAi;j! = 2�jT Xi Ci = O�2�jT � : (4.19)We have ��E I(j)p;T � �1Xi=�1Si � pT �Ai;j�� 1Xi=�1Xk2Z ��!2i;k;T � Si � pT �� Xt i;k�t j;p�t!2 ��1Xi=�1Xk2Z��Si� kT �� Si � pT ��+ CiT � Xt i;k�t j;p�t!2 �Xi �Li min�M max(2�i; 2�j)T ; 1�+ CiT �Ai;j �M2�jT 0@ jXi=�1Li2�i2iAi;j + �blog2(T=M)cXi=j�1 Li2�i2jAi;j1A ++ Xi<�blog2(T=M)cLi +O(2�j=T ) �M2�jT �blog2(T=M)cXi=�1 Li2�i +O(T�1) +O(2�j=T ) =O�2�j log(T )T � ; 101

Chapter 4. Modelling log-returns using wavelets and rescaled timewhich proves the expectation. For the variance, �rst observe that��Xi Si(z)Ai;j�� = ��Xi Si(z)X� i(�)j(�)��= ��X� c(z; �)j(�)�� supz X� jc(z; �)j< 1 (4.20)by assumption (4.15). Using Gaussianity, we haveVar�I(j)p;T� = 2�E �I(j)p;T��2= 2 �1Xi=�1Si � pT �Ai;j +O�2�j log(T )T �!2= 2 �1Xi=�1Si � pT �Ai;j!2 +O�2�j log(T )T � ;where the last step uses (4.20). This completes the proof. �The form of the remainder in (4.17) suggests that the estimator is more accu-rate for �ner scales. However, like in Nason et al. (2000) and in Section 3.4, wenormally compute the wavelet periodogram down to scale �J(T ), with J de�nedin De�nition 2.2.1.Formula (4.17) suggests the following method of estimating the spectrum: foreach t = 0; : : : ; T � 1, we solve the system of equationsI(j)t;T = Xi Si(t=T )Ai;j; i; j = �1; : : : ;�J(T ) (4.21)to obtain an approximately unbiased estimator Sj(t=T ) of the spectrum Sj(t=T )(see Nason et al. (2000) for details of this procedure in the LSW model).However, formula (4.18) shows that the wavelet periodogram is not a consistentestimator and needs to be smoothed to obtain consistency. We can either �rstsolve (4.21), and then smooth Sj(t=T ), or �rst smooth I(j)t;T , and then solve (4.21).Following Nason et al. (2000), we prefer the latter option, as it is often easier to102

4.4. Estimationwork out the distributional properties of I(j)t;T than those of Sj(t=T ), and thereforeit is easier to justify the choice of smoothing parameters for I(j)t;T .Smoothing the wavelet periodogram is by no means an easy task, due to anextremely low signal-to-noise ratio (for Gaussian series, neglecting the remainders,we have E �I(j)t;T� =nVar�I(j)t;T�o1=2 � 1=p2), and also to a signi�cant amount ofautocorrelation present in I(j)t;T . Nason et al. (2000) propose an adaptive waveletdenoising method whose performance will be discussed in Section 4.4.4.In Section 4.4.1, we propose an alternative general methodology for smoothingthe wavelet periodogram. Section 4.4.2 looks at two speci�c methods of smoothing,and Section 4.4.3 deals with inverting (4.21) in an approximate manner to ensurethe nonnegativity of the estimated spectrum.4.4.1 Generic algorithmThe approach which we propose here is based on the following observation. Denoteby fd(j)t;TgT�1t=0 the sequence of non-decimated wavelet coe�cients of Xt;T at scale j(so that I(j)t;T = �d(j)t;T�2). Often, �nancial log-returns exhibit little serial correla-tion (e.g. see the example in Section 4.1), so, by orthogonality of the decimatedwavelets, the sequence d(�1)0;T ; d(�1)2;T ; d(�1)4;T ; : : : ; d(�1)T�2;Tas well as the sequence d(�1)1;T ; d(�1)3;T ; d(�1)5;T ; : : : ; d(�1)T�1;Tare each sequences of approximately uncorrelated random variables. At scale j,the same phenomenon is observed for sequencesd(j)i;T ; d(j)i+2�j ;T ; : : : ; d(j)i+T�2�j ;T ; i = 0; 1; : : : ; 2�j � 1:However, even if the original series Xt;T exhibits some form of autocorrelation,the decimated sequences of wavelet coe�cients will often be much less correlated.103

Chapter 4. Modelling log-returns using wavelets and rescaled timeThis is the well-known \whitening" property of wavelets, see e.g. Vidakovic (1999),Section 9.5.3.If Xt;T is Gaussian, the lack of serial correlation in the decimated sequences alsomeans lack of dependence, which in turn implies that the corresponding decimatedsubsequences of the wavelet periodogramI(j)i;T ; I(j)i+2�j ;T ; : : : ; I(j)i+T�2�j ;T ; i = 0; 1; : : : ; 2�j � 1: (4.22)are simply sequences of independent (gamma-distributed) random variables.The above argument can only be made formal if Xt;T is Gaussian TMWN. Thisis obviously a simplifying assumption, as clearly not every log-return sequencecan be adequately modelled as such. However, it turns out that in practice, theassumption of the lack of dependence in the decimated subsequences of the waveletperiodogram leads to estimators which perform well numerically (on simulateddata) and are visually appealing (on both simulated and real data). In other words,the departure from the TMWN setting often turns out not to be signi�cant enoughto prevent us from treating the decimated subsequences of I(j)t;T as independent.Having made the assumption of independence, we now proceed as follows:1. Fix j.2. For i = 0; 1; : : : ; 2�j � 1, pick the decimated sequenceI(j)i;T ; I(j)i+2�j ;T ; : : : ; I(j)i+T�2�j ;Tand smooth it using a preselected method, with the smoothing parameter(s)chosen by cross-validation (CV). CV stands a chance of performing wellhere, due to the lack of dependence between the variables. For example, thetechnique of Ombao et al. (2001b) can be applied, as we are also dealingwith a sample of independent gamma variates, like in periodogram smooth-ing. In Section 4.4.2, we explore two other methods in which the smoothingparameter is chosen by CV. 104

4.4. Estimation3. Interpolate the smoothed sequence at all the points 0; 1; : : : ; T�1 (e.g. usinglinear interpolation). Denote the interpolated smoothed sequence byn~I(i;j)t;T oT�1t=0 :4. Finally, compute the estimate of the wavelet periodogram as the average ofthe estimates ~I(i;j)�;T , for i = 0; 1; : : : ; 2�j � 1:I(j)t;T = 2�j�1Xi=0 ~I(i;j)t;T :For coarser scales, where it is not possible to smooth the decimated sequences accu-rately as they are too short, we estimate I(j)t;T by a constant: I(j)t;T = 1=TPT�1l=0 I(j)l;T .The estimates I(j)t;T can now be substituted into the systems of linear equationsI(j)t;T = Xi Si(t=T )Ai;j: (4.23)CV for dependent data. CV \as it is" does not perform well when the errorsare dependent and some methods for correcting CV to this setting havebeen developed, see for example Altman (1990). However, they all workfor stationary noise and require an estimate of the autocovariance. In oursetting, �nding such an estimate implies �nding a pre-estimate of the signalitself. To avoid this nuisance, we prefer to work with independent decimatedsubsequences.4.4.2 Smoothing the decimated periodogramIn step 2. of the algorithm of Section 4.4.1, we apply a smoothing procedure to thedecimated subsequences of the wavelet periodogram. In this section, we considerthe use of two smoothing methods: cubic B-splines (see Hastie & Tibshirani (1990)for details) and translation-invariant nonlinear wavelet smoothing (see Nason et al.(2000)).The bene�ts of using cubic B-splines are the following.105

Chapter 4. Modelling log-returns using wavelets and rescaled time� The method performs well (see Section 4.4.4).� Most statistical packages provide a fast implementation of this method. Forexample, we use the S-Plus routine smooth.spline, which automaticallyselects the smoothing parameter by cross-validation.� Numerical examples suggest that the method is fairly robust to the mis-speci�cation of the local variance of the noise. This feature is particularlyattractive: in our setting, the variance of the noise depends on the signal (seeformulas (4.17) and (4.18)), and, therefore, an accurate estimate of the vari-ance would require an accurate estimate of the signal. In practice, it seemssu�cient to supply constant variance to smooth.spline, see the results inSection 4.4.4.The advantages of using translation-invariant nonlinear wavelet smoothing areas follows.� The method performs well (see Section 4.4.4).� The only smoothing parameter to be chosen is the \primary resolution",above which universal thresholding is applied with the threshold (suitablefor chi-squares) as in Nason et al. (2000). For speed of computation, wedo not choose the threshold by cross-validation, even though in theory thiscould also be considered. There are only log2(T ) primary resolution levels tochoose from, which makes the choice potentially easier than, say, the choiceof bandwidth in kernel smoothing. We perform the selection by \leave-half-out" cross-validation like in Nason (1996). The accurate choice of theprimary resolution is extremely important in this context, as the numericalexample of Section 4.4.4 powerfully demonstrates.� Unlike linear methods, this nonlinear technique is capable of detecting abruptchanges in the wavelet periodogram.106

4.4. Estimation4.4.3 Estimating the spectrum with guaranteed nonnega-tivityThe evolutionary wavelet spectrum Sj(z) is a nonnegative quantity so it would alsobe desirable if Sj(k=T ) was guaranteed to be nonnegative. This can be achieved, forexample, by replacing the system of equations (4.23) by a Linear ComplementarityProblem (LCP; see e.g. Murty (1988)): AS(k=T ) � Ik;TS(k=T ) � 0�AS(k=T )� Ik;T� S(k=T ) = 0:The above LCP can be solved using e.g. successive over-relaxation.Let SLCPj (k=T ) denote the estimate of Sj(k=T ) obtained using the LCP formu-lation, and SINVj (k=T ) | using the simple inversion of formula (4.23). By (2.33),we estimate the local variance �2(k=T ) in each case by�2(k=T )(LCP) = �1Xj=�J(T ) SLCPj (k=T )�2(k=T )(INV) = �1Xj=�J(T ) SINVj (k=T ):In practice, �2(k=T )(INV) is often a much more accurate estimator of the localvariance. In order to combine this feature with the guaranteed nonnegativity ofthe spectrum, we rescale the LCP-based estimator to yield the �nal estimators ofSj(k=T ) and �2(k=T ):Sj(k=T ) = �2(k=T )(INV) SLCPj (k=T )�2(k=T )(LCP) (4.24)�2(k=T ) = �1Xj=�J(T ) Sj(k=T ): (4.25)As explained in Sections 4.4.1 and 4.4.2, Sj(k=T ) depends on the method usedfor smoothing the wavelet periodogram. The next section brie y compares theperformance of the estimators based on cubic B-splines and on nonlinear waveletdenoising. 107

Chapter 4. Modelling log-returns using wavelets and rescaled time

0 500 1500

-0.0

60.

00.

06

0 500 1500

0.0

0.00

06

Figure 4.3: Left-hand plot: sample path from Gaussian TMWN model with time-varying standard deviation superimposed. Right-hand plot: time-varying variance(solid), its estimate using splines (dot-dashed), its estimate using nonlinear waveletthresholding (dotted), and its estimate using nonlinear wavelet thresholding withdefault parameters (dashed).4.4.4 Numerical resultsThe left-hand plot in Figure 4.3 shows a sample path from the Gaussian TMWNmodel with the superimposed contrived time-varying standard deviation. We es-timate the time-varying local variance (the square of the time-varying standarddeviation) by adding up estimators of the Haar wavelet spectrum over scales (seeformula (4.25)). The right-hand plot shows� the time-varying variance (solid line);� an estimate obtained using spline smoothing with the smoothing parameterchosen by cross-validation (dot-dashed line);� an estimate obtained using translation-invariant nonlinear wavelet smoothingwith Daubechies' least asymmetric wavelet with 10 vanishing moments wherethe primary resolution was chosen by cross-validation (dotted line);108

4.4. Estimationdefault splines waveletsmean of d�2 1197 189 195mean of dS 464 243 276Table 4.1: Values of the criterion functions averaged over 25 simulations. \Default"is the method of Nason et al. (2000) with default parameters, \splines" is ourmethod using spline smoothing and \wavelets" is our method using translation-invariant nonlinear wavelet smoothing.� an estimate obtained using the same wavelet method (dashed line) but withdefault parameters except the smooth.dev parameter in the ewspec routine(Nason (1998)) was set to var as recommended by G.P. Nason (personalcommunication).While the two estimates with the smoothing parameter chosen by cross-validation almost coincide with each other and with the true time-varying variance(except for the spurious spike in the wavelet estimate), the default estimate by Na-son et al. (2000) oversmooths. This is due to the fact that the primary resolution(PR) in the latter method is not chosen in a data-driven way but instead a �xedPR is used.For the same Gaussian TMWN process, we assessed the performance of thethree methods discussed above basing on 25 simulated sample paths. We used twocriterion functions | one for the Haar spectrum:dS(S; S) = 241011T �1Xi=�J(T ) T�1Xt=0 �Si� tT �� Si� tT ��235 ; (4.26)and the other for the variance:d�2(�2; �2) = "1011T T�1Xt=0 ��2 � tT �� 2� tT ��2# : (4.27)The values in Table 4.1 con�rm our earlier observation that the two estimators inwhich the choice of the smoothing parameter is performed by cross-validation givevery similar results. 109

Chapter 4. Modelling log-returns using wavelets and rescaled time4.5 Exploratory data analysisIn this section, we look at two examples of data analysis using the LSW3 methodol-ogy (the examples are related to each other). The �rst one uses the Haar scalogram(see formula (4.14)), and the other | the full evolutionary Haar wavelet spectrum.4.5.1 Analysis based on the scalogramIn this subsection, we compute the Haar scalogram for four series:� Xt;T : the last 1024 observations of the arti�cial simulated Gaussian TMWNof Figure 4.3,� Ft;T : the last 1024 observations of the FTSE 100 series of Figure 4.1,� Nt;T : the last 1024 observations of the Nikkei series of Figure 4.2,� Dt;T : the last 1024 observations of the Dow Jones IA series of Figure 4.2.Figure 4.4 shows logged scalograms for Xt;T , Ft;T , Nt;T and Dt;T (solid lines),plotted against �j = 1; 2; : : : ; 10. Dotted lines are theoretical log-scalograms ofcorresponding time-modulated white noise processes with the same time-varyingvariances. As Xt;T actually is a time-modulated white noise process, and its log-scalogram is substantially deviated from the corresponding dotted straight line forscales �6;�7; : : : ;�10, and slightly deviated for scales �4;�5, we suspect that fora series of length 1024, the scalogram is a relatively reliable estimator for scales�1;�2; : : : ;�5 (hence the vertical line at �j = 5), and a very reliable one forscales �1;�2;�3 (hence the vertical line at �j = 3).Looking at the 3 �nest scales (�j = 1; 2; 3), it seems that Dow Jones and Nikkeiare reasonably close to TMWN. However, FTSE 100, which was provisionallymodelled as Gaussian TMWN in Section 4.1, shows a substantial deviation fromthis setting, especially at scale j = �2, where the mean spectrum is clearly greaterthan what it should be if FTSE 100 were to be close to TMWN. Indeed, to assess110

4.5. Exploratory data analysis

Gaussian TMWN

1 2 3 4 5 6 7 8 9

-24

-20

-16

FTSE-100

1 2 3 4 5 6 7 8 9-2

2-1

8-1

4

Nikkei

1 2 3 4 5 6 7 8 9

-22

-18

-14

DJIA

1 2 3 4 5 6 7 8 9

-22

-18

-14

Figure 4.4: Solid lines: log-scalograms of Xt;T (top left), Ft;T (top right), Nt;T(bottom left) and Dt;T (bottom right), plotted against �j. Dotted lines: theoreti-cal scalograms if the processes were (time-modulated) white noise (not necessarilyGaussian). Dashed lines: �j = 3; 5 (see text for discussion).111

Chapter 4. Modelling log-returns using wavelets and rescaled timethe validity of this statement, we have simulated 1000 independent sample pathsof the standard white noise, and computed the Haar scalogram for each of them.In each case, the empirical scalogram for j = �1 was larger than that for j = �2,unlike the FTSE 100 case. The outcome of this experiment seems to con�rmour initial judgement that the deviation of FTSE 100 from the TMWN setting issigni�cant.By formula (4.16), a large scalogram at scale j = �2 implies a signi�cant con-tribution of the summand ~ST�2�2(h) to the sample autocovariance. For Haarwavelets, �2(�) is supported on h = �3; : : : ; 3, and is plotted in the left plot ofFigure 4.5. It is positive for h = �1 and negative for h = �2;�3. Therefore, if thecontribution of the spectrum at scale j = �2 is signi�cant enough, we can expectthat the sample autocorrelation of Ft;T will be signi�cant positive for h = 1, andsigni�cant negative for h = 2; 3. The right-hand plot in Figure 4.5 shows that thisis indeed the case. The shape of the acf function of Ft;T is very similar to thestructure of �2.Figure 4.1 shows that the same pattern is present in the sample autocorrelationof the whole FTSE 100 series, and not only in Ft;T (= the last 1024 observations ofFTSE 100). However, the pattern is much less visible in the sample autocorrela-tion of the standardised FTSE 100 (series Zt in Figure 4.1). This may suggest, forexample, that this autocorrelation structure (positive dependence at lag 1, neg-ative at lags 2 and 3), may be present in a stretch of high volatility, which hasa signi�cant contribution to the sample autocorrelation of FTSE 100 (or, alter-natively, to the scalogram). In Zt, the \standardised" periods of high volatilitycontribute less to the sample autocorrelation than in the original FTSE 100 series,which would explain why the sample autocorrelation of Zt exhibits a di�erent de-pendence structure: it only indicates slight positive dependence at lag 1, but nosigni�cant negative dependence at lags 2 or 3.The above discussion clearly indicates the need for a local analysis of the FTSE100 data. By looking at the full evolutionary Haar spectrum of FTSE 100, we are112

4.5. Exploratory data analysisable to �nd out where and how the local autocovariance structure changes overtime.4.5.2 Full evolutionary Haar spectrum analysisFigure 4.7 shows the estimated evolutionary Haar spectrum of F 0t;T = the 2048last observations of the FTSE 100 index (plotted in Figure 4.1), smoothed us-ing our generic algorithm with spline smoothing. It seems that scale j = �2dominates from time z0 � 0:6 onwards (this corresponds, roughly, to timet = 1200; : : : ; 2048). In particular, there is a huge bump centred at z1 � 0:67:it is clearly the most visible feature in the \spectrum landscape" of FTSE 100.Judging by the magnitude of the bump, it seems likely that even though scalej = �2 dominates over part of the time horizon only, \global" tools (such as thescalogram or the sample autocovariance computed for the whole sample) will alsobe a�ected, which will give the false impression that scale j = �2 dominates allthe way through. Indeed, if we compute the acf of F 01;T ; F 02;T ; : : : ; F 01200;T , it turnsout that the e�ect of the sample acf resembling the Haar autocorrelation functionat scale j = �2, is not present now. The acf of the �rst 1200 observations of F 0t;Tis plotted is the left-hand plot of Figure 4.6. Right-hand plot of Figure 4.6 showsthe acf of the remaining part of F 0t;T , where scale j = �2 seems to dominate. Thisis re ected in the shape of the sample acf at lags 1; 2; 3.The LSW model with the Haar basis seems to be ideally suited for modellingthe FTSE 100 series on the interval z 2 (0:6; 1), as it provides a sparse represen-tation of the local covariance in that region: most of the \energy" of the series isconcentrated at scales j = �1 and �2.The above demonstrates how important it is to analyse the log-return datalocally, rather that using global tools. There is no economic reason why log-return series should stay stationary over long periods, and the above wavelet-basedanalysis shows that, indeed, they do not.113


h

Psi

0 1 2 3 4 5

-0.5

0.5

LagA

CF

0 1 2 3 4 50.

00.

40.

8

Figure 4.5: Left-hand plot: �2(h) for Haar wavelets for h = 0; 1; : : : ; 5. Right-hand plot: autocorrelation function for Ft;T at lags 0; 1; : : : ; 5.

Lag

AC

F

0 2 4 6 8 10

0.0

0.4

0.8

Lag

AC

F

0 2 4 6 8 10

0.0

0.4

0.8

Figure 4.6: Left-hand plot: sample autocorrelation of F 01;T ; : : : ; F 01200;T . Right-hand plot: sample autocorrelation of F 01201;T ; : : : ; F 02048;T .114

4.5. Exploratory data analysis

Rescaled Time

Sca

le

0.0 0.2 0.4 0.6 0.8 1.0

24

68

1012

Figure 4.7: Estimated evolutionary Haar spectrum of T = 2048 last observationsof FTSE 100 of Figure 4.1. Smoothing uses splines. X-axis is the rescaled timez = t=T , and Y-axis is negative scale �j = 1; 2; : : : ; 11.

115

Chapter 4. Modelling log-returns using wavelets and rescaled time4.6 ForecastingA comparison of forecasting methods for daily Sterling exchange rates is providedby Brooks (1997), who concludes that forecasts based on GARCH modelling arethe most reliable. Leung et al. (2000) �nd that probabilistic neural networks(Wasserman (1993)) outperform other methods when applied to stock index re-turns. However, the input variables in their model include, apart from the pastdata, a variety of other macroeconomic factors. In this section, we only considerforecasts based on past values of the series, and compare our methodology to fore-casting based on GARCH modelling (for an overview of the latter methodology,see e.g. Bera & Higgins (1993)).The algorithm which we apply here is the adaptive forecasting procedure detailedin Section 3.5 of this thesis. As the theory underlying the algorithm was developedfor LSW2, and not for LSW3 processes, it would be natural to ask at this pointwhether a similar theory can also be developed for the latter model. We anticipatethat it is indeed the case: we conjecture that using exactly the same techniques, itis possible to obtain results analogous to those of Chapter 3; however, it is likelythat the speed of convergence of the relevant quantities will be di�erent. We leavethis interesting theoretical problem as a possibility for future research.We demonstrate the usefulness of the wavelet approach by comparing our fore-casting methodology to forecasting based on AR+GARCH modelling, on a frag-ment of the Dow Jones IA series (denoted by Dt;T in Section 4.5 and plotted inFigure 4.2). However, this brief simulation study does not aim to show that ourapproach is superior to AR+GARCH. Instead, we attempt to demonstrate a fewinteresting features of LSW3 forecasting.Suppose that we have already observed 1105 values of the series, and want toperform one-step prediction of the series along the segment D1106;T ; : : : ; D1205;T .In order to do so, we employ the algorithm of Section 3.5 with Haar wavelets. Wemake an initial guess at the values of p and g: we set (p; g) = (1; 30). Further, we116

4.6. Forecastingset the criterion function tod1 (fX0;T ; : : : ; Xk;Tg; p; g) = ��Xk;T � Xk;T (p; g)�� (4.28)and we allow one parameter update at each time point.Also, we limit the parameter space for p to the set f1; 2g, having empiricallyfound that the forecasting algorithm performs best on the given stretch of the serieswhen the upper limit for p is set to 2. This roughly corresponds to \switching"between TMWN and time-varying AR(1) at each time point, depending whichmodel produces locally more accurate forecasts.We compare our method to forecasts obtained by modelling Dt;T as� AR(1) + GARCH(1,1) | since AR(1) roughly corresponds to the upperlimit for p being equal to 2,� AR(16) + GARCH(1,1) | since the AIC criterion indicates that the orderof Dt;T along the segment t = 1105; : : : ; 1204 is equal to 16.The parameters (1; 1) of the GARCH part were selected ad hoc; however, theyhave no in uence on the point forecasts. The models were �tted using the garchroutine from the S-Plus garch module.The results of the experiment are presented in Figure 4.8. The top left plotshows the actual series D1106;T ; : : : ; D1205;T (dotted line), the corresponding one-step-ahead forecasts (thick solid line), and 95% prediction intervals (assumingGaussianity; dashed lines), for the AR(1) + GARCH(1,1) model. The top rightplot shows the same for the AR(16) + GARCH(1,1) model, and the bottom leftplot | the same for the LSW3 model. The bottom right plot in Figure 4.8 showsthe actual series scaled by the factor of 2000 (dotted line), as well as the corre-sponding values h of the bandwidth used to forecast the series. The bandwidth wasallowed to change by �1 or remain the same. The fact that it increases steadilybeginning from t = 1160 may suggest that the time-varying second order structureof Dt;T evolves more slowly in that region.117


1120 1160 1200

-0.0

20.

00.

02

1120 1160 1200

-0.0

20.

00.

02

1120 1160 1200

-0.0

20.

00.

02

1120 1160 1200

-40

040

Figure 4.8: Top left, top right and bottom left: the actual series (dotted line), one-step forecasts (solid line) and 95% prediction intervals (dashed lines) for AR(1)+ GARCH(1,1), AR(16)+GARCH(1,1) and LSW3, respectively. Bottom right:actual series �2000 and the evolution of the bandwidth g.118

4.7. ConclusionAR(1)+GARCH(1,1) AR(16)+GARCH(1,1) LSW3Mean SPE 878 857 839Median SPE 404 375 298Table 4.2: Mean Squared Prediction Error and Median Squared Prediction Error(�107 and rounded) in forecasting D1106;T ; : : : ; D1205;T one step ahead, for thethree methods tested in Section 4.6.In the LSW3 forecasting, the stretches where p = 1 wins over p = 2 are indicatedby one-step forecasts equal to zero (like in TMWN forecasting). Non-zero forecastsindicate that p = 2 is used to perform prediction. The LSW3 model does animpressive job in picking up the spike at t = 1112, and also at capturing the localstructure around t = 1135. The Mean Squared Prediction Errors and the MedianSquared Prediction Errors for the three methods are given in Table 4.2: the LSW3method outperforms the other two.For the LSW3 method, 92% of observations fall within the corresponding one-step 95% prediction intervals, whereas the analogous ratios for the AR(1) +GARCH(1,1) and AR(16) + GARCH(1,1) methods are 94% and 93%, respec-tively. Our slightly worse performance is due to the fact that the d1 criterion onlyminimises the distance between the predicted value and the actual one, and doesnot take into account the prediction intervals. A modi�cation of the comparisoncriterion would almost certainly lead to an improvement over the (already good)ratio of 92%.However, it must be mentioned that the prediction intervals in the LSW3 modelare narrower than the minimum of those in the AR(1) + GARCH(1,1) model andthose in the AR(16) + GARCH(1,1) model in 71% of the cases.We leave the important problem of forecasting volatility in our wavelet-basedframework as one of the many possible avenues for future investigation.4.7 ConclusionIn this chapter, we have provided theoretical and empirical evidence that stockindex returns can be successfully modelled and forecast in a time series model which119

Chapter 4. Modelling log-returns using wavelets and rescaled timecombines wavelets and the concept of rescaled time. Starting from a motivatingexample of the FTSE 100 series being modelled as a Time-Modulated White Noise(TMWN), we have slightly altered the de�nition of an LSW process (Nason et al.(2000)) so that the altered setup, called LSW3, includes TMWN as a special case.We have provided theoretical evidence that the (linear and non-stationary)LSW3 model can capture the most commonly observed stylised facts. In particular,we have argued that the heavy tails of the marginal distribution, negligible sampleautocorrelations, and non-negligible sample autocorrelations of the squares, are alle�ects which can possibly be caused by applying stationary, global tools (such asthe sample autocorrelation) to the analysis of non-stationary data.Furthermore, we have proposed a new general algorithm for estimating time-varying second-order quantities in the LSW3 model. We have shown that twoparticular implementations of our algorithm, speci�cally designed for �nancial log-returns, outperform the default algorithm proposed by Nason et al. (2000) forgeneral non-stationary time series.Also, we have provided two interesting examples of exploratory data analysisusing the LSW3 toolbox. By using the (global) scalogram and the (local) evolu-tionary Haar spectrum, we have found that the daily FTSE 100 index displaysa signi�cant local departure from the TMWN setting. Also, by examining theHaar spectrum, and the shape of the autocovariance function of FTSE 100 over acertain region, we have discovered that the Haar wavelet basis is ideally suited forthe sparse modelling of FTSE 100 on that interval. The example has powerfullydemonstrated that the �nancial log-return data need to analysed using local toolsas all of their second order characteristics, and not only variance, can vary overtime.Finally, we have provided evidence that �nancial log-returns can be successfullyforecast in the LSW3 framework using the adaptive forecasting algorithm of Sec-tion 3.5. We have compared the forecasts obtained by the adaptive algorithm tothose obtained using GARCH modelling. Again, we have found that the adaptive120

4.7. Conclusionmethod has the potential to accurately forecast some important local features ofnon-stationary log-return data. In the example analysed (a fragment of the DowJones IA index), the LSW3-based technique has outperformed two GARCH-basedmethods.The S-Plus routines written for and used in this chapter, the data sets analysedin it, as well as the contrived standard deviation function of Figure 4.3 are includedon the associated CD.

121

Chapter 5Denoising the waveletperiodogram using the Haar-Fiszvariance stabilising transformOur aim in this chapter is twofold. Firstly, we introduce a multiscale variancestabilising transform for the wavelet periodogram of a Gaussian LSW process. Wecall the procedure the \Haar-Fisz" transform, as it consists of three basic steps:taking the Haar transform of the periodogram sequence, dividing the arising detailcoe�cients by the corresponding smooth coe�cients (an instance of the so-calledFisz transform), and �nally taking the inverse Haar transform. The resultingvector is closer to Gaussianity than the result of the classical log transform; also,its variance is well stabilised. This is con�rmed not only by empirical results butalso by theory.Secondly, we investigate the performance of a denoising method for the waveletperiodogram which consists in taking the Haar-Fisz transform, denoising the trans-formed vector using a method suitable for Gaussian noise, and then taking theinverse Haar-Fisz transform. Simulations demonstrate excellent performance.5.1 Motivation: the Fisz transformThe initial motivation for this research was the following result proved in the paperby Fisz (1955) (we stick to the original notation from the paper). Suppose that122

5.1. Motivation: the Fisz transform�(�) is a nonnegative random variable from a family of distributions parametrisedby a positive parameter �, and m(�) = E (�(�)), �2(�) = Var(�(�)). We say that�(�) is asymptotically normal N(u(�); v(�)) if there exist functions u; v > 0 s.t.for all x 2 R, lim�!1P ��(�)� u(�)v(�) < x� = �(x); (5.1)where �(x) is the cdf of the standard normal. Let �i(�i), i = 1; 2 be two in-dependent variables and let mi = E (�i), �2i = Var(�i). The following theoremholds.Theorem 5.1.1 If� �(�)=m(�) converges in probability to one as �!1;� �(�) is asymptotically normal N(m(�); �(�));� lim(�1;�2)!(1;1)m1=m2 = 1,then the variable �(�1; �2) = �2(�2)� �1(�1)(�2(�2) + �1(�1))p ; (5.2)where p is an arbitrary positive number, is asymptotically normalN m2 �m1(m2 +m1)p ; p�21 + �22(m2 +m1)p! :Note the speci�c form of the ratio in (5.2): it can be viewed as the ratio of a Haardetail coe�cient and the pth power of the corresponding smooth coe�cient. Weshall exploit this property later.As an example, consider �(n) = a(X21 +: : :+X2n), where Xi are i.i.d. N(0; 1) anda > 0. We have �(n) � a�2n = aGamma(12 ; n2 ) and m(n) = an, �2(n) = 2a2n. Thevariable �(n) satis�es the �rst two assumptions of Theorem 5.1.1 by the Law ofLarge Numbers and the Central Limit Theorem, respectively. Assume that �1(n1),�2(n2) are independent and n1=n2 ! 1. Then, �(n1; n2) is asymptotically normalN n2 � n1(n2 + n1)pa1�p; p2(n2 + n1)p�1=2a1�p! :123

Chapter 5. Denoising the wavelet periodogram using Haar-FiszNote that setting p = 1 makes the variance of �(n1; n2) independent of a. Indeed,this variance stabilisation is the key property of the Haar-Fisz transform of Section5.3. We now formally de�ne the Fisz transform, an essential component of theHaar-Fisz transform.De�nition 5.1.1 Let X, Y be two nonnegative random variables. The Fisz trans-form of X and Y with exponent p is de�ned as�p(X; Y ) = X � Y(X + Y )p ; (5.3)with the convention that 0=0 = 0.5.2 Properties of the wavelet periodogram in theGaussian LSW modelAs mentioned above, the Fisz transform can be viewed as the division of a Haardetail coe�cient by the pth power of the corresponding smooth coe�cient (up to amultiplicative constant). In actuality, we are interested in applying this operationto Haar coe�cients of wavelet periodogram sequences in the Gaussian LSW model.Therefore, let us now recall some properties of this statistic.For Gaussian LSW processes, the wavelet periodogram I(j)t;T at a �xed scale j isa sequence of scaled �21 variables. Also, we know from Section 2.2.4 that I(j)t;T is anasymptotically unbiased, but inconsistent estimator of �j(t=T ), where�j(z) := �1Xi=�1Si(z)Ai;j: (5.4)Furthermore, the following proposition shows that the wavelet periodogram ateach scale j is typically a correlated sequence.Proposition 5.2.1 Let Xt;T be a Gaussian LSW process satisfying Sj(z) � D2j.We havecov �I(j)t;T ; I(j)t+s;T� = 2 1X�=�1 c� tT ; ��j(� + s)!2 +O(2�j=T ): (5.5)124

5.2. Properties of the wavelet periodogram in the Gaussian LSW modelThe proof uses exactly the same technique as the proof of (2.38).Our ultimate objective is to denoise the periodogram sequences at scales j =�1; : : : ;�J(T ), i.e. to provide estimates of the functions ��1(z); : : : ; ��J(T )(z).Being able to estimate f�j(z)gj is useful in two ways:1. Estimates of f�j(z)gj can be used to obtain estimates of fSj(z)gj (by (5.4)and by the invertibility of A, see Nason et al. (2000) for details);2. The estimate of Sj(z) can in turn be used to obtain an estimate of the localautocovariance c(z; �) (using the representation (2.32)).In short, estimating f�j(z)g�J(T )j=�1 allows us to make inference about the time-varying second-order structure of Xt;T .The top plot in Figure 5.1 shows an example of the wavelet spectrum Sj(z)where only S�1(z) and S�3(z) are non-zero. The middle plot shows a sample pathof length 1024 simulated from it, using Haar wavelets and Gaussian innovations.The bottom plot shows the Haar periodogram of the simulated series at scale �1.Denoising the wavelet periodogram is by no means an easy task, due to� the fact that the variance of the noise depends on the level of the signal (seeformulas (2.37) and (2.38)),� an extremely low signal-to-noise ratio: again by (2.37) and (2.38) we obtain,neglecting the remainders, E I(j)t;T =nVar�I(j)t;T�o1=2 = 2�1=2,� the presence of correlation in the noise (see formula (5.5)).Most existing denoising techniques have been designed to handle stationary Gaus-sian noise and therefore it would be desirable to be able to transform the waveletperiodogram into a signal contaminated with such noise before the denoising isperformed. A well-known technique for stabilising the variance of scaled �2n vari-ables is the log-transform, see e.g. Priestley (1981); however, the resulting variableis still far from Gaussian if, like here, n = 1. Nason et al. (2000) propose a wavelet-based technique for denoising the wavelet periodogram without any pre-processing.125

Chapter 5. Denoising the wavelet periodogram using Haar-Fisz

Rescaled Time

-Sca

le

0.0 0.2 0.4 0.6 0.8 1.0

24

68

10

0 200 400 600 800 1000

-3-2

-10

12

3

0 200 400 600 800 1000

05

1015

Figure 5.1: Top plot: example of a wavelet spectrum where only S�1(z) andS�3(z) are non-zero. Middle plot: sample path of length 1024 simulated from thisspectrum using Haar wavelets and Gaussian innovations. Bottom plot: the Haarperiodogram of the simulated realisation at scale j = �1.126

5.3. The Haar-Fisz transformIn the next section, we introduce the Haar-Fisz transform: a multiscale Gaussian-ising and variance stabilising transformation for the wavelet periodogram whichturns out to be a viable alternative for the log transform.5.3 The Haar-Fisz transform5.3.1 Algorithm for the Haar-Fisz transformIn this section, we provide details of the Haar-Fisz transform, which stabilises thevariance of the wavelet periodogram and brings its distribution closer to normality.The input to the algorithm is:� A single row of the wavelet periodogram I(j)t;T at a �xed scale j: here, weassume that T is an integer power of two. To simplify the notation in thissection, we drop the superscript j and the subscript T and consider thesequence It := I(j)t;T , or, in vector notation, I = (I(j)0;T ; : : : ; I(j)T�1;T )0.� A �xed integer M 2 f1; 2; : : : ; log2(T )g; its meaning will become clear later.The output from the algorithm is:� The mean of I, denoted by �I.� A vector UM of length 2M .The vector UM is constructed as follows:1. Let sM be the vector of local averages of I:sMn = 2MT (n+1)T2�M�1Xt=nT2�M It for n = 0; 1; : : : ; 2M � 1: (5.6)2. For each m = M � 1;M � 2; : : : ; 0, recursively form vectors sm and fm:smn = 12(sm+12n + sm+12n+1) (5.7)fmn = sm+12n � sm+12n+12smn ; (5.8)for n = 0; 1; : : : ; 2m � 1, with the convention 0=0 = 0.127

Chapter 5. Denoising the wavelet periodogram using Haar-Fisz3. For each m = 0; 1; : : : ;M � 1, recursively modify the vectors sm+1:sm+12n = smn + fmn (5.9)sm+12n+1 = smn � fmn ; (5.10)for n = 0; 1; : : : ; 2m � 1.4. Set UM := sM � �I.We denote FMI := UM . The nonlinear operator FM is called the Haar-Fisztransform of I at the resolution level M .If M = log2(T ), then the length of FMI is T and the algorithm is invertible, i.e.I can be reconstructed from FMI and �I by reversing the steps 4.{1. Therefore, thecase M = log2(T ) is the one we are the most interested in in practice. However, theexact asymptotic Gaussianising properties of the Haar-Fisz transform only holdfor M �xed (i.e. independent of T ), and this case is investigated theoretically inSection 5.5.1. Section 5.5.2 provides some heuristics as to the behaviour of FMIwhen M = log2(T ): we still conclude that the distribution of F log2(T )I is close toGaussian with a constant variance. To simplify notation, we denote F := F log2(T ).Note that the steps 2.{4. of the algorithm are similar to the forward and inverseDiscrete Haar Transform except the division by smn in formula (5.8). Also, observethat (5.8) can be written as fmn = �1(sm+12n ; sm+12n+1): (5.11)That is, fmn is the result of the Fisz transform with exponent 1 of two neighbouringsmooth coe�cients sm+12n and sm+12n+1.Finally, note that the Haar-Fisz transform, being a computationally straightfor-ward modi�cation of the Haar transform, is also of computational order O(T).128

5.3. The Haar-Fisz transform5.3.2 ExamplesAs an example, consider T = 8. For M = 2, we have:U20 = P3t=0 It �P7t=4 ItP7i=0 It + I0 + I1 � I2 � I3P3i=0 ItU21 = P3t=0 It �P7t=4 ItP7i=0 It � I0 + I1 � I2 � I3P3i=0 ItU22 = �P3t=0 It �P7t=4 ItP7i=0 It + I4 + I5 � I6 � I7P7i=4 ItU23 = �P3t=0 It �P7t=4 ItP7i=0 It � I4 + I5 � I6 � I7P7i=4 It :Similarly, for M = 3, we have:U30 = P3t=0 It �P7t=4 ItP7i=0 It + I0 + I1 � I2 � I3P3i=0 It + I0 � I1I0 + I1U31 = P3t=0 It �P7t=4 ItP7i=0 It + I0 + I1 � I2 � I3P3i=0 It � I0 � I1I0 + I1U32 = P3t=0 It �P7t=4 ItP7i=0 It � I0 + I1 � I2 � I3P3i=0 It + I2 � I3I2 + I3U33 = P3t=0 It �P7t=4 ItP7i=0 It � I0 + I1 � I2 � I3P3i=0 It � I2 � I3I2 + I3U34 = �P3t=0 It �P7t=4 ItP7i=0 It + I4 + I5 � I6 � I7P7i=4 It + I4 � I5I4 + I5U35 = �P3t=0 It �P7t=4 ItP7i=0 It + I4 + I5 � I6 � I7P7i=4 It � I4 � I5I4 + I5U36 = �P3t=0 It �P7t=4 ItP7i=0 It � I4 + I5 � I6 � I7P7i=4 It + I6 � I7I6 + I7U37 = �P3t=0 It �P7t=4 ItP7i=0 It � I4 + I5 � I6 � I7P7i=4 It � I6 � I7I6 + I7Figure 5.2 compares the log transform (left plot) and the Haar-Fisz transform(right plot) of the wavelet periodogram from the bottom plot of Figure 5.1. Here,T = 1024, and the full transform is performed, i.e. M = 10. The Haar-Fisz-transformed wavelet periodogram appears to be much closer to normality.129


0 200 600 1000

-15

-50

0 200 600 1000

-4-2

02

Figure 5.2: The log transform (left plot) and the Haar-Fisz transform with M = 10(right plot) of the wavelet periodogram from the bottom plot of Figure 5.1.5.4 A Functional CLT for the centred waveletperiodogramIn this section, we are concerned with a Functional Central Limit Theorem (FCLT)for the centred wavelet periodogram Z(j)t;T := I(j)t;T � EI(j)t;T (see Davidson (1994)for more on the stochastic limit theory we use here). Our FCLT demonstratesthat the normalised cumulative sum of the centred wavelet periodogram convergesin distribution to a transformed Brownian motion. The theory in this sectionenables us to demonstrate the Gaussianising, variance stabilising and decorrelatingproperties of the Haar-Fisz transform established in Section 5.5. Before we statethe theorem, we introduce some essential notation.De�nition 5.4.1 (transformed Brownian motion) Let � be an increasinghomeomorphism on [0; 1] with �(0) = 0 and �(1) = 1. A transformed Brown-ian motion B� is de�ned asB�(z) D= B(�(z)); z 2 [0; 1];where B is the standard Brownian motion.130

5.4. A Functional CLT for the centred wavelet periodogramDe�nition 5.4.2 (cross-scale autocorrelation wavelets) Let be a �xedwavelet system. Vectors i;j, for i; j 2 f�1;�2; : : : g, de�ned byi;j(�) = 1Xs=�1 i;s+� j;s (5.12)are called the cross-scale autocorrelation wavelets.Denote �Sj = maxz Sj(z)�k = maxj=�1;::: ;�k Cj�Sj ;with the convention 0=0 = 0. Denote furtherA�i;j = Xn i;j(n)i;j(n+ �)��j (z) = Xi Si(z)A�i;j (5.13)We now state the Functional Central Limit Theorem for the centred wavelet peri-odogram.Theorem 5.4.1 Let Xt;T be a Gaussian LSW process, and let Z(j)t;T be its centredwavelet periodogram at scale j. De�neb2T = E T�1Xt=0 Z(j)t;T!2RT (z) = PbzT c�1t=0 Z(j)t;TbT for z 2 [0; 1]:If 9 " > 0 Xi<0 Xl�m+1 2i;j(l) �Si!1=2 = O(m�1=2�"); (5.14)�J(T )=T 2 l1; (5.15)supz Xn jc(z; n)j < 1; (5.16)9D �Sj2�j � D 8 j; (5.17)then RT D! B�, where�(z) = R z0 P1�=�1 �Pi Si(u)A�i;j�2 duR 10 P1�=�1 �Pi Si(u)A�i;j�2 du:131

Chapter 5. Denoising the wavelet periodogram using Haar-FiszThe proof of Theorem 5.4.1 appears later in this section. As is clear from theproof, the left-hand expression in condition (5.14) is a measure of dependencein the sequence Z(j)t;T at lag m. Condition (5.15) places an additional restrictionon the �nite-sample wavelet spectrum f!2j;k;Tgj;k in relation to the asymptoticspectrum fSj(z)gj. Condition (5.16) is a short-memory assumption for Xt;T , andcondition (5.17) requires that the wavelet spectrum should decay at a certain speedas j ! �1.We now give an example of a periodogram sequence which satis�es the technicalcondition (5.14). Let Xt;T be a Gaussian LSW process constructed with Haarwavelets and such that Si(z) = Si = 2i. Asymptotically, Xt;T is a white noiseprocess (see formula 3.70). Consider the Haar periodogram of Xt;T at scale j = �1.Using the explicit form of discrete Haar vectors and (5.12), simple algebra yieldsXi<0 2i Xl�m+1 2i;�1(l) = O(m�2);and (5.14) is satis�ed.In particular, Theorem 5.4.1 implies that E (RT (z)2) ! �(z) as T ! 1, andthat the increments of RT (z) are asymptotically independent. Theorem 5.4.1 isfundamental for the theoretical results of the next section.To be able to prove Theorem 5.4.1, we �rst need to recall the de�nition ofL2-Near Epoch Dependence (L2-NED), and then prove two technical lemmas.De�nition 5.4.3 For a stochastic array ffVt;Tg1t=�1g1T=1, possibly vector-valued,on a probability space (;G; P ), let Gt+mt�m;T = �(Vt�m;T ; : : : ; Vt�m;T ). If an inte-grable array ffXt;Tg1t=�1g1T=1 satis�eskXt;T � E (Xt;T jGt+mt�m;T )k2 � ht;T �m;where �m ! 0, and fht;Tg is an array of positive constants, it is said to be L2-NED on fVt;Tg with constants fht;Tg. Further, if 9 " > 0 s.t. �m = O(m��"), thenffXt;Tg1t=�1g1T=1 is said to be L2-NED of size � on fVt;Tg.Lemma 5.4.1 De�ne �t;T = (��1;t; : : : ; ��J(T );t)0. If9 " > 0 Xi<0 Xl�m+1 2i;j(l) �Si!1=2 = O(m�1=2�");132

5.4. A Functional CLT for the centred wavelet periodogramthen Z(j)t;T=bT is L2-NED of size �1=2 on f�t;Tg. If in addition �J(T )=T 2 l1, thenthe NED constants can be set to 1=bT .Proof. It su�ces to examine the L2-Near Epoch Dependence for Z(j)t;T . De�neGt+mt�m;T = �(�t�m;T ; : : : ; �t+m;T ):We haveZ(j)t;T � E (Z(j)t;T jGt+mt�m;T ) = I(j)t;T � E (I(j)t;T jGt+mt�m;T )= ��J(T )Xi=�1 Xk !i;k;Ti;j(t� k)�i;k��2� ��J(T )Xi=�1 Xk�t�m!i;k;Ti;j(t� k)�i;k��2� �J(T )Xi=�1 Xk<t�m!2i;k;T2i;j(t� k)= Y 21 � Y 22 �K21 = (Y1 � Y2)(Y1 + Y2)�K21 ;where Y 2n are random and K1 is deterministic. Note that Y1 � Y2 and Y1 + Y2 areGaussian and that E (Y1 � Y2)2 = E ((Y1 � Y2)(Y1 + Y2)) = K21 . Simple algebrayields E �(Y1 � Y2)(Y1 + Y2)�K21�2 = 2K21E (Y 21 + Y 22 ) � 4K21EY 21 :Noting that K21 � �1 + �J(T )T � �1Xi=�1 Xl�m+1 2i;j(l) �SiEY 21 � �1 + �J(T )T � �1Xi=�1Xl 2i;j(l) �Si;and recalling that �J(T )=T 2 l1, the assertion of the Lemma follows. �Lemma 5.4.2 If supz2[0;1]X� jc(z; �)j < 1 (5.18)9D �Sj2�j � D 8 j; (5.19)133

Chapter 5. Denoising the wavelet periodogram using Haar-Fiszthen b2TT ! 2 Z 10 1X�=�1 Xi Si(z)A�i;j!2 dzas T !1.Proof. Using Gaussianity, we haveb2T = 2 T�1Xt=0 T�1�tX�=�t 0@�J(T )Xi=�1 Xk !2i;k;ti;j(t� k)i;j(t+ � � k)1A2

= 2 T�1Xt=0 T�1�tX�=�t 0@�J(T )Xi=�1 Xk �Si� tT � +O�Ci + Li(t� k)T �� i;j(t� k)i;j(t + � � k)!2= 2 T�1Xt=0 T�1�tX�=�t Xi Si� tT �A�i;j!2 + RestT ;whereRestT = 2 T�1Xt=0 T�1�tX�=�t 0@�J(T )Xi=�1 Xk O�Ci + Li(t� k)T �i;j(t� k)i;j(t + � � k)� �1Xi=�J(T )�1Si� tT �A�i;j1A� 0@�J(T )Xi=�1 Xk �2Si� tT � +O�Ci + Li(t� k)T ��

� i;j(t� k)i;j(t+ � � k) + �1Xi=�J(T )�1Si� tT �A�i;j!: (5.20)Let us �rst show two simple auxiliary results.134

5.4. A Functional CLT for the centred wavelet periodogram1. Summability of constants Ci and Li. We use the properties of A from Lemma3.4.1. Xi (Ci + Li(2�i + 2�j))Ai;j = Xi (Ci + Li2�i)2j2�jAi;j+ Xi Li2�j2i2�iAi;j� 2�jXi (Ci + Li2�i)Xk 2kAi;k+ 2�jXi Li2�iXk 2kAk;j= O(2�j): (5.21)2. Summability of covariance of wavelet coe�cients.X� ��Xi Si(z)A�i;j�� = X� ��Xi Si(z)Xn i(n)j(n+ �)��= X� ��Xn c(z; n)j(n + �)�� Xn jc(z; n)jX� jj(n+ �)j� K12�jXn jc(z; n)j= O(2�j); (5.22)by assumption (5.18).By formula (5.21) and assumption (5.19), we havemaxt;� ��Xi;k O�Ci + Li(t� k)T �i;j(t� k)i;j(t+ � � k)� �1Xi=�J(T )�1Si� tT �A�i;j�� O(T�1) maxt;� Xi Ci +MLi(2�i + 2�j)Xk ji;j(t� k)i;j(t+ � � k)j++ maxt;� Xk �1Xi=�J(T )�1 �Sij(k) �O(T�1)Xi �Ci +MLi(2�i + 2�j)�Ai;j +O(2�jT�1) = O(2�jT�1): (5.23)135

Chapter 5. Denoising the wavelet periodogram using Haar-FiszUsing �rst (5.23), and then (5.22) and (5.21), we bound (5.20) as followsRestT � O(2�jT�1) T�1Xt=0 T�1�tX�=�t ��Xi;k �3Si� tT �+O�Ci + Li(t� k)T �� i;j(t� k)i;j(t+ � � k)�� O(2�jT�1) T�1Xt=0 T�1�tX�=�t ��Xi Si� tT �A�i;j��+ O(2�jT�2) T�1Xt=0 T�1�tX�=�t Xi (Ci + Li(2�i + 2�j))Ai;j= O(2�2j) +O(2�2j);which yields the result. �Proof of Theorem 5.4.1. We apply Theorem 29.14 from Davidson (1994),with Ut;T = Z(j)t;T=bT (5.24)ct;T = 1=bT (5.25)KT (z) = bzT c; (5.26)where the LHS's of (5.24) { (5.26) use the notation from Davidson (1994), andthe RHS's of these formulas use the notation from this chapter. We now checkconditions (a) | (f) from Davidson (1994).(a) Clearly, EZ(j)t;T = 0.(b) For Gaussian LSW processes, we have supt;T kZ(j)t;Tkr <1 for r > 2.(c) Satis�ed by Lemma 5.4.1 as f�t;Tg independent.(d) Satis�ed by Lemma 5.4.2 aslim supT!1 Pb(z+!)T c�1t=bzT c 1b2T! = lim supT!1 Tb2T :(e) We clearly have 1=bT = O(T 1=2�1) = O(T�1=2).136

5.5. Properties of the Haar-Fisz transform(f) Again by Lemma 5.4.2, we haveER2T (z) = b2bzT cb2T ! �(z):This completes the proof. �5.5 Properties of the Haar-Fisz transform5.5.1 Properties of the Haar-Fisz transform for M �xedIn this section, we quantify the Gaussianising, variance stabilising and decorre-lating properties of the Haar-Fisz transform for M �xed. The following theoremholds:Theorem 5.5.1 Let Xt;T satisfy the assumptions of Theorem 5.4.1, and let I(j)t;Tbe the wavelet periodogram of Xt;T at scale j. Let the corresponding functions�j(z) and P� (��j (z))2 (see formulas (5.4) and (5.13)) be continuous with boundedone-sided derivatives. Further, let �j(z) be bounded away from zero. For M �xed,UM = FMI(j)t;T admits the following decomposition:UM = VM +YM ;where1. VM has an almost-sure deterministic limit as T !1;2. pTYM D! N(0;�) as T !1, with�2M+1 � 2� inf!2[0;1]P� (��j (!))2(�j(!))2 �O(M) � �n;n� �2M+1 � 2� sup!2[0;1]P� (��j (!))2(�j(!))2 +O(M) (5.27)and �n1;n2 = O(M) for n1 6= n2: (5.28)Property 2. above is called the Gaussianisation property of the Haar-Fisz trans-form. Formulas (5.27) and (5.28) de�ne, respectively, the variance stabilisationand the decorrelation properties of the Haar-Fisz transform. Note the di�erent137

Chapter 5. Denoising the wavelet periodogram using Haar-Fiszasymptotic regimes for VM and YM : the multiplication of YM by pT is neededbecause Var(Y Mn ) = O(2M=T ); remember that M is �xed. However, for the in-vertible case (see the discussion in Section 5.3), we require M = log2(T ). Eventhough this case is extremely challenging to investigate theoretically, we cast somelight on the behaviour of F log2(T ) in Section 5.5.2. We now prove Theorem 5.5.1.Proof. Denote Zt = It � EIt and recall that ��j (z) = Pi Si(z)A�i;j. Note that�0j (z) = �j(z). Consider a single Haar-Fisz summand fmn , form 2 f0; 1; : : : ;M�1gand n 2 f0; 1; : : : ; 2m � 1g. In what follows, m;n are appropriate integers and�m;n 2 f0; 1g. We havefmn = (�1)�m;nP( m;n+1)T2�(m+1)�1t= m;nT2�(m+1) It �P( m;n+2)T2�(m+1)�1t=( m;n+1)T2�(m+1) ItP( m;n+2)T2�(m+1)�1t= m;nT2�(m+1) It= (�1)�m;n P( m;n+1)T2�(m+1)�1t= m;nT2�(m+1) Zt �P( m;n+2)T2�(m+1)�1t=( m;n+1)T2�(m+1) ZtP( m;n+2)T2�(m+1)�1t= m;nT2�(m+1) Zt +P( m;n+2)T2�(m+1)�1t= m;nT2�(m+1) E It+ (�1)�m;nP( m;n+1)T2�(m+1)�1t= m;nT2�(m+1) EIt �P( m;n+2)T2�(m+1)�1t=( m;n+1)T2�(m+1) EItP( m;n+2)T2�(m+1)�1t= m;nT2�(m+1) Zt +P( m;n+2)T2�(m+1)�1t= m;nT2�(m+1) EIt= ymn + vmn :By Theorem 5.4.1 and Cram�er's theorem (Davidson (1994), Theorem 22.14), wehavepTymn = P( m;n+1)T2�(m+1)�1t= m;nT2�(m+1) Zt �P( m;n+2)T2�(m+1)�1t=( m;n+1)T2�(m+1) ZtbT� (�1)�m;n bTpTP( m;n+2)T2�(m+1)�1t= m;nT2�(m+1) Zt +P( m;n+2)T2�(m+1)�1t= m;nT2�(m+1) EItD! �B�(( m;n + 2)2�m+1)� 2B�(( m;n + 1)2�m+1) +B�( m;n2�m+1)� (�1)�m;n21=2 �P1�=�1 R 10 (��j (z))2dz�1=2R ( m;n+2)2�m+1 m;n2�m+1 �j(z)dz ;138

5.5. Properties of the Haar-Fisz transformas T ! 1. Denote the distributional limit by ~ymn . Set Y Mn = PM�1m=0 ymn and~Y Mn = PM�1m=0 ~ymn . Denote further c(1) = 21=2 �P1�=�1 R 10 (��j (z))2dz�1=2. We havepTY Mn D! ~Y Mn = c(1) M�1Xm=0(�1)�m;n�B�(( m;n + 2)2�m+1)� 2B�(( m;n + 1)2�m+1) +B�( m;n2�m+1)R ( m;n+2)2�m+1 m;n2�m+1 �j(z)dz ;as T !1. It is immediate that E ~Y Mn = 0. We now look at the variance-covariancematrix of ~YM . We haveVar( ~Y Mn ) =c2(1)0B@M�1Xm=0 �(( m;n + 2)2�(m+1))� �( m;n2�(m+1))�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 + 2M�1Xm=0 M�1Xm0=m+1(�1)�m;n�(�1)�m0;n8><>:�2�(�m;n ^ m0;n2�(m0+1)) + 4�(�m;n ^ ( m0;n + 1)2�(m0+1))R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz R ( m0 ;n+2)2�(m0+1) m0;n2�(m0+1) �j(z)dz +�2�(�m;n ^ ( m0;n + 2)2�(m0+1)) + �( m0;n2�(m0+1))� 2�(( m0;n + 1)2�(m0+1))R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz R ( m0 ;n+2)2�(m0+1) m0;n2�(m0+1) �j(z)dz+ �(( m0;n + 2)2�(m0+1))R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz R ( m0 ;n+2)2�(m0+1) m0;n2�(m0+1) �j(z)dz9>=>;1CA ;where �m;n = ( m;n + 1)2�(m+1).Diagonal contribution. Let us �rst consider the diagonal contribution toVar( ~Y Mn ). We havec2(1)2 �(( m;n + 2)2�(m+1))� �( m;n2�(m+1))�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 = P1�=�1 R ( m;n+2)2�(m+1) m;n2�(m+1) (��j (z))2dz�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 :(5.29)139

Chapter 5. Denoising the wavelet periodogram using Haar-FiszBy Cauchy inequality and the extended mean-value theorem, we have Z ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz!2 � 2�m Z ( m;n+2)2�(m+1) m;n2�(m+1) (�j(z))2dz =2�m Z ( m;n+2)2�(m+1) m;n2�(m+1) X� (��j (z))2dz R ( m;n+2)2�(m+1) m;n2�(m+1) (�j(z))2dzR ( m;n+2)2�(m+1) m;n2�(m+1) P� (��j (z))2dz =2�m Z ( m;n+2)2�(m+1) m;n2�(m+1) X� (��j (z))2dz (�j(!))2P� (��j (!))2 ;where ! 2 [ m;n2�(m+1); ( m;n + 2)2�(m+1)]. This, combined with (5.29), gives2m+1 inf!2[0;1]P� (��j (!))2(�j(!))2 � c2(1) �(( m;n + 2)2�(m+1))� �( m;n2�(m+1))�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 : (5.30)To obtain the upper bound, note that 9!1; !2 2 [ m;n2�(m+1); ( m;n + 2)2�(m+1)]such that1X�=�1Z ( m;n+2)2�(m+1) m;n2�(m+1) (��j (z))2dz �2m Z ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz!2 2�m R ( m;n+2)2�(m+1) m;n2�(m+1) (�j(z))2dz�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 �� sup!2[0;1]P� (��j (!))2(�j(!))2 =2m Z ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz!2 �j(!1)�j(!2) sup!2[0;1]P� (��j (!))2(�j(!))2 �2m Z ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz!2�1 + 2�m sup!2[0;1] j� 0j(!)jinf!2[0;1] �j(!) �� sup!2[0;1]P� (��j (!))2(�j(!))2 ;where � 0j is the one-sided derivative of �j. The above, combined with (5.29), yieldsc2(1) �(( m;n + 2)2�(m+1))� �( m;n2�(m+1))�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 �2m+1 sup!2[0;1]P� (��j (!))2(�j(!))2 + 2 sup!2[0;1] j� 0j(!)jinf!2[0;1] �j(!) sup!2[0;1]P� (��j (!))2(�j(!))2 =2m+1 sup!2[0;1]P� (��j (!))2(�j(!))2 +O(1): (5.31)140

5.5. Properties of the Haar-Fisz transformO�-diagonal contribution. Two cases are possible: either �m;n � ( m0;n+2)2�(m0+1)or �m;n � m0;n2�(m0+1). In either of the two cases, we have��2� ��m;n ^ m0;n2�(m0+1)� + 4� ��m;n ^ ( m0;n + 1)2�(m0+1)�+�2� ��m;n ^ ( m0;n + 2)2�(m0+1)�+ � � m0;n2�(m0+1)��2� �( m0;n + 1)2�(m0+1)� + � �( m0;n + 2)2�(m0+1)�� = �� m0;n2�(m0+1)��2� �( m0;n + 1)2�(m0+1)� + � �( m0;n + 2)2�(m0+1)�� 2�2m0�1 sup!2[0;1] j�00(!)j;where the last inequality follows by the mean-value theorem and �00 denotes theone-sided derivative of �0. Using the above, and, again, the mean-value theorem,we bound the o�-diagonal contribution by2c2(1) M�1Xm=0 1R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz M�1Xm0=m+1 2�(m0+1) 2�m0 sup!2[0;1] j�00(!)jR ( m0 ;n+2)2�(m0+1) m0;n2�(m0+1) �j(z)dz �2c2(1) sup!2[0;1] j�00(!)jinf!2[0;1] �j(!) M�1Xm=0 1R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz M�1Xm0=m+1 2�(m0+1) �2c2(1) sup!2[0;1] j�00(!)jinf!2[0;1] �j(!) M�1Xm=0 2�(m+1)R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz �c2(1) sup!2[0;1] j�00(!)jinf!2[0;1](�j(!))2 M = O(M): (5.32)Putting together (5.30), (5.31) and (5.32), we �nally arrive at�2M+1 � 2� inf!2[0;1]P� (��j (!))2(�j(!))2 � O(M) � Var( ~Y Mn )� �2M+1 � 2� sup!2[0;1]P� (��j (!))2(�j(!))2 +O(M): (5.33)Let us now consider Cov( ~Y Mn1 ; ~YMn2 ) for n1 6= n2. Let M 0 = #fm : ~ymn1 = ~ymn2g. Letus look at the case M 0 > 0 (the case M 0 = 0 is straightforward). It is easy to show141

Chapter 5. Denoising the wavelet periodogram using Haar-FiszthatCov( ~Y Mn1 ; ~Y Mn2 ) =Cov M 0�1Xm=0 ~ymn1 + ~yM 0n1 + M�1Xm=M 0+1 ~ymn1;M 0�1Xm=0 ~ymn1 � ~yM 0n1 + M�1Xm=M 0+1 ~ymn2! =Var M 0�1Xm=0 ~ymn1!� Var�~yM 0n1 � + E M 0�1Xm=0 ~ymn1 M�1Xm=M 0+1 ~ymn1 + M�1Xm=M 0+1 ~ymn2!+~yM 0n1 M�1Xm=M 0+1 ~ymn1 � M�1Xm=M 0+1 ~ymn2!! :The expectation can be shown to be O(M) using the same methodology asfor bounding the o�-diagonal component of Var( ~Y Mn ). We will now show thatVar(PM 0�1m=0 ~ymn1) � Var(~yM 0n1 ) = O(M). We �rst quote two simple facts: let gbe a continuous function with a bounded one-sided derivative over [0; 1] and let[c; d] � [a; b] � [0; 1]. We have��Z ba g(z)dz � b� ad� c Z dc g(z)dz�� (b� a)2 supz jg0(z)j (5.34)��Z dc g(z)dz�2 � �d� cb� a�2�Z ba g(z)dz�2�� (d� c)2(b� a) supz j(g2(z))0j:(5.35)For simplicity, denote n = n1. Using again the same method as for bounding theo�-diagonal component of the variance, we obtainVar M 0�1Xm=0 ~ymn !� Var(~yM 0n ) = O(M) + 2M 0�1Xm=0 P1�=�1 R ( m;n+2)2�(m+1) m;n2�(m+1) (��j (z))2dz�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 �2P1�=�1 R ( M0;n+2)2�(M0+1) M0;n2�(M0+1) (��j (z))2dz�R ( M0 ;n+2)2�(M0+1) M0;n2�(M0+1) �j(z)dz�2 = O(M) + 2M 0�1Xm=00BBB@P1�=�1 R ( m;n+2)2�(m+1) m;n2�(m+1) (��j (z))2dz�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 � P1�=�1 R ( M0 ;n+2)2�(M0+1) M0;n2�(M0+1) (��j (z))2dz2M 0�m�R ( M0;n+2)2�(M0+1) M0;n2�(M0+1) �j(z)dz�21CCCA+2�M 0+1P1�=�1 R ( M0;n+2)2�(M0+1) M0;n2�(M0+1) (��j (z))2dz�R ( M0 ;n+2)2�(M0+1) M0;n2�(M0+1) �j(z)dz�2 :142

5.5. Properties of the Haar-Fisz transformConsider a single component of the sum over m: it is a di�erence of two ratioswhich we denote here by I � II to shorten the notation. We have jI � IIj �jI � IIIj+ jIII � IIj, whereIII = 2M 0�mP1�=�1 R ( M0;n+2)2�(M0+1) M0;n2�(M0+1) (��j (z))2dz�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 :Using (5.34), we getjI � IIIj � 2�2m sup!2[0;1] ��P� (��j (!))2�0��R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 � sup!2[0;1] ��P� (��j (!))2�0��inf!2[0;1] �j(!)�2 :On the other hand, using (5.35) we havejIII � IIj =2M 0�m ��R ( M0;n+2)2�(M0+1) M0;n2�(M0+1) �j(z)dz�2 � 22(m�M 0) �R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2�R ( m;n+2)2�(m+1) m;n2�(m+1) �j(z)dz�2 ��P1�=�1 R ( M0;n+2)2�(M0+1) M0;n2�(M0+1) (��j (z))2dz�R ( M0;n+2)2�(M0+1) M0;n2�(M0+1) �j(z)dz�2 �2�M 0P1�=�1 R ( M0 ;n+2)2�(M0+1) M0;n2�(M0+1) (��j (z))2dz�R ( M0 ;n+2)2�(M0+1) M0;n2�(M0+1) �j(z)dz�2 sup!2[0;1]((�j(!))2)0�inf!2[0;1] �j(!)�2 ;which is bounded by (5.33). This proves the assertion that Var(PM 0�1m=0 ~ymn1) �Var(~yM 0n1 ) = O(M). Setting V Mn = PM�1m=0 vmn completes the proof of the theorem.�5.5.2 Properties of the Haar-Fisz transform forM = log2(T )In the asymptotic framework set out in Section 5.5.1, we assume that M is �xed,and therefore the length of the Haar-Fisz-transformed vector FMI is always con-stant and equal to 2M , even though T !1. This ensures the asymptotic Gaus-sianity of FMI, in the sense speci�ed by Theorem 5.5.1. However, to obtain an143


o

o

oo

oo

o

o

oo

o

oo

oo

o

o

o

o

ooo

oo

oo

o

o

o

o

ooo

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

oo

ooo

oo

o

oo

oo

oo

o

o

o

oo

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

oo

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

ooo

o

oo

o

o

o

o

o

o

o

o

oo

o

ooo

o

o

o

oo

oo

o

o

o

o

o

o

o

ooooo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

oo

oo

oo

o

o

o

ooo

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

ooo

o

o

oo

ooo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

oo

o

o

o

o

oo

oo

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

oo

o

o

oo

o

o

o

ooo

o

o

oo

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

oo

o

o

o

oo

o

o

oo

o

o

o

oo

o

o

o

oo

o

o

oo

o

oo

o

oo

o

o

o

oo

oo

oo

oo

o ooo

o

oo

oo

oo

o

o

oo

o

o

o

o

o

ooo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

ooo

o

o

oo

o o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o


-3 -1 0 1 2 3

-1.0

0.0

1.0

rhova

rianc

e-1.0 0.0 0.5 1.0

0.0

0.2

0.4

Figure 5.3: Left plot: the q-q plot of f9 arising from the Haar periodogram of apure white noise process at scale j = �1 (against the normal quantiles). Rightplot: solid line | the variance of f log2(T )�1n against the correlation of the Gaussianvariables involved; dotted line | variance = 0:4 (see text for further description).invertible operator, we need to set M = log2(T ). Simulations suggest that theasymptotic distribution of F log2(T ) is not exactly Gaussian, which is not surprisinggiven the fact that the distribution off log2(T )�1n = I(j)2n;T � I(j)2n+1;TI(j)2n;T + I(j)2n+1;T(see the second example in Section 5.3.2) is far from Gaussian. To illustrate thisstatement, let us consider the Haar periodogram sequence I(�1)t;1024 of a pure whitenoise process. The left plot in Figure 5.3 shows the q-q plot of the correspondingsequence f9 against the normal quantiles: its distribution is strongly deviatedfrom Gaussian in the tails. Other extensive simulations have shown that, for awide range of processes, the distribution of fM gets closer to Gaussianity as Mdecreases (as expected, see proof of Theorem 5.5.1).However, even though taking M = log2(T ) (instead of keeping it �xed) spoilsthe asymptotic Gaussianity of FMI, it does not seem to upset the other importantproperty of FMI: the variance stabilisation. To illustrate this point, we consider144

5.5. Properties of the Haar-Fisz transformthe variance of the summand f log2(T )�1n . Note that f log2(T )�1n is always of theform f log2(T )�1n = (�21 � �22 )=(�21 + �22), where (�1; �2) is bivariate normal with mean(0; 0). For simplicity, we assume that Var(�1) = Var(�2), which is not a restrictiveassumption: due to the local stationarity property, the two variances tend to thesame limit as T ! 1. Let � = corr(�1; �2). By straightforward computation, itcan be shown thatVar �f log2(T )�1n � = 1� Z 1�1 �(1� u2)2 � �(1� u2) �+ 2up1� �2�2�2(1 + u2)�(1� u2)2 + �(1� u2) � + 2up1� �2�2�2du:The right plot in Figure 5.3 shows the graph of Var�f log2(T )�1n � against �. It canbe seen that Var�f log2(T )�1n � is \stable" for a wide range of correlation values:indeed, the variance is between 0:4 and 0:5 for j�j � 0:74. This implies thatwhile incorporating f log2(T )�1 spoils the asymptotic Gaussianity property of theHaar-Fisz transform, it helps achieve its variance stabilisation property.A similar variance stabilisation phenomenon occurs for fM for M < log2(T )�1.5.5.3 SimulationAs an illustration of the Gaussianisation and the variance stabilisation propertiesof the Haar-Fisz transform, consider the process Xt;T = �(t=T )Yt;T , where Yt;T =�(t=T )Yt�1;T + "t with j�(z)j < 1 and "t � N(0; 1) i.i.d. It can easily be shownthat the local autocovariance function for Xt;T has the formc(z; �) = �2(z) �(z)�1� �(z)2and, for �j(z) arising from the Haar periodogram, we have�j(z) = �2(z)1� �(z)2 + 2j+3�(z)2�j�1+1 � 6 2j�(z)� 2j+1�(z)2�j+1(1� �(z)2)(1� �(z))2 :We consider the following two cases:TVAR. �2(z) = 1 and �(z) = 1:8z � 0:9, so that Xt;T is a time-varying AR(1)process; 145

Chapter 5. Denoising the wavelet periodogram using Haar-FiszTMWN. �2(z) is a scaled Donoho & Johnstone (1995) bumps function with (min,max) values of (1/8, 8), and �(z) = 0, so that Xt;T is a time-modulated whitenoise process.In both of these models, we simulate 100 sample paths for both T = 256 andT = 1024. For each of the simulated sample paths, we compute the waveletperiodogram at scales j = �1; : : : ;� log2(T ). For each of the periodogram se-quences I(j)t;T obtained in this way, we compute the residuals FMI(j)t;T �FM�j(t=T )for M = log2(T ) � 2; log2(T ) � 1; log2(T ). We assess the Gaussianity of each se-quence of residuals by looking at the p-value of the Kolmogorov-Smirnov statistic,returned by the S-Plus function ks.gof. For comparison, we also consider theresiduals from the log transform: log(I(j)t;T )� log(�j(t=T )).The results of the experiment are shown in Figure 5.4. We observe that forM = log2(T ) � 2, the proportion of p-values exceeding 5% is close to 95% forj = �1; : : : ;�5, so that residual sequences at these scales can be regarded asapproximately Gaussian. However, even for M = log2(T ) the proportion of p-values exceeding 5% is incomparably larger than the same proportion computedfor the log transform. Indeed, for T = 1024, no p-value exceeded the 5% thresholdfor the log transform.The above experiment demonstrates that even for M = log2(T ) (the invertiblecase), the Haar-Fisz transform is a far better Gaussianiser than the log transform.In practice, we often observe a degree of correlation in FMI(j), particularly atcoarser scales, i.e. for large negative j. This has to be taken into account whendenoising Haar-Fisz transformed sequences.5.6 Denoising the wavelet periodogramIn this section, we �rst outline our general methodology for denoising the waveletperiodogram of a Gaussian LSW process Xt;T , basing on a single stretch of obser-vations. Then, we provide simulation results which demonstrate the e�ectiveness146

5.6. Denoising the wavelet periodogram

1 2 3 4 5 6 7 8

0.0

0.4

0.8

1 2 3 4 5 6 7 8

0.0

0.4

0.8

1 3 5 7 9

0.0

0.4

0.8

1 3 5 7 9

0.0

0.4

0.8

Figure 5.4: Proportion of p-values exceeding or equal to 5% (x-axis shows negativescale �j). Left column: results for TVAR, right column: results for TMWN.Top row: T = 256, Bottom row: T = 1024. Solid line: M = log2(T ), dottedline: M = log2(T ) � 1, dashed line: M = log2(T ) � 2, long-dashed line: the logtransform. Horizontal solid line: 0.95.147

Chapter 5. Denoising the wavelet periodogram using Haar-Fiszof our technique.The generic algorithm consists of the following steps.1. For each j = �1; : : : ;�J(T ), compute the raw wavelet periodogram I(j)t;T . Inpractice, this is done by taking the non-decimated wavelet transform of Xt;Tdown to the level �J(T ), and then squaring the result. For computationalconvenience, we use periodic boundary treatment; another option would beto use e.g. symmetric boundary treatment.2. For each j = �1; : : : ;�J(T ), take the Haar-Fisz transform of I(j)t;T at a �xedresolution level M � log2(T ).3. For each j, denoise the Haar-Fisz transformed periodogram sequence usingany wavelet denoising technique suitable for correlated Gaussian noise withconstant variance. The wavelet denoising procedure employed at this stagemay be of a translation-invariant (TI) type: we refer to TI-denoising at thisstage as \internal" cycle-spinning (CS).4. For each j, take the inverse Haar-Fisz transform of the denoised data.5. If M < log2(T ), then for each j interpolate the estimates obtained in thisway to the grid ft=TgT�1t=0 (so that they are of length T and not 2M < T ). Inour empirical investigation, we used simple linear interpolation. For each j,take the result to be an estimate of �j(z).6. For a �xed integer S, let s = 1; : : : ; S� 1. For each j, shift I(j)t;T cyclically bys, denoise the shifted version using steps 2. { 5. of this algorithm, and shiftback by s to obtain an estimate of �j(z). The CS at this stage is referred toas \external" cycle-spinning.7. For each j, the �nal estimate of �j(z) is obtained by averaging over theestimates obtained through the S shifts.A few remarks are in order. 148

5.6. Denoising the wavelet periodogramComputational complexity. Steps 1. { 5. of the algorithm are each of compu-tational order O(TJ(T )), provided that the wavelet denoising method usedin step 3. has complexity O(T ). Therefore, the whole algorithm 1. { 7. is ofcomputational order O(STJ(T )). In practice, the software is fast.Use of wavelets. It is worth recalling here that, e�ectively, we use wavelets atfour di�erent stages of the denoising procedure:1. First of all, a non-decimated wavelet system is used in the construc-tion of the LSW process Xt;T .2. The same system is used to compute the wavelet periodogram I(j)t;T instep 1. of the denoising algorithm.3. The (inverse) Haar-Fisz transform in step 2. (4.) relies on the Haartransform: thus, wavelets are used for the third time.4. Finally, we use wavelets (possibly a di�erent family, say ~ ) to denoisethe Haar-Fisz transformed periodogram in step 3.Cycle-spinning. Let S be the shift-by-one-operator from Nason & Silverman(1995). The Haar-Fisz transform is not translation-equivariant since SFM 6=FMS. Therefore, it is potentially bene�cial to apply the external CS of step6. even if step 3. uses internal CS.We now move on to describe our particular simulation setup.5.6.1 SimulationIn this section, we describe the details of our simulation study which compares theperformance of our Haar-Fisz denoising algorithm with the original technique ofNason et al. (2000).The \test processes" used in this section are the same as those in Section 5.5.3:TVAR and TMWN. We consider the Haar periodogram of TVAR and TMWN,for sample paths of length 256 and 1024. In step 3. of the Haar-Fisz denoising149

Chapter 5. Denoising the wavelet periodogram using Haar-Fiszalgorithm, we use non-TI level-dependent universal hard thresholding, appropriatefor correlated Gaussian data as described in Johnstone & Silverman (1997). At thisstage, we use Daubechies' Least Asymmetric wavelets with 4 vanishing moments,in both our algorithm and that of Nason et al. (2000).Computational experiments suggest that for correlated noise, the choice of pri-mary resolution (PR) is of utmost importance. We do not choose the PR automat-ically (actually, we are unaware of any existing technique for performing automaticPR selection when the noise is correlated), but instead, we subjectively choose thePR for which the method of Nason et al. (2000) gives the most visually appealingresults for the wavelet periodogram at the �nest scale, i.e. j = �1. We also usethe same PR in our algorithm. The particular values of the PR are:� 7 for TMWN 1024;� 6 for TMWN 256;� 4 for TVAR 1024;� 3 for TVAR 256.We use S = 10 external cycle-shifts. Using more shifts is likely to be bene�cialin terms of MISE but is also more burdensome computationally. We only reportresults for M = log2(T ) (i.e. for the full invertible Haar-Fisz transform).Figure 5.5 shows estimates of the local variance constructed from the estimatesof the periodogram (formula (3.65)) obtained using the two methods describedabove, for particular sample paths of TMWN 1024 and TVAR 1024. For bothsample paths, our method achieves lower ISE.Figure 5.6 shows, for each j, the di�erences between the logarithm of the ISEin estimating �j(z) for the method of Nason et al. (2000), and for our Haar-Fisz algorithm. The results are averaged over 100 simulated sample paths. Ouralgorithm is superior in most of the cases, except for the 4 �nest scales in TVAR150

5.6. Denoising the wavelet periodogram

0 400 800

02

46

8

0 400 800

02

46

8

0 400 800

24

68

12

0 400 800

12

34

5

Figure 5.5: Solid lines: estimates of the local variances for T = 1024 in the TMWNmodel (top row), and the TVAR model (bottom row), using the method of Nasonet al. (2000) (left column) and the Haar-Fisz algorithm (right column) as describedin the text. Dotted lines: true local variances.151


1 2 3 4 5 6 7 8

-0.1

0.1

0.3

1 2 3 4 5 6 7 8

0.0

0.2

0.4

1 3 5 7 9

0.0

0.15

0.30

1 3 5 7 9

0.0

0.4

0.8

Figure 5.6: Solid line: di�erence between logged MISE for Nason et al. (2000)and for our Haar-Fisz algorithm (x-axis shows negative scale �j). Positive valuemeans our algorithm does better. Left column: results for TVAR, right column:results for TMWN. Top row: T = 256, bottom row: T = 1024. Dotted line: zero.152

5.7. Real data example: the Dow Jones index256, and the 3 coarsest scales in TMWN 1024. A similar pattern has been obtainedfor other values of the PR.We have also performed additional simulations for M = log2(T ) � 1 and M =log2(T )� 2. It turned out that as long as the PR remained �xed, the choice of Mhad very little in uence upon the estimates.On a �nal note, it must be mentioned that other denoising methods can also beused in step 3., and our algorithm can only bene�t from this exibility. Some ofthe techniques for correlated data are reviewed in Opsomer et al. (2001). We havealso experimented with the eBayes method of Johnstone & Silverman (2003) andobtained encouraging results.5.7 Real data example: the Dow Jones indexIn this section, we perform a local variance analysis of the DJIA se-ries Dt;T of Section 4.5 (T = 1024). The source of the data ishttp://bossa.pl/notowania/daneatech/metastock (page in Polish). We usedthe following four methods to compute the local variance of Dt;T :1. Our Haar-Fisz method of Section 5.6, based on the Haar periodogram, withthe following parameters: M = 10, S = 10, step 3. applied non-TI level-dependent hard universal thresholding using Daubechies' Least Asymmetricwavelet with 4 vanishing moments. PR = 7.2. Our Haar-Fisz method of Section 5.6, based on the Haar periodogram, withthe following parameters: M = 10, S = 10, step 3. used the S-Plus splinesmoothing routine smooth.spline with default parameters.3. A modi�cation of our Haar-Fisz method: instead of the sequences of thewavelet periodogram of Dt;T , the input to the Haar-Fisz algorithm was D2t;T .We took the smoothed version of D2t;T to be the estimate of the local variance.The parameters of the Haar-Fisz algorithm were: M = 10, S = 10, step153

Chapter 5. Denoising the wavelet periodogram using Haar-Fisz3. used the S-Plus spline smoothing routine smooth.spline with defaultparameters.4. The method of Nason et al. (2000) with the following parameters: TI level-dependent universal hard thresholding using Daubechies' Least Asymmetricwavelet with 4 vanishing moments, PR = 7. The smooth.dev parameter inthe ewspec routine (Nason (1998)) was set to var.The results for PR 6= 7 were less convincing. Figure 5.7 shows all four estimatesplotted on a log scale. The two estimates based on spline smoothing show the leastvariability, the estimate 4. is the most variable, and the estimate 1. | the secondmost variable. Moreover, 1. estimates the variance at a slightly higher level thanthe other three methods.One interesting question which can be asked is whether or not Dt;T can be mod-elled as Gaussian. This can be examined, for example, by dividing Dt;T by thesquare root of the estimates of the local variance, and looking at the distributionof the residuals. Figure 5.8 shows the qqnorm plot of the empirical quantiles ofthe residuals against the quantiles of the standard normal, for the four methodsdescribed above. The surprising observation is that all four plots consistentlyindicate that the upper tail is slightly platykurtic. However, there is no consis-tency in the assessment of the behaviour of the lower tail: here, 3 plots indicateplatykurtosis, but the result of method 3. suggests slight leptokurtosis.However, the p-values of the Kolmogorov-Smirnov test (returned by the S-Plusroutine ks.gof) are large for each of the 4 sequences of residuals. In this sense, itcan be concluded that the departure of Dt;T from Gaussianity is insigni�cant.This is in stark contrast to stationary nonlinear modelling (e.g. (G)ARCH orStochastic Volatility), where, typically, the marginal distribution of �nancial log-returns is modelled as heavily leptokurtic.154

5.8. Conclusion

0 200 400 600 800 1000

-11

-10

-9-8

-7-6

Figure 5.7: Four estimates of the local variance of Dt;T on a log scale. Solid line:method 1. Dashed line: method 2. Long-dashed line: method 3. Dotted line:method 4. See text for further description.5.8 ConclusionIn this chapter, we have introduced a Haar-Fisz variance-stabilising transform forthe wavelet periodogram (WP) of a Gaussian LSW process. The transform, per-formed in the wavelet domain by dividing the Haar detail coe�cients of the WPby the corresponding smooth coe�cients (an instance of the so-called Fisz trans-form), brings the distribution of the WP closer to normality, as well as stabilisingits variance. This makes the WP more amenable to standard denoising techniqueswhich require stationary Gaussian noise. The computational complexity of theHaar-Fisz transform is linear in the number of data points, which is required to155


o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

oo

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

oooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

-2 0 2

-2-1

01

2

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

oo

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

oo

o

oo

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

oooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

oo

o

o

o

oo

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

-2 0 2

-3-2

-10

12

3

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

oo

oo

oo

oo

oo

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

oo

o

oo

oo

o

oo

o

o

o

oo

o

o

o

o

o

o

ooo

o

oo

o

oo

o

o

oo

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

oooo

o

o

o

o

oo

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

oo

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

oo

o

o

o

oo

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

-2 0 2

-20

2

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

oo

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

ooo

o

oo

o

o

oo

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

oo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

ooo

o

ooo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

ooo

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

oo

o

o

o

o

o

o

oo

o

o

ooo

o

o

o

o

o

o

oooo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

oo

o

o

o

ooo

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

oo

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

oo

o

o

o

o

o

oo

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

ooo

o

o

o

o

o

oo

o

oo

o

o

o

o

o

oo

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

-2 0 2

-3-2

-10

12

3

Figure 5.8: Empirical quantiles of the residuals of Dt;T against the quantiles of thestandard normal. Top left: method 1. Top right: method 2. Bottom left: method3. Bottom right: method 4. See text for further description.156

5.8. Conclusionbe a power of two.In order to analyse theoretical properties of the Haar-Fisz transform in a certainasymptotic setting, we have formulated and proved a functional central limit theo-rem (FCLT) for the centred WP. Next, we have applied our FCLT to demonstratethe Gaussianising, variance-stabilising and decorrelating properties of the Haar-Fisz transform in the case where the length of the output vector remains constantas the length of the input vector goes to in�nity.Exact asymptotic Gaussianity does not hold if the length of the output vectorof the Haar-Fisz transform matches the length of the input vector (which is themore interesting case in practice). However, we have provided some numericalevidence that the limiting distribution is still not far from Gaussian, and that itsvariance is well stabilised. Extensive simulations have shown that even in thiscase, the Haar-Fisz transform is a far more e�ective Gaussianiser than the usuallog transform.Next, we considered a denoising algorithm for the WP, based on the Haar-Fisztransform. Theory has shown that the new algorithm is computationally fast,and simulation | that its MISE performance is better than that of the existingcompetitor.Finally, several variants of the algorithm have been used to compute the localvariance of the time series of daily log returns on the Dow Jones index. All ofthem consistently demonstrated that the series can be modelled as Gaussian.The S-Plus routines implementing the algorithm, as well as the data set, areincluded on the associated CD.

157

Chapter 6A Haar-Fisz algorithm forPoisson intensity estimationIn this chapter, we propose a Haar-Fisz-type algorithm for estimating the discre-tised intensity function of an inhomogeneous one-dimensional Poisson process, inthe regression setting speci�ed in Section 2.3.3. The Haar-Fisz principle was al-ready introduced in Chapter 5: take the Haar transform of the data, divide thearising detail coe�cients by the corresponding smooth coe�cients raised to anappropriate power, and then take the inverse Haar transform. In this chapter, weapply this algorithm to sequences of Poisson counts, with the aim of stabilisingtheir variance and bringing their distribution close to Gaussianity. This then en-ables us to apply known denoising techniques suitable for i.i.d. Gaussian noiseto estimate the underlying Poisson intensity. Simulations demonstrate that ourdenoising method usually signi�cantly outperforms the existing state-of-the-arttechniques.Some results of this chapter were used, in a modi�ed form, in the article by P.Fryzlewicz and G. P. Nason (2003) \A Haar-Fisz algorithm for Poisson intensityestimation", to appear in the Journal of Computational and Graphical Statistics.Throughout the thesis, this article will be referred to as Fryzlewicz & Nason (2004).

158

6.1. The Fisz transform for Poisson variables6.1 The Fisz transform for Poisson variablesIn this section, we come back to the theorem by Fisz quoted in Section 5.1 asTheorem 5.1.1, and attempt to apply it to independent Poisson variables. Let�(�) be a Poisson variable with mean �. It is shown in the original paper byFisz (1955) that �(�) satis�es the �rst two assumptions of Theorem 5.1.1. Usingthe original notation from Fisz (1955), assume that �1(�1), �2(�2) are independentand �1=�2 ! 1 as (�1; �2) ! (1;1). Then, �(�1; �2) (see formula (5.2)) isasymptotically normal N � �2 � �1(�2 + �1)p ; p�2 + �1(�2 + �1)p� :Note that setting p = 1=2 makes the variance of �(�1; �2) independent of �. There-fore, to achieve the variance stabilisation property of the Haar-Fisz transform forPoisson data, we shall need to divide the Haar detail coe�cients by the squareroot of the corresponding smooth coe�cients. This is in contrast to the waveletperiodogram case where power p = 1 had to be used.Recall from Section 5.1 the de�nition of �1=2 (the Fisz transform with exponent1=2): �1=2(X1; X2) = (X1 �X2)=(X1 +X2)1=2; (6.1)with the convention 0=0 = 0. Let Xi � Pois(�i) for i = 1; 2 and X1, X2independent. Theorem 5.1.1 only determines the behaviour of �1=2(X1; X2) if(�1; �2) ! (1;1) and �1=�2 ! 1. In practice, it would be useful to inves-tigate the behaviour of �1=2(X1; X2) if the means of X1, X2 are not necessarily\large" or \close". More speci�cally, we are interested in� how well the Fisz transform can Gaussianise and stabilise variance,� how well we can determine the mean, i.e. how close E ��1=2(X1; X2) is to�1=2(�1; �2), 159

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationfor a whole range of �i. These issues would be challenging to investigate the-oretically. However, to cast some light we performed the following simulationexperiment. We chose values of �i to range from 1 to 40 in steps of 1. For eachpair (�1; �2) we drew 105 values of �1=2(X1; X2) as de�ned by (6.1) and denotedthe sample by z(�1; �2). For a comparison of Gaussianisation we also computedAnscombe's transform, as mentioned in Section 2.3.3, to the Xi which arose fromthe larger �i (this comparison was charitable to Anscombe: either X1 or X2 couldbe used but Anscombe works better for larger intensities).Figure 6.1 gives some idea of how well the Fisz transform Gaussianises, stabilisesvariance and how close �z(�1; �2) is to �1=2(�1; �2). The top left �gure shows thatFisz is always \more Gaussian" than Anscombe. The top right �gure merely showsthat �z(�1; �2) is very close to �1=2(�1; �2). The bottom row of Figure 6.1 showsthat the variance of z(�1; �2) is stable and close to one for a wide range of (�1; �2).To summarise, the above experiment shows that �1=2(X1; X2), the Fisz transformof X1 and X2 with exponent 1=2, can be thought of as an approximately Gaussianvariable with mean �1=2(�1; �2) and variance bounded above by (and close to) one.The above discussion concentrates on the properties of individual Fisz-transformed Poisson variables. However, as we observed earlier, the Fisz transformwith exponent 1=2 can be viewed as the division of a Haar detail coe�cient bythe square root of the corresponding smooth coe�cient. Motivated by this ob-servation, we now introduce a full Haar-Fisz transform, where we perform thisoperation on all Haar detail coe�cients of a given vector of Poisson counts.6.2 The Haar-Fisz transform for Poisson countsIn this section, we provide details of the Haar-Fisz transform, which stabilisesthe variance of sequences of Poisson counts and brings their distribution closerto normality. The input to the algorithm is a vector v = (v0; v1; : : : ; vN�1) forN = 2J , where vi � 0 for all i. Typically, v will be a vector of Poisson counts. The160

6.2. The Haar-Fisz transform for Poisson counts

1020

3040

lambda_110

20

30

40

lambda_2

00.

020.

040.

060.

080.

1

1020

3040

lambda_110

20

30

40

lambda_2

00.

010.

020.

030.

04

10 20 30 40

lambda_110

2030

40

lambda_2

00.

20.

40.

60.

81

1.2

Var

(zet

a(X

_1, X

_2))

lambda_1

lam

bda_

2

0 10 20 30 40

010

2030

40 0.4

0.4

0.5

0.5

0.6

0.6

0.7

0.7

0.8

0.8

0.9

Figure 6.1: Top left: Di�erence between Kolmogorov-Smirnov test statistics com-puted on Anscombe-transformed Poisson variables with intensity max(�1; �2), andz(�1; �2). Positive di�erence means that Haar-Fisz is closer to Gaussian. Topright: j�z(�1; �2)� �1=2(�1; �2)j. Bottom left (and right): perspective (and contour)plot of Var(z(�1; �2)).161

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationoutput of the Haar-Fisz transform is a vector u = (u0; u1; : : : ; uN�1), constructedas follows:1. Let sJn = vn (6.2)2. For each j = J � 1; J � 2; : : : ; 0, recursively form vectors sj and f j:sjn = 12(sj+12n + sj+12n+1) (6.3)f jn = sj+12n � sj+12n+12psjn ; (6.4)for n = 0; 1; : : : ; 2j � 1 (with the convention 0=0 = 0).3. For each j = 0; 1; : : : ; J � 1, recursively modify the vectors sj+1:sj+12n = sjn + f jn (6.5)sj+12n+1 = sjn � f jn; (6.6)for n = 0; 1; : : : ; 2j � 1.4. Set u := sJ .For the purpose of this chapter, denote Fv := u. The nonlinear operator F iscalled the Haar-Fisz transform of v. A few important remarks are in order.� The algorithm is invertible, i.e. v can be reconstructed from Fv by reversingthe steps 4.{1.� The steps 2.{4. of the algorithm are similar to the forward and inverseDiscrete Haar Transform except the division by (sjn)1=2 in formula (6.4).� Formula (6.4) can be written asf jn = 2�1=2�1=2(sj+12n ; sj+12n+1): (6.7)In other words, f jn is the (scaled) result of the Fisz transform with exponent1=2 of two neighbouring smooth coe�cients sm+12n and sm+12n+1.162

6.2. The Haar-Fisz transform for Poisson counts� Like the Haar DWT and the Haar-Fisz transform for the wavelet peri-odogram, the Haar-Fisz transform for Poisson counts is of computationalorder O(N).6.2.1 ExampleAs an example, we demonstrate the Haar-Fisz transform applied to the inputvector v of length 8 now with all positive entries. The Haar-Fisz transform u = Fvis given byu0 = P7i=0 vi8 + P3i=0 vi �P7i=4 vi2p2qP7i=0 vi + v0 + v1 � (v2 + v3)2qP3i=0 vi + v0 � v1p2pv0 + v1 ;u1 = P7i=0 vi8 + P3i=0 vi �P7i=4 vi2p2qP7i=0 vi + v0 + v1 � (v2 + v3)2qP3i=0 vi � v0 � v1p2pv0 + v1 ;u2 = P7i=0 vi8 + P3i=0 vi �P7i=4 vi2p2qP7i=0 vi � v0 + v1 � (v2 + v3)2qP3i=0 vi + v2 � v3p2pv2 + v3 ;u3 = P7i=0 vi8 + P3i=0 vi �P7i=4 vi2p2qP7i=0 vi � v0 + v1 � (v2 + v3)2qP3i=0 vi � v2 � v3p2pv2 + v3 ;u4 = P7i=0 vi8 � P3i=0 vi �P7i=4 vi2p2qP7i=0 vi + v4 + v5 � (v6 + v7)2qP7i=4 vi + v4 � v5p2pv4 + v5 ;u5 = P7i=0 vi8 � P3i=0 vi �P7i=4 vi2p2qP7i=0 vi + v4 + v5 � (v6 + v7)2qP7i=4 vi � v4 � v5p2pv4 + v5 ;u6 = P7i=0 vi8 � P3i=0 vi �P7i=4 vi2p2qP7i=0 vi � v4 + v5 � (v6 + v7)2qP7i=4 vi + v6 � v7p2pv6 + v7 ;u7 = P7i=0 vi8 � P3i=0 vi �P7i=4 vi2p2qP7i=0 vi � v4 + v5 � (v6 + v7)2qP7i=4 vi � v6 � v7p2pv6 + v7 :(6.8)Formulas (6.8) are a special case of the general formula for F given in the nextsection.6.2.2 A general formula for the Haar-Fisz transformWe will now introduce an explicit general formula for the operator F which isused in the proofs later in this chapter. Let v = (v0; v1; : : : ; vN�1) be the vector163

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationof Poisson counts, and let u = (u0; u1; : : : ; uN�1) be the Haar-Fisz transform ofv: u = Fv. Bearing in mind that N is an integer power of two, we denoteJ = log2(N). We introduce the family of Haar wavelet vectors f j;kg, where j =0; 1; : : : ; J� 1 is the scale parameter, and k = l2J�j; l = 0; 1; : : : ; j, is the locationparameter. The components of j;k will be denoted by j;kn , for n = 0; 1; : : : ; N�1.We de�ne j;kn = 8>>>>>>><>>>>>>>:

0 for n < k1 for k � n < k + 2J�j�1�1 for k + 2J�j�1 � n < k + 2J�j0 for k + 2J�j � n: (6.9)Similarly, we introduce the family of Haar scaling vectors f�j;kg, whose componentswill be denoted by �j;kn (the range of j, k, and n remains unchanged). We de�ne�j;kn = 8>>><>>>: 0 for n < k1 for k � n < k + 2J�j0 for k + 2J�j � n: (6.10)Our de�nition of discrete Haar wavelets is similar to that of Nason et al. (2000),Section 2. The di�erence is that we \pad" the wavelet vectors with zeros on bothsides so that they all have length N , and we do not normalise them.Further, let h�; �i denote the inner product of two vectors, and let bJ(n) =(bJ0 (n); bJ1 (n); : : : ; bJJ�1(n)) be the binary representation of the integer n, wheren < 2J .The formula for the nth element of u = Fv isun = h�0;0;viN + J�1Xj=0(�1)bJj (n)2 j�J2 cj;J;n(v); (6.11)where cj;J;n = 8><>: h j;bn=2J�j c2J�j ;vih�j;bn=2J�jc2J�j ;vi 12 if h�j;bn=2J�jc2J�j ;vi > 00 otherwise: (6.12)164

6.3. Properties of the Haar-Fisz transform for constant intensities6.3 Properties of the Haar-Fisz transform forconstant intensitiesIn this section, we state and prove two propositions concerning the asymptoticbehaviour of Fv, where v is a Poisson vector of constant intensity. The case ofnon-constant intensities will be considered in Section 6.4. Proposition 6.3.1 saysthat the coe�cients of Fv are asymptotically uncorrelated, and Proposition 6.3.2says that they are also asymptotically normal with variance one.Proposition 6.3.1 Let v = (v0; v1; : : : ; vN�1) be a vector of i.i.d. Poisson vari-ables with mean �, and let N be an integer power of two. Let u = Fv. For m 6= n,we have cor(um; un) ! 0 as �!1 and �=N ! 0: (6.13)Proof. We will �rst calculate the correlation between the modi�ed detail coef-�cients at two di�erent scales. The detail coe�cient at any given scale has theform Df = (X0 �X1)f(X0 +X1)where X0 and X1 are some independent, identically distributed Poisson variables,and f(x) = x�1=2 with f(0) = 0. The detail coe�cient at any coarser scale dependson X0, X1 through their sum only, i.e. we haveDc = g(X0 +X1);where g also depends on some other Poisson variables Xi, i 6= 0; 1. Since X0, X1are identically distributed, we obviously haveE (Df ) = 0 and E (DfDc) = 0; (6.14)and so cov(Df ; Dc) = 0. We can show in a similar way that the smooth coe�cienth�0;0;vi=N is uncorrelated with any of the detail coe�cients.We are now in a position to calculate cov(um; un). From formula (6.11) it isclear that the variables will share the \smooth" term h�0;0;vi=N , which we will165

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationdenote by � to simplify the notation. Since the integer bn=2J�jc2J�j (see formula(6.11)) depends only on the �rst j bits in the binary expansion of n, the variablesum and un will also share the termX := J��1Xj=0 (�1)bJj (n)2 j�J2 cj;J;n(v); (6.15)where J� = minfj : bJj (n) 6= bJj (m)g. Using the de�nition in formula (6.12), it canbe proved that(�1)bJJ�(m)2J��J2 cJ�;J;m(v) = �(�1)bJJ� (n)2J��J2 cJ�;J;n(v): (6.16)The term on the LHS of equation (6.16) will be denoted by Y . We also denoteZ1 = J�1Xj=J�+1(�1)bJj (m)2 j�J2 cj;J;m(v)Z2 = J�1Xj=J�+1(�1)bJj (n)2 j�J2 cj;J;n(v):It takes a closer look at formula (6.11) to see that Z1 and Z2 are independent (theyare functions of di�erent components of v). Using the formulas in (6.14), we nowwrite cov(um; un) = cov(�+X � Y + Z1; �+X + Y + Z2)= Var(�) + Var(X)� Var(Y ):For � large enough, as X and Y become approximately normal (see Fisz (1955)),we have Var(X) � J��1Xj=0 (1 + �)2j�J = (1 + �)(2J��J � 2�J)Var(Y ) � (1� �)2J��JMoreover, we have Var(�) = �=N ! 0 by assumption. Since N = 2J ! 1, wehave cov(um; un) � �=N + (1 + �)(2J��J � 2�J)� (1� �)2J��J ! 0as �! 0 (note that 2J��J is constant), which completes the proof. �166

6.3. Properties of the Haar-Fisz transform for constant intensitiesProposition 6.3.2 Let v = (v0; v1; : : : ; vN�1) be a vector of i.i.d. Poisson vari-ables with mean �, and let N be an integer power of two. Let u = Fv. For alln = 0; 1; : : : ; N � 1, we have un � � = � + Yn; (6.17)where � D! 0 as �=N ! 0Yn D! N(0; 1) as (�;N) ! (1;1): (6.18)Proof. Without loss of generality, let us concentrate on u0. Let J = log2(N), andlet us denote Wj(�) = 8><>: P2j�1i=0 vi�P2j+1�1i=2j viqP2j+1�1i=0 vi if P2j+1�1i=0 vi > 00 otherwise (6.19)to emphasise the dependence of Wj on �. The following equality holds (see theexample in Section 6.2.1, and formulas (6.11) and (6.12))u0 = N�1 N�1Xi=0 vi + J�1Xj=0 2�j�12 Wj(�): (6.20)Set � = N�1 N�1Xi=0 vi � � and Y0 = J�1Xj=0 2�j�12 Wj(�):We will �rst show that Y0 D! N(0; 1) as (�; J) ! (1;1). Let us �x �1 > 0. ByTheorem 5.1.1, if � or j are large enough, then we haveVar(Wj(�)) = (1 + ��j ) � 1 + �1; (6.21)where j��j j < �1. Also, for all �, the variables Wj(�) are uncorrelated (see the proofof Proposition 6.3.1). 167

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationUsing the symmetry of Wj(�), the Chebyshev inequality, the orthogonality ofWj(�), and formula (6.21), for large �, J , and M > J we haveP M�1Xj=J 2�j�12 Wj(�) < ��! = P M�1Xj=J 2�j�12 Wj(�) > �!� ��2 Var M�1Xj=J 2�j�12 Wj(�)!= ��2 M�1Xj=J 2�j�1Var (Wj(�))� ��2 1Xj=J 2�j�1Var (Wj(�))� ��2(1 + �1)2�J :Clearly, we have that8 � 9 J0 8 J � J0 ��2(1 + �1)2�J � �: (6.22)Observe now that8 J J�1Xj=0 2�j�12 Wj(�) D! N(0; 1� 2�J) as �!1: (6.23)Here we have a �nite linear combination of orthogonal variables, each of whichconverges in distribution to N(0; 1) by Theorem 5.1.1. The �nite linear combi-nation will therefore converge to the �nite linear combination of orthogonal (=independent) normal variables, whose variances sum up to 1 � 2�J . Denote byS�2(t) the survival function of a normal variable with mean zero and variance�2. Note two properties of the family fS1�2�J (t)g1J=1: kS1�2�J (�) � S1(�)k1 !0 as J ! 1; fS1�2�J (t)g1J=1 is uniformly Lipschitz continuous with Lipschitzconstant L = 1=p�.Now �x � > 0 and choose the corresponding J0 in (6.22). For an arbitrary �xedt, examine the di�erenceD1 = ��P J�1Xj=0 2�j�12 Wj(�) > t!� S1(t)�� : (6.24)168

6.3. Properties of the Haar-Fisz transform for constant intensitiesWe haveP J�1Xj=0 2�j�12 Wj(�) > t! =P J0�1Xj=0 2�j�12 Wj(�) + J�1Xj=J0 2�j�12 Wj(�) > t! �P (J0�1Xj=0 2�j�12 Wj(�) > t� �) _( J�1Xj=J0 2�j�12 Wj(�) > �)! �P J0�1Xj=0 2�j�12 Wj(�) > t� �!+ P J�1Xj=J0 2�j�12 Wj(�) > �! �(S1�2�J0 (t� �) + �) + � � S1�2�J0 (t) + �=p� + 2� �S1(t) + � + �1=p� + 2� � � S1(t) + 4�: (6.25)On the other hand, we haveP J�1Xj=0 2�j�12 Wj(�) > t! �P (J0�1Xj=0 2�j�12 Wj(�) > t+ �) ^( J�1Xj=J0 2�j�12 Wj(�) > ��)! =P J0�1Xj=0 2�j�12 Wj(�) > t+ �!+ P J�1Xj=J0 2�j�12 Wj(�) > ��!�P (J0�1Xj=0 2�j�12 Wj(�) > t+ �) _( J�1Xj=J0 2�j�12 Wj(�) > ��)! �(S1�2�J0 (t+ �)� �) + (1� �)� 1 � S1�2�J0 (t)� 1p�� 2� �S1(t)� �� 1p� + 2� � � S1(t)� 4�: (6.26)Inequalities (6.25) and (6.26) together prove that the di�erence D1 of formula(6.24) is arbitrarily small for � and J large enough, which proves the convergence.We will now show that � D! 0 as �=N ! 0. We denote by S0(t) the survivalfunction of the constant variable 0. Consider the di�erenceD2 = ��P N�1 N�1Xi=0 vi � � > t!� S0(t)�� : (6.27)169

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationFor t > 0, we haveD2 = P N�1 N�1Xi=0 (vi � �) > t! � N�2t�2 E N�1Xi=0 (vi � �)!2= N�2t�2 N�1Xi=0 Var(vi) = N�1t�2�! 0 as �=N ! 0: (6.28)For t < 0, we haveD2 = ��P N�1 N�1Xi=0 vi � � > �jtj!� 1�� = P N�1 N�1Xi=0 vi � � � �jtj!= P �N�1 N�1Xi=0 vi + � � jtj! � N�1t�2�! 0 as �=N ! 0: (6.29)Inequalities (6.28) and (6.29) show that � D! 0 as �=N ! 0. The proof of Propo-sition 6.3.2 is completed. �6.4 Properties of the Haar-Fisz transform fornon-constant intensitiesIn this section, we assess the degree of Gaussianisation and variance stabilisationprovided by the Haar-Fisz transform for non-constant Poisson intensities. Also, weexamine the amount of correlation between the Haar-Fisz transformed variables.6.4.1 Decorrelation and GaussianisationWe begin by empirically investigating the degree of correlation between the Haar-Fisz transformed variables, as well as their proximity to normality. The details ofthe computational experiment are as follows: we selected 4 \templates" (vectorsof length 128), and shifted each of them by 1/10, 1, 2, 3, and 4 to create 4�5 = 20test intensity vectors. The templates are plotted in Figure 6.2.The template v0 is used to create constant intensities of 1/10, 1, 2, 3, and4. Each of the templates v25, v50 and v75, after shifting upwards by c (wherec 2 f1=10; 1; 2; 3; 4g), becomes a non-constant intensity vector in the shape of asymmetric rectangular hat whose middle part is elevated to the level of 8 + c (so170

6.4. Properties of the Haar-Fisz transform for non-constant intensitiesv0

0 40 80 120

04

8

v25

0 40 80 120

04

8

v50

0 40 80 120

04

8

v75

0 40 80 120

04

8

Figure 6.2: Templates used in the experiment of Sections 6.4.1 and 6.4.2.that it corresponds to a \high" intensity), with the outer parts remaining at theminimum level of c (so that they correspond to a \low" intensity). In the templatevn, the elevated middle part stretches over n% of the length of the template.Results for the templates v0, v25, v50 and v75 are plotted in Figures 6.3, 6.4,6.5 and 6.6, respectively. In each of the �gures, the consecutive rows correspondto shifts of the corresponding template by 1/10, 1, 2, 3 and 4, respectively.The left sub�gure in each row shows the quantiles of the distribution of Fx�F�against the standard normal, where � is the respective intensity vector and x isa sample path simulated from it. The quantiles have been averaged over 100simulated sample paths, rather than basing the result on only one simulation.The right sub�gure in each row shows the autocorrelation function of Fx�F� for171

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationlags 1 to 21, again averaged over 100 simulated sample paths. The corresponding95% con�dence bands have been adjusted to take into account the averaging.Decorrelation. The decorrelation appears to work �ne except for the most non-Gaussian setups: v0+1=10, v25+1=10 and v50+1=10. Also, a tendency can beobserved for the sample autocorrelation to be negative rather than positive, evenif it is non-signi�cant.Gaussianisation. Let � denote the minimum of the given intensity vector. Theexperiment demonstrates that while for constant intensities, the degree of Gaus-sianisation is only satisfactory from about � = 4 upwards, the \acceptable" levelof � falls as low as 2 or even less in the case of v75. This is so because the stretchof high intensity \makes up" for the failure of the asymptotic mechanism for theperiod of low intensity.The following example compares the Gaussianisation property of the Haar-Fisztransform, F , Anscombe's transform, A, and the identity transform. Let us con-sider the intensity as in the top plot of Figure 6.7 (a rescaled and shifted versionof the Donoho & Johnstone (1994) bumps function). This intensity vector will bedenoted by �, and v will denote a sample path generated from it.Figure 6.7 compares the Q-Q plots of v��, Av�A� and Fv�F� averaged over100 samples of v. Clearly, the Q-Q plot shows that the Haar-Fisz transformationdoes a better job in Gaussianisation. In particular, the Haar-Fisz transformeddata is less \stepped" and looks more like variates from a continuous distributionthan a discrete one. The Anscombe-transformed data appears more \stepped" forlower quantiles than for higher ones. Further, the tails for Haar-Fisz are morenormal than for Anscombe which in turn is more normal than the raw count data.6.4.2 Variance stabilisationIn this section, we use the templates v0, v25, v50 and v75 to investigate thevariance-stabilising properties of the Haar-Fisz transform and the Anscombe trans-form. 172

6.4. Properties of the Haar-Fisz transform for non-constant intensitieso o o ooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooo

oooooooooooo

ooooooo

oooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

0.0

1.0

2.0

Lag

AC

F

0 5 10 15 20

0.0

0.1

0.2

0.3

o o o oooooooooooooooooooooooooooooooooooo

ooooooooooooo

oooooooooooooooooooooooooooooooooooo

oooooooooo

oooooooooooooooooo

ooooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-10

12

Lag

AC

F0 5 10 15 20

-0.0

30.

00.

02

o o o oooooooooooo

ooooooooooooooooooooo

oooooooooooooooooo

ooooooooooooooooooooooooooooooo

ooooooooooooooooooooooo

oooooooooooo

ooooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

10.

01

o o o oooooooooooooooooo

ooooooooooooooooooooooooo

ooooooooooooooooooooooooooooo

oooooooooooooooooooooooooo

oooooooooooooooo

oooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0

o o o oooooooooooooooo




ooooooooooooooooo

ooooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

10.

01

Figure 6.3: Q-Q and acf plots for v0; see Section 6.4.1 for detailed description.173

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationo

o o oooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooo

oooooo

ooooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

50.

05

oo o ooooooooooooooooooooooo

ooooooooooooooooo

oooooooooooooooooo


oooooooooooooooooooo

oooooooooo

ooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0

o o o ooooooooooooo



oooooooooooooooooooooooooooooo


oooooooooooo

ooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0

o o o oooooooooooooooooo


oooooooooooooooooooooooooooo


ooooooooooooooooo

oooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0

oo o oooooooo

oooooooooooooooooo



oooooooooooooooooooooooo

oooooooooooo

ooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0


6.4. Properties of the Haar-Fisz transform for non-constant intensitieso

o o oooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooo

ooooooooooooo

oooooooooo

ooooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

020.

06

oo o oooooooooooooooo



oooooooooooooooooooooooooooooooo

ooooooooooooooooooo

oooooooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F0 5 10 15 20

-0.0

20.

0

o o o oooooooooooooo





oooooooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

3-0

.01

0.01

o o o ooooooooooooooooo




ooooooooooooooooo

oooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0

o o o ooooooooooooooo

oooooooooooooooooooooo



ooooooooooooooooooo

ooooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-20

12

Lag

AC

F

0 5 10 15 20

-0.0

10.

01


Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationo

o o oooooooooooooo


ooooooooooooooooooooooooooooooooooooooooooo


ooooooooooooo

ooooooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

3-0

.01

0.01

oo o ooooooooooo





oooooooooooo o o

o

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

3-0

.01

0.01

oo o oooooooo

ooooooooooooooo

ooooooooooooooooooooooooooo



oooooooooooooo

oooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0

oo o oooooooo

oooooooooooooooooo




oooooooooooo

ooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0

oo o oooooooo

ooooooooooooooooo




oooooooooooo

ooo o oo

xx$quant.x

xx$q

uant

.y

-2 -1 0 1 2

-2-1

01

2

Lag

AC

F

0 5 10 15 20

-0.0

20.

0


6.4. Properties of the Haar-Fisz transform for non-constant intensities

0 200 400 600 800 1000

05

1015

oo o o o ooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo


oooooooo o o o o

o


v-la

mbd

a

-2 0 2

-50

5

oo o o o ooooooooooooooooooooooooooooooooo


ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooo o o o o

o


Av-

A(la

mbd

a)

-2 0 2

-3-2

-10

12

3

o o o o o ooooooooooooooooooooooooooooooo


ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

ooooooooooooooooooooooooooooooooooooooooooooooooooooooo


ooooo o o o oo


Fv-

F(la

mbd

a)

-2 0 2

-3-2

-10

12

3

Figure 6.7: From top to bottom: intensity vector � of Donoho & Johnstone (1994)bumps function (solid; shifted and scaled so that the minimum intensity is 3 andthe maximum is 18) and one sample path v (dotted); Q-Q plots of vectors v� �,Av�A�, and Fv �F� averaged over 100 v samples.177

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationResults for the templates v0, v25, v50 and v75 are plotted in Figures 6.8, 6.9,6.10 and 6.11, respectively. Like in the previous section, the consecutive rows ineach of the �gures correspond to shifts of the corresponding template by 1/10, 1,2, 3 and 4, respectively.The left sub�gure in each row shows the squared residuals (A� � Ax)2, where� is the respective intensity vector and x is a sample path simulated from it. Theresults have been averaged over 1000 simulated sample paths to give an idea ofthe variance of the Anscombe-transformed vector for each intensity function. Theright sub�gure shows the analogous quantity for the Haar-Fisz operator F .The simulation shows that in most cases, Haar-Fisz provides a better variancestabilisation than Anscombe, i.e. the squared residuals are closer to one. For theHaar-Fisz transform, the degree of variance stabilisation seems to be satisfactoryfrom � = 1 onwards, while for the Anscombe transform | only from about � = 2.Moreover, for the Haar-Fisz transform, the level of the squared residuals almostexactly re ects the shape of the underlying intensity function (a phenomenon whichis natural for the Anscombe transform, but not necessarily expected of Haar-Fisz).6.4.3 Summary of conclusionsAn analogous simulation study with the four templates was also carried out forN = 1024. Let \D-G-S" denote the decorrelation, Gaussianisation and variancestabilisation properties of the Haar-Fisz transform. The main conclusions fromthe two experiments can be summarised as follows.1. The degree of D-G-S was strikingly similar for sample sizes of N = 128 andN = 1024: we suspect that the degree of D-G-S is not strongly dependenton N . We consider N = 128 to be a short vector in this situation.2. The greater the minimum of the intensity vector, �, the higher the degreeof D-G-S. For constant intensities D-G-S is extremely e�ective from about� = 4. 178

6.4. Properties of the Haar-Fisz transform for non-constant intensities0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.7

0.8

0.9

1.0

0 20 40 60 80 100 120

0.7

0.8

0.9

1.0

0 20 40 60 80 100 120

0.90

1.00

0 20 40 60 80 100 120

0.90

1.00

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

Figure 6.8: Averaged squared residuals for v0; see Section 6.4.2 for detailed de-scription.179

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimation0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.7

0.9

1.1

0 20 40 60 80 100 120

0.7

0.9

1.1

0 20 40 60 80 100 120

0.85

0.95

1.05

0 20 40 60 80 100 120

0.85

0.95

1.05

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

Figure 6.9: Averaged squared residuals for v25; see Section 6.4.2 for detaileddescription.180

6.4. Properties of the Haar-Fisz transform for non-constant intensities0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.7

0.9

1.1

0 20 40 60 80 100 120

0.7

0.9

1.1

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

1.20

0 20 40 60 80 100 120

0.90

1.00

1.10

1.20


Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimation0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.2

0.6

1.0

0 20 40 60 80 100 120

0.7

0.9

1.1

0 20 40 60 80 100 120

0.7

0.9

1.1

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.90

1.00

1.10

0 20 40 60 80 100 120

0.9

1.0

1.1

1.2

0 20 40 60 80 100 120

0.9

1.0

1.1

1.2

0 20 40 60 80 100 120

0.90

1.00

1.10

1.20

0 20 40 60 80 100 120

0.90

1.00

1.10

1.20


6.5. Poisson intensity estimation3. For non-constant intensities, the degree of D-G-S depends not only on � :=min� but also on the length of the stretch where the intensity is equal, orclose, to �. The shorter the stretch, the lower the \acceptable" value of �for which D-G-S is still very e�ective. For example, if the intensity is at itsconstant minimum, 2, for 25% of the time and the remaining intensity isconstant at 10, then the D-G-S is extremely e�ective.4. The Haar-Fisz transform is usually a much better Gaussianiser and variance-stabiliser than the Anscombe transform, especially for lower intensities.6.5 Poisson intensity estimationMotivated by the excellent Gaussianisation, variance stabilisation and decorrela-tion properties of the Haar-Fisz transform demonstrated in the previous sections,we propose the following core algorithm for estimating the (possibly non-constant)intensity � of a Poisson process:[A1] Given the vector v of Poisson observations, preprocess it using the Haar-Fisztransformation to obtain Fv.[A2] Denoise Fv using any suitable ordinary wavelet denoising technique, ap-propriate for Gaussian noise (i.e. DWT | thresholding | inverse DWT).Denote the smoothed version of Fv by cF�. We can optionally exploit thefact that the asymptotic variance of the noise is equal to one.[A3] Perform the inverse Haar-Fisz transform to obtain F�1(cF�) and take it tobe the estimate of the intensity.The following sections discuss several aspects of the above algorithm and compareits performance to a range of existing methods on a variety of test intensitites.183

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimation6.5.1 Methods for Poisson intensity estimationExisting methods. As mentioned in Section 2.3.3, the Bayesian methods for Poissonintensity estimation due to Kolaczyk (1999a) and Timmermann & Nowak (1997,1999) are currently state-of-the-art; see Besbeas et al. (2004). Our simulationstudy compared our technique with these Bayesian methods, as well as with thecomputationally intensive l1-penalised likelihood technique of Sardy et al. (2004)and with a choice of methods based on the Anscombe transformation. To compareour technique with Kolaczyk (1999a) we used Eric Kolaczyk's BMSMShrink MAT-LAB software. As we did not have access to Timmermann and Nowak's softwarewe exactly reproduced the simulation setup as in Timmermann & Nowak (1999)and compared our results to their Tables I and II. (Incidentally, the methods in Ko-laczyk (1999a) and Timmermann & Nowak (1999) are very similar: the underlyingBayesian model is exactly the same, although the hyperparameter estimation isslightly di�erent (Kolaczyk (2001), personal communication).)Our method. The following describes the common features for our Poisson in-tensity estimation.1. All our techniques always involve the Haar-Fisz transform, [A1], of the data,and the inverse Haar-Fisz transform, [A3].2. In step [A2] of our algorithm the wavelet denoising technique may be of atranslation invariant (TI) transform type; see Coifman & Donoho (1995).We refer to TI-denoising at this stage as \internal" cycle spinning (CS).3. In step [A2] we could use any one of a number of wavelet families (e.g. mul-tiwavelet, see Downie & Silverman (1998), complex-valued, see Lina (1997)etc.) for the denoising. In our simulations below we use Haar wavelets andDaubechies least-asymmetric wavelets of order 10; see Daubechies (1992).4. Let S be the shift-by-one operator from Nason & Silverman (1995). TheHaar-Fisz transform is not translation-equivariant since FS 6= SF . This184

6.5. Poisson intensity estimationnon-commutativity implies that it is bene�cial to apply CS to the wholealgorithm [A1]-[A3] even if [A2] uses a TI technique. We call this \external"CS.Due to the particular type of nonlinearity of the Haar-Fisz transform there isno fast O(N logN) algorithm for the external CS. Therefore, we implementexternal CS by actually shifting the data before [A1], shifting back the esti-mate after [A3], and averaging over the estimates obtained through severaldi�erent shifts.For a data set of length N there are N possible shifts. However, throughempirical investigation detailed in Section 6.5.3, we have found that 50 shiftsare enough for data of length � 1024. We postulate that using more shiftsfor longer data sets is likely to be bene�cial.Note that there is no point in doing external CS with the Anscombe transforma-tion, A, provided one has carried out internal CS, since Anscombe's transformationcommutes with the shift operator: AS = SA.The following list labels and describes the wavelet denoising methods that wechoose to use in [A2]. In each case F ./ denotes the use of the Haar-Fisz transformand its inverse.F ./U: Universal hard thresholding from Donoho & Johnstone (1994) as imple-mented in WaveThresh (Nason (1998)) with default parameters (e.g. usesMAD variance estimation on all coe�cients). 50 external cycle shifts.F ./CV: Cross-validation method from Nason (1996) as implemented inWaveThresh using default parameters but hard thresholding. 50 externalcycle shifts.F ./BT: A variant of the greedy tree algorithm from Baraniuk (1999). 50 externalcycle shifts. 185

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationHybrids. We also looked at the performance of certain hybrid methods. Theseestimate the intensity by averaging the results of two of the above Haar-Fisz methods. Our main hybrid, H:CV+BT, combines F ./CV and F ./BT. Note that hybrids can be easily formulated due to the large number ofmethods available for denoising Gaussian contaminated signals.During our investigations we made use of several other denoisers including theeBayes procedure as described by Johnstone & Silverman (2003); universal hardthreshold with internal cycle spinning; and hybrids of these with F ./CV.6.5.2 Simulation results for various test functionsThe simulation setup in the �rst part of this section is the same as that describedin Timmermann & Nowak (1999), who obtain two sets of intensity functions oflength N = 1024 from the test functions from Donoho & Johnstone (1994). Eachset is obtained by shifting and scaling to achieve (min,max) intensities of (1=8; 8)and (1=128; 128). The true intensity functions for the (1=8; 8) case are shown asdashed lines in Figure 6.12.The results for our methods are based on 100 independent simulations. Thefollowing list labels and provides details of other competing methods.� BAY | Bayesian method developed in Timmermann & Nowak (1999), re-sults quoted from the article, 25 independent simulations;� BM | Bayesian BMSMShrink method developed in Kolaczyk (1999a), 100independent simulations;� L1 P | l1-penalised likelihood method from Sardy et al. (2004), resultsquoted from the article (after appropriate rescaling), 25 independent simu-lations. The authors only provide results for blocks and bumps;� A ./U | method constructed in exactly the same way as the correspondingmethod F ./U except the (inverse) Anscombe transform was used instead of186

6.5. Poisson intensity estimationTable 6.1: Normalised MISE values (�10000) for various existing techniques andour F ./U and H:CV+BT methods using Haar wavelets and Daubechies' leastasymmetric wavelets with 10 vanishing moments (LA10), on the test functionswith peak intensities 8 and 128. The best results are indicated by a box.Peak intensity = 8L1 P A ./U F ./U H:CV+BTIntensity BAY BM Haar Haar LA10 Haar LA10 HaarDoppler 154 146 * 218 121 201 99 159Blocks 178 129 287 217 338 191 302 135HeaviSine 52 46 * 98 63 68 40 64Bumps 1475 1871 1557 3121 1579 2826 1268 2266Peak intensity = 128Doppler 26 20 * 28 12 29 12 23Blocks 27 8 27 8 38 8 37 7HeaviSine 7 7 * 9 7 9 7 9Bumps 143 174 122 191 133 185 133 163the (inverse) Haar-Fisz transform. 100 independent simulations.The results reported in Table 6.1 are the MISE normalised by the squared l2norm of the true intensity vector, multiplied by 10000 and then rounded for clarityof presentation (this is exactly the same performance measure as in Timmermann& Nowak (1999) which is useful for comparability).The results show that our F ./U method with the LA10 wavelet outperformsthe existing state-of-the-art methods especially for the lower intensity, except forthe blocks function (as well as bumps for the higher intensity where our techniqueis outperformed by L1 P by about 8%). The main reason why the performanceof our method on blocks is less impressive is that a smooth wavelet is used inthe Gaussian denoising step [A2]. As expected, the performance of F ./U withthe Haar wavelet is much better in this context, but still not as good as thatof BMSMShrink, which is the best competitor for blocks. However, the hybridmethod H:CV+BT with the Haar wavelet achieves performance comparable toBMSMShrink. We should emphasise here that our F ./U method is far simpler to187

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationbumps

0 200 400 600 800 1000

02

46

8

blocks

0 200 400 600 800 1000

02

46

810

heavisine

0 200 400 600 800 1000

02

46

8

doppler

0 200 400 600 800 1000

02

46

8

Figure 6.12: Selected estimates for the Donoho and Johnstone intensity functions(dashed, described in text). Each estimate gives an idea of \average" performancein that in each case its MISE is the closest to the median MISE obtained over 50sample paths. The estimation method in each case was F ./U with Daubechiesleast-asymmetric wavelets with 10 vanishing moments except for blocks which usedH:CV+BT with Haar wavelets.188

6.5. Poisson intensity estimationimplement than the current state-of-the-art techniques.In Figure 6.12 the small spike in the heavisine function is not picked up well atintensity 8 but is almost always clearly estimated at intensity 128 (not shown).However, it should be said that the spike is almost completely obscured by noisein all realisations at intensity 8 so it would be extremely di�cult for any methodto detect it. We are impressed with the quality of the estimates using the newHaar-Fisz method, particularly with bumps and doppler. Also, the reconstructionof blocks, using the hybrid method H:CV+BT, is very accurate. Overall, it mustbe remembered that the reconstructions are usually going to be less impressive thanthe classical wavelet shrinkage problem where the test functions are contaminatedwith Gaussian noise with variance one.To further investigate the performance of the methods on piecewise constantintensities we performed the following simulation study where the true intensitywas the clipped blocks function of length N = 1024 shown on the left hand side ofFigure 6.13. The clipped blocks intensity was obtained from the blocks function bysetting all negative values to zero, scaling it so that the maximum intensity is 15.6and then adding 3. We also examined the same intensity but scaled by factors of1/6, 1/3 and 10/3. These scalings gave us a range of low and high intensity settingswith large spreads of low intensity. The minimum and maximum intensities were,for each of these scalings: 3{18.6, 0.5{3.1, 1.0{6.2 and 10{62.The simulation results reported in Table 6.2 are the MISE per bin: that iswe computed the sum of the squared errors between our estimate and the trueintensity, then divided by the number of bins (1024) and then took the mean overall 100 simulations.Table 6.2 shows that at low to medium intensities BMSMShrink andH:CV+BTare competitive but at the higher 10=3 intensity our hybrid is about 10% better.The right hand �gure in Figure 6.13 shows a particular sample reconstruction usingthe hybrid method H:CV+BT.189

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationDonoho’s blocks and clipped blocks

0 200 400 600 800 1000

05

1015

Clipped blocks - sample reconstruction

0 200 400 600 800 1000

510

1520

Figure 6.13: Left: Scaled and shifted blocks function, and its clipped version:clipped blocks. Right: The true intensity function (with scaling 1, dashed) and anestimate computed using our algorithm using hybrid method H:CV+BT whoseMISE was closest to the median MISE obtained over 50 sample paths.

Table 6.2: MISE per bin (�100 and rounded) for clipped block intensity estima-tion using BMSMShrink and H:CV+BT as denoted in the text for a variety ofintensity scalings. ScalingMethod 1/6 1/3 1 10/3BMSMShrink 9 20 61 191H:CV+BT 9 20 61 171190

6.5. Poisson intensity estimation6.5.3 Performance of Haar-Fisz methods as a function ofthe number of cycle shiftsIn this section, we attempt to justify our choice of 50 external cycle shifts as thedefault number of CS recommended when estimating intensities of length � 1024(see Section 6.5.1). For the purpose of our simulation study, we selected thefollowing intensities out of those considered in Section 6.5.2:1. clipped blocks 1 | clipped blocks with scaling 1;2. blocks 128 | blocks with maximum intensity 128;3. bumps 128 | bumps with maximum intensity 8;4. doppler 8 | doppler with maximum intensity 8;5. heavisine 128 | heavisine with maximum intensity 128.For each intensity, we examined the MISE depending on the number of cycle shifts(0{50), averaged over 10 simulated sample paths. This was done for both \short"(length 128), and \long" (length 1024) intensity vectors. Clipped blocks 1 andblocks 128 were estimated using F ./CV and F ./BT (components of the hybridH:CV+BT) with Haar wavelets, whereas bumps 8, doppler 8 and heavisine 128were estimated using F ./ U with DaubLeAsymm10 wavelets.Successive rows in Figure 6.14 are results for clipped blocks 1 with F ./CV,clipped blocks 1 with F ./BT, blocks 128 with F ./CV and blocks 128 with F ./BT. The left-hand column shows results for length 128, and the right-hand column| for length 1024.Successive rows in Figure 6.15 show results for bumps 8, doppler 8 and heavisine128. The meaning of columns is as above.From the plots, it is evident that the MISE always stabilises after a small numberof cycle-shifts. For clipped blocks 1, blocks 128 and bumps 8, the MISE stabilisesafter approximately 5 and 10 cycle-shifts (for short and long signals, respectively).191

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimationFor doppler 8 and heavisine 128, the respective numbers of cycle-shifts are approx-imately 15 and 40.Motivated by the above observation, we set the default value of the numberof cycle-shifts to 50, bearing in mind that this number may need to be slightlyincreased in the case of extremely long (� 2048) signals.6.6 Application to earthquake dataIn this section, we analyse Northern Californian earthquake data, availablefrom http://quake.geo.berkeley.edu. We analyse the time series Nk, k =1; : : : ; 1024, where Nk is the number of earthquakes of magnitude 3.0 or morewhich occurred in the kth week, the last week under consideration being 29 Novem-ber { 5 December 2000. The time series, imported into S-Plus, is plotted in Figure6.16.Our aim is to extract the intensity which underlies the realisation of this process.For the purposes of this example we shall use the BMSMShrink methodology ofKolaczyk (1999a) and our hybrid H:CV+BT method with Haar wavelets. Therationale for using H:CV+BT is that:� it appears that the true earthquake intensity is highly non-regular andH:CV+BT with Haar wavelets worked the best on the blocks and clippedblocks simulation examples from Section 6.5.2;� the earthquake data exhibits medium to high intensities and H:CV+BTwas better than the other hybrids that we tried in this situation.Figures 6.17 shows the intensity estimates obtained using BMSMShrink andH:CV+BT plotted on a log scale. (Due to the large peak at 274 weeks theoriginal scale of 0-250 is not suitable for analysing the subtle di�erences betweenthe estimates.) Visually the estimates are very similar however the H:CV+BTestimate is a little less variable. Although with this real data there is clearly192

6.6. Application to earthquake data

0 10 20 30 40 50

350

400

450

500

550

600

0 10 20 30 40 50

6080

100

120

140

160

0 10 20 30 40 50

350

400

450

500

550

600

0 10 20 30 40 50

8010

012

0

0 10 20 30 40 50

4045

5055

0 10 20 30 40 50

810

1214

1618

20

0 10 20 30 40 50

3540

45

0 10 20 30 40 50

67

89

Figure 6.14: MISE against the number of shifts for clipped blocks 1 (top two rows)and blocks 128 (bottom two rows). See Section 6.5.3 for detailed description.193

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimation

0 10 20 30 40 50

3000

3500

4000

4500

0 10 20 30 40 5012

0013

0014

0015

0016

0017

0018

00

0 10 20 30 40 50

450

500

550

600

650

700

750

0 10 20 30 40 50

100

120

140

160

180

0 10 20 30 40 50

3035

4045

0 10 20 30 40 50

78

910

1112

13

Figure 6.15: MISE against the number of shifts for bumps 8, doppler 8 and heav-isine 128. See Section 6.5.3 for detailed description.194

6.7. Conclusion

weeks

num

ber

of e

arth

quak

es

0 200 400 600 800 1000

050

100

150

200

Earthquakes in Northern California, magnitude >= 3.0

Figure 6.16: The number of earthquakes of magnitude � 3:0 which occurred inNorthern California in 1024 consecutive weeks, the last week being 29 Nov { 5 Dec2000.no right or wrong answer it is reassuring that they do give such similar visual re-sults even though BMSMShrink and H:CV+BT are based on completely di�erentphilosophies.6.7 ConclusionIn this chapter, we have described a new wavelet-based technique for bringing vec-tors of Poisson counts to normality with variance one. The technique, named theHaar-Fisz transformation, was applied to estimating the intensity of an inhomo-geneous Poisson process, yielding a method whose performance was nearly alwaysbetter than that of the current state-of-the-art.For Poisson intensity estimation our methodology requires two components. The�rst, the Haar-Fisz transform, is very simple and easy to code. The second com-ponent can be any suitable Gaussian denoising procedure: we have used and com-pared a variety of wavelet methods ranging from the fast universal thresholding to195

Chapter 6. A Haar-Fisz algorithm for Poisson intensity estimation

weeks

log

estim

ated

inte

nsity

200 250 300 350 400

23

45

Figure 6.17: Intensity estimates for earthquake data for weeks 201 to 400. Dottedline is BMSMShrink estimate and solid is H:CV+BT estimate.more complicated techniques such as cross-validation, \Baraniuk trees" and em-pirical Bayes. Since any Gaussian denoiser can be used, the Haar-Fisz algorithmcan only improve as the �eld develops.If computational speed is not an issue, and little is known about the smoothnessof the true intensity, we recommend that several denoisers be used and a hybridaveraging all of their results, with optional full cycle-spinning, be considered. How-ever, if speed is important then there is an issue over which one denoiser shouldbe chosen: not all denoisers are appropriate for all types of intensity as our earliersimulations con�rmed. Our recommendation is that if one suspects the intensity ispiecewise constant then one should use Haar wavelets and a hybrid method such asH:CV+BT; otherwise, we strongly recommend the use of F ./U with a smoothwavelet.We believe that one of the reasons why the performance of the Haar-Fisz al-gorithm is so good is due to the non-commutativity of the Haar-Fisz and shiftoperators, which enables meaningful cycle spinning. Also, the Fisz transform it-self is a more e�ective normaliser than Anscombe.The S-Plus routines implementing the algorithm, as well as the data set, areincluded on the associated CD. 196

Chapter 7Conclusions and future directionsIn this thesis, we have considered some wavelet-based and wavelet-related ap-proaches to selected problems arising in two important branches of statistics: timeseries analysis and Poisson regression. In this chapter, we brie y summarise themain contributions made in Chapters 3 { 6 and then move on to discuss possibledirections for future research.In Chapter 3, we considered several theoretical and computational aspects offorecasting Gaussian Locally Stationary Wavelet (LSW) processes by means of thelinear predictor. As direct MSPE minimisation would have required knowledge ofunidenti�able parameters, we proposed to compute the prediction coe�cients byapproximate MSPE minimisation. We identi�ed conditions under which the ap-proximation was valid but found that one of the them was overly restrictive. Wecircumvented that theoretical di�culty by introducing a slight modi�cation to thede�nition of an LSW process (the new class was labelled \LSW2"). The minimi-sation of the MSPE led to the generalisation of the Yule-Walker equations: thestability of the system was analysed. We also found that \sparse" LSW2 processeswere ill conditioned for forecasting. To conclude the theoretical part of the work,we derived a generalisation of Kolmogorov's formula for one-step MSPE in theLSW2 framework. In practice, the entries of the prediction matrix have to be esti-mated, and we studied the behaviour of the �rst two moments of our (multiscale)estimators. We then proposed a complete forecasting algorithm where the choice197

Chapter 7. Conclusions and future directionsof the arising nuisance parameters was performed by adaptive forecasting. The al-gorithm was successfully applied to the forecasting of a meteorological time series.Our work in Chapter 3 provided an answer to the interesting question of whetherand how wavelets could be useful in forecasting non-stationary time series.In the less technical Chapter 4, we attempted to model �nancial log-returns inthe Gaussian LSW framework. We �rst extended the LSW model to include thetime-modulated white noise (TMWN) as a special case (we labelled the new class\LSW3"). Then, we used theory to show that the LSW3 model was able to pick upthe most commonly observed stylised facts of �nancial time series. We proposeda generic automatic algorithm for estimating the time-varying wavelet spectrumof log-returns with guaranteed nonnegativity, and used simulation to demonstratethe excellent performance of its two particular implementations. An exploratorydata analysis of the FTSE 100 series showed that its second order structure waschanging over time, and that Haar wavelets were ideally suited for the modellingof that series. Finally, we showed by simulation that �nancial log-returns could besuccessfully forecast in the LSW3 model.In Chapter 5, we proposed a fast multiscale method, called the Haar-Fisz trans-form, for stabilising the variance of the wavelet periodogram (WP) in the GaussianLSW model and bringing its distribution closer to normality. To be able to analysethe Haar-Fisz transform from a theoretical point of view, we stated and proved aFunctional Central Limit Theorem (FCLT) for the WP. We then formulated andproved the Gaussianising, variance stabilising and decorrelating properties of thetransform in a certain asymptotic regime. We showed by simulation that the Haar-Fisz transform is a far better Gaussianiser than the classical variance stabilisinglog transform. We then proposed a denoising method for the WP, which consistedof three basic steps: take the Haar-Fisz transform, denoise the transformed WPusing a method suitable for signals contaminated with stationary Gaussian noise,and then take the inverse Haar-Fisz transform. We assessed the performance of198

the method using simulation and showed that it outperformed an existing com-petitor in most of the cases. We concluded the chapter by using the Haar-Fiszmethodology to perform a local variance analysis of the Dow Jones index: theanalysis showed that the series could be modelled as Gaussian.In Chapter 6, we introduced a Haar-Fisz transform for sequences of Poissoncounts, again with the aim of stabilising their variance as well as bringing theirdistribution close to Gaussianity. We proved that if the underlying Poisson in-tensity was constant, then the Haar-Fisz transformed vector was asymptoticallynormal with variance one, and its elements were uncorrelated. We used simula-tion to show that the Gaussianising, variance stabilising and decorrelating prop-erties of the Haar-Fisz transform also held for time-varying intensities. Also, wedemonstrated that the Haar-Fisz transform was a more e�ective Gaussianiser andvariance stabiliser than the usual Anscombe square-root transform. Then, we pro-posed a method for estimating Poisson intensities, based on the Haar-Fisz trans-form. Our technique outperformed state-of-the-art competitors in most of thecases; occasionally, its performance was slightly inferior but comparable. We ap-plied our estimation technique to the well-known Northern California earthquakedata: visually, our method gave similar results to the current state of the art.We conclude this thesis by considering a few possible avenues for further re-search. The adaptive forecasting algorithm of Section 3.5 merits further investi-gation: in particular, it would be interesting to examine the dependence of thenuisance parameters on their initial values, to further robustify the algorithm, andto investigate its theoretical properties (e.g. its asymptotic behaviour as T !1).Also, we suspect that by using the MSPE criterion in Chapter 3, we do notfully exploit the potential of the LSW model in forecasting non-stationary timeseries. Indeed, observe that the prediction matrix in Section 3.1 is similar to thevariance-covariance matrix of the process, which means that the wavelet spectrum(the main quantity of interest in LSW modelling) is only used indirectly there,through the local autocovariance. There arises a question of whether a tractable199

Chapter 7. Conclusions and future directions(multiscale) prediction criterion can be formulated which would make more directuse of the wavelet spectrum.As was shown in Section 3.2, sparse LSW processes are \ill conditioned" forforecasting. It would be interesting to investigate whether a multiscale time seriesmodel could be formulated whereby processes which are represented sparsely were,in an appropriate sense, \good" for forecasting. As for �nancial time series mod-elling (Chapter 4), the problem of volatility forecasting in the LSW model couldmake an interesting research project, as could the issue of using \skewed" waveletsand/or non-Gaussian innovations.Furthermore, it remains to be investigated whether and how the Haar-Fisz trans-form for the wavelet periodogram (a) can be extended to processes with a discon-tinuous spectral structure; (b) can be used for testing for time series stationarity.Also, it would be desirable to theoretically quantify the variance stabilising prop-erty of the Haar-Fisz transform for M = log2(T ) (see Section 5.5.2).The problem of the choice of the primary resolution when the noise is correlatedis still an open question, badly neglected in the wavelet literature. Indeed, we areunaware of any automatic method for performing this selection. It would clearlymake an exciting and potentially very rewarding research project.Also, it would be of interest to establish a theoretical proof of the Gaussianising,variance stabilising and decorrelating properties of the Haar-Fisz transform fornon-stationary Poisson signals. Finally, an exciting possibility for future researchwould be to investigate how the Haar-Fisz transform can be used for Gaussianisingother distributions, not only �2 or Poisson. Recall that in the �2 case, we use theFisz transform with exponent 1, and in the Poisson case | with exponent 1=2. Inthe case of other distributions, we could attempt to estimate the suitable exponent,or indeed a suitable function of the Haar smooth coe�cient, from the data.200

BibliographyAbramovich, F., & Benjamini, Y. 1996. Adaptive thresholding of wavelet coe�-cients. Comput. Statist. Data Anal., 22, 351{361.Abramovich, F., & Silverman, B. W. 1998. Wavelet decomposition approaches tostatistical inverse problems. Biometrika, 85, 115{129.Abramovich, F., Bailey, T. C., & Sapatinas, T. 2000. Wavelet analysis and itsstatistical applications. J. Roy. Statist. Soc. Ser. D, 49, 1{29.Altman, N. S. 1990. Kernel smoothing of data with correlated errors. J. Amer.Statist. Assoc., 85, 749{759.Anscombe, F. J. 1948. The transformation of Poisson, binomial and negative-binomial data. Biometrika, 35, 246{254.Antoniadis, A., & Sapatinas, T. 2001. Wavelet shrinkage for natural exponentialfamilies with quadratic variance functions. Biometrika, 88, 805{820.Antoniadis, A., Gr�egoire, G., & Nason, G. P. 1999. Density and hazard rateestimation for right-censored data by using wavelet methods. J. R. Stat. Soc.Ser. B, 61, 63{84.Audit, B., Bacry, E., Muzy, J. F., & Arneodo, A. 2002. Wavelet-based estimatorsof scaling behaviour. IEEE Transactions on Information Theory, 48, 2938{2954.Aussem, A., & Murtagh, F. 2001. Web tra�c demand forecasting using wavelet-based multiscale decomposition. International Journal of Intelligent Systems,16, 215{236. 201

BIBLIOGRAPHYAverkamp, R., & Houdr�e, C. 2003. Wavelet thresholding for non-necessarily Gaus-sian noise: idealism. Ann. Stat., 31, 110{151.Baraniuk, R. G. 1999. Optimal tree approximation with wavelets. Pages 206{214of: Unser, M. A., Aldroubi, A., & Laine, A. F. (eds), Wavelet Applications inSignal and Image Processing VII. Proceedings of SPIE, vol. 3813. SPIE.Barber, S., & Nason, G. P. 2003. Real nonparametric regression using complexwavelets. Technical Report 03:06, Department of Mathematics, University ofBristol, UK.Battaglia, F. 1979. Some extensions in the evolutionary spectral analysis of astochastic process. Boll. Un. Mat. Ital. B (5), 16, 1154{1166.Bera, A. K., & Higgins, M. L. 1993. ARCH models: properties, estimation andtesting. J. Economic Surveys, 7, 305{366.Besbeas, P., De Feis, I., & Sapatinas, T. 2004. A comparative simulation study ofwavelet shrinkage estimators for Poisson counts. International Statistical Review,72.Bilen, C., & Huzurbazar, S. 2002. Wavelet-based detection of outliers in timeseries. Journal of Computational and Graphical Statistics, 11, 311{327.Bollerslev, T. 1986. Generalized autoregressive conditional heteroskedasticity. J.Econometrics, 31, 307{327.Bougerol, P., & Picard, N. 1992. Stationarity of GARCH processes and of somenonnegative time series. J. Econometrics, 52, 115{127.Brillinger, D. R. 1998. Some wavelet analyses of point process data. Confer-ence Record of the Thirty-First Asilomar Conference on Signals, Systems andComputers, 1997, 2, 1087{1091.Brockwell, P. J., & Davis, R. A. 1987. Time Series: Theory and Methods. Springer.202

BIBLIOGRAPHYBrooks, C. 1997. Linear and non-linear (non-)forecastability of high-frequencyexchange rates. J. Forecasting, 16, 125{145.Cai, T., & Silverman, B. W. 2001. Incorporating information on neighbouringcoe�cients into wavelet estimation. Sankhy�a Ser. B, 63, 127{148. Special issueon wavelets.Calvet, L., & Fisher, A. 2001. Forecasting multifractal volatility. J. Econometrics,105, 27{58.Cand�es, E. J., & Donoho, D. L. 2001. Curvelets and curvilinear integrals. J.Approx. Theory, 113, 59{90.Chat�eld, C. 1996. The Analysis of Time Series: An Introduction. 5 edn. Chapman& Hall.Chat�eld, C. 2000. Time Series Forecasting. Chapman & Hall.Chiann, C., & Morettin, P. 1999. A wavelet analysis for time series. J. Non-parametr. Statist., 10, 1{46.Chui, C. 1992. An Introduction To Wavelets. Academic Press.Cohen, A. 2003. Numerical Analysis of Wavelet Methods. Studies in Mathematicsand Its Applications, vol. 32. Elsevier.Cohen, A., Daubechies, I., & Feauveau, J. 1992. Bi-orthogonal bases of compactlysupported wavelets. Comm. Pure Appl. Math., 45, 485{560.Cohen, A., Daubechies, I., & Vial, P. 1993. Wavelets on the interval and fastwavelet transforms. Appl. Comput. Harmon. Anal., 1, 54{81.Coifman, R. R., & Donoho, D. L. 1995. Translation-invariant de-noising. TechnicalReport, Statistics Department, Stanford University.203

BIBLIOGRAPHYCoifman, R. R., & Wickerhauser, M. V. 1992. Entropy-based algorithms for bestbasis selection. IEEE Trans. Inform. Theory, 38, 713{718.Coifman, R. R., Meyer, Y., Quake, S., & Wickerhauser, M. V. 1989. Signal pro-cessing and compression with wave packets. In: Meyer, Y. (ed), Proceedings ofthe International Conference on Wavelets, Marseilles. Paris: Masson.Cooley, J. W., & Tukey, O. W. 1965. An algorithm for the machine calculation ofcomplex Fourier series. Math. Comput., 19, 297{301.Cox, D. R., Hinkley, D. V., & Barndor�-Nielsen, O. E. (eds). 1996. Time SeriesModels in Econometrics, Finance and Other Fields. Monographs on Statisticsand Applied Probability, vol. 65. Chapman & Hall.Cristan, A. C., & Walden, A. T. 2002. Multitaper power spectrum estimationand thresholding: Wavelet packets versus wavelets. IEEE Trans. Sig. Proc., 50,2976{2986.Dahlhaus, R. 1996a. Asymptotic statistical inference for nonstationary processeswith evolutionary spectra. In: Robinson, P. M., & Rosenblatt, M. (eds), AthensConference on Applied Probability and Time Series Analysis, vol. 2. New York:Springer-Verlag.Dahlhaus, R. 1996b. On the Kullback-Leibler information divergence of locallystationary processes. Stochastic Process. Appl., 62, 139{168.Dahlhaus, R. 1997. Fitting time series models to nonstationary processes. Ann.Stat., 25, 1{37.Dahlhaus, R., & Neumann, M. H. 2001. Locally adaptive �tting of semiparametricmodels to nonstationary time series. Stoch. Proc. Appl., 91, 277{308.Dahlhaus, R., Neumann, M. H., & von Sachs, R. 1999. Non-linear wavelet estima-tion of time-varying autoregressive processes. Bernoulli, 5, 873{906.204

BIBLIOGRAPHYDaubechies, I. 1992. Ten Lectures on Wavelets. Philadelphia, Pa.: SIAM.Davidson, J. 1994. Stochastic Limit Theory. Oxford University Press.Davidson, J. 2003. Moment and memory properties of linear conditional het-eroscedasticity models. Journal of Business and Economic Statistics.Donoho, D., Johnstone, I., Kerkyacharian, G., & Picard, D. 1996. Density estima-tion by wavelet thresholding. Ann. Statist., 24, 508{539.Donoho, D. L. 1993. Nonlinear wavelet methods for recovery of signals, densities,and spectra from indirect and noisy data. Pages 173{205 of: Proc. Sympos.Appl. Math. Amer. Math. Soc.Donoho, D. L. 1995. Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. Appl. Comput. Harmon. Anal., 2, 101{126.Donoho, D. L. 2000. Orthonormal ridgelets and linear singularities. SIAM J.Math. Anal., 31, 1062{1099.Donoho, D. L., & Huo, X. 2002. Beamlets and multiscale image analysis. Pages149{196 of: Multiscale and multiresolution methods. Lect. Notes Comput. Sci.Eng., vol. 20. Berlin: Springer.Donoho, D. L., & Johnstone, I. M. 1994. Ideal spatial adaptation by waveletshrinkage. Biometrika, 81, 425{455.Donoho, D. L., & Johnstone, I. M. 1995. Adapting to unknown smoothness viawavelet shrinkage. J. Amer. Statist. Assoc., 90, 1200{1224.Downie, T., & Silverman, B. W. 1998. The discrete multiple wavelet transform andthresholding methods. IEEE Transactions on Signal Processing, 46, 2558{2561.Engle, R. F. 1982. Autoregressive conditional heteroscedasticity with estimates ofthe variance of United Kingdom in ation. Econometrica, 50, 987{1007.205

BIBLIOGRAPHYFarge, M., Kevlahan, N., Perrier, V., & Schneider, K. 1999. Turbulence analysis,modelling and computing using wavelets. Pages 117{200 of: Wavelets in physics.Cambridge: Cambridge Univ. Press.Fisz, M. 1955. The limiting distribution of a function of two independent randomvariables and its statistical application. Colloquium Mathematicum, 3, 138{146.Fryzlewicz, P., & Nason, G. P. 2004. A Haar-Fisz algorithm for Poisson intensityestimation. Journal of Computational and Graphical Statistics, to appear.Fryzlewicz, P., Van Bellegem, S., & von Sachs, R. 2003. Forecasting non-stationarytime series by wavelet process modelling. Ann. Inst. Stat. Math., 55, 737{764.Gao, H-Y. 1997. Choice of thresholds for wavelet shrinkage estimate of the spec-trum. J. Time Ser. Anal., 18, 231{251.Gen�cay, R., Sel�cuk, F., & Whitcher, B. 2001. An Introduction to Wavelets andOther Filtering Methods in Finance and Economics. Academic Press.Gencay, R., Selcuk, F., & Whitcher, B. 2001. Scaling properties of foreign exchangevolatility. Physica A, 289, 249{266.Geronimo, J. S., Hardin, D. P., & Massopust, P. R. 1994. Fractal functions andwavelet expansions based on several scaling functions. Journal of ApproximationTheory, 78, 373{401.Geva, A. B. 1998. ScaleNet | Multiscale neural-network architecture for timeseries prediction. IEEE Transactions on Neural Networks, 9, 1471{1482.Green, P. J., & Silverman, B. W. 1994. Nonparametric Regression and GeneralizedLinear Models. Chapman & Hall.Grillenzoni, C. 2000. Time-varying parameter prediction. Ann. Inst. Statist. Math.,52, 108{122. 206

BIBLIOGRAPHYHaar, A. 1910. Zur Theorie der orthogonalen Funktionensysteme. Math. Ann.,69, 331{371.Hall, P., & Patil, P. 1995. Formulae for mean integrated squared error of nonlinearwavelet-based density estimators. Ann. Statist., 23, 905{928.H�ardle, W., Spokoiny, V. G., & Teyssi�ere, G. 2000. Adaptive estimation for a timeinhomogeneous stochastic volatility model. SFB Discussion Paper No. 6/00.Berlin: Humboldt University.Hastie, T. J., & Tibshirani, R. J. 1990. Generalized Additive Models. Chapman &Hall.Hee, Y. Y., Chong, F. L., & Zhong, B. L. 2002. A hierarchical evolutionaryalgorithm for constructing and training wavelet networks. Neural Computing &Applications, 10, 357{366.Herrick, D., Nason, G. P., & Silverman, B. W. 2001. Some new methods for waveletdensity estimation. Sankhy�a Ser. A, 63, 394{411. Special issue on wavelets.Ho�mann, M. 1999. On nonparametric estimation in nonlinear AR(1) models.Statistics & Probability Letters, 44, 29{45.Hong, Y. M., & Lee, J. 2001. One-sided testing for ARCH e�ects using wavelets.Econometric Theory, 17, 1051{1081.Ikeda, Y., & Tokinaga, S. 1999. Evaluation of stock option prices by using theprediction of fractal time-series. Journal of the Operations Research Society ofJapan, 42, 18{31.Ja�ard, S., Meyer, Y., & Ryan, R. D. 2001. Wavelets: Tools for Science & Tech-nology. Philadelphia, Pa.: SIAM.Johnstone, I. M., & Silverman, B. W. 1997. Wavelet threshold estimators for datawith correlated noise. J. Roy. Statist. Soc. Ser. B, 59, 319{351.207

BIBLIOGRAPHYJohnstone, I. M., & Silverman, B. W. 2003. Empirical Bayes selection of waveletthresholds. Technical Report 02:17, Department of Mathematics, University ofBristol, UK.Kerkyacharian, G., & Picard, D. 1996. Estimating nonquadratic functionals of adensity using Haar wavelets. Ann. Statist., 24, 485{507.Kim, W. 1998. Econometric analysis of locally stationary time series models.Manuscript, Yale University.Kokoszka, P., & Leipus, R. 2000. Change-point estimation in ARCH models.Bernoulli, 6, 513{539.Kolaczyk, E. D. 1997. Non-parametric estimation of Gamma-ray burst intensitiesusing Haar wavelets. The Astrophysical Journal, 483, 340{349.Kolaczyk, E. D. 1999a. Bayesian multiscale models for Poisson processes. Journalof the American Statistical Association, 94, 920{933.Kolaczyk, E. D. 1999b. Wavelet shrinkage estimation of certain Poisson intensitysignals using corrected thresholds. Statistica Sinica, 9, 119{135.Kress, R. 1991. Numerical Analysis. Springer.Lawton, W. 1993. Applications of complex valued wavelet transforms to subbanddecomposition. IEEE Trans. Sig. Proc, 41, 3566{3568.Ledolter, J. 1981. Recursive estimation and adaptive forecasting in ARIMA modelswith time varying coe�cients. Pages 449{471 of: Applied time series analysis,II (Tulsa, Okla., 1980). New York: Academic Press.Lee, J., & Hong, Y. M. 2001. Testing for serial correlation of unknown form usingwavelet methods. Econometric Theory, 17, 386{423.208

BIBLIOGRAPHYLeung, M. T., Daouk, H., & Chen, A.-S. 2000. Forecasting stock indices: a com-parison of classi�cation and level estimation models. Int. J. Forecasting, 16,173{190.Li, T. H., & Hinich, M. J. 2002. A �lter bank approach for modeling and forecastingseasonal patterns. Technometrics, 44, 1{14.Li, Y. A., & Xie, Z. J. 1997. The wavelet detection of hidden periodicities in timeseries. Statistics & Probability Letters, 35, 9{23.Lina, J-M. 1997. Image processing with complex Daubechies wavelets. Journal ofMathematical Imaging and Vision, 7, 211{223.Lina, J-M., & Mayrand, M. 1995. Complex Daubechies wavelets. Applied andComputational Harmonic Analysis, 2, 219{229.Mackenzie, D. 2001. Wavelets: seeing the forest and the trees. Available fromhttp://www.beyonddiscovery.org/content/view.article.asp?a=1952.Maddala, G. S., & Rao, C. R. (eds). 1996. Statistical Methods in Finance. Hand-book of Statistics, vol. 14. Elsevier.Mallat, S. 1989a. Multiresolution approximations and wavelet orthonormal basesof L2(R). Trans. Amer. Math. Soc, 315, 69{87.Mallat, S. 1989b. A theory for multiresolution signal decomposition: the waveletrepresentation. IEEE Trans. Pattn Anal. Mach. Intell., 11, 674{693.Mallat, S. 1998. A Wavelet Tour of Signal Processing. Academic Press.Mallat, S., Papanicolaou, G., & Zhang, Z. 1998. Adaptive covariance estimationof locally stationary processes. Ann. Stat., 26, 1{47.Masuda, N., & Okabe, Y. 2001. Time series analysis with wavelet coe�cients.Japan Journal of Industrial and Applied Mathematics, 18, 131{160.209

BIBLIOGRAPHYM�elard, G., & Herteleer-De Schutter, A. 1989. Contributions to evolutionaryspectral theory. J. Time Ser. Anal., 10, 41{63.Meyer, Y. 1992. Wavelets and Operators. Cambridge University Press.Mikosch, T., & Starica, C. 2003. Change of structure in �nancial data, long rangedependence and GARCH modelling. The Review of Economics and Statistics,to appear.Milidiu, R. L., Machado, R. J., & Renteria, R. P. 1999. Time-series forecastingthrough wavelets transformation and a mixture of expert models. Neurocom-puting, 28, 145{156.Morlet, J., Arens, G., Fourgeau, E., & Giard, D. 1982. Wave propagation andsampling theory. Geophysics, 47, 203{236.Murty, K. G. 1988. Linear Complementarity, Linear and Nonlinear Pro-gramming. Internet Edition, available from http://ioe.engin.umich.edu/people/fac/books/murty/linear complementarity webbook/.Nason, G. P. 1996. Wavelet shrinkage using cross-validation. J. Roy. Statist. Soc.Ser. B, 58, 463{479.Nason, G. P. 1998. WaveThresh3 Software. Available fromhttp://www.stats.bris.ac.uk/~wavethresh/.Nason, G. P., & Sapatinas, T. 2002. Wavelet packet transfer function modellingof nonstationary time series. Statistics and Computing, 12, 45{56.Nason, G. P., & Silverman, B. W. 1994. The discrete wavelet transform in S. J.Comput. Graph. Statist., 3, 163{191.Nason, G. P., & Silverman, B. W. 1995. The stationary wavelet transform andsome statistical applications. Pages 281{300 of: Antoniadis, A., & Oppenheim,G. (eds), Lecture Notes in Statistics, vol. 103. Springer-Verlag.210

BIBLIOGRAPHYNason, G. P., & von Sachs, R. 1999. Wavelets in time series analysis. PhilosophicalTransactions of the Royal Society of London (Series A), 357, 2511{2526.Nason, G. P., von Sachs, R., & Kroisandt, G. 2000. Wavelet processes and adaptiveestimation of the evolutionary wavelet spectrum. Journal of the Royal StatisticalSociety. Series B, 62, 271{292.Neumann, M., & von Sachs, R. 2000. A wavelet-based test for stationarity. J.Time Ser. Anal., 21, 597{613.Neumann, M. H. 1996. Spectral density estimation via nonlinear wavelet methodsfor stationary non-Gaussian time series. Journal of Time Series Analysis, 17,601{633.Neumann, M. H., & von Sachs, R. 1997. Wavelet thresholding in anisotropicfunction classes and application to adaptive estimation of evolutionary spectra.Ann. Statist., 25, 38{77.Nowak, R. D., & Baraniuk, R. G. 1999. Wavelet domain �ltering for photonimaging systems. IEEE Transactions on Image Processing, 8, 666{678.Ogden, R. T., & Parzen, E. 1996. Change-point approach to data analytic waveletthresholding. Statist. Comput., 6, 93{99.Ombao, H., Raz, J., von Sachs, R., & Malow, B. 2001a. Automatic statisticalanalysis of bivariate nonstationary time series. J. Amer. Stat. Assoc., 96, 543{560.Ombao, H., Raz, J., von Sachs, R., & Guo, W. 2002. The SLEX model of anon-stationary random process. Ann. Inst. Statist. Math, 54, 171{200.Ombao, H. C., Raz, J. A., Strawderman, R. L., & von Sachs, R. 2001b. A simplegeneralised crossvalidation method of span selection for periodogram smoothing.Biometrika, 88, 1186{1192. 211

BIBLIOGRAPHYOpsomer, J.D., Wang, Y., & Yang, Y. 2001. Nonparametric regression with cor-related errors. Statistical Science, 16, 134{153.Patil, P. N., & Wood, A. T. 2004. Counting process intensity estimation by or-thogonal wavelet methods. Bernoulli, to appear.Penev, S., & Dechevsky, L. 1997. On non-negative wavelet-based density estima-tors. J. Nonparametr. Statist., 7, 365{394.Pensky, M. 1999. Estimation of a smooth density function using Meyer-typewavelets. Statist. Decisions, 17, 111{123.Pensky, M. 2002. Density deconvolution based on wavelets with bounded supports.Statist. Probab. Lett., 56, 261{269.Pensky, M., & Vidakovic, B. 1999. Adaptive wavelet estimator for nonparametricdensity deconvolution. Ann. Statist., 27, 2033{2053.Percival, D. B., & Walden, A. T. 2000. Wavelet Methods for Time Series Analysis.Cambridge University Press.Pesquet, J. C., Krim, H., & Carfantan, H. 1996. Time-invariant orthonormalwavelet representations. IEEE Trans. Sig. Proc., 44, 1964{1970.Philander, S. 1990. El Ni~no, La Ni~na and the southern oscillation. San Diego:Academic Press.Pinheiro, A., & Vidakovic, B. 1997. Estimating the square root of a density viacompactly supported wavelets. Comput. Statist. Data Anal., 25, 399{415.Prakasa Rao, B. L. S. 1999. Estimation of the integrated squared density deriva-tives by wavelets. Bull. Inform. Cybernet., 31, 47{65.Priestley, M. 1965. Evolutionary spectra and non-stationary processes. Journal ofthe Royal Statistical Society. Series B, 27, 204{237.212

BIBLIOGRAPHYPriestley, M. B. 1981. Spectral Analysis and Time Series. Academic Press.Ramsey, J. 1999. The contribution of wavelets to the analysis of economic and�nancial data. Phil. Trans. Roy. Soc. London A, 357, 2593{2606.Ramsey, J. 2002. Wavelets in economics and �nance: past and future. Studies inNonlinear Dynamics and Econometrics, 6.Rao, T. S., & Indukumar, K. C. 1996. Spectral and wavelet methods for theanalysis of nonlinear and nonstationary time series. J. Franklin Inst. | Eng.Appl. Math., 333B, 425{452.Ruskai, M. B. (ed). 1992. Wavelets and Their Applications. Jones and Barlettbooks in mathematics. Jones and Berlett.Sakiyama, K. 2002. Some statistical applications for locally stationary processes.Sci. Math. Jpn., 56, 231{250.Sardy, S., Antoniadis, A., & Tseng, P. 2004. Automatic smoothing with waveletsfor a wide class of distributions. Journal of Computational and Graphical Statis-tics, to appear.Schneider, K., & Farge, M. 2001. Computing and analyzing turbulent ows us-ing wavelets. Pages 181{216 of: Wavelet transforms and time-frequency signalanalysis. Appl. Numer. Harmon. Anal. Boston, MA: Birkh�auser Boston.Serroukh, A., Walden, A. T., & Percival, C. B. 2000. Statistical properties and usesof the wavelet variance estimator for the scale analysis of time series. Journalof the American Statistical Association, 95, 184{196.Simono�, J. S. 1996. Smoothing Methods in Statistics. Springer.Soltani, S. 2002. On the use of wavelet decomposition for time series prediction.Neurocomputing, 48, 267{277. 213

BIBLIOGRAPHYSoltani, S., Boichu, D., Simard, P., & Canu, S. 2000. The long-term memoryprediction by multiscale decomposition. Signal Processing, 80, 2195{2205.Struzik, Z. R. 2001. Wavelet methods in (�nancial) time series processing. PhysicaA, 296, 307{319.Sweldens, W. 1996. The lifting scheme: a custom-design construction of biorthog-onal wavelets. Appl. Comput. Harmon. Anal., 3, 186{200.Swift, R. 2000. The evolutionary spectra of a harmonizable process. J. Appl.Statist. Sci., 9, 265{275.Taylor, S. J. 1986. Modelling Financial Time Series. Chichester: Wiley.Timmermann, K. E., & Nowak, R. D. 1997. Multiscale Bayesian estimation ofPoisson intensities. Pages 95{90 of: Proceedings of the Asilomar Conference onSignals, Systems and Computers. Paci�c Grove, CA: IEEE Computer Press.Timmermann, K. E., & Nowak, R. D. 1999. Multiscale modeling and estimationof Poisson processes with application to photon-limited imaging. IEEE Trans-actions on Information Theory, 45, 846{862.Triebel, H. 1983. Theory of Function Spaces. Basel: Birkh�auser Verlag.Truong, Y. K., & Patil, P. N. 2001. Asymptotics for wavelet based estimates ofpiecewise smooth regression for stationary time series. Annals of the Instituteof Statistical Mathematics, 53, 159{178.Vanreas, E., Jansen, M., & Bultheel, A. 2002. Stabilized wavelet transforms fornon-equispaced data smoothing. Signal Processing, 82, 1979{1990.Vidakovic, B. 1999. Statistical Modeling by Wavelets. New York: Wiley.von Sachs, R., & MacGibbon, B. 2000. Non-parametric curve estimation by waveletthresholding with locally stationary errors. Scand. J. Statist., 27, 475{499.214

BIBLIOGRAPHYvon Sachs, R., & Schneider, K. 1996. Wavelet smoothing of evolutionary spectraby nonlinear thresholding. Applied and Computational Harmonic Analysis, 3,268{282.Walden, A. T., & Serroukh, A. 2002. Wavelet analysis of matrix-valued time series.Proc. Roy. Soc. London Ser. A, 458, 157{179.Walden, A. T., Percival, D. B., & McCoy, E. J. 1998. Spectrum estimation bywavelet thresholding of multitaper estimators. IEEE Trans. Sig. Proc., 48,3153{3165.Walter, G., & Shen, X. 1999. Deconvolution using Meyer wavelets. J. IntegralEquations Appl., 11, 515{534.Wand, M. P., & Jones, M. C. 1994. Kernel Smoothing. Chapman & Hall.Wang, Y. Z., Cavanaugh, J. E., & Song, C. Y. 2001. Self-similarity index estimationvia wavelets for locally self-similar processes. J. Statist. Plan. Infer., 99, 91{110.Wasserman, P. 1993. Advanced Methods in Neural Computing. New York: VanNostrand Reinhold.West, M., & Harrison, J. 1997. Bayesian Forecasting and Dynamic Models. 2 edn.Springer.Whitcher, B. 2001. Simulating Gaussian processes with unbounded spectra. Jour-nal of Computational and Graphical Statistics, 10, 112{134.Whitcher, B., Byers, S. D., Guttorp, P., & Percival, D. B. 2002. Testing forhomogeneity of variance in time series: Long memory, wavelets and the NileRiver. Water Resources Research, 38.Wong, H., Ip, W., & Li, Y. 2001. Detection of jumps by wavelets in a heteroscedas-tic autoregressive model. Stat. Probab. Lett., 52, 365{372.215

BIBLIOGRAPHYWong, H., Ip, W. C., Xie, Z. J., & Lui, X. L. 2003. Modelling and forecasting bywavelets, and the application to exchange rates. Journal of Applied Statistics,30, 537{553.Zhang, B. L., & Dong, Z. Y. 2001. An adaptive neural-wavelet model for shortterm load forecasting. Electric Power Systems Research, 59, 121{129.Zhang, B. L., Coggins, R., Jabri, M. A., Dersch, D., & Flower, B. 2001a. Mul-tiresolution forecasting for futures trading using wavelet decompositions. IEEETransactions on Neural Networks, 12, 765{775.Zhang, G. P., Patuwo, B. E., & Hu, M. Y. 2001b. A simulation study of arti�cialneural networks for nonlinear time series forecasting. Computer and OperationsResearch, 28, 381{396.Zheng, T. X., Girgis, A. A., & Makram, E. B. 2000. A hybrid wavelet-Kalman�lter method for load forecasting. Electric Power Systems Research, 54, 11{17.Zheng, Y. J., Lin, Z. P., & Tay, D. B. H. 2001. State-dependent vector hybridlinear and nonlinear ARMA modelling: Applications. Circuits Systems andSignal Processing, 20, 575{597.

216

Date post:	18-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Abstract - LSE Statisticsstats.lse.ac.uk/fryzlewicz/phd/thesis.pdf · 2003. 12. 27. · Abstract...

Documents