+ All Categories
Home > Documents > Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz...

Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz...

Date post: 04-Apr-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
27
arXiv:chao-dyn/9810005v1 30 Sep 1998 Practical implementation of nonlinear time series methods: The TISEAN package Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, othnitzer Str. 38, D–01187 Dresden Thomas Schreiber Physics Department, University of Wuppertal, D–42097 Wuppertal We describe the implementation of methods of nonlinear time series analysis which are based on the paradigm of deterministic chaos. A variety of algorithms for data representation, prediction, noise reduction, dimension and Lyapunov estimation, and nonlinearity testing are discussed with particular emphasis on issues of implementation and choice of parameters. Computer programs that implement the resulting strategies are publicly available as the TISEAN software package. The use of each algorithm will be illustrated with a typical application. As to the theoretical background, we will essentially give pointers to the literature. LEAD PARAGRAPH Nonlinear time series analysis is becoming a more and more reliable tool for the study of complicated dynamics from measurements. The concept of low-dimensional chaos has proven to be fruitful in the understanding of many complex phenomena despite the fact that very few natural systems have actu- ally been found to be low dimensional deterministic in the sense of the theory. In order to evaluate the long term usefulness of the nonlinear time series approach as inspired by chaos theory, it will be impor- tant that the corresponding methods become more widely accessible. This paper, while not a proper review on nonlinear time series analysis, tries to make a contribution to this process by describing the actual implementation of the algorithms, and their proper usage. Most of the methods require the choice of certain parameters for each specific time series application. We will try to give guidance in this re- spect. The scope and selection of topics in this article, as well as the implementational choices that have been made, correspond to the contents of the software package TISEAN which is publicly available from http://www.mpipks-dresden.mpg.de/˜ tisean. In fact, this paper can be seen as an extended manual for the TISEAN programs. It fills the gap between the technical documentation and the existing literature, providing the necessary entry points for a more thorough study of the theoretical background. 1
Transcript
Page 1: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

arX

iv:c

hao-

dyn/

9810

005v

1 3

0 Se

p 19

98

Practical implementation of nonlinear time series methods: The TISEAN package

Rainer Hegger, Holger Kantz

Max Planck Institute for Physics of Complex Systems,

Nothnitzer Str. 38, D–01187 Dresden

Thomas Schreiber

Physics Department, University of Wuppertal, D–42097 Wuppertal

We describe the implementation of methods of nonlinear time series analysis which are basedon the paradigm of deterministic chaos. A variety of algorithms for data representation, prediction,noise reduction, dimension and Lyapunov estimation, and nonlinearity testing are discussed withparticular emphasis on issues of implementation and choice of parameters. Computer programs thatimplement the resulting strategies are publicly available as the TISEAN software package. The useof each algorithm will be illustrated with a typical application. As to the theoretical background,we will essentially give pointers to the literature.

LEAD PARAGRAPH

Nonlinear time series analysis is becoming a more and more reliable tool for the study of complicateddynamics from measurements. The concept of low-dimensional chaos has proven to be fruitful in theunderstanding of many complex phenomena despite the fact that very few natural systems have actu-ally been found to be low dimensional deterministic in the sense of the theory. In order to evaluate thelong term usefulness of the nonlinear time series approach as inspired by chaos theory, it will be impor-tant that the corresponding methods become more widely accessible. This paper, while not a properreview on nonlinear time series analysis, tries to make a contribution to this process by describing theactual implementation of the algorithms, and their proper usage. Most of the methods require the choiceof certain parameters for each specific time series application. We will try to give guidance in this re-spect. The scope and selection of topics in this article, as well as the implementational choices that havebeen made, correspond to the contents of the software package TISEAN which is publicly available fromhttp://www.mpipks-dresden.mpg.de/˜ tisean. In fact, this paper can be seen as an extended manualfor the TISEAN programs. It fills the gap between the technical documentation and the existing literature,providing the necessary entry points for a more thorough study of the theoretical background.

1

Page 2: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

I. INTRODUCTION

Deterministic chaos as a fundamental concept is bynow well established and described in a rich literature.The mere fact that simple deterministic systems generi-cally exhibit complicated temporal behavior in the pres-ence of nonlinearity has influenced thinking and intuitionin many fields. However, it has been questioned whetherthe relevance of chaos for the understanding of the timeevolving world goes beyond that of a purely philosoph-ical paradigm. Accordingly, major research efforts arededicated to two related questions. The first question isif chaos theory can be used to gain a better understand-ing and interpretation of observed complex dynamicalbehavior. The second is if chaos theory can give an ad-vantage in predicting or controlling such time evolution.Time evolution as a system property can be measured byrecording time series. Thus, nonlinear time series meth-ods will be the key to the answers of the above questions.This paper is intended to encourage the explorative useof such methods by a section of the scientific communitywhich is not limited to chaos theorists. A range of algo-rithms has been made available in the form of computerprograms by the TISEAN project [1]. Since this is fairlynew territory, unguided use of the algorithms bears con-siderable risk of wrong interpretation and unintelligibleor spurious results. In the present paper, the essentialideas behind the algorithms are summarized and point-ers to the existing literature are given. To avoid exces-sive redundancy with the text book [2] and the recentreview [3], the derivation of the methods will be kept toa minimum. On the other hand, the choices that havebeen made in the implementation of the programs arediscussed more thoroughly, even if this may seem quitetechnical at times. We will also point to possible alter-natives to the TISEAN implementation.

Let us at this point mention a number of general ref-erences on the subject of nonlinear dynamics. At anintroductory level, the book by Kaplan and Glass [4]is aimed at an interdisciplinary audience and providesa good intuitive understanding of the fundamentals ofdynamics. The theoretical framework is thoroughly de-scribed by Ott [5], but also in the older books by Bergeet al. [6] and by Schuster [7]. More advanced material iscontained in the work by Katok and Hasselblatt [8]. Acollection of research articles compiled by Ott et al. [9]covers some of the more applied aspects of chaos, likesynchronization, control, and time series analysis.

Nonlinear time series analysis based on this theoreti-cal paradigm is described in two recent monographs, oneby Abarbanel [10] and one by Kantz and Schreiber [2].While the former volume usually assumes chaoticity, thelatter book puts some emphasis on practical applicationsto time series that are not manifestly found, nor simplyassumed to be, deterministic chaotic. This is the ratio-nale we will also adopt in the present paper. A num-ber of older articles can be seen as reviews, including

Grassberger et al. [11], Abarbanel et al. [12], as well asKugiumtzis et al. [13,14]. The application of nonlineartime series analysis to real world measurements wheredeterminism is unlikely to be present in a stronger sense,is reviewed in Schreiber [3]. Apart from these works, anumber of conference proceedings volumes are devotedto chaotic time series, including Refs. [15–19].

A. Philosophy of the TISEAN implementation

A number of different people have been credited forthe saying that every complicated question has a simpleanswer which is wrong. Analysing a time series with anonlinear approach is definitely a complicated problem.Simple answers have been repeatedly offered in the liter-ature, quoting numerical values for attractor dimensionsfor any conceivable system. The present implementationreflects our scepticism against such simple answers whichare the inevitable result of using black box algorithms.Thus, for example, none of the “dimension” programswill actually print a number which can be quoted as theestimated attractor dimension. Instead, the correlationsum is computed and basic tools are provided for its inter-pretation. It is up to the scientist who does the analysisto put these results into their proper context and to inferwhat information she or he may find useful and plausi-ble. We should stress that this is not simply a question oferror bars. Error bars don’t tell about systematic errorsand neither do they tell if the underlying assumptionsare justified.

The TISEAN project has emerged from work of theinvolved research groups over several years. Some of theprograms are in fact based on code published in Ref. [2].Nevertheless, we still like to see it as a starting pointrather than a conclusive step. First of all, nonlinear timeseries analysis is still a rapidly evolving field, in partic-ular with respect to applications. This implies that theselection of topics in this article and the selection of al-gorithms implemented in TISEAN are highly biased to-wards what we know now and found useful so far. Buteven the well established concepts like dimension esti-mation and noise reduction leave considerable room foralternatives to the present implementation. Sometimesthis resulted in two or more concurring and almost re-dundant programs entering the package. We have delib-erately not eliminated these redundancies since the usermay benefit from having a choice. In any case it is healthyto know that for most of the algorithms the final wordhasn’t been spoken yet – nor is ever to be.

While the TISEAN package does contain a number oftools for linear time series analysis (spectrum, autocor-relations, histograms, etc.), these are only suitable for aquick inspection of the data. Spectral or even ARMAestimation are industries in themselves and we refer thereader – and the user of TISEAN – to the existing litera-ture and available statistics software for optimal, up-to-

2

Page 3: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

date implementations of these important methods.Some users will miss a convenient graphical interface

to the programs. We felt that at this point the extraimplementational effort would not be justified by the ex-pected additional functionality of the package. Work isin progress, however, to provide interfaces to higher levelmathematics (or statistics) software.

B. General computational issues

The natural basis to formulate nonlinear time series al-gorithms from chaos theory is a multi-dimensional phasespace, rather than the time or the frequency domain. Itwill be essential for the global dynamics in this phasespace to be nonlinear in order to fulfill the constraintsof non-triviality and boundedness. Only in particularcases this nonlinear structure will be easily representableby a global nonlinear function. Instead, all propertieswill be expressed in terms of local quantities, often bysuitable global averages. All local information will begained from neighborhood relations of various kinds fromtime series elements. Thus a recurrent computational is-sue will be that of defining local neighborhoods in phasespace. Finding neighbors in multidimensional space is acommon problem of computational geometry. Multidi-mensional tree structures are widely used and have at-tractive theoretical properties. Finding all neighbors ina set of N vectors takes O(log N) operations, thus thetotal operation count is O(N log N). A fast alternativethat is particularly efficient for relatively low dimensionalstructures embedded in multidimensional spaces is givenby box-assisted neighbor search methods which can pushthe operation count down to O(N) under certain assump-tions. Both approaches are reviewed in Ref. [20] withparticular emphasis on time series applications. In theTISEAN project, fast neighbor search is done using abox-assisted approach, as described in Ref. [2].

No matter in what space dimension we are working, wewill define candidates for nearest neighbors in two dimen-sions using a grid of evenly spaced boxes. With a grid ofspacing ǫ, all neighbors of a vector x closer than epsilonmust be located in the adjacent boxes. But not all pointsin the adjacent boxes are neighbors, they may be up to2ǫ away in two dimensions and arbitrarily far in higherdimensions. Neighbors search is thus a two stage pro-cess. First the box-assisted data base has to be filled andthen for each point a list of neighbors can be requested.There are a few instances where it is advisable to aban-don the fast neighbor search strategy. One example isthe program noise that does nonlinear noise filtering ina data stream. It is supposed to start filtering soon afterthe first points have been recorded. Thus the neighbordata base cannot be constructed in the beginning. An-other exception is if quite short (< 500 points, say), highdimensional data are processed. Then the overhead forthe neighbor search should be avoided and instead an op-

timized straight O(N2) method be used, like it is donein c2naive.

For portability, all programs expect time series datain column format represented by ASCII numbers. Thecolumn to be processed can be specified on the com-mand line. Although somewhat wasteful for storing data,ASCII is the least common divisor between the differentways most software can store data. All parameters can beset by adding options to the command, which, in manyprograms, just replace the default values. Obviously, re-lying on default settings is particularly dangerous in sucha subtle field. Since almost all routines can read fromstandard input and write to standard output, programscan be part of pipelines. For example they can be calledas filters from inside graphics software or other softwaretools which are able to execute shell commands. Also,data conversion or compression can be done “on the fly”this way. The reader here realizes that we are speak-ing of UNIX or LINUX platforms which seems to be themost appropriate environment. It is however expectedthat most of the programs will be ported to other envi-ronments in the near future.

For those readers familiar with the programs publishedin Ref. [2] we should mention that these form the basisfor a number of those TISEAN programs written in FOR-TRAN. The C programs, even if they do similar things,are fairly independent implementations. All C and C++programs now use dynamic allocation of storage, for ex-ample.

II. PHASE SPACE REPRESENTATION

Deterministic dynamical systems describe the timeevolution of a system in some phase space Γ ⊂ Rd.They can be expressed for example by ordinary differ-ential equations

x(t) = F(x(t)) , (1)

or in discrete time t = n∆t by maps of the form

xn+1 = f(xn) . (2)

A time series can then be thought of as a sequence of ob-servations {sn = s(xn)} performed with some measure-ment function s(·). Since the (usually scalar) sequence{sn} in itself does not properly represent the (multidi-mensional) phase space of the dynamical system, one hasto employ some technique to unfold the multidimensionalstructure using the available data.

A. Delay coordinates

The most important phase space reconstruction tech-nique is the method of delays. Vectors in a new space, theembedding space, are formed from time delayed values ofthe scalar measurements:

3

Page 4: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

x(t − 10 ms)

x(t

)

0-1

0

-1

x(t − 40 ms)

x(t

)

0-1

0

-1

FIG. 1. Time delay representation of a human mag-neto-cardiogram. In the upper panel, a short delay time of10 ms is used to resolve the fast waveform corresponding tothe contraction of the ventricle. In the lower panel, the slowerrecovery phase of the ventricle (small loop) is better resolveddue to the use of a slightly longer delay of 40 ms. Such a plotcan be conveniently be produced by a graphic tool such asgnuplot without generating extra data files.

sn = (sn−(m−1)τ , sn−(m−2)τ , . . . , sn) . (3)

The number m of elements is called the embedding di-

mension, the time τ is generally referred to as the delay

or lag. Celebrated embedding theorems by Takens [21]and by Sauer et al. [22] state that if the sequence {sn}does indeed consist of scalar measurements of the stateof a dynamical system, then under certain genericity as-sumptions, the time delay embedding provides a one-to-one image of the original set {x}, provided m is largeenough.

Time delay embeddings are used in almost all methodsdescribed in this paper. The implementation is straight-forward and does not require further explanation. If Nscalar measurements are available, the number of embed-ding vectors is only N − (m−1)τ . This has to be kept inmind for the correct normalization of averaged quantities.There is a large literature on the “optimal” choice of theembedding parameters m and τ . It turns out, however,that what constitutes the optimal choice largely depends

on the application. We will therefore discuss the choice ofembedding parameters occasionally together with otheralgorithms below.

A stand-alone version of the delay procedure (delay,embed) is an important tool for the visual inspectionof data, even though visualization is restricted to twodimensions, or at most two-dimensional projections ofthree-dimensional renderings. A good unfolding alreadyin two dimensions may give some guidance about a goodchoice of the delay time for higher dimensional embed-dings. As an example let us show two different two-dimensional delay coordinate representations of a humanmagneto-cardiogram (Fig. 1). Note that we do neitherassume nor claim that the magneto- (or electro-) cardio-gram is deterministic or even chaotic. Although in theparticular case of cardiac recordings the use of time delayembeddings can be motivated theoretically [23], we hereonly want to use the embedding technique as a visualiza-tion tool.

B. Embedding parameters

A reasonable choice of the delay gains importancethrough the fact that we always have to deal with a finiteamount of noisy data. Both noise and finiteness will pre-vent us from having access to infinitesimal length scales,so that the structure we want to exploit should persistsup to the largest possible length scales. Depending on thetype of structure we want to explore we have to choosea suitable time delay. Most obviously, delay unity forhighly sampled flow data will yield delay vectors whichare all concentrated around the diagonal in the embed-ding space and thus all structure perpendicular to the di-agonal is almost invisible. In [24] the terms redundancy

and irrelevance were used to characterize the problem:Small delays yield strongly correlated vector elements,large delays lead to vectors whose components are (al-most) uncorrelated and the data are thus (seemingly)randomly distributed in the embedding space. Quitea number of papers have been published on the properchoice of the delay and embedding dimension. We haveargued repeatedly [11,2,3] that an “optimal” embeddingcan – if at all – only be defined relative to a specificpurpose for which the embedding is used. Nevertheless,some quantitative tools are available to guide the choice.

The usual autocorrelation function (autocor, corr)and the time delayed mutual information (mutual), aswell as visual inspection of delay representations withvarious lags provide important information about rea-sonable delay times while the false neighbors statistic(false nearest) can give guidance about the proper em-bedding dimension. Again, “optimal” parameters cannotbe thus established except in the context of a specific ap-plication.

4

Page 5: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

m

fract

ion

offa

lse

nei

ghbors

54321

1

0.8

0.6

0.4

0.2

0

FIG. 2. The fraction of false nearest neighbors as a functionof the embedding dimension for noise free Lorenz (crosses)and Henon (filled circles) time series, as well as a Henon timeseries (open circles) corrupted by 10% of noise.

1. Mutual information

The time delayed mutual information was suggestedby Fraser and Swinney [25] as a tool to determine a rea-sonable delay: Unlike the autocorrelation function, themutual information takes into account also nonlinear cor-relations. One has to compute

S = −∑

ij

pij(τ) lnpij(τ)

pipj, (4)

where for some partition on the real numbers pi is theprobability to find a time series value in the i-th interval,and pij(τ) is the joint probability that an observationfalls into the i-th interval and the observation time τlater falls into the j-th. In theory this expression hasno systematic dependence on the size of the partition el-ements and can be quite easily computed. There existgood arguments that if the time delayed mutual infor-mation exhibits a marked minimum at a certain value ofτ , then this is a good candidate for a reasonable timedelay. However, these arguments have to be modifiedwhen the embedding dimension exceeds two. Moreover,as will become transparent in the following sections, notall applications work optimally with the same delay. Ourroutine mutual uses Eq.(4), where the number of boxesof identical size and the maximal delay time has to besupplied. The adaptive algorithm used in [25] is moredata intensive. Since we are not really interested in ab-solute values of the mutual information here but ratherin the first minimum, the minimal implementation givenhere seems to be sufficient. The related generalized mu-tual information of order two can be defined using thecorrelation sum concept (Sec.VII, [26,27]). Estimation ofthe correlation entropy is explained in Sec.VII A.

2. False nearest neighbors

A method to determine the minimal sufficient embed-ding dimension m was proposed by Kennel et al. [28].It is called the false nearest neighbor method. The ideais quite intuitive. Suppose the minimal embedding di-mension for a given time series {si} is m0. This meansthat in a m0-dimensional delay space the reconstructedattractor is a one-to-one image of the attractor in theoriginal phase space. Especially, the topological proper-ties are preserved. Thus the neighbors of a given pointare mapped onto neighbors in the delay space. Due tothe assumed smoothness of the dynamics, neighborhoodsof the points are mapped onto neighborhoods again. Ofcourse the shape and the diameter of the neighborhoodsis changed according to the Lyapunov exponents. Butsuppose now you embed in an m-dimensional space withm < m0. Due to this projection the topological struc-ture is no longer preserved. Points are projected intoneighborhoods of other points to which they wouldn’t be-long in higher dimensions. These points are called false

neighbors. If now the dynamics is applied, these falseneighbors are not typically mapped into the image of theneighborhood, but somewhere else, so that the average“diameter” becomes quite large.

The idea of the algorithm false nearest is the fol-lowing. For each point ~si in the time series look for itsnearest neighbor ~sj in a m-dimensional space. Calculatethe distance ‖~si − ~sj‖. Iterate both points and compute

Ri =|si+1 − sj+1|

‖~si − ~sj‖. (5)

If Ri exceeds a given heuristic threshold Rt, this pointis marked as having a false nearest neighbor [28]. Thecriterion that the embedding dimension is high enoughis that the fraction of points for which Ri > Rt is zero,or at least sufficiently small. Two examples are shown inFig. 2. One is for the Lorenz system (crosses), one forthe Henon system (filled circles), and one for a Henontime series corrupted by 10% of Gaussian white noise(open circles). One clearly sees that, as expected, m = 2is sufficient for the Henon and m = 3 for the Lorenzsystem, whereas the signature is less clear in the noisycase.

The introduction of the false nearest neighbors con-cept and other ad hoc instruments was partly a reactionto the finding that many results obtained for the genuineinvariants, like the correlation dimension, has been spu-rious due to caveats of the estimation procedure. In thelatter case, serial correlations and small sample fluctua-tions can easily be mistaken for nonlinear determinism.It turns out, however, that the ad hoc quantities basicallysuffer from the same problems - which can be cured by thesame precautions. The implementation false nearest

therefore allows to specify a minimal temporal separationof valid neighbors.

5

Page 6: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

p2(t)

p1(t

)

210-1

2

1

0

-1

-2

FIG. 3. Phase space representation of a human mag-neto-cardiogram using the two largest principal components.An initial embedding was chosen in m = 20 dimensions witha delay of τ = 7 ms. The two components cover 70% of thevariance of the initial embedding vectors.

Other software for the analysis of false nearest neigh-bors is available in source form from Kennel [29]. Or, ifyou prefer to pay for a license, from Ref. [30].

C. Principal components

It has been shown in Ref. [22] that the embeddingtechnique can be generalized to a wide class of smoothtransformations applied to a time delay embedding. Inparticular, if we introduce time delay coordinates {sn},then almost every linear transformation of sufficient rankagain leads to an embedding. A specific choice of lineartransformation is known as principal component analy-

sis, singular value decomposition, empirical orthogonal

functions, Karhunen-Loeve decomposition, and probablya few other names. The technique is fairly widely used,for example to reduce multivariate data to a few majormodes. There is a large literature, including textbookslike that by Jolliffe [31]. In the context of nonlinear sig-nal processing, the technique has been advocated amongothers by Broomhead and King [32].

The idea is to introduce a new set of orthonormal basisvectors in embedding space such that projections onto agiven number of these directions preserve the maximalfraction of the variance of the original vectors. In otherwords, the error in making the projection is minimized fora given number of directions. Solving this minimizationproblem [31] leads to an eigenvalue problem. The desiredprincipal directions can be obtained as the eigenvectorsof the symmetric autocovariance matrix that correspondto the largest eigenvalues. The alternative and formallyequivalent approach via the trajectory matrix is used inRef. [32]. The latter is numerically more stable but in-volves the singular value decomposition of an N ×m ma-trix for N data points embedded in m dimensions, whichcan easily exceed computational resources for time series

of even moderate length [33].In almost all the algorithms described below, simple

time delay embeddings can be substituted by principalcomponents. In the TISEAN project (routines svd, pc),principal components are only provided as a stand-alonevisualization tool and for linear filtering [34], see Sec. II Ebelow. In any case, one first has to choose an initial timedelay embedding and then a number of principal compo-nents to be kept. For the purpose of visualization, thelatter is immediately restricted to two or at most three.In order to take advantage of the noise averaging effect ofthe principal component scheme, it is advisable to choosea much shorter delay than one would for an ordinarytime delay embedding, while at the same time increasingthe embedding dimension. Experimentation is recom-mended. Figure 3 shows the contributions of the first twoprincipal components to the magneto-cardiogram shownin Fig. 1.

D. Poincare sections

Highly sampled data representing the continuous timeof a differential equation are called flow data. They arecharacterized by the fact that errors in the direction tan-gent to the trajectory do neither shrink nor increase ex-ponentially (so called marginally stable direction) andthus possess one Lyapunov exponent which is zero, sinceany perturbation in this direction can be compensated bya simple shift of the time. Since in many data analysistasks this direction is of low interest, one might wish toeliminate it. The theoretical concept to do so is calledthe Poincare section. After having chosen an (m − 1)-dimensional hyperplane in the m-dimensional (embed-ding) space, one creates a compressed time series of onlythe intersections of the time continuous trajectory withthis hyperplane in a predefined orientation. These dataare then vector valued discrete time map like data. Onecan consider the projection of these (m− 1)-dimensionalvectors onto the real numbers as another measurementfunction (e.g. by recording the value of sn when sn passesthe Poincare surface), so that one can create a new scalartime series if desirable. The program poincare con-structs a sequence of vectors from a scalar flow-like dataset, if one specifies the hyperplane, the orientation, andthe embedding parameters. The intersections of the dis-cretely sampled trajectory with the Poincare plane arecomputed by a third order interpolation.

The placement of the Poincare surface is of high rele-vance for the usefulness of the result. An optimal surfacemaximizes the number of intersections, i.e. minimizes thetime intervals between them, if at the same time the at-tractor remains connected. One avoids the trials anderrors related to that if one defines a surface by the zerocrossing of the temporal derivative of the signal, whichis synonymous with collecting all maxima or all min-ima, respectively. This is done by extrema. However,

6

Page 7: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

sn−1

s n

280027002600

2800

2700

2600

tn−1

t n

60555045

60

55

50

45

FIG. 4. Poincare surface of section using extrema: Atwo-dimensional delay plot of the sequence of maxima (top)and of the time intervals between successive maxima (bot-tom). without employing the option -t time, where time isthe number of time steps after the last extremum during whichno further extrema are searched for (here: 3), one finds somefake extrema due to noise showing up close to the diagonalof the delay representation. Data: Time series of the outputpower of a CO2 laser [35].

this method suffers more from noise, since for small timederivatives (i.e. close to the extrema) additional extremacan be produced by perturbations. Another aspect forthe choice of the surface of section is that one shouldtry to maximize the variance of the data inside the sec-tion, since their absolute noise level is independent ofthe section. One last remark: Time intervals betweenintersections are phase space observables as well [36] andthe embedding theorems are thus valid. For time serieswith pronounced spikes, one often likes to study the se-quence of interspike time intervals, e.g. in cardiology theRR-intervals. If these time intervals are constructed in away to yield time intervals of a Poincare map, they aresuited to reflect the deterministic structure (if any). Forcomplications see [36].

For a periodically driven non-autonomous system thebest surface of section is usually given by a fixed phaseof the driving term, which is also called a stroboscopic

view. Here again the selection of the phase should be

guided by the variance of the signal inside the section.

E. SVD filters

There are at least two reasons to apply an SVD filter totime series data: Either, if one is working with flow data,one can implicitly determine the optimal time delay, or,when deriving a stroboscopic map from synchronouslysampled data of a periodically driven system, one mightuse the redundancy to optimize the signal to noise ratio.

In both applications the mathematics is the same: Oneconstructs the covariance matrix of all data vectors (e.g.in an m-dimensional time delay embedding space),

Cij = 〈sn−m+isn−m+j〉 − 〈sn−m+i〉〈sn−m+j〉 , (6)

and computes its singular vectors. Then one projectsonto the m-dimensional vectors corresponding to the qlargest singular values. To work with flow data, q shouldbe at least the correct embedding dimension, and m con-siderably larger (e.g. m = 2q or larger). The result is avector valued time series, and in [22] the relation of thesecomponents to temporal derivatives on the one hand andto Fourier components on the other hand were discussed.If, in the non-autonomous case, one wants to compressflow data to map data, q = 1. In this case, the redun-dancy of the flow is implicitly used for noise reductionof the map data. The routine svd can be used for bothpurposes.

III. VISUALIZATION, NON-STATIONARITY

A. Recurrence plots

Recurrence plots are a useful tool to identify structurein a data set in a time resolved way qualitatively. Thiscan be intermittency (which one detects also by directinspection), the temporary vicinity of a chaotic trajectoryto an unstable periodic orbit, or non-stationarity. Theywere introduced in [37] and investigated in much detailin [38], where you find many hints on how to interpretethe results. Our routine recurr simply scans the timeseries and marks each pair of time indices (i, j) with ablack dot, whose corresponding pair of delay vectors hasdistance ≤ ǫ. Thus in the (i, j)-plane, black dots indicatecloseness. In an ergodic situation, the dots should coverthe plane uniformly on average, whereas non-stationarityexpresses itself by an overall tendency of the dots to beclose to the diagonal. Of course, a return to a dynamicalsituation the system was in before becomes evident bya black region far away from the diagonal. In Fig. 5, arecurrence plot is used to detect transient behavior at thebeginning of a longer recording.

7

Page 8: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

j

i

3000

300

0

FIG. 5. Recurrence plot for Poincare section data from avibrating string experiment [39]. Above the diagonal an em-bedding in two dimensions was used while below the diago-nal, scalar time series values were compared. In both casesthe lighter shaded region at the beginning of the recordingindicates that these data are dynamically distinct from therest. In this particular case this was due to adjustments inthe measurement apparatus.

For the purpose of stationary testing, the recurrenceplot is not particularly sensitive to the choice of embed-ding. The contrast of the resulting images can be se-lected by the distance ǫ and the percentage of dots thatshould be actually plotted. Various software involvingthe color rendering and quantification of recurrence plotsis offered in DOS executable form by Webber [40]. Theinterpretation of the often intriguing patterns beyond thedetection and study of non-stationarity is still an openquestion. For suggestions for the study of nonstationarysignals see [3] and references given there.

B. Space-time separation plot

While the recurrence plot shows absolute times, thespace-time separation plot introduced by Provenzale etal. [41] integrates along parallels to the diagonal and thusonly shows relative times. One usually draws lines ofconstant probability per time unit of a point to be anǫ-neighbor of the current point, when its time distance isδt. This helps identifying temporal correlations inside thetime series and is relevant to estimate a reasonable delaytime, and, more importantly, the Theiler-window w indimension and Lyapunov-analysis (see Sec. VII). Said indifferent words, it shows how large the temporal distancebetween points should be so that we can assume thatthey form independent samples according to the invari-ant measure. The corresponding routine of the TISEAN

relative time δt

dis

tance

5004003002001000

600

400

200

0

FIG. 6. Space-time separation plot of the CO2 laser data.Shown are lines of constant probability density of a point tobe ǫ-neighbor of the current point if its temporal distance isδt. Probability densitites are 1/10 to 1 with increments of1/10 from bottom to top. Clear correlations are visible.

package is stp, see Fig. 6.

IV. NONLINEAR PREDICTION

To think about predictability in time series data isworth while even if one is not interested in forecasts atall. Predictability is one way how correlations betweendata express themselves. These can be linear correla-tions, nonlinear correlations, or even deterministic con-traints. Questions related to those relevant for predic-tions will reappear with noise reduction and in surrogatedata tests, but also for the computation of Lyapunov ex-ponents from data. Prediction is discussed in most of thegeneral nonlinear time series references, in particular, anice collection of articles can be found in [17].

A. Model validation

Before entering the methods, we have to discuss howto assess the results. The most obvious quantity for thequantification of predictability is the average forecast er-ror, i.e. the root of the mean squared (rms) deviation ofthe individual prediction from the actual future value. Ifit is computed on those values which were also used toconstruct the model (or to perform the predictions), it iscalled the in-sample error. It is always advisable to savesome data for an out-of-sample test. If the out-of-sampleerror is considerably larger than the in-sample error, dataare either non-stationary or one has overfitted the data,i.e. the fit extracted structure from random fluctuations.A model with less parameters will then serve better. Incases where the data base is poor, on can apply complete

cross-validation or take-one-out statistics, i.e. one con-structs as many models as one performs forecasts, and ineach case ignores the point one wants to predict. By con-

8

Page 9: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

m = 4, τ = 6m = 4, τ = 1m = 3, τ = 6m = 2, τ = 6

forecast time k

rms

fore

cast

erro

r

5004003002001000

0.6

0.5

0.4

0.3

0.2

0.1

0

FIG. 7. Predictions k time steps ahead (no iterated pre-dictions) using the program zeroth. Top curve: embed-ding dimension two is insufficient, since these flow data filla (2+ǫ)-dimensional attractor. Second from top: Althoughembedding dimension four should in theory be a good em-bedding, τ = 1 suppresses structure perpendicular to the di-agonal so that the predictions are as bad as in m = 2! Lowercurves: m = 3 and 4 with a delay of about 4-8 time unitsserve well.

struction, this method is realized in the local approaches,but not in the global ones.

The most significant, but least quantitative way ofmodel validation is to iterate the model and to comparethis synthetic time series to the experimental data. Ifthey are compatible (e.g. in a delay plot), then the modelis likely to be reasonable. Quantitatively, it is not easy todefine the compatibility. One starts form an observed de-lay vector as intial condition, performs the first forecast,combines the forecast with all but the last componentsof the initial vector to a new delay vector, performs thenext forecast, and so on. The resulting time series shouldthen be compared to the measured data, most easily theattractor in a delay representation.

B. Simple nonlinear prediction

Conventional linear prediction schemes average over alllocations in phase space when they extract the correla-tions they exploit for predictability. Tong [42] promotedan extension that fits different linear models if the cur-rent state is below or above a given threshold (TAR,Threshold Autoregressive Model). If we expect morethan a slight nonlinear component to be present, it ispreferable to make the approximation as local in phasespace as possible. There have been many similar sugges-tions in the literature how to exploit local structure, seee.g. [43–46]. The simplest approach is to make the ap-proximation local but only keep the zeroth order, that is,approximate the dynamics locally by a constant. In theTISEAN package we include such a robust and simple

fixed pointperiod 2

”period 6

sn−1

s n

1.510.50-0.5-1-1.5

1.5

1

0.5

0

-0.5

-1

-1.5

FIG. 8. Orbits of period six, or a sub-period thereof, ofthe Henon map, determined from noisy data. The Henonattractor does not have a period three orbit.

method: In a delay embedding space, all neighbors of sn

are saught, if we want to predict the measurements attime n + k. The forecast is then simply

sn+k =1

|Un|

sj∈Un

sj+k , (7)

i.e. the average over the “futures” of the neighbors. Theaverage forecast errors obtained with the routine zeroth

(predict would give similar results) for the laser outputdata used in Fig. 4 as a function of the number k of stepsahead the predictions are made is shown in Fig. 7. Onecan also iterate the predictions by using the time seriesas a data base.

Apart from the embedding parameters, all that hasto be specified for zeroth order predictions is the size ofthe neighborhoods. Since the diffusive motion below thenoise level cannot be predicted anyway, it makes senseto select neighborhoods which are at least as large as thenoise level, maybe two or three times larger. For fairlyclean time series, this guideline may result in neighbor-hoods with very few points. Therefore zeroth also per-mits to specify the minimal number of neighbors to basethe predictions on.

A relevant modification of this method is to extend theneighborhood U to infinity, but to introduce a distancedependent weight,

sn+k =

j 6=n sj+kw(|sn − sj |)∑

j 6=n w(|sn − sj|), (8)

where w is called the kernel. For w(z) = Θ(ǫ − z) whereΘ is the Heaviside step function, we return to Eq.(7).

9

Page 10: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

C. Finding unstable periodic orbits

As an application of simple nonlinear phase space pre-diction, let us discuss a method to locate unstable peri-odic orbits embedded in a chaotic attractor. This is notthe place to review the existing methods to solve thisproblem, some references include [47–50]. The TISEANpackage contains a routine that implements the require-ment that for a period p orbit {sn, n = 1, . . . , p} of adynamical system like Eq.(2) acting on delay vectors

sn+1 = f(sn), n = 1, . . . , p, sp+1 ≡ s1 . (9)

With unit delay, the p delay vectors contain p differentscalar entries, and Eq.(9) defines a root of a system of pnonlinear equations in p dimensions. Multidimensionalroot finding is not a simple problem. The standard New-ton method has to be augmented by special tricks in or-der to converge globally. Some such tricks, in particularmeans to select different solutions of Eq.(9) are imple-mented in [50]. Similarly to the problems encountered innonlinear noise reduction, solving Eq.(9) exactly is par-ticularly problematic since f(·) is unknown and must beestimated from the data. In Ref. [49], approximate so-lutions are found by performing just one iteration of theNewton method for each available time series point. Weprefer to look for a least squares solution by minimizing

p∑

n=1

‖sn+1 − f(sn)‖2, sp+1 ≡ s1 (10)

instead. The routine upo uses a standard Levenberg-Marquardt algorithm to minimize (10). For this it isnecessary that f(·) is smooth. Therefore we cannot usethe simple nonlinear predictor based on locally constantapproximations and we have to use a smooth kernel ver-sion, Eq.(8), instead. With w(z) = exp(−z2/2h2), thekernel bandwidth h determines the degree of smoothnessof f(·). Trying to start the minimization with all avail-able time series segments will produce a number of falseminima, depending on the value of h. These have to bedistinguished from the true solutions by inspection. Onthe other hand, we can reach solutions of Eq.(9) whichare not closely visited in the time series at all, an impor-tant advantage over close return methods [47].

It should be noted that, depending on h, we may al-ways find good minima of (8), even if no solution ofEq.(9), or not even a truly deterministic dynamics ex-ists. Thus the finding of unstable periodic orbits in itselfis not a strong indicator of determinism. We may how-ever use the cycle locations or stabilities as a discrimi-nating statistics in a test for nonlinearity, see Sec. VIII.While the orbits themselves are found quite easily, it issurprisingly difficult to obtain reliable estimates of theirstability in the presence of noise. In upo, a small pertur-bation is iterated along the orbit and the unstable eigen-value is determined by the rate of its separation from theperiodic orbit.

The user of upo has to specify the embedding dimen-sion, the period (which may also be smaller) and thekernel bandwidth. For efficiency, one may choose to skiptrials with very similar points. Orbits are counted as dis-tinct only when they differ by a specified amount. Theroutine finds the orbits, their expanding eigenvalue, andpossible sub-periods. Figure 8 shows the determinationof all period six orbits from 1000 iterates of the Henonmap, contaminated by 10% Gaussian white noise.

D. Locally linear prediction

If there is a good reason to assume that the relationsn+1 = f(sn) is fulfilled by the experimental data in goodapproximation (say, within 5%) for some unknown f andthat f is smooth, predictions can be improved by fittinglocal linear models. They can be considered as the lo-cal Taylor expansion of the unknown f , and are easilydetermined by minimizing

σ2 =∑

sj∈Un

(sj+1 − ansj − bn)2 (11)

with respect to an and bn, where Un is the ǫ-neighborhood of sn, excluding sn itself, as before. Then,the prediction is sn+1 = ansn + bn. The minimiza-tion problem can be solved through a set of coupled lin-ear equations, a standard linear algebra problem. Thisscheme is implemented in onestep. For moderate noiselevels and time series lengths this can give a reasonableimprovement over zeroth and predict. Moreover, asdiscussed in Sec.VI, these linear maps are needed for thecomputation of the Lyapunov spectrum. Locally linearapproximation was introduced in [45,46]. We should notethat the straight least squares solution of Eq.(11) is notalways optimal and a number of strategies are available toregularize the problem if the matrix becomes nearly sin-gular and to remove the bias due to the errors in the “in-dependent” variables. These strategies have in commonthat any possible improvement is bought with consider-able complication of the procedure, requiring subtle pa-rameter adjustments. We refer the reader to Refs. [51,52]for advanced material.

In Fig. 9 we show iterated predictions of the Poincaremap data from the CO2 laser (Fig. 4) in a delay repre-sentation (using nstep in two dimensions). The resultingdata do not only have the correct marginal distributionand power spectrum, but also form a perfect skeleton ofthe original noisy attractor. There are of course artefactsdue to noise and the roughness of this approach, but thereare good reasons to assume that the line-like substructurereflects fractality of the unperturbed system.

Casdagli [53] suggested to use local linear models as atest for nonlinearity: He computed the average forecasterror as a function of the neighborhood size on which thefit for an and bn is performed. If the optimum occurs

10

Page 11: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

sn−1

s n

280027002600

2800

2700

2600

FIG. 9. Time delay representation of 5000 iterations of thelocal linear predictor nstep in two dimensions, starting fromthe last delay vector of Fig. 4.

at large neighborhood sizes, the data are (in this embed-ding space) best described by a linear stochastic process,whereas an optimum at rather small sizes supports theidea of the existence of a nonlinear almost deterministicequation of motion. This protocol is implemented in theroutine ll-ar, see Fig. 10.

E. Global function fits

The local linear fits are very flexible, but can go wrongon parts of the phase space where the points do not spanthe available space dimensions and where the inverse ofthe matrix involved in the solution of the minimizationdoes not exist. Moreover, very often a large set of differ-ent linear maps is unsatisfying. Therefore many authorssuggested to fit global nonlinear functions to the data,i.e. to solve

σ2 =∑

n

(sn+1 − fp(sn))2 , (12)

where fp is now a nonlinear function in closed form withparameters p, with respect to which the minimizationis done. Polynomials, radial basis functions, neural nets,orthogonal polynomials, and many other approaches havebeen used for this purpose. The results depend on howfar the chosen ansatz fp is suited to model the unknownnonlinear function, and on how well the data are de-terministic at all. We included the routines rbf andpolynom in the TISEAN package, where fp is modeledby radial basis functions [54,55] and polynomials [56],respectively. The advantage of these two models is thatthe parameters p occur linearly in the function f and canthus be determined by simple linear algebra, and the so-lution is unique. Both features are lost for models wherethe parameters enter nonlinearly.

In order to make global nonlinear predictions, one hasto supply the embedding dimension and time delay asusual. Further, for polynom the order of the polynomial

neighbourhood size ǫ

norm

alize

der

ror

10.10.01

1

0.8

0.6

0.4

0.2

0

FIG. 10. The Casdagli test for nonlinearity: The rms pre-diction error of local linear models as a function of the neigh-borhood size ǫ. Lower curve: The CO2 laser data. These dataare obviously highly deterministic in m=4 dimensions andwith lag τ=6. Central curve: The breath rate data shown inFig. 12 with m=4 and τ=1. Determinism is weaker (presum-ably due to a much higher noise level), but still the nonlinearstructure is dominant. Upper curve: Numerically generateddata of an AR(5) process, a linearly correlated random pro-cess (m=5, τ=1).

has to be given. The program returns the coefficients ofthe model. In rbf one has to specify the number of basisfunctions to be distributed on the data. The width ofthe radial basis functions (Lorentzians in our program)is another parameter, but since the minimization is sofast, the program runs many trial values and returns pa-rameters for the best. Figure 11 shows the result of afit to the CO2 laser time series (Fig. 4) with radial basisfunctions.

If global models are desired in order to infer the struc-ture and properties of the underlying system, they shouldbe tested by iterating them. The prediction errors, al-though small in size, could be systematic and thus repelthe iterated trajectory from the range where the originaldata are located. It can be useful to study a dependenceof the size or the sign of the prediction errors on theposition in the embedding space, since systematic errorscan be reduced by a different model. Global models areattractive because they yield closed expressions for thefull dynamics. One must not forget, however, that thesemodels describe the observed process only in regions ofthe space which have been visited by the data. Outsidethis area, the shape of the model depends exclusivelyon the chosen ansatz. In particular, polynomials divergeoutside the range of the data and hence can be unstableunder iteration.

11

Page 12: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

sn−1

s n

280027002600

2800

2700

2600

FIG. 11. Attractor obtained by iterating the model thathas been obtained by a fit with 40 radial basis functions intwo dimensions to the time series shown in Fig. 4. Comparealso Fig. 9.

V. NONLINEAR NOISE REDUCTION

Filtering of signals from nonlinear systems requires theuse of special methods since the usual spectral or otherlinear filters may interact unfavorably with the nonlin-ear structure. Irregular signals from nonlinear sourcesexhibit genuine broad band spectra and there is no jus-tification to identify any continuous component in thespectrum as noise. Nonlinear noise reduction does notrely on frequency information in order to define the dis-tinction between signal and noise. Instead, structure inthe reconstructed phase space will be exploited. Gen-eral serial dependencies among the measurements {sn}will cause the delay vectors {sn} to fill the available m-dimensional embedding space in an inhomogeneous way.Linearly correlated Gaussian random variables will forexample be distributed according to an anisotropic multi-variate Gaussian distribution. Linear geometric filteringin phase space seeks to identify the principal directionsof this distribution and project onto them, see Sec. II E.Nonlinear noise reduction takes into account that nonlin-ear signals will form curved structures in delay space. Inparticular, noisy deterministic signals form smeared-outlower dimensional manifolds. Nonlinear phase space fil-tering seeks to identify such structures and project ontothem in order to reduce noise.

There is a rich literature on nonlinear noise reductionmethods. Two articles of review character are avail-able, one by Kostelich and Schreiber [57], and one byDavies [58]. We refer the reader to these articles for fur-ther references and for the discussion of approaches notdescribed in the present article. Here we want to con-centrate on two approaches that represent the geometricstructure in phase space by local approximation. Thefirst and simplest does so to constant order, the moresophisticated uses local linear subspaces plus curvaturecorrections.

A. Simple nonlinear noise reduction

The simplest nonlinear noise reduction algorithm weknow of replaces the central coordinate of each embed-ding vector by the local average of this coordinate. Thisamounts to a locally constant approximation of the dy-namics and is based on the assumption that the dynam-ics is continuous. The algorithm is described in [59], asimilar approach is proposed in [43]. In an unstable, forexample chaotic, system, it is essential not to replace thefirst and last coordinates of the embedding vectors by lo-cal averages. Due to the instability, initial errors in thesecoordinates are magnified instead of being averaged out.

This noise reduction scheme is implemented quite eas-ily. First an embedding has to be chosen. Except for ex-tremely oversampled data, it is advantageous to choosea short time delay. The program lazy always uses unitdelay. The embedding dimension m should be chosensomewhat higher than that required by the embeddingtheorems. Then for each embedding vector {sn}, a neigh-

borhood U(n)ǫ is formed in phase space containing all

points {sn′} such that ‖sn − sn′‖ < ǫ. The radius ofthe neighborhoods ǫ should be taken large enough in or-der to cover the noise extent, but still smaller than atypical curvature radius. These conditions cannot al-ways be fulfilled simultaneously, in which case one hasto repeat the process with several choices and carefullyevaluate the results. If the noise level is substantiallysmaller than the typical radius of curvature, neighbor-hoods of radius about 2-3 times the noise level gave thebest results with artificial data. For each embedding vec-tor sn = (sn−(m−1), . . . , sn) (the delay time has been setto unity), a corrected middle coordinate sn−m/2 is com-

puted by averaging over the neighborhood U(n)ǫ :

sn−m/2 =1

|U(n)ǫ |

sn′∈U(n)ǫ

sn′−m/2 . (13)

After one complete sweep through the time series, allmeasurements sn are replaced by the corrected valuessn. Of course, for the first and last (m − 1)/2 points(if m is odd), no correction is available. The averagecorrection can be taken as a new neighborhood radius forthe next iteration. Note that the neighborhood of eachpoint at least contains the point itself. If that is the onlymember, the average Eq.(13) is simply the uncorrectedmeasurement and no change is made. Thus one can safelyperform multiple iterations with decreasing values of ǫuntil no further change is made.

Let us illustrate the use of this scheme with an ex-ample, a recording of the air flow through the nose of ahuman as an indicator of breath activity. (The data ispart of data set B of the Santa Fe time series contest heldin 1991/92 [17], see Rigney et al. [60] for a description.)The result of simple nonlinear noise reduction is shownin Fig. 12.

12

Page 13: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

x(t − 0.5 s)

x(t

)

10

1

0

x(t − 0.5 s)

x(t

)

10

1

0

FIG. 12. Simple nonlinear noise reduction of human breathrate data. Three iterations have been carried out, staring withneighborhoods of size 0.4 units. Embeddings in 7 dimensionsat unit delay have been used. Arguably, the resulting series(lower panel) is less noisy. However, in Sec. VIII we will showevidence that the noise is not just additive and independentof the signal.

B. Locally projective nonlinear noise reduction

A more sophisticated method makes use of the hy-potheses that the measured data is composed of the out-put of a low-dimensional dynamical system and of ran-dom or high-dimensional noise. This means that in anarbitrarily high-dimensional embedding space the deter-ministic part of the data would lie on a low-dimensionalmanifold, while the effect of the noise is to spread thedata off this manifold. If we suppose that the amplitudeof the noise is sufficiently small, we can expect to findthe data distributed closely around this manifold. Theidea of the projective nonlinear noise reduction schemeis to identify the manifold and to project the data ontoit. The strategies described here go back to Ref. [61]. Arealistic case study is detailed in Ref. [62].

Suppose the dynamical system, Eq. (1) or Eq. (2), forma q-dimensional manifold M containing the trajectory.According to the embedding theorems, there exists a one-to-one image of the attractor in the embedding space, if

the embedding dimension is sufficiently high. Thus, ifthe measured time series were not corrupted with noise,all the embedding vectors sn would lie inside anothermanifold M in the embedding space. Due to the noisethis condition is no longer fulfilled. The idea of the locallyprojective noise reduction scheme is that for each sn thereexists a correction Θn, with ‖Θn‖ small, in such a way

that sn − Θn ∈ M and that Θn is orthogonal on M.Of course a projection to the manifold can only be areasonable concept if the vectors are embedded in spaceswhich are higher dimensional than the manifold M. Thuswe have to over-embed in m-dimensional spaces with m >q.

The notion of orthogonality depends on the metricused. Intuitively one would think of using the Euclideanmetric. But this is not necessarily the best choice. Thereason is that we are working with delay vectors whichcontain temporal information. Thus even if the middleparts of two delay vectors are close, the late parts couldbe far away from each other due to the influence of thepositive Lyapunov exponents, while the first parts coulddiverge due the negative ones. Hence it is usually de-sirable to correct only the center part of delay vectorsand leave the outer parts mostly unchanged, since theirdivergence is not only a consequence of the noise, butalso of the dynamics itself. It turns out that for most ap-plications it is sufficient to fix just the first and the lastcomponent of the delay vectors and correct the rest. Thiscan be expressed in terms of a metric tensor P which wedefine to be [61]

Pij =

{

1 : i = j and 1 < i, j < m0 : elsewhere

, (14)

where m is the dimension of the “over-embedded” delayvectors.

Thus we have to solve the minimization problem

i

(

ΘiP−1Θi

) != min (15)

with the constraints

ain (sn − Θn) + bi

n = 0 for i = q + 1, . . . , m (16)

and

ainPaj

n = δij (17)

where the ain are the normal vectors of M at the point

sn − Θn.This ideas are realized in the programs ghkss,

project, and noise in TISEAN. While the first two workas a posteriori filters on complete data sets, the last onecan be used in a data stream. This means that it ispossible to do the corrections online, while the data iscoming in (for more details see section VC). All threealgorithms mentioned above correct for curvature effects.This is done by either post-processing the corrections for

13

Page 14: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

the delay vectors (ghkss) or by preprocessing the centresof mass of the local neighborhoods (project).

The idea used in the ghkss program is the following.Suppose the manifold were strictly linear. Then, pro-vided the noise is white, the corrections in the vicinityof a point on the manifold would point in all directionswith the same probability. Thus, if we added all the cor-rections Θ we expect them to sum to zero (or 〈Θ〉 = O).On the other hand, if the manifold is curved, we ex-pect that there is a trend towards the centre of curva-ture (〈Θ〉 = Θav). Thus, to correct for this trend eachcorrection Θ is replaced by Θ − Θav.

A different strategy is used in the program project.The projections are done in a local coordinate systemwhich is defined by the condition that the average of thevectors in the neighborhood is zero. Or, in other words,the origin of the coordinate systems is the centre of mass〈sn〉U of the neighborhood U . This centre of mass hasa bias towards the centre of the curvature [2]. Hence, aprojection would not lie on the tangent at the manifold,but on a secant. Now we can compute the centre of massof these points in the neighborhood of sn. Let us callit 〈〈sn〉〉U . Under fairly mild assumptions this point hastwice the distance from the manifold then 〈sn〉U . To cor-rect for the bias the origin of the local coordinate systemis set to the point: 〈〈sn〉〉U − 2〈sn〉U .

The implementation and use of locally projective noisereduction as realized in project and ghkss is describedin detail in Refs. [61,62]. Let us recall here the mostimportant parameters that have to be set individuallyfor each time series. The embedding parameters are usu-ally chosen quite differently from other applications sinceconsiderable over-embedding may lead to better noise av-eraging. Thus, the delay time is preferably set to unityand the embedding dimension is chosen to provide em-bedding windows of reasonable lengths. Only for highlyoversampled data (like the magneto-cardiogram, Fig. 15,at about 1000 samples per cycle), larger delays are nec-essary so that a substantial fraction of a cycle can becovered without the need to work in prohibitively highdimensional spaces. Next, one has to decide how manydimensions q to leave for the manifold supposedly con-taining the attractor. The answer partly depends on thepurpose of the experiment. Rather brisk projections canbe optimal in the sense of lowest residual deviation fromthe true signal. Low rms error can however coexist withsystematic distortions of the attractor structure. Thusfor a subsequent dimension calculation, a more conserva-tive choice would be in order. Remember however thatpoints are only moved towards the local linear subspaceand too low a value of q does not do as much harm asmay be though.

The noise amplitude to be removed can be selected tosome degree by the choice of the neighborhood size. Infact, nonlinear projective filtering can be seen indepen-dently of the dynamical systems background as filteringby amplitude rather than by frequency or shape. To al-low for a clear separation of noise and signal directions

x(t − 11 ms)

x(t

)

3015

-25

-40

x(t − 11 ms)

x(t

)

3015

-25

-40

FIG. 13. Two-dimensional representation of the NMRLaser data (top) and the result of the ghkss algorithm (bot-tom) after three iterations.

locally, neighborhoods should be at least as large as thesupposed noise level, rather larger. This of course com-petes with curvature effects. For small initial noise levels,it is recommended to also specify a minimal number ofneighbors in order to permit stable linearizations. Fi-nally, we should remark that in successful cases mostof the filtering is done within the first one to three it-erations. Going further is potentially dangerous sincefurther corrections may lead mainly to distortion. Oneshould watch the rms correction in each iteration andstop as soon as it doesn’t decrease substantially anymore.

As an example for nonlinear noise reduction we treatthe data obtained from an NMR laser experiment [63].Enlargements of two-dimensional delay representationsof the data are shown in Fig. 13. The upper panel showsthe raw experimental data which contains about 1.1% ofnoise. The lower panel was produced by applying threeiterations of the noise reduction scheme. The embeddingdimension was m = 7, the vectors were projected downto two dimensions. The size of the local neighborhoodswere chosen such that at least 50 neighbors were found.One clearly sees that the fractal structure of the attractoris resolved fairly well.

14

Page 15: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

xn−1

xn

210-1-2

2

1

0

-1

-2

xn−1

xn

210-1-2

2

1

0

-1

-2

FIG. 14. Two-dimensional representation of a pure Gaus-sian process (top) and the outcome of the ghkss algorithm(bottom) after 10 iterations. Projections from m = 7 downto two dimensions were performed.

The main assumption for this algorithm to work is thatthe data is well approximated by a low-dimensional mani-fold. If this is not the case it is unpredictable what resultsare created by the algorithm. In the absence of a realmanifold, the algorithm must pick statistical fluctuationsand spuriously interprets them as structure. Figure 14shows a result of the ghkss program for pure Gaussiannoise. The upper panel shows a delay representation ofthe original data, the lower shows the outcome of apply-ing the algorithm for 10 iterations. The structure createdis purely artifical and has nothing to do with structuresin the original data. This means that if one wants toapply one of the algorithms, one has to carefully studythe results. If the assumptions underlying the algorithmsare not fulfilled in principle anything can happen. Oneshould note however, that the performance of the pro-gram itself indicates such spurious behavior. For datawhich is indeed well approximated by a lower dimensionalmanifold, the average corrections applied should rapidlydecrease with each successful iteration. This was the casewith the NMR laser data and in fact, the correction wasso small after three iteration that we stopped the proce-dure. For the white noise data, the correction only de-

x(t

)[A

/D

units]

0

-1

t [s]

3210

0

-1

FIG. 15. Real time nonlinear projective filtering of a mag-neto-cardiogram time series. The top panel shows the unfil-tered data. Bottom: Two iterations were done using projec-tions from m = 10 down to q = 2 dimensions (delay 0.01 s).Neighborhoods were limited to a radius of 0.1 units (0.05 inthe second iteration) and to maximally 200 points. Neighborswere only sought up to 5 s back in time. Thus the first 5 s ofdata are not filtered optimally and are not shown here. Sincethe output of each iteration leaps behind its input by one de-lay window the last 0.2 s cannot be processed given the datain the upper panel.

creased at a rate that corresponds to a general shrinkingof the point set, indicating a lack of convergence towardsa genuine low dimensional manifold. Below, we will givean example where an approximating manifold is presentwithout pure determinism. In that case, projecting ontothe manifold does reduce noise in a reasonable way. SeeRef. [64] for material on the dangers of geometric filter-ing.

C. Nonlinear noise reduction in a data stream

In Ref. [65], a number of modifications of the aboveprocedure have been discussed which enable the use ofnonlinear projective filtering in a data stream. In thiscase, only points in the past are available for the for-mation of neighborhoods. Therefore the neighbor searchstrategy has to be modified. Since the algorithm is de-scribed in detail in Ref. [65], we only give an exampleof its use here. Figure 15 shows the result of nonlin-ear noise reduction on a magneto-cardiogram (see Figs. 1and 3) with the program noise. The same program hasalso been used successfully for the extraction of the fetalECG [66].

15

Page 16: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

VI. LYAPUNOV EXPONENTS

Chaos arises from the exponential growth of infinites-imal perturbations, together with global folding mecha-nisms to guarantee boundedness of the solutions. Thisexponential instability is characterized by the spectrumof Lyapunov exponents [67]. If one assumes a local de-composition of the phase space into directions with dif-ferent stretching or contraction rates, then the spectrumof exponents is the proper average of these local ratesover the whole invariant set, and thus consists of as manyexponents as there are space directions. The most promi-nent problem in time series analysis is that the physicalphase space is unknown, and that instead the spectrum iscomputed in some embedding space. Thus the number ofexponents depends on the reconstruction, and might belarger than in the physical phase space. Such additionalexponents are called spurious, and there are several sug-gestions to either avoid them [68] or to identify them.Moreover, it is plausible that only as many exponentscan be determined from a time series as are entering theKaplan Yorke formula (see below). To give a simple ex-ample: Consider motion of a high-dimensional systemon a stable limit cycle. The data cannot contain anyinformation about the stability of this orbit against per-turbations, as long as they are exactly on the limit cycle.For transients, the situation can be different, but thendata are not distributed according to an invariant mea-sure and the numerical values are thus difficult to inter-pret. Apart from these difficulties, there is one relevantpositive feature: Lyapunov exponents are invariant un-der smooth transformations and are thus independent ofthe measurement function or the embedding procedure.They carry a dimension of an inverse time and have tobe normalized to the sampling interval.

A. The maximal exponent

The maximal Lyapunov exponent can be determinedwithout the explicit construction of a model for the timeseries. A reliable characterization requires that the inde-pendence of embedding parameters and the exponentiallaw for the growth of distances are checked [69,70] ex-plicitly. Consider the representation of the time seriesdata as a trajectory in the embedding space, and assumethat you observe a very close return sn′ to a previouslyvisited point sn. Then one can consider the distance∆0 = sn − sn′ as a small perturbation, which shouldgrow exponentially in time. Its future can be read fromthe time series: ∆l = sn+l − sn′+l. If one finds that|∆l| ≈ ∆0e

λl then λ is (with probability one) the maxi-mal Lyapunov exponent. In practice, there will be fluc-tuations because of many effects, which are discussed indetail in [69]. Based on this understanding, one can de-rive a robust consistent and unbiased estimator for themaximal Lyapunov exponent. One computes

t [section crossings]

S(ǫ

,m,t

)

121086420

0

-1

-2

-3

-4

-5

t [flow samples]S

(ǫ,m

,t)

4003002001000

0

-1

-2

-3

-4

-5

FIG. 16. Estimating the maximal Lyapunov exponent ofthe CO2 laser data. The top panel shows results for thePoincare map data, where the average time interval Tav

is 52.2 samples of the flow, and the straight line indicatesλ = 0.38. For comparison: The iteration of the radial ba-sis function model of Fig. 11 yields λ=0.35. Bottom panel:Lyapunov exponents determined directly from the flow data.The straight line has slope λ = 0.007. In good approxima-tion, λmap = λflowTav. Here, the time window w to suppresscorrelated neighbors has been set to 1000, and the delay timewas 6 units.

S(ǫ, m, t) =

ln

1

|Un|

sn′∈Un

|sn+t − sn′+t|

n

. (18)

If S(ǫ, m, t) exhibits a linear increase with identical slopefor all m larger than some m0 and for a reasonable rangeof ǫ, then this slope can be taken as an estimate of themaximal exponent λ1.

The formula is implemented in the routines lyap kand lyapunov in a straightforward way. (The programlyap r implements the very similar algorithm of Ref. [70]where only the closest neighbor is followed for each ref-erence point. Also, the Euclidean norm is used.) Apartfrom parameters characterizing the embedding, the ini-tial neighborhood size ǫ is of relevance: The smaller ǫ,the large the linear range of S, if there is one. Obviously,noise and the finite number of data points limit ǫ frombelow. The default values of lyap k are rather reasonablefor map-like data. It is not always necessary to extendthe average in Eq.(18) over the whole available data, rea-

16

Page 17: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

sonable averages can be obtained already with a few hun-dred reference points sn. If some of the reference pointshave very few neighbors, the corresponding inner sumin Eq.(18) is dominated by fluctuations. Therefore onemay choose to exclude those reference points which haveless than, say, ten neighbors. However, discretion has tobe applied with this parameter since it may introduce abias against sparsely populated regions. This could intheory affect the estimated exponents due to multifrac-tality. Like other quantities, Lyapunov estimates may beaffected by serial correlations between reference pointsand neighbors. Therefore, a minimum time for |n − n′|can and should be specified here as well. See also Sec.VII.

Let us discuss a few typical outcomes. The data un-derlying the top panel of Fig. 16 are the values of themaxima of the CO2 laser data. Since this laser exhibitslow dimensional chaos with a reasonable noise level, weobserve a clear linear increase in this semi-logarithmicplot, reflecting the exponential divergence of nearby tra-jectories. The exponent is λ ≈ 0.38 per iteration (mapdata!), or, when introducing the average time interval,0.007 per µs. In the bottom panel we show the result forthe same system, but now computed on the original flow-like data with a sampling rate of 1 MHz. As additionalstructure, an initial steep increase and regular oscillationsare visible. The initial increase is due to non-normalityand effects of alignment of distances towards the locallymost unstable direction, and the oscillations are an effectof the locally different velocities and thus different den-sities. Both effects can be much more dramatic in lessfavorable cases, but as long as the regular oscillationspossess a linearly increasing average, this can be takenas the estimate of the Lyapunov exponent. Normaliz-ing by the sampling rate, we again find λ ≈ 0.007 perµs, but it is obvious that the linearity is less pronouncedthen for the map-like data. Finally, we show in Fig. 17an example of a negative result: We study the humanbreath rate data used before. No linear part exists, andone cannot draw any reasonable conclusion. It is worthconsidering the figure on a doubly logarithmic scale inorder to detect a power law behavior, which, with power1/2, could be present for a diffusive growth of distances.In this particular example, there is no convincing powerlaw either.

B. The Lyapunov spectrum

The computation of the full Lyapunov spectrum re-quires considerably more effort than just the maximalexponent. An essential ingredient is some estimate ofthe local Jacobians, i.e. of the linearized dynamics, whichrules the growth of infinitesimal perturbations. One ei-ther finds it from direct fits of local linear models of thetype sn+1 = ansn+bn, such that the first row of the Jaco-bian is the vector an, and (J)ij = δi−1,j for i = 2, . . . , m,where m is the embedding dimension. The an is given by

t [s]

S(ǫ

,m,t

)

1086420

0

-1

-2

-3

FIG. 17. The breath rate data (c.f. Fig. 12) exhibit nolinear increase, reflecting the lack of exponential divergenceof nearby trajectories.

the least squares minimization σ2 =∑

l(sl+1−ansl−bn)2

where {sl} is the set of neighbors of sn [45,71]. Or oneconstructs a global nonlinear model and computes its lo-cal Jacobians by taking derivatives. In both cases, onemultiplies the Jacobians one by one, following the tra-jectory, to as many different vectors uk in tangent spaceas one wants to compute Lyapunov exponents. Everyfew steps, one applies a Gram-Schmidt orthonormaliza-tion procedure to the set of uk, and accumulates the log-arithms of their rescaling factors. Their average, in theorder of the Gram-Schmidt procedure, give the Lyapunovexponents in descending order. The routine lyap specuses this method, which goes back to [71] and [45], em-ploying local linear fits. Apart from the problem of spuri-ous exponents, this method contains some other pitfalls:It assumes that there exist well defined Jacobians, anddoes not test for their relevance. In particular, when at-tractors are thin in the embedding space, some (or all) ofthe local Jacobians might be estimated very badly. Thenthe whole product can suffer from these bad estimatesand the exponents are correspondingly wrong. Thus theglobal nonlinear approach can be superior, if a modelinghas been successful, see Sec. IV.

In Table I we show the exponents of the stroboscopicNMR laser data in a three dimensional embedding as afunction of the neighborhood size. Using global nonlin-ear models, we find the numbers given in the last tworows. More material is discussed in [2]. The spread ofvalues in the table for this rather clean data set reflectsthe difficulty of estimating Lyapunov spectra from timeseries, which has to be done with great care. In partic-ular, when the algorithm is blindly applied to data froma random process, it cannot internally check for the con-sistency of the assumption of an underlying dynamicalsystem. Therefore a Lyapunov spectrum is computedwhich now is completely meaningless.

The computation of the first part of the Lyapunovspectrum allows for some interesting cross-checks. It wasconjectured [72], and is found to be correct in most phys-ical situations, that the Lyapunov spectrum and the frac-

17

Page 18: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

method λ1 λ2 λ3

local linear k=20 0.32 -0.40 -1.13“ k=40 0.30 -0.51 -1.21“ k=160 0.28 -0.68 -1.31

radial basis functions 0.27 -0.64 -1.31polynomial 0.27 -0.64 -1.15

TABLE I. Lyapunov exponents of the NMR laser data,determined with a three-dimensional embedding. The algo-rithms described in Sec. VI A give λ1 = 0.3 ± 0.02 for thelargest exponent.

tal dimension of an attractor are closely related. If theexpanding and least contracting directions in space arecontinuously filled and only one partial dimension is frac-tal, then one can ask for the dimensionality of a (fractal)volume such that it is invariant, i.e. such that the sum ofthe corresponding Lyapunov exponents vanishes, wherethe last one is weighted with the non-integer part of thedimension:

DKY = k +

∑ki=1 λi

|λk+1|, (19)

where k is the maximum integer such that the sum ofthe k largest exponents is still non-negative. DKY isconjectured to coincide with the information dimension.

The Pesin identity is valid under the same assumptionsand allows to compute the KS-entropy:

hKS =

m∑

i=1

Θ(λi)λi . (20)

VII. DIMENSIONS AND ENTROPIES

Solutions of dissipative dynamical systems cannot filla volume of the phase space, since dissipation is syn-onymous with a contraction of volume elements underthe action of the equations of motion. Instead, trajec-tories are confined to lower dimensional subsets whichhave measure zero in the phase space. These subsets canbe extremely complicated, and frequently they possess afractal structure, which means that they are in a nontriv-ial way self-similar. Generalized dimensions are one classof quantities to characterize this fractality. The Haus-

dorff dimension is, from the mathematical point of view,the most natural concept to characterize fractal sets [67],whereas the information dimension takes into accountthe relative visitation frequencies and is therefore moreattractive for physical systems. Finally, for the charac-terization of measured data, other similar concepts, likethe correlation dimension, are more useful. One gen-eral remark is highly relevant in order to understand thelimitations of any numerical approach: dimensions char-acterize a set or an invariant measure whose support is

the set, whereas any data set contains only a finite num-ber of points representing the set or the measure. Bydefinition, the dimension of a finite set of points is zero.When we determine the dimension of an attractor nu-merically, we extrapolate from finite length scales, wherethe statistics we apply is insensitive to the finiteness ofthe number of data, to the infinitesimal scales, where theconcept of dimensions is defined. This extrapolation canfail for many reasons which will be partly discussed be-low. Dimensions are invariant under smooth transforma-tions and thus again computable in time delay embeddingspaces.

Entropies are an information theoretical concept tocharacterize the amount of information needed to pre-dict the next measurement with a certain precision. Themost popular one is the Kolmogorov-Sinai entropy. Wewill discuss here only the correlation entropy, which canbe computed in a much more robust way. The occur-rence of entropies in a section on dimensions has to dowith the fact that they can be determined both by thesame statistical tool.

A. Correlation dimension

Roughly speaking, the idea behind certain quantifiersof dimensions is that the weight p(ǫ) of a typical ǫ-ballcovering part of the invariant set scales with its diameterlike p(ǫ) ≈ ǫD, where the value for D depends also onthe precise way one defines the weight. Using the squareof the probability pi to find a point of the set inside theball, the dimension is called the correlation dimensionD2, which is computed most efficiently by the correlationsum [73]:

C(m, ǫ) =1

Npairs

N∑

j=m

k<j−w

Θ(ǫ − |sj − sk|) , (21)

where si are m-dimensional delay vectors, Npairs = (N −m + 1)(N − m − w + 1)/2 the number of pairs of pointscovered by the sums, Θ is the Heaviside step function andw will be discussed below. On sufficiently small lengthscales and when the embedding dimension m exceeds thebox-dimension of the attractor [74],

C(m, ǫ) ∝ ǫD2 , (22)

Since one does not know the box-dimension a priori, onechecks for convergence of the estimated values of D2 inm.

The literature on the correct and spurious estimationof the correlation dimension is huge and this is certainlynot the place to repeat all the arguments. The relevantcaveats and misconceptions are reviewed for example inRefs. [75,11,76,2]. The most prominent precaution is toexclude temporally correlated points from the pair count-ing by the so called Theiler window w [75]. In order to

18

Page 19: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

become a consistent estimator of the correlation integral

(from which the dimension is derived) the correlation sum

should cover a random sample of points drawn indepen-dently according to the invariant measure on the attrac-tor. Successive elements of a time series are not usuallyindependent. In particular for highly sampled flow datasubsequent delay vectors are highly correlated. Theilersuggested to remove this spurious effect by simply ig-noring all pairs of points in Eq.(21) whose time indicesdiffer by less than w, where w should be chosen gener-ously. With O(N2) pairs available, the loss of O(wN)pairs is not dramatic as long as w ≪ N . At the veryleast, pairs with j = k have to be excluded [77], sinceotherwise the strong bias to D2 = 0, the mathematicallycorrect value for a finite set of points, reduces the scalingrange drastically. Choosing w, the first zero of the auto-correlation function, sometimes even the decay time ofthe autocorrelation function, are not large enough sincethey reflect only overall linear correlations [75,76]. Thespace-time-separation plot (Sec. III B) provides a goodmeans of determining a sufficient value for w, as discussedfor example in [41,2]. In some cases, notably processeswith inverse power law spectra, inspection requires w tobe of the order of the length of the time series. Thisindicates that the data does not sample an invariant at-tractor sufficiently and the estimation of invariants likeD2 or Lyapunov exponents should be abandoned.

Parameters in the routines d2, c2, and c2naive are asusual the embedding parameters m and τ , the time de-lay, and the embedding dimension, as well as the Theilerwindow.

Fast implementation of the correlation sum have beenproposed by several authors. At small length scales, thecomputation of pairs can be done in O(N log N) or evenO(N) time rather than O(N2) without loosing any ofthe precious pairs, see Ref. [20]. However, for interme-diate size data sets we also need the correlation sum atintermediate length scales where neighbor searching be-comes expensive. Many authors have tried to limit theuse of computational resources by restricting one of thesums in Eq.(21) to a fraction of the available points. Bythis practice, however, one looses valuable statistics atthe small length scales where points are so scarce any-way that all pairs are needed for stable results. In [62],buth approaches were combined for the first time by us-ing fast neighbor search for ǫ < ǫ0 and restricting thesum for ǫ ≥ ǫ0. The TISEAN implementations c2 andd2 go one step further and select the range for the sumsindividually for each length scale to be processed. Thisturns out to give a major improvement in speed. Theuser can specify a desired number of pairs which seemslarge enough for a stable estimation of C(ǫ), typically1000 pairs will suffice. Then the sums are extended toa range which guarantees that number of pairs, or, ifthis cannot be achieved, to the whole time series. Atthe largest length scales, this range may be rather smalland the user may choose to give a minimal number ofreference points to ensure a representative average. Still,

using the program c2 the whole computation may thus atlarge scales be concentrated on the first part of the timeseries, which seems fair for stationary, non-intermittentdata (nonstationary or strongly intermittent data is usu-ally unsuitable for correlation dimension estimation any-way). The program d2 is safer with this respect. Ratherthan restricting the range of the sums, only a randomlyselected subset is used. The randomization however re-quires a more sophisticated program structure in orderto avoid an overhead in computation time.

1. Takens-Theiler estimator

Convergence to a finite correlation dimension can bechecked by plotting scale dependent “effective dimen-sions” versus length scale for various embeddings. Theeasiest way to proceed is to compute (numerically) thederivative of log C(m, ǫ) with respect to log ǫ, for exam-ple by fitting straight lines to the log-log plot of C(ǫ). InFig. 18 (a) we see the output of the routine c2 acting ondata from the NMR laser, processed by c2d in order toobtain local slopes. By default, straight lines are fittedover one octave in ǫ, larger ranges give smoother results.We can see that on the large scales, self-similarity is bro-ken due to the finite extension of the attractor, and onsmall but yet statistically significant scales we see the em-bedding dimension instead of a saturated, m-independentvalue. This is the effect of noise, which is infinite dimen-sional, and thus fills a volume in every embedding space.Only on the intermediate scales we see the desired plateau

where the results are in good approximation independentof m and ǫ. The region where scaling is established, notjust the range selected for straight line fitting, is calledthe scaling range.

Since the statistical fluctuations in plots likeFig. 18 (a) show characteristic (anti-)correlations, it hasbeen suggested [78,79] to apply a maximum likelihoodestimator to obtain optimal values for D2. The Takens-Theiler-estimator reads

DTT(ǫ) =C(ǫ)

∫ ǫ

0C(ǫ′)

ǫ′ dǫ′(23)

and can be obtained by processing the output of c2

by c2t. Since C(ǫ) is available only at discrete val-ues {ǫi, i = 0, . . . , I}, we interpolate it by a pure powerlaw (or, equivalently, the log-log plot by straight lines:log C(ǫ) = ai log ǫ + bi) in between these. The resultingintegrals can be solved trivially and summed:

∫ ǫ

0

C(ǫ′)

ǫ′dǫ′ =

I∑

i=1

ebi

∫ ǫi

ǫi−1

(ǫ′)ai−1dǫ′

=

I∑

i=1

ebi

ai(ǫai

i − ǫai

i−1) . (24)

Plotting DTT versus ǫ (Fig. 18 (b)) is an interesting al-ternative to the usual local slopes plot, Fig. 18 (a). It

19

Page 20: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

is tempting to use such an “estimator of dimension” asa black box to provide a number one might quote as adimension. This would imply the unjustified assumptionthat all deviations from exact scaling behavior is due tothe statistical fluctuations. Instead, one still has to ver-ify the existence of a scaling regime. Only then, DTT(ǫ)evaluated at the upper end of the scaling range is a rea-sonable dimension estimator.

2. Gaussian kernel correlation integral

The correlation sum Eq.(21) can be regarded as an av-erage density of points where the local density is obtainedby a kernel estimator with a step kernel Θ(ǫ−r). A natu-ral modification for small point sets is to replace the sharpstep kernel by a smooth kernel function of bandwidth ǫ.A particularly attractive case that has been studied inthe literature [80] is given by the Gaussian kernel, that

is, Θ(ǫ − r) is replaced by e−r2

4ǫ2 . The resulting Gaussiankernel correlation sum CG(ǫ) has the same scaling prop-erties as the usual C(ǫ). It has been observed in [3] thatCG(ǫ) can be obtained from C(ǫ) via

CG(ǫ) =1

2ǫ2

∫ ∞

0

dǫ e−ǫ2

4ǫ2 ǫ C(ǫ) (25)

without having to repeat the whole computation. If C(ǫ)is given at discrete values of ǫ, the integrals in Eq.(25)can be carried out numerically by interpolating C(ǫ) withpure power laws . This is done in c2g which uses a 15point Gauss-Kronrod rule for the numerical integration.

B. Information dimension

Another way of attaching weight to ǫ-balls, which ismore natural, is the probability pi itself. The result-ing scaling exponent is called the information dimensionD1. Since the Kaplan-Yorke dimension of Sec.VI is anapproximation of D1, the computation of D1 throughscaling properties is a relevant cross-check for highly de-terministic data. D1 can be computed from a modifiedcorrelation sum, where, however, unpleasant systematicerrors occur. The fixed mass approach [81] circumventsthese problems, so that, including finite sample correc-tions [77], a rather robust estimator exists. Instead ofcounting the number of points in a ball one asks here forthe diameter ǫ which a ball must have to contain a cer-tain number k of points when a time series of length Nis given. Its scaling with k and N yields the dimensionin the limit of small length scales by

D1(m) = limk/N→0

d log k/N

d〈log ǫ(k/N)〉. (26)

The routine c1 computes the (geometric) mean lengthscale exp〈log ǫ(k/N)〉 for which k neighbors are found in

(a)

ǫ

dlo

gC

(ǫ)

dlo

1000100101

5

4

3

2

1

0

(b)

ǫ

DT

T(ǫ

)

1000100101

5

4

3

2

1

0

(c)

ǫ

dlo

gC

G(ǫ

)

dlo

1000100101

5

4

3

2

1

0

(d)

〈log ǫ(k/N)〉

dlo

g(k

/N

)d〈l

og

ǫ(k/N

)〉

1000100101

5

4

3

2

1

0

FIG. 18. Dimension estimation for the (noise filtered)NMR laser data. Embedding dimensions 2 to 7 are shown.From above: (a) slopes are determined by straight line fitsto the log-log plot of the correlation sum, Eq. (21). (b)Takes-Theiler estimator of the same slope. (c) Slopes areobtained by straight line fits to the Gaussian kernel correla-tion sum, Eq.(25). (d) Instead of the correlation dimension,it has been attempted to estimate the information dimension.

20

Page 21: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

N data points, as a function of k/N . Unlike the corre-lation sum, finite sample corrections are necessary if k issmall [77]. Essentially, the log of k has to be replaced bythe digamma function Ψ(k). The resulting expression isimplemented in c1. Given m and τ , the routine varies kand N such that the largest reasonable range of k/N iscovered with moderate computational effort. This meansthat for 1/N ≤ k/N ≤ K/N (default: K = 100), all Navailable points are searched for neighbors and k is var-ied. For K/N < k/N ≤ 1, k = K is kept fixed and N isdecreased. The result for the NMR laser data is shown inFig. 18 (d), where a nice scaling with D1 ≈ 1.35 can bediscerned. For comparability, the logarithmic derivativeof k/N is plotted versus exp〈log ǫ(k, N)〉 and not viceversa, although k/N is the independent variable. Oneeasily detects again the violations of scaling discussedbefore: Cut-off on the large scales, noise on small scales,fluctuations on even smaller scales, and a scaling rangein between. In this example, D1 is close to D2, and mul-tifractality cannot be established positively.

1. Entropy estimates

The correlation dimension characterizes the ǫ depen-dence of the correlation sum inside the scaling range. It isnatural to ask what we can learn form its m-dependence,once m is larger than D0. The number of ǫ-neighborsof a delay vector is an estimate of the local probabilitydensity, and in fact it is a kind of joint probability: All m-components of the neighbor have to be similar to those ofthe actual vector simultaneously. Thus when increasingm, joint probabilities covering larger time spans get in-volved. The scaling of these joint probabilities is relatedto the correlation entropy h2, such that

C(m, ǫ) ≈ ǫD2e−mh2 , (27)

As for the scaling in ǫ, also the dependence on m is validonly asymptotically for large m, which one will not reachdue to the lack of data points. So one will study h2(m)versus m and try to extrapolate to large m. The corre-lation entropy is a lower bound of the Kolmogorov Sinaientropy, which in turn can be estimated by the sum ofthe positive Lyapunov exponents. The program d2 pro-duces as output the estimates of h2 directly, from theother correlation sum programs it has to be extracted bypost-processing the output.

The entropies of first and second order can be derivedfrom the output of c1 and c2 respectively. An alternatemeans of obtaining these and the other generalized en-tropies is by a box counting approach. Let pi be theprobability to find the system state in box i, then theorder q entropy is defined by the limit of small box sizeand large m of

i

pqi ≈ e−mhq . (28)

To evaluate∑

i pqi over a fine mesh of boxes in m ≫ 1

dimensions, economical use of memory is necessary: Asimple histogram would take (1/ǫ)m storage. Thereforethe program boxcount implements the mesh of boxes as atree with (1/ǫ)-fold branching points. The tree is workedthrough recursively so that at each instance at most onecomplete branch exists in storage. The current versiondoes not implement finite sample corrections to Eq.(28).

VIII. TESTING FOR NONLINEARITY

Most of the methods and quantities discussed so far aremost appropriate in cases where the data show strong andconsistent nonlinear deterministic signatures. As soon asmore than a small or at most moderate amount of addi-tive noise is present, scaling behavior will be broken andpredictability will be limited. Thus we have exploredthe opposite extreme, nonlinear and fully deterministic,rather than the classical linear stochastic processes. Thebulk of real world time series falls in neither of these lim-iting categories because they reflect nonlinear responsesand effectively stochastic components at the same time.Little can be done for many of these cases with currentmethods. Often it will be advisable to take advantage ofthe well founded machinery of spectral methods and ven-ture into nonlinear territory only if encouraged by pos-itive evidence. This section is about methods to estab-lish statistical evidence for nonlinearity beyond a simplerescaling in a time series.

A. The concept of surrogate data

The degree of nonlinearity can be measured in sev-eral ways. But how much nonlinear predictability, say,is necessary to exclude more trivial explanations? Allquantifiers of nonlinearity show fluctuations but the dis-tributions, or error bars if you wish, are not availableanalytically. It is therefore necessary to use Monte Carlotechniques to assess the significance of results. One im-portant method in this context is the method of surrogatedata [82]. A null hypothesis is formulated, for examplethat the data has been created by a stationary Gaussianlinear process, and then it is attempted to reject thishypothesis by comparing results for the data to appro-priate realizations of the null hypothesis. Since the nullassumption is not a simple one but leaves room for freeparameters, the Monte Carlo sample has to take theseinto account. One approach is to construct constrained

realizations of the null hypothesis. The idea is that thefree parameters left by the null are reflected by specificproperties of the data. For example the unknown coeffi-cients of an autoregressive process are reflected in the au-tocorrelation function. Constrained realizations are ob-tained by randomizing the data subject to the constraintthat an appropriate set of parameters remains fixed. For

21

Page 22: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

x(t − 0.5 s)

x(t

)

10

1

0

x(t − 0.5 s)

x(t

)

10

1

0

FIG. 19. Upper: The human breath rate data from Fig. 12.Lower: the noise component extracted by the noise reductionscheme has been randomized in order to destroy correlationswith the signal. The result appears slightly but significantlyless structured than the original.

example, random data with a given periodogram can bemade by assuming random phases and taking the inverseFourier transform of the given periodogram. Randomdata with the same distribution as a given data set canbe generated by permuting the data randomly withoutreplacement. Asking for a given spectrum and a givendistribution at the same time poses already a much moredifficult question.

B. Iterative Fourier transform method

Very few real time series which are suspected to shownonlinearity follow a Gaussian single time distribution.Non-Gaussianity is the simplest kind of nonlinear signa-ture but it may have a trivial reason: The data may havebeen distorted in the measurement process. Thus a pos-sible null hypothesis would be that there is a stationaryGaussian linear stochastic process that generates a se-quence {xn}, but the actual observations are sn = s(xn)where s(·) is a monotonic function. Constrained realiza-tions of this null hypothesis would require the genera-

tion of random sequences with the same power spectrum(fully specifying the linear process) and the same singletime distribution (specifying the effect of the measure-ment function) as the observed data. The AmplitudeAdjusted Fourier Transform (AAFT) method proposedin [82] attempts to invert the measurement function s(·)by rescaling the data to a Gaussian distribution. Thenthe Fourier phases are randomized and the rescaling isinverted. As discussed in [83], this procedure is biasedtowards a flatter spectrum since the inverse of s(·) is notavailable exactly. In the same reference, a scheme is in-troduced that removes this bias by iteratively adjustingthe spectrum and the distribution of the surrogates. Al-ternatingly, the surrogates are rescaled to the exact val-ues taken by the data and then the Fourier transform isbrought to the exact amplitudes obtained from the data.The discrepancy between both steps either converges tozero with the number of iterations or to a finite inaccu-racy which decreases with the length of the time series.The program surrogates performs iterations until nofurther improvement can be made. The last two stagesare returned, one having the exact Fourier amplitudesand one taking on the same values as the data. For nottoo exotic data these two versions should be almost iden-tical. The relative discrepancy is also printed.

In Fig. 19 we used this procedure to assess the hy-pothesis that the noise reduction on the breath data re-ported in Fig. 12 removed an additive noise componentwhich was independent of the signal. If the hypothesiswere true, we could equally well add back on the noisesequence or a randomized version of it which lacks anycorrelations to the signal. In the upper panel of Fig. 19we show the original data. In the lower panel we took thenoise reduced version (c.f. Fig. 12, bottom) and added asurrogate of the supposed noise sequence. The result issimilar but still significantly different from the originalto make the additivity assumption implausible.

Fourier based randomization schemes suffer from somecaveats due to the the inherent assumption that the dataconstitutes one period of a periodic signal, which is notwhat we really expect. The possible artefacts are dis-cussed for example in [84] and can, in summary, lead tospurious rejection of the null hypothesis. One precautionthat should be taken when using surrogates is to makesure that the beginning and the end of the data approx-imately match in value and phase. Then, the periodicityassumption is not too far wrong and harmless. Usually,this amounts to the loss of a few points of the series. Oneshould note, however, that the routine may truncate thedata by a few points itself in order to be able to performa fast Fourier transform which requires the number ofpoints to be factorizable by small prime factors.

22

Page 23: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

2000150010005000

500

-50

500

-50

500

-50

FIG. 20. Upper trace: Data from a stationary Gaussianlinear stochastic process (xn = 0.7xn−1 + ηn) measured bys(xn) = x3

n. Samples 200-220 are an artefact. With the

Fourier based scheme (middle trace) the artefact results in anincreased number of spikes in the surrogates and reduced pre-dictability. In the lower trace, the artefact has been preservedalong with the distribution of values and lags 1, . . . , 25 of theautocorrelation function.

C. General constrained randomization

In [85], a general method has been proposed to createrandom data which fulfill specified constraints. With thismethod, the artefacts and remaining imprecision of theFourier based randomization schemes can be avoided byspecifying the autocorrelation function rather than theFourier transform. The former does not assume periodiccontinuation. Maybe more importantly, the restrictionto a rather narrow null hypothesis can be relaxed sincein principle arbitrary statistical observables can be im-posed on the surrogates. A desired property of the datahas to be formulated in terms of a cost function whichassumes an absolute minimum when the property is ful-filled. States arbitrarily close to this minimal cost can bereached by the method of simulated annealing. The costfunction is minimised among all possible permutations ofthe data. See [85] for a description of the approach.

The TISEAN package contains the building blocksfor a library of surrogate data routines implementinguser specified cost functions. Currently, only the au-tocorrelation function with and without periodic con-tinuation have been implemented. Further, a templateis given from which the user may derive her/his ownroutines. A module is provided that drives the simu-lated annealing process through an exponential coolingscheme. The user may replace this module by otherscheme of her/his choice. A module that performs ran-dom pair permutations is given which allows to excludea list of points from the permutation scheme. More so-phisticated permutation schemes can be substituted ifdesired. Most importantly, the cost function has to begiven as another module. The autocorrelation modulesuse maxτmax

τ=1 |C(τ) − C(τ)data|/τ , where C(τ) is the au-

(a)

10-1

1

0

-1

(b)

10-1

1

0

-1

(c)

10-1

1

0

-1

(d)

10-1

1

0

-1

(e)

10-1

1

0

-1

(f)

10-1

1

0

-1

FIG. 21. Randomization of 500 points generated by the theHenon map. (a) Original data; (b) Same autocorrelationsand distribution; (c)-(f) Different stages of annealing with acost function C involving three and four-point correlations.(c) A random shuffle, C = 2400; (d) C = 150; (e) C = 15;(f) C = 0.002. See text.

tocorrelation function with or without periodic continu-ation.

In Fig. 20 we show an example fulfilling the null hy-pothesis of a rescaled stationary Gaussian linear stochas-tic process which has been contaminated by an artefactat samples 200-220. The Fourier based schemes are un-able to implement the artefact part of the null hypoth-esis. They spread the structure given by the artefactevenly over the whole time span, resulting in more spikesand less predictability. In fact, the null hypothesis ofa stationary rescaled Gaussian linear stochastic processcan be rejected at the 95% level of significance usingnonlinear prediction errors. The artefact would spuri-ously be mistaken for nonlinearity. With the programrandomize auto exp random, we can exclude the arte-fact from the randomization scheme and obtain a correcttest.

23

Page 24: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

As an example of a more exotic cost function, let usshow the randomization of 500 iterates of the Henon map,Fig. 21 (a). Panel (b) shows the output of surrogateshaving the same spectrum and distribution. Startingfrom a random permutation (c), the cost function

C = 〈xn−1xn〉 + 〈xn−2xn〉

+ 〈x2n−1xn〉 + 〈xn−1x

2n〉 + 〈x2

n−2xn〉 + 〈xn−2xn−1xn〉

+ 〈x2n−1x

2n〉 + 〈xn−1x

3n〉 + 〈x3

n−1xn〉 (29)

is minimized (randomize generic exp random). It in-volves are all the higher order autocorrelations whichwould be needed for a least squares fit with the ansatzxn = c − ax2

n−1 + bxn−2 and in this sense fully spec-ifies the quadratic structure of the data. The randomshuffle yields C = 2400, panels (c)-(f) correspond toC = 150, 15, 0.002 respectively.

Since the annealing process can be very CPU time con-suming, it is important to provide efficient code for thecost function. Specifying τmax lags for N data pointsrequires O(Nτmax) multiplications for the calculation ofthe cost function. An update after a pair has been ex-changed, however, can be obtained with O(τmax) mul-tiplications. Often, the full sum or supremum can betruncated since after the first terms it is clear that a largeincrease of the cost is unavoidable. The driving Metropo-lis algorithm provides the current maximal permissablecost for that purpose.

The computation time required to reach the desired ac-curacy depends on the choice and implementation of thecost function but also critically on the annealing sched-ule. There is a vast literature on simulated annealingwhich cannot be reviewed here. Experimentation withcooling schemes should keep in mind the basic conceptof simulated annealing. At each stage, the system –here the surrogate to be created – is kept at a certain“temperature”. Like in thermodynamics, the tempera-ture determines how likely fluctuations around the meanenergy – here the value of the cost function C – are. Attemperature T , a deviation of size ∆C occurs with theBoltzmann probability ∝ exp(−∆C/T ). In a Metropo-lis simulation, this is achieved by accepting all downhillchanges (∆C < 0), but also uphill changes with proba-bility exp(−∆C/T ). Here the changes are permutationsof two randomly selected data items. The present im-plementation offers an exponential cooling scheme, thatis, the temperature is lowered by a fixed factor wheneverone of two conditions is fulfilled: Either a specified num-ber of changes has been tried, or a specified number ofchanges has been accepted. Both these numbers and thecooling factor can be chosen by the user. If the state iscooled too fast it gets stuck, or “freezes” in a false mini-mum. When this happens, the system must be “melted”again and cooling is taken up at a slower rate. This canbe done automatically until a goal accuracy is reached.It is, however, difficult to predict how many steps it willtake. The detailed behavior of the scheme is still sub-ject to ongoing research and in all but the simplest cases,

experimentation by the user will be necessary. To fa-cilitate the supervision of the cooling, the current stateis written to a file whenever a substantial improvementhas been made. Further, the verbosity of the diagnosticoutput can be selected.

D. Measuring weak nonlinearity

When testing for nonlinearity, we would like to usequantifiers which are optimized for the weak nonlinear-ity limit, which is not what most time series methods ofchaos theory have been designed for. The simple nonlin-ear prediction scheme (Sec. IVB) has proven quite use-ful in this context. If used as a comparative statistic, itshould be noted that sometimes seemingly inadequateembeddings or neighborhood sizes may lead to ratherbig errors which have, however, small fluctuations. Thetradeoff between bias and variance may be different fromthe situation where predictions are desired per se. Thesame rationale applies to quantities derived from the cor-relation sum. Neither the small scale limit, genuine scal-ing, or the Theiler correction are formally necessary ina comparative test. However, any temptation to inter-pret the results in terms like “complexity” or “dimen-sionality” should be resisted, even though “complexity”doesn’t seem to have an agreed-upon meaning anyway.Apart from average prdiction errors, we have found thestabilities of short periodic orbits (see Sec. IVC) usefulfor the detectionof nonlinearity in surrogate data tests.As an alternative to the phase space based methods, moretraditional measures of nonlinearity derived from higherorder autocorrelation functions ( [86], routine autocor3)may also be considered. If a time-reversal asymmetry ispresent, its statistical confirmation (routine timerev) isa very powerful detector of nonlinearity [87]. Some mea-sures of weak nonlinearity are compared systematicallyin Ref. [88].

IX. CONCLUSION AND PERSPECTIVES

The TISEAN project makes available a number of al-gorithms of nonlinear time series analysis to people inter-ested in applications of the dynamical systems approach.To make proper use of these algorithms, it is not essen-tial to have witten the programs from scratch, an effortwe intend to spare the user by making TISEAN public.Indispensable, however, is a good knowledge of what theprograms do, and why they do what they do. The latterrequires a thorough background in the nonlinear time se-ries approach which cannot be provided by this paper butrather by textbooks like Refs. [10,2], reviews [11,12,3],and the original literature [9]. Here, we have concen-trated on the actual implementation as it is realized inTISEAN and on examples of the concrete use of the pro-grams.

24

Page 25: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

A. Important methods which are (still) missing

Let us finish the discussion by giving some perspec-tives on future work. So far, the TISEAN project hasconcentrated on the most common situation of a singletime series. While for multiple measurements of similarnature most programs can be modified with moderate ef-fort, a general framework for heterogeneous multivariaterecordings (say, blood pressure and heart beat) has notbeen established so far in a nonlinear context. Never-theless, we feel that concepts like generalized synchrony,coherence, or information flow are well worth pursuingand at some point should become available to a widercommunity, including applied research.

Initial experience with nonlinear time series methodsindicates that some of the concepts may prove usefulenough in the future to become part of the establishedtime series tool box. For this to happen, availability ofthe algorithms and reliable information on their use willbe essential. The publication of a substantial collectionof research level programs through the TISEAN projectmay be seen as one step in that direction. However, thepotential user will still need considerable experience inorder to make the right decisions – about the suitabilityof a particular method for a specific time series, about theselection of parameters, about the interpretation of theresults. To some extent, these decisions could be guidedby software that evaluates the data situation and the re-sults automatically. Previous experience with black boxdimension or Lyapunov estimators has not been encour-aging, but for some specific problems, “optimal” answerscan in principle be defined and computed automatically,once the optimality criterion is formulated. For exam-ple, the prediction programs could be encapsulated in aframework that automatically evaluates the performancefor a range of embedding parameters etc. Of course,quantitative assessment of the results is not always easyto implement and depends on the purpose of the study.As another example, it seems realistic to define “opti-mal” Poincare surfaces of section and to find the optimalsolutions numerically.

Like in most of the time series literature, the issue ofstationarity has entered the discussion only as somethingthe lack of which has to be detected in order to avoidspurious results. Taking this point seriously amounts torejecting a substantial fraction of time series problems,including the most prominent examples, that is, mostdata from finance, metereology, and biology. It is quiteclear that the mere rejection of these challenging prob-lems is not satisfactory and we will have to develop toolsto actually analyse, understand, and predict nonstation-ary data. Some suggestions have been made for the de-tection of fluctuating control parameters [89–92]. Most ofthese can be seen as continuous versions of the classifica-tion problem, another application which is not properlyrepresented in TISEAN yet.

Publishing software, or reviews and textbooks for that

matter, in a field evolving as rapidly as nonlinear time se-ries analysis will always have the character of a snapshotof the state at a given time. Having the options eitherto wait until the field has saturated sufficiently or to riskthat programs, or statements made, will become obsoletesoon, we chose the second option. We hope that we canthus contibute to the further evolution of the field.

ACKNOWLEDGMENTS

We wish to thank Eckehard Olbrich, Marcus Richter,and Andreas Schmitz who have made contributions tothe TISEAN project, and the users who patiently copedwith early versions of the software, in particular UlrichHermes. We thank Leci Flepp, Nick Tufillaro, RiccardoMeucci, and Marco Ciofini for letting us use their timeseries data. This work was supported by the SFB 237 ofthe Deutsche Forschungsgemeinschaft.

[1] The TISEAN software package is publicly availableat http://www.mpipks-dresden.mpg.de/˜ tisean. Thedistribution includes an online documentation system.

[2] H. Kantz and T. Schreiber, “Nonlinear Time Series Anal-ysis”. Cambridge University Press, Cambridge (1997).

[3] T. Schreiber, Interdisciplinary application of nonlineartime series methods, to appear in Phys. Reports (1998).

[4] D. Kaplan and L. Glass, “Understanding Nonlinear Dy-namics”, Springer, New York (1995).

[5] E. Ott, “Chaos in Dynamical Systems”, Cambridge Uni-versity Press, Cambridge (1993).

[6] P. Berge, Y. Pomeau, and C. Vidal, “Order WithinChaos: Towards a deterministic approach to turbulence”,Wiley, New York (1986).

[7] H.-G. Schuster, “Deterministic Chaos: An introduction”.Physik Verlag, Weinheim (1988).

[8] A. Katok and B. Hasselblatt “Introduction to the Mod-ern Theory of Dynamical Systems”, Cambridge Univer-sity Press, Cambridge (1996).

[9] E. Ott, T. Sauer, and J. A. Yorke, “Coping with Chaos”,Wiley, New York (1994).

[10] H. D. I. Abarbanel, “Analysis of Observed ChaoticData”, Springer, New York (1996).

[11] P. Grassberger, T. Schreiber, and C. Schaffrath, Non-linear time sequence analysis, Int. J. Bifurcation andChaos 1, 521 (1991).

[12] H. D. I. Abarbanel, R. Brown, J. J. Sidorowich, andL. Sh. Tsimring, The analysis of observed chaotic datain physical systems, Rev. Mod. Phys. 65, 1331 (1993).

[13] D. Kugiumtzis, B. Lillekjendlie, n. Christophersen,Chaotic time series I, Modeling, Identification and Con-trol 15, 205 (1994).

[14] D. Kugiumtzis, B. Lillekjendlie, n. Christophersen,

25

Page 26: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

Chaotic time series II, Modeling, Identification and Con-trol 15, 225 (1994).

[15] G. Mayer-Kress, ed., “Dimensions and Entropies inChaotic Systems”, Springer, Berlin (1986).

[16] M. Casdagli and S. Eubank, eds., “Nonlinear Model-ing and Forecasting”, Santa Fe Institute Studies in theScience of Complexity, Proc. Vol. XII, Addison-Wesley,Reading, MA (1992).

[17] A. S. Weigend and N. A. Gershenfeld, eds., “Time SeriesPrediction: Forecasting the future and understanding thepast”, Santa Fe Institute Studies in the Science of Com-plexity, Proc. Vol. XV, Addison-Wesley, Reading, MA(1993).

[18] J. Belair, L. Glass, U. an der Heiden, and J. Milton, eds.,“Dynamical Disease”, AIP Press (1995).

[19] H. Kantz, J. Kurths, and G. Mayer-Kress, eds., “Non-linear analysis of physiological data”, Springer, Berlin(1998).

[20] T. Schreiber, Efficient neighbor searching in nonlineartime series analysis, Int. J. Bifurcation and Chaos 5, 349(1995).

[21] F. Takens, “Detecting Strange Attractors in Turbu-lence”, Lecture Notes in Math. Vol. 898, Springer, NewYork (1981).

[22] T. Sauer, J. Yorke, and M. Casdagli, Embedology, J. Stat.Phys. 65, 579 (1991).

[23] M. Richter and T. Schreiber, Phase space embedding ofelectrocardiograms, to appear in Phys. Rev. E (1998)

[24] M. Casdagli, S. Eubank, J. D. Farmer, and J. Gibson,State space reconstruction in the presence of noise, Phys-ica D 51, 52 (1991).

[25] A. M. Fraser and H. L. Swinney, Independent coordinatesfor strange attractors from mutual information, Phys.Rev. A 33, 1134 (1986).

[26] B. Pompe, Measuring statistical dependences in a timeseries, J. Stat. Phys. 73, 587 (1993).

[27] M. Palus, Testing for nonlinearity using redundancies:Quantitative and qualitative aspects, Physica D 80, 186(1995).

[28] M. B. Kennel, R. Brown, and H. D. I. Abarbanel, Deter-mining embedding dimension for phase-space reconstruc-tion using a geometrical construction, Phys. Rev. A 45,3403 (1992).

[29] http://hpux.cs.utah.edu/hppd/hpux/Physics/

embedding-26.May.93

[30] http://www.zweb.com/apnonlin/

[31] I. T. Jolliffe, “Principal component analysis”, Springer,New York (1986).

[32] D. Broomhead and G. P. King, Extracting qualitativedynamics from experimental data, Physica D 20, 217(1986).

[33] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T.Vetterling, “Numerical Recipes”, 2nd edn., CambridgeUniversity Press, Cambridge (1992).

[34] R. Vautard, P. Yiou, and M. Ghil, Singular-spectrumanalysis: a toolkit for short, noisy chaotic signals, Phys-ica D 58, 95 (1992).

[35] A. Varone, A. Politi, and M. Ciofini, CO2 laser with feed-back, Phys. Rev. A 52, 3176 (1995).

[36] R. Hegger and H. Kantz, Embedding of sequences of time

intervals, Europhys. Lett. 38, 267 (1997).[37] J. P. Eckmann, S. Oliffson Kamphorst, and D. Ruelle,

Recurrence plots of dynamical systems, Europhys. Lett.4, 973 (1987).

[38] M. Casdagli, Recurrence plots revisited, Physica D 108,206 (1997).

[39] N. B. Tufillaro, P. Wyckoff, R. Brown, T. Schreiber, andT. Molteno, Topological time series analysis of a stringexperiment and its synchronized model, Phys. Rev. E 51,164 (1995).

[40] http://homepages.luc.edu/˜ cwebber/

[41] A. Provenzale, L. A. Smith, R. Vio, and G. Murante, Dis-tinguishing between low-dimensional dynamics and ran-domness in measured time series, Physica D 58, 31(1992).

[42] H. Tong, “Threshold Models in Non-Linear Time SeriesAnalysis”, Lecture Notes in Statistics Vol. 21, Springer,New York (1983).

[43] A. Pikovsky, Discrete-time dynamic noise filtering, Sov.J. Commun. Technol. Electron. 31, 81 (1986).

[44] G. Sugihara and R. May, Nonlinear forecasting as a wayof distinguishing chaos from measurement errors in timeseries, Nature 344, 734 (1990); Reprinted in [9].

[45] J.-P. Eckmann, S. Oliffson Kamphorst, D. Ruelle, and S.Ciliberto, Lyapunov exponents from a time series, Phys.Rev. A 34, 4971 (1986); Reprinted in [9].

[46] J. D. Farmer and J. Sidorowich, Predicting chaotic timeseries, Phys. Rev. Lett. 59, 845 (1987); Reprinted in [9].

[47] D. Auerbach, P. Cvitanovic, J.-P. Eckmann, G. Gu-naratne, and I. Procaccia, Exploring chaotic motionthrough periodic orbits, Phys. Rev. Lett. 58, 2387 (1987).

[48] O. Biham and W. Wenzel, Characterization of unstableperiodic orbits in chaotic attractors and repellers, Phys.Rev. Lett. 63, 819 (1989).

[49] P. So, E. Ott, S. J. Schiff, D. T. Kaplan, T. Sauer, andC. Grebogi, Detecting unstable periodic orbits in chaoticexperimental data, Phys. Rev. Lett. 76, 4705 (1996).

[50] P. Schmelcher, and F. K. Diakonos, A general approachto the finding of unstable periodic orbits in chaotic dy-namical systems, Physical Review E 57, 2739 (1998).

[51] D. Kugiumtzis, O. C. Lingjærde, and N. Christophersen,Regularized local linear prediction of chaotic time series,Physica D 112 (1998) 344.

[52] L. Jaeger and H. Kantz, Unbiased reconstruction of thedynamics underlying a noisy chaotic time series, CHAOS6 (1996) 440.

[53] M. Casdagli, Chaos and deterministic versus stochasticnonlinear modeling, J. Roy. Stat. Soc. 54, 303 (1991).

[54] D. Broomhead and D. Lowe, Multivariable functional in-terpolation and adaptive networks, Complex Syst. 2, 321(1988).

[55] L. A. Smith, Identification and prediction of low dimen-sional dynamics, Physica D 58, 50 (1992).

[56] M. Casdagli, Nonlinear prediction of chaotic time series,Physica D 35, 335 (1989); Reprinted in [9].

[57] E. J. Kostelich and T. Schreiber, Noise reduction inchaotic time series data: A survey of common methods,Phys. Rev. E 48, 1752 (1993).

[58] M. E. Davies, Noise reduction schemes for chaotic timeseries, Physica D 79, 174 (1994).

26

Page 27: Rainer Hegger, Holger Kantz Thomas Schreiber - arXiv · 2008-02-05 · Rainer Hegger, Holger Kantz Max Planck Institute for Physics of Complex Systems, No¨thnitzer Str. 38, D–01187

[59] T. Schreiber, Extremely Simple Nonlinear Noise Reduc-tion Method, Phys. Rev. E 47, 2401 (1993).

[60] D. R. Rigney, A. L. Goldberger, W. Ocasio, Y. Ichimaru,G. B. Moody, and R. Mark, Multi-channel physiologicaldata: Description and analysis (Data set B), in [17].

[61] P. Grassberger, R. Hegger, H. Kantz, C. Schaffrath, andT. Schreiber, On noise reduction methods for chaoticdata, CHAOS 3, 127 (1993); Reprinted in [9].

[62] H. Kantz, T. Schreiber, I. Hoffmann, T. Buzug, G. Pfis-ter, L. G. Flepp, J. Simonet, R. Badii, and E. Brun,Nonlinear noise reduction: a case study on experimentaldata, Phys. Rev. E 48, 1529 (1993).

[63] M. Finardi, L. Flepp, J. Parisi, R. Holzner, R. Badii, andE. Brun, Topological and metric analysis of heterocliniccrises in laser chaos, Phys. Rev. Lett. 68, 2989 (1992).

[64] A. I. Mees and K. Judd, Dangers of geometric filtering,Physica D 68 427 (1993).

[65] T. Schreiber and M. Richter, Nonlinear projective filter-ing in a data stream Wuppertal preprint (1998).

[66] M. Richter, T. Schreiber, and D. T. Kaplan, Fetal ECGextraction with nonlinear phase space projections, IEEETrans. Bio-Med. Eng. 45, 133 (1998).

[67] J.-P. Eckmann and D. Ruelle, Ergodic theory of chaosand strange attractors, Rev. Mod. Phys. 57, 617 (1985).

[68] R. Stoop and J. Parisi, Calculation of Lyapunov ex-ponents avoiding spurious elements, Physica D 50, 89(1991).

[69] H. Kantz, A robust method to estimate the maximal Lya-punov exponent of a time series, Phys. Lett. A 185, 77(1994).

[70] M. T. Rosenstein, J. J. Collins, C. J. De Luca, A practicalmethod for calculating largest Lyapunov exponents fromsmall data sets, Physica D 65, 117 (1993).

[71] M. Sano and Y. Sawada, Measurement of the Lyapunovspectrum from a chaotic time series, Phys. Rev. Lett. 55,1082 (1985).

[72] J. Kaplan and J. Yorke Chaotic behavior of multidimen-sional difference equations In Peitgen, H. O. & Walther,H. O., editors, “Functional Differential Equations andApproximation of Fixed Points” Springer, New York(1987).

[73] P. Grassberger and I. Procaccia, Physica D 9, 189 (1983).[74] T. Sauer and J. Yorke, How many delay coordinates do

you need?, Int. J. Bifurcation and Chaos 3, 737 (1993).[75] J. Theiler, J. Opt. Soc. Amer. A 7, 1055 (1990).[76] H. Kantz and T. Schreiber, CHAOS 5, 143 (1995);

Reprinted in [18].[77] P. Grassberger, Finite sample corrections to entropy and

dimension estimates, Phys. Lett. A 128, 369 (1988).[78] F. Takens, in: B. L. J. Braaksma, H. W. Broer, and

F. Takens, eds., “Dynamical Systems and Bifurcations”,Lecture Notes in Math. Vol. 1125, Springer, Heidelberg(1985).

[79] J. Theiler, Lacunarity in a best estimator of fractal di-mension, Phys. Lett. A 135, 195 (1988).

[80] J. M. Ghez and S. Vaienti, Integrated wavelets on fractalsets I: The correlation dimension, Nonlinearity 5, 777(1992).

[81] R. Badii and A. Politi Statistical description of chaoticattractors, J. Stat. Phys. 40, 725 (1985).

[82] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, andJ. D. Farmer, Testing for nonlinearity in time series:The method of surrogate data, Physica D 58, 77 (1992);Reprinted in [9].

[83] T. Schreiber and A. Schmitz, Improved surrogate data fornonlinearity tests, Phys. Rev. Lett. 77, 635 (1996).

[84] J. Theiler, P. S. Linsay, and D. M. Rubin, Detecting non-linearity in data with long coherence times, in [17].

[85] T. Schreiber, Constrained randomization of time seriesdata, Phys. Rev. Lett. 80 (1998) 2105.

[86] T. Subba Rao and M. M. Gabr, “An Introduction toBispectral Analysis and Bilinear Time Series Models”,Lecture Notes in Statistics Vol. 24, Springer, New York(1984).

[87] C. Diks, J. C. van Houwelingen, F. Takens, and J. De-Goede, Reversibility as a criterion for discriminatingtime series, Phys. Lett. A 201, 221 (1995).

[88] T. Schreiber and A. Schmitz, Discrimination power ofmeasures for nonlinearity in a time series, Phys. Rev. E55, 5443 (1997).

[89] J. Kadtke, Classification of highly noisy signals usingglobal dynamical models, Phys. Lett. A 203, 196 (1995).

[90] R. Manuca and R. Savit, Stationarity and nonstationar-ity in time series analysis, Physica D 99, 134 (1996).

[91] M. C. Casdagli, L. D. Iasemidis, R. S. Savit, R. L.Gilmore, S. Roper, and J. C. Sackellares, Non-linearity ininvasive EEG recordings from patients with temporal lobeepilepsy, Electroencephalogr. Clin. Neurophysiol. 102, 98(1997).

[92] T. Schreiber, Detecting and analysing nonstationarity ina time series using nonlinear cross predictions, Phys.Rev. Lett. 78, 843 (1997).

27


Recommended