Statistical Genetics and Gaussian Stochastic ProcessesPart I: Statistical genetic papers employing
stochastic process theory. Lange, Kirkpatrick and colleagues,
Blangero, Pletcher and colleagues.
Part II: Mathematical papers on Gaussian stochastic processes, particularly on the special class known as the OrnsteinUhlenbeck process.
Doob, Chaturvedi, Serfozo, DasGupta
American Journal of Medical Genetics 24:483491 (1986)
Cohabitation, Convergence, and Environmental Covariances
Kenneth Lange
Department of Biomathematics, School of Medicine, University of California, Los Angeles
Temporal variation in traits has long been a central theme in epidemiology. However, human geneticists have largely avoided this topic. Recently, several authors have shown how temporal variation in relativetorelative covariances can be accommodated within the framework of variance components analysis. The present paper attempts to clarify the mathematics implicit in their approach. A stochastic mechanism is discussed that causes covariances to converge or diverge exponentially fast as relatives cohabit or lead separate lives.
Key words: environmental covariances, temporal variation, stochastic processes
INTRODUCTION
In a recent series of papers, Hopper and Mathews [1982, 19831 and Hopper and Culross [ 19831 have introduced some valuable new notions into variance components analysis of pedigree data. One of their innovations has been an explicit parameteriza tion of how the trait covariance of two people varies as a function of their cohabitation history. The solution of these authors is surprisingly simple, intuitively appealing, and potentially widely useful. The current paper is an attempt to provide a rigorous foundation for their approach and to clarify its range of validity.
As often happens as science becomes more and more splintered, it is possible to adapt an appropriate mathematical model from another discipline. In this case, physics and communication engineering offer just the right ideas. My purpose here is not to develop new mathematics or statistics but to reinterpret an existing model. This reinterpretation involves viewing the temporal changes contributed by the environ ment to a quantitative trait as evolving according to an OrnsteinUhlenbeck diffusion process. When several people are simultaneously followed, the process is both multidimensional and nonhomogeneous in time. Although the mathematics for such
Received for publication June 28, 1985; revision received September 9, 1985
Address reprint requests to Kenneth Lange, Department of Biomathematics, School of Medicine, University of California, Los Angeles, CA 90024.
0 1986 Alan R. Liss, Inc.
484 Lange
processes is a little esoteric, explicit and intuitively reasonable results emerge. For instance, the environmental covariance between two relatives who separate converges exponentially fast to 0. On the other hand, if they reunite, then it converges exponen tially fast to a limiting positive value that depends on their degree of propinquity. In much the same vein, covariances between present and past trait values show an exponential decay to 0, thus raising some interesting possibilities for the modeling and analysis of longitudinal data.
In practice, the OrnsteinUhlenbeck model should provide more accurate esti mates of heritability and more insight into the temporal plasticity of a trait. It is relatively parsimonious in the number of necessary parameters. These parameters can be viewed as covariance components in an overall model that also includes genetic components. The number of parameters will depend on the detail with which the cohabitation histories of pairs of individuals are tracked.
The principal drawback of the model is the extra burden of data acquisition that it imposes. Whether investigators want to carry this burden will depend on the importance of the trait and on their intuitive judgment about the relevance of the model. It should be emphasized that the model is phenomenological and not mechan istic. The physical motivation in the next section offers at most an analogy to how environmental effects could operate. If the model does not accurately capture trait covariances, then it should not be applied. This might be the case in the presence of cultural inheritance, geneenvironment interaction, or assortative mating [Cavalli Sforza and Feldman, 1981; Cloninger et al, 1979; Karlin, 1979; Rao et al, 19791. However, it is clear to me that the model offers a valuable paradigm for many traits.
In addition to the works of Hopper and Mathews (1982, 19831 and Hopper and Culross [1983], authors like Eaves et a1 [1978] and Province and Rao [1985] have pursued largely descriptive models of how age affects the correlations between relatives. The soon to be published paper of Eaves et a1 [1986] apparently presents some interesting parallels to the current paper.
MODEL FORMULATION
The fundamental units of observation for the model are pedigrees. The term pedigree should be interpreted loosely. For example, adopted children can constitute valid members of a pedigree even though they are not related to anyone else in the pedigree. The crucial distinction is that two individuals from different pedigrees always have independent realizations of the trait of interest. Let us now single out a pedigree of n people. Suppose Ztj represents some measurable quantitative trait for the jth person of this pedigree at time t. The actual values for some or all of the people will be observed at only a few specific times t. When the trait is under com bined genetic and environmental control, Zt, might be decomposed as the sum
Zq = Ytj + X,j,
where Ytj is the genetic contribution and Xtj the environmental contribution. Typi cally, Ytj and Xtj are also assumed to be uncorrelated. The remainder of this paper is primarily concerned with the random vector X, of environmental contributions.
One reasonable way of viewing the evolution of Xt is to look at how X + h differs from X, for h small and positive. For a diffusion process, one postulates that
Environmental Covariances 485
tend to limits as h + 0. In these expressions, E denotes expectation and Cov denotes covariance. The vector expectation in (1) and the matrix covariance in (2) are calculated conditional on the current trait values X,. In the physics and engineering literature, it is comonly assumed that the limit of (1) is given by A (t)X,, where X (t) is an n X n matrix, and the limit of (2) is given by an n X n covariance matrix a (t). At this level of generality, it is possible to characterize completely the process X, starting from an initial X, at the earliest time to [Jazwinski, 1970; Maybeck, 1979, 1982; Van Kampen, 19811. I will review the formal mathematical results after some motivating remarks. The only case of real interest to us is when h (t) is diagonal having jth diagonal entry h,(t)>O. Without this diagonal assumption, many of the formulas that follow would be very formidable to evaluate. A negative value of h,(t) is inconsistent with the following physical motivation.
In the traditional application, X, is viewed as the velocity of a particle and h,(t)XtJ as a frictional or dampening force that tends to slow the particle to a 0 equilibrium velocity. The variance aJt) describes the random effect of innumerable collisions with small neighboring particles. Each collision imparts a nonsystematic infinitesimal increment to X,. If aJt) = 0 and hJ(t) is a positive constant, X, simply decays exponentially fast to 0. On the other hand, if hJ(t) = 0 and aJJ(t) is a positive constant, then XtJ acts like Brownian motion with no systematic tendency to return to 0 and a variance that grows to 00. Positive constant values for both h,(t) and aJt) produce in the long run a stochastic equilibrium with mean 0 and variance finite.
What is the relevance of this to modeling the environmental contribution X, for a trait like serum lead level [Hopper and Mathews, 1983]? It might be plausible to assume for some traits that good environments get worse and bad environments get better and that the farther away Xt, is from its population mean the stronger the restoring force is. Superimposed on this deterministic trend are many random incre ments with no systematic tendency. These increments are small enough not to produce abrupt changes in X, but come often enough to have a large cumulative effect.
This leads to the question of how all n components of X, evolve in concert. Here is where a deeper understanding of the covariance matrix a(t) becomes crucial. If aJk(t) = 0, then the small increments X, + hj  XtJ and X, + hk  Xtk are totally uncorrelated. At the other extreme of a,k(t) = a,, (t)%qk(t)%, they are perfectly correlated. In general, a,k(t) should capture the current cohabitation status of the pair of individuals j and k. I will expand on this point after summarizing some mathemat ical results.
SUMMARY OF MATHEMATICAL RESULTS
It can be shown that X, is a Gaussian process provided that some initial XQ is Gaussian [Jazwinski, 1970; Maybeck 1979, 1982; Van Kampen, 19811. Now a Gaussian or multivariate normal random vector is uniquely determined by its mean
486 Lange
and covariance. To express these quantities, suppose the n X n matrix function Q(t,s) furnishes the solutions to the initial value problems:
Q(s,s) = I identity matrix
Then it can be shown that
where the superscript * denotes matrix transpose. Furthermore,
for t 2 s. When h(t) is diagonal, then Q(t,s) is also with jth diagonal entry
exp[!3j(u)duI+ (6)
If h(t) is not diagonal, then Q(t,s) can be very formidable to evaluate. We will assume from now on that
E(Xtj) = 0
for all t and j. Even if this does not hold initially, it will hold asymptotically because of the exponential decay displayed in Eqs 3 and 6. Eq 4 written in components is
for Q diagonal. The covariance in Eq 7 simplifies considerably when the ratio of ajk(s) to hj(s) + is a step function. For instance, suppose < tl < * * < tm = t and
whenever ti  < s < ti and ci is some constant. Setting
di eXp  @ [ hj (U) k hk (U) ldu,
Eq 7 becomes
Environmental Covariances 487
Observe that a!jk(s)/[hj(s) + xk(s)] will be a step function whenever a,k(s), Aj(s), and hk(s) are all step functions. The case m = 1 corresponds to this ratio being constant and is of special interest. Then Eq 8 reduces to
Because of the evident exponential decay in Eq 9,
If
to begin with, then Cov(Xt,, Xtk) is time invariant. These remarks particularly apply when j = k. Finally, observe that Eq 5 yields
which shows the characteristic exponential decay between present and past trait values,
SPECIFICATION OF THE COVARIANCE MATRIX
entries It simplifies matters to replace a(t) by its associated correlation matrix p(t) with
Pjk (t) a j k (t) [ ajj(t) a k k (t) 1 '*
p(t) is a covariance matrix in its own right. The problem now is to specify the current infinitesimal cohabitation status &(t) of j and k. It is hard to imagine circumstances
488 Lange
TABLE I. Possible Equivalence Relations
j  k if and only if 1)j = k 2) j and k belong to the same household 3) j and k work or go to school together 4) j and k are genetically identical, eg, MZ twins 5) j and k have the same mother 6) j and k have the same father 7) j and k share both parents
under which Pjk(t) 3 0 fails. An attractive definition for Ojk(t) is the fraction of time that j and k spend together averaged over some relatively short time interval around the current time t. In general, this will be difficult to measure exactly, and some less detailed way of summarizing the cohabitation status would be useful.
Here is where the partition structures introduced by Lange and Boehnke [ 19831 come into play. Let  define an equivalence relation on the pedigree. Recall that  has three properties: 1) j  j for all j , 2) j  k implies k  j , and 3) j  k and k  m imply j  m. An equivalence relation partitions the pedigree into distinct classes or blocks of equivalent people. Table I lists some examples of equivalence relations.
Some of these relations can be a little ambiguous. For instance, someone might have two jobs or a small child might go to neither school nor work. In the second case, it might be reasonable to classify the child and his caretaker in the same block. In the list of Table I, relations 13 appear to be the most pertinent to the amount of time two individuals spend together. Corresponding to the ith of these equivalence relations, there is an obvious correlation matrix Qi whose entry in row j and column k is 1 or 0 depending on whether j  k or not. Just imagine one and the same random variable assigned to each member of a block. Random variables assigned to distinct blocks are independent. It is also true that any convex combination
Q = C ri Qi 1
q Z O C Q = l 1
of these matrices is a correlation matrix. This suggests writing
1
1
0 otherwise
j and k in same household at time t
j and k in same school or workplace at time t
i r2 0 otherwise
using equivalence relations 13 and nonnegative constants r1r3. Eq 11 makes Pjk(t) an easily summarized step function depending on the joint life history of j and k.
Environmental Covariances 489
It is worth noting that not every correlation matrix with nonnegative entries can be represented as a convex combination of 0  1 partition matrices. The reader can check that
I \ 1 a JG? JG? 0 1
a 1 0 O < a < l 1 a J G ? I I
furnishes a counter example when n = 3 . Observe that there are five 0  1 partition matrices for n = 3 .
The cohabitation history summarized in P(t) puts no restrictions on the magni tude of the individual variances aj(t). There is flexibility here to allow individuals to react more or less strongly to the same random environmental stresses. ajj(t) might conceivably depend on sex or age. If this is true, then heritability might vary from individual to individual.
O < a < l
EQUILIBRIUM AND INITIAL CONDITIONS
Appropriate stationarity conditions are
and, when ajk(t), Aj(t), and &(t) are constant,
In many models, aj(t) will be constant, but ajk(t), j # k, will not. Thus, variances can remain fixed while covariances change over time.
Longrun behavior gives some clues about appropriate initial distributions. When A(t) and the diagonal elements of a(t) are constant, the simplest assumptions are stationarity and initial independence for each person j at the time of his brith, say so. Hence
Another possibility is to take XSoj = XsOk for k the mother of j. A stranger possibility yet is to assume that there is some uterine environment provided by the mother that evolves independently of her own external environment. This would make a child correlated at birth with his sibs but not with his mother. Whatever the precise assumptions about initial conditions, Eq 10 shows that they become less relevant in the presence of large decay constants Aj(t).
DISCUSSION
When it comes to the question of possible parameters, one is confronted with an embarrassment of riches. For instance, Aj(t) and aj,(t) could be constants that
490 Lange
depend on sex. As another possibility, X,(t) and aJJ(t) could be step functions of age. This will make Var(X,) vary with age unless aJ(t)/XJ(t) is held constant. If aging makes a person respond more slowly to change, then X,(t) should diminish with t. Other possible parameters are the coefficients r l , r2, and r3 that enter into Eq 11. Here the constraint r l , r2, + r3 = 1 must be obeyed.
Hypotheses about various parameters can be tested by the likelihood ratio criterion. For example, one might wish to test whether the decay constant X, is independent of sex. Testing whether a common decay constant X is 0 is not a reasonable procedure. As mentioned above, all variances can tend to 03 in this case.
Longitudinal data afford an excellent means of estimating XJ(t). There is a hazard in doing this, because X, evolves continuously, and the only way to accom modate measurement error at close times is to force XJ(t) to be large. Perhaps a separate error variance would mitigate this problem. Longitudinal data also provide an excellent means of distinguishing assortative mating from cohabitation. Married couples should converge and separated couples diverge.
The model lends itself nicely to maximum likelihood methods for estimating parameters [Hopper and Mathews, 1982, 1983; Lange and Boehnke, 1983; Elston and Stewart, 1971; Lange et al, 1976; Moll et al, 1979; Ott, 19791. Within this context, genetic variance components and mean components can be added to the model. When prime candidates for direct environmental influences are suspected, these can be measured and included as mean components. The environmental and genetic variance components should then accurately capture the residual variation not explained by the mean components. Maximum likelihood techniques afford the opportunity of estimating all these parameters simultaneously. Given parameter esti mates, it is then possible to look systematically for outlier pedigrees and outlier individuals and to test empirically the overall appropriateness of the model. Numerical implementation of the model is clearly feasible although harder than for models that are linear in all parameters. Note the relevance of Eq 8 to numerical implementation. Obviously, it will take more experimentation to determine the most useful ways of parameterizating the model. As usual, some balance must be struck among the competing demands of biological realism, model parsimony, and computational feasibility.
ACKNOWLEDGMENTS
Michael Boehnke, Richard Dudley, and Patricia Moll read various drafts of the manuscript and made many helpful suggestions for improving it. This research was supported by University of California, Los Angeles; Massachusetts Institute of Tech nology; NIH Research Career Development Award KO4 HD 00307; and NIH grant AM33329.
REFERENCES
CavalliSforza LL, Feldman MW (1981): “Cultural Transmission and Evolution: A Quantitative Ap proach. ” Princeton, New Jersey: Princeton University Press.
Cloninger CR, Rice J, Reich T (1979): Multifactorial inheritance with cultural inheritance and assortative mating. 11. A general model of combined polygenic and cultural inheritance. Am J Hum Genet 31: 176198.
Environmental Covariances 491
Eaves LJ, Last KA, Young PA, Martin NG (1978): Modelfitting approaches to the analysis of human
Eaves LJ, Long J, Heath AC (1986): A theory of developmental change in quantitative phenotypes
Elston RC, Stewart J (1971): A general model for the genetic analysis of pedigree data. Hum Hered
Hopper JL, Culross P (1983): Covariation between family members as a function of cohabitation history. Behav Genet 13:459471.
Hopper JL, Mathews JD (1982): Extensions to multivariate normal models for pedigree analysis. Ann Hum Genet 46:373383.
Hopper JL, Mathews JD (1983): Extensions to multivariate normal models for pedigree analysis. 11. Modeling the effect of shared environments in the analysis of variation in blood lead levels. Am J Epidemiol 117:344355.
behavior. Heredity 411:249320.
applied to cognitive development. Behav Genet 16: 143162.
21:523542.
Jazwinski AH (1970): “Stochastic Processes and Filtering Theory.” New York: Academic Press. Karlin S (1979): Models of multifactorial inheritance: 11. The covariance structure for a scalar phenotype
under selective assortative mating and sexdependent symmetric parentaltransmission. Theor Pop Biol 15:356393.
Lange K, Boehnke M (1983): Extensions of pedigree analysis. IV. Covariance components models for multivariate traits. Am J Med Genet 14513524.
Lange KL, Westlake J, Spence MA (1976): Extensions to pedigree analysis. 111. Variance components by the scoring method. Ann Hum Genet 39:485491.
Maybeck PS (1979, 1982): “Stochastic Models, Estimation and Control, Vols 1 and 2.” New York: Academic Press.
Moll PP, Powsner R, Sing CF (1979): Analysis of genetic and environmental sources of variation in serum cholesterol in Tecumseh, Michigan. V. Variance components estimated from pedigrees. Ann Hum Genet 42:343354.
Ott J (1979): Maximum likelihood estimation by counting methods under polygenic and mixed models in human pedigrees. Am J Hum Genet 31:161175.
Province MA, Rao DC (1985): Path analysis of family resemblance with temporal trends: Applications to height, weight, and Quetellet Index in Northeastern Brazil. Am J Hum Genet 37: 178192.
Rao DC, Morton NE, Cloninger CR (1979). Path analysis under generalized assortative mating. I. Theory. Genet Res 33: 175188.
Van Kampen NG (1981): “Stochastic Processes in Physics and Chemistry.” Amsterdam: North Holland.
Edited by James F. Reynolds
J. Math. Biol. (1989) 27:429450 Journal of
Mathematical Wology
9 SpringerVerlag 1989
A quantitative genetic model for growth, shape, reaction norms, and other infinitedimensional characters
Mark Kirkpatrick ] and Nancy Heckman 2 Department of Zoology, University of Texas, Austin, TX 78712, USA
2 Department of Statistics, University of British Columbia, Vancouver, BC V6T 1W5, Canada
Abstract. Infinitedimensional characters are those in which the phenotype of an individual is described by a function, rather than by a finite set of measurements. Examples include growth trajectories, morphological shapes, and norms of reaction. Methods are presented here that allow individual phenotypes, population means, and patterns of variance and covariance to be quantified for infinitedimensional characters. A quantitativegenetic model is developed, and the recursion equation for the evolution of the population mean phenotype of an infinitedimensional character is derived. The infinite dimensional method offers three advantages over conventional finitedimen sional methods when applied to this kind of trait: (1) it describes the trait at all points rather than at a finite number of landmarks, (2) it eliminates errors in predicting the evolutionary response to selection made by conventional methods because they neglect the effects of selection on some parts of the trait, and (3) it estimates parameters of interest more efficiently.
Key words: Quantitative g e n e t i c s  Infinitedimensional c h a r a c t e r s  Growth   Morphological shapes   Reaction norms
1. Introduction
Many phenotypic attributes of organisms can be quantified by a single measure ment. These include the characters most often studied by quantitative geneticists, such as body weight in animals and crop yield in plants. But other types of characters are intrinsically more complex. One example is the growth trajectory of an organism. A growth trajectory represents an individual as a function that relates the age of an individual to some measure of its size. Since the size of the individual for each different age can be thought of as a different character, and since there are an infinite number of ages, growth trajectories can be thought of
430 M. Kirkpatrick and N. Heckman
as infinitedimensional characters. Two other examples of infinitedimensional characters are morphological shapes and norms of reaction. A morphological shape is a curve or a surface in space. The complete description of the shape of a clam shell, say, requires information not just on its length and width, but on the spatial locations of each of the infinite number of points on its surface. A reaction norm is the function that describes what phenotype will be produced by a given genotype in each of a number of environments. Examples include the locomotory performance of an ecotherm as a function of its temperature, and crop yield as a function of soil water potential. When the environmental variable can change in a continuous way, as temperature and water potential can, each genotype's reaction norm is a function consisting of an infinite number of points.
Many evolutionary questions implicitly involve understanding how infinite dimensional characters evolve. Studies of evolutionary allometry (e.g., Thomp son 1917; Huxley 1932; Gould 1977) are concerned with the size relations between parts of morphological shapes as they become larger or smaller. A complete theory of allometry would allow one to predict how shapes change as their overall size evolves. Ecologists and physiologists are interested in reaction norms because they describe the tradeoffs inherent in being ecologically special ized or generalized. A theory for the evolution of reaction norms would specify whether increasing adaptation to one range of environmental conditions will result in increased or decreased adaptation to other environments (e.g., Huey and Hertz 1984).
This paper will introduce methods for analyzing the evolution of infinite dimensional characters. The approach taken is an extension of standard quantitative genetices (see, e.g., Falconer 1981; Bulmer 1985). The models are phenotypic, in that they are based on observable properties of the population but make no explicit reference to the underlying changes in allele frequencies. We first develop the mathematical notation and methods needed to describe and analyse infinitedimensional characters. These results are then used to derive a model that predicts the evolutionary change in the mean of an infinitedimen sional character. The infinitedimensional approach developed here is found to have several advantages over conventional quantitativegenetic methods, includ ing a more complete description of the trait, greater accuracy in predicting the evolutionary response to selection, and increased efficiency in estimating genetic parameters. Applications of this model and methods for estimating its parame ters will appear in later publications.
2. Mathematical background
The model we develop relies on concepts from functional analysis and stochastic processes. We review here the basic ideas relevant to our model for readers not familiar with those areas; others may wish to skip to Sect. 3. Introductions to these methods can be found in Reed and Simon (1980, Chaps. 1, 2, and 6) and Doob (1953).
Evolution of infinitedimensional characters 431
Throughout, we will use growth trajectories as a concrete example to illustrate the ideas, but morphological shapes and reaction norms can be treated in the same framework with appropriate modifications.
2. I. Notation
The growth trajectory of an individual is a function defined by the individual's size through time. We will denote the size of the individual at age x by 3(x). The mean growth trajectory, written 5, is simply defined such that ~(x) is the average size of individuals in the population at age x. The mean growth trajectory is directly related to the vector of character means used in standard finitedimen sional quantitative genetics: if we consider a finite number of ages for which g is the vector of mean sizes, the mean size of individuals at age xi is ~(xi).
The variation of the growth trajectories of individuals about the mean growth trajectory can be quantified by the covariance function, ~3. The value of the function ~(x~, x2) specifies the covariance between the size of randomly chosen individual at age xl and the size of the same individual at age x2. The value of ~(xl , Xl) is the variance of body size among individuals at age Xl. Defined in this way, the function ~3 is a phenotypic covariance function, as it describes variation and covariation of the growth trajectory phenotypes. A covariance function, which is a bivariate function (that is, a function of two continuous variables), is the analog of the covariance matrix that is widely used in multivariate quantitative genetics and statistics. If Po is the phenotypic covariance between size at ages x, and xj, then Po = ~(x~, xj). A more rigorous definition of covariance functions is given in Sect. A.1 of the Appendix.
2.2. Algebraic operations
We begin by reviewing basic definitions regarding the multiplication, transposi tion, and inversion of functions. We will assume here that the arguments for these functions range over the (possibly infinite) interval from a to b, and that for any univariate function used the integral S~ 02(x) dx is finite.
2.2.1. Multiplication. The (inner) product of two univariate functions t) and 3 is the scalar:
fob 0~3 = 0(~)3(~) d~. (1)
Two functions are said to be orthogonal if their inner product is zero. Multiplying a bivariate function 9.1 and a univariate function 3 produces the univariate function 9.18, with
(9.13)(x) = N(x,~)3(~ ) d~. (2)
432 M. Kirkpatrick and N. Heckman
2.2.2. Transposition. The transpose of a bivariate function 9.1 is written 9.I r and is defined such that
9~T(Xl, X2) = ~I(X2, Xl) , (3)
We also allow for the transpose of a univariate functions, although transposition leaves the values of a univariate function unchanged.
2.2.3. Inversion. The inverse, d  1 , of the operation described by Eq. (2) is a rule that associates with each univariate function 9/3 its preimage, 3 That is
d19.13 = 3, (4)
for all univariate functions 3 Not all covariance functions have inverses; those which cannot be inverted are said to be singular. Further discussion of inverses is given in Sect. A3 of the Appendix.
2.3. Gaussian distributions
A central assumption of quantitative genetics is that characters are Gaussian (i.e., normally) distributed in a population. This assumption can be extended to infinitedimensional characters in a natural way, and will be used in the following section in the development of a quantitative genetic model.
Let 3/ be a set of functions, for example the growth trajectories of individuals in a population, where the subscript i denotes the ith individual in the population. These functions are said to be Gaussian distributed if, when we choose any finite sets of ages xl, x2, 9 9 9 Xk and evaluate 3; at those points, the resulting values are distributed as a kvariate normal (Parzen 1962). I f the growth trajectories in a population are Gaussian distributed, then the sizes of individuals at any given age will be univariate normally distributed. (The converse, however, is not true: a normal distribution of sizes at each age taken separately is not sufficient to guarantee that the functions maintain a Gaussian distribution.) In an empirical study of an infinitedimensional trait, the investiga tor is free to transform the data so that they meet the requirements of normality (see Wright 1968). A Gaussian distribution of functions is completely determined by its mean function (e.g., the mean growth trajectory ~) and its covariance function (e.g., the phenotypic covariance function ~). Section A2 of the Ap pendix shows that given an arbitrary ~ and ~3, a corresponding Gaussian distribution exists with this mean and covariance, provided ~ satisfies some very general conditions.
2.4. Eigenfunctions
A covariance function can be decomposed into its component eigenfunctions and eigenvalues just as a covariance matrix can be written in terms of its eigenvectors and eigenvalues. The eigenfunctions of a covariance function 9.I are defined
Evolution of infinitedimensional characters 433
(Parzen 1962) as the functions ~ that satisfy the relation
that is,
~a b ~ ( x , ~)~1i(~) d~ = ,~il]li(x), (5)
where Oi (r is not simultaneously zero for all values of ~. The scalar number 2~ is known as the eigenvalue associated with the eigenfunction r The eigenfunc tions, which are orthogonal to each other, play the same role in infinitedimen sional theory that principal components do in standard quantitative genetics and statistics. A useful fact from the spectral theorem of linear operations (Lyusternik and Sobolev 1968) is that a covariance function can be rewritten in terms of an eigenfunction expansion:
~[(Xl, X2) = ~ ~i~li(Xl)Oi(X2). (6) i=1
Necessary conditions on the covariance function are given in the Appendix (Sect. A3).
2.5. Orthogonal function expansions
It is often very useful for both theoretical and empirical analyses of infinite dimensional characters to represent functions in terms of their orthogonal function expansions. This method allows infinitedimensional data to be approx imated by a finite number of data, and provides algorithms for performing the algebraic operations described in Sect. 2.2. This is accomplished by: (1) trans forming the functions (e.g., mean growth trajectories and covariance functions) into matrices, (2) manipulating these matrices using the conventional methods of linear algebra, and (3) reversetransforming the resulting matrices into functions.
The method depends on orthogonal functions. A set of functions thi (x) is said to be orthogonal if thi is orthogonal to thj (that is, their inner product is zero) when i r It is convenient to use orthogonal functions that have been normal ized so that the inner product of thi with itself is equal to one for all i.
An orthogonal basis is a set of orthogonal functions with the property that any univariate function 3 (such as a mean function) can be written
3(x) = ~ [G]zth~(x), (7) i~l
where the [cz]~ are coefficients (see, e.g., Abramowitz and Stegun 1965). The coefficients [G]i are uniquely determined once the function 3(x) and the
set of orthogonal functions the(X) have been specified. These coefficients are calculated from the relation
[G], = 3~th, = 3(~)th,(~) d~. (8)
434 M. Kirkpatrick and N. Heckman
The [Cz]i's constitute the entries of a vector cz that has an infinite number of elements. The vector cz is referred to as the coefficient vector associated with the function 3. To repeat, the coefficient vector depends on the choice of orthogonal functions as well as on the function 3.
A covariance function, which is a function of two variables, can be expanded in a similar way:
92(xl,x2) = Y, ~ [CA,jO,(x,)Os(xO. (9) i=lj=l
The coefficients [CA]0 are calculated using the relation
[ c A , j =
= ~b,(r ~2)qgj(~2) d~, d~ 2. (10)
The coefficients form the elements of a symmetric matrix CA which has an infinite number of elements. This is referred to as the coefficient matrix associated with the function 92. Notice that the eigenfunction expansion of Eq. (6) is a special case of the orthogonal function expansion (9) in which the offdiagonal co efficients (those for which i # j ) of the coefficient matrix are zero.
These relations are useful because they allow us to perform the algebraic operations described in Sect. 2.2 by working with the vectors and matrices of coefficients. While the full expansion of a univariate or bivariate function generates a vector or matrix of infinite dimensions, the problem is made tractable by truncating the vectors and matrices to finite dimensions. A univariate function can, under quite general conditions, be approximated to arbitrary accuracy by a polynomial or a trigonometric series of finite degree (the Weierstrass and Fourier theorems; Apostol 1975). The functions therefore can be approximated by partial expansions in terms of orthogonal polynomials or trigonometric functions (the first few terms of the righthand sides of Eqs. (7) and (9)), which then generate finite matrices of coefficients that can be handled by conventional matrix methods. The results from this procedure depend not only on the functions themselves, but also on the choice of the family of orthogonal functions used in the expansion. Bounds on the size of error introduced by the choice of orthogo nal functions can be determined by analytic techniques.
Several uses of this method are summarized in the following, which can be proven directly using the orthogonality property. In the following, CA will be the coefficient matrix associated with the covariance function 92, CB with ~3, and so forth. We will assume that a complete set of orthogonal functions ~b~ have been specified.
2.5.1. Addition and multiplication. The coefficient matrix of the sum of two functions 92 and ~ is equal to the sum of the coefficient matrix of 92 and the matrix of ~. Therefore
(92 "~ ~)(Xl, X2) = ~, ~ [C A ~ CB]ij~)i(Xl)~)j(X2). ( l l ) i=lj=l
Evolution of infinitedimensional characters 435
Similarly, the coefficient vector for the sum of two univariate functions is equal to the sum of their coefficient vectors. The product of two functions can be determined likewise: the coefficient matrix (or vector) of the product of two functions is equal to the matrix product of their respective coefficient matrices (or vectors).
2.5.2. Eigenfunctions. The coefficient vectors for the eigenfunctions Oi of a function 9/are equal to the corresponding eigenvectors of the coefficient matrix of 9/. If we denote the ith element of the j th eigenvector of CA as [co],/, then
~0j(x) = ~ [e~lu~i(x). (12) i = 1
The corresponding eigenvalues 2i of the bivariate function 9/and its coefficient matrix CA are equal.
This result gives a very useful algorithm for finding approximations to the eigenfunction of a covariance function: (1) the function is approximated by a truncated expansion using Eq. (10) to produce the coefficient matrix CA, (2) the eigenvectors and eigenvalues of the coefficient matrix are determined using standard methods of linear algebra, and (3) the eigenvectors are used in Eq. (12) to produce approximations to the eigenfunctions of the original covariance function.
This concludes the development of the notation and methods used to describe the evolution of infinitedimensional characters. We now turn to a model for the evolutionary change in the mean phenotype in a population.
3. A quantitative genetic model
The standard quantitative genetic theory for the simultaneous evolution of multiple characters (Magee 1965; Lande 1979; Falconer 1981) can be extended in a straightforward way to infinitedimensional characters. Here we develop a dynamic model for the evolution of the mean function. Applications of this model to data will be considered in a later paper.
3.1. Assumptions
A fundamental concept of quantitative genetics is that the phenotype of an individual can be defined as the sum of an additive genetic component and an uncorrelated nonadditivegenetic component. Applying this concept to infinite dimensional characters, the size of an individual in the population at age x can be written
3(x) = g(x) + e(x), (13)
where g is the additive genetic component and e the residual nonadditive component of the individual's phenotype. The components g and e are assumed
436 M. Kirkpatrick and N. Heckman
to be Gaussian (normally) distributed in the population, consistent with models of polygenic inheritance (Lande 1979; Bulmer 1985). At the outset of a genera tion (i.e. before selection acts), g is distributed in the population with mean function ~ and covariance function (5, while e is distributed with mean function ~(x) = 0 for all x and covariance function ~. The phenotypes in the population therefore are distributed with a mean growth trajectory g and a phenotypic covariance function ~ such that ~(xl, x2) = ~(Xl, x2) + ~(xl, x2).
Equation (13) is equivalent to the statistical decomposition of an individual's phenotype that is standard in quantitative genetics. The value of g(x) is the individual's additive genetic effect or "breeding value" for size at age x; that is, its average contribution to the size of its offspring at age x if it were mated at random to a large number of other individuals in the population (Falconer 1981; Bulmer 1985). The value of e(x) is the residual component of the individual's size at age x, caused by the effects of the environment and genetic dominance. By considering an individual's size at each different age as a different character, Eq. (13) is seen to be equivalent to the standard statistical model for multiple quantitative characters (see Lande 1979, p. 405).
The description of an individual's phenotype can be applied to characters other than growth trajectories. Consider a simple morphological shape, such as the outline of an insect's wing. By establishing a landmark in the interior of the wing as the origin for a set of polar coordinates, the shape of the wing can be described by the radial distance from the origin to the wing margin as a function of angle. Angle then assumes the role of age as the argument in Eq. (13). In the case of reaction norms, an environmental variable such as temperature takes that role. While the form of selection acting on the reaction norms will depend on whether the trait is fixed during development or continuously varies throughout life as continuous change, the patterns of inheritance can be described using the same framework. Thus the inheritanceand evolution of growth trajectories, morphology, and reaction norms can be treated using these infinitedimensional methods.
We will assume that the population size is sufficiently large that the effects of random genetic drift are negligible, that generations are discrete and nonoverlap ping, and that within each generation reproduction occurs after selection has finished.
3.2. Evolutionary dynamics of the mean
Selection determines the set of individuals that survive and reproduce to consti tute the next generation. Under the standard assumptions of quantitative genet ics (specifically, that mutation and recombination do not alter the mean breeding value of a population, and that the additive genetic component is gaussian distributed), the mean phenotype in the next generation is equal to the mean additive genetic value of the breeding individuals (see Bulmer 1985). This fact is used in the Appendix (Sect. A4) to show that the evolutionary change in the mean phenotype between generation t and generation t + 1 is
Evolution of infinitedimensional characters 437
A~, = ~t+l  ~t = f f i~  1~,, (14)
where ~t is the mean growth trajectory among individuals at the start of generation t (before selection acts). The function ~t is the selection differential, which is defined (Falconer 1981) as the difference in the mean phenotype of individuals before and after selection within generation t. Denoting by ~* the mean growth trajectory of individuals that survive selection and reproduce, the selection differential is
~, (x) = 5" (x)  ~, (x). (15)
Equation (14) can be thought of as the linear regression of the mean growth trajectory phenotype among the survivors onto the additive genetic value of those phenotypes. It is the infinitedimensional analog of the standard matrix equation for the joint evolutionary change of a finite number of discrete characters (Magee 1965; Lande 1979; Falconer 1981):
A z t = G P ls~, (16)
where A ~t is the vector of change between generations t and t + 1 in the means of the characters, G is the additive genetic covariance matrix, P~ is the inverse of the phenotypic covariance matrix, and st is the vector of selection differentials for the characters in generation t.
The selection differential does not directly reflect the action of selection on the phenotypes because of phenotypically correlated responses. That is, even when selection is not acting directly on the size at age x, the corresponding selection differential ~t(x) in general will not be zero because selection acting on the phenotypes at other ages will generate a selection differential at x through the phenotypic correlations. It is therefore convenient to use the selection gradient (Lande and Arnold 1983) to visualize the way in which selection is acting. The selection gradient for infinitedimensional characters is defined as
fl = ~  ~;
that is,
fl(X) = ~ (~1T ~/t~ i )~/i (X), (17) i=1
where the 2i's and ~b;'s are the eigenvalues and eigenvectors of ~3. We have now dropped the subscript t for convenience. In order for this series to converge we require that E (~T~/)~i) 2 be finite (meaning that strength of selection is finite). The selection gradient fl is a measure of the force of directional selection. A positive value for fl(x) implies selection favors larger than average individuals at age x, whereas a negative value implies smaller individuals are favored. With this notation, the pergeneration change in the mean growth trajectory can be written simply as
,~g = ~fl, (18)
which is clearly analogous to the matrix expression for the finitedimensional case, Ag = Gfl (Lande and Arnold 1983). Making use of the eigenfunction
438 M. Kirkpatrick and N. Heckman
expansion (Eq. (6)), Eq. (18) can also be written:
Ag = ~ O~i(~oTfl)q~i, (19) i~l
where the ~i's and g0i's are the eigenvalues and eigenfunctions of the genetic covariance function ~. If the selection gradient has components that correspond to eigenfunctions of ~ for which there is very little or no genetic variance (that is, the associated eigenvalue ~ is very small or zero), then the inner product rp T/~ that appears in Eq. (19) will be small or zero. In this situation, there will be little or no evolutionary response towards those particular deformations of the mean growth trajectory.
These results assume that ~ is not singular. The Appendix (Sect. A5) generalizes these equations to cases in which ~ is singular (and provides a similar result for the finitedimensional case in which P is singular).
3.3. Comparison to conventional matrix methods
Geneticists who have studied infinitedimensional characters have discretized them by taking measurements at landmark points along the continuum. For example, in their study of genetic variation in mouse growth, Riska et al. (1985) measured individuals at 9 different ages in their development. These data were then treated with standard methods designed for analysing a finite number of discrete characters. For example, in order to characterize the genetic variation in the growth trajectory, these workers calculated the eigenvectors and eigenvalues (that is, the principal components and their loadings) for the 9 • 9 genetic covariance matrix they obtained from the data. The results developed above suggest that the growth trajectories might alternatively be analysed by infinite dimensional methods. The underlying covariance function can be estimated from the data matrix by fitting orthogonal functions to the entries in the matrix. The eigenfunctions and eigenvalues then can be calculated from the estimated covari ance function using the methods described earlier.
The infinitedimensional alternative offers three advantages over the conven tional approach. The first is the obvious point that the infinitedimensional method produces a description at every point along the continuum of the character, interpolating between the landmark points at which the data were actually obtained. This by itself is a minor advantage, since some form of curvefitting can be used to interpolate between the points of a finitedimensional analysis.
The second advantage appears when predicting how the mean of a popula tion will evolve in response to selection. If selection is actually acting at all points on an infinitedimensional character (for example, at all ages of a growth trajectory), the finitedimensional formula for the evolution of the mean (Eq. (16)) will produce an inaccurate prediction even for the landmark points at which the measurements were taken. The problem is that there will be correlated response from selection at points along the trait that are not included in the
Evolution of infinitedimensional characters 439
analysis. This introduces errors for the same reason that selection on characters left out of the analysis of multivariate selection on a finite set of characters does (Lande and Arnold 1983; Turelli 1985). Thus an approach like the one developed here seems necessary. The infinitedimensional method predicts the evolutionary response to selection by interpolating the parameters of selection and inheritance between the landmark points, taking into account the spacing of those points (e.g., the ages at which growth measurements are taken). This should often result in a more accurate estimate of the response to selection that does the conven tional, finitedimensional approach.
The third advantage of the infinitedimensional method involves statistical efficiency. Given data on more and more points along the continuum of the character, the finite and infinitedimensional analyses will asymptotically con verge on the same results. An important question is which method converges more rapidly. While we have not yet completely answered this question, numer ical results suggest that the infinitedimensional method may prove to be substantially better than the finitedimensional method in this regard. We offer the following numerical example.
Imagine that we want to describe the relative amounts of additive genetic variation associated with the first and second eigenfunctions of the growth trajectories in a population. (This is equivalent to comparing the loadings on the first and second principal components of the genetic covariance matrix.) The relative efficiency of the two methods can be compared with the following exercise. A " t rue" covariance function is assumed over a finite square interval. Next, a lattice of n 2 evenly spaced points is laid down over the square, and the covariance function is evaluated at these points. This produces an n x n data matrix which we treat as the data that would have been produced by an errorfree experiment that measured the character at n evenlyspace landmark points (for example, n evenlyspaced ages along a growth trajectory). These data then can be analysed by both the finite and infinitedimensional methods for comparison.
We have performed this exercise numerically using 4 particular covariance functions:
(5(Xl, x2) = cos(rclxl  x2[) + 1, (20a)
(5(Xl, x2) = sech[3(xl  x2)], (20b)
(5(xl, x2) = exp[  2(xl  x2)2], (20c)
Iti(xl, x2) = exp[  2(xl  x2)4], (20d)
where Xl and x2 range from 0 to 1. These functions were evaluated with n = 5, 9, 17, and 33 points. In order to compare the two approaches, we examined the estimates they gave of the first two eigenvalues. For the finitedimensional analysis, the eigenvalues of the resulting data matrices for each n were calculated using a standard computer algorithm. For the infinitedimensional analysis, twodimensional Legendre polynomials were fit to the data matrices to produce an estimate of 15, the covariance function. We used polynomials of degree n  1,
440 M. Kirkpatrick and N. Heckman
0 .5 
0 .4 .
1.7j = =
1.6 y
f
0.3   9  . 9 .
a o 10 20 30 40
1.35
1.25
f
~2 0.52
0.50
0.48
0.46 b o
=  9
10 20 30 40
1.55' = = =
1.45.
I / 0.5 /
~ 0.4
0.38 I 9 C 0 10 20 30 40
1.9
1.8'
1.7'
0 . 4 "
0.3
0.2 d o
= = 9
/ ,
9 = 9 = .
lo 20 4o
n #3 Fig. 1. Two largest eigenvalues of the covariance functions of Eqs. (20a~l) plotted against the number of data points, n, as estimated by the finitedimensional method (open squares) and the infinitedimensional method (closed squares)
which allowed the estimate of t~ to pass through each of the data points. Eigenvalues were then calculated from the coefficient matrix of the polynomials, as described above (Sect. 2.5.2).
The results are shown in Fig. 1. It is clear that the infinitedimensional method converges much more rapidly to an asymptotic value, which we presume to be the actual eigenvalue. We analysed several additional cases in which the sample points (ages) xi are unevenly spaced, using the same covariance functions.
Evolution of infinitedimensional characters 441
Those results show an even greater advantage to the infinitedimensional method. This advantage may result from the fact that the infinitedimensional method takes into account the spacing of the sample points when the orthogonal functions are fit to the covariance function, whereas the finitedimensional method ignores not only the spacing but even the ordering of the points. Based on these numerical analyses, we speculate that the infinitedimensional method may be generally superior to the finitedimensional method for estimating eigenvalues.
We also compared the two methods with regard to how rapidly they converge on estimates of the eigenfunctions. In contrast to the results from the eigen values, the two methods appear to estimate the eigenfunctions with similar efficiency.
These numerical results do not prove the infinitedimensional method is always or even generally more efficient, but they are suggestive. The success of the method depends upon using a set of orthogonal functions to fit the data that span the space of functions that is spanned by the underlying covariance function. If covariance functions that appear in biological data are reasonably smooth (as seems likely), then smooth orthogonal functions, such as polynomi als, may work well. Clearly this question needs further study.
4. Discussion
Having developed this model for the evolution of infinitedimensional characters, it is now appropriate to assess the limitations of this approach. There are two general categories of questions: those general to Gaussian quantitativegenetic models, and those specific to this particular model.
This model is a direct extension of standard quantitativegenetic models for the evolution of a finite number of characters, and as such it shares their strengths and their weaknesses. Two questions are frequently raised about this class of models. The first concerns the linearity of the response to selection. An important outcome of the assumption that additive genetic effects and nonaddi rive effects are both normally (Gaussian) distributed is that the magnitude of the evolutionary response is a linear function of the force of selection in the preceding generations (see Eqs. (14) and (18)). The assumption of normality for the distributions of g and e is a major assumption. On the phenotypic level, it is known that the distribution of many univariate characters is either approxi mately normal or can be rendered so by a suitable transformation of the data (Wright 1968). Although this is consistent with the normality assumptions, it does not show directly that the constituent additive and nonadditive effects are each normal. The models are, however, supported by experimental results from shortterm selection experiments on single characters that typically match the model predictions reasonably well (Falconer 1981). A more serious potential problem is the extrapolation of the normality assumption to multiple characters. The assumption of multivariate normality is far more stringent than that of univariate normality, and has not been systematically tested with suitably large data sets.
442 M. Kirkpatrick and N. Heckman
The normality assumption has been justified on theoretical as well as empirical grounds. Several models based on detailed assumptions concerning the action of mutation, recombination, and selection lead to a normal distribution of additive genetic effects at the level of the character (as opposed to the effects at individual loci) (Fisher 1918; Kimura 1965; Bulmer 1985; Lande 1980; Falconer 1981; Barton and Turelli 1987). Other kinds of genetic models, however, do not, and instead produce a nonlinear response to selection (Robertson 1977; Bulmer 1980; Barton and Turelli 1987). Even when the distribution of allelic effects at individual loci departs from normality, however, approximate normality of the overall additive genetic effects (as we have assumed here) can be maintained if there are a large number of loosely linked loci contributing to the trait (Bulmer 1980; TureUi 1986).
To conclude our discussion of the normality assumption, we note that it has some support from both empirical and theoretical studies, but certainly deserves further evaluation. Even if the normality assumption is violated, however, our model may produce a reasonable approximation to the true evolutionary dynamics.
Confusion has arisen from the criticism that some quantitativegenetic mod els fail to correctly account for evolution of the genetic variances and covari ances. Under certain sets of assumptions, the variancecovariance structure remains approximately constant (Bulmer 1985; Lande 1980). This is not true, however, for quantitativegenetic models based on other genetic assumptions (Barton and Turelli 1987). We choose to avoid this issue, and make no claims or assumptions regarding the dynamics of the genotypic and phenotypic covariance functions ffi and ~. So long as the assumptions of the model are met, the dynamic equations for the evolution of the mean phenotype (Eqs. (14) and (18)) hold. Given empirical information or a model for the evolution of the covariance functions, new values for ~ and ~ can be used in each generation. Even if they do not remain constant, under quite general conditions the variance structure is likely to evolve slowly relative to the mean, and so assuming constancy of ffi and
may give reasonably accurate predictions for several to many generations. A question specific to the infinitedimensional model that naturally arises is
whether it is possible to substitute conventional finitedimensional methods for those developed in this paper. We find there are three arguments that favor using the infinitedimensional methods whenever the trait of interest is inherently infinitedimensional. First, these methods give a complete description of the character at all points along its continuum, rather than at fixed landmark points alone. Second, the infinitedimensional method leads to a correct prediction of the response to selection where the finitedimensional method does not because it neglects the effects of selection on all points of the trait other than the landmark points. Third, numerical examples suggest that the infinitedimensional method may be substantially more efficient in estimating parameters of interest from finite data sets. Since growth, shape, and other infinitedimensional charac ters are of such widespread interest to biologists, the development of methods such as these for describing their variation and predicting their evolution is an important goal.
Evolution of infinitedimensional characters 443
Acknowledgments. We are very grateful to M. Bulmer, J. Felsenstein, C. Pease, S. Sawyer, M. Slatkin, and B. Walsh for discussions. We thank N. Barton, D. Lofsvold, T. Nagylaki, T. Price, M. Turelli, S. Via, M. Wade, and two anonymous reviewers for comments on earlier drafts of the paper. This research was supported by N.S.F. Grant BSR8604743 to M.K. and N.S.E.R.C. Grant A7969 to N.H.
Appendix
This Appendix develops five points. Section A1 gives the definition of a covari ance function. Section A2 discusses the existence and integration of gaussian processes, and Sect. A3 develops the conditions for the existence of the inverse of the operator associated with a covariance function. Section A4 applies these results to our genetic model. Finally, Sect. A5 extends the genetic model to cases in which the phenotypic covariance matrix or function is singular.
The following assumptions will be made throughout. The arguments of all functions (either univariate or bivariate) lie in some interval, which may be infinite (e.g., the positive real line). A univariate function u is called an L 2 function if S u2(x) dx is finite. The function u need not be continuous. Two L 2 functions, Ul and u2, are considered to be the same if j" (Ul(X)  u2(x)) 2 dx = O, which can occur even if Ul does not equal u2 for, say, a finite collection of x 's . Thus the value of an L 2 function is not uniquely defined for each x. In the same way, we can say that a sequence of L 2 functions, u, , converges to a n L 2 function, u, if ~ ( u , ( x )  u ( x ) ) Z d x converges to zero. This does not imply that u,(x) converges to u(x) for all x. In what follows, we will differentiate between L 2 equivalence/convergence and pointwise equivalence/convergence for each x through the phrases "in L 2'', or "for all x " and "pointwise".
A 1. Definition of a continuous covariance function
A continuous convariance function ~3 is any bivariate function with
~3 symmetric, i.e. ~3(Xl, X2) = ~(X2, Xl) for all Xl, x2,
~3 positive semidefinite, i.e. nT~3u t> 0 for all univariate L 2 functions u, (A1)
~3 is continuous.
Except for the continuity condition, this is analogous to the definition of a covariance matrix (Rao 1973).
A2. Existence and integration of gaussian processes
Let 3 be an L 2 function and ~3 be a continuous convariance function as defined above. Then there exists a Gaussian process 3 with covariance function ~ and mean function ~. (See D o o b 1953, Theorem 3.1 of Chap. 2.)
444 M. Kirkpatrick and N. Heckman
In Sect. A3 we will require that expressions of the form uT3 = S u(r162 d~, with u in L 2, can be mathematically defined. There are two ways of doing this. In the first method one fixes a "realization", 3", of 3 and requires that ~n(r d~ be finite. This will hold for all 3* (except for a collection of realizations which has a zero probability of occurring) provided
~(r 4) d( is finite. (A2)
(See Doob 1953, Chap. 2, Theorems 2.5 and 2.7). In the second method, one views ur3=~u(r as a random variable which "exists" provided its variance is finite. This will hold for a particular u if
ur~3u = [ ~ u(~l)~(~l, r162 d~l d~2 is finite,
and will hold for all n in L 2 provided
~ [~3(r r 2 d~ dr is finite. (A3)
(See Davis 1977, Proposition 2.3.6.)
A3. The inverse of the operator associated with a covariance function
Let ~ be a continuous covariance function and ~ a univariate function. The goal of this section is to specify conditions on ~ and g that guarantee that the equation
~u = g (A4)
has a solution u in L 2, with (A4) holding pointwise. First, sufficient conditions on ~ and g will be given that guarantee that (A4) holds in L 2, and then additional conditions will be given that guarantee that (A4) holds pointwise. The solution u will be denoted ~~g.
Assume that ~ satisfies (A1), (A3), and
if ~3f = zero function, then f = zero function. (A5)
Let ~bi be the eigenfunctions of ~, with corresponding eigenvalues 2i , as defined in Sect. 2.4. Since by definition an eigenfunction is never the zero function, condition (A5) implies that none of the eigenvalues are zero. This fact alone would, in the finitedimensional matrix case, guarantee a solution to (A4). More is needed, however, in our infinitedimensional setting. Conditions (A1), (A3), and (A5) are sufficient to guarantee that the eigenfunctions of ~ form an orthogonal basis for the collection of all L 2 functions (by Sect. 3.2 and a modification of the argument in the example on p. 131 of Lyusternik and Sobolev 1968). Thus any L 2 function g can be written in terms of the orthogonal eigenfunctions qJ/:
~(X) : ~ [Cg]i~i(X), i = 1
Evolution of infinitedimensional characters 445
with convergence of the series holding in L 2. (Note that this is a special case of Eq. (7), with the coefficients [cg]i given by Eq. (8).) Given that (A3) holds,
$(xl , x2) = ~ 2Aki(xl)~ki(x2) in L 2, i = 1
where the 2g's are the eigenvalues of $ as given in Eq. (6). The solution to (A4) is then given by
u(x) = Z ([eg]~/2i)O~(x) in L 2. (A6) i=1
Unfortunately, this sum may not always converge, and we require that
([Cg]ff2i) 2 be finite. (A7) i = 1
Under this assumption the solution to (A4) exists and is given by (A6). For the second part of the argument, suppose that $ is a continuous
covariance function satisfying (A1) and (A3) and that 9 and u are univariate functions with ~3u = 9 in L 2. We seek conditions on ~3 and g so that we can say that the equality holds pointwise. If we assume that ~ is continuous, then we need only find conditions on ~3 so that the function ~3u can be defined pointwise, with this pointwise definition being continuous. Assume that
[~[~(X, ~)]2 d~ is finite for all x, (A8.a)
and, for all x0, there exists a constant C = C(xo) with
I~( x, Y) I <" I C(xo) ~(Xo, Y) I ( A8. b)
for all x in some neighborhood of x0 and for all y. (Conditions (A.8a) and (A8.b) automatically hold if ~ is continuous and we restrict ourselves to x and y in a closed and bounded interval.) By the H61der inequality and condition (A8.a), ~(x, y)u(y) is an integrable function of y, and thus we can define ~u pointwise in a natural way. Conditions (A8), the continuity of ~3, and the dominated convergence theorem guarantee that this natural definition of ~u is continuous.
A4. The genetic model
Suppose that the phenotype of an individual drawn at random from the entire population is given by the gaussian random process 3, with
3(x) = g(x) + e(x),
where g and e are independent gaussian processes. As described in Sect. 3.1, 9 is the additive genetic component and e the residual nonadditivegenetic compo nent of the individual's phenotype. The expectation of 3 is ~ (the population mean function), and the expectation of e is the zero function. The covariance functions of 3, 9, and e are $, 15, and ~, respectively, with $ = 15 + ~. Assume
446 M. Kirkpatrick and N. Heckman
that these covariance functions satisfy (A1), and in addition that ~5 satisfies (A8) and ~ satisfies (A2), (A3), (A5), and (A8).
Let 3" be the average phenotype of an infinite subset of individuals from the population, and let ~* be the average additive genetic component of those individuals. We assume that the selected group o f individuals breed randomly among themselves. A standard assumption of quantitative genetics is that the genetic processes of mutation and recombination do not alter the mean genetic component of a population from one generation to the next. The mean genetic component among the offspring therefore is equal to that of the breeding adults. Thus the mean phenotype of the offspring is simply the expected value of g in the selected group when 3*(x) is known for all values of x. This expected value will be denoted E(~*(x)13]. To prove Eq. (14) we must show that
13"]  3(x) + (3*  T l x '
where ffix(y ) = ~(x, y) and ~  1 denotes the inverse operator described in Sect. A3. By an argument in Parzen (1962) Sect. 3,
E[~*(x) [3"1 = 3(x) + u~(3*  3), (A9)
where Ux is any L 2 function with
~3Ux(y) = ~ ( y ) for all y.
Different values of x, however, will give rise to different functions Ux. For notational convenience, we will drop the subscript form Ux from this point on. Since ffi satisfies (A8), ~x can be viewed as a univariate L 2 function of y when x is fixed. Thus by the results of Sect. A3, for a given x the required function u exists provided
([Cgx],/2i) 2 is finite, (A10) i = 1
where 2, are the eigenvalues of ~ corresponding to the eigenfunctions ~i and
[Cgx] i = ~ ~)(X, ~)~i (~) d~.
Writing
i=l
= ~i~  l(~jx(y )
completes the proof. Actually, if we view E[fi*(x)13* ] as a random variable, rather than as a
particular realization of a random variable, assumption (A2) on ~ can be dropped, and condition (A10) can be weakened slightly to the requirement that
[Cgx]~/)~i be finite. (A11) i = l
This follows since the expression ( 3 *  ~ ) r ~  I ~ x , when viewed as a random
Evolution of infinitedimensional characters 447
variable, can be mathematically defined provided its variance is finite. This variance is simply
Var[ur~] = Var[ ~ u(03(~) d~]
= ~ ~ U(~l) Cov[~(~l), ~(~2)]u(~2) d~l d~2
Using the definition of u and the fact that the $~ s are the eigenfunctions of ~, this variance may be written
Var[uT~] = U(~I) ~'~ ~ ~[~(~1, ~2)([Cgx]i/~i)~i(~2) d~l d~2 i=l
i = 1
i=1j=1
= ~ [Cgx]~i/~i, i = l
which we require to be finite in (A11).
A 5. Singular covariance funct ions and matrices
In the preceding we assumed the covariance function ~3 (or covariance matrix P, in the finitedimensional case) can be inverted; that is, that ~3 is nonsingular. Since there is no mathematical or biological reason to justify this as an assumption, the question naturally arises as to what the evolutionary dynamics are when ~3 is singular. We generalize the genetic model to cases in which the phenotypic covariance structure is singular. We will first work with the conven tional finitedimensional case, then sketch the analogous proof for the infinite dimensional case.
We will establish that when the phenotypic covariance matrix P is singular, the evolutionary change in the vector of means for the traits given by the equation
A~ = G P s. (A12)
This is analogous to Eq. (16), with the difference that the inverse P1 is replaced by P  , a "generalized inverse" of P. A generalized inverse of P is defined (Rao and Mitra 1971; Rao 1973; Sects. lb.5 and 4a.3) as a matrix which has the property
P P  P = P. (A13)
In the special case that P is nonsingular, the generalized inverse P is unique and equals the proper inverse P1. The generalized inverse is no longer unique
448 M. Kirkpatrick and N. Heckman
whenever P is singular. From the definition of (A13) it can be verified that a generalized inverse of P is
P  = ~ (l/t~i)pipTi , (A14)
where 2i is the ith eigenvalue and Pe the ith eigenvector of P, and the summation extends over only those values of i for which ~.g is nonzero. This provides an algorithm for calculating a generalized inverse P  .
We now verify Eq. (A12). From Eq. (A9) we see that the evolutionary change in the vector of means can be written
Ag = U(g*  ~), (A15)
where U is any matrix that satisfies the relation
U P = G. (Am6)
Thus (A12) is established if it can be shown that
U = G P  (A17)
satisfies (A16). The definition of a generalized inverse given by (A13) implies that
P  P = I + R, (A18)
where I is the identity matrix and R is a matrix that is orthogonal to P (that is, P R = 0). Substituting (A17) into the lefthand side of (A16) and making use of (A18), we see that
G P  P = G(I + R). (A19)
The fact that R is orthogonal to P implies that it is also orthogonal to G. This can be seen by letting the vector R~ equal the ith column of R. Recalling that P = G + E, we have
a f e R i = e f G e i ~ R f e a i . (A20)
The lefthand side of (A20) vanishes because P and R are orthogonal. Since G and E are positive semidefinite, both terms on the righthand side of (A19) must vanish for their sum to vanish. This shows that Ri is orthogonal to G for any value i, and consequently the product G R must vanish. Thus (A19) can be written
G P  P = G, (A22)
which shows that (Al6) satisfies (A15), and so establishes (All) . We now sketch the analogous proof for the infinitedimensional case. The
notation and assumptions of Sect. A4 hold with the exception of assumption (A5). Suppose that ~ 's eigenfunctions and associated eigenvalues have been reordered and relabelled so that $; and ~ refer to eigenfunctions and values with ~.e nonzero, and the 9tj's denote eigenfunctions associated with zero eigenvalues.
Evolution of infinitedimensional characters 449
Paralleling (A14), for each x define
t) = y, (1/L )(
with the equality holding in the variable t in L 2. Since the O~s and 9~j's form a basis for L 2 one can expand ~x in a series:
% ( 0 : ( ;,lg + ( % )9 j(0.
Since ~ is nonnegative definite and ~ r % = 0, by an argument analogous to that following (A20)
Thus
=
This holds pointwise in x and t, by continuity of (5 and ~. Therefore, Eq. (14) can be generalized to the case of a singular phenotypic
covariance function ~3 to give
A g = ~ N  ~. (A23)
Typically one wishes to focus attention on the quantity ~  ~ . This is given by
 ~(Y) =/~ = Z (~b 5 ~/2i)~i (Y). (A24)
For /~ to be defined for a particular ~, we require (as in Sect. 3.2) that E (OT~/2i) 2 be finite.
References
Abramowitz, M., Stegun, I. A.: Handbook of mathematical functions. New York: Dover 1965 Apostol, T. M.: Mathematical analysis, 2nd edn. Reading, Mass.: AddisonWesley 1975 Barton, N. H., Turelli, M.: Adaptive landscapes, genetic distance and the evolution of quantitative
characters. Genet. Res. 49, 157 173 (1987) Bulmer, M. G.: The mathematical theory of quantitative genetics. Oxford: Oxford University Press
1985 Davis, M. H. A.: Linear estimation and stochastic control. London: Chapman and Hall 1977 Doob, J. L.: Stochastic processes. New York: Wiley 1953 Falconer, D. S.: Introduction to quantitative genetics, 2nd edn. New York: Longman 1981 Fisher, R. A.: The correlation between relatives on the supposition of Mendelian inheritance. Trans.
Royal Soc. Edinburgh 52, 399433 (1918) Gould, S. J.: Ontogeny and phylogeny. Cambridge, Mass.: Belknap 1977 Huey, R. B., Hertz, P. E.: Is a jackofalltemperatures a master of none? Evolution 38, 441444
(1984) Huxley, J.: Problems of relative growth. London: MacVeagh 1932 Kimura, M.: A stochastic model concerning the maintenance of genetic variability in quantitative
characters. Proc. Nat. Acad. Sci. 54, 731 736 (1965) Lande, R.: Quantitative genetic analysis of multivariate evolution, applied to brain:body size
allometry. Evolution 33, 402416 (1979)
450 M. Kirkpatrick and N. Heckman
Lande, R.: The genetic covariance between characters maintained by pleiotropic mutations. Genetics 94, 203215 (1980)
Lande, R., Arnold, S. J.: The measurement of selection on correlated characters. Evolution 37, 12101226 (1983)
Lyusternik, L. A., Sobolev, V. J.: Elements of functional analysis. New York: Unger 1968 Magee, W. T.: Estimating response to selection. J. Anita. Sci. 24, 242247 (1965) Parzen, E.: An approach to time series analysis. Ann. Math. Stat. 32, 951489 (1962) Rao, C. R., Mitra, S. K.: Generalized inverse of matrices and its applications. New York: Wiley 1971 Rao, C. R.: Linear statistical inference and its applications. New York: Wiley 1973 Reed, M., Simon, B. Methods of modern mathematical physics: I. Functional analysis, 2nd edn. New
York: Academic Press 1980 Riska, B., Atchley, W. R., Rutledge, J. J.: A genetic analysis of targeted growth in mice. Genetics
107, 79101 (1984) Robertson, A.: The nonlinearity of offspringparent regression. In: Pollak, E., Kempthorne, O.,
Bailey, T. B. (eds.) Proceedings Int. Conferrence on Quantitative Genetics, pp. 297306. Ames: Iowa State University Press 1987
Thompson, D. W.: On growth and form. Cambridge: Cambridge University Press 1917 Turelli, M: Effects of pleiotropy on predictions concerning mutation selection balance for polygenic
traits. Genetics 111, 165195 (1985) Turelli, M.: Gaussian versus nongaussian genetic analyses of polygenic mutationselection balance.
Karlin, S. Nevo, E. (eds.) Evolutionary processes and theory, pp. 607~28. New York: Academic Press 1986
Wright, S.: Evolution and the genetics of populations, vol. 1. Genetic and biometrical foundations. Chicago: University of Chicago Press 1968
Received January 18, 1988/Revised April 3, 1989
Copyright 0 1990 by the Genetics Society of America
Analysis of the Inheritance, Selection and Evolution of Growth Trajectories
Mark Kirkpatrick,” David Lofsvold*” and Michael Bulmer? *Department of Zoology, University $Texas, Austin, Texas 78712, and ?Department of Statistics, Oxford University,
Oxford OX1 3TG, England Manuscript received March 15, 1989
Accepted for publication December 18, 1989
ABSTRACT We present methods for estimating the parameters of inheritance and selection that appear in a
quantitative genetic model for the evolution growth trajectories and other “infinitedimensional” traits that we recently introduced. Two methods for estimating the additive genetic covariance function are developed, a “full” model that fully fits the data and a “reduced” model that generates a smoothed estimate consistent with the sampling errors in the data. By decomposing the covariance function into its eigenvalues and eigenfunctions, it is possible to identify potential evolutionary changes in the population’s mean growth trajectory for which there is (and those for which there is not) genetic variation. Algorithms for estimating these quantities, their confidence intervals, and for testing hypotheses about them are developed. These techniques are illustrated by an analysis of early growth in mice. Compatible methods for estimating the selection gradient function acting on growth trajectories in natural or domesticated populations are presented. We show how the estimates for the additive genetic covariance function and the selection gradient function can be used to predict the evolutionary change in a population’s mean growth trajectory.
A predictive theory for the evolutionary response of growth trajectories to selection is an impor
tant goal of both evolutionary biologists and breeders. Evolutionary biologists are interested in growth tra jectories because of their impact on morphology, size mediated ecological interactions, and lifehistory char acters (e.g. , EBENMAN and PERSON 1988). Animal and plant breeders are concerned with growth trajectories because of the potential to increase the economic value of domesticated species by altering growth pat terns through artificial selection (e.g., FITZHUCH 1976). Since the sizes of individuals of the same age in a population typically vary in a quantitative (contin uous) manner, it has long been recognized that quan titative genetics provides appropriate methods for the study of the inheritance and evolution of growth trajectories.
We have recently extended the classical quantitative model for the evolution of multiple characters to “infinitedimensional” traits such as growth trajecto ries in which the phenotype of an individual is repre sented by a continuous function (KIRKPATRICK 1988; KIRKPATRICK and HECKMAN 1989). In those earlier studies, we assumed the parameters of inheritance and selection were known quantities. Our goal in this paper is to develop methods for estimating those parameters and to show how they can be used to
’ Present address: Department o f Biology, Franklin and Marshall College, P.O. Box 3003, Lancaster, Pennsylvania 17604.
The publication costs of this article were partly defrayed by the payment ofpage charges. This article must therefore he hereby marked “aduertjsement” i n accordance with 18 U.S.C. 61734 solely to indicate this fact.
Genetics 124: 979993 (April, 1990)
analyze the evolution of a population’s mean growth trajectory. While the example we discuss deals with body size, the methods apply to any ontogenetic proc ess. More generally, the infinitedimensional method can be extended to other kinds of traits in which an individual’s phenotype is a continuous function, such as reaction norms and morphological shapes, and so may be of use in a variety of biological contexts. An analysis of several data sets using these methods, and a discussion of the evolutionary implications of the results, is planned for a later publication.
The infinitedimensional model is motivated by the fact that growth trajectories do not immediately fit into the framework of conventional quantitative ge netics, which treats the evolution of a finite number of traits. This is because growth trajectories are con tinuous functions of time, so that a trait in an individ ual requires an infinite rather than finite number of measurements to fully describe. The infinitedimen sional model offers several advantages over earlier attempts to adapt quantitative genetics to growth tra jectories (KIRKPATRICK and HECKMAN 1989). First, it predicts the evolution of the full growth trajectory (rather than at a set of landmark ages) without making a priori assumptions about the family of curves that are evolutionarily possible. Second, it provides a method for analyzing patterns of genetic variation that reveal potential evolutionary changes in the growth trajectory for which there is and for which there is not substantial genetic variation. Third, the method appears to have reduced biases in the esti mates of the genetic variation (and therefore of the
980 M. Kirkpatrick, D. Lofsvold and M. Bulmer
response to selection) when compared with the alter native approaches. Two additional advantages appear from the methods presented in this paper: the spacing of the ages at which the data are collected is correctly accounted for (even when the spacing is uneven), and it allows one to project the evolution of the growth trajectory even when the data on selection and inher itance are collected at two different sets of ages.
We will begin with a brief review of the infinite dimensional model, then turn to the problem of esti mating the parameters of inheritance. T o make the ideas concrete, we will illustrate the genetic methods using a subset of the data of RISKA, ATCHLEY and RUTLEDGE (1984) on the genetics of growth in ICR randombred mice. In a detailed study, these workers measured 2693 individuals at weekly intervals be tween ages 2 weeks and 10 weeks in a halfsib breeding design. For the sake of simplicity, we will use only their data on male body weight at ages 2, 3 and 4 weeks in the following. Next the estimation of the parameters of selection is treated. Last, we show how the estimates of the genetic and selection parameters can be used to project the evolution of the popula tion's mean growth trajectory.
Some of the statistical methods developed in this paper can involve a substantial amount of computa tion. Computer programs for these operations are available from the first author.
THE INFINITEDIMENSIONAL MODEL
The mean size of unselected individuals in a cohort through time is referred to as the cohort's mean growth trajectory and is denoted by the function j . Thus the value ofy(a) is simply the expected size of individuals at age a in the absence of selection. Selection within a given generation generally will cause the observed mean size of individuals to differ from the mean growth trajectory and also will produce an evolution ary change in the mean growth trajectory between that generation and the next.
The evolutionary change in2 can be determined by extending the standard theory of quantitative genetics to infinitedimensional characters (KIRKPATRICK and HECKMAN 1989). The growth trajectory of an individ ual can be thought of as the sum of two continuous functions. The first of these represents the additive genetic component of the growth trajectory inherited from the individual's parents. The second component is attributable to environmental effects, such as nutri tion, and to genetic dominance. The additive and nonadditive components are defined to be independ ent of each other and are assumed to be multivariate normally distributed in the population. This assump tion is standard in quantitative genetic models of multiple characters. The normality of genetic effects is consistent with a variety of forms of genetic varia
tion at the individual loci involved provided the num ber of loci is moderate to large and linkage is loose (BULMER 1985, Chap. 8; BARTON and TURELLI 1989). When genetic effects are not normal it may be possible to transform the scale of measurement to one in which they are (for example, by taking logarithms) (WRIGHT 1968, Chap. 10; FALCONER 198 1 Chap. 17). Last, we assume that the growth trajectory is autosomally in herited, that the effects of random genetic drift, mu tation, epistasis, and recombination on the mean growth trajectory are negligible compared with selec tion, and that generations are nonoverlapping.
When selection acts on the sizes of individuals, the evolutionary dynamics of the mean growth trajectory are described by the equation
Aj(a) = La'''''' Y(a, x)P(x) dx, (1)
where A;(a) is the evolutionary change in the mean size of individuals of age a following a single genera tion of selection, Yis the additive genetic covariance function, and P is the selection gradient function (KIRKPATRICK and HECKMAN 1989). Equation 1 can be modified to accommodate situations in which se lection acts directly on growth rate rather than size per se; see LYNCH and ARNOLD (1988).
The additive genetic covariance function 59 plays the same role in the evolution of growth trajectories that the additive genetic covariance matrix does in the standard theory of quantitative characters (see LANDE 1979). The value of Y(u~ ,u~) is the additive genetic covariance for size between individuals meas ured at age a l and those same individuals measured at age a2. The selection gradient function P is a measure of the forces of directional selection acting on body size (LANDE and ARNOLD 1983). The mag nitude of P(a) reflects the strength of directional se lection acting on body size at age a. A negative value of @(a) indicates selection favors smaller size, while a positive value indicates the converse.
Equation 1 predicts the evolutionary change across only a single generation. In general, it is possible that both the strength of selection and the genetic variation will change from generation to generation. This does not present a problem for Equation 1, however, since new values can be used in each generation. This information can come either from direct estimation of the parameters or from genetic and ecological models that predict how they will change through time. We discuss methods for direct estimation below; theoretical approaches are reviewed by BARTON and TURELLI (1 989) and BULMER (1 989).
Predicting the evolutionary dynamics of the mean growth trajectory thus requires estimating the param eters of inheritance, described by 5f and of selection, described by P. In the next three sections, we discuss
Growth Trajectories , 436 522 424
e =
424 665 558 522 808 665
estimation of .% the analysis of .% and the estimation of p. Before proceeding, we pause here to describe the notation conventions used throughout the paper. Continuous functions, such as the mean growth tra jectory? and the additive genetic covariance function 'are denoted with a script font. Vectors and matrices are written in bold. We use a hat or a tilda to signify estimates of quantities; for example, the estimate of an additive genetic covariance matrix is written G .
E S T I M A T I N G T H E : A D D I T I V E G E N E T I C C O V A R I A N C E F U N C T I O N Y
T o estimate the additive genetic covariance func tion we begin with the additive genetic covariance matrix G familiar from standard quantitative genetics. The sizes of an individual at two ages a ; and ai are considered to be two different characters, and the value of Go is equal to the additive genetic covariance for the sizes of an individual at those two ages. Meth ods for estimating genetic variances and covariances of multiple characters have been extensivelv devel oped by animal and plant breeders (FALCONER 198 1 ; BULMER 198.5), and have more recently been applied to natural populations by evolutionary biologists (e.g., ARNOLD 1981: PRICE, GRANT and BOAC 1984; LOFS VOLD 1986). Given measurements of size at n ages, a? n X n estimated additive genetic covariance matrix G can be calculated. M'e refer to the vector of n ages at which these measurements have been taken a s the age zlector, denoted a.
The entries i n the matrix G provide direct estimates of the additive genetic covariance function Y a t n' points, since G,, = $(a,, a,). The relationship between
98 1
the covariance matrix G and the covariance function !4 is illustrated i n Figures 1 and 2. The values of !4 I)etween the measured ages can be estimated by inter polation using smooth curves. By using smooth curves, we make the implicit assumption that the genetic variances and covariances do not change in a discon tinuous fashion. (Our method can be modified to ;lccotnmodate discontinuities produced, for example, by metamorphosis by dividing the growth trajectory into pre and ~~ost~lletanlorpllosis periods, and deter mining the covariances within and between the two periocls.)
A variety of techniques could be used to estimate a continuous covariavce function V from an observed covariatlce nmtrix G. U'e have chosen to use ;I family of methods that involve fitting orthogonal functions t o the data. The motivation for using this appro;lch for fitting smooth functions to the data rather than some other (such as splines) is that the coefficients derived from fitting orthogonal functions are very useful for analyzing patterns of genetic variation i n the growth trajectory, as we describe below.
A pair of functions 6, and c$~ are said to be normal ized and orthogonal over the interval [u, u] if
1'' ~ , ( x ) @ , ( x ) d x = 0 and 1" (#.)?(x) d x = 1 .
Many families of functions that meet these criteria are available. M'e w i l l analyze the mouse data using the wllstudied Legendre polynomials. The choice of dlich family of orthogonal functions to use does not affect the estimates for the covariance function at t!le ages at which the data were taken (the points i n G) . l 'he choice does, however, affect the interpolation and therefore can affect conclusions regarding ages other than those at which the data were collected. (All families of orthogonal polynomials, however, w i l l pro duce the same estimate for !G if the maximum degree of the polynomials is held constant.) M'e favor poly nomials over series of sines and cosines (Fourier func tions), for exanqde, because on biological grounds we expect a covariance function for growth to be rela tively smooth rather than oscillatory. I n an! event,
982 M. Kirkpatrick, D. Lofsvold and M. Bulmer
the element of arbitrariness introduced by the choice of orthogonal functions decreases as the number of ages at which data were sampled increases.
The jth normalized Legendre polynomial, Pj, is given by the formula
/o\
where [ .] indicates that fractional values are rounded down to the nearest integer (BEYER 1976, p. 439). These polynomials are defined over the interval [ 1, 11, and so u = 1 and v = 1 . From Equation 2 , we find that the first three polynomials are:
4o(x ) = I/&, (34
and
The additive genetic covariance function 9can be approximated to any specified degree of precision using a complete set of orthogonal functions such as Legendre polynomials (COURANT and HILBERT 1953, p. 65). In this form, the covariance between body size at ages a l and a2 is
m m
a2) = C C [GI, +i(af)4j(a2*>, (4) r=O ,=o
where
and urnin and amax are respectively the first (smallest) and last (largest) elements of the age vector. The adjusted age vector a*, calculated from the age vector a using (5) , rescales the ages at which the data were taken to the range of the orthogonal functions. In the case of the mouse data, the age vector is a = [2, 3, 4IT. Thus anlin = 2 and amax = 4, and so the adjusted age vector is a* = [1, 0 , llT.
The matrix CG in Equation 4 is the coejjcient matrix associated with the covariance function .’9 Its elements are constants that depend both on Yand on the family of orthogonal functions 4 being used (Legendre pol ynomials, in this example). The full expansion of Equation 4 involves an infinitely large coefficient ma trix which can only be estimated with an infinite amount of data. Given measurements on the sizes of individuals at n ages, however, an n x n truncated version of C G can be estimated. We previously found
that using the truncated estimate CG consisting of relatively few dimensions often produces a good ap proximation (KIRKPATRICK and HECKMAN 1989), and this is our present goal.
We have developed two methods for estimating the coefficient matrix Cc. These correspond to two dif ferent ways to estimate the additive genetic covariance function 9 The first method yields what we refer to as a “full” estimate of Y This approach estimates the coefficient matrix in such a way that the correspond ing covariance function exactly reproduces the esti mated additive genetic variances andAcovariances at the ages that were measured (that is, G). Our second method produces a “reduced” estimate of 52 The motivation for this approach is the fact that any esti mate of G includes sampling error. Fitting a function through every point in G causes the sampling error to be included in the full estimate of 9 This noise makes the full estimate of Ysomewhat less smooth than the actual covariance function is. The reduced method finds a smoother and simpler estimate of 59 using information about the sampling error of G: the reduced estimate is the lowestorder polynomial that is statistically consistent with the data. A drawback of this method is that it excludes higherorder terms from the estimate of Yeven when they actually exist if the experiment is not sufficiently powerful to prove their presence. Because of this, we recommend inves tigators consider both the full and reduced estimates of 9.
The full estimate of 9: The full estimate of the additive genetic covariance function, denoted 2 is found by calculating the coefficient matrix C G whose corresponding covariance function exactly repro duces the estimated additive genetic covariance ma trix G. We can write the observed covariance matrix in terms of the orthogonal functions using Equation 4:
G = @ CG aT, (6)
where the matrix @ is defined such that [@Ili =
$,(a?). The matrix C G is the estimate of the Coefficient matrix appearing in Equation 4. It is truncated to dimensions n X n by the finite number (n) of ages represented in the data matrix G. Rearranging Equa tion 6 , we find an expression that can be used to calculate the estimated coefficient matrix:
C G = 9” G[@T]”. (7)
The matrix C G obtained from this calculation can be substituted into Equation 4 to give a continuous esti mate of the covariance function Yfor all ages between the earliest and latest at which the data were taken.
To illustrate, the study of RISKA et al. produced an estimate for the additive genetic covariance matrix of
Growth Trajectories 983
the log of male body weight at 2, 3 and 4 weeks:
[
[ 40(1) 41(1) 4 2 ( 1 ) 1 [
436.0 522.3 424.2
424.2 664.7 558.0 1 G = 522.3 808.0 664.7 .
The elements of 9 are calculated by evaluating the first three Legendre polynomials (Equation 3ac) at the three points of the adjusted age vector a*:
40(1) 41(1) M  1 ) a) = do@) 4 l ( O ) 4 2 w
0.7071 1.2247 1.5811 = 0.7071 0 0.7906 .
0.707 1 1.2247 1.58 1 1 1 The full +mate of the additive genetic covariance
function, 9, is found by plugging these matrices into Equation 7 to obtain CG:
66.5 1 12.0 6, = 66.5 24.3 14.0 .
ryl:.O 14.0 14.5 1 Finally, the full estimate of Yis obtained by substitut ing C G into Equation 4. This gives
@(a,, a2) = 808 + 71.2(ar + a:)
+ 3 6 . 4 ~ : ~ :  40.7(ar2a; + ala2 * *2 )
 215.0(ar2 + a;')
+ 8 1 . 6 ~ ~ a 2 , *2 *2
which is valid for ages between a = 2 and a = 4. The result can be verified by checking that indeed etl = @(ai, aj). The full estimate of the additive covariance function for the mouse data calculated in this way is shown in Figure 2.
The reduced estimate of 9: Our second approach, that of finding a reduced estimate for Y seeks to fit a set of k orthogonal functions to G , where k < n. We denote a reduced estimate of 9 a s 9 and the corre sponding reduced estimate of the coefficient matrix as e,. The method, which is described in detail in APPENDIX A, consists of two steps. First, a candidate estimate of Y is constructed using weighted least squares to fit the simplest possible orthogonal func tion, that in which @ is constant for all ages. Second, this candid3te estimate is tested for statistical consist ency with G. T o perform this test we have developed a procedure that produces an approximate x 2 statistjc for the goodness of fit of the reduced estimate to G. If this test shows that 9 is consistent with (that is, it does not differ significantly from) G , then it is ac cepted. If 9 differs significantly from G , we then consider a more complex reduced estimate by fitting
the first two orthogonal functions to the data. The fit is again tested using the x2 test. The procedure is iterated with successively more orthogonal functions until reduced estimates &. and @ are obtained that are consistent with G. If no simpler combination of orthogonal functions will successfully fit the data, the full estimate consisting of n orthogonal functions will always fit the data perfectly.
Using this method on the mouse data (see APPENDIX A), we find that the leastsquares estimate for @ that consists of the first Legendre polynomial, 40, alone is @(al, a?) = 324. This estimate is rejected because the test statistic x2 = 57.3 with 5 degrees of free dom shows the estimate is inconsistent with the data (P << 0.01). The least squares estimate of Yproduced by the first two Legendre polynomials (a constant and a linear term) is
~ ( ~ l , ~ ~ ) = 3 1 2 . 2  1 1 . 9 ( a ~ + a ~ ) + 2 4 . 5 ~ : ~ ~ .
This estimate is also inconsistent with G ( x 2 = 38.7, 3 d.f., P << 0.01). Consequently, it is not possible to find a reduced estimate of Yfor this data set: only the full estimate consisting of the first three Legendre poly nomials, shown in Figure 2, is statistically consistent with G. In contrast, other data sets (particularly cases in which the number of individuals is smaller and the number of ages is larger than in this example) will often result in a reduced estimate that is consistent with the data.
Analysis of the additive genetic covariance func tion. The major motivation for using orthogonal func tions to estimate 9 is that the coefficient matrix Cc can be used to analyze the patterns of inheritance (KIRKPATRICK and HECKMAN 1989). In particular, the coefficient matrix can be used to calculate the eigen functions and eigenvalues of Y
Eigenfunctions are analogous to the eigenvectors (principal components) familiar from the analysis of covariance matrices. Each eigenfunction is a continu ous function that represents a possible evolutionary deformation of the mean growth trajectory. Any mean growth trajectory can be thought of as the sum of a population's current mean growth trajectory plus a combination of the eigenfunctions of its additive genetic covariance function. Paired with each eigen function is a number known as its eigenvalue. The eigenvalue is proportional to the amount of genetic variation in the population corresponding to that ei genfunction. Eigenvalues (and the eigenfunctions as sociated with them) are conventionally numbered in order of decreasing size, beginning with the largest.
Eigenfunctions with large eigenvalues are defor mations for which the populations has substantial ge netic variation. The shape of the mean growth tra jectory will therefore evolve rapidly along these
984 M. Kirkpatrick, D. Lofsvold and M. Bulmer
deformations if they are favored by selection. Eigen functions with very small (or zero) eigenvalues, on the other hand, represent deformations for which there is little (or no) additive genetic variation. If selection favors a new mean growth trajectory that is obtained from the current trajectory by some combination of these deformations, there will be very slow (or no) evolutionary progress towards it. The eigenfunctions and eigenvalues therefore contain information that is of great value in understanding the evolutionary po tential of growth trajectories. The ith eigenfunction and eigenvalue are denoted I)i and X i , respectively.
In principle, a covariance function has an infinite number of eigenfunctions and eigenvalues. (Many of the eigenvalues may, however, be zero.) In practice, we are able to estimate only a few of them because experiments give information about the covariance function at only a finite number of points (ages). The number of eigenfunctions and eigenvalues that can be estimated equals the dimensionality of the estimated coefficient matrix, which will be equal to the number of ages at which size was measured when dealing with a full estimate of 9 but will be smaller when using a reduced estimate.
Estimates of the eigenfunctions I)l and eigenvalues X, are calculated from the coefficient matrix CG. The ith eigenfunction is constructed from the relation
Tl1
$,(a) = C [c+tIl4j(a*), (8) j = O
where [c+,]] is the j th element of the ith eigenvector of CG (KIRKPATRICK and HECKMAN 1989). The ith eigenvalue of Yis identical to the ith eigenvalue of CG. Eigenfunctions are adjusted to a norm of unity by convention in order to allow meaningful comparisons between the eigenvalues. This is conveniently done by requiring that the norms of the eigenvectors c+i equal unity. (Virtually all software packages which compute eigenvalues and eigenvectors do this as a matter of course.) Thus to obtain estimates of the eigenfunctions and eigenvalues, we determine the ei genvectors and eigenvalues of our estimate of the Coefficient matrix CG, then use these in Equation 8. The method can >e applied using either the full coefficient matrix CG or a reduced coefficient matrix
Sampling errors in the estimate of the genetic co variance matrix G produce biases in the estimates of the eigenvalues (HILL and THOMPSON 1978). Al though the estimate of the arithmetic mean of the eigenvalues (ie., 1/n X i ) is unbiased, the larger eigenvalues are consistently overestimated while the smaller eigenvalues are consistently underestimated. This problem, which is general to all multivariate quantitative genetic studies, becomes particularly ob vious in data sets that produce one or more eigenvalue
C G .
estimates that are negative. (Covariance matrices are by definition positive semidefinite, and so have no negative eigenvalues.) HAYES and HILL (1981) pro posed transforming the estimate of G using a method they term “bending” in order to remedy this problem. Their method can be applied to G whenever negative eigenvalues are encountered if an estimate of the phenotypic covariance matrix P is available.
Often one would like to know the sampling distri butions of the eigenvalues estimated for the additive genetic covariance function. We have developed two methods and describe them in detail in APPENDIX c. The first method constructs separate confidence limits for each eigenvalue by numerical simulation. The approach is to generate ;d simulated covariance matrix whose expectation is G but that includes random deviations in the elements that correspond to the sampling error. The eigenvalues for the coefficient matrix corresponding to each simulated G are calcu lated. This procedure is iterated many times, and the distribution for each eigenvalue is constructed empir ically with the results. The second method uses a chi squared statistic to test hypotheses about one or more of the eigenvalues. Typically, the hypothesis of inter est is whether or not the observed eigenvalues are statistically distinguishable from zero.
We will now illustrate the methods for analyzing genetic covariance functions with the full estimate pf 59 from the mouse data. All three eigenvalues of CG are positive, and so bending the data matrix is unnec essary. Using a standard computer Rackage, we find that the first (largest) eigenvalue of Cc is XI = 1361, and the eigenvector associated with it is
= [0.995, 0.0504, 0.0831IT.
By substituting this into Equation 8, we obtain the full estimate for the first eigenfunction of 9
$ ] (a ) = 0.7693  0.0617~”  0.1971~*2.
The second and third eigenfunctions are obtained in the same way. The three eigenfunctions are shown in Figure 3. The eigenvalues associated with the eigen functions are XI = 1361, X2 = 24.5 and As = 1.5 (Figure 4).
Any conceivable evolutionary change in a popula tion’s mean growth trajectory can be written in terms of a weighted sum of the eigenfunctions. The rate at which a population will evolve from its current mean trajectory to some new trajectory favored by selection is determined by the eigenvalues associated with the eigenfunctions responsible for that change. A large eigenvalue indicates that a change corresponding to that eigenfunction will happen rapidly, while a small (or zero) eigenvalue indicates that the change will be slow (or will not happen at all).
The first eigenfunction is a deformation involving
Growth Trajectories 985
2 J , I I
2 3 4
AGE (weeks) FIGURE J."Estimates of the three eigenfunctions and their ei
genvalues for the additive genetic covariance function %
an overall increase or decrease of size at all ages (Figure 3). The large size of the first eigenvalue indi cates that selection will produce rapid changes if this kind of alteration in the mean growth trajectory is favored. The second eigenfunction corresponds to genetic changes that increase (or decrease) size be tween ages 2 to 3 weeks, and decrease (or increase) size after 3 weeks of age. The third eigenfunction shows a more complex pattern. The second and third eigenvalues, however, reveal that the amount of ge netic variation associated with these eigenfunctions is small in comparison with the variation associated with the first eigenfunction. These eigenvalues indicate that the evolutionary response to selection would be two or more orders of magnitude slower for changes involving the second and third eigenfunctions than for those involving the first eigenfunction.
The 95% confidence regions for each of the eigen values constructed by the numerical simulation method (described in APPENDIX c) are [ 1 100, 17001 for A,, [ 17, 331 for AB, and [2.7, 5.11 for A3 (Figure 4). We are therefore quite confident that the large differences between the estimate of the first eigen value compared with the second and third are real. The conclusion that the estimate for A3 is not statisti cally different from zero is confirmed by the chi squared test (also described in APPENDIX c). The hy pothesis that AS equals zero gives x;,, = 0.65 which is not significant (P > 0.1). The hypothesis that both X2
and A3 are zero, however, is rejected (x& = 40.4,
A qualitatively similar picture of the pattern of genetic variation for mouse growth emerges from an analysis of the full data set for ages 2 10 weeks. This analysis and its evolutionary implications will be dis cussed in a later publication.
P << 0.0 1).
10000 1
7;.3
EIGENVALUE FIGURE 4.The three eigenvalues of the additive genetic covar
iance function Yand their 95% confidence limits (determined by the numerical simulation method) on linear (above) and logarithmic (bdour) scales. The confidence limits for A3 include zero.
ESTIMATING T H E SELECTION GRADIENT FUNCTION /3
Having developed the methods for estimating the additive genetic covariance function .Y we now turn to methods for estimating the selection gradient func tion P. The techniques are extensions of the results of LANDE (1 979) and LANDE and ARNOLD ( 1 983). Appli cations and difficulties with these methods are dis cussed by ARNOLD and WADE ( 1 984a, b) and MITCH ELLOLDS and SHAW (1 987).
Our strategy here is the same as is used to estimate 9 The values of p at a finite number of ages are estimated, and then a continuous function is estimated by interpolating between these points. The selection gradient acting on any trait is defined as the partial regression of the phenotypic value of that character onto relative fitness, holding the phenotypic values for other traits constant (LANDE and ARNOLD 1983). In the context of growth trajectories, the partial regression coefficients of relative fitness at each of several ages ontp size form an estimated selection gradient vector b. The continuous selection gradient function canJhen be estimated by fitting orthogonal functions to b.
A selection gradient function can be written in terms of the same orthogonal functions that were used to describe the additive genetic covariance function:
(KIRKPATRICK and HECKMAN 1989). In (9), cg is the coefficient vector associated with the selection gra dient function P. The full estimate of cg that passes
986 M. Kirkpatrick, D. Lofsvold and M . Bulmer
through every point in 6 is found using the relation
to = 0” 6. (10)
The continuous selection gradient function is esti mated by substituting the 60 into (9). Alternatively, given informatio? on the errors in the estimates of the elements of b, a reduced estimate of p can be calculated using weighted least squares as described in APPENDIX A.
Estimating the selection gradient function P thus requires an estimate of the selection gradient vector b. The basic methodologies for estimating b are dis cussed by LANDE and ARNOLD (1983), ARNOLD and WADE (1984a, b), and MITCHELLOLDS and SHAW (1 987). The methods can beapplied to growth trajec tories in two ways. The first requires data on the sizes of individuals at each of several ages and a measure of their lifetime relative fitnesses. The selection gra dient vector can then be estimated directly as the partial regressions of size onto relative fitness at those ages. This is the preferable approach, but is limited to cases in which there is data on the lifetime fitnesses of individuals.
In the absence of such data, an indirect method that makes use of data on the effects of size on fecundity and mortality can be used if relative fitnesses are constant in time. Under that assumption, a result from LANDE (1 979) can be extended to show that
6 p(a) da =  In@), s,
where @ is the population’s mean fitness and (6/i$)ln(W) represents the first variation of ln(v) with respect to2 (see COURANT and HILBERT 1953, p. 184; K. GOMULKIEWICZ, in preparation). Equation I I is analogous to the equation for a finite number of quantitative characters, @ = Vln(m) (LANDE 1979; LANDE and ARNOLD 1983).
We can make use of Equation 11 if we have some understanding of how size affects fitness. If, for ex ample, fecundity and mortality rates are functions only of size and age, then the relationship between the selection gradient and these life history attributes is
O(a> = fdaW ’(a)  fz(a)l*’(a) (1 2) where
and
Here, l(a) is the probability a newborn survives to age a, m(a) and ;(a) are, respectively, the average birth
and mortality rates among surviving individuals at age a, and primes denote derivatives taken with respect to>*(a), the mean size of individuals alive at age a (KIRKPATRICK 1984, 1988). Fitness, on the other hand, may be determined in part by factors other than size and age, such as growth rate. In these cases, Equation 12 can be modified to account for the way in which these other factors determine fitness (LYNCH and ARNOLD 1988).
Using the indirect method of estimating the selec tion gradient function ,kl depends on evaluating the components of Equation 12 (or its analog, if fitness depends on more than size and age alone). The term m ’(a) is the rate at which the average birth rate of individuals alive at age a changes per unit increase in the mean size of individuals. Given census data about a cohort of individuals at ages ai and a I c l , this term can be estimated using the regression of fecundity on body size, divided by the duration of the interval:
where ai = (a, + a;+1)/2 is the midpoint of the interval between a, and a,+l. Equation 13 is a linear interpola tion that attributes the effects of size on birth rate to the midpoint of the interval being measured. The term Cov[m, f ( 4 ] is the covariance between the num ber of births over the interval and body size among those individuals that survived through the entire interval. The average of an individual’s size at the beginning and at the end of the interval should be used for this purpose. The term u2*(i,) is the mean of the variance in size at the start of the interval and the variance in size at the end of the interval among those individuals that survived throughout the period. Only individuals that survive are used in the calculation because the fecundity of individuals that died in the interval is reduced by the reduced time they had in which to reproduce.
The term ;’(a) in Equation 12 represents the effect of a change in the mean body size on the average mortality rate at age a. This is estimated from the relation:
In (1 4), 2+(a,) is the mean size at age a, of individuals that survive to reach age U ~ + ~ , ~ ( U ~ ) is the mean size of all individuals alive at age a,, and u2*(al) is their variance in size. Equations 13 and 14 follow from Equations 11, 12, and the results of ROBERTSON (1 966) and PRICE (1 970).
The interpolations of Equations 13 and 14 become increasingly accurate as the amount of growth that occurs over the interval becomes small relative to the variation in size among individuals present at the start
Growth Trajectories 987
of the interval. The remaining terms involved in Equation 12, which are the survivorships and mean birth rates at different ages, can be estimated directly from census data.
Given census data from n times during the ontogeny of a cohort, this method will estimate the selection gradient function at n  1 ages. These n  1 point estimates form a selection gradient vector b which can then be used to estimate the continuous selection gradient function /3 via Equations 9 and 10.
PREDICTING THE EVOLUTIONARY DYNAMICS OF THE GROWTH TRAJECTORY
The estimates of the additive genetic covariance function Yand the selection gradient function P can be used directly in Equation 1 to predict Ay, the evolutionary change in the mean growth trajectory. Using Equation 1 directly is awkward, however, be cause the integration in (1) must be performed for each age a at which Aj((a) is to be evaluated. A method making use of Cc, the coefficient matrix for the ad ditive genetic covariance function, and co, the coeffi cient vector for the selection gradient function, cir cumvents this difficulty. Using a result from KIRKPAT RICK and HECKMAN (1989), the evolutionary change in the mean growth trajectory is
A,.@) = c [cAJ=]di(U), (1 5) 2
where the coefficients cy are given by the matrix equation
CAf = C G C o . (16)
The summation in (15) extends over all i for which [cA,]' I i s nonzero.
These formulas allow us to estimate the evolution ary change in the mean growth trajectory following one generation of selection. The full or reduced esti mates of the coefficient matrix Cc and coefficient vector cp are determined using the methods described in the last two sections. These are used in Equation 16 to estimate cA2 The result is then substituted into (1 5) to give an estimate of the evolutionary change.
Equation 16 can be applied regardless of whether or not the additive genetic covariance function and the selection gradient function were estimated at the same ages: transforming the measurement into load ings on orthogonal functions puts the measurements on the same basis. In the event that the number of ages used to estimate the covariance function differs from the number used to estimate the selection gra dient function, CC and 20 will be of different dimen sions. Equation 16 can be applied in such cases by truncating the dimensions of the larger one to match those of the smaller.
A difficulty that arises when studying natural pop
ulations is that ongoing selection makes it impossible to directly observe the unselected distribution of in dividuals at any given age, since mortality at earlier ages can alter the distribution that will appear at later ages. The observed mean size of individuals surviving to age a , for example, will generally differ fromy(a) because of selection at earlier ages. The same problem appears when one tries to estimate the additive genetic covariance function from data on a population expe riencing selection. The quantities can, however, be estimated if selection is weak by calculating what the cumulative effects of selection at earlier ages have been on the distribution of sizes among the survivors. The basic methodology has been outlined by LYNCH and ARNOLD (1 988).
DISCUSSION
The infinitedimensional method for analyzing the evolution of growth trajectories joins two alternative methods in current use. Previous workers either have treated the sizes of individuals at a set of landmark ages as a finite number of traits or have fit parametric families of curves to the growth trajectories. Our alternative offers several types of advantages over those methods, including the ability to treat the full, continuous growth trajectory without making restric tive assumptions about the form of growth curves that a population's genetic variation will allow (KIRKPAT RICK and HECKMAN 1989).
Two additional benefits to the infinitedimensional method appear from the techniques introduced in this paper. First, the method explicitly accounts for the spacing of the ages at which the data were taken. There are advantages to designing breeding experi ments with unequally spaced sample intervals. Genetic variances and covariances change rapidly during cer tain periods of ontogeny, often corresponding to crit ical events such as weaning (see Figure 2; see also HERBERT, KIDWELL and CHASE 1979; CHEVERUD, RUTLEDGE and ATCHLEY 1983; ATCHLEY 1984). Pe riods in which the variance structure is changing rapidly should receive a greater sampling effort. (Ideally, the frequency at which data is collected should be proportional to how rapidly the variances are changing at that point in development.) The infi nitedimensional approach allows an investigator to concentrate effort on the critical periods, but also give these measurements the appropriate weights when estimating the population's response to selection.
A second additional benefit to using this approach is that the ages at which the genetic parameters are estimated and the ages at which the strength of selec tion is evaluated need not be the same. I t may often be the case that logistical reasons make it hard or impossible to take these data at the same ages. This w i l l immediately eliminate the possibility of using con
988 M. Kirkpatrick, D. Lofsvold and M. Bulmer
ventional quantitativegenetic methods, since they re quire that the characters on which the genetic and selection parameters are measured are homologous.
The price paid for these advantages is that our method relies on an assumption of infinitedimen sional normality in the distribution of the additive genetic component of the growth trajectories. The normality assumption is basic to classical quantitative genetics, and has support from both empirical studies and several kinds of models for the effect of genes at the underlying loci (FALCONER 198 1 ; BULMER 1985; BARTON and TURELLI 1989). The genetic effects for even a single trait can, however, depart from normal ity ( e .g . , ROBERTSON 1977). Thus an important ques tion in quantitative genetics is the extent to which multiple quantitative traits (including growth trajec tories) conform to multivariate normality. This is an empirical question, since at present it seems unlikely that it can be resolved by theory (TURELLI 1988). Models such as ours that are based on a normality assumption, however, may provide reasonable ap proximations for the evolution of the mean pheno types even when this assumption is violated if the departures are small.
We are grateful to R. GOMULKIEWICZ and F. H. C. MARRIOTT for important suggestions regarding the mathematical analysis. We thank B. RISKA for help in analy7ing his mouse data. M. LYNCH, M. TURELLI, and two anonymous reviewers made numerous helpful con~~nents on an earlier draft. This work was supported by National Science Foundation grants BSR8604743 and BSR8657521 to M. Kirkpatrick.
LITERATURE CITED
ANDERSON, 1 . W., 1958 .4n Introduction to Multivariate Analysis. Wiley, New York.
ARNOLD, S. J,, 1981 Behavioral variation in natural populations. 1. Phenotypic, genetic, and environmental correlations between chemoreceptive responses to prey in the garter snake, Tham nophis elegans. Evolution 35: 489509.
ARNOLD, S. J . , and M . J. WADE, 1984a On the measurement of natural and sexual selection: theory. Evolution 38: 709720.
ARNOLD, S. J . , and M . J. WADE, 198413 On the measurement of natural and sexual selection: applications. Evolution 38: 720 734.
AICHIXY, W. R., 1984 Ontogeny, timing of development, and genetic variancecovariance structure. Am. Nat. 123: 5 19540.
BARTON, N . H., and M. TURELLI , 1989 Evolutionary quantitative genetics: how little do we know? Ann. Rev. Genet. 23: 337 370.
BECKER, W., 1984 Manual ofQuantitative Genetics, Ed. 4 Academic Euterpriseb, Pullman, Wash.
BKYER, W. H., 1976 Handbook of Standard Mathematical Tables, Ed. 25. C.R.C. Press, Boca Raton, Fla.
BOHREN, B. B., H. E. MCKEAN and Y. YAMADA, 1961 Relative efficiencies o f heritability estimates based on regression of offspring o n parent. Biometrics 17: 481491.
BULMKR, M . G., 1985 The Mathematical Theory of Quantitative Genetics. Oxford University Press, Oxford.
BULMER, M., 1989 Maintenance of genetic variability by muta tiowselection b;tl;rnce: a child's guide through the jungle. Ge nome 31: 76 1767.
CHEVERUD, J. M . , J. J . RUTLEDGE and W. R. ATCHLEY, 1983 Quantitative genetics of development: genetic correla tions among agespecific trait values and the evolution of on togeny. Evolution 37: 895905.
COURANT, K., and D. HILBERT, 1953 Methods of Mathematical Physics, Vol. 1. Wiley, New York.
DRAPER, N . R., and H. SMITH, 1966 Applied Regression Analysis. Wiley, New York.
ERENMAN, B., and L. PERSON (Editors), 1988 The Dynamics of SizeStructured Populations. SpringerVerlag, Berlin.
FALCONER, D. S., 198 1 Introduction to Quantitative Genetics, Ed. 2 Longtryan, London.
FITLHUGH, H. A., 1976 Analysis of growth curves and strategies for altering their shapes. J. Anim. Sci. 33: 10361 051.
HAYES, J. F., and W. G. HILL, 1981 Modification of estimates of parameters in the consrruction of genetic selection indices ('bending'). Biometrics 37: 483493.
HERRER'I', J . G., J. F. KIDWELL and H. B. CHASE, 1979 The inheritance of growth and form in the mouse. IV. Changes in the variance components of weight, tail length, and t a i l width during growth. Growth 43: 3646.
HILL, W. G . , and R. THOMPSON, 1978 Probabilities of nonposi tive definite betweengroup or genetic covariance matrices. Biometrics 34: 429439.
KEMPTHORNE, O., and 0. B. TANDON, 1953 The estimation of heritability by regression of offspring on parent. Biometrics 9: 901 00.
KIRKPATRICK, M., 1984 Demographic models based on size, not age, for organisms with indeterminate growth. Ecology 65: 18741884.
KIRKPATRICK, M., 1988 The evolution of size in sizestructured populations, pp. 1328 in The Dynamics ofSizeStructured Pop ulations, edited by B. EBENMAN and L. PERSSON. Springer Verlag, Berlin.
KIRKPATRICK, M . , and N . HECKMAN, 1989 A quantitative genetic model for growth, shape, and other infinitedimensional char acters. J. Math. Biol. 27: 429450.
LANDE. R., 1979 Quantitative genetic analysis of multivariate evolution, applied to brain:body size allometry. Evolution 33:
LANDE, R. , and S. J. ARNOLD, 1983 The measurement of selection
IBFSVOLD, D., 1986 Quantitative genetics of morphological dif ferentiation in Peromyscus. I . Tests of the homogeneity of genetic covariance structure among species and subspecies. Evolution 40: 559573.
LYNCH, M . , and S. J. ARNOLD, 1988 The measurement of selec tion on size and growth. pp. 4759 in The Dynamics of Size Structured Populations, edited by B. EBENMAN and L. PERSSON. SpringerVerlag, Berlin.
MITCHELLOLDS, T . , and R. G. SHAW, 1987 Regression analysis o f natural selection: statistical inference and biological inter pretation. Evolution 41: 11491 161.
PRICF,, <,. R., 1970 Selection and covariance. Nature 227: 520 521.
PRICE, T . D., P. K. GRANT and P. T. BOAG, 1984 Genetic changes in the morphological differentiation of Darwin's ground finches, pp. 4966 in Population Biology and Evolution, edited by K. WOHRMANN and V . LOESCHKE. SpringerVerlag, Berlin.
KISKA, B., U'. R. ATCHLEY and J. J. RUTLEDGE, 1984 A genetic analysis of targeted growth in mice. Genetics 107: 791 0 1 .
RORERI'SON, A , , 1966 A mathematical model of the culling proc ess i n dairy cattle. Anim. Prod. 8: 93108.
RORERI'SON, A , , 1977 The nonlinearity of offspringparent regression, pp. 297304 in Proceedings of the International Con ference on Quantitative Genetics, edited by E. POLLAK, 0 .
KEMPTHORNF and T. B. BAILEY. Iowa State University Press, Atnes.
40241 6.
on correlated characters. Evolution 37: 12 101 226.
Growth Trajectories 989
TC'RELLI, M . , 1988 Phenotypic evolution, constant covariances, and the tnaintenance of additive variance. Evolution 42: 1342 1347.
WRIGHT, S., 1968 Evolution and the Genetics ofPopulations: Vol. 1, Genetic and Biometric Foundations. University of Chicago Press, Chicago.
Communicating editor: M . TURELLI
APPENDIX A
Here we present a method for fitting a reduced estimate of Yand testing for its consistency with the data. We then illustrate the procedure using the data on the log of male mouse weight at ages, 2, 3 and 4 weeks from RISKA, ATCHLEY and RUTLEDCE (1984).
Finding a reduced estimate: Recall that a reduced estimate is one consisting of k orthogonal functio;s, where k is smaller than n (the dimensionality of G, which equals the number of ages at which measure ments of body size were obtained). Given a set S of k orthogonal functions, we use the method of weighted least squares to fit the k X k reduced coefficient matrix e,. This produces the most statistically efficient esti mate of the coefficient matrix that can be obtained from a linear function of the elements of G (DRAPER and SMITH 166, p. 80). T o apply weighted least squares, we begin by forming the vector g by stacking the successive columns of G:
1  g = (611, . . . ) G,l, G12, * * e , e n 2 , . * e , d,,)T.
This vector has dimension n2. The vector 2. = (&I, . . . , ckO, cOl, . . . , e k l , . . ., &k)T is formed in the same way from the (as yet undetermined) coefficient matrix e,, and has dimension k 2 . In this notation, the relation between the undetermined coefficients and the ob served genetic covariances is given by the regression equation
* . \ .
g = X s 2 . + e , (AI)
where e is a vector of errors and the matrix Xs is determined by the set S of orthogonal functions. The matrix Xs is calculated by first forming Qs, the n X k matrix obtained by deleting the columns of 9 corre sponding to those 4 not in S, then taking the Kron decker product of Qps with itself:
XS = Qs €3 = [ (Qs)~; Q s , (Qs)~; 9 s : 1 . (A2) ( Q S ) l I Qs ( W 1 2 Qs *
This is a matrix of dimensions n 2 X k2. Calculating 2. also requires the covariance matrixV
of the errors in the estimates of g: V,,k, = Cov[G,, 6 ~ 1 . V can be estimated given the particular design of the breeding experiment used to estimate G . We present the general method for calculating e, the
estimate of V, and apply it to three widely used experimental designs (parentoffspring regression, halfsibs, and full sibs) in APPENDIX B.
In typical regression applications, a leastsquares estimate of the coefficients in c would follow directly from the linear form of Equation A1 and the specifi cation of Xs and V. The symmetry of G , however, produces redundancies in the vector g that cause V to be singular and so prevents us from calculating 2. from Equation A1 immediately. We therefore make the following modifications:
1. Delete from 9 those columns and rows corre sponding to elements of g whose entry G, has i < j .
2. Delete from Xs the rows corresponding to those elements of g for which G,j has i C j.
3. For each element of 2. for which Cy has i < j, add the corresponding column of Xs to the column cor responding to Cj,, then delete the former column.
4. Delete from g the elements for which G,j has i < j .
5. Delete from 2. the elements for which C,j has i < j . Following these operations, 9 has dimensions n(n + 1)/2 X n(n + 1)/2, Xs becomes n(n + 1)/2 X k(k + 1)/ 2, g becomes n(n + 1)/2 X 1, and 2: becomes k(k + 1)/ 2 x 1.
We now can calculate 2. using standard weighted least squares procedures [see, e.g. DRAPER and SMITH (1966, pp. 7781) and BULMER (1985, pp. 6061)]:
2. = (XsT iT' XS)" XsT 91 g. (A3)
The reduced coefficient matrix e G is then constructed from 2.. First, form a matrix by restoring the elements deleted in Step 5 above, and "unstacking" the col umns. Then insert a row and a column of zeroes in the positions corresponding to those orthogonal func tions not included in Qs to obtain e,. (For example, if the firstorder orthogonal function, has been omitted, a row of zeroes would be inserted into Cs between the 0th and 2nd rows, and a column of zeroes between the 0th and 2nd columns.) The reduced estimate @of the additive genetic covariance function is then obtained by substituting e G into Equation 3.
Having produced the reduced estimate 9 using the set of orthogonal functions S, we want to test the goodness of fit of 9 to G. We have adopted a proce dure that approximates the distribution of errors in the estimated 6,, by a multivariate normal. Using this approximation, the consistency of 9 and G can be determined using the standard test for the fit of a regression model [see DRAPER and SMITH (1966, pp. 7781) and BULMER (1985, pp. 6061)]. We test the chisquared statistic
= (g  xs ;)T 9" (g  Xs E), (A4)
990 M. Kirkpatrick, D. Lofsvold and M. Bulmer
where m = n(n + 1)/2 is the number of degrees of freedom in G and p = K ( K + 1)/2 is the number of parameters being fit. A significant result indicates that the model is inconsistent with the data, in which case we attempt to fit a model consisting of a larger number of orthogonal functions.
Because we are approximating the errors in the G,’s as multivariate normal, the chisquares test does not produce exact probability values. We expect it, how ever, to be a reasonable guide that discriminates be tween candidate covariance functions that fit the data reasonably well and those that do not. More accurate tests could be developed with numerical simulation.
In summary, the algorithm for finding the reduced estimate of the covariance function is as follows. Esti mates of the additive covariance function are obtained by fitting orthogonal functions in a stepwise manner using weighted least squares (Equation Al). Each es timate is tested against G using an approximate statis tical test given by Equation A4. The reduced estimate is the simplest set of orthogonal functions (e.g., the polynomial of lowest degree) which when fit produces an estimate of Y that is not statistically different from
A worked example: T o illustrate the method, con sider a reduced estimate for the mouse data consisting of the first two Legendre polynomials (that is, th_e polynomials of degrees 0 and 1). The data matrix G is given in the text (following Equation 7). Following the steps outlined above, we have
g = [436.0, 522.3, 424.2, 522.3,
A
6.
808.0, 664.7, 424.2, 664.7, 558.0IT.
The matrix CPS is found by deleting from the matrix CP (given in the text, following Equation 7) the third column, corresponding to the missing 2nd degree polynomial. This gives:
[ 0.7071 1.2247
0.7071 1.2247 O 1 CPS = 0.7071
From Equation A2 and the steps listed above we find that
0.5 0.866 0.866 1.5 0.5 0.0 0.866 0.0 0.5 0.866 0.866 1.5 0.5 0.866 0.0 0.0
0.5 0.866 0.0 0.0
0.5 0.0 0.866 0.0 0.5 0.866 0.866 1.5 
Xs = . 0.5 0.0 0.0 0.0
0.5 0.866 0.866 1.5
Using the method described in APPENDIX B, we find that 6, the estimated covariance matrix of errors in
g, is
2752 3187 2541 3187 3187 4527 3504 4527 2541 3504 3057 3504 3187 4527 3504 4527 3692 6210 4708 6210 2944 4830 4058 4830 2541 3584 3057 3584 2944 4830 4058 4830 2347 3754 3477 3754
3692 2944 2541 2944 2347 6210 4830 3584 4830 3754 4708 4058 3057 4058 3477 6210 4830 3584 4830 3754
7921 6673 4058 6673 5562 4708 4058 3057 4058 3477 7921 6673 4058 6673 5562 6005 5562 3477 5562 5 155
10453 7921 4708 7921 6005 .
We now follow the steps prescribed ?hove. Step 1, which deletes rows and columns from V, produces
2752 3 187 2541 3692 2944 2347 3187 4527 3504 6210 4830 3754 2541 3504 3057 4708 4058 3477 3692 6210 4708 10453 792 1 6005 ‘
2944 4830 4058 7921 6673 5562 2347 3754 3477 6005 5562 5155 1
By deleting rows from XS (Step 2 above) yields
0.5 0.866 0.866 1.5 0.5 0.0 0.866 0.0
0.5 0.866 0.0 0.5 0.866 0.866
The vector of coefficients, c = [coo, clo, cui, clIlT, contains the element col for which i <j . In Step 3 we therefore add the third column of Xs to the second, then delete the third column. This leaves the matrix in its final form:
0.5 1.732
xs = 0.5 0.0 0.0 . 0.5 0.866 0.0 0.5 1.732
Removing the redundant elements in g (Step 4) gives
g = [436.0, 522.3, 424.2, 808.0, 664.7, 558.0IT,
and doing the same for Z: (Step 5) produces = [ C O O ,
co1, 6111 , T
Growth Trajectories 99 1
The reduced coefficient vector E is calculated using Equation A3. This gives
E = 1624.3, 13.8, 16.3IT,
and so
[ 0.0 0.0 0.0 1 624.3 13.8 0.0 CG = 13.8 16.3 0.0 .
By using these coefficients in Equation 3, we arrive at the reduced estimate of Ythat consists of the 0 and 1 st degree Legendre polynomials:
@(al, ~ 2 ) = 312.2  11.9(a: + a:) + 24.5a:~:.
The reduced estimate .!@ can be tested against the observed genetic covariance matrix G using the chi squared statistic of Equation A4. We find x2 = 38.68. Since G has m = 6 degrees of freedom and we have estimated p = 3 coefficients, we test the statistic with 3 degrees of freedom and find that the difference between the reduced estimate @ and the observed values G are highly significant. We therefore reject the reduced estimate consisting only of the 0 and 1st degree Legendre polynomials.
Following the same procedure for all other combi nations of 0, 1st and 2nd degree Legendre polyno mials shows that only thefull estimate consisting of all three is consistent with G. The error variance of the Gq's in these data is therefore sufficiently small that no reduced model is acceptable, although this may often not be the case for smaller data sets. The full estimate 9 is shown in Figure 2.
APPENDIX B
This appendix describes methods to calculate V, the covariance matrix of errors in G , the estimated additive genetic covariance matrix. We use the nota tion vij,kl to denote the covariance of Gij and 6 k l . Below we present formulae for estimating V Yrom three widely used breeding designs: half sibs, full sibs, and parentoffspring regression.
In the following calculations, we will often need an expression for the covariance of two mean cross prod ucts. From classical statistics theory we have the result
Cov(Mq, M U ) = (MikMjl + MilMjk)/fr (B1)
where M , is the mean cross product of variables i and j , and f is the number of degrees of freedom (ANDER SON 1958, p. 161; BULMER 1985, p. 94). Replacing each of the M ' s with its estimate M and dividing by (f + 2) rather than f yields fi,,hi, an unbiased estimator of the covariance.
Halfsib analysis: In the classic halfsib analysis, s sires are each mated to n dams, and one offspring is measured from each mating. An analysis of variance
and covariance partitions the observed variation among the offspring into components among sires and a residual [see FALCONER (198 1, p. 140) and BECKER (1984, pp. 4554, 1131 IS)]. The additive genetic component of variance is estimated as
6,. 'I = 4(M aJJ ..  Me,q)/n, (B2)
where Ma,q is the mean crossproduct among sires, Mea is the residual mean crossproduct, and n is the number of offspring per sire in a balanced design. (The mean crossproducts Ma,v and Me,q are defined so as to be independent.) The sampling covariance is therefore
Ijrl ,kl = Is [Cov(Ma,v, Ma,,,,) + Cov(ke,g, ke ,~) ] , (B3) n 2
where the covariances of the M ' S are given by Equa tion B1.
We often want to estimate V from data summaries in the literature that do not include the estimated mean cross products. These quantities can, however, be backcalculated from the estimated acditive genetic and phenotypic covariance matrices G and P that frequently are reported. In a half sib analysis, the necessary relations are
A M . = p. .   G . .  1  ~ J J 'I 'I (B44
and
M a , q =  6, + i$. (B4b) n 1
4
Substituting (B4) into (B3) then gives an estimate of
Full sib analysis: In this breeding design, each of s sires is mated to d dams, and n progeny are measured per dam [see FALCONER (1981, pp. 140141) and BECKER (1984, pp. 5565, 119127)]. The resulting nested analysis of variance and covariance yields two estimates of the genetic covariance:
vij,kl.
63.9 = ~(Ms, ' I  Md.q)/nd, (B5a)
and
&,q = 4(Md,q + Me,q)/n, (B5b)
where M s , q , Gd,q, and M e , q are respectively the esti mated among sires, among dams, and residual crossproducts. The two estimates of the G's give rise to two estimates for the V's:
V"  y,kl   [cOV(k~,,j, M s , k l ) cOV(fid,q, f i d , ~ ) ] (B6a) 16
n2d2
and
16 n
@qj,kl = 7 [COV(Me,q, h e , k l ) + COV(M~,~, f i d , k l ) ] ? (B6b)
992 M . Kirkpatrick, D. Lofsvold and M. Bulmer
where the covariances are again calculated using Equation A 1. The two estimates of V obtained from (B6a) and (B6b) can be averaged to give a single composite estimate.
The k7s that appear in (B6a,b) can be obtained from reported values of Gs, Gd, and P using
1 M,,q = Pi,   ( G q + Gd,q), 4 (B7a)
and
Parentoffspring regression: When parentoff spring regression [see FALCONER (1 98 1, pp. 136 140) and BECKER (1984, pp. 103106, 133134)] is used, the additive genetic covariance of trait i with trait j can be estimated using
Gq = (Mq + M9)/2, 038)
where Mq is the estimated crossproduct for trait i in the parents and trait j in the offspring. That is,
where z b is the mean of trait i in family k, ZP is the overall mean of zi in the offspring, z$k is the midparent value of trait j in family k, Zf is the overall mean of trait j in the parents, and f is the degrees of freedom. Our estimate of the sample covariances of the genetic covariances are then readily obtained from Equation B1 as
Variation in family size can be taken into account using a form of weighted regression (KEMPTHORNE and TANDON 1953). Doing so results in each mean crossproduct, Mu, being multiplied by a weight, wi, which is the reciprocal of the variance of the offspring means about the regression line. The weight of trait zi is
where pi is the intraclass correlation of trait zi in the offspring (= h2/2 for midparent regression in the absence of dominance and environmental correlations
between sibs), /3q is the slope of the parentoffspring regression, P,i is the phenotypic variance of zi, and n is the number of offspring per family (KEMPTHORNE and TANDON 1953; BOHREN, MCKEAN and YAMADA 1961; BULMER 1985, p. 79). If family size varies, weighted regression should be used to estimate the genetic parameters. pi and p,, can either be guessed, or estimated from the data and used to iteratively calculate the regression coefficients (cf: BULMER 1985, pp. 8384). Note, however, the latter method yields biased estimates of the parameters (BOHREN, MCKEAN and YAMADA 196 1).
APPENDIX C
This appendix describes in detail two methods for testing hypotheses about the estimated additive ge netic covariance function 9 The first tests whether one or more of the eigenvalues of Yare statistically indistinguishable from 0. The second is a numerical method for constructing the confidence limits of the eigenvalues of 9 In this appendix we make use of the notation and results of APPENDIXES A and B.
T o find confidence limits on the estimates of the eigenvalues of we begin by forming the n(n + 1)/ 2dimensional vectorAg from the diagonal and subdi agonal elements of G (as described in APPENDIT A) and the n(n + 1)/2 X n(n + 1)/2 error matrix V (as described in APPENDIX B). The elements of an additive genetic covariance matrix simulated with error are calculated as
g' = + $'/.e, (C1)
where $'" is the matrix square root of 9 and e is a n(n + 1)/2dimensional vector of uncorrelated, nor mally distributed random variates with expectation 0 and variance 1. The simulated covariance matrix G' is then reconstructed from the elements of g'. The corresponding coefficient matrix CG' is determined using Equation 5, and its eigenvalues calculated. The values are recorded, and the entire procedure reiter ated. We have been using 1000 iterations in our analyses.
The apercent confidence limits for each eigenvalue can then be determined directly by the range included by 1  a of the values. Confidence regions for the values of the eigenfunctions at any specified points (ages) of interest can be determined at the same time.
Our second method tests the hypothesis that one or more of the estimated eigenvalues of !Y is statistically indistinguishable frFm 0. We can write the estimated coefficient matrix Cc in terms of its eigenvalues and eigenvectors:
C G = u A UT, (C2)
where A is a dia_gonal matrix whose elements are the eigenvalues of CG and U is a matrix whose columns
Growth Trajectories 993
are the corresponding eigenvectors. We then generate a coefficient matrix CL by setting one or more of the eigenvalues in A in Equation C? equal to 0. The genetic covariance matrix G* is constructed using
G* = 9 CE aT, (C3)
from which vector g* is formed from the lower diag onal elements of G* in the same way that g was. The hypothesis of zero eigenvalues is then tested with the
chisquared statistic x 2 = (g  g*)T +yg  g*) (C4)
with t( t + 1)/2 degrees of freedom, where t is the number of eigenvalues that have been set to zero. If this reaches a significant value, then the hypothesis that those eigenvalues are zero is rejected. The same procedure can be used to test a hypothesis that one or more eigenvalues are equal to some specified values other than zero.
Human Biology, December 1993, v. 65, no. 6, pp. 941–966.Copyright © 1993 Wayne State University Press, Detroit, Michigan 48202
key words: genotypeenvironment interaction, complex segregation analysis, quantitative genetics, quantitative traits, statistical methods.
Statistical Genetic Approaches to Human Adaptability
J. Blangero1
Abstract The genetic determinants of physiological and developmental responses to environmental stress are poorly understood. This has been primarily due to the difficulty of direct measurement of response and the lack of appropriate statistical genetic methods. Here, I present a unified statistical genetic methodology for human adaptability studies that permits evaluation of the inheritance of quantitative trait response to environmental stressors. The foundation of this approach is the mathematical relationship between genotype environment interaction and the genetic variance of response to environmental challenge. I describe two basic methods that can be used for either discrete or continuous environments. Each method allows for major loci, residual polygenic variation, and genotype environment interaction at both the major genic and the polygenic levels. The first method is based on multivariate segregation analysis and is appropriate for situations in which data are available for each individual in each environment. The second method is appropriate for the more common case when response to the environment cannot be observed directly. This method is based on an extension of a mixed major locus/variance component model and can be used when singly measured related individuals are observed in different environments. Three example applications using data on lipoprotein variation in pedigreed baboons are provided to show the utility of these methods.
The study of human adaptability examines the biological responses of individuals to environmental change and the variability in such reaction norms both within and between populations. Past research in this field has focused primarily on the assessment of the environmental components of basic homeostatic mechanisms involved in normal physiological and developmental processes in stressful environments (Baker 1976; Frisancho 1979). Genetic inferences have usually been limited to indirect quasi experimental between population comparisons using classical migration designs (Harrison 1966). Because of the general lack of family studies and of an appropriate statistical methodology, relatively little is known about the underlying genetic basis of quantitative physiological responses to environmental change.
1 Department of Genetics, Southwest Foundation for Biomedical Research, PO Box 28147, San Antonio, TX 78228 0147.
HB_56_FINAL.indb 523 5/3/2010 12:28:13 PM
524 / blangero
The genetic determinants of physiological response are likely complex, involving both major genes with large effects and minor genes (polygenes) with small effects, and their elucidation will require new analytical methods that explicitly incorporate genotype environment interaction. In this article I present a general statistical genetic approach for human adaptability studies that permits evaluation of the inheritance of quantitative traits involved in adaptation to environmental stresses.
The approach that I advocate for the dissection of the genetic architecture of response to environmental stress uses genotype environment (G × E) interaction as the focal concept. In most cases, significant G × E interaction is interpretable as evidence for a heritable basis of biological response to environmental change. This relationship can be exploited to make inferences about the genetic mediation of physiological or developmental responses to environmental stresses. The advantage of using G × E interaction as an analytical focus is that, given an appropriate pedigree based sampling design, it can be evaluated even when direct measurement of response is not possible. Therefore the statistical analysis of G × E interaction can provide a useful framework in which to make inferences about the genetic determinants of response.
I present a model of quantitative trait variation that includes the effects of a single unknown major gene (MG), polygenes (PG), known (and therefore measured) environmental factors (E), and random environmental factors (e). Although the model easily generalizes to multiple major loci (Blangero et al. 1990), I limit the present exposition to a single locus. Instead of (or in addition to) the unknown major gene, the model can include a known genetic polymorphism at a candidate locus (a measured gene, noted as mG). Using this basic model, I discuss how G × E interaction can be examined at several levels: (1) polygenotype environment interaction (PG × E), (2) major genotype environment interaction (MG × E), and (3) major genotype random environment interaction (MG × e). The direct analogues for the case when we have information on a known genetic polymorphism are termed measured genotype environment interaction (mG × E) and measured genotype random environment interaction (mG × e).
The problem is also divided on the basis of experimental design into two types: (1) complete data situations in which each individual can be measured in each environment and (2) missing data situations in which each individual can be measured in only a single environment. When each individual can be measured in each environment, the examination of G × E interaction and the genetic analysis of response to environmental change is straightforward, because response can be directly observed. For such a situation multivariate genetic analysis of the trait’s expression in the different environments can be used to examine the genetic architecture of response (Falconer 1952). Complete data situations most likely occur when the environment can be manipulated easily (e.g., cold stressor tests, exercise tests) or when individuals routinely encounter multiple environments (e.g., diurnal variation, seasonal variation).
When each individual can be measured only in a single environment, the analysis of G × E interaction and of the genetic determinants of response is
HB_56_FINAL.indb 524 5/3/2010 12:28:13 PM
Statistical Genetics of Human Adaptability / 525
greatly complicated. Such missing data situations occur when environments are difficult to manipulate (e.g., smokers versus nonsmokers), are exclusive (e.g., males versus females), or are continuous (e.g., years spent at high altitude, dietary indexes). However, I show that, when related individuals are measured in different environments, genetic analysis of the response to environmental change is possible.
Models for G × E Interaction
Each Individual Measured in Each Environment. The simplest situation in which to evaluate G × E interaction is when each individual can be measured in each environment. In this section I develop a general framework for discussing G × E interaction by examining the bivariate case of quantitative trait expression in two discrete environments. Given knowledge about the major locus genotype, an individual’s phenotype measured in the first environment can be written as
p m g em1 1 11 1 1 1= + ′ + +µ β x , (1)
where m represents the major locus genotype, mm is the mean of the mth genotype, β is a vector of regression coefficients corresponding to the vector of fixed effects x, g is the polygenotypic effect, and e is the random environmental deviation. For the current one locus problem, m can be AA, AB, or BB, which are assumed to be in Hardy Weinberg equilibrium with frequencies ψ = − − ′q q q q2 22 1 1, ( ),( ) , where q is the frequency of the A allele.
We can model the phenotypic value in the second environment as a linear function of the phenotypic expression in the first environment:
p m g e
p m
g e
m
m m g
2 2 2 2 2 2
1
1 1 2 1 1
= + ′ + +
= +
= + + + ′ + + +
µ
µ β
β
β ∆
2 x
x
∆
∆ ∆( ) ( ) ( ) ( ++∆e ),
(2)
where
∆ ∆ ∆ ∆= + ′ − + ′ + +m g eβ ∆1 2 1 2( )x x xβ (3)
refers to the response ( p2  p1) to the environment and the subscripted D’s are the component specific changes in parameters (or random variables) between environments. Now let the complete bivariate phenotype be represented as p = [ p1, p2].
Assuming that there is no correlation between the polygenotypic vector and the vector of random environmental deviations (i.e., no PG × E correlation, Cov[g, e] = 0), the conditional phenotypic covariance matrix for p is given by
Var( )p m G Em= + , (4)
HB_56_FINAL.indb 525 5/3/2010 12:28:14 PM
526 / blangero
where G is a within genotype additive genetic covariance matrix and Em is a within genotype random environmental covariance matrix. The within genotype additive genetic covariance matrix is assumed to be constant across genotypes (i.e., there is no MG × PG epistasis). G has the form
G G G G G
G G G G
=
σ ρ σ σ
ρ σ σ σ1
21 2
1 2 22 , (5)
where the s Gi2 is the residual additive genetic variance of the trait in the ith environ
ment and rG is the additive genetic correlation between trait expressions in the two environments. The environmental covariance matrices Em have analogous forms.
The total phenotypic covariance matrix of p can be decomposed into its three constituent parts (Blangero and Konigsberg 1991):
Var( )p = + +M G E (6a)
= ′ + +∑µ µ ψWC G Ei i, (6b)
where M is the genetic covariance matrix resulting from the major locus and E is the pooled within genotype environmental covariance matrix. In Eq. (6b), m is the 2 × 3 matrix of genotype specific means, W = diag(), and C = (I  13). Equation (6b) shows that the covariance matrix resulting from the major locus is strictly a function of the genotypic means and the genotypic frequencies. In the univariate case this formula reduces to
σ ψ µ ψ µM i i j j2
2
= −( )∑∑ . (7)
Given this variance decomposition of the bivariate phenotype, we can completely specify the variance components of the response. Because D is a linear transformation of p (D = kp, where k = [1, 1]), the conditional variance of the response given the major locus genotype is
Var( ) ( )∆ m G E
G Em
m
G G G G G E m
= ′ +
= ′ + ′
= + −( )+ +
k k
k k k k
σ σ ρ σ σ σ σ12
22
1 2 122 EE m Em E m E m2
21 22−( )ρ σ σ .
(8)
As with the original phenotypes, the total phenotypic variance of the response to the second environment can be partitioned into three components:
Var( ) ( )
.
∆
∆ ∆ ∆
= ′ + +
= + +( )k kM G E
M G Eσ σ σ2 2 2 (9)
The heritability of the response can be broken into two components— that resulting from the major locus (hMD
2 ) and that resulting from the polygenes (hGD2 ):
HB_56_FINAL.indb 526 5/3/2010 12:28:16 PM
Statistical Genetics of Human Adaptability / 527
hMM
M G E∆
∆
∆ ∆ ∆
22
2 2 2=+ +
σ
σ σ σ, (10a)
hGG
M G E∆
∆
∆ ∆ ∆
22
2 2 2=+ +
σ
σ σ σ. (10b)
The total heritability of response is simply the sum of these two component specific heritabilities: h h hM G
2 2 2= +∆ ∆.
PG × E Interaction. PG × E interaction occurs when there is a significant polygenic component of variance in response. The additive polygenic variance in response is a function of the additive genetic variance expressed in the two environments and the additive genetic correlation between the trait’s expression in the two environments:
σ σ σ ρ σ σ
σ σ σ σ ρG G G G G G
G G G G G
∆2
12
22
1 2
1 22
1 2
2
2 1
= + −
= − + −( ) ( ). (11)
The absence of PG × E interaction implies that there is no polygenic variance for the response to the environment (i.e., σ G∆
2 0= ). Equation (11), which was first derived by Robertson (1959), shows that there is no PG × E interaction when σ σG G1
22
2= and rG = 1. The first condition requires that the polygenic variance be constant across environments. Observed heteroscedasticity of genetic variances across environments can arise simply because of scaling. For example, if g2 = cg1, where c is a constant, s G2
2 will be equal to c2s G12 and (sG1  sG2)
2 will equal (1  c)2
s G12 . For the second condition ( rG = 1) to hold, the same polygenes must influence
the phenotype in both environments and have similar effects in each environment. The second condition is therefore the requirement of complete pleiotropy. In the absence of complete pleiotropy ( rG < 1), the polygenotypes may exhibit different ranks in different environments—one genotype may express the highest quantitative trait mean in one environment, but a different genotype may have the highest mean in a second environment.
MG × E Interaction. Significant MG × E interaction requires that there be a significant major locus component for the biological response to the environment. Similar to the case for PG × E, the major locus component of variance in response is given by
σ σ σ σ σ ρM M M M M M∆2
1 22
1 22 1= − + −( ) ( ). (12)
The absence of MG × E interaction requires σ M∆2 0= . This will occur only when
DAA = DAB = DBB (i.e., when there is no major genotype specific response).
MG × e Interaction. A more subtle form of G × E interaction involves interaction between a gene and the random environment—that part of the unmeasured
HB_56_FINAL.indb 527 5/3/2010 12:28:18 PM
528 / blangero
environment that is specific to each individual. This type of interaction, which I denote MG × e interaction, is closely related to the concepts of physiological stability and developmental stability (Mather 1953; Bradshaw 1965). If some genotypes are more environmentally labile, they will exhibit increased random environmental variance. Therefore MG × e interaction can be said to exist when the environmental variance is a function of major locus genotype. In terms of our current bivariate model, the null hypothesis of no MG × e interaction requires that EAA = EAB = EBB. It is important to note that this type of interaction can be examined without reference to any measured environment. Unlike PG × E and MG × E interaction, MG × e interaction is not directly related to the genetic variance of the response of a quantitative trait to an environmental change.
Each Individual Measured in a Single Environment. The missing data situation in which individuals can be measured in only a single environment is considerably more complex. In this section I extend the model to the examination of trait expression as a function of a continuous environmental index, Although I focus on this continuous case, the proposed model also can be formulated for discrete environments.
For the case of a measured continuous environment indexed by z, an individual’s phenotype can be modeled as the linear function
p z g em m z= + + ′ + +µ α β x1 , (13)
where am is a genotype specific regression on the environmental index, which is assumed to be scaled so that the basal environment exhibits a value of 0. The response to the measured environment z relative to the basal environment can be defined as Dmz = amz. The regression coefficients βz are subscripted to allow them to be a function of the measured environment if necessary.
PG × E Interaction. For the continuous environment case the presence of PG × E interaction implies that the polygenic variance is a function of z:
Var( ) ( , )
,
g z f zG
Gz
=
=
θ
σ 2 (14)
where fG(z, θ) is a nonnegative function and θ is a vector of parameters. Such variance functions can take many possible forms (Carrol and Ruppert 1988). For example, we can assume that the additive genetic standard deviation is a linear parametric function of the measured environment:
σ σ γGz G Gz= +0 1( ), (15)
where sG0 is the expected additive genetic standard deviation when z = 0 and gG determines the rate of change in sG. These two parameters must be constrained
HB_56_FINAL.indb 528 5/3/2010 12:28:19 PM
Statistical Genetics of Human Adaptability / 529
so that sG 0. Another possible variance function model is to let the logarithm of the additive genetic standard deviation be a linear function of the environment, which leads to
σ κ γGz G Gz= +exp( ), (16)
which guarantees that sGz 0.Similarly, the genetic correlation between an individual’s polygenotypic value
expressed in environment zi with that expressed in environment zj can be written
ρG zi zj i j G i j G i jg g z z f z z f z z, , ( , ), ( , )( )= − ≤ ≤1 1. (17)
If there is no PG × E interaction, Var( )g z G=σ 2 and rG(gi, gj) = 1. Again, the parametric function fG(zi, zj) can take any number of forms. One simple yet plausible one is to let the genetic correlation be an exponential function of the differences between the two environmental indexes:
ρ λG zi zj G i jg g z z( , ) exp= − −( ), (18)
where lG is a parameter that determines the rate of exponential decline in the genetic correlation as the difference between environmental indexes increases. If l = 0, the genetic correlation between trait expressions in any two environments is 1. More elaborate functions can be specified to take into account the environments individuals may have previously encountered (Hopper and Mathews 1983) when such historical environmental data are available.
For the variance functions listed, a statistical test of the null hypothesis of no PG × E interaction or no polygenic variance in response can be based on the expectation that gG = 0 and l = 0. Once the variance and correlation functions are known, the genetic variance for the response between any two environments can be obtained. For example, plugging the functions given in Eqs. (15) and (18) into Eq. (11) yields the following prediction for the polygenic variance in response to a change in environmental index relative to the basal environment:
σ σ γ γ λG G G G Gzz z z∆
20
2 2 2 2 1= + + − ( )exp( ) . (19)
MG × E Interaction. For the continuous environment a quantitative phenotype will exhibit significant MG × E interaction when the genetic variance resulting from the major locus varies as a function of the environmental index:
σ α α αMz M AA AB BBf z2 = ( , , , ), (20)
which requires that σ M z∆2 0> for at least some values of z. This will occur only
when there is heterogeneity in the genotype specific regressions on the environmental index, because
HB_56_FINAL.indb 529 5/3/2010 12:28:20 PM
530 / blangero
σ ψ α ψ αM i i j jzz∆
2 22
= −( )∑∑ . (21)
MG × e Interaction. The residual random environmental variance can also be modeled as a function of the measured environmental index. The same variance functions used for the polygenic variance can be used for the random environmental variance. To allow for MG × e interaction, the parameters of the environmental variance function have to be major genotype specific:
Var( , ) ( , , )
.
e z m f z mE
Emz
=
=
θ
σ 2 (22)
The analogous environmental function to Eq. (15) is
σ σ γEmz Em Emz= +0 1( ). (23)
The null hypothesis of no MG × e interaction requires that sEAA0 = sEAB0 = sEBB0 and gEAA = gEAB = gEBB. If there is evidence of MG × e interaction, then genotypes significantly vary in their environmental stabilities.
Environmental Variance of Response. When individuals can be measured in only a single environment, the random environmental correlation between the expression of the trait in different measured environments is statistically unidentifiable. Therefore the random environmental variance of the response to environmental change is also undefined. However, if we assume that the correlation between random environmental deviations is 0, the expected pooled environmental variance of response to environment zi will be
σ σ σ σ σE Ez E Ez Ezz i i∆2
02
02= − +( ) . (24)
The assumption of rE = 0 is plausible for many physiological traits that are influenced by time specific local conditions. Therefore Eq. (24) may be useful to help gauge the relative importance of major genic and polygenic determinants when individuals can be measured only in single environments.
Statistical Methods
A variety of statistical genetic methods can be used to assess G × E interaction, depending on the genetic determinants to be considered and the experimental design of the study.
Complete Data Situations: Multivariate Genetic Analysis. Statistical detection of G × E interaction is uncomplicated when information is available on each individual in each environment. Given sufficient pedigree data, the parameters of the model for the bivariate complete data situation described in Eqs. (1), (2), and
HB_56_FINAL.indb 530 5/3/2010 12:28:21 PM
Statistical Genetics of Human Adaptability / 531
(4)–(6) can be estimated using standard likelihood methods for pedigrees. For a pedigree of size n, let P be the n × 2 matrix of phenotypes. Assuming multivariate normality within genotypes, the conditional density of P given a vector of major locus genotypes m and the matrix of covariates X is
f P X
P F X P F X
nm ,
exp
/( )=
× − − −( )′ − −( )
− −
−
21 2
12
1
π
µ µ
Ω
Ωvec vecβ β
, (25)
where F is an n × 3 indicator matrix of 0’s and 1’s, mapping each individual to the appropriate genotypic specific means in the matrix m. The phenotypic covariance matrix for the whole pedigree is denoted , which is given by
Ω Φ ϒ= ⊗ +G 2 , (26)
where is the Kronecker product operator, is a matrix of kinship coefficients, and is a block diagonal matrix with n genotype specific random environmental covariance matrices (Em) along the diagonal given by
ϒ=⊕=in
mE i1 , (27)
where is a block diagonal matrix summation operator and mi is the ith individual’s genotype.
Quantitative Genetic Analysis. For the case of no major gene effects and no measured genotypic effects (i.e., mAA = mAB = mBB), Eq. (25) is the likelihood of the polygenic model commonly used in quantitative genetic analyses (Hopper and Mathews 1982; Lange and Boehnke 1983). For these polygenic models the maximum likelihood estimates of the parameters can be obtained by maximizing the ln likelihood function ln[ (f P X )] using numerical optimization methods. However, in the present context, inference based on such a simple model is limited to the examination of PG × E interaction.
Measured Genotype Analysis. If information is available on a polymorphic candidate locus that may be involved in the physiological, developmental, or metabolic pathway of the trait being studied, several additional types of G × E interaction can be detected. Without modification, Eq. (25) provides the likelihood of a multivariate measured genotype model (Boerwinkle et al. 1986; Blangero et al. 1992). Parameter estimation for measured genotype models can be performed by maximizing the function ln[ ( , )]f P Xm . Using this technique, PG × E, mG × E, and mG × e interaction can all be tested.
Complex Segregation Analysis. For models that include a major gene component, the likelihood is more complex and parameter estimation is more difficult. The detection of major genes affecting quantitative traits is accomplished using a
HB_56_FINAL.indb 531 5/3/2010 12:28:22 PM
532 / blangero
set of methods known as complex segregation analysis (Elston and Stewart 1971; Morton and MacClean 1974). A standard set of hypotheses are tested before the hypothesis of major gene involvement is accepted (Lalouel et al. 1983). In recent years these methods have been extended to allow for multivariate phenotypes (Lalouel 1983; Bonney et al. 1988; Blangero and Konigsberg 1991). This extension to multiple traits has great implications for the joint examination of MG × E and PG × E interaction because it allows the simultaneous analysis of a single trait measured in multiple environments (Blangero et al. 1990). Because response is a linear function of the original phenotypes, results from a multivariate segregation analysis can be formulated in terms of the variance components of response, as detailed in Eqs. (8)–(12).
The likelihood of a multivariate mixed major gene and polygene model can be written (Blangero and Konigsberg 1991)
L G E q P X f m f P m XA j jj
n
µ, , , , , , ( ) ( , )β M( )= ⋅ ⋅=
∑1
3
, (28)
where M is an n × 3n matrix containing all possible genotypic combinations, m·j is the jth column of M, and f (m·j) is the probability of observing the jth genotypic vector, which is a function of the pedigree structure and the rules of Mendelian transmission (Elston 1981). In Eq. (28), the summation is over all possible genotypic vectors—a potentially enormous number. For a Mendelian model this number can be reduced by eliminating consideration of impossible genotypic combinations (e.g., father = AA, mother = BB, child = AA). However, this requirement to sum over all possible genotypes leads to the computational complexity of segregation analysis. Fortunately, some efficient methods for recursive probability calculations on pedigrees are available (Elston and Stewart 1971; Cannings et al. 1978). These methods exploit the systematic pattern of the residual phenotypic covariance matrix that occurs when there is a standard additive polygenic residual component. In such cases the polygenotypes of offspring are independent, given the polygenotypes of their parents (Elston and Stewart 1971). This assumption greatly reduces the numerical burden of calculating the likelihood of a mixed major locus/polygenic model by permitting efficient analytical integration over multivariate polygenotypes (Hasstedt 1982; Blangero and Konigsberg 1991) or by allowing the rapid inversion of small patterned residual covariance matrices (Bonney 1984).
Missing Data Situations. When each individual is measured in only one environment, assessment of MG × E and PG × E interaction is possible if related individuals are measured in different environments. This situation is made statistically tractable by means of missing data theory (Little and Rubin 1987) because the measurements that are lacking can be considered missing. In missing data situations examination of MG × E and PG × E interaction (and therefore the genetics of response to environmental changes) is still possible if the missing data can be
HB_56_FINAL.indb 532 5/3/2010 12:28:23 PM
Statistical Genetics of Human Adaptability / 533
considered to be missing at random (MAR). If the probability of observing an individual in an environment is independent of genotype, then the missing data are MAR. In the present context MAR means that there is no correlation between genotype and environment (i.e., genotypes are distributed randomly across environments). The assumption of MAR is therefore unlikely to hold if there is strong natural selection acting on the trait such that allele frequencies are markedly different in the contrasting environments. For most traits of interest this is unlikely to be a problem because there are few examples of such selection. Note that the MAR assumption does not depend on the randomness of environments within pedigrees (sets of related individuals). Even if there is familial aggregation for the environmental measure, the MAR assumption will hold so long as genotypes (not phenotypes) and environmental measures are uncorrelated (i.e., genotypes and environmental measures are not jointly transmitted within families).
Given that the MAR assumption holds, the marginal distribution of the observed phenotypes pobs can be obtained by integrating out the missing data pmis:
f f dp p p pobs obs mis misθ θ( )= ( )−∞
+∞
∫ , , (29)
where θ represents the parameters to be estimated. It can be shown that the resulting ln likelihood function is given by
L f cθ θp pobs obs( )= ( )+ln , (30)
where c is a constant. Equation (30) suggests that standard likelihood inference can be used and is based solely on the observed data.
Conditional Distribution of Phenotypes Given Major Locus Genotypes. Assuming that the missing data are MAR, the conditional density of phenotypes in a pedigree given the vector of major locus genotypes m and the matrix of covariates X, is
f X Xp m F, , )( ) −MVN( µ β Ω , (31)
where MVN() denotes a multivariate normal density with mean vector (Fm  Xβ) and phenotypic covariance matrix . This density is identical in structure to the one shown in Eq. (25), except for dimensional differences. However, the phenotypic covariance matrix in Eq. (31) is different from that used in the complete data situation. When individuals can be measured in only a single environment,
Var( , )
,
p m X
R
=
= +
Ω
Φ Ξ ϒ2 (32)
where is the Hadamard product operator and is the n × n kinship matrix. The elements of the matrices R (rij), X (xij), and (uij) are given by
HB_56_FINAL.indb 533 5/3/2010 12:28:24 PM
534 / blangero
rz z
z z z zij
i j
G i j i j
=∀ =
∀ ≠
1, ,
( , ), ,ρ (33a)
ξ σ σij Gz Gzi j= , (33b)
υσ
ij
Ezii j
i j=
∀ =
∀ ≠
2
0
, ,
, . (33c)
The matrix R can be considered a correction to the kinship matrix because the presence of PG × E interaction alters the expected genetic correlations among relatives.
Statistical Genetic Analysis in Missing Data Situations. Maximum likelihood estimation of the parameters of the G × E models can be based on the likelihood implied by Eq. (31). Quantitative genetic analysis of PG × E interaction and measured genotype analysis of mG × e, mG × E, and PG × E interactions can proceed using the same techniques that are used for the complete data situation with the only difference being that the random environmental correlation between a trait’s expression in different environments is undefined. All other relevant parameters can be estimated using standard likelihood methods.
Complex Segregation Analysis. For the mixed major locus/polygenic model the analysis of G × E interaction is significantly more complex than that observed for the complete data situation. Most of the complication is due to the residual PG × E interaction component. Incorporating PG × E interaction violates the assumption of conditional independence of offspring’s polygenotypes given parental polygenotypes. This leads to increased complexity of pedigree likelihood calculations by making analytical integration over polygenotypes cumbersome. To avoid this problem, I have adapted Hasstedt’s (1991) variance component/major locus likelihood approximation to allow for G × E interaction. This method requires that the n × n conditional phenotypic covariance matrix be formed and inverted at each iteration. The method has been shown to be computationally feasible and to generate unbiased parameter estimates (Blangero 1991). Because of the computational requirements (the covariance matrix would have to be inverted separately for each possible genotypic vector), it is unlikely that the evaluation of MG × e interaction is currently feasible.
Assessing MG × E interaction in the missing data situation poses no analytical difficulties because it requires only a simple genotype specific regression model, as shown in Eq. (13). In fact, several investigators have developed major gene models that allow for MG × E interaction (Eaves 1984; Moll et al. 1984; Konigs berg et al. 1991; Pérusse et al. 1991; Gordeuk et al. 1992). Unlike the current method, none of these earlier methods allowed for the simultaneous evaluation of MG × E and PG × E interaction.
HB_56_FINAL.indb 534 5/3/2010 12:28:24 PM
Statistical Genetics of Human Adaptability / 535
Hypothesis Testing. Using likelihood based inference, we can compare competing hypotheses regarding the presence or absence of different G × E components. Formal statistical tests of the null hypothesis of no G × E interaction can be performed using likelihood ratio statistics (Kendall and Stuart 1961). Such a test compares the likelihood of a general model with the likelihood of a nested submodel. For example, to test the hypothesis that there is no interaction between a major locus and a continuous environmental index, we would compare a model in which the genotype specific regressions on the environment were held equal to one another (i.e., a = aAA = aAB = aBB) with the more general model in which aAA, aAB, and aBB are each estimated. The likelihood ratio statistic for such a test is
L = 2[ln Li (aAA, aAB, a BB, θi)  ln Lj (a , θj)], (34)
where the vector θi denotes all other estimated parameters of the ith model. L is distributed approximately as a chi square variate with degrees of freedom equal to the difference in estimated parameters between the two compared models. In the given example there are two degrees of freedom. If the null hypothesis of no MG × E interaction is rejected, there is evidence for a significant major locus component of the response to the environment. Similar tests to the example given can be specified for each of the G × E interaction terms.
The chi square approximation to the asymptotic distribution of L does not hold when the constrained parameter is located on the boundary of its acceptable parameter space. Such tests often occur when testing whether a particular variance component is greater than 0. In this case L is distributed as a mixture of a chi square distribution and a density with all its point mass at 0 (Chernoff 1954; Hopper and Mathews 1982). For such a test with one degree of freedom, the p value obtained from the c1
2 distribution should be halved.
Applications
In this section I present three examples of applications of these methods. All examples are taken from ongoing work at the Southwest Foundation for Biomedical Research on the genetics of lipoprotein metabolism using pedigreed baboons.
Complete Data, Discrete Environment: mG × E, PG × E, and mG × e Interaction. The first example uses the data of Hixson et al. (1989), who examined the relationship between low density lipoprotein cholesterol (LDL C) levels and a DNA polymorphism in an important candidate gene, the LDL receptor (LDLR) gene, for LDL C metabolism in a group of pedigreed baboons. The 253 baboons (Papio hamadryas anubis, P. h. cynocephalus, and their hybrids) were members of 21 pedigrees. Serum concentrations of LDL C were measured on each baboon in each of two dietary environments: (1) a basal diet and (2) a high cholesterol, high saturated fat (HCSF) diet. Therefore this is an example of a complete data
HB_56_FINAL.indb 535 5/3/2010 12:28:24 PM
536 / blangero
situation. The baboons were also typed for an AvaII restriction fragment length polymorphism (RFLP) in intron 17 of the LDLR gene (Hixson et al. 1989). Two alleles (A, B) were found, with the observed frequency of the more common B allele estimated at 0.79. Hixson et al. (1989) detected a significant association between LDLR genotype and quantitative levels of LDL C on both diets. However, Hixson did not examine G × E interaction or the genetic determinants of LDL C response to the HCSF diet.
I reanalyzed Hixson’s data using the methods for mG × E, PG × E, and mG × e interaction. Complete data (i.e., LDLR genotype, LDLC basal level, and LDL C HCSF level) were available for 203 animals. The quantitative genetic analysis program Fisher (Lange et al. 1988) was used for this analysis. Special subroutines were written to estimate the parameters described in Eqs. (1)–(4). The effects of several significant covariates (sex, male age, nursery reared versus maternal reared, and subspecies) were simultaneously estimated in all analyses.
Table 1 shows the results of hypothesis testing using likelihood ratio statistics. As found by Hixson et al. (1989) in their univariate analyses, there is strong evidence ( p = 0.007) for an effect of the LDLR gene on quantitative LDL C variation. There is also significant evidence for mG × E interaction (L = 6.26, p = 0.044), which indicates that there is a significant effect of the LDLR polymorphism on the response to the HCSF diet. This can be seen in Figure 1, which shows the genotypic means in each dietary environment. The solid lines indicate the observed response, and the dashed lines show the expected response assuming that there is no mG × E interaction. In the absence of mG × E interaction, each genotype would be expected to increase by 57 mg/dl. The estimated genotype specific responses were D AA = 38.31, D AB = 44.10, and D BB = 62.92. In particular, the AA genotype appears to be less influenced by the dietary challenge.
Table 1 also shows that there is significant PG × E interaction ( p < 0.001). This suggests that additional genetic factors influence dietary response. The PG × E interaction effect was further broken into the two components described by Eq. (11). The hypothesis that the additive genetic standard deviations were equal in the two environments can be rejected ( p < 0.001). The maximum likelihood estimates of the genetic standard deviations were sG1 = 13.68 2.05 for the basal
Table 1. Analysis of mG × E, PG × E, and mG × e Interaction in LDLC Concentrations in 203 Pedigreed Baboons
Model d.f. Λ p
No mG effects 4 14.17 0.007No mG × E interaction 2 6.26 0.044No PG × E interaction 2 22.06 <0.001 sG1 = sG2 1 20.12 <0.001 rG = 1 1 7.81 0.003No mG × e interaction 6 17.75 0.007
The measured genotype (mG) refers to an AvaII LDLR RFLP, E refers to diet (basal versus HCSF), and e refers to random environment.
HB_56_FINAL.indb 536 5/3/2010 12:28:25 PM
Statistical Genetics of Human Adaptability / 537
Figure 1. Diet specific LDLR genotypic means for LDL C levels showing mG × E interaction. Solid lines indicate estimated values; dashed lines indicate values expected in the absence of mG × E interaction.
diet and sG2 = 35.88 4.53 for the HCSF diet. As mentioned previously, such a difference can be due to scaling phenomena. The hypothesis of complete pleiotropy ( rG = 1) is also rejected ( p = 0.003), which suggests the possibility that polygenotypes can change ranks in different environments. The observed genetic correlation is relatively low: 0.563 0.158.
HB_56_FINAL.indb 537 5/3/2010 12:28:25 PM
538 / blangero
There is also evidence for significant mG × e interaction ( p = 0.007). The random environmental covariance matrices varied significantly among measured genotypes. This can be seen in Figure 2, which shows genotype specific bivariate ellipses. The area within the ellipse indicates the magnitude of variability within measured genotypes. The orientation of the ellipse reflects the environmental correlation between LDLC serum levels on the two diets. For example, the ellipse for the BB genotype is much larger than those of the other two genotypes, reflecting the overall greater variability. Therefore the BB genotype appears to have decreased environmental stability compared with the other two genotypes. This genotype may exhibit reduced capacity for physiological buffering against other unknown environmental factors.
Figure 2. Bivariate ellipses showing LDLR genotypic specific environmental variation (mG × e inter action). Ellipses contain 68% of each genotypic distribution (approximately 1 SD on either side of the mean). Genotype specific centroids are shown as plus signs.
HB_56_FINAL.indb 538 5/3/2010 12:28:25 PM
Statistical Genetics of Human Adaptability / 539
Missing Data, Discrete Environment: MG × E and PG × E Inter action. Whereas the first example used a known candidate polymorphism in a complete data situation, the second example involves an unknown major gene and a missing data situation. The quantitative trait is apolipoprotein AI (apo AI) serum concentration. apo AI is the main protein in high density lipoprotein (HDL) and is considered a protective factor against heart disease. We have shown that apo AI serum levels are influenced by two separate major genes in baboons (Blangero et al. 1990) and that both genes exhibit genotype diet interaction. For the current example I examine the role of genotype sex interaction in a single dietary environment using just one of the two major loci. In this case the environment is sex, which primarily marks differences in endogenous sex hormonal microenvironment. Sex is a good example of an obligate missing data situation that is important for many physiological traits. Elsewhere, the role of genotype sex interaction in determining apo AI levels using standard quantitative genetic methods has been considered (Towne et al. 1992).
The sample includes 617 baboons in 23 pedigrees and is similar to the sample that we previously analyzed (Blangero et al. 1990). The computer program PAP (Hasstedt 1989) was adapted to allow for segregation analysis incorporating both MG × E and PG × E interaction in a missing data situation. As with the LDL C analysis, the effects of several significant covariates (age and age2 in females and nursery rearing versus maternal rearing) were simultaneously estimated in each analysis.
Table 2 shows the results of the apo AI segregation analyses. The model incorporating both MG × E and PG × E interaction is a significant improvement ( p = 0.003) over the classical mixed model, which ignores G × E interaction. The null hypothesis of no MG × E interaction (DAA = DAB = DBB) can be unequivocally rejected ( p = 0.006). Therefore there is evidence for a major genic component of variance in response (or more appropriately, sexual dimorphism). This is clearly indicated in Figure 3, which shows the sex specific genotypic means. Females show an exaggerated between genotype variability relative to males. In females approximately 41% of the total phenotypic variance is accounted for by the major locus, whereas only 13% of the variance is attributable to this locus in males. Table 2 also shows that there is significant PG × E interaction for this trait ( p = 0.001). This interaction component is purely due to heterogeneity of residual genetic standard deviations between the two sexes because the genetic correlation between the expression of apo AI in the two sexes is estimated as 1.00.
Missing Data Continuous Environment: PG × E Interaction. The final application examines a case of G × E interaction involving a continuous environment. The trait is apolipoprotein B (apo B) serum concentrations. apo B is one of the primary proteins of LDL. The continuous environmental index is the average ambient temperature (°F) of the month when blood samples were drawn. A number of human studies have documented the existence of seasonal variation in lipoprotein levels (Buxtorf et al. 1988; Gordon et al. 1988), even after controlling
HB_56_FINAL.indb 539 5/3/2010 12:28:25 PM
540 / blangero
for seasonal differences in diet. Because ambient temperatures are known to influence some enzymatic activities, one potential cause of seasonal variation is temperature variation. The data consisted of apo B serum concentrations in 614 pedigreed baboons. Each individual was represented by a single measurement. The measurements were taken over several years across all months. Additional significant covariates (sex, sex specific age, and age2) were included (but not shown) in all subsequent analyses.
The data were analyzed using the computer program PAP (Hasstedt 1989), modified by specialized penetrance subroutines that I have developed. Because there was no evidence of a major locus influencing apo B, the analysis was limited to the assessment of PG × E interaction. For the analysis the environmental index (temperature) was scaled so that the observed minimum average temperature (50°F) had a score of 0.
Table 3 shows the results of the analysis. apo B levels exhibit a significant negative relationship with average temperature [ (temp) = 0.298 0.078]. The variance function used for the analysis is described by Eq. (15), and the genetic correlation function used is given by Eq. (18). As judged by the likelihood ratio statistic comparing the PG × E model with the classical polygenic model, there is evidence of significant PG × E interaction ( p = 0.01). This is due to a significant decrease in additive genetic variance as temperature increased (gG = 0.011 0.005, p = 0.049), which is shown in Figure 4. There was no analogous effect of temperature on the random environmental variance (data not shown). There is also significant evidence for incomplete pleiotropy among environments (l = 0.03
Table 2. Analysis of MG × E and PG × E Interaction in apo AI Concentrations in 617 Pedigreed Baboons: MaximumLikelihood Estimates and Likelihood Ratio Statistics
Model
MG × E No No ClassicalParameter PG × E MG × E PG × E Mixed
qA 0.668 0.757 0.666 0.712μAA 117.03 114.10 112.19 113.94μAB 125.21 131.40 129.38 129.90μBB 147.68 172.27 154.92 169.29ΔAA –10.15 –5.49 –3.95 –5.69ΔAB –4.06 –5.49 –9.22 –5.69ΔBB 17.21 –5.49 8.35 –5.69sGM 14.63 12.24 7.74 7.61sGF 5.60 5.12 7.74 7.61sEM 18.17 16.29 19.21 19.88sEF 19.94 21.55 20.30 19.88rG(M,F) 1.000 1.000 (1) (1)Λ – 10.22 11.12 16.44d.f. – 2 1 4p – 0.006 0.001 0.003
MG refers to an inferred major gene and E refers to sex (male versus female).
HB_56_FINAL.indb 540 5/3/2010 12:28:25 PM
Statistical Genetics of Human Adaptability / 541
Figure 3. Sex specific genotypic means for apo AI concentration showing MG × E interaction. Error bars indicate 1 SE.
Table 3. Analysis of PG × E Interaction in APOB Concentrations in 614 Pedigreed Baboons: MaximumLikelihood Estimates and Likelihood Ratio Statistics
Model
ClassicalParameter PG × E l = 0 gG = 0 Polygenic
m 51.16 50.28 50.48 49.52 (temp) –0.298 –0.245 –0.286 –0.232sG50 15.60 14.49 12.13 10.84sE 16.76 17.47 16.77 17.55gG –0.0114 –0.0125 (0) (0)l 0.0299 (0) 0.0299 (0)Λ – 2.90 3.88 6.56d.f. – 1 1 2p – 0.044 0.049 0.010
E represents average monthly temperature at time of sample.
HB_56_FINAL.indb 541 5/3/2010 12:28:26 PM
542 / blangero
0.02, p = 0.044). This can be seen in Figure 5, which shows how the genetic correlation between the trait’s expression in different temperature environments is a function of the absolute temperature difference between any two environments. The finding of significant PG × E interaction can be interpreted as evidence for a significant genetic component in the response of serum apo B levels to temperature change in baboons.
Discussion
The methods presented here can be used to decompose the underlying genetic determinants of physiological and developmental response to environmental stresses in human populations. Using some basic mathematical relationships, I have shown that the genetics of response can be studied through the examination
Figure 4. Predicted additive genetic variance in apo B concentration as a function of average monthly temperature. Significant decline in variance is indicative of PG × E interaction.
HB_56_FINAL.indb 542 5/3/2010 12:28:26 PM
Statistical Genetics of Human Adaptability / 543
of G × E interaction even when response itself cannot be measured in any single individual. However, utilization of this approach requires a shift in sampling designs from individuals to pedigrees. More specifically, the successful implementation of the proposed methodology depends on the exploitation of situations in which relatives can be measured in different environments (i.e., there are environmental contrasts among relatives).
Other investigators have called for and/or employed similar strategies for a variety of physiological traits and environmental stresses (Mueller et al. 1980; Ward and Prior 1980; Chakraborty et al. 1983; Ward 1985). However, most previous applications of family studies have focused on classical quantitative genetic models of inheritance [but see Ward (1985) for an application allowing for a major gene]. Such simple models are unlikely to hold for many of the traits that are examined in studies of human adaptability.
Figure 5. Predicted additive genetic correlations between apo B polygenotypes as a function of temperature difference. Nonunit correlations are indicative of PG × E interaction.
HB_56_FINAL.indb 543 5/3/2010 12:28:26 PM
544 / blangero
The genetic architecture of physiological traits involved in the response to environmental stresses is complex, involving both major genes with relatively large effects and numerous genes with small effects (polygenes). The evidence for major genes influencing critical physiological pathways is immense (as a random perusal of the genetic epidemiological literature will reveal). Therefore it is highly likely that some of the focal traits in human adaptability studies are also influenced by major genes. We have recently confirmed this expectation by demonstrating that a single major locus accounts for nearly 40% of the phenotypic variance in %O2 saturation of arterial hemoglobin in Tibetan high altitude dwellers (Beall et al. 1994). The analyses of response to environmental stress in such traits will require methods, such as the ones presented here, that include the potential for both MG × E and PG × E interaction. Only recently have the necessary statistical genetic tools become available for the detection of the effects of G × E interaction on complex quantitative traits (Eaves 1984; Moll et al. 1984; Bonney et al. 1988; Blangero et al. 1990; Blangero and Konigsberg 1991; Konigsberg et al. 1991; Pérusse et al. 1991; Blangero et al. 1992).
The methods described here will be particularly useful for studies that can incorporate a continuous environmental measure. In this regard the choice of environments to examine can be broad, encompassing both endogenous and exogenous factors that are not normally considered environments. For example, the age at which an individual is measured can be considered a measure of the endogenous developmental environment. Thus these methods will also be applicable to studies of growth and development. For example, using these methods Williams Blangero et al. (1992) recently found evidence for a major gene influencing head breadth that showed distinct major genotype age interaction, suggesting that genotypes exhibit differential growth patterns.
In summary, the statistical genetic methodology presented here can be used for a wide variety of problems regarding the genetic basis of human adaptability. Ultimately, such knowledge of the genetic components of intrapopulation variation will help us to understand the evolutionary dynamics of between population differentiation.
Acknowledgments This research was supported by the National Institutes of Health under grants HL28972, HL45522, GM31575, DK44297, and contract HV5303. I thank Tom Dyer for providing expert computer programming assistance, Sarah Williams Blangero, Lyle Konigsberg, and Brad Towne for helpful discussions, Glen Mott for performing the LDL C, apo B, and apo AI measurements, and Jim Hixson for allowing me to reanalyze his LDLR RFLP data.
The specialized Fortran subroutines used in the examples will be made available to interested individuals who already have official copies of PAP (version 3.0) and/or Fisher.
Received 9 November 1992; revision received 1 February 1993.
HB_56_FINAL.indb 544 5/3/2010 12:28:26 PM
Statistical Genetics of Human Adaptability / 545
Literature CitedBaker, P. 1976. Research strategies in population biology and environmental stress. In The Measures
of Man: Methodologies in Biological Anthropology, E. Giles and J. S. Friedlaender, eds. Cambridge, MA: Peabody Museum Press, 230–259.
Beall, C. M., J. Blangero, S. Williams Blangero, and M. C. Goldstein. 1994. A major gene for saturation of arterial hemoglobin in Tibetan highlanders. Am. J. Phys. Anthropol. (in press).
Blangero, J. 1991. Complex segregation analysis incorporating genotype environment interaction. Am. J. Hum. Genet. S49:465.
Blangero, J., and L. W. Konigsberg. 1991. Multivariate segregation analysis using the mixed model. Genet. Epidemiol. 8:299–316.
Blangero, J., S. Williams Blangero, and J. E. Hixson. 1992. Assessing the effects of candidate genes on quantitative traits in primate populations. Am. J. Primatol. 27:119–132.
Blangero, J., J. W. MacCluer, C. M. Kammerer, G. E. Mott, T. D. Dyer, and H. C. McGill Jr. 1990. Genetic analysis of apolipoprotein A I in two dietary environments. Am. J. Hum. Genet. 47:414–428.
Boerwinkle, E., R. Chakraborty, and C. F. Sing. 1986. The use of measured genotype information in the analysis of quantitative phenotypes in man. I. Models and analytical methods. Ann. Hum. Genet. 50:181–194.
Bonney, G. E. 1984. On the statistical determination of major gene mechanisms in continuous human traits: Regressive models. Am. J. Med. Genet. 18:731–749.
Bonney, G. E., G. M. Lathrop, and J. M. Lalouel. 1988. Combined linkage and segregation analysis using regressive models. Am. J. Hum. Genet. 43:29–37.
Bradshaw, A. D. 1965. Evolutionary significance of phenotypic plasticity in plants. Adv. Genet. 13:115–155.
Buxtorf, J. C., M. F. Baudet, C. Martin, J. L. Richard, and B. Jacotot. 1988. Seasonal variation of serum lipids and apoproteins. Ann. Nutr. Metab. 32:68–74.
Cannings, C., E. A. Thompson, and M. H. Skolnick. 1978. Probability functions on complex pedigrees. Adv. Appl. Probability 10:26–61.
Carrol, R. J., and D. Ruppert. 1988. Transformation and Weighting in Regression. New York: Chapman and Hall.
Chakraborty, R., J. Clench, R. E. Ferrell, S. A. Barton, and W. J. Schull. 1983. Genetic components of variations of red cell glycolytic intermediates at two altitudes among the South American Aymara. Ann. Hum. Biol. 10:173–184.
Chernoff, H. 1954. On the distribution of the likelihood ratio. Ann. Math. Soc. 25:573–578.Eaves, L. J. 1984. The resolution of genotype × environment interaction in segregation analysis of
nuclear families. Genet. Epidemiol. 1:215–228.Elston, R. C. 1981. Segregation analysis. Adv. Hum. Genet. 11:63–120.Elston, R. C., and J. Stewart. 1971. A general model for the genetic analysis of pedigree data. Hum.
Hered. 21:523–542.Falconer, D. S. 1952. The problem of environment and selection. Am. Natur. 86:293–298.Frisancho, A. R. 1979. Human Adaptation: A Functional Interpretation. St. Louis, MO: Mosby.Gordeuk, V., J. Mukiibi, S. J. Hasstedt, W. Samowitz, C. Q. Edwards, G. West, S. Ndambire, J. Em
manual, N. Nkanza, Z. Chapanduka et al. 1992. Iron overload in Africa: Interaction between a gene and dietary iron content. New Engl. J. Med. 326:95–100.
Gordon, D. J., J. Hyde, D. C. Trost, F. S. Whaley, P. J. Hannan, D. R. Jacobs, and L. G. Ekelund. 1988. Cyclic seasonal variation in plasma lipid and lipoprotein levels: The Lipid Research Clinics Coronary Primary Prevention Trial Placebo Group. J. Clin. Epidemiol. 41:679–689.
Harrison, G. A. 1966. Human adaptability with reference to the IBP proposals for high altitude research. In The Biology of Human Adaptability, P. T. Baker and J. S. Weiner, eds. Oxford, England: Clarendon Press, 509–520.
Hasstedt, S. J. 1982. A mixed model likelihood approximation for large pedigrees. Computers Biomed. Res. 15:295–307.
HB_56_FINAL.indb 545 5/3/2010 12:28:26 PM
546 / blangero
Hasstedt, S. J. 1989. Pedigree Analysis Package, V3.0. Salt Lake City, UT: Department of Human Genetics.
Hasstedt, S. J. 1991. A variance components/major locus likelihood approximation on quantitative data. Genet. Epidemiol. 8:113–125.
Hixson, J. E., C. M. Kammerer, L. A. Cox, and G. E. Mott. 1989. Identification of an LDL receptor gene marker associated with altered levels of LDL cholesterol and apolipoprotein B in baboons. Arterosclerosis 9:829–835.
Hopper, J. L., and J. D. Mathews. 1982. Extensions to multivariate normal models for pedigree analysis. Ann. Hum. Genet. 46:373–383.
Hopper, J. L., and J. D. Mathews. 1983. Extensions to multivariate normal models for pedigree analysis. II. Modeling the effect of shared environments in the analysis of variation in blood lead levels. Am. J. Epidemiol. 117:344–355.
Kendall, M. G., and A. Stuart. 1961. The Advanced Theory of Statistics, v. 2. London, England: Charles Griffin.
Konigsberg, L. W., J. Blangero, C. M. Kammerer, and G. E. Mott. 1991. Mixed model segregation analysis of LDL C concentration with genotype covariate interaction. Genet. Epidemiol. 8:69–80.
Lalouel, J. M. 1983. Segregation analysis of familial data. In Methods in Genetic Epidemiology, N. E. Morton, D. C. Rao, and J. M. Lalouel, eds. Basel, Switzerland: Springer Karger, 75–97.
Lalouel, J. M., D. C. Rao, N. E. Morton, and R. C. Elston. 1983. A unified model for complex segregation analysis. Am. J. Hum. Genet. 35:816–826.
Lange, K., and M. Boehnke. 1983. Extensions to pedigree analysis. IV. Covariance components models for multivariate traits. Am. J. Med. Genet. 14:513–524.
Lange, K., D. Weeks, and M. Boehnke. 1988. Programs for pedigree analysis: Mendel, Fisher, and dGene. Genet. Epidemiol. 5:471–472.
Little, R. J. A., and D. B. Rubin. 1987. Statistical Analysis with Missing Data. New York: Wiley.Mather, K. 1953. Genetical control of stability in development. Heredity 7:297–336.Moll, P. P., C. F. Sing, S. Ussier Cacan, and J. Davignon. 1984. An application of a model for a
genotype dependent relationship between a concomitant (age) and a quantitative trait (LDL cholesterol) in pedigree data. Genet. Epidemiol. 1:301–314.
Morton, N. E., and C. J. MacClean. 1974. Analysis of familial resemblance. III. Complex segregation analysis of quantitative traits. Am. J. Hum. Genet. 26:489–503.
Mueller, W. H., R. Chakraborty, S. A. Barton, F. Rothhammer, and W. J. Schull. 1980. Genes and epidemiology in anthropological adaptation studies: Familial correlations in lung function in populations residing at different altitudes in Chile. Med. Anthropol. 4:367–384.
Pérusse, L., P. P. Moll, and C. F. Sing. 1991. Evidence that a single gene with gender and age dependent effects influences systolic blood pressure determination in a population based sample. Am. J. Hum. Genet. 49:94–105.
Robertson, A. 1959. The sampling variance of the genetic correlation coefficient. Biometrics 15:469–485.
Towne, B., J. Blangero, and G. E. Mott. 1992. Genetic analysis of sexual dimorphism in serum APO AI and HDL C concentrations in baboons. Am. J. Primatol. 27:107–117.
Ward, R. H. 1985. Isolates in transition: A research paradigm for genetic epidemiology. In Diseases of Complex Etiology in Small Populations, E. Szathmary and R. Chakraborty, eds. New York: Alan R. Liss, 147–177.
Ward, R., and I. Prior. 1980. Genetic and sociocultural factors in the response of blood pressure to migration of the Tokelau population. Med. Anthropol. 4:339–366.
Williams Blangero, S., J. Blangero, and M. C. Mahaney. 1992. Segregation analysis of craniometric traits incorporating genotype specific growth patterns. Am. J. Hum. Genet. 51:A163.
HB_56_FINAL.indb 546 5/3/2010 12:28:26 PM
Genet. Res., Camh. (1994),64, pp. 5769 With 5 texl:figures Copyright © 1994 Cambridge University Press 57
Estimating the covariance structure of traits during growth and ageing, illustrated \vith lactation in dairy cattle
MARK KIRKPATRICK*t, WILLIAM G. HILLt A~D ROBIN THOMPSON§ * Department 01" Zuology, Unil.'ersity uf Texas. Austin TX 7R712, USA. t Instilute of Cell, Animal and Population Biolugy. University of Edinburgh. West l'Jains Road. Edinhurgh EH93JT U.K. §BBSRC Ruslin lnstirure (Edinburgh), Ruslin, l'vfidlorhian EH259PS. U.K.
(Received 20 October 1993 and in redsedfurm 13 !vIay 1994)
Summary
Quantitative variation in traits that change with age is important to both evolutionary biologists and breeders. We present three new methods for estimating the phenotypic and additive genetic covariance functions of a trait that changes with age, and illustrate them using data on daily lactation records from British HolsteinFriesian dairy cattle. First, a new technique is developed to fit a continuous covariance function to a covariance matrix. Secondly, this technique is used to estimate and correct for a bias that inflates estimates of phenotypic variances. Thirdly, \ve offer a numerical method for estimating the eigenvalues and eigenfunctions of covariance functions. Although the algorithms are moderately complex, they have been implemented in a software package that is made freely available.
Analysis of lactation shows the advantages of the new methods over earlier ones. Results suggest that phenotypic variances are inflated by as much as 39 % above the underlying covariance structure by measurement error and short term environmental effects. Analysis of additive genetic variation indicates that about 90 % of the additive genetic variation for lactation during the first 10 months is associated with an eigenfunction that corresponds to increased (or decreased) production at all ages. Genetic tradeoffs between early and late milk yield are seen in the second eigenfunction, but it accounts for less than 8 % of the additive variance. This illustrates that selection is expected to increase production throughout lactation.
1. Introduction
An individual's phenotype changes with age. A trait that changes with age can be represented as a trajectory, that is, a function of time. Because each character takes on a value at each of an infinite number of ages, and its value at each age can be considered as a distinct trait, such trajectories are referred to as 'infinitedimensional' characters.
Many problems of interest to breeders and evolutionary biologists involve selection on this type of trait. The traditional way of analysing the quantitative genetics of infinite dimensional traits involves focusing on the phenotypic values at a small number of landmark ages, making discrete what is intrinsically a continuous process. Recently, the methods of quantitative genetics have been extended to infinitedimensional traits to overcome this deficiency
t Corresponding author.
(Kirkpatrick & Heckman, 1989; Kirkpatrick et al. 1990; Kirkpatrick & Lofsvold, 1992; Gomulkiewicz & Kirkpatrick, 1992).
The infinitedimensional approach can provide more accurate estimates of variation in the traits and improve estimates of their response to natural or artificial selection as compared to conventional methods. Improved estimates of phenotypic and genetic covariances can be realized using the fact that the measurements are ordered in time. The situation is analogous to the classical statistical problem of predicting the value of a dependent variable y as a function of an independent variable x. A standard approach is to regress observed values of y onto x. Then, given a value x*, a prediction for the corresponding value y* is determined by the regression eq uation. Alternatively, one might use the observed value of y corresponding to the observed value of x that is closest to x*. In many situations the regression prediction will be superior because measurement error in y makes prediction from a single pair of observed x
1'v1. Kirkpatrick, W. G. Hill and R. Thornpson
and y unreliable, while the regression approach gains power bv using information from all the observations.
An estimate of the covariance between the values of a trait at two ages can likewise be improved by using information about the covariances at other ages. The classical approach of treating the value at each age as a discrete trait without regard for its place in the sequence of ages loses substantial information. In contrast, the infinitedimensional approach seeks to retain this information by using, in effect, a regression of covariance on age. Given the notoriously large sampling errors inherent in estimates of covariances, any gain in the power of estimation is welcome.
This paper extends the recentlydeveloped methods for the analysis of infinitedimensional traits and demonstrates them using data on lactation records from British HolsteinFriesian dairy cattle. We begin by briefly reviewing the framework for estimating covariance functions that was introduced by Kirkpatrick et al. (1990). We then introduce three new methods within this framework. The first is a technique for estimating covariance functions referred to as the method of asymmetric coefficients. This method is illustrated with a simple worked example. The second is a technique for correcting the bias that appears along the diagonal of an estimated phenotypic covariance function or matrix. This bias arises because datespecific measurement errors inflate the phenotypic variances (the diagonal elements), but have no such bias on the phenotypic covariances (the offdiagonal elements) or any of the additive genetic parameters. Our strategy here is to use the unbiased offdiagonal elements to estimate the diagonal elements. The algorithm is demonstrated using a simple example. The third method is a numerical approach to calculate the eigenvalues and eigenfunctions of a covariance function which is useful to describe the patterns of variation. After these new methods are introduced, they are applied to lactation records from British HolsteinFriesian dairy cattle.
2. Estimating covariance functions
For any trait that changes in time, the phenotype of an individual at age t can be written x(t). Variation in the population for this function is characterized by a covariance function. A covariance function is the infinitedimensional analogue of a covariance matrix. The value of the phenotypic covariance function 2I'(t l' t 2 ) gives the phenotypic covariance between the value of the trait at ages t1 and t2 • The phenotypic variance at age t] is written 2J>(t l' tl). Likewise, the additive genetic covariance structure of a population is described by the additive genetic covariance function '9.
For any practical application, these covariance functions are estimated from breeding data. The approach advocated by Kirkpatrick et al. (1990) starts
58
with measurements of individuals at each of n ages, denoted a 1 through a". Standard quantitativegenetic methods are used to obtain an estimate of the n x n covariance matrix for the measurements at these ages.
The goal now is to estimate the underlying covariance function from this matrix. In general, this is done by interpolating between the values of the covariance matrix, perhaps smoothing them in order to damp out the sampling error in the elements of the matrix. A variety of functions can be used for the interpolation. The approach we developed earlier is based on orthogonal functions (Kirkpatrick & Heckman, 1989; Kirkpatrick et af. 1990), and we will again use that method here.
We begin by briefly reviewing the approach, which is referred to as the method of' symmetric coefficients' in this paper. It starts with the fact that any continuous covariance function can be represented as a weighted sum of orthogonal functions. That is, given a set of functions 9;, i = 0, 1, ... , that are orthogonal over the interval [al' an], we can write the covariance function ,9 as
qJ(tl,lz) = 2: 2: C i )9iUJ9,(t2), (1) ?=OJ=O
where the elis are constants. These constants form a symmetric matrix, C lj = C ji (whence the term' symmetric coefficients'), which guarantees that qJ is symmetric as required by the definition of a covariance function. The strategy developed by Kirkpatrick et al. 1990) is to use an estimated covariance matrix P
based on measurements taken at n ages IO estimate a truncated set of the weighting coeflkients Clio Our estimate of the covariance function qJ, based on the first k orthogonal functions, is then
.. 1 .. 1
pj(t1 , t 2 ) = 2: 2: Cij 9;(t1 ) 9lt2)' (2) i=H 1=0
where k ~ 11. The statistical problem, then. is to estimate the matrix of coefficients C so that they can be substituted into eqn (2) to yield an estimate of the covariance function. As discussed by Kirkpatrick et al. (1990), we can obtain a 'full fit', in which k = n, such that the value of .&(tl' t 2 ) exactly equals the corresponding value of P when t1 and t2 equal two of the ages at which the data were taken. Alternatively, we can seek a 'reduced fit', in which k < n. Under a reduced fit, there will generally be discrepancies between if> and.P. The rational for favouring a reduced fit is that the estimate P includes sampling error, and we might prefer an estimate ,0/; that smooths out the fluctuations that these errors introduce.
The methods for both the full and reduced estimates of a covariance function that were developed earlier lead to a symmetric coefficient matrix C. In the next section we introduce a new method that leads to an asymmetric coefficient matrix, and show its advantages.
Infinitedimensional COil'S
3. The method of asymmetric coefficients
There are three reasons for developing the new method based on 2symmetric coefficients. First, an estimate of a covariance function based on our earlier method has continuous first derivatives everywhere. This may be undesirable alol1g the diagonal of the covariance function, where we might want to allow for the possibility of a crease, or discontinuous first derivative. A discontinuous first derivative along the diagonal is found in the covariance functions of several simple stochastic processes, including brownian motion, and so it seems desirable to allow for this possibility. The method of asymmetric coefficients makes no assumption about the continuity of first derivatives of the estimated covariance function along the diagonaL
Secondly, estimates of a covariance function based on asymmetric coefficients may be somewhat better behaved than those based on symmetric coefficients. The reason lies in the fact that estimates using symmetric coefficients involve the products of higherorder terms that result in functions that are less smooth than their asymmetric counterparts. The coefficient matrix C derived using symmetric coefficients generally will have all nonzero elements. When substituted into eqn (2), this produces terms of order ¢k](tJ ¢kl(tZ)' With orthogonal polynomials as the ¢s, for example, this corresponds to the product of two (k _1)th order polynomials, which will often result in a quite 'wiggly' function. By contrast, the coefficient matrix C derived using asymmetric coefficients has zeros in all elements eu for which i+j ~ k. Thus the terms of highest order to appear in eqn (2)
Symmetric coef:ficiems I
2 100.____ ~~j 11
Asymmetric coefficients
11
11
rIg. 1. Fits using the methods of symmetric coefficients ttop) and asymmetric coefficients (bottom) with the example of eqn (5) discussed in the text. The solid circles show the original data points.
59
are of the same order as 9"1' Hence asymmetric coefficients often lead to smoother estimates.
Thirdly, the asymmetric method can be used to correct for a bias in the diagonal elements of phenotypic covariance functions. We discuss this problem further in a later section (' Extrapolating to the diagonal ').
These attractions of asymmetric coefficients are mitigated by the fact that some of the techniques developed earlier under the method of symmetric coefficients do not carryover to the new method. In particular, the algebraic technique for estimating the eigenfunctions and eigenvalues of the covariance matrix directly from the coefficients cannot be applied to asymmetric coelTIcients. It is still possible, however, to estimate these quantities by numerical methods using the methods we discuss in a later section C Analysis of genetic variation ').
The method of asymmetric coefficients seeks an estimate of the covariance function .:Yf that is of the form
_ f~l ~ Cij 9,(t1 ) 9i t 2)' tl ~ t2
gl>(tl' tJ = :~~ L~ (3)
l~ E Cij 9i(t2 ) 9/11' t] < t2 •
Unlike the earlier method, there is no requirement that Cij = Cji because the form of eqn (3) guarantees that ~ will be symmetric, so we refer to this as the method of' asymmetric coefficients'. The data matrix P contains n(n + 1 )/2 parameters, and we can estimate no more than this number of coefficients. We choose to fit the coefficients Cij with i + j ~ k 1, that is the lower left half of the matrix C. This choice tends to result in a smoother estimate than if higherorder coefficients were fitted, as discussed above.
The strategy we use to fit C is to transform the problem into a standard leastsquares formulation. By stacking the columns of the data matrix P to form a vector p, and similarly transforming the coefficient matrix ; into a vector c, the statistical model can be written:
p = Xc+e, (4)
where p is a vector of observations (the estimated variances and covariances), X is a matrix defined by the values of the orthogonal functions evaluated at the measured ages, c is a vector of coefficients, and e is a vector of error terms. Our goal is to solve for the vector c that minimizes the error vector according to the weighted sum of squares criterion.
Algorithms for this calculation are described in Appendix A for the cases of both a full and a reduced fit. The method has been implemented as a computer program in a J'vJathematicag notebook (V/olfram, 1991). The program (which also performs other analyses and displays them graphically) is available from the senior author.
.M. Kirkpatrick, W. G. Hill and R. Thompson
To illustrate our approach, consider the problem of fitting the covariance matrix
(5)
based on measurements taken at the ages a = (10, 11)T, as showil in Figure 1. We will find the full estimate of 9, and so k = n = 2. \Ve choose to use normalized Legendre polynomials as the orthogonal functions. The first two of these polynomials are:
(6)
(see Kirkpatrick et al. 1990). Calculation of the coefficient matrix is described in detail for this example in Appendix A. It leads to the result
~ ( 6 c= v(3/3)
V/(3/3))
° . J
(7)
Notice that the matrix t: is asymmetric, and that elements below the antidiagonal are zero. These two properties distinguish the asymmetric coefficient matrix from the symmetric matrix approach described by Kirkpatrick et al. (1990).
Substituting these coefficients into eqn (A 14) gives our estimate of the covariance function:
for for
10 ~ " ~ t2 ~ 11 10~t2~tl~11.
(8)
While the coefficient matrix from which it was calculated is asymmetric, the covariance function itself is symmetric (as required by the definition of a covariance function). Checking, we confirm that the entries in the original matrix P are recovered when we substitute (1' t2 = 10, 11 into eqn (8). A perfect fit of the estimated covariance function to the data matrix results whenever a full fit is calculated.
The method of symmetric coefficients developed previously (Kirkpatrick et at. 1990) leads to somewhat different results. Using that method, the coefficient matrix for a full fit is
 (5 c= o o ')
1 '~ . /.5 J
(9)
Unlike the coefficient matrix (7), this matrix is symmetric. (The offdiagonal coefficients are zero in this example, but that is not generally true.) The corresponding estimate of the covariance function is
(10)
As with the earlier estimate using asymmetric coefficients, the original data matrix of eqn (5) is recovered when we substitute t 1 , t2 = 10, 11 into this equation.
The symmetric and asymmetric expressions for r! are quite different. The differences are seen clearly in Fig. 1. A conspicuous and diagnostic discrepancy is that symmetric coefficient estimate of r! is smooth along the diagonal while the asymmetric coefficient
60
estimate is not. The symmetric coefficient estimate also has more curvature.
4. Extrapolating to the diagonal
Estimates of phenotypic variances for the values of traits at specific ages are often inflated by factors that do not affect estimates of the covariances between ages. One source of this inflation is measurement error. A second source involves environmental factors that have effects over periods much shorter than the betweenmeasurement intervals, such as weather, health, food quality, and hormonal state. This second type of factor tends to increased covariances close to the diagonal of the covariance function. For example, estimates of the phenotypic correlations of lactation test day records one day apart were 0'84, declining only to 082 for records five days apart (Pander et al. 1993). Thus we can view the diagonal elements of a phenotypic covariance matrix or function as being biased upwards, relative to a smoother underlying pattern that we expect on biological grounds. The upward bias appears as a ridge along the diagonal of estimated phenotypic covmiance matrices and covariance functions. This bias distorts OUT picture of the covariance structure of the trait, and has practical implications in breeding programs that are based on agespecific variances.
We would therefore like to correct for the bias. Two strategies are available. A direct approach would be to estimate the measurement error directly, for example through repeated measures. A second, indirect approach is available when the characters of interest are agespecific measurements of the same trait through time. A familiar example is a growth trajectory, in which thc data are measurements of the sizes of each individual at a series of ages. In this situation the basic phenotype of interest is a continuous function·· the growth trajectory  that is an infinitedimensional trait. Here we show how the phenotypic covariances for an infinitedimensional trait can be used to estimate the variances (that is, the diagonal elements of the covariance matrix). These estimates are free of measurement error bias, and may lead to selection indices with increased efficiency.
Our strategy is as follows. On intuitive grounds, we expect covariance function for growth processes to be continuous. (This is a biological rather than mathematical argument, since there is nothing in the definition of a covariance function that requires it to be continuouso) Using the unbiased estimates for the phenotypic co variances (that is, f!lJ(tl' t 2) where (1 =F t~), we can extrapolate estimates of the phenotypic variances (that is, f!lJ(ll' t1))' The algorithm begins with an estimated phenotypic covariance matrix of the sizes of indiyiduals at the n ages ai . \Ve first estimate the phenotypic covariance function using only the n(nl)/2 unbiased subdiagonal elements of P. The method of asymmetric coefficients described above
Infinitedimensional COlrS
15
Fig. 2. Fit using the method of extrapolating to the diagonal using the example discussed in the text. The solid circles show the original data points; the open circles are the extrapolated values for the diagonal elements (the variances).
produces an estimate of the phenotypic covariance function in terms of a weighted sum of orthogonal functions. Because the diagonal elements of P were omitted, this estimated interpolates the values of q»(tl' t2 ) over the ranges tl E La2 , an] and t2 E lap anJ, where tl > 12 , Secondly, the coefficients are used to extrapolate the estimated covariance function: the range of tl is extended downward from age az to a 1 and the range of [2 upward from age G'il to an' giving us the full range t1' 12 E [aI' an]. This extrapolation gives us an unbiased estimate of the diagonal of f!J' along with the rest of the covariance function.
A detailed description of the algorithm is given in Appendix B. It has been implemented in a liJafhematicaT§ notebook, which is available from the senior author. To illustrate, consider the estimated phenotypic covariance matrix
3 8 3
~) 9/
(11)
based on measurements of some character taken at ages a = (10, 11, 15yr, plotted in Fig. 2. This example will illustrate how the method naturally accommodates uneven intervals between the measured ages. The variances along the diagonal of P have been inllated by the biases described earlier, and our aim is to obtain corrected estimates for them. In this approach the diagonal elements are not used in the estimate of the if? , and so a full fit uses k = n  1 = 2 orthogonal functions.
Appendix B shows that using the method of extrapolating to the diagonal, we obtain an estimated phenotypic covariance function
#(tl' 12 ) =
(17/4+t 1 t2 /4 for
t17/4+t,,t l /4 for
10!( 11 !( [2 !( 15
10 !( t2 !( t1 !( 15. (12)
Evaluating this function at the measured ages (t = 10, 11. 15), we obtain the matrix
~ (3'25 p= 3
2
3 4 3
(13)
61
The results suggest that the variances shown along the diagonal of eqn (11) are overestimated by as much as 115 % (7 F. 3·25 for the variance at age 10). These results are illustrated in Fig. 2.
5. Analysis of variation
The covariance function is an important descriptor of variation in the trajectory of a character that changes through time, and a substantial amount can be learned from its analvsis. The spectrum, or eigenvalues and eigenfunction~, of a covariance function is particularly useful. The leading eigenvalues and eigenfunctions visualize major patterns of variation, and describe these patterns with many fewer parameters than the full covariance function. One important application involves the additive covariance function. Its leading eigenfunctions identify the types of evolutionary changes for which the population has substantial genetic variation available. Conversely, its spectrum also shows the types of changes for which there is not appreciable genetic variation, and therefore which will occur slowly if at all under selection.
The method of symmetric coefficients was specifically devised with this objective in mind. Calculations based on a symmetric coefficient matrix can be used to obtain estimates of eigenfunctions and eigenvalues directly (Kirkpatrick & Heckman, 1989; Kirkpatrick et al. 1990), The method of asymmetric coefficients, on the other hand, cannot be adapted to these calculations. We therefore propose an alternative using a numerical approach.
An estimate of a covariance function based on asvmmetric coefficients can be evaluated on a square la~tice of a moderate to large number of points. These values form a matrix whose spectrum (eigenvectors and eigenvalues) can then be calculated by standard methods. As the number of points on the lattice increases, the estimates of the eigenvalues will converge on those of the underlying covariance function (see Kirkpatrick & Heckman, 1989). The points of the eigenvectors can be interpolated to give estimates of the corresponding eigenfunctions (within a constant factor that is a function of the number of points in the lattice1
Thi~ may seem like a rather baroque method of estimating quantities that could be obtained much more directly by simply calculating the eigenvalues and eigenvectors of the original covariance matrix. The incentive for performing the less direct algorithm just described is that it is expected to give more accurate estimates (Kirkpatrick & Heckman, 1989). The reason for this seems to lie in the fact that simply calculating the spectrum of the original matrix discards all information about the ordering in time of the ages at which the measurements were taken. The methods for symmetric coefficients developed by Kirkpatrick et al. (1990) make use of this information; the indirect
M. Kirkpatrick, W. G. Hill and R. Thompson 62
Data matrix Symmetric Full fit (k = 10)
8
Asyrrunetric Extrapolated Full fit (k = 9)
8 8
4
Extrapolated Discrepancy
8
10
Fig. 3. Estimates of the phenotypic covariance function for lactation in British HolsteinFriesian dairy cattle. The original data (top left) show a ridge corresponding to upward biases in the diagonal elements (variances). The full fit using symmetric coefficients (top right) overfits rhe data; the plot has been truncated in the vertical dimension. The full fit with asymmetric coefficients (middle left) is much smoother, but reproduces the inflated diagonal. The extrapolated full fit (middle right) is poorly behaved along the diagonal. An extrapolated reduced fit with k = 7 (bottom left) is well behaved everywhere. The discrepancy between this estimate and the original data (bottom right) is substantial along the diagonal, corresponding to the bias, but very small elsewhere. Variances and covariances are in units of kg".
algorithm just described for use with asymmetric coefficients also does so.
6. Analysis of lactation in dairy cattle
\Ve will illustrate the three methods developed above using lactation records of British HolsteinFriesian dairy cattle. The data are described in detail by
Pander et al. (1992), and comprise their data set 2. Briefly, these were records of daily milk yield (' test day records ') of 34029 heifers of known parents. The first record for each heifer was taken between day 5 and day 35 after the start of lactation, and successive records at monthly intervals for a total of 10 monthly records per individual. The data were analysed as if each measurement was taken at the midpoint of the
infinitedimensional con's
month interval. Additive genetic and phenotypic parameters were calculated from the sire component of covariances using restricted maximum likelihood (REML; see Patterson & Thompson, 1971; Meyer, 1985).
We analysed these data with the methods described above using Legendre polynomials. When reduced estimates are computed, the matrix V of the error covariances between the estimated covariances is required (Kirkpatrick et al. 1990). Since the REML program used does not estimate V, we used the following approximation. The phenotypic and additive genetic covariances were viewed as if they had been estimated using a standard balanced halfsib breeding design with 700 random sires, each with 10 halfsib offspring. There was a total of 16000 residual degrees of freedom because daughters of anum ber of additional selected sires were used to increase connexions between herds. These sample sizes are a reasonable approximation to the more complex pedigree actually used to estimate the genetic parameters (see Pander et a/. 1992). The V matrix 'Was then estimated using the formulae in Appendix C of Kirkpatrick et at. (1990).
(i) Estimating the phenotypic covariance function
The phenotypic covariance matrix is plotted in three dimensions in Fig. 3 (top left). We begin by calculating full estimates (k = n = 10) of the phenotypic covariance function. The estimates based on symmetric coefficients and on asymmetric coefficients are compared in Fig. 3. Both show a conspicuous ridge running along the diagonaL corresponding to the datespecific measurement error described in the introduction. A secondary effect of the spike along the diagonal is to produce a series of parallel harmonic ridges in the symmetric coefficient estimate. These are not seen in the original data, but rather reflect side effects of how the polynomials used to construct :j accommodate the large elements along the diagonaL The estimate based on asymmetric coefficients is substantially smoother, as anticipated for the reasons discussed earlier, but still captures the diagonal ridge corresponding to the inflated variance estimates.
The method of extrapolating to the diagonal is used to eliminate the upward bias in the estimates of the diagonal of P. The full fit (k = 9 polynomials) gives an unsatisfactory estimate for f!jJ (Fig. 3, middle right). The extrapolation based on these highorder polynomials causes the estimated covariance function to take on extremely small values along the diagonal. In fact, this estimate is not positive semidefinite, and so does not qualify as a covariance function. The reduced fit with k = 8 suffers the same problem.
A reduced estimate with k = 7 (Fig. 3, bottom left), however, shows a covariance function that is both wellbehaved and in keeping with our intuitive
63
expectation based on the original data. The function rises smoothly to the diagonal. It fits the offdiagonal elements of P very well: all of the differences are less than 2 % in magnitude. In contrast, there is a large discrepancy between the extrapolated estimates of the diagonal elements of P and those from the original matrix (Fig. 3, bottom right). The differences are, in facL our estimates of the upward biases in the diagonal elements. They are substantial. Our extrapolated values differ by as TIl uch as 36 % from the values of the diagonal elements of the original matrix P.
An even simpler estimate of the covariance function would be one that depended only on the difference between any pair of ages. Inspection of the data, however, shows for example that the phenotypic correlation between the first and second month of lactation is different from that between the seventh and eighth (r p = 0·64 v. r p = 0·76, respectively; Pander et af. 1992 Appendix Table 1), and so this alternative is not appropriate in this case.
(ii) Estimation and analysis of the additive genetic covariance function
To illustrate the method of asymmetric coefficients, we will again use the data of Pander e tal. (1992). Their estimate of the lOx 10 additive genetic covariance matrix G and our estimates of the continuous covariance function are shown in Fig. 4. We first calculated full (k = n = 10) estimates r§ of the covariance function using both symmetric and asymmetric coefficients. Both show severe fluctuations. The symmetric estimate takes on values that range from less than  3 to more than 7 kg2 (Fig. 4, top right) even though the original matrix elements only span the range from 1·5 to 3·5 kg2
• The asymmetric estimate is again considerably better behaved, as expected, but it nevertheless takes on values as small as  0.12 kg~ (Fig. 4, middle left).
\Ve than calculated the symmetric and asymmetric estimates using reduced fits using k = 9 polynomials (Fig. 4, middle right and bottom left). Both are far better behaved than the full estimates. The goodnessoffit tests give X2 (10 D.F.) = 36·4 for the symmetric fit and X2 (10 D.F.) = 30·8 for the asymmetric fits, indicating that the asymmetric tit is somewhat better. Both tests, however, show there are statistically significant discrepancies between the smoothed covariance function and the original data. \Ve nevertheless prefer these reduced estimates because they are smoother and because the discrepancies between them and the original data matrix are small (less than 7 % for both the symmetric and asymmetric fits).
We estimated the eigenvalues and eigenfunctions of <§ using three different methods for comparison. First, we analysed the symmetric estimate with the algebraic method described by Kirkpatrick et af. (1990) using the reduced estimate with k = 9. Secondly, we carried
M. Kirkpatrick, W. G. Hill and R. Thompson
Data matrix
4
Asymmetric
4
2
Asymmetric
4
10
Symmetric
4
Symmetric
4
4
o
Full fit (k = 10)
Asymmetric (k = 7) v. data
Fig. 4. Estimates of the additive genetic covariance function for lactation in British HolsteinFriesian dairy cattle. The original data are shown at top left. The full fit using symmetric coefficients (top right) overfits the data; the plot has been truncated in the vertical dimension. The full fit with asymmetric coefficients (middle left) is much smoother, but is poorly behaved in the offdiagonal comers. Reduced fits with k = 9 using symmetric coefficients (middle right) and asymmetric coefficients (bottom lefI) arc much smoother. The discrepancy betv.:een the reduced asymmetric fit and the original data (bottom right) is very small everywhere. Variances and covariances are in units of kg".
64
out an analysis of the asymmetric estimate again with k = 9 using the numerical approach outlined above wi th a 31 x 31 matrix reconstructed from the es timated covariance function. Thirdly, we calculated the eigenvalues and eigenvectors of the original matrix G. The
eigenvalues from the last two methods were renormalized to make them comparable to those from the first method. (The eigenvalues of a covariance function are defined by an integration rather than a summation. Eigenvalues calculated by the second and third
Infinitedimensional cows
I 04 ~
04
O+"....Cll
20
04
o
4 7 10 Momh
Fig. 5. Estimates of the first and second eigenfunctions (top and bottom panels, respectively) of additive genetic variation for lactation in British HolsteinFriesian dairy cattle. Each panel compares the estimates obtained via symmerric coefficients, asymmetric coefficients, and the corresponding eigenvector from the original additive genetic covariance matrix. Estimates using symmetric and asymmetric coefficients are reduced fits with k = 9 (see Fig. 4). Estimates of the corresponding eigenvalues are shown in the insets; note that estimates of A1 are about an order of magnitude greater than those for "\2' Eigenvalues are in units of kg2.
methods, in contrast, are defined by a matrix product that involves a summation whose value depends on the number of ages sampled. Renormalization accounts for the number of ages so that eigenvalues estimated by these different methods can be compared.)
The results are present in Fig. 5. The first eigenfunction is positive everywhere, showing that the principal axis of genetic variation corresponds to simultaneous increases or decreases in lactation at all ages. The leading eigenvalue shows that this eigenfunction accounts for about 90% of all additive genetic variation (ill = 228 and the sum of all eigenvalues = 254, using the method of asymmetric coefficients). Thus there appears to be substantial genetic variation for enhanced milk production throughout the entire lactation pcriod. A consequence of practical importance is that there are not strong tradeoffs between early and late lactation: genetic improvement at one age will tend to improve all ages. The second eigenfunction, which accounts for less than 8 % of the genetic variation, shows a tradeoff between performance before v. after the fourth month of lactation.
65
All three methods give similar results in this example. Our experience with other examples, hmNever, shows that this is not always so. It is likely that the agreement between the methods in this case results from the high precision of the parameter estimates in this very large data set and the relatively large number of measured ages. In other applications, we expect the infinitedimensional methods will have substantially greater power than the conventional matrixbased ones (see Kirkpatrick & Heckman, 1989). Furthermore, the infinitedimensional methods estimate the full eigenfunctions rather than a series of points along them.
7. DiscUlssion
The methods developed here complement those developed earlier for estimating and analysing the structure of variation in traits that change with age. The tcchnique of asymmetric coefficients may lead to smoother and more accurate estimates of the covariance and correlation functions whenever we are willing to allow there to be a crease (that is, a discontinuous first derivative) along its diagonal. The technique of extrapolating to the diagonal allows one to correct for biases that inflate the diagonal elements of a covariance matrix (the variances). The eigenvalues and eigenfunctions of covariance functions estimated by either method can be calculated numerically to reveal dominant components of variation and tradeoffs.
A question common to all of these methods is how to decide the appropriate degree of smoothing for the estimate of the covariance function. Kirkpatrick et af.
(1990) developed a goodnessoffit test, and suggested we choose the smoothest function (the one with the smallest number of orthogonal polynomials) that does not differ significantly from the original data matrix. Our analysis of phenotypic variation in lactation shows that this criterion is not ahvays adequate. The function that it chooses may not be positive semidefinite, and therefore not qualify as a covariance function. Our preference in this case is for a smoother estimate of the covariance function that is better behaved (Fig. 3). Although it does differ significantly from the original data, the discrepancies are small.
An issue related to smoothing involves the positive definiteness of the covariance functions estimated by these methods. A covariance function, by definition, must be positive semidefinite. Estimates of covariance functions, like estimates of covariance matrices, can violate this requirement. Even if the matrix on which it is based is positive semidefinite, there is no certainty that the covariance function estimated by interpolating between the points of the matrix will be. The problem is illustrated in Fig. 3 by the estimates with k = 9. One might choose to use positive semidefiniteness as one of the criteria for choosing among estimates of the covariance function that differ in the
GRH 64
M. Kirkpatrick, W. G. Hill and R. Thompson
degree of smoothing. One might expect smoother fits generally to be less prone to violate positive semidefiniteness if the original data matrix does not.
Alternative methods for fitting functions might lead to betterbehaved estimates of covariance functions, for example that are smoother, that conform better to the data, and that are less likely to violate the requirement of positive semidefmiteness. Polynomials, which are the basis for the estimates in this paper, are very often wiggly and can be poorly behaved when used for extrapolation. Other methods such as two dimensional splines are available (Lancaster & Salkauskas, 1986). They might lead to improved estimates.
One clear opportunity for alternative methods involves our method for extrapolating to the diagonal of the covariance function. The algorithm we developed produces a covariance function estimate that has a crease (a discontinuous first derivative) along its diagonal. Cnfortunately, trajectories corresponding to covariance functions of this sort are not smooth: they are continuous, but do not have continuous derivatives (Soong, 1973, chapter 4). We usually expect on biological grounds that a growth process will be smooth. \Ve therefore would prefer an estimate with continuous derivatives everywhere. It may be possible to extend the approach described here to cure this weakness in our method. In any event, we suspect that this extension typical1y would make only small changes to the quantitative results. The data analysed in Section 6 below suggest that further changes in the variance estimates produced by smoothing the diagonal crease will be small relative to the corrections made by the algorithm presented here.
Lactation in dairy cattle is an excellent candidate for infinitedimensional analyses because changes in rate of production throughout lactation are of interest. \Ve would like to maximize production over the whole lactation, and need to be able to predict lactation yield from a small number of records early in lactation, both at the phenotypic level so as to make early culling decisions, and at the genetic level to make early selection decisions. Previous analyses of lactation curves and yield prediction (reviewed by Danell, 1990) have not considered the underlying continuous covariance structure of the records. The deviation of an individual from the population mean at one or two early ages can be used to estimate its performance at any later age or set of ages by the standard methods of partwhole correlation (see, e.g. Falconer, 1989; VanRaden el al. 1991). Given the economic incentives, the relatively small amount of additional computation required by the infinitedimensional method seems a small price to pay.
Our results show that allowance for inflation of the phenotypic variance by measurement error and datespecific effects (such as illness and weather) needs to be made in computing the underlying phenotypic covariance structure. Analyses of daily milk records
66
have shown that almost all of the increase is associated with the variance of the daily record., but some residual effects span a few days. For example, the phenotypic correlations of r<!cords 1, 2, 10 and 30 days apart were 0'84, 0'82, 079 and 075 in a small data set (Pander et aI. 1993). Test day records used in the present analysis were approximately 30 d apart, and the increase in diagonal elements of 30 % or so (Fig. 3) correspond to these figures. In the repeata bility model commonly used in the analysis of quantitative traits with multiple records (e.g. Falconer, 1989, chap. 8) it is assumed that #(11' '2) = rVp for all t1 =!= [2' and that 2I'(t1 , ( 1 ) = Vp , where r is the repeatability. In our analysis we allow for this inflation of the variance but also for continuous changes in the covariance over the lactation.
The genetic analysis shows that, although there is substantial additive genetic variation, about 90 % of it is associated with the first eigenfunction, which is positive at all ages (Fig. 5). Tradeoffs are seen in the second eigenfunction, which shows opposite effects on production before and after the fourth month of testing. This eigenfunctions, however, accounts for less than 7 % of the genetic variation. The present analysis therefore formalizes what is known from examination of the genetic correlation matrix of test day records, which shows high positive values throughout (e.g. Pander et al. 1992), that selection on records from the first few months of production will have little negative consequences on performance in later lactation. A further development of the methods would involve defining joint covariance functions of yield of milk and, for example, of proportion of fat in the first lactation and (requiring more change in structure) of milk yield in the first and in later lactations.
Two directions for future work are suggested by this work. First, covariance functions would be better fitted assuming that the error structure of the estimated variances and covariances followed a multivariate Wishart distribution using likelihood to evaluate the fit. This approach requires numerical iteration. A covariance function estimate based on the methods from this paper (using least squares, assuming normallydistributed errors) would be a logical initial point for the iterations.
The ultimate extension would be a method in which the covariance function is estimated directly from the original observations, without the intermediate of a covariance matrix. In the analyses above, measurements records for individuals were grouped into 1month categories for analysis. These pooled data were than analysed to give estimated covariance matrices, which in turn were analysed by the infinitedimensional methods. A more direct approach would avoid the pooling entirely and instead treat each record according to the individual's actual age (or number of days since lactation began). Such a method might well increase the precision of covariance function estimates.
b!finitedimensional cows
We thank Nick Barton and two anonymous referees for comments. This work was supported by NlH grant 4522601 and NSF grant 9107140 to M.K.
Appendix A
The appendix describes the method of asymmetric coefficients. Programs for this analysis have been implemented in a lvlathematicdJpy notebook that is available on request from the senior author.
Here we 'will follow Kirkpatrick et al. (1990) by fitting orthogonal functions. These functions, denoted 9i (i = 0,1, ... ), are orthogonal over the interval [u, u]. In the examples discussed in the text and below we use Legendre polynomials, in which case u = 1 and v = 1.
The method of asymmetric coefficients then proceeds according to the following seven steps.
(i) Form the vector p by stacking the successive columns of the lower left diagonal part of the phenotypic covariance matrix:
(A 1)
(ii) Form the coefficient vector c by stacking the successive columns of upper left diagonal part of the matrix c:
c = (Coo, C 10' ••• , Ck  1 ,O' COl. Cll"'" C If  1 . P ..• ,
CO. k lr". (A 2)
This contrasts with the method of symmetric coefficients, in which the coefficient vector is formed from the lower diagonal parts of the columns of C, in the same way that p is formed according to eqn (A 1). Note that the subscripts run from 0 to k 1 rather from 1 to k in order to conform with the conventional numbering of the orthogonal functions.
(iii) The estimated phenotypic covariance matrix P is based on measurements at n ages; these ages form the age vector a. We use a to calculate the adjusted age vector a*:
(A 3)
j = 1,2, ... , n. This operation rescales the range of the measured ages to the range of the orthogonal functions.
(iv) The next step is to form the matrix X from the orthogonal functions. The way in which the vectors p and c are formed makes the notation for specifying the elements of X somewhat awkward. \Ve begin by defining four "index functions', which are integervalued functions that generate appropriate subscripts for the orthogonal functions and the adjusted age vector. The first two index functions are used to generate the subscripts for the matrix C as they appear on the right hand side of (A 2). The index function 11 (i, k) is based on the sequence
0, 1, ... ,kl,0, 1, ... ,k2, ... ,0. (A 4)
67
The value of 11 (i, k) is given by the ith element of (A 4). For example, 11(5,3) = 1. The function Iz(i,k) is based on the sequence
0,0, ... ,0,1,1, ... ,1, ... ,kl, (A 5)
in which there are k Os, (kl) Is, etc. The value of 12(i,k) is given by the ith element of A 5; thus 12(6,3) = 2.
The third index function is 13 (i, k), and is based on the sequence
1,2, ... , k, 2, 3, ... , k, ... , k. (A 6)
The value of Iii, k) is given by the ith element of (A 6). For example, 1a(4, 3) = 2. The last two index functions are simply an incremented and decremented version of 12 and 13:
14(i,k) = 1k,k); 1,
I5U, k) = T3U, k)1.
(A 7)
(A 8)
For all five index functions, the argument i can take on the values i = 1,2, ... , k(k+ 1)/2.
With these definitions in hand, we now calculate the matrix X:
(A 9)
where i= ,2, ... ,n(n+1)/2 and j= 1,2, ... , k(k + 1)/2. By way of comparison, the previous method of symmetric coefficients calls for X to be of the form
Xij = 915U,k)(a1,u, n) ¢I,(j . .,)(ai',(i,n)
+ 91.(j, k)(a;3(i, n») 915U, ",)(at(i. H) / I + J[I2(j, k),
15(j, k)], (A 10)
where o[s, t] = 1 if s = t and is 0 otherwise. (v) For a full estimate of &, we solve for c using the
relation
(A 11)
Alternatively, we may be interested in a reduced estimate of &, in which the number of orthogonal coefficients that are fit is smaller than the number of observations (k < n). To do so, we first calculate the matrix V that is the error covariance matrix corresponding to the vector p; details are given in Kirkpatrick et al. (1990). We than calculate c using weighted least squares:
c = (XT VI X)I X T V 1 p. (A 12)
(vi) We now form the estimated coefficient matrix C by unstacking the vector c (that is, performing the reverse of the operation described by eqn (A 2). Using the notation of the index functions, the elements of C are found by plugging successive values of i into the relation
(A 13)
where i = 1, 2, ... , k(k  1) /2; all other elements of C are O.
52
M. Kirkpatrick, W. G. Hill and R. Thompson
(vii) The coefficient matrix generates our estimate for the covariance function:
(A 14)
where
(A 15)
The form ofeqn (A 14) guarantees that.#is symmetric, as reg uired by one of the defining criteria of covariance functions. In the case of a reduced fit (k < 11), the consistency with the original data matrix can be tested for statistically significant deviation from the data using the goodnessoffit test described in Kirkpatrick et a!. (1990, Appendix C).
0) A worked example
The method will be illustrated by the worked example presented by the covariance matrix of text eqn (5). We will use Legendre polynomials for the fit, which are defined over the interval [ 1, 1], so 11 =  1 and v = 1.
Following Step (i) above, stack successive columns of the subdiagona1 part of P to form the vector
p = (3,2, 3V. (A 16)
From Step (ii), the (unknown) vector of coefficients is
(A 17)
From Step (iii), the adjusted age vector is
a* = (1, (A 18)
\Ve form the matrix X from the orthogonal functions as described in Step (iv):
c¢"CCn 9o(a;t')] [91(a;") 9o(ai)]
X [9o(ap 9o(an] [91 (a;) 90(a;")J [9ua; ) 90(a:)] [91(a:J ¢o(a~)]
2 ( \,'3\ \2)
( _ \/3) \ 2 j
1 \,3 (~) 2 2 \ 2
(A 19)
1 \3 \ 3 
2 2 2
The orthogonal coefficients for the full fit are calculated as described in Step (v):
c = XI P = (6,  v'3/3,\,3/3)T. (A 20)
By unstacking this vector according to Step (vi), we find the estimated coefficients matrix given by text eqn
68
(7). Finally, using that result in Step (vii) we arrive at the estimated covariance function given by text eqn (8).
Appendix B
The technique of extrapolating to the diagonal makes use of the asymmetric coefficient fit described in Appendix A. Programs that run this analysis have been developed in a Mathematica@) notebook that is available Crom the senior author on request.
As with the earlier methods, we are interested in fitting the data using k orthogonal functions. But because we are not using the diagonal elements of Pin the fit, we now require k < n rather than k ~ n, The algorithm proceeds by the following steps.
(i) Form the vector p by stacking the subdiagonal parts of the columns of P:
This vector is of length nnl)/2. (ii) Form the coefficient vector c according to eqn
(A 2). (iii) We calculate two adjusted age vectors a* and
b*:
(B 2a)
and
(B 2b)
i = 1, 2, ... , n  1. (iv) Form the matrix X:
(B 3)
where
Is(i,n) = 13 (i,nl)+1, (B4)
i= 1,2, ... ,I1(n1)/2, andj= 1,2, ... ,k(k+1)/2. v) As in the previous section, the coefficients are
calculated using eqn (A 11) for a full estimate, or eqn (A 12) for a reduced estimate. Note, however, that because only the offdiagonal elements of P are being used, a full fit implies k = n  1 rather than k = n polynomials (and likewise a reduced fit implies k < n 1). A reduced fit can be tested for consistency with the original data using the goodnessoffit test described in Kirkpatrick et al. (1990, Appendix C). When extrapolating to the diagonal, however, the values along the diagonal are omitted from the test.
(vi) The coefficient matrix C is formed from the vector c via eqn (A 13).
(vii) Finally, the estimated covariance function is obtained by substituting C into eqn (A 14) using
(B Sa)
Infinitedimensional cmvs
and
where t 1 and ! 2 range over the interval [a" az]'
(i) A Ivorked example
\Ve will demonstrate the method of extrapolating to the diagonal using the covariance matrix given in text eqn (11), based on measurements taken at the ages
a = (l0, 11, 15)1'.
(This age vector will illustrate how the infinitedimensional method naturally accommodates unequal spacing of ages.) \Ve will calculate a full estimate oL~ (that is k = 11  1 = 2), again using Legendre polynomials.
The vectors p, c, a*, and h* arc
and
a* = h* = (  1, 1 r. Steps (iii(v) produce the same values for X, c, and C found in the example of Section 2. Last, we follow Step (vi) to obtain the estimate of the covariance function given by text eqn 12).
References
Danell, B. (1990). Genetic aspects of different parts of the lactation. Proceedings of' the 4th Vvorld Congress on
69
Genetic Applications to Lh'estock Production, Edinburgh, 14,114117.
Falconer, D. (1989). Introductioll to Quantitati1'e Genetics, 3rd edition. Kew York: Longman.
Gomulkiewicz, R. & Kirkpatrick, ~1. (1992). Quantitative genetics and the evolution of reaction norms. EL'o/ution 46. 390411.
Kirkpatrick, M. & Hel:kman, K. (1989). A quantitative genetic model for growth, shape and other infinitedimensional characters. JOllrnal of'il4athematical Biologr 27,429450.
Kirkpatrick, M., Lofsvold, D. & Bulmer, M. (1990). Analysis of the inheritance, selection and evolution of growth trajectories. Genetics 124, 979993.
Kirkpatrick, M. & Lofsvold, D. (1992). Measuring selection and constraint in the evolution of growth. El'olution 46, 954971.
Lancaster, P. & Salkauskas, K. (1986). Cune and Surface Fitting: An Introduction. London: Academic Press.
Meyer, K. (1985). Maximum likelihood estimation of variance components for a multivariate mixed model with equal design matrices. Biometrics 41, 153165.
Pander, B. L., Hill, W. G. & Thompson. R. (1992). Genetic parameters of test day records of British HolsteinFriesian heifers. Animal Production 55, 1121.
Pander, B. L., Thompson, R. & HilL W. G. (1993). The effect of increasing the interval between recordings on genetic parameters of test day yields of British HolstcinFriesian heifers. Anima! Production 56,159164.
Pander, B. L., Thompson, R. & Hill, W. G. (1993). Phenotypic correlations among daily records of milk yields. Indian Journal of Animal Sciences 63, 12821286.
Patterson, H. D. & Thompson, R. (1971). Recovery of interblock information when block sizes are unequal. Biomerrika 58, 545554.
Soong, T. T. (1973). Random DifFerentia! Equations in Science and Engineering. New York: Academic Press.
Van Raden, P. M., Wiggans, G. R. & Ernst, C. A. (1991). Expansion of projected lactation yield to sta bilize genetic variance. Journal of Dair:r Scif:'rlce 74, 43444349.
\Volfram, S. (1991). Mathematica: A System for Doing lHathematics h)' COrrJpUler, Second edition. Redwood CilY: AddisonWesley.
Copyright 1999 by the Genetics Society of America
The Genetic Analysis of AgeDependent Traits:Modeling the Character Process
Scott D. Pletcher*,† and Charles J. Geyer†
*Department of Ecology, Evolution and Behavior and †School of Statistics, University of Minnesota, Saint Paul, Minnesota 55108
Manuscript received March 15, 1999Accepted for publication June 22, 1999
ABSTRACTThe extension of classical quantitative genetics to deal with functionvalued characters (also called
infinitedimensional characters) such as growth curves, mortality curves, and reaction norms, was begunby Kirkpatrick and coworkers. In this theory, the analogs of variance components for single traits arecovariance functions for functionvalued traits. In the approach presented here, we employ a variety ofparametric models for covariance functions that have a number of desirable properties: the functions (1)are positive definite, (2) can be estimated using procedures like those currently used for single traits, (3)have a small number of parameters, and (4) allow simple hypotheses to be easily tested. The methodsare illustrated using data from a large experiment that examined the effects of spontaneous mutationson agespecific mortality rates in Drosophila melanogaster. Our methods are shown to work better than astandard multivariate analysis, which assumes the character value at each age is a distinct character.Advantages over existing methods that model covariance functions as a series of orthogonal polynomialsare discussed.
SINCE the introduction of quantitative genetics the function of some independent and continuous variable.ory and methods to the study of evolution, a tremen More specifically, a functionvalued trait is a function
dous body of literature has developed, documenting x(t). In all of the work that has been done on functionpatterns of quantitative genetic variation within and be valued traits, including ours, both the independent varitween species for a wide variety of continuous characters able t and the dependent variable x(t) are single valued.(Barton and Turelli 1989; Falconer 1989; Lynch These traits have also been called infinitedimensionaland Walsh 1998). Evolutionary biologists use this infor traits (Kirkpatrick and Heckman 1989) because themation to predict how a population might respond to character can take on a value at an infinite number ofnatural or artificial selection and to provide insight into ages. In principle, there is no reason why our methodsthe contributions of the various evolutionary processes or those of other workers in this area cannot be exto the levels of genetic variation seen in natural popula tended to allow t or x(t) or both to be multivariate. Fortions (Lande 1979, 1982; Houle 1992). Empirical esti the case of univariate t and x(t), we think “functionmates of genetic variances in single traits and genetic valued” is the more descriptive term. It avoids confusioncovariances between traits have contributed greatly to with characters that are described by a multidimensionalour knowledge of the evolution of biological characters. t or x(t). For specificity, we always refer to the indepen
Classical quantitative genetics theory covers the analy dent variable t as time or age, although there is nosis of a single quantitative trait, such as bristle number reason why it cannot be any continuous variable.in Drosophila, or at most a few traits. However, many In cases where the functional nature of the trait isinteresting characters are inherently too complex to be of interest, classical methods are often employed bydescribed by classical theory. Most often this is because treating arbitrary, discrete age intervals as unique charit is difficult to describe the character of interest by a acters in a multivariate analysis (Hughes and Charlessingle value. Examples can be found in the field of life worth 1994; Promislow et al. 1996; Tatar et al. 1996;history evolution, where traits change over the lifetime Pletcher et al. 1998). This approach is problematic.of an individual. In fact, in many cases it is the change As the number of ages of interest increases the abilityof the character with age that is the primary interest to produce precise estimates of statistical parameters(Hughes and Charlesworth 1994; Promislow et al. is rapidly lost (Shaw 1987, 1991). In addition, when1996; Pletcher et al. 1998). measurements are taken at irregular intervals, one
Functionvalued traits are characters that change as a might reasonably expect the trait to be more similarbetween ages separated by a short time as compared withmore disparate ages. A standard variance component
Corresponding author: Scott D. Pletcher, Max Planck Institute for analysis ignores this type of information.Demographic Research, Doberaner Str. 114, D18057 Rostock, Germany.Email: [email protected] Recognizing the limits of the classical approach, Kirk
Genetics 151: 825–835 ( October 1999)
826 S. D. Pletcher and C. J. Geyer
patrick and Heckman (1989) formulated a quantita Cov(eij, ekl) 5 dikεjl, (2a)tive genetic model for functionvalued traits, which has
where dik are the elements of the identity matrix (dik 5since served as the foundation for numerous theoretical
1 if i 5 k, and dik 5 0 otherwise), andand experimental investigations in this area. On thetheoretical side, agespecific selection on a character Cov(gij, gkl) 5 rikgjl, (2b)and its interactions with genetic constraints have re
where the rij are the coefficients of relationship (eleceived considerable attention (Kirkpatrick et al. 1990;
ments of the A matrix) and the gjl and εjl are parametersKirkpatrick and Lofsvold 1992). The evolution of
to be estimated. Making matrices G and E with elementsreaction norms over continuous environments has also
gjl and εjl allows us to write the matrix equationbeen studied (Gomulkiewicz and Kirkpatrick 1992).On the experimental side, estimates of genetic variation Var(x) 5 A ^ G 1 I ^ E, (3)for agedependent growth patterns in birds (Gebhardt
where ^ denotes the Kronecker product of matricesHenrich and Marks 1993; Bjorklund 1997), mice
(Searle et al. 1992, pp. 443 ff.) and x is a vector con(Kirkpatrick et al. 1990; Meyer and Hill 1997), and
taining all data on all individuals in the order x11, x12,livestock (Kirkpatrick 1997) have been published.. . . , x21, x22, . . . . The matrices G and E are symmetric
Moreover, the recent interest in agespecific compom 3 m matrices if there are m traits, and each has m(m 1
nents of genetic variation for other lifehistory charac1)/2 independent parameters. Statistical inference
ters (Engstrom et al. 1989; Houle et al. 1994; Hughesabout the G matrix and the constraints it imposes on
and Charlesworth 1994; Promislow et al. 1996;the dynamics of phenotypic evolution is the primary
Pletcher et al. 1998) suggests that interest in functioninterest in these analyses (Lande 1979, 1982).
valued traits is growing.Functionvalued traits add an additional level of com
A quantitative genetics theory for functionvaluedplication. Now for individual i the trait is a function
traits is a straightforward extension to standard methodxi(t) of the continuous variable t. Equations 2a and 2b
ology. Classical quantitative genetics partitions an obare replaced by
servable trait asCovei(s), ek(t) 5 dikE(s, t) (4a)
x 5 m 1 g 1 e, (1)Covgi(s), gk(t) 5 rikG(s, t). (4b)
where m is the mean (fixed effect) and g and e areThe primary interest in analyses of functionvalued traitsthe genetic and environmental components (randomis statistical inference about the “G function,” G(s, t),effects). Assuming no geneenvironment interaction, galso called the additive genetic covariance function. The “Eand e are independent, hencefunction,” E(s, t), also called the environmental covariance
Var(x) 5 Var(g) 1 Var(e). function, is of lesser interest.In practice, data are only observed at a finite set of
If xi, etc. denote the effects for individual i, the simplest times t1, . . . , tm, rather than a continuum, so we haveassumptions are that ei and ej are uncorrelated if i ≠ j only a finite set of data on each individual, which weand that Cov(gi, gj) is proportional to the coefficient of can consider as a multivariate trait vector xi(t1), . . . ,relationship of i and j (Falconer 1989, pp. 111 ff., xi(tm). Although in theory the trait has a continuous Gespecially p. 156). Making a matrix A of the coefficients function, in practice the covariance structure is deof relationship (the socalled numerator relationship ma scribed by a “G matrix.” The elements of the G matrixtrix) allows us to write the matrix equation are genetic covariances between the trait measured at
different ages. The key idea here is that the elementsVar(x) 5 s2gA 1 s2
eI,of the G matrix do not consist of unique parameters
where I is the identity matrix and s2g and s2
e are two for all variances and covariances. Instead, all elementsparameters to be estimated, the genetic and environ of this matrix are obtained from a single G function.mental variances. Thus, the finite dimensional G matrix for the character
More complex genetic models partition the genetic process model has elements defined by gjl 5 G(tj, tl). Aeffect into additive, dominance, and other effects similar argument applies for the “E matrix.” Given the(Lynch and Walsh 1998). All the theory and examples new parameterization of the G and E matrices, Equationin this article consider only additive models. Extension 3 again describes the variance of the observed phenoof our methods to include dominance and other effects type considered as a multivariate trait vector xi(tj).is theoretically straightforward (though no doubt some Is that all there is to functionvalued traits? It appearspractical difficulties will arise). as though we have simply redefined the problem. Al
When more than one trait is modeled, we have covari though in principle there is a G function G(s, t), inances among traits as well as among individuals (Shaw practice there is only a G matrix G(tj, tl). Is anything1987, 1991). If xij, etc., now denote the effects for individ new introduced by talking about functionvalued traits?
The answer is “yes,” because classical multivariate methual i and trait j, the simplest assumptions are now
827Genetics of FunctionValued Traits
ods run into intractable difficulties when there are many oN
i51oN
j51
bibjrX(ti, tj) $ 0. (7)traits. Even five traits are trouble (Shaw et al. 1995;Shaw and Geyer 1997). Functionvalued traits are often
Most quantitative genetics theory is based on the asobserved at many times (or many values of t if t is notsumption that the character of interest or some transfortime), too many for classical multivariate quantitativemation of it is normally distributed (Lynch and Walshgenetics to cope with.1998). This assumption can be extended to a characterSome new idea has to be added to manage the paramprocess by utilizing the theory of Gaussian processeseter explosion, m(m 1 1) parameters to estimate in the(Hoel et al. 1972; Kirkpatrick and Heckman 1989).genetic covariance matrix alone if data are observed atA stochastic process X(t), t P T, is called a Gaussianm times. In the theory of functionvalued characters,process if the vector (X(t1), X(t2), . . . , X(tm)) has athe number of parameters in the finite dimensional Gmultivariate normal distribution for every choice ofmatrix is equal to the number of parameters in the Gtimes t1, . . . , tm (Hoel et al. 1972). As with any Gaussianfunction—this is independent of the number of agesrandom variable, the distribution of a Gaussian processexamined, and the task is to model and estimate the Gis completely determined by its mean and covariancefunction. There are two possible approaches: parametfunction.ric and nonparametric. This article explores the use of
Using the language of Gaussian processes, we canparametric models for the G function. Kirkpatrick andnow complete our description of quantitative geneticscoworkers and followers use an approach that is nonfor functionvalued traits. We assume the observed pheparametric in spirit, although for most experimentalnotypic character process X(t) is a Gaussian process anddesigns it is missing some important features that onecan be decomposed analogous to (1) asexpects in a nonparametric statistical method.
In the following sections we provide a brief review of X(t) 5 m(t) 1 g(t) 1 e(t), (8)the seminal work in this area, while focusing on the
where m(t) is a nonrandom function, the mean functiondifferences between previous work and our own. Weof X(t), and g(t) and e(t) are meanzero Gaussian propresent representative examples from an extensive secesses that are independent of each other and haveries of simulations in which we compared our approachcovariance functions G(s, t) and E(s, t), respectively.with those suggested previously. We then illustrate theBy the independence of g(t) and e(t), the covariancevarious techniques using real data on mortality rates infunction of X(t) is given by P(s, t) asfemale Drosophila. Last, we summarize some of the
benefits of our character process model over previous P(s, t) 5 G(s, t) 1 E(s, t). (9)methods and suggest promising avenues for future theoretical development. Each individual has a different realization of the charac
ter processes X(t), g(t), and e(t). The covariance of theprocesses for different individuals we have already de
GENERAL CONSIDERATIONSrived as (4a) and (4b).
The probabilistic framework for modeling a function Thus the character process approach, also called funcvalued trait is based on the theories of stochastic pro tionvalued quantitative genetics, can be simply butcesses. A stochastic process can be defined as a set of briefly described as replacing the Gaussian random varirandom variables X(t), t P T, where T is a subset of the ables or random vectors of classical quantitative geneticsreal line and termed the time parameter set (Hoel et by Gaussian stochastic processes and proceeding mutatisal. 1972). A specific realization of a process (i.e., the mutandis. What we have described so far includes allvalues of the random variables at each t) is called a sample approaches to functionvalued quantitative genetics:path of that process. We are interested in processes with that of Kirkpatrick and coworkers, that of Meyer andfinite variance, i.e., for which EX(t)2 , ∞, the socalled Hill (1997), and ours. The differences are in how thesecondorder processes. In such cases, we can define a G and E functions are modeled and in how the modelsmean function of the process by are fitted to data.
mX(t) 5 EX(t), t P T (5)
and a covariance function of the process by NONPARAMETRICS AND ORTHOGONALPOLYNOMIALS
rX(s, t) 5 CovX(s), X(t), s, t P T. (6)In the approaches of Kirkpatrick and coworkers and
Equation 5 is the function describing how the expected of Meyer and Hill, the G and E functions are modeledvalue of the character changes with age, and (6) de by a linear combination of orthogonal Legendre polynoscribes the covariance between the character at two sepa mialsrate ages. The covariance function must be nonnegativedefinite, that is, for any finite set of times (t1 . . . tN) and G(s, t) 5 o
m
i50om
j50
φi(s)φj(t)kij, (10)any real numbers (b1 . . . bN),
828 S. D. Pletcher and C. J. Geyer
where G is the covariance function, m determines the have a large number of parameters, most of whichnumber of polynomial terms used in the model, kij are have no simple interpretation. Specific agedepenunknown parameters to be estimated (the coefficients dent hypotheses are not easily tested.of the linear combination), and φi is the ith Legendre
We avoid these problems by using parametric modelspolynomial (Kirkpatrick and Heckman 1989; Kirkfor the G and E functions. We discuss a large familypatrick et al. 1990). A similar model is used for the Eof parametric models, each with a small number offunction.interpretable parameters, that satisfy theoretical reKirkpatrick and coworkers used fitting proceduresquirements and that as a group exhibit a wide varietythat are no longer recommended, being supersededof behaviors. We (like Meyer and Hill 1997) use MLby the methods of Meyer and Hill (1997), who usedto estimate parameters. C code, implementing theserestricted maximum likelihood (REML). Meyer and Hillprocedures, is available from the first author.estimated the parameters of the model (i.e., the kij in
Equation 10) for each model with a fixed set of Legendre polynomials, which corresponds to fixing m in(10). They then used likelihoodratio tests to determine PARAMETRIC CHARACTER PROCESS MODELSa value of m that adequately fits the data.
Useful parametric models for covariance functionsWe have no argument with model fitting by maximumare limited by several theoretical requirements. First,likelihood (ML) or REML, but we propose a differentcovariance functions must be positive semidefinite, i.e.,way of modeling G and E functions. Covariance funcsatisfy Equation 7. Second, biological processes are extions modeled with Legendre polynomials (or otherpected to be reasonably smooth, requiring their covariorthogonal polynomials) have a number of potentialance functions to be smooth as well (Hoel et al. 1972).drawbacks.If a Gaussian stochastic process is to be considered
1. They are not automatically positive semidefinite. Al smooth, it will have differentiable sample paths, and sothough constrained ML or REML can be used to must its covariance function. In general the covarianceimpose this condition, this greatly complicates hy function has twice as many derivatives as the processpothesis testing and other statistical procedures. itself (Hoel et al. 1972). Thus, because we expect biolog
2. Legendre polynomials have no theoretical justifica ical processes to be relatively smooth, we choose covarition other than being one among many sets of or ance function models that are highly differentiable.thogonal basis functions. Third, it is desirable for the covariance function to have
3. Polynomials do not fit covariance functions well. parameters with biologically meaningful interpretationsPolynomials of high degree are extremely “wiggly” so that interesting hypotheses can be easily tested.and do not have asymptotes. Sensible covariance With these considerations in mind, we first concenfunctions are extremely smooth and typically trate on a simple model of a character process that
nevertheless may adequately represent many agedepenG(s, t) → 0, as s 2 t → ∞dent traits. We assume each process X(t) is secondorder
(an asymptote). stationary, which means4. For the majority of genetic studies, trying to be non
parametric about the covariance function of an un mX(t) is independent of t andobservable stochastic process may be optimistic. In rX(s, t) is a function of s 2 ttimeseries analysis and spatial statistics, where the
(Hoel et al. 1972). This stationarity assumption is necesstochastic process is observed directly, the most sucsary for several fundamental results, but it is relaxedcessful methods use parametric models [e.g., autorelater. Secondorder stationarity requires that the meangressive integrated moving average (ARIMA) modelvalue of the trait must not change with age and that theing of time series and variogram estimation in spatialcovariance between the value of the character at twostatistics]. Experience in spatial statistics shows thatdifferent ages depends only on the time distance bethe behavior of the covariance function at pointstween the age classes.closely related in time determines most of the behav
For stationary models, the choice of a covariance funcior of the process, and it is difficult to distinguishtion is greatly simplified by Bochner’s theorem (Hoel etdifferent behaviors in the tails of the covariance funcal. 1972), which asserts that a strictly positive covariancetion (Cressie 1993, section 3.2.1). It is even morefunction is necessarily proportional to the characteristicdifficult if the stochastic process is unobserved likefunction of some probability distribution. Thus, immethe genetic and environmental processes in quantitadiately we have a long menu of potential covariancetive genetics. For realistic experimental designs,functions from which to choose, as any realvalued charthere is not enough information in the data for goodacteristic function of a probability distribution is alnonparametric estimation.
5. Polynomial models for covariance functions often lowed. A number of satisfactory functions are presented
829Genetics of FunctionValued Traits
TABLE 1 (rather than covariance) stationarity. This relaxationallows variance to change with age. If rX(s 2 t) is theCovariance functions for the character process modelcorrelation function of a secondorder stationary process and v(t) is an arbitrary function, thenName Covariance function
rX(s, t) 5 v(s)v(t)rX(s 2 t) (11)Standard normal u0 exp(2uct 2)
Cauchy is a valid covariance function. Thus we can choose rX(t)u0
1 1 uct 2 to be any of the functions in Table 1 with the additionalrestriction that u0 5 1 [so that the correlation of X(t)Bilateral exponential u0 exp(2uct)with itself is 1] and choose v(t) completely arbitrarilyHyperbolic cosine u0
cosh(puct/2) and still obtain a reasonable model. Although the modelhas stationary correlation, the variance
Characteristic function of a u0sin(uct)uctuniform distribution VarX(t) 5 v(t)2
is not stationary and can be specified as we please.Characteristic function of a u0[1 2 cos(uct)]u2
ct2triangular distribution Hypotheses concerning the pattern of change in agespecific variances (genetic and otherwise) for a given
Characteristic function of a u0 exp(2ucta) character can be examined using this model.general stable distribution The parameters of the model are estimated straight
forwardly using ML or REML. The reason, as mentionedValid covariance functions derived from the characteristicfunctions of various probability distributions. The parameters in the Introduction, is that the character process is onlysatisfy u0 . 0, uc . 0, and 0 , a # 2. Characteristic functions observed at a finite set of times; hence the observationswere taken from Feller (1968). More general covariance form a multivariate normal random vector with meanfunctions can be obtained by replacing u0 with a more general
and covariance that are specified by the models for thevariance function (see text).mean function and G and E covariance functions. Inprinciple the estimation procedure is no different fromclassical quantitative genetics of multivariate traits. Onlyin Table 1. In many cases the characteristic function
of one probability distribution is proportional to the the model specification is new. In practice, however,the ideas of the character process model use reasonableprobability density function of another. In such cases
we refer to the covariance function by the name of assumptions to reduce the dimension of the parameterspace and make an agedependent quantitative analysisthe distribution with the proportional density. In cases
where there is no such distribution, the covariance func of the trait possible.tion is specifically referred to as the characteristic function of its parent distribution. The available functions
EXAMPLESexhibit a wide variety of behaviors, and some can benegative in sign. Simulation study: We investigated the behavior of the
character process and orthogonal polynomial (OP)Although the assumptions of stationarity are ratherstrict, we can use the results for stationary processes models through extensive simulations. Three represen
tative examples are provided in this section. For eachto formulate models that account for agedependentchanges in the mean value of the character and that example, a single data set was generated assuming a
standard halfsib design (Lynch and Walsh 1998) inallow for more general covariance functions. The simplest way to achieve firstorder stationarity (i.e., a con which 20 sires were each mated to three dams and three
progeny were measured from each dam. We assumedstant mean over time) is to model the mean separatelyas in (8), where g(t) and e(t) have mean zero for all the character of interest was measured at 10 regularly
spaced ages denoted 1, . . . , 10. It is important to notet, hence are firstorder stationary. The nonstochasticfunction m(t), analogous to fixed effects in classical that such a balanced design is not required for applying
these methods. Unequal family structure, as well as irregquantitative genetics, models the mean behavior. Analternative to modeling the mean function directly is to ularly spaced measurements, are perfectly accept
able, although different designs will contain differentuse methods analogous to those used to remove trendsin time series (Box et al. 1994), such as differencing amounts of genetic information (Shaw 1987). Details
of the simulation procedure are available from the firstthe series (replacing the value at time t by Xt11 2 Xt),and more generally using “integrated” models, such as author.
Because they are unobserved, we have no way of knowARIMA.A relaxation of secondorder stationarity—the condi ing what a typical genetic covariance function might
look like. Therefore, these examples are rather arbitrarytion that requires the covariance of the process betweenages t1 and t2 to be only a function of t1 2 t2—that still and serve mainly to illustrate the relationship between
the character process and OP models. We present threegives relatively simple models is secondorder correlation
830 S. D. Pletcher and C. J. Geyer
Figure 2.—(A) Actual genetic covariance surface for simuFigure 1.—(A) Actual genetic covariance surface for simulated data from case II: constant genetic variance and slowlylated data from case I: constant genetic variance and rapidlydeclining covariance. The form of the covariance function isdeclining covariance. The form of the covariance function isG(t1, t2) 5 0.5e20.01(t12t2)2. (B) Lack of fit of an estimated geneticG(t1, t2) 5 0.5e20.7(t12t2)2. (B) Lack of fit of an estimated geneticcovariance surface for a model consisting of three orthogonalcovariance surface for a model consisting of five orthogonalpolynomials. Lack of fit is defined as the absolute differencepolynomials. Lack of fit is defined as the absolute differencebetween the estimated surface and the actual surface. Darkerbetween the estimated surface and the actual surface. Darkerregions indicate greater lack of fit.regions indicate greater lack of fit.
cally uncorrelated (or nearly so), OP models provide arelatively simple cases: case I, genetic variance is conpoor estimate of the covariance function (Figure 1).stant across all ages, and genetic covariance declinesThe fivepolynomial model was determined to providevery quickly between adjacent ages; case II, genetic varian adequate fit to the data via likelihoodratio tests (aance is constant across all ages, and genetic correlationsixpolynomial model did not fit significantly better),declines very slowly; case III, the genetic covarianceand although the fit is quite poor, genetic variances arefunction is composed of four OPs (giving a covarianceestimated more accurately than covariances (Figure 1b).function of degree three).In our experience this is to be expected when covariFigures 1–3 present the actual covariance functionsances decline asymptotically toward zero within thefor each of the three cases along with contour plotsrange of the data. The wiggly nature of the polynomialdescribing the fit of different models to the simulatedmodel has difficulty reproducing such a structure. Thedata. The contour plots display the absolute differenceOP model does a much better job of describing thebetween the fitted surface and the actual surface, withcovariance structure when genetic correlations are highdarker regions indicating regions of poor fit and lighterbetween all ages in the data (Figure 2). In this case, theregions indicating regions of better fit. Contour shadingthreeOP model was determined as the best fit, andis constant over all figures, allowing comparisons beit does a reasonable job of estimating the covariancetween them.
When character values at different ages are geneti structure. The fits of the character process models are
831Genetics of FunctionValued Traits
not presented for these two examples. They are ex structure of the genetic covariances. Nevertheless, thefit of the character process model is not terrible, andpected to fit well (and do) because they were used to
generate the data. essentially smooths over the undulations in the actualfunction. Surprisingly, the OP model has some difficultyFigure 3 presents a genetic covariance function gener
ated directly from a fourOP model. In this case, it was reproducing the covariance structure. This is likely dueto the number of parameters in the model (10) andthe character process model (a linear variance model
with normal correlation) that had trouble capturing the the size of the simulated experiment. Even when theform of the underlying covariance function is knownprecisely, most experiments will not provide enoughinformation to accurately estimate even a moderatenumber of parameters.
In summary, OP models do not accurately describethe structure of the genetic covariance function whenthe genetic correlation is expected to decline significantly with age. We argued (see above) that it is thesetypes of covariance functions that one might expectfrom natural stochastic processes. For relatively simplecovariance structures, however, the OP models accurately estimate the surfaces (Figure 2). Flexibility fromthe range of allowable character process models allowsa reasonable approximation to the actual covariancestructure even when it is very irregular (Figure 3). Moreover, Figures 1–3 suggest that a significant strength ofthe character process model is its separation of variancefunctions from correlation functions. In all the examples, the majority of lack of fit is in the covariance (notvariance) structure, suggesting the overall fit of themodel is determined primarily by estimates of agespecific variances.
Agespecific mortality rates in Drosophila: In this example, our goal is to estimate the genetic covariancestructure for agespecific mortality rates in lines of Drosophila melanogaster allowed to accumulate spontaneousmutations for 19 generations (Pletcher et al. 1998).The data are mortality rate estimates (5day intervals) for29 mutationaccumulation lines. For each accumulationline there are four mortality observations at each age,and mortality rates are presented for six different ages.A logarithmic transformation was used to normalize thedata (Promislow et al. 1996; Pletcher et al. 1998). Inthis example, logmortality rates are examined throughage 30 days posteclosion. Data from the oldest ages wereexcluded because estimates of genetic variances andcovariances among these ages were extremely imprecisewhen estimated using standard methods, and often this
Figure 3.—(A) Actual genetic covariance surface for simulated data from case III: genetic covariance function basedon four orthogonal polynomials. (B) Lack of fit of an estimatedgenetic covariance surface using a character process modelwith a linear variance and normal correlation function. (C)Lack of fit of an estimated genetic covariance surface for amodel consisting of four orthogonal polynomials (the sameform used to generate the data). For both B and C, lack offit is defined as the absolute difference between the estimatedsurface and the actual surface. Darker regions indicate greaterlack of fit.
832 S. D. Pletcher and C. J. Geyer
TABLE 2hindered our ability to compare estimation methods.Estimates of the mutational covariance structure based Comparison of agespecific genetic variance matriceson the complete data set are presented in a companion estimated by various methodsarticle (Pletcher et al. 1999).
The data set was analyzed using three approaches. Age interval (days)MethodFirst, the genetic covariance structure was estimated(days) 0–4 5–9 10–14 15–19 20–24 25–29completely nonparametrically (i.e., using standard mulSMtivariate techniques) by specifying a separate parameter
0–4 0.55 0.50 0.48 0.39 0.26 0.11for each agespecific variance and each covariance. Our5–9 0.50 0.51 0.40 0.30 0.17sample size was far too small to estimate all 21 parame10–14 0.54 0.47 0.40 0.21ters in the 6 3 6 covariance matrix simultaneously, and15–19 0.53 0.49 0.26
we were forced to construct the matrix piecewise—by 20–24 0.51 0.29examining ages two at a time. Pairwise covariances were 25–29 0.16obtained using ML implemented in the program QUER OPCUS (Shaw 1987; Shaw and Shaw 1992). Second, a 0–4 0.62 0.60 0.52 0.39 0.23 0.04genetic covariance function composed of four Legendre 5–9 0.58 0.51 0.38 0.23 0.05
10–14 0.50 0.46 0.35 0.15polynomials (giving a polynomial of degree three) was15–19 0.52 0.48 0.26estimated using ML procedures similar to those of20–24 0.51 0.30Meyer and Hill (1997). Third, we used the character25–29 0.20process approach to estimate a genetic covariance func
CPtion based on a quadratic variance function and normal0–4 0.57 0.59 0.55 0.45 0.31 0.17correlation function (see Table 1).5–9 0.66 0.65 0.56 0.42 0.24
The estimated genetic covariance matrices for the 10–14 0.67 0.62 0.49 0.29various methods are presented in Table 2. Although all 15–19 0.60 0.50 0.32procedures appear to capture the dominant aspects of 20–24 0.45 0.31
25–29 0.22the covariance structure, several issues make the character process approach desirable. First, using standard Genetic covariance (generated by spontaneous mutations)multivariate methods, covariances and their asymptotic for agespecific mortality rates in female Drosophila melanogasterstandard errors were estimated pairwise and are too estimated by standard multivariate methods (SM), orthogonal
polynomials (OP), and the character process model (CP).small when considering the matrix as a whole. DespiteThe SM matrix was estimated “piecewise,” by estimating varithe small standard errors there is insufficient statisticalances and covariances between pairs of ages. The OP matrixpower to detect a significant change in covariance as was based on a model of four orthogonal polynomials, and
ages become further separated in time (analysis not the CP matrix was based on a quadratic variance and normalshown). Second, because data from each age are consid correlation function.ered separately, systematic relationships among thecharacters are ignored. Third, the sample size prohibitsestimating the entire 6 3 6 covariance matrix simultane guaranteed to be positive definite, and data from all
ages are analyzed simultaneously. Standard errors forously, and as a result the “piecewise” matrix (Table 2)is not even positive definite. the parameters of the model are obtained from the
maximization procedure and error estimates on the inThe genetic matrix produced by the fourpolynomialmodel is quite similar to that produced by the standard dividual age measures can be easily calculated. Most
covariance functions have relatively few parameters,methods. However, a primary concern remains the number of parameters in the model; we are estimating 10 which are estimated with high precision. Finally, and
perhaps most importantly, the parameters of the modelparameters for the genetic matrix alone. As with thestandard methods, the number of parameters demands have useful interpretations, which allow simple hypothe
ses to be easily tested.a large sample size for accurate estimation, but unlikethese methods, none of the parameters have a clear To further investigate the behavior of the character
process models, we fit several different covariance funcinterpretation. Although we may have asymptotic variance estimates for the coefficients of the OP (as is the tions to the data. In all models, we estimated a nonpara
metric mean function—average mortality rates at eachcase when ML is used), it is difficult to establish simpletests of interesting hypotheses. For example, the rate of age were estimated simultaneously—to account for the
increase in mortality rates with age. For both the geneticdecline in covariance as ages become further separatedin time is described by a complicated combination of and environmental effects, we examined the fit of covari
ance functions composed of (in all combinations) threethe coefficients of the polynomial.Many of the problems inherent in the standard and variance functions, the v(t)2 from Equation 11 (con
stant, linear, and quadratic) and three correlation funcOP methods are alleviated under the character processmodel. The estimated genetic covariance functions are tions, the rX(s 2 t) from Equation 11 (normal, Cauchy,
833Genetics of FunctionValued Traits
TABLE 3
Character process model estimates for genetic covariance functions
Correlation VarianceFunction Function u0 u1 u2 uc Likelihood
Normal Constant 0.38 — — 0.05 280.65(0.09) (0.02)
Linear 0.92 20.13 — 0.04 276.29(0.26) (0.04) (0.02)
Quadratic 0.40 0.21 20.04 0.03 273.14(0.25) (0.16) (0.02) (0.01)
Cauchy Constant 0.39 — — 0.06 280.93(0.10) (0.03)
Linear 0.94 20.13 — 0.04 276.51(0.26) (0.04) (0.02)
Quadratic 0.47 0.10 20.02 0.04 274.71(0.25) (0.14) (0.02) (0.02)
Uniform Constant 0.38 — — 0.53 280.24(0.09) (0.09)
Linear 0.90 20.12 — 0.47 276.01(0.26) (0.04) (0.10)
Quadratic 0.46 0.10 20.02 0.45 274.03(0.24) (0.14) (0.02) (0.10)
Parameter estimates (standard errors) and the log likelihoods for nine character process models composedof all combinations of three variance and three correlation functions. Variance functions are as follows: constant(u0), linear (u0 1 u1), and quadratic (u0 1 u1t 1 u2t 2). Correlation functions are taken from Table 1 with u0 51.Estimates were obtained using maximum likelihood.
and characteristic function of a uniform). For all analy is little statistical power to detect subtle differences inthe shapes of the underlying genetic correlation funcses the constant variance and Cauchy correlation func
tions were chosen for modeling the environmental co tion.Hypothesis tests concerning agespecific genetic varivariance—more complicated covariance functions did
not provide a significantly better fit (details not shown). ance for mortality are easily conducted. ML estimatesare asymptotically normally distributed, and thereforeParameter estimates for the genetic covariance func
tions are given in Table 3. The dynamics of agespecific their estimated standard errors can be used to constructconfidence intervals and test statistics (Searle et al.genetic variance can be determined using likelihood
ratio tests. Given a specific correlation function, twice 1992). Further, the significantly improved fit of the quadratic variance function over the constant and linearthe difference in log likelihoods between a more general
variance model (e.g., quadratic variance) and a more functions provides strong evidence for interestingchanges in mutational properties across ages, althoughconstrained model (e.g., linear variance) has a chi
square distribution with degrees of freedom equal to the the low variance at ages 25–29 days may be driving thisresult. Such statements could not be made from thenumber of additional parameters in the more general
model. The Pvalues for the test that a quadratic variance results of standard methods or from the fit of OPs.The hypothesis that most mutations affect mortalityfunction fits better than a linear one are 0.01 for the
normal correlation function, 0.06 for the Cauchy, and equally at all ages can be tested by asking if the correlation in mortality rates between various ages is different0.05 for the characteristic function of the uniform (the
deviances being 6.3, 3.6, and 3.96, respectively, all from unity. Because, for all character process models,uc (see Table 1) is the rate of decrease in correlation withasymptotically chisquare on 1 d.f.). A cubic variance
function did not provide a significantly better fit to the time, testing whether this value is significantly differentfrom zero directly addresses this hypothesis. The paramdata.
Given a particular model for the variance function, eter is significantly greater than zero in all models (P ,0.05), providing strong evidence that the majority ofthere is little difference between the fits of the correla
tion functions. For example, the loglikelihood values measured mutations exhibit some form of age specificity.for the normal, Cauchy, and uniform correlation func
tions with a quadratic variance function are 273.14, Despite the twofold increase in the number of parameters, a covariance function based on four OP did not274.71, and 274.03, respectively. Although a rigorous
test of nonnested hypotheses such as these is rather provide a significantly better fit than the bestfit functionfrom the character process model. Using two popularcomplicated (see Cox 1961, 1962), it is clear that there
834 S. D. Pletcher and C. J. Geyer
criteria, Akaike information criterion (AIC) and Bayes Many of the problems with OPs were recognized bythe original authors, and it has been suggested thatian information criterion (BIC; Schwarz 1978; Stone
1979), any of the character process models with a qua more advanced “smoothing” techniques, such as cubicsplines or wavelets, might be more well behaved (Kirkdratic variance function would be chosen over the best
OP model (data not shown). patrick et al. 1994). This is a promising avenue forfuture research. Good parametric and nonparametricapproaches complement one another. The strengths of
DISCUSSIONthe parametric approach are its great efficiency and itsease of interpretation. Unfortunately, if the assumedThe quantitative genetic analysis of functionvalued
traits, such as growth and mortality curves, starts with model is grossly incorrect, inferences can be misleading(Simonoff 1996). Good nonparametrics are less reliantthe fundamental recognition by Kirkpatrick and
Heckman (1989) that the genetic and environmental on assumptions about the formal structure of the data.They do, however, require large sample sizes, muchcomponents of such traits should be modeled as
Gaussian stochastic processes. It continues with the rec larger than many of the most ambitious quantitativegenetic studies. If there is insufficient information inognition by Meyer and Hill (1997) that ML or REML
can be used to fit such models, just as it can be used the data to support the accurate estimation of manyparameters, one is essentially left with a bad parametricfor all other quantitative genetics models. Our contribu
tion to the subject is a method of finding valid paramet model.An equally promising direction for the future mightric models for covariance functions of these Gaussian
processes from theory in spatial and timeseries statistics, be the extension of our techniques to examine the relationship between multiple character processes. Twowhere it is widely used (Cressie 1993, section 2.5.1).
These parametric models for covariance functions character processes can be examined by estimatingcovariance functions for each character and a crosshave many virtues. They are assured to be positive defi
nite, hence valid covariance functions. They can be cho covariance function between the two (Kirkpatrick 1988).The approach is analogous to estimating the geneticsen to be highly differentiable, implying the character
process itself is smooth, which we expect from a biologi covariance between two different characters, except inthis case the covariance is estimated for the value of thecal process. They have a small number of parameters,
and models can be chosen to address specific biological two characters at every combination of the two ages.In this way agedependent genetic constraints on thehypotheses. Moreover, the flexibility of the approach
means reasonable fits are obtained even when the actual independent evolution of the two traits can be explored.covariance function is highly irregular (Figure 3). Comments provided by J. Curtsinger, R. Shaw, G. Oehlert, R. Lande,
It is important to recognize that parametric models M. Kirkpatrick, A. Clark, and an anonymous reviewer greatly improvedthe quality and clarity of the manuscript. M. Kirkpatrick generouslyhave certain limitations. Although we have argued thatprovided creative discussion throughout the development of this work.our covariance functions are reasonable models, veriThis work was supported by National Institutes of Health grants AGfying the assumptions of the models, particularly sta0871 and Ag11722 to J. Curtsinger and by the University of Minnesota
tionarity in correlation, is exceedingly difficult (Math Graduate School.eron 1988). Stationarity will, however, often be a goodapproximation; and as George Box asserted, all modelsare wrong, but some are useful (Box 1976). Kirkpatrick LITERATURE CITEDand colleagues often focus on characterizing the domi
Barton, N. H., and M. Turelli, 1989 Evolutionary quantitativenant eigenfunctions of the genetic covariance function, genetics: how little do we know? Annu. Rev. Genet. 23: 337–370.
Bjorklund, M., 1997 Variation in growth in the blue tit (Paruswhich are thought to summarize patterns of geneticcaeruleus). J. Evol. Biol. 10: 139–155.variation (Kirkpatrick et al. 1990). Although we have
Box, G. E. P., 1976 Science and statistics. J. Am. Stat. Assoc. 71:not pursued it here, it is likely that for a particular covari 791–802.
Box, G. E. P., G. Jenkins and G. C. Reinsel, 1994 Time Series Analysis:ance function, the eigenfunctions are somewhat limitedForecasting and Control, Ed. 3. Prentice Hall, Englewood Cliffs, NJ.in their range of behaviors. One may argue, however,
Cox, D. R., 1961 Tests of separate families of hypotheses. Proc. 4ththat the process of choosing a good model in effect Berkeley Symp. 1: 105–123.
Cox, D. R., 1962 Further results on tests of separate families ofsearches a large space of possible eigenfunctions.hypotheses. J. R. Stat. Soc. B 24: 406–424.Implementing a nonparametric approach using Le
Cressie, N. A., 1993 Statistics for Spatial Data. John Wiley and Sons,gendre polynomials (Kirkpatrick and Heckman 1989) New York.
Engstrom, G., L. E. Lilijedahl, M. Rasmuson and T. Bjorklund,is problematic. Subsequent covariance functions are not1989 Expression of genetic and environmental variation duringnecessarily positive definite. Simple simulations showageing: 1. Estimation of variance components for number of
that polynomials of low degree do not closely approxi adult offspring in Drosophila melanogaster. Theor. Appl. Genet. 77:119–122.mate reasonable covariance functions unless character
Falconer, D. S., 1989 Introduction to Quantitative Genetics, Ed. 3.values at all measured ages are highly correlated (FigLongman, New York.
ures 1 and 2). Polynomials of high degree have many Feller, W., 1968 An Introduction to Probability Theory and its Applications, Vol. 1, Ed. 3. John Wiley and Sons, New York.parameters, more than are necessary to fit data.
835Genetics of FunctionValued Traits
GebhardtHenrich, S. G., and H. L. Marks, 1993 Heritabilities of Matheron, G., 1988 Estimating and Choosing: An Essay on Probabilitygrowth curve parameters and agespecific expression of genetic in Practice. SpringerVerlag, New York.variation under two different feeding regimes in Japanese quail Meyer, K., and W. G. Hill, 1997 Estimation of genetic and pheno(Coturnix coturnix japonica). Genet. Res. 62: 42–55. typic covariance functions for longitudinal or ‘repeated’ records
Gomulkiewicz, R., and M. Kirkpatrick, 1992 Quantitative genetics by restricted maximum likelihood. Livest. Prod. Sci. 47: 185–200.and the evolution of reaction norms. Evolution 46: 390–411. Pletcher, S. D., D. Houle and J. W. Curtsinger, 1998 Agespecific
Hoel, P. G., S. C. Port and C. Stone, 1972 Introduction to Stochastic properties of spontaneous mutations affecting mortality in DroProcesses. Houghton Mifflin, Boston. sophila melanogaster. Genetics 148: 287–303.
Houle, D., 1992 Comparing evolvability and variability of quantita Pletcher, S. D., D. Houle and J. W. Curtsinger, 1999 The evolutive traits. Genetics 130: 195–204. tion of agespecific mortality rates in Drosophila melanogaster: ge
Houle, D., K. A. Hughes, D. K. Hoffmaster, J. Ihara, S. Assima netic divergence among unselected lines. Genetics 153: 813–823.copoulos et al., 1994 The effects of spontaneous mutation on Promislow, D. E. L., M. Tatar, A. A. Khazaeli and J. W. Curtsinger,quantitative traits. I. Variances and covariances of life history 1996 Agespecific patterns of genetic variation in Drosophila melatraits. Genetics 138: 773–785. nogaster. I. Mortality. Genetics 143: 839–848.
Hughes, K. A., and B. Charlesworth, 1994 A genetic analysis of Schwarz, G., 1978 Estimating the dimension of a model. Ann. Stat.senescence in Drosophila. Nature 367: 64–66. 6: 461–464.Kirkpatrick, M., 1988 The evolution of size in sizestructured popu
Searle, S. R., G. Casella and C. E. McCulloch, 1992 Variancelations, pp. 13–28 in The Dynamics of SizeStructured Populations,Components. Wiley and Sons, New York.edited by B. Ebenman and L. Persson. SpringerVerlag, Heidel
Shaw, F. H., and C. J. Geyer, 1997 Estimation and testing in conberg, Germany.strained covariance component models. Biometrika 84: 95–102.Kirkpatrick, M., 1997 Genetic improvement of livestock growth
Shaw, R. G., 1987 Maximumlikelihood approaches applied to quanusing infinitedimensional analysis. Anim. Biotech. 8: 55–56.titative genetics of natural populations. Evolution 41: 812–826.Kirkpatrick, M., and N. Heckman, 1989 A quantitative genetic
Shaw, R. G., 1991 The comparison of quantitative genetic paramemodel for growth, shape, reaction norms, and other infiniteters between populations. Evolution 45: 143–151.dimensional characters. J. Math. Biol. 27: 429–450.
Shaw, R. G., and F. H. Shaw, 1992 QUERCUS: programs for quantiKirkpatrick, M., and D. Lofsvold, 1992 Measuring selection andtative genetic analysis using maximum likelihood.constraint in the evolution of growth. Evolution 46: 954–971.
Shaw, R. G., G. A. J. Platenkamp, F. H. Shaw and R. H. Podolsky,Kirkpatrick, M., D. Lofsvold and M. Bulmer, 1990 Analysis of1995 Quantitative genetics of response to competitors in Nethe inheritance, selection and evolution of growth trajectories.mophila menziesii: a field experiment. Genetics 139: 397–406.Genetics 124: 979–993.
Simonoff, J. S., 1996 Smoothing Methods in Statistics. SpringerVerlag,Kirkpatrick, M., W. G. Hill and R. Thompson, 1994 Estimatingthe covariance structure of traits during growth and ageing, illus New York.trated with lactation in dairy cattle. Genet. Res. 64: 57–69. Stone, M., 1979 Comments on model selection criteria of Akaike
Lande, R., 1979 Quantitative genetic analysis of multivariable evolu and Schwarz. J. R. Stat. Soc. Ser. B 41: 276–278.tion, applied to brain:body size allometry. Evolution 33: 402–416. Tatar, M., D. E. L. Promislow, A. A. Khazaeli and J. W. Curtsinger,
Lande, R., 1982 A quantitative genetic theory of life history evolu 1996 Agespecific patterns of genetic variation in Drosophila melation. Ecology 63: 607–615. nogaster. II. Fecundity and its genetic correlation with mortality.
Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative Genetics 143: 849–858.Traits. Sinauer Associates, Sunderland, MA.
Communicating editor: A. G. Clark
Copyright 2000 by the Genetics Society of America
Statistical Models for Estimating the Genetic Basis of Repeated Measuresand Other FunctionValued Traits
Florence Jaffrezic* and Scott D. Pletcher†
*Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, Scotland and †Max PlanckInstitute of Demographic Research, D18057 Rostock, Germany
Manuscript received February 18, 2000Accepted for publication June 26, 2000
ABSTRACTThe genetic analysis of characters that are best considered as functions of some independent and
continuous variable, such as age, can be a complicated matter, and a simple and efficient procedure isdesirable. Three methods are common in the literature: random regression, orthogonal polynomialapproximation, and character process models. The goals of this article are (i) to clarify the relationshipsbetween these methods; (ii) to develop a general extension of the character process model that relaxescorrelation stationarity, its most stringent assumption; and (iii) to compare and contrast the techniquesand evaluate their performance across a range of actual and simulated data. We find that the characterprocess model, as described in 1999 by Pletcher and Geyer, is the most successful method of analysis forthe range of data examined in this study. It provides a reasonable description of a wide range of differentcovariance structures, and it results in the best models for actual data. Our analysis suggests geneticvariance for Drosophila mortality declines with age, while genetic variance is constant at all ages forreproductive output. For growth in beef cattle, however, genetic variance increases linearly from birth,and genetic correlations are high across all observed ages.
Asimple and efficient procedure for the genetic anal 1998). Third, the character process model was recentlyproposed by Pletcher and Geyer (1999) and is basedysis of characters that change as a function of age
(or some other independent and continuous variable) on theories of stochastic processes. We develop andconsider a general extension of the process model tois desirable for researchers in several fields of biology
and genetics. Plant and animal breeders are often faced take advantage of new methods for estimating complicated correlation structures. Each of these methods haswith the genetic analysis of “repeated measures” data,
such as lactation in dairy cows or growth rates in impor been implemented in relatively easy to use computersoftware packages, and they are freely available.tant agricultural species. Biologists interested in the
evolution of life histories study the genetic basis of The aim of this article is to compare and contrastrandom regression, orthogonal polynomials, and charagespecific fitness components, such as survival or re
productive output, while evolutionary ecologists often acter process models and evaluate their performance.We focus first on examining the underlying assumptionsexamine the genetic relationship between values of a
single character expressed over a continuous range of of the three methods, while emphasizing fundamentalsimilarities and differences when appropriate. Next, weenvironmental variables.
Recent conceptual and computational advancements explore a variety of simulated data sets and describe thetypes of covariance structures (genetic, environmental,have made the genetic analysis of such functionvaluedand otherwise) accommodated by each method. Last,traits readily accessible. Three methods have been adusing empirical data on agespecific mortality and reprovanced in the literature. First, random regression modductive output in the fruit fly Drosophila melanogaster andels have been widely used for the analysis of longitudinalon growth in beef cattle, we evaluate the ability of eachdata in the traditional statistical literature (Diggle etmodel to adequately fit empirical data.al. 1994) and recently have been applied in the animal
breeding context (Jamrozik et al. 1997b). Second, theuse of orthogonal polynomials to approximate covari THE GENETIC ANALYSIS OFance matrices was initially suggested by Kirkpatrick FUNCTIONVALUED TRAITSand Heckman (1989) and is closely related to the ran
Detailed descriptions of the extension of classicaldom regression models (Meyer and Hill 1997; Meyerquantitative genetics to the analysis of functionvaluedtraits is given in Kirkpatrick and Heckman (1989)and Pletcher and Geyer (1999). In short, the method
Corresponding author: Scott D. Pletcher, Department of Biology, Wolfassumes the observed character is best described by ason House, 4 Stephenson Way, University College, London NW1 2HE,
England. Email: [email protected] function (or stochastic process) of some independent
Genetics 156: 913–922 (October 2000)
914 F. Jaffrezic and S. D. Pletcher
and continuous variable. Although any continuous vari acter process model (Pletcher and Geyer 1999). Allthree methods are based on likelihood estimation—able is acceptable (e.g., the level of some environmental
factor), the most common is age, and all of the examples although the orthogonal polynomial approach was originally published as a least squares estimation (Kirkpatin this article focus on characters that change with age.
Further, it is assumed that the character values at each rick et al. 1990).Random regression: Random regression (RR) modelsage constitute a multivariate normal distribution on
some scale. employ parametric forms for the unobserved functionsin (1). Although traditionally a parametric mean curveAs with traditional quantitative genetics, it is assumed
that the observed phenotypic trajectory of the character is often used to estimate m(t), this is not essential. However, the individual deviations from this curve [i.e., theis random and influenced by one or more unobservable
factors. In the simplest case one might consider the g(t) and e(t)] are assumed to be parametric functionsof time, and polynomials are often used. For example,additive contribution of many genes along with unpre
dictable environmental effects. More complicated mod the agedependent deviations from the populationmean due to an individual’s genotype might be linearels involving interactions among different genes or spe
cific environmental effects (e.g., maternal effects) are in time, such thatstraightforward, although computational difficulties will
g(t) 5 a1 1 a2t,likely arise. For the additive model, we assume the observed phenotype can be decomposed as where the ai are random genetic regression coefficients.
The regression coefficients are unobservable randomX(t) 5 m(t) 1 g(t) 1 e(t) 1 ε, (1)effects; they have a specific value for each individual;
where m(t) is a nonrandom function, the genotypic and they are assumed to be multivariate normally distribmean function of X(t), and g(t) and e(t) are Gaussian uted. The environmental deviations, e(t), are assumedrandom functions, which are independent of one an independent of the genetic effects, and they are modother and have an expected value of zero at each age eled similarly.(Kirkpatrick and Heckman 1989; Pletcher and Genetic and environmental covariances as a functionGeyer 1999). They represent the agedependent ge of age are determined by the variances and covariancesnetic and environmental deviations, respectively. In this among the regression coefficients. Following the examcontext, e(t) is often referred to as the permanent envi ple presented above, the genetic covariance betweenronmental effect and ε is the residual variation—ε is ages s and t isassumed normally distributed with constant and un
G(s, t) 5 Cov(g(s), g(t))known variance over time. The original developmentof the character process (CP) model did not include a 5 Cov(a1 1 a2s, a1 1 a2t)residual variance term (Pletcher and Geyer 1999).
5 Var(a1) 1 (s 1 t)Cov(a1, a2) 1 st Var(a2).Recently, however, we have found that data sets thatexhibit a great deal of measurement error support a (3)residual variance.
The primary objective in these models is to chooseThe goal of the analysis is to decompose the observedthe most appropriate parametric functions for the gevariation in X(t) into its genetic and environmental connetic and the permanent environmental deviations. Intributions by estimating covariance functions for g(t) andmany cases the parametric functions are nested ande(t). A covariance function, r(s, t), is a bivariate continulikelihoodratio testing can be used. Since this involvesous function that describes the covariance between anytesting the significance of parameters on the boundarytwo ages, r(s, t) 5 Cov X (s), X (t). By the independenceof their feasible parameter space, the test statistics areof g(t) and e(t), the phenotypic covariance function ofoften mixtures of chisquare distributions (Stram andX(t) is given by P(s, t) asLee 1994).
P(s, t) 5 G(s, t) 1 E(s, t), (2) Character process model: In contrast to the RR models, the character process model does not attempt towhere G(s, t) is the genetic covariance function, and E(s, t)model the forms of the g(t) or e(t) functions. Instead,the environmental covariance function, which also includesparametric models for the covariance functions themthe residual variance. These functions are estimable viaselves [i.e., G(s, t) and E(s, t) in Equation 2] are themaximum likelihood (ML) or restricted maximum liketarget of analysis (Pletcher and Geyer 1999).lihood (REML) when there are data on individuals of
Again taking the genetic covariance function as anvarious relatedness (Lynch and Walsh 1998; Pletcherexample, the covariance function can be decomposedand Geyer 1999).intoThere have been at least three different methods sug
gested for estimating the desired covariance functions: G(s, t) 5 vG(s)vG(t)rG(s 2 t), (4)orthogonal polynomials (Kirkpatrick and Heckman1989), random regression (Meyer 1998), and the char where vG(t)2 describes how the genetic variance changes
915Genetic Analysis of FunctionValued Traits
with age and rG(s 2 t) describes the genetic correlation G(s, t) 5 om
i50om
j50
φi(s)φj(t)kij, (6)between two ages. There are no restrictions on the formof vG(·), and it is often modeled using simple polynomi where m determines the number of polynomial termsals (linear, quadratic, etc.). As presented in Pletcher used in the model, kij are the m(m 1 1)/2 unknownand Geyer (1999), the character process model assumes parameters to be estimated (the coefficients of the lincorrelation stationarity; i.e., the correlation between two ear combination), and φi is the ith Legendre polynomialages is assumed to be a function only of the time distance (Kirkpatrick et al. 1990). The environmental covari(s 2 t) between them. Although strictly speaking this ance function is modeled similarly. Meyer and Hillassumption is almost surely wrong, experience suggests (1997) present a method for estimating covariance functhat it is expected to provide a reasonable approxima tions such as (6) directly from the data using REML.tion in most cases (Pletcher and Geyer 1999). The As originally presented, the orthogonal polynomialbenefit of correlation stationarity is that it allows numer approach is similar in spirit to the CP model, and bothous choices for r(·), all of which satisfy several theoreti differ in principle from the RR approach. In the RRcal requirements (Pletcher and Geyer 1999). methods, the primary model development occurs at the
We suggest an extension of the character process level of individual deviations (Equation 1). The analystmodel for nonstationary correlations using a method begins by considering the behavior of individual ageproposed by NunezAnton (1998) and NunezAnton specific deviations. The resulting covariance structureand Zimmerman (2000) in what they term structured is a consequence of these deviations. For the CP andantedependence models. The idea is to implement a OP models, the situation is reversed. The analyst beginsnonlinear transformation upon the time axis, f(t), such by considering the structure of the covariance matrixthat correlation stationarity holds on the transformed (Equation 2), and the shapes of the individual deviascale—on the original scale the correlation is nonsta tions are a consequence of this structure. In some casestionary. The correlation function is then defined as it may be possible to expose a duality between the two, asr(s, t) 5 r(f (s) 2 f (t)), and the functions suggested Meyer (1998) has done for certain RR and OP models.by Pletcher and Geyer (1999) remain valid. Ideally When the data are collected at equally spaced intervals,the transformation function should contain a small CP models with a constant variance and an absolutenumber of parameters with interpretable effects. exponential correlation (r(s, t) 5 uc
s2t) function areNunezAnton and Zimmerman (2000) suggest a Box equivalent to an autoregressive model of order 1. At
Cox power transformation such that present, however, analytical difficulties preclude moregeneral results for the character process models.
f l(t) 5
(tl 2 1)/l if l ? 0
log t if l 5 0, (5)EXAMPLES AND ANALYSES
where l is a parameter to be estimated. ConsideringEstimation procedures: All models were estimated usan absolute exponential correlation function, r(s, t) 5
ing REML. In all cases a nonparametric mean functionuf(s)2f(t), the correlations on the subdiagonals are monowas used (i.e., a separate mean was fitted for each distincttone increasing if l , 1 or monotone decreasing if l .age in the data), which ensures a consistent estimate of1. If l 5 1 the nonstationary model reduces to a stationthe covariance structure (Diggle et al. 1994). Compariary one. Thus, a likelihoodratio test of the null hypotheson among models was based on the Bayesian informasis H0: l 5 1.0 can be used to quantitatively examinetion criterion (BIC; Schwarz 1978), which provides forthe extent of nonstationarity in the data. Additionallikelihoodbased comparison among nonnested modflexibility in the nonstationary pattern might beels. BIC isachieved by considering more than one parameter l.
loglikelihood 2 1⁄2 3 number of parameters in the model 3 log n*,For example, one might incorporate distinct li for different values of s 2 t, which is equivalent to a separate where n* 5 n 2 p when using REML with n the numberli for each subdiagonal of the covariance structure. of observations in the data set and p the number of
Orthogonal polynomials: Kirkpatrick and Heckman fixed effects. The model selected is the one that maxi(1989) originally presented the use of orthogonal poly mizes the criterion.nomials (OPs) as a nonparametric way of “smoothing” To determine the bestfitting model under each techpreviously estimated covariance matrices. This was the nique, a large number of models were fit to each datafirst attempt to formalize the estimation of covariance set. For the character process method, .100 differentfunctions in a genetic context. As with the CP model, models (i.e., different combinations of polynomial varithe shapes of the individual agedependent deviations ance functions and stationary and nonstationary correlawere not considered, and models for the structure of tion functions) were investigated, and the best modelthe variancecovariance matrix itself were the focus of was chosen according to the BIC criterion. We choseattention. Kirkpatrick and Heckman (1989) suggest to examine a large number of CP models for reasons
of thoroughness. The CP models are relatively new,that the genetic covariance function be represented as
916 F. Jaffrezic and S. D. Pletcher
and the behavior of these models is not well known. In with genetic variance function identical to that in thepractice, such an exhaustive search is not required, as stationary CP data, but with an arbitrary nonstationarystandard model selection procedures (e.g., sequential correlation structure (Figure 1B). The environmentaladdition of polynomial terms to the variance function) covariance was assumed identical to that in the stationresult in identical conclusions (results not presented). ary CP data. This data set is the nonstationary CP data.For both random regression and orthogonal polynomial The third data set was simulated according to a ranmethods, the appropriate polynomials of increasing de dom regression model with linear deviations for bothgree were fit until an increase in degree no longer the genetic and environmental parts. The chosen paresulted in a significant increase in the loglikelihood rameter values resulted in genetic and environmentalat the a 5 0.05 level (Meyer and Hill 1997). We find correlations that remained quite high over all ages inthat a reasonable approach to model selection requires the data (Figure 1C).on the order of 5–10 model fits for each method. The last data set that we present was simulated ac
Estimates of the covariance structure based on ran cording to an OP model, with quadratic Legendre polydom regression and orthogonal polynomial methods nomials for the genetic and environmental parts (i.e.,were obtained using the software package ASREML m 5 2 in Equation 6). The shapes of the covariance(Gilmour et al. 1997), while estimates of the character functions were rather undulating, as is expected fromprocess model (and certain orthogonal polynomial functions based on orthogonal polynomials. Parametermodels) were obtained using computer software devel values were chosen such that the environmental correlaoped by one of the authors (S. Pletcher; C code and tion remained quite high over time while the geneticexecutable files freely available). A series of exploratory correlation was highly nonstationary (Figure 1D).analyses were conducted to ensure the two software To compare the fit of the models we calculated goodpackages produced comparable loglikelihoods. A small nessoffit statistics for the estimated variance and correnumber of covariance structures could be fitted by both lation functions under each model with respect to thepackages (models of constant variance and correlation simulated structure. Goodness of fit was quantified byacross ages, and small orthogonal polynomial models) the concordance correlation coefficient, rc, describedand these structures were fitted to several data sets. In by Vonesh et al. (1996; see appendix). The possibleall cases, identical loglikelihoods were reported by each values of rc are in the range 21 # rc # 1, with a perfectpackage. fit corresponding to a value of 1 and a lack of fit to
Simulated data: Many data sets were simulated ac values #0.cording to various covariance structures from CP, RR, Empirical data: Drosophila reproduction and mortality:and OP models. All were built assuming a standard sire
Agespecific measurements of reproduction and mortaldesign (i.e., groups of halfsibs) in which 12 offspring
ity rates were obtained from 56 different recombinantfrom each of 70 sires were measured at five differentinbred (RI) lines of D. melanogaster, which are expectedages (Lynch and Walsh 1998). Under such a design,to exhibit genetically based variation in longevity andthe estimated betweensire covariance function is direproduction (J. W. Curtsinger and A. A. Khazaeli,rectly proportional to the genetic covariance function.unpublished results). Agespecific measures of mortalityThe environmental covariance function and residualand average female reproductive output were collectederror are estimated based on the withinsire and thesimultaneously from two replicate cohorts for each ofwithinanimal variation. We present the results of four56 RI lines. Deaths were observed every day, while eggrepresentative data sets. Because the magnitudes of thecounts were made every other day. For both mortalityvariance and covariances were different among the simand reproduction the data were pooled into 11 5dayulations, we set the residual variance for all simulationsintervals for analysis. Mortality rates were log transto z10% of the total variance at age 0.formed and reproductive measures were squarerootThe first data set was simulated according to a stationtransformed to insure the agespecific measures wereary CP covariance structure, the purpose of which wasnormally distributed.to assess the behavior of RR and OP models when the
Growth in beef cattle: These data come from the Wogenetic correlation decreases to zero within the rangekalup selection experiment in Western Australia andof the data. The genetic covariance function was comcorrespond to January weights of 436 beef cows fromposed of a quadratic variance [i.e., a quadratic v2(·)77 sires. Weights were recorded between 19 and 82from Equation 4] and “normal” correlation (r(ti, tj) 5months of age, with up to six records per cow. Analysesexp(20.8(ti 2 tj)2)) (Figure 1A). The environmentalwere carried out within 83 contemporary groups (yearcovariance function was composed of a linear variancepaddockage of weighing subclasses), fitted as fixed efand “Cauchy” correlation function (r(ti, tj) 5 1/(1 1fects. Additional information, along with access to the0.05(ti 2 tj)2)) (Pletcher and Geyer 1999). We referdata, can be obtained from Dr. Karin Meyer’s web pageto this data set as the stationary CP data.at the Animal Genetics unit of the University of NewTo examine a wellbehaved covariance function with a
somewhat nonstationary correlation, we simulated data England, Australia (http://agbu.une.edu.au/zmeyer).
917Genetic Analysis of FunctionValued Traits
Figure 1.—Contour plotsof the simulated genetic covariance structures for (A)data generated accordingto a stationary character process (CP) model, (B) datasimulated according to aCP model with arbitraryand nonstationary correlation (this is a discrete valuematrix rather than a continuous function), (C) datagenerated under a randomregression (RR) model withlinear deviations, and (D)data simulated assuming anorthogonal polynomial (OP)model of degree 2.
RESULTS correlation patterns that decrease asymptotically to zerowithin the range of the data, and the correlation obtainedSimulations: For the stationary CP data, the best ranby both models goes negative (Figure 2).dom regression model according to the BIC criterion
The aim of the second simulated data set was to investiwas characterized by quadratic and linear deviationsgate the behavior of these models in the case of a ratherfor the genetic and environmental parts, respectively.simple nonstationary genetic correlation structure. TheHigherorder polynomials did not converge to a maxibest RR and OP models were the same as for the stationarymum and could not be considered. The best OP modelCP data detailed in the previous paragraph. The RR modelcontained a cubic polynomial for the genetic covariancedealt very poorly with the nonstationary pattern of theand a quadratic for the environmental part. As expected,genetic correlation (rc 5 0.10); the correlation was estithe simulated structure was accurately recovered by themated to be very high over all ages. Again, the greaterstationary character process model. Concordance conumber of parameters in the bestfitting OP model overefficients rc describing the goodness of fit for the varithe regression model provided a better fit to the correlaance and correlation functions are given in Table 1. Fortion structure (rc 5 0.70). Surprisingly, the CP modelthe RR and OP models, the environmental covariancefailed to accurately estimate the nonstationary correlationstructure (including both the variance and correlation)pattern (Table 1). Our nonstationary extension did notwas very well fitted (rc ≈ 1). The genetic variance wassignificantly improve the goodness of fit (BIC 5 24454also well modeled, but both models had trouble dealingand 24456 for stationary and nonstationary models, rewith the rapidly decreasing genetic correlation function.spectively; P 5 0.052 for a likelihoodratio test of l 5 1.0).Although the OP model could better estimate the geneticHowever, the goodness of fit of the fitted nonstationarycorrelation (rc 5 0.61 for OP compared to 0.36 for RR),correlation (rc 5 0.55) is substantially better than thatit contains significantly more parameters than the regresof the stationary model (rc 5 0.03), which provides ansion model (17 vs. 10), and both models exhibit similar
behavior. The polynomial structures are unable to handle interesting commentary on model selection criteria. In
918 F. Jaffrezic and S. D. Pletcher
TABLE 1
Goodnessoffit values for covariance functions estimated from threedifferent methods on simulated data
Simulated covariancestructure Model VarG CorrG VarE CorrE BIC
Stationary CP CP 0.98 1.0 1.0 1.0 24591RR 0.96 0.36 0.93 0.87 27414OP 0.98 0.61 0.98 0.98 26605
Nonstationary CP CP 0.91 0.03 0.99 1.0 24454RR 0.95 0.10 0.94 0.81 27397OP 0.84 0.70 0.98 0.97 26628
Random regression CPa 1.0 0.93 0.96 0.93 23817RR 1.0 0.94 0.99 1.0 23803OP 1.0 0.94 0.99 1.0 23803
Orthogonal polynomial CPa 0.86 0.10 0.69 0.94 214334RR 0.30 0.15 0.94 0.90 214371OP 0.99 0.83 0.99 1.0 214272
Concordance values (see appendix) for covariance functions estimated by three different methods on fourrepresentative covariance structures. The methods are CP, the character process model; RR, the randomregression model; and OP, the orthogonal polynomial model. VarG represents the fit to agespecific geneticvariances; CorrG refers to the fit to genetic correlations between ages; VarE represents the fit to environmentalvariances; and CorrE shows the fit to environmental correlations between ages. See text for details of thesimulated covariance structures and details of the bestfitting models for each approach.
a The bestfitting correlation function was a nonstationary CP model.
retrospect, the nonstationarity in this data set was predomi netic and environmental) remained quite high over time.Our nonstationary extension of the CP model was successnantly between extreme ages (ages 1 and 5). It is possible
that more observations per individual are needed to detect ful in providing a good fit to the data. The genetic covariance structure was described by a quadratic variance andsmall to moderate levels of nonstationarity (see fly repro
duction data). The genetic variance function and environ nonstationary correlation given by the characteristic function of the Uniform distribution (Pletcher and Geyermental covariance structure were identical to that for the1999), and the environmental variance function was linearstationary CP data and were well fit by all the methodswith a Cauchy correlation. The goodness of fit for the(Table 1).genetic correlation structure was improved substantiallyAll methods did a reasonable job of estimating theover a stationary model (rc 5 0.74, BIC 5 23819 andgenetic and environmental covariance structures generrc 5 0.93, BIC 5 23817 for the stationary and nonstationated according to a random regression model with linearary CP models, respectively).deviations. Under this model the correlations (both ge
Although we have essentially no idea what a typical agedependent genetic covariance function might look like,the data set simulated with an OP structure might beconsidered pathological in that the genetic covariancestructure is highly irregular. In fact, the genetic correlationis negative between early ages but highly positive betweenlate ages (Figure 1D). Such a structure is, however, typicalfor OP models (Kirkpatrick et al. 1994). Convergenceproblems hindered our ability to obtain estimates of highdimensional random regression models, and the best RRmodel was not able to accommodate either the simulatedgenetic variance or correlation (rc 5 0.30 and rc 5 0.15,respectively). Both the genetic and environmental covariance structures were described by a quadratic varianceand nonstationary correlation given by the characteristic
Figure 2.—Genetic correlations between age 1 and other function of the Uniform distribution. When comparedfor the simulated stationary character process data and fitted
to random regression, the CP model is much better atgenetic correlations obtained from the random regressionestimating the genetic variance function but is slightlymodel with linear deviations and orthogonal polynomial of
degree 3. worse at approximating the correlation structure (Table
919Genetic Analysis of FunctionValued Traits
TABLE 2
Results of covariance function estimation on empirical data
Method Genetic Environmental NPCov Loglikelihood BIC
Fly mortality (N 5 955)11 fixed effects CP QuadCauchy LinCauchy 7 2186.0 2247.7
OP Cubic Quadratic 17 2242.1 2338.0RR Quadratic Quadratic 13 2298.2 2380.4
Fly reproduction (N 5 1109)11 fixed effects CP ConstExpa QuadCauchya 8 494.1 427.5
OP Cubic Quadratic 17 451.4 353.4RR Quadratic Linear 10 374.0 300.5
Beef cattle growth (N 5 1626)24 fixed effects CP LinExp LinExp 7 26895.6 27010.0
RR Constant Linear 6 26910.7 27021.4OP Linear Linear 8 26908.3 27026.4
The bestfitting genetic and environmental covariance functions for three different methods using empiricaldata on fruit fly mortality and reproduction and growth in beef cattle. Also presented is the loglikelihood ofthe models at their maximum and the BIC model selection criterion. NPCov represents the number of estimatedparameters in the covariance structure for each model. The number of fixed effects reflects the number ofdifferent ages at which observations were obtained, and N is the total number of observations. Quad, quadratic;Const, constant; Exp, exponential; Lin, linear.
a The bestfitting correlation function was a nonstationary CP model.
1). The environmental covariance is better behaved and This is true for reproductive output as well, and the significant nonstationary parameter in the genetic correlamuch less of a problem. As seen with the random regrestion provides evidence for an increase in the correlationsion simulations, the strong positive correlations across allbetween two equidistant ages with increasing age.ages are well fit by all the methods.
Beef cattle: Although differences in fit among the methEmpirical: Drosophila reproduction and mortality: For ageods are less dramatic for beef cattle than for Drosophila,specific mortality and reproduction in Drosophila, thethe character process model again provides a significantlycharacter process model provided a significantly better fit,better fit (as determined by the BIC criterion) than eitheraccording to the BIC criterion, than either the orthogonalrandom regression or orthogonal polynomial methodspolynomial or random regression methods (Table 2). In(Table 2). The bestfitting model for the genetic part wasfact, the CP models achieved higher likelihoods despitea linear variance (increasing with age) and an absolutecontaining significantly less parameters than the OP orexponential correlation (rG(ti, tj) 5 uti2tj)). There wasRR models. For agespecific mortality, the bestfittingno evidence for nonstationarity in the data. Parametermodel for the genetic covariance was a quadratic varianceestimates and their standard errors for the CP model arewith a Cauchy correlation function (rG(ti, tj) 5 1/(1 1presented in Table 3, and the fitted genetic covarianceu(ti 2 tj)2)). For fly reproduction the best character processstructure is shown in Figure 3C.model was a constant variance at all ages coupled with a
nonstationary correlation function described by the absolute exponential, rG(ti, tj) 5 uf(ti)2f(tj) (see text following
DISCUSSIONEquation 5). Parameter estimates and their standard errors for the CP model are presented in Table 3, and the The quantitative genetic analysis of repeated measuresfitted genetic covariance structures are presented in Figure and other functionvalued traits requires the estimation3, A and B. of continuous covariance functions for each source of
The simplicity of the character process model allows variation in a particular statistical model. Traditionally,quantitative statements about the predominant attributes statistical geneticists interested in characters that changeof the genetic covariance function. Genetic variance for gradually along some continuous scale have had to settleDrosophila mortality declines significantly with age, while for models that are either overparameterized (i.e., stangenetic variance is constant at all ages for reproductive dard multivariate methods) or oversimplified (e.g., comoutput. For mortality, the parameter in the genetic correla posite character analysis; Meyer 1998; Pletcher andtion function was significantly different from zero (P , Geyer 1999). In recent years, however, the introduction0.0001), suggesting that mortality rates become less geneti and development of random regression models, orthogo
nal polynomial models, and models based on stochasticcally correlated as ages become further separated in time.
920 F. Jaffrezic and S. D. Pletcher
TABLE 3
Character process model estimates of genetic and environmental covariancefunctions for empirical data
Parameters Genetic Environmental Residual
Fly mortalityu0 0.28 (0.12) 0.53 (0.05) Noneu1 0.35 (0.08) 20.03 (0.007)u2 20.03 (0.007) —uC 0.10 (0.02) 1.76 (0.29)
Fly reproductionu0 0.18 (0.03) 0.10 (0.02) Noneu1 — 20.01 (0.01)u2 — 20.002 (0.001)uC 0.26 (0.15) 4.0 (2.0)l 20.63 (0.30) 0.51 (0.13)
Beef cattle growthu0 0.0001a (186.3) 0.0001a (257.8) 1000.8 (85.35)u1 4.12 (6.95) 38.94 (7.77)uC 0.99 (0.02) 0.99 (0.003)
Parameter estimates (and standard errors) for the bestfitting character process models for empirical dataon fruit fly mortality and reproduction and growth in beef cattle. u0, u1, and u2 represent parameters of thevariance function such that a quadratic variance is represented as v2(t) 5 u0 1 u1t 1 u2t 2. In cases where thebestfitting model was constant or linear, the appropriate ui are omitted. uC and l are parameters of thecorrelation function. A residual term is not always added in the model.
a Parameter estimate is at the lower boundary and asymptotic standard errors may not be reliable.
process theory (i.e., the character process model) have mentioned previously, for random regression models theentire covariance structure is implicitly determined by theprovided important alternatives. Other types of random
regression models (e.g., nonlinear models as suggested by shapes of the regression polynomials, and covariance surfaces described by orthogonal polynomials have a fixedLindstrom and Bates 1990 and Davidian and Giltinan
1995) may prove useful, but they are currently difficult to relationship between variance and correlation. This limitation is exemplified in the analysis of growth in beef cattle.implement.
Through extensive investigation of a variety of simulated For the genetic deviation, the bestfitting RR model included only a random intercept. This implies not onlycovariance structures and empirical data, we find that
under most conditions the CP models provide the best that the variance is considered constant over time but alsothat the correlation is constant and equal to 1 across alldescription of the underlying covariance structure. It is
clear from the simulation results that the CP model is the ages, which is probably not appropriate (Figure 3C).Applying the same argument to the fertility data in Droonly method that adequately captures a correlation that
declines rapidly to zero as character values become further sophila, the bestfitting CP model for the genetic partwas a constant variance with a rather rapid decline inseparated in time. Both random regression models and
orthogonal polynomials have noticeable problems approx correlation between increasingly separated ages (Table 3).Such a combination is simply not possible under the RRimating such a structure (Table 1, stationary CP data;
Figure 2). Polynomials do not have asymptotes, and the or OP methods. It is also likely that the separation ofvariance and correlation was a major factor contributingrapid decline in correlation tends to force both methods
to estimate correlations that are strongly negative within to the ability of the CP model to reasonably estimate thegenetic variation with a much smaller number of paramethe range of the data. Although the characteristics of
covariance functions for natural organisms remain gener ters (4 parameters) than random regression (10 parameters) or orthogonal polynomial (17 parameters) modelsally unknown, this is a serious limitation as asymptotic
behavior in covariances/correlations are to be expected (Table 2).The data sets we examined were small in comparison(Pletcher and Geyer 1999). Other parameterizations of
the RR models (e.g., using orthogonal polynomials in the to those commonly analyzed in agricultural and breeding contexts. Using extremely large data sets, compliregression) may prove more useful in this regard. On the
other hand, RR and OP models work quite well when cated covariance and correlation models may be ofgreater use, and the random regression and orthogonalthe correlation structure remains high over time (see Ta
ble 1, environmental correlation in CP simulated data). polynomial methods may begin to show an advantage.Large data sets would also relieve the convergence probA further advantage of the CP models appears to be the
ability to model the variance and correlation separately. As lems we experienced with highorder random regres
921Genetic Analysis of FunctionValued Traits
several important limitations of the process models thatsuggest avenues for further development. First, additional ways of relaxing the stationarity assumption(Pletcher and Geyer 1999) without greatly increasingthe number of parameters are needed. Although notappropriate in all situations, a promising direction proposed by NunezAnton and Zimmerman (2000) hasbeen studied here and seems to offer reasonable flexibility in practice. Second, CP models require the manipulation (inversion, factorization, etc.) of matrices whosedimensions are proportional to the number of ages inthe data set, regardless of the size of the model itself(Meyer 1998). A method of reparameterization, similarto that used for RR and OP models (Meyer 1998),would be useful. Third, a method for estimating theeigenfunctions of covariance functions used by the process models would provide insight into patterns of genetic constraints across ages (Kirkpatrick et al. 1990;Kirkpatrick and Lofsvold 1992).
Last, the genetic analysis of two or more functionvalued traits is an important goal. Generalization ofregression models to multitrait analyses is straightforward and has already been used, for instance, to analyzeagedependent milk production, fat, and protein content in dairy cattle (Jamrozik et al. 1997a). Bivariatecharacter process models might be implemented by defining a parametric crosscovariance function betweenthe two traits, but appropriate forms for this functionare yet to be discovered.
W. Hill, N. Barton, and two anonymous reviewers provided valuablecomments on the manuscript. Thanks to J. Curtsinger and A. Khazaelifor generously providing published and unpublished data. F.J. thanksthe INRA for support during this project.
LITERATURE CITED
Davidian, M., and D. M. Giltinan, 1995 Nonlinear Models for RepeatedMeasurement Data. Chapman and Hall, London.
Diggle, P. J., K. Y. Liang and S. L. Zeger, 1994 Analysis of Longitudinal Data. Oxford University Press, Oxford.
Gilmour, A. R., R. Thompson, B. R. Cullis and S. J. Welham, 1997ASREML Manual. New South Wales Department of Agriculture,Orange, 2800, Australia.
Jamrozik, J., L. Schaeffer, Z. Liu and G. Jansen, 1997a Multipletrait random regression test day model for production traits.Proceedings of 1997 Interbull Meeting, Vol. 16, pp. 43–47.
Figure 3.—Contour plots of genetic covariance functions Jamrozik, J., L. R. Schaeffer and J.C. M. Dekkers, 1997b Geneticfitted by the character process model. (A) Agespecific mortal evaluation of dairy cattle using test day yields and random regresity in the fruit fly, Drosophila melanogaster; (B) agespecific re sion model. J. Dairy Sci. 80: 1217–1226.
Kirkpatrick, M., and N. Heckman, 1989 A quantitative geneticproduction in D. melanogaster, (C) agespecific growth in beefmodel for growth, shape, reaction norms, and other infinitecattle.dimensional characters. J. Math. Biol. 27: 429–450.
Kirkpatrick, M., and D. Lofsvold, 1992 Measuring selection andconstraint in the evolution of growth. Evolution 46: 954–971.
Kirkpatrick, M., D. Lofsvold and M. Bulmer, 1990 Analysis ofsion and orthogonal polynomial models. Unfortunately, the inheritance, selection and evolution of growth trajectories.most quantitative genetic studies of natural and experi Genetics 124: 979–993.
Kirkpatrick, M., W. G. Hill and R. Thompson, 1994 Estimatingmental populations are extremely labor intensive, andthe covariance structure of traits during growth and ageing, illussample sizes will often be similar to those reported here. trated with lactation in dairy cattle. Genet. Res. 64: 57–69.
For these situations, the properties of the character pro Lindstrom, M. J., and D. M. Bates, 1990 Nonlinear mixed effectsmodels for repeated measures data. Biometrics 46: 673–687.cess models (e.g., easy hypothesis testing, few and inter
Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitativepretable parameters) make it a useful option. Traits. Sinauer Associates, Sunderland, MA.
Meyer, K, 1998 Estimating covariance functions for longitudinalDespite their apparent success in this study, there are
922 F. Jaffrezic and S. D. Pletcher
data using a random regression model. Genet. Sel. Evol. 30:rc 5 1 2
RT2 1i51 RT
j5i11 (yij 2 yij)2
Ri,j(yij 2 y)2 1 Ri,j(yij 2 y)2 1 T(T 2 1)(y 2 y)2/2,221–240.
Meyer, K., and W. G. Hill, 1997 Estimation of genetic and pheno(A1)typic covariance functions for longitudinal or ‘repeated’ records
by Restricted Maximum Likelihood. Livest. Prod. Sci. 47: 185–200.
where yij represents the estimated correlation betweenNunezAnton, V., 1998 Longitudinal data analysis: nonstationaryerror structures and antedependent models. Appl. Stochastic times ti and tj given by the model and yij is the correlationModels Data Anal. 13: 279–287. between times ti and tj in the simulated data. T representsNunezAnton, V., and D. L. Zimmerman, 2000 Modeling nonsta
the total number of times at which measurements weretionary logitudinal data. Biometrics 56 (in press).Pletcher, S. D., and C. J. Geyer, 1999 The genetic analysis of age taken. y and y are the means of the correlation values
dependent traits: modeling a character process. Genetics 153: for the simulated data and for the model, respectively.825–833.The concordance coefficient for the variance estimateSchwarz, G., 1978 Estimating the dimension of a model. Ann. Stat.
6: 461–464. is much simpler and given byStram, D. O., and J. W. Lee, 1994 Variance components testing in
the longitudinal and mixed effects model. Biometrics 50: 1171–1177.
rc 5 1 2RT
i51 (yi 2 yi)2
Ri(yi 2 y)2 1 Ri(yi 2 y)2 1 T(y 2 y)2, (A2)Vonesh, E., V. Chinchilli and K. Pu, 1996 Goodnessoffit in gener
alized nonlinear mixedeffects models. Biometrics 52: 572–587.
Communicating editor: C. Haleywhere the y’s now refer to the actual and estimatedvariances rather than correlations.
APPENDIX: GOODNESS OF FIT OF THE The coefficient rc is directly interpretable as a concorCOVARIANCE STRUCTURE dance coefficient between observed and predicted val
ues. It directly measures the level of agreement (concorThe concordance correlation coefficient rc describeddance) between yij and yij, and its value is reflected inby Vonesh et al. (1996) was used in the simulation studyhow well a scatter plot yij vs. yij falls about the line identity.to evaluate the goodness of fit for both the varianceThe possible values of rc are in the range 21 # rc # 1,and correlation functions estimated by the models whenwith a perfect fit corresponding to a value of 1 and acompared to the simulated structure. For the correla
tion structure, for instance, we consider lack of fit to values #0.
BIOMETRICS 58, 157162 March 2002
Generalized Character Process Models: Estimating the Genetic Basis of Traits That Cannot Be Observed and That Change with Age or Environmental Conditions
Scott D. Pletcher Department of Biology, Galton Laboratory, University College London NW1 2HE, U.K.
email: [email protected]
and
Florence Jaffrhzic Institute of Animal, Cell, and Population Biology, Edinburgh University, Edinburgh, Scotland
SUMMARY. The genetic analysis of characters that change as a function of some independent and continuous variable has received increasing attention in the biological and statistical literature. Previous work in this area has focused on the analysis of normally distributed characters that are directly observed. We propose a framework for the development and specification of models for a quantitative genetic analysis of function valued characters that are not directly observed, such as genetic variation in agespecific mortality rates or complex threshold characters. We employ a hybrid Markov chain Monte Carlo algorithm involving a Monte Carlo EM algorithm coupled with a Markov chain approximation to the likelihood, which is quite robust and provides accurate estimates of the parameters in our models. The methods are investigated using simulated data and are applied to a large data set measuring mortality rates in the fruit fly, Drosophila melanogaster.
KEY WORDS: Agespecific mortality; Character process models; Genetic variation; Infinite dimensional traits; Quantitative genetics; Repeated measures.
1. Introduction Functionvalued quantitative genetics (Pletcher and Geyer, 1999) or the genetics of infinitedimensional characters (Kirk patrick and Heckman, 1989) is concerned with estimating the genetic contribution to observed variation in characters that change as a function of age or some other continuous variable. Taking advantage of observations from related indi viduals, observed variation in the functionvalued character is decomposed into genetic and nongenetic contributions by estimating continuous, bivariate covariance functions (Kirk patrick and Heckman, 1989). These models have been shown to be effective when applied to a variety of characters from agedependent patterns of reproductive output in fruit flies to growth and lactation curves in cattle (Diggle, Liang, and Zeger, 1994; Kirkpatrick, Hill, and Thompson, 1994; Jaffrezic and Pletcher, 2000).
In this article, we present theory and implementation of the genetic analysis of survival and other threshold charac ters thought to be influenced by a continuously distributed underlying trait, commonly termed frailty or liability, that is unobserved and changes as a function of some continuous variable, such as age. An important example is inference con cerning agespecific mortality rates, which are genetically in fluenced but unobserved (Shaw et al., 1999). Other applica
tions include estimating the genetic component of variation in the appearance of an environmentally induced phenotype across different environmental conditions (Roff and Bradford, 2000) or in the expression of an ordered categorical character across age and space (Wright, 1934). As a foundation for our development of a generalized functionvalued quantitative ge netics, we have chosen the character process model (Pletcher and Geyer, 1999). It has several desirable properties; most important for us is its improved efficiencythis model fits many observed covariance structures better and with fewer parameters than other popular models such as random re gression and other repeated measurestype analyses (Jaffrezic and Pletcher, 2000).
2. Generalized Process Models We are interested in inferring the genetic basis of some char acter Y, which is not observed, given a series of measurements on an observed trait, which is denoted by X . We assume that some reasonable model for the relationship between X and Y is available and that all genetic and shared environmental effects are modeled with respect to the Y value. This is in keeping with the standard interpretation of threshold charac ters (Wright, 1934) and of correlated frailty (Yashin, Iachine, and Harris, 1999).
157
158 Biometrics, March 2002
When considering functionvalued traits, it is assumed that the trajectory (over some continuous variable) of the character is random and influenced by one or more unobservable factors. For the additive model, we assume the unobserved character can be decomposed as
where t is some continuous measure and g ( t ) and e(t) are Gaussian random functions, which are independent of one an other and have an expected value of zero at each age (Kirk patrick and Heckman, 1989; Pletcher and Geyer, 1999). These represent genetic and environmental deviations at each value of t . p f t ) is the mean function, and E is the residual variation.
In practice, a finite number of observations (each associated with a particular value of the continuous variable) are made on a number of individuals i of varying relatedness. Thus, let y i ( t J ) , etc., denote the effects for individual i at point t j and y be a vector containing all data on all individuals in the order y1 ( t l ) , y1 ( t z ) , . . . , y2 ( t l ) , . . . ; then the distribution of y, fe, (y), is multivariate normal with density
and @ denotes the Kroneker product, A is a matrix con taining coefficients of relatedness, II , is the identity matrix of size k, n is the number of measurements on each individ ual, and N is the total number of observations in the data. The remaining matrices, Z and R, are discrete representa tions of the covariance functions for the genetic and envi ronmental processes given in (1). If G ( s , t ) = covg(s),g(t) and E(s,t) = cove(s),e(t), then Z[ i , j ] = G(t i , t j ) and R[i, j] = E ( t i , t j ) (Pletcher and Geyer, 1999). The residual variance is a:. The vector /3 describes the mean function nonparametrically by specifying a unique parameter for each value o f t in the data set.
Parametric forms for the covariance functions are based on the character process model where, taking G ( s , t ) as an example, the functions are written as
G ( s , t ) = ~ G ( ~ ) W G ( ~ ) P G ( ~ S  41, (4)
where ?JG(t)2 describes how the genetic variance changes with age and ~ ~ ( 1 s  ti) describes the genetic correlation between two ages. There are no restrictions on the form of TIC(.), and it is often modeled using simple polynomials (linear, quadratic, etc.) either on the natural or the log scale. If the correla tion between two ages is a function only of the time distance (1s  tl) between them (correlation stationarity), then numer ous choices for p ( . ) are available, all of which satisfy several theoretical requirements (for a list, see Pletcher and Geyer, 1999). Strict correlation stationarity can be relaxed by im plementing a nonlinear transformation on the time axis, f ( t ) (NunezAnton, 1998; Jaffrezic and Pletcher, 2000). The corre lation function is then defined as p(s, t ) = p(/f(s)  f ( t ) l ) , and the functions suggested by Pletcher and Geyer (1999) remain valid.
The elements of the observed vector 5 are conditionally
independent given y, and
N N
i=l i=l
The likelihood associated with the observed data is N
fd5) = / n. fe, ( X i I Y i ) f O l (Y)dY, ( 6 ) a = 1
where 82 is a vector of parameters describing the relat,ionship between X and Y and 81 contains parameters describing the distribution of Y , which includes parameters of the variance functions, mean function, and potential fixed effects (Meyer and Hill, 1997; Pletcher and Geyer, 1999).
3. Likelihood Maximization The likelihood was maximized using a hybrid algorithm com posed of Markov chain Monte Carlo EM (MCEM) (McCul loch, 1997) and Markov chain Monte Carlo integration/max imization (MCMLE) (Shaw et al., 1999; Geyer, 1995). The computational cost of the MCEM algorithm is much lower than that of the MCMLE. However, parameter estimates ob tained from MCEM show a good deal of variation (McCulloch, 1997; S. Pletcher, unpublished results) and confidence inter vals are not easily obtained. The MCMLE provides accurate parameter estimates and confidence intervals, but it is compu tationally expensive and requires a reference point in the pa rameter space of @ = @I, & that is close to the MLE (Shaw et al., 1999). We found the following threestep procedure combines the strengths of both methods. First, the MCEM is used to determine the reference point, call it 00, for the MCMLE. Second, a single chain of random deviations from fe,(y 1 z) is obtained using a Metropolis algorithm (Shaw et al., 1999). These deviates are used to approximate the likelihood function (6) through a Monte Carlo evaluation of the integral (Geyer, 1995). Third, the approximation is max imized, and estimates and standard errors of the parameters are obtained. Details of the computational algorithms and rel evant computer code are available from the first author (or see http://www.ucl.ac.uk/biology/goldstein/scottindex. htm).
4. Example For the following examples, the character we are interested in, y ( t ) , is the agespecific mortality function for a specific cohort of genetically identical individuals. The observed char acter z ( t ) is the number of individuals dying in that cohort at age t . Shaw and colleagues assumed parametric forms for the unobserved mortality curves using Gompertz and logistic functions (Shaw et al., 1999), which is analogous to a ran dom regression on the agedependent trajectories (Jaffrezic and Pletcher, 2000). Because the character process models have been shown to perform better than random regression models for observed functionvalued characters (Jaffrezic and Pletcher, 2000), we extend the Shaw model to the generalized character process theory.
Measurements are taken at a finite number of ages, and therefore we observe a census vector x i g , which contains the number of individuals alive in cohort i at census number j . Similarly, we assume each cohort has a logmortality rate yZ3 at census number j , and t i j is the age at which census number j was taken. We estimate genetic and environmental
Generalized Character Process Models 159
Table 1 Estimated genetic and environmental covariance functions for simulated dataa
VG V E
Data e Qc e OC
yvalues (unobserved) 0.15 (0.03) 0.095 (0.041) 0.20 (0.006) 0.402 (0.006) zvalues (observed) 0.17 (0.17) 0.081 (0.150) 0.20 (0.032) 0.403 (0.012)
"Covariance functions are composed of a constant variance across ages (v ( t )2 = 6') and a normal correlation function p(s, t ) = e  e c ( s  t ) 2 . Asymptotic standard errors of the estimates are in parentheses. yvalues indicate results from a functionvalued analysis directly on the unobserved frailty. 2values represent the results of the Markov chain Monte Carlo models on the observed ages at death. Parameter estimates from the two methods are nearly identical, but standard errors are higher for the MCMC analysis.
covariance functions associated with y as well as a separate mean mortality rate (over all cohorts) for each t i j (call these parameters p j ) . The mean parameters are incorporated into the conditional distribution f(x I y), resulting in the unob served variable y(t) representing the cohortspecific deviation from this mean at each age, which has mean function equal to zero (p = 0 in equation (2)).
Assuming a piecewise constant hazard function, the proba bility of an individual alive at the start of census j  1 surviv ing the interval [ti(jl>,tij) is p t t j = exppj(tij ticj1)). The number of deaths in the interval is binomially distributed with frequency p t i , and number of trials xi(jl). Writing xi = zil, zi2, . . . and yi similarly, the conditional proba bility of observing a specific census vector for a specific cc hort is
.r
3=7
where J is the number of census times (Shaw et al., 1999). This distribution is substituted into (5) and combined with (2) to yield the likelihood (6) for use in the Metropolis algorithm and in likelihood maximization.
4.1 Simulated Data Simulated ages at death were generated for 600 distinct cohorts (20 replicate cohorts from each of 30 genetically distinct lines) of 500 individuals each. The data were simulated using a covariance function with a constant variance (i.e., v$(t) = 0.2 in equation (4)) and standard normal correlation function (i.e., p (s , t ) = eec(st)2) for both the genetic and environmental parts (Bc = 0.1 and Bc = 0.4 for the genetic and environmental correlations, respectively). Similar results were obtained using other covariance functions and experimental designs.
The small number of lines in the simulated data leaves open the possibility that the realized genetic variance and covariance among lines may deviate significantly from the target values. To compensate for this, we estimated cova riance functions for the realized yvalues themselves (i.e., the unobserved agedependent mortality rates), which are saved during the course of the simulation, and we used these estimates as metrics for determining the accuracy of the covariance functions estimated from the 2values (the observed ages at death).
The MCEM routines provided an excellent 60 for the
MCMLE routines. The sample paths for the four covariance parameters (two for both the genetic and environmental covariance functions) show a rapid convergence to the neigh borhood of the simulated value, with the genetic variancp converging less quickly than the others (data not presented). 00 for the MCMLE was obtained by averaging the values from the last 200 (of a total of 500) iterations.
The MCMLE routines were then used to obtain estimates and confidence intervals for the genetic and environmental covariance functions and for the mean mortality trajectory. The approximation of the likelihood was based on an MCMC sample size of 1000 random deviates sampled from the chain every 1000 steps. The genetic covariance function obtained from this analysis is in complete agreement with that obtained when standard methods are used on the unobserved y values themselves (Table 1). The environmental covariance functions, which are estimated accurately with smaller sample sizes, are essentially identical. As expected, the asymptotic standard errors on the parameter estimates are much larger (up to five times larger) when obtained from the observed data (Table 1). It may be that increasing the MCMC sample size would reduce this difference. It is more likely, however, that there is simply more uncertainty in the estimates.
4.2 ~ o ~ a Z i t y in Drosophila melanogaster The Drosophila data are taken from a large mortality experiment composed of 29 genetically distinct lines of flies. The lines were created using an experimental mutagenesis technique whereby single mutational events were initiated in a genetically homogeneous background (S. D. Pletcher, unpublished data). Experimental populations differed among themselves genetically via one mutational event. Genetic variation in mortality rates as a function of age and genetic covariation in mortality between ages provide important insights into the agespecific properties of these mutations (Pletcher, Houle, and Curtsinger, 1998). Ages at death were recorded for four replicate cohorts (each of approximately 300 males) from each line and were pooled into 3day intervals for analysis.
Exploratory analyses, including an examination of the phenotypic covariance structure and an estimate of the genetic variogram cloud (Jaffrezic, Pletcher, and Hill, 2001), suggested the use of genetic and environmental covariance functions that are composed of a linear variance function (i.e., v2(t) = yo + y l t in equation (4)) and a normal cor relation function (i.e., p(s,t) = eyc(st)2). The 80 value
160 Biometrics, March 2002
Figure 1. Contour plots of the genetic and environmental covariance functions estimated from a large mortality experiment using the fruit fly, Drosophila melanogaster. Functions represent agedependent covariance in log mortality rates. Both genetic and environmental covariance functions are described by a linear variance function v&(t) =
yo+ylt and normal correlation function p(s , t ) = e  Y c ( s  t ) 2 . A. Estimated genetic covariance function. +J < 0.0001, = 0.049, yk = 0.072. yo was estimated at its lower boundary (G 0). B. Estimated environmental covariance function. "yo = 0.21, = 0.022, 72 = 0.60.
for the MCMLE procedures was obtained by averaging 500 consecutive iterations of the MCEM algorithm after it was determined to have converged to a stable region for each parameter. The MCMLE routines were then executed with an MCMC sample size of 2000, and the chain was sampled every 1000 steps.
We found that both genetic and environmental variance for agespecific mortality increased with age. Early in life, environmental variance was very high, yo = 0.21 and 70 < 0.0001 for the environmental and genetic variances, respectively, but the rate of increase in genetic variance with age was faster (Figure 1). The correlation parameter was much higher in the environmental correlation function than it was in the genetic function (0.59 versus 0.075), implying that environmental covariance decreases much more rapidly as ages become more and more separated in time. This suggests a rather high degree of pleiotropy (single genes affecting mortality at more than one age) and a relatively transient influence of the environment. The degree of uncertainly in the parameter estimates of the genetic function is illustrated by profile likelihoods (Figure 2).
5. Discussion We present a flexible approach for examining the genetic basis of functionvalued characters that are unobserved but that influence the expression of an observed character through some arbitrary, hypothesized form. The complexity of the models necessitated the use of stochastic methods for model specification, and we rely heavily on Markov chain Monte Carlo methods, which can be troublesome and difficult to implement. To alleviate some of the difficulties, we implemented a composite algorithm consisting of a Markov chain EM algorithm (MCEM) followed by a Markov chain approximation to the actual likelihood (MCMLE). This combination was found to work well for generalized linear mixed models (McCulloch, 1997), and many of the properties of convergence discussed in McCulloch (1997) apply to the models we developed here. The MCEM algorithm robustly provided excellent reference values (i.e., 00) from a wide range of starting points, which were then used in the MCMLE to estimate parameters and to obtain likelihood statistics and confidence intervals. Results obtained through the analysis of simulated data and of mortality rates in the fruit fly, Drosophzla ,melanogaster, show that variation accumulated through heterogeneity of starting values and through the stochastic nature of the Markov chain algorithms is surprisingly small (essentially inconsequential) in comparison with the support of the parameter estimates provided by the data (data not shown).
Although the algorithms are successful in recovering the underlying genetic structure in simulated data sets and in capturing the variation in Drosophila mortality rates, some limitations are apparent. Despite the large number of individuals in our data sets, the asymptotic standard errors (and profile likelihood functions) of the estimates in our analyses are considerable. This suggests that large sample sizes may be required for inference regarding the genetic basis of unobserved characters. In addition, our choice of covariance model was based on exploratory algorithms that will not apply in all situations (Jaffrezic et al., 2001). The development of model selection criteria similar to those used for observed functionvalued traits is an important issue, and work is currently underway in this area.
Our examples have focussed exclusively on agespecific mortality rates. However, precisely the same theory applies to any nonnormally distributed phenotype that is thought to be determined by an unobserved, normally distributed
Generalized Character Process Models 161
5
0 U 0 0 c Q, 5  $ Is) 0 I 10
0.W 0.05 0.10 015 0.20
15 0.01 0.12 0.23 0.34
’YO 5
25 U 0 0 r 5 55 $ Is) 0 85 J
115 r o.ol 0.03 0.06 o.08 o.lo
0.01 0.12 0.23 0.34
’Y1
lo I 0
U 0 0 r a, 10 Y 1

bl 9 20
30 0.W 0.05 0.10 0.15 0.20
0.01 0.12 0.23 0.34
‘YC
Figure 2. Likelihood profiles for the parameters of the genetic covariance function estimated from a large mortality experiment in Drosophila. The estimated genetic covariance function is described by a linear variance function u g ( t ) =
70 +?It and normal correlation function p ( s , t ) = eYc(st)2. Estimated values are TO < 0.0001, = 0.049, and 72 = 0.072. 70 was estimated at its lower boundary (M 0). Insets focus on a narrow range of parameter values and provide guidance for the construction of 99% confidence intervals on the estimates.
character (Wright, 1934). An example may be the expression of a threshold character, such as the occurrence of a disease, over space or time. The distribution of the observed trait given the unobserved liability f,,, is the only aspect of the theory and computer code that requires change. Furthermore, although we prefer the character process model for describing the covariance structure of the unobserved character, random regression or orthogonal polynomial models could be implemented with equally small modifications.
ACKNOWLEDGEMENTS
We thank W. Hill, D. Commenges, and three reviewers for comments on the manuscript. We are indebted to F. Shaw for his clarification of many of the MCMC methods and A. Yashin for pointing out the problems of identifiability in survival models.
RESUME L’analyse genktique des caracthres dont l’expression est modifiee par une variable independante continue fait l’objet d’un int6rGt croissant dans la litt6rature biologique et sta tistique. Les travaux antkrieurs dans ce domaine se sont concentres sur l’analyse des traits normalement distribuks et directement observables. Nous proposons un cadre mk thodologique pour le developpement et la spkcification de modbles destines l’analyse gknetique quantitative de ce type de caractkres lorsqu’ils ne sont pas directement observables, comme la variation gCn6tique dam les taux de mortalitks age specifiques ou les caractkres & seuil complexe. Nous employons un algorithme hybride MCMC (associant un algorithme de Monte Carlo de type EM et une approximation de la vraisemblance par les chaines de Markov) qui est robuste et fournit une estimation precise des paramktres de nos modkles. Ces mkthodes sont Btudikes dans le contexte de donnCes simulees et elles sont appliqukes a un large jeu de donnkes reelles mesurant les taux de mortalit6 de la mouche a fruit, Drosophila melanogaster.
REFERENCES
Diggle, P. J., Liang, K. Y., and Zeger, S. L. (1994). Analysis of Longitudinal Data. Oxford: Oxford University Press.
Geyer, C. J . (1995). Estimation and optimization of functions. In Markou Chain Monte Carlo in Practice, W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (eds), 241258. London: Chapman and Hall.
Jaffrezic, F. and Pletcher, S. D. (2000). Statistical models for estimating the genetic basis of repeated measures and other functionvalued traits. Genetics 156, 913922.
Jaffrezic, F., Pletcher, S. D., and Hill, W. G. (2001). Non parametric estimation of covariance structure for genetic analysis of repeated measures and other functionvalued traits. Genetical Research, in press.
Kirkpatrick, M. and Heckman, N. (1989). A quantitative genetic model for growth, shape, reaction norms, and other infinitedimensional characters. Journal of Mathematical Biology 27, 429450.
Kirkpatrick, M., Hill, W. G., and Thompson, R. (1994). Estimating the covariance structure of traits during growth and ageing, illustrated with lactation in dairy cattle. Genetical Research 64, 5769.
162 Biometracs, March 2002
McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Society 92, 162170.
Meyer, K. and Hill, W. G. (1997). Estimation of genetic and phenotypic covariance functions for longitudinal or ‘repeated’ records by restricted maximum likelihood. Livestock Production Science 47, 185200.
NunezAnton, V. (1998). Longitudinal data analysis: Non stationary error structures and antedependent models. Applied Stochastic Models and Data Analysis 13, 279 287.
Pletcher, S. D. and Geyer, C. J. (1999). The genetic analysis of agedependent traits: Modeling a character process. Genetics 153, 825833.
Pletcher, S. D., Houle, D., and Curtsinger, J. W. (1998). Age specific properties of spontaneous mutations affecting mortality in Drosophila melanogaster. Genetics 148, 287303.
Roff, D. A. and Bradford, M. J. (2000). A quantitative genetic analysis of phenotypic plasticity of diapause induction in the cricket allonemobius socius. Heredzty 84, 193200.
Shaw, F., Promislow, D. E. L., Tatar, M., Hughes, K., and Geyer, C. J. (1999). Towards reconciling inferences concerning genetic variation in senescence. Genetacs 152,
Wright, S. (1934). An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19,
Yashin, A. A., Iachine, I. A., and Harris, J. R. (1999). Half of variation in susceptibility to mortality is genetic: Findings from Swedish twin survival data. Behavior Genetzcs 29, 11 19.
553566.
506536.
Received November 2000. Revised September 2001. Accepted September 2001.
Copyright 2004 by the Genetics Society of AmericaDOI: 10.1534/genetics.103.019554
Multivariate Character Process Models for the Analysis of Two or MoreCorrelated FunctionValued Traits
Florence Jaffrezic,*,1 Robin Thompson†,‡ and Scott D. Pletcher§
*INRA Quantitative and Applied Genetics, 78352 JouyenJosas Cedex, France, †Rothamsted Research, Harpenden, Herts AL5 2JQ,United Kingdom, ‡Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, United Kingdom and §Huffington Center
on Aging and Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
Manuscript received July 1, 2003Accepted for publication May 17, 2004
ABSTRACTVarious methods, including random regression, structured antedependence models, and character
process models, have been proposed for the genetic analysis of longitudinal data and other functionvalued traits. For univariate problems, the character process models have been shown to perform well incomparison to alternative methods. The aim of this article is to present an extension of these models tothe simultaneous analysis of two or more correlated functionvalued traits. Analytical forms for stationaryand nonstationary crosscovariance functions are studied. Comparisons with the other approaches arepresented in a simulation study and in an example of a bivariate analysis of genetic covariance in agespecific fecundity and mortality in Drosophila. As in the univariate case, bivariate character process modelswith an exponential correlation were found to be quite close to firstorder structured antedependencemodels. The simulation study showed that the choice of the most appropriate methodology is highlydependent on the covariance structure of the data. The bivariate character process approach proved tobe able to deal with quite complex nonstationary and nonsymmetric crosscorrelation structures and wasfound to be the most appropriate for the real data example of the fruit fly Drosophila melanogaster.
THE need for a rigorous method of analysis for bio ances. A comparison among these methods revealedlogical characters that are best considered as func that, in many cases, character process models performed
tions of some independent and continuous variable is well in comparison to alternative methods, especiallyrapidly growing. Important examples of these socalled random regression, often providing a better fit to thefunctionvalued traits include growth curves (Meyer covariance structure (genetic and nongenetic) with2001), agespecific components of organismal fitness fewer parameters (Jaffrezic and Pletcher 2000).such as survival or reproductive output (Pletcher et al. A parsimonious method for the analysis of two or1998), lactation curves in dairy cattle (Meuwissen and more correlated functionvalued traits is needed. AlPool 2001; Jaffrezic et al. 2002), and gene expression though a multivariate extension of random regressionprofiles across age or environmental treatments (DeRisi models is straightforward, their sometimes poor perforet al. 1997; Pletcher et al. 2002). mance in the univariate case argues for the development
Several techniques have been proposed for single of alternative methods. Moreover, the nature of thetrait (univariate) analyses. These include random re parameterization results in a dramatic increase in thegression models, which are based on a parametric mod number of parameters required to describe complicatedeling of individual curves (Diggle et al. 1994), character covariance structures, which is often problematic. Theprocess models, which focus on parametric modeling data sets that are generated in experimental sciences,of the covariance structure (Pletcher and Geyer 1999), such as genetics, and that are used to estimate differentand structured antedependence models (SAD; Nunez types of covariance structures (e.g., genetic and nongeAnton and Zimmerman 2000; Jaffrezic et al. 2003), netic) are often too small to support the estimation ofwhere an observation at time t is modeled via a regres many parameters (Pletcher et al. 1998). This wouldsion over the preceding observations. The number of also preclude the use of other models such as splineparameters is considerably reduced in the SAD approach functions.compared to the traditional antedependence models The aim of this article is to investigate an extension(Gabriel 1962), thanks to a parametric modeling of of the character process (CP) models (Pletcher andthe antedependence coefficients and innovation vari Geyer 1999) to the multivariate case. The advantages
that apply to the CP models in the univariate setting, i.e.,a small number of parameters to model the covariancestructure and a high degree of flexibility, are crucial1Corresponding author: INRASGQA, 78352 JouyenJosas Cedex,
France. Email: [email protected] for developing practical multivariate models. Several
Genetics 168: 477–487 (September 2004)
478 F. Jaffrezic, R. Thompson and S. D. Pletcher
crosscorrelation and crosscovariance functions are Cov(g(t), g(s)) Cov(g1(t), g1(s)) Cov(g1(t), g 2(s))Cov(g 2(t), g1(s)) Cov(g 2(t), g 2(s)). (4)
studied, and their behavior is compared to multivariaterandom regression and structured antedependence As the covariance function has to be symmetric, it is required
thatmodels in a simulation study and in an example for thegenetic analysis of agespecific fecundity and mortality Cov(g(s), g(t)) Cov(g(t), g(s)). (5)in the fruit fly, Drosophila melanogaster.
Definition of matrix (t s): In the bivariate case, matrix(t s) is of dimension 2 2. The requirements on thismatrix are that it is positive definite, equal to the identity
MATERIALS AND METHODS matrix when t s, and should verify the symmetry property(t s) (s t). It corresponds to a bivariate extension
Bivariate character process models: A detailed description of the correlation functions proposed for univariate characterof the quantitative genetic model for univariate functionval process models by Pletcher and Geyer (1999). All the funcued traits is given by Jaffrezic and Pletcher (2000) and tions proposed in their article can be extended. Among them,Pletcher and Geyer (1999). In the genetic analysis of two however, the most commonly used are the exponential, thecorrelated functionvalued traits, it is assumed that the ob Gaussian, and the Cauchy correlations. These functions areserved phenotypic characters can be decomposed as defined as follows:
Y(t) (t) g(t) e(t), (1) Exponential: (t s) exp((t s)).Gaussian: (t s) exp((t s)2).
where Y(t) (Y1(t), Y2(t)) represent the observed phenotypic Cauchy: (t s) (I (t s)2)1.trajectories for the two characters Y1(t) and Y2(t), t represents
In the bivariate case, I is the 2 2 identity matrix and any continuous independent variable, which for clarity weis a 2 2 matrix, not necessarily symmetric, with positiveassume is time, (t) (1(t), 2(t)) are nonrandom funceigenvalues. The matrix exponentiation corresponds to a setions that correspond to the genotypic mean functions of Y1(t)ries expansion and can be calculated using an eigenvalueand Y2(t), respectively, and g(t) (g1(t), g2(t)) represent thedecomposition as shown in appendix a.genetic deviations for the two characters. Both deviations are
The bivariate exponential function is also used in the statisticorrelated over time and g(t) is a bivariate Gaussian process.cal literature for the OrnsteinUhlenbeck process (Sy et al.Similarly, e(t) (e1(t), e2(t)) are the environmental devia1997).tions. Processes g(t) and e(t) are assumed independent of one
Further extension to this framework includes a relaxationanother, with mean zero at each age and with covarianceof stationarity of the correlation function. The nonstationaryfunctions G(t, s) and E(t, s). Focus is on the modeling of theseextension of the CP models proposed by Jaffrezic andcovariance functions.Pletcher (2000) is implemented by replacing time lags (t In the univariate character process approach, there is onlys) by a transformation (f(t) f(s)). Considering a BoxCoxone functionvalued trait, Y(t), and its covariance functionstransformation, as suggested by NunezAnton and Zimmer(genetic and environmental) are modeled asman (2000), and an exponential CP model, the correlationfunction can be written asG(t, s) v(t)v(s)(t, s), (2)
(t, s) exp(((t s )/)) (6)where v2(t) represents the variance function and is usuallya parametric function of the continuous variable such as a for 0 andpolynomial and (t, s) is the correlation function. Assuming
(t, s) exp((Log(t) Log(s))) (7)stationarity in the correlations, Pletcher and Geyer (1999)proposed parametric forms for the correlation function in when 0.cluding an exponential ((t, s) exp(t s)), a Gaussian Definition of matrix V(t): In the bivariate case, matrix V(t)((t, s) exp((t s)2)), and a Cauchy ((t, s) 1/(1 is also of dimension 2 2. The requirements for this matrix(t s)2)) function. Jaffrezic and Pletcher (2000) suggested are that it is symmetric and positive definite. It in fact correa nonstationary extension of the models based on a nonlinear sponds to the covariance of the process at a given time t, astransformation of the timescale, f(t) (NunezAnton and Zim matrix (t s) is the identity matrix when t s :merman 2000). Correlation stationarity is assumed to hold onthe transformed scale (t, s) (f(t) f(s)). V(t) Var(g(t)) Var(g1(t)) Cov(g1(t), g 2(t))
Cov(g1(t), g 2(t)) Var(g 2(t)) . (8)Models for bivariate Gaussian processes have been investi
gated previously (Sy et al. 1997) as, for example, the bivariateWe present here two possible ways of modeling matrix V(t).OrnsteinUhlenbeck process. It corresponds to a continuous
It is possible to use a polynomial of time to model functiontime extension of a firstorder autoregressive process [AR(1)],V(t). That would correspond to a direct bivariate extensionwhich is also equivalent to a CP model with an exponentialof the variance function of the character process modelcorrelation and a constant variance. We adapt these ideas to (Pletcher and Geyer 1999).extend the character process methodology. When considering, for example, a quadratic function ofLet the continuous variable of interest be time and the time, the bivariate variance function can be written as
object of analysis be the genetic covariance function. In thebivariate case, let g(t) (g1(t), g2(t)) be the genetic character ln(V(t)) A Bt Ct 2, (9)process, where g1(t) is associated with trait 1 and g2(t) with
where A, B, and C are 2 2 symmetric matrices. The ln( )trait 2. The bivariate covariance function of the process canof the variance again corresponds to a series expansion andbe written ascan be calculated as the exponential in the matrix by usingan eigenvalue decomposition as explained in appendix a.Cov(g(t), g(s)) V(t)1/2(t s)(V(s)1/2) (3)
The covariance matrix V(t) can also be decomposed interms of variance and correlation functions such as(for 0 s t), where
479Genetic Analysis of Correlated FunctionValued Traits
TABLE 1
Likelihood values for the simulated data sets based on unstructured covariance matrices
Model NPCov Example 1 Example 2 Example 3
US 55 2746.7 2401.8 3801.9CP QuadExp 13 551.0 799.4 588.1CP QuadExpNS 14 566.3 1478.9 703.0SAD(1) 12 262.2 1008.4 545.0SAD(2) 14 430.8 1380.2 864.4RR1 13 980.0 200.7 472.6
US, unstructured covariance matrix; CP QuadExpNS, quadratic polynomial used to model V(t), exponentialfunction for (t s) with the nonstationary extension (Equation 6); RR1, linear random regression modelwith three additional parameters for the residual structure; NPCov, number of parameters in the covariancestructure.
that the firstorder bivariate structured antedependenceV(t) v21(t) v1(t)v2(t)12(t)
v1(t)v2(t)12(t) v22(t) . (10)
model [SAD(1)] was well able to capture the covariancestructures simulated under all these different assumpVariance functions can be modeled as for univariate charactertions (results not shown). The similarity between theseprocess models with polynomial functions of time. For a qua
dratic function, for instance, v21(t) Var(g1(t)) exp(a 1 two approaches had already been pointed out in the
b 1 t c 1 t 2) and v22(t) Var(g2(t)) exp(a 2 b 2t c 2t 2). univariate case for SAD(1) models and CP with an expo
Function 12(t) represents the crosscorrelation between the nential correlation function (Jaffrezic et al. 2003). Ontwo traits at a given time t. A possible parametric modelingthe other hand, random regression models dealt poorlyfor this crosscorrelation function iswith all the different covariance structures considered
Corr(g1(t), g2(t)) 12(t) exp(1t) exp(2t) (11) here, even when a cubic polynomial was used (involving36 parameters for the covariance structure).for 1, 2 0. For practical purposes, it is interesting to note
that this correlation function is equal to 0 at t 0, increases Simulations with unstructured covariance models: Toto a maximum at t [ln(2/1)]/(2 1), and then decreases understand better the abilities and limitations of theto 0 at infinity. different models, several patterns of covariance strucA likelihoodratio test can be used to examine specific
tures were investigated. To avoid favoring any of thehypotheses about the parameters. For example, testing if themethodologies, data were simulated with unstructuredcrosscorrelation between the two processes at all times t is
equal to zero is equivalent to testing if 1 2. The cross covariance matrices. A total of 2000 animals were considcorrelation function 12(t) can also be assumed constant: ered with five observations for each trait. As focus was12(t) r, which would imply that the crosscorrelations are
on the crosscorrelation modeling, quite simple strucequal for all t.tures for the variances and correlations of both variablesEstimation procedure: Parameters of these bivariate charac
ter process models can be estimated with REML procedures, were chosen. Three examples are presented here.using, for example, the OWN function of ASREML (Gilmour In the first case, the data were generated using a crosset al. 2002) as presented in appendix a. The nonstationary correlation that was stationary, symmetric, with quiteparameter (Equation 6) is estimated at the same time as the
high values. With regard to the likelihood value (seeother covariance parameters with standard REML procedures.Table 1), a simple bivariate linear random regressionThe properties of the proposed bivariate covariance function
are studied in appendix b. model was found to be the most appropriate, followedby the bivariate CP models and then the SAD models(all models had about the same number of parameters:
EXAMPLE from 12 to 14). Estimated crosscorrelations obtainedwith the unstructured model and the bivariate linearSimulation study: A simulation study was performedrandom regresssion model are presented in Figure 1.to understand better the analogies between the differ
In the second example, the crosscorrelation was moreent methodologies: the bivariate CP model proposedcomplex. Although the correlations between the traitshere, the bivariate structured antedependence modelswere still quite high, they were nonstationary and nonpresented in Jaffrezic et al. (2003), and the randomsymmetric. The bivariate quadratic random regressionregression models. In a first set of simulations, data weremodel did not converge and, on the other hand, thegenerated according to a bivariate CP model, with anlinear bivariate model was not able to deal adequatelyexponential “correlation” function (exp((t s)))with this crosscorrelation pattern. It was found for theand a V(t) structure defined as ln V(t) A Bt Ct 2.character process model that the nonstationary extenDifferent assumptions on parameters of , A, B, and Csion, using only one extra parameter (parameter inwere investigated, setting some elements to zero or giv
ing various values to these parameters. It was found Equation 6), considerably improved the fit as shown in
480 F. Jaffrezic, R. Thompson and S. D. Pletcher
Figure 1.—Estimated crosscorrelations for example 1 of the simulation study for the unstructured model (US) and a bivariatelinear random regression model (RR).
Table 1. The likelihood value was then higher than that early ages and then increasing and decreasing for lateages. The likelihood value was higher for SAD(2) thanfor the secondorder SAD model with the same number
of parameters. Figure 2 gives the estimated crosscorrela for all the other models. It can be seen, however, inFigure 3, that this model was not able to adequately fittions obtained with the unstructured model and with
the chosen bivariate CP model. the diagonal crosscorrelation terms. On the other hand,although the likelihood value was a little lower thanIn the third example, the data were also generated
with nonsymmetric and nonstationary crosscorrela that with the secondorder SAD model, the characterprocess model was better able to capture the diagonaltions, with lower values than those for the first two exam
ples. The diagonal crosscorrelations were lower for crosscorrelation pattern. These figures do show, how
Figure 2.—Estimated crosscorrelations for example 2 of the simulation study for the unstructured model (US) and the chosenbivariate CP model: quadratic polynomial used to model V(t), exponential function for (t s) with the nonstationary extension(Equation 6).
481Genetic Analysis of Correlated FunctionValued Traits
Figure 3.—Estimated crosscorrelations for example 3 of the simulation study, with data simulated with an unstructuredcovariance matrix. [US, unstructured covariance matrix; CP, quadratic polynomial used to model V(t), exponential function for(t s) with the nonstationary extension (Equation 6); SAD, secondorder bivariate structured antedependence model; RR,linear random regression model.]
ever, that even for the chosen models, there is still scope output were collected simultaneously from two replicatecohorts for each of 56 RI lines. Deaths were observedfor improving the fit, although this might be difficult
while keeping the number of parameters reasonably low. every day, while egg counts were made every other day.For both mortality and reproduction, the data wereEmpirical data—joint analysis of fecundity and mor
tality in Drosophila: Agespecific measurements of re pooled into 11 5day intervals for analysis. Mortality rateswere log transformed and reproductive measures wereproduction and mortality rates were obtained from 56
different recombinant inbred (RI) lines of D. melanogas squareroot transformed so that the agespecific measures were approximately normally distributed.ter, which are expected to exhibit genetically based varia
tion in longevity and reproduction (J. W. Curtsinger Parameter estimates for the different methodologieswere obtained with ASREML using the OWN functionand A. A. Khazaeli, unpublished results). Agespecific
measures of mortality and average female reproductive (Gilmour et al. 2002). Models were compared using the
482 F. Jaffrezic, R. Thompson and S. D. Pletcher
TABLE 2
Likelihood values and BIC criterion (Schwarz 1978) for univariate and bivariate genetic analysesof fecundity and mortality in Drosophila
Genetic Environmental
Corr. Var. Corr. Var. NPCov Log L BIC
UnivariateMortality Cauchy Quad. Cauchy Lin.Fecundity Exp. NS Const. Cauchy NS Quad. 15 329.0 186.7
Mortality Cauchy NS Quad. Cauchy NS Quad.Fecundity Cauchy NS Quad. Cauchy NS Quad. 20 337.2 175.6
Bivariate Cauchy NS QuadConst. Cauchy NS LinQuad. 23 377.9 204.8Cauchy NS Quad. Cauchy NS Quad. 28 380.6 188.2Exp. NS Quad. Cauchy NS Quad. 28 370.2 177.8Cauchy Quad. Cauchy Quad. 26 352.9 168.2Exp. NS Quad. Exp. NS Quad. 28 354.6 162.2
In both cases the logarithms of the variances were modeled, such as ln v 2(t) a bt c t 2 and ln(V(t)) A Bt Ct 2 with A, B, and C 2 2 symmetric matrices. Corr., correlation; Var., variance; Quad., quadratic;Lin., linear; Exp., exponential; Const, constant.
BIC criterion (Schwarz 1978; Jaffrezic and Pletcher in the methodology section). The main improvementof the bivariate model lies in its ability to model the2000): BIC ln L 0.5ncln(N p), where ln L is the
REML likelihood value, nc is the number of covariance crosscovariance structure. The likelihood value of thebivariate model (Log L 377.9) was indeed muchparameters in the model, p is the number of fixed ef
fects, and N is the total number of observations. Stan higher than that for the two univariate analyses (Log L 329.0). Therefore, taking into account the correlationdard likelihoodratio tests could be used for nested mod
els. Specific cases include testing if certain parameters function between the two variables fits the actual processmuch better. Estimates obtained for the chosen bivariin matrices V(t) or are equal to zero. A nonparametric
mean function was used for both traits (i.e., a separate ate model are given in Table 3 and the first graph ofFigure 4 gives the genetic crosscorrelation estimates.mean was fitted for each distinct age in the data), which
ensures a consistent estimate of the covariance structure They were found to be negative at all ages, nonstationaryand nonsymmetric. Fecundity and mortality were more(Diggle et al. 1994).
The best models chosen in the univariate analyses are strongly negatively correlated at a similar age (diagonalterms), and the correlation intensity decreased whengiven in the first part of Table 2. For the genetic part,
a Cauchy correlation with quadratic variance was chosen ages became farther apart.As they allow a simple and straightforward extensionfor mortality and a nonstationary exponential correla
tion with a constant variance was chosen for fecundity. to the multivariate case, random regression models(RRM) are most often used for multivariate analyses ofMany different correlation and variance functions were
investigated for the bivariate analysis and the best ones longitudinal data. They may not always, however, be themost appropriate methodology. In this example, forregarding the likelihood value and BIC criterion are
given in Table 2. In the bivariate model, the correlation instance, the likelihood value was much higher for thecharacter process approach (Log L 377.9) than forfunction has to be the same for the two variables and
was chosen here to be a nonstationary Cauchy correla a bivariate quadratic random regression model (LogL 134.7), despite having far more parameters (42 fortion (with parameter of the nonstationary extension
as in Equation 6). For the variance function, more flex the RRM compared to 23 for the CP model). Moreover,increasing the order of the polynomials dramaticallyibility can be achieved in the choice of the function by
setting some parameters of matrices A, B, and C to zero. increases the number of parameters (for instance, fromquadratic to cubic: 42 to 72 parameters).In the bivariate model, the chosen function was, as in
the univariate case, quadratic for mortality and constant Although the difference was not as important as forrandom regression models, the likelihood value was alsofor fecundity. Estimates obtained for the variance and
correlation functions for fecundity and mortality were higher, in this example, for the bivariate CP modelthan for a bivariate structured antedependence modelvery similar with the univariate and bivariate models
(although their analytical forms were different, as shown (Jaffrezic et al. 2003; Log L 322.8, 24 parameters).
483Genetic Analysis of Correlated FunctionValued Traits
TABLE 3
Parameter estimates (and standard errors) for the bivariate genetic analysis of fecundity and mortality inDrosophila with the bestfitting bivariate character process model, for the BIC criterion, given in Table 2
Parameters Genetic Environmental Parameters Genetic Environmental
1 0.49(0.22) 5.82(2.45) b1 14.91(2.14) 0.04(0.19)2 1.20(0.61) 18.17(8.57) b 2 0.0 0.04(0.70)1 0.71(0.44) 1.16(0.30) b 3 0.0 2.22(0.44)2 0.18(0.31) 2.32(0.83) c 1 16.64(2.19) 0.0a 1 2.71(0.46) 0.92(0.13) c 2 0.0 0.75(0.56)a 2 1.99(0.14) 2.40(0.18) c 3 0.0 1.46(0.36)a 3 0.59(0.07) 0.41(0.12) 0.37(0.14) 0.43(0.11)
The variance functions are defined by ln(V(t)) A Bt C t 2, where t age/10 and
A a1 a3
a3 a2
and similarly for matrices B and C. Parameters 1, 2, 1, and 2 define matrix as specified in appendix afor the Cauchy correlation function, and is the nonstationary parameter (Equation 6).
The estimated genetic crosscorrelations obtained estimated phenotypic crosscorrelations and the unstructured estimates, the Vonesh concordance coeffiwith the three methodologies are presented in Figurecient (Vonesh et al. 1996) was used, as presented by4. Their patterns were found to be very different, evenJaffrezic and Pletcher (2000), considering the unbetween the bivariate CP and SAD models, althoughstructured estimates as the correct values.there was only a small difference in their likelihood
The concordance coefficients were 0.77 for the CPvalues. As the true genetic crosscorrelations are notmodel, 0.52 for the SAD model, and 0.73 for the RRknown, it is difficult, however, to know which patternmodel (a perfect fit being at 1.0). As shown with theis the closest to reality and how much discrepancy stilllikelihood value, the bivariate character process modelremains compared to the actual values.fit best the phenotypic crosscorrelation structure. OnTo address these issues, a phenotypic analysis wasthe other hand, the goodness of fit was found higherperformed on these data, which allows us to obtainfor the bivariate random regression model than for theestimates for an unstructured covariance matrix (22 structured antedependence model (0.73 compared to22). This was not possible in the genetic study due to0.52), although the likelihood value was much higherthe very large number of parameters to be estimated.for the SAD model (Log L 183.8) than for the RREstimated phenotypic crosscorrelations obtained withmodel (Log L 67.7). The SAD models were thereforethe different models are presented in Figure 5 and thein this case better able to model the covariance structureunstructured estimates were considered as the referencefor each trait separately, as in univariate analysis,model. Once again, the four estimated patterns werewhereas the random regression models were better ablefound to be very different. As in the genetic analysis,to fit the crosscorrelation structure. The choice of thethe likelihood value was the highest for the charactermodel should therefore not be made regarding theprocess model ( 197.1 with a nonstationary Cauchylikelihood value only, but also depends on the prioritiescorrelation function and quadratic V(t) function, withof the study. In any case, in this particular study, the14 parameters, BIC 58.6), compared to a bivariatecharacter process model was more appropriate than theSAD(1) model (Log L 183.8, with 12 parameters,other two methodologies.BIC 53.0), a bivariate SAD(2) model (Log L 185.9,
Figure 5 shows, however, that the obtained crosscorwith 14 parameters, BIC 47.4), and a quadratic bivarirelation patterns were still all quite different from theate random regression model (Log L 67.7, 21 parameunstructured phenotypic estimates and that there is still,ters, BIC 97.7). The highest likelihood value, obtherefore, scope for improvement.tained here with the bivariate CP model, is still, however,
quite far away from that of the unstructured model (LogL 535.6). But as the number of parameters in the
DISCUSSIONunstructured model is very large ( 253), its BIC valueis extremely low ( 522.4), and the best model with The character process model, originally proposed byregard to the BIC criterion here, therefore, is the bivari Pletcher and Geyer (1999) to analyze functionvaluedate CP model. traits, is based on a parametric modeling of the variance
and correlation functions of a stochastic process. It modTo have a measure of the discrepancy between the
484 F. Jaffrezic, R. Thompson and S. D. Pletcher
Figure 4.—Estimated genetic crosscorrelations between fecundity and mortality obtained with the chosen CP model, a bivariateSAD(1) model, and a quadratic random regression model.
els the covariance structure with a small number of properties of the univariate character process approachand simultaneously allow a parametric modeling of theinterpretable parameters. A special case of these models
has been independently proposed in the statistical litera crosscovariance structure. The proposed extension wasbased on an idea presented by Sy et al. (1997) for theture, namely the OrnsteinUhlenbeck process (Taylor
et al. 1994). It is equivalent to a character process model OrnsteinUhlenbeck process and was generalized toother kinds of correlation functions, including thosewith an exponential correlation function and constant
variances and represents a continuous time extension that are nonstationary.Models were presented here in the bivariate case, butof a firstorder autoregressive model.
We proposed an extension of the univariate character extension to the analysis of more than two correlatedfunctionvalued traits is straightforward and accomprocess model to the multivariate case. Our goal was to
develop a method of analysis for two or more correlated plished by increasing the dimensions of matrices V and in accord with the number of traits analyzed.functionvalued traits that would retain all the desirable
485Genetic Analysis of Correlated FunctionValued Traits
Figure 5.—Estimated phenotypic crosscorrelations between fecundity and mortality obtained with the unstructured model(US); a character process model CP QuadCauchyNS: quadratic polynomial used to model V(t), Cauchy function for (t s)with the nonstationary extension; a bivariate SAD(1) model; and a quadratic random regression model.
The first part of the simulation study highlighted the data and that the three models (random regression,structured antedependent, or character process) cansimilarities between the bivariate CP models with an
exponential correlation and bivariate firstorder SAD be worthwhile depending on the particular biologicalphenomenon studied. When the crosscovariance strucmodels (Jaffrezic et al. 2003), as in the univariate case.
Further differences between the two approaches appear ture is symmetric and stationary with quite high correlations, the most appropriate model to use might be awhen higher orders of antedependence are considered
or when other parametric correlation functions are used simple random regression model. When the crosscorrelation structure becomes more complex it should bein the CP models.
It was found in the second part of the simulation study either structured antedependence or character processmodels, especially because the number of parametersthat the choice of the most appropriate methodology is
highly dependent on the covariance structure of the required in a more complex random regression model
486 F. Jaffrezic, R. Thompson and S. D. Pletcher
Sy, J. P., J. M. G. Taylor and W. G. Cumberland, 1997 A stochasticdramatically increases. For the Drosophila analysis, themodel for the analysis of bivariate longitudinal AIDS data. Biomet
bivariate character process model proved to be the most rics 53: 542–555.Taylor, J. M. G, W. G. Cumberland and J. P. Sy, 1994 A stochasticappropriate.
model for analysis of longitudinal AIDS data. J. Am. Stat. Assoc.The multivariate extension of the character process89: 727–736.
models represents a flexible and powerful technique Vonesh, E., V. Chinchilli and K. Pu, 1996 Goodnessoffit in generalized nonlinear mixedeffects models. Biometrics 52: 572–587.for the genetic analysis of two or more functionvalued
traits. Although the observed measurements are avail Communicating editor: M. K. Uyenoyamaable only on a discrete timescale, this approach canmodel the fact that the underlying process is continuousand therefore can deal with highly unbalanced data. As APPENDIX A: IMPLEMENTATIONvariance parameters are assumed to change with time,
As suggested by Sy et al. (1997), to calculate the matrixother environmental factors of heterogeneity could beexponentiation used in the correlation functions, diagoincluded in the variance modeling, as suggested bynalization of matrix is used,Foulley and Quaas (1995). Further research might
extend these multivariate models to include the genetic 1, (A1)analysis of nonnormally distributed traits, as studied by
where is a diagonal matrix of the distinct eigenvaluesPletcher and Jaffrezic (2002) in the univariate case.1 and 2 of , and is a 2 2 matrix whose columns
We are most grateful to JeanLouis Foulley, William G. Hill, Nancy are the right eigenvectors. The matrix exponential isHeckman, Jay Beder, and two anonymous referees for very interesting
then written and evaluated ascomments and ideas. Thanks go to J. Curtsinger and A. Khazaeli forgenerously providing published and unpublished data.
e(ts) e(ts)1. (A2)
For the exponential correlation,LITERATURE CITED
exp((t s)) 1 2
1 1 e1(ts) 0
0 e2(ts) 1 2
1 1 1
,DeRisi, J. L., V. R. Iyer and P. O. Brown, 1997 Exploring themetabolic and genetic control of gene expression on a genomicscale. Science 278: 680–686. (A3)
Diggle, P. J., K. Y. Liang and S. L. Zeger, 1994 Analysis of Longitudinal Data. Oxford University Press, Oxford. where parameters 1 and 2 are the elements of matrix
Foulley, J. L., and R. L. Quaas, 1995 Heterogeneous variances in (Sy et al. 1997). The Gaussian is similar, with (t s)Gaussian linear mixed models. Genet. Sel. Evol. 27: 211–228.being replaced by (t s)2. For the Cauchy correlation,Gabriel, K. R., 1962 Antedependence analysis of an ordered set
of variables. Ann. Math. Stat. 33: 201–212. taking advantage of the fact that 1 11, itGilmour, A. R., B. J. Gogel, B. R. Cullis, S. J. Welham and R. follows thatThompson, 2002 ASREML User Guide Release 1.0. VSN Interna
tional, Hemel Hempstead, UK.Jaffrezic, F., and S. D. Pletcher, 2000 Statistical models for esti (I (t s)2)1 1 2
1 1 mating the genetic basis of repeated measures and other functionvalued traits. Genetics 156: 913–922.
Jaffrezic, F., I. M. S. White, R. Thompson and P. M. Visscher, 2002 1/(1 1(t s)2) 00 1/(1 2(t s)2)Contrasting models for lactation curve analysis. J. Dairy Sci. 84:
968–975.Jaffrezic, F., R. Thompson and W. G. Hill, 2003 Structured antede
1 2
1 1 1
.pendence models for genetic analysis of multivariate repeatedmeasures in quantitative traits. Genet. Res. 82: 55–65.
Meuwissen, T. H. E., and M. H. Pool, 2001 Autoregressive versus For the variance functions V(t), an eigenvalue decomrandom regression testday models for prediction of milk yields.position, ln V(t) P(t)(t)P (t), can also be used. ItInterbull Bull. 27: 172–178.
Meyer, K., 2001 Estimating genetic covariance functions assuming follows that V(t)1/2 P(t)exp(1⁄2(t))P(t).a parametric correlation structure for environmental effects. Parameter estimations were obtained using the OWNGenet. Sel. Evol. 33: 557–585.
function of ASREML (Gilmour et al. 2002), which reNunezAnton, V., and D. L. Zimmerman, 2000 Modeling nonstationary longitudinal data. Biometrics 56: 699–705. quires us to provide the first derivatives of the covariance
Pletcher, S. D., and C. J. Geyer, 1999 The genetic analysis of age matrix with respect to each parameter. The nonstationdependent traits: modeling a character process. Genetics 153:ary parameter of Equation 6 is obtained at the same825–833.
Pletcher, S. D., and F. Jaffrezic, 2002 Generalized character pro time as the other parameters of the covariance matrix.cess models: estimating the genetic basis of traits that cannot beobserved and that change with age or environmental conditions.Biometrics 58: 157–162.
Pletcher, S. D., D. Houle and J. W. Curtsinger, 1998 Agespecificproperties of spontaneous mutations affecting mortality in Dro APPENDIX B:sophila melanogaster. Genetics 148: 287–303. PROPERTIES OF THE DEFINED BIVARIATE
Pletcher, S. D., S. J. Macdonald, R. Marguerie, U. Certa, S. C. CHARACTER PROCESS COVARIANCE FUNCTIONStearns et al., 2002 Genomewide transcript profiles in agingand calorically restricted Drosophila melanogaster. Curr. Biol. 12 When J times of measurement are available for two(9): 712–723.
variables Y1 and Y2, and for each individual i, observaSchwarz, G., 1978 Estimating the dimension of a model. Ann. Stat.6: 461–464. tions are ordered as yi (yi11, yi21, . . . , yi1J , yi2J). The
487Genetic Analysis of Correlated FunctionValued Traits
whole genetic covariance matrix G of dimension (2J nential function is considered. In this case, matrix isdefined as for the bivariate OrnsteinUhlenbeck process2J ) can be written as G VV . By construction (Equa
tion 5), matrix G will be symmetric. Matrix V is block (Sy et al. 1997) and therefore satisfies the positive definiteness property. When considering other functions asdiagonal: V (Vj)j1,J, where Vj are 2 2 matrices de
fined by Vj (V(tj))1/2, where ln V(tj) A Btj proposed in the univariate case by Pletcher and Geyer(1999), such as Gaussian or Cauchy, the property isCt 2
j , or is specified as in Equation 10. In both cases, matrices Vj, for j 1, . . . , J, are positive definite. Matrix maintained. Therefore, the proposed function for the
bivariate CP model satisfies the theoretical requireis a 2J 2J symmetric matrix defined, for (i, j 1, . . . ,J ), by (2(i 1) 1:2i, 2(j 1) 1:2j) ij, where ments of a covariance function as it is symmetric and
positive definite.ij (exp((ti tj)))1ji and ji ij, if an expo
GAUSSIAN STOCHASTIC PROCESSES
S. Chaturvedi
School of Physics, University of Hyderabad
Hyderabad  500134, India
Introduction
Among all possible stochastic processes, Gaussian stochastic processes constitute
a very important class. These occur in many areas of physics. A historically impor
tant example of a Gaussian stochastic process is that of Brownian motion. The inten
sity of the light emitted by a thermal source is another example of such a process.
The main reason why Gaussian stochastic processes have been studied so extensively
is that they are completely specified by the first two moments. This makes them par
ticularly easy to handle.
We shall begin our discussion on Gaussian stochastic processes by studying
Gaussian random variables. This, as we shall see later, will enable us to define
Gaussian stochastic processes and to discuss some of their important properties.
Gaussian Random variables
Let us briefly recapitulate what a Gaussian random variable is.
A random variable X is defined by specifying
(i) the range of values x it can take and
(ii) a probability distribution over this range.
A random variable is said to be Gaussian if the range of values it can take ex
tends from  ~ to + oo and if the probability distribution over this range is a
Gaussian distribution
2 I ( x  <x>)
P (x) exp ( ) (i)
If instead of a single variable we have a vector X having n components then (1)
generalizes to i
(detA) ~ T P (Xl .... Xn) n/2 exp E (x<X~ A (x__<X_>)~ (2)
(2~)
where A is a positive definite symmetric matrix. The probability distribution (2)
is known as a multivariate Gaussian distribution. We shall now discuss some of its
properties.
(a) Let us first check that <X> appearing on the R.H.S. of (2) is indeed the mean
value of ~ and that the distribution is correctly normalised.
1 T A <X>= (detA)½ dx(x)exp E ~ (x<X>) (x<X>)~  (2~) n/2 ~
1 yT (detA) ½ dy Ey+<X>~ exDE 7 A y 3 (2~) n/2      
co
Stochastic Processes: Formalism and Applications Lecture Notes in Physics, 1983, v. 184, pp. 1929
20
since A is a symmetric matrix, it can be diagonalized by an orthogonal matrix.
S T A = s T AS ; S =
Putting y = Sz in the expression above and making use of the fact that the
Jacobian of the transformation is unity we obtain
<X > = (detA)½ dz~s z + <X>] exp E ~ zTA z]  (207) n/2   
_oo
The first term, being odd in z, vanishes so that
<X > = <X> (datA)½ 7 dZl dz exp [1 n 2]  (2JD n/2 "'" n ~ Z A z. i=l i l o0
(detA) ½ (2 2T) n/2 <X> n/2 <X >
 (27[) (A 1 .  .A n) ½ 
Here Ai are the eigenvalues of A m
This also shows that (2) is correctly normalised.
(b) We now want to show that
< (Xi<Xi>) (Xj<Xj >) >
< (Xi<Xi>) (Xj<Xj >)>
= ~l)i j
(det A ) ½ ~ 1 T (2jT)n/2 dy__ (yiYj) exp E~ Y A y~
oo
(detA) ½ d z (Sz_)i(Sz) j exp E ~z Az] (2 5q) n/2 _oo
(3)
(detA) ½ ~ I n 2 (2jT)n/2 ~SidSj~ Jdz z~ z~ exp [ ~Z flizi]
co i=l
8 ~ s " s T) = >2 s s = ( i~ 1 = (A 1) ij
we thus see that a Gaussian distribution is completely determined by the mean
values of the variables and the second moments.
Henceforth we shall assume for convenience that the variables X have zero mean 1
i.e. <X> = 0. All the results for the case <X>~ 0 can easily be obtained by re
placing X in the following by X <X__>.
(c) We now establish a very useful property of multivariate Gaussian distribution
~f(X) > <Xif (X~ > = ~ <XiXj > < ~ (4)
3 3 where f(X) is a polynomial in the X . 's.
 l
21
To prove (4) we rewrite it using (3) as
Df (x) < Xif (X_) > = ~ (Al) ij ~~
J J or
~f (X) > Aij < Xj f (X__) > = <
j i
Now
Aij < Xjf (X) > (detA)½
j  (2 •) n/2 _co
= (datA !~ T
(2j7) n/2 J _co
and on integrating by parts
~f (X) = < ~ > 1
Hence the proof.
dx f(x) Zj Aij xj exp~ m xlAlmxm ~
i T dx f(x) ~ exp [ ~x Ax3
l
(d) Repeated use of (4) enables us to show that all the even moments of a Gaussian
distribution with zero mean factorize pairwise into the second moments. (The odd
moments of such a distribution of course vanish as is easily seen.) Consider for
instance the four£h moment < XiXjXkXI>. From (4) we have
< XiXjXkXl> : ~ <XiX~>< ~ (XjXkXl)>
= <XiXj><XkXI> + <XiXk>< XjXI>+<XiXI><XjXk> (5)
The R.H.S. of (5) can be written compactly as
= Z < XpXq>< XrXs > pairs
where the indices p,q,r,s etc. are the same as i,j,k;l and the summation extends
over all different ways in which i,j,k,l can be divided into pairs.
Proceeding in a similar fashion, we have, in general
< XiXjXkX 1 ..... > = Z <XpXq><XrXs> ...... (6) pairs
Thus the even moments of a Gaussian with zero mean factorise as in (6). For a moment
of order 2k, there are (2k~/2 k k~) terms on the R.H.S. of (6). Conversely one can
show that if the moments of a probability distribution factorise as in (6) then the
distribution is a Gaussian. It therefore follows that (6) is both necessary and
sufficient for a distribution to be a Gaussian and is called the moment theorem for
Gaussian distributions.
(e) A convenient way of calculating the moments of a probability function is to work
out its characteristic function
22
C(h__) = < exp (i h.X) >
= ~ (ihl)ml (ih2)m2 ... < X nh xmz "" 1 2 ...... > (7) !
m = o ml [ m2" Z
Given the characteristic function, an arbitrary moment can be worked out by diffe
rentiating it an appropriate number of times w.r.t, the h.'s and then setting h = o. l
For Gaussian distribution with zero mean c(h) has a very simple form
1 hTAlh ] (8) C (h~ = exp ~  ~ _ _
as is easily seen: ½
1 i i xTh C(h) (detA) dx exp E ~ xTAx + ~ hTx +  (2 3T) n/2 . . . . . . .
% 1 h T i (detA) dx exp E (xiAlh )T A(xiAlh ) ~ A
(2 jT) n/2 . . . . . .
1 h T (detA) ½ = exp ( ~_ Alh)_ j dy exp[ ~ Yl TA y~ (2 ]i) n/2  _
hi
i h T = exp (~_ Alh)
Using (3) we may write (8) as
1 ~. h i <XiX j > hj] (9) C (h_) = exp E ~ ij
Similarly for a Gaussian with <X>~ O, C(h) is found to be
i h T A i c(h) = exp[~_ h + i hT<x>3
1 = exp[iZ hi<xi> ~ hi<<XiXj> hj~ (i0)
i ~3
Another useful quantity is the logaritham of the characteristic function  the cumulant
generating function:
K(hJ = in C(h A (ihl) m I (ih2) m~
, ..... << Xl Im X2 ~m .... >> m. =o ml ' m2" 1
For a Gaussian we find that the cumulant generating function
1 hTAI K (h_) = i h i < X >  ~ _ h
! X hi< < = iE hi<X i>  2 . , XiXj>>hj i z3
(ii)
(12)
is at most quadratic in the auxiliary variables h.'s and hence all cumulants higher 1
than the second vanish.
23
We mention here a theorem due to Marcinkiewicz EI3~ which states that
the cumulant generating function of a probability distribution can not be a poly
nomial in the hi's of degree greater than 2. In other words either K(~) is at best
quadratic in h or contains all of powers of h. This in turn implies that either
all but the first two cumulants of a probability distribution vanish or there are
an infinite number of non vanishing cumulants.
With this background on Gaussian random variables we now go over to defining
Gaussian stochastic processes.
Gaussian Stochastic Processes
A stochastic process is a function of two variables t, the time and~ a random
variable.
X~(t) = f (t, j~) (13)
We may look upon (13) in two ways:
(i) For each value (O the random variable3%takes, X (t) is just an ordinary function
of time and is called a realisation of the stochastic process x~(t). The stochastic
process is thus an ensemble of all such realisations.
(ii) For a fixed t, X (t) is a sotchastic variable  being a function of a random
variable X. The stochastic process X~(t) may be regarded as a continuum of random
variables one for each t.
~From the second point of view it therefore logically follows that in order to
define a stochastic process completely we need to specify an infinite number of
joint probabilities,
P1 (x, t)
P2 c~'t2; h 'h ) (x ,t "x2,t2"x ,t I) P3 3 3' ' 1
....... . .........
which say ~hat is the probability that X~(t) has a value x at t, what is the probabi
lity that X~(t) has a value x I at t I and x 2 at t 2 etc. Given this infinite (over
complete) set of joint probabilities the stochastic process is completely defined.
A stochastic process is said to be Gaussian if all these joint probabilities
are Gaussian.
Pn(Xn,tn; . . . . . ; x2 , t 2 ; x l , t 1)
(det A ) ½ 1 exp E ~ .~ (xi<X(ti)>) Aij (xj<X(tj)>) ~ (14)
(2JI) n/2 13
i where the matrix A.. is the inverse of the matrix A with elements
13
( A I) . . = <(X(ti) <X(t2)>) (X(tj)<x(tj)>) > x3
= ~ X (ti) X (tj)~ (15)
24
in analogy with (3). <<X(ti) X(tj)~ is known as the auto correlation function.
Thus a Gaussian stochastic process is completely characterized by<X(t)> and
the auto correlation function. All the formulae we had derived previously for
Gaussian distributions can now be generalised to stochastic processes by replacing
partial derivatives by functional derivatives, summation over i by integration over
t etc. We list them below.
(a) Novikov's Theorem: For a functional f[X~ of the stochastic process X(t) we have
/ \ <X(t) fiX] > = dt' <X(t) X(t')> \ ~X(t')/ (16)
(b) Moment theorem:
For a Gaussian stochastic process with < X(t)> = o, the odd moments vanish
and the even moments factorise pairwise.
<X(ti)X(tj)... > = ~ <X(tp) X(tq)> <X(tr) X(ts) > ..... (17) pairs
(c) The characteristic functional oo
C ~] = <exp i[ dt h(t) X(t)>
im co = ~. ~. ~ dtl...Jdt m h(t l)...h(t m) < X(t l)...x(t m) > (18)
m=o
for a Gaussian stochastic process with zero mean is given by
1 C[h] = expE ~ ~ dtlfdt 2 h(tl) h(t2) <X(tl) X(t2) > ~ (19)
and for <X(t)>~ 0 by
C[h] = exp E iJdtlh(tl)<X(tl) >  21 fdtlfdt2h(tl)h(t2)<<X(tl)X(t2)~3 (20)
where
~X(t l) X(t2)~ = <(X(tl)<X(tl)~ ) (X(t2)<X(t2)>)> (21)
(d) The cumulant generating functional
K[h 3 = in C[h]
is fdtl...fdtm h(t l)...h(tm)<x(t I) ...X(tm)> > (22) m=o
f o r a G a u s s i a n d i s t r i b u t i o n w i t h < X ( t ) > = 0 r e a d s
1 K [ h ] =  ~ j d t l f d t 2 h ( t l ) h ( t 2 ) < x ( t t ) x ( t 2 ) > (23)
and f o r < X ( t ) > ¢ 0 a s
i m p l y i n g t h a t a l l t h e c u m u l a n t s o f a G a u s s i a n s t o c h a s t i c p r o c e s s h i g h e r t h a n t h e
s e c o n d v a n i s h . The M a r c i n k i e w i c z t h e o r e m h o l d s f o r s t o c h a s t i c p r o c e s s e s a s w e l l .
25
Of special interest to physicists and mathematicians are a class of stochastic
process known as Markov processes E4~  A Markov process is fully determined by
a single time distribution Pl(X,t) and a conditional probability defined as
P2(Xl,tl;x2,t2 )
p (xl,tlL x2t2) ~ pi (x2,t2) (24)
satisfying
(i) the ChapmanKolmogorov equation
P(x3,t3Jxl,tl) = \; dx2P(x3,t31x2,t2) P(x2,t21 xl,tl) for t3>t2>t I (25)
and
(ii) Pl(X2,t2) = f dx I P(x2,t21xl,tl) P(Xl,tl) (26)
Another class of stochastic processes which are of special relevance to physics
are the stationary processes. A stochastic process is stationary if all the joint
probabilities are invariant under a shift in time.
Pn(Xn,tn +q; .... ;x2,t2 +~; xl,tl+~) = Pn(Xn,tn; .... ;x2,t2;xl,tl) (27)
This necessarily implies that the single time probability Pl(X,t) is independent of
time. Equation (27) in turn implies that
<X(tn+q) .... X(t2+~ ) Xl(tl+~)> = <X(tn) .... X(t2)X(tl) > (28)
Having thus defined these three important classes of stochastic processes viz.
Gaussian, Markovian and stationary stochastic processes, a natural question to ask
is as follows: Among all Gaussian stochastic processes what characterizes those which
have the additional attributes of being stationary and Markovian? The answer to this
question is provided by Doob's Theorem:
A stationary Gaussian process is Markovian only if the auto correlation function
is an exponential.
~X (tl)X(t2)~ oC exp~ tlt2 I (29)
(For a multicomponent stochastic process (29) is to be replaced by its obvious genera
lisation
~ X(tl)XT (t2)~ 0C exp F I tlt21 (30)
where I ~ is a constant matrix.)
We now briefly outline the proof of this important theorem.
For a Gaussian stochastic process the joint probabilities have the form given
in (14). (For simplicity we shall consider Gaussian processes with <X(t)> = 0).
Substituting for P2(Xl,tl;x2,t2) and Pl(X2,t2) in (24), we find that the conditional
probability for a Gaussian process has the following general form
26
P(Xl,t l]x2,t 2)
/2/[ 0 2 (t I) (i~ (tl,t2) ~
1 i £ (tl' t2)o(tl) ,2
expE 2 o2(tl) (i_ e2(tl,t2)] (Xl ~(~2 ) x21 ] (31)
where 2
O ( t ) = <X(t) X(t)> (32)
and < X(tl) X(t 2) >
e (tl,t 2) O_(tl)O_(t2) (33)
e(tl,t2) is known as the correlation coefficient.
From (31) it follows that the conditional average of X(t) at time t I given that
it had a value x 3 at time t 3
<X(tl)~X(t3)=x 3 ~ f dx I x I P(Xl,tll x3,t 3) (34)
is given by
e(tl,t3) O(t I) <X(tl)>X(t3)=x3 0_(t3) x 3 (35)
For a GaussMarkov process, we have, on using (35) and the Chapman Kolmogorov
equation (25)
< X(tl)>X(t3)=x3
/JdXldX x I P(Xl,t I
e (tl,t 2) ~(t I) =
O(t 2)
= e (tl,t2) (](t I)
dx I x I P(xl,t I x3,t 3)
x2,t 2) P(x2,t 2 I x3t3);tl~ t2~ t 3
dx~2P(x2,t21 x3,t 3)
C (t 2,t 3) @(t 2)
G (t 2) O (t 3) x3 (36)
From (35) and (36) we have
e (tl,t 3) : e(tl,t 2) e (t2,t 3) tl~t 2 ~ t 3 (37)
Thus we find that a necessary condition for a Gaussian process to be Markovian is
that the correlation coefficients must satisfy (36). In fact this condition turns out
to be both necessary and sufficient E5 ~ .
Let us now consider a stationary Gaussian process. Stationarity implies that
O(t) is independent of time and that <X (tl) X (t2) > and hence e (tl,t 2) depends only
on tlt 2. From (36) it follows that for such a process to be Markovian we must have
27
e(tlt 3) = e(tlt 2) e(t2t 3) (38)
This functional equation is satisfied only if ~ (tlt3) is an exponential
e(tlt3) = exp  ~ (tlt 3)
i.e. <X(t I) X(t2)> oc exp  t tlt21 (39)
Hence the proof.
Examples of Gaussian Stochastic Processes
In conclusion we list some important Gaussian stochastic processes which one
frequently encounters in physics. It contains examples of Gaussian stochastic proce
sses which have either the Markov property or stationarity or both or neither of them.
i. Gaussian White noise : The Gaussian "stochastic process" ~(t) characterized by
<~(t)> = o (40)
<~(t) ~(t')> = ~ (tt') (41)
is usually referred to as G aussian white noise. Such a stochastic process was first
introduced by Langevin in the context of Brownian motion. Gaussian white noise is not
a stochastic process in a strict mathematical sense. However, in physics it is often
used as a model for very rapid fluctuations.
2. Wiener Process: Wiener process W(t) is an example of a Gaussian Markovian non
stationary stochastic process and is characterised by
<W(t)> = 0
<W(t) W(t')> = min (t,t')
That it is a non stationary process is clear from (43).
that O2(t) and e(tl,t2) for this process are given by
2 0 (t) = < W(t)W(t) > = t (44)
< W(tl)W(t 2) > t~t ~ e (tl,t2) = o_ (tl)O.(t2) = ; tl>t 2 (45)
The single time probability Pl(W,t) is therefore given by
2 w Pl(W,t ) = 1 exp [  ~~] (46)
J2/[ t
Substituting from (44) and (45) in (31) we find that the conditional probability
P(Wl,tllw2,t2) for this process is given by
1 1 (Wl_W2)2 ] (47) P(wl,tlL w2,t2) exp [ 2  
J2 ~ (tlt 2) (tlt2)
That this process is a Markov process is easily checked by verifying that
e(tl,t2) satisfies (37).
(42)
(43)
From (43) it also follows
28
We can regard Wiener process as an integral of the Gaussian white noise t
W(t) = f at ~(t) (48)
in the sense that (48) together with (40) and (41) reproduce (42) and (43). We can
also write (48) formally as a differential equation
dW = ~(t) (49)
d~
3. Ornstein Uhlenbeck Process: This is an example of a Gaussian Markovian stationary
stochastic process. It is characterised by
<X (t) > = 0 (50)
<Y (tl) Y (t2) > = exp E Itlt21 ~ (51)
From (50) and (51) we can construct Pl(Y,t) and Pl(Yl,tllY2,t2) just as in the
case of Wiener process. These are given by
1 y2j pl(Y,t ) 1 exp E
j2~
(yl_Y2e (t, t~) ) 2
P(Yl,tl ly2,t2 ) = 1 exp E (t tz)) / 2(t t2) ~ 2(le 2 / 25q (le . ,
(52)
(53)
Again, as in the case of Wiener process,
cess in terms of Gaussian white noise as follows
t
Y(t) = ~ dt' e (tt') ~(t')
in the sense that (54) reproduces (50) and (51).
differential equation
We may express the Ornstein Uhlenbeck pro
(54)
Equation (54) may be written as a
dY d~ =  Y + ~(t). (55)
4. An example of a Gaussian stochastic process which is stationary but non Markovian
is easy to construct. Any Gaussian stationary process with a non exponential auto
correlation function is, according to Doob's Theorem, non Markovian.
5. Finally an example of a Gaussian stochastic process which is neither Markovian
nor stationary is the stochastic process defined to be the integral of Ornstein
Uhlenbeck process t
Z(t) = Jo Y(t) dt' (56)
For this process we can deduce using (50) and (51) that
<Z (t) > = 0 (57)
29
<Z(tl)Z(t2)> = et'+e t~ lelt't~l+2 min (tl,t2)
This process is not stationary as is evident from (58).
that (37) is not satisfied for this process and hence it is non Markovian.
We may write (55) as a differential equation
dZ  = Y(t) (59) dt
where Y(t) obeys (54). With Z(t) and Y(t) identified with the position and velocity
of a Brownian particle, (59) and (55) are the Langevin equation for a free Brownian
particle. Although Z(t) is non Markovian Z(t) and Y(t) together constitute a Markov
process.
The material presented above can be found in some form or another in any good
text book on stochastic processes. See for instance E6~ and E7~.
(58)
It is also easy to check
References
i. J. Marcinkiewicz, Math. Z. 4_~4, 612 (1939). 2. D.W. Robinson, Comm. Math. Phys. ~, 89 (1965). 3. A.K. Rajagopal and E.C.G. Sudarshan, Phys. REv. AI0, 1852 (1974). 4. See Lectures by R. Vasudevan in these proceedings. 5. W. Feller, "Introduction to Probability Theory and its Applications", vol.2
Wiley, New York (1966). 6. N.G. Van Kampen, "Stochastic Processes in Physics and Chemistry", North Holland,
Amsterdam, New York, Oxford (1981). 7. C.W. Gardiner "A Hand book of Stochastic Methods for Physics, Chemistry and Natural
Sciences". To be published, Springer (1983).
Chapter 5
Brownian Motion
Brownian motion processes originated with the study by the botanist Brownin 1827 of the movements of particles suspended in water. As a particle isoccasionally hit by the surrounding water molecules, it moves continuouslyin three dimensions. Assuming the infinitesimal displacements of the particleare independent and identically distributed, the central limit theorem wouldimply that the size of a typical displacement (being the sum of many smallones) is normally distributed. Then the continuous trajectory of the particlein R
3 would have increments that are stationary, independent and normallydistributed. These are the defining properties of Brownian motion. This diffusion phenomenon, commonly encountered in other contexts as well, gaverise to the theory of Brownian motion and more general diffusion processes.
Brownian motion is one of the most prominent stochastic processes. Its importance is due in part to the central limit phenomenon that sums of randomvariables such as random walks, considered as processes in time, converge toa Brownian motion or to functions of it. Moreover, Brownian motion playsa key role in stochastic calculus involving integration with respect to Brownian motion and semimartingales. This calculus is used to study dynamicalsystems modeled by stochastic differential equations. For instance, in thearea of stochastic finance, stochastic differential equations are the basis forpricing of options by BlackScholes and related models. Brownian motion isan important example of a diffusion process and it is a Gaussian process aswell. Several variations of Brownian motion arise in specific applications, suchas Brownian bridge in statistical hypothesis testing. In operations research,the major applications of Brownian motion have been in approximations forqueueing systems, and there have also been applications in various areas suchas financial models and supply chains.
This chapter begins by introducing a Brownian motion as a Markov processthat satisfies the strong Markov property, and then characterizes a Brownianmotion as a Gaussian process. The second part of the chapter is a study ofhitting times of Brownian motion and its cumulative maximum process. This
R. Serfozo, Basics of Applied Stochastic Processes,Probability and its Applications.c© SpringerVerlag Berlin Heidelberg 2009
341
342 5 Brownian Motion
includes a reflection principle for Brownian sample paths, and an introductionto martingales and the optional stopping theorem for them.
The next major results are limit theorems: a strong law of large numbersfor Brownian motion and its maximum process, a law of the iterated logarithm for Brownian motion, and Donsker’s functional limit theorem showingthat Brownian motion is an approximation to random walks. Applications ofDonsker’s theorem yield similar Brownian approximations for Markov chains,renewal and regenerativeincrement processes, and G/G/1 queueing systems.
Other topics include peculiarities of Brownian sample paths, geometricBrownian motion, Brownian bridge processes, multidimensional Brownianmotion, Brownian/Poisson particle process, and Brownian motion in a random environment.
5.1 Definition and Strong Markov Property
Recall that a random walk in discrete time on the integers is a Markov chainwith stationary independent increments. An analogous process in continuoustime on R is a Brownian motion. This section introduces Brownian motion asa realvalued Markov process on the nonnegative time axis. Its distinguishing features are that it has stationary, independent, normallydistributed increments and continuous sample paths. It also satisfies the strong Markovproperty.
We begin by describing a “standard” Brownian motion, which is also calleda Wiener process.
Definition 1. A realvalued stochastic process B = B(t) : t ∈ R+ is aBrownian motion if it satisfies the following properties.(i) B(0) = 0 a.s.(ii) B has independent increments and, for s < t, the increment B(t)−B(s)has a normal distribution with mean 0 and variance t− s.(iii) The paths of B are continuous a.s.
Property (ii) says that a Brownian motion B has stationary, independentincrements. From this one can show that B is a Markov process. Consequently, a Brownian motion is a diffusion process — a Markov process withcontinuous sample paths. The next section establishes the existence of Brownian motion as a special type of Gaussian process. An introduction to Brownianmotion in R
d is in Section 5.14.Because the increments of a Brownian motion B are stationary, inde
pendent and normally distributed, its finitedimensional distributions aretractable. The normal density of B(t) with mean 0 and variance t is
fB(t)(x) =1√2πt
e−x2/2t.
5.1 Definition and Strong Markov Property 343
Denoting this density by ϕ(x; t), it follows by induction and properties (i)and (ii) that, for 0 = t0 < t1 < · · · < tn and x0 = 0, the joint density ofB(t1), . . . , B(tn) is
fB(t1),...,B(tn)(x1, . . . , xn) =n∏
m=1
ϕ(xm − xm−1;√tm − tm−1). (5.1)
Another nice feature is that the covariance between B(s) and B(t) is
E[B(s)B(t)] = s ∧ t. (5.2)
This follows since, for s < t,
Cov(B(s), B(t)) = E[B(s)B(t)] = E[B(s)[(B(t) −B(s)) +B(s)]]= E[B(s)2] = s.
Several elementary functions of a Brownian motion are also Brownian motions; see Exercise 1. Here is an obvious example.
Example 2. Symmetry Property. The process −B(t), which is B reflectedabout 0, is a Brownian motion (i.e., −B d= B).
As a generalization of a standard Brownian motion B, consider the process
X(t) = x+ μt+ σB(t), t ≥ 0.
Any process equal in distribution to X is a Brownian motion with drift: xis its initial value, μ is its drift coefficient, and σ > 0 is its variation. Manyproperties of a Brownian motion with drift readily follow from properties ofa standard Brownian motion. For instance, X has stationary, independentincrements and X(t + s) − X(s) is normally distributed with mean μt andvariance σ2t. Drift and variability parameters may be useful in Brownianmodels for representing certain trends and volatilities.
Now, let us see how Brownian motions are related to diffusion processes.Generally speaking, a diffusion process is a Markov process with continuouspaths. Most diffusion processes in applications, however, have the followingform. Suppose that X(t) : t ≥ 0 is a realvalued Markov process withcontinuous paths a.s. that satisfies the following properties: For each x ∈ R,t ≥ 0, and ε > 0,
limh↓0
h−1PX(t+ h) −X(t) > εX(t) = x = 0,
limh↓0
h−1E[X(t+ h) −X(t)X(t) = x] = μ(x, t),
limh↓0
h−1E[(X(t+ h) −X(t))2X(t) = x] = σ(x, t),
344 5 Brownian Motion
where μ and σ are functions on R × R+. The X is a diffusion process on R
with drift parameter μ(x, t) and diffusion parameter σ(x, t).As a prime example, a Brownian motion with drift X(t) = μt+ σB(t), is
a diffusion process whose drift and diffusion parameters μ and σ are independent of x and t. Many functions of Brownian motions are also diffusions(e.g., the OrnsteinUhlenbeck and Bessel Processes in Examples 8 and 64).
We end this introductory section with the strong Markov property forBrownian motion. Suppose that B is a Brownian motion on a probabilityspace (Ω,F , P ). Let FB
t ⊆ F be the σfield generated by B(s) : s ∈ [0, t],and assume FB
0 includes all sets of P probability 0 to make it complete. Astopping time of the filtration FB
t is a random time τ , possibly infinite, suchthat τ ≤ t ∈ FB
t , t ∈ R+. The σfield of events up to time τ is
FBτ = A ∈ F : A ∩ τ ≤ t ∈ Ft, t ∈ R+.
If τ1 and τ2 are two FBt stopping times and τ1 ≤ τ2, then FB
τ1 ⊆ FBτ2 .
Theorem 3. If τ is an a.s. finite stopping time for a Brownian motion B,then the process B′(t) = B(τ + t) − B(τ), t ∈ R+, is a Brownian motionindependent of FB
τ .
Proof. We will prove this only for a stopping time τ that is a.s. bounded(τ ≤ u a.s. for some u > 0). Clearly B′ has continuous sample paths a.s. Itremains to show that the increments of B′ are independent and independentof FB
τ , and B′(s+t)−B′(s) is normally distributed with mean 0 and variancet. These properties will follow upon showing that, for any 0 ≤ t0 < · · · < tn,and u1, . . . , un in R+,
E[eSn FBτ ] = e
12
∑ni=1 u
2i (ti−ti−1) a.s., (5.3)
where Sn =∑ni=1 ui[B
′(ti) −B′(ti−1)].The proof of (5.3) will be by induction. First note that
E[eSn+1 FBτ ] = E
[eSnE[eSn+1−Sn FB
τ+tn ]∣∣∣FB
τ
]. (5.4)
Now, since τ + tn is a bounded stopping time, using Example 26 below,
E[eSn+1−Sn FBτ+tn ] = E[eun+1[B(τ+tn+1)−B(τ+tn)]FB
τ+tn ]
= e12u
2n+1(tn+1−tn).
This expression with n = 0 and S0 = 0 proves (5.3) for n = 1. Next assuming(5.3) is true for some n, then using the last display and (5.3) in (5.4) yields(5.3) for n+ 1.
5.2 Brownian Motion as a Gaussian Process 345
5.2 Brownian Motion as a Gaussian Process
This section shows that Brownian motion is a special type of Gaussian Process. Included is a proof of the existence of Gaussian processes, which leadsto the existence of Brownian motion.
We begin with a discussion of multivariate normal distributions. Supposethat X1, . . . , Xn are normally distributed (not necessarily independent) random variables with means m1, . . . ,mn. Then clearly, for u1, . . . , un ∈ R,
E[ n∑
i=1
uiXi
]=
∑
i
uimi, Var[ n∑
i=1
uiXi
]=
∑
i
∑
j
uiujcij , (5.5)
where cij = Cov(Xi, Xj). The vector (X1, . . . , Xn) is said to have a multivariate normal (or Gaussian) distribution if
∑ni=1 uiXi has a normal distribution
for any u1, . . . , un in R. In light of (5.5), the (X1, . . . , Xn) has a multivariatenormal distribution if and only if its moment generating function has theform
E[e∑n
i=1 uiXi
]= exp
∑
i
uimi +12
∑
i
∑
j
uiujcij
, ui ≥ 0. (5.6)
The vector (or distribution) associated with the moment generating function (5.6) is called nondegenerate if the n× n matrix C = cij has rank n.In this case, the joint density of (X1, . . . , Xn) is
f(x1, . . . , xn) =1
√(2π)nC
exp− 1
2
∑
i
∑
j
cij(xi −mi)(xj −mj), (5.7)
where cij is the inverse of C and C is its determinant.It turns out that any multivariate normal vector can be represented by
a nondegenerate one as follows. When C does not have rank n, it followsby a property of symmetric matrices that there exists a k × n matrix Awith transpose At, where k ≤ n, such that C = AtA. Let X denote themultivariate normal vector as a 1 × n matrix with mean vector m. Supposethat Y is a 1×k nondegenerate multivariate vector of i.i.d. random variablesY1, . . . , Yk that are normally distributed with mean 0 and variance 1. Thenthe multivariate normal vector, has the representation
Xd= m+ Y A. (5.8)
This equality in distribution follows because the moment generating functionof m+ Y A is equal to (5.6). Indeed,
346 5 Brownian Motion
E
⎡
⎣exp n∑
i=1
uimi +n∑
i=1
ui(k∑
j=1
aji)Yj⎤
⎦ = exp
∑
i
uimi +v
2
,
where interchanging the summations and using C = AtA,
v = Var
⎡
⎣k∑
j=1
(n∑
i=1
uiaji
)
Yj
⎤
⎦ =k∑
j=1
(n∑
i=1
uiaji
)2
=k∑
j=1
(n∑
i=1
uiatij
)(n∑
=1
uaj
)
=∑
i
∑
j
uiujcij .
A major characteristic of a Brownian motion is that its finitedimensionaldistributions are multivariate normal. Other stochastic processes with thisproperty are as follows.
Definition 4. A stochastic process X = X(t) : t ∈ R+ is a Gaussianprocess if (X(t1), . . . , X(tn)) has a multivariate normal distribution for anyt1, . . . , tn in R+. Discretetime Gaussian processes are defined similarly.
A process X is Gaussian, of course, if and only if∑n
i=1 uiX(ti) has a normal distribution for any t1, . . . , tn in R+ and u1, . . . , un in R. A Gaussian processX is called nondegenerate if its covariance matrix cij = Cov(X(ti), X(tj))has rank n, for any t1, . . . , tn in R+. In that case,
(X(t1), . . . , X(tn)
)has a
multivariate normal density as in (5.7).The next result establishes the existence of Gaussian processes. It also
shows that the distribution of a Gaussian process is determined by its meanand covariance functions. So two Gaussian processes are equal in distributionif and only if their mean and covariance functions are equal. Let c(s, t) be arealvalued function on R
2+ that satisfies the following properties:
c(s, t) = c(t, s), s, t,∈ R+. (Symmetric)
For any finite set I ⊂ R+ and ut ∈ R,∑
t∈I
∑
s∈Iusutc(s, t) ≥ 0. (Nonnegativedefinite)
Theorem 5. For any realvalued function m(t) and the function c(s, t) described above, there exists a Gaussian process X(t) : t ∈ R+ defined on aprobability space (Ω,F , P ) = (RR+ ,BR+ , P ), with E[X(t)] = m(t) and
Cov(X(s), X(t)) = c(s, t), s, t,∈ R+.
Furthermore, the distribution of this process is determined by the functionsm(t) and c(s, t).
5.2 Brownian Motion as a Gaussian Process 347
Proof. We begin by defining finitedimensional probability measures μI fora process on (RR+ ,BR+ , P ). For any finite subset I in R+, let μI be theprobability measure specified by μI(×t∈IAt), At ∈ B, t ∈ I, that has thejoint normal moment generating function
G(uI) = exp∑
t∈Iutm(t) +
12
∑
t∈I
∑
s∈Iusutc(s, t)
, uI = (ut : t ∈ R+).
Note that for I ⊆ J ,
G(uJ) = G(uI), if ut = 0, t ∈ J\I.
Consequently, the joint normal distributions μI satisfy the consistency condition that, for any I ⊆ J for finite J and At ∈ B, t ∈ J ,
μJ(×t∈JAt) = μI(×t∈IAt), if At = R for t ∈ J\I. (5.9)
Then it follows by Kolmogorov’s extension theorem (Theorem 5 in the Appendix), that there exists a stochastic process X(t) : t ∈ R+ defined on theprobability space (Ω,F , P ) = (RR+ ,BR+ , P ), whose finitedimensional probability measures are given by μI . Since the μI are determined by m(t) andc(s, t), so is the distribution of X . Moreover, from the moment generatingfunction for the μI , it follows that
E[X(t)] = m(t), Cov(X(s), X(t)) = c(s, t).
Brownian motion is a quintessential example of a Gaussian process.
Proposition 6. A Brownian motion with drift X(t) = μt+ σB(t), t ≥ 0, isa Gaussian process with continuous sample paths a.s. starting at X(0) = 0and its mean and covariance functions are
E[X(t)] = μt, Cov(X(s), X(t)) = σ2(s ∧ t), s, t ∈ R+.
Proof. For any 0 = t0 < t1 < · · · < tn, letting Yi = X(ti)−X(ti−1), we have
n∑
i=1
uiX(ti) =n∑
i=1
ui
i∑
k=1
Yi =n∑
k=1
(n∑
i=k
ui
)
Yk, u1, . . . , un ∈ R.
Now the increments Yi are independent, normally distributed random variables with mean 0 and variance σ2(ti− ti−1). Then the last doublesum termhas a normal distribution, and so (X(t1), . . . , X(tn)) has a multivariate normal distribution. Hence X is a Gaussian process, and its mean and varianceare clearly as shown.
The preceding characterization is useful for verifying that a process isa Brownian motion, especially when the multivariate normality condition
348 5 Brownian Motion
is easy to verify (as in Exercise 2). There are other interesting Gaussianprocesses that do not have stationary independent increments; see Example 8below and Exercise 10.
One approach for establishing the existence of a Brownian motion is toconstruct it as a Gaussian process as follows.
Theorem 7. There exists a stochastic process B(t) : t ≥ 0 defined ona probability space (Ω,F , P ) = (RR+ ,BR+ , P ) such that B is a Brownianmotion.
Sketch of Proof. Let B(t) : t ≥ 0 be a Gaussian process as constructedin the proof of Theorem 5 with the special Brownian functions m(t) = 0and c(s, t) = s ∧ t. A major result (whose proof is omitted) says that thisprocess has stationary independent increments, and B(t)−B(s), for s < t, isnormally distributed with mean 0 and variance t−s. A second step is needed,however, to justify that such a process has continuous sample paths.
Since B(t) −B(s) d= (t− s)1/2B(1), for s < t, the process satisfies
E[B(t) −B(s)a] = (t− s)a/2E[B(1)a] <∞, a > 0.
Using this property, another major result shows that B can be chosen so thatits sample paths are continuous, and hence it is a Brownian motion. The results that complete the preceding two steps are proved in [64].
A Brownian motion is an example of a Markov process with continuouspaths that is a Gaussian process. Are there Markov processes with continuous paths (i.e., diffusion processes), other than Brownian motions, that areGaussian? Yes there are — here is an important example.
Example 8. An OrnsteinUhlenbeck Process is a stationary Gaussian processX(t) : t ≥ 0 with continuous sample paths whose mean function is 0 andwhose covariance function is
Cov(X(s), X(t)) =σ2
2αe−αs−t, s, t ≥ 0,
where α and σ are positive. This process as proved in [61] is the only stationary Gaussian process with a continuous covariance function that is a Markovprocess. (Exercise 9 shows that a Gaussian processX is stationary if and onlyif its mean function is a constant and its covariance function Cov(X(s), X(t))only depends on t− s.)
It is interesting that the process X is also a function of a Brownian motionB in that X is equal in distribution to the process
Y (t) = σe−αtB(eαt/2α), t ≥ 0.
To see this, note that Y has continuous sample paths and clearly it is Gaussiansince B is. In addition, E[Y (t)] = 0 for each t and, for s < t,
5.3 Maximum Process and Hitting Times 349
Cov(Y (s), Y (t)) = σ2e−α(s+t)E[B(eαs/2α)B(eαt/2α)
]
=σ2
2αe−α(t−s).
Consequently, Y is a stationary Gaussian process and it has the same meanand covariance functions as X . Hence Y d= X .
The OrnsteinUhlenbeck process X defined above satisfies the stochasticdifferential equation
dX(t) = −αX(t)dt+ σdB(t). (5.10)
The example above assumes that X(0) has a normal distribution with mean0 and variance σ2. The solution of this equation is
X(t) = X(0)e−αt + σ
∫ t
0
e−α(t−s)dB(s).
The stochastic differential equation and the integral with respect to Brownianmotion, which is beyond the scope of this work, is discussed in [61, 64].
Scientists realized that the Brownian motion representation for a particlemoving in a medium was an idealized model in that it does not account forfriction in the medium. To incorporate friction in the model, Langevin 1908proposed that the OrnsteinUhlenbeck processX could represent the velocityof a particle undergoing a Brownian motion subject to friction. He assumedthat the rate of change in the velocity satisfies (5.10), where −αX(t)dtmodelsthe change due to friction; the friction works in the opposite direction to thevelocity and α is the coefficient of friction divided by the mass of the particle.The stochastic process for this model was formalized later by Ornstein andUhlenbeck 1930 and Doob 1942.
5.3 Maximum Process and Hitting Times
For a Brownian motion B, its cumulative maximum process is
M(t) = maxs≤t
B(s), t ≥ 0.
This process is related to the hitting times
τa = inft > 0 : B(t) = a, a ∈ R.
Namely, for each a ≥ 0 and t,
τa ≤ t = M(t) ≥ a. (5.11)
350 5 Brownian Motion
In other words, the distribution of the maximum process is determined bythat of the hitting times and vice versa. This section presents expressions forthese distributions and an important property of the hitting times.
We begin with a preliminary fact.
Remark 9. The hitting time τa is an a.s. finite stopping time of B.
The τa is a stopping time since B has continuous paths a.s. Its finitenessfollows by Theorem 32 below, which is proved by the martingale optionalstopping theorem in the next section. The finiteness also follows by the consequence (5.30) of the law of the iterated logarithm in Theorem 38 below.
The first result is a reflection principle that an increment B(t) − B(τ)after a stopping time τ has the same distribution as the reflected increment−(B(t) − B(τ)). This is basically the symmetry property B
d= −B in Example 2 manifested at the stopping time τ . A version of this principle forstochastic processes is in Exercises 20 and 21.
Proposition 10. (Reflection Principle) If τ is an a.s. finite stopping time ofB, then, for any a and t,
PB(t) −B(τ) ≤ a, τ ≤ t = PB(t) −B(τ) ≥ −a, τ ≤ t.
Proof. Letting B′(t) = B(τ + t) −B(τ), t ≥ 0, we can write
B(t) −B(τ) = B′(t− τ), for τ ≤ t. (5.12)
By the strong Markov property in Theorem 3, B′ is a Brownian motionindependent of Fτ . Using this and (5.12) along with the symmetry propertyB′ d= −B′ and τ ≤ t ∈ Fτ ,
PB(t) −B(τ) ≤ a, τ ≤ t = E[PB′(t− τ) ≤ aτ ≤ t,Fτ
]Pτ ≤ t
= E[P−B′(t− τ) ≤ aτ ≤ t
]Pτ ≤ t
= PB′(t− τ) ≥ −a, τ ≤ t.
Then using (5.12) in the last probability completes the proof.
We will now apply the reflection principle to obtain an expression for thejoint distribution of B(t) and M(t).
Theorem 11. For x < y and y ≥ 0,
PB(t) ≤ x,M(t) ≥ y = PB(t) ≥ 2y − x, (5.13)PM(t) ≥ y = 2PB(t) ≥ y.
Furthermore, M(t) d= B(t) for each t, and the density of M(t) is
5.3 Maximum Process and Hitting Times 351
fM(t)(x) =2√2πt
e−x2/2t, x ≥ 0.
HenceE[M(t)] =
√2t/π, Var[M(t)] = (1 − 2/π)t. (5.14)
Proof. Assertion (5.13) follows since by (5.11) and Proposition 10 with τ =τy, B(τ) = y, and a = x− y, we have, for x ≤ y and y ≥ 0,
PB(t) ≤ x,M(t) ≥ y = PB(t) ≤ x, τy ≤ t= PB(t) ≥ 2y − x, τy ≤ t= PB(t) ≥ 2y − x.
The last equality is because 2y − x ≥ y and
B(t) ≥ 2y − x ⊆ B(t) ≥ y ⊆ τy ≤ t.
Next, using what we just proved with x = y, we have
PM(t) ≥ y = PB(t) ≤ y,M(t) ≥ y + PB(t) ≥ y,M(t) ≥ y= 2PB(t) ≥ y.
Taking the derivative of this with respect to y yields the density of M(t). Inaddition, 2PB(t) ≥ y = PB(t) ≥ y implies M(t) d= B(t). Exercise 11proves (5.14).
Even though M(t) d= B(t) for each t, the processes M and B are notequal in distribution; M is nondecreasing while B is not. Exercise 19 pointsout the interesting equality in distribution M d= M −B for the processes.
Because of the reflection property −B d= B of a Brownian motion B, itsminima is also a reflection of its maxima.
Remark 12. The minimum process for B is
M(t) = mins≤t
B(s), t ≥ 0.
It is related to the maximum process M by M d= −M . Hence
PM(t) ≤ a = 2PB(t) ≥ −a, a < 0.
That M d= −M follows by M(t) = −maxs≤t−B(t) and the reflection prop
erty −B d= B. Also, Theorem 11 yields the distribution of M(t).
We will now obtain the distribution of the hitting time τa from Theorem 11. This result is also a special case of Theorem 32 for hitting times fora Brownian motion with drift, which also contains the Laplace transform ofτa and shows that E[τa] = ∞.
352 5 Brownian Motion
Corollary 13. For any a ≥ 0,
Pτa ≤ t = 2[1 − Φ(a/√t)], t > 0,
where Φ is the standard normal distribution. Hence, the density of τa is
fτa(t) =a√2πt3
e−a2/2t, t > 0.
Proof. From (5.11), Theorem 11, and B(t) d=√tB(1), we have
Pτa ≤ t = PM(t) ≥ a= 2PB(t) ≥ a = 2[1 − Φ(a/
√t)].
Taking the derivative of this yields the density of τa.
The family of hitting times τa : a ≥ 0 for B is an important process inits own right. It is the nondecreasing leftcontinuous inverse process of themaximum process M since
τa = inft : B(t) = a = inft : M(t) = a.
By Corollary 13, we know the density of τa and that E[τa] = ∞. Here is moreinformation about these hitting times.
Proposition 14. The process τa : a ≥ 0 has stationary independent increments and, for a < b, the increment τb − τa is independent of Fτa and it isequal in distribution to τ(b−a).
Proof. Since τb − τa = inft : B(τa + t) − B(τa) = b − a, it follows by thestrong Markov property at τa that τb − τa is independent of Fτa and it isequal in distribution to τb−a. Also, it follows by an induction argument thatτa : a ≥ 0 has stationary independent increments.
5.4 Special Random Times
In this section, we derive arc sine and arc cosine probabilities for certainrandom times of Brownian motion by applying properties of the maximumprocess.
We first consider two random times associated with a Brownian motion Bon the interval [0, 1] and its maximum process M(t) = maxs≤t B(s). Thesetimes have the same arc sine distribution, which is discussed in Exercise 14.
Theorem 15. (Levy Arc Sine Law) For a Brownian motion B on [0, 1], thetime τ = inft ∈ [0, 1] : B(t) = M(1) has the arc sine distribution
5.4 Special Random Times 353
Pτ ≤ t =2π
arcsin√t, t ∈ [0, 1]. (5.15)
In addition, the time τ ′ = supt ∈ [0, 1] : B(t) = 0 has the same distribution.
Proof. First note that, for t ≤ 1,
τ ≤ t ⇐⇒ maxs≤t
B(s) −B(t) ≥ maxt≤s≤1
B(s) −B(t).
Denote the last inequality as Y1 ≥ Y2 and note that these random variablesare independent since B has independent increments. Now, by the translationand symmetry properties of B and Theorem 11,
Y1d= M(t) d= B(t) d= t1/2Z1,
Y2d= M(1 − t) d= B(1 − t) d= (1 − t)1/2Z2,
where Z1 and Z2 are normal random variables with mean 0 and variance 1.From these observations, we have
Pτ ≤ t = PY1 ≥ Y2 = PtZ21 ≥ (1 − t)Z2
2= PZ2
2/(Z21 + Z2
2 ) ≤ t, (5.16)
where we may take Z1 and Z2 to be independent. Then (5.15) follows sincethe last probability, due to the symmetry property of the normal distribution,is the arsine distribution by Exercise 14.
Next, note that by Remark 12 on the minimum process, we have
Pτ ′ < t = P maxt≤s≤1
B(s) > 0 + P mint≤s≤1
B(s) < 0
= 2PY > −B(t).
where Y = maxt≤s≤1B(s) − B(t) is independent of B(t). By the symmetryof B and Theorem 11, we have
−B(t) d= B(t) d= t1/2Z1,
Yd= M(1 − t) d= B(1 − t) d= (1 − t)1/2Z2,
where Z1 and Z2 are normal random variables with mean 0 and variance 1.Assuming Z1 and Z2 are independent, the preceding observations and (5.16)yield
Pτ ′ < t = 2P(1 − t)1/2Z2 < t1/2Z1= P(1 − t)Z2
2 < tZ21 = Pτ ≤ t.
This proves that τ ′ also has the arc sine distribution.
354 5 Brownian Motion
Next, we consider the event that a Brownian motion returns to the origin0 in a future time interval.
Theorem 16. The event A that a Brownian motion B hits 0 in a time interval [t, u] has the probability
P (A) =2π
arccos√t/u, where 0 < t < u. (5.17)
Proof. For t < u and u = 1, it follows by Theorem 15 that
P (A) = Pτ > t = 1 − 2π
arcsin√t =
2π
arccos√t.
The proof for u = 1 is Exercise 15.
5.5 Martingales
A martingale is a realvalued stochastic process defined by the property thatthe conditional mean of an “increment” of the process conditioned on pastinformation is 0. A random walk and Brownian motion whose mean stepsizes are 0 have this property. However, the increments of a martingale aregenerally dependent, unlike the independent increments of a random walk orBrownian motion. Martingales are used for proving convergence theorems,analyzing hitting times of processes, finding optimal stopping rules, and providing bounds for processes. Moreover, they are key tools in the theory ofstochastic differential equations.
In this section, we introduce martingales and discuss several examplesassociated with Brownian motion and compound Poisson processes. We alsopresent the important submartingale convergence theorem. The next two sections cover the optional stopping theorem for martingales and its applicationsto Brownian motion.
Throughout this section, X = X(t) : t ≥ 0 will denote a realvalued continuoustime stochastic process that has rightcontinuous pathsand E[ X(t) ] <∞, t ≥ 0. Associated with the underlying probability space(Ω,F , P ) for the process X , there is a filtration Ft : t ≥ 0, which isa family of σfields contained in F that is increasing (Fs ⊆ Ft, s ≤ t)and rightcontinuous (FB
t = ∩u>tFBu ), and F0 contains all events with P 
probability 0. Furthermore, the process X is adapted to the filtration Ft inthat X(t) ≤ x ∈ Ft, for each t and x.
Definition 17. The process X is a martingale with respect to Ft if
E[X(t)Fs] = X(s) a.s. 0 ≤ s < t. (5.18)
The process X is a submartingale if
5.5 Martingales 355
E[X(t)Fs] ≥ X(s) a.s. 0 ≤ s < t.
If the inequality is reversed, then X is a supermartingale.
Taking the expectation of (5.18) yields the characteristic of a martingalethat
E[X(t)] = E[X(s)], s ≤ t.
The martingale condition (5.18) is equivalent to
E[X(t) −X(s)Fs] = 0,
which says that the conditional mean of an increment conditioned on thepast is 0.
A classic illustration of a martingale is the value X(t) of an investment (orthe fortune of a gambler) at time t in a marketplace described by the eventsin Ft. The martingale property (5.18) says that the investment is subject toa “fair market” in that its expected value at any time t conditioned on theenvironment Fs up to some time s < t is the same as the value X(s).
On the other hand, the submartingale property implies that the market is biased toward “upward” movements of the value X resulting inE[X(t)] ≥ X(s) a.s., for s ≤ t. Similarly, the supermartingle property implies“downward” movements resulting in E[X(t)] ≤ X(s) a.s.
In typical applications, Ft = FYt , which is the σfield generated by the
events of a rightcontinuous process Y (s) : s ≤ t on a general state space.In this setting, we say that X is a martingale with respect to Y . In someinstances, it is natural that X is a martingale with respect to the filtrationFt = FX
t of its own history.Martingales in discrete time are defined similarly. In particular, realvalued
random variables Xn with E[ Xn ] < ∞ form a martingale with respect toincreasing σfields Fn if
E[Xn+1Fn] = Xn, n ≥ 0.
The Xn is a submartingale or supermartingale if the equality is replaced by ≥or ≤, respectively. Standard examples are sums and products of independentrandom variables; see Exercise 30.
Note that a Brownian motion B is a martingale with respect to itself since,for s ≤ t,
E[B(t)FBs ] = E
[B(t) −B(s)
∣∣∣B(s)
]+B(s) = B(s).
Similarly, if X(t) = x+ μt+ σB(t) is a Brownian motion with drift, then
E[X(t)FBs ] = μ(t− s) +X(s), s ≤ t.
356 5 Brownian Motion
Therefore,X is a martingale, submartingale, or supermartingale with respectto B according as μ is = 0, > 0, or < 0.
We will also encounter several functions of Brownian motion that are martingales of the following type.
Example 18. Martingales For Processes with Stationary Independent Increments. Suppose that Y is a realvalued process that has stationary independent increments and, for simplicity, assume that Y (0) = 0. Suppose that themoment generating function ψ(α) = E[eαY (1)] exists for α in a neighborhoodof 0, and that E[eαY (t)] as a function of t is continuous at 0, for fixed α.
Then by Exercise 7, E[eαY (t)] = ψ(α)t and
E[Y (t)] = at, Var[Y (t)] = bt,
where a = E[Y (1)] and b = Var[Y (1)]. For instance, Y may be Brownianmotion with drift, a Poisson process or a compound Poisson process.
An easy check shows that two martingales with respect to Y are
Y (t) − at, and (Y (t) − at)2 − bt, t ≥ 0.
The means of these martingales are 0.Next, consider the process
Z(t) = eαY (t)/ψ(α)t, t ≥ 0.
Clearly Z(t) is a deterministic, nonnegative function of Y (s) : s ≤ t, andE[Z(t)] = 1. Then Z is a martingale (sometimes called an exponential martingale) with respect to Y . Indeed,
E[Z(t)FYs ] = Z(s)
E[eα(Y (t)−Y (s))Z(s)]ψ(α)t−s
= Z(s).
Example 19. Martingales for Brownian Motion. For a Brownian motion withdrift Y (t) = x+μt+σB(t), the preceding example justifies that the followingfunctions of Y are martingales with respect to B:
(Y (t) − x− μt)2 − σ2t, ec[Y (t)−x−μt]−e−c2σ2t/2, t ≥ 0, c = 0.
In particular, B(t)2 − t and ecB(t)−c2t/2 are martingales with respect to B.
Having a constant mean suggests that a martingale should also have anice limiting behavior. According to the next major theorem, many submartingales as well as martingales converge a.s. to a limit. This result fordiscretetime processes also holds in continuoustime.
Theorem 20. (Submartingale Convergence) If Xn is a martingale, or a submartingale that satisfies supnE[X+
n ] <∞, then there exists a random variableX with E[ X  ] <∞ such that Xn → X a.s. as n→ ∞.
5.5 Martingales 357
This convergence can be viewed as an extension of the fact that a nondecreasing sequence of real numbers that is bounded converges to a finite limit.For a submartingale, the nondecreasing tendency is E[Xn+1Fn] ≤ Xn, anda bound on E[X+
n ] is enough to ensure convergence a.s. — the submartingaleitself need not be nondecreasing.
The theorem establishes the existence of the limit X , but it does notspecify its distribution. Properties of X can sometimes be derived in specificcases depending on characteristics of Xn.
In addition to the convergence Xn → X a.s., it follows by Theorem 15 inthe Appendix that E[ Xn −X  ] → 0 as n→ ∞ when the Xn are uniformlyintegrable.
Although Theorem 20 is a major result, we will only give the following example since it is not needed for our results. For a proof and other applications,see for instance [37, 61, 62, 64].
Example 21. Doob Martingale. Let Z be a random variable with E[ Z ] <∞,and let Fn be an increasing filtration on the underlying probability space forZ. The conditional expectation
Xn = E[Z Fn], n ≥ 1,
is a martingale with respect to Fn. Then by Theorem 20
X = limn→∞
E[ZFn] exists a.s.
That Xn is a martingale follows since
E[ Xn ] ≤ E[E[ Z Fn]
]= E[ Z ] <∞,
E[Xn+1Fn] = E[E[Z Fn+1]
∣∣∣Fn]
]= E[Z Fn] = Xn.
Consider the case Xn = E[Z Y1, . . . , Yn] in which Xn is a martingale withrespect to Yn. For instance, Xn could be an estimate for the mean of Z basedon observations Y1, . . . , Yn associated with Z. By an additional argument itfollows that the limit of Xn is X = E[Z Y1, Y2, . . .]. Therefore,
E[Z Y1, . . . , Yn] → E[Z Y1, Y2, . . .] a.s. as n→ ∞.
In particular, if Z is the indicator of an event A in the σfield generated byY1, Y2, . . ., then
P (A Y1, . . . , Yn) → P (A) a.s. as n→ ∞.
358 5 Brownian Motion
5.6 Optional Stopping of Martingales
This section presents the optional stopping theorem for martingales. It wasinstrumental in the proof of the strong Markov property for Brownian motionin Theorem 3; the next section uses it to analyze hitting times of Brownianmotion.
For the following discussion, suppose that X is a martingale with respectto a filtration Ft, and that τ is a stopping time of the filtration: τ ≤ t ∈ Ft,for each t.
We will now address the following questions: Is the martingale property,E[X(t)] = E[X(0)] also true when t is a stopping time? More generally, isE[X(σ)] = E[X(τ)] true for any stopping times σ and τ?
The optional stopping theorem below says that E[X(τ)] = E[X(0)] isindeed true for a bounded stopping time τ . A corollary is that the equalityis also true for a finite stopping time when X satisfies certain bounds. Thiswould imply, for instance, that the expected value of an investment in a fairmarket at the stopping time is the same as the initial value. In other words,in this fair market, there would be no benefit for the investor to choose tostop and freeze his investment at a time depending only on the past historyof the market (independent of the future).
There are several optional stopping theorems with slightly different assumptions. For our purposes, we will use the following version from [61]. Itsdiscretetime version is Theorem 28 below.
Theorem 22. Associated with the martingale X, assume that σ and τ arestopping times of Ft such that τ is bounded a.s. Then
X(σ ∧ τ) = E[X(τ)Fσ ] a.s. (5.19)
Hence E[X(τ)] = E[X(0)]. If σ is also bounded, then E[X(σ)] = E[X(τ)].
Proof. Our proof for this continuous time setting uses an approximationbased on the analogous discretetime result in Theorem 28 below. For a fixedn ∈ Z+, let Xk = X(k2−n), k ∈ Z+. Clearly Xk is a discretetime martingalewith respect to Fk = Fk2−n . Define
σn = 2nσ + 1/2n, τn = 2nτ + 1/2n.
Now σ′m = 2nσm for fixed m, and τ ′n = 2nτn are integervalued stoppingtimes of Fk. Then by Theorem 28 below,
Xσ′m∧τ ′
n= E[Xτ ′
nFσ′
m] a.s.
This expression in terms of the preceding definitions is
X(σm ∧ τn) = E[X(τn)Fσm ] a.s.
5.6 Optional Stopping of Martingales 359
Letting m→ ∞ in this expression results in σm → σ and
X(σ ∧ τn) = E[X(τn)Fσ] a.s.
Then letting n→ ∞ in this expression yields (5.19). The justifications of thelast two limit statements, which are in [61], will not be covered here.
The assertion E[X(0)] = E[X(τ)] follows by taking expectations in (5.19)with σ = 0. Finally, when σ as well as τ is bounded, then (5.19) and thisexpression with the roles of σ and τ reversed yield
E[X(τ)Fσ] = X(σ ∧ τ) = E[X(σ)Fτ ] a.s.
Then expectations of these terms give E[X(σ)] = E[X(τ)].
Theorem 22 can also be extended to stopping times that are a.s. finite, butnot necessarily bounded. To see this, suppose that τ is an a.s. finite stoppingtime of Ft. The key idea is that, for fixed s and t, the τ ∧ s and τ ∧ t are a.s.bounded stopping times of Ft. Then by Theorem 22,
X(τ ∧ s) = E[X(τ ∧ t)Fτ∧s], s < t.
This property justifies the following fact, which is used in the proof below.Remark 23. Stopped Martingales. The stopped processX(τ∧t) is a martingalewith respect to Ft.
Corollary 24. Associated with the martingale X, suppose that τ is an a.s.finite stopping time of Ft, and that either one of the following conditions issatisfied.(i) E
[supt≤τ X(t)
]<∞.
(ii) E[ X(τ) ] <∞, and limu→∞ E[ X(u)1(τ > u)] = 0.Then E[X(τ)] = E[X(0)].
Proof. Since X(τ ∧ t) is a martingale with respect to Ft, Theorem 22 impliesE[X(τ ∧ u)] = E[X(0)], for u > 0. Now, we can write
E[X(τ)] − E[X(0)] = E[X(τ)] − E[X(τ ∧ u)]≤ E[ X(τ) −X(u)1(τ > u)] .
If (i) holds, then X(τ)−X(u) ≤ 2Z, where Z = supt≤τ X(t). Since τ isfinite a.s., 1(τ > u) → 0 a.s. as u→ ∞, and so by the dominated convergencetheorem,
E[X(τ)] − E[X(0)] ≤ 2E[Z1(τ > u)] → 0.
On the other hand, if (ii) holds, then
E[X(τ)] − E[X(0)] ≤ E[(
X(τ) + X(u))1(τ > u)
]→ 0.
Thus, E[X(τ)] = E[X(0)] if either (i) or (ii) is satisfied.
360 5 Brownian Motion
The next proposition and example illustrate computations involving optional stopping.
Proposition 25. (Wald Identities) Let X be a process with stationary independent increments as in Example 18, with E[ X(1) ] < ∞ and ψ(α) =E[eαX(1)]. Suppose τ is an a.s. finite stopping time of X. Then
E[X(τ)] = E[X(1)]E[τ ].
If in addition τ is bounded or E[supt≤τ X(t)
]<∞, then
E[eαX(τ)ψ(α)−τ ] = 1, for any α with ψ(α) ≥ 1. (5.20)
Proof. Example 18 establishes that X(t) − E[X(1)]t is a martingale withrespect to X . Now, τ ∧ t is a bounded stopping time of X , and so by theoptimal stopping theorem, E[X(τ ∧ t) −E[X(1)](τ ∧ t)] = 0. Letting t→ ∞in this expression, the dominated and monotone convergence theorems yieldE[X(τ)] = E[X(1)]E[τ ].
Similarly, Z(t) = eαX(t)ψ(α)−t is a martingale with respect to X , andunder the assumptions the optional stopping theorem or Corollary 24 implythat E[Z(τ)] = E[Z(0)] = 1, which gives (5.20).
Example 26. Brownian Optional Stopping. For a Brownian motion B, weknow by Example 19 that the processes B(t) and B(t)2 − t are martingaleswith respect to B with zero means. Then as in the preceding proposition, wehave the following result.
If τ is an a.s. finite stopping time of B, then E[B(τ)] = 0. In addition,E[τ ] = E[B(τ)2] if τ is bounded a.s.
Example 19 also noted that X(t) = ecB(t)−c2t/2 is a martingale with respect to B with mean 1. If τ is an a.s. bounded stopping time of B, thenE[X(τ)] = 1 by the optional stopping theorem. Consequently, the conditionalmoment generating function for an increment of B following τ is
E[ec[B(τ+u)−B(τ)]Fτ ] = ec2u/2 = E[ecB(u)].
This was the key step in proving the strong Markov property ofB for boundedstopping times.
The rest of this section is devoted to proving the discretetime optionalstopping theorem used in the proof of Theorem 22. We begin with a preliminary result.
Proposition 27. Let X and Y be random variables on a probability space,and let F and G be two σfields on the space. Suppose there is an eventA ∈ F ∩ G such that X = Y a.s. on A and F = G on A (A ∩ F = A ∩ G).Then E[X F ] = E[Y G] a.s. on A.
5.7 Hitting Times for Brownian Motion with Drift 361
Proof. Let Z = E[X F ]−E[Y G] and C = A∩Z > 0. Under the hypotheses, C ∈ F ∩ G and
E[Z1C ] = E[E[X F ]1C − E[Y G]1C
]= E[X1C − Y 1C ] = 0.
Here 1C is the random variable 1(ω ∈ C). Because a nonnegative randomvariable V is 0 a.s. if and only if E[V ] = 0, it follows that Z1C = 0 a.s.,which implies Z ≤ 0 a.s. on A. A similar argument with C = A ∩ Z < 0,shows Z ≥ 0 a.s. on A. This proves the assertion.
Theorem 28. Suppose that Xn : n ∈ Z+ is a martingale with respect toFn, and that σ and τ are stopping times of Fn such that τ is bounded a.s.Then
Xσ∧τ = E[Xτ Fσ] a.s..
Hence E[Xτ ] = E[X0]. If σ is also bounded, then E[Xσ] = E[Xτ ].
Proof. For m ≤ n, one can show that Fτ = Fm on τ = m. Then byProposition 27 and the martingale property,
E[XnFτ ] = E[XnFm] = Xm = Xτ , a.s. on τ = m.
Since this is true for each m ≤ n, we have
E[XnFτ ] = Xτ , a.s. if τ ≤ n a.s. (5.21)
Now, consider the case σ ≤ τ ≤ n a.s. Then Fσ ⊆ Fτ . Using this and(5.21) for τ and for σ, we get
E[Xτ Fσ] = E[E[XnFτ ]
∣∣∣Fσ
]= E[XnFσ] = Xσ a.s.
In addition, E[Xτ Fσ] = Xτ a.s. if τ ≤ σ ∧ n.For the general case, similar reasoning using the preceding two results and
Proposition 27 yield
E[Xτ Fσ] = E[Xτ Fσ∧τ ] = Xσ∧τ a.s. on σ ≤ τE[Xτ Fσ] = E[Xσ∧τ Fσ] = Xσ∧τ a.s. on σ > τ.
This provesXσ∧τ = E[Xτ Fσ] a.s. The other assertions follow as in the proofof Theorem 22.
5.7 Hitting Times for Brownian Motion with Drift
We will now address the following questions for a Brownian motion with drift.What is the probability that the process hits b before it hits a, for a < b?
362 5 Brownian Motion
What is the distribution and mean of the time for the process to hit b? Weanswer these questions by applications of the material in the preceding sectionon martingales and optional stopping.
Consider a Brownian motion with drift X(t) = x+μt+σB(t), where B isa standard Brownian motion. For a < x < b, let τa and τb denote the timesat which X hits the respective states a and b. In addition, let τ = τb ∧ τa,which is the time at which X escapes from the open strip (a, b). Our focuswill be on properties of these hitting times.
Remark 29. Finiteness of Hitting Times. If μ ≥ 0, then τb is finite a.s. sinceusing Remark 9,
τb = inft ≥ 0 : X(t) = b ≤ inft ≥ 0 : B(t) = (b − x)/σ <∞ a.s.
Similarly, τa is finite a.s. if μ ≤ 0. Also, τ is finite since either τa or τb isnecessarily finite.
We begin with a result for a Brownian motion with no drift.
Theorem 30. The probability that the process X(t) = x+σB(t) hits b beforea is
Pτb < τa = (x− a)/(b− a). (5.22)
Also, E[τ ] = (x− a)(b − x)/σ2.
Proof. By Example 19, X is a martingale with respect to B with mean x.Also, E[supt≤τ X(t)] is finite since X(t) ∈ (a, b) for t ≤ τ . Then by theoptional stopping theorem (Theorem 22) for τ ,
E[X(τ)] = E[X(0)] = x. (5.23)
Now, since τ = τa ∧ τb, we can write
X(τ) = a1(τa ≤ τb) + b1(τb < τa). (5.24)
ThenE[X(τ)] = a[1 − Pτb < τa] + bPτb < τa.
Substituting this in (5.23) and solving for Pτb < τa, we obtain (5.22).Next, we know by Example 19 that Z(t) = (X(t)−x)2−σ2t is a martingale
with respect to B. Then the optional stopping theorem for the boundedstopping time τ ∧ t yields E[Z(τ ∧ t)] = E[Z(0)] = 0. That is,
σ2E[τ ∧ t] = E[(X(τ ∧ t) − x)2].
Now since τ ∧ t ↑ τ and X(t) is bounded for t ≤ τ , it follows by the monotoneand bounded convergence theorems that
σ2E[τ ] = E[(X(τ) − x)2].
5.7 Hitting Times for Brownian Motion with Drift 363
Then using (5.24) in the last expectation followed by (5.22), we have
σ2E[τ ] = (a− x)2Pτa ≤ τb + (b− x)2Pτb < τa= (x − a)(b− x).
The preceding result for a Brownian motion with no drift has the followinganalogue for a Brownian motion X(t) = x+ μt+ σB(t) with drift μ = 0.
Theorem 31. The probability that the process X hits b before a is
Pτb < τa =eαx − eαa
eαb − eαa, (5.25)
where α = −2μ/σ2. In addition,
E[τ ] = μ−1[(a− x) + (b − a)Pτb < τa
]. (5.26)
Proof. As in Example 19, Z(t) = expcX(t) − cx − (cμ + c2σ2/2)t is amartingale with respect to B. Letting c = α, this martingale reduces toZ(t) = eαX(t)−αx.
Now, E[supt≤τ Z(t)] is finite, since X(t) ∈ [a, b] for t ≤ τ . Then byCorollary 24 on optional stopping,
1 = E[Z(0)] = E[Z(τ)] = e−αxE[eαX(τ)].
Now, using X(τ) = a1(τa ≤ τb) + b1(τb < τa) in this expression yields
eαx = eαa(1 − Pτb < τa) + eαbPτb < τa.
This proves (5.25).To determine E[τ ], we apply the optional stopping theorem to the mar
tingale B(t) = σ−1[X(τ) − x − μτ ] and the bounded stopping time τ ∧ t toget 0 = E[B(τ ∧ t)]. That is,
μE[τ ∧ t] + x = E[X(τ ∧ t)].
Letting t → ∞ in this expression, we have τ ∧ t ↑ τ , and so the monotoneand bounded convergence theorems yield
μE[τ ] + x = E[X(τ)] = bPτb < τa + a(1 − Pτb < τa).
This proves (5.26).
The last result of this section characterizes the distribution of the hittingtime τb for a Brownian motion X(t) = μt+ σB(t) with drift μ.
Theorem 32. Let τb denote the time at which the Brownian motion X hitsb > 0. If μ ≥ 0, then E[τb] = −b/μ (which is ∞ if μ = 0) and the Laplacetransform and density of τb are
364 5 Brownian Motion
E[e−λτb ] = exp−bσ−2[√μ2 + 2σ2λ− μ], (5.27)
fτb(x) =
b
σ√
2πx3exp−(b− μx)2/(2σ2x. (5.28)
If μ < 0, then τb may be infinite and Pτb <∞ = e2bμ/σ2.
Proof. For the case μ < 0, it follows by (5.25) (with x = 0 and α > 0) that
Pτb <∞ = lima→−∞
Pτb < τa = e2bμ/σ2.
Next, consider the case μ ≥ 0. For positive constants α and λ, considerthe process Z(t) = eαX(t)−λt. This is a martingale (see Example 19) and itreduces to Z(t) = ecB(t)−c2t/2, where
c = ασ, α = σ−2[√μ2 + 2σ2λ− μ].
This choice of α ensures that α2σ2/2 + αμ− λ = 0.Now, applying the optional stopping theorem to the martingale Z(t) and
the bounded stopping time τb ∧ t, we obtain
1 = E[Z(0)] = E[Z(τb ∧ t)] = E[eαX(τb∧t)−λ(τb∧t)].
Since X is continuous a.s., we have
limt→∞
[αX(τb ∧ t) − λ(τb ∧ t)] = αb− λτb a.s.
Then by the preceding displays and the bounded convergence theorem,
1 = E[ limt→∞
Z(τb ∧ t)] = eαbE[e−λτb ].
This proves (5.27). Inverting this Laplace transform yields the density formula(5.28). Finally, the derivative of this transform at λ = 0 yields E[τb] = −b/μ.
5.8 Limiting Averages and Law of the IteratedLogarithm
This section contains strong laws of large numbers for Brownian motion andits maximum process, and a law of the iterated logarithm for Brownian motion.
As usual B will denote a standard Brownian motion. The strong law oflarge numbers for it is as follows.
Theorem 33. A Brownian motion B has the limiting average
5.8 Limiting Averages and Law of the Iterated Logarithm 365
limt→∞
t−1B(t) = 0 a.s.
Proof. Since B has stationary independent increments, it has regenerativeincrements with respect to the deterministic times Tn = n. Then the assertionwill follow by the SLNN in Theorem 54 in Chapter 2 for processes withregenerative increments upon showing that n−1B(n) → 0 a.s., and
E[
maxn−1≤t≤n
B(t)]<∞. (5.29)
Now, the SLLN for i.i.d. random variables ensures that
n−1B(n) = n−1n∑
m=1
[B(m) −B(m− 1)] → E[B(1)] = 0, a.s.
Also, (5.29) follows since E[M(1)] <∞ and
maxn−1≤t≤n
B(t) d= max0≤t≤1
B(t) ≤M(1) −M(1),
where M(t) = mins≤tB(s) d= −M(t) by Remark 12.
If a realvalued process X , such as a Brownian motion or a functional of aMarkov process, has a limiting average t−1X(t) → c a.s., you might wonderif its maximum M(t) = sups≤tX(s) also satisfies t−1M(t) → c a.s. Wonderno longer. The answer is given by the next property, which is analogous tothe elementary fact that if n−1cn → c, then n−1
∑nk=1 ck → c.
Proposition 34. Let x(t) and a(t) be realvalued functions on R+ such that
0 ≤ a(t) → ∞, a(t)−1x(t) → c, as t→ ∞.
Then the maximum m(t) = sups≤t x(s) satisfies limt→∞ a(t)−1m(t) = c.
Proof. For any ε > 0, let t′ be such that a(t)−1x(t) < c+ ε, for t ≥ t′. Thenfor t ≥ t′,
a(t)−1x(t) ≤ a(t)−1m(t) = maxa(t)−1m(t′), a(t)−1 sup
t′≤s≤tx(s)
≤ maxa(t)−1m(t′), c+ ε
.
Letting t→ ∞ in this display, it follows that
c ≤ lim inft→∞
a(t)−1m(t) ≤ lim supt→∞
a(t)−1m(t) ≤ c+ ε.
Since this is true for any ε, we have a(t)−1m(t) → c.
We will now apply this result to the maximum processM(t) = maxs≤tX(s)for a Brownian motion with drift X(t) = x+ μt+ σB(t).
366 5 Brownian Motion
Proposition 35. The Brownian motion with drift X and its maximum process have the limiting averages
t−1X(t) → μ, t−1M(t) → μ a.s. as t→ ∞.
Proof. This follows since the SLLN t−1B(t) → 0 implies that
t−1X(t) = t−1x+ μ+ σt−1B(t) → μ a.s.,
and then t−1M(t) → μ a.s. follows by Proposition 34.
The preceding result implies that M(t) → ∞ or −∞ a.s. according as thedrift μ is positive or negative. This tells us something about the maximum
M(∞) = supt∈R+
X(t)
on the entire time axis. First, we have the obvious result
M(∞) = limt→∞
M(t) = ∞ a.s. when μ > 0.
Second, M(∞) = ∞ a.s. when μ = 0 by the law of the iterated logarithm in(5.30) below.
For the remaining case of μ < 0, we have the following result.
Theorem 36. If μ < 0 and X(0) = 0, then M(∞) has an exponential distribution with rate −2μ/σ2.
Proof. The assertion follows, since letting t → ∞ in M(t) > b = tb < tand using Theorem 32,
PM(∞) > b = Pτb <∞ = e2μb/σ2.
We will now consider fluctuations of Brownian motions that are describedby a law of the iterated logarithm. Knowing that the limiting average ofBrownian motion B is 0 as t → ∞, a followon issue is to characterize itsfluctuations about 0. These fluctuations, of course, can be described for a“fixed” t by the normal distribution of B(t); e.g., PB(t) ≤ 2
√t ≈ .95.
However, to get a handle on rare fluctuations as t→ ∞, it is of interest tofind constants h(t) such that
lim supt→∞
B(t)h(t)
= 1 a.s.
In other words, h(t) is the maximum height of the fluctuations of B(t) above0, and B(t) gets near h(t) infinitely often (i.o.) as t→ ∞ in that
PB(t) ∈ [h(t) − ε, h(t)] i.o. = 1, ε > 0.
5.8 Limiting Averages and Law of the Iterated Logarithm 367
Since the reflection −B is a Brownian motion, the preceding would also yield
lim inft→∞
B(t)h(t)
= −1 a.s.
These fluctuations are related to those as t ↓ 0 as follows.
Remark 37.
lim supt→∞
B(t)h(t)
= 1 a.s. ⇐⇒ lim supt↓0
B(t)th(1/t)
= 1 a.s.
This is because the timeinversion process X(t) = tB(1/t) is a Brownianmotion by Exercise 2. Indeed, the equivalence is true since, using s = 1/t,
lim supt→∞
B(t)h(t)
= lim sups↓0
X(s)sh(1/s)
.
Remark 37 says that h(t) is the height function for fluctuations of B ast → ∞ if and only if th(1/t) is the height function for fluctuations as t ↓ 0.The height functions for both of these cases are as follows. The proof, due toKhintchine 1924, is in [37, 61, 64].
Theorem 38. (Law of the Iterated Logarithm)
lim supt↓0
B(t)√
2t log log(1/t)= 1, lim sup
t→∞
B(t)√2t log log t
= 1 a.s.
lim inft↓0
B(t)√
2t log log(1/t)= −1, lim inf
t→∞
B(t)√2t log log t
= −1 a.s.
Note that the lim supt↓0 result implies that B(t) > 0 i.o. near 0, and so
inft > 0 : B(t) > 0 = 0 a.s.
Similarly, the lim inft↓0 result implies that B(t) < 0 i.o. near 0 a.s. Consequently, B(t) = 0 i.o. near 0 a.s. because B has continuous paths a.s.
The other results for t → ∞ imply that, for any fixed a > 0, we haveB(t) > a and B(t) < −a i.o. a.s. as t→ ∞, and so B passes through [−a, a]i.o. a.s. Furthermore, the extremes of B are
supt∈R+
B(t) = ∞ a.s., inft∈R+
B(t) = −∞ a.s. (5.30)
368 5 Brownian Motion
5.9 Donsker’s Functional Central Limit Theorem
By the classical central limit theorem (Theorem 63 in Chapter 2), we knowthat a random walk under an appropriate normalization converges in distribution to a normal random variable. This section extends this result tostochastic processes. In particular, viewing a random walk as a process incontinuous time, if the time and space parameters are rescaling appropriately, then the random walk process converges in distribution to a Brownianmotion. This result, called Donsker’s functional central limit theorem, alsoestablishes that many functionals of random walks can be approximated bycorresponding functionals of Brownian motion.
Throughout this section Sn =∑ni=1 ξk will denote a random walk in which
the step sizes ξn are i.i.d. with mean 0 and variance 1. For each n, considerthe stochastic process
Xn(t) = n−1/2Snt, t ∈ [0, 1].
That is,
Xn(t) = n−1/2Sk if k/n ≤ t < (k + 1)/n for some k < n.
This process is a continuoustime representation of the random walk Sk inwhich the location Sk is rescaled (or shrunk) to the value n−1/2Sk, and thetime scale is rescaled such that the walk takes [nt] steps in time t. Then as nbecomes large the steps become very small and frequent and, as we will show,Xn converges in distribution to a standard Brownian motion B as n→ ∞.
We begin with the preliminary observation that the finitedimensional distributions of Xn converge in distribution to those of B. That is, for any fixedt1 < · · · < tk,
(Xn(t1), . . . , Xn(tk))d→ (B(t1), . . . , B(tk)), as n→ ∞. (5.31)
In particular, for each fixed t, we have Xn(t)d→ B(t), as n→ ∞.
The latter follows since n−1/2Snd→ B(1) by the classical central limit
theorem, and so
Xn(t) = (nt/n)1/2nt−1/2Sntd→ t1/2B(1) d= B(t).
Similarly, (5.31) follows by a multivariate central limit theorem.Expression (5.31) only provides a partial description of the convergence
in distribution of Xn to B; we will now give a complete description of theconvergence that includes sample path information.
Throughout this section, D = D[0, 1] will denote the set of all functionsx : [0, 1] → R that are rightcontinuous with lefthand limits. Assume that theσfield associated with D is the smallest σfield under which the projection
5.9 Donsker’s Functional Central Limit Theorem 369
map x → x(t) is measurable, for each t. Almost every sample path of Xn isa function in D, and so the process Xn is a Dvalued random variable (or arandom element in D).
We will consider D as a metric space in which the distance between twofunctions x and y is ‖x− y‖, based on the uniform or supremum norm
‖x‖ = supt≤1
x(t).
Other metrics for D are discussed in [11, 115]. Convergence in distributionof random elements in D, as in other metric spaces, is as follows. Randomelements Xn in a metric S convergence in distribution to X in S as n→ ∞,denoted by Xn
d→ X in S, if
limn→∞
E[f(Xn)] = E[f(X)],
for any bounded continuous function f : S → R. The convergence Xnd→ X
is equivalent to the weak convergence of their distributions
PXn ∈ · w→ PX ∈ ·. (5.32)
Several criteria for this convergence are in the Appendix.An important consequence of Xn
d→ X in S is that it readily leads to theconvergence in distribution of a variety of functionals of the Xn as follows.
Theorem 39. (Continuous Mapping) Suppose that Xnd→ X in S as n→ ∞,
and f : S → S′ is a measurable mapping, where S′ is another metric space.If C ⊆ S is in the σfield of S such that f is continuous on C and X ∈ C
a.s., then f(Xn)d→ f(X) in S′ as n→ ∞.
Proof. Recall that Xnd→ X is equivalent to (5.32), which we will denote by
μnw→ μ. Then f(Xn)
d→ f(X) is equivalent to μnf−1 w→ μf−1 since
Pf(Xn) ∈ A = PXn ∈ f−1(A) = μnf−1(A).
Also note that by Theorem 10 in the Appendix, μnw→ μ is equivalent to
lim infn→∞
μn(G) ≥ μ(G), for any open G ⊆ S.
Now using this characterization, for any open set G ⊆ S′,
lim infn→∞
μnf−1(G) ≥ lim inf
n→∞μn(f−1(G)) ≥ μ(f−1(G)).
Here A is the interior of the set A. Clearly f−1(G) ⊃ C ∩ f−1(G), andμ(C) = 1 by the assumption X ∈ C a.s. Then μ(f−1(G)) = μf−1(G). Using
370 5 Brownian Motion
this in the preceding display yields lim infn→∞ μnf−1(G) ≥ μf−1(G), which
proves μnf−1 w→ μf−1, and hence f(Xn)d→ f(X).
We are now ready to present the functional central limit theorem provedby Donsker in 1951 for the continuoustime random walk process
Xn(t) = n−1/2Snt, t ∈ [0, 1].
Theorem 40. (Donsker’s FCLT) For the random walk process Xn definedabove, Xn
d→ B in D as n→ ∞, where B is a standard Brownian motion.
The proof of this theorem will follow after a few observations and preliminary results. Donsker’s theorem is called a “functional central limit theorem”because, under the continuousmapping theorem, many functionals of therandom walk also converge in distribution to the corresponding functionalsof the limiting Brownian motion. Two classic examples are as follows; wecover other examples later.
Example 41. If Xnd→ B in D, then, for t1 < · · · < tk ≤ 1,
(n−1/2Snt1, . . . , n−1/2Sntk)
d→ (B(t1), . . . , B(tk)). (5.33)
This convergence is equivalent to (5.31). Now (5.33) says f(Xn)d→ f(B),
where f : D → Rk is defined, for fixed t1 < · · · < tk, by
f(x) = (x(t1), . . . , x(tk)).
Clearly f is continuous on the set C of continuous functions in D and B ∈ Ca.s. Then (5.33) follows from the continuousmapping theorem.
Example 42. The convergence Xnd→ B implies
n−1/2 maxm≤n
Smd→ max
s≤1B(s).
The distribution of the limit is given in Theorem 11. The convergence follows by the continuousmapping theorem since the function f : D → R+
defined by f(x) = maxs≤1 x(s) is continuous in that ‖xn − x‖ → 0 impliesmaxs≤1 xn(s) → maxs≤1 x(s).
Donsker’s FCLT is also called an invariance principle because in the convergence Xn
d→ B, the Brownian motion limit B is the same for “any” distribution of the step size of the random walk, provided it has a finite meanand variance. When the mean and variance are not 0 and 1, respectively, theresult applies with the following change in notation.
5.9 Donsker’s Functional Central Limit Theorem 371
Remark 43. If ξn are i.i.d. random variables with finite mean μ and varianceσ2, then (ξk−μ)/σ are i.i.d. with mean 0 and variance 1, and hence Donsker’stheorem holds for
Xn(t) = n−1/2
nt∑
k=1
(ξk − μ)/σ, t ∈ [0, 1].
Consequently, the random walk Sn =∑nk=1 ξk, for large n, is approximately
equal in distribution to a Brownian motion with drift. In particular, usingn1/2B(t) d= B(nt),
Sntd≈ μnt + σB(nt), Sn
d≈ μn+ σB(n).
Does the convergence in distribution in Donsker’s theorem hold for processes defined on the entire time axis R+? To answer this, consider the spaceD[0, T ] of all functions x : [0, T ] → R that are rightcontinuous with lefthand limits, for fixed T > 0. Similarly to D[0, 1], the D[0, T ] is a metricspace with the supremum norm. Now let D(R+) denote the space of all functions x : R+ → R that are rightcontinuous with lefthand limits. ConsiderD(R+) as a metric space in which convergence xn → x in D(R+) holds ifxn → x in D[0, T ] holds for each T that is a continuity point of x.
Remark 44. Convergence in D(R+). Donsker’s convergence Xnd→ B holds in
D[0, T ], for each T , and in D(R+) as well. The proof for D[0, T ] is exactlythe same as that for D[0, 1]. The convergence also holds in D(R+), since Bis continuous a.s.
Donsker’s approach for proving Theorem 40 is to prove the convergence(5.31) of the finitedimensional distributions and then establish a certaintightness condition. This proof is described in Billingsley 1967; his book andone by Whitt 2002 cover many fundamentals of functional limit theoremsand weak convergence of probability measures on metric spaces.
Another approach for proving Theorem 40, which we will now present, isby applying Skorohod’s embedding theorem. The gist of this approach is thatone can construct a Brownian motion B and stopping times τn for it suchthat Sn d= B(τn). Then further analysis of Xn and B defined on thesame probability space establishes ‖Xn −B‖ P→ 0, which yields Xn
d→ B.The key embedding theorem for this analysis is as follows. It says that
any random variable ξ with mean 0 and variance 1 can be represented asB(τ) for an appropriately defined stopping time τ . Furthermore, any i.i.d.sequence ξn of such variables can be represented as an embedded sequenceB(τn)−B(τn−1) in a Brownian motion B for appropriate stopping times τn.The proof is in [37, 61].
Theorem 45. (Skorohod Embedding) Associated with the random walk Sn,there exists a standard Brownian motion B with respect to a filtration and
372 5 Brownian Motion
stopping times 0 = τ0 ≤ τ1 ≤ . . . such that τn − τn−1 are i.i.d. with mean 0and variance 1, and Sn d= B(τn).
Another preliminary leading to Donsker’s theorem is the following Skorohod approximation result that the uniform difference between the randomwalk and a Brownian motion on [0, t] is o(t1/2) a.s. as t→ ∞. This materialand the proof of Donsker’s theorem below is from Kallenberg 2004.
Theorem 46. (Skorohod Approximation of Random Walks) There exists astandard Brownian motion B on the same probability space as the randomwalk Sn such that
t−1/2 sups≤t
Ss −B(s) P→ 0, as t→ ∞. (5.34)
Proof. Let B and τn be as in Theorem 45, and define them on the sameprobability space as Sn (which is possible) so Sn = B(τn) a.s. Define
D(t) = t−1/2 sups≤t
B(τs) −B(s).
Then (5.34) is equivalent to PD(t) > ε → 0 for ε > 0.To prove this convergence, let δt = sups≤t τs − s, t ≥ 0. For a fixed
γ > 0, consider the inequality
PD(t) > ε ≤ PD(t) > ε, t−1δt ≤ γ + Pt−1δt > γ. (5.35)
Note that n−1τn → 1 a.s. by the strong law of large numbers for i.i.d. randomvariables, and so t−1τt − t → 0 a.s. Then the limiting average of thesupremum of these differences is t−1δt → 0 a.s. by Proposition 34.
Next, consider the modulus of continuity of f : R+ → R, which is
w(f, t, γ) = supr,s≤t, r−s≤γ
f(r) − f(s), t ≥ 0.
ClearlyD(t) ≤ w(B, t+ tγ, tγ), when t−1δt ≤ γ.
Using this observation in (5.35) and t−1/2B(r) : r ≥ 0 d= B(rt) : r ≥ 0from the scaling property in Exercise 2, we have
PD(t) > ε ≤ Pt−1/2w(B, t+ tγ, tγ) > ε + Pt−1δt > γ= Pw(B, 1 + γ, γ) > ε + Pt−1δt > γ.
Letting t → ∞ (t−1δt → 0 a.s.), and then letting γ → 0 (B has continuouspaths a.s.), the last two probabilities tend to 0. Thus PD(t) > ε → 0,which proves (5.34).
We will now obtain Donsker’s theorem by applying Theorem 46.
5.10 Regenerative and Markov FCLTs 373
Proof of Donsker’s Theorem. Let B and Sn = B(τn) a.s. be as in the proofof Theorem 46, and define Bn(t) = n−1/2B(nt). Clearly
‖Xn −Bn‖ = n−1/2 supt≤1
Snt −B(nt) = n−1/2 sups≤n
Ss −B(s).
Then ‖Xn −Bn‖ P→ 0 by Theorem 46.Next, note that, by Exercise 1, the scaled process Bn is a Brownian motion.
Now, as in [61], one can construct Xn and a Brownian motion B on the sameprobability space such that (Xn, B) d= (Xn, Bn). Then we have
‖Xn −B‖ d= ‖Xn − B‖ d= ‖Xn −Bn‖ P→ 0.
This proves Xnd→ B.
5.10 Regenerative and Markov FCLTs
This section presents an extension of Donsker’s FCLT for processes withregenerative increments. This in turn yields FCLTs for renewal processesand ergodic Markov chains in discrete and continuous time.
For this discussion, suppose that Z(t) : t ≥ 0 is a realvalued processwith Z(0) = 0 that is defined on the same probability space as a renewal process N(t) whose renewal times are denoted by 0 = T0 < T1 < . . . The increments of the twodimensional process (N(t), Z(t)) in the interval [Tn−1, Tn)are denoted by
ζn = (Tn − Tn−1, Z(t) − Z(Tn−1) : t ∈ [Tn−1, Tn)).
Recall from Section 2.10 that Z(t) has regenerative increments over Tn if ζnare i.i.d.
Theorem 65 in Chapter 2 is a central limit theorem for processes withregenerative increments. An analogous FCLT is as follows. Assuming theyare finite, let
μ = E[T1], a = E[Z(T1)]/μ, σ2 = Var[Z(T1) − aT1],
and assume σ > 0. In addition, let
Mn = supTn<t≤Tn+1
Z(t) − Z(Tn), n ≥ 0,
and assume E[M1] and E[T 21 ] are finite. For r > 0, consider the process
374 5 Brownian Motion
Xr(t) =Z(rt) − art
σ√r/μ
, t ∈ [0, 1].
This is the regenerativeincrement process Z with spacetime scale changesanalogous to those for random walks. A realvalued parameter r instead ofan integer is appropriate since Z is a continuoustime process.
Theorem 47. (Regenerative Increments) For the normalized regenerativeincrement process Xr defined above, Xr
d→ B as r → ∞, where B is astandard Brownian motion.
The proof uses the next two results. Let D1 denote the subspace of functions x in D that are nondecreasing with x(0) = 0 and x(t) ↑ 1 as t → 1.The composition mapping from the product space D ×D1 to D, denoted by(x, y) → x y, is defined by x y(t) = x(y(t)), t ∈ [0, 1]. Let C and C1 denotethe subspaces of continuous functions in D and D1, respectively.
Proposition 48. The composition mapping from D×D1 to D is continuouson the subspace C × C1.
Proof. Suppose (xn, yn) → (x, y) in D×D1 such that (x, y) ∈ C ×C1. Usingthe sup norm and the triangle inequality,
‖xn yn − x y‖ ≤ ‖xn yn − x yn‖ + ‖x yn − x y‖.
Now, the last term tends to 0 since x ∈ C is uniformly continuous. Also,
‖xn yn − x yn‖ = ‖xn − x‖ → 0.
Thus xn yn → x y in D, which proves the assertion.
The continuity of composition mappings under weaker assumptions is discussed in [11, 115]. The importance of the composition mapping is illustratedby the following result. In the setting of Theorem 47, the regenerative increment property of Z implies that
ξn = Z(Tn) − Z(Tn−1) − a(Tn − Tn−1)
are i.i.d. with mean 0 and variance σ2.
Lemma 49. Under the preceding assumptions, define the process
X ′r(t) =
1σ√r/μ
N(rt)∑
k=1
ξk, t ∈ [0, 1].
Then X ′rd→ B as r → ∞.
5.10 Regenerative and Markov FCLTs 375
Proof. Letting Xr(t) = 1
σ√r/μ
∑rtk=1 ξk, it follows by Donsker’s theorem that
Xrd→ μ1/2B as r → ∞. With no loss in generality, assume μ−1 < 1. Consider
the process
Yr(t) =N(rt)/r if N(r)/r ≤ μ−1
t/μ if N(r)/r > μ−1.
Note that
Xr Yr(t) =1
σ√r/μ
rYr(t)∑
k=1
ξk.
This equals X ′r(t) when N(r)/r ≤ μ−1, and so for any ε > 0,
P‖X ′r − Xr Yr‖ > ε ≤ PN(r)/r > μ−1 → 0.
The convergence follows since N(r)/r → μ−1 a.s. by the SLLN for renewalprocesses (Corollary 11 in Chapter 2). This proves X ′
r − Xr Yr d→ 0. Thento prove X ′
rd→ B, it suffices by Exercise 53 to show that Xr Yr d→ B.
Letting I(t) = t, t ∈ [0, 1], note that
‖Yr − μ−1I‖ ≤ supt≤1
N(rt)/r − μ−1t
= r−1 sups≤r
N(s) − μ−1s → 0 a.s.
The convergence follows by Proposition 34 since the SLLN for N impliesr−1N(r) − μ−1r → 0 a.s. Now, we have (Xr, Yr)
d→ (μ1/2B,μ−1I), wherethe limit functions are continuous. Then Proposition 48 and Exercise 1 yield
Xr Yr d→ μ1/2B μ−1Id= B.
Thus Xr Yr d→ B, which completes the proof.
Remark 50. The assertion in Lemma 49 implies that
X ′r(1) =
1σ√r/μ
N(r)∑
k=1
ξkd→ B(1),
which is Anscombe’s result in Theorem 64 in Chapter 2.
We now establish the convergence of Xr(t) = (Z(rt) − art)/(σ√r/μ).
Proof of Theorem 47. We can write
Xr(t) = X ′r(t) +
√μ
σVr(t), (5.36)
where
376 5 Brownian Motion
X ′r(t) =
Z(TN(rt)) − aTN(rt)
σ√r/μ
,
Vr(t) = r−1/2[Z(rt) − Z(TN(rt)) − a(rt− TN(rt))
].
Recognizing that X ′r is the process in Lemma 49, we have X ′
rd→ B. Then the
proof of Xrd→ B will be complete upon showing that Vr
d→ 0.Letting
ξn = supTn<t≤Tn+1
Z(t) − Z(Tn) + a(Tn+1 − Tn),
it follows that
‖Vr‖ ≤ r−1/2 supt≤1
ξN(rt) =√N(r)/r
(N(r)−1/2 sup
k≤N(r)
ξk
).
The regenerativeincrement property of Z implies that the ξn are i.i.d. Then
n−1/2ξnd= n−1/2ξ1 → 0 a.s.
Now N(r)−1/2 supk≤N(r) ξkP→ 0 by Proposition 34. Applying this to the pre
ceding display and using N(r)/r → μ−1 a.s., we get ‖Vr‖ d→ 0.
Since renewal processes and ergodic Markov chains are regenerative processes, FCLTs for them are obtainable by Theorem 47. To see this, first notethat a renewal process N(t) has regenerative increments over its renewaltimes Tn, and the parameters above are Mn = 1,
a = E[N(T1)]/μ = μ−1, Var[N(T1) − μ−1T1] = μ−2Var[T1].
Then the following is an immediate consequence of Theorem 47.
Corollary 51. (Renewal Process) Suppose N(t) is a renewal process whoseinterrenewal times have mean μ and variance σ2, and define
Xr(t) =N(rt) − rt/μ
σ√r/μ3
, t ∈ [0, 1].
Then Xrd→ B as r → ∞.
The particular case Xr(1) d→ B(1) is the classical central limit theoremfor renewal processes, which we saw in Example 67 in Chapter 2; namely
N(r) − r/μ
σ√r/μ3
d→ B(1).
For the next result, suppose that Y is an ergodic CTMC on a countablestate space S with stationary distribution p. For a fixed state i, assume that
5.11 Peculiarities of Brownian Sample Paths 377
Y (0) = i and let 0 = T0 < T1 < . . . denote the times at which Y enters statei. Assume Ei[T 2
1 ] < ∞ and let μ = Ei[T1]. For f : S → R, assuming thefollowing integral exists, consider the process
Z(t) =∫ t
0
f(Y (s))ds, t ≥ 0.
This has regenerative increments over the Tn and, assuming the sum is absolutely convergent, Corollary 40 in Chapter 4 yields
a = Ei[Z(T1)]/μ =∑
j
f(j)pj.
Assume Ei[M1] and σ2 = Var[Z(T1) − aT1] are finite, and σ > 0. ThenTheorem 47 for the CTMC functional Z is as follows. An analogous resultfor discretetime Markov chains is in Exercise 48.
Corollary 52. (CTMC) Under the preceding assumptions, for r > 0, definethe process
Xr(t) =
∫ rt0f(Y (s))ds− art
σ√r/μ
, t ∈ [0, 1].
Then Xrd→ B as r → ∞.
5.11 Peculiarities of Brownian Sample Paths
While sample paths of a Brownian motion are continuous a.s., they are extremely erratic. This section describes their erratic behavior.
Continuous functions are typically monotone on certain intervals, but thisis not the case for Brownian motion paths.
Proposition 53. Almost every sample path of a Brownian motion B ismonotone on no interval.
Proof. For any a < b in R+, consider the event A = B is nondecreasing on[a, b]. Clearly A = ∩∞
n=1An, where
An = ∩ni=1B(ti) −B(ti−1) ≥ 0
and ti = a + i(b − a)/n. The A is measurable since each An is. BecausePB(ti) − B(ti−1) ≥ 0 = 1/2 and the increments of B are independent,we have P (An) = 2−n, and so P (A) ≤ limn→∞ P (An) = 0. This conclusion is also true for the event A = B is nonincreasing on [a, b]. Thus B ismonotone on no interval a.s.
378 5 Brownian Motion
For the next result, we say that for a Brownian motion B on a closed interval I, its local maximum is supt∈I B(t), and its local minimum is inft∈I B(t).There are processes that have local maxima on two disjoint intervals thatare equal with a positive probability, but this is not the case for Brownianmotion.
Proposition 54. The local maxima and minima of a Brownian motion Bare a.s. distinct.
Proof. It suffices to show that, for disjoint closed intervals I and J in R+,
MI = MJ a.s.,
where each of the quantities MI and MJ is either a local minimum or a localmaximum.
First, suppose MI and MJ are both local maxima. Let u denote the rightendpoint of I and v > u denote the left endpoint of J .
MJ −MI = supt∈J
[B(t) −B(v)] − supt∈I
[B(t) −B(u)] +B(v) −B(u).
The three terms on the right are independent, and the last one is nonzero a.s.(since the increments are normally distributed). Therefore, MI = MJ a.s.
This result is also true by similar arguments when each of the quantitiesMI and MJ are both local minima, or when one is a local minimum and theother is a local maximum.
We now answer the question: How much time does a Brownian motionspend in a particular state?
Proposition 55. The amount of time that a Brownian motion B spends ina fixed state a over the entire time horizon is the Lebesgue measure La of thetime set t ∈ R+ : B(t) = a, and La = 0 a.s.
Proof. Since La is nonnegative, it suffices to show E[La] = 0. For n ∈ Z+,consider the process Xn(t) = B(nt/n), t ≥ 0. Clearly Xn(t) → B(t) a.s. asn→ ∞ for each t. Then by Fubini’s theorem,
E[La] =∫
R+
PB(t) = adt =∫
R+
limn→∞
PXn(t) = adt
≤ lim infn→∞
∫
R+
PXn(t) = adt.
The last integral (of a piecewise constant function) is 0 since Xn(t) has anormal distribution, and so E[La] = 0.
Proofs of the next two results are in [61, 64].
5.12 Brownian Bridge Process 379
Theorem 56. (Dvoretzky, Erdos, and Kakutani 1961) Almost every samplepath of a Brownian motion B does not have a point of increase: for positivet and δ,
PB(s) ≤ B(t) ≤ B(u) : (t− δ)+ ≤ s < t < u ≤ t+ δ = 0.
Analogously, every sample path of B does not have a point of decrease.
Theorem 57. (Paley, Wiener and Zygmund 1933) Almost every sample pathof a Brownian motion is nowhere differentiable.
More insights into the wild behavior of a Brownian motion path are givenby its linear and quadratic variations. The (linear) variation of a realvaluedfunction f on an interval [a, b] is
V ba (f) = sup n∑
k=1
f(tk) − f(tk−1) : a = t0 < t1 < · · · < tn = b.
If this variation is finite, then f has the following properties:• It can be expressed as the difference f(t) = f1(t) − f2(t) of two increasingfunctions, where f1(t) = V ta (f).• The f has a derivative at almost every point in [a, b].• RiemannStieltjes integrals of the form
∫[a,b] g(t)df(t) exist.
In light of these observations, Theorem 57 implies that almost every samplepath of a Brownian has an “unbounded” variation on any finite interval ofpositive length. Further insight into the behavior of Brownian paths in termsof their quadratic variation is in Exercise 33.
Because the sample paths of a Brownian motion B have unbounded variation a.s., a stochastic integral
∫[a,b]X(t)dB(t) for almost every sample path
cannot be defined as a classical RiemannStieltjes integral. Another approachis used for defining stochastic integrals with respect to a Brownian motionor with respect to a martingale. Such integrals are the basis of the theory ofstochastic differential equations.
5.12 Brownian Bridge Process
We will now study a special Gaussian process called a Brownian bridge.Such a process is equal in distribution to a Brownian motion on [0, 1] that isrestricted to hit 0 at time 1. An important application is its use in the nonparametric KolmogorovSmirnov statistical test that a random sample comesfrom a specified distribution. In particular, for large samples, the normalizeddifference between the empirical distribution and the true distribution is approximately the maximum of a Brownian bridge.
380 5 Brownian Motion
Throughout this section X(t) : t ∈ [0, 1] will denote a stochastic processon R, and B(t) will denote a standard Brownian motion. The process X isa Brownian bridge if it is a Gaussian process with mean 0 and covariancefunction
E[X(s)X(t)] = s(1 − t), 0 ≤ s ≤ t ≤ 1.
Such a process is equal in distribution to the following Brownian motion“tied down” at 1.
Proposition 58. The process X(t) = B(t)− tB(1), t ∈ [0, 1], is a Brownianbridge.
Proof. This follows since X is clearly a Gaussian process with zero mean and
E[X(s)X(t)] = E[B(s)B(t) − tB(s)B(1)] − sE[B(1)B(t) − tB(1)2]= s(1 − t), s < t.
The last equality uses E[B(u)B(v)] = u, for u ≤ v.
Because of its relation to Brownian motion, many basic properties of aBrownian bridge X can be related to those of Brownian motion. For instance,X has continuous paths that are not differentiable. Note that the negation−X(t), and time reversal X(1 − t) are also Brownian bridges; related ideasare in Exercises 49 and 50.
We will now show how a Brownian bridge is a fundamental process relatedto empirical distributions. Suppose that ξ1, ξ2, . . . are i.i.d. random variableswith distribution F . The empirical distribution associated with ξ1, . . . , ξn is
Fn(t) = n−1n∑
k=1
1(ξk ≤ x), x ∈ R, n ≥ 1.
This function is an estimator of F based on n samples from it. The estimatoris unbiased since clearly E[Fn(x)] = F (x). It is also a consistent estimatorsince by the classical SLLN,
Fn(x) → F (x) a.s. as n→ ∞. (5.37)
This convergence is also uniform in x as follows.
Proposition 59. (GlivenkoCantelli) The empirical distributions satisfy
supx
Fn(x) − F (x) → 0 a.s. as n→ ∞.
Proof. Consider any −∞ = x1 < x2 < · · · < xm = ∞, and note that since Fand Fn are nondecreasing, for x ∈ [xk−1, xk],
Fn(xk−1) − F (xk) ≤ Fn(x) − F (x) ≤ Fn(xk) − F (xk−1).
5.12 Brownian Bridge Process 381
Then
supx
Fn(x) − F (x) ≤ maxk
Fn(xk−1) − F (xk) + maxk
Fn(xk) − F (xk−1).
Letting n → ∞ and letting the differences xk − xk−1 tend to 0, and thenapplying (5.37) to the preceding display proves the assertion for continuousF . Exercise 40 proves the assertion when F is not continuous.
An important application of empirical distributions concerns the followingnonparametric text that a sample comes from a specified distribution.
Example 60. KolmorogovSmirnov Statistic. Suppose that ξ1, ξ2, . . . are i.i.d.random variables with a distribution F that is unknown. As mentioned above,the empirical distribution Fn(x) is a handy unbiased, consistent estimatorof F . Now, suppose we want to test the simple hypothesis H0 that the sampleis from a specified distribution F , versus the alternative hypothesis H1 thatthe sample is not from this distribution. One approach is to use the classicalchisquare test.
Another approach is to use a test based on the KolmogorovSmirnov statistic defined by
Dn =∑
x
Fn(x) − F (x).
This is a measure of the distance between the empirical distribution Fn andF (which for simplicity we assume is continuous). The test would reject H0 ifDn > c, and accept it otherwise. The c would be determined by the probability PDn > cH0 = α, for a specified level of significance α. The conditioningon H0 means conditioned that F is the true distribution.
When n is large, one can compute c by using the approximation
Pn1/2Dn ≤ xH0 ≈ P sup0≤t≤1
B(t) − tB(1) ≤ x
= 2∞∑
k=1
(−1)k+1e−2k2x2.
This approximation follows from Theorem 61 below, and the summation formula is from [37].
We will now establish the limiting distribution of the KolmogorovSmirnovstatistic.
Theorem 61. The empirical distribution Fn associated with a sample fromthe distribution F satisfies
n1/2 supx
Fn(x) − F (x) d→ sup0≤t≤1
X(t), (5.38)
where X is a Brownian bridge.
382 5 Brownian Motion
Proof. From Exercise 40, we know that Fn = Gn(F (·)) and
supx
Fn(x) − F (x) = sup0≤t≤1
Gn(t) − t,
where Gn(t) = n−1∑n
k=1 1(Uk ≤ t) is the empirical distribution of the Un,which are i.i.d. with a uniform distribution on [0, 1]. The ξn and Un aredefined on the same probability space.
In light of this observation, assertion (5.38) is equivalent to
n−1/2‖Yn‖ d→ ‖X‖,
where Yn(t) =∑nk=1(1(Uk ≤ t) − t), 0 ≤ t ≤ 1, and ‖x‖ = supt≤1 x(t),
for x ∈ D. To prove this convergence, it suffices by the continuousmappingtheorem to show that n−1/2Yn
d→ X in D, since the map x→ ‖x‖ from D toD is continuous (in the uniform topology).
Let κn be a Poisson random variable with mean n that is independent ofthe Uk. We will prove n−1/2Yn
d→ X based on Exercise 53 by verifying
n−1/2Yκn
d→ X, (5.39)
n−1/2‖Yn − Yκn‖P→ 0. (5.40)
Letting Nn(t) =∑κn
k=1 1(Uk ≤ t), where Nn(1) = κn, we can write
n−1/2Yκn(t) = n−1/2(Nn(t) − nt) − tn−1/2(Nn(1) − n).
Now Nn is a Poisson process on [0, 1] with rate n by the mixedsample representation of Poisson processes in Theorem 26 of Chapter 3. Then fromthe functional central limit theorem for renewal processes in Corollary 51,the process n−1/2(Nn(t) − nt) converges in distribution in D to a Brownianmotion B.
Applying this to the preceding display, it follows that the process n−1/2Yκn(t)converges in distribution in D to the process B(t) − tB(1), which is a Brownian bridge. This proves (5.39).
Next, note that
n−1/2‖Yn − Yκn‖d= n−1/2 sup
0≤t≤1
∣∣∣
n−κn∑
k=1
(1(Uk ≤ t) − t)∣∣∣
= n−1/2κn − nZn, (5.41)
where Zn = sup0≤t≤1 Gκn−n(t) − t. Since κn is the sum of n i.i.d. Poissonrandom variables with mean 1, it follows by the classical central limit theoremthat n−1/2κn − n d→ B(1). This convergence also implies κn − n P→ ∞.Now sup0≤t≤1 Gn(t) − t P→ 0 by Proposition 59 and so this convergence
5.13 Geometric Brownian Motion 383
is also true with n replaced by κn − n; that is, ZnP→ 0. Applying these
observations to (5.41) verifies (5.40), which completes the proof.
5.13 Geometric Brownian Motion
This section describes a geometric Brownian and related processes that areused for modeling stock prices or values of investments.
Let X(t) denote the price of a stock (commodity or other financial instrument) at time t. Suppose the value of the stock has many small up anddown movements due to continual trading. One possible model is a Brownianmotion with drift X(t) = x + μt + σB(t). This might be appropriate as acrude model for local or shorttime behavior. It is not very good, however,for medium or long term behavior, since the stationary increment propertyis not realistic (e.g., a change in price for the stock when it is $50 should bedifferent from the change when the value is $200).
A more appropriate model for the stock price, which is used in practice, is
X(t) = xeμt+σB(t). (5.42)
Any process equal in distribution to X is a geometric Brownian motion withdrift μ and volatility σ. Since E[eαB(t)] = eα
2t/2, the moments of X(t) aregiven by
E[X(t)k] = xkekμt+k2tσ2/2, k ≥ 1.
For instance,
E[X(t)] = xeμt+tσ2/2 = x[1 + (μ+ σ2/2)t] + o(t) as t ↓ 0.
The X is a diffusion process that satisfies the differential property
dX(t) = (μ+ σ2/2)X(t)dt+ σX(t)dB(t).
We will not prove this characterization, but only note that by the momentformula above, it follows that the instantaneous drift and diffusion parametersfor X are
μ(x, t) = (μ+ σ2/2)x, σ(x, t) = σ2x2.
Although the geometric Brownian motion X does not have stationary independent increments, it does have a nice property of ratios of the increments.In particular, the ratio at the end and beginning of any time interval [s, s+ t]is
X(t+ s)/X(s) = eμt+σ(B(s+t)−B(s)) d= eμt+σB(t),
so its distribution is independent of s. Also, these ratios over disjoint equallength time intervals are i.i.d. This means that as a model for a stock price,
384 5 Brownian Motion
one cannot anticipate any upward or downward movements in the price “ratios”. So in this sense, the market is equitable (or not biased).
Does this also mean that the market is fair in the martingale sense thatX(t) is a martingale with respect to B? The answer is generally no.
However, X is a martingale if and only if μ + σ2/2 = 0 (a very specialcondition). This follows since e−t(μ+σ2/2)X(t) is a martingale with respectto B with mean x by Example 19 (and E[X(t)] = x when X(t) is such amartingale).
The geometric Brownian model (5.42) has continuous paths that do notaccount for large discretejumps in stock prices. To incorporate such jumps,another useful model is as follows.
Example 62. Prices with Jumps. Suppose the price of a stock at time t is givenbyX(t) = eY (t), where Y (t) is a realvalued stochastic process with stationaryindependent increments (e.g., a compound Poisson or Levy process). Theseproperties of Y also ensure that the price ratios are i.i.d. in disjoint, fixedlength intervals.
Assume as in Exercise 7 that the moment generating function ψ(α) =E[eαY (1)] exists for α in a neighborhood of 0, and E[eαY (t)] is continuous att = 0 for each α. Then it follows that
E[X(t)k] = ψ(k)t, k ≥ 1.
In particular, if Y (t) is a compound Poisson process with rate λ and itsjumps have the moment generating function G(α), then ψ(α) = e−λ(1−G(α)).Consequently,
E[X(t)k] = e−λt(1−G(k)), k ≥ 1.
Other possibilities are that Y is a sum of a Brownian motion plus anindependent compound Poisson process, or that X is the sum of a geometricBrownian motion plus an independent compound Poisson process.
We will not get into advanced investment models using geometric Brownian motion such as BlackScholes option pricing. However, the followingillustrates an elementary computation for an option.
Example 63. Stock Option. Suppose that the price of one unit of a stock attime t is given by a geometric Brownian motion X(t) = eB(t). A customerhas the option of buying one unit of the stock at a fixed time T at a priceK, but the customer need not make the purchase. The value of the optionto the customer is (X(t) −K)+ since the customer will not buy the stock ifX(t) < K. We will disregard any fee that the customer would pay in orderto obtain the option.
The expectation of the option’s value is
E[(X(T )−K)+] =∫ ∞
0
PX(T )−K > x dx
=∫ ∞
0
PB(T ) > log(x+K) dx.
5.14 Multidimensional Brownian Motion 385
This integral can be integrated numerically by using an approximation forthe normal distribution of B(T ). A variation of this option is in Exercise 39.
5.14 Multidimensional Brownian Motion
Brownian motions in the plane and in multidimensional spaces are natural models for phenomena driven by several independent (or dependent)singledimension Brownian motions. This section gives some insight into thesemultidimensional processes.
A stochastic process B(t) = (B1(t), . . . , Bd(t)), t ≥ 0, in Rd is a multidi
mensional Brownian motion if B1, . . . , Bd are independent Brownian motionson R. Many basic properties of this process follow from results in one dimension. For instance, the multidimensional integral formula
∫
Rd
Px+B(t) ∈ Adx = A,
the Lebesgue measure of A, follows from the similar formula for d = 1 inExercise 6. The preceding integral is used in Section 5.15 for particle systems.
Applications of Brownian motions in Rd typically involve intricate func
tions of the singledimension components whose distributions determine system parameters (e.g., Exercise 54). Here is another classical application.
Example 64. Bessel Processes. Associated with a Brownian motion B(t) inRd, consider its radial distance to the origin defined by
R(t) =(B1(t)2 + · · · +Bd(t)2
)1/2, t ≥ 0.
Any process equal in distribution to R is a Bessel process of order d.When d = 1, we have the familiar reflected Brownian motion process
R(t) = B(t). Exercise 19 mentioned that this is a Markov process and itspecifies its distribution (also recall Theorem 11).
The R(t) is also a Markov process on R for general d. Its transition probability PR(t) ∈ AR(0) = x =
∫A p
t(x, y) dy has the density
pt(x, y) = t−1(xy)1−d/2yd−1Id/2−1(xy/t),
where Iβ is the modified Bessel function of order β > −1 defined by
Iβ(u) =∞∑
k=0
(u/2)2k+β
k!Γ (k + β + 1), u ∈ R.
This is proved in [61]. We will only derive the density of R(t) when R(0) = 0.To this end, consider
386 5 Brownian Motion
R(t)2/t = (B1(t)2 + · · · +Bd(t)2)/td= B1(1)2 + · · · +Bd(1)2.
The last sum of squares of d independent standard normal random variablesis known to have a χsquared density f with d degrees of freedom. This f isa gamma density with parameters α = d/2 and λ = 1/2 (see the Appendix).Therefore, knowing that R(0) = 0,
PR(t) ≤ r = PR(t)2/t ≤ r2/t =∫ r2/t
0
f(x) dx. (5.43)
The density of R(t) is shown in Exercise 55.Although the hitting times of R(t) are complicated, we can evaluate their
means. Consider the time τa = inft ≥ 0 : R(t) = a to hit a value a > 0. Thisis a stopping time of R(t) and τa ≤ inft ≥ 0 : B1(t) = a < ∞ a.s. sincethe last stopping time is finite a.s. as noted in Theorem 11. Now, Exercise 56shows that R(t)2− t is a martingale with mean 0. Then the optional stoppingresult in Corollary 24 yields E[R(τa)2 − τa] = 0. Therefore E[τa] = a2.
We will now consider a multidimensional process whose components are“dependent” onedimensional Brownian motions with drift. Let B(t) be aBrownian motion in R
d, and let C = cij be a d × d matrix of nonnegative real numbers that are symmetric (cij = cji) and nonnegativedefinite(∑
i
∑j uiujcij ≥ 0, for u ∈ R
d). As in the representation (5.8) of a multivariate normal vector, let A be a k × d matrix with transpose At and k ≤ dsuch that AtA = C. Consider the process X(t) : t ≥ 0 in R
d defined by
X(t) = x+ μt+B(t)A,
where x and μ are in Rd.
Any process equal in distribution to X is a generalized Brownian motionin R
d with initial value x, drift μ and covariance matrix C = AtA.A major use for multidimensional Brownian motions is in approximat
ing multidimensional random walks. The following result is an analogue ofDonsker’s Brownian motion approximation for onedimensional random walksin Theorem 40.
Suppose that ξk, k ≥ 1, are i.i.d. random vectors in Rd with mean vector
μ = (μ1, . . . , μd) and covariances cij = E[(ξk,i − μi)(ξk,j − μj)], 1 ≤ i, j ≤ d.Define the processes Xn(t) : t ≥ 0 in R
d, for n ≥ 1, by
Xn(t) = n−1/2
nt∑
k=1
(ξk − μ), t ≥ 0.
Theorem 65. Under the preceding assumptions, Xnd→ X, as n→ ∞, where
X is a generalized Brownian motion on Rd starting at 0, with no drift, and
with covariance matrix cij.
5.15 Brownian/Poisson Particle System 387
Sketch of Proof. Consider Xn,i(t) = n−1/2∑nt
k=1 (ξk,i − μi), which is the ith
component of Xn. By Donsker’s theorem, Xn,id→ Xi for each i. Now, the
CramerWold theorem states that (Xn,1, . . . , Xn,d)d→ (X1, . . . , Xd) in R
d
if and only if∑d
i=1 aiXn,id→
∑di=1 aiXi in R for any a ∈ R
d. However,the latter holds by another application of Donsker’s theorem. Therefore, thefinitedimensional distributions of Xn converge to those of X . To completethe proof that Xn
d→ X , it suffices to verify a certain tightness condition onthe distributions of the processes Xn, which we omit.
5.15 Brownian/Poisson Particle System
This section describes a system in which particles occasionally enter an Euclidean space and move about independently as Brownian motions and eventually exit. The system data and dynamics are represented by a markedPoisson process like those in Chapter 2. The focus is on characterizing certain Poisson processes describing particle locations over time and departuresas intricate functions of the arrival process and particle trajectories. TheBrownian motion structure of the trajectories lead to tractable probabilities.
Consider a system of discrete particles that move about in the space Rd
as follows. The locations and entry times of the particles are represented bythe spacetime Poisson process N =
∑n δ(Xn,Tn) on R
d × R, where Xn isthe location in R
d at which the nth particle enters at time Tn. This Poissonprocess is homogeneous in that
E[N(A× I)] = αAλI,
where A is the Lebesgue measure of the Borel set A. Here λ is the arrivalrate of particles per unit time in any unit area, and α is the arrival rate perunit area in any unit time period. Note that, for bounded sets A and I, theN(A × I) is finite, but N(Rd × I) and N(A × R) are infinite a.s. (becausetheir Poisson means are infinite).
We assume that each particle moves in Rd independently as a ddimensional
Brownian motion B(t), t ≥ 0, for a length of time V with distribution G andthen exits the space.
More precisely, let Vn, n ∈ Z, be independent with Vnd= V , and let Bn,
n ∈ Z, be independent with Bnd= B. Assume Bn, Vn are independent
and independent of N . Viewing the Bn and Vn as independent marks of(Xn, Tn), the data for the entire system is defined formally by the markedPoisson process
M =∑
n
δ(Xn, Tn, Bn, Vn), on S = Rd × R × C(R,Rd) × R+.
388 5 Brownian Motion
Here C(R,Rd) denotes the set of continuous functions from R to Rd. The
mean measure of M is given by
E[M(A× I × F × J)] = αAλIPB ∈ FPV ∈ J.
The interpretation is that the nth particle has a spacetime entry at(Xn, Tn) and its location at time t is given by Xn + Bn(t − Tn), wheret − Tn ≤ Vn. At the end of its sojourn time Vn it exits the system at timeTn + Vn from the location Xn +Bn(Vn).
Let us see where the particles are at any time t. It is not feasible to accountfor all the particles that arrive up to time t, which is N(Rd × (−∞, t]) = ∞.So we will consider particles that enter in a bounded time interval I prior tot, which is t− I (e.g., t− [a, b] = [t− b, t− a]).
Now, the number of particles that enter Rd in a time interval I prior to t
and are in A at time t is
Nt(I ×A) =∑
n
δ(Tn,Xn+Bn(t−Tn))(I ×A)1(Vn > t− Tn).
The Nt is a point process on R+ × Rd.
Proposition 66. The family of point processes Nt : t ∈ R is stationary int, and each Nt is a Poisson process on R+ × R
d with mean measure
E[Nt(I ×A)] = αλA∫
I
(1 −G(u))du. (5.44)
Proof. By the form of its mean measure, the Poisson process M with its timeaxis shifted by an amount t is
StM =∑
n
δ(Xn,Tn+t,Bn,Vn)d= M, t ∈ R.
Therefore, M is stationary in the time axis. To prove that Nt is stationaryin t, it suffices by Proposition 104 in Chapter 3 to show that Nt = f(StM),for some function f .
Accordingly, for a locallyfinite counting measure ν =∑
n δ(xn,tn,bn,vn) onS, define the counting measure f(ν) on R+ × R
d by
f(ν) =∑
n
δ(−tn,xn+bn(−tn))1(vn > −tn).
Then clearly, Nt = f(StM), which proves that Nt is stationary.Next, note that Nt is a deterministic map of the Poisson process M re
stricted to the subspace (x, s, b, v) ∈ S : s ≤ t, v > t−s, in which any point(x, s, b, v) in the subspace is mapped to (s, x+ b(t−s)). Then by Theorem 32in Chapter 2, Nt is a Poisson process with mean measure given by
5.16 G/G/1 Queues in Heavy Traffic 389
E[Nt(I ×A)] = αλ
∫
t−I
( ∫
Rd
Px+B(t− s) ∈ A dx)PV > t− sds.
Because B is a Brownian motion, the integral in parentheses reduces toA by Exercise 6. Therefore, using the change of variable u = t − s in thelast expression yields (5.44).
Next, let us consider departures from the system. The number of particlesthat enter R
d during the time set I and depart from A during the time setJ is N(I ×A× J), where N is a point process of the form
N =∑
n
δ(Tn,Xn+Bn(Vn),Tn+Vn) on (s, x, u) ∈ R × Rd × R : s ≤ u.
Proposition 67. The point process of departures N is a Poisson process withmean measure given by
E[N(I ×A× J)] = αλA∫
R+
I ∩ (J − v)dG(v). (5.45)
Proof. By its definition, N is a deterministic map g of the Poisson processM ,where g(Xn, Tn, Bn, Vn) = (Tn, Xn+Bn(Vn), Tn +Vn). Then by Theorem 32in Chapter 2, N is a Poisson process with mean
E[N(I×A×J)] = αλ
∫
I
∫
R+
1(s+v ∈ J)( ∫
Rd
Px+B(v) ∈ A dx)dG(v) ds.
The integral in parentheses reduces to A by Exercise 6. Then an interchangeof the order of integration in the last expression yields (5.45).
There are several natural generalizations of the preceding model with moredependencies among the marks and entry times and points, e.g., see Exercise51. Although the processes Nt and N may still be Poisson, their mean valueswould be more complicated.
5.16 G/G/1 Queues in Heavy Traffic
Section 4.20 of Chapter 4 showed that the waiting times Wn for successiveitems in a G/G/1 queueing system are a function of a random walk. Thissuggests that the asymptotic behavior of these times can be characterizedby the Donsker Brownian motion approximation of a random walk, and thatis what we shall do now. We first describe the limit of Wn when the trafficintensity ρ = 1, and then present a more general FCLT for the Wn when thesystem is in heavy traffic: the traffic intensity is approximately 1.
Consider a G/G/1 queueing system, as in Section 4.20 of Chapter 4, inwhich items arrive at times that form a renewal process with interarrival
390 5 Brownian Motion
times Un, and the service times are i.i.d. nonnegative random variables Vnthat are independent of the arrival times. The service discipline is firstcomefirstserved with no preemptions. The interarrival and service times havefinite means and variances, and the traffic intensity of the system is ρ =E[V1]/E[U1]. For simplicity, assume the system is empty at time 0.
Our interest is in the length of time Wn that the nth arrival waits in thequeue before being processed. Section 4.20 of Chapter 4 showed that thesewaiting times satisfy the Lindley recursive equation
Wn = (Wn−1 + Vn−1 − Un)+, n ≥ 1,
and consequently,
Wn = max0≤m≤n
n∑
k=m+1
(Vk−1 − Uk). (5.46)
Under the assumptions on the Un and Vn, it follows that
Wnd= max
0≤m≤nSm, (5.47)
where Sn =∑nm=1 ξm and ξm = Vm − Um.
In case ρ < 1, Theorem 118 of Chapter 4 noted that
Wnd→ max
0≤m<∞Sm.
In this section, we consider the limiting behavior of the waiting times Wn
when ρ equals or approaches 1, meaning that the system is in heavy traffic.We begin with the case ρ = 1, and describe the asymptotic behavior of
the waiting times via the process
Wn(t) =Wntσ√n, t ≥ 0.
Theorem 68. Suppose the G/G/1 system defined above has ρ = 1 and σ2 =Var(ξ1) > 0. Then
Wnd→M in D(R+) as n→ ∞,
where M(t) = maxs≤tB(s), the maximum process for a standard Brownianmotion B. Hence
Wn
σ√n
d→M(1) d= B(1).
Proof. Note that
5.16 G/G/1 Queues in Heavy Traffic 391
Wn(t) d=1
σ√n
maxm≤nt
Sm =1
σ√n
sups≤t
S[ns].
That is, Wn(t) d= f(Xn)(t), t ≥ 0, where
Xn(t) =Sntσ√n, t ≥ 0,
and f : D(R+) → D(R+) is the supremum map defined by
f(x)(t) = sup0≤s≤t
x(s), x ∈ D(R+).
Now the random walk Sn has steps with mean E[ξ1] = 0, since ρ = 1; andσ2 = Var(ξ1). Then Xn
d→ B by Donsker’s theorem.Next, it is clear that if ‖xn − x‖ → 0 in D[0, T ], then
‖f(xn) − f(x)‖ ≤ ‖xn − x‖ → 0 in D[0, T ].
Then since ‖Xn −B‖ P→ 0 in D[0, T ] for each T , it follows that
‖f(Xn) − f(B)‖ P→ 0 in D(R+).
This along with Wn = f(Xn) and f(B) = M proves Wnd→M .
In particular, Wn(1) d→M(1) d= B(1), which proves the second assertion;that M(1) d= B(1) follows by Theorem 11.
The preceding result suggests that for any G/G/1 system in which ρ ≈ 1,
the approximation Wnd≈M would be valid. A formal statement to this effect
is as follows.Consider a family of G/G/1 systems indexed by a parameter r with inter
arrival times Urn and service times V rn . Denote the other quantities by ρr,W rn , Srn =
∑nm=1 ξ
rm, etc., and consider the process
Wr(t) =W r
rtσ√r, t ≥ 0.
Theorem 69. Suppose the family of G/G/1 systems are such that ρr → 1,
suprE[(ξr1 − E[ξr1 ])2+ε] <∞, for some ε > 0,
r1/2E[ξr1 ] → 0, Var(ξr1) → σ2 > 0, as r → ∞.
Then Wrd→M in D(R+) as r → ∞.
392 5 Brownian Motion
Proof. As in the proof of Theorem 68, Wr = f(Xr), where f is the supremummap and Xr(t) = Srt/σ
√r, t ≥ 0. Then to prove the assertion, it suffices
to show that Xrd→ B as r → ∞.
Now, we can write
Xr(t) = Yr(t) + (rt/r)r1/2E[ξr1 ],
where
Yr(t) =1
σ√r
rt∑
m=1
(ξrm − E[ξr1 ]).
Under the hypotheses, Yrd→ B by a theorem of Prokhorov 1956, and hence
Xrd→ B.
The preceding results are typical of many heavytraffic limit theorems thatone can obtain for queueing and related processes by the framework presentedby Whitt [115]. In particular, when a system parameter, such as the waitingtime above, can be expressed as a function of the system data (cumulativeinput and output processes), and that data under an appropriate normalization converges in distribution, then under further technical conditions, thesystem parameter also converges in distribution to the function of the limitsof the data. Here is one of the general models in [115].
Example 70. Generalized G/G/1 System. Consider a generalization of theG/G/1 systems above in which the interarrival times Urn and service timesV rn (the system data) are general random variables that may be dependent.Then the waiting times W r
n can still be expressed as a function of the systemdata as in (5.46). In other words,
W rn = Sn − min
0≤m≤nSm
where Sn =∑nk=1(Vk−1 − Uk). As above, consider the processes
Wr(t) =W r
rtσ√r, Xr(t) =
Srtσ√r, t ≥ 0.
Then we can write Wr = h(Xr), where h : D(R) → D(R) is the onesidedreflection map defined by
h(x)(t) = x(t) − inf0≤s≤t
x(s), t ≥ 0.
The reflection map h (like the supremum map above) is continuous in theuniform topology on D[0, T ] since
‖h(x) − h(y)‖ ≤ 2‖x− y‖, x, y ∈ D[0, T ].
5.17 Brownian Motion in a Random Environment 393
Then the continuousmapping theorem yields the following result.
Convergence Criterion. If Xrr→ X in D(R), then Wr
d→ W in D(R) asr → ∞, where
W (t) = X(t) − inf0≤s≤t
X(s), t ≥ 0.
To apply this for a particular situation, one would use properties of theinterarrival times and service times (as in Theorem 69) to verify Xr
r→ X .There are a variety of conditions under which the limit X is a Brownianmotion, a process with stationary independent increments, or an infinitelydivisible process; and other topologies on D(R+) are often appropriate [115].
5.17 Brownian Motion in a Random Environment
Section 3.14 describes a Poisson process with a random intensity measurecalled a Cox process. The random intensity might represent a random environment or field that influences the locations of points. This section describesan analogous randomization for Brownian motions in which the time scale isdetermined by a stochastic process.
Let X(t) : t ∈ R+ and η = η(t) : t ∈ R+ be realvalued stochasticprocesses defined on the same probability space, such that η(t) is a.s. nondecreasing with η(0) = 0 and η(t) → ∞ a.s. as t → ∞. The process X isa Brownian motion directed by η if the increments of X are conditionallyindependent given η, and, for any s < t, the increment X(t) − X(s) has aconditional normal distribution with variance τt − τs. These conditions, interms of the moment generating function for the increments of X , say that,for 0 = t0 < t1 < · · · < tn and u1, . . . , un in R+,
E[exp
n∑
i=1
ui[X(ti) −X(ti−1)] ∣∣∣ η
](5.48)
= exp1
2
n∑
i=1
u2i [η(ti) − η(ti−1)]
a.s.
A directed Brownian motion is equal in distribution to a standard Brownian motion with random time parameter as follows.
Remark 71. A process X is a Brownian motion directed by η if and only ifX
d= B η′, where B and η′ are defined on a common probability spacesuch that B is a standard Brownian motion independent of η′, and η′
d=η. This follows by the definition above and consideration of the momentgenerating function of the increments of the processes. The process B η′ is a
394 5 Brownian Motion
Brownian motion subordinated to η (like a Markov chain subordinated to aPoisson process, which we saw in Chapter 4). In case η is strictly increasing,Exercise 43 shows that X = B η a.s., where B is defined on the sameprobability space as X and η.
A Brownian motion X directed by η inherits many properties of standardBrownian motions. The proofs usually follow by conditioning on η and usingproperties of this process. Here are some examples.
Example 72. E[X(t)] = 0, and Var[X(t)] = E[η(t)].
Example 73. Consider τXa = inft : X(t) ≥ a. Then
PτXa ≤ t =∫
R+
Pη(t) ≤ uPτa ∈ du,
where τa = inft : B(t) = a.
Example 74. Suppose that X1, . . . , Xm are Brownian motions directed byη1, . . . , ηm, respectively, and (X1, η1), . . . , (Xm, ηm) are independent. ThenX(t) = X1(t) + · · · + Xm(t) is a Brownian motion directed by η(t) =η1(t) + · · · ηm(t).
Example 75. FCLT. For a Brownian motion X directed by η, define
Xr(t) = b−1/2r X(rt) t ≥ 0,
where br → ∞ are constants. By Remark 71 and the scaling b−1/2r B
d= B(br ·),we can write Xr
d= B Yr, where Yr(t) = η′(rt)/br and the Brownian motionB and η′ are independent. Then by the property of the composition mappingin Proposition 48, we obtain the following result, where I is the identityfunction: If Yr
d→ I in D(R+), then Xrd→ B in D(R+).
5.18 Exercises
For the following exercises, B will denote a standard Brownian motion.
Exercise 1. Show that each of the following processes is also a Brownianmotion.(a) B(t+ s) −B(s) Translated process for a fixed time s.(b) −B(t) Reflected process(c) c−1/2B(ct) Scaling, for c > 0
(d) B(T ) −B(T − t), t ∈ [0, T ], Timereversal on [0, T ] for fixed T .
5.18 Exercises 395
Exercise 2. The timeinversion of a Brownian motion B is the processX(0) = 0,
X(t) = tB(1/t), t > 0.
Prove that X is a Brownian motion. First show that X(t) → 0 a.s. as t ↓ 0.
Exercise 3. Suppose that h : R+ → R+ is a continuous, strictly increasingfunction with h(0) = 0, and h(t) ↑ ∞. Find the mean and covariance functionsfor the process B(h(t)). Show that B(h(t)) d= X(t) for each t, where X(t) =(h(t)/t)−1/2B(t). Are the processes B(h(·)) and X equal in distribution?
Exercise 4. For 0 < s < t, show that the conditional density of B(s) givenB(t) = b is normal with conditional mean and variance
E[B(s)B(t) = b] = bs/t, Var[B(s)B(t) = b] = s(t− s)/t.
For t1 < s < t2, show that
PB(s) ≤ xB(t1) = a, B(t2) = b = PB(s−t1) ≤ x−aB(t2−t1) = b−a.
Using these properties prove that the conditional density of B(s) givenB(t1) = a, B(t2) = b is normal with conditional mean and variance
a+ (b− a)(s− t1)(t2 − t1)
and(s− t1)(t2 − s)
(t2 − t1).
Exercise 5. Consider the process X(t) = ae−αt + σ2B(t), t ≥ 0, wherea, α ∈ R and σ > 0. Find the mean and covariance functions for this process. Show that X is a Gaussian process by applying Theorem 5. Does Xhave independent increments, and are these increments stationary? Is X amartingale, submartingale or supermartingale with respect to B?
Exercise 6. For a density f on R that is symmetric (f(x) = f(−x), x ∈ R),show that
∫
R
( ∫
A−xf(y)dy
)dx = A (The Lebesgue measure of A).
Use this to verify that, for a Brownian motion B(t),∫
R
Px+ μt+ σB(t) ∈ Adx = A,
independent of t, μ and σ.
Exercise 7. Let Y (t) be a realvalued process with stationary independentincrements. Assume that the moment generating function ψ(α) = E[eαY (1)]exists for α in a neighborhood of 0, and g(t) = E[eαY (t)] is continuous att = 0 for each α. Show that g(t) is continuous in t for each α, and that
396 5 Brownian Motion
g(t) = ψ(α)t. Use the fact that g(t + u) = g(t)g(u), t, u ≥ 0, and that theonly continuous solution to this equation has the form g(t) = etc, for some cthat depends on α (because g(t) depends on α). Show that
E[Y (t)] = tE[Y (1)], Var[Y (t)] = tVar[Y (1)].
Exercise 8. Let X(t) = μt+ σB(t) be a Brownian motion with drift, whereμ > 0. Suppose that a signal is triggered whenever M(t) = sups≤tX(s)reaches the levels 0, 1, 2, . . .. So the nth signal is triggered at time τn =inft : M(t) = n, n ≥ 0. Show that these times of the signals form arenewal process and find the mean, variance and Laplace transform of thetimes between signals.
In addition, obtain this information under the variation in which a signalis triggered whenever M(t) reaches the levels 0 = L0 < L1 < L2 < . . . , whereLn − Ln−1 are independent exponential random variables with rate λ.
Exercise 9. When is a Gaussian Process Stationary? Recall that a stochasticprocess is stationary if its finitedimensional distributions are invariant undershifts in time. This is sometimes called strong stationarity. A related notionis that a realvalued process X(t) : t ≥ 0 is weakly stationary if its meanfunction E[X(t)] is a constant and its covariance function Cov(X(s), X(t))depends only on t−s. Weak stationarity does not imply strong stationarity.However, if a realvalued process is strongly stationary and its mean andcovariance functions are finite, then the process is weakly stationary. Showthat a Gaussian process is strongly stationary if and only if it is weaklystationary.
Exercise 10. Suppose that Yn, n ∈ Z, are independent normally distributedrandom variables with mean μ and variance σ2. Consider the moving averageprocess
Xn = a0Yn + a1Yn−1 + · · · + amYn−m, n ∈ Z,
where a0, . . . , am are real numbers. Show that Xn : n ∈ Z is a Gaussian process that is stationary and specify its mean and covariance functions. Justifythat this process is not Markovian and does not have stationary independentincrements.
Exercise 11. Derive the mean and variance formulas in (5.14) for M(t).
Exercise 12. Equal CruiseControl Settings. Two autos side by side on ahighway moving at 65 mph attempt to move together at the same speed bysetting their cruisecontrol devises at 65 mph. As in many instances, naturedoes not always correspond to one’s wishes and the actual cruisecontrolsettings are independent normally distributed random variables V1 and V2
with mean μ = 65 and standard deviation σ = 0.4. Find the probability thatthe autos move at the same speed. Find PV1 − V2 < .3.
5.18 Exercises 397
Exercise 13. Letting τa = inft : B(t) = a, find the probability that B hits0 in the time interval (τa, τb), where 0 < a < b.
Exercise 14. Arc sine Distribution. Let U = sin2θ, where θ has a uniformdistribution on [0, 2π]. Verify that PU ≤ u = arcsin
√u, u ∈ [0, 1], which
is the arc sine distribution.Let X1, X2 be independent normally distributed random variables with
mean 0 and variance 1. Show that X21/(X
21 + X2
2 ) d= U . Hint: In the integral representation for PX2
1/(X21 +X2
2 ) ≤ u, use polar coordinates where(x1, x2) is mapped to r = (x2
1 + x22)1/2 and θ = arctanx2/x1.
Is it true that U d= 1 − U?
Exercise 15. Prove Theorem 15 for u = 1. Using this result, find the distribution of η = supt ∈ [0, u] : B(t) = 0 for u = 1.
Exercise 16. Suppose that B and B are independent Brownian motions.Find the moment generating function of B(τa) at the time when B hits a,which is τa = inft : B(t) = a. Show that B(τa) : a ∈ R+ considered as astochastic process has stationary independent increments.
Exercise 17. For the hitting time τ = inft > 0 : B(t) ∈ (−a, a), wherea > 0, prove that its Laplace transform is
E[e−λτ ] = 1/ arccos(a√
2λ).
Mimic the proof of Theorem 32 using the facts that B(τ) is independent ofτ , and PB(τ) = −a = PB(τ) = a = 1/2.
Exercise 18. Continuation. In the context of the preceding exercise, verifythat E[τ ] = a2, and E[τ2] = 5a4/3.
Exercise 19. Let M(t) = sups≤tB(s), and consider the process X(t) =M(t) −B(t), t ≥ 0. Show that
X(t) d= M(t) d= B(t), t ≥ 0.
(The processes X and B(·) are Markov processes on R+ with the sametransition probabilities, and hence they are equal in distribution. However,they are not equal in distribution to M , since the latter is nondecreasing.)
Show that
PX(t) ≤ zX(s) = x =∫ z
−∞[ϕ(y − x; t− s) + ϕ(−y − x; t− s)] dy,
where ϕ(x; t) = e−x2/2t/
√2πt. In addition, verify that
PM(t) > aX(t) = 0 = e−a2/2t.
398 5 Brownian Motion
Exercise 20. Reflection Principle for Processes. Suppose that τ is an a.s.finite stopping time for a Brownian motion B, and define
X(t) = B(t ∧ τ) − (B(t) −B(t ∧ τ)), t ≥ 0.
Prove that X is a Brownian motion. Hint: Show that X d= B by using thestrong Markov property along with the process B′(t) = B(τ + t) − B(τ),t ≥ 0, and the representations
B(t) −B(t ∧ τ) = B′((t− τ)+), B(t) = B(t ∧ τ) +B′((t− τ)+).
Exercise 21. Continuation. For the hitting time τa = inft > 0 : B(t) = a,show that the reflected process
X(t) = B(t)1(τa ≤ t) + (2a−B(t))1(τa > t)
is a Brownian motion. Use the result in the preceding exercise.
Exercise 22. Use the reflection principle to find an expression for
PB(t) > y, mins≤t
B(s) > 0.
Exercise 23. The value of an investment is modeled as a Brownian motionwith drift X(t) = x + μt + σB(t), with an upward drift μ > 0. Find thedistribution of M(t) = mins≤tX(s). Use this to find the distribution of thelowest value M(∞) = inft∈R+ X(t) when x = 0. In addition, find
PX(t)−M(t) > a, a > 0.
Exercise 24. The values of two stocks evolve as independent Brownian motions X1 and X2 with drifts, where Xi(t) = xi + μit+ σiBi(t), and x1 < x2.Find the probability that X2 will stay above X1 for at least s time units.Let τ denote the first time that the two values are equal. Find E[τ ] whenμ1 < μ2 and when μ1 > μ2.
Exercise 25. Show that
PB(1) ≤ xB(s) ≥ 0, s ∈ [0, 1] = 1 − e−x2/2.
Hint: Consider B(t) = B(1) − B(1 − t) and show that the conditional probability is equal to PB(1) ≤ xM (1) = B(1), where M(t) = sups≤t B(s).
Exercise 26. Consider a compound Poisson process Y (t) =∑N(t)
n=1 ξn, whereN(t) is a Poisson process with rate λ and the ξn are i.i.d. and independent ofN . Suppose ξ1 has a mean μ, variance σ2 and moment generating functionψ(α) = E[eαξ1 ]. Show that the following are martingales with respect to Y :
5.18 Exercises 399
X1(t) = Y (t) − λμt, X2(t) = (Y (t) − λμt)2 − tλ(μ2 + σ2),X3(t) = eαY (t)−λt(1−ψ(α)), t ≥ 0.
Find the mean E[Xi(t)], for each i.
Exercise 27. Suppose X(t) denotes the stock level of a certain product attime t and the holding cost up to time t is Y (t) = h
∫ t0X(s) ds, where h is
the cost per unit time of holding one unit in inventory. Show that if X is aBrownian motion B, then the mean and covariance functions of Z are
E[Y (t)] = 0, Cov(Y (s), Y (t)) = h2s2(t/2 − s/6), s ≤ t.
Find the mean and covariance functions of Y if X(t) = x + μt + σB(t), aBrownian motion with drift; or if X is a compound Poisson process as in thepreceding problem.
Exercise 28. Prove that Y (t) =∫ t0B(s) ds, t ≥ 0, is a Gaussian process
with mean 0 and E[Y (t)2] = t3/3.Hint: Show that Z =
∑ni=1 uiX(ti) has a normal distribution for any
t1, . . . , tn in R+, and u1, . . . , un in R. Since a Riemann integral is the limitof sums of rectangles, we know that Z = limn→∞ Zn, where
Zn =n∑
i=1
ui
n∑
k=1
(ti/n)B(kti/n).
Justify that each Zn is normally distributed, and that its limit (using momentgenerating functions) must also be normally distributed.
Exercise 29. Continuation. Suppose X(t) = exp∫ t0 B(s) ds, t ≥ 0. Verify
that E[X(t)] = et6/6.
Exercise 30. Let Y = Yn, n ≥ 0 be independent random variables (thatneed not be identically distributed) with finite means. Suppose X0 is a deterministic function of Y0 with finite mean. Define
Xn = X0 +n∑
i=1
Yi, X ′n = X0
n∏
i=1
Yi, n ≥ 1.
Show that Xn is a discretetime martingale with respect to Y if E[Yi] = 0.How about if E[Yi] ≥ 0? Is Xn a martingale with respect to itself? What canyou say about X ′
n if the Yi are positive with E[Yi] = 1? or ≥ 1?
Exercise 31. Wald Equation for Discounted Sums . Suppose that ξ0, ξ1, . . .are costs incurred at discrete times and they are i.i.d. with mean μ. Considerthe discounted cost process Zn =
∑nm=0 α
mξm, where α ∈ (0, 1) is a discount
400 5 Brownian Motion
factor. Suppose that τ is a stopping time of the process ξn such that E[τ ] <∞and E[ατ ] exists for some 0 < α < 1. Prove that
E[Zτ ] =μ(1 − αE[ατ ])
(1 − α).
Do this by finding a convenient martingale and applying the optional stoppingtheorem; there is also a direct proof without the use of martingales.
Next, consider the process Sn =∑n
m=0 ξm, n ≥ 0, and show that
E[τ∑
m=0
αmSm] =μE[τ ]1 − α
− αμ(1 − αE[ατ ])(1 − α)2
.
Exercise 32. Continuation. In the preceding problem, are the results trueunder the weaker assumption that ξ0, ξ1, . . . are such that E[ξ0] = μ andE[ξnξ0, . . . , ξn−1] = μ, n ≥ 1?
Exercise 33. Quadratic Variation. Consider the quadratic increments
V (t) =∑
i
(B(ti) −B(ti−1))2
over a partition 0 = t0 < t1 < · · · < tk = t of [0, t]. Verify E[V (t)] = t and
Var[V (t)] =∑
i
(ti − ti−1)2Var[B(1)2].
Next, for each n ≥ 1, let Vn(t) denote a similar quadratic increment sum fora partition 0 = tn0 < tn1 < · · · < tnkn = t, where maxk(tnk − tn,k−1) → 0.Show that E[Vn(t)2 − t] → 0 (which says Vn(t) converges in mean squaredistance to t). The function t is the quadratic variation of B in that it is theunique function (called a compensator) such that B(t) − t is a martingale.
One can also show that Vn → t a.s. when the partitions are nested.
Exercise 34. Random Time Change of a Martingale. Suppose that X is amartingale with respect to Ft and that τt : t ≥ 0 is a nondecreasing processof stopping times of Ft that are bounded a.s. Verify thatX(τt) is a martingalewith respect to Ft, and that its mean is E[X(τt)] = E[X(0)].
Exercise 35. Optional Switching. Suppose that Xn and Yn are two martingales with respect to Fn that represent values of an investment in a fairmarket that evolve under two different investment strategies. Suppose an investor begins with the Xstrategy and then switches to the Y strategy at ana.s. finite stopping time τ of Fn, and that Xτ = Yτ . Then the investmentvalue would be
5.18 Exercises 401
Zn = Xn1(n < τ) + Yn1(n ≥ τ),
where Xτ is the value carried forward at time τ . Show that there is no benefitfor the investor to switch at τ by showing that Zn is a martingale. Use therepresentation
Zn+1 = Xn+11(n < τ) + Yn+11(n ≥ τ) − (Xτ − Yτ )1(τ = n+ 1).
Exercise 36. Prove that if σ and τ are stopping times of Ft, then so areσ ∧ τ and σ + τ .
Exercise 37. Prove that if X(t) and Y (t) are submartingales with respectto Ft, then so is X(t) ∨ Y (t).
Exercise 38. Consider a geometric Brownian motion X(t) = xeB(t). Findthe mean and distribution of τa = inft : X(t) = a.
Exercise 39. Recall the investment option in Example 63 in which a customer may purchase a unit of a stock at a price K at time T . Consider thisoption with the additional stipulation that the customer “must” purchasea unit of the stock before time T if its price reaches a prescribed level a,and consequently the other purchase at time T is not allowed. In this setting, the customer must purchase the stock at the price a prior to time t,if maxx≤tX(s) = eM(t) > a, where M(t) = maxs≤tB(s). Otherwise, theoption of a purchase at the price K is still available at time T . In this case,the value of the option is
Z = (1 − a)1(M(T ) > log a) + (X(T )−K)+1(M(T ) ≤ log a).
Prove that
E[Z] = 2(1 − a)[1 − Φ(log a/√T )] +
∫ log a
0
∫ y
0
(ey −K)fT (x, y) dx dy,
where ft(x, y) is the joint density of B(t),M(t) and Φ is the standard normaldistribution. Verify that
ft(x, y) =2(2y − x)√
2πt3e−(2y−x)2/2t, x ≤ y, y ≥ 0.
Verify that E[Z] is minimized at the value a at which the integral term equalsthe preceding term. This would be the worst scenario for the customer.
Exercise 40. Prove Proposition 59 when F is not continuous. Use the factfrom Exercise 11 in Chapter 1 that ξn
d= F−1(Un), where Un are i.i.d. withthe uniform distribution on [0, 1]. By Theorem 16 in the Appendix, you canassume the ξn and Un are on the same probability space. Then the empiricaldistribution Gn(t) = n−1
∑nk=1 1(Uk ≤ t) of the Un satisfies Fn = Gn(F (·)).
Conclude by verifying that
402 5 Brownian Motion
supx
Fn(x) − F (x) = supt≤1
Gn(t) − t → 0 a.s. as n→ ∞,
where the limit is due to Proposition 59 for a continuous distribution.
Exercise 41. Let X be a Brownian motion directed by η. Suppose the process η has stationary independent nonnegative increments and E[e−αη(t)] =ψ(α)t, where ψ(α) = E[e−αη(1)]. Determine the moment generating functionof X(1) (as a function of ψ) and show that X has stationary independentincrements.
Exercise 42. Show that if Xn is a nonnegative supermartingale, then thelimit X = limn→∞Xn exists a.s. and E[X ] ≤ E[X0]. Use the submartingaleconvergence theorem and Fatou’s lemma.
Exercise 43. Let X be a Brownian motion directed by η, where the pathsof η are strictly increasing a.s. Show that X(t) = B(η(t)), t ∈ R+, where Bis a Brownian motion (on the same probability space as X and η) that isindependent of η.
Hint: Define B(t) = X(η(t)), where η(t) = infs ≥ 0 : η(s) = t. Arguethat η(η(t)) = t and X(t) = B(η(t)), for each t, and that
E[exp
n∑
i=1
ui[B(ti) −B(ti−1)] ∣∣∣ η
]= exp1
2
n∑
i=1
u2i (ti − ti−1) a.s.
Thus B is a Brownian motion and it is independent of η since the last expression is not random.
Exercise 44. As a variation of the model in Section 5.17, a realvalued process X is a Brownian motion with drift μ and variability σ directed by η if,for 0 = t0 < t1 < · · · < tn and u1, . . . , un in R+,
E[exp
n∑
i=1
ui[X(ti) −X(ti−1)] ∣
∣∣ η
]
= exp n∑
i=1
uiμ[η(ti) − η(ti−1)]12
n∑
i=1
u2iσ
2[η(ti) − η(ti−1)]
a.s.
Show that if t−1η(t) → c a.s. for some c > 0, then
t−1X(t) → cμ, a.s. t−1 maxs≤t
X(s) → cμ a.s.
Exercise 45. Prove Theorem 39 on continuous mappings for “separable”metric spaces by applying the coupling result for the a.s. representation ofconvergence in distribution (Theorem 16 in the Appendix).
5.18 Exercises 403
Exercise 46. Use Donsker’s theorem to prove that
Pn−1/2(Sn − mink≤n
Sk) > x → e−x2/2.
Exercise 47. In the context of Donsker’s theorem, consider the range
Yn = maxk≤n
Sk − mink≤n
Sk
of the random walk. Show that n−1/2Ynd→ Y , where E[Y ] = 2
√2/π. Express
Y as a functional of a Brownian motion.
Exercise 48. FCLT for Markov Chains. Let Yn be an ergodic Markov chainon a countable state space S with stationary distribution π. For a functionf : S → R, consider the process
Xn(t) =1
σ√n
nt∑
k=1
[f(Yk) − a], t ∈ [0, 1].
Specify assumptions (and a, σ) under which Xnd→ B as n→ ∞, and prove it.
Exercise 49. Show that if B is a Brownian motion, then (1− t)B(t/(1− t))and tB(1 − t)/t) are Brownian bridges. In addition, show that if X is aBrownian bridge, then (1+t)X(t/(1+t)) and (1+t)X(1/(1+t)) are Brownianmotions. Hint: Take advantage of the Gaussian property.
Exercise 50. For a Brownian bridge X , find expressions for the distributionof M(1) = mint≤1X(t) and M(1) = maxt≤1X(t).
Exercise 51. Consider the Brownian/Poisson model in Section 5.15 with thedifference that the Poisson input process N is no longer timehomogeneousand its mean measure is
E[N(A× I)] = αAΛ(I),
where Λ is a measure on the time axis R. As in Section 5.15, let Nt(I × A)denote the number of particles that enter S in the time interval t− I and arein A at time t. Verify that each Nt is a Poisson process with
E[Nt(I ×A)] = αA∫
t−IPV > t− sΛ(ds).
Is the family Nt : t ∈ R stationary as it is in Section 5.15?
404 5 Brownian Motion
Exercise 52. Continuity of Addition in D × D. Assume that (Xn, Yn)d→
(X,Y ) in D×D and Disc(X)∩Disc(Y ) is empty a.s., where Disc(x) denotesthe discontinuity set of x. Prove that Xn + Yn
d→ X + Y .
Exercise 53. Show thatXnd→ X inD if Xn
d→ X inD andXn−Xnd→ 0. Do
this by proving and applying the property that if Xnd→ X in D and Yn
d→ y
in D for nonrandom y, then (Xn, Yn)d→ (X, y) in D2 and Xn +Yn
d→ X + yin D, when X has continuous paths a.s.
Exercise 54. Suppose that (B1(t), B2(t)) is a Brownian motion in R2, and
define τa = inft : B1(t) = a. Then X(a) = B2(τa) is the value of B2
when B1 hits a. The process X(a) : a ≥ 0 is, of course, a Brownian motiondirected by τa : a ≥ 0. Show that X has stationary independent incrementsand that X(a) has a Cauchy distribution with density
f(x) =1
aπ(1 + (x/a)2), x ∈ R.
Hint: Find the characteristic function of X(a).
Exercise 55. Consider the Bessel process R(t) =(B1(t)2 · · ·Bd(t)2
)1/2 as in(5.43). Show that its density is
fR(t) =2
(2t)n/2Γ (n/2)rd−1e−r
2/2t.
Evaluate this for d = 3 by using the fact that Γ (α) = (α− 1)Γ (α− 1). Showthat Γ (1/2) =
√π by its definition Γ (α) =
∫ ∞0xα−1e−x dx and the property
of the normal distribution that√
2∫∞0 e−t
2/2 dt =√π.
Exercise 56. Continuation. For the Bessel process R(t) in the precedingexercise, show that R(t)2 − t is a martingale with mean 0.
Exercise 57. Suppose that X(t) is a Brownian bridge. Find an expression interms of normal distributions for pt = PX(1/2)−X(t) > 0. Is pt strictlyincreasing to 1 on [1/2, 1]?
Exercise 58. Let X(t) denote a standard Brownian motion in R3 and let
A denote the unit ball. Find the distribution of the hitting time τ = inft :X(t) ∈ Ac. Is τ d= inft : B(t) > 1?
Exercise 59. For the G/G/1 system described in Section 5.16, consider thewaiting times
Wn = max0≤m≤n
n∑
=m+1
(V−1 − U).
Show that if ρ > 1, then n−1Wn → E[V1 − U1] a.s. as n→ ∞.
Chapter 12Brownian Motion and Gaussian Processes
We started this text with discussions of a single random variable. We then proceededto two and more generally, a finite number of random variables. In the last chapter,we treated the random walk, which involved a countably infinite number of randomvariables, namely the positions of the random walk Sn at times n D 0; 1; 2; 3; : : :.The time parameter n for the random walks we discussed in the last chapter belongsto the set of nonnegative integers, which is a countable set. We now look at a specialcontinuous time stochastic process, which corresponds to an uncountable family ofrandom variables, indexed by a time parameter t belonging to a suitable uncountabletime set T . The process we mainly treat in this chapter is Brownian motion, althoughsome other Gaussian processes are also treated briefly.
Brownian motion is one of the most important continuoustime stochastic processes and has earned its special status because of its elegant theoretical properties,its numerous important connections to other continuoustime stochastic processes,and due to its real applications and its physical origin. If we look at the path ofa random walk when we run the clock much faster, and the steps of the walk arealso suitably smaller, then the random walk converges to Brownian motion. Thisis an extremely important connection, and it is made precise later in this chapter.Brownian motion arises naturally in some form or other in numerous statisticalinference problems. It is also used as a real model for modeling stock marketbehavior.
The process owes its name to the Scottish botanist Robert Brown, who noticedunder a microscope that pollen particles suspended in fluid engaged in a zigzag andeccentric motion. It was, however, Albert Einstein who in 1905 gave Brownian motion a formal physical formulation. Einstein showed that Brownian motion of a largeparticle visible under a microscope could be explained by assuming that the particle gets ceaselessly bombarded by invisible molecules of its surrounding medium.The theoretical predictions made by Einstein were later experimentally verified byvarious physicists, including Jean Baptiste Perrin who was awarded the Nobel prizein physics for this work. In particular, Einstein’s work led to the determination ofAvogadro’s constant, perhaps the first major use of what statisticians call a momentestimate. The existence and construction of Brownian motion was first explicitlyestablished by Norbert Wiener in 1923, which accounts for the other name Wienerprocess for a Brownian motion.
A. DasGupta, Probability for Statistics and Machine Learning: Fundamentalsand Advanced Topics, Springer Texts in Statistics, DOI 10.1007/9781441996343 12,c Springer Science+Business Media, LLC 2011
401
402 12 Brownian Motion and Gaussian Processes
There are numerous excellent references at various technical levels on thetopics of this chapter. Comprehensive and lucid mathematical treatments are available in Freedman (1983), Karlin and Taylor (1975), Breiman (1992), Resnick(1992), Revuz and Yor (1994), Durrett (2001), Lawler (2006), and Bhattacharyaand Waymire (2009). Elegant and unorthodox treatment of Brownian motion isgiven in Morters and Peres (2010). Additional specific references are given in thesections.
12.1 Preview of Connections to the Random Walk
We remarked in the introduction that random walks and Brownian motion are interconnected in a suitable asymptotic paradigm. It would be helpful to understandthis connection in a conceptual manner before going into technical treatments ofBrownian motion.
Consider then the usual simple symmetric random walk defined by S0 D 0; Sn DX1 CX2 C CXn; n 1, where the Xi are iid with common distribution P.Xi D˙1/ D 1
2. Consider now a random walk that makes its steps at much smaller time
intervals, but the jump sizes are also smaller. Precisely, with the Xi ; i 1 still asabove, define
Sn.t/ D Sbntcpn
; 0 t 1;
where bxc denotes the integer part of a nonnegative real number x. This amounts tojoining the points
.0; 0/;
1
n;
X1pn
;
2
n;
X1 C X2pn
; : : :
by linear interpolation, thereby obtaining a curve. The simulated plot of Sn.t/ forn D 1000 in Fig. 12.1 shows the zigzag path of the scaled random walk. We can seethat the plot is rather rough, and the function takes the value zero at t D 0; that is,Sn.0/ D 0, and Sn.1/ ¤ 0.
It turns out that in a suitable precise sense, the graph of Sn.t/ on Œ0; 1 for largen should mimic the graph of a random function called Brownian motion on Œ0; 1.Brownian motion is a special stochastic process, which is a collection of infinitelymany random variables, say W.t/; 0 t 1, each W.t/ for a fixed t being anormally distributed random variable, with other additional properties for their jointdistributions. They are introduced formally and analyzed in greater detail in the nextsections.
The question arises of why is the connection between a random walk and theBrownian motion of any use or interest to us. A short nontechnical answer to thatquestion is that because Sn.t/ acts like a realization of a Brownian motion, by usingknown properties of Brownian motion, we can approximately describe propertiesof Sn.t/ for large n. This is useful, because the stochastic process Sn.t/ arises in
12.2 Basic Definitions 403
0.2 0.4 0.6 0.8 1 t
0.75
0.5
0.25
0.25
0.5
0.75
1
Fig. 12.1 Simulated plot of a scaled random walk
numerous problems of interest to statisticians and probabilists. By simultaneouslyusing the connection between Sn.t/ and Brownian motion, and known properties ofBrownian motion, we can assert useful things concerning many problems in statistics and probability that would be nearly impossible to assert in a simple directmanner. That is why the connections are not just mathematically interesting, butalso tremendously useful.
12.2 Basic Definitions
Our principal goal in the subsequent sections is to study Brwonian motion andBrownian bridge due to their special importance among Gaussian processes.
The Brownian bridge is closely related to Brownian motion, and shares many ofthe same properties as Brownian motion. They both arise in many statistical applications. It should also be understood that the Brownian motion and bridge are ofenormous independent interest in the study of probability theory, regardless of theirconnections to problems in statistics.
We caution the reader that it is not possible to make all the statements in thischapter mathematically rigorous without using measure theory. This is because weare now dealing with uncountable collections of random variables, and problems ofmeasure zero sets can easily arise. However, the results are accurate and they can bepractically used without knowing exactly how to fix the measure theory issues.
We first give some general definitions for future use.
Definition 12.1. A stochastic process is a collection of random variables fX.t/; t
2 T g taking values in some finitedimensional Euclidean space Rd ; 1 d < 1,where the indexing set T is a general set.
404 12 Brownian Motion and Gaussian Processes
Definition 12.2. A realvalued stochastic process fX.t/; 1 < t < 1g is calledweakly stationary if
(a) E.X.t// D is independent of t .(b) EŒX.t/2 < 1 for all t; and Cov.X.t/; X.s// D Cov.X.t C h/; X.s C h// for
all s; t; h:
Definition 12.3. A realvalued stochastic process fX.t/; 1 < t < 1g iscalled strictly stationary if for every n 1; t1; t2; : : : ; tn, and every h, thejoint distribution of .Xt1 ; Xt2 ; : : : ; Xtn/ is the same as the joint distribution of.Xt1Ch; Xt2Ch; : : : ; XtnCh/.
Definition 12.4. A realvalued stochastic process fX.t/; 1 < t < 1g is called aMarkov process if for every n 1; t1 < t2 < < tn,
P.Xtn xtn jXt1 D xt1 ; : : : ; Xtn1D xtn1
/ D P.Xtn xtn jXtn1D xtn1
/I
that is, the distribution of the future values of the process given the entire past depends only on the most recent past.
Definition 12.5. A stochastic process fX.t/; t 2 T g is called a Gaussian processif for every n 1; t1; t2; : : : ; tn, the joint distribution of .Xt1 ; Xt2 ; : : : ; Xtn/ is amultivariate normal.
This is often stated as a process is a Gaussian process if all its finitedimensionaldistributions are Gaussian.
With these general definitions at hand, we now define Brownian motion and theBrownian bridge. Brownian motion is intimately linked to the simple symmetricrandom walk, and partial sums of iid random variables. The Brownian bridge isintimately connected to the empirical process of iid random variables. We focus onthe properties of Brownian motion in this chapter, and postpone discussion of theempirical process and the Brownian bridge to a later chapter. However, we defineboth Brownian processes right now.
Definition 12.6. A stochastic process W.t/ defined on a probability space.;A; P /; t 2 Œ0; 1/ is called a standard Wiener process or the standard Brownianmotion starting at zero if:
(i) W.0/ D 0 with probability one.(ii) For 0 s < t < 1; W.t/ W.s/ N.0; t s/.
(iii) Given 0 t0 < t1 < : : : < tk < 1; the random variables W.tj C1/ W.tj /; 0 j k 1 are mutually independent.
(iv) The sample paths of W.:/ are almost all continuous; that is except for a set ofsample points of probability zero, as a function of t; W.t; !/ is a continuousfunction.
Remark. Property (iv) actually can be proved to follow from the other three properties. But it is helpful to include it in the definition to emphasize the importance of thecontinuity of Brownian paths. Property (iii) is the celebrated independent increments
12.2 Basic Definitions 405
property and lies at the heart of numerous further properties of Brownian motion.We often just omit the word standard when referring to standard Brownian motion.
Definition 12.7. If W.t/ is a standard Brownian motion, then X.t/ D x C W.t/,x 2 R, is called Brownian motion starting at x, and Y.t/ D W.t/; > 0 is calledBrownian motion with scale coefficient or diffusion coefficient .
Definition 12.8. Let W.t/ be a standard Wiener process on Œ0; 1. The processB.t/ D W.t/ tW.1/ is called a standard Brownian bridge on Œ0; 1.
Remark. Note that the definition implies that B.0/ D B.1/ D 0 with probabilityone. Thus, the Brownian bridge on Œ0; 1 starts and ends at zero. Hence the name tieddown Wiener process. The Brownian bridge on Œ0; 1 can be defined in various otherequivalent ways. The definition we adopt here is convenient for many calculations.
Definition 12.9. Let 1 < d < 1, and let Wi .t/; 1 i d , be independent Brownian motions on Œ0; 1/. Then Wd .t/ D .W1.t/; : : : ; Wd .t// is called d dimensionalBrownian motion.
Remark. In other words, if a particle performs independent Brownian movements along d different coordinates, then we say that the particle is engaged ind dimensional Brownian motion. Figure 12.2 demonstrates the case d D 2. Whenthe dimension is not explicitly mentioned, it is understood that d D 1.
Example 12.1 (Some Illustrative Processes). We take a few stochastic processes,and try to understand some of their basic properties. The processes we consider arethe following.
(a) X1.t/ X , where X N.0; 1/.(b) X2.t/ D tX , where X N.0; 1/.
0.2 0.4 0.6 0.8
0.8
0.6
0.4
0.2
0.2
0.4
Fig. 12.2 State visited by a planar Brownian motion
406 12 Brownian Motion and Gaussian Processes
(c) X3.t/ D A cos t C B sin t , where is a fixed positive number, t 0, andA; B are iid N.0; 1/.
(d) X4.t/ D R t
0W.u/du; t 0, where W.u/ is standard Brownian motion on
Œ0; 1/, starting at zero.(e) X5.t/ D W.t C 1/ W.t/; t 0, where W.t/ is standard Brownian motion on
Œ0; 1/, starting at zero.
Each of these processes is a Gaussian process on the time domain on which it isdefined. The mean function of each process is the zero function.
Coming to the covariance functions, for s t ,
Cov.X1.s/; X1.t// 1:
Cov.X2.s/; X2.t// D st:
Cov.X3.s/; X3.t// D cos s cos t C sin s sin t D cos .s t/:
Cov.X4.s/; X4.t// D E
Z s
0
W.u/duZ t
0
W.v/dv
D E
Z t
0
Z s
0
W.u/W.v/dudv
DZ t
0
Z s
0
EŒW.u/W.v/dudv DZ t
0
Z s
0
min.u; v/dudv
DZ s
0
Z s
0
min.u; v/dudv CZ t
s
Z s
0
min.u; v/dudv
DZ s
0
Z v
0
min.u; v/dudv CZ s
0
Z s
vmin.u; v/dudv
CZ t
s
Z s
0
min.u; v/dudv
D s3
6C
s3
2 s3
3
C s2
2.t s/
D s2t
2 s3
6D s2
6.3t s/:
Cov.X5.s/; X5.t// D Cov.W.s C 1/ W.s/; W.t C 1/ W.t//
D s C 1 min.s C 1; t/ s C s
D 0; if s t 1; and D s t C 1 if s t > 1:
The two cases are combined into the single formula Cov.W.s C 1/ W.s/;
W.t C 1/ W.t// D .s t C 1/C. The covariance functions of X1.t/; X3.t/, andX5.t/ depend only on s t , and these are stationary.
12.2.1 Condition for a Gaussian Process to be Markov
We show a simple and useful result on characterizing Gaussian processes that areMarkov. It turns out that there is a simple way to tell if a given Gaussian processis Markov by simply looking at its correlation function. Because we only need to
12.2 Basic Definitions 407
consider finitedimensional distributions to decide if a stochastic process is Markov,it is only necessary for us to determine when a finite sequence of jointly normalvariables has the Markov property. We start with that case.
Definition 12.10. Let X1; X2; : : : ; Xn be n jointly distributed continuous random variables. The ndimensional random vector .X1; : : : ; Xn/ is said to havethe Markov property if for every k n, the conditional distribution of Xk
given X1; : : : ; Xk1 is the same as the conditional distribution of Xk given Xk1
alone.
Theorem 12.1. Let .X1; : : : ; Xn/ have a multivariate normal distribution withmeans zero and the correlations Xj ;Xk
D jk . Then .X1; : : : ; Xn/ has the Markovproperty if and only if for 1 i j k n; ik D ij jk .
Proof. We may assume that each Xi has variance one. If .X1; : : : ; Xn/ has theMarkov property, then for any k, E.Xk jX1; : : : ; Xk1/ D E.Xk jXk1/ Dk1;kXk1(see Chapter 5). Therefore, Xk k1;kXk1 D Xk E.Xk jX1; : : : ;
Xk1/ is independent of the vector .X1; : : : ; Xk1/. In particular, each covariance Cov.Xk k1;kXk1; Xi / must be zero for all i k 1. This leads toik D i;k1k1;k , and to the claimed identity ik D ij jk by simply applyingik D i;k1k1;k repeatedly.
Conversely, suppose the identity ik D ij jk holds for all 1 i j k n. Then, it follows from the respective formulas for E.Xk jX1; : : : ; Xk1/
and Var.Xk jX1; : : : ; Xk1/ (see Chapter 5) that E.Xk jX1; : : : ; Xk1/ Dk1;kXk1 D E.Xk jXk1/, and Var.Xk jX1; : : : ; Xk1/ D Var.Xk jXk1/.All conditional distributions for a multivariate normal distribution are also normal,therefore it must be the case that the distribution of Xk given X1; : : : ; Xk1 anddistribution of Xk given just Xk1 are the same. This being true for all k, the fullvector .X1; : : : ; Xn/ has the Markov property. ut
Because the Markov property for a continuoustime stochastic process is definedin terms of finitedimensional distributions, the above result gives us the followingsimple and useful result as a corollary.
Corollary. A Gaussian process X.t/; t 2 R is Markov if and only if X.s/;X.u/ DX.s/;X.t/ X.t/;X.u/ for all s; t; u; s t u.
12.2.2 Explicit Construction of Brownian Motion
It is not a priori obvious that an uncountable collection of random variables with thedefining properties of Brownian motion can be constructed on a common probabilityspace (a measure theory terminology). In other words, that Brownian motion existsrequires a proof. Various proofs of the existence of Brownian motion can be given.We provide two explicit constructions, of which one is more classic in nature. Butthe second construction is also useful.
408 12 Brownian Motion and Gaussian Processes
Theorem 12.2 (Karhunen–Loeve Expansion). (a) Let Z1; Z2; : : : be an infinitesequence of iid standard normal variables. Then, with probability one, the infinite series
W.t/ D p2
1XmD1
sin
m 12
t
m 1
2
Zm
converges uniformly in t on Œ0; 1 and the process W.t/ is a Brownian motionon Œ0; 1.
The infinite series B.t/ D p2
P1mD1
sin.mt/m
Zm converges uniformly in t
on Œ0; 1 and the process B.t/ is a Brownian Bridge on Œ0; 1.(b) For n 0, let In denote the set of odd integers in Œ0; 2n. Let Zn;k; n 0; k 2
In be a double array of iid standard normal variables. Let Hn;k.t/; n 0; k 2In be the sequence of Haar wavelets defined as
Hn;k.t/ D 0 if t …
k 1
2n;
k C 1
2n
; and Hn;k.t/ D 2.n1/=2
if t 2
k 1
2n;
k
2n
; and 2.n1/=2 if t 2
k
2n;
k C 1
2n
:
Let Sn;k.t/ be the sequence of Schauder functions defined as Sn;k.t/ DR t
0Hn;k.s/ds; 0 t 1; n 0; k 2 In.Then the infinite series W.t/ D P1
nD0
Pk2In
Zn;kSn;k.t/ converges uniformly in t on Œ0; 1 and the process W.t/ is a Brownian motion on Œ0; 1.
Remark. See Bhattacharya and Waymire (2007, p. 135) for a proof. Both constructions of Brownian motion given above can be heuristically understood byusing ideas of Fourier theory. If the sequence f0.t/ D 1; f1.t/; f2.t/; : : : formsan orthonormal basis of L2Œ0; 1, then we can expand a square integrable function, say w.t/, as an infinite series
Pi ci fi .t/, where ci equals the inner productR 1
0 w.t/fi .t/dt . Thus, c0 D 0 if the integral of w.t/ is zero. The Karhunen–Loeveexpansion can be heuristically explained as a random orthonormal expansion ofW.t/. The basis functions fi .t/ chosen do depend on the process W.t/, specificallythe covariance function. The inner products
R 1
0 W.t/fi .t/dt; i 1 form a sequenceof iid standard normals. This is very far from a proof, but provides a heuristic context for the expansion. The second construction is based similarly on expansionsusing a wavelet basis instead of a Fourier basis.
12.3 Basic Distributional Properties
Distributional properties and formulas are always useful in doing further calculations and for obtaining concrete answers to questions. The most basic distributionalproperties of the Brownian motion and bridge are given first.
12.3 Basic Distributional Properties 409
Throughout this chapter, the notation W.t/ and B.t/ mean a (standard) Brownianmotion and Brownian bridge. The phrase standard is often deleted for brevity.
Proposition. (a) Cov.W.s/; W.t// D min.s; t/I Cov.B.s/; B.t// D min.s; t/st:
(b) (The Markov Property). For any given n and t0 < t1 < < tn,the conditional distribution of W.tn/ given that W.t0/ D x0; W.t1/ Dx1; : : : ; W.tn1/ D xn1 is the same as the conditional distribution of W.tn/
given W.tn1/ D xn1.(c) Given s < t , the conditional distribution of W.t/ given W.s/ D w is N.w; ts/.(d) Given t1 < t2 < < tn, the joint density of W.t1/; W.t2/; : : : ; W.tn/ is given
by the function
f .x1; x2; : : : ; xn/ D p.x1; t1/p.x2 x1; t2 t1/ p.xn xn1; tn tn1/;
where p.x; t/ is the density of N.0; t/; that is, p.x; t/ D 1p2t
e x2
2t :
Each part of this proposition follows on simple and direct calculation byusing the definition of a Brownian motion and Brownian bridge. It is worthmentioning that the Markov property is extremely important and is a consequence of the independent increments property. Alternatively, one can simplyuse our previous characterization that a Gaussian process is Markov if and onlyif its correlation function satisfies X.s/;X.u/ D X.s/;X.t/ X.t/;X.u/ for alls t u.
The Markov property can be strengthened to a very useful property, known as thestrong Markov property. For instance, suppose you are waiting for the process toreach a level b for the first time. The process will reach that level at some randomtime, say . At this point, the process will simply start over, and W.t/ b will actlike a path of a standard Brownian motion from that point onwards. For the generaldescription of this property, we need a definition.
Definition 12.11. A nonnegative random variable is called a stopping time for theprocess W.t/ if for any s > 0, whether s depends only on the values of W.t/
for t s.
Example 12.2. For b > 0, consider the first passage time Tb D infft > 0 W W.t/
D bg. Then, Tb > s if and only if W.t/ < b for all t s. Therefore, Tb is a stoppingtime for the process W.t/.
Example 12.3. Let X be a U Œ0; 1 random variable independent of the processW.t/. Then the nonnegative random variable D X is not a stopping time forthe process W.t/.
Theorem 12.3 (Strong Markov Property). If is a stopping time for the processW.t/, then W. Ct/W./ is also a Brownian motion on Œ0; 1/ and is independentof fW.s/; s g.
See Bhattacharya and Waymire (2007, p. 153) for its proof.
410 12 Brownian Motion and Gaussian Processes
12.3.1 Reflection Principle and Extremes
It is important in applications to be able to derive the distribution of special functionals of Brownian processes. They can be important because a Brownian processis used directly as a statistical model in some problem, or they can be importantbecause the functional arises as the limit of some suitable sequence of statistics ina seemingly unrelated problem. Examples of the latter kind are seen in applicationsof the socalled invariance principle. For now, we provide formulas for the distribution of certain extremes and first passage times of a Brownian motion. The followingnotation is used:
M.t/ D sup0<stW.s/I Tb D infft > 0 W W.t/ D bg:
Theorem 12.4 (Reflection Principle). (a) For b > 0, P.M.t/ > b/ D 2P.W
.t/ > b/:
(b) For t > 0; M.t/ D sup0<stW.s/ has the density
r2
tex2=.2t/; x > 0:
(c) For b > 0, the first passage time Tb has the density
bpt
t3=2
;
where denotes the standard normal density function.(d) (First Arcsine Law). Let be the point of maxima of W.t/ on Œ0; 1. Then is
almost surely unique, and P. t/ D 2
arcsin.p
t /.(e) (Reflected Brownian Motion). Let X.t/ D sup0st jW.s/j. Then X.1/ D
sup0s1jW.s/j has the CDF
G.x/ D 4
1XmD0
.1/m
2m C 1e.2mC1/22=.8x2/; x 0:
(f) (Maximum of a Brownian Bridge). Let B.t/ be a Brownian bridge on Œ0; 1.Then, sup0t1 jB.t/j has the CDF
H.x/ D 1 1X
kD1.1/k1e2k2x2
; x 0:
(g) (Second Arcsine Law). Let L D supft 2 Œ0; 1 W W.t/ D 0g. Then P.L t/ D2
arcsin.p
t /.(h) Given 0 < s < t , P.W.t/ has at least one zero in the time interval .s; t// D
2
arccos.q
st/.
12.3 Basic Distributional Properties 411
Proof of the Reflection Principle: The reflection principle is of paramount importance and we provide a proof of it. The reflection principle follows from twoobservations, the first of which is obvious, and the second needs a clever argument.The observations are:
P.Tb < t/ D P.Tb < t; W.t/ > b/ C P.Tb < t; W.t/ < b/;
and,P.Tb < t; W.t/ > b/ D P.Tb < t; W.t/ < b/:
Because P.Tb < t; W.t/ > b/ D P.W.t/ > b/ (because W.t/ > b impliesthat Tb < t), if we accept the second identity above, then we immediately have thedesired result P.M.t/ > b/ D P.Tb < t/ D 2P.W.t/ > b/. Thus, only the secondidentity needs a proof. This is done by a clever argument.
The event fTb < t; W.t/ < bg happens if and only if at some point < t ,the process reaches the level b, and then at time t drops to a lower level l; l < b.However, once at level b, the process could as well have taken the path reflectedalong the level b, which would have caused the process to end up at level b C.b l/ D 2b l at time t . We now observe that 2b l > b, meaning thatcorresponding to every path in the event fTb < t; W.t/ < bg, there is a path inthe event fTb < t; W.t/ > bg, and so P.Tb < t; W.t/ < b/ must be equal toP.Tb < t; W.t/ > b/.
This is the famous reflection principle for Brownian motion. An analytic proofof the identity P.Tb < t; W.t/ < b/ D P.Tb < t; W.t/ > b/ can be given by usingthe strong Markov property of Brownian motion.
Note that both parts (b) and (c) of the theorem are simply restatements of part(a). Many of the remaining parts follow on calculations that also use the reflectionprinciple. Detailed proofs can be seen, for example, in Karlin and Taylor (1975,pp. 345–354). utExample 12.4 (Density of Last Zero Before T ). Consider standard Brownian motionW.t/ on Œ0; 1/ starting at zero and fix a time T > 0. We want to find the densityof the last zero of W.t/ before the time T . Formally, let D T D supft < T WW.t/ D 0g. Then, we want to find the density of .
By using part (g) of the previous theorem,
P. > s/ D P.There is at least one zero of W in .s; T // D 2
arccos
rs
T
:
Therefore, the density of is
f .s/ D d
ds
2
arccos
rs
T
D 1
p
s.T s/; 0 < s < T:
Notice that the density is symmetric about T2
, and therefore E./ D T2
. A calcula
tion shows that E.2/ D 38T 2, and therefore Var./ D 3
8T 2 T 2
4D 1
8T 2.
412 12 Brownian Motion and Gaussian Processes
12.3.2 Path Properties and Behavior Near Zero and Infinity
A textbook example of a nowhere differentiable and yet everywhere continuousfunction is Weierstrass’s function f .t/ D P1
nD0 2n cos.bnt/; 1 < t < 1, forb > 2 C 3 . Constructing another example of such a function is not trivial. A resultof notoriety is that almost all sample paths of Brownian motion are functions ofthis kind; that is, as a function of t; W.t/ is continuous at every t , and differentiableat no t! The paths are extremely crooked. The Brownian bridge shares the sameproperty. The sample paths show other evidence of extreme oscillation; for example,in any arbitrarily small interval containing the starting time t D 0, W.t/ changes itssign infinitely often. The various important path properties of Brownian motion aredescribed and discussed below.
Theorem 12.5. Let W.t/; t > 0 be a Brownian motion on Œ0; 1/. Then,
(a) (Scaling). For c > 0; X.t/ D c 12 W.ct/ is a Brownian motion on Œ0; 1/.
(b) (Time reciprocity). X.t/ D tW. 1t/, with the value being defined as zero at
t D 0 is a Brownian motion on Œ0; 1/.(c) (Time Reversal). Given 0 < T < 1; XT .t/ D W.T /W.T t/ is a Brownian
motion on Œ0; T .
Proof. Only part (b) requires a proof, the others being obvious. First note that fors t , the covariance function is
Cov
sW
1
s
; tW
1
t
D st
min
1
s;
1
t
D st
1
tD s D minfs; tg:
It is obvious that X.t/ X.s/ N.0; t s/. Next, for s < t < u; Cov.X.t/ X.s/; X.u/ X.t// D t s t C s D 0, and the independent increments propertyholds. The sample paths are continuous (including at t D 0) because W.t/ has continuous sample paths, and X.0/ D 0. Thus, all the defining properties of a Brownianmotion are satisfied, and hence X.t/ must be a Brownian motion. ut
Part (b) leads to the following useful property.
Proposition. With probability one, W.t/t
! 0 as t ! 1.
The behavior of Brownian motion near t D 0 is quite a bit more subtle, andwe postpone its discussion till later. We next describe a series of classic results thatillustrate the extremely rough nature of the paths of a Brownian motion. The resultsessentially tell us that at any instant, it is nearly impossible to predict what a particleperforming a Brownian motion will do next. Here is a simple intuitive explanationfor why the paths of a Brownian motion are so rough.
12.3 Basic Distributional Properties 413
Take two time instants s; t; s < t . We then have the simple moment formulaEŒ.W.t/ W.s/2 D .t s/. Writing t D s C h, we get
EŒW.s C h/ W.s/2 D h , E
W.s C h/ W.s/
h
2
D 1
h:
If the time instants s; t are close together, then h 0, and so 1h
is large. We can
see that the increment W.sCh/W.s/h
is blowing up in magnitude. Thus, differentiability is going to be a problem. In fact, not only is the path of a Brownian motionguaranteed to be nondifferentiable at any prespecified t , it is guaranteed to be nondifferentiable simultaneously at all values of t . This is a much stronger roughnessproperty than lack of differentiability at a fixed t .
The next theorem is regarded as one of the most classic ones in probability theory.We first need a few definitions.
Definition 12.12. Let f be a realvalued continuous function defined on some opensubset T of R. The upper and the lower Dini right derivatives of f at t 2 T aredefined as
DCf .t/ D lim suph#0
f .t C h/ f .t/
h; DCf .t/ D lim inf
h#0
f .t C h/ f .t/
h:
Definition 12.13. Let f be a realvalued function defined on some open subset Tof R. The function f is said to be Holder continuous of order > 0 at t if forsome finite constant C (possibly depending on t), jf .t C h/ f .t/j C jhj forall sufficiently small h. If f is Holder continuous of order at every t 2 T with auniversal constant C , it is called Holder continuous of order in T .
Theorem 12.6 (Crooked Paths and Unbounded Variation). (a) Given anyT > 0; P.supt2Œ0;T W.t/ > 0; inft2Œ0;T W.t/ < 0/ D 1. Hence, with probability one, in any nonempty interval containing zero, W.t/ changes sign at leastonce, and therefore infinitely often.
(b) (Nondifferentiability Everywhere). With probability one, W.t/ is (simultaneously) nondifferentiable at all t > 0; that is,
P.For each t > 0; W.t/ is not differentiable at t/ D 1:
(c) (Unbounded Variation). For every T > 0, with probability one, W.t/ has anunbounded total variation as a function of t on Œ0; T .
(d) With probability one, no nonempty time interval W.t/ can be monotone increasing or monotone decreasing.
(e) P.For all t > 0; DCW.t/ D 1 or DCW.t/ D 1 or both/ D 1.(f) (Holder Continuity). Given any finite T > 0 and 0 < < 1
2, with probability
one, W.t/ is Holder continuous on Œ0; T of order .(g) For any > 1
2, with probability one, W.t/ is nowhere Holder continuous of
order .
414 12 Brownian Motion and Gaussian Processes
(h) (Uniform Continuity in Probability). Given any > 0; and 0 < T < 1;
P.supt;s;0 t;s T;jtsj<hjW.t/ W.s/j > / ! 0 as h ! 0.
Proof. Each of parts (c) and (d) would follow from part (b), because of results inreal analysis that monotone functions or functions of bounded variation must bedifferentiable almost everywhere. Part (e) is a stronger version of the nondifferentiability result in part (b); see Karatzas and Shreve (1991, pp. 106–111) for parts(e)–(h). Part (b) itself is proved in many standard texts on stochastic processes; theproof involves quite a bit of calculation. We show here that part (a) is a consequenceof the reflection principle.
Clearly, it is enough just to show that for any T > 0; P.supt2Œ0;T W.t/ > 0/ D 1.This will imply that P.inft2Œ0;T W.t/ < 0/ D 1, because W.t/ is a Brownianmotion if W.t/ is, and hence it will imply all the other statements in part (a). Fixc > 0. Then,
P. supt2Œ0;T
W.t/ > 0/ P. supt2Œ0;T
W.t/ > c/ D 2P.W.T / > c/ .reflection principle/
! 1 as c ! 0, and therefore P.supt2Œ0;T W.t/ > 0/ D 1. utRemark. It should be noted that the set of points at which the path of a Brownianmotion is Holder continuous of order 1
2is not empty, although in some sense such
points are rare.The oscillation properties of the paths of a Brownian motion are further illus
trated by the laws of the iterated logarithm for Brownian motion. The path of aBrownian motion is a random function. Can we construct suitable deterministicfunctions, say u.t/ and l.t/, such that for large t the Brownian path W.t/ will bebounded by the envelopes l.t/; u.t/? What are the tightest such envelope functions?Similar questions can be asked about small t . The law of the iterated logarithm answers these questions precisely. However, it is important to note that in addition tothe intellectual aspect of just identifying the tightest envelopes, the iterated logarithm laws have other applications.
Theorem 12.7 (LIL). Let f .t/ D p2t log j log t j; t > 0. With probability one,
(a) lim supt!1W.t/f .t/
D 1I lim inft!1 W.t/f .t/
D 1.
(b) lim supt!0W.t/f .t/
D 1I lim inft!0W.t/f .t/
D 1.
Remark on Proof: Note that the lim inf statement in part (a) follows from thelim sup statement because W.t/ is also a Brownian motion if W.t/ is. On theother hand, the two statements in part (b) follow from the corresponding statementsin part (a) by the time reciprocity property that tW. 1
t/ is also a Brownian motion
if W.t/ is. For a proof of part (a), see Karatzas and Shreve (1991), or Bhattacharyaand Waymire (2007, p. 143). ut
12.3 Basic Distributional Properties 415
12.3.3 Fractal Nature of Level Sets
For a moment, let us consider a general question. Suppose T is a subset of thereal line, and X.t/; t 2 T a realvalued stochastic process. Fix a number u, andask how many times does the path of X.t/ cut the line drawn at level u; that is,consider NT .u/ D #ft 2 T W X.t/ D ug. It is not a priori obvious that NT .u/ isfinite. Indeed, for Brownian motion, we already know that in any nonempty intervalcontaining zero, the path hits zero infinitely often with probability one. One mightguess that this lack of finiteness is related to the extreme oscillatory nature of thepaths of a Brownian motion. Indeed, that is true. If the process X.t/ is a bit moresmooth, then the number of level crossings will be finite. However, investigationsinto the distribution of NT .u/ will still be a formidable problem. For the Brownianmotion, it is not the number of level crossings, but the geometry of the set of timesat which it crosses a given level u that is of interest. In this section, we describe thefascinating properties of these level sets of the path of a Brownian motion. We alsogive a very brief glimpse into what we can expect for processes whose paths aremore smooth, to draw the distinction from the case of Brownian motion.
Given b 2 R, letCb D ft 0 W W.t/ D bg:
Note that Cb is a random set, in the sense that different sample paths will hit the levelb at different sets of times. We only consider the case b D 0 here, although most ofthe properties of C0 extend in a completely evident way to the case of a general b.
Theorem 12.8. With probability one, C0 is an uncountable, unbounded, closed setof Lebesgue measure zero, and has no isolated points; that is, in any neighborhoodof an element of C0, there is at least one other element of C0.
Proof. It follows from an application of the reflection principle that P.supt0
W.t/ D 1; inft0 W.t/ D 1/ D 1 (check it!). Therefore, given any T > 0,there must be a time instant t > T such that W.t/ D 0. For if there were a finite lasttime that W.t/ D 0, then for such a sample path, the supremum and the infimumcannot simultaneously be infinite. This means that the zero set C0 is unbounded. It isclosed because the paths of Brownian motion are continuous. We have not definedwhat Lebesgue measure means, therefore we cannot give a rigorous proof that C0
has zero Lebesgue measure. Think of Lebesgue measure of a set C as its total length .C/. Then, by Fubini’s theorem,
EŒ .C0/ D E
ZC0
dt
D E
ZŒ0;1/
IfW.t/D0gdt
DZ
Œ0;1/
P.W.t/ D 0/dt D 0:
If the expected length is zero, then the length itself must be zero with probabilityone. That C0 has no isolated points is entirely nontrivial to prove and we omit the
416 12 Brownian Motion and Gaussian Processes
proof. Finally, by a result in real analysis that any closed set with no isolated pointsmust be uncountable unless it is empty, we have that C0 is an uncountable set. utRemark. The implication is that the set of times at which Brownian motion returnsto zero is a topologically large set marked by holes, and collectively the holes are bigenough that the zero set, although uncountable, has length zero. Such sets in one dimension are commonly called Cantor sets. Corresponding sets in higher dimensionsoften go by the name fractals.
12.4 The Dirichlet Problem and Boundary CrossingProbabilities
The Dirichlet problem on a domain in Rd ; 1 d < 1 was formulated by Gaussin the midnineteenth century. It is a problem of special importance in the area ofpartial differential equations with boundary value constraints. The Dirichlet problemcan also be interpreted as a problem in the physical theory of diffusion of heatin a d dimensional domain with controlled temperature at the boundary points ofthe domain. According to the laws of physics, the temperature as a function of thelocation in the domain would have to be a harmonic function. The Dirichlet problemthus asks for finding a function u.x/ such that
u.x/ D g.x/ .specified/; x 2 @U I u.:/ harmonic in U;
where U is a specified domain in Rd . In this generality, solutions to the Dirichlet problem need not exist. We need the boundary value function g as well as thedomain U to be sufficiently nice. The interesting and surprising thing is that solutions to the Dirichlet problem have connections to the d dimensional Brownianmotion. Solutions to the Dirichlet problem can be constructed by solving suitableproblems (which we describe below) about d dimensional Brownian motion. Conversely, these problems on the Brownian motion can be solved if we can directly findsolutions to a corresponding Dirichlet problem, perhaps by inspection, or by usingstandard techniques in the area of partial differential equations. Thus, we have analtruistic connection between a special problem on partial differential equations anda problem on Brownian motion. It turns out that these connections are more thanintellectual curiosities. For example, these connections were elegantly exploited inBrown (1971) to solve certain otherwise very difficult problems in the area of statistical decision theory.
We first provide the necessary definitions. We remarked before that the Dirichletproblem is not solvable on arbitrary domains. The domain must be such that it doesnot contain any irregular boundary points. These are points x 2 @U such that aBrownian motion starting at x immediately falls back into U . A classic exampleis that of a disc, from which the center has been removed. Then, the center is anirregular boundary point of the domain. We refer the reader to Karatzas and Shreve(1991, pp. 247–250) for the exact regularity conditions on the domain.
12.4 The Dirichlet Problem and Boundary Crossing Probabilities 417
Definition 12.14. A set U Rd ; 1 d < 1 is called a domain if U is connectedand open.
Definition 12.15. A twice continuously differentiable realvalued function u.x/
defined on a domain U Rd is called harmonic if its Laplacian 4u.x/ DPdiD1
@2
@x2i
u.x/ 0 for all x D .x1; x2; : : : ; xd / 2 U .
Definition 12.16. Let U be a bounded regular domain in Rd , and g a realvaluedcontinuous function on @U . The Dirichlet problem on the domain U Rd withboundary value constraint g is to find a function u W U ! R such that u is harmonicin U , and u.x/ D g.x/ for all x 2 @U , where @U denotes the boundary of U andU the closure of U .
Theorem 12.9. Let U Rd be a bounded regular domain. Fix x 2 U . ConsiderXd .t/; t 0, d dimensional Brownian motion starting at x, and let D U Dinfft > 0 W Xd .t/ … U g D infft > 0 W Xd .t/ 2 @U g. Define the function upointwise on U by
u.x/ D ExŒg.Xd .//; x 2 U I u D g on @U:
Then u is continuous on U and is the unique solution, continuous on U , to theDirichlet problem on U with boundary value constraint g.
Remark. When Xd .t/ exits from U having started at a point inside U , it can exitthrough different points on the boundary @U . If it exits at the point y 2 @U , theng.Xd .// will equal g.y/. The exit point y is determined by chance. If we averageover y, then we get a function that is harmonic inside U and equals g on @U . Weomit the proof of this theorem, and refer the reader to Karatzas and Shreve (1991,p. 244), and Korner (1986, p. 55).
Example 12.5 (Dirichlet Problem on an Annulus). Consider the Dirichlet problemon the d dimensional annulus U D fz W r < jjzjj < Rg; where 0 < r < R < 1.Specifically, suppose we want a function u such that
u harmonic on fz W r < jjzjj < Rg; u D 1 on fz W jjzjj D Rg;u D 0 on fz W jjzjj D rg:
A continuous solution to this can be found directly. The solution is
u.z/ D jzj r
R rfor d D 1I
u.z/ D log jjzjj log r
log R log rfor d D 2I
u.z/ D r2d jjzjj2d
r2d R2dfor d > 2:
418 12 Brownian Motion and Gaussian Processes
We now relate this solution to the Dirichlet problem on U with d dimensionalBrownian motion. Consider then Xd .t/, d dimensional Brownian motion thatstarted at a given point x inside U ; r < jjxjj < R. Because the function g corresponding to the boundary value constraint in this example is g.z/ D IfzWjjzjjDRg, bythe above theorem, u.x/ equals
u.x/ D ExŒg.Xd .//
DPx.Xd .t/ reaches the spherejjzjj D R before it reaches the spherejjzjj D r/:
For now, let us consider the case d D 1. Fix positive numbers r; R and suppose aonedimensional Brownian motion starts at a number x between r and R, 0 < r <
x < R < 1. Then the probability that it will hit the line z D R before hittingthe line z D r is xr
Rr. The closer the starting point x is to R, the larger is the
probability that it will first hit the line z D R. Furthermore, the probability is a verysimple linear function. We revisit the case d > 1 when we discuss recurrence andtransience of d dimensional Brownian motion in the next section.
12.4.1 Recurrence and Transience
We observed during our discussion of the lattice random walk (Chapter 11) that itis recurrent in dimensions d D 1; 2 and transient for d > 2. That is, in one and twodimensions the lattice random walk returns to any integer value x at least once (andhence infinitely often) with probability one, but for d > 2, the probability that therandom walk returns at all to its starting point is less than one. For the Brownianmotion, when the dimension is more than one, the correct question is not to ask ifit returns to particular points x. The correct question to ask is if it returns to anyfixed neighborhood of a particular point, however small. The answers are similarto the case of the lattice random walk; that is, in one dimension, Brownian motionreturns to any point x infinitely often with probability one, and in two dimensions,Brownian motion returns to any given neighborhood of a point x infinitely oftenwith probability one. But when d > 2, it diverges off to infinity. We can see this byusing the connection of Brownian motion to the Dirichlet problem on discs. We firstneed two definitions.
Definition 12.17. For d > 1, a d dimensional stochastic process Xd .t/; t 0 iscalled neighborhood recurrent if with probability one, it returns to any given ballB.x; / infinitely often.
Definition 12.18. For any d , a d dimensional stochastic process Xd .t/; t 0 iscalled transient if with probability one, Xd .t/ diverges to infinity.
We now show how the connection of the Brownian motion to the solution of theDirichlet problem will help us establish that Brownian motion is transient for d > 2.That is, if we let B be the event that limt!1 jjWd .t/jj ¤ 1, then we show thatP.B/ must be zero for d > 2. Indeed, to be specific, take d D 3, pick a point
12.5 The Local Time of Brownian Motion 419
x 2 R3 with jjxjj > 1, suppose that our Brownian motion is now sitting at the pointx, and ask what the probability is that it will reach the unit ball B1 before it reachesthe disk jjzjj D R. Here R > jjxjj. We have derived this probability. The Markov
property of Brownian motion gives this probability to be exactly equal to 1 1 1jjxjj
1 1R
.
This clearly converges to 1jjxjj as R ! 1. Imagine now that the process has evolved
for a long time, say T , and that it is now sitting at a very distant x (i.e., jjxjj is large).The LIL for Brownian motion guarantees that we can pick such a large T and sucha distant x. Then, the probability of ever returning from x to the unit ball wouldbe the small number D 1
jjxjj . We can make arbitrarily small by choosing jjxjjsufficiently large, and what that means is that the probability of the process returninginfinitely often to the unit ball B1 is zero. The same argument works for Bk , the ballof radius k for any k 1, and therefore, P.B/ D P.[1
kD1Bk/ D 0; that is, the
process drifts off to infinity with probability one. The same argument works for anyd > 2, not just d D 3. The case of d D 1; 2 is left as a chapter exercise. We then havethe following theorem. utTheorem 12.10. Brownian motion visits every real x infinitely often with probability one if d D 1, is neighborhood recurrent if d D 2, and transient if d > 2.Moreover, by its neighborhood recurrence for d D 2, the graph of a twodimensionalBrownian path on Œ0; 1/ is dense in the twodimensional plane.
12.5 The Local Time of Brownian Motion
For the simple symmetric random walk in one dimension, we derived the distribution of the local time .x; n/, which is the total time the random walk spends at theinteger x up to the time instant n. It would not be interesting to ask exactly the samequestion about Brownian motion, because the number of time points t up to sometime T at which the Brownian motion W.t/ equals a given x is zero or infinity. PaulLevy gave the following definition for the local time of a Brownian motion. Fix aset A in the real line and a general time instant T; T > 0. Now ask what is the totalsize of the times t up to T at which the Brownian motion has resided in the givenset A. That is, denoting Lebesgue measure on R by , look at the following kernel
H.A; T / D ft T W W.t/ 2 Ag:
Using this, Levy formulated the local time of the Brownian motion at a given x as
.x; T / D lim#0
H.Œx ; x C ; T /
2;
where the limit is supposed to mean a pointwise almost sure limit. It is important tonote that the existence of the almost sure limit is nontrivial.
420 12 Brownian Motion and Gaussian Processes
Instead of the clumsy notation T , we eventually simply use the notation t , andthereby obtain a new stochastic process .x; t/, indexed simultaneously by twoparameters x and t . We can regard .x; t/ together as a vectorvalued time parameter,and call .x; t/ a random field. This is called the local time of onedimensionalBrownian motion. The local time of Brownian motion is generally regarded to be ananalytically difficult process to study. We give a relatively elementary exposition tothe local time of Brownian motion in this section.
Recall now the previously introduced maximum process of standard Brownianmotion, namely M.t/ D sup0st W.s/. The following major theorem on the distribution of the local time of Brownian motion at zero was proved by Paul Levy.
Theorem 12.11. Let W.s/; s 0 be standard Brownian motion starting at zero.Consider the two stochastic processes, f.0; t/; t 0g, and fM.t/; t 0g. Thesetwo processes have the same distribution.
In particular, for any given fixed t , and y > 0,
P
.0; t/p
t y
D
r2
Z y
0
ez2=2d z D 2ˆ.y/ 1
, P..0; t/ y/ Dr
2
t
Z y
0
ez2=.2t/d z:
For a detailed proof of this theorem, we refer to Morters and Peres (2010, p. 160).A sketch of the proof can be seen in Revesz (2005).
For a general level x, the corresponding result is as follows, and it follows fromthe case x D 0 treated above.
Theorem 12.12.
P..x; t/ y/ D 2ˆ
y C jxjp
t
1; 1 < x < 1; t; y > 0:
It is important to note that if the level x ¤ 0, then the local time .x; t/ can actuallybe exactly equal to zero with a positive probability, and this probability is simplythe probability that Brownian motion does not reach x within time t , and equals2ˆ. jxjp
t/ 1. This is not the case if the level is zero, in which case the local time
.0; t/ possesses a density function.The theorem above also says that the local time of Brownian motion grows at
the rate ofp
t for any level x. The expected value follows easily by evaluating theintegral
R 10
Œ1 P..x; t/ y/dy, and one gets
EŒ.x; t/ D 4p
t
xpt
1 ˆ
jxjpt
4jxj
1 ˆ
jxjpt
2
:
The limit of this as x ! 0 equalsq
2
pt , which agrees with EŒ.0; t/. The ex
pected local time is plotted in Fig. 12.3.
12.6 Invariance Principle and Statistical Applications 421
01
23
45 0
2
4
6
8
10
0
1
2
01
23
4
Fig. 12.3 Plot of the expected local time as function of (x,t)
12.6 Invariance Principle and Statistical Applications
We remarked in the first section of this chapter that scaled random walks mimic theBrownian motion in a suitable asymptotic sense. As a matter of fact, if X1; X2; : : :
is any iid sequence of onedimensional random variables satisfying some relativelysimple conditions, then the sequence of partial sums Sn D Pn
iD1 Xi ; n 1, whenappropriately scaled, mimics Brownian motion in a suitable asymptotic sense. Whyis this useful? This is useful because in many concrete problems of probability andstatistics, suitable functionals of the sequence of partial sums arise as the objectsof direct importance. The invariance principle allows us to conclude that if the sequence of partial sums Sn mimics W.t/, then any nice functional of the sequence ofpartial sums will also mimic the same functional of W.t/. So, if we can figure outhow to deal with the distribution of the needed functional of the W.t/ process, thenwe can use it in practice to approximate the much more complicated distribution ofthe original functional of the sequence of partial sums. It is a profoundly useful factin the asymptotic theory of probability that all of this is indeed a reality. This section treats the invariance principle for the partial sum process of onedimensionaliid random variables. We recommend Billingsley (1968), Hall and Heyde (1980),and Csorgo and Revesz (1981) for detailed and technical treatments; Erdos and Kac(1946), Donsker (1951), Komlos et al. (1975, 1976, Major (1978), Whitt (1980),and Csorgo and Hall (1984) for invariance principles for the partial sum process;and Pyke (1984) and Csorgo (2002)) for lucid reviews. Also, see Dasgupta (2008)for references to various significant extensions, such as the multidimensional anddependent cases.
422 12 Brownian Motion and Gaussian Processes
Although the invariance principle for partial sums of iid random variables is usually credited to Donsker (1951), Erdos and Kac (1946) contained the basic ideabehind the invariance principle and also worked out the asymptotic distribution of anumber of key and interesting functionals of the partial sum process. Donsker (1951)provided the full generalization of the Erdos–Kac technique by providing explicitembeddings of the discrete sequence Skp
n; k D 1; 2; : : : ; n into a continuoustime
stochastic process Sn.t/ and by establishing the limiting distribution of a generalcontinuous functional h.Sn.t//. In order to achieve this, it is necessary to use a continuous mapping theorem for metric spaces, as consideration of Euclidean spacesis no longer enough. It is also useful to exploit a property of the Brownian motion known as the Skorohod embedding theorem. We first describe this necessarybackground material.
Define
C Œ0; 1 D Class of all continuous real valued functions on Œ0; 1; and
DŒ0; 1 D Class of all realvalued functions on Œ0; 1 that are right continuous
and have a left limit at every point in Œ0; 1:
Given two functions f; g in either C Œ0; 1 or DŒ0; 1, let .f; g/ D sup0t1
jf .t/ g.t/j denote the supremum distance between f and g. We refer to as theuniform metric. Both C Œ0; 1 and DŒ0; 1 are (complete) metric spaces with respectto the uniform metric .
Suppose X1; X2; : : : is an iid sequence of real valued random variables with meanzero and variance one. Two common embeddings of the discrete sequence Skp
n; k D
1; 2; : : : ; n into a continuous time process are the following.
Sn;1.t/ D 1pn
ŒSbntc C fntgXbntcC1;
and
Sn;2.t/ D 1pn
SŒnt;
0 t 1. Here, b:c denotes the integer part and f:g the fractional part of a positivereal.
The first one simply continuously interpolates between the values Skpn
by drawingstraight lines, but the second one is only right continuous, with jumps at the pointst D k
n; k D 1; 2; : : : ; n. For certain specific applications, the second embedding is
more useful. It is because of these jump discontinuities that Donsker needed to consider weak convergence in DŒ0; 1. It led to some additional technical complexities.
The main idea from this point on is not difficult. One can produce a versionof Sn.t/, say QSn.t/, such that QSn.t/ is close to a sequence of Wiener processesWn.t/. Because QSn.t/ Wn.t/, if h.:/ is a continuous functional with respect tothe uniform metric, then one can expect that h. QSn.t// h.Wn.t// D h.W.t// indistribution. QSn.t/ being a version of Sn.t/; h.Sn.t// D h. QSn.t// in distribution,
12.6 Invariance Principle and Statistical Applications 423
and so, h.Sn.t// should be close to the fixed Brownian functional h.W.t// in distribution, which is the question we wanted to answer.
The results leading to Donsker’s theorem are presented below.
Theorem 12.13 (Skorohod Embedding). Given a random variable X with meanzero and a finite variance 2, we can construct (on the same probability space) astandard Brownian motion W.t/ starting at zero, and a stopping time with respectto W.t/ such that E./ D 2 and X and the stopped Brownian motion W./ havethe same distribution.
Theorem 12.14 (Convergence of the Partial Sum Process to Brownian Motion).Let Sn.t/ D Sn;1.t/ or Sn;2.t/ as defined above. Then there exists a common probability space on which one can define Wiener processes Wn.t/ starting at zero, anda sequence of processes f QSn.t/g; n 1, such that
(a) For each n; Sn.t/ and QSn.t/ are identically distributed as processes.
(b) sup0t1j QSn.t/ Wn.t/j P) 0:
We prove the last theorem, assuming the Skorohod embedding theorem. A proofof the Skorohod embedding theorem may be seen in Csorgo and Revesz (1981), orin Bhattacharya and Waymire (2007, p. 160).
Proof. We treat only the linearly interpolated process Sn;1.t/, and simply call itSn.t/. To reduce notational clutter, we write as if the version QSn of Sn is Sn itself.Thus, the QSn notation is dropped in the proof of the theorem. Without loss of generality, we take E.X1/ D 0 and Var.X1/ D 1. First, by using the Skorohod embeddingtheorem, construct a stopping time 1 with respect to the process W.t/; t 0 such
that E.1/ D 1 and such that W.1/LD X1. Using the strong Markov property of
Brownian motion, W.t C 1/ W.1/ is also a Brownian motion on Œ0; 1/, independent of .1; W.1//, and we can now pick a stopping time, say 0
2 with respect to
this process, with the two properties E. 02/ D 1 and W. 0
2/LD X2. Therefore, if we
define 2 D 1 C 02, then we have obtained a stopping time with respect to the orig
inal Brownian motion, with the properties that its expectation is 2, and 2 1 and1 are independent. Proceeding in this way, we can construct an infinite sequence ofstopping times 0 D 0 1 2 3 , such that k k1 are iid with meanone, and the two discrete time processes Sk and W.k/ have the same distribution.Moreover, by the usual SLLN,
n
nD 1
n
nXkD1
Œk k1a:s:! 1;
from which it follows that
max0kn jk kjn
P! 0:
424 12 Brownian Motion and Gaussian Processes
Set Wn.t/ D W.nt/pn
; n 1. Therefore, in this notation, W.k/ D pnWn.
k
n/. Now
fix > 0 and consider the event Bn D fsup0t1 jSn.t/ Wn.t/j > g. We need toshow that P.Bn/ ! 0.
Now, because Sn.t/ is defined by linear interpolation, in order that Bn happens,at some t in Œ0; 1 we must have one of
ˇSk=
pn Wn.t/
ˇand
ˇSk1=
pn Wn.t/
ˇ
larger than , where k is the unique k such that k1n
t < kn
. Our goal is toshow that the probability of the union of these two events is small. Now use thefact that in distribution, Sk D W.k/ D p
nWn.k
n/, and so it will suffice to show
that the probability of the union of the two events fjWn. k
n/ Wn.t/j > g and
fjWn.k1
n/Wn.t/j > g is small. However, the union of these two events can only
happen if either Wn differs by a large amount in a small interval, or one of the twotime instants k
nand k1
nare far from t . The first possibility has a small probability
by the uniform continuity of paths of a Brownian motion (on any compact interval),and the second possibility has a small probability by our earlier observation thatmax0kn jkkj
n
P! 0: This implies that P.Bn/ is small for all large n, as we wantedto show.
This theorem implies the following important result by an application of the continuous mapping theorem, continuity being defined through the uniform metric onthe space C Œ0; 1.
Theorem 12.15 (Donsker’s Invariance Principle). Let h be a continuous functional with respect to the uniform metric on C Œ0; 1 and let Sn.t/ be defined as
either Sn;1.t/ or Sn;2.t/. Then h.Sn.t//L) h.W.t//, as n ! 1.
Example 12.6 (CLT Follows from Invariance Principle). The central limit theoremfor iid random variables having a finite variance follows as a simple consequence ofDonsker’s invariance principle. Suppose X1; X2; : : : are iid random variables withmean zero and variance 1. Let Sk D Pk
iD1 Xi ; k 1. Define the functional h.f / Df .1/ on C Œ0; 1. This is obviously a continuous functional on C Œ0; 1 with respect tothe uniform metric .f; g/ D sup0t1 jf .t/ g.t/j. Therefore, with Sn.t/ as thelinearly interpolated partial sum process, it follows from the invariance principle that
h.Sn/ D Sn.1/ DPn
iD1 Xipn
L) h.W / D W.1/ N.0; 1/;
which is the central limit theorem.
Example 12.7 (Maximum of a Random Walk). We apply the Donsker invarianceprinciple to the problem of determination of the limiting distribution of a functionalof a random walk. Suppose X1; X2; : : : are iid random variables with mean zero andvariance 1. Let Sk D Pk
iD1 Xi ; k 1. We want to derive the limiting distribution of
12.7 Strong Invariance Principle and the KMT Theorem 425
Gn D max1kn Skpn
. To derive its limiting distribution, define the functional h.f / Dsup0t1 f .t/ on C Œ0; 1. This is a continuous functional on C Œ0; 1 with respectto the uniform metric .f; g/ D sup0t1 jf .t/ g.t/j. Further notice that ourstatistic Gn can be represented as Gn D h.Sn/, where Sn is the linearly interpolated
partial sum process. Therefore, by Donsker’s invariance principle, Gn D h.Sn/L)
h.W / D sup0t1 W.t/, where W.t/ is standard Brownian motion on Œ0; 1. Weknow its CDF explicitly, namely, for x > 0, P.sup0t1 W.t/ x/ D 2ˆ.x/ 1.Thus, P.Gn x/ ! 2ˆ.x/ 1 for all x > 0.
Example 12.8 (Sums of Powers of Partial Sums). Consider once again iid randomvariables X1; X2; : : : with zero mean and a unit variance. Fix a positive integer m
and consider the statistic Tm;n D n1m=2Pn
kD1 Smk
. By direct integration of the
polygonal curve ŒSn.t/m, we find that Tm;n D R 1
0ŒSn.t/mdt . This guides us to
the functional h.f / D R 1
0 f m.t/dt . Because Œ0; 1 is a compact interval, it is easyto verify that h is a continuous functional on C Œ0; 1 with respect to the uniformmetric. Indeed, the continuity of h.f / follows from simply the algebraic identityjxm ymj D jx yjjxm1 C xm2y C C ym1j. It therefore follows from
Donsker’s invariance principle that Tm;n
L) R 1
0W m.t/dt . At first glance it seems
surprising that a nondegenerate limit distribution for partial sums of Smk
can existwith only two moments.
12.7 Strong Invariance Principle and the KMT Theorem
In addition to the weak invariance principle described above, there are also stronginvariance principles. The first strong invariance principle for partial sums was obtained in Strassen (1964). Since then, a lot of literature has developed, includingfor the multidimensional case. Good sources for information are Strassen (1967),Komlos et al. (1976), Major (1978), Csorgo and Revesz (1981), and Einmahl (1987).
It would be helpful to first understand exactly what a strong invariance principleis meant to achieve. Suppose X1; X2; : : : is a zero mean unit variance iid sequenceof random variables. For n 1, let Sn denote the partial sum
PniD1 Xi , and Sn.t/
the interpolated partial sum process with the special values Sn. kn
/ D Skpn
for each
n and 1 k n. In the process of proving Donsker’s invariance principle, wehave shown that we can construct (on a common probability space) a process QSn.t/
(which is equivalent to the original process Sn.t/ in distribution) and a single Wiener
process W.t/ such that sup0t1 j QSn.t/ 1pn
W.nt/j P! 0. Therefore,
j QSn.1/ 1pn
W.n/j P! 0
) j QSn W.n/jpn
P! 0:
426 12 Brownian Motion and Gaussian Processes
The strong invariance principle asks if we can find suitable functions g.n/ such that
we can make the stronger statement j QSnW.n/jg.n/
a:s:! 0, and as a next step, what is thebest possible choice for such a function g.
The exact statements of the strong invariance principle results require us to saythat we can construct an equivalent process QSn.t/ and a Wiener process W.t/ on
some probability space such that j QSnW.n/jg.n/
a:s:! 0 for some suitable function g. Dueto the clumsiness in repeatedly having to mention these qualifications, we drop theQSn notation and simply say Sn.t/, and we also do not mention that the processes
have all been constructed on some new probability space. The important thing forapplications is that we can use the approximations on the original process itself, bysimply adopting the equivalent process on the new probability space.
Paradoxically, the strong invariance principle does not imply the weak invarianceprinciple (i.e., Donsker’s invariance principle) in general. This is because under theassumption of just the finiteness of the variance of the Xi , the best possible g.n/
increases faster thanp
n. On the other hand, if the common distribution of the Xi
satisfies more stringent moment conditions, then we can make g.n/ a lot slower,and even slower than
pn. The array of results that is available is bewildering and
they are all difficult to prove. We prefer to report a few results of great importance,including in particular the KMT theorem, due to Komlos et al. (1976).
Theorem 12.16. Let X1; X2; : : : be an iid sequence with E.Xi / D 0; Var.Xi / D 1.
(a) There exists a Wiener process W.t/; t 0, starting at zero such thatSnW.n/p
n log log n
a:s:! 0.
(b) Thep
n log log n rate of part (a) cannot be improved in the sense thatgiven any nondecreasing sequence an ! 1 (however slowly), there existsa CDF F with zero mean and unit variance, such that with probability one,lim supn an
SnW.n/pn log log n
D 1, for any iid sequence Xi following the CDF F , and
any Wiener process W.t/.(c) If we make the stronger assumption that Xi has a finite mgf in some open neigh
borhood of zero, then the statement of part (a) can be improved to jSnW.n/j DO.log n/ with probability one.
(d) (KMT Theorem) Specifically, if we make the stronger assumption that Xi has afinite mgf in some open neighborhood of zero, then we can find suitable positiveconstants C; K; such that for any real number x and any given n,
P. max1kn
jSk W.k/j C log n C x/ Kex;
where the constants C; K; depend only on the common CDF of the Xi .
Remark. The KMT theorem is widely regarded as one of the most major advancesin the area of invariance principles and central limit problems. One should note thatthe inequality given in the above theorem has a qualitative nature attached to it,as we can only use the inequality with constants C; K; that are known to exist,depending on the underlying F . Refinements of the version of the inequality given
12.8 Brownian Motion with Drift and Ornstein–Uhlenbeck Process 427
above are available. We refer to Csorgo and Revesz (1981) for such refinements andgeneral detailed treatment of the strong invariance principle.
12.8 Brownian Motion with Drift and Ornstein–UhlenbeckProcess
We finish with two special processes derived from standard Brownian motion. Bothare important in applications.
Definition 12.19. Let W.t/; t 0 be standard Brownian motion starting at zero.Fix 2 R and > 0. Then the process X.t/ D t C W.t/; t 0 is calledBrownian motion with drift and diffusion coefficient . It is clear that it inheritsthe major path properties of standard Brownian motion, such as nondifferentiablityat all t with probability one, the independent increments property, and the Markovproperty. Also, clearly, for fixed t; X.t/ N.t; 2t/.
12.8.1 Negative Drift and Density of Maximum
There are, however, also some important differences when a drift is introduced. Forexample, unless D 0, the reflection principle no longer holds, and consequentlyone cannot derive the distribution of the running maximum M.t/ D sup0st X.s/
by using symmetry arguments. If 0, then it is not meaningful to ask for thedistribution of the maximum over all t > 0. However, if < 0, then the processhas a tendency to drift off towards negative values, and in that case the maximum infact does have a nontrivial distribution. We derive the distribution of the maximumwhen < 0 by using a result on a particular first passage time of the process.
Theorem 12.17. Let X.t/; t 0 be Brownian motion starting at zero, and withdrift < 0 and diffusion coefficient . Fix a < 0 < b, and let
Ta;b D minŒinfft > 0 W X.t/ D ag; infft > 0 W X.t/ D bg;the first time X.t/ reaches either a or b. Then,
P.XTa;bD b/ D e2a=2 1
e2a=2 e2b=2:
A proof of this theorem can be seen in Karlin and Taylor (1975, p 361). By usingthis result, we can derive the distribution of supt>0 X.t/ in the case < 0.
Theorem 12.18 (The Maximum of Brownian Motion with a Negative Drift). IfX.t/; t 0 is Brownian motion starting at zero, and with drift < 0 and diffusioncoefficient , then, supt>0 X.t/ is distributed as an exponential with mean 2
2.
428 12 Brownian Motion and Gaussian Processes
Proof. In the theorem stated above, by letting a ! 1, we get
P.X.t/ ever reaches the level b > 0/ D e2b=2
:
But this is the probability that an exponential variable with mean 2
2is larger
than b. On the other hand, P.X.t/ ever reaches the level b > 0/ is the same asP.supt>0 X.t/ b/. Therefore, supt>0 X.t/ must have an exponential distribution
with mean 2
2. ut
Example 12.9 (Probability That Brownian Motion Does Not Hit a Line). Considerstandard Brownian motion W.t/ starting at zero on Œ0; 1/, and consider a straightline L with the equation y D aCbt; a; b > 0. Because W.0/ D 0; a > 0, and pathsof W.t/ are continuous, the probability that W.t/ does not hit the line L is the sameas P.W.t/ < a C bt8t > 0/. However, if we define a new Brownian motion (withdrift) X.t/ as X.t/ D W.t/ bt , then
P.W.t/ < a C bt8t > 0/ D P
supt>0
X.t/ a
D 1 e2ab ;
by our theorem above on the maximum of a Brownian motion with a negative drift.We notice that the probability that W.t/ does not hit L is monotone increasing ineach of a; b, as it should be.
12.8.2 Transition Density and the Heat Equation
If we consider Brownian motion starting at some number x, and with drift < 0
and diffusion coefficient , then by simple calculations, the conditional distributionof X.t/ given that X.0/ D x is N.x C t; 2t/, which has the density
pt .x; y/ D 1p2
pte
.yxt/2
22t :
This is called the transition density of the process. The transition density satisfies avery special partial differential equation, which we now prove.
By direct differentiation,
@
@tpt .x; y/ D .x y/2 2t2 2t
2p
23t5=2e
.yxt/2
22t I@
@ypt .x; y/ D x y C tp
23t3=2e
.yxt/2
22t I@2
@y2pt .x; y/ D .x y C t/2 2tp
25t5=2e
.yxt/2
22t :
12.8 Brownian Motion with Drift and Ornstein–Uhlenbeck Process 429
On using these three expressions, it follows that the transition density pt .x; y/ satisfies the partial differential equation
@
@tpt .x; y/ D
@
@ypt .x; y/ C 2
2
@2
@y2pt .x; y/:
This is the driftdiffusion equation in one dimension. In the particular case that D0(no drift), and D 1, the equation reduces to the celebrated heat equation
@
@tpt .x; y/ D 1
2
@2
@y2pt .x; y/:
Returning to the driftdiffusion equation for the transition density in general, if wenow take a general function f .x; y/ that is twice continuously differentiable in y
and is bounded above by Kecjyj for some finite K; c > 0, then integration by partsin the driftdiffusion equation produces the following expectation identity, which westate as a theorem.
Theorem 12.19. Let x; be any real numbers, and > 0. Suppose Y N.x Ct; 2t/, and f .x; y/ twice continuously differentiable in y such that for some 0 <
K; c < 1; jf .x; y/j Kecjyj for all y. Then,
@
@tExŒf .x; Y / D Ex
@
@yf .x; Y /
C 2
2Ex
@2
@y2f .x; Y /
:
This identity and a multidimensional version of it has been used in Brown et al.(2006) to derive various results in statistical decision theory.
12.8.3 The Ornstein–Uhlenbeck Process
The covariance function of standard Brownian motion W.t/ is Cov.W.s/; W.t// Dmin.s; t/. Therefore, if we scale by
pt , and let X.t/ D W.t/p
t; t > 0, we get that
Cov.X.s/; X.t// Dq
min.s;t/max.s;t/
Dq
st, if s t . Therefore, the covariance is a func
tion of only the time lag on a logarithmic time scale. This motivates the definitionof the Ornstein–Uhlenbeck process as follows.
Definition 12.20. Let W.t/ be standard Brownian motion starting at zero, and let’ > 0 be a fixed constant. Then X.t/ D e ’t
2 W.e’t /; 1 < t < 1 is calledthe Ornstein–Uhlenbeck process. The most general Ornstein–Uhlenbeck process isdefined as
X.t/ D C p’
e ’t2 W.e’t /; 1 < < 1; ’; > 0:
In contrast to the Wiener process, the Ornstein–Uhlenbeck process has a locallytimedependent drift. If the present state of the process is larger than , the global
430 12 Brownian Motion and Gaussian Processes
mean, then the drift drags the process back towards , and if the present state ofthe process is smaller than , then it does the reverse. The ’ parameter controls thistendency to return to the grand mean. The third parameter controls the variability.
Theorem 12.20. Let X.t/ be a general Ornstein–Uhlenbeck process. Then, X.t/
is a stationary Gaussian process with EŒX.t/ D , and Cov.X.s/; X.t// D2
’e ’
2 jst j.
Proof. It is obvious that EŒX.t/ D . By definition of X.t/,
Cov.X.s/; X.t// D 2
’e ’
2.sCt/ min.e’s; e’t / D 2
’e ’
2.sCt/e’ min.s;t/
D 2
’e
’2
min.s;t/e ’2
max.s;t/ D 2
’e.’=2/jst j;
and inasmuch as Cov.X.s/; X.t// is a function of only js t j, it follows that it isstationary. utExample 12.10 (Convergence of Integrated Ornstein–Uhlenbeck to Brownian Motion). Consider an Ornstein–Uhlenbeck process X.t/ with parameters ; ’, and2. In a suitable asymptotic sense, the integrated Ornstein–Uhlenbeck processconverges to a Brownian motion with drift and an appropriate diffusion coefficient; the diffusion coefficient can be adjusted to be one. Towards this, defineY.t/ D R t
0X.u/du. This is clearly also a Gaussian process. We show in this ex
ample that if 2; ’ ! 1 in such a way that 42
’2 ! c2; 0 < c < 1, thenCov.Y.s/; Y.t// ! c2 min.s; t/. In other words, in the asymptotic paradigm where; ’ ! 1, but are of comparable order, the integrated Ornstein–Uhlenbeck processY.t/ is approximately the same as a Brownian motion with some drift and somediffusion coefficient, in the sense of distribution.
We directly calculate Cov.Y.s/; Y.t//. There is no loss of generality in taking
to be zero. Take 0 < s t < 1. Then
Cov.Y.s/; Y.t// DZ t
0
Z s
0
EŒX.u/X.v/dudv D 2
’
Z t
0
Z s
0
e ’2 juvjdudv
D 2
’
Z s
0
Z v
0
e ’2 juvjdudv C
Z s
0
Z s
ve ’
2 juvjdudv
CZ t
s
Z s
0
e ’2 juvjdudv
D 2
’
Z s
0
Z v
0
e ’2
.vu/dudv CZ s
0
Z s
ve ’
2.uv/dudv
CZ t
s
Z s
0
e ’2
.vu/dudv
D 2
’
4
’2
h’s C e’s=2 C e’t=2 e’.ts/=2
i;
on doing the three integrals in the line before, and on adding them.
Exercises 431
If ’; 2 ! 1, and 42
’2 converges to some finite and nonzero constant c2, thenfor any s; t; 0 < s < t , the derived expression for Cov.Y.s/; Y.t// ! c2s Dc2min.s; t/, which is the covariance function of Brownian motion with diffusioncoefficient c.
The Ornstein–Uhlenbeck process enjoys another important property besides stationarity. It is also a Markov process. It is the only stationary and Markov Gaussianprocess with paths that are smooth. This property explains some of the popularityof the OrnsteinUhlenbeck process in fitting models to real data.
Exercises
Exercise 12.1 (Simple Processes). Let X0; X1; X2; : : : be a sequence of iid standard normal variables, and W.t/; t 0 a standard Brownian motion independent ofthe Xi sequence, starting at zero. Determine which of the following processes areGaussian, and which are stationary.
(a) X.t/ X1CX2p2
:
(b) X.t/ j X1CX2p2
j:(c) X.t/ D tX1X2q
X21
CX22
:
(d) X.t/ D Pkj D0ŒX2j cos j t C X2j C1 sin j t :
(e) X.t/ D t2W. 1t2 /; t > 0, and X.0/ D 0:
(f) X.t/ D W.t jX0j/:
Exercise 12.2. Let X.t/ D sin t , where U Œ0; 2.
(a) Suppose the time parameter t belongs to T D f1; 2; : : : ; g. Is X.t/ stationary?(b) Suppose the time parameter t belongs to T D Œ0; 1/. Is X.t/ stationary?
Exercise 12.3. Suppose W.t/; t 0 is a standard Brownian motion starting at zero,and Y N.0; 1/, independent of the W.t/ process. Let X.t/ D Yf .t/ C W.t/,where f is a deterministic function. Is X.t/ stationary?
Exercise 12.4 (Increments of Brownian Motion). Suppose W.t/; t 0 is astandard Brownian motion starting at zero, and Y is a positive random variable independent of the W.t/ process. Let X.t/ D W.t C Y / W.t/. Is X.t/
stationary?
Exercise 12.5. Suppose W.t/; t 0 is a standard Brownian motion starting at zero.Let X.n/ D W.1/ C W.2/ C : : : C W.n/; n 1. Find the covariance function ofthe process X.n/; n 1.
432 12 Brownian Motion and Gaussian Processes
Exercise 12.6 (Moments of the Hitting Time). Suppose W.t/; t 0 is a standardBrownian motion starting at zero. Fix a > 0 and let Ta be the first time W.t/ hits a.Characterize all ’ such that EŒT ’
a < 1.
Exercise 12.7 (Hitting Time of the Positive Quadrant). Suppose W.t/; t 0 is astandard Brownian motion starting at zero. Let T D infft > 0 W W.t/ > 0g. Showthat with probability one, T D 0.
Exercise 12.8. Suppose W.t/; t 0 is standard Brownian motion starting at zero.Fix z > 0 and let Tz be the first time W.t/ hits z. Let 0 < a < b < 1. FindE.Tb jTa D t/.
Exercise 12.9 (Running Maximum of Brownian Motion). Let W.t/; t 0
be standard Brownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. EvaluateP.M.1/ D M.2//.
Exercise 12.10. Let W.t/; t 0 be standard Brownian motion on Œ0; 1/. Let T >
0 be a fixed finite time instant. Find the density of the first zero of W.t/ after thetime t D T . Does it have a finite mean?
Exercise 12.11 (Integrated Brownian Motion). Let W.t/; t 0 be standardBrownian motion on Œ0; 1/. Let X.t/ D R t
0W.s/ds. Identify explicit positive
constants K; ’ such that for any t; c > 0; P.jX.t/j c/ Kt’
c:
Exercise 12.12 (Integrated Brownian Motion). Let W.t/; t 0 be standardBrownian motion on Œ0; 1/. Let X.t/ D R t
0W.s/ds. Prove that for any fixed t; X.t/
has a finite mgf everywhere, and use it to derive the fourth moment of X.t/.
Exercise 12.13 (Integrated Brownian Motion). Let W.t/; t 0 be standardBrownian motion on Œ0; 1/. Let X.t/ D R t
0W.s/ds. Find
(a) E.X.t/ jW.t/ D w/.(b) E.W.t/ jX.t/ D x/.(c) The correlation between X.t/ and W.t/.(d) P.X.t/ > 0; W.t/ > 0/ for a given t .
Exercise 12.14 (Application of Reflection Principle). Let W.t/; t 0 be standardBrownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. Prove that P.W.t/ w; M.t/ x/ D 1 ˆ. 2xwp
t/; x w; x 0. Hence, derive the joint density of
W.t/ and M.t/.
Exercise 12.15 (Current Value and Current Maximum). Let W.t/; t 0 bestandard Brownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. Find P.W.t/ DM.t// and find its limit as t ! 1.
Exercise 12.16 (Current Value and Current Maximum). Let W.t/; t 0
be standard Brownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. FindE.M.t/ jW.t/ D w/.
Exercises 433
Exercise 12.17 (Predicting the Next Value). Let W.t/; t 0 be standard Brownian motion on Œ0; 1/ and let NW .t/ D 1
t
R t
0W.s/ds the current running average.
(a) Find OW .t/ D E.W.t/ j NW .t/ D w/:
(b) Find the prediction error EŒjW.t/ OW .t/j:Exercise 12.18 (ZeroFree Intervals). Let W.t/; t 0 be standard Brownian motion, and 0 < s < t < u < 1. Find the conditional probability that W.t/ has nozeroes in Œs; u given that it has no zeroes in Œs; t .
Exercise 12.19 (Application of the LIL). Let W.t/; t 0 be standard Brownianmotion, and 0 < s < t < u < 1. Let X.t/ D W.t/p
t; t > 0. Let K; M be any two posi
tive numbers. Show that infinitely often, with probability one, X.t/ > K and < M .
Exercise 12.20. Let W.t/; t 0 be standard Brownian motion, and 0 < s < t <
u < 1. Find the conditional expectation of X.t/ given X.s/ D x; X.u/ D y.
Hint: Consider first the conditional expectation of X.t/ given X.0/ D X.1/ D 0.
Exercise 12.21 (Reflected Brownian Motion Is Markov). Let W.t/; t 0 bestandard Brownian motion starting at zero. Show that jW.t/j is a Markov process.
Exercise 12.22 (Adding a Function to Brownian Motion). Let W.t/ be standardBrownian motion on Œ0; 1 and f a general continuous function on Œ0; 1. Show thatwith probability one, X.t/ D W.t/ C f .t/ is everywhere nondifferentiable.
Exercise 12.23 (No Intervals of Monotonicity). Let W.t/; t 0 be standardBrownian motion, and 0 < a < b < 1 two fixed positive numbers. Show, byusing the independent increments property, that with probability one, W.t/ is nonmonotone on Œa; b.
Exercise 12.24 (TwoDimensional Brownian Motion). Show that twodimensionalstandard Brownian motion is a Markov process.
Exercise 12.25 (An Interesting Connection to Cauchy Distribution). LetW1.t/; W2.t/ be two independent standard Brownian motions on Œ0; 1/ starting at zero. Fix a number a > 0 and let Ta be the first time W1.t/ hits a. Find thedistribution of W2.Ta/.
Exercise 12.26 (Time Spent in a Nonempty Set). Let W2.t/; t 0 be a twodimensional standard Brownian motion starting at zero, and let C be a nonemptyopen set of R2. Show that with probability one, the Lebesgue measure of the set oftimes t at which W.t/ belongs to C is infinite.
Exercise 12.27 (Difference of Two Brownian Motions). Let W1.t/; W2.t/; t 0
be two independent Brownian motions, and let c1; c2 be two constants. Show thatX.t/ D c1W1.t/ C c2W2.t/ is another Brownian motion. Identify any drift anddiffusion parameters.
434 12 Brownian Motion and Gaussian Processes
Exercise 12.28 (Intersection of Brownian Motions). Let W1.t/; W2.t/; t 0 betwo independent standard Brownian motions starting at zero. Let C D ft > 0 WW1.t/ D W2.t/g.
(a) Is C nonempty with probability one?(b) If C is nonempty, is it a finite set, or is it an infinite set with probability one?(c) If C is an infinite set with probability one, is its Lebesgue measure zero or
greater than zero?(d) Does C have accumulation points? Does it have accumulation points with prob
ability one?
Exercise 12.29 (The L1 Norm of Brownian Motion). Let W.t/; t 0 be standard Brownian motion starting at zero. Show that with probability one, I DR 1
0jW.t/jdt D 1.
Exercise 12.30 (Median Local Time). Find the median of the local time .x; t/ ofa standard Brownian motion on Œ0; 1/ starting at zero.Caution: For x ¤ 0, the local time has a mixed distribution.
Exercise 12.31 (Monotonicity of the Mean Local Time). Give an analytical proofthat the expected value of the local time .x; t/ of a standard Brownian motionstarting at zero is strictly decreasing in the spatial coordinate x.
Exercise 12.32 (Application of Invariance Principle). Let X1; X2; : : : be iid variables with the common distribution P.Xi D ˙1/ D 1
2. Let Sk D Pk
iD1 Xi ; k 1,and …n D 1
n#fk W Sk > 0g. Find the limiting distribution of ˘n by applying
Donsker’s invariance principle.
Exercise 12.33 (Application of Invariance Principle). Let X1; X2; : : : be iid variables with zero mean and a finite variance 2. Let Sk D Pk
iD1 Xi ; k 1,and Mn D n1=2 max1kn Sk . Find the limiting distribution of Mn by applyingDonsker’s invariance principle.
Exercise 12.34 (Application of Invariance Principle). Let X1; X2; : : : be iid variables with zero mean and a finite variance 2. Let Sk D Pk
iD1 Xi ; k 1, andAn D n1=2 max1kn jSkj. Find the limiting distribution of An by applyingDonsker’s invariance principle.
Exercise 12.35 (Application of Invariance Principle). Let X1; X2; : : : be iid variables with zero mean and a finite variance 2. Let Sk D Pk
iD1 Xi ; k 1, andTn D n3=2
PnkD1 jSkj. Find the limiting distribution of Tn by applying Donsker’s
invariance principle.
Exercise 12.36 (Distributions of Some Functionals). Let W.t/; t 0 be standardBrownian motion starting at zero. Find the density of each of the following functionals of the W.t/ process:
References 435
(a) supt>0 W 2.t/;
(b)R 1
0 W.t/dt
W. 12
/;
Hint: The terms in the quotient are jointly normal with zero means.
(c) supt>0W.t/aCbt
; a; b > 0.
Exercise 12.37 (Ornstein–Uhlenbeck Process). Let X.t/ be a general Ornstein–Uhlenbeck process and s < t two general times. Find the expected value of jX.t/ X.s/j.Exercise 12.38. Let X.t/ be a general Ornstein–Uhlenbeck process and Y.t/ DR t
0X.u/du. Find the correlation between Y.s/ and Y.t/ for 0 < s < t < 1, and
find its limit when ; ’ ! 1 and ’
! 1.
Exercise 12.39. Let W.t/; t 0 be standard Brownian motion starting at zero, and0 < s < t < 1 two general times. Find an expression for P.W.t/ > 0 jW.s/ > 0/,and its limit when s is held fixed and t ! 1.
Exercise 12.40 (Application of the Heat Equation). Let Y N.0; 2/ and f .Y /
a twice continuously differentiable convex function of Y . Show that EŒf .Y / isincreasing in , assuming that the expectation exists.
References
Bhattacharya, R.N. and Waymire, E. (2007). A Basic Course in Probability Theory, Springer,New York.
Bhattacharya, R.N. and Waymire, E. (2009). Stochastic Processes with Applications, SIAM,Philadelphia.
Billingsley, P. (1968). Convergence of Probability Measures, John Wiley, New York.Breiman, L. (1992). Probability, AddisonWesley, New York.Brown, L. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value prob
lems, Ann. Math. Statist., 42, 855–903.Brown, L., DasGupta, A., Haff, L.R., and Strawderman, W.E. (2006). The heat equation and Stein’s
identity: Connections, Applications, 136, 2254–2278.Csorgo, M. (2002). A glimpse of the impact of Pal Erdos on probability and statistics, Canad. J.
Statist., 30, 4, 493–556.Csorgo, M. and Revesz, P. (1981). Strong Approximations in Probability and Statistics, Academic
Press, New York.Csorgo, S. and Hall, P. (1984). The KMT approximations and their applications, Austr. J. Statist.,
26, 2, 189–218.Dasgupta, A. (2008), Asymptotic Theory of Statistics and Probability Springer,New York.Donsker, M. (1951). An invariance principle for certain probability limit theorems, Mem. Amer.
Math. Soc., 6.Durrett, R. (2001). Essentials of Stochastic Processes, Springer, New York.Einmahl, U. (1987). Strong invariance principles for partial sums of independent random vectors,
Ann. Prob., 15, 4, 14191440.Erdos, P. and Kac, M. (1946). On certain limit theorems of the theory of probability, Bull. Amer.
Math. Soc., 52, 292–302.
436 12 Brownian Motion and Gaussian Processes
Freedman, D. (1983). Brownian Motion and Diffusion, Springer, New York.Hall, P. and Heyde, C. (1980). Martingale Limit Theory and Its Applications, Academic Press,
New York.Karatzas, I. and Shreve, S. (1991). Brownian Motion and Stochastic Calculus, Springer, New York.Karlin, S. and Taylor, H. (1975). A First Course in Stochastic Processes, Academic Press,
New York.Komlos, J., Major, P., and Tusnady, G. (1975). An approximation of partial sums of independent
rvs and the sample df :I, Zeit fur Wahr. Verw. Geb., 32, 111–131.Komlos, J., Major, P. and Tusnady, G. (1976). An approximation of partial sums of independent
rvs and the sample df :II, Zeit fur Wahr. Verw. Geb., 34, 33–58.Korner, T. (1986). Fourier Analysis, Cambridge University Press, Cambridge, UK.Lawler, G. (2006). Introduction to Stochastic Processes, Chapman and Hall, New York.Major, P. (1978). On the invariance principle for sums of iid random variables, Mult. Anal., 8,
487517.Morters, P. and Peres, Y. (2010). Brownian Motion, Cambridge University Press, Cambridge, UK.Pyke, R. (1984). Asymptotic results for empirical and partial sum processes: A review, Canad.
J. Statist., 12, 241–264.Resnick, S. (1992). Adventures in Stochastic Processes, Birkhauser, Boston.Revesz, P. (2005). Random Walk in Random and Nonrandom Environments, World Scientific Press,
Singapore.Revuz, D. and Yor, M. (1994). Continuous Martingales and Brownian Motion, Springer, Berlin.Strassen, V. (1964). An invariance principle for the law of the iterated logarithm, Zeit. Wahr. verw.
Geb., 3, 211–226.Strassen, V. (1967). Almost sure behavior of sums of independent random variables and martin
gales, Proc. Fifth Berkeley Symp., 1, 315–343, University of California Press, Berkeley.Whitt, W. (1980). Some useful functions for functional limit theorems, Math. Opem. Res., 5,
67–85.