Principles and Procedures of QTL Mapping

Principles and Procedures of QTL Mapping

Zhiqiu Hu & Shizhong Xu

4/14/2010

The correct bibliographic citation for this program is

Zhiqiu Hu and Shizhong Xu (2009). PROC QTL - A SAS Procedure for Mapping

Quantitative Trait Loci. International Journal of Plant Genomics 2009: 3

doi:10.1155/2009/141234.

PROC QTL Version 2.0

Copyright © 2008, University of California, Riverside, CA, USA

All rights reserved.

University of California, Riverside

900 University Ave., Riverside, CA 92521

i

Contents

INTRODUCTION TO QTL MAPPING .............................................................................. 1

Quantitative traits .............................................................................................. 1

Mapping populations ......................................................................................... 2

Molecular markers ............................................................................................. 9

Linkage map of markers .................................................................................. 10

Interval mapping .............................................................................................. 10

Multiple QTL mapping ..................................................................................... 11

Association mapping ....................................................................................... 12

MULTIPOINT METHOD FOR QTL GENOTYPE INFERENCE ............................................. 13

Mapping function ............................................................................................. 13

Markov chain property ..................................................................................... 14

Virtual map ...................................................................................................... 20

MAXIMUM LIKELIHOOD METHOD ............................................................................... 23

Likelihood function ........................................................................................... 23

Newton-Raphson algorithm ............................................................................. 24

Information matrix and estimation errors ......................................................... 25

Likelihood ratio test statistics ........................................................................... 26

Wald test statistics ........................................................................................... 27

BAYESIAN METHOD ................................................................................................ 28

Introduction to Bayesian method ..................................................................... 28

Markov chain Monte Carlo algorithm ............................................................... 29

Diagnoses of convergence of Markov chain .................................................... 32

Post MCMC analysis ....................................................................................... 35

INTERVAL MAPPING FOR NORMALLY DISTRIBUTED TRAITS .......................................... 36

Simple least squares method .......................................................................... 37

Weighted least squares ................................................................................... 39

Fisher scoring algorithm .................................................................................. 42

Maximum likelihood method ............................................................................ 44

ii

Hypothesis testing ........................................................................................... 51

Remarks on the four methods of interval mapping .......................................... 51

INTERVAL MAPPING FOR DISCRETE TRAITS ............................................................... 54

Generalized linear model for ordinal traits ....................................................... 54

Expectation substitution method ...................................................................... 57

Fisher scoring method ..................................................................................... 59

Approximate mixture model ............................................................................. 60

Mixture model maximum likelihood method .................................................... 62

Variance-covariance matrix for estimated parameters .................................... 63


Extension to other traits ................................................................................... 64

MAPPING QUANTITATIVE TRAIT LOCI UNDER SEGREGATION DISTORTION ...................... 69

The likelihood of markers ................................................................................ 69

The likelihood of phenotypes ........................................................................... 71

Joint likelihood of markers and phenotypes .................................................... 72

EM algorithm for the joint analysis ................................................................... 72


Standard errors of the estimated parameters .................................................. 76

INTERVAL MAPPING FOR MULTIPLE TRAITS ................................................................ 80

Multivariate model ........................................................................................... 80

Least square method ....................................................................................... 81

Maximum likelihood method ............................................................................ 82


BAYESIAN SHRINKAGE METHOD FOR QTL MAPPING .................................................. 85

Multiple QTL model ......................................................................................... 85

Prior, likelihood and posterior .......................................................................... 86

Fixed interval ................................................................................................... 91

Random walk................................................................................................... 92

Moving interval ................................................................................................ 93

Summary of the MCMC sampling process ...................................................... 95

Post MCMC analysis ....................................................................................... 96

iii

Bayesian mapping for ordinal traits ................................................................. 97

Sampling missing phenotypic values ............................................................... 99

Permutation ................................................................................................... 100

BAYESIAN MAPPING FOR DISCRETE TRAITS ............................................................. 102

Generalized linear model .............................................................................. 102

Binary data .................................................................................................... 104

Binomial data................................................................................................. 105

Poisson data ................................................................................................. 106

EMPIRICAL BAYESIAN METHOD .............................................................................. 107

Main QTL effect model .................................................................................. 107

Epistatic QTL effect model ............................................................................ 108

Simplex algorithm .......................................................................................... 109

BAYESIAN MAPPING FOR MULTIPLE TRAITS ............................................................. 111

Multiple continuous traits ............................................................................... 111

Multiple binary traits ...................................................................................... 117

Mixture of continuous and binary traits .......................................................... 118

Missing values ............................................................................................... 118

REFERENCE ........................................................................................................ 119

1

INTRODUCTION TO QTL MAPPING

QUANTITATIVE TRAITS

A quantitative trait, by definition, is defined as a trait that varies quantitatively. The phenotypic values of individuals vary by degree rather than by kind. These traits are usually controlled by the segregation of multiple genes plus environmental factors. Some genes have large effects and some have small effects. Some traits are influenced more by genetic effects than by environmental effects and some are influenced more by environments than by genetic effects. Genes that control the genetic variation of quantitative traits are called quantitative trait loci (QTL) (TANKSLEY 1993). Because of the polygenic nature and sensitiveness to environmental changes, these traits must be studied in large populations and using sophisticated statistical tools to dissect the genetic architecture. Finding the genome locations of the QTL and estimating the effects of the QTL using molecular markers as anchors is called QTL mapping (TANKSLEY 1993). QTL mapping almost exclusively uses the linear model to describe the relationship between the phenotypic value and the putative QTL. The most commonly used method is the maximum likelihood method. Likelihood ratio test (WILKS 1938) is often used as the test statistic.

Some traits have a discrete distribution, e.g., disease resistance traits, where the phenotype is measured by kind, e.g., affected and normal (XU and ATCHLEY 1996). Very few disease resistance traits are controlled by a single gene (TURNPENNY and ELLARD 2005). Most traits, however, are controlled by multiple genes plus environmental effects. These traits, although phenotypically very simple, are genetically complicated. They are polygenic traits and thus are often defined as complex traits (LANDER and SCHORK 2006). QTL mapping also covers this kind of complex traits. The way to handle these traits is to hypothesize an underlying continuously distributed liability under each discrete trait (WRIGHT 1934). The connection between the unobserved liability and the observed phenotype is through a threshold. Below the threshold, the individual will have the normal phenotype. Above the threshold, it will show the abnormal (disease) phenotype. Using the threshold model, we can map QTL controlling the unobserved liability (RAO and XU 1998; XU et al. 2005b; XU and ATCHLEY 1996). The QTL parameters are estimated in the scale of liability. Therefore, QTL mapping also covers these complex traits. Because we are mapping QTL for the liability and the liability is not observable, we often use the generalized linear model. A generalized linear model, by definition, is a generalization of the linear model. All technical tools developed in general linear model apply to the generalized linear model.

2

MAPPING POPULATIONS

Quantitative genetic theory largely deals with allelic effects and allele frequencies. The accuracy of parameter estimation depends on the allelic frequencies. In wild populations, we cannot control the allelic frequencies of the population. Therefore, estimation of genetic effects cannot be guaranteed with the optimal accuracy. In highly designed experiments, we can control the allelic frequencies and thus can design an optimal experiment to ensure that the genetic parameters are estimated with high accuracy. The current version of PROC QTL can handle the following mating designs: BC (backcross), FW (four-way cross), RIL (recombinant inbred lines) and DH (double haploid).

F2 mating design

The most popular design of experiments in QTL mapping is the F2 mating

design. An 2F design is through a line crossing experiment involving two

inbred lines, called P1 and P2. The 1F hybrid from the cross of P1 and P1 is

then selfing to generate a segregating 2F family. QTL mapping can be

performed using the 2F family. In terms of allelic frequencies of segregating

loci, they are optimal because each parent contributes an equal number of

alleles. Let 1 1A A and 2 2A A be the genotypes of the two parents, respectively,

and 1 2A A be the genotype of the hybrid. There are three possible genotypes

in the progeny of the 2F family: 1 1A A , 1 2A A and 2 2A A . The ratio of the three

genotypes is 1: 2:1. Let 11G , 12G and 22G be the genotypic values for the

three genotypes, respectively. The additive and dominance effects of the locus are defined as

11 11 22 11 22

1 1( ) ( )

2 2a G G G G G (1.1)

and

12 11 22

1( )

2d G G G (1.2)

respectively. Rather than estimating the genotypic values, we actually

estimate and test a and d in QTL mapping. The linear model for a single

QTL is given

j j j jy X a W d e (1.3)

where jy is the phenotypic value of individual j , is the population mean

(or intercept), jX is an indicator variable (for the additive effect) assigned a

3

value of 1, 0 or -1, respectively, for the three genotypes, 1 1A A , 1 2A A and

2 2A A , jW is an indicator variable (for the dominance effect) assigned a

value of 0, 1 or 0, respectively, for the three genotypes, 1 1A A , 1 2A A and

2 2A A , je is the residual error following a 2(0, )N distribution. The genotype

indicator variables, jX and jW , can be defined in many different scales.

The scales are usually chosen for statistical convenience rather than for biological meaningfulness because the scales only affect the estimation of the genetic effects and do not affect the results of statistical tests. PROC QTL actually estimates the genotypic values, not the genetic effects. Users are asked to provide scales of users’ choice in the ESTIMATE statement of

PROC QTL. If there is no segregation distortion, the following scales for jX

and jW are recommended (YANG et al. 2006), 2 0 2jX and

1 1 1jW for the three genotypes, 1 1A A , 1 2A A and 2 2A A . This scale

choice leads to var( ) var( ) 1j jX W and cov( , ) 0j jX W , and thus

2 2 2var( )jy a d (1.4)

which is mathematically more attractive than any other scales.

BC mating design

Starting from the two parents and the 1F hybrid, a BC family is generated

through crossing the 1F individual back to one of the two parents. If P1 is the

backcrossed parent, the BC family is called BC1. If P2 is the backcrossed parent, the BC is called BC2. Take BC1 for example, there are two possible

genotypes, 1 1A A and 1 2A A . It is impossible to estimate the dominance effect

because there is no enough degree of freedom to do so. The QTL effect is defined as

11 12'a G G a d (1.5)

Apparently, the QTL effect defined this way is the difference between the additive effect and the dominance effect. This effect is equivalent to the additive effect only if the dominance effect is absent. Using a BC family for

QTL mapping is not as powerful as using the 2F family because: (1) for the

same sample size, a BC family only carries half the number of meioses as

the 2F family; (2) the additive and dominance effects are confounded. When

the dominance effect is absent, we need a double sample size for a BC

design to achieve the same statistical power as the 2F design. Under the

assumption of no dominance, the model appears

4

j j jy X a e (1.6)

where 1 0jX for the two genotypes, 1 1A A and 1 2A A . The fact that the

BC design is not as powerful as the 2F design can be shown by looking at

the variances of jX in the two different families. The scale of jX in the 2F

design must be 1 0 1jX for the three genotypes in order to compare

the powers for the two different designs. For the BC design var( ) 1/ 4jX ,

but var( ) 1/ 2jX for the 2F design. The design with a larger var( )jX has

more power than the design with a smaller var( )jX , which explains why the

2F design is more powerful than the BC design.

RIL mating design

Recombinant inbred line design also involves two inbred parents, the 1F

hybrid and the 2F family. Each 2F progeny is undergoing many generations

of continuous selfing until all loci are fixed. This may take about 20 generations to reach 99% homozygosity. Eventually, each line descended from the cross is an inbred line, but carries genes from different parents across loci. In other words, within each locus, an RIL line carries the same allele from one parent, but between loci, the contributing parents may alternate. Therefore, an RIL carries a mosaic genome of the two parents.

The two homozygotes are 1 1A A and 2 2A A with the QTL effect at this locus

defined as

11 22' ( ) 2a G G a a a (1.7)

Therefore, using the RIL mating design is more powerful than a BC or 2F

design because the QTL effect defined is doubled. If we define 1 1jX

for the two homozygotes, 1 1A A and 2 2A A , the genetic variance at this locus

is

2 2 2var( )( ') var( )(2 ) 4j jX a X a a (1.8)

which is much larger than the corresponding genetic variance for the BC

design ( 2 / 4a ) and the genetic variance for the 2F design ( 2 / 2a ) under the

same scale of variable jX . In addition, for a sample size n of an 2F family,

there are 2n meioses, but the same number of RIL individuals will cumulate

many more meioses. The genetic material has altered many times across the genome (loci), leading to a “high” frequency of recombination between loci. Therefore, using the RIL design has both advantages of high power

5

and high resolution (fine mapping) over using the 2F design. Let r be the

recombination frequency between two loci per meiosis (in a BC for example), after many generations of cumulated meioses, the recombination frequency in the RIL will become

1

2

1 2

rc

r

(1.9)

This multiple meiosis corrected recombination fraction is larger than the original recombination fraction. Therefore, we will expect to see many more crossovers between two loci in the RIL design than those in a BC family. The genome essentially gets “longer” and thus allows fine mapping. Recombinant inbred lines generated through selfing are called RIL1. In animals where selfing does not happen, recombinant inbred lines can be generated through continuous brother-sister mating. The sib-mating approach will take more generations to reach the same homozygosity as the selfing approach. This type of RIL is called RIL2. The corresponding correction for the recombination frequency is

2

4

1 6

rc

r

(1.10)

Statistical methods of QTL mapping for RIL and BC are identical once r in BC is replaced by c in RIL.

DH mating design

A double haploid individual is created by duplicating a gamete via chemical treatment. A DH individual is a diploid homozygote for all loci. Starting from

an 2F progeny derived from the cross of two inbred lines, each of the two

gametes of an 2F is duplicated and the two copies are fused to give two DH

individuals. We need n 2F progeny to produce 2n independent DH

individuals. Like the RIL design, there are two possible genotypes in a DH

population, 1 1A A and 2 2A A with the QTL effect defined as

11 22' ( ) 2a G G a a a (1.11)

Therefore, DH design should provide the same power as an RIL design.

However, the resolution of DH mapping is equivalent to an 2F design

because the number of meioses is the same. The mating designs discussed so far all involve only two parents. Therefore, these mating design are called bi-parental mating designs.

FW mating design

The four-way cross mating design requires four inbred lines and two rounds

of crossing (XU 1996). In the first round of crossing, (12)

1 2 1P P F and

6

(34)

3 4 1P P F , two independent 1F hybrids are generated. In the second

round of crossing, (12) (34)

1 1F F FW , the two different 1F hybrids are

crossing to generate a four-way cross family. The mating design is more clearly described using the genotypic labels. In the first round of cross, we

have 1 1 2 2 1 2A A A A A A and 3 3 4 4 3 4A A A A A A . In the second round of

cross, we get

1 2 3 4 1 4 1 4 2 3 2 4, , ,A A A A A A A A A A A A (1.12)

There are four possible genotypes in the four-way cross family. The labels of the alleles need to be changed again in order to describe the genetic model for the FW cross design. In the first round of cross, we have

1 1 2 2 1 2

p p p p p pA A A A A A and 1 1 2 2 1 2

m m m m m mA A A A A A . Note that the four alleles

involved in the FW progeny, 1A , 2A , 3A and 4A , have been relabeled as 1

pA ,

2

pA , 1

mA and 2

mA , respectively, where the superscripts p and m indicate the

paternal and maternal origins of the progeny and the subscripts 1 and 2 indicate the paternal and maternal origins of the parents. With this new notation, we get the FW cross family

1 2 1 2 1 1 1 2 2 1 2 2, , ,p p m m p m p m p m p mA A A A A A A A A A A A (1.13)

We now assign a value to each allele, say 1

pa , 2

pa , 1

ma and 2

ma , for the four

alleles. The corresponding genotypic values are now defined as

11 1 1 11

12 1 2 12

21 2 1 21

22 2 2 22

p m

p m

p m

p m

G a a d

G a a d

G a a d

G a a d

(1.14)

where ijd is the interaction effect between the two alleles involved in the

genotype. The model is over parameterized because we cannot estimate nine parameters using four genotypes. Therefore, some restrictions are required to reduce the number of parameters. Many different schemes of restrictions can be used, but one particular scheme lead to the following reduced parameters (XU 1998b),

11 22

11 22

111 12 21 224

( )

( )

( )

p p p

m m m

a a

a a

d d d d

(1.15)

Including , we have four estimable parameters that are expressed as

linear contrasts (combinations) of the genotypic values,

7

1 1 1 1 1 1 1 111 12 21 22 114 4 4 4 4 4 4 4

1 1 1 1 1 1 1 111 12 21 22 124 4 4 4 4 4 4 4

1 1 1 1 1 1 1 111 12 21 22 214 4 4 4 4 4 4 4

1 1 1 1 1 1 1 111 12 21 22 224 4 4 4 4 4 4 4

p

m

G G G G G

G G G G G

G G G G G

G G G G G

(1.16)

The reverse relationship is

11

12

21

22

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

p m

pp m

mp m

p m

G a a

G a a

G a a

G a a

(1.17)

Let us define

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

H

(1.18)

We can see that

1 1 1 14 4 4 4

1 1 1 14 4 4 41

1 1 1 14 4 4 4

1 1 1 14 4 4 4

H

(1.19)

We now give the model expressed as functions of only the estimable parameters,

j j jy X e (1.20)

where

1 1 1

2 1 2

3 2 1

4 2 2

for

for

for

for

p m

p m

j p m

p m

H A A

H A AX

H A A

H A A

and

p

m

(1.21)

and kH is the k th row of matrix H . Under this scale of definition for jX ,

the total phenotypic variance can be partitioned into four components,

2 2 2 2var( ) ( ) ( )p m

jy (1.22)

where 2 is the residual error variance.

8

Full-sib family

A full-sib family is a family of individuals generated by repeated matings between two parents. Individuals within the family are called full-siblings and they all share the same father and the same mother. The genetic model described in the FW cross design directly applies to QTL mapping in the full-sib family. The father (paternal parent) of the full-sib family is equivalent

to the 1F hybrid used as the paternal parent of a FW cross. The mother

(maternal parent) of the full-sib family is equivalent to the 1F hybrid used as

the maternal parent of a FW cross. We can only estimate the allelic

difference between the two alleles of the father ( p ), the allelic difference

between the two alleles of the mother ( m ) and the interaction effect ( ).

The full-sib family design differs from a FW cross is that we need to infer the linkage phases of the markers prior to QTL mapping because we do not necessarily have the genotypic information of the grandparents in the full-sib family. Once the linkage phases are inferred for the full-sib family, the statistical model and method in the FW cross apply to the full-sib family.

Half-sib family

Each member of the family has a different mother but all share the same father. This type of family is common in large animals such as beef cattle. Half-sib families can also be found in forest trees but the common parent of each half-sib family is the female parent. PROC QTL can handle half-sib families using the same mating design as BC. The common parent in the

half-sib family is treated as the “ 1F ” hybrid in the BC. The other parents (all

independent) in the half-sib family are treated as the backcrossed “parent.” This comparison is hard to understand, but it is true from the statistical

model point of view. In a BC family of this mating type 1 2 1 1A A A A , we

estimate the difference between two genotypes of the progeny, 1 1A A and

1 2A A . In fact, we are estimating the difference between the two alleles

carried by the 1F hybrid ( 1 2A A ). The common parent ( 1 1A A ) plays no roles

other than providing a background for evaluation of the two alleles of the 1F

hybrid. In half-sib QTL mapping, we are estimating the difference between the two alleles of the common parent. The background alleles are provided by all other independent parents. The difference between the two different designs occurs in the different background alleles. A BC design has a uniform or homogeneous background allele while a half-sib family has a heterogeneous background allelic array. The background alleles play no role in the statistical model. You need to manipulate the data a little bit to “fool” the program. First, you need to infer the linkage phases of all markers for the common parent and label the paternal allele of the common parent

by 1A and the maternal allele by 2A . Secondly, you need to recode the

9

genotypes of the progeny by 1 _A A and 2 _B A , where the underscore

mean a wild card representing the background alleles. You have now relabeled the genotypes of the progeny so that there are only two possible “genotypes” in the progeny. This half-sib family can now be mapped using the BC mating design.

MOLECULAR MARKERS

QTL mapping requires two sets of data, one is the phenotype data of a quantitative trait and the other is the marker genotype data. Depending on the method of QTL mapping, a marker map may be needed. We assume that the markers are already mapped in the genome. What we are going to do in QTL mapping is to put detected QTL in the genome with positions defined relative to the positions of markers. What are molecular markers and what are the differences between a marker and a gene? A molecular marker is a piece of “labeled” DNA in the genome. The alleles of a marker are inherited following the Mendel’s laws of inheritance. A marker acts like a gene because of its Mendelian behavior, but there is no known function on any traits. If the function is known, it would be called a gene. However, the genotype of a marker in an individual can be “observed” or measured using some molecular technique. This is in contrast to a gene whose genotype is rarely observed. Because the genotypes of markers can be observed, their relative locations in the genome can be inferred. Genes have functions on a trait of interest, but their genotypes cannot be observed. Through linkage analysis, we can find the association of markers with the phenotype of interest, from which the relative positions of the genes (QTL) can be inferred. This process is called QTL mapping.

There are two kinds of molecular markers: dominant markers and co-dominant markers. A dominant marker only has two observed states, presence and absence. One allele is said to be dominant over other alleles if one copy of the allele is sufficient to suppress the expression of all other

alleles. For example, if 1A allele is dominant over 2A allele, you cannot tell

the difference between 1 1A A and 1 2A A because the 2A allele will not be

expressed. In terms of the 1A allele, an individual only has two observed

states, presence ( 1 _A ) and absence ( _ _ ) of the 1A allele. A co-dominant

marker is a marker in which each allele is expressed (observed) so that you can directly see the alleles of a genotype. Dominant markers provide less information than co-dominant markers, but very often dominance markers have much higher density than co-dominance markers regarding their occurrence (distribution) along the genome.

10

LINKAGE MAP OF MARKERS

Special software packages, e.g., MapMaker (LANDER et al. 1987), are required to put markers in different linkage groups and order the markers within each linkage group. The linear arrangement of the markers in the genome is called the linkage map of markers. The marker map is usually stored in a separate file with three columns and m rows where m is the

total number of markers in the map. The first column stores the names of the markers, the second column gives the positions (cM) of the markers within the chromosomes and the third column shows the chromosome identifications of the markers. The position of each marker is measured in cM relative to the position of the first marker of the chromosome. The distance in cM between two consecutive markers are converted from the recombination fraction between the two markers using either the Haldane (1919) mapping function or the Kosambi (1944) mapping function. The maker map is required for interval mapping, but not required for individual marker analysis.

INTERVAL MAPPING

Interval mapping was originally developed by Lander and Botstein (1989). Prior to interval mapping, QTL mapping already existed but it was called individual marker analysis (SOLLER et al. 1976), which is simply a linear regression analysis or t-test repeatedly used for every marker of the genome. The problem with the individual marker analysis occurs when a QTL is located between two markers. In such a situation, part of the effect of the QTL is absorbed by the marker in the left and part absorbed by the marker in the right. The true location of the QTL and its effect is never estimated precisely. Lander and Botstein (1989) realized that a putative position between two markers with known positions can be evaluated for its association with the phenotype of a quantitative trait. When the position is fixed, the distances of this putative position from the two flanking markers are automatically given. The genotype of an individual for that putative locus is, of course, missing but it probabilistic distribution can be inferred from the genotypes of the flanking markers. Lander and Botstein (1989) then used a mixture model to fit the data so that the QTL effect of that position can be estimated and test for statistical significance. We can evaluate every possible position within the interval and the position that has the highest test statistic is a candidate of QTL in that interval. We then search for the QTL in another interval using different flanking markers. All intervals within a chromosome must be searched. Eventually, all chromosomes in the genome are searched. The putative position of the entire genome that has the highest test statistic is a candidate QTL. If the test statistic passes a critical value, the candidate QTL can be safely claimed as a QTL.

Interval mapping, by definition, refers to the method of using two markers

11

only each time to infer the genotype of an internal locus. If one or both of the two flanking marker are missing (have missing genotypes), the nearest non-missing markers must be used in place of the missing markers. This makes the interval making complicated because the interval defined varies from one individual to another, depending on the missing pattern of the flanking markers. As the advent of the multipoint method (JIANG and ZENG 1997), all markers can be used simultaneously to infer the genotypes of any putative locus of the genome. This multipoint approach makes the name of interval mapping no longer appropriate. There is no such a thing as an interval because every putative position is evaluated using markers of the entire genome. It is better to call the multipoint implemented interval mapping genome scanning. Anyway, the so called interval mapping performed by PROC QTL is the multipoint genome scanning.

MULTIPLE QTL MAPPING

Interval mapping is designed for mapping a single QTL per chromosome because the statistical model only contains a single QTL. If more than one QTL are present in a chromosome, interval mapping can still detect multiple QTL if these QTL are not too close to each other. If two QTL are closely linked, interval mapping may only show a single large peak in the test statistic profile. The estimated QTL effects under the interval mapping strategy will be biased if multiple QTL exist in the same linkage groups. The best model to map multiple QTL is the multiple regression model. For marker analysis, one can evaluate the entire genome by fitting all markers to a single model. Because the number of markers may be huge, a model selection algorithm may be applied to select important markers. Forward selection, backward selection or step-wise regression can be used to perform variable selection. Existing SAS procedures are available for that purpose. Extension of interval mapping to multiple QTL model has been made, the so called multiple interval mapping (KAO et al. 1999). Once the important markers or intervals are found, the estimated QTL effects are achieved via the ordinary least squares method or the classical maximum likelihood method by fitting all the detected markers or interval to a single linear model. Bayesian method is the state-of-the-art method for handling multiple QTL. Two approaches of Bayesian variable selection are currently used for multiple QTL mapping. One is the reversible jump Markov chain Monte Carlo method (SILLANPÄÄ and ARJAS 1998). The other is the Bayesian shrinkage method of Wang et al. (2005b). In the Bayesian shrinkage method, all markers are fit into a single model. Each regression coefficient (marker effect) is assigned a normal prior distribution with mean zero and a specific variance component. The regression coefficient specific prior variance is further assigned a scaled inverse chi-square hyper prior distribution with hyper parameters in the hyper prior provided by the users or set at some values as vague as possible. The current version of PROC QTL handles multiple QTL using the Bayesian shrinkage approach. The

12

reversible jump MCMC algorithm will be added later.

ASSOCIATION MAPPING

In contrast to linkage mapping where a designed line crossing experiment is required, association mapping uses a random sample of a population to perform QTL mapping, an approach called association mapping as opposed to linkage mapping. Association mapping assumes that cumulative historical recombination events have destroyed the linkage disequilibrium between a QTL and any marker nearby that does not overlap with the QTL. It is a simple individual marker analysis applied to a random sample of a target population. If a marker is strongly associated with the trait phenotype, this marker is the QTL because if it is not, the association would have been destroyed by the cumulative historical recombination events. PROC QTL can perform association mapping by individual marker analysis or multiple marker analysis. The difference between the association mapping and linkage mapping conducted by PROC QTL is that the data for association mapping must be pretreated by removing any influence of population structures (PRITCHARD et al. 2000) and hidden genetic relatedness among individuals (HANSEN et al. 1997). The processed data are then treated as the input data of RPOC QTL for further analysis.

13

MULTIPOINT METHOD FOR QTL GENOTYPE

INFERENCE

The key difference between individual marker analysis and interval mapping is the ability of the latter to estimate QTL effect for a putative position that does not overlap with a marker. In interval mapping, a putative position bracketed by two markers has missing genotypes. The probability distribution of the genotype, however, is inferred from marker information. This probability distribution provides the foundation for the mixture model maximum likelihood method. The current method for the probability inference is the multipoint method where all markers are used simultaneously rather than using two markers at a time. PROC QTL takes the most current multipoint method for such probability inference.

MAPPING FUNCTION

The marker distances in the linkage map are almost always measured by the expected numbers of crossovers (additive distances) in centiMorgan (cM). The multipoint method, however, takes the distances measured in recombination fractions as the input data. Therefore, the additive distances between consecutive markers must be converted into recombination fractions prior to the multipoint analysis. There are two mapping functions commonly used in QTL mapping, the Haldane (1919) mapping function and the Kosambi (1944) mapping function. PROC QTL uses the Haldane

mapping function only. Let ijd and ijr be the additive distance and the

recombination fraction between loci i and j , respectively, the Haldane

mapping function is

1

1 exp( 2 )2

ij ijr d (2.1)

where the additive distance is measured in Morgan not in centiMorgan (1 M = 100 cM). If you provided the additive distances measured in centiMorgan, you should convert the distances into Morgan prior to applying the Haldane mapping function. The Kosambi (1944) mapping function takes into consideration the crossover interference between consecutive intervals and thus it is more realistic than the Haldane (1919) mapping function. However, Haldane mapping function is mathematically more attractive than the Kosambi mapping function. For your convenience, we present the Kosambi mapping function below, although it is not used by PROC QTL,

1 exp( 4 )

2 1 exp( 4 )

ij

ij

ij

dr

d

(2.2)

14

MARKOV CHAIN PROPERTY

The multipoint method of genotype probability inference was developed based on the Markov chain properties (JIANG and ZENG 1997). A chromosome is considered as a Markov chain. You can treat the left end of the chromosome as the starting point of the chain and the right end as the ending point of the chain or vice versa. A marker or any locus in the chromosome is a time point within the chain. For each locus, the genotypes are the states of the point. The transition probabilities between two consecutive loci are functions of the recombination fraction between the two loci. Each Mendelian locus occupies a specific point on a chromosome. A linkage analysis requires two or more Mendelian loci, and thus involves two or more points. When a linkage analysis involves two Mendelian loci, the analysis is called two-point analysis. When more than two Mendelian loci are analyzed simultaneously, the method is called multipoint analysis. Multipoint analysis can extract more information from the data if markers are not fully informative, e.g., missing genotypes, dominance alleles and so on.

When there is no interference between the crossovers of two consecutive chromosome segments, the joint distribution of genotypes of marker loci is Markovian. We can imagine that the entire chromosome behaves like a Markov chain, in which the genotype of one locus depends only on the genotype of the previous locus. A Markov chain has a direction, but a chromosome has no meaningful direction. Its direction is defined in an arbitrary fashion. Therefore, we can use either a forward Markov chain or a backward Markov chain to define a chromosome and the result will be identical, regardless which direction has been taken.

A Markov chain is used to derive the joint distribution of all marker genotypes. The joint distribution is eventually used to construct a likelihood function for estimating multiple recombination fractions. Given the recombination fractions, one can derive the conditional distribution of the genotype of a locus bracketed by two marker loci given the genotypes of the markers. The conditional distribution is fundamentally important in genetic mapping for complex traits.

Joint distribution of multiple locus genotype

When three loci are considered jointly, the method is called three-point analysis. Theory developed for three-point analysis applies to arbitrary

number of loci. Let ABC be three ordered loci on the same chromosome

with pairwise recombination fractions denoted by ABr , BCr and. ACr We can

imagine that these loci forms a Markov chain as either A B C or

A B C . The direction is arbitrary. Each locus represents a discrete

variable with two or more distinct values (states). For an individual from a four-way (FW) cross, each locus takes one of four possible genotypes, and

15

thus four states. Let 1 3A A , 1 4A A , . 2 3A A . and 2 4A A be the four possible

genotypes for locus A , 1 3B B , 1 4B B , 2 3B B and 2 4B B be the four possible

genotypes for locus B , and 1 3C C , 1 4C C , 2 3C C and 2 4C C be the four possible

genotypes for locus C. For convenience, each state is assigned a numerical value. For example, 1A or 2A indicates that an individual takes

genotype 1 3A A or 1 4A A . Let us take A B C as the Markov chain, the

joint distribution of the three-locus genotype is

( , , ) Pr( )Pr( | )Pr( | ),Pr A B C A B A C B (2.3)

where Pr( 1) Pr( 2) Pr( 3) Pr( 4) 1/ 4A A A A assuming that there is

no segregation distortion. The conditional probabilities, Pr( | )B A and

Pr( | )C B , are called the transition probabilities between loci A and B and

between loci B and C, respectively. The transition probabilities depend on the genotypes of the two loci and the recombination fractions between the two loci. These transition probabilities from locus A to locus B can be found from the following 4 4 transition matrix,

2 2

2 2

2 2

2 2

(1 ) (1 ) (1 )

(1 ) (1 ) (1 )

(1 ) (1 ) (1 )

(1 ) (1 ) (1 )

AB AB AB AB AB AB

AB AB AB AB AB AB

AB

AB AB AB AB AB AB

AB AB AB AB AB AB

r r r r r r

r r r r r rT

r r r r r r

r r r r r r

(2.4)

The transition matrix from locus B to locus C is denoted by BCT , which is

equivalent to matrix ABT except that the subscript AB is replaced by

subscript BC . Note that this transition matrix is obtained by the Kronecker

square (denoted by a superscript [2] ) of a 2 2 transition matrix,

1

,1

AB AB

AB

AB AB

r rH

r r

(2.5)

that is

21 1 1

.1 1 1

AB AB AB AB AB AB

AB

AB AB AB AB AB AB

r r r r r rT

r r r r r r

The 4 4 transition matrix (2.4) may be called the zygotic transition matrix and the 2 2 transition matrix (2.5) may be called the gametic transition matrix. That the zygotic transition matrix is the Kronecker square of the gametic transition matrix is very intuitive because a zygote is the product of

two gametes. Let ( , )ABT k l be the k th row and the l th column of the 4 4

transition matrix ABT , , 1, ,4k l . The joint probability of the three locus

genotype is expressed as

16

1

( , , ) ( , ) ( , ).4

AB BCPr A B C T A B T B C (2.6)

Consider a single locus, say locus A. A FW progeny can take one of the

four genotypes: 1 3A A , 1 4A A , 2 3A A and 2 4A A . Let 1, ,4A denotes the

numerical code for each of the four genotypes. The diagonal matrices, AD ,

BD and CD , are defined as a 4 4 matrix. The numerical code of A k is

translated into a AD matrix whose elements are all zero except that the k th

row and the k th column is unity. Having defined these diagonal matrices for

all loci, we can rewrite the joint distribution of the three locus genotype as

1

( , , ) ,4

A AB B BC CPr A B C J D T D T D J (2.7)

where J is a 4 1 vector of unity. For example, the joint probability that

3A , 1B and 4C is

2

13, 1, 4

4

1(3,1) (1, 4)

4

1(1 ) .

4

A AB B BC C

AB BC

AB AB BC

Pr A B C J D T D T D J

T T

r r r

The FW cross design described early represents a situation where all the four genotypes in the progeny are distinguishable. In reality, not all genotypes are distinguishable, e.g., the presence of dominant alleles. This may happen when two or more of the grandparents carry the same allele at

the locus of interest. The consequence is that the 1F hybrid initiated by the

first level of the cross may be homozygous or the two 1F parents may have

the same genotype. Assume that (34)

1F has a genotype of 3 3A A , which is

homozygous. This may be caused by a cross between two parents, both of

which are fixed at 3A allele. Regardless the reason that causes the

homozygosity of the 1F hybrid, let us focus on the genotypes of the two

parents and consider the four possible genotypes of the FW progeny.

Assume that (12)

1F and (34)

1F have genotypes of 1 2A A and 3 3A A , respectively.

The four possible genotypes of the progeny are 1 3A A , 1 3A A , 2 3A A and 2 3A A .

The first and the second genotypes are not distinguishable, although the 3A

allele carried by the two genotypes have different origins. This situation applies to the third and forth genotypes. Considering the allelic origins, we have four ordered genotypes, but we only observe two distinguishable genotypes. This phenomenon is called incomplete information for the genotype. Such a genotype is called partially informative genotype. If we

17

observe genotype 1 3A A , the numerical code for the genotype is (1,2)A . In

matrix notation, it is represented by

1 0 0 0

0 1 0 0.

0 0 0 0

0 0 0 0

AD

If an observed genotype is 2 3A A , the numerical code becomes (3,4)A ,

represented by

0 0 0 0

0 0 0 0.

0 0 1 0

0 0 0 1

AD

If both parents are homozygous and fixed to the same allele, say 1A , then

all the four genotypes of the progeny have the same observed form, 1 1A A .

The numerical code for the genotype is (1,2,3,4)A , a situation called no

information. Such a locus is called uninformative locus and usually excluded from the analysis. The diagonal matrix representing the genotype is simply a 4 4 identity matrix.

The following is an example showing how to calculate the three-locus joint genotype using the FW cross approach with partial information. Let

1 3 2 3 1 1A A B B C C and 4 4 2 3 1 2A A B B C C be the three-locus genotypes for two

parents. The linkage phases of markers in the parents are assumed to be known so that the order of the two alleles within a locus is meaningful. In fact, the phase known genotypes of the parents are better denoted by

1 2 1

3 3 1

A B C

A B C and 4 2 1

4 3 2

A B C

A B C, respectively, for the two parents. Assume that a

progeny has a genotype of 3 4 2 2 1 1A A B B C C . We want to calculate the

probability of observing such a progeny given the genotypes of the parents. First, we examine each single locus genotype to see which one of the four possible genotypes this individual belongs to. For locus A, the parental

genotypes are 1 2A A and 4 4A A . The four possible genotypes of a progeny

are 1 4A A , 1 4A A , 3 4A A and 3 4A A , respectively. The single locus genotype of

the progeny is 3 4A A , matching the third and forth genotypes, and thus

(3,4)A . For locus B, the parental genotypes are 2 3B B and 2 3B B . The four

possible genotypes of a progeny are 2 2B B , 2 3B B , 3 2B B and 3 3B B ,

respectively. The single locus genotype 2 2B B for the progeny matches the

18

first genotype, and thus 1B . For locus C, the parental genotypes are 1 1C C

and 1 2C C . The four possible genotypes of a progeny are 1 1C C , 1 2C C , 1 1C C

and 1 2C C , respectively. The single locus genotype of the progeny 1 1C C

matches the first and the third genotypes, and thus (1,3)C . In summary,

the numerical codes for the three loci are (3,4)A , 1B and (1,3)C ,

respectively. We now convert the three single locus genotypes into their corresponding diagonal matrices,

0 0 0 0 1 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0, and .

0 0 1 0 0 0 0 0 0 0 1 0

0 0 0 1 0 0 0

0 0 0 0 0

A B CD D D

Substituting these matrices into equation (2.7), we have

1Pr 3,4 , 1, 1,3

4

13,1 4,1 1,1 1,3

4

11

4

A AB B BC C

AB AB BC BC

AB BC

A B C J D T D T D J

T T T T

r r

The populations handled by PROC QTL also include 2F , BC, DH and RILs.

The four-way cross design is a general design where the BC, DH and 2F

designs are special cases of the general design with partial information. For

example, the two parents of the 1BC design have genotypes of 1 2A A and

1 1A A , respectively. If we treat the BC progeny as a special FW progeny, the

four possible genotypes are 1 1A A , 1 1A A , 2 1A A and 2 1A A , only two

distinguishable observed types. If a progeny has a genotype 1 1A A , the

numerical code of the genotype in terms of a FW cross is (1,2)A . If a

progeny has a genotype of 2 1A A , its numerical codes becomes (3,4)A .

The two parents of the 2BC design have genotypes of 1 2A A and 2 2A A ,

respectively. In terms of a FW cross, the four possible genotypes are 1 2A A ,

1 2A A , 2 2A A and 2 2A A . Again, there are only two distinguishable genotypes.

The two parents of the 2F design have genotypes of 1 2A A and 1 2A A ,

respectively. If we treat the 2F progeny as a special FW progeny, the four

possible genotypes are 1 1A A , 1 2A A , 2 1A A and 2 2A A , only three

distinguishable genotypes. The numerical code for the two types of homozygote are 1A and 4A , respectively, whereas the numerical code

19

for the heterozygote is (2,3)A . In summary, when the general FW design

is applied to a BC design, only two of the four possible genotypes are distinguishable and the numerical codes are (1,2)A for one observed

genotype and (3,4)A for the other observed genotype. When the general

FW design is applied to the 2F design, the two forms of heterozygote are

not distinguishable. When coding the genotype, we use (3,4)A to

represent the heterozygote, and 1A and 4A to represent the two types of homozygote, respectively. The transition matrices remain the same as those used in the FW cross design. When using the FW design for the BC problem, we have combined the first and second genotypes to form the first observable genotype, and combined the third and forth genotypes to form the second observable genotype for the BC design. It can be shown that the joint probability calculated by the Markov chain with two states (using the 2 2 transition matrix) and that calculated by the Markov chain with four

states (the 4 4 transition matrix) are identical. The 2F design we learned

early can be handled by combining the second and third genotypes into the

observed heterozygote. The 4 4 transition matrix is converted into a 3 3

transition matrix,

2 2

2 2

2 2

(1 ) 2(1 )

(1 ) (1 ) (1 ) .

2(1 ) (1 )

AB AB AB AB

AB AB AB AB AB AB AB

AB AB AB AB

r r r r

T r r r r r r

r r r r

The joint probability of multiple locus genotype for an 2F individual can be

calculated using a Markov chain with the 3 3 transition matrix. The

numerical code for a genotype must be redefined in the following way. The

three defined genotypes 1 1A A , 1 2A A and 2 2A A , are numerically coded by

1A , 2A and 3A , respectively. In matrix notation, the three genotypes

are denoted by

1 0 0 0 0 0 0 0

, a

0

0 0 0 0 1 0 0 0 0

0

nd .

0 0 0 0 0 0 0 1

A A AD D D

Recombinant inbred lines (RILs) are another widely used mapping population which produced by continuous selfing or sib mating the progeny

of individual members of a 2F population until complete homozygous is

achieved. RILs shared the same genetic structure with DH population and the model definition from BC population also can be used to RILs after little modification, but more recombinant individuals will be detected in RILs because the multi-cycles of meiosis. Therefore, recombinant fraction calculated via the Haldane function need to be further converted using equations (2.1) and (2.2) given in the introduction section

20

The general FW design using a Markov chain with four states is

computationally more intensive when applied to BC and 2F designs

compared to the specialized BC (with 2 2 transition matrix) and 2F (with

3 3 transition matrix) algorithm. However, the difference in computing

times is probably unnoticeable given the current computing power. In

addition, the 3 3 transition matrix is not symmetrical, a factor that may

easily cause a programming error. Therefore, the general FW design is recommended and was actually used in PROC QTL for all the simple line crossing experiments.

Conditional distribution of genotype for a putative position

The joint distribution described above is not used very often in QTL mapping. It is mainly used for further calculating the conditional probabilities of QTL genotypes. Consider the following five loci denoted by ABCDE in that order. Assume that ABDE are markers and C is a putative position in the center of the four markers. The conditional probability of genotype for locus C is

4

' 1

Pr( | ) Pr( | ) Pr( ) Pr( | ) Pr( | )Pr( | )

Pr( | ) Pr( | ') Pr( ') Pr( | ') Pr( | )k

A B B C k C k E C k D EC k ABED

A B B C k C k E C k D E

(2.8)

In matrix notation, this conditional probability is expressed as

( )

4

( ')

1

Pr( | )

T

A AB B BC k CE E ED D

T

A AB B BC k CE E ED D

k

J D T D T D T D T D JC k ABED

J D T D T D T D T D J

(2.9)

where ( )kD is a diagonal matrix with all elements being zero except that the

k th row and the k th column is unity.

VIRTUAL MAP

A virtual map is required for interval mapping and the implementation of interval mapping by PROC QTL. A virtual map is a map that contains more or less evenly distributed putative locations of the entire genome. The distance between two putative positions is 1 cM by default but users can define their own virtual map with a different distance. The conditional probabilities of the genotypes of the putative positions are calculated using the above multipoint method prior to execution of the interval mapping procedure. The separation of the conditional probabilities of genotypes from the interval mapping is a cost saving strategy. PROC QTL allows user to select three different ways to construct the virtual map, which are described below.

21

1. Variable increment

Users can provide a maximum increment that forces markers to be included in the virtual map. For example, if you put the STEP = d option in the PROC QTL statement, the procedure will create a virtual map in which the distance between two consecutive putative position equals or less than d cM and all markers are included in the virtual mapping. If the distance between two markers is a integer multiples of d cM, the increment within the interval is exactly d cM, otherwise, the increment within this interval is

*int( / ) 1

AB

AB

xd

x d

(2.10)

where ABx is the distance measured in cM between locus A and locus B.

With this option, the increment can vary from one marker interval to another but within a marker interval, the increment is the same.

2. Soft fixed increment

Soft fixed increment sets each increment to be exactly d cM except that the distance between a marker and a putative position in either side of the marker may be slightly less than d cM. All markers are forced to be included in the virtual map. The fixed increment search is turned on when STEP = d/soft option is used in the PROC QTL statement.

3. Hard fixed increment

Hard fixed increment sets each increment to be exactly d cM throughout the entire genome. Markers are not necessarily included in the virtual map. A marker will be included in the virtual map if and only if its distance from the first marker of the chromosome is an integer multiple of d cM. The hard fixed increment search is turned on when STEP = d/hard option is used in the PROC QTL statement.

The following example shows the differences among the three options. In this example, there are three linked markers M1, M2 and M3 in a chromosome of 20 cM long. The two intervals are 9.8 cM and 10.2 cM, respectively. User specified d = 2.0 cM as the step size for QTL mapping. The virtual maps generated by the three different options are shown in Figure 1 given below.

22

Figure 1. Virtual maps generated by the three options with d = 2.0: variable (left), soft

fixed (middle) and hard fixed (right)

M1

.

.

.

.

M2

.

.

.

.

.

M3

1.96

1.96

1.96

1.96

1.96

1.70

1.70

1.70

1.70

1.70

1.70

9.8

cM

10

.2cM

a

M1

.

.

.

.

M2

.

.

.

.

.M3

2.00

2.00

2.00

2.00

1.80

2.00

2.00

2.00

2.00

2.00

0.20

b

M1

.

.

.

.

.

.

.

.

.

M3

2.00

2.00

2.00

2.00

2.00

2.00

2.00

2.00

2.00

2.00

c

23

MAXIMUM LIKELIHOOD METHOD

We used two major statistical methods to estimate QTL parameters, the maximum likelihood method and the Bayesian method. PROC QTL uses these two methods and some variants or special cases of the two methods to perform QTL mapping. The maximum likelihood method is used for interval mapping while the Bayesian method is used for mapping multiple QTL. This chapter briefly describes the concept of maximum likelihood method and the likelihood ratio test statistic in general. The basic framework of the maximum likelihood method will be customized in latter chapters when details of the QTL mapping procedures are discussed.

LIKELIHOOD FUNCTION

PROC QTL deals with line crossing data. All line crossing experiments share a common property, that is different individuals within the family are independent conditional on parameters. This property makes the log

likelihood function the sum of the individual log likelihoods. Let jy be the

data point from the j th individual for 1,...,j n where n is the sample size.

Let be an 1m vector of parameters. The log likelihood function for

individual j is

( ) ln ( | )j jL f y (3.1)

where ( | )jf y is the probability density. The overall log likelihood function

is

1 1

( ) ( ) ln ( | )n n

j j

j j

L L f y

(3.2)

The log likelihood function of parameter and the logarithm of probability density of the data differ only by a constant, which is a function of data but not a function of the parameter. That constant is irrelevant to the maximum likelihood solution and thus always is ignored. The maximum likelihood

estimate (MLE) of is the one that maximizes ( )L and usually denoted by

. A local maximum likelihood solution can be obtained by solving the

following simultaneous equations,

1

( )( )0

nj

j

LL

(3.3)

In a very few situations, an explicit solution may exist, but most often there is no explicit solution. Therefore, a numerical solution must be found with some iteration schemes. Two numerical algorithms have been implemented

24

in PROC QTL. One is the Newton-Raphson algorithm, including the Fisher scoring algorithm as an improved version of the Newton-Raphson algorithm. The other is the Nelder and Mead (1965) simplex algorithm. The simplex algorithm is a derivative free algorithm because it does not require the partial derivative of the likelihood function with respect to the parameter vector. We only describe the Newton-Raphson and the Fisher scoring algorithms, leaving the simplex algorithm to the original paper (NELDER and MEAD 1965).

NEWTON-RAPHSON ALGORITHM

The Newton-Raphson algorithm requires both the first and the second

partial derivatives. Let ( )t be the parameter value at iteration t , the

parameter at iteration 1t is given by

12 ( ) ( )

( 1) ( )

( ) ( ) ( )1 1

( ) ( )t tn nj jt t

t t tj j

L L

(3.4)

We often call

1 1

( )( ) ( )

n nj

j

j j

LS S

(3.5)

the score vector and

2

1 1

( )( ) ( )

n nj

j Tj j

LH H

(3.6)

the Hessian matrix. Therefore the Newton-Raphson algorithm can be rewritten as

( 1) ( ) 1 ( ) ( )( ) ( )t t t tH S (3.7)

The Newton-Raphson algorithm is a fast algorithm in the sense that it only takes a few iteration to converge. Unfortunately, it is sensitive to the initial value of the parameter. If the initial value is not close to the true solution, the algorithm may fail to converge to the correct solution. The iteration

process often stops before it converges due to the fact that 1 ( )( )tH

does

not exist. One modification of the algorithm is called the Newton-Raphson-Ridge algorithm. This algorithm adds a small positive number to the diagonals of the Hessian matrix to make the matrix invertible, an idea borrowed from the ridge regression method (HOERL and KENNARD 2000; TYCHONOFF 1943). This modification is minor and thus the name of the algorithm still preserves the Newton-Raphson prefix.

25

INFORMATION MATRIX AND ESTIMATION ERRORS

A further improvement on the Newton-Raphson algorithm is called the Fisher scoring algorithm (HAN and XU 2008). Because the modification is substantial, the Newton-Raphson name is no longer preserved in this algorithm. But the idea of the new algorithm still came from the original Newton-Raphson algorithm. The Fisher scoring algorithm first defines the information matrix, called Fisher’s information, which is

( ) ( )I E H (3.8)

negative of the expectation of the Hessian matrix. The expectation is taken with respect to the observed data y . The Fisher scoring algorithm simply

replaces the Hessian matrix by the expectation of the Hessian matrix. You may be confused by this "expectation with respect to y " because the data

do not explicitly appear in the notation of the Hessian matrix. Dependent on the properties of the problem in question, sometimes ( )H is a function of

the data and sometimes it is not. When ( )H is not a function of the data,

( ) ( )I H holds and thus the Fisher scoring algorithm is the same as the

Newton-Raphson algorithm. The iteration equation of the Fisher scoring algorithm is

1

( 1) ( ) ( ) ( ) ( ) 1 ( ) ( )( ) ( ) ( ) ( )t t t t t t tE H S I S

(3.9)

The Fisher scoring algorithm is much more stable than the Newton-

Raphson algorithm because 1 ( )( )tI

almost always exists. Several other

properties make the Fisher scoring algorithm more appealing than the original Newton-Raphson algorithm. First, the inverse of the information matrix asymptotically represents the variance-covariance matrix of the estimated parameter

1ˆ ˆvar( ) ( )I (3.10)

This means that ˆvar( ) is a by-product of the Fisher scoring iteration

algorithm. Secondly, although

( ) ( ) ( )TH S S (3.11)

the following relationship holds

( ) ( ) ( )TE H E S S (3.12)

This means that we do not have to know the Hessian matrix to find the expectation of the Hessian matrix. This way of finding the expectation of the Hessian matrix can be substantially easier than deriving the Hessian matrix because we can use

26

1 1 1

( ) var ( ) ( ) ( )n n n

T

j j j

j j j

E H S E S E S

(3.13)

to calculate the expectation.

LIKELIHOOD RATIO TEST STATISTICS

The null hypothesis can often be expressed by

0 :H K C (3.14)

where k is a known p m matrix for p m and C is a known 1p vector.

Under the null hypothesis, can be estimated via maximizing the log

likelihood function under constraint K C . Let be the MLE of parameter

under constraint K C . The likelihood ratio test statistic is given by

0 1ˆ2 ( ) ( )L L

(3.15)

where is the MLE of the parameter obtained via maximizing the full log

likelihood function given in equation (3.2) and 1

ˆ( )L is the log likelihood

value evaluated at ˆ . 1

ˆ( )L is also called the log likelihood value for the

full model. Accordingly, 0( )L is called the log likelihood value for the

restricted (or reduced) model. 0( )L is simply the value of equation (3.2)

evaluated at . Note that K is often a subset of . In such a case,

can be estimated by maximizing the so called reduced log likelihood function, which has the same form as equation (3.2) except that only the

subset of parameter appear in the function. Most hypothesis tests in

PROC QTL are this kind. A more general method to find is through

maximization of the following quantity

( ) ( )TQ L K C (3.16)

where is a 1p vector of Lagrange multipliers. Both and are treated

as unknowns when maximizing Q . A more rigorous expression of the

solution is

arg max ( ) ( )TL K C (3.17)

where the arguments include both and . Under the null hypothesis, the

likelihood ratio test statistics asymptotically follows a chi-square distribution with p degrees of freedom. This asymptotic property allows us

to use the Piepho's (2001) simple method to calculate the genome-wide threshold value of the test statistics used for significance test of QTL.

27

WALD TEST STATISTICS

Wald test (WALD 1957) is another test alternative to the likelihood ratio test. The test statistics is defined as

1

1

ˆ ˆ ˆ( ) var ( )( )

ˆ ˆ ˆ( ) var( ) ( )

T

T T

W K C K C K C

K C K K K C

(3.18)

The Wald test statistics has the same asymptotic property as the likelihood ratio test, that is the chi-square distribution under the null hypothesis. When the sample size is small, the likelihood ratio test is more preferable than the Wald test. However, an obvious advantage of the Wald test over the likelihood ratio test is the avoidance of evaluating multiple log likelihood functions. Only the full log likelihood function is maximized with the Wald test.

28

BAYESIAN METHOD

INTRODUCTION TO BAYESIAN METHOD

Parameters and data y are the two items required by the maximum

likelihood analysis. The Bayesian method requires a distribution for the

parameters , in addition to the parameters themselves and the data.

Therefore, the parameters have been treated as variables in the Bayesian analysis. The purpose of Bayesian analysis is to infer the posterior

distribution of , also called the conditional distribution of given the data.

The posterior distribution of contains much more information than the

point estimate of from the maximum likelihood analysis. Let ( | )p y be

the probability density of data given the parameters and it is also called the likelihood function of the parameters. Let ( )p be the probability density of

the parameters. In the Bayesian framework, it is called the prior distribution. The posterior distribution of the parameters is

( | ) ( )

( | ) ( | ) ( )( )

p y pp y p y p

p y

(4.1)

which is proportional to the joint distribution of the data and the parameters. The marginal distribution of the data in the denominator plays no role in the Bayesian analysis and thus is often ignored. The posterior distribution does not seem to be more complicated than the likelihood function except that a prior distribution of the parameters is needed. This is true for a single

parameter problem. When the dimensionality of is more than one, the

posterior distribution required in the Bayesian analysis is the marginal posterior distribution for each element of the parameter vector. The posterior distribution given by equation (4.1) is called the joint posterior distribution, which is not what we want. Let us partition the parameter vector

into k and k , where k is the k th element of and k is a vector

containing the remaining elements of . The original parameter vector can

be expressed as ( , )k k . The marginal posterior distribution of element

k of vector is

( | ) ... ( , | )k k k kp y p y d (4.2)

This marginal posterior distribution is what we need in Bayesian analysis. To find this marginal posterior distribution, multiple integrations are required. In most situations, explicit multiple integral does not exist and numerical multiple integrations are needed. This explains why Bayesian analysis was not as popular as it is today in the past because numerical multiple

29

integrations were computationally unmanageable in the past. What is the Bayesian estimate of a parameter? The marginal posterior distribution is the Bayesian estimate of a parameter. It is a distribution rather than a single point estimate. The marginal posterior distribution can be best described by a few parameters in the distribution. Therefore, people often use the posterior mean or the posterior mode as the Bayesian estimate of the parameter. A more informative representative of the Bayesian estimate is the posterior mean accompanied with the posterior standard deviation. The most frequently used representatives of Bayesian estimate are the posterior mean, equal tail interval and highest posterior density interval. Anyway, the posterior distribution contains all information we need and it is the most informative representative of the Bayesian estimate.

MARKOV CHAIN MONTE CARLO ALGORITHM

The Markov chain Monte Carlo (MCMC) algorithm is a special numerical method for multiple integrations. It is more efficient than any other numerical methods for multiple integrations when the dimensionality is high. The MCMC algorithm is a sampling based algorithm in which parameters are repeatedly sampled from their conditional posterior distributions. Recall that the purpose of Bayesian analysis is to infer the marginal posterior distribution for each parameter. The MCMC algorithm, however, does not sample parameters from their marginal distributions because we do not have explicit forms of the marginal distributions. If we had known the marginal posterior distributions, the problem would have been solved. The target distribution from which parameters are sampled is the joint posterior distribution, but MCMC does not sample parameters from the joint posterior distribution either. The MCMC algorithm actually samples each parameter from its fully conditional posterior distribution. In the end, we collect the posterior sample for the random draws and the posterior sample is considered being drawn from the joint posterior distribution. When only one parameter is considered in the posterior sample and all other parameters are ignored, the sample of this parameter is drawn from the marginal posterior distribution. The verbal description of MCMC may be confusing. Let us describe the MCMC sampling process using equations and symbols.

Let ( | , )k kp y be the likelihood of the parameters and 1( , )kp be the

prior density of the parameters. The joint posterior distribution of the parameters is

1 1( , | ) ( | , ) ( , )k k k kp y p y p (4.3)

Let ( )t

k be the value of k at the t th iteration, the fully conditional posterior

distribution of k is obtained simply by replacing k with ( )t

k in the above

equation. Therefore, the fully conditional posterior distribution of k is

30

( ) ( ) ( ) ( )( | , ) ( , | ) ( | , ) ( , )t t t t

k k k k k k k kp y p y p y p (4.4)

This distribution usually has a simple form, e.g., normal, Bernoulli, chi-

square or other explicit form of distribution. As a result, k can be sampled

directly from that distribution. Each and every element of vector is sampled from its own fully conditional posterior distribution. This particular MCMC sampling algorithm is called the Gibbs sampler (CASELLA and GEORGE 1992;

GEMAN and GEMAN 1984). Let ( )t be the values of all parameters at

iteration t . The sequence (0) (1) ( ){ ... }T forms a Markov chain. This

means that ( 1)t depends on ( )t . Since the initial value (0) is arbitrarily

chosen, the Markov chain is highly dependent of the initial values of the parameters in the early stage. The change of parameter values in the early stage is usually chaotic. When the chain is sufficiently long, the parameters will reach their stationary distribution, i.e., the change tends to be stabilized. Parameter values in the early stage (before the chain reaches the stationary distribution) should not be counted. This early stage of the Markov chain is

called the burn-in period. After the burn-in period, ( 1)t may still highly

depend on ( )t . To remove the auto correlation, the chain needs to be

trimmed or thinned. Depending on the autocorrelation, the thinning rate may vary. By default, PROC QTL saves one observation in every 10 iterations. The thinning rate is 1:10. The posterior sample contains only the save observations after the burn-in and thinning. For example, suppose that the burn-in period is 1000 and the thinning rate is 1:10. The total number of iterations is 11000. The posterior sample contains only (11000 1000) /10 1000 observations, as shown in the following table,

Table 1. Posterior sample for five parameters with burn-in period of 1000 and

thinning rate of 1:10.

Iteration 1 2 3 4 5

1010 (1010)

1 (1010)

2 (1010)

3 (1010)

4 (1010)

5

1020 (1020)

1 (1020)

2 (1020)

3 (1020)

4 (1020)

5

1030 (1030)

1 (1030)

2 (1030)

4 (1030)

5

1040 (1040)

1 (1040)

2 (1040)

3 (1040)

4 (1040)

5

1050 (1050)

1 (1050)

2 (1050)

3 (1050)

4 (1050)

5

… … … … … …

10980 (10980)

1 (10980)

2 (10980)

3 (10980)

4 (10980)

5

11000 (11000)

1 (11000)

2 (11000)

3 (11000)

4 (11000)

5

31

Once the Markov chain reaches it stationary distribution, the parameter

vector ( )t is considered to be sampled from the joint posterior distribution,

i.e., each row of Table 1 is an observation from the joint posterior distribution. What is the marginal posterior distribution of a parameter? We do not have such a distribution yet, but we have a sample from that distribution. Take the first parameter for example, if we look at the column

headed by 1 in Table 1 and ignore all other parameters. The posterior

sample formed by this column is the sample drawn from the marginal posterior distribution. This is equivalent to the situation where we collected a sample with two variables, X and Y . It is a joint sample if both X and Y are considered, but if we only look at X and ignore Y , the sample is a marginal sample of X . The MCMC algorithm does not explicitly infer the marginal distribution; rather it provides a sample of observations drawn from the target distribution.

The MCMC algorithm described above is called the Gibbs sampler. In order to perform Gibbs sampler, we must know explicitly what distribution

( )( | , )t

k kp y is and know how to draw a random observation from that

distribution. Sometimes we may not know what ( )( | , )t

k kp y is and do not

know an easy way to draw a variable from that distribution. In this case, we must use an alternative method to draw a variable. This method is called the Metropolis-Hastings algorithm (HASTINGS 1970; METROPOLIS et al. 1953). This algorithm is an accept-reject method of drawing random variable. With

this method, we first draw a variable from a proposal distribution ( )kq . This

proposal distribution should be as close to ( )( | , )t

k kp y as possible. We

then decide whether the draw should be accepted or rejected. If we reject

the draw, the old value of k will be carried over to the next iteration,

otherwise, the old value is replaced by the new draw. Let ( )t

k be the old

value of parameter k and *

k be the new value drawn from the proposal

distribution *( )kq . The acceptance probability is defined as

* ( ) ( )

( ) ( ) *

( | , ) ( )max 1,

( | , ) ( )

t t

k k k

t t

k k k

p y q

p y q

(4.5)

If *

k is accepted, we let ( 1) *t

k k , otherwise, ( 1) ( )t t

k k . Depending on the

acceptance rate, the auto correlation may be high. The ideal acceptance rate should be around 0.6. It can be shown that if the proposal distribution is

the fully conditional posterior distribution, then 1 and the acceptance

rate is 100%. Therefore, the Gibbs sampler is a special case of the Metropolis-Hastings algorithm (TIERNEY 1994).

Another special case of the Metropolis-Hastings algorithm is called the Metropolis algorithm or random walk algorithm. With the random walk

32

algorithm, the proposal distribution is symmetric, i.e., ( ) *( ) ( )t

k kq q .

Therefore, the acceptance probability is simply the posterior ratio. For

example, let ( ) ( )( , )t t

k kU be a uniform random variable around

the old value of the parameter and is a small constant. Let * ( )t

k k k .

The proposal distribution is * ( )( | ) 1/ (2 )t

k kq . The counter part of the

proposal distribution is ( ) *( | ) 1/ (2 )t

k kq , and thus * ( ) ( ) *( | ) ( | )t t

k k k kq q .

This random walk MCMC can be used for some parameters (e.g., regression coefficient) but not others (e.g., residual variance).

PROC QTL uses the Gibbs sampler, the random walk algorithm or the general Metropolis-Hastings algorithm to sample a parameter. This is different from PROC MCMC, which uses the random walk algorithm to sample all parameters. Because of this, PROC MCMC is so general that it can be used for almost any problems, including QTL mapping. However, PROC MCMC is not effective for QTL mapping because the sample size and number of markers are usually too large for the MCMC procedure to complete the analysis within a reasonable time frame. Majority of the time spent by the MCMC procedure is for tuning the parameters of the proposal distribution to get the optimal acceptance rate rather than for the actual sampling process. PROC QTL, however, does not manipulate the tuning parameters and thus much faster than the MCMC procedure.

DIAGNOSES OF CONVERGENCE OF MARKOV CHAIN

The concept of convergence in MCMC differs from convergence in other numerical iteration algorithm. The iteration of MCMC sampling converges to a joint distribution rather than a constant vector. We cannot collect the posterior sample before the chain converges to the stationary distribution. Therefore, an assessment of convergence is needed.

The simplest method for convergence assessment is to visualize the trace plots for a few parameters. The trace plot for a parameter is the plot of the sample value against the iteration number. Before the chain reaches its stationary distribution, the trace varies chaotically. Once it reaches the stationary distribution, the change becomes stabilized.

Figure 2 shows the trace plot for a particular parameter. The chain appears to have converged after 100 iterations. Therefore, the burn-in period of 100 seems to be sufficient for this parameter. Strictly speaking, all parameters (not a few) must reach their stationary distributions before we can collect observations to form the posterior sample. In practice, this may not be realistic because some parameters may just never converge or may be trapped to some fixed values. As long as these parameters are not

33

Figure 2. The posterior TAD panels (trace, autocorrelation and density) for parameter

beta ( ).

Figure 3. The trace plot of a parameter for three independent chains.

Diagnostics for beta

Iteration

10000 20000 30000 40000 50000

beta

4.0

4.4

4.8

5.2

beta

2 3 4 5 6 7

Poste

rior

Density

Lag

0 10 20 30 40 50

Auto

corr

ela

tion

-1.0

-0.5

0.0

0.5

1.0

Iteration

0 200 400 600 800 1000

Para

mete

r

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

34

important, the results can be used for inference. Bayesian theoreticians may not agree with this statement, but our experience on simulated data analyses does support this statement. A better assessment of convergence is to run multiple independent chains and monitor the trace plots of a few parameters for all chains. Figure 3 shows the trace plots of a parameter for three independent chains. The convergence appears to happen after 200 iterations.

The multiple-chain MCMC algorithm also gives a chance to statistically test the convergence, called the Gelman-Rubin’s diagnostic test (GELMAN and RUBIN 1992). The test statistic is analogous to ANOVA where the between chain variance is compared with the within chain variance. A substantially larger between chain variance than the within chain variance indicates that convergence has not been reached. Geweke’s (1992) diagnostic test can also be used to assess the convergence. This test does not require multiple chains but a single long chain. We select x% of the posterior sample in early stage of the chain and x% of the posterior sample in late stage of the chain. The means of the two subsamples are compared. If the means are not significantly different, the chain may have converged. PROC QTL does not provide these convergence diagnostic tests, but it does provide the entire posterior sample from which the diagnostic test statistics can be obtained by the investigators using other programs.

Two additional diagnostic analyses are commonly used to evaluate the mixing properties of the sampler. They are the autocorrelation and the effective posterior sample size. Autocorrelation is a measurement of the dependency of consecutive observations of the posterior sample. A high autocorrelation indicates highly dependency of the posterior observations within the posterior sample. Autocorrelation of lag h ( h n ) is defined as

cov || , | |

cov | 0

k

k

k

hh h n

(4.6)

where n is the posterior sample size and

( ) ( )

1

1cov( | ) , 0k k k

n ht h t

k

t

kh h nn h

(4.7)

is the covariance between the t th observation and the t h observation for

parameter k . The denominator of equation (4.6) is the covariance at 0h ,

equivalent to the variance of the posterior sample.

The effective sample size is determined by the autocorrelation. A high autocorrelation will reduce the effective sample size. The functional relationship of the effective sample size and the autocorrelation is

35

1

ESS( )

1 2 ( | )k

k

h

n n

h

(4.8)

where is referred to as the autocorrelation time. In reality, the infinity is

replaced by a cutoff point of h , beyond which ( | )k h is essentially zero.

You can choose the cutoff point at a value so that ( | ) 0.05k h .

POST MCMC ANALYSIS

By default, PROC QTL generates a posterior sample that contains observations trimmed after the burn in period. Users can use other SAS procedures to perform post MCMC analysis on the sampled parameters. Alternatively, users may run PROC QTL again using the posterior sample as the input dataset to report the summary statistics of the posterior sample. The summary statistics reported by PROC QTL only contain the posterior sample size, the posterior mean and the posterior standard deviation. More summary statistics will be added in a later version of PROC QTL. These additional summary statistics include the equal-tail credible interval and

the highest posterior density credible interval. The alpha equal tail

credible interval ( , )a b is defined as

Pr( ) Pr( ) / 2k ka b (4.9)

while the highest posterior density credible interval is defined as

Pr( ) 1ka b (4.10)

such that b a is the shortest interval among all other choices of a and b .

36

INTERVAL MAPPING FOR NORMALLY

DISTRIBUTED TRAITS

Interval mapping was originally developed by Lander and Botstein (1989) and further modified by numerous authors. Interval mapping has revolutionized genetic mapping because we can really pinpoint the exact location of a QTL. In each of the four sections that follow, we will introduce

one specific statistical method of interval mapping based on an 2F design.

Maximum likelihood (ML) method of interval mapping (Lander and Botstein 1989) is the optimal method for interval mapping. Least squares (LS) method (Haley and Knott 1992) is a simplified approximation of Lander and Botstein (1989). The iteratively reweighted least squares (IRLS) method (Xu 1998b) is a further improved method over the least squares method. Recently Feenstra et al. (2006) developed an estimating equation (EE) method for QTL mapping, which is an extension of the IRLS with improved performance. Han and Xu (2008) developed a Fisher scoring algorithm (FISHER) for QTL mapping. Both the EE and FISHER algorithms maximize the same likelihood function and thus generate identical result.

A special comment on interval mapping is presented here in the introduction. Interval mapping is efficient for mapping single QTL. However, it is frequently used for mapping multiple QTL and the results are not too bad as long as the QTL are not too closely linked (Wang et al. 2005b). With the advent of more advanced statistical methods for mapping multiple QTL such us composite interval mapping, multiple interval mapping and Bayesian mapping, interval mapping seems to be out of date. What is the justification of using interval mapping while multiple QTL model and methodology are already available? Our experience indicates that interval mapping is still the most reliable method for QTL mapping in most situations. Compared with interval mapping, advanced statistical methods tend to be more sensitive to the experimental designs, the data structures, the initial values of the parameters and other factors such us assumptions of the models. The advanced methods require much experience of the investigators to make them work properly while interval mapping does not. When applying advanced statistical methods, e.g., the Bayesian method, to analyze real data, we almost always compare the results with that of the interval mapping to eliminate any suspicious artifacts. If the results are drastically different from that of the interval mapping, we always double check the models, the methods and the programs. In most situations, the advanced methods are supposed to improve interval mapping, not to change the result in a fundamental way. Having said that, we are not discouraging the use of advanced models and methodologies, we simply recommend using advanced methods with extra cautions to prevent any potential artifacts.

37

In this chapter, we introduce the methods based on their simplicity rather than their chronological orders of development. Therefore, the methods will be introduced in the following order: LS, IRLS, FISHER and ML.

SIMPLE LEAST SQUARES METHOD

The LS method was introduced by Haley and Knott (1992) aiming to improving the computational speed. The statistical model for the phenotypic value of the j th individual is

j j j jy X Z (5.1)

where is a 1p vector for some model effects that are irrelevant to QTL

effects, jX is a 1 p known design vector, , da is a 2 1 vector for

QTL effects of a putative locus ( a for additive effect and d for dominance

effect), jZ is a 1 2 vector for the genotype indicator variable defined as

1 1 1

2 1 2

3 2 2

for

for

for

j

H A A

H A A

H A A

(5.2)

where kH for 1,2,3k is the k th row of matrix

1 0

0 1

1 0

H

(5.3)

The residual error j is assumed to be a 2(0, )N variable. Although normal

distribution for j is not a required assumption for the LS method, it is

required for the ML method. It is important to include non-QTL effects in

the model to control the residual error variance as small as possible. For example, location and year effects are common in replicated experiments. These effects are not related to QTL but will contribute to the residual error if not included in the model. If there is no such a non-QTL effect to consider in a nice designed experiment, will be a single parameter (intercept) and

jX will be unity across all 1, ,j n .

With interval mapping, the QTL genotype is never known unless the putative QTL position overlaps with a fully informative marker. Therefore,

Haley and Knott (1992) suggested to replace the unknown jZ by the

expectation of jZ conditional on flanking marker genotype. Let (1)jp , (0)jp

and ( 1)jp be the conditional probabilities for the three genotypes given

38

flanking marker information. The LS model of Haley and Knott (1992) is

j j j jy U e (5.4)

where

1 2 3( ) (1) (0) ( 1)j j j j jU E p H p H p H (5.5)

is the conditional expectation of jZ . The residual error je (different from j )

remains to be normal with mean zero and variance 2 , although this

assumption has been violated (see next section). The least squares estimate of and is

1 1 1

1 1

1

1

ˆ

ˆ

n n nT T T

j j j j j j

j j j

n n nT T T

j j j j j j

j j j

X X X U X y

U X U U U y

(5.6)

and the estimated residual error variance is

2 2

1

1 ˆ ˆˆ ( )2

n

j j j

j

y X Un p

(5.7)

The variance-covariance matrix of the estimated parameters is

1 1

1

1

2

1

vˆ

ˆar

n nT T

j j j j

j j

n nT T

j j j j

j j

X X X U

U X U U

(5.8)

which is a ( 2) ( 2)p p matrix. Let

ˆˆ ˆvar( ) cov( , ),

ˆvar( )ˆ ˆˆcov( , ), var( )

a a dV

a d d

(5.9)

be the 2 2 lower diagonal block of the matrix (5.8). The standard errors of the estimated additive and dominance effects are the square roots of the diagonal elements of matrix (5.9).

We can use either the F-test or the W-test statistic to test the hypothesis

of 0 : 0H . The W-test statistic is

1

1ˆ ˆˆ ˆvar( ) cov( , )ˆˆ ˆ ˆ

ˆˆ ˆˆcov( , ) var( )

Taa a d

W V a dda d d

(5.10)

39

The corresponding F test statistic is / 2F W in the 2F design. The

likelihood ratio test statistic can also be applied if we assume that 2~ (0, )je N for all 1, ,j n . The log likelihood function for the full model

is

2 2 2

1 21

1 ˆ ˆˆ ˆln( ) ( ) [ln( ) 1]ˆ2 2 2

n

j j

j

n ny X UL

(5.11)

The reduce model under 0 : 0H is

2 2 2

0 21

1 ˆˆ ˆˆˆ ˆln( ) ( ) [ln( ) 1]ˆ2 2ˆ

n

j

j

n ny XL

(5.12)

where

1 1

1

ˆ n nT T

j j j j

j j

X X X y

(5.13)

and

2 2

1

1 ˆˆ ˆˆ ( )n

j j

j

y Xn p

(5.14)

The likelihood ratio test statistic is

0 12( )L L (5.15)

WEIGHTED LEAST SQUARES

Xu (1995) realized that the LS method is flawed because the residual

variance is heterogeneous after replacing jX by its conditional expectation

jU . The conditional variance of jX given marker information varies from

one individual to another and it will contribute to the residual variance. Xu (1998a, b) rewrote the exact model

j j j jX Zy (5.16)

by

( )j j j j j jy X U Z U (5.17)

which differs from the Haley and Knott's (1992) model by ( )j jZ U . Since

jZ is not observable, this additional term is merged into the residual error.

Let

40

( )j j j jZ Ue (5.18)

be the new residual error. The Haley and Knott's model (1992) can be rewritten as

j j j jX ey U (5.19)

Although we assume 2 (0, )j N , this does not validate the normal

assumption of je . The expectation of 2(0, )je N is

) [ ( ) ] ( ) 0( j j j jE Z UE e E (5.20)

The variance of je is

2 2 2

2

1) ( ) 1var( varT T

j j j je Z

(5.21)

where var( )j jZ , which is defined as a conditional variance matrix given

flanking marker information. The explicit forms of j is

2 ( ) ( ) ( )T T

j j j jE Z Z E Z E Z (5.22)

where

1 1 2 2 3 3) (1) (0) ( 1( )T T T T

j j j j jZ p H H p H HE Z p H H

and

1 2 3) (1) (0) ( 1)( j j j j jU p H p H pZ HE . (5.23)

Let

2 2 2

2

1 11T

j j

jW

(5.24)

where

1

2

11T

j jW

(5.25)

is the weight variable for the j th observation. The weighted least squares

estimate of the parameters are

41

1 1 1

1 1 1

1

ˆ

ˆ

n n nT T T

j j j j j j j j j

j j j

n n nT T T

j j j j j j j j j

j j j

W W WX X X U X y

U X U U U yW W W

(5.26)

and

2 2

1

1 ˆ ˆˆ ( )2

n

j j j j

j

W y X Un p

(5.27)

Since jW is a function of 2 , iterations are required. The iteration process is

demonstrated as below.

1. Initialize and 2

2. Update and using equation (5.26)

3. Update 2 using equation (5.27)

4. Repeat step 2 to step 3 until a certain criterion of convergence is satisfied.

The iteration process is very fast, usually taking less than 5 iterations to converge. Since the weight is not a constant (it is a function of the parameters), repeatedly updating the weight is required. Therefore, the weighted least squares method is also called iteratively reweighted least squares (IRLS). The few cycle of iterations make the results of IRLS very close to that of the maximum likelihood method (to be introduced later). A nice property of the IRLS is that the variance-covariance matrix of the estimated parameters is automatically given as a by-product of the iteration process. This matrix is

1

1 1 2

1 1

vˆ

ˆˆ

ar

n nT T

j j j j j j

j j

n nT T

j j j j j j

j j

X X X U

U X U

W

W WU

W

(5.28)

As a result, the F- or W-test statistic can be used for significance test. Like the least squares method, a likelihood ratio test statistic can also be

established for significance test. The 0L under the null model is the same

as that described in the section of least squares method. The 1L under the

alternative model is

42

2 2

1 21 1

2

1

1 1 ˆ ˆˆln( ) ln( ) ( )ˆ2 2 2

1ˆln( ) 1 ln(

2 2)

n n

j j j j

j j

n

j

j

nW W y X UL

nW

(5.29)

FISHER SCORING ALGORITHM

The weighted least squares solution described in the previous section does not maximize the log likelihood function (5.29). We can prove that it actually

maximizes equation (5.29) but with jW treated as a constant. The fact that

jW is a function of parameters makes the above weighted least squares

estimates suboptimal. The optimal solution should be obtained by

maximizing (5.29) fully without assuming jW being a constant.

Recall that the linear model for jy is

j j j jX ey U

where the residual error ( )j j j je Z U has a zero mean and variance

2 2 2

2

1 11T

j j

jW

(5.30)

If we assume that 2(0, )j je N , we can construct the following log

likelihood function,

2 2

21 1

(1 1

ln( ) ln( ) ( )2 2 2

)n n

j j j j

j j

nW W y X UL

(5.31)

where 2, , is the vector of parameters. The maximum likelihood

solution for the above likelihood function is hard to obtain because jW is not

a constant but a function of the parameters. The Newton-Raphson algorithm may be adopted but it requires the second partial derivative of the log likelihood function with respect to the parameter, which is very complicated. In addition, the Newton-Raphson algorithm often misbehaves

when the dimensionality of is high. We now introduce the Fisher scoring

algorithm for finding the MLE of (Han and Xu 2008). The method requires

the first partial derivative of ( )L with respect to the parameters, called the

score vector and denoted by ( )S , and the information matrix, denoted by

( )I . The score vector has the following form,

43

2

2 2

2 2 41 1 1

2 2

4 21 1

1

1)

1 1 1) ) )

(

( ( (

1)

2(

1

nT

j j j j

n n nT

j j j j j j j j j

j j

j

j

j

n n

j j j j

j j

X W y

S W y yU W

W

W

W y

(5.32)

where

j j jX U (5.33)

The information matrix is given below

2 21 1

2 2

2 2 4 41 1 1 1

2 2

4 41 1

1 10

1 1 2 1)

1 10

(

2

n nT T

j j j j j j

j j

n n n nT T

j j j j j j j j j

j j j j

n nT

j j j

j j

j j

X W X X W U

U X U U W W

W

I W

W

W

(5.34)

The Fisher scoring algorithm is implemented using the following iteration equation,

( 1) ( ) 1 ( ) ( )( ) ( )t t t tI S (5.35)

where ( )t is the parameter value at iteration t and ( 1)t is the updated

value. Once the iteration process converges, the variance-covariance matrix of the estimated parameters is automatically given, which is

1va ( ˆ ˆ) )r (I (5.36)

The detailed expression of this matrix is

1

1 1

2 2

2 21 1 1 12

2 2

2 21 1

0

ˆ

2 1ˆvar

ˆ1 1

02

j j

n nT T

j j j j j j

j j

n n n nT T

j j j j j j j j j

j j j j

n nT

j j j

j j

X W X X W U

U X U U W W

W

W W

W

(5.37)

which can be compared with the variance-covariance matrix of the iteratively reweighted least squares estimate given in the previous section.

44


The maximum likelihood method is the optimal one compared with all other methods described in this chapter. Recall that the linear model for the

phenotypic value of jy is

j j j jX Zy (5.38)

where 2(0, )j N is assumed. The genotype indicator variable jZ is a

missing value because we cannot observe the genotype of a putative QTL.

Rather than replacing jZ by jU as done in the least squares and weighted

least squares methods, the maximum likelihood method takes into

consideration the mixture distribution of jy . When the genotype of the

putative QTL is observed, the probability density of jy is

2

2

1 1( ) Pr( | ) exp , 1, 2, 3

22k j j j k j j ky y Z H y Xf H k

(5.39)

When flanking marker information is used, the conditional probability that

j kZ H is 1(1) Pr( )j jp Z H , 2(0) Pr( )j jp Z H or

3( 1) Pr( )j jp Z H , respectively, for the three genotypes ( 1 1A A , 1 2A A ,

2 2A A ). These probabilities are different from the Mendelian segregation

ratio (0.25,0.5,0.25) . They are the conditional probabilities given marker

information and thus vary from one individual to another because different individuals may have different marker genotypes. Using the conditional probabilities as weights, we get the mixture distribution

3

1

) (2 ) ( )(k

j j k jf y p k f y

(5.40)

where

for 1

for

(1)

(2 ) (0)

( 1)

2

for 3

j

j j

j

k

p k

p p k

p k

(5.41)

is a special notation for the conditional probability and should not be

interpreted as jp times (2 )k . The log likelihood function is

1 1

( ln () ( ) )n

j

n

j

j

j

L L f y

(5.42)

where ( ) ln ( )j jL f y .

45

The EM algorithm

The MLE of can be obtained using any numerical algorithms but the EM

algorithm is generally more preferable than others because we can take advantage of the mixture distribution. Derivation of the EM algorithm is not provided in this manual. Here we simply give the result of the EM algorithm. Assuming that the genotypes of all individuals are observed, the maximum likelihood estimates of parameters would be

1

1 1 1

11 1

n nT T

j j j j j

j j

n nT T

j j j j j

j

nT

j

j

nT

j

jj

X X X Z

Z X Z Z

X y

Z y

(5.43)

and

2 2

1

1( )

n

j j j

j

y X Zn

(5.44)

The EM algorithm takes advantage of the above explicit solutions of the

parameters by substituting all entities containing the missing value jZ by

their posterior expectations, i.e.,

1

1

1 1

1 11

( )

) )( )( (

n nT T

j j j j j

j j

n nT T

j j j j j

j j

nT

j

j

nT

j

j

E XX X X Z

E Z X E Z Z

y

E Z y

(5.45)

and

2 2

1

1 ( )

n

j j j

j

E y X Zn

(5.46)

where the expectations are taken using the posterior probabilities of QTL genotypes. Let

3

'

*

1

(2 ) ( )(2 )

(2 ) ( )

j k j

j

j k jk

p k f yk

k f yp

p

(5.47)

be the posterior probability of the k th genotype for 1,2,3k . The posterior

expectations are

46

3*

1

3*

1

32 * 2

1

) (2 )

) (2 )

( ) (2 )(

(

(

)

j j k

k

T T

j j j k k

k

j j j j j j k

k

p k H

Z p k H H

E

E Z

E

y X Z p k y X H

Z

(5.48)

Since ( )k jf y is a function of parameters, and thus *(2 )jp k is also a

function of parameters. However, the parameters are unknown and they are the very quantities we want to find out. Therefore, iterations are required. Here is the iteration process,

1. Initialize ( )t for 0t

2. Calculate the posterior expectations using equations (5.47) and (5.48)

3. Update parameters using equations (5.45) and (5.46)

4. Increment t by 1 and repeat step 2 to step 3 until a certain criterion

of convergence is satisfied.

Once the iteration converges, the MLE of the parameters is ( )ˆ t where t

is the number of iterations required for convergence.

Variance-covariance matrix of

Unlike the weighted least squares and the Fisher scoring algorithms where the variance-covariance matrix of the estimated parameters is automatically given as a by-product of the iteration process, the EM algorithm requires an additional step to calculate this matrix. The method was developed by Louis (1982) and it requires the score vectors and the Hessian matrix for the complete-data log likelihood function rather than the actual observed log likelihood function. The complete-data log likelihood function is the log

likelihood function as if jZ were observed, which is

1

, ) ( , )(n

j

j

Z ZL L

(5.49)

where

2 2

2

1 1( , ) ln( ) ( )

2 2j j j jZ yL X Z

(5.50)

The score vector is

1

( (, ) , )n

j

j

Z ZS S

(5.51)

47

where

2

2

2

2 42

1( , ) )

1( , ) ( , ) )

1 1)( , )

2 2

(

(

(

Tj j j j j

T

j j j j j j

j j jj

L y

S L y

Z X X Z

Z Z Z X Z

yL X ZZ

(5.52)

The second partial derivative (Hessian matrix) is

1

, ) ( , )(n

j

j

Z H ZH

(5.53)

where

2 2 2

2

2 2 2

2

2 2 2

2 2 2 2

( , ) ( , ) ( , )

( , ) ( , ) ( , )( , )

( , ) ( , ) ( , )

T T

j T T

T T

L Z L Z L Z

L Z L Z L ZZ

L Z L Z Z

H

L

(5.54)

The six different blocks of the above matrix are

2

2

2

2

2

2 4

2

2

2

2 4

22

2 2 4 6

( , )

( , ) 1

( , ) 1)

( , ) 1

( , ) 1)

( , ) 1 1)

2

1

(

(

(

T

T

j jT

T

j j j

T

j j

j

T

j jT

T

j j j j

j j j

L Z

L ZX Z

L ZX X Z

L ZZ

X X

y

y

y

Z

L ZZ X Z

L ZX Z

(5.55)

We now have the score vector and the Hessian matrix available for the complete-data log likelihood function. The Louis information matrix is

) ( , ) ( , )( , )(TE H Z E S Z S ZI (5.56)

48

where the expectations are taken with respect the missing value ( jZ ) using

the posterior probabilities of QTL genotypes. At the MLE of parameters,

ˆ, )( 0S ZE

. Therefore,

, , var , , ,

var ,

T TE S Z S Z S SZ E S Z E Z

S Z

(5.57)

As a result, an alternative expression of the Louis information matrix is

1 1

var( ) ( , ) ( , )

, var ,n n

j j

j j

I E H Z S Z

E H Z S Z

(5.58)

The expectations and variances in the above information matrix are given below.

2 2 2

2

2 2 2

2

2 2

( , ) ( , ) ( , )

( , ) ( , ) ( , )( , )

( , ) ( , ) ( , )

j j j

T T

j j j

j T T

j j j

T T

L Z L Z L ZE E E

L Z L Z L ZE H Z E E E

L Z L Z L ZE E E

2 2

(5.59)

The six different blocks of the above matrix are

2

2

2

2

2 4

2

2

2

2 4

2

2

( )

( ) 1( )

(

1

( )

) 1( )

( ) 1( )

( ) 1

( )

j

T

j T

j jT

j T

j j j j

j T

j jT

j

j j j

j

T

j j

T

j

LE X X

LE

LE

LE

LE Z

X E Z

X y X E Z

E Z Z

E

L

X

E

y Z

2

2 2 4 6(

1

2)

1j j jE y X Z

(5.60)

49

Again, all the expectations are taken with respect to the missing value jZ ,

not the observed phenotype jy . This is very different from the information

matrix of the Fisher scoring algorithm. The variance-covariance matrix of the score vector is

1

var ( , ) var ( , )n

j

j

S Z S Z

(5.61)

where

2

2

2

var[ ( , )]

( , ) ( , ) ( , ) ( , ) ( , )var cov , cov ,

( , ) ( , ) ( , ) ( , ) ( , )cov , var cov ,

( , ) ( , )cov ,

j

j j j j j

T

j j j j j

T

j j

S Z

L Z L Z L Z L Z L Z

L Z L Z L Z L Z L Z

L Z L Z

2 2

( , ) ( , ) ( , )cov , var

j j j

T T

L Z L Z L Z

(5.62)

The variances are calculated with respect to the missing value using the posterior probabilities of QTL genotypes. Following is the detailed expression of all blocks of the above matrix.

( , ) ( , ) ( , ) ( , ) ( , ) ( , )cov ,

j j j j j j

T T T

L Z L Z L Z L Z L Z L ZE E E

2 2 2

( , ) ( , ) ( , ) ( , ) ( , ) ( , )cov , ,

j j j j j jL Z L Z L Z L Z L Z L Z

E E E

( , ) ( , ) ( , ) ( , ) ( , )var

j j j j jL Z L Z L Z L Z L Z

E E E

2 2 2

( , ) ( , ) ( , ) ( , ) ( , ) ( , )cov ,

j j j j j jL Z L Z L Z L Z L Z L Z

E E E

2 2 2 2 2

( , ) ( , ) ( , ) ( , ) ( , )var

j j j j jL Z L Z L Z L Z L Z

E E E

where

2

( , ) 1( )

j T

j j j j

L ZE X y X E Z

2

( , ) 1( )

j

j j j jT

L ZE E y X Z Z

50

2

2 2 4

( , ) 1 1( )

2 2

j

j j j

L ZE E y X Z

2

4

( , ) 1( )

( , )j j

j j j j j

TL Z L Z

E X y X Z XE

2

4

( , ) ( , ) 1( )

j j T

j j j j jT

L Z L ZE X E y X Z Z

3

4 62

( ) ( )( , ) ( , )),

2 2

T Tj j jj j j j jj j

y X E ZX X E y X ZL Z L ZE

2

4

( , ) ( , ) 1( )

j j

j j j j j

TL Z L Z

E E Z y X Z Z

3

4 62

( ) ( )( , ) ( , )

2 2

T T

j j j j j j j jj jE EZ y X Z Z y X ZL Z L Z

E

2 4

4 6 82 2

( )( , ) ( , ) 1

4 2

( )

4

j j j j j jj jE Ey X Z y X ZL Z L Z

E

The expectations in the above equations are calculated using the posterior probability of QTL genotype by

3*

1

3*

1

32 * 2

1

3*

1

3 *

(2 )

( ) (2 )( )

( ) (2 ) ( )

( ) (2 )( )

(

) (2 )

)

(

j j k

k

j j j j j j j k k

k

T T

j j j j j j j j j k k

k

n n

j j j j j j k

k

T

j j j j j j

Z p k H

E y X Z Z p k y X H H

E Z y X Z Z p k H y X H H

E y X Z p k y X H

E Z y X Z H

E

p k

3

3

1

( )T

j j k

k

y X H

When calculating the information matrix, the parameter is substituted by

, the MLE of . Therefore, the observed information matrix is

ˆ ˆ ˆ( ) ( , ) var ( , )I E H Z S Z (5.63)

and the variance-covariance matrix of the estimated parameters is

51

1ˆ ˆvar( ) ( )I (5.64)

HYPOTHESIS TESTING

The hypothesis that 0 : 0H can be tested using several different ways. If

ˆvar( ) is already calculated, we can use the F- or W-test statistic, which

requires ˆvar( ) , the variance-covariance matrix of the estimated QTL

effects. It is a submatrix of ˆvar( ) . The W-test statistic is

1ˆ ˆ ˆvar ( )TW (5.65)

Alternatively, the likelihood ratio test statistic can be applied to test 0H . We

have presented two log likelihood functions, one is the complete-data log likelihood function, denoted by ( , )L Z , and the other is the observed log

likelihood function, denoted by ( )L . The log likelihood function used to

construct the likelihood ratio test statistic is ( )L , not ( , )L Z . The

complete-data log likelihood function, ( , )L Z , is only used to derive the EM

algorithm and the observed information matrix. The likelihood ratio test statistic is

0 12( )L L (5.66)

where 1

ˆ( )L L is the observed log likelihood function evaluated at

2ˆ ˆ ˆ ˆ, , and 0L is the log likelihood function evaluated at 2ˆ ˆ ˆˆ ˆ ˆ,0,

under the restricted model. The estimated parameterˆ under the restricted

model and 0L are the same as those given in the section of least squares

(5.12).

REMARKS ON THE FOUR METHODS OF INTERVAL MAPPING

The LS method (Haley and Knott 1992) is an approximation of the ML method, aiming to improve the computational speed. Because of the fast speed, the method remains a popular method, even though the computer power has increased by many orders of magnitude since the LS was developed. In some literature (e.g., Feenstra et al. 2006), the LS method is also called the H-K method in honor of the authors, Haley and Knott (1992). Xu (1995) noticed that the LS method, although a good approximation to ML in terms of estimates of QTL effects and test statistic, may lead to a biased (inflated) estimate for the residual error variance. Based on this work, Xu (1998a, b) eventually developed the iteratively reweighted least squares

52

(IRLS) method. In his works (XU 1998a, b), the iteratively reweighted least squares was abbreviated IRWLS. Xu (1998b) compared LS, IRLS and ML in a variety of situations and conclude that IRLS is always better than LS and as efficient as ML. When the residual error does not have a normal distribution, which is required by the ML method, LS and IRLS can be better than ML. In other words, LS and IRLS are more robust than ML to departure from normality. Kao (2000) and Feenstra et al. (2006) conducted more comprehensive investigation on LS, IRLS and ML and found that when epistatic effects exist, LS can generate unsatisfactory results, but IRLS and ML usually map QTL better than LS. In addition, Feenstra et al (2006) modified the weighted least square method by using the estimating equations (EE) algorithm. This algorithm further improved the efficiency of the weighted least squares by maximizing an approximate likelihood function. Most recently, Han and Xu (2008) developed a Fisher scoring (FISHER) algorithm to maximize the approximate likelihood function. Both the EE and Fisher algorithm maximize the same likelihood function, and thus they produce identical results.

The LS method ignores the uncertainty of the QTL genotype. The IRLS, FISHER (or EE) and ML methods use different ways to extract information from the uncertainty of QTL genotype. If the putative location of QTL overlaps with a fully informative marker, all four methods produce identical result. Therefore, if the marker density is sufficiently high, there is virtually no difference for the four methods. For low marker density, when the putative position is far away from either flanking marker, the four methods will show some difference. This difference will be magnified by large QTL. Han and Xu (2008) compared the four methods in a simulation experiment. In this research, when the putative QTL position is fixed in the middle of a 10 cM interval, the four methods generated almost identical results. However, when the interval expands to 20 cM, the differences among the four methods become noticeable.

A final remark on interval mapping is the way to infer the QTL genotype using flanking markers. If only flanking markers are used to infer the genotype of a putative position bracketed by the two markers, the method is called interval mapping. Strictly speaking, interval mapping only applies to fully informative markers because we always use flanking markers to infer the QTL genotype. However, almost all datasets obtained from real life experiments contain missing, uninformative or partially informative markers. To extract maximum information from markers, people always use the multipoint method (Jiang and Zeng 1997) to infer a QTL genotype. The multipoint method uses more markers or even all markers of the entire chromosome (not just flanking markers) to infer the genotype of a putative position. With the multipoint analysis, we no longer have the notion of interval, and thus interval mapping is no long an appropriate phrase to describe QTL mapping. Unfortunately, a more appropriate phrase has not been proposed and people are used to the phrase of interval mapping.

53

Therefore, the so called interval mapping in the current literature means QTL mapping under a single QTL model, regardless whether the genotype of a putative QTL position is inferred from flanking markers or all markers.

54

INTERVAL MAPPING FOR DISCRETE TRAITS

Many disease resistance traits in agricultural crops are measured in ordered categories. The generalized linear model (GLM) methodology (MCCULLAGH and NELDER 1989) is an ideal tool to analyze these traits. In QTL mapping for continuously distributed traits, mixture model (LANDER and BOTSTEIN 1989) is the most efficient way to take advantage of marker information. The least square method of Haley and Knott (1992) is the simplest way to incorporate linked markers. Performances of the weighted least squares method of Xu (1998a, b) and estimating equations (EE) algorithm of Feenstra et al. (2006) are usually between the least squares and mixture model methods. These methods have been successfully applied to QTL mapping for continuous traits, but they have not been investigated for ordinal trait QTL mapping. In the current version of PROC QTL, there are four GLM methods for mapping ordinal trait QTL including the ML under homogeneous variance (Expectation substitution method, LS), ML under heterogeneous variance (heterogeneous residual variance, Fisher scoring), ML under mixture distribution (Quasi-likelihood method) and ML via the EM algorithm. The algorithm of QTL mapping under the generalized linear model used in PROC QTL can be found from Xu and Hu (2010a).

GENERALIZED LINEAR MODEL FOR ORDINAL TRAITS

Suppose that a disease phenotype of individual j ( 1, ,j n ) is measured

by an ordinal variable denoted by 1, , 1jS p , where 1p is the total

number of disease classes and n is the sample size. Let j jkY Y ,

1, , 1k p , be a ( 1) 1p vector to indicate the disease status of

individual j . The k th element of jY is defined as

1 if

0 if j

jk

jS k

S ky

(6.1)

Using the probit link function, the expectation of jkY is defined as

1( )jk jk k j j k j jE Y X Z X Z (6.2)

where k ( 0 and 1p ) is the intercept, is a 1q vector for

some systematic effects (not related to the effects of quantitative trait loci), and is an 1r vector for the effects of a quantitative trait locus. The

symbol ( ) is the standardized cumulative normal function. The design

matrix jX is assumed to be known, but jZ may not be fully observable

55

because it is determined by the genotype of j for the locus of interest.

Because the link function is probit, this type of analysis is called probit

analysis. Let j jk be a ( 1) 1p vector. The expectation for vector jY

is ( )j jE Y and the variance matrix of jY is

var( ) T

j j j j jV Y (6.3)

where diag( )j j . The method to be developed requires the inverse of

matrix jV . However, jV is not of full rank. We can use a generalized inverse

of jV , such as 1

j jV , in place of 1

jV . The parameter vector is

, , with a dimensionality of ( ) 1p q r . Binary data is a special

case of ordinal data in that 1p so that there are only two categories,

1, 2jS . The expectation of jkY is

1 0

2 1

for 1( ) ( )

( ) ( ) for 2

j j j j

jk

j j j j

X Z X Z

X Z X Z

k

k

(6.4)

Because 0 and 2 in the binary case, we have

1

1

( ) for 1

1 ( ) for 2

j j

jk

j j

X Z k

X Z k

(6.5)

We can see that 2 11j j and

1

1 1( )j j jX Z (6.6)

The link function is 1 ( ) and thus it is called the probit link function. Once

we take the probit transformation, the model becomes a linear model. Therefore, this type of model is called a generalized linear model (GLM). The usual linear model we deal with for continuous traits is a special case of the GLM because the link function is simply the identity, i.e.,

1

1 1( )j j jX ZI (6.7)

or simply

1 1j j jX Z (6.8)

Let us first assume that the genotypes of the QTL are observed for all

individuals. In this case, variable jZ is not missing. The log likelihood

function under the probit model is

56

1

( ) ( )n

j

j

LL

(6.9)

where

1

1

1

( ) ln ( ) ( )p

j jk k j j k j

k

jX Z X ZL Y

(6.10)

and , , is the vector of parameters. This is the simplest GLM

problem and the classical iteratively weighted least squares approach for GLM (NELDER and WEDDERBURN 1972; WEDDERBURN 1974) can be used without any modification. The iterative equation under the classical GLM is given below,

( 1) ( ) 1 ( ) ( )( ) ( )t t t tI S (6.11)

where t is the value of parameter at the current iteration, ( )tI is the

information matrix and ( )tS is the score vector, both evaluated at t .

We can interpret

1 ( ) ( )( ) ( )t tI S (6.12)

in equation (6.11) as the adjustment for ( )t to improve the solution in the

direction that leads to the ultimate maximum likelihood estimate of .

Equation (6.3) shows that the variance of jY is a function of the expectation

of jY . This special relationship leads to a convenient way to calculate the

information matrix and the score vector, as given by Wedderburn (1974),

1

( )n

jj

T

j jW DI D

(6.13)

and

1

(( ))n T

j j j jjD W YS

(6.14)

where 1

j jW . Therefore, the increment (adjustment) of the parameter can

be estimated using the following iteratively weighted least squares approach,

1

1 1( )

n nT T

j j j j j j jj jDD W D W Y

(6.15)

where jD is a ( 1) ( )p p q r matrix for the first partial derivatives of j

with respect to the parameters and 1

j j jW V is the weight matrix.

Matrix jD can be partitioned into three blocks,

57

T T T

j j j j

j TD

(6.16)

The first block / /T

j jk l is a ( 1)p p matrix with

1

1

0, { 1, }

jk

k j j

k

jk

k j j

l

k

jk

X Z

X Z

l k k

(6.17)

The second block / { / }T

j jk is a 1p q matrix with

1( ) ( )jk T

j k j j k j jX X Z X Z

(6.18)

The third block / /j jk is a 1p r matrix with

1( ) ( )jk T

j k j j k j jZ X Z X Z

(6.19)

where (.) is the probability density of the standardized normal variable. In

all the above partial derivatives, the range of k is 1, , 1k p . The

sequence of parameter values during the iteration process converges to a

local maximum likelihood estimate, denoted by . The variance-covariance

matrix of is approximately equal to 1var ( ) )ˆ ˆ(I , which is a by-product

of the iteration process. Here, we are actually dealing with a situation where the QTL overlaps with a fully informative marker because observed marker genotypes represent the genotypes of the disease locus.

EXPECTATION SUBSTITUTION METHOD

This method is also called the homogeneous variance model. If the QTL of interest does not overlap with any markers, the genotype of the QTL is not

observable, i.e., jZ is missing. The classical GLM does not apply directly to

such a situation. The missing value jZ still has some information due to

linkage with some markers. Again, we use an 2F population as an example

to show how to handle the missing value of jZ . The ML estimation of

parameters under the homogeneous variance model is obtained simply by

substituting jZ with the conditional expectation of jZ given flanking marker

58

information. Let

(2 ) Pr( | marker), 1,2,3j jg Z Hg gp (6.20)

be the conditional probability of the QTL genotype given marker information, where the marker information can be either drawn from two flanking markers (interval mapping, LANDER and BOTSTEIN 1989) or multiple markers

(multipoint analysis, JIANG and ZENG 1997). Vector gH is the g th row of

matrix H ,

1 0

0 1

1 0

H

(6.21)

which has been defined early in the mixture model maximum likelihood method of interval mapping for continuous traits. Using marker information,

we can calculate the expectation of jZ , which is

3

1

( ) (2 )j j j g

g

E Z p g HU

(6.22)

The method is called ML under the homogeneous residual variance model

because when we substitute jZ by jU , the residual error variance is no

longer equal to unity; rather it is inflated and varies across individuals. However, the homogeneous variance model here assumed the residual variance is a constant across individuals. This method is analogous to the Haley and Knott's (1992) method of QTL mapping applied to continuous traits. As a result, it is invoked by the "LS" method when the trait is ordinal. In other words, if you choose the method option in the PROC QTL statement as method = "LS", this expectation substitution algorithm will be used to estimate the QTL effects. The method is exactly the same as that

described in the generalized linear model section except that jZ is replaced

by jU . In summary, the expectation of the data is j jk where

1( )jk jk k j j k j jE Y X U X U (6.23)

The weight matrix for individual j is

1diag ( )j j jW V (6.24)

The partial derivative is

T T T

j j j j

j TD

(6.25)

59

The first block / /T

j jk l is a ( 1)p p matrix with

1

1

0, { 1, }

jk

k j j

k

jk

k j j

l

k

jk

X U

X U

l k k

(6.26)

The second block / { / }T

j jk is a 1p q matrix with

1( ) ( )jk T

j k j j k j jX X U X U

(6.27)

The third block / /j jk is a 1p r matrix with

1( ) ( )jk T

j k j j k j jU X U X U

(6.28)

where (.) is the probability density of the standardized normal variable.

FISHER SCORING METHOD

The homogeneous variance model described above is only a first moment

approximation because the uncertainty of the estimated jZ has been

ignored. Let

3

1

var( ) (2 ) T T

j j j g g j j

g

Z p g H H U U

(6.29)

be the conditional covariance matrix for jZ . Note that model (6.2) with jZ

substituted by jU is

1( ) ( ) ( )jk jk k j j k j jE Y X U X U (6.30)

An underlying assumption for this probit model is that the residual error variance for the "underlying liability" of the disease trait is unity across

individuals. Once jU is used in place of jZ , the residual error variance

becomes

2 1T

j j (6.31)

This is an inflated variance and it is heterogeneous across individuals. In order to adjust for the inflation of the residual error variance, we need to

60

rescale the model effects as follows,

1

1 1( ) ( ) ( )jk jk k j j k j j

j j

E Y X U X U

(6.32)

This modification leads to a change in the partial derivatives of j with

respect to the parameters. Corresponding changes in the derivatives are given below.

1

1

1 1( )

1 1( )

0, { 1, }

jk

k j j

k j j

jk

k j j

k j j

jk

l

X U

X U

l k k

(6.33)

1

1 1 1 1( ) ( )

jk T T

k j j j k j j j

j j j j

X U X X U X

(6.34)

and

2

1 1

1 1 1( ) ( )

1 1 1( ) ( )

jk T

k j j j k j j j

j j j

T

k j j j k j j j

j j j

X U U X U

X U XU U

(6.35)

The iteration formula remains the same as (6.11) except that the modified weight and partial derivatives are used under the heterogeneous residual variance model. To invoke this method, the method option in the PROC QTL statement must be set as method = "fisher".

APPROXIMATE MIXTURE MODEL

In this approximate model, we define genotype specific expectations and then combine them using the probabilities of QTL genotypes inferred from markers as the weights. Let

1( ) ( ) ( ) ( )jk jk k j g k j gg E Y X H X H (6.36)

be the genotype specific expectation of jkY when individual j takes the g th

genotype for 1,2,3g . The weighted expectation is

61

3

1

(2 ) ( )jk jk

g

p g g

(6.37)

Let j jk be a column vector containing the jk ’s. The corresponding

weight for individual j is

1 1diag ( )j jj jW V (6.38)

Let

( ) ( ) ( ) ( )

( )T T

j j

T

j

T

j

j

g g g gD g

(6.39)

be the partial derivatives of the genotype specific expectation with respect

to the parameters. The corresponding values of ( )jD g are

1

1

1

( )

(

( )

( )

( )0 1,,

)

jk

k j g

k

jk

k j g

k

jk

l

g

g

g

X H

X H

l k k

(6.40)

1

( )( ) ( )

jk T

j k j g k j gX Xg

HX H

(6.41)

and

1

( )( ) ( )

jk T

g k j g k j gX Xg

HH H

(6.42)

The weighted average of ( )jD g is

3

1

(2 ) ( )j j

g

D Dp g g

(6.43)

We have defined j , jW and jD , which are all what we need to establish

the following iteration equation to estimate the parameter,

1

( 1) ( )

1 1( )

n nt t T T

j j j j j j jj jD W D W YD

(6.44)

This approximate model is invoked by turning on the method = “irls” option in the PROC QTL statement.

62

MIXTURE MODEL MAXIMUM LIKELIHOOD METHOD

The exact mixture model approach defines genotype specific expectation, variance matrix and all derivatives for each individual. Let

1( ) ( ) ( ) ( )jk jk k j g k j gg E Y X H X H (6.45)

be the genotype specific expectation of jkY when individual j takes the g th

genotype for 1,2,3g . The corresponding genotype specific weight matrix

is

1 1( ) ( ) diag ( )j j jW g g g (6.46)

Let ( )jD g be the genotype specific partial derivative of the expectation with

respect to the parameters whose elements are defined as

1

1

1

( )

(

( )

( )

( )0 1,,

)

jk

k j g

k

jk

k j g

k

jk

l

g

g

g

X H

X H

l k k

(6.47)

1

( )( ) ( )

jk T

j k j g k j gX Xg

HX H

(6.48)

and

1

( )( ) ( )

jk T

g k j g k j gX Xg

HH H

(6.49)

Let us define the posterior probability of QTL genotype after incorporating the disease phenotype for individual j as

*

3

1

(2 ) ( )(2 )

(2 ) ( )

T

j j j

jT

j j jg

p g Y gp g

p g Y g

(6.50)

The increment for parameter updating under the mixture model is

1

1 1

)( ( )n n

T T

j j j j j j j

j j

E D W D E D W Y

(6.51)

where

3*

1

) ( ) ( ) ( ) ( )( T T

j j j j j j j

g

W D p g D g W g D gE D

(6.52)

63

3

*

1

( ) (2 ) ( ) ( ) ( )T T

j j j j j j j j j

g

Y p g D g W Y gE D gW

(6.53)

and

1( ) ( )j jgW g (6.54)

This is actually an EM algorithm where calculating the posterior probabilities of QTL genotype and using the posterior probabilities to calculate

)( T

j j jW DE D and ( )T

j j j jW YE D constitute the E-step and calculating the

increment of the parameter using the weighted least square formula makes up the M-step.

VARIANCE-COVARIANCE MATRIX FOR ESTIMATED PARAMETERS

A problem with this EM algorithm is that ˆvar( ) is not a by- product of the

iteration process. For simplicity, if the markers are sufficient close to the trait locus of interest, we can use

1

1

ˆ)var )( (n T

j j jjE D W D

(6.55)

to approximate the covariance matrix of estimated parameters. This is an

underestimated variance matrix. A more precise method to calculate var( ˆ)

is to adjust the above equation by the information loss due to uncertainty of the QTL genotype. Let

1

ˆ | ( )( )n

T

j j j j

j

YS Z D W

(6.56)

be the score vector as if Z were observed. Louis (1982) showed that the information loss is due to the variance-covariance matrix of the score vector, which is

1

ˆ( | ) (var r )van

T

j j j j

j

S Z WD Y

(6.57)

The variance is taken with respect to the missing value Z using the posterior probability of QTL genotype. The information matrix after adjusting for the information loss is

1 1

( va( ) (rˆ) )n n

T T

j j j j j j j

j j

E D W D DI W Y

(6.58)

The variance-covariance matrix for the estimated parameters is then

approximated by var( ˆ) 1 ˆ)(I . The first term in the above expression is

64

3

*

1 1 1

( ) ( )( () )2n n

T T

j j j j j j j

j j g

E D W D p g D W Dg g g

(6.59)

which is the expected value of the negative Hessian matrix. The second term of equation (6.58) is

3*

1 1 1

var ( | ) ( 2) ( | ) ( ) ( | ) ( )n n

T

j j j j j j j

j j g

S Z p g S g S S g S

(6.60)

which is the variance matrix of the score vector, where

( ) ) )| () (( T

j j j j jS g D W Yg g g (6.61)

and

3*

1

( ) ( ) ( | )j j j

g

S p g S g

(6.62)

The mixture model maximum likelihood method is invoked by turning on the method = "ml" option in the PROC QTL statement.

HYPOTHESIS TESTING

The two statistical tests introduced in QTL mapping for continuous traits are also applicable for the analysis of ordinal traits. The variance-covariance

matrix of the QTL effects is easily obtained from a subset of ˆvar( ) . The

Wald-test for QTL effects under the hypothesis 0 : 0H is

var(ˆ ˆ ˆ)TW . (6.63)

Alternatively, the likelihood ratio test statistic can be applied to test 0H ,

0 12( )L L (6.64)

where

1

1

ˆlog( )n

T

j j

j

L Y

and 0

1

ˆlog( )n

T

j j

j

L Y

(6.65)

In the above equation, ˆT

j and ˆT

j are the expectations of jY at

2ˆ ˆ ˆ ˆ, , and 2ˆ ˆ ˆˆ ˆ ˆ,0, , respectively.

EXTENSION TO OTHER TRAITS

Ordinal traits are the most commonly observed discrete traits in QTL

65

mapping experiments. Other discrete traits also commonly seen in QTL mapping experiments are binary traits, binomial traits and Poisson traits. This section is dedicated to these commonly observed discrete traits. The mixture model algorithm and the heterogeneous variance approximation apply to all traits as long as the traits can be analyzed by the generalized linear model. To apply the algorithms to any specific trait, we only need to find: (1) the distribution of trait (probability density of the data point), (2) the expectation of the data point, (3) the weight (inverse of the variance) of the data point and (4) the partial derivative of the expectation with respect to the parameter. We now introduce these discrete traits and provide details of the formulas for interested readers.

Binary traits

Binary traits can be treated as a special case of ordinal traits with 1p .

Without any modification, the method developed for ordinal traits can be

applied to binary traits with 1 2[ ]T

j j jY Y Y defined as a 2 1 vector. Each of

the two components is defined as a binary variable and the two components

are perfectly correlation. Here we simplify the problem by defining jY as a

univariate binary trait. This univariate treatment not only saves computing time but also simplifies the notation. We now use the univariate definition to define the binary phenotype,

1 for trait presence

0 for trait absencejY

(6.66)

The expectation and variance of the phenotype given the parameter value are

( )j j j jE Y X Z (6.67)

and

var( ) (1 )j j j jY V (6.68)

The probability density is

1

( ) (1 )j jY Y

j j jp Y

(6.69)

We now give the details for the mixture model and heterogeneous variance model. Let 1,2,3g index the three genotypes and

( ) ( )| ( )j j j ggg E Y X H (6.70)

The D matrix for genotype g is defined as

66

( ) ( )

( )j j

j T T

g gD g

(6.71)

where

)

( )(j T

j j gXg

X H

(6.72)

and

)

( )(j T

g j gXg

H H

(6.73)

We now describe binary data under the heterogeneous variance model. Let

the expectation of jY be

1

( ) ( )j j j j

j

E Y X U

(6.74)

The D matrix is defined as

j j

j T TD

(6.75)

where

1 1

( )j T

j j j

j j

X U X

(6.76)

and

2

1 1 1( ) ( )

j T

j j j j j j

j j j

X U U X U

(6.77)

Binomial traits

Let 0 / ,1/ , ..., /j j j j jY n n n n be the phenotype of a binomial trait

(expressed as a ratio or fraction) with jn trials. The expectation and

variance of the phenotype are

( ) ( )j j j jE Y X Z (6.78)

and

1

var( ) (1 )j j j j

j

Y Vn

(6.79)

The weight is

67

1

1

j

j j

j j

nW V

(6.80)


(1 )( )!

(1 )( )!(

))!

( j j j jn Y n Yj

j j

j j j j

j

n

n np

nY

Y

(6.81)

The D matrix for the binomial data is exactly the same as that of the binary data.

Poisson traits

Let 0,1, ...,jY be the phenotype of a Poisson trait. The expectation and

the variance of the phenotype are equivalent, ( ) var( )j j jE Y Y , where

expj j j jV X Z (6.82)

The weight is

1 1

expj j

j g

W VX H

(6.83)


( ) exp( )( )!

jY

j

j j

j

p YY

(6.84)

Let the expectation of jY given genotype g be

( ) ( | ) exp , 1,2,3j j j gg E Y g X H g (6.85)

Define

( ) ( )

( )j j

j T T

g gD g

(6.86)

where

( )exp

( )exp

j T

j g j

j T

j g g

gX H X

gX H H

(6.87)

Under the heterogeneous variance model, the expectation of jY is

68

1

( ) exp ( )j j j j

j

E Y X U

(6.88)

The D matrix is

j j

j T TD

(6.89)

where

2

ln( )

j T

j

j

j j jT

j

j

j

j

j j

X

U

(6.90)

69

MAPPING QUANTITATIVE TRAIT LOCI UNDER

SEGREGATION DISTORTION

Segregation distortion is a phenomenon that the genotypic frequency array of a locus does not follow a typical Mendelian ratio. Depending on the population under investigation, Mendelian ratio of a locus varies from 1:1 for

a backcross to 1:2:1 for an 2F and to 1:1:1:1 for a four-way cross. For some

reasons, a marker may not follow a typical Mendelian ratio. Such a marker is called a distorted marker. For a long period of time, the effects of distorted markers on the result of QTL mapping were not known. For precaution, people simply discarded all the distorted markers in QTL mapping. Recently, we found that distorted markers can be safely used for QTL mapping with no detrimental effect on the result of QTL mapping (XU 2008). This finding can help QTL mappers save tremendous resources by using all available markers, regardless whether they are Mendelian or not. We also found that if distorted markers are handled properly, they can be beneficial to QTL mapping.

Marker segregation distortion is only a phenomenon. The reason behind the distortion is due to one or more segregation distortion loci (SDL). These loci are subject to gametic selection (FARIS et al. 1998) or zygotic selection (KÄRKKÄINEN et al. 1996) and their (unobservable) distorted segregation cause the observed markers to distort. Several investigators (FU and RITLAND 1994; LORIEUX et al. 1995a; LORIEUX et al. 1995b; LUO and XU 2003; LUO et al. 2005; VOGL and XU 2000; WANG et al. 2005a; ZHU and ZHANG 2007) have attempted to map these segregation distortion loci using molecular markers. It is natural to consider mapping QTL and SDL jointly in the same population. Agricultural scientists are interested in mapping QTL for economically important traits while evolutionary biologists are interested in mapping SDL that respond to natural selection. Combining the two mapping strategies into one is beneficial to both communities. Performing such a joint mapping strategy is the main objective of this study. Since the theory of segregation distortion has been introduced and discussed in previous studies (LORIEUX et al. 1995a; LORIEUX et al. 1995b) and our own research (XU 2008). This study only presents the EM (expectation-maximization) implementation of the statistical method. The variance-covariance matrix of estimated parameters under the EM algorithm is also derived and presented in this chapter. Details of the method can be found from our recent publication (XU and HU 2010b).

THE LIKELIHOOD OF MARKERS

Let M and N be the left and right flanking markers bracketing the QTL

(denoted by G for short). The interval of the genome carrying the three loci

70

is labeled by a segment MGN . The three genotypes of the QTL are

denoted by 1 1G G , 1 2G G and 2 2G G , respectively. Similar notation also applies

to the genotypes of the flanking markers. The interval defined by markers

M and N is divided into two segments. Let 1r and 2r be the recombination

fractions for segment MG and segment GN , respectively. The joint

distribution of the marker genotypes conditional on the QTL genotype can be derived using the Markov chain property under the assumption of no segregation interference between consecutive loci. Let us order the three

genotypes, 1 1G G , 1 2G G and 2 2G G as genotypes 1, 2 and 3, respectively. If

individual j takes the th genotype for the QTL, we denote the event by

, 1,2,3jG . The joint probability of the two markers conditional on

the genotype of the QTL is

Pr( , | ) Pr( | )Pr( | )j j j j j j jM N G M G N G (7.1)

for all , , 1,2,3 , where 1Pr( | ) ( , )j jM G and

2Pr( | ) ( , )j jN G . We use ( , )i to denote the th row and

the th column of the following transition matrix

2 2

2 2

2 2

(1 ) 2 (1 )

(1 ) (1 ) (1 )

2 (1 ) (1 )

i i i i

i i i i i i i

i i i i

r r r r

r r r r r r

r r r r

, 1,2i (7.2)

For example,

2

1 2 1 2 2

Pr 1, 2 | 3 Pr( 1| 3)Pr( 2 | 3)

(3,1) (3,2) 2 (1 )

j j j j j j jM N G M G N G

r r r

(7.3)

Let Pr( ), 1,2,3G , be the probability that a randomly sampled

individual from the 2F family has a genotype . We use a generic notation

p for probability, so that ( )jp G represents Pr( )jG and

( , | )j j jp M N G stands for Pr( , | )j j jM N G . The log likelihood function

of the flanking marker genotypes in the 2F population is

3

11

3

1 211

| ln ( ) ( , | )

ln ( , ) ( , )

n

j j j j

j

n

j j

j

L m p G p M N G

M N

(7.4)

where T

1 2 3[ ] is a vector of parameters with constraint

71

3

11

, where m in ( | )L m stands for marker information. Note that

without any prior information, ( ) , 1, ,jp G j n . Under the

assumption of Mendelian segregation, where T

1 2 3[ ] [1/ 4 1/ 2 1/ 4] . However, we treat as unknown

parameters. We postulate that deviation of from will cause a marker

linked to locus G to show distorted segregation. This likelihood function is

the one used in mapping viability loci (LUO et al. 2005).

THE LIKELIHOOD OF PHENOTYPES

Let jy be the phenotypic value of a quantitative trait measured from

individual j . The probability density of jy conditional on the genotype of

individual j is normal with mean

jX H (7.5)

and variance 2 , i.e.,

2

22

1 1( | ) exp ( )

22j j j jp y G y X H

(7.6)

where H is the th row of matrix H and

1 0

0 1

1 0

H

(7.7)

Vector [ ]Ta d contains the additive and dominance effects. The design

matrix jX and the regression coefficients capture non-QTL effects, e.g.,

location effects, year effects and so on. The likelihood function of the

phenotypic values in the 2F population is

2

32

11

32 21

1 21

, , , | ln ( ) ( | )

ln( ) log exp ( )2

n

j j j

j

n

j

j

L y p G p y G

ny

(7.8)

where letter y in 2( , , , | )L y stands for the phenotype. This likelihood

function is the one used in segregation analysis of quantitative traits (ELSTON and STEWART 1973) because no marker information is required.

72

JOINT LIKELIHOOD OF MARKERS AND PHENOTYPES

Let T 2 T[ ]T T be a vector of all parameters in the joint

analysis. The likelihood function can be obtained by combining equations (7.4) and (7.8),

3

1 1

32 2

1 221 1

( | , ) ln ( ) ( | ) ( , | )

1 ln exp ( ) ( , ) ( , ) ln( )

22

n

j j j j j j

j

n

j j j

j

L m y p G p y G p M N G

ny M N

(7.9)

For QTL mapping under segregation distortion, this log likelihood function is the one that is subject to maximization. The previous two likelihood functions (for markers and for phenotypes) were presented as background information to introduce this joint log likelihood function.

EM ALGORITHM FOR THE JOINT ANALYSIS

The MLE (maximum likelihood estimate) of the parameters can be solved via an EM algorithm (DEMPSTER et al. 1977). We need to rewrite the likelihood function in a form of complete data. Let us define a delta function as

if 1

( , ) if 0

j

j

j

GG

G

(7.10)

If the genotypes of all individuals are known, i.e., given ( , )jG for all

1, ,j n and 1, 2,3 , the complete data log likelihood is

1

( , ) log ( | ) ( , | ) ( )n

j j j j j j

j

L p y G p M N G p G

(7.11)

where

3

2

221

1 1( | ) exp ( , )( )

22j j j jp y G G y

, (7.12)

3

( , )

1

3( , )

1 2

1

, | ( , | )

( , ) ( , )

j

j

G

j j j j j j

k

G

j j

k

p M N G p M N G

M N

(7.13)

and

3

( , )

1

( ) jG

jp G

(7.14)

73

The delta variables are missing values. Therefore, we need to take

expectation of the likelihood with respect to . The expected likelihood

function is

( ) ( )

0 1 2( | ) ( , ) | ( ) ( )t tL E L (7.15)

Note that ( )[ ( , ) | ]tE l stands for the expectation of ( , )L with respect

to conditional on parameter at the current state ( )t and the data (the

symbol for data is suppressed for simplicity). The three components of equation (7.15) are

3( )

0 1 2

1 1

[ ( , ) | ] log ( , ) log ( , )n

t

j j j

j

E G M N

, (7.16)

32 ( ) 2

1 21 1

1( ) ln( ) [ ( , ) | ]( )

2 2

nt

j j

j

nE G y

(7.17)

and

3( )

2

1 1

( ) [ ( , ) | ]lnn

t

j

j

E G

. (7.18)

The first component 0 is a function of ( )t but not a function of .

Therefore, it is considered as a constant.

Expectation (E-step): The expectation step of the EM algorithm requires

computing the expectation of conditional on the data and ( )t . Because

is a Bernoulli variable, the expectation is simply the probability of 1 ,

i.e.,

2

2

( ) ( )

3

1

211 22

3 211 221

( , ) | Pr[ ( , ) 1| , , ]

( ) ( | ) ( , | )

( ) ( | ) ( , | )

exp ( ) ( , ) ( , )

exp ( ) ( , ) ( , )

t t

j j

j j j j j j

j j j j j j

j j j

j j j

E G G m y

p G p y G p M N G

p G p y G p M N G

y M N

y M N

(7.19)

Maximization (M-step): The maximization step of the EM algorithm requires

taking the partial derivatives of ( )( | )tL with respect to , setting the

partial derivatives equal to zero, and solving for the parameters.

( )

1 2( | ) ( ) ( ) 0tL

(7.20)

The solutions of the parameters are

74

13

1 1 1

13

1 1

32

2

1 1

1

( , )

( , ) ( )

1( , )

1( , ) , 1, 2,3

n nT

j j j j

j j

n nT

j j j

j j

n

j j j

j

n

j

j

X X E G y H

E G H H y X

E G y X Hn

E Gn

(7.21)

HYPOTHESIS TESTING

Hypothesis tests are complicated when QTL segregate in a non-Mendelian fashion. There are many different hypotheses we can test here. Although the Wald test can be performed for testing the presence of QTL, such a test is not justified for testing the null hypothesis of Mendelian segregation. Therefore, the likelihood ratio tests are more justifiable here. Regardless what hypothesis is tested, the full model joint log likelihood function given in equation (7.9) is required. Let us reintroduce this joint log likelihood function using a different notation so that different likelihood ratio tests are easily interpreted. The joint likelihood is rewritten as

2

32

1 221 1

( , ) log( )2

1ln exp ( ) ( , ) ( , )

2

QS

n

j j j

j

nL

y M N

(7.22)

where indicates QTL effect and represents non-Mendelian segregation.

The null hypothesis for QTL detection is : 0QTLH while the null

hypothesis for detecting segregation distortion is :SDLH .

Testing the presence of QTL:

The null hypothesis is : 0QTLH . The likelihood ratio test statistic is

ˆ ˆ ˆ2 (0, ) ( , )QTL S QSL L (7.23)

where ˆ(0, )SL is the log likelihood value under the null model 0 , which

is

2ˆˆ ˆˆ(0, ) ( , | ) ( | )SL L y L m (7.24)

where

75

2 2 2 2

21

1ˆ ˆˆ ˆ ˆ( , | ) ln( ) ( ) ln( ) 1ˆ2 2 2

n

j j

j

n nL y y X

(7.25)

and

3

1 2

1 1

ˆ ˆ( | ) ln ( , ) ( , )n

j j

j

L m M N

(7.26)

The estimated parameters in (7.25) and (7.26) are obtained by maximizing the corresponding likelihood functions.

Testing non-Mendelian segregation:

The null hypothesis is :SDLH . The likelihood ratio test statistic is

ˆ ˆ ˆ2 ( , ) ( , )SDL Q QSL L (7.27)

where

3

2 2

1 221 1

1ˆ ˆ ˆ( , ) ln( ) ln exp ( ) ( , ) ( , )

ˆ2 2

n

Q j j j

j

nL y M N

(7.28)

Again, the MLE of the parameters in equation (7.28) are obtained by maximizing this likelihood function.

Testing both QTL and SDL:

The null hypothesis is 0 : 0 and H . The likelihood ratio test statistic

is

ˆ ˆ2 (0, ) ( , )QS QSL L (7.29)

where

2ˆ ˆ(0, ) ( , | ) ( | )L L y L m (7.30)

The two components of (7.30) are

2 2 2 2

21

1ˆ ˆˆ ˆ ˆ( , | ) ln( ) ( ) ln( ) 1ˆ2 2 2

n

j j

j

n nL y y X

(7.31)

and

3

1 2

1 1

( | ) ln ( , ) ( , )n

j j

j

L m M N

(7.32)

This hypothesis is rejected if either 0 or or both. The QTL effect

and the segregation distortion are confounded. This hypothesis may be useful in the following situation. Suppose that, for some reason, we know

76

for sure that the population from which the sample is drawn is a Mendelian population. The sample drawn from this population is selected based on extreme phenotypes (selective genotyping). The sample is then non-Mendelian regarding the QTL that control the trait subject to phenotypic selection. Rejecting this hypothesis is equivalent to rejecting the null hypothesis of QTL. The reason is that segregation distortion in the sample is solely caused by selective genotyping. Therefore, this joint test can be used to detect QTL under selective genotyping.

STANDARD ERRORS OF THE ESTIMATED PARAMETERS

Let us define the individual-wide complete-data log likelihood for individual j as

32 2

21

3

1 2

1

3

1

1 1( , ) ln( ) ( , )( )

2 2

( , ) ln ( , ) ln ( , )

( , ) ln

j j j j

j j j

j

L G y X H

G T M T N

G

(7.33)

where 3 1 21 so that 3 is excluded from the parameter vector. To

derive the variance covariance matrix of the estimated parameters, we need

to define the score vector ( , )jS and the Hessian matrix ( , )jH for the

individual-wide complete-data log likelihood function. The Louis (1982) information matrix of the parameters under the EM algorithm is then

1 1

ˆ ˆ ˆ( ) ( , ) var ( , )n n

j j

j j

I E H S

(7.34)

where the expectation and variance are taken with respect to the missing

values . Once the information matrix is defined, the variance matrix of the

estimated parameters simply takes

1va ( ˆ ˆ) )r (I (7.35)

The standard error of each parameter simply takes the square root of each diagonal element of the above matrix.

We now present the score vector and the Hessian matrix. The score vector

is denoted by ( , )

( , )j

j

LS

, which consists of five blocks, as shown

below

77

3

21

3

21

32 2

2 2 41

1

1 1 1

( , ) 1( , ) ( , ) ( )

( , ) 1( , ) ( , ) ( )

( , ) 1 1( , ) ( , )( )

2 2

( , ) 1 1( , ) ( ,1) ( ,3)

1

j T

j j j j j

k

j T

j j j j

j

j j j j

j

j j j

LS G X y X H

LS G H y X H

LS G y X H

LS G G

2

2

2 2 1 2

( , ) 1 1( , ) ( , 2) ( ,3)

1

j

j j j

LS G G

(7.36)

Concatenating the above five scores vertically, we can get the score vector,

2

1

2

( , )

( , )

( , ) ( , )

( , )

( , )

j

j

j j

j

j

S

S

S S

S

S

(7.37)

The Hessian matrix is denoted by

2 ( , )( , )

j

j T

LH

, which is block

diagonal with non-zero blocks given below,

78

2

2

2 3

21

2 32

2 41

2 3

21

2

2

2 4

( , ) 1( )

( , ) 1( ) ( , )

( , ) 1( ) ( , ) ( )

( , ) 1( ) ( , )

( , ) 1( ) (

j T

j j jT

j T

j j jT

j T

j j j j j

j T

j jT

j

j

LH X X

LH G X H

LH G X y X H

LH G H H

LH

3

1

2 32 2 2

2 2 4 61

2

1 1 2 2

1 1 1 3

2

1 2 2

1 2 3

2

2 2

2 2

, ) ( )

( , ) 1 1( ) ( , )( )

2

( , ) 1 1( ( ,1) ( ,3)

( , ) 1( ( ,3)

( , )( (

)

)

,) 2

T

j j j

j

j j j j

j

j j j

j

j j

j

j j

G H y X H

LH G y X H

LH G G

LH G

LH G

2 2

2 3

1 1) ( ,3)jG

(7.38)

The Hessian matrix is obtained through

2

2

2 2 2 2

1 1 1 2

1 2 2 2

( ) ( ) ( ) 0 0

( ) ( ) ( ) 0 0

( , ) ( ) ( ) ( ) 0 0

0 0 0 ( ) ( )

0 0 0 ( ) ( )

j j j

T

j j j

T T

j j j j

j j

T

j j

H H H

H H H

H H H H

H H

H H

(7.39)

The expectation of the Hessian matrix ( , )jE H and the variance matrix

of the score vector var ( , )jS can be expressed explicitly because both

the Hessian matrix and the score vector are linear functions of the missing value

( ,1) ( , 2) ( ,3)T

j j j jG G G (7.40)

Therefore, ( , )jE H and var ( , )jS can eventually be expressed as

functions of the expectation and variance of j , which have simple

expressions because j is a multinomial variable. The Hessian matrix

79

( , )jH is already expressed in linear function of j and thus the

expectation can be obtained straightforwardly by replacing j by ( )jE . An

explicit linearity for the score function is not obvious. The following gives the linear relationship using matrix notation. Let us define the following matrices,

1 1 12 2

2 2 22 2

3 3 32 2

2

14

2 2

24

1 1( ) ( )

1 1( ) ( ) ; ( ) ( )

1 1( ) ( )

1( )

2

1( ) ( )

2

1

2

T T

j j j j j

T T

j j j j j j j

T T

j j j j j

j j

j j j

X y X H H y X H

C X y X H C H y X H

X y X H H y X H

y X H

C y X H

1

1 2

2

3

2

4 3

3

10

1; ( ) 0 ; ( )

11( )

j j

j j

C C

y X H

(7.41)

The score vector in matrix notation is

2

2

1

2

0( )

0( )

( , ) 1( , ) ( )

2( )

0( )

0

T

j

T

j

j T

j j j jj

T

j

T

j

C

CL

S C DC

C

C

(7.42)

As a result,

var ( , ) var( ) T

j j j jS C C (7.43)

where

var( ) diag ( ) ( ) ( )T

j j j jE E E (7.44)

is the variance-covariance matrix of the multinomial variable j .

80

INTERVAL MAPPING FOR MULTIPLE TRAITS

Multiple traits are measured virtually in all line crossing experiments of QTL mapping. Yet, almost all data collected for multiple traits are analyzed separately for different traits. Joint analysis for multiple traits has shed new light in QTL mapping by improving the statistical power of QTL detection and increasing the accuracy of QTL localization when different traits segregating in the mapping population are genetically related. Joint analysis for multiple traits is defined as a method that includes all traits simultaneously in a single model, rather than analyzing one trait at a time and reporting the results in a format that appears to be multiple trait analysis. In addition to the increased power and resolution of QTL detection, joint mapping can provide insights into fundamental genetic mechanisms underlying trait relationships such as pleiotropy versus close linkage and genotype by environment (G×E) interaction, which would otherwise be difficult to address if traits are analyzed separately. The current version of PROC QTL can perform interval mapping for multiple continuously distributed traits. If one or more traits are ordinal in the multiple trait set, you must use the Bayesian method to perform the multivariate analysis, which will be introduced later in the manual. Details of the multivariate QTL mapping can be found from Xu et al. (2005a).

MULTIVARIATE MODEL

Let n be the sample size of a mapping population, say 2F cross, and m be

the number of traits. Let 1[ ]T

j j jmy y y be an 1m column vector for

the phenotypic values of m traits measured from individual j for 1,...,j n .

The linear model for jy can be described as

j j j jy X Z (8.1)

where is an m p matrix for effects not related to QTL, jX is a 1p

design matrix connecting the non-QTL effects with the phenotypic value,

is an 2m matrix of QTL effects, jZ is a 2 1 vector determined by the

genotype of individual j for the QTL and j is an 1m vector for the

residual errors. The QTL effect matrix is defined as

1 1

2 2

m m

a d

a d

a d

(8.2)

81

where ia and id are the additive and dominance effects for the i th trait for

1,...,i m . The 1 2[ ]T

j j jZ Z Z vector is defined as

1 1 1 1

1 1 2 2 1 2

2 2 2 2

1 for 0 for

0 for and 1 for

1 for 0 for

j j

A A A A

Z A A Z A A

A A A A

(8.3)

In matrix notation, jZ can take one of the three columns of matrix H ,

1 0 1

0 1 0H

(8.4)

In other words, jZ is defined as

1 1 1

2 1 2

3 2 2

for

for

for

j

H A A

Z H A A

H A A

(8.5)

where kH is the k th column of matrix H . The residual error j is assumed

to be multivariate normal,

~ (0, )j N (8.6)

where is an m m positive definite covariance matrix.

LEAST SQUARE METHOD

There are two methods that users can selection for multivariate QTL mapping. The first method is the so called “least squares” method, which is the multivariate version (KNOTT and HALEY 2000) of the Haley-Knott method (1992). Under the LS method, the model is

j j j jy X U (8.7)

where

3

1

( 2)j j k

k

U p k H

(8.8)

is the expectation of jZ conditional on the marker information. The LS

estimates of the parameters are obtained as follows. Let [ || ] be the

horizontal concatenation of matrices and , i.e., is a ( 2)m p matrix

combining and horizontally (put side-by-side). Let [ / / ]j j jW X U be

the vertical concatenation of vectors jX and jU . The linear model given in

82

equation (8.7) is rewritten as

j j jy W (8.9)

This model provides an easy expression for the LS estimation of the parameters,

1

1 1

ˆn n

T T

j j j j

j j

y W W W

(8.10)

and

1

1 ˆ ˆˆ ( )( )n

T

j j j j

j

y W y Wn

(8.11)

Since the variance-covariance matrix of the estimated parameters

requires a complicated rearrangement of the elements within , we did not

provide the Wald test statistics for the QTL effects; rather, we only gave the likelihood ratio test statistics, which is defined as

0 12( )L L (8.12)

where

1

1

1

1 ˆ ˆˆ ˆln | | ( ) ( )2 2

nT

j j j j

j

nL y W y W

(8.13)

and

1

0

1

1ln | | ( ) ( )

2 2

nT

j j j j

j

nL y X y X

(8.14)

The estimated parameters under the reduced model are

1

1 1

n nT T

j j j j

j j

y X X X

(8.15)

and

1

1( )( )

nT

j j j j

j

y X y Xn

(8.16)


Let us denote the probability density for jy given the genotype of the QTL

by

83

1

1/2

1 1( | ) exp ( ) ( )

2| |

T

j j j k j j kf y k y X H y X H

(8.17)

The log likelihood function of the parameters is

1

( , ) ln ( 2) ( | )n

j j

j

L p k f y k

(8.18)

The MLE of the parameters are obtained via the EM algorithm described below. In the maximization step, the parameters are updated using the following equations,

1

1 1

( )n n

T T

j j j j j

j j

y E Z X X X

(8.19)

1

1 1

( ) ( )n n

T T

j j j j j

j j

y X E Z E Z Z

(8.20)

1

1( )( )

nT

j j j j j j

j

E y X Z y X Zn

(8.21)

In the expectation step, we calculated the posterior probabilities of QTL genotypes and the posterior expectations involved in the above three equations. The posterior probability of QTL genotype is

*

3

' 1

( 2) ( | )( 2)

( ' 2) ( | ')

j j

j

j j

k

p k f y kp k

p k f y k

(8.22)

The posterior expectations are

3

*

1

( ) ( 2)j j k

k

E Z p k H

(8.23)

3

*

1

( ) ( 2)T T

j j j k k

k

E Z Z p k H H

(8.24)

and

3*

1

( )( )

( 2)( )( )

T

j j j j j j

T

j j j k j j k

k

E y X Z y X Z

p k y X H y X H

(8.25)

HYPOTHESIS TESTING

PROC QTL only reports the likelihood ratio test statistics for multivariate mapping. We have described the likelihood ratio test statistics when the LS

84

method is used. The log likelihood value under the null model, 0L , is

calculated using exactly the same formula as used in the LS method. The likelihood value under the full model for the maximum likelihood method is

1

1

ˆ ˆ( , ) ln ( 2) ( | )n

j j

j

L L p k f y k

(8.26)

where

1

1/2

1 1 ˆ ˆˆˆ ˆ( | ) exp ( ) ( )ˆ 2| |

T

j j j k j j kf y k y X H y X H

(8.27)

The likelihood ratio test statistics is then given by

0 12( )L L (8.28)

The QTL procedure only provides the likelihood ratio test for overall QTL effect and there is no separate test for additive or dominance effect.

85

BAYESIAN SHRINKAGE METHOD FOR QTL

MAPPING

Methods of interval mapping include the least square method (HALEY and KNOTT 1992), the weighted least square method (XU 1998a, b), the maximum likelihood method (LANDER and BOTSTEIN 1989) and the Fisher scoring method (HAN and XU 2008). All these methods were originally developed based on the single QTL model. Although interval mapping (under the single QTL model) can detect multiple QTL by evaluating the number of peaks in the test statistic profile, it cannot provide accurate estimates of QTL effects. The best way to handle multiple QTL is to use a multiple QTL model. Such a model requires knowledge of the number of QTL. Most QTL mappers consider that the number of QTL is an important parameter and should be estimated in QTL analysis. Therefore, model selection is often conducted to determine the number of QTL (YI et al. 2003). Under the Bayesian framework, model selection is implemented through the reversible jump MCMC algorithm (YI and XU 2002). We (WANG et al. 2005b; XU 2003), however, believed that the number of QTL is not an important parameter. As a result, we proposed a model that includes as many QTL as the model can handle. Such a model is called an over saturated model (WANG et al. 2005b). Some of the proposed QTL may be real, but most of them are spurious. As long as we can force the spurious QTL to have zero or close to zero estimated effects, the over saturated model is considered satisfactory. The selective shrinkage Bayesian method can generate the result of QTL mapping exactly the same as we expected, that is spurious QTL effects are shrunken to zero whereas true QTL have effects subject to no shrinkage. In this chapter, we describe the Bayesian method for multiple QTL mapping but only for a single trait. Multiple trait Bayesian method will be given in the next chapter.

MULTIPLE QTL MODEL

The multiple QTL model can be described as

1 1

p q

j ji i jk k j

i k

y X Z

(9.1)

where jy is the phenotypic value of a trait for individual j for 1, ,j n

and n is the sample size. The non-QTL effects are included in vector

1, , p with matrix 1, ,j j jpX X X being the design vector to

connect to jy . The effect of the k th QTL is denoted by k for 1, ,k q

where q is the proposed number of QTL in the model. Vector

86

1, ,j j jqZ Z Z are determined by the genotypes of the proposed QTL in

the model. The residual error is assumed to be i.i.d 2(0, )N . Let us use a

BC population as an example. For the k th QTL, 1jkZ for one genotype

and 1jkZ for the other genotype. Extension to 2F population and adding

the dominance effects are straightforward (only requires adding more QTL effects and increasing the model dimension). The proposed number of QTL is q , which must be larger than the true number of QTL to make sure that

no QTL will be missed. The optimal strategy is to put one QTL in every d

cM of the genome, where d can be any value between 5 to 50. If 5d , the

model may be ill-conditioned due to multicollinearity. If 50d , some

genome areas may not be visited by the proposed QTL even if there are true QTL located in those areas. Of course, a larger sample size is required to handle a larger model (more QTL).

PRIOR, LIKELIHOOD AND POSTERIOR

The data involved in QTL mapping include the phenotypic values of the trait and marker genotypes for all individuals in the mapping population. Unlike Wang et al. (2005b) who expressed marker genotypes explicitly as data in the likelihood, here we suppress the marker genotypes from the data to simplify the notation. The linkage map of markers and the marker genotypes only affect the way to calculate QTL genotypes. We first use the multipoint method to calculate the genotype probabilities for all putative loci of the genome. These probabilities are then treated as the prior probabilities of QTL genotypes, from which the posterior probabilities are calculated by incorporating the phenotype and the current parameter values. Therefore,

the data used to construct the likelihood are represented by 1{ , , }ny y y .

The vector of parameters is denoted by , which consists of the positions of

the proposed QTL denoted by 1{ , , }q , the effects of the QTL

denoted by 1{ , , }q , the non-QTL effects denoted by 1{ , , }p

and the residual error variance 2 . Therefore, 2{ , , , , } , where

2 2

1{ , , }q will be defined later. The QTL genotypes 1{ , , }j j jqZ Z Z

are not parameters but missing values. The missing genotypes can be

redundantly expressed as 1{ , , }j j jq where

( , )jk jkG g

is the function. If jkG g , then ( , ) 1jkG g , else ( , ) 0jkG g , where

jkG is the genotype of the k th QTL for individual j and 1, 2g for a BC

population (two possible genotypes per locus). The probability density of

87

is

1

( | ) ( | )q

j jk k

k

p p

(9.2)

The independence of the QTL genotype across loci is due to the fact that they are the conditional probabilities given marker information. So, the marker information has entered here to infer the QTL genotypes. The prior for the is

1

( ) ( ) constantp

i

i

p p

(9.3)

This is a uniform prior or, more appropriately, uninformative prior. The reason for choosing uninformative prior for is that the dimensionality of

is usually very low so that can be precisely estimated from the data

alone without resorting to any prior knowledge. The prior for the QTL effects is

2 2

1 1

( | ) ( | ) ( | 0, )q q

k k k k

k k

p p N

(9.4)

where 2

k is the variance of the prior distribution for the k th QTL effect.

Collectively, these variances are denoted by 2 2

1{ , , }q . This is a

highly informative prior because of the zero expectation of the prior distribution. The variance of the prior distribution determines the relative

weights of the prior information and the data. If 2

k is very small, the prior

will dominate the data and thus the estimated k will be shrunken towards

the prior expectation, that is zero. If 2

k is large, the data will dominate the

prior so that the estimated k will be largely unaltered (subject to no

shrinkage). The key difference between this prior and the prior commonly used in Bayesian regression analysis is that different regression coefficient has a different prior variance and thus different level of shrinkage. Therefore, this method is also called the selective shrinkage method. The classical Bayesian regression method, however often uses a common prior for all

regression coefficients, i.e., 2 2 2

1 2 q . The problem with this

selective shrinkage method is that there are too many prior variances and it is hard to choose the appropriate values for the variance. There are two approaches to choosing the prior variances, empirical Bayes (XU 2007b) and hierarchical modeling (GELMAN 2006). The empirical Bayes estimates the prior variances under the mixed model methodology by treating each regression coefficient as a random effect. The hierarchical modeling treats the prior variances as parameters and assigns a higher level prior to each variance component. By treating the variances as parameters, rather than

88

as hyper-parameters, we can estimated the variances along with the regression coefficients. Here, we take the hierarchical model approach and

assign each 2

k a prior distribution. The scaled inverse chi-square

distribution is chosen for each variance component,

2 2 2( ) Inv ( | , ), 1, ,k kp k q (9.5)

The degree of freedom and the scale parameter are hyper-parameters

and their influence on the estimated regression coefficients is much weak

because the influence is through the 2

k . It is now easy to choose and .

The degree of freedom is also called the prior belief. Although the proper

prior should have 0 and 0 , our past experience showed that an

improper prior works better than the proper prior. Therefore, we choose 0 as the default value, which leads to

2

2

1( ) , 1, ,k

k

p k q

(9.6)

Users do have an option to choose a different set of hyper parameters. The

joint prior for all the 2

k is

2

1

( ) ( )q

k

k

p p

(9.7)

The residual error variance is also assigned to the improper prior,

2

2

1( )p

(9.8)

The positions of the QTL depend on the number of QTL proposed, the number of chromosomes and the size of each chromosome. Based on the average coverage per QTL (e.g., 30 cM per QTL), the number of QTL

allocated to each chromosome can be easily calculated. Let cq be the

number of QTL proposed for chromosome c . These cq QTL should be

placed evenly along the chromosome. We can let the positions fixed throughout all the MCMC process so that the positions are simply constants (not parameters of interest). In this case, more QTL should be proposed to make sure that the genome is well covered by QTL. The alternative and also more efficient approach is to allow QTL positions to move along the genome during the MCMC process. There is a restriction for the moving range of each QTL. The positions are disjoint along the chromosome. The first QTL must move between the first marker and the second QTL. The last QTL must move between the last marker and the second last QTL. All other QTL must move between the QTL in the left and the QTL in the right of the current QTL, i.e., the QTL that flank the current QTL. Based on this search strategy, the joint prior probability of QTL positions is

89

1 2 1 1( ) ( ) ( | ) ( | )c cq qp p p p (9.9)

Given the positions of all other QTL, the conditional probability of the position of QTL k is

1 1

1( )k

k k

p

(9.10)

If QTL k is located at either end of a chromosome, the above prior needs to

be modified by replacing either 1k or 1k by the position of the nearest

end marker. We now have a situation where the prior probability of one variable depends on values of other variables. This type of prior is called adaptive prior.

Since marker information has been used to calculate the prior probabilities of QTL genotypes, they are no longer expressed as data. The only data appearing explicitly in the model are the phenotypic values of the trait. Conditional on all parameters and the missing values, the probability

density of jy is normal. Therefore, the joint probability density of all the jy 's

(called the likelihood) is

2

1 11 1

( | ) ( )n n

p q

j j j ji i jk ki kj j

p y p y N y X Z

(9.11)

The fully conditional posterior of each variable is defined as

( | , , ) ( , , , )i i i ip y p y (9.12)

where i is a single element of the parameter vector and i is the

collection of the remaining elements. The symbol means that there is an

ignored constant irrelevant to parameter i . The joint probability density

( , , , ) ( , , )i ip y p y is expressed as

2( , , ) ( | , ) ( | ) ( ) ( | , ) ( | ) ( ) ( | ) ( ) ( )p y p y p p p y p p p p p (9.13)

The fully conditional posterior probability density for each variable is simply derived by treating all other variables as constants and comparing the kernel of the density with a standard distribution. After some algebraic manipulation, we obtain the fully conditional distribution for most of the unknown variables (including parameters and missing values).

The fully conditional poster for the non-QTL effect is

2( | ) ( | , )

ii i ip N

(9.14)

The special notation ( | )ip is used to express the fully conditional

probability density. The three dots after the symbol | mean everything

90

else except the variable of interest. The posterior mean and posterior variance are calculated using

1

2

'' 11

'

1

n np q

i ji ji j j ii jk ki i kj j

X X y X Z

(9.15)

and

1

2 2 2

1i

n

ji

j

X

(9.16)

The fully conditional posterior for the k th QTL effect is

2( | ) ( | , )kk k kp N (9.17)

where

12

2

' '21 1 1 '

p qn n

k jk jk j ji i jk k

j j i k kk

Z Z y X Z

(9.18)

and

12

2 2 2

21

k

n

jk

j k

Z

(9.19)

Comparing the conditional posterior distributions of i with k , we notice

the difference between a normal prior and a uniform prior with respect to the effects on the posterior distributions. When a normal prior is used, a

shrinkage factor, 2

2

k

, is added to 2

1

n

jk

j

Z

. If 2

k is very large, the shrinkage

factor disappears, meaning no shrinkage. On the other hand, if 2

k is small,

the shrinkage factor will dominate over 2

1

n

jk

j

Z

and, in the end, the

denominator will become infinitely large, leading to zero expectation and

zero variance for the conditional posterior distribution k . As such, the

estimated k is completely shrunken to zero. The conditional posterior

distribution for each of the variance component 2

k is scaled inverse chi-

square with probability density,

2 2 2 2 2 2 2( | ) Inv 1, Inv |1,k k k k kp (9.20)

This conditional posterior is proper, regardless whether 0 or not,

and thus Gibbs sampling can be performed. The conditional posterior density for the residual error variance is

91

2 2 2 2 2( | ) Inv , Inv | ,p n SS n SS (9.21)

where

2

1 1 1

p qn

j ji i jk k

j i k

SS y X Z

(9.22)

The next step is to sample the QTL genotypes, which determine the value

of Z j . Let us again use a BC population as an example and consider

sampling the k th QTL genotype given that every other variable is known.

There are two sources of information available to infer the probability for each of the two genotypes of the QTL. One information comes from the

markers denoted by ( 1)jp and ( 1)jp , respectively, for the two genotypes,

where ( 1) ( 1) 1j jp p . These two probabilities are calculated from the

multipoint method (JIANG and ZENG 1997). The other source of information comes from the phenotypic value. The connection between the phenotypic

value and the QTL genotype is through the probability density of jy given

the QTL genotype. For the two alternative genotypes of the QTL , i.e.,

1jkZ and = 1jkZ , the two probability densities are

2

' '1 '

2

' '1 '

( | 1, ) ,

( | 1, ) ,

p q

j jk j ji i jk k ki k k

p q

j jk j ji i jk k ki k k

p y Z N y X Z

p y Z N y X Z

(9.23)

Therefore, the conditional posterior probabilities for the two genotypes of the QTL are

*

*

( 1) ( | 1, )( 1)

( 1) ( | 1, ) ( 1) ( | 1, )

( 1) ( | 1, )( 1)

( 1) ( | 1, ) ( 1) ( | 1, )

j j jk

j

j j jk j j jk

j j jk

j

j j jk j j jk

p p y Zp

p p y Z p p y Z

p p y Zp

p p y Z p p y Z

(9.24)

where *( 1) ( 1| )j jqp p Z and

*( 1) ( 1| )j jqp p Z . The genotype of

the QTL is 2 1jqZ u , where u is sampled from a Bernoulli distribution with

probability *( 1)jp .

FIXED INTERVAL

So far we have completed the sampling process for all variables except the QTL positions. If we place a large number of QTL evenly along the genome, say one QTL in every 20 cM, we can let the positions fixed (do not move) across the entire MCMC process. Although this fixed-position approach

92

does not generate accurate result, it does provide some general information about the ranges where the QTL are located. Suppose that the trait of interest is controlled by only 5 QTL and we place 100 QTL evenly on the genome, then majority of the assumed QTL are spurious. The Bayesian shrinkage method allows effects of the spurious QTL to be shrunken to zero. This is why the Bayesian shrinkage method does not need variable selection. A QTL with estimated effect close to zero is equivalent to being excluded from the model. When the assumed QTL positions are fixed, investigators actually prefer to put the QTL at marker positions because the marker positions contain the maximum information. This multiple marker analysis is recommended before conducting detailed fully Bayesian analysis with QTL positions moving. Result of the detailed analysis will be more or less the same as that of the multiple marker analysis. Further detailed analysis is only conducted after the investigators get a general picture of the result.

RANDOM WALK

We now discuss several different ways to allow QTL positions to move across the genome. If our purpose of QTL mapping is to find the regions of the genome that most likely carry QTL, the number of QTL is irrelevant and so are the QTL identities. If we allow QTL positions to move, the most important information we want to capture is how many times a particular segment (position) of the genome is hit or visited by non-spurious QTL. A position can be visited many times by different QTL, but all these QTL have negligible effects, such a position is not of interest. We are interested in positions that are visited repeatedly by large QTL. Keeping this in mind, we propose the first strategy of QTL moving, the random walking strategy. We start with a "sufficient" number of QTL evenly placed on the genome. How sufficient is sufficient enough? This perhaps depends on the marker density and sample size of the mapping population. Putting one QTL in every 20 cM seems to work well. Each QTL is allowed to travel freely between the left and the right QTL. In other words, the QTL are distributed along the genome in a disjoint manner. The positions of the QTL are moving but the order of the QTL is reserved. This is the simplest method of QTL traveling.

Let us take the k th QTL for example, the current position of the QTL is

denoted by k . The new position can be sampled from the following

distribution,

*

k k (9.25)

where ~ (0, )U and is the maximum distance (in cM) that the QTL is

allowed to move away from the current position at each step. The following

restriction *

1 1k k k is enforced to make sure that the order of the QTL

does not change. Empirically, 2 cM seems to work well. The new

93

position is always accepted, regardless whether it is more likely or less likely to carry a true QTL relative to the current position. The Markov chain should be sufficiently long to make sure all putative positions are visited a number of times. Theoretically, there is no need to enforce the disjoint distribution for the QTL positions. The only reason for such a restriction is the convenience of programming if the order is reserved. With the random walk strategy of QTL moving, the frequency of hits by QTL at a position is not of interest; instead, the average effect of all the QTL hitting that position is the important information. The random walk approach does not distinguish "hot regions" (regions containing QTL) and "cold regions" (regions without QTL) of the genome. All regions are visited with equal frequency. The hot regions, however, are supposed to be visited more often than the cold regions to get a more accurate estimate of the average QTL effects for those regions. The random walk approach does not discriminate against the cold regions and thus needs a very long Markov chain to make sure that the hot regions are sufficiently visited for accurate estimation of the QTL effects.

MOVING INTERVAL

The optimal strategy for QTL moving is to allow QTL to visit the hot regions more often than the cold regions. This sampling strategy cannot be accomplished using the Gibbs sampler (GEMAN and GEMAN 1984) because the conditional posterior of the position of a QTL does not have a well known form of the distribution. Therefore, the Metropolis-Hastings algorithm (HASTINGS 1970; METROPOLIS et al. 1953) is adopted here to sample the QTL positions. Again, the new position is randomly generated in the neighborhood of the old position using the same approach as used in the

random walk approach, but the new position *

k is only accepted with a

certain probability. The acceptance probability is determine based on the

Metropolis-Hastings rule, denoted by *min[1, ( , )]k k . The new position

*

k

has an *1 min[1, ( , )]k k chance to be rejected, where

1*

* *1

*1

1*

11

Pr( | ) ( | , )( , ) ( | )

( , )( , ) ( | )

Pr( | ) ( | , )

n

jk k j jk

lj k k kk k n

k k kjk k j jk

lj

Z l p y Z lp q

p qZ l p y Z l

(9.26)

If the new position is rejected, the QTL remains at the current position, i.e., *

k k . If the new position is accepted, the old position is replaced by the

new position, i.e., *

k k . Whether the new position is accepted or not,

all other variables are updated based on the information from position *

k

where Pr( 1| )jk kZ and Pr( 1| )jk kZ are the conditional probabilities

94

that 1jkZ and 1jkZ , respectively, calculated from the multipoint

method. These probabilities depend on position k . Previously, these

probabilities were denoted by ( 1) Pr( 1| )j jk kp Z and

( 1) Pr( 1| )j jk kp Z , respectively. For the new position *

k , these

probabilities are *Pr( 1| )jk kZ and

*Pr( 1| )jk kZ , respectively. The prior

distribution for the parameters are denoted by *( , )kp and ( , )kp ,

respectively. These two probability densities are usually cancelled out each

other. The proposal probabilities *( | )k kq and *( | )k kq are normally equal

to 1/ (2 ) and thus are also canceled out each other. However, once k

and *

k are near the boundaries, these two probabilities may not be the

same. Since the new position is always restricted to the interval where the

old position occurs, the proposal density ( | )k kq and its reverse partner

( | )k kq may be different. Let us denote the positions of the left and right

QTL by 1k and 1k , respectively. If k is close to the left QTL so that

1k k , then the new position must be sampled from

1~ ( , )k k k kU

to make sure that the new position is within the

required sample space. Similarly, if k is close to the right QTL so that

1k k , then the new position must be sampled from

1~ ( , )k k kU

. In either case, the proposal density should be modified.

The general formula of the proposal density after incorporating the modification is

1

1

1

1

1if

( )

1( | ) if

( )

1otherwise

2

k k

k k

k k k k

k k

q

(9.27)

The assumption of using the above proposal density is that the distance

between any two QTL must be larger than . The reverse partner of this

proposal density is

95

*

1*

1

* *

1*

1

1if

( )

1( | ) if

( )

1otherwise

2

k k

k k

k k k k

k k

q

(9.28)

The difference between sampling k and sampling other variables are (1)

the proposed new position may or may not be accepted while the new values of all other variables are always accepted and (2) when calculating the acceptance probability for a new position, the likelihood does not depend on the QTL genotype while the conditional posterior probabilities of all other variables depend on the sampled QTL genotypes.

SUMMARY OF THE MCMC SAMPLING PROCESS

The MCMC sampling process is summarized as follows.

1. Choose the number of QTL to be placed in the model, q

2. Initialize parameters and missing values, (0) and (0)

jk jkZ Z

3. Sample i from 2( | , )i

i iN

4. Sample k from 2( | , )kk kN

5. Sample 2

k from 2 2 2Inv ( |1, )k k

6. Sample 2 from 2 2Inv ( | , )n SS

7. Sample jkZ from its conditional posterior distribution

8. Sample k using the Metropolis-Hastings algorithm

9. Repeat step (3) to step (8) until the Markov chain is sufficiently long

The length of the chain should be sufficiently long to make sure that, after burn-in deleting and chain trimming, the posterior sample size is large enough to allow accurate estimation of the posterior means (modes or medians) of all QTL parameters. Methods and computer programs are available to check whether the chain has converged to the stationary distribution (BROOKS and GELMAN 1998; GELMAN and RUBIN 1992; GEWEKE et al. 1992; RAFTERY and LEWIS 1992). Our past experience showed that the burn-in period may only contain a few thousand observations. The trimming frequency of saving one in every 20 observations is sufficient. The posterior sample size of 1000 usually works well. However, if the model is not very large, it is always a good idea to delete more observations for the burn-in

96

and trim more observations to make the chain thinner.

POST MCMC ANALYSIS

The MCMC sampling process is much like doing an experiment. It only generates data for further analysis. The Bayesian estimates will only be available after summarizing the data (posterior sample). The parameter

vector is very long but not all parameters are of interest. Unlike other

methods in which the number of QTL is an important parameter, the Bayesian shrinkage method uses a fixed number of QTL, and thus q is not

the parameter of interest. Although the variance component for the k th QTL, 2

k , is a parameter, it is not a parameter of interest. It only serves as a

factor to shrink the estimated QTL effect. Since the marginal posterior of 2

k

does not exist, the empirical posterior mean or posterior mode of 2

k does

not have any biological meaning. In some observations, the sampled 2

k

can be very large and in others it may be very small. The residual error

variance 2 is meaningful only if the number of QTL placed in the model is

small to intermediate. When q is very large, the residual error variance will

be absorbed by the very large number of spurious QTL. The only parameters that are of interest are the QTL effects and QTL positions.

However, the QTL identity, k , is also not something of interest. Since the kth QTL may jump all of the places over the chromosome where it is

originally placed, the average effect k does not have any meaningful

biological interpretation. The only things left are the positions of the genome that are hit frequently by QTL with large effects. Let us consider a fixed position of a genome. A position of a genome is only a point or a locus. Since the QTL position is a continuous variable, a particular point of the genome is hit by a QTL has a probability of zero. Therefore, we define the

genome position by a bin with a width of d cM, where d can be 1 or 2 or

any other suitable value. The mid value of the bin represents the genome

location. For example, if 2d cM, the genome location 15 cM actually

represents the bin covering a region of the genome from 14 cM to 16 cM,

where 14 15 / 2d and 16 15 / 2d . Once we define the bin width of a

genome location, we can count the number of QTL that hit the bin. For each hit, we record the effect of that hit. The same location may be hit many times by QTL with the same or different identities. The average effect of the QTL hitting the bin is the most important parameter in the Bayesian shrinkage analysis. Each and every bin of the genome has an average QTL effect. We can then plot the effect against the genome location to form a QTL (effect) profile. This profile represents the overall result of the Bayesian

mapping. In the BC example of Bayesian analysis, the k th QTL effect is

denoted by k . Since the QTL identity k is irrelevant, it is now replaced by

97

the average QTL effect at position , which is a continuous variable. The

without a subscript indicates a genome location. The average QTL effect

at position can be expressed as ( ) to indicate that the effect is a

function of the genome location. The QTL effect profile is now represented by ( ) . If we use ( ) to denote the posterior mean of QTL effect at

position , we may use 2( ) to denote the posterior variance of QTL

effect at position . If QTL moving is not random but guided by the

Metropolis-Hastings rule, the posterior sample size at position should be

a useful piece of information to indicate how often position is hit by a

QTL. Let ( )n be the posterior sample size at , the standard error of the

QTL effect at should be ( ) / ( )n . Therefore, another useful profile is

the so called t -test statistic profile expressed as

( )

( ) ( )( )

t n

(9.29)

The corresponding F -test statistic profile is

2

2

( )( ) ( )

( )F n

(9.30)

The t-test statistic profile is more informative than the F-test statistic profile because it also indicates the direction of the QTL effect (positive or negative) while the F-test statistic profile is always positive. On the other hand, the F-test statistic can be extended to multiple effects per locus, e.g., additive and

dominance in an 2F design. Both the t-test and F-test statistic profiles can

be interpreted as kinds of weighted QTL effect profiles because they incorporated the posterior frequency of the genome location. The current version of PROC QTL only reports the posterior sample size ( )n , the

average posterior mean ( ) and the posterior standard deviation ( ) .

Other posterior statistics, such as the equal-tail credibility interval, the highest posterior density interval and the t-test statistics etc. will be added later.

Note that the t-test or F-test statistics highly depends on the posterior standard deviations, which may be deflated by autocorrelation of the posterior samples. Therefore, the Markov chain should be thinned (trimmed) highly to reduce the autocorrelation. In other words, you should delete more observations of the posterior sample.

BAYESIAN MAPPING FOR ORDINAL TRAITS

The Bayesian method for ordinal trait QTL mapping is different from the generalized linear model described in interval mapping. Here, we assume an underlying variable, called liability, for each individual. This latent

98

variable is connected to the ordinal phenotype by a series of thresholds. An individual will fall into a particular category if the liability of this individual is within an interval defined by a particular pairs of thresholds. Suppose that a disease phenotype of individual j ( 1, ,j n ) is measured by an ordinal

variable denoted by 1, , 1jS r , where 1r is the total number of

disease classes and n is the sample size. Let jy be the underlying liability

for individual j . Let k be the k-th threshold for 0, , 1k r where

1k k , 0 and 1r . The connection between jy and jS is

jS k if 1k j ky for 1, , 1k r . The underlying liability jy is treated

as a regular quantitative trait and described by the usual linear model

1 1

p

j

q

j ji i jk k

i k

y X Z

(9.31)

where ~ (0,1)j N is assumed. The parameter vector now contains all the

parameters described in the regular quantitative trait QTL mapping plus r

thresholds of the liability 1[ ... ]T

r . The Bayesian method for such

an ordinal trait requires sampling jy conditional on { , , , }jS and

sampling conditioning on all the jy ’s.

Given , , ,jS , the liability jy follows a truncated normal distribution

between 1k and k with mean j and variance 1, where

1 1

p q

j ji i jk k

i k

X Z

(9.32)

This truncated normal distribution is denoted by

1

( | ...) ,1k

k j jjp y N y

(9.33)

The algorithm we used to sample jy is through the inverse distribution

function as described below. First, we sample a uniform variable from

~ ( , )ju U a b (9.34)

where

1

1 1

p q

ji i jk k

i k

k X Za

(9.35)

and

99

1 1

p q

ji i jk k

i k

k Xb Z

(9.36)

We then assign jy a value using

1( )j jy u (9.37)

Such sampled jy is guaranteed to follow the truncated normal distribution.

Given all the sampled jy ’s, we can sample the thresholds. First, we classify

all individuals into 1r groups based on their observed ordinal phenotypes. Let

min ( ) minj

jS k

y k y

(9.38)

be the minimum value of the jy ’s for all individuals with jS k and

max ( ) maxj

jS k

y k y

(9.39)

be the maximum value of the jy ’s for all individuals with jS k . The

posterior distribution of k given all the jy ’s is uniform,

max min( | ) | ( ), ( 1)k kp y U y k y k (9.40)

from which k is sampled. Once all the k ’s and jy ’s are sampled, other

parameters are sampled the same way as the Bayesian QTL mapping for regular quantitative traits.

SAMPLING MISSING PHENOTYPIC VALUES

The problem of missing phenotypic values can be handled by using the Bayesian method of QTL mapping. For single trait analysis, individuals with missing phenotypic values may be deleted from the analysis, because they do not provide any additional information. Alternatively, you may replace the missing values by the average of the trait in the mapping population. Another approach is to replace the missing values by the expectations, i.e.,

replace the missing jy by j , where

1 1

p q

j ji i jk k

i k

X Z

(9.41)

Since the expectation is a function of the parameters, it varies throughout the MCMC process. The Bayesian method of PROC QTL handles missing

phenotypic values through sampling. Let jy be the missing phenotypic

100

value and *

jy be the sampled value to replace jy . The distribution used to

sample *

jy is 2( , )jN , where 2 is the residual error variance. All

parameters take their current values in the MCMC process.

Although deleting individuals with missing phenotypic values does not hurt the result for single trait QTL mapping, it does reduce the efficiency for multiple trait QTL mapping (to be described later).

PERMUTATION

The permutation option in the PROC QTL statement allows users to perform permutation analysis for the Bayesian method. The permutation analysis is different from the permutation test in interval mapping (CHURCHILL and DOERGE 1994). You need to analyze the data using the Bayesian method first to provide the posterior means for all the QTL. The permutation analysis will draw the QTL effects from the null model (assuming there are no QTL effects). To generate the null distributions, you need to permute the data in every cycle of the MCMC sampling process. This is why you cannot conduct the permutation outside the QTL procedure. Once the permutation option is turned on, the phenotypic data are reshuffled before parameters are sampled. When every parameter is drawn, the phenotypic data are reshuffled again. This process continues until a desired length of the Markov chain has been reached. The posterior sample will contain all the QTL effects drawn from the null distribution. You can then

calculate the 100%a and (1 ) 100%a values for each QTL effect of the

posterior sample, where 0.05 or other value of the user's choice. PROC

QTL only helps you draw the posterior sample from the null model. You must perform the post MCMC analysis using other methods available either in SAS or other software packages. If the posterior mean of a particular

marker or QTL from the original (unshuffled) data is beyond the 2 100%

and (1 2) 100% interval, the QTL can be declared as "significant" at the

level. Details of the Bayesian permutation analysis can be found in Che

and Xu (2010). In summary, you need to perform two MCMC runs to complete a permutation tested Bayesian mapping. One MMC run is the analysis for the original data and the other is the MCMC run for the repeatedly permuted data. The original data analysis provides the estimated

QTL effects. The permutated data analysis provides the 2 100% and

(1 ) 02 1 0% interval drawn from the null distribution.

101

Figure 4. Estimated QTL effects for the entire genome and the empirical thresholds

drawn from permutation within the Markov chain analysis at α=0.05 (2.5% - 97.5% ,

wider interval) and α=0.10 ( 5% - 95%, narrower interval).

Genome Position (CentiMorgan)

0 40 80 120 160 200 240 280 320 360

QT

L E

ffect

-6

-4

-2

0

2

4

6

102

BAYESIAN MAPPING FOR DISCRETE TRAITS

The Bayesian method for QTL mapping covers quantitative traits and ordinal traits. The ordinal traits, however, belongs to the category of discrete traits. Under the liability model, ordinal traits are simply discrete observations of underlying quantitative traits. Therefore, we discussed ordinal traits in the chapter of Bayesian mapping for quantitative traits. There are many different discrete traits with a polygenic background. Three of them are very important in experiments and thus will be introduced here in this chapter. These traits can be mapped using the so called generalized linear model (GLM). Although generalized linear model has been used for mapping discrete traits in interval mapping, the GLM under the Bayesian mapping framework is slightly different from the GLM in the interval mapping. The GLM to be used here takes advantage of normal distribution while the GLM described before does not require normal distribution. The basic idea of the GLM here is to find a functional transformation of a discrete trait so that the transformed trait is normally distributed conditional on parameters. Since the parameters are sampled during the MCMC sampling process, the transformation is constantly updated using newly sampled parameters. Because the transformed trait is conditionally normal, the conditional posterior distribution of each QTL effect is also normal provided that the prior is also normal. Therefore, we can fully take advantage of existing sampling algorithms learned previously in the Bayesian mapping for continuous traits.

GENERALIZED LINEAR MODEL

Let jw be the value of a trait for individual j in a population of size n . The

trait does not have to be continuously distributed. It can be of binary, binomial, Poisson or some other type of distribution in the exponential family. We can analyze such traits using the generalized linear model, which consists of the following three components: (1) a linear predictor, (2) a monotonic mapping between the mean of the data and the linear predictor and (3) a response distribution in the exponential family of distributions (GELMAN 2005). Let

1 1

p q

j ji i jk k

i k

X Z

(10.1)

be the linear predictor for individual j . The QTL and non-QTL effects along

with the design matrices are defined exactly the same as those in Bayesian mapping for quantitative traits. Therefore, no further discussion will be given for the sampling of these effects. The model can be simplified in a compact form

103

j j jX Z (10.2)

where i and k are vectors rather than single elements. Let

( | )j jE w be the expectation of jw so that

1 1( ) ( )j j j jX Z (10.3)

or

( )j j j jX Z (10.4)

where ( )j is called the link function that connects the expectation of the

trait to the linear predictor. The parameters include non-QTL

effects (nuisance parameters) and QTL effects ( ). Wolfinger and

O’connell (1993) showed that the likelihood function of the original datum

jw can be approximated by a normal likelihood function of the pseudo

datum jy , where the pseudo datum is a function of the parameter .

Let

j j jX Z (10.5)

and

1( )| |

j j j j

j j

j

j j

(10.6)

The pseudo datum ( )jy is defined as

1 1( )j j j j jy w (10.7)

which is approximately normal with mean j j jX Z and variance

2 1 1var( | )j j j js w (10.8)

The quantity inside the above “sandwich” expression of 2

js is var( | )jw ,

which is the variance of the observed data point given . The above

mean and variance of the pseudo datum ( )jy is the conditional mean and

conditional variance given j j jX Z .

We now write the linear model for the pseudo datum ( )y as a vector,

y X Z (10.9)

104

which is distributed as

~ ( , )Ty N X ZGZ R (10.10)

where 2 2

1 1var( ) diag[var( ),..., var( )] diag( ,..., )p pG and

2 2

1diag( ,..., )nR s s . Now the typical mixed model Bayesian analysis can be

performed using the pseudo data ( )y . The pseudo data and matrix R ,

however, are functions of the parameters. Therefore, the MCMC implemented Bayesian analysis involves an extra step within each cycle of the samplings, that is recalculation of the pseudo data and R after all parameters have been sampled.

The non-QTL effects is a 1q vector, which is not subject to shrinkage

and, therefore, uniform prior is given to . The QTL effects are denoted by

an 1p vector with the k th component representing the genetic effect of

locus k . The prior distribution for k is 2(0, )kN and the variance in the

prior is further described by

2 2( ) Inv ( , )kp (10.11)

The Bayesian shrinkage analysis of Xu (2003) actually adopted the Jeffreys’

prior 2 2( ) 1/k kp , which is a special case of the scaled inverse chi-square

with ( , ) (0,0) . This prior is improper but usually generates highly sparse

models. ter Braak, Boer and Bink (2005) proposed to use ( , ) ( 2 ,0) ,

where 0 0.5 . This is a proper prior, meaning that a posterior

distribution exists. The current version of PROC QTL simply takes ( , ) (0,0) . More options will be added later. We now provide three

special cases of the GLM.

BINARY DATA

The first example is the binary trait where the probit link function is used.

The trait is defined as 0,1jw where ( )j jE w and the link function is

1( ) ( )j j j j jX Z (10.12)

This link function leads to

( )

| | ( )j j j j

j j

j j

j j

(10.13)

and

var( | ) ( ) 1 ( )j j jw (10.14)

105

Therefore, the pseudo datum is

1

( ) ( )( )

j j j j

j

y w

(10.15)

The mean and variance of the pseudo datum are

j j jX Z (10.16)

and

2 1 1

2

( ) 1 ( )( ) var( | )

( )

j j

j j j j

j

s w

(10.17)

respectively.

BINOMIAL DATA

The trait is defined as /j j jw m n where jm is called the number of events

and jn is called the number of trials. Let ( )j jE w and the link function is

1( ) ( )j j j j jX Z (10.18)


( )

| | ( )j j j j

j j

j j

j j

(10.19)

and

1

var( | ) ( ) 1 ( )j j j

j

wn

(10.20)

Therefore, the pseudo datum is

1

( ) ( )( )

j j j j

j

y w

(10.21)

The mean and variance of the pseudo datum are

j j jX Z (10.22)

and

2 1 1

2

( ) 1 ( )( ) var( | )

( )

j j

j j j j

j j

s wn

(10.23)

respectively.

106

The binary trait is a special case of the binomial trait when 1jn for all

1,...,j n . Therefore, PROC QTL only provides two options for the

generalized linear model, the binomial (including binary as special case) and Poisson options.

POISSON DATA

The second example is the Poisson data 0,1,...,jw . The PDF of the

Poisson distribution is

( ) exp!

jw

j

j j

j

p ww

(10.24)

where ( ) var( )j j jE w w . The log link function is used for the Poisson data,

( ) ln( )j j j j jX Z (10.25)


exp( )

| | exp( )j j j j

j j

j j

j j

(10.26)

Therefore, the pseudo datum and the variance of the pseudo datum are

1

( ) exp( )exp( )

j j j j

j

y w

(10.27)

and

2 1 1 1( ) var( | )

exp( )j j j j

j

s w

(10.28)

respectively.

107

EMPIRICAL BAYESIAN METHOD

MAIN QTL EFFECT MODEL

Empirical Bayes is still a Bayesian method, but the hyper parameters (the parameters of the prior distribution) are not preselected by the investigators, instead, they are estimated from the same data as used in the Bayesian analysis (XU 2007b). Once the hyper parameters are estimated, they are used in the Bayesian analysis as if they were the hyper parameters of the prior distributions. The data are actually used twice, once for estimating the hyper parameters and once for estimating the Bayesian posterior means. In the QTL mapping problem, the flat prior for does not have any hyper

parameters. If uniform prior is used for the residual variance 2 , there is

also no hyper parameter for the uniform prior. The prior for the QTL effect is independent normal

2 2( | ) ( | 0, )k k k kp N (11.1)

where 2

k is the variance of the prior distribution. It is a hyper parameter,

which is assigned another higher level prior. We use

2 2 2( ) Inv ( | , )k kp (11.2)

as the prior, where and are hyper parameters at the higher level. For

q model effects, the number of 2

k is q , which can be a large number. In

the fully Bayesian method under the hierarchical model, 2

k is estimated

simultaneously along with k . In the empirical Bayes method, we estimate 2

k first, independent of k , from the same data set. Recall that the linear

model for jy is

1 1

p q

j ji i jk k j

i k

y X Z

(11.3)

where 2~ (0, )j N is normally distributed. The compact matrix notation of

this model is

y X Z (11.4)

when k is treated as a random effect, the expectation of k is zero.

Therefore,

( )E y X (11.5)

108

The variance-covariance matrix of y is

2 2

1

var( )q

T

k k k

k

y V Z Z I

(11.6)

Let 2 2 2

1 , , ,q be the vector of variance components. The

distribution of y is multivariate normal

( | , ) ( | , )p y N y X V (11.7)

The log likelihood function for parameters , is

1 11 1

( ) ln | | ( ) (1

( 2) l)2

n | tr(22

)2

T TL V y X V y D DX

| (11.8)

where 2 2

1diag{ ,..., }qD . This likelihood is not a function of the QTL

effects. We can maximize this log likelihood function with respect to the

parameters to obtain MLE of { , } denoted by ˆ ˆ ˆ{ , } . These

parameters are then treated as known quantities and used to derive the

posterior distribution for the QTL effects k . The posterior mean of each k ,

given ˆ , is the empirical Bayes estimate of k . We need a conditional

updating algorithm to find ˆ ˆ ˆ, , such as the algorithm given by Xu

(2007b).

EPISTATIC QTL EFFECT MODEL

One advantage of the empirical Bayesian method over the fully Bayesian method is its fast computational speed because MCMC sampling is not required for the empirical Bayes. As a result, the method can handle even larger models, e.g., the epistatic QTL effect model. Let q be the number of

loci included in the model, the total number of QTL effects is ( 1) / 2q q ,

including q main effects and ( 1) / 2q q pairwise interaction effects. Higher

order interactions may also be included in the model if q is not too large.

The epistatic effect model is

1

' '

1 1 ' 1

#q q q

k k k k kk

k k k k

y X Z Z Z

(11.9)

where '#k kZ Z represents direct multiplication of the two vectors and 'kk is

the epistatic effect between loci k and 'k . Parameters of both the main

effect model and the epistatic effect model are estimated using the same algorithm. Therefore, we only describe the updating algorithm for the main effect model.

109

SIMPLEX ALGORITHM

The high dimensionality of the model ( 1)p q prohibits the use of

simultaneous search for all parameters. Instead, we adopt a conditional updating algorithm to search for one parameter at a time, given the remaining parameters. We then take turn to update each one of the other parameters. The iteration continues for many cycles until convergence is reached. When updating a single parameter, it is a single dimensional problem and usually an explicit solution is possible. First, we define the

parameter value at the t th iteration by ( ) ( ) ( ),t t t . Let

( ) 2( ) 2( )

1

qt T t t

k k k

k

V Z Z I

(11.10)

We can write the conditional log likelihood function for each parameter. For the non-QTL effect, we have the following conditional log likelihood function,

(

( ) ( ) ( )

(

1

) ) 1

2 2

1( 2) l

1 1( | ) ln | | ( ) ( )

n | tr ( )2

( )

2

t t

t t T tL V y X V y X

D D

|

(11.11)

Setting ( )( | ) 0tL

and solving for , we obtain an updated ,

denoted by

1

( 1) ( ) 1 ( ) 1( ) ( )t T t T tX V X X V y

(11.12)

The conditional log likelihood function for the residual variance is

2 ( ) ( ) 2( ) ( ) 2

1( ) 2( ) ( ) ( )

2

1 1( | , ) ln ln( )

1

2

2 2

t t t t

Tt t t t

L V

y X V y X

(11.13)

Setting 2 ( ) ( )

2( | , ) 0t tL

and solving for 2 yields

2( )

2( 1) ( ) ( ) 1 ( )( ) ( ) ( )t

t t T t ty X V y Xn

(11.14)

The conditional log likelihood function for 2

k is

110

2 ( ) ( ) ( ) 1 2 2( )

22 2( ) ( ) ( ) 1

( ) 1 2 2( )

2

2

1( | , ) ln ( ) ( ) 1

( ) ( ) ( )1

( ) ( ) 1

1

2

2

( 2) ln22

t t T t t

k k k k k

t t T t

k k k

T t t

k k k k

k

k

L Z V Z

y X V Z

Z V Z

(11.15)

Explicit solution does not exist and thus we used the simplex algorithm of

Nelder and Mead (1965) to search for the solution. When 2 and 0 ,

there is an explicit solution. We set 2 ( ) ( )

2( | , ) 0t t

k

k

L

and solve for 2

k ,

yielding

2( ) ( ) 1 ( ) 1

2( ) 2( )

2( ) 1

( ) ( ) ( )

( )

t T t T t

k k kt t

k kT t

k k

y X V Z Z V Z

Z V Z

(11.16)

The following initial values are used by PROC QTL,

(0) 1

2(0) (0) (0)

2(0)

( )

1( ) ( )

0, 1, ,

T T

T

k

X X X y

y X y Xn

k q

(11.17)

The MLE of , are used, as if they were true values of the

parameters, to derive the posterior means of the QTL effects.

2 1( | ) ( )T

k k kE y Z V y X (11.18)

Such a posterior mean is called the best linear unbiased predictions (BLUP). The posterior variance is

2 1 2var( | ) (1 )T

k k k k ky Z V Z (11.19)

Let ˆ ( | )k kE y and ˆ var( | )k kS y , the t -test statistic is

ˆ

ˆ

k

kkt

S

(11.20)

which can be plotted against the genome location to produce a visual presentation of the QTL effects. Users have an option to choose ( , ) . The

default setting is ( , ) (0,0) , corresponding the Jeffreys’ prior (FIGUEIREDO

2003).

111

BAYESIAN MAPPING FOR MULTIPLE TRAITS

MULTIPLE CONTINUOUS TRAITS

Let 1[ ... ]T

j j jmy y y , for 1,...,j n , be a 1m vector for the phenotypic

values of m quantitative traits measured from the j th individual of an 2F

mapping population, where n is the sample size. The vector of phenotypic

values is described by the following multivariate linear model,

1 1

p p

j jk k jk k j

k k

y X Z e

, (12.1)

where 1[ ... ]T

m is an 1m vector of the population means (or

intercepts) for the m traits, 1[ ... ]T

k k mk and 1[ ... ]T

k k mk are

the additive and dominance effects, respectively, for locus k ( 1,...,k p )

and p is the number of loci included in the model. Both k and k are

1m vectors because there are m traits involved in the model. The residual

error 1[ ... ]T

j j jme e e is an 1m vector with an assumed multivariate

normal distribution (0, )N , where is an m m positive definite variance-

covariance matrix. The independent variables, jkX and jkZ , are defined as

follows. Let 1 1A A , 1 2A A and 2 2A A be the three genotypes at locus k . These

two variables are

1 1 1 1

1 2 1 2

2 2 2 2

1 for 0 for

0 for and 1 for

1 for 0 for

jk jk

A A A A

X A A Z A A

A A A A

. (12.2)

The scales of these independent variables are arbitrary. Alternative scales have been used by other investigators, e.g., Yang et al. (2006).

Under the assumption of normal distribution for the residual errors, the

conditional probability density of jy is

1 1

( | , , , ) ( | , )p p

j j jk k jk kk kp y N y X Z

, (12.3)

The likelihood function of the parameters is proportional to

1

( , , , ) ( | , , , )n

j

j

L p y

, (12.4)

The parameter vector is { , , , , } , where k is the array of

112

QTL positions. The QTL genotype arrays, 1{ }n

j jX X and 1{ }n

j jZ Z , are

not parameters of interest but missing values in QTL mapping. They are interesting quantities when marker assisted selection is considered after QTL mapping. The likelihood function serves as a link between the data, the parameters and the missing values. Combined with the prior distribution of the parameters, the likelihood function is used to derive the posterior distribution of the parameters. The number of QTL is p , which is supposed

to be a parameter of interest in the classical QTL mapping experiment, but in the Bayesian shrinkage analysis, it is a preset constant. We set p as the

number of marker intervals. If an interval does not contain a QTL, the QTL effects will be shrunken to zero. Therefore, a QTL with effect of zero is equivalent to being excluded from the model. With this shrinkage analysis, model selection is not conducted explicitly but implicitly via shrinkage.

Each of the parameters is assigned a prior distribution. The population mean can be estimated accurately from the data, and thus a flat prior is

given to , i.e., ( ) constant.p Each of the QTL effect vectors is assigned

a normal prior, ( ) ( | 0, )k k kp N A and ( ) ( | 0, )k k kp N B , where kA and

kB are unknown variance-covariance matrices with dimension m m . The

above notation for the probability distribution is adopted from (GELMAN

2005), which are equivalently expressed as ~ (0, )k kN A and ~ (0, )k kN B .

The key difference between the shrinkage analysis and the usually Bayesian regression analysis is that these prior variance-covariance matrices are effect specific, i.e., they vary across different loci. Another difference between the two is that the hyper-parameters (parameters of the

prior), kA and kB , are not known a priori but estimated from the data. To do

this, we give each of them a prior distribution. Once we assign a prior distribution to a hyper-parameter, there will be multilevel prior assignment. This is called hierarchical modeling (LINDLEY and SMITH 1972). We assign the variance-covariance matrices with the following inverse Wishart

distributions, ( ) Inv-Wishart( | , )k kp A A and ( ) Inv-Wishart( | , )k kp B B ,

where 1m and | | 0 are the prior degree of freedom and prior scale

matrix. These hyper-parameters are already remote from k and k , and

thus they can be preset with some convenient values (constant across loci) without affecting the posterior inference of the QTL effects. To reflect the lack of knowledge, and are set with values as small as possible, e.g.,

m and 0.1 mI , where mI is an m m identity matrix. The residual

variance-covariance matrix is also assigned the same inverse Wishart distribution, ( ) Inv-Wishart( | , )p . Although is a parameter of

interest, data are usually sufficient to provide an accurate estimate of , and thus the hyper-parameters and will have little influence on the

estimated . Finally, a uniform prior distribution for k is chosen. Since we

113

assume that each marker interval contains one and only one QTL, the

uniform distribution for k is ( ) ( | , ) 1/ ( )L R R L

k k k k k kp U . All these

priors are independent across loci. Therefore, the joint prior distribution of the parameters is

1

( , , , , ) ( ) ( ) ( ) ( ) ( )p

k k k

k

p p p p p p

. (12.5)

The distribution of QTL genotype array is

1

( | ) ( | )n

j

j

p X p X

. (12.6)

To simplify the sampling process, it is easier to sample one variable at a time conditional on values of all other variables. The single variable defined

here means a vector of variables with the same type. For example, k is

defined as a variable, but it is a vector containing the additive effects for all traits. The conditional posterior distribution for one variable usually has an explicit form of the distribution, making Monte Carlo simulation easy. We now provide the posterior distribution for each of the parameters.

The conditional posterior distribution of is multivariate normal with mean

and variance given by

1 1 1

1E( | ...) ( )

p pn

j jk k jk k

j k k

y x zn

(12.7)

and

1

var( | ...)n

, (12.8)

respectively, where the special notation ( | ...) means conditional on all

other variables.

The conditional posterior for k is multivariate normal with the following

mean and variance,

1

2 1 1 1

' '11 ' 1

E( | ...) ( )p pn

n

k jk k jk j jk k jk kjj k k k

X A X y X Z

(12.9)

and

1

2 1 1

1var( | ...)

n

k jk kjX A

. (12.10)

Similarly, the conditional posterior for k is also multivariate normal with

mean and variance of

114

1

2 1 1 1

' '11 1 '

E( | ...) ( )p pn

n

k jk k jk j jk k jk kjj k k k

Z B Z y X Z

(12.11)

and

1

2 1 1

1var( | ...)

n

k jk kjZ B

, (12.12)

respectively. The conditional posterior means of k and k are called the

shrinkage estimates. Derivation of the shrinkage estimates can be found in a recent note by Xu (2007a).

The hierarchical model also requires sampling of kA and kB from their

conditional posterior distribution. The inverse Wishart prior is conjugate and

thus the conditional posteriors of kA and kB are also inverse Wishart,

( | ...) Inv-Wishart( | 1, )T

k k k kp A A (12.13)

and

( | ...) Inv-Wishart( | 1, )T

k k k kp B B . (12.14)

The conditional posterior for the residual variance-covariance matrix is inverse Wishart due to the conjugate nature of the prior,

( | ...) Inv-Wishart | ,p n SS , (12.15)

where

1 1 1 11

n Tp p p p

j jk k jk k j jk k jk kk k k kj

SS y x z y x z

.(12.16)

The distribution of jkX is discrete, and thus the conditional posterior

distribution can be obtained from the Bayes’ theorem. Let

1 2 3[ ] [ 1 0 1]T Tg g g g

be the three genotype indicators for variable jkx and

1 2 3[ ] [0 1 0]T Th h h h

be the three genotype indicators for variable jkZ . Assume that L

jk um g

( 1,2,3u ) and R

jk vm g ( 1,2,3v ) are the observed genotypes for the two

flanking markers. The conditional posterior probability for jk wX g

( 1,2,3)w is calculated using the following Bayes’ theorem,

115

*

3 *

' ' '' 1

( | ...)

( ) ( , ) ( , ) ( | , )

( ) ( ', ) ( ', ) ( | , )

L R

L R

jk w

jk w j w k w kkm km

jk w j w k w kkm kmw

p X g

p X g H w u H w v N y g h

p X g H w u H w v N y g h

. (12.17)

where

1

1 3 22( ) ( ) ( ) 1/ 4jk jk jkp X g p X g p X g

is the Mendelian segregation ratio. Other items in the above Bayes’ theorem are defined as follows.

*

' ' ' '

' '

p p

j j jk k jk k

k k k k

y y X Z

, (12.18)

is the adjusted phenotypic value of individual j by removing effects of all

other QTL except k . LkmH is the transition matrix between QTL k and

marker Lm . RkmH is the transition matrix between QTL k and marker Rm .

The conditional posterior distribution of the position of QTL k , ( | ...)kp , has

no explicit form due to the complexity of the model. Therefore, k must be

sampled from a Metropolis-Hastings (HASTINGS 1970; METROPOLIS et al. 1953) algorithm. The algorithm presented by Wang et al. (2005b) for univariate QTL mapping can be directly adopted here for multivariate QTL mapping.

Finally, when a marker genotype is missing, it must be sampled from its conditional posterior distribution.

The MCMC sampling process is summarized as follows:

1. Initialize all variables, including parameters and missing values, with some values in their legal domains;

2. Sample from its conditional posterior distribution (multivariate

normal);

3. Sample k and k from their conditional posterior distributions

(multivariate normal);

4. Sample kA and kB from their conditional posterior distributions

(inverse Wishart);

5. Sample from its conditional posterior distribution (inverse Wishart);

6. Sample QTL genotypes from their conditional posterior distributions (derived from the Bayes’ theorem);

116

7. Sample genotypes of missing markers from their conditional posterior distributions (derived from the Bayes’ theorem);

8. Sample QTL positions from their conditional posterior distribution using the Metropolis-Hastings algorithm;

9. Repeat steps (1) – (8) until the Markov chain is sufficiently long. Steps (1) - (7) are called the Gibbs sampler steps while step (8) is called the M-H step.

How long is sufficiently long for the Markov chain? Users can use the algorithm of Gelman et al. (2005) to check the convergence of the chain. The product of MCMC sampling is a realized sample of all unknown variables from the joint posterior distribution. The MCMC does not result in a significance test but serves as a process of creating the empirical posterior distributions of parameters, from which all the information about the QTL is inferred. The most important parameters are the locations and the effects of the QTL, while the covariance matrices are not of immediate interest but assist in the estimation of the effects. In the conventional Bayesian mapping analysis (SILLANPÄÄ and ARJAS 1998; XU 2002), the marginal posterior distribution of QTL position was graphically summarized by plotting the number of hits by a QTL in a short region against the location where that short region occurs in the genome. The curve produced is called the QTL intensity profile. In this study, we assume that each marker interval is associated with a QTL, and thus all intervals are hit by a QTL the same number of times. If an interval contains a real QTL, the QTL intensity profile within the interval is expected to show a peak. Otherwise, the intensity profile will be flat (uniform). Such a QTL intensity profile is denoted by ( )f ,

where now denotes a particular location of the genome.

The QTL intensity profile itself is not the best indicator of the QTL location under the Bayesian shrinkage analysis. We propose to weigh the intensity profile by the QTL effects and use the weighted QTL intensity profile to indicate the locations of the QTL. The majority of the genome segments have negligible QTL effects and thus only the areas with nontrivial QTL effects will show clear peaks. Let ( ) and ( ) be 1q vectors of additive

and dominance effects, respectively, of QTL collected at position of the

genome. There are many ways to present the QTL effects as functions of genome location. However, we choose the following profile to present the QTL effects,

1 1

( ) ( ) ( ) ( ) ( )2 4

T Tg . (12.19)

The coefficients 1/ 2 and 1/ 4 in front of the quadratic terms are the

expected variances of jkX and jkZ across individuals within the 2F

population (assuming no segregation distortion). This QTL effect profile

( )g reduces to 2 21 1

2 4( ) ( ) ( )g in the special case of single trait

117

analysis, which is the QTL variance at location . If desired, one can also

draw QTL effect profile for each trait or each effect of the trait (additive or dominance). The QTL effect profile presented here is the overall effect on the entire genome.

The weighted QTL intensity profile is defined as

( ) ( ) ( )w f g . (12.20)

The intensity profile ( )f does not tell much about QTL across marker

intervals because each interval is hit by the same number of QTL, but if an interval contains a QTL, ( )f is able to show a peak within that interval. The

QTL effect profile ( )g , on the other hand, can pick up the intervals with

large effect QTL, but it is not sensitive to the change of location within an interval. Therefore, the weighted intensity profile ( )w can pick up the

intervals with QTL and also show sharp peaks within intervals.

In practice, not all traits are measured in the same scale. The profile of the overall QTL effect may be dominated by the traits with large variances. Two approaches may be taken to eliminate this problem. One is to standardize all traits before the analysis so that they all have roughly the same variance (XU 2002). Alternatively, ( )g may be modified by

1 11 1

( ) ( ) ( ) ( ) ( )2 4

T Tg . (12.21)

where is the residual covariance matrix.

Pleiotropic effects can be visualized by comparing the weighted QTL

intensity profiles for individual traits. Let 1( ) [ ( ) ... ( )]T

q be the

additive effects of QTL at location , where ( )i is the effect for the i th

trait for ( 1,...,i q ). Pleiotropic effect occurs at position if more than one

component of ( ) is noticeably large.

MULTIPLE BINARY TRAITS

With little effort, the method can be extended to handle binary traits. A binary trait is a categorical trait with two states: presence and absence.

Recall that 1[ ... ]T

j j jmy y y is a vector of phenotypic values for q

quantitative traits. If the i th trait is binary, the phenotype is denoted by

{0,1}jiw with 0 representing absence and 1 representing presence. Under

the threshold model for a binary trait (XU et al. 2005a), we propose that trait

i is still a quantitative trait, but we cannot observe jiy . This latent

quantitative trait, however, determines the observed binary phenotype. We

propose a hypothetical threshold 0it so that 0jiw for ji iy t and 1jiw

118

for ji iy t . The latent variable is still described by the usual linear model

with normal residual error except that the residual error variance is set to 1 because it is not estimable. Under this threshold model, we can derive the

conditional posterior distribution of jiy given jiw , the phenotypic values of

all other traits and the current parameter values. This conditional posterior

distribution happens to be a truncated normal distribution, from which jiy is

sampled. Detailed algorithm for sampling jiy has been given by Xu et al.

(2005a). Korsgaard et al. (2003) provided a general method for sampling the liability for ordered categorical traits. Again, their method is not for QTL

mapping but for classical quantitative trait analysis. Once jiy is sampled, jy

becomes a full vector of quantitative trait values. The MCMC sampling schemes described earlier applies. Therefore, mapping multiple traits with one or more binary trait components requires only one more step of sampling the missing phenotype of an underlying quantitative trait.

MIXTURE OF CONTINUOUS AND BINARY TRAITS

PROC QTL can also perform QTL mapping for a trait set consisting of multiple continuous and binary traits. The key here is to sample the latent

liability for the binary trait components. Recall that 1[ ... ]T

j j jmy y y is a

vector of continuous trait phenotypes. If a trait is binary, say the i th trait,

the corresponding jiy is missing and it must be sampled from a truncated

normal distribution. This truncated normal distribution is conditional on all

other components of jy except jiy . This conditional distribution is truncated

normal with the direction of truncation determined by the binary phenotype of the trait of concern.

MISSING VALUES

Missing values of a phenotype is sampled using the same approach as sampling the liability except that if the missing phenotype is continuous, it is sampled from the conditional normal distribution. If the missing trait is a binary, it is sampled from the truncated conditional normal distribution.

119

REFERENCE

BROOKS, S., and A. GELMAN, 1998 General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7: 434-455.

CASELLA, G., and E. I. GEORGE, 1992 Explaining the Gibbs sampler. Am. Stat. 46: 167-174.

CHE, X., and S. XU, 2010 Significance test and genome selection in Bayesian shrinkage analysis. Int. J. Plant Genomics (in press).

CHURCHILL, G. A., and R. W. DOERGE, 1994 Empirical threshold values for quantitative trait mapping. Genetics 138: 963-971.

DEMPSTER, A. P., M. N. LAIRD and D. B. RUBIN, 1977 Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39: 1-38.

ELSTON, R. C., and J. STEWART, 1973 The analysis of quantitative traits for simple genetic models from parental, F1 and backcross data. Genetics 73: 695-711.

FARIS, J. D., B. LADDOMADA and B. S. GILL, 1998 Molecular mapping of segregation distortion loci in Aegilops tauschii. Genetics 149: 319-327.

FEENSTRA, B., I. M. SKOVGAARD and K. W. BROMAN, 2006 Mapping quantitative trait loci by an extension of the Haley-Knott regression method using estimating equations. Genetics 173: 2269-2282.

FIGUEIREDO, M., 2003 Adaptive sparseness for supervised learning. IEEE T. Pattern. Anal. 25: 1150-1159.

FU, Y. B., and K. RITLAND, 1994 On estimating the linkage of marker genes to viability genes-controlling inbreeding depression. Theor. Appl. Genet. 88: 925-932.

GELMAN, A., 2005 Analysis of variance: why it is more important than ever. Ann. Stat. 33: 1-31.

GELMAN, A., 2006 Prior distributions for variance parameters in hierarchical models (Comment on Article by Browne and Draper). Bayesian Anal. 1: 515–534.

GELMAN, A., and D. RUBIN, 1992 Inference from iterative simulation using multiple sequences. Stat. Sci. 7: 457-472.

GEMAN, S., and D. GEMAN, 1984 Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE T. Pattern. Anal. 6: 721-741.

GEWEKE, J., J. BERGER and A. DAWID, 1992 Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments in Bayesian Statistics 4,

120

edited by J. M. BERNARDO, J. O. BERGER, A. P. DAWID and A. F. M. SMITH. Oxford Univ. Press, Oxford.

HALDANE, J. B. S., 1919 The combination of linkage values and the calculation of distances between the loci of linked factors. J. Genet. 8: 299-309.

HALEY, C. S., and S. A. KNOTT, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315-324.

HAN, L., and S. XU, 2008 A Fisher scoring algorithm for the weighted regression method of QTL mapping. Heredity 101: 453-464.

HANSEN, M. M., E. E. NIELSEN and K.-L. D. MENSBERG, 1997 The problem of sampling families rather than populations: relatedness among individuals in samples of juvenile brown trout Salmo trutta L. Mol. Ecol. 6: 469-474.

HASTINGS, W., 1970 Monte Carlo sampling methods using Markov chains and their applications. Biometrika: 97-109.

HOERL, A., and R. KENNARD, 2000 Ridge regression: Biased estimation for nonorthogonal problems. Technometrics: 80-86.

JIANG, C., and Z. B. ZENG, 1997 Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101: 47-58.

KAO, C. H., 2000 On the differences between maximum likelihood and regression interval mapping in the analysis of quantitative trait loci. Genetics 156: 855-865.

KAO, C. H., Z. B. ZENG and R. D. TEASDALE, 1999 Multiple interval mapping for quantitative trait loci. Genetics 152: 1203-1216.

KÄRKKÄINEN, K., V. KOSKI and O. SAVOLAINEN, 1996 Geographical variation in the inbreeding depression of scots pine. Evolution 50: 111-119.

KNOTT, S. A., and C. S. HALEY, 2000 Multitrait least squares for quantitative trait loci detection. Genetics 156: 899-911.

KORSGAARD, I., M. LUND, D. SORENSEN, D. GIANOLA, P. MADSEN et al., 2003 Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling. Genet. Sel. Evol. 35: 159 - 183.

KOSAMBI, D. D., 1944 The estimation of map distances from recombination values. Ann. Eugenics 12: 172-175.

LANDER, E. S., and D. BOTSTEIN, 1989 Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185-199.

LANDER, E. S., P. GREEN, J. ABRAHAMSON, A. BARLOW, M. J. DALY et al., 1987 MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1: 174-181.

121

LANDER, E. S., and N. J. SCHORK, 2006 Genetic dissection of complex traits. Focus 4: 442-458.

LINDLEY, D., and A. SMITH, 1972 Bayes estimates for the linear model. J. Roy. Stat. Soc. B 34: 1-41.

LORIEUX, M., B. GOFFINET, X. PERRIER, D. G. LEÓN and C. LANAUD, 1995a Maximum-likelihood models for mapping genetic markers showing segregation distortion. 1. Backcross populations. Theor. Appl. Genet. 90: 73-80.

LORIEUX, M., X. PERRIER, B. GOFFINET, C. LANAUD and D. LEÓN, 1995b Maximum-likelihood models for mapping genetic markers showing segregation distortion. 2. F2 populations. Theor. Appl. Genet. 90: 81-89.

LOUIS, T., 1982 Finding the observed information matrix when using the EM algorithm. J. Roy. Stat. Soc. B 44: 226-233.

LUO, L., and S. XU, 2003 Mapping viability loci using molecular markers. Heredity 90: 459-467.

LUO, L., Y. M. ZHANG and S. XU, 2005 A quantitative genetics model for viability selection. Heredity 94: 347-355.

MCCULLAGH, P., and J. NELDER, 1989 Generalized linear models. London: Chapmann and Hall.

METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER and E. TELLER, 1953 Equation of state calculations by fast computing machines. J. Chem. Phys. 21: 1087-1092.

NELDER, J. A., and R. MEAD, 1965 A simplex method for function minimization. Comput. J. 7: 308-313.

NELDER, J. A., and R. W. M. WEDDERBURN, 1972 Generalized linear models. J. Roy. Stat. Soc. A Sta. 135: 370-384.

PIEPHO, H. P., 2001 A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157: 425-432.

PRITCHARD, J. K., M. STEPHENS, N. A. ROSENBERG and P. DONNELLY, 2000 Association mapping in structured populations. Am. J. Hum. Genet. 67: 170-181.

RAFTERY, A., and S. LEWIS, 1992 How many iterations in the Gibbs sampler. Bayesian Stat. 4: 763-773.

RAO, S., and S. XU, 1998 Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity 81: 214-224.

SILLANPÄÄ, M. J., and E. ARJAS, 1998 Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148: 1373-1388.

122

SOLLER, M., T. BRODY and A. GENIZI, 1976 On the power of experimental designs for the detection of linkage between marker loci and quantitative loci in crosses between inbred lines. Theor. Appl. Genet. 47: 35-39.

TANKSLEY, S. D., 1993 Mapping polygenes. Annu. Rev. Genet. 27: 205-233.

TER BRAAK, C. J. F., M. P. BOER and M. C. A. M. BINK, 2005 Extending Xu's Bayesian model for estimating polygenic effects using markers of the entire genome. Genetics 170: 1435-1438.

TIERNEY, L., 1994 Markov chains for exploring posterior distributions. Ann. Stat. 22: 1701-1728.

TURNPENNY, P., and S. ELLARD, 2005 Emery's elements of medical genetics. Elsevier, Churchill Livingstone.

TYCHONOFF, A. N., 1943 On the stability of inverse problems. Dok. Akad. Nauk SSSR 39: 195-198.

VOGL, C., and S. Z. XU, 2000 Multipoint mapping of viability and segregation distorting loci using molecular markers. Genetics 155: 1439-1447.

WALD, A., 1957 Tests of statistical hypotheses concerning several parameters when the number of observations is large. Selected papers in statistics and probability 54: 323.

WANG, C., C. ZHU, H. ZHAI and J. WAN, 2005a Mapping segregation distortion loci and quantitative trait loci for spikelet sterility in rice ( Oryza sativa L.). Genet. Res. 86: 97-106.

WANG, H., Y. ZHANG, X. LI, G. L. MASINDE, S. MOHAN et al., 2005b Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465-480.

WEDDERBURN, R. W. M., 1974 Generalized linear models specified in terms of constraints. J. Roy. Stat. Soc. B 36: 449-454.

WILKS, S. S., 1938 The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9: 60-62.

WOLFINGER, R., and M. O'CONNELL, 1993 Generalized linear mixed models: a pseudo-likelihood approach. J. Stat. Comput. Sim. 48: 233-243.

WRIGHT, S., 1934 An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19: 506-536.

XU, C., Z. LI and S. XU, 2005a Joint mapping of quantitative trait loci for multiple binary characters. Genetics 169: 1045-1059.

XU, C., Y. ZHANG and S. XU, 2005b An EM algorithm for mapping quantitative resistance loci. Heredity 94: 119-128.

123

XU, S., 1995 A Comment on the Simple Regression Method for Interval Mapping. Genetics 141: 1657-1659.

XU, S., 1996 Mapping quantitative trait loci using four-way crosses. Genet. Res. 68: 175-181.

XU, S., 1998a Further investigation on the regression method of mapping quantitative trait loci. Heredity 80: 364-373.

XU, S., 1998b Iteratively reweighted least squares mapping of quantitative trait loci. Behav. Genet. 28: 341-355.

XU, S., 2002 QTL analysis in plants, pp. 283-310 in Quantitative Trait Loci: Methods and Protocols, edited by N. J. CAMP and A. COX.

XU, S., 2003 Estimating polygenic effects using markers of the entire genome. Genetics 163: 789-801.

XU, S., 2007a Derivation of the shrinkage estimates of quantitative trait locus effects. Genetics 177: 1255-1258.

XU, S., 2007b An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63: 513-521.

XU, S., 2008 Quantitative trait locus mapping can benefit from segregation distortion. Genetics 180: 2201-2208.

XU, S., and W. R. ATCHLEY, 1996 Mapping Quantitative Trait Loci for Complex Binary Diseases Using Line Crosses. Genetics 143: 1417-1424.

XU, S., and Z. HU, 2010a Generalized linear model for interval mapping of quantitative trait loci. Theor. Appl. Genet.: doi: 10.1007/s00122-00010-01290-00120.

XU, S., and Z. HU, 2010b Mapping quantitative trait loci using distorted markers. Int. J. Plant Genomics 2009: 11 doi:10.1155/2009/410825.

YANG, R., Q. TIAN and S. XU, 2006 Mapping quantitative trait loci for longitudinal traits in line crosses. Genetics 173: 2339-2356.

YI, N., and S. XU, 2002 Linkage analysis of quantitative trait loci in multiple line crosses. Genetica 114: 217-230.

YI, N., S. XU and D. B. ALLISON, 2003 Bayesian model choice and search strategies for mapping interacting quantitative trait Loci. Genetics 165: 867-883.

ZHU, C., and Y. M. ZHANG, 2007 An EM algorithm for mapping segregation distortion loci. BMC Genet. 8: 82.

Date post:	01-Dec-2014
Category:	Documents
Upload:	luis
View:	203 times
Download:	5 times

Principles and Procedures of QTL Mapping

Documents