+ All Categories
Home > Documents > ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of...

ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of...

Date post: 21-Feb-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
32
Annu. Rev. Genet. 2002. 36:721–50 doi: 10.1146/annurev.genet.36 050802.093940 Copyright c 2002 by Annual Reviews. All rights reserved First published online as a Review in Advance on September 12, 2002 ESTIMATING F-STATISTICS B. S. Weir Program in Statistical Genetics, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-7566 W. G. Hill Institute for Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, United Kingdom Key Words population structure, forensic profiles, inbreeding, relatedness Abstract A moment estimator of θ , the coancestry coefficient for alleles within a population, was described by Weir & Cockerham in 1984 (100) and is still widely cited. The estimate is used by population geneticists to characterize population struc- ture, by ecologists to estimate migration rates, by animal breeders to describe genetic variation, and by forensic scientists to quantify the strength of matching DNA profiles. This review extends the work of Weir & Cockerham by allowing different levels of coancestry for different populations, and by allowing non-zero coancestries between pairs of populations. All estimates are relative to the average value of θ between pairs of populations. Moment estimates for within- and between-population θ values are likely to have large sampling variances, although these may be reduced by combining information over loci. Variances also decrease with the numbers of alleles at a locus, and with the numbers of populations sampled. This review also extends the work of Weir & Cockerham by employing maximum likelihood methods under the assump- tion that allele frequencies follow the normal distribution over populations. For the case of equal θ values within populations and zero θ values between populations, the maximum likelihood estimate is the same as that given by Robertson & Hill in 1984 (70). The review concludes by relating functions of θ values to times of population divergence under a pure drift model. CONTENTS INTRODUCTION ..................................................... 722 LITERATURE REVIEW ............................................... 723 Estimation Strategies ................................................. 723 Non-Frequency Measures ............................................. 724 Estimation of Migration Rates .......................................... 724 Allocation of Individuals to Populations .................................. 724 Forensic Applications ................................................ 725 ESTIMATION OF θ ................................................... 725 MOMENT ESTIMATES ................................................ 726 Overall Estimates .................................................... 727 0066-4197/02/1215-0721$14.00 721 Annu. Rev. Genet. 2002.36:721-750. Downloaded from www.annualreviews.org by University of Lausanne on 06/15/12. For personal use only.
Transcript
Page 1: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC10.1146/annurev.genet.36 050802.093940

Annu. Rev. Genet. 2002. 36:721–50doi: 10.1146/annurev.genet.36 050802.093940

Copyright c© 2002 by Annual Reviews. All rights reservedFirst published online as a Review in Advance on September 12, 2002

ESTIMATING F-STATISTICS

B. S. WeirProgram in Statistical Genetics, Department of Statistics, North Carolina StateUniversity, Raleigh, North Carolina 27695-7566

W. G. HillInstitute for Cell, Animal and Population Biology, University of Edinburgh, EdinburghEH9 3JT, United Kingdom

Key Words population structure, forensic profiles, inbreeding, relatedness

■ Abstract A moment estimator ofθ , the coancestry coefficient for alleles withina population, was described by Weir & Cockerham in 1984 (100) and is still widelycited. The estimate is used by population geneticists to characterize population struc-ture, by ecologists to estimate migration rates, by animal breeders to describe geneticvariation, and by forensic scientists to quantify the strength of matching DNA profiles.This review extends the work of Weir & Cockerham by allowing different levels ofcoancestry for different populations, and by allowing non-zero coancestries betweenpairs of populations. All estimates are relative to the average value ofθ between pairsof populations. Moment estimates for within- and between-populationθ values arelikely to have large sampling variances, although these may be reduced by combininginformation over loci. Variances also decrease with the numbers of alleles at a locus,and with the numbers of populations sampled. This review also extends the work ofWeir & Cockerham by employing maximum likelihood methods under the assump-tion that allele frequencies follow the normal distribution over populations. For thecase of equalθ values within populations and zeroθ values between populations, themaximum likelihood estimate is the same as that given by Robertson & Hill in 1984(70). The review concludes by relating functions ofθ values to times of populationdivergence under a pure drift model.

CONTENTS

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722LITERATURE REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723

Estimation Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723Non-Frequency Measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724Estimation of Migration Rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724Allocation of Individuals to Populations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724Forensic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725

ESTIMATION OF θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725MOMENT ESTIMATES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726

Overall Estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727

0066-4197/02/1215-0721$14.00 721

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 2: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

722 WEIR ¥ HILL

Properties of Moment Estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730Population-Specific Estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730

NORMAL THEORY APPROACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735Overall Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735Population-Specific Estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737

NUMERICAL RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744

INTRODUCTION

In 1984, Weir & Cockerham (100) published a set of equations for estimating theparameterFST or θ that describes the genetic structure of populations. The paperis still widely cited; in the first three months of 2002 the methods it described wereapplied to data on ash trees (59), Barbus (86), barley (42), barnacle (22), butterfly(18), cherry (54), cod (44), cord grass (85), Drosophila (32), eelgrass (64), frog (84),housefly (23), insects (58, 103), ladybird beetle (92), mackerel (11), moose (41),mountain lion (24), pig (45), pine (66, 68), quelea (19), red drum (33), redfish(72), river otter (9), rodent (14), salmon (37), scallops (67), sea trout (94), seaweed(88), shrimp (28), snail (13), stonefly (76), sugar beet (89), trout (38, 48), tsetse fly(47), wombat (7), zooplankton (34), and humans (1, 36, 53) among other species.Population biologists, ecologists and human geneticists have a substantial interestin being able to quantify the genetic relationships among their populations; it istherefore timely to re-visit the 1984 paper they cite. It may be especially useful toallow for different values ofθ in different populations.

This discussion regards population structure, or the genetic differentiation ofpopulations within the same species, as allelic frequency variation over popula-tions. The restriction to allele frequencies, as opposed to genotypic frequencies,carries an implicit assumption of Hardy-Weinberg equilibrium at the loci underconsideration. Even if two populations are maintained under the same evolution-ary conditions they will have different allele frequencies because of the stochasticnature of these forces. Different evolutionary conditions for a set of populationswill increase the differentiation among them, andθ can be defined in terms ofvariances and covariances of allele frequencies. The magnitude of these coeffi-cients therefore reflects the evolutionary history of the populations being studied,although the observed allele frequencies also reflect the sampling processes withineach population. The various approaches to estimatingθ can differ according towhether they use only expected variances and covariances of allele frequencies orthe entire frequency distributions. Use of the whole distribution may appear to bebetter, but there is an implicit constraint on the class of evolutionary scenarios ifsecond-moment parameters are assumed to completely characterize a distribution.

The emphasis on within-species variation, and the usual use of unlinked locimeans that coalescent approaches for non-recombining DNA sequences and deepevolutionary divergences [e.g., (61, 93)] are not considered.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 3: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 723

LITERATURE REVIEW

Estimation Strategies

The search for the best estimators ofθ , and the evaluation of existing estimators,continues. One way of distinguishing estimators is to consider how much of thedistribution of allele frequencies across populations is used. It is shown below thatthe variances and covariances of allele frequencies across populations depend onθ as well as on the mean frequencies. This suggests thatθ can be estimated fromjust the first and second moments of the allele frequency distribution, and thisis the essence of the method of moments used by Weir & Cockerham (100). Noparticular evolutionary model leading to specific values forθ is assumed. Othermethods assume the form of the whole distribution, which constrains applicabilityto certain evolutionary scenarios. The Dirichlet distribution used by Balding &Nichols (4) and Lange (49) assumes an evolutionary equilibrium, and is appropriateunder the infinite alleles mutation model. Strictly, it is the Multinomial-Dirichletdistribution that is needed. The Dirichlet distribution is not appropriate for thestepwise mutation model (35). It is not clear that there is an evolutionary modelfor which the normal distribution used by Smouse & Williams (81), Long (51),and Nicholson et al. (60) and employed below in this review is appropriate, but itis justified by convenience and an appeal to large sample theory.

More statistical issues were addressed by Weicker et al. (95). The estimator ofθ

described by Weir & Cockerham (100) used the actual sample sizes in each samplein order to reduce bias, and Weicker et al. showed that good approximations tothat estimator can be found that use the average sample size. These authors alsopresented confidence intervals found by bootstrapping over loci, with an implicitassumption that the number of loci is not small. Questions of both bias and variancewere covered by Raufaste & Bonhomme (62) for loci with multiple alleles. Thesimplest models assume that allele frequency distributions have the same variancesand covariances for all alleles, so thatθ could be estimated separately for each al-lele. Raufaste & Bonhomme confirmed the prediction of Weir & Cockerham (100)that their weighting was satisfactory for larger values ofθ , whereas an alternativeweighting of Robertson & Hill (70) was better for smallθ . The Robertson & Hillapproach is equivalent to the multivariate approaches (51) described below.

This review is concerned with the relationships of pairs of alleles within andbetween populations, but a further hierarchy of relationships when there are sub-populations nested within populations, sub-subpopulations nested within subpop-ulations, and so on (97, 105). The nested analysis of variance structure is a naturalframework for the analysis of that situation, and a generic definition of population-structure parameters for a hierarchy of populations was given by Rousset (75).

The growing use of Bayesian methods to population genetics is reflected by sev-eral papers that use such methods to characterize population structure (30, 39, 40,71). Allele frequencies are assumed to follow a Dirichlet distribution across pop-ulations, or a beta distribution in the case of loci with two alleles.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 4: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

724 WEIR ¥ HILL

Non-Frequency Measures

Althoughθ is defined in terms of variances of allele frequencies, there are parallelmeasures that use other parameters. The fact that mutation at microsatellite markersis generally between pairs of alleles with similar numbers of repeat units suggeststhat allele size (i.e., number of repeats) can be used in place of allele frequency(79). Balloux & Goudet (5) and Balloux & Lugin-Moulin (6) were concernedwith the case where the stepwise mutation model holds for microsatellite markers.They compared two estimators of the form

∑loci Va/

∑loci Vt where the variance

components (Va among populations andVt total) were for allele frequencies (100)or allele sizes (57). They compared the estimators for data simulated under afinite island model and concluded that neither estimator was best overall, althoughthe Weir-Cockerham estimator was better for higher levels of gene flow. Weir &Cockerham (100) pointed out that the performance of their estimator reflects themethod they used for combining information over multiple alleles at a locus, andthey predicted better behavior for higher values ofθ . It is the magnitudes of theparameter, rather than the forces leading to those values, that should affect thequality of the estimator in the multiple-alleles case.

Merila & Crnoka (56) compared estimates ofθ from various genetic markerswith an analogous quantity,QST, defined for quantitative traits (83). The esti-mate is based on the genetic variances of an additive quantitative trait,Va amongpopulations andVw within populations, and is given byVa/(Va + 2Vw). If allelefrequencies are available for the same loci that affect the quantitative trait, valuesof θ andQST should be equal.

Estimation of Migration Rates

Molecular ecologists, in particular, have been interested in inferring migration ratesfrom estimates ofθ , usually by employing the equilibrium result for the infinite-island migration model:θ = 1/(1+ 4Nm). HereN is the effective population sizeof each island andmis the migration rate between each pair of islands. Because thisis a monotonic transformation ofθ , it is not clear that much is gained over simplypresentingθ estimates, especially as real populations are unlikely to conform tothe many assumptions that lead to this result (101). Cockerham & Weir (15, 16)discussed more general relationships betweenθ andm. Kinnison et al. (46) fittedNmto estimatedθ values without assuming equilibrium. A recent review is givenby Rousset (74), and a multivariate normal approach was adopted by Tufto et al.(87). Analogous work uses estimates ofθ to estimate effective population size(8, 90).

Allocation of Individuals to Populations

Even though the genetic variation within human populations tends to be muchgreater than that among populations, there is often sufficient genetic differentia-tion among populations, as described byθ , to allow individuals to be allocated

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 5: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 725

to populations. The problem was discussed for blood-type markers by Spiel-man & Smouse (82) and Smouse & Spielman (80). More recent studies, pri-marily by forensic scientists, have used microsatellite markers (10, 25, 52, 77, 78).Cornuet et al. (17) evaluated several methods for allocating individuals by assess-ing their behavior as functions ofθ . Dawson & Belkhir (20) assessed the qualityof their Bayesian method for assigning individuals to groups within a populationby estimatingθ from the resulting grouped data.

Forensic Applications

Genetic profiles are now widely used for human identification in a forensic setting,and also for inferring relationships in cases of disputed parentage or the identifica-tion of remains. The key question generally involves determining the probabilityof a set of profiles under alternative hypotheses about the sources of those profiles.In the simplest forensic situation where the profile of a suspect matches that ofa stain found at the scene of a crime, this reduces to determining the probabilitythat an unknown person in a population has the profile given that a suspect isknown to have the profile (26). When allele frequencies are assumed to have aDirichlet distribution over populations, this probability is a function ofθ (3, 4),and forensic scientists routinely estimateθ for the populations with which theywork (4, 30, 102).

ESTIMATION OF θ

The parameterθ provides a description of the relationship between pairs of allelesin a population. It could be defined as the probability that the two alleles areidentical by descent, but this is restrictive in that its values are then constrained to liein the range [0,1]. A more general definition is in terms of correlation coefficients,and can be expressed in terms of indicator variablesxju for the jth allele in asample:

xju ={

1 allele is of typeAu

0 otherwise.

Thenθ is the correlation betweenxju andxj ′u for different alleles (j 6= j ′), wherethe underlying expectation process is over replicates of the population. This cor-relation should be written asθu to allow for selection or mutation differences fordifferent allelic types, but these differences generally are assumed not to exist.Although θ is designed to capture evolutionary variation, values of its estimatesalso reflect the sampling process leading to the data employed. Weir (97) madethe distinction between genetic and statistical sampling for these two sources ofvariation. Another way of expressing this concept is to say thatθ measures relat-edness of pairs of alleles within a population relative to the total (i.e., the expected)

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 6: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

726 WEIR ¥ HILL

population and this is why Wright (104) used the notationFST, whereSdenotessubpopulation andT denotes the total population.

Under the random mating assumption, expectations of the indicator variablesdo not depend on the particular values ofj, and

E(xju) = pu

E(x2

ju

) = pu

E(xjux j ′u) = p2u + pu(1− pu)θ, j 6= j ′,

wherepu is the population frequency of alleleAu, an expected value over replicatesof the population. The expression forE(xjux j ′u) can be taken as a definition ofθ , and clearly Var(xju) = pu(1− pu), Cov(xju, xj ′u) = pu(1− pu)θ so thatθ isindeed a correlation coefficient over replicate populations.

It may be convenient to write the expected value ofxjux j ′u as Pu,u, the prob-ability with which the two alleles are both of typeAu. However, for a populationmating by random union of gametes, this quantity is the same as the homozygotefrequencyPuu. For nonrandom mating populations, it is necessary to distinguishthe cases where the alleles are in the same or different individuals and the indi-cator variables need to be defined asxjku for the kth allele in thejth individual.Expectations are then

E(xjku) = pu

E(x2

jku

) = pu

E(xjkuxj ′k′u) ={

p2u + pu(1− pu)F, j = j ′, k 6= k′

p2u + pu(1− pu)θ j 6= j ′

,

whereF is the total inbreeding coefficient (sometimes written asFIT). ThenPuu =p2

u + pu(1− pu)F differs from Pu,u = p2u + pu(1− pu)θ .

Becauseθ refers to variation over the evolutionary process, it cannot be es-timated from a sample from a single population. Inferences made from a singlesample are for within-population parameters such as the within-population in-breeding coefficientf, or FIS. This quantity satisfiesf = (F − θ )/(1− θ ), andit describes the relationship of pairs of alleles within individuals relative to thatbetween individuals within the same population. There is generally little interest inwithin-population analogs ofθ , as the point of estimatingθ is to make inferencesabout evolutionary processes.

MOMENT ESTIMATES

With the assumption of no local inbreeding,FIS= 0, FIT=FST= θ , estimation ofθ makes use only of sample allele frequencies, although these need to be inferredfrom sample genotype frequencies. Second moments of allele frequencies can be

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 7: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 727

expressed in terms ofθ , suggesting that estimators can be constructed from samplesecond moments.

Overall Estimates

The variation described byθ is estimated in practice from allele frequency variationamong different populations, and it has been customary to regard extant populationsas providing the replicates inherent in its definition. This carries the assumptionthat each sampled population has the sameθ value, and this will now be relaxed. Todistinguish the populations sampled, an indexi is added to the indicator variablesfor theith sample. A general set of expectations for thejth allele in theith sample are

E(xi ju ) = pu

E(x2

i ju

) = pu

E(xi ju xi ′ j ′u) ={

p2u + pu(1− pu)θi i = i ′, j 6= j ′

p2u + pu(1− pu)θi i ′ i 6= i ′

.

Each population is assumed to have the same (expected) allele frequency. Weir &Cockerham (100) assumed thatθi i ′ = 0 for all i ′ 6= i . Later they relaxed thoseassumptions (15, 98).

Sample allele frequencies are denoted by tildes, and the average frequencyover samples is denoted by a bar. If there areni alleles sampled from theith of rpopulations:

piu = 1

ni

ni∑j=1

xi ju

pu = 1∑i

ni

r∑i=1

ni piu,

so that

E( piu) = pu

E( pu) = pu

Var(piu) = 1

nipu(1− pu)[1+ (ni − 1)θi ] 1.

Cov(piu, pi ′u) = pu(1− pu)θi i ′ . 2.

Subsequent developments are simplified with additional notation:

πu = pu(1− pu)

φi = 1

ni[1+ (ni − 1)]θi ]

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 8: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

728 WEIR ¥ HILL

Equations 1, 2 can be taken as defining theθ parameters and therefore can serveas a starting point. They could be derived by considering two sets of expectations,one within (W) and one among (A) populations. Ifpiu is the frequency of alleleAu in the ith population, the usual multinomial distribution gives:

EW( piu) = piu

VarW( piu) = 1ni

piu(1− piu).

}3.

Among populations, the moments are

EA(piu) = pu

VarA(piu) = pu(1− pu)θi

}4.

to introduce theθ ’s. The method of moments for estimatingθ makes no morestatements concerning the distribution of thepiu’s about pu. Balding & Nichols(3, 4) assumed a Dirichlet distribution with parameters (1− θi )pu/θi for Au whichalso gives Equations 4, as does the normal distributionN(pu, πuθi ) assumed byNicholson et al. (60). Combining Equations 3 and 4 leads to Equations 1 and2, emphasizing that expectations in such equations are total (within and amongpopulations). Foulley & Hill (31) contrasted the use of the normal and Dirichletdistributions.

When it is assumed thatθi = θ for all i andθi i ′ = 0 for all i 6= i ′, Weir &Cockerham (100) note that there are two unknown quantities,πu andθ , and definetwo mean squares. In the notation of Weir (97):

MSPu = 1

r − 1

r∑i=1

ni ( piu − pu)2

MSGu = 1r∑

i=1(ni − 1)

r∑i=1

ni piu(1− piu).

The average allele frequencypu includes sample size weights. An alternative is touse an unweighted averagep∗u =

∑ri=1 piu/r . Estimates based onpu or p∗u will

be better whenθ or (1− θ )/ni , respectively, are larger. Following Robertson (69),a weighted estimate could be obtained from the two.

Under the general model, the mean squares have expected values

E(MSPu) = πu

r − 1

[r∑

i=1

nicφi − 1∑ri=1 ni

r∑i,i ′=1i 6=i ′

ni ni ′θi i ′

]

E(MSGu) = πur∑

i=1(ni = 1)

(r∑

i=1

ni −r∑

i=1

niφi

),

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 9: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 729

wherenic = ni − n2i /∑r

i=1 ni . There are two special cases that lead to simplifica-tion.

In the special case thatθi = θ for all i andθi i ′ = 0 for all i 6= i ′,

E(MSPu) = πu[(1− θ )+ ncθ ]

E(MSGu) = πu(1− θ ),

where

nc = 1

r − 1

(r∑

i=1

ni −∑r

i=1 n2i∑r

i=1 ni

)= 1

r − 1

r∑i=1

nic.

This led Weir & Cockerham (100) to their moment estimator ofθ :

θMu = MSPu −MSGu

MSPu + (nc − 1)MSGu.

To the extent that the expected value of this quantity is the ratio of expectations ofits numerator and denominator, it is unbiased forθ .

In the special case of balanced data,ni = n for all i,

E(MSPu) = πu[(1− θw)+ n(θw − θa)]

E(MSGu) = πu(1− θw),

where

θw = 1

r

r∑i=1

θi

θa = 1

r (r − 1)

r∑i,i ′=1i 6=i ′

θi i ′ ,

so that the moment estimate, now written asβ, is providing an estimate of (θw −θa)/(1 − θa). This result should also hold if all of the sample sizes are largeand approximately equal. In general, however, the usual moment estimate is of acomplex function of theθi ’s andθi i ′ ’s. Alternative statistics lead to estimates ofweighted averages ofθi ’s andθi i ′ ’s, as shown below.

Under the assumption that the same value ofθ applies to each allele at a lo-cus, Weir & Cockerham (100) combined information over alleles by summingnumerator and denominator separately

θM =

m∑u=1

(MSPu −MSGu)

m∑u=1

[MSPu + (nc − 1)MSGu], 5.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 10: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

730 WEIR ¥ HILL

and they found by simulation that this method of weighting over alleles generallyprovides low bias and variance. No explicit account is taken of the correlationamong frequencies of different alleles. If data are collected from a series ofL loci,and ifθ is assumed to apply equally to each locus, then an obvious extension is toadd mean squares over loci:

θM =

L∑l=1

ml∑u=1

(MSPlu −MSGlu)

L∑l=1

ml∑u=1

[MSPlu + (nc − 1)MSGlu ]

.

Properties of Moment Estimate

Because of the difficulty in describing the properties of ratio estimates, Dodds(21) and Weir (97) suggested numerical resampling for obtaining the samplingdistribution of θM . Resampling over populations would change the structure ofthe data, but resampling over loci would exploit the assumption that (unlinked)loci provide independent replicates of the evolutionary process. Resampling wasalso used by Raymond & Rousset (63). Jiang (43) used a Taylor series expansionand approximate higher-order moments of sample allele frequencies to obtain themean and variance ofθM . Li (50) appealed to asymptotic theory to show that themean square MSPu has a chi-square distribution in the two-allele case,

MSPu ∼ πu[1+ (nc − 1)θ ]χ2(r−1),

and that the mean square MSGu tends to a constant value ofπu(1 − θ ). Thisassumes that theθi ’s are equal and that theθi i ′ ’s are zero. These results allowedher to derive expressions for the mean and variance ofθ :

E(θM ) = θ − 2(1− θ )

r − 1

(1+ (nc − 1)θ

nc

)2

Var(θM ) = 2(1− θ )2

r − 1

(1+ (nc − 1)θ

nc

)2

.

The variance formula differs slightly from the variance of the intraclass correlationgiven by Fisher (29), but is equal to that result for large sample sizes.

Population-Specific Estimates

If independent populations have different values ofθi , maybe reflecting the dif-ferences in population size or differences in environmental influences, there is thedanger of having an over-parameterized model. There arer independent sampleallele frequenciespiu for allele Au. In the two-allele case, this meansr observa-tions but (r + 1) parameters: the frequencypu and ther values ofθi . It is possibleto construct estimates, but they will not be unique. Form> 2 alleles at a locus,however, there are more [r(m− 1)] independent sample allele frequencies than

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 11: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 731

there are parameters: (m− 1) parameterspu plus r parametersθi . Similarly, forL > 1 diallelic loci, there are more observations (rL allele frequencies) than thereare parameters (L allele frequencies andr θ ’s). The following discussion assumesthat there are at least as many allele frequencies in the data as there are parametersto be estimated.

If the terms in the mean square within populations are weighted bynic insteadof ni , the sums of squares corresponding to MSP and MSG have expectations

E[

r∑i=1

ni ( piu − pu)2

]= πu

[r∑

i=1

nicφi − 1∑ri=1 ni

r∑i,i ′=1i 6=i ′

ni ni ′θi i ′

]

E[

r∑i=1

nic piu(1− piu)

]= πu

[r∑

i=1

nic −r∑

i=1

nicφi

],

suggesting that, for independent populations (θi i ′ = 0),πu can be estimated as

πu =

r∑i=1

ni ( piu − pu)2+r∑

i=1nic piu(1− piu)

r∑i=1

nic

.

Therefore, from the relationship

E[

m∑u=1

piu(1− piu)

]=(

m∑u=1

πu

)(1− φi ),

a moment estimate ofφi for independent populations is

φi = 1−

(r∑

i=1

nic

)m∑

u=1

piu(1− piu)

m∑u=1

r∑i=1

[ni ( piu − pu)2+ nic piu(1− piu)

] . 6.

The estimate of the mean of theφi ’s is

ˆφ = 1−

(r∑

i=1

nic

)m∑

u=1

r∑i=1

piu(1− piu)

rm∑

u=1

r∑i=1

[ni ( piu − pu)2+ nic piu(1− piu)

] .

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 12: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

732 WEIR ¥ HILL

When the sample sizes are equal,ni = n for all i,

φi = 1−

m∑u=1

piu(1− piu)

m∑u=1

[1

r − 1

r∑i=1

( piu − pu)2+ 1

r

r∑i=1

piu(1− piu)

] .Further, when the numberr of samples is large

φi ≈ 1−

m∑u=1

piu(1− piu)

m∑u=1

pu(1− pu)

ˆφ ≈

m∑u=1

r∑i=1

( piu − pu)2

rm∑

u=1

pu(1− pu)

.

For each independent locus indexed byl = 1, 2, . . . L, the estimate ofφi maybe written as 1− xli /yl where

xli =m∑

u=1

pliu (1− pliu )

yl = 1r∑

i=1

nlic

m∑u=1

r∑i=1

[nli ( pliu − plu)2+ nlic pliu (1− pliu )

],

showing the addition of locus subscripts on sample sizes and allele frequencies.These terms have expectations

E(xli ) = (1− φi )ml∑

u=1

πlu

E(yl ) =ml∑

u=1

πlu .

Information from loci with the same values ofφi can be combined as for the earlierWeir & Cockerham estimator (100):φi = 1− (

∑l xli )/(

∑l yl ). The sampling

distribution of this combined estimate may be found by bootstrapping over loci ifL is not small.

Nicholson et al. (60) were especially interested in SNP loci, which generallyhave only two alleles. In that case, the two summands in the sums over allelesu

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 13: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 733

are the same and only one needs to be used. Ifpi is the frequency of one of thealleles at a locus, the equal sample size estimate is

φi = 1− pi (1− pi )

1

r − 1

r∑i=1

( pi − p)2+1

r

r∑i=1

pi (1− pi )

,

and, for a large number of samples,

φi ≈ 1− pi (1− pi )

p(1− p).

Averaging over samples recovers the “classical” estimate (27)

ˆφ ≈

r∑i=1

( pi − p)2

r p(1− p).

Care is needed in interpreting the values of the estimatesφi , as differences mayreflect differences among the sample sizesni or among the coefficientsθi , or both.

When the populations are not independent,θi i ′ 6= 0, the estimate ofφi shownin Equation 6 is actually estimating (φi − θA)/(1− θA), where

θA =

r∑i,i ′=1i 6=i ′

ni ni ′θi i ′

r∑i,i ′=1,i 6=i ′

ni ni ′

.

The weighted averageθA reduces to the simple arithmetic mean,θa, of theθi i ′ ’swhen the sample sizes are equal. An estimate ofβi i ′ = (θi i ′ −θA)/(1−θA) is givenby

βi i ′ = θi i ′ − θA

1− θA= 1−

(r∑

i=1

nic

)m∑

u=1

[piu(1− pi ′u)+ pi ′u(1− piu)

]2

m∑u=1

r∑i=1

[ni ( piu − pu)2+ nic piu(1− piu)

] . 7.

where= denotes “is estimated by.” These estimates sum to zero. In the case of onlytwo samples, this estimate is zero as required. The corresponding single-populationequation is

βi = θi − θA

1− θA= 1−

( r∑i=1

nic

) m∑u=1

ni

ni − 1piu(1− piu)

m∑u=1

r∑i=1

[ni ( piu − pu)2+ nic piu(1− piu)]

. 8.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 14: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

734 WEIR ¥ HILL

This is to replace Equation 6, although the difference between them is trivial forlarge sample sizes.

By analogy toθA, the weighted averageθW can be defined as

θW =

r∑i=1

ni θi

r∑i=1

ni

,

which reduces to the simple arithmetic average,θw, when the sample sizes areequal. The quantityβW = (θW − θA)/(1− θA) can be estimated as

βW = 1−

(r∑

i=1

nic

)m∑

u=1

n2i

ni − 1piu(1− piu)(

r∑i=1

ni

)m∑

u=1

r∑i=1

[ni ( piu − pu)2+ nic piu(1− piu)

] . 9.

For equal sample sizes this reduces to the estimator in Equation 5 given by Weir &Cockerham (100). Because it serves as an estimator in the case of unequal samplesizes, however, it may be preferred to the Weir & Cockerham estimator.

There are two unsatisfactory aspects of this development. In the first place,it is seen that the quantities being estimated depend on the sample sizes, unlessthose sizes are equal. A more serious problem is the involvement of the averagebetween-population relatedness quantityθA. Unless there are grounds for assumingthis quantity is zero, all estimates are relative to that value. This does not preventa comparison among the values ofθi or θi i ′ , but it does prevent their absolutevalue being estimated. There is the same need for a reference population wheninbreeding coefficientsFIT are to be estimated. The issue is similar to that facedin the reconstruction of phylogenetic trees. Trees cannot be rooted unless there isinformation from an outgroup.

Finally, for large numbers of large samples,

θi − θA

1− θA= 1−

m∑u=1

piu(1− piu)

m∑u=1

pu(1− pu)

10.

θi i ′ − θA

1− θA= 1−

m∑u=1

[ piu(1− pi ′u)+ pi ′u(1− piu)]

2m∑

u=1

pu(1− pu)

. 11.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 15: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 735

NORMAL THEORY APPROACH

Moment estimators have the property of being unbiased but little else is knownabout their sampling properties. If the sampling distribution for the data is known,then likelihood methods can be employed. If individuals, and hence genotypes,are sampled randomly from a single population their counts follow a multinomialdistribution among samples from the same population. When there is random unionof gametes in the population, allele counts are also multinomially distributed oversamples from the population. For large samples, the multinomial distribution can beapproximated by the multivariate normal distribution, and it will now be assumedthat the normal distribution applies also across populations. Normality has alsobeen assumed by previous authors (51, 60, 81, 87). IfP is the vector of sampleallele frequencies:

P ∼ MVN(P,V),

where

P=

p1

p2

· · ·pr

, P=

p

p

· · ·p

, V =

V11 V12 · · · V1r

V21 V22 · · · V2r

· · · · · · · · · · · ·Vr 1 Vr 2 · · · Vrr

.

The vectorspi andp have (m− 1) componentspiu and pu, one for each of (m−1) of the alleles at the locus. The (m− 1) × (m− 1) matricesV i i ′ have elementsVii ′uu′ . Wheni = i′ andu = u′ these elements are the variances ofpiu , otherwisethey are the covariances ofpiu and pi ′u′ . Their values are:

Vii ′uu′ =

pu(1− pu)φi i = i ′, u = u′

−pu pu′φi i = i ′, u 6= u′

pu(1− pu)θi i ′ i 6= i ′, u = u′

−pu pu′θi i ′ i 6= i ′, u 6= u′.

Overall Estimate

If there is no relationship among alleles from different populations,θi i ′ = 0, thenthe vectorspi are independent. These vectors also have the same expected value,but they have the same variances only if theφi values are the same. Unless thesample sizes are very large, this requires not only equalθi values, but also equalsample sizesni . Suppose now thatφi = φ, becauseθi = θ and because theni ’sare either equal or so large that they are approximately equal. The sample allelefrequency vectorspi are then independently and identically distributed and, from

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 16: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

736 WEIR ¥ HILL

standard theory, the quadratic form

Q =r∑

i=1

(pi − p)′V−1i i (pi − p)

= 1

φ

r∑i=1

m∑u=1

( piu − pu)2

pu

has a chi-square distribution

Q ∼ φχ2(r−1)(m−1).

The mean allele frequencies arepu =∑r

i=1 ni piu/∑r

i=1 ni as before, and theestimate of the common valueθ is

θN = 1

n− 1

(n

(r − 1)(m− 1)

r∑i=1

m∑u=1

( piu − pu)2

pu− 1

)12.

when the sample sizes are equal, or

θN = 1

(r − 1)(m− 1)

r∑i=1

m∑u=1

( piu − pu)2

pu13.

when the sample sizes are large (70). If data are available fromL independentloci, the lth of which hasml alleles, the sum over loci of the quadratic forms hasa chi-square distribution withd = (r − 1)

∑Ll=1(ml − 1) df, and the estimates are

simply averaged over loci.From the properties of the chi-square distribution

E(θN) = θ

Var(θN) = 2[1+ (n− 1)θ ]2

(n− 1)2d≈ 2θ2

d.

Similar expressions were given by Foulley & Hill (31).The chi-square distribution also provides confidence intervals. For example, if

X0.025 andX0.975 are the 2.5th and 97.5th percentiles of theχ2d distribution, a 95%

confidence interval is(d

X0.975

[θN + 1

n− 1

]− 1

n− 1,

d

X0.025

[θN + 1

n− 1

]− 1

n− 1

)for equal sample sizes, and (

dθN

X0.975,

dθN

X0.025

)for large sample sizes.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 17: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 737

Population-Specific Estimates

When the populations are independent,θi i ′ = 0 for all i 6= i ′, but with differentvalues ofθi , the variance matrixV can be written as a Kronecker product:

V = Π⊗Φ,

where

Π =

p1(1− p1) −p1 p2 · · ·−p1 p2 p2(1− p2) · · ·· · · · · · · · ·

Φ =

φ1 0 · · ·0 φ2 · · ·· · · · · · · · ·

.If there arer samples andm alleles at the locus,V has determinant

|V| =(

r∏i=1

φi

)m( m∏u=1

pi

)r

and inverse

V−1 = Φ−1⊗Π−1,

where

Π−1 =

1p1+ 1

pm

1pm

· · ·1pm

1p2+ 1

pm· · ·

· · · · · · · · ·

Φ−1 =

1φ1

0 · · ·0 1

φ2· · ·

· · · · · · · · ·

.Ignoring terms that do not include the parameters of interest in likelihood

expressions, the log-likelihood function is

ln L = −1

2ln(|V|)− 1

2(P− P)′V−1(P− P)

= −m

2

r∑i=1

ln(φi )− r

2

m∑u=1

ln(pu)− 1

2

r∑i=1

m∑u=1

( piu − pu)2

φi pu.

Because thepu’s sum to one, it is necessary to add a Lagrangian term beforemaximizing this function in order to find the maximum likelihood estimates of the

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 18: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

738 WEIR ¥ HILL

pu’s andφi ’s. The modified function and its derivatives are

ln L = −m

2

r∑i=1

ln(φi )− r

2

m∑u=1

ln(pu)− 1

2

r∑i=1

m∑u=1

p2iu

φi pu

+ 1

2

r∑i=1

1

φi+ λ

(m∑

u=1

pu − 1

)

∂ ln L

∂φi= − m

2φi− 1

2

m∑u=1

p2iu

φ2i pu− 1

2φ2i

∂ ln L

∂pu= − r

2pu+ 1

2

r∑i=1

p2iu

φi p2u

+ λ

∂ ln L

∂λ=

m∑u=1

pu − 1.

Setting the derivatives to zero provides equations that need to be solved numeri-cally. One approach would be to iterate

φi = 1

m

m∑u=1

( piu − pu)2

pu14.

pu =

r∑i=1

(1− p2

iu

φi pu

)m∑

u=1

r∑i=1

(1− p2

iu

φi pu

) .Theθi ’s are then recovered from theφi ’s.

In the special case of equalφi ’s (which implies equal sample sizes as well asequalθi ’s), the log-likelihood becomes

ln L = −rm

2ln(φ)− r

2

m∑u=1

ln(pu)− 1

r∑i=1

m∑u=1

p2iu

pu+ r

2φ+ λ

(m∑

u=1

pu − 1

)This leads to the iterative equations

φ = 1

rm

r∑i=1

m∑u=1

( piu − pu)2

pu

pu =

r∑i=1

(1− p2

iu

φpu

)m∑

u=1

r∑i=1

(1− p2

iu

φpu

) .

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 19: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 739

A comparison with the estimate ofθ in Equations 12 and 13 emphasizes thatthe maximum likelihood estimates of allele frequencies are not the sample allelefrequencies (see Appendix), although the two will be equal for largem andr. Itappears to be satisfactory in practice (simulation results not shown) to replacepu

in the estimates ofφi andφ by the sample average valuespu and change themdivisor to (m− 1):

θ i N = 1

n− 1

(rn

(r − 1)(m− 1)

m∑u=1

( piu − pu)2

pu− 1

). 15.

Averaging the estimates from Equation 15 over samples gives the estimate inEquation 12 and there is a corresponding simplification for large sample sizesn.This approximation requires independent populations.

The advantage of the likelihood approach is that hypotheses about theφi ’s canbe tested. The hypothesisH0 : φi = φ can be tested by comparing the likelihoodsmaximized under no constraint and under the constraint of the hypothesis.

NUMERICAL RESULTS

The moment estimators discussed here were applied to the simple case of threepopulations having the tree structure shown in Figure 1. Data were simulatedassuming a pure drift model, and means and standard deviations of estimates from1000 replicates are shown in Table 1. The simulation was for a single locus withm = 5 alleles, all equally frequent initially. Populationi = 0, of size 500 alleles,resulted from 5 generations of random mating. Populationi = 3 was of size300 alleles, andt1 + t2 was 20 generations. Populationi = 4, of 500 alleles,resulted fromt2 = 10 generations of random mating from populationi = 0.

Figure 1 Three-population tree.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 20: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

740 WEIR ¥ HILL

TABLE 1 Moment estimates, using Equations 7 and 8, for populations inFigure 1. (Parameter values given in text.)

Populations

1 2 3 1&2 1&3 2&3 βW

θ parameter .210 .053 .076 .032 .010 .010

β parameter∗ .196 .036 .060 .015 −.007 −.007 .097

β estimate .195 .033 .060 .017 −.008 −.008 .096

SD of estimate .130 .047 .066 .049 .046 .037 .052

*β = (θ − θA)/(1− θA)

Populationsi = 1 andi = 2, of 50 and 500 alleles, respectively, resulted fromt1 = 10 generations of random mating after populationi = 4. All sample sizeswereni = 100, i = 1, 2, 3.

The moment methods were then applied to data made publicly available by theFBI (12). Three samples, each of about 200 people, were collected from the UnitedStates and typed at 13 microsatellite markers, the “CODIS” set of loci. Sampleproperties for these loci are shown in Table 2: the locus name, the number of allelesml and the adjusted sample size termsnlc for thelth locus. Estimates of the within-population coancestriesθi are shown in Table 3, and of the between-populationcoancestriesθi i ′ in Table 4.

TABLE 2 Sample properties of FBI data (12)

Heterozygosity

Locus No. Alleles Sample size AA CA HI

D3S135 10 414.6 .763 .795 .719

vWA 10 385.5 .809 .811 .769

FGA 22 385.5 .863 .860 .878

D8S117 13 385.5 .778 .797 .792

D21S11 21 384.8 .861 .853 .811

D18S51 17 385.5 .873 .876 .875

D5S818 10 384.9 .739 .682 .718

D13S31 9 384.8 .688 .771 .827

D7S820 10 414.6 .782 .806 .772

CSF1PO 11 414.6 .781 .734 .707

TPOX 11 414.0 .763 .621 .607

THO1 8 414.6 .727 .783 .757

D16S53 8 412.6 .798 .767 .771

AA: African American, CA: Caucasian, HI: Hispanic.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 21: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 741

TABLE 3 Single-population estimates, from Equation 8,for FBI data (12)

βi

Locus AA CA HI Average βW

D3S135 .010 −.030 .069 .017 .019

vWA .000 −.002 .050 .017 .019

FGA .007 .012 −.008 .003 .006

D8S117 .026 .003 .009 .012 .015

D21S11 −.012 −.003 .047 .012 .014

D18S51 .011 .008 .010 .010 .012

D5S818 −.018 .061 .012 .019 .021

D13S31 .132 .028 −.042 .036 .040

D7S820 .014 −.016 .026 .008 .011

CSF1PO −.048 .015 .051 .006 .008

TPOX −.118 .090 .112 .027 .030

THO1 .078 .008 .041 .043 .045

D16S53 −.011 .028 .024 .014 .016

All loci .010 .017 .032 .020 .020

AA: African American, CA: Caucasian, HI: Hispanic.

The development based on normal theory shown above suggests that samplevariances decrease with the number of alleles per locus, the number of loci, andthe number of samples. The simulation results shown in Table 1 show rather largestandard deviations for the case of only three samples, and this may account forthe very large variation among loci for the results in Tables 3 and 4. Of course itmay also be that the different loci are not providing replicates of the same evolu-tionary history. Loci may have been subjected to different selection pressures, forexample, and variation amongθ values has been suggested as a means of detectingselection, as recently reviewed by Vitalis et al. (91) and applied by Marshall &Ritland (55). If loci can be regarded as providing replication of the same process,however, then averaging over loci is appropriate. The variation among loci is muchreduced when the three population-specific estimates are averaged, or when onlya common value is estimated.

DISCUSSION

This review has extended Weir & Cockerham (100) in two directions. Most signif-icantly, it has allowed the separate estimation of population- and population-pairspecific values ofθ . Previously it was assumed that populations were independent

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 22: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

742 WEIR ¥ HILL

TABLE 4 Two-population estimates, from Equation 7, for FBI data (12)

βi i ′ βW

Locus AA&CA AA&HI CA&HI AA&CA AA&HI CA&HI

D3S135 −.018 .026 −.009 .010 .016 .030

vWA −.018 .006 .010 .019 .021 .017

FGA .006 −.002 −.004 .006 .004 .008

D8S117 −.012 .002 .009 .029 .018 .000

D21S11 −.008 −.008 .015 .003 .029 .010

D18S51 −.004 −.008 .011 .016 .021 .001

D5S818 .003 −.039 .033 .021 .037 .006

D13S31 .058 −.021 −.032 .026 .067 .026

D7S820 .001 .004 −.006 .000 .019 .013

CSF1PO −.024 −.009 .034 .010 .012 .002

TPOX −.043 −.053 .097 .030 .049 .007

THO1 −.034 .028 .006 .077 .035 .021

D16S53 −.009 −.009 .018 .020 .018 .011

Total −.008 −.006 .014 .021 .023 .020

AA: African American, CA: Caucasian, HI: Hispanic.

and that either each population had the same value ofθ or a population-averagevalue was being estimated. The other extension has been the adoption of multi-variate normal methods as an alternative to the method of moments. There may bean increase in computational burden and increase in bias with these methods, butthere is the gain of a distributional form for the estimates.

Natural populations of the same species are unlikely to have the same valueof θ , if only because they have different sizes. Although the reconstruction ofintra-specific trees can proceed satisfactorily on the basis of the usual estimatesof averageθ values (65, 98), there are occasions when population-specific valuesare needed. There is the immediate issue of degrees of freedom. Forr populations,there arer within-population values andr(r − 1)/2 between-population values tobe estimated. As there arem− 1 independent allele frequencies for a locus withm alleles, there are onlyr(m− 1) independent observations in all, so only lociwith large numbers of alleles can be used. WithL loci, there is an increase in thenumber of observed allele frequencies toLr(m− 1) and an increase tor(r + 1)/2+ L(m− 1) parameters, so that even diallelic SNPs can be used. The constraintsare less severe if the between-population coefficientsθi i ′ are ignored, but it needsto be recognized that the estimates are then actually for a combination of within-and between-population values.

Under a pure drift model, values ofθ are simple functions of population sizeand time. For a pair of populations, the values ofθ within each can be expressed

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 23: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 743

Figure 2 Two populations.

in terms ofθ for their most recent common ancestral population. For the situationin Figure 2:

θi = 1− (1− θ12)Xt1i , i = 1, 2,

whereXi = (2Ni − 1)/2Ni andNi is the constant population size for populationsi = 1, 2. Therefore,

βi = θi − θ12

1− θ12= 1− Xt1

i ≈t1

2Ni.

Theβ parameters estimated by Equation 8 for a pair of populations are thereforefurnishing estimates of the time since those populations diverged from an ancestralpopulation. Although the two times must be the same, the pure drift model showsthat the estimates will be different when the two population sizes are different. Theestimate of Weir & Cockerham (100) is for

βW = θW − θ12

1− θ12= 1− Xt1

1 + Xt12

2

≈ 1

2

(1

2N1+ 1

2N2

)t1 = t1

2Nh,

whereNh is the harmonic mean of the two population sizes. The quantityβW isproportional to the divergence timet1 (65).

If populationsi = 1, 2, 3, 4 in Figure 1 have sizesNi , and ifXi = (2Ni−1)/2Ni :

θ12 = 1− (1− θ0)Xt24

θi = 1− (1− θ12)Xt1i = 1− (1− θ0)Xt1

i Xt24 , i = 1, 2

θ3 = 1− (1− θ0)Xt1+t23

θ13 = θ23 = θ0.

Theβ parameters being estimated from the three extant populations 1, 2 and 3involve the average between-population quantityθA = (θ12+2θ0)/3 although this

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 24: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

744 WEIR ¥ HILL

cancels out of the expressions needed to estimate the times:

βi − β12

1− β12= 1− Xt1

i ≈t1

2Ni, i = 1, 2

βi − βi 3

1− β13= 1− Xt1+t2

3 ≈ t1+ t22N

, i = 1, 2.

Theθ ’s of interest can be expressed in terms of the estimableβ ’s:

θi − θ12

1− θ12= βi − β12

1− β12, i = 1, 2.

If θ0 is assumed to be zero, the outgroup population 3 allows estimation of all threemeasuresθ1, θ2 andθ12 for populations 1 and 2 since thenβ12 = 2θ12/(3− θ12)andβi = (3θi − θ12)/(3− θ12), i = 1, 2.

Moment estimates of theθ ’s involve only the second moments of sample allelefrequencies, whereas likelihood or Bayesian methods use the whole distribution.Higher-order moments can be expressed in terms of analogs ofθ (96). Ignoringsample-size terms

E( piu − pu)2 = pu(1− pu)θ

E( piu − pu)3 = pu(1− pu)(1− 2pu)γ

E( piu − pu)4 = pu(1− pu)(1− 2pu)(1− 3pu)δ + 3p2u(1− pu)21.

The normal distribution assumption implies thatγ = δ = 0,1 = θ2, or that thereare no dependencies among a set of four alleles in addition to those between anypair of them. Assuming that allele frequencies have a Dirichlet distribution overpopulations, or thatpiu has a Beta distribution with parameters (1− θ )pu/θ and(1−θ )(1− pu)/θ (4) implies thatγ = 2θ2/(1+θ ), δ = 6θ3/[(1+θ )(1+2θ )],1 =(99). These relations hold for the infinite-allele mutation model, but not for thestepwise mutation model (35).

ACKNOWLEDGMENTS

This work was supported in part by NIH Grant GM 45344. Very helpful discus-sions were held with Peter Donnelly and George Nicholson, and the review wascompleted while the first author enjoyed the hospitality of both the MathematicalGenetics group in the Department of Statistics and the Wellcome Trust Centre forHuman Genetics at the University of Oxford.

APPENDIX

The failure of the maximum likelihood estimate of mean allele frequencies toequal their observed values reflects, in part, the approximation of a multinomialdistribution by a multivariate normal. In the general setting of a population with

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 25: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 745

proportionsQu in theuth of mcategories, the probability of category countsnu ina sample of sizen =∑m

u=1 nu is

Pr({nu}) = n!∏mu=1 nu!

m∏u=1

(Qu)nu,

and the means, variances, and covariances of the counts are

E(nu) = nQu

Var(nu) = nQu(1− Qu)

Cov(nu, nu′ ) = −nQu Qu′ , u 6= u′.

The log-likelihood for the category probabilities is

ln(L({Qu}) =m∑

u=1

nu ln(Qu).

To accommodate the dependency caused by∑m

u=1 Qu = 1, the Lagrange mul-tiplier termλ(1−∑m

u=1 Qu) is added to the log-likelihood. Differentiating withrespect toQu gives

∂ ln(L)

∂Qu= nu

Qu− λ,

which leads to the maximum likelihood estimates (MLEs)Qu = Qu whereQu =nu/n.

For large sample sizes, the multivariate normal distribution provides a goodapproximation to the multinomial. The appropriate normal distribution for categorycounts will have variance matrixnV whereV hasuth diagonal elementQu(1−Qu) and off-diagonal elements− Qu Qu′ , u 6= u′. Omitting themth row andcolumn removes the singularity of this matrix. The mean vector is thennQ =n[Q1, Q2, . . . , Qm−1]′. The determinant of the reduced matrix is

∏mu=1 Qu and its

inverse hasuth diagonal element [1/(Qu)+1/(Qm)] and all off-diagonal elementsequal to 1/(Qm). These results lead to the log-likelihood

ln(L) = −1

2ln

(m∏

u=1

Qu

)− 1

2

m∑u=1

(nu − nQu)2

nQu− λ

(1−

m∑u=1

Qu

),

where the Lagrange multiplierλ allows allmunknownsQu to be included. Settingthe derivative with respect to eachQu equal to zero gives

1

n

(λ− 1

2Qu

)+(

Qu − Qu

Qu+ (Qu − Qu)2

2Q2u

)= 0.

Only for largenwill these equations are satisfied byQu = Qu, so thatQu = Qu are

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 26: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

746 WEIR ¥ HILL

approximations to the MLEs in the normal approximation formulation. In general,however, the MLEs are not simply the observed values.

The Annual Review of Geneticsis online at http://genet.annualreviews.org

LITERATURE CITED

1. Anaya JM, Correa PA, Mantilla RD.2002. Rheumatoid arthritis associationin Colombian population is restricted toHLA-DRB1∗04 QRRAA alleles.GenesImmun.3:56–58

2. Balding DJ, Bishop M, Cannings C, eds.2001.Handbook of Statistical Genetics.New York: Wiley. 890 pp.

3. Balding DJ, Nichols RA. 1994. DNA pro-file match probability calculations: howto allow for population stratification, re-latedness, database selection and singlebands.Forensic Sci. Int.64:125–40

4. Balding DJ, Nichols RA. 1995. A methodfor characterizing differentiation betweenpopulations at multi-allelic loci and its im-plications for establishing identity and pa-ternity.Genetica96:3–12

5. Balloux F, Goudet J. 2002. Statisticalproperties of population differentiationestimators under stepwise mutation in afinite island model.Mol. Ecol.11:771–83

6. Balloux F, Lugon-Moulin N. 2002. Theestimation of population differentiationwith microsatellite markers.Mol. Ecol.11:155–65

7. Banks SC, Skerratt LF, Taylor AC. 2002.Female dispersal and relatedness structurein common wombats (Vombatus ursinis).J. Zool.256:389–99

8. Basset P, Balloux F, Perrin N. 2001. Test-ing demographic models of effective pop-ulation size.Proc. R. Soc. London Ser. B268:311–17

9. Blundell GM, Ben-David M, Groves P,Bowyers RT, Geffen E. 2002. Characteris-tics of sex-biased dispersal and gene flowin coastal river otters: implications for nat-ural recolonization of extirpated popula-tions.Mol. Ecol.11:289–303

10. Brenner CH. 1998. Difficulties in the es-timation of ethnic affiliation.Am. J. Hum.Genet.62:1559–60

11. Broughton RE, Stewart LB, Gold JR.2002. Microsatellite variation suggestssubstantial gene flow between king mack-erel (Scomberomorus cavalla) in the west-ern Atlantic Ocean and Gulf of Mexico.Fish. Res.54:305–16

12. Budowle B, Moretti T. 1999. Genotypeprofiles for six population groups at the13 CODIS short randem repeat core lociand other PCR-based loci.Forensic Sci.Comm. http://www.fbi.gov/hq/lab/fsc/backissu/july1999/budowle.htm

13. Charbonnel N, Angers B, RastavonjizayR, Bremond P, Jarne P. 2002. Evolutionaryaspects of the metapopulation dynamicsof Biomphataria pfeifferi, the intermedi-ate host ofSchistosoma mansoni. J. Evol.Biol. 15:248–61

14. Chiappero MB, Sabattini MS, Blanco A,Calderon GE, Gardenal CN. 2002. Geneflow amongCalomys musculinis(Roden-tia, Muridae) populations in Argentina.Genetica114:63–72

15. Cockerham CC, Weir BS. 1987. Correla-tions, descent measures: drift with migra-tion and mutation.Proc. Natl. Acad. Sci.USA84:8512–14

16. Cockerham CC, Weir BS. 1993. Estima-tion of gene flow fromF-statistics.Evolu-tion 47:855–63

17. Cornuet J-M, Piry S, Luikart G, EstoupA, Solignac M. 1999. New methods em-ploying multilocus genotypes to select orexclude populations as origins of individ-uals.Genetics153:1989–2000

18. Davies N, Bermingham E. 2002. The his-torical biogeography of two Caribbean

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 27: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 747

butterflies (Lepidoptera: Heliconiidae) asinferred from genetic variation at multipleloci. Evolution56:573–89

19. Dallimer M, Blackburn C, Jones PJ, Pem-berton JM. 2002. Genetic evidence formale biased dispersal in the red-billedqueleaQuelea quelea. Mol. Ecol.3:529–33

20. Dawson KJ, Belkhir K. 2001. A Bayesianapproach to the identification of panmic-tic populations and the assignment of in-dividuals.Genet. Res.78:59–77

21. Dodds KG. 1986.Resampling methods ingenetics and the effect of family structurein genetic data.PhD thesis, North Car-olina State Univ. 110 pp.

22. Dufresne F, Bourget E, Bernatchez L.2002. Differential patterns of spatial di-vergence in microsatellite and allozymealleles: further evidence for locus-specificselection in the acorn barnacle,Semibal-anus balanoides?Mol. Ecol.11:113–23

23. Endsley MA, Baker MD, Krafsur ES.2002. Microsatellite loci in the house flyMusca domesticaL (Diptera: Muscidae).Mol. Ecol. Notes2:72–74

24. Ernest HB, Rubin ES, Boyce WM. 2002.Fecal DNA analysis and risk assessmentof mountain lion predation of bighornsheep.J. Wildlife Manage.66:75–85

25. Evett IW, Pinchin R, Buffery C. 1992. Aninvestigation of the feasibility of inferringethnic origin from DNA profiles.J. Foren-sic Sci.32:301–6

26. Evett IW, Weir BS. 1998.InterpretingDNA Evidence. Sunderland, MA: Sinauer.285 pp.

27. Excoffier L. 2001. Analysis of populationsubdivision. See Ref. 2, pp. 271–307

28. Fievet E, Eppe R. 2002. Genetic differ-entiation among populations of theamphidromous shrimpAtya innocous(HERBST) and obstacles to their up-stream migration.Arch. Hydrobiol.153:287–300

29. Fisher RA. 1921. On the “probable er-ror” of a coefficient of correlation deducedfrom a small sample.Metron1:3

30. Foreman LA, Lambert JA. 2000. Geneticdifferentiation within and between for UKethnic groups.Forensic Sci. Int.114:7–20

31. Foulley JL, Hill WG. 1999. On the pre-cision of estimation of genetic distance.Genet. Sel. Evol.31:457–64

32. Frydenberg J, Pertoldi C, Dahlgaard J.2002. Genetic variation in original andcolonizing Drosophila buzzatiipopula-tions analyzed by microsatellite loci iso-lated with a new PCR screening method.Mol. Ecol.11:181–90

33. Gold JR, Turner TF. 2002. Populationstructure of red drum (Sciaenops ocella-tus) in the northern Gulf of Mexico, as in-ferred from variation in nuclear encodedmicrosatellites.Mar. Biol. 140:249–65

34. Gomez A, Adcock GJ, Lunt DH, Car-valho GR. 2002. The interplay betweencolonization history and gene flow inpassively dispersing zooplankton: mi-crosatellite analysis of rotifer resting eggbanks.J. Evol. Biol.15:158–71

35. Graham J, Curran J, Weir BS. 2000. Con-ditional genotypic probabilities for mi-crosatellite loci.Genetics155:1973–80

36. Grimaldi MC, Crouau, Roy B, Contu L,Amoros JP. 2002. Molecular variation ofHLA class I genes in the Corsican popula-tion: approach to its origin.Eur. J. Immun.29:101–7

37. Hawkins SL, Varnavskya NV, MatzakEA, Efremov VV, Guthrie CM III,et al. 2002. Population structure of odd-broodline Asian pink salmon and its con-trast to even–broodline structure.J. Fish.Biol. 60:370–88

38. Heath DD, Busch C, Kelly J, Atagi DY.2002. Temporal change in genetic struc-ture and effective population size in steel-head trout (Oncorhynchus mykiss). Mol.Ecol.11:197–214

39. Holsinger KE. 1999. Analysis of gen-etic diversity in geographically struc-tured populations: a Bayesian perspective.Hereditas130:245–55

40. Holsinger KE, Lewis PO, Dey DK.2002. A Bayesian approach to inferring

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 28: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

748 WEIR ¥ HILL

population structure from dominant mark-ers.Mol. Ecol.11:1157–64

41. Hundertmark KJ, Shields GF, Udina IG.2002. Mitochondrial phylogeography ofmoose (Alces alces): Late Pleistocene di-vergence and population expansion.Mol.Phylogenet. Evol.22:375–87

42. Ivandic V, Hackett CA, Nevo E, Keith R,Thomas WTB, Forster BP. 2002. Analysisof simple sequence repeats (SSRs) in wildbarley from the Fertile Crescent: associa-tions with ecology, geography and flow-ering time.Plant Mol. Biol.48:511–27

43. Jiang C. 1987.Estimation of F-statisticsin subdivided populations. PhD thesis,North Carolina State Univ. 95 pp.

44. Jonsdottir ODB, Imsland AK, Daniels-dottir AK, Marteinsdttir G. 2002. Ge-netic heterogeneity and growth proper-ties of different genotypes of Atlantic cod(Gadus morhua L.) at two spawning sitesoff south Iceland.Fish. Res.55:37–47

45. Kim KS, Choi CB. 2002. Genetic struc-ture of Korean native pig using mi-crosatellite markers.Korean J. Genet.24:1–7

46. Kinnison MT, Bentzen B, Unwin MJ,Quinn TP. 2002. Reconstructing recent di-vergence: evaluating nonequilibrium pop-ulation structure in New Zealand chinooksalmon.Mol. Ecol.11:739–54

47. Krafsur ES. 2002. Population structure ofthe tsetse flyGossina pallidipesestimatedby allozyme, microsatellite and mitochon-drial gene diversities.Insect Mol. Biol.11:37–45

48. Laikre L, Jarvi T, Johansson L, Palm S,Rubin JF, et al. 2002. Spatial and tem-poral population structure of sea trout atthe Island of Gotland, Sweden, delineatedfrom mitochondrial DNA.J. Fish. Biol.60:49–71

49. Lange K. 1995. Applications of theDirichlet distribution to forensic matchprobabilities.Genetica96:107–17

50. Li Y-J. 1996.Characterizing the structureof genetics populations. PhD thesis, NorthCarolina State Univ. 106 pp.

51. Long J. 1986. The allelic correlation struc-ture of Gainj- and Kalam-speaking peo-ple I. The estimation and interpretation ofWright’s F-statistics.Genetics112:629–47

52. Lowe AL, Urquhart A, Foreman LA,Evett IW. 2001. Inferring ethnic origin bymeans of an STR profile.Forensic Sci. Int.119:17–22

53. Manzano C, de la Rua C, Iriondo M,Mazn LI, Vicario A, Aguirre A. 2002.Structuring the genetic heterogeneity ofthe Basque population: a view from clas-sical polymorphisms.Hum. Biol.74:51–74

54. Margis R, Felix D, Caldas JF, SalgueiroF, de Araujo DSD, et al. 2002. Ge-netic differentiation among three neigh-boring Brazil-cherry (Eugenia uniflora L.)populations within the Brazilian Atlanticrain forest.Biodivers. Conserv.11:149–63

55. Marshall HD, Ritland K. 2002. Geneticdiversity and differentiation of Kermodebear populations.Mol. Ecol.11:685–97

56. Merila J, Crnokrak P. 2001. Compari-son of genetic differentiation at markerloci and quantitative traits.J. Evol. Biol.14:892–903

57. Michalakis Y, Excoffier L. 1996. Ageneric estimation of population subdivi-sion using distances between alleles withspecial reference for microsatellite loci.Genetics142:1061–64

58. Monaghan MT, Spaak P, Robinson CT,Ward JV. 2002. Population genetic struc-ture of 3 alpine stream insects: influencesof gene flow, demographics, and habitatfragmentation.J. N. Am. Benthol. Soc.21:114–31

59. Morand ME, Brachet S, Rossignol P,Dufour J, Frascaria-Lacoste N. 2002. Ageneralized heterozygote deficiency as-sessed with microsatellites in French com-mon ash populations.Mol. Ecol.11:377–85

60. Nicholson G, Smith AV, J´onsson F,GustafssonO, Stefansson K, Donnelly P.

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 29: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

F-STATISTICS 749

2002. Assessing population differentia-tion and isolation from single nucleotidepolymorphism data.Proc. R. Stat. Soc.Inpress

61. Nielsen R, Wakeley J. 2001. Distinguish-ing migration from isolation: a Markovchain Monte Carlo approach.Genetics158:885–96

62. Raufaste N, Bonhomme F. 2000. Proper-ties of bias and variance of two multial-lelic estimators ofFST. Theoret. Pop. Biol.57:285–96

63. Raymond M, Rousset F. 1995. An exacttest for population differentiation.Evolu-tion 49:1280–83

64. Reusch TBH. 2002. Microsatellites revealhigh population connectivity in eelgrass(Zostera marina) in two contrasting coa-stal areas.Limnol. Oceanogr.47:78–85

65. Reynolds J, Weir BS, Cockerham CC.1983. Estimation of the coancestry coef-ficient: basis for a short-term genetic dis-tance.Genetics105:767–79

66. Ribeiro MM, LeProvost G, Gerber S.2002. Origin identification of maritimepine stands in France using chloroplastsimple-sequence repeats.Ann. For. Sci.59:53–62

67. Rios C, Sanz S, Saavedra C, Pea JB. 2002.Allozyme variation in populations of scal-lops, Pecten jacobaeus(L.) and P. max-imus (L.) (Bivalvia: Pectinidae), acrossthe Almeria-Oran front.J. Exp. Mar. Biol.Ecol.267:223–44

68. Richardson BA, Brunsfeld J, Klopfen-stein NB. 2002. DNA from bird-dispersedseed and wind-disseminated pollen pro-vides insights into postglacial coloniza-tion and population genetic structure ofwhitebark pine (Pinus albicaulis). Mol.Ecol.11:215–27

69. Robertson AR. 1962. Weighting in the es-timation of variance components in theunbalanced single classification.Biomet-rics 18:413–17

70. Robertson A, Hill WG. 1984. Deviationsfrom Hardy-Weinberg proportions: sam-pling variances and use in estimation of in-

breeding coefficients.Genetics107:703–18

71. Roeder K, Escobar M, Kadane J, Balasz I.1998. Measuring heterogeneity in foren-sic databases using hiererachical Bayesmodels.Biometika85:269–87

72. Roques S, Sevigny JM, Bernatchez L.2002. Genetic structure of deep-waterredfish, Sebastes mentella, populationsacross the North Atlantic.Mar. Biol.140:297–307

73. Rothman ED, Sing CF, Templeton AR.1974. A model for analysis of populationstructure.Genetics76:943–60

74. Rousset F. 2001. Inferences from spatialpopulation genetics. See Ref. 2, pp. 239–69

75. Rousset F. 2002. Inbreeding and related-ness coefficients: What do they measure?Heredity88:371–80

76. Schultheis AS, Hendricks AC, Weigt LA.2002. Genetic evidence for ‘leaky’ co-horts in the semivoltine stoneflyPeltop-erla tarteri (Plecoptera: Peltoperlidae).Freshwater Biol.47:367–76

77. Shriver MD, Smith MW, Li Jin. 1998.Reply to Brenner.Am. J. Hum. Genet.62:1560–61

78. Shriver MD, Smith MW, Li Jin, MarciniA, Akey JM, et al. 1997. Ethnic-affiliationestimation by use of population-specificDNA markers.Am. J. Hum. Genet.60:957–64

79. Slatkin M. 1995. A measure of popula-tion subdivision based on microsatellitefrequencies.Genetics139:457–62

80. Smouse PE, Spielman RS. 1977. How al-location of individuals depends on geneticdifferences among populations. InHumanGenetics, ed. S Armendares, R. Lisker, pp.255–60. Amsterdam: Excerpta Medica

81. Smouse PE, Williams RC. 1982. Multi-variate analysis of HLA-disease associa-tions.Biometrics38:757–68

82. Spielman RS, Smouse PE. 1975. Mul-tivariate classification of human popula-tions. I. Allocation of Yanomama Indiansto villages.Am. J. Hum. Genet.28:317–31

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 30: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

21 Oct 2002 13:8 AR AR174-GE36-24.tex AR174-GE36-24.SGM LaTeX2e(2002/01/18)P1: IBC

750 WEIR ¥ HILL

83. Spitze K. 1993. Population structure inDaphnia abtusa: quantitative genetic andallozymic variation.Genetics135:367–74

84. Squire T, Newman RA. 2002. Fine-scalepopulation structure in the wood frog(Rana sylvatica) in a northern woodland.Herpetologica58:119–30

85. Travis SE, Proffitt CE, Lowenfield RC,Mitchell TW. 2002. A comparative as-sessment of genetic diversity amongdifferently-aged populations ofSpartinaalterniflora on restored versus naturalwetlands.Restor. Ecol.10:37–42

86. Tsigenopoulos CS, Kotlik P, Berrebi P.2002. Biogeography and pattern of geneflow among Barbus species (Teleostei:Cyprinidae) inhabiting the Italian Penin-sula and neighbouring Adriatic drainagesas revealed by allozyme and mitochon-drial sequence data.Biol. J. Linn. Soc.75:83–99

87. Tufto J, Engen S, Hindar K. 1996. In-ferring patterns of migration from genefrequencies under equilibrium conditions.Genetics144:1911–21

88. Van der Strate HJ, Van de Zande L,Stam WT. 2002. The contribution ofhaploids, diploids and clones to fine-scale population structure in the seaweedCladophoropsis membranacea(Chloro-phyta).Mol. Ecol.11:329–45

89. Viard F, Bernard J, Despalnque B. 2002.Crop-weed interactions in theBeta vul-garis complex at a local scale: allelicdiversity and gene flow within sugarbeet fields.Theor. Appl. Genet.104:688–97

90. Vitalis R, Couvet D. 2001. Estimationof effective population size and migra-tion rate from one- and two-locus identitymeasures.Genetics157:911–25

91. Vitalis R, Dawson K, Boursot P. 2001.Interpretation of variation across markerloci as evidence of selection.Genetics158:1811–23

92. von der Schulenburg JHJ, Hurst GDD,Tetzlaff D, Booth GE, Zakharov IA, Ma-jerus MEN. 2002. History of infection

with different male-killing bacteria in thetwo-spot ladybird beetleAtalia bipunc-tatarevealed through mitochondrial DNAsequence analysis.Genetics160:1075–86

93. Wakeley J. 2001. The coalescent in an is-land model of population subdivision withvariation among demes.Theor. Pop. Biol.59:133–44

94. Was A, Wenne R. 2002. Genetic differ-entiation in hatchery and wild sea trout(Salmo trutta) in the Southern Baltic atmicrosatellite loci.Aquaculture204:493–506

95. Weicker JJ, Brumfield RT, Winker K.2001. Estimating the unbiased estimatorθ for population genetic survey data.Evo-lution 55:2601–5

96. Weir BS. 1994. The effects of inbreed-ing on forensic calculations.Annu. Rev.Genet.28:597–621

97. Weir BS. 1996.Genetic Data Analysis II.Sunderland, MA: Sinauer 376 pp.

98. Weir BS. 2000. What is the structure ofhuman populations?Evol. Biol. 32:195–202

99. Weir BS. 2001. Forensics. See Ref. 2, pp.721–39

100. Weir BS, Cockerham CC. 1984. Estimat-ing F-statistics for the analysis of popula-tion structure.Evolution38:1358–70

101. Whitlock MC, McCauley DE. 1999. Indi-rect measures of gene flow and migration:FST 6=1/(4Nm+ 1).Heredity82:1385–70

102. Wolanska-Nowak P. 2000. Applicationof subpopulation theory to evaluation ofDNA evidence.Forensic Sci. Int.113:63–69

103. Wondji C, Simard F, Fontenille D. Evi-dence for genetic differentiation betweenthe molecular forms M and S within theForest chromosomal form ofAnophelesgambiaein an area of sympatry.InsectMol. Biol. 11:11–19

104. Wright S. 1951. The genetical structure ofpopulations.Ann. Eugen.15:323–54

105. Yang R-C. 1998. Estimating hierarchicalF-statistics.Evolution52:950–56

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 31: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

P1: FDS

October 17, 2002 15:44 Annual Reviews AR174-FM

Annual Review of GeneticsVolume 36, 2002

CONTENTS

FRONTISPIECE—R. C. Lewontin xii

DIRECTIONS IN EVOLUTIONARY BIOLOGY, R. C. Lewontin 1

GENETIC MATING SYSTEMS AND REPRODUCTIVE NATURALHISTORIES OF FISHES: LESSONS FOR ECOLOGY AND EVOLUTION,John C. Avise, Adam G. Jones, DeEtte Walker, J. Andrew DeWoody,and collaborators 19

GENETICS OF MOTILITY AND CHEMOTAXIS OF A FASCINATING GROUPOF BACTERIA: THE SPIROCHETES, Nyles W. Charon andStuart F. Goldstein 47

RECOMBINATION IN EVOLUTIONARY GENOMICS, David Posada,Keith A. Crandall, and Edward C. Holmes 75

DEVELOPMENT AND FUNCTION OF THE ANGIOSPERM FEMALEGAMETOPHYTE, Gary N. Drews and Ramin Yadegari 99

PRIMORDIAL GENETICS: PHENOTYPE OF THE RIBOCYTE, Michael Yarus 125

STUDYING GENE FUNCTION IN EUKARYOTES BY CONDITIONAL GENEINACTIVATION, Manfred Gossen and Hermann Bujard 153

DNA TOPOLOGY-MEDIATED CONTROL OF GLOBAL GENE EXPRESSIONIN ESCHERICHIA COLI, G. Wesley Hatfield and Craig J. Benham 175

MEIOTIC RECOMBINATION AND CHROMOSOME SEGREGATION INDROSOPHILA FEMALES, Kim S. McKim, Janet K. Jang, andElizabeth A. Manheim 205

XIST RNA and the Mechanism of X Chromosome Inactivation, KathrinPlath, Susanna Mlynarczyk-Evans, Dmitri A. Nusinow,and Barbara Panning 233

ORIGINS OF SPONTANEOUS MUTATIONS: SPECIFICITY ANDDIRECTIONALITY OF BASE-SUBSTITUTION, FRAMESHIFT, ANDSEQUENCE-SUBSTITUTION MUTAGENESES, Hisaji Maki 279

GENETICS OF INFLUENZA VIRUSES, David A. Steinhauer and John J. Skehel 305

ALLOSTERIC CASCADE OF SPLICEOSOME ACTIVATION, David A. Brow 333

GENETIC ENGINEERING USING HOMOLOGOUS RECOMBINATION,Donald L. Court, James A. Sawitzke, and Lynn C. Thomason 361

v

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.

Page 32: ESTIMATING F-STATISTICS B. S. Weir · This review is concerned with the relationships of pairs of alleles within and between populations, but a further hierarchy of relationships

P1: FDS

October 17, 2002 15:44 Annual Reviews AR174-FM

vi CONTENTS

CHROMOSOME REARRANGEMENTS AND TRANSPOSABLE ELEMENTS,Wolf-Ekkehard Lonnig and Heinz Saedler 389

GENETICS OF SENSORY MECHANOTRANSDUCTION, Glen G. Ernstromand Martin Chalfie 411

UNDERSTANDING THE FUNCTION OF ACTIN-BINDING PROTEINSTHROUGH GENETIC ANALYSIS OF DROSOPHILA OOGENESIS,Andrew M. Hudson and Lynn Cooley 455

THE GENETICS OF RNA SILENCING, Marcel Tijsterman, Rene F. Ketting,and Ronald H. A. Plasterk 489

TRANSVECTION EFFECTS IN DROSPHILA, Ian W. Duncan 521

GENETICS OF CRYPTOCOCCUS NEOFORMANS, Christina M. Hulland Joseph Heitman 557

TOWARD MAINTAINING THE GENOME: DNA DAMAGE ANDREPLICATION CHECKPOINTS, Kara A. Nyberg, Rhett J. Michelson,Charles W. Putnam, and Ted A. Weinert 617

THE FELINE GENOME PROJECT, Stephen J. O’Brien,Marilyn Menotti-Raymond, William J. Murphy, and Naoya Yuhki 657

GENETIC APPROACHES TO MOLECULAR AND CELLULAR COGNITION:A FOCUS ON LTP AND LEARNING AND MEMORY, Anna Matynia,Steven A. Kushner, and Alcino J. Silva 687

ESTIMATING F-STATISTICS, B. S. Weir and W. G. Hill 721

INDEXESSubject Index 751Cumulative Index of Contributing Authors, Volumes 32–36 787Cumulative Index of Chapter Titles, Volumes 32–36 790

ERRATAAn online log of corrections to Annual Review of Genetics chaptersmay be found at http://genet.annualreviews.org/errata.shtml

Ann

u. R

ev. G

enet

. 200

2.36

:721

-750

. Dow

nloa

ded

from

ww

w.a

nnua

lrev

iew

s.or

gby

Uni

vers

ity o

f L

ausa

nne

on 0

6/15

/12.

For

per

sona

l use

onl

y.


Recommended