+ All Categories
Home > Documents > W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON...

W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON...

Date post: 12-Mar-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
20
SELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April 16, 1969 THE main concerns of population genetics are the frequencies of genes in popu- lations and the forces that alter them. Central in importance among these forces is natural selection. The analysis of selection in experimental populations began with the work of L’H~RITIER and TEISSIER in 1934 and is today a standard laboratory procedure. Despite the attention of geneticists for thirty years, how- ever, certain important aspects of selection remain undemonstrated. The purpose of this article is to investigate in detail two particular aspects of selection: how it varies, and how it is partitioned among its more important components. Lethal genes have been chosen for this inquiry into the mechanisms of selection, since for them the algebra of selection is greatly simplified. We shall consider selection which acts on two alleles, one of them lethal, at a single locus on one of the autosomes. A lethal allele is one which, when homozy- gous, causes death before reproduction. The selection against the homozygotes for the lethal allele is thus complete; our task is to estimate the selection on heterozy- gotes for the lethal allele (LETH) and a non-lethal allele (NL) . COMPONENTS OF SELECTION We shall first develop an algebraic model for the study of lethal genes. The model will involve discrete generations; that is, it will be constructed with specific, non-overlapping periods for reproduction. Let us call the total selective value of organisms carrying two non-lethal alleles, W. “W” is assigned to the heterozy- gotes in accordance with the usual convention in discussions o€ lethal genes. It measures the average number of off spring produced by heterozygotes relative to the number produced by organisms carrying two non-lethal alleles. The homo- zygotes for a lethal produce no offspring, so their selective value is 0. The number of offspring, of course, determines the number of alleles present in the next generation. Thus the selective values determine the frequencies of the alleles. In order to reproduce, a fly must live to reproductive age; thus, the total selective value W is composed of all the various types of selection which occur from the formation of zygotes in one generation to the formation of zygotes in the next. The terms “fitness” and “adaptive value” are frequently used as synonyms for selective value. Consider a population in which we determine the frequencies of the genotypes among the adults of each generation before they are transferred to fresh food and Genetics 62: 65%672 July 1969.
Transcript
Page 1: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES

WYATT W. ANDERSON

Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520

Received April 16, 1969

T H E main concerns of population genetics are the frequencies of genes in popu- lations and the forces that alter them. Central in importance among these

forces is natural selection. The analysis of selection in experimental populations began with the work of L’H~RITIER and TEISSIER in 1934 and is today a standard laboratory procedure. Despite the attention of geneticists for thirty years, how- ever, certain important aspects of selection remain undemonstrated. The purpose of this article is to investigate in detail two particular aspects of selection: how it varies, and how it is partitioned among its more important components. Lethal genes have been chosen for this inquiry into the mechanisms of selection, since for them the algebra of selection is greatly simplified.

We shall consider selection which acts on two alleles, one of them lethal, at a single locus on one of the autosomes. A lethal allele is one which, when homozy- gous, causes death before reproduction. The selection against the homozygotes for the lethal allele is thus complete; our task is to estimate the selection on heterozy- gotes for the lethal allele (LETH) and a non-lethal allele (NL) .

COMPONENTS O F SELECTION

We shall first develop an algebraic model for the study of lethal genes. The model will involve discrete generations; that is, it will be constructed with specific, non-overlapping periods for reproduction. Let us call the total selective value of organisms carrying two non-lethal alleles, W . “W” is assigned to the heterozy- gotes in accordance with the usual convention in discussions o€ lethal genes. It measures the average number of off spring produced by heterozygotes relative to the number produced by organisms carrying two non-lethal alleles. The homo- zygotes for a lethal produce no offspring, so their selective value is 0. The number of offspring, of course, determines the number of alleles present in the next generation. Thus the selective values determine the frequencies of the alleles. In order to reproduce, a fly must live to reproductive age; thus, the total selective value W is composed of all the various types of selection which occur from the formation of zygotes in one generation to the formation of zygotes in the next. The terms “fitness” and “adaptive value” are frequently used as synonyms for selective value.

Consider a population in which we determine the frequencies of the genotypes among the adults of each generation before they are transferred to fresh food and Genetics 62: 65%672 July 1969.

Page 2: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

654 W. W. ANDERSON

allowed to lay eggs or bear young. This is, in fact, the common laboratory pro- cedure for experimental populations where generations are discrete. Now, the overall fitness W extends over the entire interval from the lormation of zygotes in one generation to their formation in the next. We are, however, considering the general model where the populations are sampled and the frequencies of the alleles determined some time during the zygote-to-zygote cycle. The results of selection may not then be described by W alone, because mating occurs during the interval between samples and separates the action of the principal components of selection. The selection which occurs between the scoring of the adults and the occurrence of mating alters the genotype frequencies. Mating then redistributes the alleles among the genotypes. The selection which next acts on the zygotes will have a different effect than if the two phases of selection were not separated by mating, since the genotype frequencies on which the selection acts are different in the two situations. PROUT (1965) first pointed out this important point. Follow- ing his notation, we separate W into two components: E (for Early), that part of the selective value which operates from the formation of zygotes to our determi- nation of allele frequencies in the adults; and L (for Late), that part of the selec- tive value which operates from the determination of allele frequencies in the adults to the formation of the next generation's zygotes. The overall selective value is equal to the product of its components: W = E . L. Now E and L are not independent. E is the expected number of NL/LETH zygotes which survive to reproductive age, relative to the number of NL/NL zygotes which do so. L is tlie expected number of zygotes generated by the heterozygotes, relative to the num- ber generated by the NL/NL genotype, given that the heterozygous parents have survived to reproduce. Hence W is the product of its components by the definition of conditional expectations, not the law for combining independent ones. The E component of the selective value consists almost entirely of selection by viability from egg to adult. The L component consists largely of differences in mating activity and fecundity, but also includes any viability differences during the usually short reproductive phase.

THE ASSUMPTIONS

The following assumptions will be incorporated in the model. 1. Mutation will be ignored, as is justified in constructing a model for short-

term changes (of the order of a few years at best) at a single gene locus. 2. The population size is assumed to be sufficiently large that random sampl-

ing effects will not be important, an assumption which is valid for the type of data we shall analyze.

3. The selective values of the genotypes are assumed to be constant, or very nearly so. This assumption is certainly not always justified, but we shall incor- porate it in order to simplify the analysis. Later I shall outline a technique to reveal some types of variation in selection from generation to generation.

4. Mating is assumed to be random, and it is assumed that the effects of differ- ential mating activity and differential fecundity are simply to alter the effective

Page 3: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION ON LETHAL GENES 655

frequencies of alleles in the populations. Differential mating activity effectively alters the frequencies of the alleles as if there were more individuals with the favored genotype in the population. If the females lay approximately the same number of eggs from matings with either of the male genotypes, then differential fecundity between the female genotypes will likewise simply increase the effec- tive frequencies of the favored alleles in the “mating pool” of gametes. If different types of mdings result in different numbers of zygotes, however, a more complex analysis must be undertaken (see BODMER 1965).

5 . Selection will initially be assumed to be the same, or at least very similar, in the two sexes. Later this assumption will be specifically evaluated for lethal genes and shown to be reasonable.

6. Segregation of the alleles at meiosis is assumed to be normal in both sexes; there is assumed to be no “meiotic drive” ( SANDLER and NOVITSKI 1957).

THE ALGEBRA

Let; Q be the frequency of the lethal allele and P the frequency of the non-lethal alleles. The algebra will begin, among the adults, at the point where the frequen- cies of the alleles are determined. There are only two genotypes among the adults: individuals with two non-lethal alleles and heterozygotes for the lethal. Only the heterozygotes carry the lethal allele, and in them only one of their two alleles is lethal. Hence the frequency of the heterozygotes is 24.

NL/NL NL/LETH Genotypes 1-2Qt 2Qt Genotype frequencies

among the adults at generation t .

1 L “L” selection l+Qt 2QtL Genotype frequencies 1 + q t (L-1) 1+2Qt(L--l) after “L” selection.

QtL Allele frequencies in the 1+2Qt (L-I) “mating” pool of gametes. and Qmat =

1 +Qt (L-2) (I) = 1+2Qt(L-1) Random mating occurs, and zygotes are formed. Then selection acts through the “E” component.

NL/NL NL/LETH LET€X/LETH Genotypes P‘mat 2 P m a t Q m a t Pmat Genotype frequencies

1 E 0 “E” selection P‘mat 2 P m a t Q m a t E 0 Genotype frequencies

after random mating

P ’ m a t f 2 P m a t Q m a t E P 2 m a t + 2 P m a t Q m a t E ‘after “E)’ selection It is at this point that allele frequencies are again scored; tl3.e full cycle of one generation has passed.

(2) - Q m a t E

QtW

Qt Pmat + 2QmatE Substituting for Pmat and Qmat, and letting E . L = W where the product appears,

Qt+l = I+Qt (L-2),+2QtW * (3)

Page 4: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

656 W. W. ANDERSON

Inverting both sides of this equation, we transform this non-linear difference equation into one which is linear in a simple function of the Q’s.

1 1 L-2+2W 1 - W

. -+ ot+1 ot W (4) - ~

Solving (4) , we obtain

) (5)

w-1 the equilibrium

1 (L-2+2W) (L-2-2W) Qt - - W-1 = Qt(&- W-1

provided W = E . L# 1. The values of 1/Q converge monotonically to a final value by a factor of 1/W per generation; ( l /W) gives the convergence over t genera- tions. If W>1, then 1/W<1 and the right hand side of (5) approaches zero as t

increases. Hence - + L-2+2W as t+ 00 and Qt +

gene frequency. If W<1, then 1/W>1 and the right hand side of (5) increases without bound with increasing t; this requires that Qt+O, so that the lethal allele will be eliminated. These equilibrium properties are just one case of the general formulation for two alleles at a single autosomal locus (FISHER 1922): an equi- librium at intermediate frequencies of the alleles is possible if and only if the selective value of heterozygote is larger than that of either homozygote. Simplifying ( 5 ) ,

L-2f2W ’ Qt W-1

(6) W t (W-1) Qo

Q t = W-1 + (L-2f2W) (Wt-l)Qo ‘ The time required to change the frequency of the lethal allele from Qo to Qt is

(Qt(l+QoK))

L-2+2W and W#l. (7) loge Qo (1+QtK) where K = t =

loge W l-W Ii we consider the usual model where sampling is from the newly-Eormed zygotes, only the overall fitness W enters the equation and

W t (W-1)Qo Qt = W-lf(2W-1) (Wt-l)Qo

The time “t” is the same as in (7), but now K = (2W-l)/(l-W), provided W # 1, that is, provided there has been some selection in the heterozygotes. Formulas (6) and (8) are not new, although they are seldom encountered in the literature. They were first developed by TEISSIER (1942) , who later (TEISSIER 1944) considered the population genetics of lethal alleles in greater detail.

If the lethal is completely recessive ( W=l ) , so that the only selection is against the lethal homozygotes, then with sampling accomplished anywhere in the life cycle

where W = E . L = 1.

ation to those in the next, then for a completely recessive lethal Assuming again the model of change from newly-formed zygotes in one gener-

1 1 Qo and t=--- w h e r e W = E . L = l ,

Q t = w Qt Qo as shown by JENNINGS (1916).

Page 5: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION ON LETHAL GENES 65 7

Lethal genes are one of the two special cases for which the basic recurrence formula relating gene frequencies in successive generations has been solved to give a closed form such as (6). The other case involves genotypes whose selective values form a geometric progression (LI 1959).

THE EFFECT O F DIFFERENTIAL SELECTION IN THE SEXES

Suppose that selection is different in the two sexes, and let the selective value in female heterozygotes be W , , that in male heterozygotes W,. Let the average selective value be W . Each generation will begin with gene frequencies equal in the two sexes, since mating distributes the genes equally between female and male zygotes. Let the frequencies of the newly-formed zygotes at time T be N for the NL/NL genotype and H for the NL/LETH genotype. The lethal gene frequencies in the “mating pool” of gametes, after the differential selection in the females and males, are

HWO HW2 ” . The overall lethal gene fre- 2N i- HWd and Q , ( T ) =

Q? = 2N i- .*W, quency among the newly-formed zygotes at generation (Ti-1) is Q(T+1)=1/2 (Q ( T ) S Q , ( T ) ) . If the observed change in gene frequency had been due to selection which was identical in the sexes, the common selective value being 2,

HZ 2 N + H Z . We should like to know how Z is related to W and then Q(T+l), =

to the difference of the selective values in the females and the males. That is, if we estimate the selection assuming it is alike in females and males, just how will this overall value relate to the true selective values in the sexes? To find out, we set Q ( T + l ) z = q(T+l) and colve for 2. After simplification,

Z=W-DZ;F __ w h e r e E < l a n d D w = I W o - W W , I .

In a population where selection is different in females and males, then, the gene frequency changes approximately as if the two sexes had a common selective value equal to the average of the values in the two sexes. This common selective value differs from the average in the two sexes by a factor smaller than one-fourth of the square of the difference in selective values between females and males. This error factor will be negligible unless the male and female selective values are quite different.

4

THE ANALYSIS

Let us turn now to the problem of analyzing data and see how much informa- tion we can extract about the genetic changes in experimental populations. Several experiments with lethal alleles in laboratory populations of Drosophila have been undertaken, and we shall use some of these data for the analysis. There are four types of information we should like to get from the data. First, can we visualize the selection process in a simple way? Second, has selection been constant from generation to generation or has it changed? Third, what is the extent of the selec-

Page 6: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

658 W. W. ANDERSON

tion and how is it partitioned between the “early” and “late” components? Fourth, how reliable are our estimates of the selective values?

A GRAPHICAL TECHNIQUE

Recessive lethals which produce a visible effect in the heterozygotes can very readily be scored among the adults in a population by simple examination of the phenotype. It is easy to measure all the members of a small population. For this case an especially straightforward analysis is possible. Using equation (4) as o w basis, we plot l/Qt+, on the ordinate versus l/Qt on the abscissa. The slope of the line through the points is 1/W, and it is easy to show that the points will fall along a straight line if and only if the fitaesses are constant. A convenient way to fit a line with fairly many points is BARTLETT’S (1949) procedure. Since the Q’s are known exactly in these experiments, there is no variance to the l/Q’s.

This graphical technique is also useful in visualizing the course of selection in the early generations of experiments where the Q’s are determined from samples of the populations. In this case the Q’s have binomial variances and the 1/Q’s are vastly more variable; hence, the method can only give a rough idea of the extent of selection.

Example 1. The author maintained two populations of Drosophila melano- gaster at 25°C in pint bottles with cornmeal-molasses-agar medium. One popu- lation contained the Cy allele and the other, the P m allele. Both populations were initiated with heterozygotes for these lethals. The parents of each generation were

20

18

16

14.

12

10

8

6

4

2

‘/QT

FIGURE 1 .-Graphical visualization of selection involving lethal genes.

Page 7: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION O N LETHAL GENES 659

put into bottles and allowed to lay eggs for seven days; they were then removed. All newly-emerging flies remained in the bottles until the eighteenth day after their parents began to lay eggs. This next generation was then scored for pheno- type and put into a fresh bottle to begin egg laying. Then the sequence above was repeated. The data for five generations are graphed in Figure 1. The adults used to begin the populations were all heterozygotes for the lethal allele; hence there could be no differential fertility or viability (the “L” component) among them. Since there was no ‘‘15’’ component of selection in the first time interval, the initial frequency of 0.5 was not included in the graphs.

It is clear from Figure 1 that selection is eliminating both the Cy and P m alleles in the heterozygotes as well as in the homozygotes for these lethal alleles. The close fit of the points to a straight line for each population indicates that the selec- tion in each has been very nearly constant.

VARIATION O F SELECTION: W ESTIMATED OVER INTERVALS O F O N E GENERATION

By sampling among the newly-formed zygotes, we can estimate the actual selection which acts in each generation and get an idea of how it has varied with time or gene frequency. Lethal genes are tailor-made for studies of variable selec- tion. They are, in fact, the only genetic situation in diploids where the selection in a single generation can be estimated. With K alleles at a single locus, the distri- bution of newly-formed genotypes after random mating gives ( K - I ) degrees of freedom, since the genotype frequencies are simple multiplicative functions of the gene frequencies. Over an interval of one generation there will be 2(K-I) degrees of freedom in the data. But there are (K(K+1)/2)-1 independent fit- nesses to be estimated, as well as the (K-1) independent gene frequencies at the beginning of the interval. (K(K+1)/2)-l+K-1>2(K-i) for ell K 2 2 . There are, therefore, not enough degrees of freedom to estimate the selective values, no matter what the number of alleles at the locus. Only with lethal genes, where the selective value of one genotype is already known, do the data permit estimation of all the independent selective values. Suppose that samples of eggs were taken from populations in which a lethal allele were segregating, and that the eggs were cultured under near-optimal conditions so that selection between the NL/NL and NL/LETH genotypes was minimized. Then the allele frequencies among the zygotes may be inferred. There will be only two classes of genotypes in the cul- tures; over one generation there will be two samples and, thus, two degrees of freedom. Then we may estimate Q, the frequency of the lethal at the beginning of the interval, and W , the overall selective value of the heterozygotes. The expected frequencies of the genotypes which emerge from the sample at any generation t are:

NL/NL NL/LETH 1-Qt wt 1 +Qt 1 +Qt

where Qt is given by formula (8). Thus, the expected genotype frequencies at

Page 8: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

660 W. W. ANDERSON

time t+l (in term; of the frequencies at generation t ) are found by substituting t=l into formula ( 8 ) :

Time t Time t+l

NL/NL NL/LETH

1-Ht Ht

Ht is the observed frequency at generatioil t.

NL/NL NL/LETH Genotypes 1+Qt(W-1) 2QtW Expected 1 +Qt (3W-1) 1 +Qt (3W-1) frequencies l-Ht+i Ht+l Observed

frequencies of the lethal heterozygotes from the sample taken

We obtain the maximum likelihood estimates, which are those which maximize the probability of the particular frequencies which we observe, by equating the expected and observed frequencies (see appendix 1 to BAILEY 1961). Thus,

The variances of these estimates were calculated by the “delta method”; a mathe- matical statement and justification of this procedure is given, for instance, by RAO (19,65, pp. 319-322). In large samples they are, approximately:

and Q t ( l + Q t ) (1-Qt’) var Qt = 2Nt

N t and N t + l are the total numbers of organisms recorded at times t and t f l , respectively. When a population is begun with heterozygotes, the first interval begins from a known heterozygote frequency of K among the newly-formed zygotes after the lethal homozygotes have been eliminated. For this interval the variance of W depends only on the variation in H t +l; it is, approximately,

The estimates of W from successiue intervals are not independent, since they share one set of observations. This set is the end of one interval and the beginning of the next. Hence, if the variation in this sample causes one estimate to be too high, the other will be too low. The covariance of the estimates of W from suc- cessive intervals will be negative, and in large samples it is approximately

Example 2. POLIVANOV and ANDERSON (unpublished) followed the frequency of the third chromosome lethal Sb in an experimental population of Drosophila melanogaster with the generations discrete. Samples of eggs were taken at the beginning of each generation and raised under nearly optimal conditions. The frequencies of the heterozygotes among the adults hatching in each sample and the numbers of flies counted are given in Table 1. Separate experiments showed

Page 9: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION ON LETHAL GENES 661

TABLE 1

Frequency of Sb heterozygotes and estimates of the selective values over each generation in POLIVANOV'S and ANDERSON'S experimental population

Generation f' f N * k 2 S . E . r

0 0.667 . .

1 0.547 342 -0.51

2 0.443 325 -0.53

3 0.298 382 -0.47

4 0.239 348 -0.53

5 0.159 454

1.52 f 0.42

1.09 f 0.24

0.68 k 0.12

0.88 k 0.16

0.66 j, 0.13

H - frequency of Sb heterozygotes among the adults hatching from an egg sample taken at

N t = total number of adults counted in sample at generation t . r = coefficient of correlation between the estimates of W in successive generations.

-. generation t .

that there was no differential viability between the NL/NL and N L / L E m geno- types which hatched from the egg samples. Estimates of W , plus or minus their standard errors (the square roots of the variances), and the correlation coefficients

cov (kt, &+I) ' var f i t . var I+, Tt,t+l = ~

between the successive estimates of W are also given in Table 1. Since the one available degree of freedom from each interval has been used to estimate W , the estimates will exactly account for the observed changes. It is apparent that the standard errors increase with the size of the W ' s ; this is expected, since the changes in heterozygote frequency, on which the estimation depends, become smaller as the W s become larger. The fairly large standard errors indicate the difficulty in obtaining precision when estimating selection over a single genera- tion. Had selection been constant, the best estimate of the selective value would be .777, a weighted average of the separate estimates. The irequencies of hetero- zygotes expected each generation were calculated with this selective value, and the observed and expected numbers of the genotypes in the egg samples were compared in a xz test for goodness of fit. The xz, with 4 degrees of freedom, was 39.1; there were sizeable contributions to this total x2 from each generation. I t is thus almost certain that the selective values were not constant, but vaned sig- nificantly among the five generations.

Estimates of the selective values in the population containing Sb are plotted uersus the time intervals in Figure 2. There is a decline of I$' with time or with lethal frequency. Sb appears to be heterotic at first, with an erosion to appreciable selective disadvantage.

Page 10: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

662 W. W. ANDERSON

1.6 1

1.5 - 1.4 - 13 -

1.2 - w 1.1 -

1.0 - .9 - .8 - .7 -

TIME INTERVAL FIGURE 2.-Variation of selection in POLIVANOV’S and ANDERSON’S population.

PARTITIONING THE SELECTIVE VALUE

The E (“early”) and L (“late”) components of the selective value may be estimated in experiments where the samples are taken among the adults after part of the selection (the E part) has occurred. The total selective value may thus be partitioned into two components, the first ( E ) largely selection by viability, and the second ( L ) largely selection through mating activity and fecundity. It is important to note that the partition can be accomplished in the population under- going selection, without recourse to other experiments. To measure components of selection in experiments outside the population being studied usually requires special experimental arrangements. It is unlikely that the conditions of such measurements are like those in the population itself, and hence the components of selection being estimated are probably different than those acting in the popu- lation. I t is risky to extrapolate from such “outside” experiments to a population where the environmental conditions are different. The only proviso attached to the method of estimating E and L is that there should be some change in fre- quency of the lethal during the observations; otherwise the population is at equilibrium. E and L may be estimated over every interval of two generations. But these estimates over two generations do not have much value for our objective of partitioning the selective value, for they are quite sensitive to sampling error. It is better, then, to estimate E and L over the whole course of an experiment. By generating the expected frequencies of the alleles from the estimates of the selec-

Page 11: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION ON LETHAL GENES 663

tion and the known initial frequencies, a chi-square test may then be applied to determine whether the estimates do adequately account for the data. For a sample at time i the model is as follows.

NL/NL NL/LE?W TOTAL Genotypes l-H(i) H (i) 1 Expected frequencies N1 (i) N , ( 9 N ( i ) Observed numbers

Most populations are started with a lethal frequency of 50%. Since the adults used to originate the population are all of the same genotype-the heterozygotes NL/LETH-there can be no selection among them corresponding to the “L” component during the first interval. Only the “E” component will operate during the interval from the beginning of the population to the first sample. The recur- rence relation (6) can be modified to take this factor into account. The frequency of heterozygotes in generation i, beginning with a lethal frequency of 0.5, is

2EWi-1 ( W - 1 ) 2EWi-l(W-l) + Wi - 1 H ( i ) =

The technique of maximum likelihood scoring introduced by FISHER will be used to estimate E and L (See appendix 1 to BAILEY 1961 for a description of the tech- nique). The estimates will be under the hypothesis that there is a deterministic process which governs the frequencies of the alleles at any generation. The fre- quency of the heterozygotes at any time depends only on the initial frequency, the components of selection E and L, and the time elapsed since the beginning of the population, as given by formula (15). We assume that the observations are made on random samples of the individuals in the populations. The probability of observing N , organisms of genotype NL/NL and N , organisms of genotype NL/LETH at time i is the likelihood

like(i) = k ( I - H ( i ) ) x ~ ( i ) . H ( i ) N z ( B ) , where k is a constant. The maximum likelihood estimates are obtained by simultaneously minimizing the derivatives of the likelihood, or equivalently its logarithm, with respect to each parameter being estimated. The scoring technique utilizes a slightly modified MCLAURIN’S series expansion of the derivative of the log likelihood, with terms above first order neglected, in order to achieve the minimization. The series ex- pansion gives a vector of corrections to some initial vector of estimates as the product of a vector of “scores” and the inverse of an “information” matrix; this product is evaluated at the initial values. The whole process is repeated and new corrections are obtained at each cycle. Convergence is usually rapid. The scores,

which are a log, like (i) aE

a log, like (i) , are aL and

1 - log, like (i) The terms of the information matrix are miven by I,, = -E (” a, a,

Page 12: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

664 W. W. ANDERSON

For the ith sample they are: N ( i ) H ( i ) (l--H(i)) I E E ( i ) =

Considering the samples at different times to be independent, like = x like (i), and log, like = log, like (i) . Hence SE = Z sp; (i) , sL = Z SL ( i ) , IEE = Z ZEE ( i ) , ZEL = ZLE = I L L (i) . The scores and the information matrix are first calculated for trial values of E and L. Let the elements of the inverse of the information matrix be denoted by superscripts. Then corrections to the initial estimates will be

z

c

ZEL (i) , and I L L = k 1.

6 E, = ZEE SE 4- Z E L SL and 6LO = I E L S E $. ILL SL.

The improved estimates are E, = E, f 6 E, and L, = Lo + 6 Lo.

This procedure is repeated until the corrections become as small as we choose. The covariance matrix for the final estimates will be approximately Z-l in large samples. The variance of k is approximately Z E E and that of e is approximately

ILL. The maximum likelihood estimated of W is lk = I?.,?,. The variance of fi is approximately.

var J@ = LZF + E ~ P + ~ ~ V Z E L . The problems of averaging sets of I? and e from different populations, or of

comparin3 them, are identical to the general problems of combining or comparing estimates c f selective values from different populations. The necessary algebraic procedures are discussed in detail by DUMOUCHEL and ANDERSON (1968), and their application to data is illustrated by ANDERSON et al. (1968).

Example 3. WALLACE (1963) maintained an experimental population of Dro- sophila melanogaster which contained a lethal allele at the It locus on chromo- some 2. The generations were discrete, and samples of adult males were taken from the population and their genotypes determined every generation for ten gen- erations. WALLACE'S data and my estimates of E, L, W and their standard errors are given in Table 2. Starting with a lethal heterozygote frequency of 0.5, the frequencies expected with these estimates were generated and compared with those observed. The expected frequencies and the chi-squares for goodness-of-fit of the observed and expected frequencies of both genotypes are given in Table 2. The estimates do provide a tolerable fit to the observations; notice that a major part of the chi-square comes from the one sample at generation four. The observed and expected frequencies of the genotypes are graphed in Figure 3.

Page 13: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION O N LETHAL GENES 665 TABLE 2

Components of selection in WALLACE’S experimenlrll population

Generation Nt Observed H, Expected H, Chi-square

0 1 2 3 4 5 6 7 8 9

10

. . 227 97

106 130 145 199 183 191 19% 197

1 .ooo 0.568 0.454 0.378 0.376 0.180 0.170 0.169 0.130 0.108 0.082

. . . . 0.623 0.441 0.335 0.265 0.216 0.180 0.152 0.130 0.112 0.098

2.88 0.21 0.89 8.23 1.11 0.13 0.21 0.00 0.04 0.56

A

E = 0.825 f 0.082

L = 1.112 +- 0.145

W = 0.916 f 0.034

Cov (E, L) = -0.012

A

A

A A

Total chi-square for goodness-of-fit of both genotypes is 14.24 with 8 degrees of freedom;

N , = number of individuals observed at generation t. H , = relative frequency of lethal heterozygotes in sample at generation t.

0.05 < P < 0.1.

It is clear that the lethal is partially dominant and will be eliminated faster than if only the lethal homozygotes were selected against. The selection is mostly a differential viability from the egg to the adult stages, there being little evidence for differential mating activity or differential fecundity. It is important to remem- ber that in estimating the components of selection we assumed that they were alike in the sexes; if they are not, then we shall have estimated an average viability and an average fertility.

A fuller explanation of the statistical procedure is perhaps in order here. The likelihood equations are quite complex, and it is not clear that they should have a unique solntion set. In particular, the likelihood function might contain relative maxima to which our estimates could converge. I have there€ore included in the computer program a preliminary analysis in which the scores and the chi-squares for goodness-of-fit are calculated for all possible permutations of E and L from 0.1 to 2.0, in increments of 0.1. These four hundred computations give a “surface” of scores, and a “surface” of chi-squares over a significant part of the E, L “plane.” For WALLACE’S data and for the other five sets of data I have analyzed there is a single minimum of these surfaces (corresponding to a maximum on the likelihood surface), and it is in the same region for the scores and for the chi-squares. The best three or four sets of E and L were fed into the maximum likelihood scoring program; in all cases they converged to the same values. The maximization of the likelihood has always corresponded to a minimization of the chi-square. Con- vergence is rapid; in nine iterations the scores for WALLACE’S data were reduced

Page 14: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

666

1.0

.9

.8

.7

.6

H T .5

.4

.3

.2

.1

W. W. ANDERSON

0 0 = OBSERVED

&--A = EXPECTED

I I I I I I I I I I I

0 1 2 3 4 5 6 7 8 9 10 GENERATION

FIGURE 3.-Observed and expected heterozygote frequencies in WALLACE’S papulation.

to SE = 0.00009 and SL = 0.00005. The estimates of E and L did not change in the third decimal place after only three iterations.

THE RELIABILITY O F THE LARGE-SAMPLE FORMULAS FOR THE VARIANCES

In order to make the most effective use of the estimates of the selective value and its components, it is necessary to have some way of establishing their reli- ability. The large-sample formulas of the variances and convariances were derived ior this purpose. What statistical theory tells us is that these formulas will be asymptotically approached as the sample sizes grow larger and larger, and that the formulas are true to a close approximation in “large” samples. In “large” samples the estimates will be normally distributed. But how large is “large”? The sample size required for the approximate formulas to be accurate depends on the particular situation; what is large for one type of experiment may be small for another. Do the large-sample formulas for the variances and covariances accurately measure the reliability of the estimates of E, L and W? Computer simulations were undertaken in order to decide this point.

Page 15: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION O N LETHAL GENES 667

A simplified flow diagram for the program to test the formulas for the variances of E and L as estimated over many generations is given in Figure 4. Beginning with an initial heterozygote frequency of 0.5, and using two sets of E and L, one heterotic and the other not, heterozygote frequencies among adults after “E” selection but before “L” selection were generated for ten generations with recur- rence relation 15. A pseudo-random number generator was used, as shown in Fig- ure 4, to simulate the random selection of either 100 or 200 individuals from populations with these true heterozygote frequencies. The “perturbed” hetero- zygote frequencies should differ from the true heterozygote frequencies with the binomial sampling errors expected in random samples from populations with two classes of individuals. Each resulting sequence of ten perturbed heterozygote frequencies was used to estimate an E and an L. This procedure was followed for 100 simulations with the two sets of selective values and with sample sizes in each generation of 100 or 200 individuals. The 100 estimates of E and L for each set of true selection components and for each sample size were used to calculate a mean I?, a mean i, their variances about the true values, and the covariance between ,?? and 2. The mean k and mean 2 will indicate how accurately the true values are estimated and whether or not there is a serious bias in the estimation. The empirical variances of the estimated E’s and L‘s can be compared with the values from the formulas for the asymptotic sampling variances to see whether or

empirical variance - s2 sampling variance u2

will be distributed as -, where N is the number of degrees of freedom. The ratios

not the formulas give close approximations. The ratio -- X2N

N X 2

N may be directly referred to a table of - . The results of simulations are compared

with the values calculated by maximum likelihood theory in Table 3. With the sample size of 100, the empirical variance of the L component was significantly larger than the asymptotic sampling variance for one set of selection components; no other comparison approached statistical significance. With 200 individuals per sample, the empirical variance was in no case significantly larger than the sampling variance. For one case, in fact, the empirical variance of the E compo- nent was significantly smaller than the sampling variance, although the ratio of the two was not far from unity; no special importance is attached to this one case. The means of the estimated E’s and L‘s were quite close to the true values. The distributions of the E’s and L’s estimated from the simulations were examined. A chi-square test of goodness-of-fit, using ten equiprobable intervals about the mean, was employed to test the fit to a normal curve. For samples of size 100 the fit was good. But for samples of size 200 thc distributions of i!? and f, from the simulations were narrower than the normal distribution, and in some cases they were slightly skewed. The simulations suggest that confidence intervals based on the sampling formulas will be satisfactory indicators or reliability. They will be, if anything, slightly conservative.

Page 16: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

S T A R T

COhlPONENTS O F SELECTION

I C D I ~ I P U T E "PERFECT" GENOTYPE

GENERATIONS, USING FORhiULA 15 I FREQUENCIES, H'K', FOR 10

C A L C U L A T E A S Y M P T O T I C SAMPLING VARIANCES AND THE COVARIANCE O F E A N 0 L 6 Y h3AXfPlUhl I LIKELIHOOD THEORY

A POPULATION WHICH UNDERGOES SELECTION FOR 10 GENERATIONS, BEGINNING FRDhl H I 0 1 0.5, W I L L BE S I U U L A T E D

SIh lULATE ANOTHER POPULATION

UNIFORMLY OlSTRlBUTEO I N T H E

RNIHIK)?

T H I S PROCESS IS E Q U I V A L E N T T O DRAWING ONE INDIVIOUAL A T RANOOM F R O M A N I N F I N I T E L Y LARGE POPULATION W I T H Y HETEROZYGOTE FREQUENCY H!Ki

NO

NO

I E S T I M A T E E!Kl AND L l K i BY M A X I M U M LIKELIHOOD SCORING TECHNIQUE A N D STORE FOR I F U T U R E U S E

S I M U L A T E 0 7

A N 0 THEIR VARIANCES A N 0 COVARIANCE. FROM THE SIMULATIONS

COMPARE TRUE E AND L W I T H E S T I M A T E S FROh! THE SIL lULATIONS, AND COhlPARE SAMPLING VARIANCES A N 0 COVARIANCE l Y l T H THE E h i P l R l C A L VARIANCES A N 0 COVARIANCE FROhl THE Slh lULATIONS

~

FIULJJRE 4.-Flow diagram of the computer program to investigate the accuracy in estimating E and L, and to investigate the validity of large-sample formulas for the variances and covariances of E and L with samples of 100 and 200 individuals.

Page 17: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

TAB

LE 3

The

ave

rage

fi an

d L,

the

ir v

aria

nces

, and

the

ir c

ovar

ianc

e as

est

imat

ed i

n 100

popu

latio

ns s

imul

ated

by

com

pute

r, c

ompa

red

wit

h th

e tr

ue

gene

ratio

ns;

popu

latio

ns b

egun

wit

h he

tero

zygo

te f

requ

ency

of

0.5.

v, M n

E an

d L

and

thei

r as

ympt

otic

sam

plin

g va

rian

ces

and

cova

rian

ce.

Est

imat

ion

from

gen

otyp

e fr

eque

ncie

s am

ong

adul

ts i

n 10

L 8 E Z A

sym

ptot

ic s

ampl

ing

valu

e.

Em

piri

cal v

alue

s fr

om s

imul

ated

pop

ulat

ions

Sa

mpl

ing

vari

ance

0

2

Em

piri

cal v

aria

nce

~~

EL

V

ar E

V

ar L

C

ov(E

,L)

Sam

ple

size

M

ean

E

Mea

n L

Var

E

Va

rL

Cov

(E,L

) T

rue

E T

rue L

0.82

47

1.11

22

0.01

070

0.03

467

-0.0

1893

10

0 0.

8168

1.

1436

0.

0104

7 0.

040+

7 -0

.019

86

0.98

1.

17

0.82

47

1.11

22

0.00

535

0.01

733

-0.0

0946

20

0 0.

8092

1.

1541

0.

0051

8 0.

0196

8 -0

.009

83

0.97

1.

14

F 1.

2000

1.

1000

0.

0275

1 0.

0563

1 -0

.038

71

100

1.18

39

1.16

92

0.02

981

0.08

146

-0.0

4681

1.

08

1.50

* r 0

1.

2000

1.

1000

0.

0137

5 0.

0281

6 -0

.019

35

200

1.18

25

1.13

54

0.00

975

0.02

336

-0.0

1443

0.

71*

0.83

M

z * S

igni

fica

nt a

t 0.

05 le

vel.

Page 18: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

670 W. W. ANDERSON

To test the formula for the variance of W estimated over a single generation, two pairs of heterozygote frequencies were chosen: .5 and .3, and .548 and .472. The pairs of numbers represent heterozygote irequencies among the adults hatch- ing from egy samples taken one generation apart. The first set corresponds to a non-heterotic W of about .55 and the second, a heterotic W of 1.3. A pseudo- random number generator was used, as before, to simulate the random selection of either 100 or 200 individuals from populations with the true heterozygote frequencies cited above. One hundred pairs of heterozygote frequencies were generated for each true W and for each sample size to represent 100 sample pairs perturbed with binomial sampling error. Each sample pair was used to estimate a W , and the 100 estimates for each true W at each sample size were used to

obtain a mean I% and the empirical variance about the true W . The results of the simulations are given in Table 4. With 100 individuals in the sample, the ratio of empirical variance to sampling variance was close to unity for one of the W ' s but significantly larger than unity for the other. With a sample size of 200, the ratios (empirical variance/sampling variance) were very close to one; the differences were statistically not at all significant. The means of the estimated W ' s were close to the true values. The distributions of the W's estimated from the simulations were noticeably narrower than the normal distribution and were somewhat skewed. The confidence intervals calculated by sampling theory will likely give a satisfactory idea of the reliability of the estimates.

Thus, the computer simulations for both statistical procedures, the estimation of E and L over many generations and the estimation of W over one generation, indicate the validity of the formulas for the asymptotic sampling variances and covariances when as few as 200 individuals are included in each sample. Samples between 100 and 200 are probably adequate. The simulations uncovered no evi- dence of serious bias in the estimation of E and L or of W. It should be noted that usually 200 or more individuals are sampled in experiments of the type for which these statistical analyses are designed.

TABLE 4

The average W a n d its uariance as estimated in 100 populations simulated by computer, compared with the true W and its asymptotic sampling variance.

W estimated over one generation from samples among newly-formed zygotes.

True heterozygote frequencies: Sampling Mean W Empirical Empirical variance

Time t Time I f 1 True W of true W Sample size simulations estimated W's Sampling variance vanance estimated from variance of

0.500 0.300 0.5455 0.0349 100 0.5388 0.0392 1.12 0.500 0.300 0.5465 0.0174 200 0.5573 0.0204 1.17 0.548 0.472 1.333 0.3050 100 1.427 0.5232 1.72' 0.548 0.472 1.333 0.1 525 200 1.413 0.1 633 1.07

* Significant at 0.05 level.

Page 19: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

SELECTION O N LETHAL GENES 671

I am particularly grateful to Mr. ROLLIN RICHMOND for his assistance with the computer work. Drs. F. AYALA, M. DOWLER, TH. DOBZHANSKY, J. SVED, and W. WATT offered helpful criticism on many points, for which I am most grateful. Dr. S. POLIVANOV very kindly allowed me to cite some of our unpublished work.

SUMMARY

The deterministic theory of selection on an autosomal lethal allele is reviewed and extended, toward the end of investigating in detail the mechanisms of natural selection with this special case. A general equation for the frequency of the lethal at any time is formulated in terms only of the initial frequency, the selective values, and the generations elapsed. Whether a stable equilibrium or elimination will result is discussed, and the rate of convergence to these terminal states is given. A simple graphical means for visualizing the selection on a lethal allele is presented. A method for measuring the selection which acts within each genera- tion in an experimental population is developed as a means of studying the varia- tion of selection with time, or gene frequency, or some other factor of interest. A method is developed for partitioning the total selective value into components which are largely viability and fertility. This partition is accomplished from observations on experimental populations themselves, without the inaccuracies involved in separate experiments to determine each component. The reliability of the estimates of selection under both methods of analysis is assessed and formu- las for the large-sample variances are given. Computer simulations indicate that the large-sample formulas are valid with the sample sizes usually employed in experiments with lethal genes. Data are analyzed to illustrate each concept.

LITERATURE CITED

ANDERSON, W. W., C. OSHIMA, T. WATANABE, TH. DOBZHANSKY, and 0. PAVLOVSKY, 1968 Ge- netics of natural populations XXXIX. A test of the possible influence of two insecticides on the chromosomal polymorphism in Drosophila pseudoobscura. Genetics 58: 423434.

BAILEY, N. T. J., 1961 Maximum likelihood scoring. Appendix 1 to Introduction to the Mathe- matical Theory of Genetic Linkage. Clarendon Press, Oxford.

BART", M. S . , 1949 Fitting a straight line when both variables are subject to error. Bio- metrics 5: 207-212.

BODMER, W. F., 1965. Differential fertility in population genetics models. Genetics 51: 411-424. DUMOUCHEL, W. H., and W. W. ANDERSON, 1968 The analysis of selection in experimental

FISHER, R. A., 1922 CH~RITIER, P. H., and G. TEISSIER, 1934

populations. Genetics 58: 435114.9.

On the dominance ratio. Proc. Roy. Soc. Edinburgh 42: 399433. Une experience de dlection naturelle. Courbe

d'Qlimination du ghne "bar" dans une population de Drosophiles en Cquilibre. Compt. Rend. Soc. Biol. Paris 117: 1049-1051.

The numerical results of diverse systems of breeding. Genetics 1 : 53-89. Notes on the relative fitness of genotypes that form a geometric progression.

The estimation of fitnesses from genotypic frequencies. Evolution 19: 546-551.

JENNINGS, H. S., 1916. Lr, C. C., 1959

PROUT, T., 1965 Evolution 13: 564-567.

Page 20: W. “W” W - GeneticsSELECTION IN EXPERIMENTAL POPULATIONS. I. LETHAL GENES WYATT W. ANDERSON Department of Biology, Yale Uniuersity, New Hauen, Connecticut 06520 Received April

6 72 W. W. ANDERSON

KAo, C. R., 1965 SANDLER, L., and E. NOVITSKI, 1957 Meiotic drive as an evolutionary force. Am. Naturalist 41 :

TEISSIEX, G., 1942 Persistance d'un g h e lbthal dans une population de Drosophiles. Compt. Rend. Acad. Sci. 214: 327-330. - 1944. Equilibre des g h e s 16thaux dans les popula- tions stationnaires panmictiques. Revue Scientifique, Pans: 82e annCe, fasc. 3, pp. 145-159.

The elimination of an autosomal lethal from an experimental population of Drosophila melanogaster. Am. Naturalist 97 : 65-66.

Linear Statistical Inference and Its Applications. Wiley, New York.

105-110.

WALLACE, B., 1963


Recommended