+ All Categories
Home > Documents > Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker...

Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker...

Date post: 08-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
13
Edinburgh Research Explorer Mapping Mendelian traits in asexual progeny using changes in marker allele frequency Citation for published version: Logeswaran, S & Barton, NH 2011, 'Mapping Mendelian traits in asexual progeny using changes in marker allele frequency', Genetics Research, vol. 93, no. 3, pp. 221-232. https://doi.org/10.1017/S0016672311000115 Digital Object Identifier (DOI): 10.1017/S0016672311000115 Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record Published In: Genetics Research Publisher Rights Statement: © Cambridge University Press 2011 General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 25. Feb. 2021
Transcript
Page 1: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

Edinburgh Research Explorer

Mapping Mendelian traits in asexual progeny using changes inmarker allele frequency

Citation for published version:Logeswaran, S & Barton, NH 2011, 'Mapping Mendelian traits in asexual progeny using changes in markerallele frequency', Genetics Research, vol. 93, no. 3, pp. 221-232.https://doi.org/10.1017/S0016672311000115

Digital Object Identifier (DOI):10.1017/S0016672311000115

Link:Link to publication record in Edinburgh Research Explorer

Document Version:Publisher's PDF, also known as Version of record

Published In:Genetics Research

Publisher Rights Statement:© Cambridge University Press 2011

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and / or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights.

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact [email protected] providing details, and we will remove access to the work immediately andinvestigate your claim.

Download date: 25. Feb. 2021

Page 2: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

Mapping Mendelian traits in asexual progeny using changesin marker allele frequency

SAYANTHAN LOGESWARAN1* AND NICK H. BARTON1,2

1 Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, UK2 IST Austria, Am Campus 1, Klosterneuburg 3400, Austria

(Received 12 March 2010 and in revised form 9 February 2011 )

Summary

Linkage between markers and genes that affect a phenotype of interest may be determined by examiningdifferences in marker allele frequency in the extreme progeny of a cross between two inbred lines. This strategy isusually employed when pooling is used to reduce genotyping costs. When the cross progeny are asexual, theextreme progeny may be selected by multiple generations of asexual reproduction and selection. We analyse thismethod of measuring phenotype in asexual progeny and examine the changes in marker allele frequency due toselection over many generations. Stochasticity in marker frequency in the selected population arises due to thefinite initial population size. We derive the distribution of marker frequency as a result of selection at a singlemajor locus, and show that in order to avoid spurious changes in marker allele frequency in the selectedpopulation, the initial population size should be in the low to mid hundreds.

Introduction

Methods to map alleles responsible for variation ina particular trait rely on detecting linkage betweenknownmarker alleles and the trait (Sax, 1923; Thoday,1961). In experimental crosses, linkage is inferred fromstatistical correlations between marker and phenotypein the progeny of a cross between two inbred linesthat differ in trait value (Broman, 2001). In order toachieve reasonable power in detecting linkage, largenumbers of cross progeny need to be genotyped andphenotyped. Consequently, this procedure can be verytime consuming and expensive. Selective genotyping(Lander & Botstein, 1989; Darvasi & Soller, 1992) re-duces time and costs by only analysing cross progenywith extreme phenotype, as these individuals providethe most linkage information. When analysing onlythe extreme progeny, one can use changes in markerallele frequency in the selected group to infer linkage(Lebowitz et al., 1987). Markers that are linked toalleles that influence the trait should change in fre-quency in the selected group, whereas the frequencyof unlinked markers should remain unchanged.

This strategy of using change in marker allelefrequency to detect linkage is usually employed whenDNA pooling is used. Rather than individuallygenotyping each progeny in the selected group, DNAis pooled from all individuals in the group and markerfrequencies are estimated from the intensities ofmarker bands (or similar signals) in the pooledDNA. This further reduces time and costs. Thismethod is often referred to as bulk segregant analysis(Michelmore et al., 1991) or selective DNA pooling(Darvasi & Soller, 1994).

The other main occasion when marker frequenciesare used to detect linkage is in artificial selection ex-periments, where two lines are divergently selected(Keightley & Bulfield, 1993; Nuzhdin et al., 1998,2007). This strategy is used for quantitative traits,where the aim is to have many generations of sexualreproduction and selection, so that much greaterphenotypic variation is produced than is present inan F2 or backcross population. The more extremephenotypes generated result in larger differences inmarker allele frequencies between the two lines,making it easier to detect linkage. Furthermore, anadded advantage of this method is that the increasednumber of recombination events (due to the severalgenerations of sexual reproduction) may result ingreater mapping resolution.

* Corresponding author: Institute of Evolutionary Biology,School of Biological Sciences, University of Edinburgh, WestMains Road, Edinburgh EH9 3JT, UK. e-mail : [email protected]

Genet. Res., Camb. (2011), 93, pp. 221–232. f Cambridge University Press 2011 221doi:10.1017/S0016672311000115

Page 3: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

In all these methods that use change in allelefrequency to detect linkage, one must measure thephenotype of the progeny in the F2/backcross popu-lation or in each generation in an artificial selectionexperiment to pick out the tails of the phenotypic dis-tribution. In most studies, the cross progeny are sex-ual and the phenotype is measured in standard ways.However, when the cross progeny are asexual one canuse selection to measure the phenotype. Artificiallyselecting the asexual cross progeny over many gen-erations is equivalent to picking out the tail of thephenotypic distribution of sexual progeny in a singlegeneration. The longer one selects the asexual pro-geny (and the larger the initial population), the moreextreme the tail of the phenotypic distribution that isselected. This method has recently been used in genemapping studies in microbes.

One such method is array-assisted bulk segregantanalysis (Brauer et al., 2006), which has been usedto map traits in yeast. Here, yeast strains differing ingenetic background and trait value are crossed. Theresulting asexual progenies are individually measuredby selecting for the trait over a number of generations.A group of the selected individuals is then pooledto detect linkage. In this particular method, the allelefrequencies are estimated by hybridizing the pooledDNA to a microarray.

When using this strategy in asexual cross progeny,one could also measure the phenotype directly withina pool of recombinant progeny. That is, rather thanindividually selecting each asexual recombinantand then pooling, one could pool the cross progenytogether at the start and then select for the trait directlyon this pooled progeny. The selected pool is thenused to detect linkage. An example of this strategyis Linkage Group Selection (Culleton et al., 2005;Martinelli et al., 2005), which has been used to mapgenes in malaria parasites. Here, once again malariaparasites with differing genetic background andtrait value are crossed. The resulting asexual crossprogenies are pooled and selected for the trait formany generations. Linkage is then determined byestimating changes in marker allele frequency fromthe selected pool. Similar strategies have been used instudies of yeast (Segre et al., 2006; Ehrenreich et al.,2010).

When using this method in asexual progeny, itis important to ensure that the changes in markerallele frequency in the selected pool are due to linkageto a selected allele and not just a result of randomdrift. Previous models (Lebowitz et al., 1987; Kim &Stephan, 1999) that have dealt with changes in markerallele frequency in gene mapping experiments, havefocused on artificial selection experiments in sexualprogeny and examined changes inmarker frequency asa result of several generations of sexual reproductionand selection. In this paper, however, we provide the

basic theoretical framework for the strategy of pickingout the extreme individuals in pooled asexual progenyby selecting for the trait over many generations. Weconcentrate on Mendelian traits and derive the dis-tribution of marker frequency in a selected pool asa result of selection at just a single major locus. Weshow from this how large the initial population sizeshould be in order to avoid spurious changes inmarkerallele frequency.

Theory

Model

A cross is made between two haploid lines that differin trait value. This cross results in N haploid recom-binant progenies each containing a randomassortmentof marker alleles from the parental lines, with eachmarker having an expected frequency of 0.5. We willconcentrate on the simplest situation of a binary traitwhere the variation in phenotype between the two linesis due to just one major locus. A fitness advantage isassigned to the recombinants that contain the positiveallele (i.e. the allele that increases the value of thetrait), and so the initial population consists of twofitness classes. This recombinant population is thenselected for the trait over many generations. As thispopulation is asexual, no further recombination takesplace during this multi-generation selection phase. Itis assumed that selection is applied for long enough sothat only recombinants originating from the fitterclass remain in the final population. Therefore, thepositive allele should be fixed in the selected popu-lation, and because there is only one round of re-combination, markers in a large region around theselected locus should also be at a higher frequency.The frequencies of markers in all other regions of thegenome are expected to remain unchanged. So, fromthis model, we are interested in analysing the fre-quency of all markers in the selected population, andthe stochasticity that arises in this frequency due tofinite population size.

Deterministic expectation

If selection is continued until the fitter class of recom-binants fix in the population, then the selected allelewill be at frequency 1. The expected frequency of allother markers in the selected population would beequal to the probability that the marker in questionwas on the same genotype as the selected allele inthe initial population. For the positive markers (fitterparental markers), this probability would simply be1xr, and for the negative markers (less fit parentalmarkers) it would be just r, where r is the probabilityof recombination between the selected allele and themarker in question.

S. Logeswaran and N. H. Barton 222

Page 4: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

Stochastic distribution

With an infinite number of recombinants, the markerfrequency will approach the deterministic expectation,but finite numbers will lead to variation around thisexpectation. In the extreme, suppose there was justone recombinant with the positive allele in the initialpopulation. The typical marker composition of this re-combinant will look like one of those given in Fig. 1a.As this single recombinant is the fittest in the initialpopulation, selection (if applied for long enough) willpick out only its descendants. Therefore, all recom-binants in the selected population will have exactlythe same marker composition. Hence, the final markerfrequencies will look like those in Fig. 1b, where amarker is either fixed or not present at all. Withmore than one initial recombinant with the positiveallele present in the initial population, there will beinitially much more diversity in the marker compo-sition, but this diversity may not be reflected in thefinal population. For example, suppose there were

10 initial recombinants with the positive allele, eachwith a different marker composition. Again, selectionwill pick out only the descendants originating fromthese 10 initial recombinants. However, the numberof descendants that each recombinant actually leavesmay be highly random. One may leave no descendantsin the final population, while another may leave hun-dreds. Consequently, some markers will be over rep-resented in the selected population, which can be seenfromFig. 1c results in a very randompattern ofmarkerfrequency. This randomness is reduced by increasingthe number of recombinants with the positive allele inthe initial population. This results in a more balancedrepresentation of all markers in the selected popu-lation. It can be seen from Fig. 1d that with this in-crease in the number of recombinants with the positiveallele in the initial population, the marker frequenciesapproach the deterministic expectation, enablingmuch easier identification of the selected locus.

So, in order to evaluate how much stochasticity inthe marker frequency would be expected for a certain

(a)

(c) (d)

(b)G

enom

es

0 2 4 6 8 10Marker position – Morgans

2 4 6 8 10Marker position – Morgans

2 4 6 8 10Marker position – Morgans

0 2 4 6 8 10Marker position – Morgans

1

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

0

1

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

1

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

Fig. 1. (a) Each line represents the typical marker composition of a single recombinant with a selected allele at position 4on the genome represented by a circle. The black parts represent the fitter parental markers (positive markers) and the greyparts represent the less fit parental markers (negative markers). (b) Plot of the positive marker frequency in the selectedpopulation when there is just a single recombinant (the first genome in (a)) in the initial population. (c) The black and greycurves show two replicates of the positive marker frequencies in the selected population when all ten recombinants fromthe first graph are present in the initial population. It can be seen that the two replicates do not give the same frequencies.This reflects the random number of descendants each recombinant left in each replicate. (d) This shows the frequency ofthe positive markers in the selected population when there are 100 recombinants with the positive allele in the initialpopulation. In (b), (c) and (d) the dotted curve represents the deterministic expectation for the positive marker frequency,which is 1xr, where r is calculated from the Haldane map function r=1/2 (1xex2x), and x is the map distance betweenthe marker and selected locus.

Mapping genes in asexual cross progeny 223

Page 5: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

initial population size, we will next derive the distri-bution of the marker frequency in the selected popu-lation. From this, it is possible to calculate how largethe initial population size needs to be in order to avoidspurious changes in marker frequency, and also workout the probability of getting false positives when wedo have large stochasticity in frequency.

Branching process

To derive the distribution of marker frequency, thedistribution of the number of descendants originatingfrom a single recombinant needs to be obtained. Thiscan be modelled as a branching process. That is, ateach generation each selected recombinant leaves anumber of offspring j, with mean m and variance s2.This process can be modelled by the probability gen-erating function f(z)=g1

0 Pkzk, where Pk is the prob-

ability that j=k. This represents the offspringdistribution of a single recombinant for a single gen-eration. This can be extended to get the offspringdistribution after t generations by t iterations of f(z).That is, ft(z)=f(f( …(f(z)) …)). So, if we let X denotethe number of descendants originating from a singlerecombinant after t generations, we have that X hasdistribution ft(z). Obtaining probabilities from ft(z),however, can be computationally intensive, so insteadjust the moments of X will be outlined. From theproperties of generating functions we have that themean E (X) and variance Var(X) of the number of de-scendants originating from a single recombinant aftert generations is given by (1) and (2) (Jagers, 1975) :

E (X)=mt, (1)

Var(X)=s2mtx1 (mtx1)(mx1)x1: (2)

Moments

Using (1) and (2) it is possible to obtain the mean,variance and covariance of the number of copies ofeach marker in the selected population. Consider aninitial population of size N and a single marker m. LetSm be the number of copies of that marker in theselected population. We have that Sm=gn

i=0Xi, wheren is a random variable representing the initial numberof recombinants that had marker m. Expressions forthe moments of Sm are derived in the Appendix. Now,let Fm=Sm/St be the frequency of marker m, where St

is the total number of recombinants in the selectedpopulation. Obtaining exact expressions for the mo-ments of Fm in the selected population is mathemat-ically difficult, so approximations will be used instead.These approximations are given by (3)–(5). They arederived from the moments of Sm and St (derivationdetailed in the Appendix). In (3) and (4), Pm is theprobability that marker m is on the fittest genotype(Pm=r if m is a negative marker and Pm=1xr

if m is a positive marker). In (5), Cov (Fm1,Fm2

) refersto the covariance in frequency between two markersm1 and m2, and Pm1m2

is the probability that bothmarkers m1 and m2 are on the fittest genotype.

E (Fm)=Pm, (3)

Var(Fm)=2Pm(1xPm) 1+Var(X)

E (X)2

� �1

N, (4)

Cov(Fm1,Fm2

)=2(Pm1m2xPm1

Pm2) 1+

Var(X)

E (X)2

� �1

N:

(5)

Diffusion approximation

Although expressions for the moments of the numberof copies of a marker and moments for the frequencyof a marker have been obtained, in order to obtaina tractable expression for the distribution of these,we need to use a diffusion approximation. Diffusiontheory predicts (Feller, 1951) that starting with n0 re-combinants, after a long time, given that they survive,the numbers will increase as n0xe

st, where 0<x<‘ is ameasure of the acceleration relative to the expectationn0e

st, and its distribution is given by

w (x)=2ex2n0sxn0sI1 (4n0s

ffiffiffix

p)

(e2n0sx1)ffiffiffix

p , (6)

where I1 (x) is the modified Bessel function ands=log(m). For small n0s, eqn (6) approximates to anexponential distribution. So, as an approximation wecan try and use an exponential distribution for thedistribution of numbers from a single recombinant.The expected value lx1 for the exponential distri-bution would be the expected x of a single recom-binant given that its descendants have survived in theselected population. We have that the probability ofsurvival PS=1–ft(0), and thus lx1=PS

x1. So, there-fore we get (7) as an approximation for the distri-bution of x

’ (x)=PS exPSx: (7)

It should be noted, however, that as (7) is an ap-proximation derived from the diffusion result, whichitself is an approximation of the general branchingprocess, it is not expected that it will work well in allsituations. Figure 2 shows the goodness of fit of (6)and (7) for simulated data. It can be seen that bothwork well for weak selection but decline in goodnessof fit for strong selection. So, in the following section,we will use (7) to derive the distribution of markerfrequency for situations when fitness is not too high,but as we shall show later, for large fitness we can inmost cases use a normal approximation for thedistribution of frequency using the moment calcula-tions (3)–(5).

S. Logeswaran and N. H. Barton 224

Page 6: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

Distribution of marker frequency

We will assume that the distribution of the numberof descendants from a single recombinant, given thatits descendants have survived in the selected popu-lation, is an exponential distribution with expectationE (X)PS

x1. Now consider an initial population of sizeN and a single positive marker m+ a recombinationrate r away from the selected locus. We have that thenumber of copies of m+ in the selected population isgiven by S +

m =gn1i=0Xi, where each Xi is exponentially

distributed and n1 is a binomially distributed randomvariable with expectation E(n1)=1=2NPS (1xr).Thus, Sm

+ is distributed as C (n1, E (X)PSx1), where C

represents a Gamma distribution (i.e. a sum of expo-nential distributions). So, the frequency of m+ in theselected population would be defined as Sm

+/(Sm

x+Sm+), where Sm

x is the number of negativemarkers at that locus in the selected population,which has distribution C (n2, E (X)PS

x1), whereE (n2)=1=2NPSr. Hence, the distribution of markerfrequency is a Beta distribution B(n1, n2). Averagingover n1 and n2, we get (8) as the probability densityfunction for a positive marker frequency u, where p1=1=2PS (1xr) and p2=1=2PSr.

f(u)= gN

n1=1gN

n2=1

N!

n1!(Nxn1)!

N!

n2!(Nxn2)!

rp n11 (1xp1)

Nxn1p n22 (1xp2)

Nxn2

rC (n1+n2)

C (n1)C (n2)un1x1 (1xu)n2x1:

(8)

It should be noted that as the Beta distribution isonly defined for n1,n2>0, f(u) does not take into ac-count the case where there are zero copies of a partic-ular marker at the locus (i.e. n1=0 or n2=0). Thisresults in the density function f(u) excluding theprobability that a marker is fixed or lost in the selectedpopulation. Therefore, the true density function isgiven by f(u)+P (u=0)+P (u=1), where P (u=0)is the probability that the marker is lost, and P (u=1)is the probability that the marker is fixed. If weagain focus on a positive marker m+, we have thatP (u=1)=(1x(1xp1)

N)(1xp2)N, where (1x(1xp1)

N)is the probability that at least one recombinantwith marker m+ survives in the selected population,and (1xp2)

N is the probability that no recombinantswith the negative marker at that locus survivesin the selected population. Similarly, P (u=0)=(1x(1xp2)

N)(1xp1)N. It should be noted that the in-

clusion of these two probabilities is only really neededin the cases where the initial population size is verysmall or when a marker is extremely close to theselected locus, as the probability of a marker beingfixed or lost in other situations is negligible.

Figure 3 illustrates the goodness of fit of this ap-proximation for various different parameters. Wesee, as expected, eqn (8) works well for small fitnessbut goodness of fit declines as fitness gets larger.For large fitness, however, assuming N is not toosmall, we can approximate the distribution of fre-quency by using a normal distribution with meanand variance given by (3) and (4). It can be seen fromFig. 3c, d that the normal distribution provides a

µ = 1·2

µ = 3

µ = 2

1

0·1 1 2 3 4 5

8 16 24 25 2 4 6 8x x

x

φ (x) – Diffusion

ϕ (x) – Exponential

Fig. 2. Distribution of the relative numbers from a single recombinant given that its descendants have survived in theselected population. The diffusion curve represents (6) with parameters n0=1 and s=log(m), and the exponential curverepresents (7). The number of generations of growth were t={20, 10, 10} for m={1.2, 2, 3}. The offspring distribution pergeneration was Poisson.

Mapping genes in asexual cross progeny 225

Page 7: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

good approximation when the initial population isnot too small.

Effective initial population size

Using the moment calculations it is possible to workout how large the initial population sizeN should be inorder to avoid spurious changes in marker frequency.As seen in Fig. 1 the largerN is, the less is the variationin frequency in the selected population. However, itcan also be seen from Fig. 3 that even though thesame initial population size can be present in two ex-periments, the distribution of marker frequency canbe very different. In Fig. 3a, b, both simulations showlarge variation in frequency due to having only asmall initial population size of N=15. Figure 3a,however, shows far more variation than Fig. 3b. Thisdiscrepancy is due to the variation in the number ofdescendants each initial recombinant leaves in theselected population. The majority of this variation inthe number of descendants can be attributed towardsthe differences in the probability of survival of theinitial recombinants in the two examples. That is, notall of the 15 recombinants in the initial population

have survived and left descendants in the selectedpopulation. Only a certain portion of the initial popu-lation has actually contributed towards the final fre-quency. This subset of the initial population thatactually leaves descendants in the selected populationis what we will refer to as the effective initial popu-lation size N*. Since, it is assumed in this model thatonly the fittest genotype remains in the selectedpopulation, this effective initial population size N*can be defined as the initial proportion of recom-binants within this fitter class that leave descendantsin the selected population. As a result, N* is a bi-nomially distributed random variable with E (N*)=0.5NPS.

The larger N* is, the less the variation in markerfrequency. For instance, in Fig. 3a, the probability ofsurvival PS=0.32, and hence E (N*)=2.38, while inFig. 3b PS=0.94 and E (N*)=7.05. So, although bothexamples started off with 15 unique recombinantgenotypes, on average only about two unique geno-types are represented in the selected population in oneexample, whereas on average seven unique genotypesare represented in the selected population in the other.So, this reduction in the effective initial populationsize led to a lot more variation in frequency in the

(a) (b)

(c) (d)

N = 15µ = 1·2

N = 100µ = 1·2

N = 100µ = 3

N = 15µ = 3

0 0·5 1 0 0·5 1

0 0·5 1 0 0·5 1

Fig. 3. Distribution of frequency for unlinked markers for a small initial population size of N=15 and a larger initialpopulation size of N=100. For each of the initial population sizes, the distribution of frequency is plotted for small fitnessm=1.2 and large fitness m=3. The black curve represents (8), while the grey curve in (c) and (d) is a normal approximationusing (3) and (4). The number of generations of selection was 20 for small fitness and 10 for large fitness. The offspringdistribution was Poisson.

S. Logeswaran and N. H. Barton 226

Page 8: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

example in Fig. 3a. The same explanation is respon-sible for the differences in marker distribution inFig. 3c, d. Hence, when determining how large theinitial population size N should be, one needs to takeinto account the probability of survival. In general,when the mean offspring per generation is small, theprobability of survival would be quite low and a muchlarger N would be needed to ensure enough genotypessurvive in the selected population. This can be seen inFig. 4. It plots the variance in frequency in the selectedpopulation (using (4)) against N for various differentfitnesses. It can be seen that, as expected, for small Nthere is a lot more variation, and for small fitness thevariance is even larger due to the smaller N*. It canalso be seen that having an initial population size atleast in the mid hundreds ensures only small variationin marker frequency in the selected population.

False positives

To get an idea of how this variation in marker fre-quency affects the mapping ability, we can calculatethe number of false positives we would get, when wetry to identify markers linked to the selected locus. Forinstance, suppose we wanted to do an initial genomescan to see which chromosome the selected allele lieson. The deterministic expectation predicts that thecloser a particular marker is to the selected locus themore extreme the frequency of that marker becomes.Hence, identifying the marker with the highest (posi-tive markers) or lowest (negative markers) frequencyshould reveal, at a minimum, which chromosome theselected allele lies on. Finite population sizes, how-ever, may lead to more extreme marker frequency onother chromosomes. So, for various initial populationsizes, what is the probability that the most extrememarker frequency is the marker that is linked to theselected locus?

If we look at the positive markers we are interestedin finding the maximum marker frequency. In thiscase, we can define a false positive as a marker inunlinked regions that has a frequency greater than themarker that is closest to the selected locus. Hence, weneed to evaluate P (unull<ulinked), where unull is themaximum frequency in unlinked (or null) regions, andulinked is the frequency of the marker closest to theselected locus. To evaluate this probability, we willassume that we have c chromosomes of equal length lMorgans, and assume each chromosome has a total oft markers at equally spaced intervals d=l/(tx1). Forsimplicity, we will also assume that the selected alleleis positioned in the middle of two markers resulting inthe distance between the closest marker and theselected allele being d/2. Now, in order to evaluatethe distributions for unull and ulinked, we will use thenormal approximations using moment calculations(3)–(5). So, let fN(ulinked) be the normal approxi-mation for the probability density of ulinked, and letP (ulinked=1) be the probability that ulinked is fixed inthe selected population. For unull, the distribution ofthe maximum frequency from the set of markers inunlinked regions is needed. We need to use a multi-variate normal distribution for this probability as thefrequencies of markers on the same chromosome canbe correlated. So, for any given value of ulinked, sayulinked*, an approximate probability that the maxi-mum frequency in unlinked regions is less thanulinked*, is given by P (unull<ulinked*)=FCMVN(u)cx1,where FCMVN(u) is the cumulative multivariate nor-mal distribution, and u is a vector of length t with allelements equal to ulinked*. Integrating over all possiblevalues of ulinked, we get (9) as an approximation forthe probability of not getting a false positive.

P (unull<ulinked)=Z 1

0FCMVN (u)

cx1fN (ulinked) dulinked

+P (ulinked=1): (9)

Figure 5 shows how well (9) works against simu-lation results. The solid curves are the theoreticalresults using (9) and the dashed curves are the corre-sponding results from simulations. The curves plotthe probability of getting a false positive for increas-ing effective initial population size. In the example,there are c=20 chromosomes each of length l=1Morgan. The false-positive probabilities were calcu-lated when there were t=3 and t=5 markers perchromosome. It can be seen that the approximation(9) slightly overestimates the number of false positives.This is mainly due to the normal approximation forulinked. That is, the closer a marker is to the selectedlocus, the less it follows a normal distribution. As aresult, the false-positive rate is overestimated. Forextremely small initial population sizes, eqn (9) wouldnot provide a good approximation for the number of

0·14

0·12

0·1

0·08

0·06

0·04

0·02

100 200 300 400 500

Var

ianc

e in

fre

quen

cy

Initial population size (N)

Fitness µ3

1·5

1·2

Fig. 4. The variance in frequency in the selectedpopulation for unlinked markers. The variance wascalculated from 2 Pm (1xPm) (m

2+s2xm) (N (mx1) m)x1

(i.e. limit of (4) as tp‘), where Pm=0.5 and m={1.2,1.5, 3}. The offspring distribution used was a Poissondistribution and thus s2=m.

Mapping genes in asexual cross progeny 227

Page 9: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

false positives, as the marker frequencies can nolonger be approximated by a normal distribution. Ingeneral, however, we see from Fig. 5, that the false-positive rate is reduced, as expected, when the vari-ation in marker frequency is reduced with the increasein the effective initial population size. With the smal-ler effective initial population sizes, an increase in themarker density is needed to reduce the number of falsepositives. It should also be noted that with extremelysmall initial population sizes (i.e. effective initialpopulation size less than 15), the probability of fix-ation of an unlinked marker is greater than zero, andas a result the false rate may always remain high nomatter how densely the markers are spaced.

Discussion

Mendelian traits

The aim is to locate alleles that influence a trait byexamining changes in marker allele frequency in poolsof asexual selected cross progeny. The extreme pro-geny are selected by multiple generations of asexualreproduction and selection. It was shown that theability to identify markers linked to a causative alleledepends on the variance in marker frequency in theselected population. The larger the variation in mar-ker frequency, the more chance there is of spuriouspeaks and valleys in frequency in unlinked regions.The amount of variation in frequency in unlinkedregions will be determined by the number of uniquerecombinant genotypes present in the selected popu-lation. The more unique recombinant genotypes

present in the selected population, the more balancedthe representation of markers is in the selected popu-lation, and the more likely that the marker frequencywill approach the deterministic expectation, makingidentification of causative loci much easier. Theamount of unique recombinant genotypes present inthe selected population will be determined by the sizeof the initial population. From Fig. 4, it was shownthat having an initial population size in the mid hun-dreds should ensure that there is small probability ofspurious changes in marker frequency in unlinkedregions.

However, the ease of detection will also depend onthe marker density. A simple way to identify the gen-eral location of the selected locus would be to identifythe marker with the most extreme frequency. In thiscase, having a very dense map of markers will ensurethat a marker is close enough to the selected locus, sothat its frequency is the most extreme in the genome,making identification of the location of the selectedlocus easier. How dense the markers need to be toachieve this will mainly be determined by the effectiveinitial population size, and also by the length andnumber of chromosomes. From the example in Fig. 5,it was shown that relatively few markers are neededper chromosome to achieve a low false-positive rate,as long as the effective initial population size is not toosmall.

Maximum likelihood estimator

A more statistical approach to identify the location ofselected loci may also be employed using the modeldeveloped in this paper. For example, a maximumlikelihood approach using a standard interval map-ping technique (Lander & Botstein, 1989) can be usedto identify markers linked to selected alleles that havebeen fixed in the population. That is, similar to intervalmapping, two markers at a time would be analysed oneach chromosome. For each pair of markers thatare analysed, a log likelihood ratio l=log (L0/LA)=log (L0)xlog (LA) would be calculated. LA is thelikelihood under the hypotheses that a single selectedallele is fixed somewhere between the two markers,and L0 is likelihood under the null hypothesis that noselected allele exists between the two markers.Assuming that the effective initial population size isnot extremely small, a bivariate normal distributionusing moments (3)–(5) can be used for the likelihoodfunctions for both L0 and LA. Apart from the locationparameter of the fixed selected allele (which is em-bedded in the recombination probabilities in (3)–(5)),there is one unknown parameter in both L0 and LA

whose value needs to be estimated from the data. Thisis the constant V=2 1+ Var (X)

E (X)2

� �1Nfrom the moments

(4) and (5). A maximum likelihood estimator for V, V̂,can be obtained by solving dlog(L0/A)/dV=0 for V.

0·6

0·5

0·4

0·3

0·2

0·1

15 20

E (N*) – Expected effective initial population size

25 30 35 40 45 50

Prob

abili

ty o

f fa

lse

posi

tive

Markers

3

5

Fig. 5. The probability of getting a false positive plottedagainst the expected effective initial population size E (N*).The solid curves are the theoretical predictions using (9)(i.e. 1xP (unull<ulinked)) and the two dashed curves aresimulation results. The parameters that were used wasc=20 chromosomes each of length l=1 Morgan. Thenumber of markers t on each chromosome was t=3 and5. The black curves are results when t=3 and the greycurves are the results when t=5. The number ofgenerations of selection was 10 and overall fitness ofselected allele was 3.

S. Logeswaran and N. H. Barton 228

Page 10: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

Once V̂ has been obtained the log likelihood ratio lcan be calculated at various positions along a chromo-some. Significance levels for these log likelihood ratioscan be obtained by permutation analysis (Churchill& Doerge, 1994) by using simulated data from amultivariate normal with parameter V̂. Figure 6 showsa simple example of this. Figure 6a plots the negativemarker frequency in a single replicate where there isa selected allele fixed at position 1 and the effectiveinitial population size N*=20. Figure 6b plots thecorresponding log likelihood ratios and significancelevel. It can be seen that the likelihood model cor-rectly identifies the general location of the selectedallele.

Quantitative traits

So, overall it can be seen that this selection techniquein asexual cross progeny is a relatively efficient methodfor mapping simple Mendelian traits. However, formore complex traits, the situation is not as straight-forward. If we apply the current technique to quanti-tative traits, we see firstly that the current strategy ofletting the experiment run until a genotype fixeswould not be the most efficient. This is because thelonger one selects, the more stochasticity we would seein marker frequency in unlinked regions. For example,suppose g loci influence the trait. There now could bea possible 2g genotypes in the initial haploid popu-lation. As selection is applied, the less fit genotypesare lost, and the genotypic composition of the popu-lation becomes increasingly biased towards the uppertail of the fitness distribution. However, if g is large,these genotypes in the upper tail may only have beenat small numbers in the initial population. As a result,the effective initial population size may become verysmall as selection is applied, leading to large stochas-ticity in marker frequency in unlinked regions.

An example of this is shown in Fig. 7. It shows themarker frequencies at various generations of selec-tion, when there are five unlinked selected loci, onelarge effect locus and four small effect loci, and arelatively large initial population size of 200. The barcharts in Fig. 7 represent the genotypic compositionof the population at that particular generation. Withfive unlinked selected loci, there are 25=32 possiblegenotypes, with each genotype having a probability2x5=0.03125 of being produced at meiosis. So, in thebar charts in Fig. 7, each bar represents one of these32 genotypes, with bar number 1 representing the leastfit genotype and bar number 32 representing the fittestpossible genotype. In the initial cross, it can be seenthat most genotypes are equally represented in thepopulation and markers frequencies are, as expected,around 0.5. After 10 generations of selection, it can beseen that most genotypes are still present in thepopulation, but the frequency of the genotypes in theupper half of the fitness distribution have increased.These genotypes in the upper half of the fitness dis-tribution all have the large effect allele, and conse-quently it can be seen that the frequencies of markersaround the large effect locus have increased. The fre-quencies of all other markers remain roughly thesame. After 30 generations of selection it can be seenthat the fitter genotypes are now starting to establishin the population, which results in an increase in fre-quency of the smaller effect alleles. It can also be seenthat a lot of the genotypes in the lower half of thefitness distribution are at insignificant numbers or nolonger present in the population. This results in a de-crease in the effective initial population size. That is,after 30 generations of selection, the number of

1(a)

(b)

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

0

–2

–4

Log

like

lihoo

d ra

tio λ

–6

2 4 6 8 10 12

Marker position – Morgans

20 4 6 8 10 12

Marker position – Morgans

Fig. 6. (a) Plot of the negative marker frequency(marker every 5 cM) in a single replicate where there is aselected allele fixed at position 1, and the effective initialpopulation size N*=20. (b) The grey curve is a plot of thecorresponding log likelihood ratios and the black line isthe significance level. To calculate the log likelihood ratios,the genome was split into overlapping intervals of 10 cM,where the overlap was 5 cM. For each interval, thelog likelihood ratio was calculated using the two markersthat define the interval. The unknown parameter V in thelog likelihood functions was estimated by using all markersand assuming they are all unlinked, and then solvingdlog (L)/dV=0 for V. The significance levels wereobtained by permutation analysis of simulated data froma null region. The simulated data were obtained bydirectly simulating frequencies from a multivariatenormal with parameter V̂.

Mapping genes in asexual cross progeny 229

Page 11: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

unique recombinant genotypes in the population hasbeen reduced from 200 to 92. This results in slightlymore variation in frequency in unlinked regions. After100 generations of selection, there are only six fitnessclasses present in the population, with the fittest(genotype 32) being the only one in substantial num-bers, which results in the frequency of all the selectedalleles nearing fixation. However, with so few fitnessclasses remaining in the population, the effectiveinitial population size has become very small. There

are now only 27 unique recombinant genotypes in thepopulation, with the vast majority of the populationoriginating from just six unique recombinant geno-types. Consequently, many markers in unlinked re-gions are also at very low or high frequency.

So, we see that, in any one replicate, if selectionis continued on for a very long time, it may be verydifficult to identify which of these peaks and valleys inmarker frequency are truly selected alleles and whichare null regions, due to the very low effective initial

1 6543210

% in

pop

ulat

ion

0

2

4

6

8

10

% in

pop

ulat

ion

0

5

10

15

20

25

30

% in

pop

ulat

ion

0

20

40

60

80

% in

pop

ulat

ion

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

20 4 6 8 10 12 1 4 8 12 16

Genotype

20 24 28 32

1 4 8 12 16

Genotype

20 24 28 32

1 4 8 12 16

Genotype

20 24 28 32

1 4 8 12 16

Genotype

20 24 28 32

Marker position – Morgans

1

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

20 4 6 8 10 12

Marker position – Morgans

1

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

20 4 6 8 10 12

Marker position – Morgans

1

0·8

0·6

0·4

Mar

ker

freq

uenc

y

0·2

20 4 6 8 10 12

Marker position – Morgans

Initial cross

10 generations of selection

30 generations of selection

100 generations of selection

Fig. 7. The marker frequencies and the genotypic composition of the population at various generations of selection whenthere are multiple selected loci are shown. There are a total of five selected alleles at positions {2, 4, 6, 8, 10} (shown by thefilled circles) with selection coefficients {0.2, 0.05, 0.01, 0.03, 0.04}. With five selected alleles there are 32 possiblegenotypes. The bar charts show the proportion of each of these 32 genotypes in the population at that particulargeneration. Genotype number 1 refers to the least fit genotype (relative fitness of 1) and genotype 32 refers to the fittestpossible genotype (relative fitness of 1.36). The initial population size was 200.

S. Logeswaran and N. H. Barton 230

Page 12: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

population size. In order to avoid this, much largerinitial population sizes would be needed so that en-ough numbers of the fitter genotypes are producedat meiosis. However, as g gets larger the populationsizes that are needed may become prohibitively large.Also, with large g the fitness differences between thevarious genotypes will become quite small, and thusletting the experiment run until a genotype fixeswould most likely be infeasible, as it would take anextremely long time for any one genotype to fix.Hence,both these reasons suggest that for quantitative traits,finding an optimal time to run the experiment in orderto get the maximum amount of information from thechanges in marker frequency is necessary.

We thank two anonymous reviewers for helpful commentsand suggestions. This work was funded as part of theGENACT Project, funded by the Marie Curie HostFellowships for Early Stage Research Training, as partof the 6th Framework Programme of the EuropeanCommission.

Appendix

Moments of the number of copies of a marker min the selected population

Consider an initial population of size N and a singlemarker m. Let Sm be the number of copies of thatmarker in the selected population. We have thatSm=gn

i=0Xi, where n is a random variable represent-ing the initial number of recombinants that had mar-ker m. As we are assuming in the model that only thefitter class of recombinants survive in the selectedpopulation, we have that n is a binomially distributedrandom variable with expectation E (n)=0.5NPm,where Pm is the probability that marker m is on thefittest genotype (Pm=r if m is a negative markerand Pm=1xr if m is a positive marker). Therefore,we have that the expected number of copies of amarkerm, E (Sm), and variance Var(Sm) in the selectedpopulation is given by (A.1) and (A.2).

E (Sm)=E (E (Smjn))=E (n)E (X), (A:1)

Var(Sm)=E (Var(Smjn))+Var(E (Smjn))=E (n)Var(X)+Var(n)E (X)2:

(A:2)

Given two markers m1 and m2, the covariance,Cov(Sm1

Sm2), between the number of copies of each

marker in the selected population is given by (A.3),where Pm1m2

is the probability that both markers m1

and m2 are on the fittest genotype.

Cov(Sm1,Sm2

)=NE (X)2 (0�5Pm1m2x0�25Pm1

Pm2)

+0�5NVar(X)Pm1m2: (A:3)

Moments of the frequency of a marker m in the

selected population

Obtaining exact expressions for the moments ofmarker frequency in the selected population is math-ematically difficult, so approximations will be usedinstead. Let Fm=Sm/St be the frequency of marker m,where St is the total number of recombinants in theselected population. If we expandFm as a Taylor series,we get (A.4) and (A.5) as an approximation for themean and variance in marker frequency in the selectedpopulation. To derive the covariance in frequency,Cov(Fm1

,Fm2)=E (Fm1

Fm2)xE (Fm1

)E (Fm2), we expand

Fm1Fm2

=(Sm1Sm2

)�(St)

2 as a Taylor series and get(A.6) as an approximation for the covariance in fre-quency between markers m1 and m2.

E (Fm) �E (Sm)

E (St)+

Var (St)E (Sm)

E (St)3 x

Cov(Sm,St)

E (St)2 ,

(A:4)

Var(Fm) �Var(Sm)

E (St)2 +

E (Sm)2Var (St)

E (St)4

x2E (Sm)Cov(Sm,St)

E (St)3

(A:5)

Cov(Fm1,Fm2

) �E (Sm1)E (Sm2

)

E (St)2 +

Cov(Sm1,Sm2

)

E (St)2

x2E (Sm1

)Cov(Sm2,St)

E (St)3

x2E (Sm2

)Cov(Sm1,St)

E (St)3

+3E (Sm1

)E(Sm2)Var(St)

E (St)4

xE (Fm1)E (Fm2

):

(A:6)

Since we are assuming in our model that onlyrecombinants from the fittest class survive in theselected population, we can make some simplifica-tions to the above calculations. Given only one fitnessclass survives we have that E (Sm)=PmE (St) andCov(Sm,St)=PmVar(St), where E (St) and Var(St)can be calculated using (A.1) and (A.2), where n nowis a binomial random variable with expectation 0.5N.Substituting these into (A.4), (A.5) and (A.6) we get(A.7) as the expectation of frequency, which is just thesame as the deterministic expectation, and (A.8) and(A.9) as the variance and covariance in frequency.

E (Fm)=Pm, (A:7)

Var(Fm)=2Pm (1xPm) 1+Var(X)

E (X)2

� �1

N: (A:8)

Cov(Fm1,Fm2

)=2(Pm1m2xPm1

Pm2) 1+

Var(X)

E (X)2

� �1

N:

(A:9)

Mapping genes in asexual cross progeny 231

Page 13: Edinburgh Research Explorer · Mapping Mendelian traits in asexual progeny using changes in marker allele frequency SAYANTHAN LOGESWARAN 1* ANDNICK H. BARTON,2 1 Institute of Evolutionary

References

Brauer, M. J., Christianson, C. M., Pai, D. A. & Dunham,M. J. (2006). Mapping novel traits by array-assisted bulksegregant analysis in Saccharomyces cerevisiae. Genetics173, 1813–1816.

Broman, K. W. (2001). Review of statistical methods forQTL mapping in experimental crosses. Lab Animal (NY)30, 44–52.

Churchill, G. A. & Doerge, R. W. (1994). Empiricalthreshold values for quantitative trait mapping. Genetics138, 963–971.

Culleton, R., Martinelli, A., Hunt, P. & Carter, R. (2005).Linkage group selection: rapid gene discovery in malariaparasites. Genome Research 15, 92–97.

Darvasi, A. & Soller, M. (1992). Selective genotyping fordetermination of linkage between a marker locus and aquantitative trait locus. Theoretical and Applied Genetics85, 353–359.

Darvasi, A. & Soller, M. (1994). Selective DNA poolingfor determination of linkage between a molecularmarker and a quantitative trait locus. Genetics 138,1365–1373.

Ehrenreich, I. M., Torabi, N., Jia, Y., Kent, J., Martis, S.,Shapiro, J. A., Gresham, D., Caudy, A. A. & Kruglyak,L. (2010). Dissection of genetically complex traits withextremely large pools of yeast segregants. Nature 464,1039–1042.

Feller, W. (1951). Diffusion processes in genetics. InProceedings of Second Berkeley Symposium on Math-ematics Statistics and Probability, pp. 227–246. Berkeley,CA: University of California Press.

Jagers, P. (1975). Branching Processes with BiologicalApplications. London, New York: Wiley.

Keightley, P. D. & Bulfield, G. (1993). Detection of quan-titative trait loci from frequency changes of marker allelesunder selection. Genetical Research 62, 195–203.

Kim, Y. & Stephan, W. (1999). Allele frequency changesin artificial selection experiments : statistical power

and precision of QTL mapping. Genetical Research 73,177–184.

Lander, E. S. & Botstein, D. (1989). Mapping Mendelianfactors underlying quantitative traits using RFLP linkagemaps. Genetics 121, 185–199.

Lebowitz, R. J., Soller, M. & Beckmann, J. S. (1987). Trait-based analyses for the detection of linkage betweenmarker loci and quantitative trait loci in crosses betweeninbred lines. Theoretical and Applied Genetics 73,556–562.

Martinelli, A., Cheesman, S., Hunt, P., Culleton, R., Raza,A., Mackinnon, M. & Carter, R. (2005). A genetic ap-proach to the de novo identification of targets of strain-specific immunity in malaria parasites. Proceedings ofNational Academy of Sciences of the United States ofAmerica 102, 814–819.

Michelmore, R. W., Paran, I. & Kesseli, R. V. (1991).Identification of markers linked to disease-resistancegenes by bulked segregant analysis : a rapid method todetect markers in specific genomic regions by using seg-regating populations. Proceedings of National Academy ofSciences of the United States of America 88, 9828–9832.

Nuzhdin, S. V., Harshman, L. G., Zhou, M. & Harmon, K.(2007). Genome-enabled hitchhiking mapping identifiesQTLs for stress resistance in natural Drosophila.Heredity99, 313–321.

Nuzhdin, S. V., Keightley, P. D., Pasyukova, E. G. &Morozova, E. A. (1998). Mapping quantitative trait lociaffecting sternopleural bristle number in Drosophilamelanogaster using changes of marker allele frequenciesin divergently selected lines. Genetical Research 72, 79–91.

Sax, K. (1923). The association of size differences with seed-coat pattern and pigmentation in phaseolus vulgaris.Genetics 8, 552–560.

Segre, A. V., Murray, A. W. & Leu, J. Y. (2006). High-resolution mutation mapping reveals parallel experi-mental evolution in yeast. PLoS Biology 4, e256.

Thoday, J. M. (1961). Location of polygenes. Nature 191,368–370.

S. Logeswaran and N. H. Barton 232


Recommended