Lecture 10: Linkage Analysis III

Lecture 10: Linkage Analysis III

Date: 9/26/02 Revisit segregation ratio distortion. Haplotype coding Three point analysis Multipoint analysis

Additive Segregation Ratio Distortion

Systematic genotype classification error occurs.

Power and estimates of recombination fraction are unaffected by additive distortion in the backcross configuration.

Estimates of recombination fraction are not affected for F2, but the false positive rate increases.

Additive Segregation - Backcross

Suppose the frequency of genotype Aa is increased because a fraction u of aa genotypes are misclassified.

Similarly, assume the frequency of genotype Bb is independently increased by fraction v.

We need to recalculate the expected frequencies under the new model with additional parameters u and v.

Additive Segregation – Backcross (contd)

Genotype Expected Frequency

Expected Frequency with Distortion

AaBb 0.5(1-) 0.5(1-) + u/2 + v/2

Aabb 0.5 0.5u/2 – v/2

aaBb 0.5 0.5 - u/2 + v/2

aabb 0.5(1-) 0.5(1-) – u/2 – v/2

Total: Aa 0.5 0.5 + u

Total: aa 0.5 0.5 – u

Total: Bb 0.5 0.5 + v

Total: bb 0.5 0.5 – v


The number of unknown parameters equals the number of degrees of freedom.

Use Bailey’s method to find the MLEs of the parameters (, u, v).

Bailey’s Method

Set the expected frequencies equal to the observed proportions and solve the system of equations for the unknown parameters. These are the MLEs.

Example: Suppose you observe 5 successes from a Binomial(10, p) distribution. Then

pmle = 5/10


What do you notice about the MLE for recombinant fraction?

Is the MLE for recombinant fraction biased?

N

ffffv

N

ffffu

N

ff

4ˆ

4ˆ

ˆ

22122111

22211211

2112

Additive Segregation – F2-CC

Genotype Expected Frequency

Additive Distortion

AABB 0.25(1-)2 u/3 + v/3

AABb 0.5 u/3 – v/3

AAbb 0.25 u/3

AaBB 0.5(1-) - u/3 + v/3

AaBb 0.5(1-2+22) -u/3 – v/3

Aabb 0.5 (1-) -u/3

aaBB 0.252 v/3

aaBb 0.5 (1-) -v/3

aabb 0.25(1-)2 0

Penetrance Distortion - Backcross

Selection, penetrance, linkage to selected markers all can result in penetrance distortion, thus it is quite common.

Suppose (100xu)% of the genotype aa is misclassified as Aa. Similarly, assume that bb has (100xv)% misclassified as Bb independently.


Gen. Expected Frequency

AaBb P(AaBb)+P(scored as Aa|aaBb)P(aaBb)+P(scored as Bb|Aabb)P(Aabb)+P(scored as AaBb|aabb)P(aabb)

=0.5(1-)+0.5u+0.5v+0.5(a+b)(1-)

=0.5[(u+v)+(1-)(1+uv)]

Aabb

aaBb

aabb


Is the estimate for recombination fraction biased?

The power to detect linkage is decreased.

N

ffffv

N

ffffu

vuN

f

22122111

22211211

22

ˆ

ˆ

11

21ˆ

Cost of Assuming Non-Distortion Model

The estimate for recombination fraction is biased. By how much?

ˆEBias

Overall Impact of Segregation Distortion

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

Distortion (u=v)

Biasrecomb. fraction 0.3

recomb. fraction 0.2

recomb. fraction 0.1

First Project

This slide marks the end of the material that will be needed to complete the first project.

Linkage Analysis for Multiple Loci

The haplotype is the sequence of alleles along one of the chromosomes in an individual.

In multipoint linkage analysis we are not concerned with the alleles at each locus, rather its parental origin.

Recoding Haplotypes

Suppose there are k loci. Recode each haplotype as a string of k-1 of 0’s and 1’s If the ith position is 0, it indicates the (i+1)th

locus is noit recombinant with respect to the ith locus.

If the ith position is 1, it indicates the (i+1)th locus is recombinant with respect o the ith locus.

Recoding Haplotypes (contd)

Haplotype ABC

Recombinant on interval:

Picture

AB AC BC

00 no no no A—B—C

01 no yes yes A—BC

10 yes no yes ABC

11 yes yes no ABC

Recoding Haplotypes (contd)

Haplotype Code

ABxCxD

101

000110

Recoded Haplotypes and Recombination Fractions

1101

1001

1011

AC

BC

AB

111011000

Calculate the probabilities of the four haplotype classes (i.e. 00, 10, 01, 11) when AB = 0.1 and BC = 0.2 and AC is unknown. Assume the Sturt map function with L = 1.

Sample Problem

Plan of Attack

1. Transform recombination fractions to genetic map units using the inverse map function.

2. Sum the genetic map units to obtain length of AC interval.

3. Calculate the recombination fraction between AC using the map function.

4. Solve the set of simultaneous equations for the haplotype frequencies.

Step 1

238.0

108.0

BC

AB

m

m

LLme

L

m /12112

1

Step 2

346.0238.0108.0 BCABAC mmm

Step 3

269.0

346.0112

1

112

1

346.0

/12

e

eL

m

AC

LLm

Step 4

1

269.0

2.0

1.0

11100100

1101

1001

1011

0845.0

1845.0

0155.0

7155.0

11

01

10

00

Phase Known Three Point Analysis

When all gametes in sample are fully informative, then the likelihood is simple.

4

1

logi

iifl

BCAB

ACBCAB

BCAB

ACBCAB

c

cl

l

2

,,

,,

How would youtest for interference?

Multipoint Analysis – A Difficulty

Suppose there are k loci. How many haplotypes are possible? How many recombination fractions are

there?

Recombination Value

Definition: The recombination value of a set of intervals is the probability of an odd number of crossovers occurring in the intervals.

How many sets of intervals are there?

Sample Problem – Four Point Analysis

Suppose loci A, B, C, and D are in syntenic order and AB = 0.1, BC = 0.2, and CD = 0.3.

What are the probabilities of the haplotype classes given the Kosambi map function.

12

14

4

m

m

e

e

The Linear Equations

1111101011110100010001000

111111101011001

110111110011010

,101111100010001

100111110101100

011110100011001

010101100011010

001110101010001

AD

AC

CDAB

AB

BD

BC

CD

Multipoint Likelihood

Can be written in terms of the 2k-1-1 recombination values or haplotype frequencies.

Can be reparameterized as k-1 recombination fractions and 2k-1-k interference parameters.

Then tests for interference are possible. An alternative is to assume a map function with

possibly unknown parameters which constrains the gamete probabilities as functions of the k-1 recombination fractions.

Multilocus-Infeasible Map Functions

Kosambi, Carter-Falconer, and Felsenstein map functions are multilocus-infeasible because they can produce negative gametic frequencies.

The Morgan, Haldane, Sturt and generalized map functions are multilocus-feasible.

Haldane is most often used for its simplicity except when linkage is tight, e.g. m << 0.5.

Map Building

How many possible orders are there for k loci?

10 loci can be ordered in over 1 million ways.

The solution is to generate a small number of probably orders and then analyze these few in depth.

Stepwise Approximate Ordering

Use likelihood analysis to order a few markers, say l.

Add each additional marker one at a time by considering all l-1 positions for it. Choose the location that results in the highest likelihood.

Number of likelihood evaluations: 3+4+5...+k = (k-2)(k+3)/2.

Pairwise Approximate Ordering

Two point linkage analysis on all pairs of loci to obtain a recombination fraction estimate.

Multidimensional scaling analyses (multivariate exploratory analysis) to find approximate orders.

Final Step – Perfecting Order

Test the likelihood of various reorderings of neigboring groups of loci.

If an tested order has higher likelihood, keep it.

etc...

Disease Mapping

Condition on an ordering of all markers except disease locus.

Calculate a multilocus likelihood for each possible position of the disease locus, call this lx.

Calculate the location score 2(lx - l) at point x, where l is the log-likelihood with disease locus unlinked to other markers.

Disease Mapping

Can also calculate multipoint LOD scores by dividing locations scores by 2ln(10).

Plot location score or multipoint LOD score by position x. The peak is the likely position of the disease locus and if the peak exceeds some cut-off criteria linkage to that region is significant.

Multipoint vs. Single Point Disease Mapping

Information from every sampled individual, even those who may be homozygous at the single marker.

Single marker can only provide information about crossovers on one side of the disease gene.

The more markers, the sharper the peak. The disease gene is ultimately mapped to the smallest

interval where there is no observed crossover between marker and disease gene in entire sample.

Sample Size

Assuming no interference, crossovers are distributed exponentially with mean 1 per Morgan.

Sample n individuals and the mean rate is n. Therefore, the expected distance to the nearest

crossover on either side of the disease locus is 1/n. The interval containing disease gene has length

distributed as gamma distribution with mean 2/n. Example: You want to localize disease gene to 1

cM = 1/100 M. Therefore, you need n>200.

Summary

Modeling of segregation distortion and the impact on linkage analysis.

Haplotying coding. The use of map functions. Overview of likelihood formulation for

multipoint analysis.

Date post:	12-Jan-2016
Category:	Documents
Upload:	amato
View:	36 times
Download:	0 times

Lecture 10: Linkage Analysis III

Documents