QTL Analysis: Concept Parents F1 F2 F2:3 × AB Generation Procedure Alternatives: BC1, RIL, DHL...

Post on 27-Mar-2015

223 views 0 download

Tags:

transcript

QTL Analysis: Concept

Parents

F1

F2

F2:3

×

A B

Generation Procedure

Alternatives: BC1, RIL, DHL

Field

PHT[cm]

210 190 203 159 206 . . 171

Marker # 1 2 3 4 5 .. M 1 B B H H A .. A 2 H A H A A .. H 3 B B H H H .. A 4 H H B B B .. H 5 H B H H A .. B . . . . . . . . . . . . . . . . N A H H H A .. A

Laboratory

Chromosome 1

LOD score PHT

Office

QTL Analysis: Single Marker Analysis

160

180

200

220

240

Plant height (cm)

XMC (cm)

Total

196

umc130

AA Aa aa

201 196 191

F = 6.47**

umc157

AA Aa aa

195 197 195

F = 0.48 ns

QTL Analysis: Single Marker Model (F2)

rM

m

Q

q

Additive effect: )21(2/)( rammMM

Dominance effect:2)21(2/)( rdmmMMMm

F tests on the contrasts of marker classes test the following hypothesis:

a > 0d > 0r < 0.5

QQ Qq qq

MM (1-r)2 2r(1-r) r2 μ(MM)

Mm r(1-r) (1-r)2+r2 r(1-r) μ(Mm)

mm r2 2r(1-r) (1-r)2 μ(mm)

μ1 μ2 μ3

Schön, 2002

QTL Analysis: Single Marker Model (F2)

r = 0M

m

Q

q

r = 0.2M

m

Q

q

Example: Plant height, umc130

Case 1 Case 2

X(MM) = 201cmX(Mm) = 196cmX(mm) = 191cm

PHT (cm) r = 0 r = 0.2 r = 0.4

Add. Effect 5.0 8.3 25.0

X(QQ) 201.0 204.3 221.0 X(Qq) 196.0 196.0 196.0X(qq) 191.0 187.7 171.0

4. Association Analysis

Concepts

Dissecting A Quantitative Trait: Time Versus Resolution

Resolution in bp

1x1071

Res

earc

h T

ime

in Y

ears

5

1AssociationsAssociations

1x104

F2 QTL Mapping

F2 QTL Mapping

NILs NILs Positional Cloning

Positional Cloning

RI QTL Mapping

RI QTL Mapping

Resolution Versus Allelic Range

Resolution in bp

1x1071

All

eles

Eva

luat

ed

>40

1

Associations In Diverse Germplasm

Associations In Diverse Germplasm

1x104

NILNIL

PedigreePedigree

F2 or RIL Mapping

F2 or RIL Mapping

Positional Cloning

Positional Cloning

Associations In Narrow Germplasm

Associations In Narrow Germplasm

AssociationTests

• Evaluate whether nucleotide polymorphisms associate with phenotype

• Natural populations• Exploit extensive recombination

1.3m

1.5m

1.4m

1.8m

2.0m

2.0m

T A GA A

C G GA A

C G TA A

T A TC G

T G TA G

T G GA G

Association mapping

• Mainstay of human genetics– One of a few possible approaches– Reproducibility was an issue

• Cystic fibrosis– Kerem, et al. (1989). Science 245, 1073-1080.

• Alzheimer's disease – Corder et al. (1994). Nature Genet. 7, 180-184.

Associations may result from at least three causes

2. The locus is in linkage disequilibrium with the cause of the phenotype

Linked and highly correlated

1. The locus is the cause of the phenotype

1 2

Complete Linkage Disequilibrium

Adapted from Rafalski (2002) CurrOpin Plant Biol 5:94-100.

D’=1r2=1

6

6

Locus 1

Lo

cus

2

Same mutational history and no recombination.

No resolution

1 2

Linkage Disequilibrium

D’=1r2=0.33

3

6

Locus 1

Lo

cus

2

Different mutational history and no recombination.

Some resolution

3

1 2

Linkage Equilibrium

D’=0r2=0

3

3

Locus 1

Lo

cus

2

Same mutational history with recombination.

Resolution

3

3

Andes U.S.

3. Population structure can produce associations

G TG G G G TT T G T T

P=0.04

GT80

100

120

140

160

180

200

Pla

nt H

eigh

tP<<0.001

T G0

2

4

6

8

10

Ker

nel H

ue

These non-functional associations can be accounted for by estimating the population structure using random markers.

5. QTL mapping analysis

QTL Analysis: Interval Mapping

PLOT Peak at 96 LOD = 4.7 + === ===== I === === I == === I == I = 2.4 + == I ==== I I ==== ===========********** ****** *************** 0.0 M----+----+---M+----MC--M+----M----+----+----+-C--+----+---M+----+----+--M cM (0.47) 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

r1M1

m1

Q

q

M2

m2

r2

r

Simple Interval Mapping

Composite Interval Mapping

PlabQTL

QTL Analysis: Power of QTL detection

Power: Probability of finding a QTL

Heritability:

Power (%)

0

10

20

30

40

50

60

70

80

90

100

Heritability

0.4 0.5 0.6 0.7 0.8 0.9 1.0

N = 600

N = 300

N = 100

Utz and Melchinger, 1994

2

22

p

gh

QTL Analysis: Conclusions

There are a number of QTL, in analysis the largest ones easiest to detect BUT

Makes detection of others difficult

Models can adjust for this – detect others

QTL Analysis: Conclusions

QTL mapping combines qualitative linkage analysis with quantitative genetic analysis. – Association between marker genotypes and phenotypic trait values.

Single marker analysis is easy to perform but QTL effect and position are confounded. This results in low power of QTL detection.

Interval mapping approaches increase power of QTL detection and allow the estimation of QTL effects and position.

QTL Analysis: Conclusions

Estimates of QTL effects and the proportion of the genotypic variance explained by QTL are biased due to genotypic and environmental sampling.

Estimates of QTL position show low precision.

With large populations a large number of QTL is found for complex traits.

When conducting a QTL study you may wish to use a large population size.

6. Candidate Genes

Functional Genomics Using Diversity

Forward Genetics

Trait

Positionally clone gene

Reverse Genetics

Trait

Candidate gene

QTL Candidate Polymorpism

ComparativeGenomics

Candidate Genes

MutagenesisMolecular &Expression

BiochemicalAnalyses

Positional Candidate Genes

Evolutionary Association Tests

Identify Genes with Phenotypic Effects

Move Alleles into Elite Lines withTransgenics and Introgression

Survey Diverse Races For:1. Phenotype

2. Candidate Gene Sequence3. Population History

Evaluate Phenotypic Effects and MakeGermplasm Available to Breeders

MorphologyPhysiology

QTL Mapping

Association Analysis

Identification of More Favorable Alleles

Enhanced Marker Assisted Breeding

7. Linkage Disequilibrium

Analysis

25

Properties of LD

A

a

PAB = pApB + DAB

PAb = pApb - DAB

PaB = papB - DAB

Pab = papb + DAB

B b

pA

pa

pB pb 1

The basic measure of LD is:

DAB = PAB - pA pB ( DAB = DAb = DaB = Dab )

Linkage Disequilibrium versus Generations Since its Creation

0 100 200 300 400 500

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

c = 0.1 c = 0.02c = 0.01c = 0.005c = 0.001

Dis

eq

uilib

riu

m,

r AB

Generation, g

rAB (1-c)g

Recomb. Rate (c)

Other Measures of LD

Can divide DAB by the maximum value it can obtain:

D’AB = DAB / [max(-pApB, -papb)] if DAB < 0 DAB / [min (pApb, papB)] if DAB > 0

The sampling properties of D’AB are not well understood.

r2AB = D2

AB

pA pB pa pbE(r2)= 1 / (1 – 4Nc)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 2000 4000 6000 8000 10000

Distance in bp

d8

id1

sh1

tb1

d3

fae2

su1

bt2

sh2

wx1

LD generally decays rapidly with distance

r2

Remington, D. L., et al. 2001.. PNAS-USA 98:11479-11484. & unpublished

Investigator Population Studied

Extent of LD

Gaut Landraces <1000 bp

Buckler Diverse Inbreds 2000 bp

Rafalski Elite Lines 100 kb?

(6 kb euchromatin?)

Population Effect on Linkage Disequilibrium in Maize

Reviewed in Flint-Garcia, S. A. et al. 2003. Annual Review of Plant Biology 54:357-374.

8. Association Analysis

Allele Case-Control Test

n1|aff n2|aff

n1|unaff n2|unaff

Affected

Unaffected

allele 1 allele 2

2 naff

2 nunaff

n1 n2

2 N individuals

X2 = i (ni|aff - ni|unaff)2

ni|aff + ni|unaff ~ 2

(k-1)

marker

if naff = nunaff

(k alleles)

39.3%35.9%8

28.8%28.3%4

19.9%17.8%0

-+

Gm3;5,13,14 haplotypeIndex of Indian Heritage

39.3%35.9%8

28.8%28.3%4

19.9%17.8%0

-+

Gm3;5,13,14 haplotypeIndex of Indian Heritage

Proportion with NIDDM by heritage and marker status

Full heritage American Indian Population

+ -Gm3;5,13,14 ~1% ~99%

(NIDDM Prevalence 40%)

Caucasian Population

+ -Gm3;5,13,14 ~66% ~34%

(NIDDM Prevalence 15%)

Full heritage American Indian Population

+ -Gm3;5,13,14 ~1% ~99%

(NIDDM Prevalence 40%)

Caucasian Population

+ -Gm3;5,13,14 ~66% ~34%

(NIDDM Prevalence 15%)

Gm3;5,13,14 haplotype

Cases Controls

+ 7.8% 29.0% - 92.2% 71.0%

Study without knowledge of genetic background:

OR=0.2795%CI=0.18 to 0.40

Population Stratification: American Indian and Diabetes

Knowler 1988 Am J Hum Genet 43, 520-526.

Use SSR Markers to Estimate Population Structure

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

% Non-Stiff Stalk

% S

tiff

Sta

lk

8 Stiff Stalk

38 Non-Stiff Stalk30 Sub-Tropical

Method: Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-59.

Example: Remington, D. L., et al. 2001.. Proc Natl Acad Sci U S A 98:11479-11484.

Logistic Regression Ratio Test For Association

• Adapted from Pritchard case-control approach

• Where:–C = candidate polymorphism distribution

–T = trait value

–Q = matrix of population membership

• Evaluated by logistic regression

• Significance evaluated by permutation based on haplotype distribution in populations

)ˆ;(Pr

)ˆ,;(Pr

0

1

QC

QTC

Pritchard, J. K., M. Stephens, N. A. Rosenberg, and P. Donnelly. 2000. Am J Hum Genet 67:170-181.

Population Structure Estimates Greatly Reduce Estimated Type I Error Rates

0.00

0.05

0.10

0.15

0.20

0.25

1 2 3 4 1 2 3 4

SS

R E

stim

ated

Typ

e I

Err

or

Rat

e

No Pop. Structure EstimateWith Pop. Structure Estimate

Pop. Structure with Rescaling

Flowering Time Height

Fields

Su1• Sugary1 is an

isoamylase, a starch debranching enzyme

• Sequenced fully from 32 diverse lines

• Sampled 2 small parts of gene from 102 lines

11100bp

Whitt, S. R., et al. 2002. PNAS-USA 99:12959-12962.

su1 Promoter & 1st Exon

• Two distinct alleles

• Sweet phenotype not associated

SweetDent + Flint

Pop

2

0 02

2 015

0 01

0 04

0 03

0 01

2 079

0 07

4564:D E

21 1

3 44

su1 Coding Region

• Two distinct alleles

• Sweet phenotype associated with W578R

SweetDent + Flint

Pop

0 05

0 01

0 011

0 03

0 02

0 02

0 01

0 01

0 013

0 061

2

662:K E B4

0 01

0 11

3 00

5 00

0 47

92163:F L

578:W R

SweetDent + Flint

Pop

0 05

0 01

0 011

0 03

0 02

0 02

0 01

0 01

0 013

0 061

2

662:K E B4

0 01

0 11

3 00

5 00

0 47

92163:F L

578:W R

Su1

578:WR

Based on survey of 12kbp from 32-102 lines.

Dwarf8 functional variation

2 Amino AcidDeletion

SH2 Domain

When controlling for population structure, associates with flowering time & plant height across 12 environments.

Thornsberry et al. 2001 Nat. Genet.

MITEIndel

0.6

0.8

1

1.2

1.4

1.6

1.8

D8 SH2 Variant

Day

s to

Silk

ing

rel

ativ

e to

B73

9. Type I and Type II Error

Statistics - Hypothesis Test             Null Hypoth True Null Hypoth False

Reject Null Hypothesis

Type I Errorα

Correct

Fail to Reject Null Hypothesis Correct

Type II Errorβ

Power = 1- β

P-value = α

Experimentwise P value

• Each statistical test has a Type I error rate– Test 20 independent SNPs, one will be significant at

P<0.05• Bonferroni correction essentially divides the P

by number of tests– Often too conservative (no power), as markers are

correlated• Churchill and Doerge permutation help estimate

experimentwise P, – Permutes the entire genotype relative to the

phenotypes

Power of approaches

• Sample size– 100 to 1000 are typical

• Heritability of trait– H2 = 10% - 90%– Depends on ability to measure trait– Interactions with environment

• Depends on statistical properties of test

Association Approaches Complement QTL Linkage Mapping

Linkage (RILs)Association

10,000,000 bp2000 bpResolution

High PowerLittle PowerGenome Scan

HighLowStatistical Power per Allele

Low (1 or 2)High (10s)Allelic Range