Aladdin Hamwieh
Mapping and Applications of Linkage Disequilibrium and Association
Mapping in Crop Plants
independent assortment and punnett square
A dihybrid cross produces F2 progeny in the ratio 9:3:3:1.
Crossover
Independent assortment produces a recombinant frequency of 50 percent.
Linkage
• Loci that are close enough together on the same chromosome to deviate from independent assortment are said to display genetic linkage
BUT • The linked loci that are far from each others
are in danger of
CROSSINGOVER
Deviations from independent assortment
In the early 1900s, William Bateson and R. C. Punnettwere studying inheritance of two genes in the sweet pea.
In a standard self of a dihybrid F1, the F2 did not show the 9:3:3:1 ratio predicted by the principle of independent assortment.
In fact Bateson and Punnett noted that certain combinations of alleles showed up more often than expected, almost as though they were physically attached in some way. They had no explanation for this discovery.
Thomas Hunt Morgan found a similar deviation from Mendel’s second law while studying two autosomal genes in Drosophila. Morgan proposed a hypothesis to explain the phenomenon of apparent allele association.
One of the genes affected eye color (pr, purple, and pr, red), and the other wing length (vg, vestigial, and vg, normal). The wild-type alleles of both genes are dominant.
DEVIATIONS FROM INDEPENDENT ASSORTMENT
When two genes are close together on the same chromosome pair (i.e., linked), they do not assort independently.
• Chiasmata (the visible manifestations of crossing-over): a cross-shaped structure forming the points of contact between non-sister chromatides of homologous chromosomes.
Frequencies of recombinants arising from crossing-over. The frequencies of such recombinants are less than 50 percent.
Linkage maps (distance between the genes.)
• Recombinant frequencies are significantly lower than 50 percent and the recombinant frequency was 12.97 percent.
(146+157) * 100 / 2335 = 12.97 • Morgan studied
– linked genes, – proportion of recombinant progeny – varied considerably,
• Morgan concluded actual distances separating genes on the chromosomes.
• Alfred Sturtevant suggested that we can use this percentage of recombinants as a quantitative index of the linear distance between two genes on a genetic map, or linkage map.
• Sturtevant postulated the greater the distance between the linked genes, the greater the chance of crossovers in the region between the genes.
• Sturtevant defined one genetic map unit (m.u.) as that distance between genes for which one product of meiosis in 100 is recombinant. Put another way, a recombinant frequency (RF) of 0.01 (1 percent) is defined as 1 m.u. A map unit is sometimes referred to as a centimorgan (cM) in honor of Thomas Hunt Morgan.
LINKAGE MAPS (DISTANCE BETWEEN THE GENES.)
A chromosome region containing three linked genes. Calculation of AB and AC distances leaves us with the two possibilities shown for the BC distance.
Recombination between linked genes can be used to map their distance apart on the chromosome. The unit of mapping (1 m.u.) is defined as a recombinant frequency of 1 percent.
example
For the v and ct loci 89+94+3+5 =191
For the ct and cv, loci 45+40+3+5 = 93
For the v and cv, loci 45+40+89+94 = 268
Fig. 5.15
Mapping the12 chromosomes
of tomatoes.
Morphological Markers1. Small Number2. Limited genomic coverage3. Could be influence by environment4. Most of them exhibit dominance nature
Linkage Mapping• Genes are points on the genome and there are a
flanking regions around them link to these genes. • The central idea of the linkage mapping is to put a
lot of points on the genome in order to get points that linked to another interesting points (genes).
• These points that we add are called as:
“MARKERS”
Molecular Markers• Dominant or Co-dominant nature in different types:1. Protein-based
– Isozyme– Allozyme
2. Hybridization-based– RFLP– DArT
3. PCR-based– RAPD, AP-PCR– AFLP– STS (SSR, ISSR, SCAR, CAPS)– RGA
4. Single Nucleotide Polymorphism (SNP)
Linkage mapping populations
The mapping resolution and the genetic diversity in the linkage mapping populations will depend on the number of founders, generations of inter-mating and generationsof selfing.
AI-RILs, advanced intercross–recombinant inbred linesHIF, heterogeneous inbred familyMAGIC lines, multiparent advanced generation intercross linesNIL, near-isogenic lineRILs, recombinant inbred lines
(Bergelson and Roux, 2010) Nature Review, Genetics (December), Vol 11: 867-879
Hamwieh et al. 2005
Introduction
Molecular markers:•RFLP•AFLP•RAPD•SSR•SNP•STS•ISSR
Genetic map of lentil
RAPDAFLPSSR
How to genotype?
a bb b a a b b b a a a a b b b a b b H
P1 P2
Plant 85
Plant 86
Marker 1
Marker 2
How to genotype?
Qual itative traits:
Co-dominant MarkerP1 P2
1 2 3 4 5 6 7 8 9 10 11 12 13 14
DOMINANT MARKERP1 P2
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Chi-Square
• Obs.A=45 Exp.A=50• Obs.B=55 Exp.B=50
ExpExpObsx
22 )(
150
)5550(50
)4550( 222
x
Chi-Square Table
DF 0,995 0,9500 0,100 0,050 0,025 0,010 0,005 1 0,000 0,004 2,706 3,842 5,024 6,635 7,879 2 0,010 0,103 4,605 5,992 7,378 9,210 10,597 3 0,072 0,352 6,251 7,815 9,348 11,345 12,838 4 0,207 0,711 7,779 9,488 11,143 13,277 14,860 5 0,412 1,146 9,236 11,071 12,833 15,086 16,750 6 0,676 1,635 10,645 12,592 14,449 16,812 18,548 7 0,989 2,167 12,017 14,067 16,013 18,475 20,278 8 1,344 2,733 13,362 15,507 17,535 20,090 21,955 9 1,735 3,325 14,684 16,919 19,023 21,666 23,589
10 2,156 3,940 15,987 18,307 20,483 23,209 25,188 11 2,603 4,575 17,275 19,675 21,920 24,725 26,757 12 3,074 5,226 18,549 21,026 23,337 26,217 28,300 13 3,565 5,892 19,812 22,362 24,736 27,688 29,819 14 4,075 6,571 21,064 23,685 26,119 29,141 31,319 15 4,601 7,261 22,307 24,996 27,488 30,578 32,801 16 5,142 7,962 23,542 26,296 28,845 32,000 34,267 17 5,697 8,672 24,769 27,587 30,191 33,409 35,718 18 6,265 9,390 25,989 28,869 31,526 34,805 37,156 19 6,844 10,117 27,204 30,144 32,852 36,191 38,582 20 7,434 10,851 28,412 31,410 34,170 37,566 39,997
Recombinant Fraction
P1 P2 1 2 3 4 5 6 7 8 9 10 11 12 13 14
M1
M2
CM3.14100142
LOD Score
)5.0()5.0(log_ 10
LLScoreLOD
n
nmm
LLratioLikelihood
5.0)1(
)5.0()5.0(_
Zmax
• N=35• M=7
0 0.1 0.2 0.3 0.4 0.5
-1
0
1
2
3(0.20,2.9
2978)
Recombinant fraction
LOD
Sco
re
n
nmm
Z5.0
)1(logmax 105.00max
M:RecombinantN: Total NumberM-N: Non Recombinant
θ 0.001 0.01 0.05 0.1 0.2 0.3 0.4Z -6.0 -3.0 -1.1 -0.4 0.1 0.2 0.1θ 0.05 0.1 0.15 0.2 0.25 0.3 04Σ(Z) 28.2 31.2 30.4 27.8 24.0 19.4 9.0
Zmax = maximum likelihood score (MLS)
Mapping Function
• Haldane
• Kosambi
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50
50
100
150
200
qHaldaneKosambi
Recombinant fraction
Centi
mor
gan
)21(5.0 LnM
212125.0 LnM
Softwares
Program System Lic. Interface Pop. Types Ref.CARTHAGENE Win, UNIX Free
Graphical,Command line
F2, backcross, RIL, outcross
de Givry et al. 2005
CRIMAP Win, UNIX Free Command line pedigree Green et al 1990
JOINMAP Win Com. Graphical F2, backcross ,RIL, DH, outcross Stam 1993
LINKMFEX Win Free Graphical outcross Danzann and Gharbi 2001
MAPMAKER Win,UNIX, MAC Free Command line
F2, backcross, RIL, DH
Landr et al. 1987
MAPMANAGER Win, MAC Free GraphicalF2, backcross,
RILManly and Olson 1999
QTL mapping
• genotype and phenotype individuals• look for statistical correlation between
genotype and phenotype
Quantitative traits:
Quantitative trait loci (QTL) analysis:Correlate segregation of thequantitative trait with that ofqualitative trait, i.e., markers
Marker Distance
Line 1
Line 2
Line 3
Line
4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Line
11
Line 12
Line 13
Line 14
Line 15
Line 1
6 _3_0363_ 0 A B B A A A B A B B A B B B B B_1_1061_ 0.8 A B B A A A B A B B A A A B B A_3_0703_ 1.5 B A A B B B A B A A B B B B B B_1_1505_ 1.5 B A A B B B A B A B B B B B B B_1_0498_ 1.5 B B B B B B B B B B B B B B B A_2_1005_ 3.8 A B B A A A B A B A A B B B B B_1_1054_ 3.8 A A A A A A A A A B A A A A A A_2_0674_ 6 A B B A A A B A B A A A A A A B_1_0297_ 8.8 A A B B B B B A A A A A A A A B_1_0638_ 10.7 A A B B B B B A A B A A A A A A_1_1302_ 11.4 B A A A B B A A A B A B B B B A_1_0422_ 11.4 B A A A B B A A A B A B B B B A_2_0929_ 15.3 A B B B A A B B B A B A A A A B_3_1474_ 15.4 A B B B A A B B B A B A A A A A_1_1522_ 17.3 A B B B A A B B B A B A A A A A_2_1388_ 17.3 A A A A A A A A A A A A A A A A_3_0259_ 18.1 B B B B B B B B B B B A A A A A_1_0325_ 18.1 B B B B B B B B B B B A A A A A_2_0602_ 20.8 A A B A A A A B A B A A A A A A_1_0733_ 23.9 B B B B B B B B B B B A A A A A_2_0729 23.9 B B B B B B B B B B B A A A A A_1_1272_ 23.9 A B B B A A B B B B B B B B B B_2_0891_ 26.1 A A A A A A A A A B A A A A A A_2_0748_ 26.6 B B B B B B B B B A B B B B B B_3_0251_ 27.4 A B A A A B A A A B A A A B A A_1_0997_ 35.5 B B A A A B B B B B B B B B B B_1_1133_ 41.8 B B A A A B B B B A B A A A A A_2_0500_ 42.5 A A A A A A A A A B A B B B B B_3_0634_ 43.3 B B B B B B B B B A B A A A A A
0
10
5Disease severity
Ref. Software
Lander et al. 1987 MapMaker/QTL
Basten et al. 1999 QTL Cartographer
Broman et al. 2003 R/qtl
Mester et al. 2004 MultiQTL
van Ooijen and Maliepaard 1996 MapQTLSeaton et al. 2002 QTL Express
Utz and Melchinger 1996 PLABQTL
Meer et al. 2004 MapManager/QTX
Wang et al. 2003 WebQTL
Yang et al. 2005 QTLNetwork
QTL Detection Softwares
Statistical Models
1. Interval Mapping (IM)2. Composite Interval Mapping (CIM)3. Multiple Interval Mapping (MIM)4. Bayesian Interval Mapping (BIM)5. single Marker Regression (MR)6. Statistical Machine Learning (SML)
Association mapping
Comparison of Different Plant Breeding Materials for Association Mapping
Hamwieh, A., Udupa, S., Sarker, A., Jung, C. and Baum, M. (2009). Development of new microsatellite markers and their application in the analysis of genetic diversity in lentils. Breeding Science 59: 77-86.
Project 2: Genetic diversity in lentils
300 accessions2915 accessions
Chickpea Reference Set (GCP)
Upadhyaya HD, Dwivedi SL, Baum M, Varshney RK, Udupa SM, Gowda CLL, Hoisington D and Singh S (2008) Genetic structure, diversity, and allelic richness in composite collection and reference set in chickpea (Cicer arietinum L.). BMC Plant Biology 8: 106.
Allele frequency
–frequency (A) = p,–frequency (B) = q,then the next generation will have:–frequency of the AA genotype = p2–The frequency of the AB genotype = 2pq–The frequency of the BB genotype = q2
Allele and Genotype Frequencies in H-W equilibrium
p2 (AA)2pq (Aa)q2 (aa)
Hardy-Weinberg Equilibrium
Hardy–Weinberg equilibriumFemalesA (p) a (q)
MalesA (p) AA (p2) Aa (pq)a (q) Aa (pq) aa (q2)
(p2) + (2pq) + (q2) = 1
P= AA + ½ Aaq= aa + ½ Aa
where p is the frequency of the A allele, q is the frequency of the a allele, and p + q= 1.
Basic Descriptors ofLinkage Disequilibrium
• LD is measuring non random association between alleles
m2m3
m4m5
m6m7 m8m9m1
Hardy–Weinberg equilibrium p + q = 1p2 + 2pq + q2 = 1
Example
p: is the frequency of the dominant allele. p: is the frequency of the recessive allele. p2:is the frequency of individuals with the homozygous dominant genotype. 2pq: is the frequency of individuals with the heterozygous genotype. q2 :is the frequency of individuals with the homozygous recessive genotype.
Hardy–Weinberg equilibriump + q = 1p2 + 2pq + q2 = 1
The frequency of white fruits is 160, the homozygous recessive genotype, as they have only one genotype, (bb). Black fruits can have either the genotype (Bb) or the genotype (BB), and therefore, the frequency cannot be directly determined. Population size is 1000.
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑜𝑓 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙= 𝐼𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
160
1000= 0.16
bb = q2 = 0.16 q = 0.4 p = 1 – q p = 1 – 0.4 = 0.6
2pq = 2 X 0.6 X 0.4 = 0.48 p2 = 0.62 = 0.36
q2 X total population = 0.16 X 1000 = 160 White fruits, bb genotypep2 X total population = 0.36 X 1000 = 360 Black fruits, BB genotype2pq X total population = 0.48 X 1000 = 480 Black fruits, Bb genotype
Mar
ker B
A
A
marker B
Linkage equilibrium : random associationLinkage disequilibrium : there is a correlation between loci
Introduction to Linkage Disequilibrium
B b Total
A PAB PaB PA
a PaB Pab Pa
Total PB Pb 1.0
A BA ba Ba b
A, B: major alleles
a, b: minor alleles
PA: probability for A alleles at SNP1
Pa: probability for a alleles at SNP1
PB: probability for B alleles at SNP2
PB: probability for b alleles at SNP2
PAB: probability for AB haplotypes
Pab: probability for ab haplotypes
SNP1 SNP2
Linkage Equilibrium• PAB = PAPB
• PAb = PAPb = PA(1-PB)
• PaB = PaPB = (1-PA) PB
• Pab = PaPb = (1-PA) (1-PB) B b Total
A PAB PAb PA
a PaB Pab Pa
Total PB Pb 1.0
SNP1
SNP2
Linkage Disequilibrium
PAB ≠ PAPB
DAB=PAB-PAPB
A1 A2 Total
B1 p1q1+D p2q1-D q1
B2 p1q2-D p2q2+D q2
Total p1 p2
Allele frequencies
Linkage Disequilibrium
PAB ≠ PAPB DAB=PAB-PAPB
D’ = D/DmaxWhen D≥ 0
Dmax is the smaller of p1q2 and p2q1
D’ = D/DminWhen D≤ 0
Dmin is the larger of -p1q2 and -p2q1
Linkage Disequilibrium
Another LD measure is r2 and this is calculated as the following:
r2= D2/(p1p2q1q2)0 ≤ r2 ≤ 1
r2 = 0: Loci in complete linkage equilibrium r2 = 1: Loci are in complete linkage disequilibrium
Haplotype Observed FrequencyA1B1 0.6
A1B2 0.1
A2B1 0.2
A2B2 0.1
ExampleSNP locus A: A1 = T, A2 = CSNP locus B: B1 = A, B2 = G
Allele Symbol Allelic freq.
A1 p1 0.7
A2 p2 0.3
B1 q1 0.8
B2 q2 0.2
D=0.6-(0.7 * 0.8) D = 0.04 D>0 then we use Dmax
p1q2 = 0.14p2q1 = 0.24
D’ = 0.04/0.14 = 0.286r2= (0.04)^2/(0.7*0.3*0.8*0.2)
r2= 0.048
Examples
Dise
ase
Linkage Disequilibrium
Likelihood ratio test for HWE
65
An Example of LD Bins (1/3)
• SNP1 and SNP2 can not form an LD bin.– e.g., A in SNP1 may imply either G or A in SNP2.
Individual SNP1 SNP2 SNP3 SNP4 SNP5 SNP6
1 A G A C G T2 T G C C G C3 A A A T A T4 T G C T A C5 T A C C G C6 T G C T A C7 A A A T A T8 A A A T A T
66
An Example of LD Bins (2/3)
• SNP1, SNP2, and SNP3 can form an LD bin.– Any SNP in this bin is sufficient to predict the values of others.
Individual SNP1 SNP2 SNP3 SNP4 SNP5 SNP6
1 A G A C G T2 T G C C G C3 A A A T A T4 T G C T A C5 T A C C G C6 T G C T A C7 A A A T A T8 A A A T A T
67
An Example of LD Bins (3/3)
• There are three LD bins, and only three tag SNPs are required to be genotyped (e.g., SNP1, SNP2, and SNP4).
Individual SNP1 SNP2 SNP3 SNP4 SNP5 SNP6
1 A G A C G T2 T G C C G C3 A A A T A T4 T G C T A C5 T A C C G C6 T G C T A C7 A A A T A T8 A A A T A T
0 20 40 60 80 100 120 140 160 1800
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Distance
LD(R
2)Short LD extend
Long LD extend
Genome-Wide Association Studies (GWAS): Hunting for Genes in the New Millennium
•GWAS scan the genomes of thousands of individuals who have a particular phenotype for DNA sequences that they share, but are much rarer in individual who do not have the trait
•GWAS: to identify of new regions containing no a priori candidate genes, and potentially enhancing the knowledge of complex traits.
Accessions with disorder Accessions without disorder
The new way to track genes (Genome wide association)
Advantages of combining association andtraditional linkage mapping methods.
(Bergelson and Roux, 2010) Nature Review, Genetics (December), Vol 11: 867-879
(Bergelson and Roux, 2010) Nature Review, Genetics (December), Vol 11: 867-879
Thank you