Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 1 times |
Thoughts about the TDT
Contribution of TDT: FindingGenes for 3 Complex Diseases
• PPAR-gamma in Type 2 diabetes
Altshuler et al. Nat Genet 26:76-80, 2000
• NOD2 in Crohn’s Disease
Hugot et al., Nature 411: 599-603, 2001
• ADAM33 in asthma
Van Eerdewegh et al., Nature 418: 426-430, 2002
The common PPAR-gamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes
Altshuler et al. Nat Genet 26:76-80, 2000
*
NOD2 Variants and Susceptibility to Crohn’s Disease
Hugot et al., Nature 411: 599-603, 2001
SNP13:
p=6x10-6
Chrom 16q
Van Eerdewegh et al., Nature 418: 426-430, 2002
ADAM33 Gene: Asthma and Bronchial Hyperresponsiveness
P= 3x10-6
to 0.04
Chrom 20p
Supplementary InformationTable 2 Transmission Disequilibrium test (TDT) for 5 SNPs in ADAM 33
AsthmaOver-Transmitted TDT
SNP/ SNP Combination Allele/Haplotype T NT p-valueS1 G 37 20 0.033T1 T 43 27 0.072V-1 C 43 27 0.072V1 A 7 7 1.00V4 C 73 55 0.13S1/T1 GT 72 38 0.0029T1/V-1 TC 80 46 0.0043T1/V4 TC 97 60 0.0070S1/T1/V-1 GTC 77 41 0.0029S1/T1/V1 GTA 75 41 0.0047S1/T1/V4 GTC 96 60 0.0084T1/V-1/V1 TCA 76 45 0.015T1/V-1/V4 TCC 97 59 0.0046T1/V1/V4 TAC 98 58 0.0031S1/T1/V-1/V1 GTCA 74 41 0.0068S1/T1/V-1/V4 GTCC 96 58 0.0034S1/T1/V1/V4 GTAC 97 58 0.0078T1/V-1/V1/V4 TCAC 96 59 0.0063S1/T1/V-1/V1/V4 GTCAC 95 58 0.0048
SNP Haplotypes
Population distributions of (a) disease given genotype, and (b) genotype given disease.
Affected Affected
Genotype Yes No Genotype Yes No
M1M1 a 1 – a M1M1 d g
M1M2 b 1 – b M1M2 e h
M2M2 c 1 – c M2M2 f i
(a) (b)
Clayton
R R M Ma
cR R M M
b
c( ) , ( )1 1 1 2
d e
f
g h
iOdds Ratio
He calls this the relative risk. Confusing!
Ott
D M
θ
D1 M1
D2 M2
Null hypothesis: θ = ½
(Disease and marker loci unlinked)
Alternative hypothesis: θ < ½
(Disease and marker loci linked)
freq (D1 M1) ≠ freq (D1) × freq (M1)
δ = freq (D1 M1) – freq (D1) × freq (M1)
• We assume that we observe the marker locus genotypes, either M1M1, M1M2, or M2M2, of both parents and the affected sibs in all families in the data.
Probabilities for transmitted and non-transmitted
marker alleles M1 and M2 from any parent of an affected child.
Non-transmitted allele
Transmitted Allele M1 M2 Total
M1 P(11) P(12) P(1.)
M2 P(21) P(22) P(2.)
Total P(.1) P(.2) 1
P(11) = q2 + q δ / p
P(12) = q (1 – q) + (1 – θ – q) δ / p
P(21) = q (1 – q) + (θ – q) δ / p
P(22) = (1 – q)2 – (1 – q) δ / p
Numbers of transmitted and non-transmitted
marker alleles M1 and M2 among the parents of the affected sibs
Non-transmitted allele
Transmitted Allele M1 M2
M1 n11 n12
M2 n21 n22
Put n12 + n21 = n
Only P(12) and P(21) depend on θ .
Also, when θ = ½, P(12)=P(21)
So the “natural” (TDT) test statistic is
This (McNemar statistic) has an
asymptotic 1 df χ2 distribution when the null hypothesis is true.
( )n n
n1 2 2 1
2
• Note that this statistic depends only on n12 and n21 only, and ignores n11 and n22.
• This makes sense: the statistic uses data only from M1M2 parents, and only these are informative for linkage.
• We call these ‘informative” parents.
• So at the end of the day we consider only transmissions from informative parents.
• We will focus entirely on the denominator, n, of the TDT statistic.
• It is remarkable how many questions one can ask about this.
• But before we ask these, we first ask, where does this denominator come from?
• Assuming the null hypothesis is true, n12 has a binomial (n, ½) distribution.
• Note: this is true even if the data contain several affected children from the same family.
• Thus the variance of n12 - n21 (= 2n12 – n) is 4n/4 = n.
• We will examine three situations, all focusing on the question: “Is n the correct (variance) denominator for the situation at hand?”.
• Situation 1. Testing for association.
• Here the null hypothesis is “no association”, or
0The problem here is that transmissions to different affected sibs in the same family are not independent under this null hypothesis. Thus when there are several families in the data with more than one affected sib, n12 does not have a binomial distribution.
If H0, δ =0, is true, the cell probabilities for the simple random-mating case are
P(11) = q2 , P(12) = q(1 – q) ,
P(21) = q(1 – q) , P(22) = (1 – q)2
(Thus should we not be testing this H0 by using both n11 n22 – n12 n21 and n12 –
n21 and a 2 degrees of freedom test?)
Let’s ignore this point for now.
P(11) = (Σi αi (pi2
qi2
+ δi pi qi )) / (Σi αi pi2)
P(12) = (Σi αi (pi2
qi (1 – qi) + δi pi (1 – θ – qi))) / (Σi αi pi
2)
P(21) = (Σi αi (pi2
qi (1 – qi) + δi pi (θ – qi))) / (Σi αi pi2)
P(22) = (Σi αi (pi2
(1 – qi)2 – δi pi (1 – qi))) / (Σi αi pi2)
αi = relative size of subpopulation iδi = linkage disequilibrium in subpopulation ipi = frequency of D1 in subpopulation iqi = frequency of M1 in subpopulation i
Suppose that in family j, M1 is transmitted n12j
times, M2 is transmitted n21j times, from M1M2
parents.
Define Dj as n12j – n21j
The test statistic is
2j
j
D
DT
2j
j
D
DT
n12 – n21
Suppose that there is only one affected child in each family.
Then Dj = ±1 (for all j)
... ly,Equivalent2112
2112
nn
nnz
T
T2 = TDT χ2
• Situation 2. Suppose we have families in the data where both parents are dead, (so we do not know their marker locus genotypes), but where there are two affected sibs, one being M1M1, the other M2M2.
• We therefore can infer that both parents were informative.
• Should we use the data from these families in the analysis, using the standard TDT statistic?
• The answer is “no”. Why is this so?
• Because the very fact that we can infer the parental genotypes unambiguously means that one sib MUST be M1M1 and the other MUST be M1M1.
• In such families there is zero variance, rather than some binomial variance, for the number of M1 genes in the two sibs.
• Philosophical question: is there any difference between the actions you take in directly observing an event and having unambiguous evidence that the event occurred?
• In this case, “yes there is”.
Situation 3. Suppose that we have two affected sibs, one informative (i.e. M1M2) parent, in each family in the data.
Numbers of transmission from the informative parents
2M1 1M1 , 1M2 2M2 Total
# families i j k n
H0 means n/4n/2
n/4 n
4
4
2
2
4
4
totalis Sum
2 Sharing
2 TDT
222
2TOT
2
222SH
2
22TDT
2
n
nk
n
nj
n
ni
ΧΧ
n
nki
n
jkiΧΧ
n
kiΧΧ
1 if 2
21 if
0 if )(2
2))(1(2)(
2TDT
2TDT
2TDT
2TDT
2
j
nΧ
Χ
ki
nΧ
jki
nΧΧW
12
10
, 2
If
22TDT
22WWW ΧΧΧΧ
nki
n
ΧΧΧ
n
ΧΧΧ
nki
W
W
2SH22
TDT
2SH2
TDT2
1)0(
1)1(
2 Assume
Suppose that a sharing Χ2 has been carried out, correctly, as a one-sided test.
Given i + k = s, what is the distribution of Χ2
TDT ?
2 if 1 2
factor Correction
)!0( is This
2 original
correct
)(,0)(
21,Bin , then given ,21HUnder
22TDT
222TDT
0
nss
n
W
n
kiΧ
ki
ki
s
kiΧ
skiVarkiE
s kis
,
One affected sib, two informative (M1M2) parents
Genotype of affected child
M1 M1 M1 M2 M2M2 Total
# families p q r nExpected when H0 (θ=½)
n/4n/2
n/4 n
rp
rpΧ
n
rpΧ
22WL
22TDT
2
s.assumption variousmakes
22 222112211211
221122
HHRR nnnnnnn
nnΧ
GameticDisequilibrium Δ2 = Δ1 (1–θ)
GameticDisequilibrium Δ3 = Δ2 (1–θ)
GameticDisequilibrium Δ1
Parents of generation 1 mate only within their subpopulation
Parents of generation 2 mate at random throughout population
Parents of generation 3 mate at random throughout population
Subpopulation 1 2 …… i …… k
Relative Size α1 α2 …… αi …… αk
Coefficient of gameticDisequilibrium δ1 δ2 …… δi …… δk
Generation 0
Generation 1
Generation 2
Generation 3
Generation 1Gametic Disequilibrium
Δ1
Generation 2Gametic Disequilibrium
Δ2
Generation 3Gametic Disequilibrium
Δ3
Generation 4, etc
Generation 0
The value of the TDT statistic in two models
1. Immediate admixtureGeneration 1 1.48
Generation 2 2.07
Generation 3 15.34
Generation 4 12.43
2. Gradual admixtureGeneration 1 1.48
Generation 2 2.07
Generation 3 8.53
Generation 4 6.99