Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | aubrey-harrison |
View: | 221 times |
Download: | 5 times |
10 AM Tue 13-Feb
Genomics, Computing, Economics
Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)
http://openwetware.org/wiki/Harvard:Biophysics_101/2007
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0 10 20 30 40 50
Normal (m=20, s=4.47)
Poisson (m=20)
Binomial (N=2020, p=.01)
Binomial, Poisson, Normal
p and q p q q = 1 – p two types of object or event.
Factorials 0! = 1 n! = n(n-1)!
Combinatorics (C= # subsets of size X are possible from a set of total size of n)
n!
X!(n-X)! C(n,X)
B(X) = C(n, X) pX qn-X np 2npq
(p+q)n = B(X) = 1
Binomial frequency distribution as a function of X {int n}
B(X: 350, n: 700, p: 0.1) = 1.53148×10-157 =PDF[ BinomialDistribution[700, 0.1], 350] Mathematica ~= 0.00 =BINOMDIST(350,700,0.1,0) Excel
P(X) = P(X-1) X = x e-X! 2
n large & p small P(X) B(X) np
For example, estimating the expected number of positives
in a given sized library of cDNAs, genomic clones,
combinatorial chemistry, etc. X= # of hits.
Zero hit term = e-
Poisson frequency distribution as a function of X {int }
Z= (X-
Normalized (standardized) variables
N(X) = exp(-2/2) / (2)1/2
probability density function
npq large N(X) B(X)
Normal frequency distribution as a function of X {-}
Expectation E (rth moment) of random variables X for any distribution f(X)
First moment= Mean variance 2 and standard deviation
E(Xr) = Xr f(X) E(X) 2E[(X-2]
Pearson correlation coefficient C= cov(X,Y) = X-X )Y-Y)]/(X Y)
Independent X,Y implies C but C0 does not imply independent X,Y. (e.g. Y=X2)
P = TDIST(C*sqrt((N-2)/(1-C2)) with dof= N-2 and two tails.
where N is the sample size.
Mean, variance, & linear correlation coefficient
www.stat.unipg.it/IASC/Misc-stat-soft.html
One form of HIV-1 Resistance
Association test for CCR-5 & HIV resistanceAlleles Obs Neg ObsSeroPos total ExpecNeg ExpecPosCCR-5+ 1278 1368 2646 1305 1341 ccr-5 130 78 208 103 105total 1408 1446 2854
Pdof=(r-1)(c-1)=1 ChiSq=sum[(o-e) 2̂/e]= 15.6 0.00008
Samson et al. Nature 1996 382:722-5
Association test for CCR-5 & HIV resistanceAlleles Obs Neg ObsSeroPos total ExpecNeg ExpecPosCCR-5+ 1278 1368 2646 1305 1341 ccr-5 130 78 208 103 105total 1408 1446 2854
Pdof=(r-1)(c-1)=1 ChiSq=sum[(o-e) 2̂/e]= 15.6 0.00008
Samson et al. Nature 1996 382:722-5
But what if we test more than one locus?
The future of genetic studies of complex human diseases. Ref (Note above graphs are active spreadsheets -- just click)
Y= Number of Sib Pairs (Association)X= Population frequency (p)
GRR=1.5, #alleles=1E6
1E+2
1E+3
1E+4
1E+5
1E+6
1E+7
1E+8
1E+9
1E+10
1E-091E-081E-071E-060.000010.00010.0010.010.11
|Y= Number of Sib Pairs (Association)X= Genotypic Relative Risk (GRR)
#alleles=1E6, p=0.5 (population frequency)
1E+1
1E+2
1E+3
1E+4
1E+5
1E+6
1E+7
1E+8
0.001 0.01 0.1 1 10 100 1000 10000
1.001 1.01 1.1 2 11 101 1,001 10,001
1-GRRGRR
[based on Risch & Merikangas (1996) Science 273: 1516]|
|
Y= Number of Sib Pairs (Assocation)X= Number of Alleles (Hypotheses) Tested
GRR=1.5, p= 0.5 (population frequency)
0
200
400
600
800
1,000
1,200
1,400
1,600
1E+4 1E+6 1E+8 1E+10 1E+12 1E+14 1E+16 1E+18 1E+20 1E+22
|
GRR = Genotypic relative risk
Class outline
(1) Topic priorities for homework since last class(2) Quantitative exercises so far: psycho-statistics, combinatorials, exponential/logistic, bits, association & multi-hypotheses
(3) Project level presentation & discussion
(4) Discuss communication/presentation tools
Spontaneous chalkboard discussions of t-test,
genetic code, non-coding RNAs & predicting
deleteriousness of various mutation types.