Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | galvin-gamble |
View: | 49 times |
Download: | 0 times |
How to Measure Genetic Heterogeneity
International Workshop on Statistical-Mechanical Informatics
2009/09/13-2009/09/16
Unit of Statistical Genetics
Center for Genomic Medicine
Kyoto University
Ryo Yamada
What is genetic heterogeneity?
Biological Strategies
Landcover map by Environmental Research and Teaching at the University of Toronto
Lives cover land.
Slime moldchanges its shape and moves
around but uses spores to reproduce.
Wikipedia
Slime mold keeps looking for new
(better?) conditions.Space is too big to be covered completely.Therefore, multiple places are selected
and they are bridged without break.
Each part seems to act independently.
Food Slime mold is clever enough to find the shortest
route in the labyrinth.
Its strategy is being
investigated as a new model of
parallel computing
system.
Phylogenic tree
LIFE keeps looking for new (better?)
conditions.
Space is too big to be covered completely.
Therefore, multiple places are selected
and they are bridged without break.
Each part seems to act independently.
Phylogenic tree
They are bridged without break.
WE are here because WE are all offspring of “No-break” family sharing the features of continuous LIFE.
?? Features of LIFE ??
• Keeps looking for something.
• Accepts multiple conditions as good ones.
• Stays contiguous each other.
• Acts independently.
Slime mold distributes in physical space.
LIFE distributes in genetic space.
Distributions ~ Heterogeneity
LIFE distributes in genetic space.
What is genetic space?
DNA molecules4 letters, {A,T,G,C}L=3 x 109 in length (Homo sapience)
Sequence variations
4L; L=1,2,…
Biological space is a part of physico-chemical space.
Biological space is far much smaller than chemical space, But still enormously big.
Environmental fluctuations change width of pathways in biological space
Nature Reviews Genetics 3, 380-390 (2002); doi:10.1038/nrg795GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS
Nature Reviews Genetics 3, 380-390 (2002); doi:10.1038/nrg795GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS
Inter-species
Phylogeny
Intra-species
Recombination Graph
Inter-species heterogeneityIntra-species heterogeneity
MutantLetters are changed
(Mutation)
Combination of letters are changed
(Recombination)
Letters are changed : MutationCombination of letters are changed : Recombination
4L → 2L
Space is too big to be covered completely.
L=3 x 109
Variable sites ~ 10 x 106
Population size of homo sapience ~ 6x109
210,000,000>>>>> 6x109
k sites → 2k sequence variations
00…000 p(1)
00…001 p(2)
00…010 p(3)
00…011 p(4)
…
11…111 p (2k) =1-(p(1)+…+p(2k-1))
• 2k -1 parameters
• Flat and equal
Genetic heterogeneity
Dependency or association among variable sites
How to summarize the heterogeneity
with how many parameters?
Pairwise relation : r2
Variance-covariance matrix• describes the
heterogeneity with k(k-1)/2 parameters for individual pairs.
• predicts test statistics of associated markers for association study.
Ψ• Power set of {1,2,
…,k} is consisted of 2k subsets.
φ{1},{2},…,{k}{1,2},{1,3},…,{2,3},{2,4},…,{k-1,k}→Pairwise
{1,2,3},{1,2,4},…,{2,3,4},…,{k-2,k-1,k}…{1,2,…,k}
• Hierarchic parameters in full.
Hyper-cubes or lattice
Subsets with tandem elements
in Ψ
Pairwise relation : r2
Tandem pairs are elements of
both.
{1,2,…,k}
{1,2}{1,2,3}
{1,2,3,4}
{1,3}
{1,4}
{1,k}
One parameter for heterogeneity (1)
• EntropyH=-Σ p(i) ln(p(i)).
Effective No. sites to describe heterogeneity.
H=k ln(2) when all sites are independent.
H=0 when a clone (no variation).
• Entropy-based standardized measure of allelic association : ε ε=0 when all sites are independent.
ε=1 when only 2 types of sequence exist.
One parameter for heterogeneity (2)
• Entropy-based measure of allelic association : ε When k=2,
ε=r2=Σ((obs-exp)2/exp)=Σ(obs2/exp)-1
• rk keeps the shape of the equation of r2 and fits the value from 0 to 1 for any k:rk=Σ(obs(1+1/(k-1))/exp (1/(k-1))) -1
Space is too big to be covered completely.
210,000,000>>>>> 6x109
Every sequence is unique.Frequency is not useful.
Sparse graph
• Sequences can be plotted at nodes in k-dimensional hyper cube.
• Graph distance between sequences is No. mutations.
Graph distance between sequences is No. mutations.
Recombination’s distance?
Recombination is three-term relation.But graph is for two-
term relation.A more informational
tool is necessary.
• Biological meaning of heterogeneity:
• It does not want to lose variations even when a significant part of it can not survive because they might be useful sometime.
Survival curve of variable sites when a fraction of population extinct
• Each sequence set draws different survival curve.
• A measure to represent curves :
• The area upper the curve.
Various ways to measure
• Pairwise relation r2 k(k+1)/2• Power set Ψ 2k-1 Hierarchic• Entropy H 1• Entropy-based ε 1• r2-generalization rk 1• Graph Mutation distance• Graph+α +Recombination
distance• Survival curve 1 Simulation,
Mutation and Recombination
distance
Unit of Statistical Genetic, Center for Genomic MedicineGraduate School of Medicine, Kyoto University
http://www.genome.med.kyoto-u.ac.jp/wiki_tokyo/index.php/