+ All Categories
Home > Documents > Anitha Kannan and John Winn

Anitha Kannan and John Winn

Date post: 31-Dec-2015
Category:
Upload: aretha-rivera
View: 28 times
Download: 1 times
Share this document with a friend
Description:
Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations. Anitha Kannan and John Winn. Jim Huang *. - PowerPoint PPT Presentation
Popular Tags:
22
24/07/2007 ISMB/ECCB 2007 24/07/2007 ISMB/ECCB 2007 Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations Anitha Kannan and John Winn Jim Huang * Probabilistic and Statistical Inference Group, Edward S. Rogers Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada Microsoft Research Cambridge Machine Learning and Perception Group Cambridge, UK
Transcript
Page 1: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007

Bayesian association of haplotypes and non-genetic factors to

regulatory and phenotypic variation in human populations

Anitha Kannan and John Winn

Jim Huang*

Probabilistic and Statistical Inference Group, Edward S. Rogers Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada

Microsoft Research Cambridge Machine Learning and Perception Group Cambridge, UK

Page 2: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007

Outline

• Main contributions:

• Joint Bayesian modelling of genetic variation data and quantitative trait measurements

• Rich probabilistic model for genotype data• State-of-the-art results on predicting missing

genotypes

Page 3: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

OutlineGenotype: Unordered pair of SNPs along both chromosomes

Haplotype: Ordered set of SNPs along a chromosome

Presence of recombination hotspots partitions haplotypes into blocks [Daly, 2001]

Page 4: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Part I: Learning haplotype block structure

• Our model for genotype data should:– Account for phase & parent-child information– Account for uncertainty in ancestral

haplotypes– Account for uncertainty in block structure– Account for population-specific haplotype

block statistics– Allow for prior knowledge of haplotype block

structure

Page 5: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007 24/07/2007ISMB/ECCB 2007

Previous models for genotype data

• Previous methods learn a low-dimensional representation of the genotype data:

• HAPLOBLOCK (Greenspan, G. and Geiger, D. RECOMB 2003)– Hard partitioning of data into set of haplotype blocks using low-

dimensional “ancestral” haplotypes

• fastPHASE (Scheet P. and Stephens, M. Am J Hum Genet 2006)– Learn ancestral haplotypes from high-dimensional genotype data

while accounting for uncertainty in haplotype blocks

• Jojic, N., Jojic, V. and Heckerman, D. UAI 2004.

Page 6: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Low-dimensional latent representation

Probabilistic generative model for genotype data

High-dimensional data

Unsupervised learning via maximum likelihood

Page 7: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Predicting missing genotype data

• Have we learned a good density model for genotype data?

• Gains from– Accounting for uncertainty in haplotype block structure– Accounting for uncertainty in ancestral haplotypes– Accounting for parental relationships

• Assess model using cross-validation/test prediction error

Page 8: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Predicting missing genotype data

• Crohn’s/5q31 data set (Daly et al., 2001)– Crohn’s disease data from Chromosome 5q31 containing

genotypes for 129 children + 258 parents across 103 loci (phases given for children)

• For each test set, make ρ fraction of data missing• Retain model parameters from model learned from training

data, then draw 1000 samples over missing data• Compute fill-in error rate over 1000 samples, for all

missing data

Page 9: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Prediction error for Crohn’s/5q31 data

Page 10: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Comparative performance for Crohn’s/5q31 data

Page 11: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Establishing haplotype block boundaries

• Define the recombination prior γ on transition probabilities– Different γ correspond to different “blockiness” of data

• For each locus k, can compute the probability of transition pk

– Can establish a threshold t and establish block boundaries

• Once blocks are defined, can assign block labels lb = (m,n)

Page 12: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Haplotype block structure in the ENm006 region

• 573 SNP markers for 270 individuals from 3 sub-populations:– 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan,

Nigeria (YRI);– 90 individuals (30 trios) of European descent from Utah (CEU)– 45 Han Chinese individuals from Beijing (CHB+JPT)/45 Japanese

individuals from Tokyo (JPT)

Page 13: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Part II: Linking haplotype block structure and gene expression data

Page 14: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

A model for linking haplotype structure to quantitative trait measurements

Observed quantitative trait profile

+

x 1.0

x 0.0

Relevance variable

=

Latent block profile

Haplotype block 2

Individual 1

Individual 2

Individual 3

Individual 4

Individual 5

Individual 1

Individual 2

Individual 3

Individual 4

Individual 5

Haplotype block 1

Label 1

Label 2

Label 3

Label 4

x

x

Page 15: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Sbj

zgj

μbg

wbg

ρg

individuals j = 1,…,J

blocks b = 1,…,B

quantitative traits g = 1,…,G

ºº

α0,β0

τ0,μ0

Noise precision

Latent block profile

Relevance variable

Observed trait

Block label

π0

A Bayesian model for linking haplotype structure to quantitative

measurements

Tbj

Page 16: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Linking haplotype blocks to phenotype• 387 individuals with Crohn’s (+1) or non-Crohn’s (-1) phenotype;• Link 10 haplotype blocks from 5q31 to phenotype• Average cross-validation error: 23.1% + 3.45%

Haplotype blocks 2 and 10 most relevant to Crohn’s phenotype (p < 4.76 x 10-5)

Test cases (sorted)

Test data splits

Page 17: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Linking haplotype blocks to gene expression• ENm006 data set:

• 19 haplotype blocks (573 SNPs)• 28 gene expression profiles in ENm006 region

(Stranger et al., 2007)

Page 18: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Addressing population stratification

…whereas variation between individuals is the effect we’re interested in

The population variable affects phenotype/gene expression…

Page 19: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Associations between haplotype blocks and gene expression

GDI1 - HapBlock2 (YRI) GDI1 - HapBlock5 (CHB+JPT)

p < 2.5 x 10-4 p < 3.33 x 10-4

Page 20: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Summary

• Enhanced version of Jojic et al. (UAI 2004) model for haplotype inference/ discovering block structure

• Novel Bayesian model for associating haplotype blocks to gene expression

• We re-discover population-specific block structures across populations in the HapMap data

• Predictions for Crohn’s disease from Chromosome 5q31 data• Cis- associations between blocks and gene expression in

ENm006 in presence of non-genetic factors

• Cis- association between HapBlocks 2 and 5 and GDI1

Page 21: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

The road ahead…

• Applying to larger portions of the HapMap data• Finding trans- associations• Non-linear models for associating block structure to

quantitative traits• Joint learning of haplotype block structure and

associations• Accounting for patterns of gene

co-expression/similar phenotypes

Page 22: Anitha Kannan   and   John Winn

24/07/2007ISMB/ECCB 2007

Acknowledgements

• Manolis Dermitzakis and Richard Durbin, Wellcome Trust Sanger Institute

• Nebojsa Jojic, Microsoft Research Redmond

• Paul Scheet, University of Michigan - Ann Arbor

• US National Science Foundation (NSF)


Recommended