+ All Categories
Home > Documents > Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9....

Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9....

Date post: 14-Nov-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
21
Baysian Haplotype Inference via the Dirichlet Process Eric Xing, Micheal Jordan, Roded Sharan presented by Amrudin Agovic
Transcript
Page 1: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Baysian Haplotype Inference via the Dirichlet Process

Eric Xing, Micheal Jordan, Roded Sharan

presented byAmrudin Agovic

Page 2: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Motivation

99.9 % of human DNA shared

0.1% of DNA makes up for differences

Need to determine what those 0.1% are

Find genes responsible for diseases

Page 3: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Background

Humans have 23 pairs of chromosomes in their cells

23 come from the father, 23 from the mother

Certain parts of the genome are inherited unchanged

Other genetic information gets mixed up 

Page 4: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Background

Allele:  genetic coding that occupies a position on the chromosome. 

Genotype: unordered pairs of Alleles in a region (one from each chromosome)

Phase: Allele Chromosome association  (not given)

SNP: Single Nucleotide Polymorphism, difference in one nucleotide (A,C,G,T)

Haplotype: set of associated SNP alleles in a region of a chromosome. A haplotype is inherited as a unit.

Page 5: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Background

Page 6: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral
Page 7: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Dirichlet Process Representation

Let 

G0(Ф) be a base measure for the dirichlet process

A(k) :=[A1

(k),..,AJ

(k)] be a founding haplotype configuration

(ancestral template) at loci t=[1,..,J]

θ(k) be the mutation rate of the ancestor

Ф be the parameter associated with a mixture component. Where Ф

k = {A(k), θ(k)}

Page 8: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Dirichlet Process Representation

Use Chinese Restaurant Process Associate population haplotype with table

Sample for each table  Фk = {A(k), θ(k)}

Page 9: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

The Model

Page 10: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Assumptions

G0(A,θ)=p(A)p(θ)

p(A) uniform distribution over all haplotypes

p(θ) is Beta(αh,β

h)

Page 11: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Distributions

Considering for all alleles mutations:

Integrating out theta:

Page 12: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Noisy Observation Model

Observed Genotype at a locus determined by parental and maternal alleles

If genotype disagrees penalize

γhas Beta prior

Page 13: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Pedigree-Haplotyper

Page 14: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Inference - Gibbs Sampling

γ and θ integrated out

Sample Cit , A

j

(k), Hit,j

1) Given current hidden values of haplotypes sample cit ,

aj

(k)

Page 15: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Gibbs Sampling

2) Given ancestral assignment and ancestral pool sample haplotype

Page 16: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Metropolis Hastings

Long list of loci and uniform prior p(a), leaves probability of sampling new ancestor very small.

Slow mixing Sample ancestor assignment using proposal distribution

Page 17: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Metropolis Hastings

In acceptance probability, the proposal factor cancels out

Page 18: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Experiments

Simulated Data: Haplotypes randomly paired to form genotypes. 

Performance compared to PHASE

Page 19: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Experiments

Two real data sets: 129 individuals, 90 individuals from 4 populations

Dataset 1:

Page 20: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Experiments

Dataset 2: Small sample size, tougher data set Haplotyper outperforms PHASE

Page 21: Baysian Haplotype Inference via the Dirichlet Processbanerjee/Teaching/Fall07/... · 2009. 9. 18. · Pedigree-Haplotyper. Inference - Gibbs Sampling ... assignment and ancestral

Conclusions

Algorithm  outperform PHASE on two data sets With a big margin on one of them.

Strength of proposed approach in flexibility Can be extended to incorporate aspects of 

evolutionary dynamics and other things Illustrated example: Pedigree information


Recommended