+ All Categories
Home > Documents > Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Date post: 17-Jan-2016
Category:
Upload: fathi
View: 46 times
Download: 0 times
Share this document with a friend
Description:
Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1. GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data. RECOMB 2007 Presentation. - PowerPoint PPT Presentation
28
Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation 1 School of Computer Science, Carnegie Mellon University 2 Division of Pulmonary, Allergy, and Critical Care Medicine, University of Pittsburgh
Transcript
Page 1: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Yanxin Shi1, Fan Guo1, Wei Wu2,Eric P. Xing1

GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation

1 School of Computer Science, Carnegie Mellon University2 Division of Pulmonary, Allergy, and Critical Care Medicine, University of Pittsburgh

Page 2: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Outline

• Motivation and Background

• Computational framework

• Experiments and Results

• Summary

Page 3: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Copy number aberration and Array CGH

• DNA copy number (a.k.a. dosage state)– Normal: 2 DNA copies

– Aberrations: deletion(0 copy), loss (1 copy), gain(3 copies), amplification(>3 copies)

– Array CGH: a high throughput method to measure DNA copy number

Page 4: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Array CGH data

Deletion (0 copy): LR = log(0/2) =

Loss (1 copy): LR = log(1/2) = -1

Normal (2 copies): LR = log(2/2) = 0

Gain (3 copies): LR = log(3/2) = 0.58

Amplification (>=4 copies): LR >= log(4/2) = 1

Ideally,

Page 5: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

However…

• Factors influencing the LR values Impurity of the test sample (e.g. mixture of normal

and cancer cells)

Variations of hybridization efficiency Base compositions of different probes Saturation of array Divergent sequence lengths of the clones Many others…

Measurement noises, etc…

Page 6: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Segmental pattern and spatial drift

Spatial drift Segmental pattern

Page 7: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Existing Computational Methods

• Threshold Method• Mixture Models (e.g. Hodgson et al., 2001)

– Assume observations are iid samples from a mixture distribution.

• Regression Models (e.g., Hsu et al., 2005; Myers et al., 2004)– Smoothing for visual inspection to detect copy number states.

• Segmentation Models (e.g. Hupé et al., 2004)

– Directly search for breakpoints in sequential data;

• Spatial Dynamics Models (e.g. Fridlyand et al., 2004)

Page 8: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

• Hidden Markov Models – Dosage states form a Markov chain of hidden

variables– Observed LR ratios are generated from state-specific

Gaussian distributions

Spatial Dynamic Methods

dosage states

LR ratios

Page 9: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

• Introduce hidden trajectory to model state-specific LR distributions (no longer fixed mean)

Dosage-Specific Kalman Filters

Linear Dynamics for dosage state m

Page 10: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

• A SKF generates observations from one of the trajectories.

Switching Kalman Filters

Dosage state chain

Trajectory 1

Trajectory M

Page 11: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Posterior Inference

• Dosage annotation is equivalent to the estimate of the posterior .

• Recovering the hidden trajectory: .

Page 12: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

• Posterior Inference is intractable.

• Variational inference: decouple the hidden chains.

• Decoupled chains have tractable distributions.

Variational Inference

Page 13: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

• Use this tractable distribution to approximate the true distribution by minimizing KL divergence.

• Fixed point equations to update the variational parameters.

Variational Inference

Page 14: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Parameter Sharing

• The CGH dataset contains whole-genome measurements for multiple individuals.

• Chromosome-specific parameters shared across individuals:

• Individual-specific parameters shared across chromosomes:

All other parameters

e.g. output noise variance

trajectory parameters:

Page 15: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Experiment Design

• Simulation Analysis:– Data generated from SKFs.– Compare with: threshold, HMM.

• aCGH profiles of 125 colorectal tumors (Nakao et al. 2004)– Case studies of 3 representative chromosomes.– Populational analysis over 125 genomes

Page 16: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Simulation Analysis (1)

Performance of dosage state prediction (b – noise in hidden dynamics, r – noise in observation, M=5)

Page 17: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Simulation Analysis (2)

Synthetic Data

Prediction by HMM

Prediction by SKF

Page 18: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Experiment Design

• Simulation Analysis:– Data generated from SKFs.– Compare with: threshold, HMM.

• aCGH profiles of 125 colorectal tumors (Nakao et al. 2004)– Case studies of 3 representative chromosomes.– Populational analysis over 125 genomes

Page 19: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Real aCGH Profile

Spatial Patterns Difficult for Conventional Methods(1) Flat-Arch Pattern

Page 20: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Real aCGH Profile

Spatial Patterns Difficult for Conventional Methods(2) Step Pattern

Page 21: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Real aCGH Profile

Spatial Patterns Difficult for Conventional Methods(3) Spikes Pattern

Page 22: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Populational Analysis

Frequency of dosage state alteration of 125 individuals

red bar – copy number gain or amplification

blue bar – copy number loss or deletionsolid vertical lines – boundary between chromosomes

Page 23: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Populational Analysis

Frequency of dosage state alteration on 2 chromosomes

top, red square – copy number gain top, blue circle – copy number loss bottom, red square – copy number amplification bottom, blue circle – copy number deletion

Page 24: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Summary

• SKF for whole-genome analysis of aCGH data.

• SKF can capture variations in the hybridization efficiency.

• Parameter sharing scheme for data integration.

• Possible Extensions:– Gene expression concordance analysis– Incorporate information about sequence length and

distance between clones

Page 25: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Thank you!

Page 26: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

Populational Analysis

Detailed spectrum of GIM rates over 125 Colorectal cancer patients in 4 hotspots region with annotation of cancer related gene

Page 27: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

• M is selected by AIC.

• We also have done experiments to compare SKF with segmentation methods (result now shown here).

Page 28: Yanxin Shi 1 , Fan Guo 1 , Wei Wu 2 , Eric P. Xing 1

• A SKF generates observations from one of the trajectories.

• is the switching process as in an HMM.

• are observed LR ratios.

Switching Kalman Filters


Recommended