9/30/09erlich@cshl.edu Watson School of Biological Sciences Cold Spring Harbor Laboratory Watson...

Post on 17-Jan-2016

221 views 0 download

Tags:

transcript

9/30/09 erlich@cshl.edu

Watson School of Biological Sciences

Cold Spring Harbor Laboratory

Yaniv Erlich

Compressed Sensing Approaches for High Throughput Carrier Screen

Joint work with Noam Shental, Amnon Amir and Or Zuk

9/30/09 erlich@cshl.edu

Outline

• What is a carrier screen?

• Our vision - compressed sensing carrier screen

• Unique features of our setting

• Bayesian reconstruction algorithm

• Simulations

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Rare recessive genetic diseases

Normal

Carrier

Affected

Healthy

Healthy!

Disease

Name Genotype Phenotype

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

~29/30

~1/30

0.003%

Cystic Fibrosis

9/30/09 erlich@cshl.edu

Carrier breading may lead to devastating results

AffectedCarrier1:2 1:4

No Carrier1:4

Intro - carrier screens

CS vision Unique features BP solver Simulations

Carrier couple

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

What can we do?

• Several countries employ nationwide programs

- screen the bulk population

- very limited set of genes

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Carrier screen - the current mechanism

Input: Thousands of specimens.

Output: Finding carriers for rare genetic diseases

A needle in a haystack problem

Intro - carrier screens

CS vision Unique features BP solver Simulations

Serial processing:

- sequence: 1 region of 1 person per reaction

- expensive and does not scale

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Carrier screens - our vision

Ultra-high throughput carrier screen

Many specimens + many regions

• Adding more genes to the test panel while keeping the task in a tractable scale

• Increase the participation by reducing the cost

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

BUT

• On pooled samples - only histogram of the DNA sequence type.

How to multiplex many specimens with next generation sequencers?

Next generation sequencers – parallel processing

Sequence 100 million DNA molecules in a single batch (~1 week)

Fra

ctio

n o

f re

ads

Example:

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1

When pooling 4 normal specimens and 1 carrier

WT allele

Mutant

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Multiplexing - the compressed sensing approach

y = Φx

CS principle: when x is sparse, very few measurements are sufficient for faithful reconstruction.

X

N

carrier

=

Φ

T pools

y

Pooling design

0-1 matrix

The ratio of carrier reads

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

On a budget compressed sensing

• Heavy weight design requires long pooling steps and higher material consumption

• Higher compression level is more prone to technical difficulties

• We want a very sparse sensing matrix

Specimens (N)

Pools (t)Φ=

Weight (w)

Compression level

Random matrix with p=0.5

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Inputs: N (number of specimens in the experiment)

Weight (pooling efforts)

Algorithm:1. Find W numbers {x1,x2,…,xw} such that:

• Bigger than

• Pairwise coprime

2. Generate W modular equations:

3. Construct the pooling design upon the modular equations

Output: Sparse pooling design with

Light Chinese Design

N>

)(mod

)(mod

)(mod

2

1

WxPoolSpecimen

xPoolSpecimen

xPoolSpecimen

≡≡

M

Advantages:

• (w-1)-disjunct matrix

•The weight does not explicitly depend on the number of specimens

• The compression level is

• Easy to debug

N<

mod 6

mod 7

Intro - carrier screens

CS vision Unique features BP solver Simulations

N

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Not all pools were born equal•The sequencer does not report the absolute number of carriers in the pool

•Instead:

),( prbinomial# carrier reads ~

# total sequence reads

Fraction of carriers in the pool / 2

• Pools with ↑sequence reads and ↓carriers provide more reliable information.

• The noise is not additive but with correlation to the content of the pool.

• We need a reconstruction algorithm that takes into account the reliability of the data from each pool.

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Signal Domain

rx ∈ RN

Nx }1,0{∈r

In traditional CS:

In compressed carrier screen:

Traditional CS decoder solves:

εφ ≤−=∈

21.minargˆ yxtsxx

NRx

• What are the implications of using traditional decoder and employing rounding procedure?

• Can we find reconstruction procedure that directly finds Nx }1,0{∈r

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Bayesian reconstruction algorithm

Biological expectations Pooling model and sequencing

Biologically, the genotype of one specimen is not dependent on the genotype of other one (unless relatives)

Only the specimens in the pool are affecting the pool results

⎭⎬⎫

⎩⎨⎧

∈= ∏∏∈∈ Tt

ti

ix

txDPBxPxN

}){|()|(maxarg}1,0{

*

r

r

{ })|()|(maxarg}1,0{

* xDPBxPxNx

rrrr∈

= Biological data Pooling data

Approximation by loopy Belief Propagation…

Φ

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Advantages of Belief Propagation

• Bottom up approach – weighs the reliability of each individual pool

• Bayesian – everything speaks the same language. Can incorporate a-priori medical information and familial connections.

• Encoding advantage – Chinese pooling ensures that there are no short cycles

• Binary results directly – no rounding procedure at the end

Biological data Pooling data

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Simulations of compressed carrier screen in Ashkenazi Jews

Genetic Disorder Carrier rate

Tay-Sachs 1:25

Cystic Fibrosis 1:30

Familial Dysautonomia 1:30

Usher Syndrome 1:40

Canavan 1:40

Glycogen Storage 1:71

Fanconi Anemia C 1:80

Niemann-Pick 1:80

Mucolipidosis type 4 1:100

Bloom 1:102

Nemaline Myopathay 1:108

• Finding carriers for two Ashkenazi Jews diseases: Tay-Sachs and Bloom syndrome.

• Chinese pooling design

• Comparing GPSR (traditional solver) and BP

• Evaluating Nmax – the largest number of specimens for which at least 48 out of 50 runs give 100% accuracy.

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Results

Bloom Tay-Sachs

BP GPSR Pools/Specimen =

6.5%Pools/Specimens=

13%

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Conclusions• CS framework can be utilized for ultra-high throughput carrier screens.

• Our setting shows several unique features not in traditional framework

- We suggest tailored encoding (light Chinese) and decoding (BP) procedures

• At least in our settings: a tailor decoder, BP, has an advantage over reconstructing with off-the shelf CS solver

• CS carrier screen has the potential to reduce dramatically the cost of sequencing.

Intro - carrier screens

CS vision Unique features BP solver Simulations

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

An ongoing study…

Introduction

Naïve Solution

s

Chinese Pooling

Analysis Results

Intro - carrier screens

CS vision Unique features BP solver SimulationsThe real thing

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Greg Hannon

Acknowledgements

For more information: hannonlab.cshl.edu/labmembers/erlich

Noam Shental

Or Zuk& Amnon Amir

Igor Carron (Nuit

Blanche)

Funding:

Lindsay Goldberg PhD Fellowship

ACM/IEEE-CS HPC PhD Fellowship

Compressed sensing carrier screen

9/30/09 erlich@cshl.edu

Loopy belief propagation is tricky

Damping is the key

DNA Sudoku

9/30/09 erlich@cshl.edu

9/30/09 erlich@cshl.edu

Pooling imperfections•Background contamination

•Pooling failures (erasures)

mod 377mod 377

Data from a real experiment

Pools not in use

Pools

# R

ead

sIntro - carrier screens

CS vision Unique features BP solver Simulations

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Pooling imperfections

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Pooling imperfections

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Pooling imperfections

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Pooling imperfections

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations

9/30/09 erlich@cshl.edu

Distinctions from traditional CS• ‘On a budget’ compressed sensing

• Not all pools were born equal

• Pooling imperfections

• Signal domain

Intro - carrier screens

CS vision Unique features BP solver Simulations