Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | thomasina-sims |
View: | 221 times |
Download: | 0 times |
9/30/09 [email protected]
Watson School of Biological Sciences
Cold Spring Harbor Laboratory
Yaniv Erlich
Compressed Sensing Approaches for High Throughput Carrier Screen
Joint work with Noam Shental, Amnon Amir and Or Zuk
9/30/09 [email protected]
Outline
• What is a carrier screen?
• Our vision - compressed sensing carrier screen
• Unique features of our setting
• Bayesian reconstruction algorithm
• Simulations
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Rare recessive genetic diseases
Normal
Carrier
Affected
Healthy
Healthy!
Disease
Name Genotype Phenotype
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
~29/30
~1/30
0.003%
Cystic Fibrosis
9/30/09 [email protected]
Carrier breading may lead to devastating results
AffectedCarrier1:2 1:4
No Carrier1:4
Intro - carrier screens
CS vision Unique features BP solver Simulations
Carrier couple
Compressed sensing carrier screen
9/30/09 [email protected]
What can we do?
• Several countries employ nationwide programs
- screen the bulk population
- very limited set of genes
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Carrier screen - the current mechanism
Input: Thousands of specimens.
Output: Finding carriers for rare genetic diseases
A needle in a haystack problem
Intro - carrier screens
CS vision Unique features BP solver Simulations
Serial processing:
- sequence: 1 region of 1 person per reaction
- expensive and does not scale
Compressed sensing carrier screen
9/30/09 [email protected]
Carrier screens - our vision
Ultra-high throughput carrier screen
Many specimens + many regions
• Adding more genes to the test panel while keeping the task in a tractable scale
• Increase the participation by reducing the cost
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
BUT
• On pooled samples - only histogram of the DNA sequence type.
How to multiplex many specimens with next generation sequencers?
Next generation sequencers – parallel processing
Sequence 100 million DNA molecules in a single batch (~1 week)
Fra
ctio
n o
f re
ads
Example:
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
When pooling 4 normal specimens and 1 carrier
WT allele
Mutant
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Multiplexing - the compressed sensing approach
y = Φx
CS principle: when x is sparse, very few measurements are sufficient for faithful reconstruction.
X
N
carrier
=
Φ
T pools
y
Pooling design
0-1 matrix
The ratio of carrier reads
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
On a budget compressed sensing
• Heavy weight design requires long pooling steps and higher material consumption
• Higher compression level is more prone to technical difficulties
• We want a very sparse sensing matrix
Specimens (N)
Pools (t)Φ=
Weight (w)
Compression level
Random matrix with p=0.5
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Inputs: N (number of specimens in the experiment)
Weight (pooling efforts)
Algorithm:1. Find W numbers {x1,x2,…,xw} such that:
• Bigger than
• Pairwise coprime
2. Generate W modular equations:
3. Construct the pooling design upon the modular equations
Output: Sparse pooling design with
Light Chinese Design
N>
)(mod
)(mod
)(mod
2
1
WxPoolSpecimen
xPoolSpecimen
xPoolSpecimen
≡
≡≡
M
Advantages:
• (w-1)-disjunct matrix
•The weight does not explicitly depend on the number of specimens
• The compression level is
• Easy to debug
N<
mod 6
mod 7
Intro - carrier screens
CS vision Unique features BP solver Simulations
€
N
Compressed sensing carrier screen
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Not all pools were born equal•The sequencer does not report the absolute number of carriers in the pool
•Instead:
),( prbinomial# carrier reads ~
# total sequence reads
Fraction of carriers in the pool / 2
• Pools with ↑sequence reads and ↓carriers provide more reliable information.
• The noise is not additive but with correlation to the content of the pool.
• We need a reconstruction algorithm that takes into account the reliability of the data from each pool.
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Signal Domain
€
rx ∈ RN
Nx }1,0{∈r
In traditional CS:
In compressed carrier screen:
Traditional CS decoder solves:
εφ ≤−=∈
21.minargˆ yxtsxx
NRx
• What are the implications of using traditional decoder and employing rounding procedure?
• Can we find reconstruction procedure that directly finds Nx }1,0{∈r
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Bayesian reconstruction algorithm
Biological expectations Pooling model and sequencing
Biologically, the genotype of one specimen is not dependent on the genotype of other one (unless relatives)
Only the specimens in the pool are affecting the pool results
⎭⎬⎫
⎩⎨⎧
∈= ∏∏∈∈ Tt
ti
ix
txDPBxPxN
}){|()|(maxarg}1,0{
*
r
r
{ })|()|(maxarg}1,0{
* xDPBxPxNx
rrrr∈
= Biological data Pooling data
Approximation by loopy Belief Propagation…
Φ
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Advantages of Belief Propagation
• Bottom up approach – weighs the reliability of each individual pool
• Bayesian – everything speaks the same language. Can incorporate a-priori medical information and familial connections.
• Encoding advantage – Chinese pooling ensures that there are no short cycles
• Binary results directly – no rounding procedure at the end
Biological data Pooling data
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Simulations of compressed carrier screen in Ashkenazi Jews
Genetic Disorder Carrier rate
Tay-Sachs 1:25
Cystic Fibrosis 1:30
Familial Dysautonomia 1:30
Usher Syndrome 1:40
Canavan 1:40
Glycogen Storage 1:71
Fanconi Anemia C 1:80
Niemann-Pick 1:80
Mucolipidosis type 4 1:100
Bloom 1:102
Nemaline Myopathay 1:108
• Finding carriers for two Ashkenazi Jews diseases: Tay-Sachs and Bloom syndrome.
• Chinese pooling design
• Comparing GPSR (traditional solver) and BP
• Evaluating Nmax – the largest number of specimens for which at least 48 out of 50 runs give 100% accuracy.
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Results
Bloom Tay-Sachs
BP GPSR Pools/Specimen =
6.5%Pools/Specimens=
13%
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
Conclusions• CS framework can be utilized for ultra-high throughput carrier screens.
• Our setting shows several unique features not in traditional framework
- We suggest tailored encoding (light Chinese) and decoding (BP) procedures
• At least in our settings: a tailor decoder, BP, has an advantage over reconstructing with off-the shelf CS solver
• CS carrier screen has the potential to reduce dramatically the cost of sequencing.
Intro - carrier screens
CS vision Unique features BP solver Simulations
Compressed sensing carrier screen
9/30/09 [email protected]
An ongoing study…
Introduction
Naïve Solution
s
Chinese Pooling
Analysis Results
Intro - carrier screens
CS vision Unique features BP solver SimulationsThe real thing
Compressed sensing carrier screen
9/30/09 [email protected]
Greg Hannon
Acknowledgements
For more information: hannonlab.cshl.edu/labmembers/erlich
Noam Shental
Or Zuk& Amnon Amir
Igor Carron (Nuit
Blanche)
Funding:
Lindsay Goldberg PhD Fellowship
ACM/IEEE-CS HPC PhD Fellowship
Compressed sensing carrier screen
9/30/09 [email protected]
9/30/09 [email protected]
Pooling imperfections•Background contamination
•Pooling failures (erasures)
mod 377mod 377
Data from a real experiment
Pools not in use
Pools
# R
ead
sIntro - carrier screens
CS vision Unique features BP solver Simulations
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Pooling imperfections
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Pooling imperfections
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Pooling imperfections
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Pooling imperfections
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations
9/30/09 [email protected]
Distinctions from traditional CS• ‘On a budget’ compressed sensing
• Not all pools were born equal
• Pooling imperfections
• Signal domain
Intro - carrier screens
CS vision Unique features BP solver Simulations