+ All Categories
Home > Technology > OpenCL applications in genomics

OpenCL applications in genomics

Date post: 25-Jun-2015
Category:
Upload: usc
View: 1,131 times
Download: 0 times
Share this document with a friend
Popular Tags:
51
Using OpenCL to accelerate genomic analysis Gary K. Chen June 16, 2011
Transcript
Page 1: OpenCL applications in genomics

Using OpenCL to accelerategenomic analysis

Gary K. Chen

June 16, 2011

Page 2: OpenCL applications in genomics

An outline

OpenCL Introduction

Copy number inference in tumors

Data considerations

Hidden Relatedness

Variable Selection

Page 3: OpenCL applications in genomics

Scientific Programming on GPGPUdevices

I nVidia and ATI are currently market leadersI Very competitive in performance and priceI Impressive double-precision performance - though

still about 4 times slower than 32 bit FPI ATI 9370 chipset: 528 64-GFLOPS 4GB GDDR5

$2,399I nVidia Tesla C2050: 520 64-GFLOPS 3GB GDDR5

$2,199I Source: www.sabrepc.com

Page 4: OpenCL applications in genomics
Page 5: OpenCL applications in genomics

Future multi-core CPUs

I Intel’s 48 core SCC chipI Potentially a more powerful solution when

considering data intenstive computing. Notconstrained by PCI bus

Page 6: OpenCL applications in genomics

An open-standards based developmentplatform

Page 7: OpenCL applications in genomics

Same idea as CUDA, different terms

Page 8: OpenCL applications in genomics

Data parallel coding

Page 9: OpenCL applications in genomics

OpenCL Concepts

Page 10: OpenCL applications in genomics

An outline

OpenCL Introduction

Copy number inference in tumors

Data considerations

Hidden Relatedness

Variable Selection

Page 11: OpenCL applications in genomics

Biology background

I DNAI A string with a four letter alphabet: A,C,G,TI Humans have two copies: one from mom, one from

dadI Most of the sequence between two strands is the

same, except for a small proportion

I Example sequence: ATATTGC. We could have:

I A single nucleotide polymorphism (common pointmutation): ATATAGC

I Copy number variants/abberations(deletions,amplifications,translocations):

I AT–GCI ATATTATTATTGC

Page 12: OpenCL applications in genomics

SNP microarrays

Page 13: OpenCL applications in genomics

What is observed

I Microarray outputI Probes are dyed, and microarrays scanned with

CCD camerasI X,Y: Intensities of A and B alleles (two possible

variants)I R = X+Y: Overall intensityI LRR (log2 R ratio): Intensity relative to a standard

intensityI BAF (B allele frequency): Ratio of allelic intensity

between A and B

Page 14: OpenCL applications in genomics

Inferring CNVs from microarray output

Page 15: OpenCL applications in genomics

Hidden Markov ModelI A formalized statistical model

I We want to use information from observables(LRR,BAF) to infer true state of nature (copynumber, genotype)

Table: Example hidden states from PennCNV software

State CN possible genotypes1 0 Null2 1 A,B3 2 AA,AB,BB4 2 AA,BB5 3 AAA,AAB,ABB,BBB6 4 AAAA,AAAB,AABB,ABBB,BBBB

Page 16: OpenCL applications in genomics

Copy number inference in tumors

I Inference is harder!I 1. When dissecting breast tissue for example,

stromal (normal cell) contamination is almostinevitable. Hence you are modeling a mixture oftwo or more cell populations

I Suppose you have a state assuming normal CN=2,tumor CN=4,α = .2

I e.g. ri = αri,n + (1− α)ri,tI expected mean intensity: .2(1) + .8(1.68) = 1.544

I 2. Amplification events can be wilder thangermline (e.g. blood) events, leading to greatercopy number/genotype possibilities

I Combine issues 1) and 2) and you can get a hugesearch space

Page 17: OpenCL applications in genomics

Expanded state space

ID BACn CNt BACt α r̄ b̄0 0 2 0 0.3 1 01 0 2 0 0.6 1 02 1 2 1 0.3 1 0.53 1 2 1 0.6 1 0.54 2 2 2 0.3 1 15 2 2 2 0.6 1 16 0 1 0 0.3 0.65 07 0 1 0 0.6 0.8 08 0 1 1 0.3 0.65 0.5384629 0 1 1 0.6 0.8 0.25

10 1 1 0 0.3 0.65 0.23076911 1 1 0 0.6 0.8 0.37512 1 1 1 0.3 0.65 0.76923113 1 1 1 0.6 0.8 0.62514 2 1 0 0.3 0.65 0.46153815 2 1 0 0.6 0.8 0.7516 2 1 1 0.3 0.65 117 2 1 1 0.6 0.8 118 0 3 0 0.3 1.35 019 0 3 0 0.6 1.2 020 1 3 1 0.3 1.35 0.3703721 1 3 1 0.6 1.2 0.41666722 1 3 2 0.3 1.35 0.6296323 1 3 2 0.6 1.2 0.58333324 2 3 3 0.3 1.35 125 2 3 3 0.6 1.2 1

Page 18: OpenCL applications in genomics
Page 19: OpenCL applications in genomics

Algorithm

I InitializeI Empirically estimate σ of BAF and LRRI Compute emission matrix O for each state/obs

from a Gaussian pdf

I Train: Expectation MaximizationI Forward backward: computes posterior probs and

overall likelhoodI Baum Welch: Compute MLE of transition

probabilites in matrix T

I Traverse state pathI Viterbi (dynamic programming): walk the state

path based on max-product

Page 20: OpenCL applications in genomics

I Parallel Forward AlgorithmI We compute the probability vector at observation

t: f0:t = f0:t−1TOt

I Each state (element of the m-state vector) canindependently compute a sum-product

I Threadblocks map to statesI Threads calculate products in parallel, followed by

a log2(m) addition reduction

Page 21: OpenCL applications in genomics

Technical issue: Underflow

I Tiny probabilities often have to be representedin log space (even for FP64)

I How do we deal with adding log probabilities?I We usually exponentiate, add, then log

I RemedyI Add an offset to log before exponentiatingI Subtract the offset from the log space answer

Page 22: OpenCL applications in genomics

Gridblocks: Forward Backward Calculation

Page 23: OpenCL applications in genomics

Code: Computing products in parallel

Page 24: OpenCL applications in genomics

Code: 2 Reductions: computing offset,sum-product

Page 25: OpenCL applications in genomics

Algorithm Improvements

I Examples:I Re-scaling transition matrix (accounting for

SNP spacing)I Serial: O(2nm2); Parallel: O(n)

I Forward backwardI Serial: O(2nm2); Parallel: O(nlog2(m))

I ViterbiI Serial: O(nm2); Parallel: O(nlog2(m))

I Normalizing constant (Baum-Welch)I Serial: O(nm); Parallel: O(log2(n))

I MLE of transition matrix (Baum-Welch)I Serial: O(nm2); Parallel: O(n)

Page 26: OpenCL applications in genomics

Performance

Table: One EM iteration on Chr 1 (41,263 SNPs)

states CPU GPU fold-speedup128 9.5m 37s 15x512 2h 35m 1m 44s 108x

Page 27: OpenCL applications in genomics

An outline

OpenCL Introduction

Copy number inference in tumors

Data considerations

Hidden Relatedness

Variable Selection

Page 28: OpenCL applications in genomics

Storing data

I Global memoryI Relatively abundant, but slowI However, even 4GB may be insufficient for modern

datasets

I Genotype dataI Highly compressibleI We only care if a position differs from the

canonical sequenceI Thus: AA,AB,BB,NULL are 4 possible genotypesI Should be able to encode this into two bits, so 4

genotypes per byte

Page 29: OpenCL applications in genomics

Possible approachesI Store as a float array

I +: Easy to implementI -: Uses 16 times as much memory as needed!

I Store as an int arrayI Allocate a local memory array of 256 rows, 4 cols

for mapping all possible genotype 4-tuplesI +: Uses global memory efficiently, maximizes

bandwidthI -: You might not even have enough local memory,

much less for real workI Store as a char array

I Right bitshift pairs of bits, then OR mask with 3I +: Uses global memory efficiently, saves on local

memoryI -: Threads load a minimum of 4 bytes per word,

you use 25% of available bandwidth

Page 30: OpenCL applications in genomics

One solution: custom container

I Idea:I Designate each threadblock to handle 512

genotypesI First 32 threads: each loads a packedgeno t

element

I For each of the 32 threads:I Loop four times, extracting each charI Subloop four times, extacting each genotype via

bitshift/mask

Page 31: OpenCL applications in genomics

Illustration

Page 32: OpenCL applications in genomics

An outline

OpenCL Introduction

Copy number inference in tumors

Data considerations

Hidden Relatedness

Variable Selection

Page 33: OpenCL applications in genomics

Inferring Relatedness

I Inferring relatedness

I The human race is one large pedigree

I Individuals of the same ethnicity are expectedto share more SNP alleles

I We can summarize this relationship through acorrelation matrix called ’K’

Page 34: OpenCL applications in genomics

Uses for the ’K’ matrix

I Principal Components AnalysisI A singular value decomposition on ’K’I K = VDV ′

I V contains orthogonal axes, facilitating populationstructure inference

I Estimating heritabilityI In random effects modelsI Y = µ + βX + γ2K + σ2II h2 = γ2

γ2+σ2

Page 35: OpenCL applications in genomics

Example: Latino samples in LA

Page 36: OpenCL applications in genomics

Computing K

I Essentially a matrix multiplicationI K̂jk = 1

m

∑mi=1

(xij−2fi )(xik−2fi )

4fi (1−fi )I Or in another words: K=ZZ’I Including more SNPs adds more precise, subtle

information

I Parallel codeI Carrying out matrix multiplication is

straightforward on GPUI Matrix multiplication is ideal for GPU: Approx.

240x speedup.I Because K is summed over SNPs, we can split

genotype matrix by subsets of SNPs and run eachK slice in parallel

Page 37: OpenCL applications in genomics

An outline

OpenCL Introduction

Copy number inference in tumors

Data considerations

Hidden Relatedness

Variable Selection

Page 38: OpenCL applications in genomics

Variable Selection

I One goal in biomedical research is correlatingDNA variation to disease phenotypes

I Genomics technologyI The number of subjects n remains about the same

(cost of recruiting, sample preps, etc), whilenumber of features p is exploding

I Rate that data is being generated per dollarsurpasses Moore’s Law

Page 39: OpenCL applications in genomics
Page 40: OpenCL applications in genomics

Regression

I Standard logistic regressionI The usual method for hypothesis testing of

candidate predictorsI log( p

1−p) = βX , p being the probability of affection

I We apply Newton-Raphson scoring until f (β) ismaximized.

I Logistic regression simple fails whens p > n

I L1 penalized regression, aka LASSOI Idea: Fit the logistic regression model, but subject

to a penalty parameter λI g(β) = f (β)− λ

∑pj=1 |βj |

Page 41: OpenCL applications in genomics

Algorithms for fitting the LASSOI One dimensional Newton Raphson at variable

j :I Cyclic Coordinate DescentI ∆βj = β

(new)j − βj = − g ′βj

g ′′βj

I g ′(βj) =n∑

i=1

xi ,jyi1

1 + exp(xi ,jβjyi)− sgn(βj)λ

I g ′′(βj) =n∑

i=1

x2i ,j

exp(xi ,jβjyi)

(1 + exp(xi ,jβjyi))2

I We cycle through each j until likelihood stopsincreasing within some tolerance

I Performs great, but only allows parallelizationacross samples

ref: Genkin,Lewis,Madigan: Am Stat Assoc 2007 Vol 49,No. 3

Page 42: OpenCL applications in genomics

Distributed GPU implementation

I If possible to parallelize across variables, it isworth splitting up design matrix

I For really large dimensions, we can link up anarbitrary number of GPUs

I Message Passing Interface allows us to beagnostic to physical location of GPU devices

Page 43: OpenCL applications in genomics

Distributed GPU implementation

I Approach:I MPI master node delegates heavy lifting to slaves

across networkI Master node performs fast serial code, such as

sampling new λ, comparing logLs, broadcastinggradients, etc.

I Network traffic is kept to a minimumI Implemented for Greedy Coordinate Descent and

Gradient DescentI Developed on server at USC Epigenome Center: 2

Tesla C2050s

Page 44: OpenCL applications in genomics
Page 45: OpenCL applications in genomics

Parallel algorithms for fitting the LASSO

I Greedy coordinate descent (ref)I Same algorithm as CCD, except for each variable

sweep, update only j that gives greatest increase inlogL

I No dependencies between subjects and variables,massive parallelization across subjects ANDvariables

I Ideal if you have a huge dataset, and you want astringest type 1 error rate (only care about a fewvariables)

I Ayers and Cordell, Gen Epi 2010: Permute, andpick largest λ that allows first “false” variable toenter

ref: Wu, Lange: Annals Appl Stat 2008 Vol 2,No. 1

Page 46: OpenCL applications in genomics

Layout for greedy coordinate descentimplementation

Page 47: OpenCL applications in genomics

Overview of Greedy CD algorithmI Newton-Raphson kernel

I Each threadblock maps to a block of 512 subjects(theads) for 1 variable

I Each thread calculates subject’s contribution togradient and hessian

I Sum (reduction) across 512 subjectsI Sum (reduction) across subject blocks in new

kernel

I Compute log-likelihood change for eachvariable (like above).

I Apply a max operator (log2 reduction) toselect variable with greatest contribution tolikelihood.

I Iterate repeatedly until likelihood increase lessthan epsilon

Page 48: OpenCL applications in genomics

Evaluation on large dataset

I GWAS dataI 6,806 subjects in a case control study of prostate

cancerI 1,047,986 SNPs typed

I Invoke approx. 7 billion threads per iterationI Total walltime for 1 GCD iteration (sweep

across all variables)I 15 minutes on optimized serial implementation

split across 2 slave CPUsI 5.8 seconds on parallel implementation across 2

nVidia Tesla C2050 GPU devicesI 155x speed up

Page 49: OpenCL applications in genomics

Parallel algorithms for fitting the LASSOI (Stochastic Mirror) Gradient Descent (ref)

I Sometimes, we are interested in tuning λ for saythe best cross validation errors

I Greedy descent seems awfully wasteful in that onlyone βj is updated

I However, we can update all variables in parallelcycling through subjects

I AlgorithmI Extremely simple:I For subject i : gradient gi = −yi

(1+exp(xiβyi ))I Update his βi vector, where βi ,j = βi ,j − ηgixi ,j

I η is a learning parameter, set sufficiently small(e.g. .0001)

ref: Shwartz,Tewari: Proc. 26th Intern. Conf Machine

Learning 2009

Page 50: OpenCL applications in genomics

Gradient descentI Performance

I Slow convergence compared to serial cycliccoordinate descent, but far more scalable

I For large lambdas, slower than greedy coordinatedescent

I Computation:bandwidth ratio not greatI For 1 million SNPs, only about 15x speedup. Far

more SNPs are neededI Technical issues

I Must store genotypes in subject major order toenabled coalesced memory loads/stores

I Makes SNP level summaries like means and SDsdifficult to compute.

I Heterogeneous data types: floats: (E,ExG),compresesed chars: (G,GxG)

I Memory constrained: can perform interactions onthe fly with SNP major

Page 51: OpenCL applications in genomics

Potential for robust variable selection:

I Subsampling:I Applying LASSO once overfits data. Model

selection inconsistentI Subsampling is preferable: Bootstrapping, stability

selection, x-fold cross validationI Number of replicates << number of samples <<

number of features

I Bayesian variable selection:I If we assume βLASSO conditionally independentI Master node can (quickly) sample hyperparameters

(e.g. λ) from a prior distribution


Recommended