+ All Categories
Home > Documents > Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture

Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture

Date post: 22-Feb-2016
Category:
Upload: ide
View: 35 times
Download: 0 times
Share this document with a friend
Description:
Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture. Junqi Yin , Bhanu Rekepalli, Pragneshkumar Patel, Chanda Drennen , and Annette Engel XSEDE 14, Atlanta GA , July 15, 2014. Outline . Introduction Motivation --- ECSS - PowerPoint PPT Presentation
Popular Tags:
19
Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory Architecture Junqi Yin , Bhanu Rekepalli, Pragneshkumar Patel, Chanda Drennen, and Annette Engel XSEDE 14, Atlanta GA , July 15, 2014
Transcript
Page 1: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

Instrumenting Genomic Sequence Analysis Pipeline Mothur on Shared Memory

ArchitectureJunqi Yin, Bhanu Rekepalli, Pragneshkumar Patel,

Chanda Drennen, and Annette EngelXSEDE 14, Atlanta GA , July 15, 2014

Page 2: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

2

Outline · Introduction

– Motivation --- ECSS – Bioinformatics tool --- Mothur– SGI UV1000 --- Nautilus

· Porting OTU analysis pipeline – Pre-clustering denoise – Distance matrix calculation– Sequence clustering

· Performance results on Nautilus· Summary

Porting Mothur to Nautilus

Page 3: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

3

ECSS community code project The effect of the Macondo oil spill on coastal ecosystems · The ultimate goal is to improve society’s ability to

understand how to respond to and mitigate the effects of petroleum pollution and related stressors on marine and coastal ecosystems of the Gulf of Mexico.

· The challenge is analyzing rapidly growing pyrosequencing data (millions of sequences), which are beyond the capability of a typical workstation.

· The solution is to develop a downstream analysis pipeline capable for HPC.

Porting Mothur to Nautilus

Page 4: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

4

Mothur

· Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41 Cited by 2453

· Mothur is an expandable C++ code that re-implements a large number of popular algorithms within the community into a single, standalone executable for different platforms.

· However, it is not HPC ready.

Porting Mothur to Nautilus

Page 5: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

5

Mothur

· One important goal is to categorize sequences · 3 ways to bin sequences in Mothur

– Operational Taxonomic Units (OTUs): sensitive to errors, but is independent of any previous knowledge.

– Taxonomic: bins sequences based on what they’re named – Phylogenetic: builds trees and uses the branching structure to

bin sequences · For more information

– Wiki: http://www.mothur.org/wiki/Main_Page– User forum: http://www.mothur.org/forum

Porting Mothur to Nautilus

Page 6: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

6

Nautilus

Single system image: 1024 cores

Intel Nehalem EX processors 4TB of global shared memory 8 NVIDIA Tesla GPUs NUMA

Porting Mothur to Nautilus

Page 7: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

7

Nautilus

· A single node system with large global shared memory · Offloading thread synchronization, data sharing, and

massage passing overhead from CPUs · Scalable interconnect with other blades via NUMAlink5· For more information, see http://

www.nics.utk.edu/computing-resources/nautilus

Page 8: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

d

NICS and Nautilus:

Darter 11,968 physical cores

8

Page 9: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

9

OTU Analysis Pipeline

· Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity– Pre-clustering denoise– Distance matrix calculation– Sequence clustering

Porting Mothur to Nautilus

Page 10: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

10

Pre-clustering denoise (pre.cluster) · Remove sequences due

to errors: if 2 sequences that are each 1 bp different from a big group, this assumes that it’s due to sequencing error.

· Time complexity O(N2); two loops are not independent, and the OpenMP directive is applied to the inner loop.

Preclustercommand::process for (int i = start; i < numSeqs; i++) { if (alignSeqs[i].active) { //this sequence has not been merged yet //try to merge it with all smaller seqs int sum=0; #pragma omp parallel { string merge=""; #pragma omp for nowait reduction(+:sum,count) for (int j = i+1; j < numSeqs; j++) { if (alignSeqs[j].active) { //this sequence has not been merged yet //are you within "diff" bases int mismatch = calcMisMatches(alignSeqs[i].seq.getAligned(), alignSeqs[j].seq.getAligned()); if (mismatch <= diffs) { //merge merge += ',' + alignSeqs[j].names; sum += alignSeqs[j].numIdentical; alignSeqs[j].active = 0; alignSeqs[j].numIdentical = 0; count++; } }//end if j active }//end for loop j #pragma omp critical alignSeqs[i].names += merge; } alignSeqs[i].numIdentical += sum; //remove from active list alignSeqs[i].active = 0; }//end if active i

Page 11: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

11

Distance matrix calculation (dist.seqs) · Calculate pairwise distance between sequences ( O(N2) )· Using MPI in Mothur with embarrassingly parallel

scheme · A shared MPI-IO pointer is employed and every MPI

process writes to a single file in a line-by-line fashion, which cause writing contentions.

· Solution: file per process; scale up to the number of Object Storage Targets (OSTs) of the parallel file systems

Page 12: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

12

Sequence clustering (cluster)

· Unweighted Pair Group Method with Arithmetic mean (UPGMA)– Search the distance matrix and find the minimum cell ( O(N2) )– Treat the found cell as a node and update its distance to other

cells ( O(N) )– Repeat first two steps N times or until the found minimum

distance is larger than a predefined cutoff value· Time complexity O(N3) ; memory complexity O(N2);

sequential implementation in Mothur

Page 13: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

13

Sequence clustering (cluster)

· To use more than one socket, each thread should work on part of matrix allocated on local memory

· Distance matrix is represented by STL vector· “first touch” policy is enforced for NUMA· Solution: customizing memory allocation by overwriting

the allocator in std::vector<Type, Allocator<Type> > numa_seqVec

Page 14: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

14

Sequence clustering (cluster)

· Most important methods in custom allocator class in allocate()

· Object with dynamic data are problematic, e.g. can’t use vector::erase method

pointer numaAllocator::allocate (size_type num, const void* = 0) { size_type len = num * sizeof(T); char *ret = (char*)(std::malloc(len)); if(!omp_in_parallel()){ #pragma omp parallel for schedule(static) for(size_type i=0; i<len; i+=sizeof(T)){ for(size_type j=0; j<sizeof(T);++j){ ret[i+j]=0; } } } return (pointer)(ret); }

Page 15: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

15

Sequence clustering (cluster)

· The hot spot (over 90%) in cluster is SparseDistanceMatrix::getSmallestCell method

· Same static scheduling· Set OMP_PROC_BIND

or use dplace

ull SparseDistanceMatrix::getSmallestCell(ull& row){ try { if (!sorted) { sortSeqVec(); sorted = true; } smallDist = 1e6; ull col; #pragma omp parallel { float dist_p = 1e6; ull row_p, col_p; #pragma omp for schedule(static) nowait for (int i = 0; i < numa_seqVec.size(); i++){ for (int j = 0; j < numa_seqVec[i].size(); j++) { //already checked everyone else in row int idx = numa_seqVec[i][j].index; if(idx == INT_MAX) continue; if (i < idx) { float dist = numa_seqVec[i][j].dist; if(dist < dist_p){ //found a new smallest distance dist_p = dist; row_p = i; col_p = idx; } }else { j+=numa_seqVec[i].size(); } //stop looking } } #pragma omp critical { if(dist_p < smallDist){ smallDist = dist_p; row = row_p; col = col_p; } } }

Page 16: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

16

Performance results on Nautilus

Method Seqs cores Time(s)

1 1185

Pre.cluster 50000 2 668

4 392

8 243

1 31.8

Read.dist 5000 2 18.8

4 11.3

8 7.7 Scaling of distance matrix calculation 10000 sequences on Nautilus

Page 17: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

17

Performance results on Nautilus

Ratio of run time for 16 cores with respect to up to 160 cores for 5000, 10000, and 2000 sequences

Run time and speedups for 5000 Sequences on up to 128 cores

Page 18: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

18

Page 19: Instrumenting Genomic  Sequence Analysis Pipeline Mothur on  Shared Memory  Architecture

19 HPC in Physics

Summary

· Pre-clustering and matrix loading have seen over 4x speedup on Nautilus

· Distance calculation shows linear scaling up to the number of the OSTs

· Sequence clustering shows 7x speedup when number of cores increased by 10x

· Overall, OTU pipeline being accelerated by orders of magnitude on Nautilus, and the optimization is generally applicable for other shared memory machines.


Recommended