+ All Categories
Home > Documents > High performance bioinformatics

High performance bioinformatics

Date post: 13-Feb-2016
Category:
Upload: nika
View: 34 times
Download: 0 times
Share this document with a friend
Description:
Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams. High performance bioinformatics. Problem/Need Statement. Current ways to solve Bioinformatics problems are either slow or very expensive. - PowerPoint PPT Presentation
Popular Tags:
29
HIGH PERFORMANCE BIOINFORMATICS Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams
Transcript
Page 1: High performance bioinformatics

HIGH PERFORMANCE

BIOINFORMATICS

Group May 09-06Bryan McCoy

Kinit PatelTyson Williams

Page 2: High performance bioinformatics

Problem/Need Statement Current ways to solve Bioinformatics

problems are either slow or very expensive.

There is a need for a way to reduce cost and still deliver high performance in a computer system that can solve Bioinformatics problems.

Page 3: High performance bioinformatics

What is Bioinformatics? Genetic sequencing. Massive amounts of data. Simple operations but many of them. Perfect for distributed computing.

Page 4: High performance bioinformatics

Proposed Solution Use a cluster of

PS3s with their embedded Cell processors.

Page 5: High performance bioinformatics

Cell Broadband Engine Has 1 central

PowerPC based PPE.

Has 8 surrounding SPEs.

The 8 SPEs are connected via the element interconnect bus.

Page 6: High performance bioinformatics

Cell Broadband Engine

Page 7: High performance bioinformatics

Functional requirements FR1. Ported applications shall run on

the Cell B.E. FR2. The results returned shall be the

same as the original program. FR3. The applications shall return their

runtime. FR4. The applications shall execute in

parallel on multiple Cell B.E.s.

Page 8: High performance bioinformatics

Non-Functional Requirements NF1. The Cells shall all run on the Linux

OS. NF2. The resulting runtimes of the

ported applications shall be faster than on the original applications.

NF3. The ported application shall be coded in the C language.

Page 9: High performance bioinformatics

Operating Environment Use Fedora 9 OS as

it is currently supported by the Cell SDK 3.1.

Uses the command line for user interface.

Use the IBM XLC compiler and/or the current GCC compiler.

Page 10: High performance bioinformatics

Market Survey Results of the survey point to a huge speed

up of computationally intensive programs. Dr. Gaurav Khanna at the University of

Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer.

Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.

Page 11: High performance bioinformatics

Deliverables The Source Code. Compiled Executable. Runtime Comparisons. Project Final Report. Project Poster. Project Final Presentation.

Page 12: High performance bioinformatics

Work Breakdown Structure

Port Apps to Cluster PS3s

Problem Definition

Research Cell/B.E

Research Bioperf Suite

Research Distributed Parallel Algorithms

Research Previously Done Work

End Product Design

Design Requirements

Design Process

Design Documents

Considerations and Selections

Decide Which Linux to Install

Decide which applications to port

End Product Implementation

Hardware Implementation

Prototyping Implementation

Software Implementation

End Product Testing

Ensure Correctness of Output Results

Benchmarking

Final Documentation and Demonstration

Create Final Report

Create Project Poster

Prepare for Presentation

Page 13: High performance bioinformatics

Costs Time

Approximately 555 man hours total.

Freely donated.Total cost $0.

Equipment3 PS3s Crossbar routerProvided for us by

client.Total cost $0.

Page 14: High performance bioinformatics

Resource Requirements 3 PlayStation 3s. High performance network switch. Books on distributed computing on Cell. Time.

Page 15: High performance bioinformatics

Work Schedule Gant chart

Page 16: High performance bioinformatics

Risk Assessment Slow network speed. Software support. Limited RAM. Hardware Failure.

Lower quality entertainment hardware. Limited prior experience. Software development schedule.

Page 17: High performance bioinformatics

Design Further divide the application into

multiple threads for SPE execution on multiple PS3s, alter the functional logic, and vectorize the code where possible.

Page 18: High performance bioinformatics

Software Decomposition Diagram

Page 19: High performance bioinformatics

System Requirements SR1. The system shall allow the user to input multiple

DNA sequences in FASTA format through a file interface.

SR2. The system shall output all of the most parsimonious trees implied by the input data to the screen.

SR3. The system shall share computational work among the PPE and SPEs available to each client/server process.

SR4. The front-end shall share computational work with available back-end processes.

SR5. The front-end shall be able to connect to at least 2 back-end processes via a high performance router.

Page 20: High performance bioinformatics

System Analysis The key is data flow. Broken into 3 stages.

DNA sequences distributed to the PPEs down to the SPEs

Each SPE searches every possible parsimony tree for the best possible score using a branch and bound heuristic.

Finally the results are aggregated back to the main PPE and the results output.

Page 21: High performance bioinformatics

Specifications Input

DNA sequence files in FASTA format. Output

Runtime of the application.The most parsimonious phylogenetic tree.The parsimony score of the phylogenetic

tree.

Page 22: High performance bioinformatics

Specifications User Interface

No changes to the user interface.Uses a command line interface.

Page 23: High performance bioinformatics

Specifications Hardware

3 PlayStation 3sHigh performance

Cross-Bar network switch.

Page 24: High performance bioinformatics

Specifications Software

Fedora 9 with Linux 2.6.25 kernel for the Power PC

IBM Cell SDK 3.1IBM XLC 9.0 and GCC 4.3 compilers.DNAPenny 3.6.Bioperf Suite

Page 25: High performance bioinformatics

Specifications Testing

Compare benchmarked runtimes over several iterations and inputs to get averages.

Compare these runtimes with previous group’s runtimes on single Cell processor.

Compare these runtimes with previous group’s runtimes on a high performance server.○ Quad-core Intel Xeon 3.0GHz, 6GB RAM.

Page 26: High performance bioinformatics

Acknowledgements May08-24 group

Kyle ByerlyShannon McCormickMatt RohlfBryan Venteicher

Bioperf developersDavid A. Bader, Georgia Tech Yue Li, Univ. of Florida Tao Li, Univ. of Florida Vipin Sachdeva, IBM Austin

Page 27: High performance bioinformatics

Questions?

Page 28: High performance bioinformatics

Previous Results and Projected Results

Code revision 4-Way 3.0GHz Machine (seconds)

X Speedup

PlayStation 3 (seconds)

X Speedup

dnapenny_orig 823.568 1 7793.915 1

dnapenny_slimmer 360.131 2.28685673

941.981 8.273962

parallel_dnapenny_1.0 221.432 3.71928177

780.867 9.9811043

supplement_spe_parallel_1SPE

1111.471 7.0122522

supplement_spe_parallel_3SPE

443.521 17.572821

supplement_spe_parallel_6SPE

277.233 28.11323

supplement_parallel_vector_1SPE

260.952 29.867236

supplement_parallel_vector_3SPE

153.656 50.723141

supplement_parallel_vector_6SPE

130.59 59.682326

Cluster with 3 PlayStations

(Projected)

~54.8 ~142.224

1 2 3 4 5 6 7 80

10

20

30

40

50

60

70

f(x) = 5.72802144736842 x + 21.9361413947368R² = 0.887915258548363

Number of available SPEs + PPE

x Sp

eedU

p (C

ompa

red

to o

rigin

al p

rogr

am ru

nnin

g on

one

PPE

)

Page 29: High performance bioinformatics

Summary Cost: $0. Equipment provided. Time: 555 approximate man hours.

Freely Donated. Results: 4x the performance of a

similarly priced system.


Recommended