+ All Categories
Home > Documents > Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Date post: 02-Jan-2016
Category:
Upload: blake-tucker
View: 214 times
Download: 0 times
Share this document with a friend
30
HIGH PERFORMANCE BIOINFORMATICS Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang
Transcript
Page 1: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

HIGH PERFORMANCE

BIOINFORMATICS

Group May 09-06

Bryan McCoy

Kinit Patel

Tyson Williams

Advisor/Client: Zhao Zhang

Page 2: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

What is Bioinformatics? Genetic sequencing Massive amounts of data Many simple operations Perfect for distributed

computing

Page 3: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Problem Current solutions are not realistically feasible

Too expensive○ Super computers○ High powered servers

Too slow○ Some inputs can takes several days

Need for high speed, low cost solutions

Page 4: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Our Solution Cell Processor

Based on Phase 1 Cluster of PlayStation 3s MPI

Message Passing Interface

Page 5: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

IBM Cell Broadband Engine 1 Power Processing Element (PPE) 8 Synergistic Processing Elements (SPEs)

Only 6 SPEs are accessible on a PlayStation 3 4 high speed rings for processor communication

Page 6: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

DNAPenny Compares DNA strands

from different species Score indicates

evolution similarities between two species

Branch and bound search algorithm

Page 7: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Functional requirements

FR1. Ported applications shall run on the Cell B.E.

FR2. The results returned shall be the same as the original program.

FR3. The applications shall return their runtime.

FR4. The applications shall execute in parallel on multiple Cell B.E.s.

Page 8: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Non-Functional Requirements NF1. The Cells shall all run on the Linux

OS. NF2. The resulting runtimes of the

ported applications shall be faster than on the original applications.

NF3. The ported application shall be coded in the C language.

Page 9: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Market Survey Results of the survey point to a huge speed

up of computationally intensive programs. Dr. Gaurav Khanna at the University of

Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer.

Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.

Page 10: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Risk Assessment

Slow network speed Software support Limited RAM Hardware Failure

Lower quality entertainment hardware Limited prior experience Software development schedule

Page 11: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Resource Requirements

3 PlayStation 3s High performance network switch Cell programming books Front node (desktop computer) Time

Page 12: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Software Environment Use Fedora 9 OS as it

is currently supported by the Cell SDK 3.1.

Uses the command line for user interface.

Use the IBM XLC compiler and/or the current GCC compiler.

Page 13: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Hardware Environment 3 PlayStation 3s High speed Crossbar switch Private network Front Node (desktop computer)

Proxy serverNetwork File Store (NFS)

Page 14: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

I/O Input

Inputs are DNA sequences stored in a text file.

Text is a CustalW alignment organized in Phylip format, a standard format for biological applications.

OutputOutputs are

○ The parsimony score○ The best trees○ The execution time

The score and best trees are output to the screen and to text files.

The execution time is output to a CSV (Comma Separated Value) file.

Page 15: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Work Breakdown Structure

Port Apps to Cluster PS3s

Problem Definition

Research Cell/B.E

Research Bioperf Suite

Research Distributed Parallel

Algorithms

Research Previously Done

Work

End Product Design

Design Requirements

Design Process

Design Documents

Considerations and Selections

Decide Which Linux to Install

Decide which applications to port

End Product Implementation

Hardware Implementation

Prototyping Implementation

Software Implementation

End Product Testing

Ensure Correctness of Output Results

Benchmarking

Final Documentation and

Demonstration

Create Final Report

Create Project Poster

Prepare for Presentation

Page 16: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Work Schedule Gant chart

Page 17: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Deliverables

Source Code Compiled Executable Runtime Comparisons Final Report Poster Final Presentation

Page 18: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Costs Time

Approximately 555 man hours total.

Freely donated.

Total cost $0.

Equipment3 PlayStation 3s

○ Provided by clientCrossbar router

○ Provided by clientStandard desktop computer

○ Provided by department

Total cost $0.

Page 19: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Development: Initial Overview Use MPI to distribute the program to the

multiple PlayStations. Each PlayStation would search one

branch of the tree. 1 function (supplement) took 90% of the

runtimePhase 1 ported this function to the SPEs

Page 20: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Development: Difficulties Found a bug in supplement. The bug did not affect results but did affect

runtime. We contacted the original developer, Dr.

Felsenstein at the University of Washington, who fixed the bug.

The fix significantly improved runtime. However, the fix negated all work done by

Phase 1 as supplement no longer took a significant amount of runtime.

Page 21: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Development: Reworking After the bug fix, no

single function took a significant amount of runtime.

We decided to distribute branches of the tree search to different processors.

Page 22: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Development: Results

Completed our goals Divided work among 3 PlayStation 3s.Produced faster code that comparable

sequential environment. Due to time constraints, we were not

able to port the code to the SPEs.

Page 23: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Testing

Used script to test multiple inputs. Averaged the runtimes. Used several different code revisions

and machines to provide comparisons. Projected the speedup that could be

attained if code was ported to SPEs.

Page 24: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Results: Actual Our current code is

20.76 times faster than the it was at the beginning of the semester.

Surpassed our original projections, which assumed the use of the SPEs.

Code revision Runtime (sec)

X Speedup X Speedup

(compared to desktop)

# of available cores

Original (Core2)

1861.66 0.116 1

With Bug Fixes (Core 2)

218.8 1 1

Original (1 PPE)

4953.51 1 0.044 1

With Bug Fixes (1 PPE)

662.70 7.47 0.330 1

MPI with Bug Fixes (3 PPEs)

238.57 20.76 0.917 3

MPI with Bug Fixes (3 PPEs, 18 SPEs)

(Projected)

34.08 145.35 6.420 21

Original Projections

334.82 14.79 0.653 21

Page 25: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Results: MPI The speedup for

MPI was 2.78. Excellent speedup

for 3 nodes.

Code revision Runtime (sec)

X Speedup X Speedup

(compared to desktop)

# of available cores

Original (Core2)

1861.66 0.116 1

With Bug Fixes (Core 2)

218.8 1 1

Original (1 PPE)

4953.51 1 0.044 1

With Bug Fixes (1 PPE)

662.70 7.47 0.330 1

MPI with Bug Fixes (3 PPEs)

238.57 20.76 0.917 3

MPI with Bug Fixes (3 PPEs, 18 SPEs)

(Projected)

34.08 145.35 6.420 21

Original Projections

334.82 14.79 0.653 21

Page 26: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Results: Comparison Our final code came

close to a high powered desktop.Core 2 Quad at

2.66 GHz Our projected

results indicate a speedup of 6.4.

Code revision Runtime (sec)

X Speedup X Speedup

(compared to desktop)

# of available cores

Original (Core2)

1861.66 0.116 1

With Bug Fixes (Core 2)

218.8 1 1

Original (1 PPE)

4953.51 1 0.044 1

With Bug Fixes (1 PPE)

662.70 7.47 0.330 1

MPI with Bug Fixes (3 PPEs)

238.57 20.76 0.917 3

MPI with Bug Fixes (3 PPEs, 18 SPEs)

(Projected)

34.08 145.35 6.420 21

Original Projections

334.82 14.79 0.653 21

Page 27: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Results: Projected

Using all SPEs, the speedup should be 145.35 Assuming SPEs run as fast

as the PPEs○ Before SPE vectorization

Code revision Runtime (sec)

X Speedup X Speedup

(compared to desktop)

# of available cores

Original (Core2)

1861.66 0.116 1

With Bug Fixes (Core 2)

218.8 1 1

Original (1 PPE)

4953.51 1 0.044 1

With Bug Fixes (1 PPE)

662.70 7.47 0.330 1

MPI with Bug Fixes (3 PPEs)

238.57 20.76 0.917 3

MPI with Bug Fixes (3 PPEs, 18 SPEs)

(Projected)

34.08 145.35 6.420 21

Original Projections

334.82 14.79 0.653 21

Page 28: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Conclusions

Achieved our goal of using MPI to get runtime improvement.

Contributed a major fix to a widely used application.

Surpassed our initial runtime goal. Projected results show an even larger

runtime improvement still possible.

Page 29: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Acknowledgements May08-24 group (phase I)

Kyle Byerly Shannon McCormick Matt Rohlf Bryan Venteicher

DNAPenny Author Dr. Felsenstein

Advisor Zhao Zhang

Environment Help Steve Nystrom

Page 30: Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang.

Questions?


Recommended