Cray-NCI Announcement

October 26, 2001

Supercomputers for BioInformatics and The

GridRaj Godhia

Consultant, Cray Inc.c/o Mega Computing (S) Pte Ltd

October 26, 2001

Cray-NCI AnnouncementCRAY INC. AND NATIONAL CANCER

INSTITUTE COLLABORATE ON MORE-POWERFUL BIOINFORMATICS RESEARCH

TOOLS

SEATTLE--(BUSINESS WIRE)--July 9, 2001-- Goal is to Exploit Unique Supercomputer Technologies to Identify and Analyze Genes Involved in Cancer and Other Diseases; Demonstration Project Produces Full STR Mapping of Genome

Cray Inc. (Nasdaq:CRAY) today announced it is collaborating with the National Cancer Institute (NCI) to develop bioinformatics research tools substantially more powerful than those available today. Bioinformatics is a high-potential market that involves applying computer technology to biology and medicine.

By exploiting several unique, ultra-fast technologies originally designed into Cray supercomputers for classified government use, the NCI and Cray are working to create genome analysis software capable of identifying and analyzing genes involved in cancer and other diseases.

In an initial demonstration project, scientists at the NCI's Advanced Biomedical Computing Center in Frederick, Md., produced a comprehensive map of short tandem repeat sequences (STRs) -- often used as gene markers -- for the entire human genome. Using the Cray SV1(TM) supercomputer located at the NCI, computations that previously took hours are being completed in seconds. This will enable biologists to do full-scale analyses that previously were impractical, Cray officials said.

"In preliminary testing, the unique technologies available on Cray vector supercomputers have provided enormous speed-ups for full-scale analysis of some common types of bioinformatics problems," said Bill Long, Cray's chief collaborator for the NCI work. "Assuming this validation continues, we believe there is a potential to make full-scale, exhaustive analysis of many bioinformatics

problems feasible for the first time." Although exhaustive analysis typically produces results that are ore complete and reliable than methods based on statistical sampling, he said, to date exhaustive analysis has been too slow and expensive to use routinely. Short tandem repeats, also known as microsatellites, are repetitive sequences of DNA that scientists have exploited for several years as tools to map new genes, study the structure of chromosomes, and compare the DNA of different species, all of which are major areas of interest in biology and medical research.

Other bioinformatics software tools under development in the NCI-Cray collaboration include: non-tandem repeats, EST cluster assembly, CG island detection, genome assembly from BAC clones, SNP (single nucleotide polymorphism) analysis, and the extension to protein sequences for proteomic applications.

"We are excited about the initial results of our collaboration with the NCI and optimistic about the larger potential for applying our unique technologies in the field of bioinformatics," said Jim Rottsolk, Cray Inc. chairman and CEO. Cray SV1 supercomputer systems start at under $1 million (U.S. list), are air cooled and fit easily into office environments.

About NCI's Advanced Biomedical Computing Center

The NCI's Advanced Biomedical Computing Center (Frederick, Md.) serves 1,800 biological researchers worldwide. Using a Cray supercomputer, ABCC played a critical role in solving the 3-D structure of HIV-1 protease, an enzyme that HIV utilizes to infect human immune cells. With the 3-D structure clarified, scientists were able to design highly effective protease inhibitors that are now the mainstay of AIDS therapy. For this work, ABCC was named a finalist for the prestigious Computerworld Smithsonian science award in 2000.

October 26, 2001

National Cancer Institute – Cray Collaboration

• Use the special hardware features of the Cray SV1 cluster to address genomic and proteomic issues.

• Integrate genomics, post-genomic, and proteomic methods to provide insights into the mechanism of cancer.

• NCI making results such as STR Database available via the web.

October 26, 2001

NCI’s Advanced Biomedical Computing Center

Par

S D

O R I G I N 2 000

Sil iconG raphi cs

O R I G I N 2 000

Sil iconG raphi cs

S D

O R I G I N 2 000

Sil iconG raphi cs

O R I G I N 2 000

Sil iconG raphi cs

S D

O R I G I N 2 000

Sil iconG raphi cs

O R I G I N 2 000

Sil iconG raphi cs

S D

O R I G I N 2 000

Sil iconG raphi cs

O R I G I N 2 000

Sil iconG raphi cs

S D

Sil iconG ra phic sCo mp ute r Syst em s

XLS E R I E S

SD

A P HA

GENERO

SLP H A E RV E R840 0

d i g i t a lS D

S il ic onGra phicsC om put er S ystem s

CHALLENGE

X LS E R IE S

S D

IBMRISC 6000

SD

Sun

W

E N T E R P R I S E

W

3 0 0 0

D R IV E NU L TR A S P A R C

Cray J90SE 16PE 1GW Cray SV1 96PE 12GW Cray J90 8PE 256MW

GigaRing

Parallel Vector Environment

Origin 2000 64PE 32GbSGI Servers Compaq 8400 IBM SP2

Storagetek Tape Silo

Workstations and File Servers

October 26, 2001

What Is an STR, and Why Do I Care?

• STR ( Short Tandem Repeat )– String of ‘n’ letters ( nucleotides ) repeated ‘m’ times (‘m’

usually >6) : ATATATATATATAT• Why STRs are important

– They can be associated with gene locations, diseases, and other important biology

– They can affect the accuracy of algorithms used to assemble the genome

– They are used for forensic identification– …

October 26, 2001

Human Genome

• > 3 Billion Base Pairs of Nucleotides

• All Short Tandem Repeats (2-8) found in <10 minutes on Cray SV1 – 1 CPU; 150 sec on 15 CPUs of SV1e.

• NCI believes such methodologies show great

promise for genome analysis and proteomics

October 26, 2001

Unique Cray Features

• Several capabilities, not just one– Unique, hard-to-replicate combination of hardware features– Benefits from applying multiple processors (CPUs)

• Originally created for intelligence community– ~100x faster than anything else for classified problems– Key bioinformatics problems look like classified problems

• Bioinformatics ‘connection’ was serendipitous– One clever individual

• Resident in Cray SV1, MTA-2, SV2– Experience to date is with SV1 series

Cray SV1™ Supercomputer

October 26, 2001

SV1 Kernel Performance

• Nucleotide encoding: 600M characters/sec.

• Difference counting: 200M starting points/sec.

– For a 32 nucleotide sequence, this would be 6.4G nucleotides/second

• Reverse complement: 4G nucleotides/sec.

– For example, the complete human genome can be

reverse complemented in about 1 second

October 26, 2001

Performance Comparisons

69

9000

0

5000

10000

Millions of Characters/Second (1 processor)

Alpha Cray SV1

1200

10

0

200

400

600

800

1000

1200

Solution Time (seconds)

Benson's methodology

SGI O2KCray SV1

Source: NCI

October 26, 2001

Kernel Status & Plans

• Available:– Nucleotide encoding– Reverse complement ( turn ACCTG into CAGGT )– Difference count– Tandem repeat search

• In progress:– Amino acid encoding & comparison scoring– Nucleotide sorting ( for non-tandem repeats )– Higher level drivers– …

October 26, 2001

Supercomputing and The Grid• Several organizations in Asia intend to implement GLOBUS on

Cray SV1 systems and make them available to BioInformatics users

• Cray systems will play a major role on The Grid

– Supercomputer centers like SDSC have always provided service to remote users

• Some organizations are confronting implementation issues running “coupled” jobs on The Grid using distributed memory techniques

– Shared memory supercomputers may play an important role as “couplers” for Grid-based distributed applications

October 26, 2001

Thank you.

godhia @ cray.com

Date post:	19-Jan-2016
Category:	Documents
Upload:	rio
View:	35 times
Download:	0 times

Cray-NCI Announcement

Documents