+ All Categories
Home > Documents > RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower...

RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower...

Date post: 13-Jan-2016
Category:
Upload: cecily-williams
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
16
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly Funded Research William K. Barnett, Ph.D. (Director) Richard LeDuc, Ph.D. (Manager) National Center for Genome Analysis Support
Transcript
Page 1: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

RNA-Seq 2013, Boston MA, 6/20/2013

Optimizing the National Cyberinfrastructure for LowerBioinformatic Costs: Making the Most of Resources for

Publicly Funded Research

William K. Barnett, Ph.D. (Director)

Richard LeDuc, Ph.D. (Manager)National Center for Genome Analysis Support

Page 2: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

National Center for Genome Analysis Support: http://ncgas.org

Summary

• NCGAS and its mission and cyberinfrastructure.

• Overview NCGAS server-on-demand resources on a low cost fee-for-cycles basis.

• Overview XSEDE: When you need truly large-scale resources.

Page 3: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

• Funded by National Science Foundation1. Large memory clusters for assembly

2. Bioinformatics consulting for biologists

3. Optimized software for better efficiency

• Collaboration across IU, TACC, SDSC, and PSC.

• Open for business at: http://ncgas.org

Page 4: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

Making it easier for Biologists

• Web interface to NCGAS resources

• Supports many bioinformatics tools

• Available for both research and instruction.

Common

Rare

Computational Skills

LOW

HIGH

Page 5: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

National Center for Genome Analysis Support: http://ncgas.org

NCGAS Cyberinfrastructure at IU

• Rockhopper: 11 servers with 48 cores and 128 GB RAM.

• Mason large memory cluster: 16 nodes with 32 cores each and 512 GB RAM per node.

• Data Capacitor: 1 PB at 20 Gbps throughput.• Research Database Cluster for managing data sets.• All interconnected with high speed internal

network (40 Gbps)• 100 Gbps Internet2 Backbone

Page 6: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

Rockhopper• Penguin Computing's Penguin-On-Demand (POD) supercomputing

cloud appliance hosted by Indiana University. • A collaborative effort between Penguin Computing, IU, the University

of Virginia, the University of California Berkeley, and the University of Michigan.

• Provides supercomputing cloud services in a secure US facility.

• Researchers at US institutions of higher education and Federally Funded Research and Development Centers (FFRDCs) can purchase computing time from Penguin Computing, and receive access via high-speed national research networks operated by IU.

National Center for Genome Analysis Support: http://ncgas.org

Page 7: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

Standardized Trinity Analyses

0.0 GB 2.0 GB 4.0 GB 6.0 GB 8.0 GB$0.00

$10.00

$20.00

$30.00

$40.00

$50.00

Cost by Input Size for Trinity Jobs on POD@IU

Cost by Input Size

Linear (Cost by Input Size)

Size of Each Input File from Paired-End Library

National Center for Genome Analysis Support: http://ncgas.org

Page 8: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
Page 9: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

GALAXY.NCGAS.ORG Model

Virtual box hosting Galaxy.ncgas.org

The host for each tool is configured individually

Quarry Mason

Data CapacitorArchive

NCGAS establishes tools, hardens them, and moves them into production.

Custom Galaxy tools can be made for moving data

Individual projects can get duplicate boxes – provided they support it themselves.

Policies on the DC guarantee that untouched data is removed with time.

Page 10: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

10 Gbps

100 Gbps NCGAS Mason

(Free for NSF users)

IU POD(12 cents

per core hour)Data CapacitorNO data storage Charges

Your Friendly Neighborhood Sequencing Center

Your Friendly Neighborhood Sequencing Center

Your Friendly Neighborhood Sequencing Center

Moving Forward

Other NCGAS XSEDE Resources…

Lustre WAN File System

Globus On-line and other tools

Optimized Software

Page 11: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

Performance Improvement

Page 12: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

The National Cyberinfrastructure

Page 13: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

XSEDE: Extreme Science and Engineering Discovery Environment

• About 100 requests every quarter.

• About 50% of need is met.

• 75% from two systems.

• Allocations in the millions of SU.

Page 14: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
Page 15: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

National Center for Genome Analysis Support: http://ncgas.org

In Sum…

• NG Sequencing is creating a analytical problem that cannot be solved at sequencing centers

• NCGAS can provide a global scale infrastructure to better serve the needs of biologists who cannot become bioinformaticians to accomplish their research.

• XSEDE allows scaling to larger projects.

Page 16: RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.

Thank YouQuestions?

Bill Barnett ([email protected])

Rich LeDuc ([email protected])

Le-Shin Wu ([email protected])

Carrie Ganote ([email protected])


Recommended