1
Columbia Supercomputer and the NASA Research & Education
Network
WGISS 19: CONAE
March, 2005
David Hartzell
NASA Ames / CSC
2
Agenda• Columbia
• NREN-NG
• Applications
3
NASA’s Columbia System
• NASA Ames has embarked on a joint SGI and Intel Linux Super Computer.– Initally twenty 512 processor Intel IA-64 SGI Altix nodes
– NREN-NG: an Optical support WAN
• NLR will be the optical transport for this network, delivering high-bandwidth to other NASA centers.
• Achieved 51.9 Teraflops with all 20 nodes on Nov 2004
• Currently 2nd on the Top500 list– Other systems have come on-line that are now faster.
4
Columbia
5
Preliminary Columbia UsesSpace Weather Modeling Framework (SWMF)
SWMF has been developed at the University of Michigan under the NASA Earth Science Technology Office (ESTO) Computational Technologies (CT) Project to provide “plug and play” Sun-to-Earth simulation capabilities to the space physics modeling community.
Estimating the Circulation and Climate of the Ocean (ECCO)Continued success in ocean modeling has improved model and the work continued during very busy Return To Flight uses of Columbia.
finite-volume General Circulation Model (fvGCM)Very promising results from 1/4° fvGCM runs encouraged use for real time weather predictions during hurricane seasons - one goal is to predict hurricanes accurately in advance.
Return to Flight (RTF)Simulations of tumbling debris from foam and other sources are being used to assess the threat that shedding such debris poses to various elements of the Space Shuttle Launch Vehicle.
6
20 Nodes in Place• Kalpana was on site at the
beginning of the project
• The first two new systems were received on 28 June and placed in to service that week.
• As of late October, 2004, all systems were in place.
7
Power
• Ordered and received twenty 125kw PDU’s
• Upgrade / installation of power distribution panels
8
Cooling
• New Floor Tiles• Site visits
conducted• Plumbing in HSPA
and HSPB complete• Heating problem
contingency plans developed
9
Networking
• Each Columbia node has four 1 GigE
• And one 10 GigE• Plus Fiber Channel and
Infiniband• Required all new fiber
and copper infrastructure, plus switches
10
ComponentsFront End- 128p Altix 3700 (RTF)Networking- 10GigE Switch 32-port-10GigE Cards (1 Per 512p)- InfiniBand Switch (288port)- InfiniBand Cards (6 per 512p)- Altix 3900 2048 Numalink Kits
Compute Node- Altix 3700 12x512p - “Altix 3900” 8x512p
Storage Area Network-Brocade Switch 2x128port
Storage (440 TB)-FC RAID 8x20 TB (8 Racks)-SATARAID 8x35TB (8 Racks)
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
A512p
T512p
T512p
T512p
T512p
T512p
T512p
T512p
T512p
RTF 128
SATA35TB
Fibre Channel
20TB
SATA35TB
SATA35TB
SATA35TBSATA
35TBSATA35TB
SATA35TB
SATA35TB
FC Switch 128pFC Switch 128p
Fibre Channel
20TB
Fibre Channel
20TB
Fibre Channel
20TBFibre
Channel20TB
Fibre Channel
20TB
Fibre Channel
20TB
Fibre Channel
20TB
InfiniBand10GigE
11
NREN Goals• Provide a wide area, high-speed network for large data distribution
and real-time interactive applications
• Provide access to NASA research and engineering communities - primary focus: supporting distributed data access to/from Columbia
• Provide access to federal and academic entities via peering with High Performance Research and Engineering Networks (HPRENs)
• Perform early prototyping and proofs-of-concept of new technologies that are not ready for production network (NASA Integrated Services Network - NISN)
12
NREN-NG
• NREN Next Generation (NG) wide-area network will be expanded from OC-12 to 10 GigE within the next 3-4 months to support Columbia applications.
• NREN will “ride” the National Lambda Rail (NLR) to reach the NASA research centers and major exchange locations.
13
ARC/NGIX-WestARC/NGIX-West
JPLJPL
GSFCGSFC
NGIX-EastNGIX-East
NREN SitesPeering Points10 GigE
NREN-NG Target
NLR SunnyvaleNLR Sunnyvale
NLR Los AngelesNLR Los Angeles
MATPMATP
NLR HoustonNLR Houston
NLR ClevelandNLR ClevelandNLR ChicagoNLR ChicagoStarLightStarLight
JSCJSC
GRCGRC
Approach
MSFCMSFCNLR MSFCNLR MSFC
LRCLRC
Implementation Plan, Phase 1
14
NREN-NG Progress
• Equipment order has been finalized.• Start construction of network from West to East• Temporary 1 GigE connection up to JPL in place,
moving to 10 GigE by end of summer.• Current NREN paths to/from Columbia seeing
gigabit/s transfers• NREN-NG will ride the National Lambda Rail
network in the US
15
The NLR• National Lambda Rail (NLR).
• NLR is a U.S. consortium of education institutions and research entities that partnered to build a nation-wide fiber network for research activities.– NLR offers wavelengths to members and/or Ethernet
transport services.
– NLR is buying a 20-year right-to-use of the fiber.
16
Denver
Seattle
LASan Diego
ChicagoPitts
Wash DC
Raleigh
Jacksonville
Atlanta
KC
Portland
Clev
Boise
Ogden/Salt Lake
NLR Layer 1
NLR Route
NLR – Optical Infrastructure - Phase 1
17
18
Some Current NLR Members
• CENIC
• Pacific Northwest GigaPOP
• Pittsburgh Supercomp. Center
• Duke (coalition of NC universities)
• Mid-Atlantic Terascale Partnership
• Cisco Systems
• Internet2
• Florida LambdaRail
• Georgia Institute of Technology
• Committee on Institutional Cooperation (CIC)
• Texas / LEARN
• Cornell
• Louisiana Board of Regents
• University of New Mexico
• Oklahoma State Regents
• UCAR/FRGP
Plus Agreements with:
• SURA (AT&T fiber donation)
• Oak Ridge National Lab (ORNL)
19
NLR Applications• Pure optical wavelength research
• Transport of Research and Education Traffic (like Internet2/Abilene today)
• Private Transport of member traffic
• Experience working operating and managing an optical network– Development of new technologies to integrate
optical networks into existing legacy networks
20
– Finite Volume General Circulation Model (fvGCM): Global atmospheric model
– Requirements: (Goddard – Ames)• ~23 million points• 0.25 degree global grid • 1 Terabyte set for 5 day forecast
– No data compression required, prior to data transfer
– Assumes BBFTP for file transfers, instead of FTP or SCP
Distribution of Large Data Sets
10.00 / 10.00GSFC - Ames Performance (Full 10 Gig)
1.00 / 10.00GSFC - Ames Performance (1/10 Gig)
1.00 / 0.155Current GSFC - Ames Performance
Bandwidth[Gigabits/sec]
(LAN/WAN)
0.4 - 1.1
3 - 5
17 - 22
DataTransfer
Time (hours)
Columbia Applications
21
Distribution of Large Data Sets• ECCO: Estimating the Circulation and Climate of the
Ocean. Joint activity among Scripps, JPL, MIT & others
• Run Requirements are increasing as model scope and resolution are expanded:
– November ’03 = 340 GBytes / day– February ’04 = 2000 GBytes / day– February ‘05 = 4000 Gbytes / day (est)
– Bandwidth for distributed data intensive applications can be limiter
– Need high bandwidth alternatives and better file transfer options
Projected NREN (CENIC 10G)
NREN Feb 2005 (CENIC 1G)
Previous NREN Performance
0.2 - 0.410.0/10.0
0.6 - 0.91.0/1.0
6 - 121.0/0.155
Data Transfer Time
(Hours)
Bandwidth [Gigabits/sec]
(LAN/WAN)
Columbia Applications
22
23
24
hyperwall-1: large images
25
Disaster Recovery/Backup– Transfer up to seven 200 gigabyte files per day between
Ames and JPL– Limiting Factors
• Bandwidth: recent upgrade from OC-3 POS to 1 Gigabit Ethernet
• Compression: 4:1 Compression utilized for WAN transfers at lower bandwidths. Compression limited bandwidth to 29 Mbps (end host constraint)
No
No
Yes (4:1)
DataCompression
Required
0.6 - 1.510.00 / 10.00JPL - Ames (CENIC 10 GigE)
4.4 - 6.21.00 / 1.00JPL - Ames (CENIC 1 GigE)
27 - 311.00 / 0.155JPL - Ames (OC-3 POS)
DataTransfer
Time (hours)
Bandwidth[Gigabits/sec]
(LAN/WAN)
Projected Transfer Improvement
ARCARC
JPLJPL
Columbia Applications