+ All Categories
Home > Technology > Xtw01t7v021711 cluster

Xtw01t7v021711 cluster

Date post: 06-May-2015
Category:
Upload: pgnguyen44
View: 428 times
Download: 0 times
Share this document with a friend
Popular Tags:
46
© 2006 IBM Corporation This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers. IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation Introduction to Intelligent Clusters XTW01 Topic 7
Transcript
Page 1: Xtw01t7v021711 cluster

© 2006 IBM Corporation

This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers.IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM

Corporation

Introduction to Intelligent Clusters

XTW01Topic 7

Page 2: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 2

Course Overview

The objectives of this course of study are:

>Describe a high-performance computing cluster

>List the business goals that Intelligent Clusters addresses

> Identify three core Intelligent Clusters components

>List the high-speed networking options available in Intelligent Clusters

>List three software tools used in Clusters

>Describe Cluster benchmarking

Page 3: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 3

Topic Agenda

>*Commodity Clusters*

>Overview of Intelligent Clusters

>Cluster Hardware

>Cluster Networking

>Cluster Management, Software Stack, and Benchmarking

Page 4: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 4

>Clusters are comprised of standard, commodity components that could be used separately in other types of computing configurations Compute servers – a.k.a. nodes High-speed networking adapters and switches Local and/or external storage A commodity operating system such as Linux Systems management software Middleware libraries and Application software

>Clusters enable “Commodity-based supercomputing”

What is a Commodity Cluster?

A multi-server system, comprised of interconnected computers and associated networking and storage devices, that are unified via systems management and networking software to accomplish a specific purpose.

Page 5: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 5

StorageRack

FiberNetwork

Fibre SAN Switch

Storage Nodes

Ethernet Switch

Management,Storage,SOL andCluster VLANs

Storage VLAN

Management, SOL, ClusterVLANs

Management Node

User/Login Nodes

LAN

Cluster VLAN

High-speed network Switch

Message-passing Network

User access

To management network

Compute Node Rack

Public

VLAN

Conceptual View of a Cluster

Page 6: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 6

Energy Finance Mfg Life Sciences Media Public / Gov’t

Seismic Analysis

Reservoir Analysis

Derivative Analysis

Actuarial Analysis

Asset Liability Management

Portfolio Risk Analysis

StatisticalAnalysis

Mechanical/ Electric Design

Process Simulation

Finite Element Analysis

Failure Analysis

Drug Discovery

Protein Folding

MedicalImaging

Digital Rendering

Collaborative Research

Numerical Weather

Forecasting

High Energy Physics

Bandwidth Consumption

Gaming

Application of Clusters in Industry

Page 7: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 7

Technology Innovation in HPC

>Multi-core enabled systems create new opportunities to advance applications and solutions Dual and Quad core along with increased density memory designs “8 way” x86 128GB capable system that begins at less than $10k.

>Virtualization is a hot topic for architectures Possible workload consolidation for cost savings Power consumption reduced by optimizing system level utilization

>Manageability is key to addressing complexity Effective power/thermal management through SW tools Virtualization management tools must be integrated into the overall

management scheme

Multi Core

Virtualization

Manageability

Page 8: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 8

Topic Agenda

>Commodity Clusters

>*Overview of Intelligent Clusters*

>Cluster Hardware

>Cluster Networking

>Cluster Management, Software Stack and Benchmarking

Page 9: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 9

Approaches to Clustering

Roll Your Own Roll Your Own • Client orders individual

components from a variety of vendors, including IBM

• Client tests and integrates components or contracts with an integrator

• Client must address warranty issues with each vendor

BP Integrated BP Integrated • BP orders servers &

storage from IBM and networking from 3rd Party vendors

• BP builds and integrates components and delivers to customer

• Client must address warranty issues with each vendor

IBM Racked and IBM Racked and Stacked Stacked

• Client orders servers & storage in standard rack configurations from IBM

• Client integrates IBM racks with 3rd Party components or contracts with IGS or other integrator

• Client must address warranty issues with each vendor

Intelligent Intelligent ClustersClusters

• Client orders integrated cluster solution from IBM, including servers, storage and networking components

• IBM delivers factory-built and tested cluster ready to “plug-in”

• Client has Single Point of Contact for all warranty issues.

Piece PartsPiece Parts Integrated SolutionIntegrated SolutionClient bears all risk for

sizing, design, integration, deployment and warranty

issues

Single Vendor responsible for sizing, design,

integration, deployment, and all warranty issues

IBM Delivers Across the Spectrum

Page 10: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 10

Blade Servers

Disk Storage

Storage Networking Fiber Channel iSCSI FcOE

Rack-mount Servers

Compute Nodes

StorageSoftware ServeRAID

IBM TotalStorage®

IBM Servers

CoreTechnologies

An IBM portfolio of components that have been cluster configured, tested, and work with a defined supporting software stack.

•Factory assembled •Onsite Installation •One phone number for support. •Selection of options to customize your configuration including Linux operating system (RHEL or SUSE), xCAT, & GPFS

The degree to which a multi-server system exhibits these characteristics determines if it is a cluster:- Dedicated private VLAN- All nodes running same suite of apps- Single point-of-control for:- Software/application distribution- Hardware management- Inter-node communication- Node interdependence- Linux operating system

Storage Nodes

Management Nodes

Processors -Intel®

FiberSASiSCSI

HS21-XM

Ethernet10 GbE

1 GbE

InfiniBand

4X – SDR 4X – DDR 4X - QDR

Networks

What is an IBM System Intelligent Cluster?

Out of band Management

Terminal Serv.

x3550 M3 x3650 M3

HS22

Scale-out Servers

iDataPlex

dx360 M3

HX5

Page 11: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 11

IBM HPC Cluster Solution (Intelligent Clusters)

HPC Cluster SolutionSystem x Servers (Rack mount, Blades or iDataPlex)

GPFS

xCAT

Linux or Windows

Switches & Storage

Cluster Software

IBM or Business Partner adds…

+ =+Technical

Application(or “Workload”)

Page 12: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 12

Course Agenda

>Commodity Clusters

>Overview of Intelligent Clusters

>*Cluster Hardware*

>Cluster Networking

>Cluster Management and Software Stack and Benchmarking

Page 13: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 13

Intelligent Clusters Overview - Servers

IBM System x™ 3550 M3

High performance compute nodes

• Dual Socket – 3550 M3 Intel• Integrated System Management

IBM System x™ 3650 M3Mission critical performance

• Dual Socket – 3650 M3 Intel• Integrated System Management

2U

1U

Active Energy ManagerTM: Power Management at Your Control

• HS21-XM/HS22/HX5 Intel Processor-based Blades

IBM BladeCenter® with HS21-XM, HS22, and HX5

IBM BladeCenter S

Distributed, small office, easy to configure

IBM BladeCenter H

High performance

IBM BladeCenter E

Best energy efficiency, best density

HS21 XMExtended-memory

HS22General-purpose

enterprise

Industry-leading performance, reliability and control

HX5Scalable enterprise

Page 14: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 14

IBM System x iDataPlex

PDUs

3U Chassis

2U Chassis

Switches

iDataPlex Rear Door Heat Exchanger

HPC ServerWeb Server

Storage Drives & Options

I/O TrayStorage Tray

Page 15: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

Current iDataPlex Server Offerings

>Processor: Quad Core Intel Xeon 5500

>Quick Path Interconnect up to 6.4 GT/s

>Memory:16 DIMM DDR3 - 128 GB max

>Memory Speed: up to 1333 MHz

>PCIe: x16 electrical/ x16 mechanical

>Chipset: Tylersburg-36D

>Last Order Date: December 31, 2010

iDataPlex dx360 M2High-performance Dual-Socket

>Storage: 12 3.5” HDD up to 24 TB per node / 672TB per rack

>Proc: 6 or 4 Core Intel Xeon 5600

>Memory: 16 DIMM / 128 GB max

>Chipset: Westmere

iDataPlex 3U Storage RichFile Intense Dual-Socket

>Processor: 6 & 4 Core Intel Xeon 5600

>Quick Path Interconnect up to 6.4 GT/s

>Memory:16 DIMM DDR3 - 128 GB max

>Memory Speed: up to 1333 MHz

>PCIe: x16 electrical/ x16 mechanical

>Chipset: Westmere 12 MB cache

>Ship Support March 26, 2010

iDataPlex dx360 M3High-performance Dual-Socket

>Processor: 6 & 4 Core Intel Xeon 5600

>2 NVIDIA M1060 or M2050

>Quick Path Interconnect up to 6.4 GT/s

>Memory:16 DIMM DDR3 - 128 GB max

>Memory Speed: up to 1333 MHz

>PCIe: x16 electrical/ x16 mechanical

>Chipset: Westmere 12 MB cache

>Ship Support August 12, 2010

iDataPlex dx360 M3 RefreshExa-scale Hybrid CPU + GPU

Page 16: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

System x iDataPlex dx360 M3

iDataPlex flexibility with better performance, efficiency and more options!

1U Drive Tray

1U Compute

Node

3U Storage Chassis

Maximize Storage Density

3U, 1 Node Slot & Triple Drive Tray

HDD: 12 (3.5” Drives) up to 24TB

I/O: PCIe for networking + PCIe for RAID

Compute + Storage

Balanced Storage and Processing

2U, 1 Node Slot & Drive Tray

HDD: up to 5 (3.5”)

Compute Intensive

Maximum Processing

2U, 2 Compute Nodes

750W N+N

Redundant

Power Supply

900W

Power

Supply

1U Dual GPU I/O Tray

550W

Power

Supply

Acceleration Compute + I/O

Maximum Component Flexibility

2U, 1 Node Slot

I/O: up to 2 PCIe, HDD up to 8 (2.5”)

Tailored for Your Business Needs

Page 17: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

iDataPlex dx360 M3 Refresh

> Increased Server efficiency & Westmere enablement Intel Westmere-EP 4 and 6 core processor support (up to 95 watts) 2 DIMM / Channel @1333MHz with Westmere 95 watt CPU’s Lower Power (1.35V) DIMM (2GB, 4GB, 8GB)

>Expanded I/O performance capabilities New I/O tray and 3-slot “butterfly” PCIe riser to support 2 GPU + network adapter Support for NVIDIA Tesla M1060 or “Fermi” M2050 in a 2U Chassis + 4 HDD

>Expanded Power Supply Offerings Optional Redundant 2U Power Supply for Line Feed (AC) and Chassis (DC) protection High Efficiency power supplies fitted to workload power demands

>Storage Performance, Capacity and Flexibility Simple-Swap SAS, SATA & SSD, 2.5” & 3.5” in any 2U configuration Increased capacities of 2.5” & 3.5” SAS, SATA and SSD Increased capacities in 3U Storage Dense to 24TB (with 2TB 3.5” SATA/SAS drives) 6Gbps backplane for performance Rear PCIe slot enablement in 2U chassis for RAID controller flexibility Higher capacity/higher performance Solid State Drive controller

>Next-Generation Converged Networking FCoE via 10G Converged Network Adapters, Dual Port 10Gb Ethernet

Page 18: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

dx360 M3 Refresh - Power Supply Offerings

>Maximum Efficiency for lower power requirements New High Efficiency 550W Power Supply for optimum efficiency in low power

configurations More efficiency by running higher on the power curve

>Flexibility to optimize power supply to workload appropriately 550W (non-redundant) for lower power demands 900W (non-redundant) for higher power demands 750W N+N for node and line feed redundancy

>Redundant Power Supply option for the iDataPlex chassis Node-level power protection for smaller clusters, head node, 3U storage-rich, VM &

Enterprise Rack-level line feed redundancy with discreet feeds Tailor rack-level solutions that require redundant power in some or all nodes Maintains maximum floor space density with the iDataPlex rack Graceful shutdown on power supply failure for virtualized environments

>Flexibility per chassis to optimize rack power Power supply is per 2U or 3U chassis Mix across the rack to maximize flexibility, minimize stranded power

900W HE

550W HE

750W N+N

AC 1

AC 2

PS 1 750W Max

PS 2 750W Max

C

A

B

750W Total in redundant mode

200-240V onlyRedundant supply block diagram

Page 19: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

>Rack level value

Greater density, easier to cool

Flexibility of network topology without compromising density

More density reduces number of racks and power feeds in the data center

Rear Door Heat Exchanger provides the ultimate value in cooling and density

dx360 M3 Refresh - Rack GPU Configuration

>42 High Performance GPU servers / rack

> iDataPlex efficiency drives more density on the floor

> In-rack networking will not reduce rack density, regardless of topology required by the customer

>Rear Door Heat Exchanger provides further TCO value

Page 20: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

4- 2.5” SS SAS 300 or 600GB/10K 6Gbps

(or SATA, or 3.5”, or SSD…)

Infiniband DDR

(or QDR, or 10GbE…)

NVIDIA M2050 #1

“Fermi”

(or M1060,or FX3800, or Fusion IO,…

NVIDIA M2050 #2

“Fermi”

Or M1060 FX3800, or Fusion IO,…)

Server level value

> Each server is individually serviceable

> Balanced performance for demanding GPU workloads

> 6Gbps SAS drives and controller for maximum performance

> Service and support for server and GPU from IBM

dx360 M3 Refresh - Server GPU Configuration

Page 21: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 21

Intelligent Clusters Storage Portfolio Summary

> Intelligent Clusters BOM consists of the following Storage components Entry-level DS3000 series disk storage systems Mid-range DS4000 series disk storage systems High-end DS5000 series disk storage systems All standard hard disk drives (SAS/SATA/FC) Entry-level SAN fabric switches

>Majority of the HPC solutions use DS3000/DS4000 series disk storage with IBM GPFS parallel file system software

>A small percentage of HPC clusters use entry-level storage (DS3200/DS3300/DS3400/DS3500)

> Integrated business solutions (SAP-BWA, Smart Analytics, SoFS) use DS3500 storage (mostly)

>Smaller-size custom solutions use DS3000 entry-level storage

>A small percentage of special HPC bids use DDN DCS9550 storage

Page 22: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 22

DS5020 (FC-SAN) DS5000 (FC-SAN)

DS3400 (FC-SAN, SAS/SATA)

DS3500 (SAS)

DS3300 (iSCSI/SAS)

EXP3000 Storage Expansion (JBOD)

Intelligent Clusters Storage Portfolio (Dec 2008)

Page 23: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 23

Topic Agenda

>Commodity Clusters

>Overview of Intelligent Clusters

>Cluster Hardware

>*Cluster Networking*

>Cluster Management Software Stack and Benchmarking

Page 24: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 24

Cluster Networking

>Networking is an integral part of any Cluster system from communication across various devices, including servers and storage, and for cluster management

>All servers in the cluster, including login, management, compute, and storage nodes communicate using one or more network fabrics connecting them

>Typically clusters have one or more of the following networks A cluster-wide Management network A user/campus network through which users login to the

cluster and launch jobs A low-latency, high-bandwidth network such as InfiniBand

used for inter-process communication A Storage network used for communication across the

storage nodes (optional) A Fibre-channel or Ethernet network (in case of iSCSI traffic)

used as the Storage network fabric

Cluster Network

Page 25: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

QDR InfiniBand HCA’s

QDR InfiniBand Switches

4036

1U

36 ports

12200-36

1U

36 ports

InfiniScale IV

1U

36 ports

Director Class

InfiniScale IV

10 U

216 ports

Director Class

InfiniScale IV

29 U

648 ports

12800-180

14 U

432 ports

12800-360

29 U

864 ports

ConnectX-2

Dual Port

ConnectX-2

Single Port

QLE 7340

Single Port

12300-36

1U Managed

36 ports

Director Class

InfiniScale IV

6 U

108 ports

Director Class

InfiniScale IV

17 U

324 ports

Grid Director4700

18U

324 ports

Grid Director 4200

11U

110-160 ports

= New for 10B Release

InfiniBand Portfolio - Intelligent Cluster

Page 26: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

SMC 8848M 1U

48 x 1Gb ports

2 x 10Gb uplink

SMC 8126L2 1U

26 1Gb ports

Cisco 4948 1U

48 x 1Gb ports

2x 10Gb optional uplink

Cisco 2960G-48

48 1Gb ports

1U

Low Cost 48 Port Industry Low Cost Premium Brand Alternative Premium Brand

(Stackable)

Premium Low Cost

Industry Low Cost

Blade G8000-48

48-1Gb ports

4x10Gb Up

1U

Cisco 4900 2U

Cisco 10gbE

24 10Gb Ports

IBM FCX-48 (Foxhound) 48X

48 1Gb ports

10Gb Up - I- DPX

Low Cost 48 Port

Added in Oct 10 BOM

SMC 8150L2 1U

50 1Gb ports

Industry Low Cost

Force 10 S60 1U

48 x 1Gb ports

Up to 4 x 10Gb opt. uplink

Blade G8124 1U

24x SFP+ 10Gb

24-port 10bGb SFP+

Cisco 3750G-48

48 1Gb ports with Stacking

1U

Intelligent Cluster Ethernet Portfolio

10G Switches

1G 48 Port with 10G Up

1G 48 Port Switches

1G 24 Port Switches

Entry / Leaf / Top of Rack Switches

Page 27: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

Ethernet Switch Portfolio - iDataPlex

SMC 8848M 1U

48 x 1Gb ports

2 x 10Gb uplink

Industry Low Cost

Cisco 4948E 1U

48 x 1Gb ports

4 x 10Gb optional uplink

Premium Brand

Added in Oct 10 BOM

Blade G8124 1U

24x SFP+ 10Gb

24-port 10bGb SFP+

IBM B24X (TurboIron) 24X

24 10Gb ports

I- DPX

IBM DCN -24port 10Gb

IBM DCN -48port 10Gb

IBM B50C 1U

(NettIron 48)

48 1Gb ports w/2 10GbE (opt)

Low Cost 48 Port

IBM FCX-48 (Foxhound) 48X

48 1Gb ports

10Gb Up - I- DPX

Blade G8000-48

48-1Gb ports

4x10Gb Up

1U

Low Cost 48 Port

IBM J48 Juniper EX4200-48

48-1Gb ports

10Gb Up, 2 VC ports I- DPX

Premium Brand

Alternative Premium Brand

(Stackable)

Force 10 S60 1U 48 x 1Gb ports

4-10Gb Uplinks

10G Switches

1G 48 Port with 10G Uplinks

1G 24/48 Port Switches

Entry / Leaf / Top of Rack Switches

Page 28: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

Ethernet Switch Portfolio - Intelligent Cluster

Core & Aggregate

Switches

.

Cisco 6509-E

15U 9 Slots

384-1Gb ports

32-10Gb ports

Chelsio Dual Port T3 SFP+ 10Gbe PCI-E x8 line rate adapter

Chelsio Dual port T3 CX4 10Gbe PCI-E x8 line rate adapter

Chelsio Dual port T3 10Gbe CFFh High Performance Daughter Card for Blades

Mellanox ConnectX 2 EN 10GbE PCI-E x8 line rate adapter

Added in Oct 10 BOM

10GbE HPC Adapters

IBM B16R

(BigIron)

16 Slots

768 -1Gb

256 -10Gb ports

IBM B08R(BigIron)

8 Slots 384 -

1Gb

32 -10Gb ports

Voltaire 8500

12 Slots

15U

288 -10Gb ports

All Core Switches & 10GbE Adapters Tested for compatibilitywith iDataPlex

Force 10 E600i

16U

7 slots

633-1Gb ports

112-10Gb ports

Force 10 E1200i

21U

14 slots

1260-1Gb ports 224-10Gb ports

Juniper 8216 21U

16 Slots

768-1Gb ports

128 -10Gb ports

Juniper 8208 14U

8 Slots

384-1Gb ports

64-10Gb ports

Core Switches & Adapters

Page 29: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 29

High-speed Networking

>Many HPC applications are sensitive to network bandwidth and latency for performance

>Primary choices for high-speed networking for Clusters InfiniBand 10 Gigabit Ethernet (emerging)

> InfiniBand InfiniBand is an industry standard low-latency, high-bandwidth server interconnect, ideal

to carry multiple traffic types (clustering, communications, storage, management) over a single connection

>10Gigabit Ethernet 10GbE or 10GigE is an IEEE Ethernet standard 802.3ae, which defines Ethernet

technology with data rate of 10 Gbits/sec Follow-on to 1Gigabit Ethernet technology

Page 30: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 30

InfiniBand

>An industry standard low-latency, high-bandwidth server interconnect

> Ideal to carry multiple traffic types (clustering, communications, storage, management) over a single physical connection

>Serial I/O interconnect architecture operating at a base speed of 5Gb/s in each direction with DDR and 10Gb/s in each direction with QDR

>Provides highest node-to-node bandwidth available today of 40Gb/s bidirectional with Quadruple Data Rate (QDR) technology

>Lowest end-to-end messaging latency in micro seconds (1.2-1.5 µsec)

>Wide-industry adoption and multiple vendors (Mellanox, Voltaire, QLogic, etc.)

>Open source drivers and libraries are available for users (OFED)

Lanes SDR - 2.5Gb/s DDR - 5Gb/s QDR - 10Gb/s EDR - 20Gb/s

1x (2.5 + 2.5) Gb/s (5 + 5) Gb/s (10 + 10) Gb/s (20 + 20) Gb/s

4x (10 + 10) Gb/s (20 + 20) Gb/s (40 + 40) Gb/s (80 + 80) Gb/s

8x (20 + 20) Gb/s (40 + 40) Gb/s (80 + 80) Gb/s (160 + 160) Gb/s

12x (30 + 30) Gb/s (60 + 60) Gb/s (120 + 120) Gb/s (240 + 240) Gb/s

InfiniBand Peak Bi-directional Bandwidth Table

Page 31: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation

QDR InfiniBand HCA’s

QDR InfiniBand Switches

4036

1U

36 ports

12200-36

1U

36 ports

InfiniScale IV 1U

36 ports

Director Class

InfiniScale IV

10 U

216 ports

Director Class

InfiniScale IV

29 U

648 ports

12800-180

14 U

432 ports

12800-360

29 U

864 ports

ConnectX-2

Dual Port

ConnectX-2

Single Port

QLE 7340

Single Port

12300-36

1U Managed

36 ports

Director Class

InfiniScale IV

6 U

108 ports

Director Class

InfiniScale IV

17 U

324 ports

Grid Director4700

18U

324 ports

Grid Director 4200

11U

110-160 ports

New for 10B Release

InfiniBand Portfolio - Intelligent Cluster

Page 32: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 32

10 Gigabit Ethernet

>10GbE or 10GigE is an IEEE Ethernet standard 802.3ae, which defines Ethernet technology with data rates of 10 Gbits/sec

>Enables applications to take advantage of 10Gbps Ethernet

>Requires no changes to the application code

>High-speed interconnect choice for “loosely-coupled” HPC applications

>Wide industry support for 10GbE technology

>Growing user adoption for Data Center Ethernet (DCE) and Fibre Channel Over Ethernet (FCoE) technologies

> Intelligent Clusters supports 10GbE technologies for both node-level and switch-level, providing multiple vendor choices for adapters and switches (BNT, SMC, Force10, Brocade, Cisco, Chelsio, etc.)

Page 33: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 33

Topic Agenda

>Commodity Clusters

>Overview of Intelligent Clusters

>Cluster Hardware

>Cluster Networking

>*Cluster Management, Software Stack and Benchmarking*

Page 34: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 34

Cluster Management - xCAT

>xCAT - Extreme Cluster (Cloud) Administration Toolkit Open Source Linux/AIX/Windows Scale-out Cluster Management Solution Leverage best practices for deploying and managing Clusters at scale Scripts only (no compiled code) Community requirements driven

>xCAT Capabilities Remote Hardware Control

- Power, Reset, Vitals, Inventory, Event Logs, SNMP alert processing

Remote Console Management- Serial Console, SOL, Logging / Video Console (no logging)

Remote OS Boot Target Control- Local/SAN Boot, Network Boot, iSCSI Boot

Remote Automated Unattended Network Installation For more information on xCAT go to http://xcat.sf.net

Page 35: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 35

Cluster Software Stack

>Provides fast and reliable access to common set of file data from a single computer to hundreds of systems

>Brings together multiple systems to create a truly scalable cloud storage infrastructure

>GPFS-managed storage improves disk utilization and reduces footprint energy consumption and management efforts

>GPFS removes client-server and SAN file system access bottlenecks

>All applications and users share all disks with dynamic re-provisioning capability

SAN

GPFS

SAN

LANLAN

TECHNOLOGY: >OS Support

Linux (on POWER and x86) AIX Windows

> Interconnect Support (w/ TCP/IP) 1GbE and 10 GbE Infiniband (RDMA in addition to IPoIB) Myrinet IBM HPS

High performance scalable file management solution

IBM GPFS - General Parallel File System

Page 36: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 36

What is GPFS ?

> IBM’s shared disk, parallel cluster file system.

>Product available on pSeries/xSeries clusters with AIX/Linux

>Used on many of the largest supercomputers in the world

>Cluster: 2400+ nodes, fast reliable communication, common admin domain.

>Shared disk: all data and metadata on disk accessible from any node through disk I/O interface.

>Parallel: data and metadata flows from all of the nodes to all of the disks in parallel.

GPFS File System Nodes

Switching fabric(System or storage area network)

Shared disks(SAN-attached or network

block device)

For more information on IBM GPFS, go to http://www-03.ibm.com/systems/clusters/software/gpfs/index.html

Page 37: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 37

>Resource Managers/Schedulers queue, validate, manage, load balance, and launch user programs/jobs.

>Torque - Portable Batch System (free) Works with Maui Scheduler (free)

>LSF--Load Sharing Facility (commercial)

>Sun Grid Engine (free)

>Condor (free)

>MOAB Cluster Suite (commercial)

>Load Leveler (commercial scheduler from IBM)

Resource Managers/Schedulers

Job SchedulerJob Scheduler

User 1User 1

Resource ManagerResource Manager

Node 3Node 3Node 2Node 2Node 1Node 1 Node NNode N

Job Queue

User 2User 2 User NUser N

Page 38: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 38

Messaging Passing Libraries

>Enable inter-process communication among processes of an application running across multiple nodes in the cluster (or on a symmetric multi-processing system)

>“Masks” the underlying interconnect from the user application Allows application programmer to use a “virtual” communication environment as reference

for programming applications for clusters

Messaging Passing Interface (MPI) Parallel Virtual Machine (PVM)

> Included with most Linux distributions (open source)

IP (Ethernet)

GM (Myrinet)

>Linda (commercial)

IP (Ethernet)

>MPICH2 (free)

IP (Ethernet)

MX (Myrinet)

InfiniBand

>LAM-MPI (free)

IP (Ethernet)

>Scali (commercial)

IP (Ethernet)

MX (Myrinet)

InfiniBand

>OpenMPI (free)

IP (Ethernet)

InfiniBand

Page 39: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 39

Compilers & Other tools

>Compilers are critical in creating an optimized binary code that takes advantage of the specific processor architectural features such that the application can exploit the full power of the system and runs most efficiently

>Respective processor vendors typically have the best compilers for their processors – e.g. Intel, AMD, IBM, SGI, Sun, etc.

>Compilers are important to produce the best code for HPC applications as individual node performance is a critical factor for the overall cluster performance

>Open source and commercial compilers are available such as the GNU GCC compiler suite (C/C++, Fortran 77/90) (Free), and PathScale (owned by QLogic) compilers

>Support libraries and debugger tools are also packaged and made available with the compilers, such as Math libraries (e.g. Intel Math Kernel Libraries, AMD Core Math Library) and debuggers such as gdb (GNU debugger) and TotalView debugger used for debugging parallel applications on clusters

Page 40: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 40

HPC Software StackThe Intelligent Clusters supports a broad range of HPC software from industry leading suppliers. Software is available directly from IBM or the respective Solution providers.

Functional Area Software Product Source Comments

Cluster Systems Management xCAT2IBM Director

IBM CSM functionality now merged into xCAT2

File Systems General Parallel File System (GPFS) for Linux;

GPFS for Linux on POWER

IBM  

PolyServe Matrix Server File System

HP

  NFS Open Source  Lustre Open Source

Workload Management Open PBS Open Source  PBS Pro AltaireLoadLeveler IBM

  LSF Platform Computing  MOAB Cluster Resources Commercial version of Maui

schedulerGridserver Datasynapse

  Maui Scheduler open source Interfaces to many schedulersMessage Passing Interface Solutions

Scali MPI Connect™ Scali

Compilers PGI Fortran 77/90; C/C++ STM Portland Group 32/64-bit support  Intel Fortran/C/C++ Intel    NAG Fortran/C/C++ NAG 32/64-bit  Absoft® Compilers Absoft    PathScale™ Compilers PathScale AMD Opteron

  GCC open source  

Page 41: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 41

Debugger/Tracer TotalView Etnus    CodeAnalyst AMD Timer/event profiling pipeline simulations

Fx2 Debugger™ AbsoftDistributed Debugging Tool (DDT) Allinea

Math Libraries ACML (AMD Core Math Libraries) AMD/NAG BLAS, FFT, LAPACKIntel Integrated Performance Primitives

Intel

Intel Math Kernel Library IntelIntel Cluster Math Kernel Library IntelIMSL™, PV-WAVE® Visual Numerics

Message Passing Libraries

MPICH Open Source TCP/IP networks

  MPIC-GM Myricom Myrinet networks  SCA TCP Linda™ SCA  

WMPI II™ Critical SoftwareParallelization Tools TCP Linda® SCAInterconnect ManagementScali MPI Connect ScaliPerformance Tuning Intel VTune™ Performance Analyzer Intel

Optimization and Profiling Tool (OPT)

Allinea

High Performance Computing Toolkit IBM http://www.research.ibm.com/actcThreading Tool Intel Thread Checker IntelTrace Tool Intel Trace Analyzer and Collector Intel

Functional Area Software Product Source Comments

HPC Software Stack Cont.

Page 42: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 42

Cluster Benchmarking

Benchmarking – technique for running some well-known reference applications on clusters in order to exercise various system components and measuring the performance characteristics of the cluster (e.g. network bandwidth, latency, FLOPs, etc.)

>STREAM (memory access latency and bandwidth) http://www.cs.virginia.edu/stream/ref.html

>Linpack - the TOP500 benchmark Solves a dense system of linear equations You are allowed to tune the problem size and benchmark to optimize for your system http://www.netlib.org/benchmark/hpl/index.html

>HPC Challenge A set of HPC benchmarks to test various subsystems of a cluster system http://icl.cs.utk.edu/hpcc/

>SPEC A set of commercial benchmarks to measure performance of various subsystems of the servers http://www.spec.org/

>NAS 2.3 Parallel Benchmarks

>http://www.nas.nasa.gov/Resources/Software/npb.html

> Intel MPI Benchmarks (previously Pallas benchmarks) http://software.intel.com/en-us/articles/intel-mpi-benchmarks/

>Ping-Pong (Common MPI benchmark to measure point-to-point latency and bandwidth

>Customer's own code Provides a good representation of the system performance specific to the application code

Page 43: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 43

Summary> A Cluster system is created out of commodity server hardware, high-speed networking,

storage and software technologies

> High-performance computing (HPC) takes advantage of cluster systems to solve complex problems in various industries

> IBM Intelligent Clusters provides a one-stop-shop for creating and deploying HPC solutions using IBM servers and third party Networking, Storage and Software

> InfiniBand, Myrinet (MX and Myri-10G), and 10Gigabit Ethernet technologies are more commonly used as the high-speed interconnect solution for Clusters

> IBM GPFS parallel file system provides a highly-scalable, and robust parallel file system and storage virtualization solution for Clusters and other general-purpose computing systems

> xCAT is an open-source, scalable cluster deployment and Cloud hardware management solution

> Cluster benchmarking enables performance analysis, debugging and tuning capabilities for extracting optimal performance from Clusters by isolating and fixing critical bottlenecks

> Message-passing middleware enables developing HPC applications for Clusters

> Several commercial software tools are available for Cluster computing

Page 44: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 44

Glossary of Terms

>Commodity Cluster

> InfiniBand

>Message Passing Interface (MPI)

>Extreme Cluster (Cloud) Administration Toolkit (xCAT)

>Network-attached storage (NAS)

>Cluster VLAN

>Message-Passing Libraries

>Management Node

>High Performance Computing (HPC)

>Roll Your Own (RYO)

>BP Integrated

>Distributed Network Topology

> Intelligent Clusters

>General Parallel File System (GPFS)

>Direct-attached storage (DAS).

> iDataPlex

> Inter-node communication

>Compute Network

>Centralized Network Topology

> IBM Racked and Stacked

>Leaf Switch

>Core/aggregate Switch

>Quadruple Data Rate

>Storage Area Network (SAN)

>Parallel Virtual Machine (PVM)

>Benchmarking

Page 45: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 45

Additional Resources

>IBM STG SMART Zone for more education:

Internal: http://lt.be.ibm.com

BP: http://lt2.portsmouth.uk.ibm.com/

>IBM System x

http://www-03.ibm.com/systems/x/

>IBM ServerProven

http://www-03.ibm.com/servers/eserver/serverproven/compat/us/

>IBM System x Support

http://www-947.ibm.com/support/entry/portal/

>IBM System Intelligent Clusters

http://www-03.ibm.com/systems/x/hardware/cluster/index.html

Page 46: Xtw01t7v021711 cluster

IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 46

Trademarks•The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.>Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market.

>Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:

•The following are trademarks or registered trademarks of other companies.>Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.

>Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefore.

>Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

>Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

>Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

>UNIX is a registered trademark of The Open Group in the United States and other countries.

>Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

>ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.

>IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

•All other products may be trademarks or registered trademarks of their respective companies

>Notes:

>Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

>IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

>All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

>This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

>All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

>Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products


Recommended