Post on 03-Jun-2020
transcript
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP1
FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE
Carl Trieloffcctrieloff@redhat.comRed Hat
Lee Fisherlee.fisher@hp.comHewlett-Packard
High Performance Computing on Wall Street conference14 September, 2009
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP2
From simulation to trade
tradertrader
schedulerscheduler
Internal poolInternal pool
Another Another internal internal divisiondivision
External External resource resource Eg EC2Eg EC2
tradetrade
Messaging
Messaging
Latency
Scale up
Scale out
Grid
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP3
Red Hat Enterprise MRG
Integrated platform for high performance distributed computing
High speed, interoperable, open standard Messaging
Deterministic, low-latency Realtimekernel
High performance & throughput computing Grid scheduler for distributed workloads and Cloud computing
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP4
AMQP, HP Performance, scale up.
two Intel(R) Xeon(R) CPU X5570 @ 2.93GHz per blade two Intel(R) Xeon(R) CPU X5570 @ 2.93GHz per blade (Nehalem 2.93 GHz, 8MB L3 cache, 95W)(Nehalem 2.93 GHz, 8MB L3 cache, 95W)MemoryMemory 24GB(6x4GB) , Memory Type DDR324GB(6x4GB) , Memory Type DDR3--1333, HT, Turbo 2/2/3/3) 1333, HT, Turbo 2/2/3/3) Infiniband 4X QDR IB DualInfiniband 4X QDR IB Dual--port Mezzanine HCAs(1 port connected) port Mezzanine HCAs(1 port connected) Infiniband SwitchInfiniband Switch BLc 4X QDR IB Switch BLc 4X QDR IB Switch
8 Broker 4 Broker 2 Broker 1 Broker0
2000000
4000000
6000000
8000000
10000000
12000000
Single HP Nehalem BL460c 40G Infiniband AMQP Perftest
8 bytes64 Bytes256 Bytes1024 Bytes
Number of Brokers on the Server
Mes
sage
s/Se
c
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP5
AMQP Messaging on 8-node HP Nehalem Infiniband 40Gps > 11 M mes/s
4 Broker 2 Broker 1 Broker 0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
0
0.5
1
1.5
2
2.5
3
3.1 3.1 3.1
NehalemHarperton% Nehalem vs Harperton
Number of Brokers per Server
Mes
sage
s/Se
c
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP6
KVM Performance – AMQP Messaging Intel Nahalem 2 10Gbit Vt-D > 1 M mes/s
16 32 64 128 256 512 1024 2048 40960
200000
400000
600000
800000
1000000
1200000
0
100
200
300
400
500
600
700
800
900
104 6081 1023869
902689 880965
804045741297
555465
369145
210634
RHEL 5.4 KVM AMQP 2-Guest
Msg/secThroughput MB/sec
Msg Size (bytes)
Mes
sage
s / S
ec
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP7
MRG Messaging Infiniband RDMA Latency:Under 40 Microseconds Reliably Acknowledged
13
57
911
1315
1719
2123
2527
2931
3335
3739
4143
4547
4951
5355
5759
6163
6567
6971
7375
7779
8183
8587
8991
9395
9799
0.0340
0.0360
0.0380
0.0400
0.0420
0.0440
0.0460
0.0480
MRG Messaging Latency Test on HP BL460c G6 Infiniband100K Message Rate
32 Bytes RDMA Nehalem256 Bytes RDMA Nehalem1024 Bytes RDMA Nehalem
Ave
rage
Lat
ency
(ms)
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP8
Components of the Solution Stack
Solutions still matter in an industry-standard, open source world…
HP reduced SMI BIOS'sRed Hat MRG - Realtime
Red Hat / HP SystemsRed Hat MRG – Messaging / Grid
Red Hat MRG – Tuning toolsTuning & working in labs
HP – Voltaire / Red Hat RDMA
HP compute & storage
Determinism, and performance needs to work at each layer, HP & Red Hat are partnered across the stack
FSI-HPC Solution Stack
X86-64 Server Architecture
BIOS
Operating System
Server Interconnect L2 Fabric
Integrated Systems
Workload Middleware
Application Environment
Users
Services
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP9
Hardware matters…
Scale-Up Blades
Scale-Out Rack-Optimized SL6000
HP Low Latency Lab with MRG+
Red Hat MRG Lab with HP BL460/BL685 & IB
Today’s RFP Metrics:Performance/Watt Performance/BTUPerformance/Rack
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP10
Dealing with SMIs
HP BIOS Option for Low Latency AppsDisable frequent SMIs used for Dynamic Power Savings Mode, CPU
Utilization monitoring, P-state monitoring and ECC reportingBenefits both RHEL & MRG operating environments.
Latency spikes with standard BIOS settings Latencies when SMIs disabled in BIOS
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP11
MRG – Realtime RHEL on HP systems
Enables applications and transactions to run predictably, with guaranteed response times
Upgrades RHEL 5 to realtime OS
Provides replacement kernel for RHEL5; x86/x86_64
Preserves RHEL Application Compatibility
Certified on HP hardware, see Red Hat / HP certifications
Time
Res
pons
e tim
e
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP12
MRG Realtime Scheduling Latency
VanillaMin: 1Max: 2857Mean: 11.47Mode: 9.00Median: 9.00Std. Deviation: 54.94MRG RTMin: 4Max: 43Mean: 8.34Mode: 8.00Median: 8.00Std. Deviation: 1.49
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP13
Networking matters…
Voltaire DDR and QDR InfiniBand:
RoEE – RDMA on Enhanced EthernetRoEE is defined to be a verbs compliant IB transport running over the emerging IEEE Converged Enhanced Ethernet standardwww.openfabrics.org/archives/spring2009sonoma/monday/grun.pdf
36 QDR QSFP ports Ethernet mngt port
Serial portUSB port
LEDs
Test Configuration:Two Nehalem-based server w/ ConnectX PCI-E HCAs, back-to-backQDR – ConnectX HCA running at QDRDDR – ConnectX HCA running at DDRRHEL5 UPDATE 2Mellanox VERBs Performance Test
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP14
MRG GridProvides leading high performance & high throughput computing:
Brings advantages of scale-out and flexible deployment to any application or workloadDelivers better asset utilization, allowing applications to take advantage of all available computing resources
Enables building cloud infrastructure and aggregating multiple clouds:Integrated support for virtualization as well as public cloudsSeamlessly aggregates multiple cloud resources into one compute pool
Provides seamless and flexible computing across:Local gridsRemote gridsPrivate and hybrid cloudsPublic clouds (Amazon EC2) Cycle-harvesting from desktop PCs
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP15
Based on Condor and Includes:
Enterprise SupportabilityFrom Red Hat
Web-Based Management ConsoleUnified management across all of MRG for
job, system, license management, and workload management/monitoring
Low Latency SchedulingEnable job submission to Condor via AMQP
Messaging clientsEnable sub-second, low-latency scheduling
for sub-second jobs
Virtualization Support via libvirt IntegrationSupport scheduling of virtual machines on
Linux using libvirt API's
Cloud Integration with Amazon Ec2Enable automatic cloud provisioning, job
submission, results storage, teardown via Condor scheduler
Extensible, it can be a dependency for other jobs or executed based on rules (e.g. add capacity in in the cloud if local grid out of capacity)
Concurrency LimitsSet limits on how much of a certain resource
(e.g. software licenses, db connections) can be used at once
Dynamic SlotsMark slots as partitionable and sub-divide them
dynamically so that more than one job can occupy a slot at once
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP16
Testing and developing solutions working together…
...Delivered in reference papers & certifications...Delivered in reference papers & certifications
Red Hat / HP White Paper:
1-GigE 10-GigE IPoIB IB SDP IB RDMA60
62
64
66
68
70
72
74
Throughput Memory Usage
cachebufffree
HPC on Wall Street 2009 | Carl Trieloff – Red Hat | Lee Fisher - HP17
Additional Information
www.redhat.com/mrgwww.hp.com/go/fsi