+ All Categories
Home > Documents > Maximizing Six-Core AMD Opteron Processor Performance...

Maximizing Six-Core AMD Opteron Processor Performance...

Date post: 21-May-2018
Category:
Upload: hoangcong
View: 216 times
Download: 1 times
Share this document with a friend
30
Red Hat Summit 2009 | Bhavna Sarathy 1 Maximizing Six-Core AMD OpteronProcessor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009
Transcript
Page 1: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy1

Maximizing Six-Core AMD Opteron™ Processor Performance with RHEL

Bhavna SarathyRed Hat Technical Lead, AMD

Sanjay RaoSenior Software Engineer, Red Hat

Sept 4, 2009

Page 2: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy2

Agenda

• Six-Core AMD Opteron™ processor codenamed “Istanbul” – overview

• Six-Core AMD Opteron™ processor feature support

• Continued virtualization support

• New Innovations

• Red Hat Enterprise Linux software support

• Performance benchmarking results

• Conclusions

Page 3: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy3

Six-Core AMD Opteron™ Processor (“Istanbul”)

• Six True Cores• New HyperTransport™ Technology HT Assist• Increased HyperTransport™ 3.0 Technology (HT3) Bandwidth• Higher Performing Integrated Memory Controller• Same power/thermal envelopes as Quad-Core AMD Opteron™ Processor• Continued AMD Virtualization™ (AMD-V™) technology support, Rapid Virtualization Indexing

Page 4: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy4

Prior Generation Innovations that Continue

AMD Wide Floating-Point Accelerator

Dual Dynamic Power Management™

AMD Memory Optimizer Technology

AMD Virtualization™ (AMD-V™) technology

All the performance-enhancing features of Quad-Core AMD Opteron™ processor

AMD Balanced Smart Cache

HyperTransport™ 3 Technology

HT3CPU CPU

VM1 VM2Virtual Memory 1 Virtual Memory 2

Physical Memory

Page 5: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy5

Prior Generation Innovations that Continue

AMD PowerCap manager Core Select

All the power-efficiency features of Quad-Core AMD Opteron™ processor

Independent Dynamic Core Technology

AMD CoolCore™ Technology

Dual Dynamic Power Management™

Low-Power DDR2 Memory

AMD Smart Fetch technology

Page 6: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy6

New Innovations in Six-Core AMD Opteron™ processor (“Istanbul”)

• Six cores per socket

• Six core support for F (1207) socket infrastructure

• Improves performance (compared to Quad-Core AMD Opteron™ processor)

• HT Assist – in multi-socket systems:

• Reduces probe traffic

• Resolves probes more quickly

• Higher HyperTransport™ 3.0 Technology Speeds

• Support for up to 4.8GT/s per link

• Overall system performance

Page 7: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy7

HT Assist : What is it?

• Micro-architectural feature in Six-Core AMD Opteron™ processor

• Helps reduce memory latency

• Helps increase overall system performance in 4-socket and 8-socket systems

• Improves HyperTransport™ technology link efficiency and increases performance by:

• Reducing probe traffic

• Resolving probes more quickly

• Probe “broadcasting” can be eliminated in 8 of 11 typical CPU-to-CPU transactions

Page 8: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy8

HT Assist : How does it work?

CPU 1 CPU 2

CPU 3 CPU 4

L3 L3

L3

CPU 1 CPU 2

CPU 3 CPU 4

L3 L3

L3 L3 L3

= Data Request = Probe Request =L3 Directory

= Data Response = Probe Response =Directory Read

Without HT Assist (Total 10 transactions)

With HT Assist(Total 2 transactions)

Query Example:

Page 9: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy9

HT Assist : What is the cache directory?• The HT Assist is a sparse directory cache

• Associated with memory controller of home node

• Tracks all lines cached in the system from home node

• Logically part of the memory controller

• Physically in L3 cache, occupying 1MB of L3 cache

• For many transactions, eliminates probe broadcasts

• Host CPU knows exactly which CPU to probe for data

• local accesses get local DRAM latency,

• less queuing delay due to lower HT traffic overhead

• Results in reduced latency and increased system performance in multi-socket systems

Page 10: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy10

HT Assist: What is the result?• Helps reduce memory latency• Helps increase overall system performance• 4-way stream memory bandwidth performance improves

by ~60% (42 GB/s with HT Assist, and 25.5 GB/s without HT Assist)*

• Can result in faster query times that can increase performance for cache sensitive applications:

•Database•Virtualization•HPC T Assist vs. 25.5GB/s without HT

Assist)*

*See backup slides for performance and configuration information.

Page 11: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy11

HyperTransport™ 3.0 Technology

• Advantages of HyperTransport™ 3.0 technology (HT3)

•Compared to HyperTransport™ 1.0 technology (HT1), improves system bandwidth between CPUs and I/O

•Increased interconnect rate (from 2GT/s with HT1 up to 4.8GT/s per link with HT3)

•Improves overall system balance and scalability, especially in commercial applications (database, web server, etc.)

T Assist vs. 25.5GB/s without HT Assist)*

Page 12: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy12

Six-Core AMD Opteron™ Processor Support For Red Hat Enterprise Linux®• Excellent relationship with Red Hat

• Hardware enablement

• Virtualization and performance collaboration

• Six-Core AMD Opteron™ processor works best with RHEL5.4

• Continued support for AMD-V™ with Rapid Virtualization Index

• Continued support for AMD Power Now!™ technology driver

• Continued support for Xen 2MB super pages

Page 13: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy13

Six-Core AMD Opteron™ Support For Red Hat Enterprise Linux®

• RHEL5.4: New Features and support

• Supports Six-Core AMD Opteron™ processors

• AMD-Vi on SR5690 enabled systems

• KVM virtualization support

• RHEL6.0: New Features and support

• 1GB huge page table

Page 14: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy14

RHEL5.4 Performance Testing on Six-Core AMD Opteron™ “Istanbul”

• Bare Metal Scalability Testing with Oracle OLTP workload

• Multiple instance testing with OLTP workload

• Taking advantage of NUMA

• KVM multiguest testing with Oracle OLTP workload

• KVM multiguest testing with Sybase OLTP workload

Page 15: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy15

RHEL5.4 Testing on Six-Core AMD Opteron™ “Istanbul”

• Bare Metal Scalability testing with Oracle OLTP

workload• Multiple Instance testing with Oracle OLTP workload• Taking advantage of NUMA• KVM Multiguest testing with Oracle OLTP workload• KVM Multiguest testing with Sybase OLTP workload

Page 16: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy16

Hardware Configuration

System 4 Socket - Six-Core AMD Opteron(tm) Processor 8431 @ 2400.099 MHz

64 GB Memory

Storage

HP – HSV300

Fusion IO SSD Device

Page 17: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy17

Scaling with Oracle OTLP workload

10U 20U 40U 60U 80U 100U0.00

50000.00

100000.00

150000.00

200000.00

250000.00

300000.00

350000.00

400000.00

450000.00

RHEL54 – FCRHEL54 – SSD

Number of Users

Tran

s / M

in

Graph shows scaling with Oracle OLTP workload (Running in batch commit mode) Scaling improves with storage with low latency higher throughput characteristics

Page 18: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy18

KVM – 2 Vcpu Multi guest - Oracle OLTP

1 Guest – 2Vcpu 2 Guests – 4 Vcpu 4 Guests – 8 Vcpu 8 Guests – 16 Vcpu0.00

50000.00

100000.00

150000.00

200000.00

250000.00

0

100

200

300

400

500

600

700

800

100

216.15

398.53

742.56

24 cpu Istanbul - 64G

No. of Guests - Total Vcpu

Tran

s / m

in

Scaling with multiple 2 Vcpu guests running Oracle OLTP workload – Near linear Scaling

Page 19: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy19

KVM – 4 Vcpu Multi guest - Oracle OLTP

1Guest–4 Vcpu-8G 2Guest–8 Vcpu-16G 4Guest–16 Vcpu-32G 6Guest-24 Vcpu-48G 8G-32Vcpu-64G-Oversub0.00

50000.00

100000.00

150000.00

200000.00

250000.00

300000.00

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

450.00

500.00

24 cpu Istanbul - 64G

No of Guests - Total Vcpu - Total Memory

Tra

ns /

min

4 vcpu multi guest testing with Oracle OLTP workload shows good scalingLast bar shows no significant penalty with oversubscription of cpus

Page 20: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy20

KVM – 8 Vcpu Multi guest - Oracle OLTP

1Guest-8 Vcpu-14G 2Guest-16 Vcpu-28G 3Guest-24Vcpu-42G 4Guest-32Vcpu–56G0.00

50000.00

100000.00

150000.00

200000.00

250000.00

300000.00

100

120

140

160

180

200

220

240

260

280

300

100

206.02

271.99277.85

Istanbul - 24 cpus - 64G

No of Guests - Total Vcpu-Total Memory

Tra

ns /

min

8 vcpu multi guest testing with Oracle OLTP workload shows linear scalingLast bar shows no significant penalty with oversubscription of cpus

Page 21: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy21

NUMA – pinning with numactl

1Guest-6vcpu -14G2Guest-12vcpu-28G

3Guest-18 vcpu-42G4Guest-24vcpu-56G

1Guest-6vcpu -14G-NUMA2Guest-12 vcp u-28G-NUMA

3Guest-18 vcp u-42G-NUMA4Guest-24vcpu-56G-NUMA

0.0 0

5000 0.00

10 0000.00

15 0000.00

2000 00.00

2500 00.00

3000 00.00

3500 00.00

40 0000.00

0

50

100

150

200

250

300

350

400

450

100

197.56

291.2

326.68

100

196.66

295.76

389.1

Istanbul - 24 CPUs - 64G Mem

No of Guest - Total vCPU - Total Mem

Tra

ns /

Min

Platform shows good scaling without NUMA tuning (Bars 1-4)Using numactl, linear scaling is achieved with multiple guests (Bars 5 -8)

Page 22: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy22

NUMA – Pinning with taskset

The platform supports NUMA. By pinning 4 database instances into 4 NUMA nodesa 10% performance improvement was seenThe platform supports NUMA. By pinning 4 database instances into 4 NUMA nodes

a 10% performance improvement was seen (Compare bar 3 & 4)

1 Instance 2 Insta nces 4 Instances 4 Instances NUMA0.00

10 0000.0 0

2000 00.00

3000 00.00

40 0000.0 0

5000 00.00

6000 00.00

RHEL5.4 Multi - Instance ScalingOracle database workload

No of Instances

No

of T

rans

Page 23: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy23

KVM – 4 Vcpu Multi guest - Sybase OLTP

1Guest–4 Vcpu-8G 2 Guest–8 Vcpu-16G 4 Guests–16 Vcpu-32G 6 Guests–24 Vcp u-48G0.0 0

2000 0.00

40 000.00

6000 0.00

8000 0.00

10 0000.0 0

12 0000.0 0

140000.00

16 0000.0 0

Istanbul - 24 cpu - 64G

Guests - Tota l Vcpu - Tota l Memory

Tra

ns /

min

4 vcpu guests showed scaling trend as more guests were added.Scaling was not linear as the workload was not tuned to run in KVM guest

Page 24: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy24

KVM – 8 Vcpu Multi guest - Sybase OLTP

8 vcpu guests showed scaling trend as more guests were added.Scaling was not linear as the workload was not tuned to run in KVM guest

1 Guest–8Vcpu-14G 2 Guests–16 Vcp u-28G 3 Guests–24 Vcp u-42G0.0 0

2000 0.00

40 000.00

6000 0.00

8000 0.00

10 0000.0 0

12 0000.0 0

140000.00

Istanbul 24 cpu - 64G

Guests - Total Vcpu-Total Memory

Tra

ns /

Min

Page 25: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy25

Conclusion

Six-core AMD Opteron™ Processor “Istanbul” has shown:

• Good Vertical scaling

• Storage (low latency)

• Memory (Dense memory)

• Good Horizontal scaling

• Consolidation

• Virtualization• Storage (low latency , high bandwidth)• Memory (Dense memory)• NUMA

Page 26: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy26

Conclusion (contd)

Six-core AMD Opteron™ Processor “Istanbul”:

• Retains the prior generation innovations

• Adds new innovations• six-core, HTAssist, higher HyperTransport 3.0 bandwidth

• Optimized on RHEL, new hardware features enabled

• System consolidation in data centers

• What is your data center story?

Page 27: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy27

Questions?

Bhavna Sarathy

[email protected]

Sanjay Rao

[email protected]

Page 28: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy28

Backup

Page 29: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy29

Four-Socket STREAM Performance Improvement with HT Assist (slide 10)42GB/s using 4 x Six-Core AMD Opteron™ processors (“Istanbul”) Model 8435 in Tyan Thunder n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit (with HT Assist enabled)25.5GB/s using 4 x Six-Core AMD Opteron™ processors (“Istanbul”) Model 8435 in Tyan Thunder n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit (with HT Assist disabled)24GB/s using 4 x Quad-Core AMD Opteron™ processors (“Shanghai”) Model 8384 in Tyan Thunder n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit9GB/s using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model E7450 in Supermicro X7QC3+ motherboard, 32GB (16x2GB DDR2-667 FB-DIMM) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit

Page 30: Maximizing Six-Core AMD Opteron Processor Performance …docs.huihoo.com/redhat/2009/bsarathy_11_maximizing_amd.pdf · • Improves HyperTransport™ technology link efficiency and

Red Hat Summit 2009 | Bhavna Sarathy30

Disclaimer & AttributionDISCLAIMERThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION© 2009 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD CoolCore, AMD Opteron, AMD PowerNow!, AMD Virtualization, AMD-V, Dual Dynamic Power Management, and combinations thereof are trademarks of Advanced Micro Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Technology Consortium. Microsoft, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.


Recommended