+ All Categories
Home > Documents > Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls...

Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls...

Date post: 25-Mar-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
28
Accelerating MySQL with JIT Compilers David Yeager Percona Live Santa Clara April 2018
Transcript
Page 1: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

Accelerating MySQL with JIT Compilers

David Yeager

Percona Live Santa Clara April 2018

Page 2: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

2

What is a Just-In-Time Compiler?

Java source code

Bytecode

Machine code

Java Compiler

Java JIT Compiler

C/C++ source code

Profiling Information

Dynimizer JIT Compiler

C Compiler

Machine code

Machine code

Page 3: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

3

How MySQL benefits from JITs

OVH public cloud, 2 vCores x 2.3 Ghz (Broadwell Xeon) template B2-7, BHS1 datacenter

time time time

Page 4: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

4

How MySQL benefits from JITs

*OVH public cloud, 2 vCores x 2.3 Ghz (Broadwell Xeon) template EG-7-SSD, BHS1 datacenter*tpcc-mysql is not validated or certified by the TPC corporation and so this is not an official TPC-C result

MySQL 5.7 tpcc-mysq / Wordpress

Page 5: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

5

$ sudo bash ­c 'bash <(wget ­O ­ https://dynimize.com/install) ­default'

Installation

Dynimizer Usage In a Nutshell

$ sudo dyni ­startDynimizer started

$ sudo dyni ­statusDynimizer is runningmysqld, pid: 20722, dynimizing

$ sudo dyni ­statusDynimizer is runningmysqld, pid: 20722, dynimized

Usage

Page 6: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

6

$ sudo dyni ­startDynimizer started

1 Start

$ sudo dyni ­statusDynimizer is runningmysqld, pid: 20722, profiling

3 Profiling

$ sudo dyni ­statusDynimizer is running

2 Monitoring

Dynimizer Usage4 Dynimizing$ sudo dyni ­statusDynimizer is runningmysqld, pid: 20722, dynimizing

$ sudo dyni ­statusDynimizer is runningmysqld, pid: 20722, dynimized

5 Dynimized

pid 20722drastically change

phase?

Y

N

Reoptimize (can be disabled)

Page 7: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

7

Hardening Dynimizer For Production

$ dyni -optimizeOnce:y

Default is to reoptimize after large changes in workload● This setting disables it● Prevents temporary performance overhead if had to re-optimize in

middle of a workload● No changes to machine code == more stable● More conservative● If workload changes drastically, Dynimizer improvement will be reduced

Page 8: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

8

Hardening Dynimizer For Production

$ dyni -secureCodeCache:y

Default code cache is executable, readable and writable at the same time

● This setting makes code cache executable and read-only ● Enable automatically on SELinux for extra security● You may want this enabled regardless

$ dyni -pid <number>

You may want to limit Dynimizer to a specific mysqld process

Page 9: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

9

Configuring with /etc/dyni.conf

[options]

log:/var/log/dyni.log

maxLogSize:1MB

optimizeOnce: n

fastCompile: n

initdService: n

secureCodeCache: n

[exeList]

mysqld

#sysbench

#tpcc_start

#[users]

#mysql

● This is dyni.conf after default installation

● Overridden by command-line options

– For example:

$ dyni -optimizeOnce:y will override dyni.conf

● Can target other programs by adding exe names under

[exeList]

– Non-mysqld targets not supported yet so test thoroughly!

Page 10: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

10

OLTP workloads are mostly front-end CPU stalls● Instruction cache misses, branch mispredictions, ITLB misses● Use profiling information to better layout the machine code, reduce

branching

Other profile guided optimizations● Hot call-site inlining, sparse conditional constant propagation ● Dead code elimination, copy propagation ● Loop unrolling, branch target alignment● Other optimizations

Sources of performance gain

Page 11: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

11

● High CPU usage

● Long running workloads

● Well indexed queries

● Have fully optimized MySQL, want even more performance

● Read heavy workload

● SELECT: lots of front-end CPU stalls

● Working set fits into the buffer pool

● Low CPU usage scenarios

● Lots of writes to slow disks

– IO bottleneck

● Working set doesn't fit in buffer pool

● Full table scans

● Short mysqld process lifetime

● > 5 k threads

– Current ptrace scales poorly

Most Beneficial Least Beneficial

When can Dynimizer help?

Page 12: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

12

 $ perf stat ­e r0280:u,r0380 ­p 30041 sleep 30

 Performance counter stats for process id '30041':

     3,224,918,396 r0280:u                [100.00%]

     39,530,772,359 r0380                                                      

When can Dynimizer help?

● I-cache misses a good indicator

● r0280 means I-cache misses for last several generations of Intel CPUs

● u: is user mode, r0380 is instruction fetches

● 3,224,918,396/39,530,772,359 = 8%

● > 5% indicates instruction bandwidth is a serious bottleneck

Page 13: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

13

DYNIMIZER PROCESS

TARGET PROCESS BEING OPTIMIZED

ORIGINAL PROGRAM MACHINE CODE

CODE CACHE

+

HIGH-LEVEL OPTIMIZATIONS

MICROARCHITECTURE- SPECIFIC OPTIMIZATIONS

IR

CONVERT TO MACHINE CODE

COLLECT SAMPLE BASED PROFILING DATA

LINUX PERF_EVENTS SUBSYSTEM

READ PROCESS STATE (MACHINE CODE, DATA)

LINUX PTRACE

COMMIT OPTIMIZED MACHINE CODE

LINUX PTRACE

MySQL + Dynimizer Architecture

IR

IR

Page 14: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

14

Dynimizer is the Everyman's PGO

● Available in GCC

– Compile with instrumentation

– Training run with profiling

– Recompile

● Difficult to find a representative workload that will stand up over time

● Labour intensive

● For large scale MySQL deployments that can amortize the labour

● Orders of magnitude easier

– Trivial usage: $ dyni -start

– Not required to build from source

– 1-5 minutes to optimize

● Zero downtime

● Includes shared libraries

● Way more flexible

– Can optimize code for each run

Profile Guided Optimization Dynimizer JIT

Page 15: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

15

Supported Targets

● Linux x86-64● That means mysqld

Optimization Target

MySQL Server

MariaDB Server

Percona Server

Version

5.5 – 5.7

5.5 – 10.2

5.5 – 5.7

Page 16: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

16

Sysbench: MySQL 5.7 OLTP-RO

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 2 4 8 16 32 64 128

Transactions/Second

Threads

WITH DynimizerWITHOUT Dynimizer

CPU: Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz, 4 cores, 8 Threads (Kaby Lake)RAM: 32 GB of 2400 MHz DDR4

*This is a dedicated server rented from OVH, model: SP-32 Server, data center BHS 5*Relative speedups the similar across various table size or number of tables, so long fits into memory

Page 17: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

17

20000

40000

60000

80000

100000

120000

140000

1 2 4 8 16 32 64 128

Transactions/Second

Threads

WITH DynimizerWITHOUT Dynimizer

20000

40000

60000

80000

100000

120000

140000

1 2 4 8 16 32 64 128

Transactions/Second

Threads

WITH DynimizerWITHOUT Dynimizer

Sysbench: MySQL 5.7 OLTP Simple

20000

40000

60000

80000

100000

120000

140000

1 2 4 8 16 32 64 128

Transactions/Second

Threads

WITH DynimizerWITHOUT Dynimizer

20000

40000

60000

80000

100000

120000

140000

1 2 4 8 16 32 64 128

Transactions/Second

Threads

WITH DynimizerWITHOUT Dynimizer

Page 18: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

18

5%10%15%20%25%30%35%40%45%50%55%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

Sysbench: TPS Increase

5%10%15%20%25%30%35%40%45%50%55%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

Threads

Page 19: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

19

5%10%15%20%25%30%35%40%45%50%55%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

-70.0%-65.0%-60.0%-55.0%-50.0%-45.0%-40.0%-35.0%-30.0%-25.0%-20.0%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

Reduction in Branch Mispredictions

Page 20: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

20

5%10%15%20%25%30%35%40%45%50%55%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

-100.0%

-80.0%

-60.0%

-40.0%

-20.0%

0.0%

20.0%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

Reduction in ITLB Misses

Page 21: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

21

5%10%15%20%25%30%35%40%45%50%55%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

-60.0%

-55.0%

-50.0%

-45.0%

-40.0%

-35.0%

-30.0%

-25.0%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

Reduction in I-Cache Misses

Page 22: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

22

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

Increase in Instructions Per Cycle

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

1 2 4 8 16 32 64 128

Threads

oltp read-onlyoltp-simple

selectselect-random-ranges

Page 23: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

23

Caveats: Steep warmup curve

Will be reduced in next major release

Page 24: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

24

Caveats: Memory Usage

● 4 GB per process during the dynimizing phase only– Freed once optimized

– Extra RAM not necessary. Just increase swap by 4 GB

– May not be appropriate for some micro cloud instances

● Will be reduced in next major release.

Page 25: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

25

Noteworthy attributes

● Exploiting Run-Time Information

● Zero downtime

● Optimize in minutes

● Target app source code not required

● Optimize across shared libraries

● Simple usage

● Little to no configuration necessary

Page 26: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

26

Coming soon...

● Cache compilation for instant optimized restart of target processes (mysqld)

● Lower profiling and memory overheads

● Improved phase change detection

● More optimizations

● Toggle between code cache versions depending on program phase

● Many more target programs to optimize.

– Have observed similar improvements with MongoDB● Many new optimizations and speedups along the way

Page 27: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

27

Questions?

To learn more visit dynimize.com

Page 28: Accelerating MySQL with JIT Compilers...10 OLTP workloads are mostly front-end CPU stalls Instruction cache misses, branch mispredictions, ITLB misses Use profiling information to

28

Rate My Session


Recommended