+ All Categories
Home > Documents > Benchmarking Swift

Benchmarking Swift

Date post: 23-Feb-2016
Category:
Upload: joshwa
View: 66 times
Download: 0 times
Share this document with a friend
Description:
Benchmarking Swift. Eamonn O’Toole Mark Seger. Agenda. Benchmarking with HP’s getput Procedure, tools and operation Case study Selecting servers for HP’s public cloud. The Benchmarking Bible. Scripts work best for repeatability Both for load generation and measurement - PowerPoint PPT Presentation
Popular Tags:
22
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Benchmarking Swift Eamonn O’Toole Mark Seger
Transcript
Page 1: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Benchmarking SwiftEamonn O’Toole

Mark Seger

Page 2: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Agenda

• Benchmarking with HP’s getput– Procedure, tools and operation

• Case study– Selecting servers for HP’s public cloud

Page 3: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

The Benchmarking Bible• Scripts work best for repeatability

– Both for load generation and measurement

• Test from the bottom of the stack up• Longer runs tend to reduce cache effects• The middle of the test is as important as the duration• Avoid changing more than 1 thing at a time• It will take as long as it will take• There’s no such thing as a coincidence!

Page 4: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Size Matters• Large Objects

– IOPS are small, so pay attention to MB/sec– These use a lot of bandwidth so make sure network wide enough– Use a lot of CPU so could need ~1core/stream/client

• Small Objects– MB/sec is low, so pay attention to IOPS– Network bandwidth is less of a concern but latency is– CPU requirements are relatively low as well

Page 5: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Collectl• Developed about a dozen years ago• Open Source on sourceforge• Collects fine-grained metrics

– CPU, Disk, Network, Memory and more– Process level, including I/O

• Can generate stats in real-time or record for later playback– In playback mode can summarize metrics for each process

• Colplot generates plots for visualizing overall performance

Page 6: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Getput Tools • Designed exclusively for Swift Benchmarking• Lots of options for simulating lots of behaviors

– Puts, Gets, Deletes– Object sizes– Number of clients– Number of processes– Level of container sharing

• Options for running tests– Ranges for numbers of objects, processes and clients– Define pre/post test initialization/analysis scripts

• The complete list beyond the scope of this talk

Page 7: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Getting Started• Need swift credential exported to your environment• If swift stat works, getput will work and if it doesn’t it won’t!

$ ./getput.py -cc -oo -n1 -s1k -tp,g,dRank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange0 put 1 1 1k 20:28:04 20:28:04 0.01 1 7.30 0 0.137 0.137 0.14-00.140 get 1 1 1k 20:28:04 20:28:04 0.13 1 132.40 0 0.008 0.008 0.01-00.010 del 1 1 1k 20:28:04 20:28:04 0.06 1 66.32 0 0.015 0.015 0.02-00.02

Simple put, get, del

$ ./getput.py -cc -oo -n1 -s1k,2k -tp --procs 1,2Rank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange0 put 1 1 1k 20:32:55 20:32:55 0.02 1 19.92 0 0.050 0.050 0.05-00.050 put 1 1 2k 20:32:55 20:32:55 0.07 1 37.50 0 0.027 0.027 0.03-00.03Rank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange0 put 1 2 1k 20:32:55 20:32:55 0.03 2 28.82 0 0.071 0.084 0.06-00.080 put 1 2 2k 20:32:56 20:32:56 0.21 2 109.18 0 0.019 0.022 0.02-00.02

Multiple sizes, multiple number of processes

Note that 1KB PUTs are a lot slower than 2KB PUTs

Page 8: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Watching with collectl$ ./getput.py -cc -oo -n1 -s1g -tpRank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange0 put 1 1 1g 20:43:51 20:44:05 77.57 1 0.08 0 13.201 13.201 13.20-13.20

Large object upload

p# <----CPU[HYPER]-----><----------Disks-----------><----------Network---------->#Time cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut20:52:28 4 0 440 217 0 0 12 1 13 26 3 2420:52:29 13 3 454 100 0 0 44 11 0 3 0 220:52:30 10 1 14913 30949 0 0 0 0 841 14253 57030 3960520:52:31 12 1 20892 44930 0 0 0 0 1221 20841 82666 5715420:52:32 12 1 21092 44454 0 0 0 0 1248 21296 82315 5689420:52:33 11 1 19808 40054 0 0 0 0 1162 19839 76518 5292820:52:34 6 0 16505 33347 0 0 0 0 927 15824 69085 4790820:52:35 7 0 17832 34715 0 0 0 0 1028 17541 67448 4685820:52:36 6 0 20819 42114 0 0 0 0 1219 20785 80389 5562820:52:37 9 0 10210 20885 0 0 0 0 591 10080 40941 2829020:52:38 6 0 20067 39984 0 0 12 1 1160 19802 75784 5255220:52:39 8 0 21208 44885 0 0 56 14 1263 21552 82416 5698520:52:40 12 1 18289 36995 0 0 0 0 1073 18311 71868 4975820:52:41 8 0 20044 37608 0 0 0 0 1223 20872 94048 6474320:52:42 8 0 17100 28888 0 0 0 0 850 14503 91449 6251620:52:43 12 0 19396 35053 0 0 0 0 1143 19512 92891 6379220:52:44 6 0 5005 6023 0 0 0 0 178 3025 25813 1746720:52:45 6 0 364 142 0 0 0 0 0 2 0 220:52:46 4 0 188 72 0 0 0 0 0 1 0 1

Collectl Network rate is NOT smooth

Page 9: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Running a Benchmark

• PUTs scale linearly through 16 process, rate increases are slower at 32 and 48• GETs look read good through 32 processes and slow down a bit at 48• DELs had some irregular latencies in upper range

$ gpsuite --suite 1kobjsTest Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency LatRangeput 1 1 1k 11:36:24 11:38:24 0.02 2081 17.34 0 0.058 0.052 0.01-00.78get 1 1 1k 11:38:54 11:39:11 0.12 2081 119.78 0 0.008 0.007 0.01-00.27del 1 1 1k 11:39:41 11:40:15 0.06 2081 60.50 0 0.017 0.011 0.01-00.75put 1 2 1k 11:40:45 11:42:45 0.03 4030 33.58 0 0.060 0.052 0.01-01.03get 1 2 1k 11:43:15 11:43:31 0.25 4030 258.12 0 0.008 0.007 0.01-00.25del 1 2 1k 11:44:01 11:44:33 0.12 4030 126.27 0 0.016 0.011 0.01-00.76put 1 4 1k 11:45:03 11:47:03 0.06 7864 65.50 0 0.061 0.052 0.01-00.97get 1 4 1k 11:47:33 11:47:48 0.50 7864 514.76 0 0.008 0.007 0.01-00.22del 1 4 1k 11:48:18 11:49:01 0.21 7864 210.04 0 0.019 0.011 0.01-00.84put 1 8 1k 11:49:31 11:51:31 0.12 14711 122.56 0 0.065 0.052 0.01-00.99get 1 8 1k 11:52:01 11:52:16 0.95 14711 975.96 0 0.008 0.007 0.01-00.25del 1 8 1k 11:52:46 11:53:37 0.29 14711 298.07 0 0.027 0.011 0.01-01.23put 1 16 1k 11:54:07 11:56:07 0.24 29435 245.23 0 0.065 0.052 0.01-01.33get 1 16 1k 11:56:37 11:56:52 1.88 29435 1927.82 0 0.008 0.007 0.01-00.26del 1 16 1k 11:57:23 11:58:31 0.45 29435 459.14 0 0.035 0.012 0.01-00.96put 1 32 1k 11:59:01 12:01:01 0.38 46277 385.58 0 0.083 0.053 0.01-01.04get 1 32 1k 12:01:31 12:01:44 3.58 46277 3662.58 0 0.009 0.007 0.01-00.62del 1 32 1k 12:02:14 12:03:40 0.54 46277 549.55 0 0.058 0.012 0.01-03.43put 1 48 1k 12:04:11 12:06:11 0.51 62605 521.51 0 0.092 0.054 0.01-01.49get 1 48 1k 12:06:41 12:06:56 4.41 62605 4520.88 0 0.011 0.007 0.01-00.53del 1 48 1k 12:07:26 12:09:07 0.63 62605 640.82 0 0.075 0.021 0.01-02.23

gpsuite –suite 1kobjs

Page 10: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Example of getput maxing out

Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency LatRangeput 1 1 1k 16:43:50 16:48:50 0.02 6116 20.39 0 0.049 0.01-01.08put 8 32 1k 16:50:17 16:55:18 0.24 62506 207.68 0 0.154 0.01-03.84put 8 128 1k 16:56:51 17:01:56 0.08 32430 107.51 0 1.191 0.01-07.12put 8 256 1k 17:03:40 17:08:45 0.08 35056 115.50 0 2.216 0.01-09.40put 8 512 1k 17:09:36 17:14:44 0.16 44663 147.10 0 3.481 0.01-308.52put 8 1024 1k 17:15:41 17:20:49 0.16 44179 145.98 0 7.122 0.01-308.87

gpsuite –suite 1kobjs

• Look at the latencies growing in both average and range• Also notice we’ve hit the wall at a little <150 IOPS• BUT swift did keep on chugging along

Wow!

Note – this cluster only had 1 object server

Page 11: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Selecting servers for HP’s public cloud

• Get better understanding of Swift performance and optimise hardware/Swift combination

• Two different hardware configurations– 12-disk data servers

• Dedicated proxy servers• Data servers host account/container/object services• 5:1 data-servers:proxy-servers

– 60-disk data servers• Dedicated proxy servers• Data servers host object services only• Container/account services on separate servers to object services• 1:1 data-servers:proxy-servers

• Concentrate on transaction rates, especially PUTs of small objects (1KB to 10KB)– Most objects in production are small (50% <= 20KB)– High transaction rate exercises CPU, container & proxy services

Page 12: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Configuration 1

. . .

Disk1

Disk 1

Disk 12. . .

Disk 1

Disk 12. . .

.

.

.

Disk2

Proxy servers• 12 physical cores, 2666MHz• 96GB RAM• 10 GigE• 2*2TB 7200 RPM drives (mirror)• ½ U width

Data servers• 12 physical cores, 2666MHz• 24GB RAM• 1 GigE• 12*2TB 7200RPM drives• 1U high• Run object, container & account services

. . .

Server 1

Server 5

Page 13: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

First set of measurements: “idle” system

• Idle: no external PUTs, GETs, DELETEs etc• This system has 123K containers & 17M objects per data-server• Measurements with different services turned on and off in graph• Significant “idle” CPU load• Biggest contributor to “idle” CPU burn is container replicator

object &

cont se

rvers

only

object &

cont a

uditors

on

object &

cont u

pdaters

on

object &

cont re

plicators

on0

0.51

1.52

2.53

3.54

Total CPU cores for different "idle" runs – data-server

swift-object-updaterswift-object-serverswift-object-replicatorswift-object-expirerswift-object-auditorswift-container-updaterswift-container-syncswift-container-stats-loggerswift-container-serverswift-container-replicatorswift-container-auditorswift-account-serverswift-account-replicatorswift-account-reaperswift-account-auditor

Idle run type

CPU

core

s

Page 14: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

CPU measurements: 1KB object PUTs

9 137 226 271 3380

1

2

3

4

5

6

CPU cores per process per data-server

object-updaterobject-serverobject-replicatorobject-auditorcontainer-updatercontainer-synccontainer-servercontainer-replicatorcontainer-auditor

PUTs/s

CPU

core

s

Page 15: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

I/O measurements: 1KB object PUTs

9 137 226 271 3380

100000

200000

300000

400000

500000

600000

Read from Cache & Read from Disk

RKBRKBC

PUTs/s

KB/s

account-a

uditor

account-r

eaper

account-r

eplica

tor

account-s

erver

contai

ner-au

ditor

contai

ner-rep

licator

contai

ner-ser

ver

contai

ner-syn

c

contai

ner-update

r

object-a

uditor

object-r

eplica

tor

object-s

erver

object-u

pdater

0

100

200

300

400

500

600Write rate per process per PUT rate

9 Puts/Sec

137 Puts/Sec

226 Puts/Sec

271 Puts/Sec

338 Puts/Sec

Swift Process

KB W

rite/

s

account-a

uditor

account-r

eaper

account-r

eplica

tor

account-s

erver

container-

audito

r

container-

replica

tor

container-

serve

r

container-

sync

container-

updater

object-a

uditor

object-r

eplica

tor

object-s

erver

object-u

pdater

0

500

1,000

1,500

2,000

2,500

3,000

Read rate per process per PUT rate

9 Puts/Sec137 Puts/Sec

226 Puts/Sec

271 Puts/Sec338 Puts/Sec

Swift Process

KB R

ead/

s

Page 16: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Observations on Configuration 1 measurements

• Idle CPU burn is 34%, increases to 38% at the maximum-achieved PUT rate (approx 338 PUTs/s)

• Container services are the major CPU hogs• Small amount of memory hurts performance - most of the

reads go to disk as opposed to cache– Major source of reads: object auditor– Object server reads grow approx. linearly with PUT rate (read 6x

as much as write for 1KB PUTs)• Running the container service in conjunction with the

object service hurts I/O - the container data flushes object data from cache

Page 17: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Conclusions from Configuration 1 measurements

• PUT throughput (1KB) is limited by READ IOPs• Keep container and object services separate• The object services consumes relatively little

CPU• Large amounts of RAM for buffer cache will

help increase performance

Page 18: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Configuration 2

Disk 1

Disk 4. . .

Disk 1

Disk 60. . .

Proxy/Container & Account Servers• Same server type for proxy services & container/account

services• 12 physical cores, 2666MHz• 96-192GB RAM• 10 GigE• 4*1TB 7200 RPM disks, in a variety of RAID

configurations• ½ U width

Object servers• 12 physical cores, 2666MHz• 96GB RAM• 10 GigE• 60*2TB 7200 RPM disks• 4.3U high

. . .

. . .

Note• Used many combinations of server & Swift services• Used many variations of server details – e.g. RAID config• Report results for a specific server/Swift service config

Page 19: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Performance measurements: 4KB object PUTs

85.95

314.558333333333

769.616666666667

1117.16666666667

1372.35

1578.39166666667

1542.0333333333302468

1012141618

CPU cores per process per object-server

swift-object-serverswift-object-replicator

PUTs/sec

CPU

core

s

85.95

314.558333333333

769.616666666667

1117.16666666667

1372.35

1578.39166666667

1542.0333333333302468

1012141618

CPU cores per process per “proxy” server

swift-proxy-serverswift-container-updaterswift-container-syncswift-container-serverswift-container-replicator

PUTs/secCP

U co

res

20 477 986 1033 987 8940

50,000100,000150,000200,000250,000300,000350,000

Obj Read: Cache & Disk

rkbrkbc

Page 20: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Observations on Configuration 2 measurements

• We achieved a maximum throughput of approx 1600 PUTs/s using 1KB objects, and 2000 PUTs/s using 4KB objects

• Dramatic jump in CPU usage particularly on the Object-server for the 2000 PUTs/s run– Benefiting from hyperthreading

• On the Proxy/Account&Container-server, the dominant processes are the proxy-server and the container-server.

• All reads are satisfied from cache on Object-server and Proxy/Account&Container-server

Page 21: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Conclusions from Configuration 2 measurements

• Massive increase in operation throughput– 5x System 1 (per rack U)

• Proxy services and account/container services can coexist• Object auditing time probably an issue with 60 disks

– Estimate over 200 days for auditor to walk that many disks on “full” system

– Possible solution: parallel object auditor• Patch under review https://review.openstack.org/#/c/59778/

• Next steps– Detailed object auditor measurements– Large container measurements using SSDs, striped disks

Page 22: Benchmarking Swift

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Links

collectl : http://collectl.sourceforge.net/getput: https://github.com/markseger/getput


Recommended