Download - Leveraging open source tools to gain insight into OpenStack Swift

Presented by:

Leveraging open source tools to gain insight into OpenStack SwiftMay 20, 2015

Michael Factor, IBM Fellow, Storage and Systems, IBM Research - Haifa

Dmitry Sotnikov, System and Storage Researcher, IBM Research - Haifa

Deep dive insights into Swift

The work was done with help of:

Yaron Weinsberg George Goldberg

For more information contact: [email protected]

Swift Monitoring

• Monitoring Swift With StatsD• https://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd/

• Unified Instrumentation and Metering of Swift• https://wiki.openstack.org/wiki/UnifiedInstrumentationMetering

• Administrator’s Guide• http://docs.openstack.org/developer/swift/admin_guide.html#cluster-telem

etry-and-monitoring

“Once you have all this great data, what do you do with it? Well, that’s going to require its own post.“

“Monitoring Swift With StatsD” by SwiftStack, Inc2

https://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd/

https://swiftstack.com/blog/2012/04/11/swift-monitoring-with-statsd/

https://wiki.openstack.org/wiki/UnifiedInstrumentationMetering

http://docs.openstack.org/developer/swift/admin_guide.html#cluster-telemetry-and-monitoring

http://docs.openstack.org/developer/swift/admin_guide.html#cluster-telemetry-and-monitoring

Swift Monitoring Flow

• StatsD allows deep instrumentation of the Swift code and can report over 100 metrics.

• Collectd gathers statistics about the system• Graphite is an enterprise-scale monitoring tool that

stores and displays numeric time-series data• Logstash is a data pipeline that can normalize the

data to a common format.• Elasticsearch is a search server that allows

indexing large amounts of data.• Kibana is a browser based analytics and search

interface for Elasticsearch. • Spark is a fast and general engine for large-scale

data processing. • RequestStopper catches the request and returns

success, enabling isolating overheads in a non-production system.

Pro

xy S

erve

r

Co

nta

ine

r S

erve

r

Ob

ject

Se

rver

Acco

un

t S

erve

r

StatsD

CPU Statistics

RAM Statistics

Disk Statistics

Monitoring, Analytics and Visualization Node

Swift Node

RequestStopper

3

Benchmark Tool

• COSBench, Intel’s Cloud Object Storage Bench-marking tool• https://github.com/intel-cloud/cosbench

4

https://github.com/intel-cloud/cosbench

https://github.com/intel-cloud/cosbench

Where Our Journey Starts

• Swift 1.13• 1 container• Half a million small objects • 100 COSBench workers• What should be the cluster size to run more then 1000 PUTs a

second? (with reasonable response time)

5

• 3 proxy node (Proxy servers only)• 7 storage nodes (Object, Container and Account servers)

• 20 HDD• 2 SSD• 256 GB RAM

• 3 clients machines connected to Proxy• All network connections are 10 Gbps

Our Hardware - Story #1

520 operations per second

6

Swift Data Path Flow

• The Put object request arrives to one of the Proxies.• The Proxy sends the request to R (e.g., 3) storage

nodes, that will hold the R (e.g., 3) replicas of that object.

• Next, the container database is updated asynchronously to reflect the new object in it. (https://swiftstack.com/openstack-swift/architecture/)

• It is not fully asynchronous, but on timeout of 0.5 sec – first it tries to make a synchronous update.

• When at least two of the three writes to the object servers return successfully, the proxy server process will notify the client that the upload was successful.

Proxy Server

Object Server

Container Server

Client Put Request Response

7

https://swiftstack.com/openstack-swift/architecture/



Swift Data Path Flow

Proxy Server

Object Server

Container Server


Null Container Server

Null Object Server

Null Proxy Server

While nulling out a server is not useful for a production system, it is useful to diagnose performance bottlenecks 8

RequestStopper

https://gist.github.com/gilv/7e70ba055f24bcc472b6 9

Swift Data Path Flow – Put Request Response Time

Proxy Server

Object Server

Container Server


RequestStopper at Container Server

RequestStopper at Object Server

RequestStopper at Proxy Server

192.47 ms

47.3 ms

32.89 ms

1.86 ms

10

Object PUT Operations Average Response Time per Swift Component 100 Workers, 500K Objects

SWIFT 1.13 : 1 container SWIFT 1.13 : 100 containers0

50

100

150

200

250

Network RTT to Proxy Proxy ServerObject Server Container Server

Resp

onse

Tim

e (m

s)

X 4.7 faster

11

SWIFT 1.13 : 1 container

SWIFT 2.2 : 1 container

SWIFT 1.13 : 100 containers

SWIFT 2.2 : 100 containers

0

50

100

150

200

250

Network RTT to Proxy Proxy Server Object Server Container Server

Resp

onse

Tim

e (m

s)

Object PUT Operations Average Response Time Comparison of Swift 1.13 vs Swift 2.2100 Workers, 500K Objects

X 3 faster

X 1.5 faster

12

Mixed Workload: 1 Container, 100 Workers, 500K Objects

13

SWIFT 2.2 SWIFT 1.130

200

400

600

800

1000

1200

1400

1600

1800

Mixed Workload - 1 container

Read Write Delete

Ope

ratio

ns p

er S

econ

d

At Mixed Workload SWIFT 2.2 achieves 70% performance improvement

SWIFT Scalability – Swift 2.2100 Containers, 100 Workers, 500K Objects

14

2 3 4 5 6 70

50010001500200025003000

Measured Operation Ratio

Number of Storage Servers

Put O

pera

tions

per

Sec

ond

2 3 4 5 6 70

100200300400500600

Response Time Distribution

60%-RT 80%-RT 90%-RT 95%-RT 99%-RT

Number of Storage Servers

Tim

e (m

s)

This is a maximal performance that can be achieved by 100 COSBench worker, for Swift 2.2, so adding a new node does not improves the performance.

#Workers Operations per second Average Response Time (ms)

100 workers2854.31 38.98

200 workers3455.17 62.68

400 workers4323.52 101.96

Influence of number of COSBench Workers on Performance – Swift 2.27 Storage Nodes, 500K Objects, 100 Container

15

Story #1 Conclusions: RequestStopper

• In some cases the limiting factor is not throughput but response time• Response time of the native Swift 1.13 with 1 container is 192 ms ~5.2 op/sec per COSBench worker 520 op/sec per 100 COSBench workers• Reducing the response time to 65 ms at Swift 2.2 helps to get ~1560 IOPS on

same cluster

16

Story #1 Conclusions: Container Server

• The difference in the Container Server performance between Swift 2.2 and Swift 1.13 was due in large part to the container merge_items speedup patch (https://review.openstack.org/#/c/116992/)

• Container Sharding (https://review.openstack.org/#/c/139921/) still has a potential to improve the performance for this workload by a factor of 1.5

17

https://review.openstack.org/#/c/116992/

https://review.openstack.org/#/c/139921/

System Size Influence on Performance

18

10 100 1000 100000

10

20

30

40

50

60

70

80

90

1 container 100 containers

Number of Objects per Container

Aver

age

Resp

onse

Tim

e (m

s)

5 kops 10 kops 50 kops 100 kops 500 kops0

50

100

150

200

250

300

350

1 container 100 containers

Number of Objects at System

Aver

age

Resp

onse

Tim

e (m

s)

SWIFT performance (response time) is influenced by the number of objects per container. In our environment we identified an optimum number of objects – need to evaluate what affects the optimal number of objects per container

Where Our Journey Continues

• September 2014• Swift 1.13• 1 container• Half million small objects • What should be the cluster size to run more then 1000 PUTs in a

second? ( with reasonable response time )

19

20

Kibana – Put Request Response Time Percentiles

Resp

onse

Tim

e (s

ec)

Average Response Time – for 1 seconds intervals – SWIFT 2.2

2:23:532:27:432:31:332:35:232:39:132:43:032:46:532:50:432:54:332:58:233:02:153:06:053:09:553:13:453:17:353:21:253:25:153:29:050

50

100

150

200

250

300

350

400

450

Tim

e (m

s)

2:23:51 2:31:03 2:38:15 2:45:27 2:52:39 2:59:51 3:07:03 3:14:15 3:21:27 3:28:390

50100150200250300350400450

Tim

e (m

s)

21

https://bugs.launchpad.net/swift/+bug/1450656 ?

There is some peak each 30 sec

21

Resp

onse

Tim

e (s

ec)

Time

https://bugs.launchpad.net/swift/+bug/1450656

https://bugs.launchpad.net/swift/+bug/1450656

22

Graphite – PUT Request Response Time

30 seconds30 seconds

Resp

onse

Tim

e (s

ec)

Time

Zoom-in on Swift Response time outliers (> 0.5 sec) Request Granularity PUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2

0:36:00 0:36:43 0:37:26 0:38:09 0:38:52 0:39:36 0:40:190

2

4

6

8

10

12

14

16

30 seconds

23Time

Resp

onse

Tim

e (s

ec)

The effect of fs.xfs.xfssyncd_centisecs on PUT response timePUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2

8:08:398:09:548:11:098:12:248:13:398:14:548:16:098:17:248:18:398:19:548:21:098:22:248:23:398:24:548:26:098:27:248:28:398:29:540

100

200

300

400

500

600

700

800

300 seconds 60 seconds

300 seconds

60 seconds

24

Resp

onse

Tim

e (m

s)

Time

The effect of fs.xfs.xfssyncd_centisecs on PUT response timePUT workload, 500K objects, 100 Containers, 100 Workers, Swift 2.2

Seconds Avg-ResTime 60%-RT 80%-RT 90%-RT 95%-RT 99%-RT 100%-RT

10 83.26 30 50 350 520 700 1,62030 43.34 30 40 50 60 530 3,69060 38.81 30 40 50 70 270 5,900

300 31.89 30 40 50 70 220 9,530

Increasing of the fs.xfs.xfssyncd_centisecs improves the 99% percentile at the price of 100% percentile degradation

25

Story #2

26

• 2 proxy node (Proxy servers only)• 4 object nodes (Object servers)

• 15 HDD• 128 GB RAM

• 2 metadata nodes (Container and Account servers)• 2 SSD• 128 GB RAM

• 2 clients machines connected to Proxy• Internal network connections are 10 Gbps

Our Hardware - Story #2

27

28

Object PUT Workload100 Workers, 100 Containers, 500K Objects

29

Clients Transmitted Throughput

Thro

ughp

ut (M

B/se

c)

Time

30

Clients Transmitted vs. Proxy Servers Received Throughput Comparison

Proxy Servers Received Throughput

Clients Transmitted Throughput =

Total Client Transmitted NetworkTotal Proxy Received Network

Thro

ughp

ut (M

B/se

c)

Time

31

Proxy Servers Received vs. Proxy Servers Transmitted Throughput Comparison

Thro

ughp

ut (M

B/se

c)

Time Proxy Server Received ThroughputProxy Server Transmitted Throughput

X3

Proxy Servers Transmitted Throughput


32

Proxy Servers Received and Transmitted vs. Object Servers Received Throughput Comparison

Thro

ughp

ut (M

B/se

c)

Time Proxy Server Transmitted ThroughputProxy Server Received ThroughputObject Server Received Throughput

X3

Proxy Servers Transmitted Throughput


Object Servers Received Throughput

33

Network vs. Disks Throughput Comparison

Thro

ughp

ut (M

B/se

c)

Time Proxy Server Transmitted ThroughputProxy Server Received ThroughputObject Server Received ThroughputObject Servers Disks Write Throughput

Object Servers Disks Write Throughput


Object Servers Received Throughput

X12

34

Total Disks Capacity

Disks Capacity Utilization

35

Thro

ughp

ut (M

B/se

c)

Time

New object creation part

Rewrite workload

The expected disks capacity for all the workload

Number of async_pending requests over the time

36

#asy

nc_p

endi

ng re

ques

ts p

er s

ec

Time

Object1Object2Object3Object4

37

Variable object size workload

64 KB128 KB

32 KB15 KB

512 KB 1 MBTh

roug

hput

(MB/

sec)

Time Proxy Server Received ThroughputProxy Server Transmitted ThroughputObject Servers Disks Write Throughput

38

Disk vs. Client Perceived Bandwidth

64 KB

128 KB

32 KB

15 KB

512 KB 1 MB

The overhead is not flat 3x, but instead is a function of object size

Ratio

• 2 proxy node (Proxy servers only)• 5 object nodes (Object servers)

• 15 HDD• 128 GB RAM

• 3 metadata nodes (Container and Account servers)• 3 SSD• 128 GB RAM

• 2 clients machines connected to Proxy• Internal network connections are 10 Gbps

Our Hardware - Story #3 – without async_pendings

39

40

Network vs. Disks Throughput Comparison

Thro

ughp

ut (M

B/se

c)

Time Proxy Server Received ThroughputProxy Server Transmitted ThroughputObject Servers Disks Write Throughput

41

Disks Capacity UtilizationTh

roug

hput

(MB/

sec)

Time

42

Number of async_pending requests over the time

#asy

nc_p

endi

ng re

ques

ts p

er s

ec

Time

Back to Story #2

43

44

PUT Request Average Response Time (Lower is better)

Object1Object2Object3Object4Proxy1Proxy2

PUT

Requ

est R

espo

nse

Tim

e (m

s)

Time

Object Server “Object1” has much higher response time

“Object2/3/4” have lower response times than proxies

Proxies

Disks Read and Write Throughputs

45

Object1.Read

Object1.Write

Object2.Read

Object2.Write

Object3.Read

Object3.Write

Object4.Read

Object4.Write

Processes statistics over object servers

46

#Pro

cess

Wall Clock Time

Blocked processes – processes that are waiting to IO response

Running Blocked

The idle CPU comparison over object servers (Higher is better)

CPU0-CPU9 CPU10-CPU19 CPU20-CPU29 CPU30-CPU39 Stopped47

Perc

ent (

%)

Wall Clock Time

• Micro benchmark results (Vdbench 8k random write workload):

• “Object1” shows an average response time of ~37 ms • “Object2”, “Object3”, “Object4” show an average response time of

~30 ms

• Our investigation revealed that “Object1” server consists of older hardware components even though all servers were “supposed” to be the same

48

4 Object Servers

3 Object Servers – without Object1

~10% Throughput improvement, although 25% object servers reduction

49

Object Size Avg-Res Time Avg-Proc Time Throughput Bandwidth15 KB 38.98 ms 38.94 ms 2854.31 op/s 42.81 MB/S 1 MB 105.37 ms 103.6 ms 967.04 op/s 967.04 MB/S

10 MB 852.06 ms 578.62 ms 117.43 op/s 1.17 GB/S

Effect of Object Size on Cluster Bandwidth

50

Back of the envelope calculation:User Bandwidth * Number of Replicas < Total Proxy Backend Bandwidth

At our case we have 3 proxy servers, and 10 Gbit network:User Bandwidth*3 < 3*10 Gbit User Bandwidth < 1.25 GB/sec