+ All Categories
Home > Software > Performance evaluation between checkpoint services in multi tier stateful

Performance evaluation between checkpoint services in multi tier stateful

Date post: 15-Apr-2017
Category:
Upload: demis-gomes
View: 35 times
Download: 0 times
Share this document with a friend
93
Performance Evaluation Between Checkpoint Services in Multi-tier Stateful Applications Demis Gomes Advisor: Glauco Gonçalves Co-Advisor: Patricia Endo
Transcript
Page 1: Performance evaluation between checkpoint services in multi tier stateful

Performance Evaluation Between Checkpoint Services in Multi-tier Stateful

ApplicationsDemis Gomes

Advisor: Glauco GonçalvesCo-Advisor: Patricia Endo

Page 2: Performance evaluation between checkpoint services in multi tier stateful

2

INTRODUCTION

Page 3: Performance evaluation between checkpoint services in multi tier stateful

3

Introduction

• Plataform-as-a-Service (PaaS)

DeveloperPaaS

Application

User

PaaS Provider

Page 4: Performance evaluation between checkpoint services in multi tier stateful

4

Introduction

• Multi-tier stateful applications

Page 5: Performance evaluation between checkpoint services in multi tier stateful

5

Introduction

• It is important keep an application in a PaaS running as long as possible

• A downtime causes many financial losses

Page 6: Performance evaluation between checkpoint services in multi tier stateful

6

Introduction

• The average cost of a critical application failure per hour is $500,000 to $1 million.

Source: https://devops.com/2015/02/11/real-cost-downtime/ . Last access 11 out. 2016

Checkpoint Services!

Page 7: Performance evaluation between checkpoint services in multi tier stateful

7

IntroductionDevelopers Users

CheckpointService

PaaS Providers

Page 8: Performance evaluation between checkpoint services in multi tier stateful

8

Background

• A checkpoint service is divided into three mechanisms– Checkpoint saving– Failure detection– Failover

Page 9: Performance evaluation between checkpoint services in multi tier stateful

9

Background

• Checkpoint Service

AppActiveStandb

y

Checkpoint Service

AppState

AppState

AppState

Failover

CheckpointSaving

App

FailureDetection

Page 10: Performance evaluation between checkpoint services in multi tier stateful

10

Background

• Service Availability Forum (SAF)

• Three different implementations:– Non-collocated– Collocated warm– Collocated hot

Page 11: Performance evaluation between checkpoint services in multi tier stateful

11

Background

Page 12: Performance evaluation between checkpoint services in multi tier stateful

12

Checkpoint ServicesCS Application-level CS System-level

App

Agent

State-aware application

App

Agent

HA-agnostic application

ContainerCheckpoint

Manager CheckpointManager

Page 13: Performance evaluation between checkpoint services in multi tier stateful

13

Motivation

• Works presented either app-lvl [1] or sys-lvl [2]

• Lack of consistent comparison between these services

• No implementation in accordance with the SAF standard

Page 14: Performance evaluation between checkpoint services in multi tier stateful

14

Motivation

• Carry out a performance evaluation between system and application checkpoint services, where these models follow the SAF standard and evaluate the impact of different recovery modes in time and resource consumption

Page 15: Performance evaluation between checkpoint services in multi tier stateful

15

Answer three questions

• System-level ~= App-level?• Impact of changing from non-

collocated to collocated?• Bottlenecks of the system-level

and application-level?

Page 16: Performance evaluation between checkpoint services in multi tier stateful

16

CHECKPOINT SERVICES

Page 17: Performance evaluation between checkpoint services in multi tier stateful

17

Application

• State-aware application • A multi-tier stateful chat– Frontend: provides interface and

saves user’s data– Backend: saves room messages– Database: stores information related

to rooms and users

App AgentGET /state

200 OK

Page 18: Performance evaluation between checkpoint services in multi tier stateful

18

Application

• State provided via JSON (backend)

Page 19: Performance evaluation between checkpoint services in multi tier stateful

19

CS System-level

• We used well-known tools:– LXC as container–NFS as file system– rsync to transfer files between

instances– CRIU to establish checkpoint and

restore containers

CS: Checkpoint Service! :D

Page 20: Performance evaluation between checkpoint services in multi tier stateful

20

CS System-level

• We did not implement collocated hot because CRIU does not allow restore in a running instance

Page 21: Performance evaluation between checkpoint services in multi tier stateful

21

CS System-level

• Checkpoint in non-collocatedApp

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

Container

Container

Page 22: Performance evaluation between checkpoint services in multi tier stateful

22

CS System-level

• Checkpoint in collocated warmApp

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

rsync

Container

Container

Page 23: Performance evaluation between checkpoint services in multi tier stateful

23

Container

CS System-level

• Failover in non-collocatedApp

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

Container

Page 24: Performance evaluation between checkpoint services in multi tier stateful

24

Container

CS System-level

• Failover in collocated warmApp

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

Container

rsync

Page 25: Performance evaluation between checkpoint services in multi tier stateful

25

CS App-level

• CS at application-level was developed from scratch for this work

• REST resources

Remember, CS: Checkpoint Service! :D

GET http://{manager_ip}:{manager_port}/config

RESPONSE 200 OK Content-type: application/json

Page 26: Performance evaluation between checkpoint services in multi tier stateful

26

CS App-level

• Checkpoint at Application-level

App

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

State-aware application Non-collocated

Collocatedwarm

Collocatedhot

Page 27: Performance evaluation between checkpoint services in multi tier stateful

27

CS App-level

• Failover in non-collocated

App

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

Page 28: Performance evaluation between checkpoint services in multi tier stateful

28

CS App-level

• Failover in collocated warm

App

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

Page 29: Performance evaluation between checkpoint services in multi tier stateful

29

CS App-level

• Failover in collocated hot

App

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

Page 30: Performance evaluation between checkpoint services in multi tier stateful

30

EVALUATION

Page 31: Performance evaluation between checkpoint services in multi tier stateful

31

Evaluation

• Two evaluations were conducted– Evaluation I: Failover time

comparison – Evaluation II: Checkpoint time and

resources consumption comparison

Page 32: Performance evaluation between checkpoint services in multi tier stateful

32

Evaluation

Physical Machines: 16 GB RAM, 8 cores, Gigabit Interface

Page 33: Performance evaluation between checkpoint services in multi tier stateful

33

Evaluation I

• Methodology– Backend with 1, 5,10,15,20 and 25

MB of state sizes– Experiment Manager starts the

experiment and generates a failure alert

– Failover process is executed– Failover time is collected

Page 34: Performance evaluation between checkpoint services in multi tier stateful

34

Failover time – Non collocated

Application-level has a greater failover time

The growth is linear

Page 35: Performance evaluation between checkpoint services in multi tier stateful

35

Failover time – Non collocated

We estimate the failover time with state size increasing until 100 MB

App lvl would be 66% faster

Page 36: Performance evaluation between checkpoint services in multi tier stateful

36

Failover time – Collocated

Application-level collocated warm is greatly impacted with increase of state size

The values of app lvl collocated hot and sys lvl collocated warm are very similar

Page 37: Performance evaluation between checkpoint services in multi tier stateful

37

Failover time – Collocated

Linear regression shows:

High increase of app lvl collocated warm

Slight increase on sys lvl collocated warm

Constant values to collocated hot

Page 38: Performance evaluation between checkpoint services in multi tier stateful

38

Evaluation II

• Methodology– Similarly to the previous experiment,

states are saved in same state sizes– Experiment Manager triggers a

checkpoint process– Checkpoint time is collected– Resources consumption are

evaluated

Page 39: Performance evaluation between checkpoint services in multi tier stateful

39

Evaluation II

• Methodology– Resources consumption metrics

Metrics Measured inCheckpoint Time s

CPU Load %

Memory Occupation %

Network I/O Throughput Mbps

Disk I/O Throughput b/s

Page 40: Performance evaluation between checkpoint services in multi tier stateful

40

Evaluation IICheckpoint times

Page 41: Performance evaluation between checkpoint services in multi tier stateful

41

Evaluation II – Active InstanceAt 25MB CPU Memory Network (I/O) Disk (W)

Sys-lvl collocated

warm

6,8% 9,4% 0/59,8 Mbps 1300 b/s

App-lvl collocated

warm

2,7% 9,1% 0/8,8 Mbps 9220 b/s

App-lvl collocated hot

2,53% 9,5% 0/8,64 Mbps 8340 b/s

At 25MB CPU Memory Network (I/O)

Disk (W)

Sys-lvl non-collocated

6% 9,1% 0/81 Mbps 1780b/s

App-lvl non-collocated

2% 8,92% 0/11,6 Mbps

2410 b/s

Page 42: Performance evaluation between checkpoint services in multi tier stateful

42

Evaluation II – Standby InstanceAt 25 MB CPU Memory Network (I/O) Disk (W)

Sys-lvl collocated

warm

1,8% 10,3% 5,1/0 Mbps 12500 b/s

App-lvl collocated

warm

2,5% 11,9% 8,5/8,5 Mbps 7280 b/s

App-lvl collocated

hot

4,1% 12,4% 8,35/8,35 Mbps

6900 b/s

At 25 MB CPU Memory Network (I/O)

Disk (W)

Sys-lvl non-collocated

0,16% 9,8% 0/0 Mbps 800 b/s

App-lvl non-collocated

0,2% 11,4% 0/0 Mbps 2600 b/s

Page 43: Performance evaluation between checkpoint services in multi tier stateful

43

Discussion

• Availability Analysis in a year• Mean Time To Recovery (MTTR) as

failover time• Mean Time To Failure (MTTF) as

Apache Server (788.4h/year) [3]• Assuming that the failover time is 50

times greater• High Availability (HA) = 99.999%

(five nines)

Page 44: Performance evaluation between checkpoint services in multi tier stateful

44

Discussion

MTTR in25 MB (s)

MTTR in 25 MB with

factor 50 (s)

MTTF(s) Availability with factor 50 (%)

System-levelcollocated warm

0.38636 19.318 2838240 99.9993

Application-level collocated warm

1.27823 63.9115 2838240 99.997

Application-levelcollocated hot

0.25802 12.901 2838240 99.9995

System-levelnon-collocated

3.5441 177.205 2838240 99.9937

Application-level non-collocated

1.38795 69.3975 2838240 99.997

Availability analysis (25 MB)

Page 45: Performance evaluation between checkpoint services in multi tier stateful

45

Discussion

MTTR in100 MB

(s)

MTTR in 100 MB with

factor 50 (s)

MTTF(s) Availability with factor 50 (%)

System-levelcollocated warm

0.5902 29.51 2838240 99.9989

Application-level collocated warm

3.8621 193.1 2838240 99.993

Application-levelcollocated hot

0.2677 13.385 2838240 99.9995

System-levelnon-collocated

9.7999 498.995 2838240 99.9824

Application-level non-collocated

4.321 216.05 2838240 99.9923

Availability analysis (prediction until 100 MB)

Page 46: Performance evaluation between checkpoint services in multi tier stateful

46

CONCLUSIONS AND FUTURE WORKS

Page 47: Performance evaluation between checkpoint services in multi tier stateful

47

Conclusions

Answering the questions• System-level ~= App-level?

Yes! In collocated warm

Page 48: Performance evaluation between checkpoint services in multi tier stateful

48

Conclusions

• Impact of change from non-collocated to collocated?– Failover: great decrease– Checkpoint: great increase– Resources Consumption: Similar,

except of CPU and disk (greater on collocated)

Page 49: Performance evaluation between checkpoint services in multi tier stateful

49

Conclusions

• Bottlenecks of the system-level and application-level?

– App : disk, CPU in standby (hot) and development time

– Sys: CPU, network and NFS

Page 50: Performance evaluation between checkpoint services in multi tier stateful

50

Conclusions

• CS Application-level– Private PaaS – App with large state size and high

rate of checkpoints (massive online applications)

Page 51: Performance evaluation between checkpoint services in multi tier stateful

51

Conclusions

• CS System-level– PaaS with legacy applications– App with less state size and higher

checkpoint intervals

Page 52: Performance evaluation between checkpoint services in multi tier stateful

52

Conclusions

• PaaS Business Model– Non-collocated: Free plans– Collocated: Premium plans

Page 53: Performance evaluation between checkpoint services in multi tier stateful

53

Contributions

• Short paper approved with results of Experiment I, entitled:

“Failover Time Evaluation Between Checkpoint Services in Multi-tier Stateful Applications”

IM-2017, Exp. Session (Qualis B1)

Page 54: Performance evaluation between checkpoint services in multi tier stateful

54

Future Works

As future works, we will study• Scalability of services• Resources consumption on

Experiment Instance

Page 55: Performance evaluation between checkpoint services in multi tier stateful

55

Acknowledgments

• Thanks!

#CatãoEterno

Page 57: Performance evaluation between checkpoint services in multi tier stateful

57

References• [1] KANSO, Ali; LEMIEUX, Yves. Achieving High Availability

at the Application Level in the Cloud. In: 2013 IEEE Sixth International Conference on Cloud Computing. IEEE, 2013. p. 778-785.

• [2] LI, Wubin; KANSO, Ali; GHERBI, Abdelouahed. Leveraging linux containers to achieve high availability for cloud services. In: Cloud Engineering (IC2E), 2015 IEEE International Conference on. IEEE, 2015. p. 76-83

• [3] MELO, R. M. D. et al. Redundant vod streaming service in a private cloud: availability modeling and sensitivity analysis. Mathematical Problems in Engineering, Hindawi Publishing Corporation, v. 2014, 2014

Page 58: Performance evaluation between checkpoint services in multi tier stateful

58

BACKUP

Page 59: Performance evaluation between checkpoint services in multi tier stateful

59

Agenda

• Introduction• Checkpoint Services• Evaluation– Experiment I– Experiment II

• Conclusion and Future Works• Acknowledgments

Page 60: Performance evaluation between checkpoint services in multi tier stateful

60

Introduction

• PaaS contains several challenges, where one is the availability of your services

• Multi-tier stateful applications

Page 61: Performance evaluation between checkpoint services in multi tier stateful

61

Introduction

• Many PaaS not have a mechanism that handles failures on application

• Some offers a backup but is not transparent

Page 62: Performance evaluation between checkpoint services in multi tier stateful

62

Introduction

Tsuru only restarts application, not saving your last state

Page 63: Performance evaluation between checkpoint services in multi tier stateful

63

VM x Container

• VMs • Containerization

Page 64: Performance evaluation between checkpoint services in multi tier stateful

64

Objectives• General– Carry out a consistent comparison between

checkpoint in system and application levels• Specifics– Develop the two modes following SAF

standard– Compare the services among following

metrics:• Failover time• Checkpoint time• Load generated in application

Page 65: Performance evaluation between checkpoint services in multi tier stateful

65

Application

• Application generates new base states if– threshold defined by developer has

reached– A time limit has reached

App 20 new messages!

App 120 seconds without updates!

Page 66: Performance evaluation between checkpoint services in multi tier stateful

66

CS System-level

Page 67: Performance evaluation between checkpoint services in multi tier stateful

67

CS System-level

• Checkpoint/Restore In Userspace (CRIU)

• Saves memory context• Freezes processes reading

memory• Restores processes in machines

with same filesystem

Page 68: Performance evaluation between checkpoint services in multi tier stateful

68

CS System-level

• Phoenix!

Page 69: Performance evaluation between checkpoint services in multi tier stateful

69

Checkpoint Services Implementation

• URLS implemented by chat

Page 70: Performance evaluation between checkpoint services in multi tier stateful

70

Checkpoint Services

• CS Application-level

App

CheckpointManager

Agent

App

Agent

Standby Instance

ActiveInstance

State-aware application Non-collocated

Collocatedwarm

Collocatedhot

Page 71: Performance evaluation between checkpoint services in multi tier stateful

71VM/Container

Checkpoint Services

• CS System-levelApp

CheckpointManager

Agent

App

AgentStandby Instance

ActiveInstance

HA-agnosticapplication

Non-collocated

Collocatedwarm

Collocatedhot

VM/Container

Page 72: Performance evaluation between checkpoint services in multi tier stateful

72

CS System-level

• LXC must be configured to allow CRIU make checkpoint and restore

Page 73: Performance evaluation between checkpoint services in multi tier stateful

73

Evaluation II

• Methodology– Checkpoint time is presented as

means with 95% Confidence Interval (CI)

– Resource consumption are means with 95% CI related to active and standby instances

Page 74: Performance evaluation between checkpoint services in multi tier stateful

74

CS System-level

• Checkpoint process is established in non-collocated– saving container via CRIU and storing

your memory context in a shared file system between Manager and Agent

• In collocated:– saving container via CRIU and send

state via rsync to all standby instances

Page 75: Performance evaluation between checkpoint services in multi tier stateful

75

CS System-level

Page 76: Performance evaluation between checkpoint services in multi tier stateful

76

CS System-level

• Failover process (non-collocated)

Page 77: Performance evaluation between checkpoint services in multi tier stateful

77

CS System-level

• Failover process (collocated warm)

Page 78: Performance evaluation between checkpoint services in multi tier stateful

78

CS App-level

Page 79: Performance evaluation between checkpoint services in multi tier stateful

79

CS App-level

• In failover process (non-collocated)

Page 80: Performance evaluation between checkpoint services in multi tier stateful

80

CS App-level

• In failover process (collocated warm)

Page 81: Performance evaluation between checkpoint services in multi tier stateful

81

CS App-level

• In failover process (collocated hot)

Page 82: Performance evaluation between checkpoint services in multi tier stateful

82

Evaluation I

• T-test between app collocated hot and sys collocated warm

Page 83: Performance evaluation between checkpoint services in multi tier stateful

83

Evaluation IINetwork received (collocated modes)

Page 84: Performance evaluation between checkpoint services in multi tier stateful

84

Evaluation IINetwork received (non-collocated)

Page 85: Performance evaluation between checkpoint services in multi tier stateful

85

Evaluation IICPU Load (collocated modes)

Page 86: Performance evaluation between checkpoint services in multi tier stateful

86

Evaluation IICPU Load (non-collocated)

Page 87: Performance evaluation between checkpoint services in multi tier stateful

87

Evaluation IIMemory occupation (collocated modes)

Page 88: Performance evaluation between checkpoint services in multi tier stateful

88

Evaluation IIMemory occupation (non-collocated)

Page 89: Performance evaluation between checkpoint services in multi tier stateful

89

Evaluation IINetwork sent (collocated modes)

Page 90: Performance evaluation between checkpoint services in multi tier stateful

90

Evaluation IINetwork sent (non-collocated)

Page 91: Performance evaluation between checkpoint services in multi tier stateful

91

Evaluation IIDisk written (collocated modes)

Page 92: Performance evaluation between checkpoint services in multi tier stateful

92

Evaluation IIDisk written (non-collocated)

Page 93: Performance evaluation between checkpoint services in multi tier stateful

93

Acknowledgments

• Family• Friends• Creators• UFRPE• Advisors (the bests)• CNPq and FACEPE


Recommended