Download - Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Design

Consequences

of DevOps

Practices

Len Bass


Introductions

• Me

• You

2


Overview of Tutorial

• DevOps practices when taken to the limit for

internet scale organizations => continuous

delivery

• Economics of deployment when have many

instances of services => rolling upgrade

3


Outline

• What is DevOps?

– Definitions

– Deriving architecturally significant requirement

• Architectural style elaboration

• Deployment

• Summary

4


What is DevOps?

• “DevOps is a software development method that stresses

communication, collaboration, and integration between software

developers and IT professionals” – Wikipedia

• From an architect’s or developers’ perspective it means treating

system administrators and operators as first class stakeholders.

5


What is DevOps - 2

• DevOps is accompanied by a certain amount of

mysticism.

– “Be Self-Aware

– Be aware of a project’s maturity

– Be aware of others” http://architects.dzone.com/articles/zen-and-art-collaborative

• Similar to the early days of agile.

6

http://architects.dzone.com/articles/zen-and-art-collaborative









What problem is DevOps trying to solve?

• Poor communication between developers and

operations personnel

• Slow release schedule

• Limited capacity of operations staff

• Limited organizational insight into operations

7


Communication between developers and

operations staff

• Log messages

– What information is needed to do monitoring and

error diagnosis?

– Where is the best place to put particular types of

information?

• Release planning

– What is the scheduling for the next release?

– What capacity is needed for the next release?

– What are the infrastructure compatibility requirements

for the next release?

8


Release plan

1. Define and agree release and deployment plans with

customers/stakeholders.

2. Ensure that each release package consists of a set of related assets and

service components that are compatible with each other.

3. Ensure that integrity of a release package and its constituent components is

maintained throughout the transition activities and recorded accurately in

the configuration management system.

4. „„Ensure that all release and deployment packages can be tracked, installed,

tested, verified, and/or uninstalled or backed out, if appropriate.

5. „„Ensure that change is managed during the release and deployment

activities.

6. „„Record and manage deviations, risks, issues related to the new or changed

service, and take necessary corrective action.

7. „„Ensure that there is knowledge transfer to enable the customers and users

to optimise their use of the service to support their business activities.

8. „„Ensure that skills and knowledge are transferred to operations and support

staff to enable them to effectively and efficiently deliver, support and

maintain the service, according to required warranties and service levels *http://en.wikipedia.org/wiki/Deployment_Plan

9


Limited capacity of operations staff

• The number of physical servers that can be

administered by a single sys admin varies

depending on context but some data*

– As low as 10 per admin

– Norm of 30 per admin at small-medium businesses

• Depends on whether admin performs just

maintenance or whether admin is also involved

in other projects

*http://www.computerworld.com.au/article/352635/there_best_practice_server_

system_administrator_ratio_/

10


Limited Organizational insight into

operations

• An organization has budgetary insight into

operations.

• The impact of various operational activities on

business value is difficult to discern.

• This is a long running complaint that goes under

the heading of “aligning IT with the

business”. There are differences in

– Objectives

– Culture

– Incentives

11


DevOps can also be a role

• DevOps practices rely on a high degree of

automation and standardization of tools

• Someone has to be responsible for these tools.

• Person filling this role is “DevOps Engineer”

12


My Take on DevOps

• DevOps is a set of practices intended to – Reduce management overhead

– Speed up deployment

– Move some (formerly) IT responsibilities to developers

– Increase communication between developers and operations

– Reduce operations costs

• Are there architecturally significant requirements

in these practices?

13


Architecturally significant requirement

• Speed up deployment through minimizing

synchronous coordination among development

teams.

• Synchronous coordination such as a meeting

adds time since it requires – Ensuring that all parties are available

– Ensuring that all parties have the background to make

the coordination productive.

– Following up to decisions made during the meeting.

14


Summary of this section

• DevOps is a collection of practices designed, among

other things, to reduce time to deploy new features.

• Reducing time to deploy new features can be

accomplished by reducing synchronous coordination

among development teams

– This is an architecturally significant requirement that we will carry

forward.

15


Questions

16


Outline

• What is DevOps?

• Architectural Style Elaboration

– Micro Service Oriented Architecture

– Categories of design decisions

– How micro SOA specifies or delegates the categories

of design decisions

• Deployment

• Summary

17


Deployment pipeline

• Developers commit code

• Code is compiled

• Binary is processed by a build and unit test tool which

builds the service

• Integration tests are run followed by performance tests.

• Result is a machine image (assuming virtualization)

• The service (its image) is deployed to production.

18


Continuous Deployment

• Deployment pipeline is triggered by commit of

code

• All gates from one phase to the next are

automatic.

19


Requirements that drive the design in this

section

• Reduce synchronous communication among

development teams

– Continuous deployment

– Individual developers can commit to production (as

long as automated tests are passed)

• Scalability and performance

• Reliability

• A different ordering of requirements will produce

a different design

20


Architectural Style

• An architectural style (pattern) can specify many

decisions that might otherwise require

synchronous coordination among development

teams.

• The remainder of this section will justify why the

Micro Service Oriented Architecture style

satisfies our identified Architecturally Significant

Requirement.

21


Amazon design rules - 1

• All teams will henceforth expose their data and

functionality through service interfaces.

• Teams must communicate with each other

through these interfaces.

• There will be no other form of inter-process

communication allowed: no direct linking, no

direct reads of another team’s data store, no

shared-memory model, no back-doors

whatsoever. The only communication allowed is

via service interface calls over the network.

22


Amazon design rules - 2

• It doesn’t matter what technology they[services]

use.

• All service interfaces, without exception, must be

designed from the ground up to be

externalizable.

• Amazon is optimizing for its workload with these

requirements

– Mainly searching and browsing and web page

delivery

– Some transactions but not the dominant portion of the

workload. 23


Micro service oriented architecture

24

Service

• Each user request is

satisfied by some sequence

of services.

• Most services are not

externally available.

• Each service communicates

with other services through

service interfaces.

• Service depth may be 70,

e.g. LinkedIn


Relation of teams and services

• Each service is the responsibility of a single

development team

• Individual developers can deploy new version without

coordination with other developers.

• It is possible that a single development team is

responsible for multiple services

• Team size

• Coordination among team members

must be high bandwidth and low

overhead.

• Typically is done with small teams –

as in agile.

25


Design decisions

• Seven categories of design decisions*.

1. Allocation of responsibilities.

2. Coordination model.

3. Data model.

4. Management of resources.

5. Mapping among architectural elements.

6. Binding time decisions.

7. Choice of technology

*Software Architecture in Practice 3rd edition, Chap 4

26


Design decisions made or delegated by

choice of micro SOA

• Micro service oriented architecture either

specifies or delegates to the development team

five out of the seven categories of design

decisions.

1. Allocation of responsibilities.

2. Coordination model.

3. Data model.

4. Management of resources.

5. Mapping among architectural elements.

6. Binding time decisions.

7. Choice of technology

27


Roadmap for next several slides

• Micro service oriented architectural style will

either specify or allow delegation of five different

categories of design decisions.

• Each decision category will be discussed

separately.

28


Decision 1 – allocation of responsibilities

• This decision is not delegated to the team or

specified.

• Development teams must coordinate to divide

responsibilities for features that are to be added.

• Typically this happens at the beginning of each

iteration cycle.

29


Decision 2 - coordination model

• Elements of service interaction

– Services communicate asynchronously through

message passing

– Each service could (in principle) be deployed

anywhere on the net.

• Latency requirements will probably force particular

deployment location choices.

• Services must discover location of dependent services.

30


Service discovery

31

• When an instance of a

service is launched, it

registers with a

registry/load balancer

• When a client wishes

to utilize a service, it

gets the location of an

instance from the

registry/load balancer.

• Eureka is an open

source registry/load

balancer

Instance of

a service

Client

Register

Invoke

Registry/

load balancer

Query registry


Subtleties of registry/load balancer

• When multiple instances of the same service

have registered, the load balancer can rotate

through them to equalize number of requests to

each instance.

• Each instance must renew its registration

periodically (~90 seconds) so that load balancer

does not schedule message to failed instance.

• Registry can keep other information as well as

address of instance. For example, version

number of service instance.

32


Decision 3 – Data model

• Schema based database system (relational).

Requires coordination.

– Development teams must coordinate when schema is

defined or modified.

– Schema definition happens once when the

architecture is defined. Schema modification should

be rare occurrence. Schema extensions (new fields or

tables) do not cause problems.

• NoSQL systems. Will still require coordination

over semantics of data.

– Data written by one service is typically read by others,

they must agree on semantics.

33


Decision 4 – Resource Management

• Each instance of a service can process a certain

workload.

– Could be expressed in terms of requests

– Could be expressed in terms of resource

requirements – e.g. CPU

• Each client instance will require resources from

the service to process its requests.

• Service Level Agreements (SLAs) are a means

for automating the resource assumptions of the

clients and the resource requirements of the

service.

34


Managing SLAs

• A requirement for each service is to provide an SLA for

its response time in terms of the workload asked of it.

– E.g. For a workload of Y requests per second, I will

provide a response within X seconds.

• A requirement for each client is to provide an estimate of

the requests it will make of each dependent service.

– E.g. for each request I receive, I will make Z requests

for your service per second.

• This combination will enable a run time determination of

the number of instances required for each service to

meet its SLA.

35


Provisioning new instances

• When the desired workload of a service is greater than

can be provided by the existing number of instances of

that service, new instances can be instantiated (at

runtime).

• Four possibilities for initiating new instance of a service:

1. Client. Client determines whether service is adequately

provisioned for its needs based on service SLA and services

current workload.

2. Service. Service determines whether it is adequately

provisioned based on number of requests it expects from

clients.

3. Registry/load balancer determines appropriate number of

instances of a service based on SLA and client instance

requests.

4. External entity can initiate creation of new instances

36


Responsibilities of development teams.

• SLA determination of a service is done by the

service development team prior to deployment

augmented by run time discovery.

• Determination of a client's requirements for a

service are is done by the client’s development

team.

• Choice of which component has responsibility

for instantiating/deinstantiating instances of a

service is done as a portion of the architecture

definition.

37


Decision 5 – Mapping among architectural

elements

• Decisions about packaging modules into

processes and processes into a service are

delegated to the service development team.

• Decisions about deployment of a service will be

discussed in the next section.

38


Decision 6 – Binding time

• Configuration information binding time is

decided during the development of architecture

and the deployment pipeline.

• Other binding time decisions are delegated to

the service development team.

39


Decisions 7 – Technology choices

• All technology choices are delegated to the

service development team.

40


Questions about Micro SOA

• /Q/ Isn’t it possible that different teams will implement the

same functionality, likely differently?

• /A/ Yes, but so what? Major duplications are avoided

through assignment of responsibilities to services. Minor

duplications are the price to be paid to avoid necessity

for synchronous coordination.

• /Q/ what about transactions?

• /A/ Micro SOA privileges flexibility above reliability and

performance. Transactions are recoverable through

logging of service interactions. This may introduce some

delays if failures occur.

41


Summary

• Synchronous coordination among development

teams is avoided by

– Using a micro SOA architecture

– Having the architecture specify the coordination

model and resource management techniques used by

the application.

– Delegating to the development team mapping,

binding time, and technology decisions.

– Having each service be the responsibility of a single

development team.

• Micro SOA privileges flexibility and development

team independence over performance and

reliability.

42


Questions

43


Outline

• What is DevOps?

• Overall Architectural Style

• Deployment

– Deployment strategies

– Maintaining Logical Consistency.

• Summary

44


Deployment Overview

45

Multiple instances

of a service are

executing • Red is service being

replaced with new version

• Blue are clients

• Green are dependent

services

VA VB VB VB

UAT / staging / performance

tests


Deployment goal and constraints

• Goal of a deployment is to move from current

state (N instances of version A of a service) to a

new state (N instances of version B of a service)

• Constraints:

– Any development team can deploy their service at

any time. I.e. New version of a service can be

deployed either before or after a new version of a

client. (no synchronization among development

teams)

– It takes time to replace one instance of version A with

an instance of version B (order of minutes)

– Service to clients must be maintained while the new

version is being deployed. 46


Deployment strategies

• Two basic all of nothing strategies

– Big Flip – leave N instances with version A as they

are, allocate and provision N instances with version B

and then switch to version B and release instances

with version A.

– Rolling Upgrade – allocate one instance, provision it

with version B, release one version A instance.

Repeat N times.

• Other deployment topics

– Partial strategies (canary testing, A/B testing,). We

will discuss them later. For now we are discussing all

or nothing deployment.

– Rollback

– Packaging services into machine images

47


Trade offs - Big Flip and Rolling Upgrade

• Big Flip

– Only one version available

to the client at any

particular time.

– Requires 2N instances

(additional costs)

• Rolling Upgrade

– Multiple versions are

available for service at the

same time

– Requires N+1 instances.

• Rolling upgrade is

commonly preferred. 48

Update Auto Scaling

Group

Sort Instances

Remove & Deregister

Old Instance from ELB

Confirm Upgrade Spec

Terminate Old

Instance

Wait for ASG to Start

New Instance

Register New Instance

with ELB

Rolling

Upgrade

in EC2


Types of failures during rolling upgrade

Rolling Upgrade Failure

Provisioning

See references at end

Logical failure

Inconsistencies to be discussed

Instance failure

Handled by Auto Scaling Group in EC2

49


What are the problems with Rolling

Upgrade?

• Recall that any development team can deploy

their service at any time.

• Three concerns

– Maintaining consistency between different versions of

the same service when performing a rolling upgrade

– Maintaining consistency among different services

– Maintaining consistency between a service and

persistent data

50


Maintaining consistency between different

versions of the same service

• Key idea – differentiate between installing a new

version and activating a new version

• Involves “feature toggles” (described

momentarily)

• Sequence

– Develop version B with new code under control of

feature toggle

– Install each instance of version B with the new code

toggled off.

– When all of the instances of version A have been

replaced with instances of version B, activate new

code through toggling the feature. 51


Issues

• What is a feature toggle?

• How do I manage features that extend across

multiple services?

• How do I activate all relevant instances at once?

52


Feature toggle

• Place feature dependent new code inside of an

“if” statement where the code is executed if an

external variable is true. Removed code would

be the “else” portion.

• Used to allow developers to check in

uncompleted code. Uncompleted code is

toggled off.

• During deployment, until new code is activated,

it will not be executed.

• Removing feature toggles when a new feature

has been committed is important.

53


Multi service features

• Most features will involve multiple services.

• Each service has some code under control of a

feature toggle.

• Activate feature when all instances of all

services involved in a feature have been

installed.

– Maintain a catalog with feature vs service version

number.

– A feature toggle manager determines when all old

instances of each version have been replaced. This

could be done using registry/load balancer.

– The feature manager activates the feature.

54


Activating feature

• The feature toggle manager changes the value

of the feature toggle. Two possible techniques to

get new value to instances.

– Push. Broadcasting the new value will instruct each

instance to use new code. If a lag of several seconds

between the first service to be toggled and the last

can be tolerated, there is no problem. Otherwise

synchronizing value across network must be done.

– Pull. Querying the manager by each instance to get

latest value may cause performance problems.

• A coordination mechanism such as Zookeeper

will overcome both problems. I will discuss

Zookeeper if I have time at the end. 55


Maintaining consistency across versions

(summary)

• Install all instances before activating any new

code

• Use feature toggles to activate new code

• Use feature toggle manager to determine when

to activate new code

• Use Zookeeper to coordinate activation with low

overhead

56


Maintaining consistency among different

services

• Use case:

– Wish to deploy new version of service A without

coordinating with development team for clients of

service A.

• I.e. new version of service A should be backward compatible

in terms of its interfaces.

• May also require forward compatibility in certain

circumstances, e.g. rollback

57


Achieving Backwards Compatibility

• APIs can be extended but must always be

backward compatible.

• Leads to a translation layer

External APIs (unchanging but with ability to extend

or add new ones)

Translation to internal APIs

Client Client

Internal APIs (changes require changes to

translation layer but do not propagate further)


What about dependent services?

• Dependent services that are within your control

should maintain backward compatibility

• Dependent services not within your control (third

party software) cannot be forced to maintain

backward compatibility.

– Minimize impact of changes by localizing interactions

with third party software within a single module.

– Keeping services independent and packaging as

much as possible into a virtual machine means that

only third party software accessed through message

passing will cause problems.

59


Forward Compatibility

• Gracefully handle unknown calls and data base schema

information

– Suppose your service receives a method call it does

not recognize. It could be intended for a later version

where this method is supported.

– Suppose your service retrieves a data base table with

an unknown field. It could have been added to

support a later version.

• Forward compatibility allows a version of a service to be

upgraded or rolled back independently from its clients. It

involves both

– The service handling unrecognized information

– The client handling returns that indicate unrecognized

information. 60


Maintaining consistency between a service

and persistent data

• Assume new version is correct – we will discuss the

situation where it is incorrect in a moment.

• Inconsistency in persistent data can come about

because data schema or semantics change.

• Effect can be minimized by the following practices (if

possible).

– Only extend schema – do not change semantics of

existing fields. This preserves backwards

compatibility.

– Treat schema modifications as features to be toggled.

This maintains consistency among various services

that access data.

61


I really must change the schema

• In this case, apply pattern for backward

compatibility of interfaces to schemas.

• Use features of database system (I am

assuming a relational DBMS) to restructure data

while maintaining access to not yet restructured

data.

62


Summary of consistency discussion so far.

• Feature toggles are used to maintain

consistency within instances of a service

• Backward compatibility pattern is used to

maintain consistency between a service and it s

clients.

• Discouraging modification of schema will

maintain consistency between services and

persistent data.

– If schema must be modified, then synchronize

modifications with feature toggles.

63


Canary testing

• Canaries are a small number of instances of a new

version placed in production in order to perform live

testing in a production environment.

• Canaries are observed closely to determine whether the

new version introduces any logical or performance

problems. If not, roll out new version globally. If so, roll

back canaries.

• Named after canaries

in coal mines.

64


Implementation of canaries

• Designate a collection of instances as canaries. They do

not need to be aware of their designation.

• Designate a collection of customers as testing the

canaries. Can be, for example

– Organizationally based

– Geographically based

• Then

– Activate feature or version to be tested for canaries.

Can be done through feature activation

synchronization mechanism

– Route messages from canary customers to canaries.

Can be done through making registry/load balancer

canary aware.

65


A/B testing

• Suppose you wish to test user response to a

system variant. E.g. UI difference or marketing

effort. A is one variant and B is the other.

• You simultaneously make available both

variants to different audiences and compare the

responses.

• Implementation is the same as canary testing.

66


Rollback

• New versions of a service may be unacceptable

either for logical or performance reasons.

• Two options in this case

• Roll back (undo deployment)

• Roll forward (discontinue current deployment and

create a new release without the problem).

• Decision to rollback or roll forward is almost

never automated because there are multiple

factors to consider.

• Forward or backward recovery

• Consequences and severity of problem

• Importance of upgrade 67


States of upgrade.

• An upgrade can be in one of two states when an

error is detected.

– Installed (fully or partially) but new features not

activated

– Installed and new features activated.

68


Possibilities

• Initially we will discuss the situation where

persistent data is not incorrect. Later we will

discuss persistent data.

• Installed but new features not activated

– Error must be in backward compatibility

– Halt deployment

– Roll back by reinstalling old version

– Roll forward by creating new version and installing

that

• Installed with new features activated

– Turn off new features

– If that is insufficient, we are at prior case.

69


Persistent data

• Keep log of user requests (each with their own

identification)

• Identification of incorrect persistent data • Tag each data item with metadata that provides service and

version that wrote that data

• user request that caused the data to be written

• Correction of incorrect persistent data (simplistic

version)

– Remove data written by incorrect version of a service

– Install correct version

– Replay user requests that caused incorrect data to be

written

70


Persistent data correction problems

I will not present good solutions to these problems.

1. Replaying user requests may involve requesting

features that are not in the current version.

– Requests can be queued until they can be correctly

re-executed

– User can be informed of error (after the fact)

2. There may be domino effects from incorrect data. i.e.

other calculations may be affected.

– Keep pedigree for data items that allows determining

which additional data items are incorrect. Remove

them and regenerate them when requests replayed.

– Data that escaped the system, e.g. sent to other

system or shown to a user, cannot be retrieved.

71


Summary of rollback options

• Can roll back or roll forward

• Rolling back without consideration of persistent

data is relatively straightforward.

• Managing erroneous persistent data is

complicated and will likely require manual

processing.

72


Packaging of services

• The last portion of the deployment pipeline is

packaging services into machine images for

installation.

• Two dimensions

– Flat vs deep service hierarchy

– One service per virtual machine vs many services per

virtual machine

73


Flat vs Deep Service Hierarchy

• Trading off independence of teams and

possibilities for reuse.

• Flat Service Hierarchy

– Limited dependence among services & limited

coordination needed among teams

– Difficult to reuse services

• Deep Service Hierarchy

– Provides possibility for reusing services

– Requires coordination among teams to discover

reuse possibilities. This can be done during

architecture definition.

74


Services per VM Image

75

Service

1

Service

2

VM image

Develop

Develop

Embed

Embed

One service per VM

Service VM image

Develop Embed

Multiple services per VM


One Possible Race Condition with Multiple

Services per VM

76

TIME

Initial State: VM image with Version N of Service 1 and Version N of Service 2

Developer 1

Build new image with VN+1|VN

Begin provisioning

process with new image

Developer 2

Build new image with VN|VN+1

Begin provisioning


without new version of

Service 1

Results in Version N+1 of Service 1 not being

updated until next build of VM image

Could be prevented by VM image build tool


Another Possible Race Condition with Multiple

Services per VM

77

TIME

Initial State: VM image with Version N of Service 1 and Version N of Service 2

Developer 1

Build new image with VN+1|VN

Begin provisioning


overwrites image

created by developer 2

Developer 2

Build new image with VN+1|VN+1

Begin provisioning


Results in Version N+1 of Service 2 not being

updated until next build of VM image

Could be prevented by provisioning tool


Trade offs

• One service per VM

– Message from one service to another must go

through inter VM communication mechanism – adds

latency

– No possibility of race condition

• Multiple Services per VM

– Inter VM communication requirements reduced –

reduces latency

– Adds possibility of race condition caused by

simultaneous deployment

78


Summary of Deployment

• Rolling upgrade is common deployment strategy

• Introduces requirements for consistency among

– Different versions of the same service

– Different services

– Services and persistent data

• Other deployment considerations include

– Canary deployment

– A/B testing

– Rollback

79


Question

80


Zookeeper

• What purpose does Zookeeper serve?

• Use cases

– Leader election

– Group membership

– Distributed locks

– Synchronization

– Configuration

• In our case, we will use Zookeeper to manage

activating features

81


Distributed applications

• Zookeeper provides guaranteed consistent

(mostly) data structure for every instance of a

distributed application.

– Definition of “mostly” is within eventual consistency

lag (but this is small)

• Zookeeper deals with managing failure as well

as consistency.

– Done using Praxis algorithm.

• Zookeeper guarantees that service requests are

linearly ordered and processed in a FIFO order


Model

• Zookeeper maintains a file type data structure

– Hierarchical

– Data in every node (called znode)

– Amount of data in each node assumed small (<1M)

– Intended for metadata

• Configuration

• Location

• Group


Zookeeper znode structure

/

<data>

/b1

<data>

/b1/c1

<data>

/b1/c2

<data>

/b2

<data>

/b2/c1

<data>


API

Function Type

create write

delete write

Exists read

Get children Read

Get data Read

Set data write

+ others

• All calls return atomic views of state – either

succeed or fail. No partial state returned. Writes

also are atomic. Either succeed or fail. If they

fail, no side effects.


Use Case – leader election

• Many distributed applications have master

(leader)/slave structure

– One master, many slaves

– Master

• Sends work to slaves

• Monitors health of slaves and creates new ones as needed.


Using Zookeeper to elect master

• Suppose master fails. Then must create/choose

a new master.

• All candidates issue “create” call with node

name “master”.

• Only one of these create requests will succeed,

the rest will fail. This is one of the consistency

elements enforced by Zookeeper.

• Client who successfully creates znode named

“master” will become new master.


Using Zookeeper to manage group membership

• App connects to zookeeper – Get list of zookeeper servers

– Create session (if server fails – automatic fail over)

• Known group name – Create /group_name

• If already exists get a failure

• Client joins group by creating /group_name/my_id

• Client can list children of /group_name and get members of group.

• Watcher will inform client if group members fail or leave.


Using Zookeeper to manage distributed locks - 1

• Naïve solution

– All clients attempt to create /lockname

– Successful client has lock.

• Client will delete znode when finished with lock

• Znode will be deleted if client fails

– Unsuccessful clients will watch /lockname. If it is

deleted then they will attempt to create it.

– Repeat


Distributed locks – 2

• Problem with naïve solution is “herd effect”.

– If many clients all wake up and try to grab lock at

once there will be an impact on the system load.

• Better solution is for each client to watch

predecessor.

– Zookeeper enforces order

– When predecessor deletes /lockname, then client will

acquire it.

– If predecessor fails, client is informed and will watch

predecessor’s predecessor. Etc.


Using Zookeeper for distributed synchronization

• Create new synchronization client. – It creates synchronization node

– Other clients register on synchronization node at beginning of computation.

– At end of computation they remove themselves from synchronization node

• Synchronization client watches clients that have registered themselves. If one fails, it removes it from synchronization node.

• When synchronization node is empty, synchronization client deletes it and other clients (who are watching) can proceed.


Using Zookeeper for configuration

Each client records configuration information as

data in a child node it creates under a main

configuration node.

Checking configuration is a matter of getting data

from all of the children of the configuration node.


Using Zookeeper to synchronize activation

of features.

• Feature manager creates Znode containing

– <feature flag name, feature flag value>

– Written only when all services available.

• Service retrieves feature flag value from Znode

– If (Znode_read_value(feature flag name) then

feature is active

else

feature is inactive

• Feature flag value guaranteed to be consistent across

services.

• Latency is low (order of micro seconds) since

Zookeeper keeps data structures in memory. 93


Summary of tutorial

• DevOps practices lead to requirement to

minimize inter team coordination

• Continuous deployment has no human

intervention from developer commit until

deployment to production

• Micro SOA architectural style determines or

delegates 5 of 7 design decision categories

• Deployment strategies raise issues of

consistency. Separation of installation and

activation enables turning features on or off.

• Zookeeper is one tool to manage synchronizing

the activations of features. 94


NICTA Team

• Anna Liu

• Alan Fekete

• Min Fu

• Daniel Sun

• Hiroshi Wada

• Ingo Weber

• Xiwei Xu

• Liming Zhu

95


Readings

• http://www.slideshare.net/lenbass/what-is-dev-ops-for-

review

• http://www.slideshare.net/lenbass/02-team-practcies-

and-overall-architecture

• http://www.slideshare.net/lenbass/03-build-structure-

and-testing

• NICTA research papers.

https://ssrg.nicta.com.au/projects/cloud/

96

http://www.slideshare.net/lenbass/what-is-dev-ops-for-review












http://www.slideshare.net/lenbass/02-team-practcies-and-overall-architecture