Best Practices In Building Scalable Cloud Ready Service Based

Best Practices in building scalable cloud-ready Service based systems

Discussion

Igor MoochnickIgorShare [email protected]

Blog: www.igorshare.com/blog

Is this system scalable?

What is scalability?

• Increase resources -> Increase performance proportionally to the amount of added resources

• Increase performance -> more units of works

$10,000

machine

$1000machin

e

Scale-up And Scale-outVo

lum

e

$500machine

# MachinesScale Up

$500machin

e

$500machin

e

$500machin

e

Volu

me

$500machin

e

Scale Out

DNSWWW




Is this system a scalable MADNESS?

Here is the gun. Go kill yourself!

Strong

Eventual

Optimistic

Missile Launch

Address Change

Stock Ticker

Now

In the Future

Consistency Level Changes are Visible Example

Maybe in the Future

Some Useful Definitions

Exactly Once

At Least Once

At Most Once

Bank Transfer

Email

Streaming Video

No loss, no duplicates

No loss, duplicates

Assurance Message Delivery Example

Loss, no duplicates

Best Effort Stock TickerLoss, duplicates

Consistency Levels

Message Assurances

Where did you start? Where did you end up?Shared State

ACID Transactions

Partitioned, Replicated State

Eventual Consistency

Exactly Once Messaging Best Effort Messaging

Machine Loss is a Catastrophe

Keep Processes Running

Machine Loss is Business As Usual

Recovery-Oriented Computing

The InfrastructureDeveloper's Experience

The law

• The least scalable component of your system becomes a bottleneck for the whole system

Recipe ingredients (Amazon guidelines)• Autonomy• Asynchrony• Controlled concurrency• Controlled parallelism• Decentralize• Decompose into small well-understood building blocks• Failure tolerant• Local responsibility• Recovery Built-in• Simplicity• Symmetry

Key principals

• Things fail all the time!• Machines

– Disposable– Nameless– Self assembled

• State management– Caching– Loose consistency– Relax isolation

• Redundancy• Partitioning

• Loosely coupled messaging• Best effort• Message loss• Retries• Self monitors• Self heals• Designed to expect failures• Continue to work seamlessly

during the failure

Application Development Patterns• Architecture

– Choose a high-level framework– Keep service and hosting code separate– Partition

• Design– Use loose coupling– Use caches and stale data– Have just a few simple recovery paths– Be topology-independent– Be hardware-indepedent

Challenges Of Scalability

• How do I ensure incoming requests are processed at the right location?– Partition on service-specific input– Dynamically route to correct node– Fail over seamlessly

• How do I manage state inside my service?– Take a hard look at consistency requirements– Aggressively cache and use transient data– Partition the Storage Tier

ACID vs. BASE

• ACID– Atomic– Consistent– Isolated– Durable

• Modern BASE-based systems– Basically Available– Soft-state (or scalable)– Eventually consistent

What is the problem?

• Only two of three:– Strong Consistency

• All clients see the same view during updates– High Availability

• Some data replica is always available despite failures

– Partition tolerance• All the properties hold even if partitioned

Techniques

• Expiration based caching: AP• Quorum / majority algorithms: PC• Two-phase commit: AC

Scaling data in 3 steps

• Partitioning• Routing• State management

Solving the data congestion

• Throttling (especially on startup after failure)• Denormalization• Scale vs. Performance• Fault Tolerance and recoverability• Geo-distribution• Content distribution providers (like Akamai)

Fault tolerance

• Throttling incoming traffic• Limit retries• Server failover• Data center failover• Consider using queues

Monitoring

• Monitor data about what the user sees – this is what is most important

• Make sure not to overdo – kills the components you rely on

• Be frugal– Built in counters and monitor the trends - can help you to

predict the spikes and allocate on demand extra resources

Monitoring

• Availability• Performance• Alerts• Auto throttling• Capacity thresholds• Load• Transactions• Should measure

realistic/relevant actions and behavior!

Importance

Diagnosing & Logging

• Non-blocking• Asynchronously• Size – can be too big (there is “too much of a good thing”)

– Have control over “what” and “how much”• Performance hit (“do no harm”)• Should not become a bottleneck• Be careful what you log

– Horizontally– Vertically

• Should be able to replay logs and correlate the requests– <time><correlate-id><node-id><action><data><result>

Troubleshooting the distributed systems• Decoupling• Role isolation• Single box

• Allow to separate the functionality from the rest of the system

• Allow to run all from a single box• Have stubs and simulators• Be able to “replay” the logs

Deployments

• Deployment packaging• Rolling out gradually or

atomically• Automatic deployments• Staging environment• Building confidence with

real customer data• Rolling back• Security trumps feature• Load balancing

• Consider linear scale• Keep IT in mind• Upgradability

Deployment

• It’s hard• It’s hard to make it right• Automate everything – simplifies the repeatability• Version Forward/Backward compatible• Rolling upgrade and rollback• Be nice to your friends• Know and manage your environments

• Compensate for gradual system recovery• Clean queues

Resources

• Availability & Consistency presentation of Amazon CTO Dr. Werner Vogel http://www.infoq.com/presentations/availability-consistency

• Microsoft PDC’08 Presentationshttps://sessions.microsoftpdc.com/timeline.aspx

Q&A

Thank you!

Date post:	16-Apr-2017
Category:	Technology
Upload:	igor-moochnick
View:	4,761 times
Download:	0 times

Best Practices In Building Scalable Cloud Ready Service Based

Technology