+ All Categories
Home > Documents > Scalability & Availability

Scalability & Availability

Date post: 02-Jan-2016
Category:
Upload: hayes-serrano
View: 41 times
Download: 2 times
Share this document with a friend
Description:
Scalability & Availability. Paul Greenfield CSIRO. Building Real Systems. Scalable Fast enough to handle expected load Grow easily when load grows Available Available enough of the time Performance and availability cost Aim for ‘enough’ of each but not more. Scalable. Scale-up - PowerPoint PPT Presentation
41
1 Advanced Distributed Software Architectures and Technology group ADSaT Scalability & Availability Paul Greenfield CSIRO
Transcript
Page 1: Scalability & Availability

1Advanced Distributed Software Architectures and Technology group

ADSaT

Scalability & Availability

Paul GreenfieldCSIRO

Page 2: Scalability & Availability

2Advanced Distributed Software Architectures and Technology group

ADSaT

Building Real Systems

• Scalable– Fast enough to handle expected load– Grow easily when load grows

• Available– Available enough of the time

• Performance and availability cost– Aim for ‘enough’ of each but not more

Page 3: Scalability & Availability

3Advanced Distributed Software Architectures and Technology group

ADSaT

Scalable

• Scale-up– Bigger and faster systems

• Scale-out– Systems working to handle load– Server farms– Clusters

• Implications for application design

Page 4: Scalability & Availability

4Advanced Distributed Software Architectures and Technology group

ADSaT

Available

• Goal is 100% availability– 24x7 operations

• Redundancy is the key– No single points of failure– Spare everything

• Disks, disk channels, processors, power supplies, fans, memory, ..

• Automated fail-over and recovery

Page 5: Scalability & Availability

5Advanced Distributed Software Architectures and Technology group

ADSaT

Performance

• How fast is this system? – Not the same as scalability but related

• Scalability is concerned with the limits to possible performance

– Measured by response time and throughput

– Aim for enough performance• Have a performance target• Tune and add hardware until target hit• Then worry about tomorrow…

Page 6: Scalability & Availability

6Advanced Distributed Software Architectures and Technology group

ADSaT

Performance Measures

• Response time– What delay does the user see?– Instantaneous is good but 95%

under 2 seconds is acceptable– Response time varies with

‘heaviness’ of transactions• Fast read-only transactions• Slower update transactions• Effects of database contention

Page 7: Scalability & Availability

7Advanced Distributed Software Architectures and Technology group

ADSaT

Response TimesKeytable performance

0

2000

4000

6000

8000

10000

12000

14000

1 5 10 20 50 100 200 400 600 800 1000

Clients

Res

po

nse

tim

e (m

s)

Buy

Create

Get HS

Query C

Query ID

Sell

Update

Page 8: Scalability & Availability

8Advanced Distributed Software Architectures and Technology group

ADSaT

Response TimesIdentity performance

0

500

1000

1500

2000

2500

3000

1 5 10 20 50 100 200 400 600 800 1000

Clients

Res

po

nse

tim

e (m

s)

Buy

Create

Get HS

Query C

Query ID

Sell

Update

Page 9: Scalability & Availability

9Advanced Distributed Software Architectures and Technology group

ADSaT

Response TimesC++ response times

remote db - identity & keytable

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 200 400 600 800 1000 1200

Clients

Res

po

nse

tim

e (m

s)

Read ident

Update ident

Average ident

Read key

Update key

Average key

Page 10: Scalability & Availability

10Advanced Distributed Software Architectures and Technology group

ADSaT

Throughput

• How many transactions can be handled in some period of time– Transactions/second or tpm, tph or tpd– A measure of overall capacity

• Transaction Processing Council– Standard benchmarks for TP systems– TPCC for typical transaction system– www.tpc.org– Current record is 227,000 tpmc

Page 11: Scalability & Availability

11Advanced Distributed Software Architectures and Technology group

ADSaT

Throughput

• Throughput increases until some resource limit is hit– Adding more clients just increases

the response time– Run out of processor, disk

bandwidth, network bandwidth– Some resources overload badly

• Ethernet network performance degrades

Page 12: Scalability & Availability

12Advanced Distributed Software Architectures and Technology group

ADSaT

ThroughputC++ transaction rates

0

50

100

150

200

250

300

350

400

450

500

0 200 400 600 800 1000 1200

Client threads

TP

S

Local keytable

Local Identity

Remote identity 10M

Remote identity 100M

Remote keytable 100M

Page 13: Scalability & Availability

13Advanced Distributed Software Architectures and Technology group

ADSaT

System Capacity

• How many clients can you support?– Name an acceptable response time– Average 95% under 2 secs is common

• And what is ‘average’?

– Plot response time vs # of clients• Great if you can run benchmarks

– Reason for prototyping and proving proposed architectures before leaping into full-scale implementation

Page 14: Scalability & Availability

14Advanced Distributed Software Architectures and Technology group

ADSaT

System CapacityC++ average response times

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 200 400 600 800 1000 1200

Client threads

Res

po

nse

tim

e (m

s)

Local keytable

Remote keytable

Local identity

Remote identity

Page 15: Scalability & Availability

15Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

• A few different but related meanings• 1. Balancing across server processes

– CORBA-style where clients use objects that live inside server processes

– Want all server processes to be busy– Client calls have to go to the process

containing their object, even if this process is busy and others are idle

Page 16: Scalability & Availability

16Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

Simple Load balancing

02468

101214

0 10 20 30

Servers

%Load

No LoadBalancing

Load Balanced

Load Balanced

Page 17: Scalability & Availability

17Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

• Client calls on name server to find the location of a suitable server

• Name server can spread client objects across multiple servers– Often ‘round robin’

• Client is bound to server and stays bound forever– Can lead to performance problems

Page 18: Scalability & Availability

18Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I 

Server Object Reference

Client Numbers

Total Clients per server object

1 1-100 100

2 101-200 100

3 201-300 100

4 301-400 100

5 401-500 100

 

Server Object Reference

Client Numbers

Total Clients per server object

1 1-100, 201, 206, 211, ….496

160

2 101-200, 202, 207, 212, …, 497

160

3 203, 208, 213, …, 498

60

4 204, 209, 214, …, 499

60

5 205, 210, 215, …, 500

60

Initial Later

Page 19: Scalability & Availability

19Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

• Solution to static allocation problem is for clients to throw away their server objects and get new ones every now and again

• Application coding problem– And can be objects be discarded?– What kind of ‘objects’ are they if

they can be discarded?

Page 20: Scalability & Availability

20Advanced Distributed Software Architectures and Technology group

ADSaT

Name Servers

• Server processes call name server when they come up– Advertising their services

• Clients call name server to find the location of a server process– Up to the name server to match

clients to servers• Client calls server process to

create objects

Page 21: Scalability & Availability

21Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

Client

Client

Client

Name Server

Server process

Server process

Advertise service

Request server reference

Return server reference

Call server object’s methods

Get server object reference

Load balancing across processes within a server

Page 22: Scalability & Availability

22Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• What happens when our single system is full?– Use faster systems

• Scale-up

– Use additional systems• Scale-out• Now load-balancing is used to spread

load across systems

Page 23: Scalability & Availability

23Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• CORBA world…– Name server can distribute across

server processes running on different systems

– Scales well…• Name server only involved when

handing out a reference to a server, not on every method call

Page 24: Scalability & Availability

24Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

Client

Client

Client

Name Server

Server process

Server process

Advertise service

Request server reference

Return server reference

Call server object’s methods

Get server object reference

Load balancing across multiple systems

Page 25: Scalability & Availability

25Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• COM+ world…– No need for load-balancing within a

system• Multithreaded server process• All objects live in a single process space

– Component load balancing across systems• Client calls router when creating object• Router returns reference to an object in a

COM+ server process• Load balanced at time of object creation

Page 26: Scalability & Availability

26Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

Client

Client

Client

App

DLL

DCOM/

MTS

MTS process

Thread pool

Shared object space

Application code

COM+/MTS using thread pools rather than load balancing within a single system

Page 27: Scalability & Availability

27Advanced Distributed Software Architectures and Technology group

ADSaT

COM+ Component Load Balancing

Client

Client

Client

Response time tracker

RouterCreate object

Call object’s methods

Pass request to server

Create object and pass back reference

COM + CLB balancing load across multiple systems

Page 28: Scalability & Availability

28Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• COM+ scales well…– Router only involved when object is

created• May change in later release to support

dynamic re-balancing as server load changes

– Method calls direct from client to server– Allocation based on response time

rather than round-robin• Allocate to least-loaded server

Page 29: Scalability & Availability

29Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• No name server in COM world?– COM/MTS clients ‘know’ the name

of the server• Set at client installation time• Can change using GUI tools• Admin problem if server app is moved

– COM+ uses Active Directory to find services

Page 30: Scalability & Availability

30Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• Some systems involve the router in every method call/request– Request goes to router process who

then passes it on to a server process– Scales poorly as the router can be a

major bottle-neck– Some availability concerns as well

• What happens if the router fails?

Page 31: Scalability & Availability

31Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

Client

Client

Client

Router

Server process

Server process

Load balancing with router in main call path

Page 32: Scalability & Availability

32Advanced Distributed Software Architectures and Technology group

ADSaT

Scale-up

• No need for load-balancing across systems

• Just use a bigger box– Add processors, memory, ….– SMP (symmetric multiprocessing)

• Runs into limits eventually• Could be less available

Page 33: Scalability & Availability

33Advanced Distributed Software Architectures and Technology group

ADSaT

Scale-up

• Example from the Web– Large auction site– Server farm of NT boxes (scale-out)– Single database server (scale-up)

• 64-processor SUN box

– More capacity needed?• Add more NT boxes easily• SUN box is full so have to shift some

databases to another box

Page 34: Scalability & Availability

34Advanced Distributed Software Architectures and Technology group

ADSaT

Clusters

• A group of independent computers acting like a single system– Shared disks– Single IP address– Single set of services– Fail-over to other members of cluster– Load sharing within the cluster– DEC, IBM, MS, …

Page 35: Scalability & Availability

35Advanced Distributed Software Architectures and Technology group

ADSaT

ClustersClient PCsClient PCs

Server AServer A Server BServer B

Disk cabinet ADisk cabinet A

Disk cabinet BDisk cabinet B

HeartbeatHeartbeat

Cluster managementCluster management

Page 36: Scalability & Availability

36Advanced Distributed Software Architectures and Technology group

ADSaT

Clusters

• Address scalability– Add more boxes to the cluster

• Address availability– Fail-over– Add & remove boxes from the

cluster for upgrades and maintenance

• Can be used as one element of a highly-available system

Page 37: Scalability & Availability

37Advanced Distributed Software Architectures and Technology group

ADSaT

Web Server Farms

• Web servers are highly scalable– Web applications are normally stateless

• Next request can go to any Web server• State comes from client or database

– Just need to spread incoming requests• IP sprayers (hardware, software)• >1 Web server looking at same IP address

with some coordination (see MS WLB docs)

– Same technique for other network apps

Page 38: Scalability & Availability

38Advanced Distributed Software Architectures and Technology group

ADSaT

Available SystemWeb Clients

Web Servers Load balanced using Convoy

App Servers use COM+ LB

Database is installed on Wolfpack cluster for high availability

COM+ LBS router node

Page 39: Scalability & Availability

39Advanced Distributed Software Architectures and Technology group

ADSaT

Availability

• How much?– 99% 87.6 hours a year– 99.9% 8.76 hours a year– 99.99% 0.876 hours a year

• Need to consider operations as well– Maintenance, software upgrades,

backups, application changes– Not just faults and recovery time

Page 40: Scalability & Availability

40Advanced Distributed Software Architectures and Technology group

ADSaT

Availability and Scalability• Often a question of application design

– Stateful vs stateless• What happens if a server fails?• Can requests go to any server?

– What language and database API• Balance cost vs speed – VB/C++ - ODBC/ADO

– Synchronous method calls or asynchronous messaging?• Reduce dependency between components• Failure tolerant designs

Page 41: Scalability & Availability

41Advanced Distributed Software Architectures and Technology group

ADSaT

Next Week

• Distributed application architectures– How to design systems that will

work, scale and be available– Web-based systems– Web technology


Recommended