Post on 13-Dec-2015
transcript
Cluster-Based Scalable Cluster-Based Scalable Network ServiceNetwork Service
Author: Armando Author: Armando
Steven D.GribbleSteven D.Gribble
Yatin ChawatheYatin Chawathe
Eric A. BrewerEric A. Brewer
Paul GauthierPaul Gauthier
Presenter: Kang CaoPresenter: Kang Cao
Over ViewOver View
• IntroductionIntroduction• Cluster-Based Scalable Service Cluster-Based Scalable Service
ArchitectureArchitecture• Service ImplementationService Implementation• MeasurementsMeasurements• DiscussDiscuss• conclusionconclusion
IntroductionIntroduction
• GoalGoal• Advantages of ClustersAdvantages of Clusters• Challenges of Cluster computingChallenges of Cluster computing• BASE SemanticsBASE Semantics
GoalGoal
• ScalabilityScalability– Keep same per-user cost as load Keep same per-user cost as load
increases.increases.
• Availability:Availability:– Run 24 hour a day and 7 day a weekRun 24 hour a day and 7 day a week
• Cost effectivenessCost effectiveness
AdvantagesAdvantages
• ScalabilityScalability– Clusters are well suited to Internet Clusters are well suited to Internet
Service workloadService workload– Incremental scalability Incremental scalability
• High availabilityHigh availability• Commodity building blocksCommodity building blocks
– Cheap commodity PCCheap commodity PC– Get service quickly and cheapGet service quickly and cheap
challengeschallenges
• Administration Administration • Component VS. System replicationComponent VS. System replication• Partial failuresPartial failures• Share statesShare states
BASE SemanticsBASE Semantics
Against ACID(atomicity, Against ACID(atomicity, consistency,isolation,durability)consistency,isolation,durability)
• StaleStale• Soft stateSoft state• ApproximateApproximate
Cluster-Based Scalable Cluster-Based Scalable Service ArchitectureService Architecture
• Layer ArchitectureLayer Architecture• Separate network services from Separate network services from
their implementationtheir implementation• Stateless workersStateless workers
Cluster-Based Scalable Cluster-Based Scalable Service Architecture Service Architecture
• SNSSNS• TACCTACC• ServiceService
Scalable network serviceScalable network service
• Incremental and absolute scalabilityIncremental and absolute scalability• Worker load balancing and overflow Worker load balancing and overflow
managementmanagement• Front-end availability, fault tolerance Front-end availability, fault tolerance
mechanismsmechanisms• System monitoring and logging System monitoring and logging
SNSSNS
SNS SNS ManagerManager
SNS SNS ManagerManager
InternalInternalNetworkNetwork
Front EndFront EndFront EndFront End
MSMSMSMS
Front EndFront EndFront EndFront End
MSMSMSMS
Front EndFront EndFront EndFront End
MSMSMSMS
Worker DriverWorker DriverWorker DriverWorker Driver
WorkerWorkerWorkerWorker
Worker DriverWorker DriverWorker DriverWorker Driver
WorkerWorkerWorkerWorker
...
...
$
$
Internet
Load balanceLoad balance
• Centralized load balancingCentralized load balancing• Easy to implementEasy to implement
How to handle BurstsHow to handle Bursts
• Has a overflow poolHas a overflow pool• Manager can spawn workers on Manager can spawn workers on
overflow machines on the demandoverflow machines on the demand
ScalabilityScalability
• Components replicated Components replicated • Amount of additional resources Amount of additional resources
required is a linear function of the required is a linear function of the increase in offered loadincrease in offered load
• Partition the function between front Partition the function between front end and workerend and worker
• Keep worker as simple as possible Keep worker as simple as possible
Fault Tolerance and AvailabilityFault Tolerance and Availability
• Process peer fault toleranceProcess peer fault tolerance• Using soft statesUsing soft states• Timeout as an additional fault-Timeout as an additional fault-
tolerance mechanismtolerance mechanism
TACCTACC
TACC: Transformation, Aggregation, Caching, TACC: Transformation, Aggregation, Caching, CustomizationCustomization
• API for composition of stateless data API for composition of stateless data transformation and content aggregation transformation and content aggregation modulesmodules
• Uniform caching of original, post-aggregation Uniform caching of original, post-aggregation and post-transformation dataand post-transformation data
• Transparent access to Customization databaseTransparent access to Customization database
TACCTACC
A programming model for internet A programming model for internet ServiceService
• TransformationTransformation• Aggregation Aggregation • CachingCaching• CustomizationCustomization
Service ImplementationService Implementation
• Workers that present human Workers that present human interface to what TACC modules interface to what TACC modules do, including device-specific do, including device-specific presentationpresentation
• User interface to control the User interface to control the serviceservice
• Most service can be done at the Most service can be done at the service and TACC layersservice and TACC layers
Example:TranSendExample:TranSend
Model pool
switch
workstation Workstation workstation
Internet
TranSendTranSend
• Front EndsFront Ends• Load balancing ManagerLoad balancing Manager• User profile DatabaseUser profile Database• Cache NodesCache Nodes• Datatype-Specific DistillersDatatype-Specific Distillers• Graphical MonitorGraphical Monitor
Load Balancing ManagerLoad Balancing Manager
• Client-side JavaScript support Client-side JavaScript support balance load across multiple front balance load across multiple front endsends
• Centralized manager for internal Centralized manager for internal load balancingload balancing
Load balancingLoad balancing
• components register to managercomponents register to manager• Front end asks manager to give it a Front end asks manager to give it a
worker when it has taskworker when it has task• Manager locates a worker to Front endManager locates a worker to Front end• Manager may create a new distiller Manager may create a new distiller • Workers report their load to managerWorkers report their load to manager
Load balancingLoad balancing
• Manager broadcast the information Manager broadcast the information of load periodicallyof load periodically
• FrontEnds cache these informationFrontEnds cache these information• FrontEnds use the cached FrontEnds use the cached
information to dispatch requests to information to dispatch requests to workersworkers
Fault Tolerance and crash Fault Tolerance and crash RecoveryRecovery
• Using BASE semantics simplifies Using BASE semantics simplifies crash recoverycrash recovery
• Manager reports workers failures to Manager reports workers failures to the FrontEndthe FrontEnd
• Manager detects and restarts a Manager detects and restarts a crashed front endcrashed front end
• The front end detects and restarts The front end detects and restarts a crashed managera crashed manager
PerformancePerformanceLoad balancingLoad balancing
Performance:Performance:Load balancingLoad balancing
Conclusions:Conclusions:
• Layer architecture for cluster-base Layer architecture for cluster-base scalable network servicescalable network service
• The architecture is reusableThe architecture is reusable• Cluster-based value-added network Cluster-based value-added network
services will become an important services will become an important Internet-service paradigmInternet-service paradigm
Performance:Performance:ScalabilityScalability
questionquestion
1.1. Why are the cluster-based Why are the cluster-based network service well suited to network service well suited to internet serviceinternet service
answeranswer
• The requirements are highly The requirements are highly parallel( many indepent parallel( many indepent simultaneous users)simultaneous users)
• The grain size typically corresponds The grain size typically corresponds to at most a few CPU seconds on a to at most a few CPU seconds on a commodity PCcommodity PC
Question 2Question 2
• Why does the cluster-base network Why does the cluster-base network service use BASE semantics?service use BASE semantics?
Answer:Answer:
• BASE semantics allow us to handle BASE semantics allow us to handle partial failure in clusters with less partial failure in clusters with less complexity and cost.complexity and cost.
Question 3Question 3
• When the overflow machines are When the overflow machines are being recruited unusually often, being recruited unusually often, what should be done at this time?what should be done at this time?
Answer:Answer:
• It is time to add new machines. It is time to add new machines.
Question 4Question 4
• Does the Frontend crash not lost Does the Frontend crash not lost any information? If does, what kind any information? If does, what kind information will be lost?information will be lost?
Answer:Answer:
• User requests will be lost and user User requests will be lost and user need to handle timeout and resend need to handle timeout and resend request.request.