Adaptive Cluster Computing using
JavaSpaces
The design, implementation and evaluation of a framework that uses JavaSpaces
2006/08/09 林子鐸
ReferencesJ. Batheja and M. Parashar, “Adaptive Cluster Computing using JavaSpaces”, Proceedings of the 2001 IEEE International Conference on Cluster ComputingSun Microsystems. JavaSpaces, www.javasoft.com/products/javaspaces/specs (1998)E. Freeman, S. Hupfer, K. Arnold. JavaSpaces Principles, Patterns, and Practice. (June 1999)
Outline
IntroductionA Framework for Adaptive Parallel Computing on ClustersExperimental Evaluation of the FrameworkConclusions
Motivation: Traditional HPC is expensive
High Performance Computing (HPC)Why expensive?
Massively parallel processorsSupercomputersHigh-end workstation clusters
So, any alternatives?
Using idle resources in a networked system can be a more cost effective alternative
Opportunistic computing
What is Opportunistic Computing?
To provide large amounts of processing capacity by harnessing the idle and available resources on the network in an “opportunistic”manner Two Approaches
Job level approachAdaptive approach
Opportunistic Computing : Job level approach
Entire application jobs are allocated to available idle resources for computationA passive approach
Opportunistic Computing : Adaptive approach
Available processors are treated as part of a dynamic resource pool, and they aggressively competes for application tasksAn active approachThis approach targets applications that can be decomposed into independent tasksCluster based or web based
The Challenges to archive Opportunistic Computing
HeterogeneityIntrusivenessSystem configuration and management overheadAdaptability to system and network dynamicsSecurity and privacy
A suitable solution: JavaSpaces
What is it?A shared, network accessible repository for Java objects
The Principle of JavaSpaces
Master-worker parallel computing using JavaSpaces
A Framework for Adaptive Parallel Computing on Clusters
Three featuresPortability across heterogeneous platforms Minimal configuration overheads and runtime class loading at the participating nodesAutomated system state monitoring
Targets applications that are divisible into subtasks that can be solved independently
The Framework: Architecture Overview
The Framework: The Master Module
The Master ModuleHosts the JavaSpaces serviceDecomposes the application into independent tasksPlaces the tasks into the spaceTakes back the task results
Master JavaSpacesPut jobs
Get results
The Framework: The Worker Module
The Worker ModuleA thin moduleCan be configured and loaded at runtime Gets the tasks from the spacePut the task results back to the spaceControlled by the network management module
Worker JavaSpacesPut results
Get jobs
The Framework: The Worker Module
State transition diagram
Running0~25%
Paused25~50%
Stopped50~100%
Stop
Start
Pause
Resume
The Framework: The Network Management Module
Two functionsMonitor the state of workersProvide a decision making mechanism to facilitate the utilization of idle resources
Inference EngineRule based protocolMonitoring agent
Use SNMPTwo components: the manager component and the worker-agent component
The Framework: The Implementations
Remote Node ConfigurationUsing reflection to load classes dynamicallyRequired worker classes are downloaded from the master server
Dynamic Worker Management for Adaptive Cluster Computing
Dynamic Worker Management for Adaptive Cluster Computing
“Worker Module”
“Network Management
Module”
The EvaluationParallel Ray Tracing
An image generation techniqueDivide-&-Conquer
A 600x600 image is divided into 24 25x600 independent rectangular slices
Experiments (5 PCs)Scalability AnalysisAdaptation Protocol AnalysisAnalysis of Dynamic Worker Behavior Patterns under Varying Load Conditions
The Evaluation: Scalability Analysis
Measures the overall scalability of the frameworkCriteria
Max worker timeMaximum computation time among all workers
Task planning timeTime required for dividing and putting the tasks
Task aggregation timeTime required for collecting and aggregating the results
Parallel timeTotal execution time from start to finish
The Evaluation: Scalability Analysis Result
The Evaluation: Adaptation Protocol Analysis
Adaptation Protocol AnalysisAnalyze the overheads involved in signaling worker nodes and adapting to their current CPU load.Criteria
Two load simulatorsCPU load
Simulator 1: 30%~50%Simulator 2: 100%
The Evaluation: Adaptation Protocol Analysis Result
CPU Usage History on the worker machine Analysis of the signaling times
Simulator 2
START
Simulator 1
The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions
Analysis of Dynamic Worker Behavior Patterns under Varying Load Conditions
CriteriaMaximum Worker TimeMaximum Master OverheadTask Planning and Aggregation TimeTotal Parallel Time
The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions Result
The Evaluation: Dynamic Worker Behavior Patterns under Varying Load Conditions Result
Conclusions
SummaryGood scalability for loosely coupled applicationsIdle workstations can be effectively usedMonitoring and reacting to system state enables us to minimize intrusiveness to machines within the cluster