Performance Evaluation of Load Sharing Policies on a Beowulf Cluster
James Nichols
Marc Lemaire
Advisor: Mark Claypool
Outline
Introduction Methodology Results Conclusions
Introduction
What is a Beowulf cluster? Cluster of computers networked together via Ethernet
Load Distribution Share load, decreasing response times and increasing
overall throughput Need for expertise in a particular load distribution
mechanism Load
Historically use CPU as the load metric. What about disk and memory load? Or system events like
interrupts and context switches? PANTS Application Node Transparency System
Removes the need for expertise required by other load distribution mechanisms
PANTS
PANTS Application Node Transparency System Developed in previous MQP: CS-DXF-9918.
Enhanced the following year to use DIPC in: CS-DXF-0021
Intercepts execve() system calls Uses /proc files system to calculate CPU load to
classify node as “busy” or “free” Any workload which does not generate CPU load will
not be distributed Near linear speedup for computationally intensive
applications New load metrics and polices!
Outline
Introduction Methodology Results Conclusions
Methodology
Identified load parameters Implemented ways to measure parameters Improved PANTS implementation Built micro benchmarks which stressed each load metric Built macro benchmark which stressed a more realistic
mix of metrics Selected real world benchmark
Methodology: Load Metrics
Acquired via /proc/stat CPU – Totals jiffies (1/100ths of a second) that the
processor spent on user, nice, and system processes. Obtain a percentage of total.
I/O – Blocks read/written to disk per second. Memory – Page operations per second. Example: a
virtual memory page fault requiring a page to be loaded into memory from disk
Interrupts – System interrupts per second. Example: incoming Ethernet packet.
Context Switches – How many times the processor switched between processes per second.
Methodology: Micro & Macro Benchmarks Implemented micro and macro benchmarks
Helped refine understanding of how the system was performing, tested our load metrics, etc.
(Not enough time to present)
Real world benchmark: Linux kernel compile Distributed compilation of the Linux kernel Executed by the standard Linux program make Loads I/O and memory resources Details:
Kernel version 2.4.18 432 files Mean source file size: 19KB Needed to expand relative path names to full paths
Thresholds Pants default: CPU: 95% New policy: CPU: 95%, I/O: 1000 Blocks/sec, Memory: 4000
Page faults/sec, IRQ: 115000 interrupts/sec, Context Switches: 6000 switches/sec
Compare PANTS default policy to our new load metrics and policy
Outline
Introduction Methodology Results Conclusions
Results: CPU
AverageStd Dev
MaxMin
PANTS - New Policies
PANTS - Defualt0
10
20
30
40
50
60
70
CPU %
Results: Memory
AverageStd Dev
MaxMin
PANTS - New Policies
PANTS - Defualt0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Page Faults/Sec
Results: Context Switches
AverageStd Dev
MaxMin
PANTS - New Policies
PANTS - Defualt0
5000
10000
15000
20000
25000
30000
35000
40000
Context Switches/Sec
Results: Summary
Summary Resuls - Distributed Compilation
0
200
400
600
800
1000
1200
1400
1600
Local NFS PANTS - nomigration
PANTS -default
PANTS - newload metrics
Compilation Method
Tim
e(S
ec
)
Conclusions
Achieve better throughput and more balanced load distribution when metrics include I/O, memory, interrupts, and context switches.
Future Work: Use preemptive migration? Include network usage load metric.
For more information visit: http://pants.wpi.edu
Questions?