Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | joella-jefferson |
View: | 217 times |
Download: | 1 times |
1Euro-Par 2007, Rennes, 29th August
The Characteristics and Performance of Groups of Jobs in
Grids
Alexandru Iosup, Mathieu Jan*, Ozan Sonmez and Dick Epema
PDS GroupDelft University of Technology
The Netherlands
*: now postdoc LRI/INRIA Futurs, Orsay (Paris South), France
Euro-Par 2007, Rennes, 29th August 2
Outline
• Why looking at groups of jobs?
• Grid traces and environment summary
• Definitions of groups of jobs
• The characteristics of jobs grouping• Workload-level analysis• Group-level analysis• Job-level analysis
• Conclusion and future work
Euro-Par 2007, Rennes, 29th August 3
Why looking at groups of jobs?
• Current grids run almost exclusive single-node jobs [Grid2006]• Traces analysis: LCG, Grid3, TeraGrid, DAS-2
• How jobs are related then? What is their structure?• Batches of identical jobs?• Something else?
• No such analysis using long-term data from production and research grid environment
• No analysis of the impact of groups of jobs on the performance of grids
Euro-Par 2007, Rennes, 29th August 4
Our research questions
• What are the dependencies among the jobs submitted by a single user?
• What is the physical structure of such groupings?
• What is the impact of the job groupings on the performance of grids?
Euro-Par 2007, Rennes, 29th August 5
Grid traces: Grid’5000 (1/3)
• Experimental platform• Grid’5000: 9 sites, 15 clusters• All clusters managed by OAR
• Trace period: 05/2004 - 11/2006• CPUs: ~ 2500• Jobs: 951 K• Users: 473• Groups: 10• Consumed CPU time: 651 years
Euro-Par 2007, Rennes, 29th August 6
Grid traces: NorduGrid (2/3)
• Large scale production grid • NorduGrid: ~75 sites• Handled via ARC middleware
• Advanced Resource Connector
• Trace period: 05/2004 - 02/2006• CPUs: ~ 2000• Jobs: 781 K• Users: 387• Groups: 106• Consumed CPU time: 2443 years
Euro-Par 2007, Rennes, 29th August 7
Grid traces: GLOW (3/3)
• Grid Laboratory Of Wisconsin• Campus wide distributed computing
environment• Condor based
• Trace period: 09/2006 - 01/2007• CPUs: ~ 1400• Jobs: 216 K• Users: 18• Groups: 1• Consumed CPU time: 55 years
Euro-Par 2007, Rennes, 29th August 8
Grid traces summary
Period 05/2004 - 11/2006
05/2004 - 02/2006
09/2006 - 01/2007
Sites 15 ~75 1
CPUs ~2500 ~2000 ~1400
Jobs 951 K 781 K 216 K
Groups 10 106 1
Users 473 387 18
Consumed CPU time
651 years 2443 years 55 years
Euro-Par 2007, Rennes, 29th August 9
Groups of jobs: definitions (1/2)
• Batch submission
Maximal contiguous subsequence G of such that for any two successive jobs J, J’ in G
• Parameter Sweep Application (PSA)• Batch submission + jobs execute the same application
Euro-Par 2007, Rennes, 29th August 10
Groups of jobs: definitions (2/2)
• In this talk, we focus on batch submissions
Euro-Par 2007, Rennes, 29th August 11
Characteristics of jobs groupings
• In our analysis, = 120 seconds
Euro-Par 2007, Rennes, 29th August 12
Workload-level analysis
Grid’5000 NorduGrid GLOW
Submissions
26k 50k 13k
Jobs 808k (951k)
738k (781k) 205k (216k)
CPU time 193y (651y)
2192y (2443y)
53y (55y)
• Batches
• Continued• NorduGrid & GLOW: identical to batches• Grid’5000: 14k sub, 910k jobs, 462y
• Bursty: less submissions, more jobs
Euro-Par 2007, Rennes, 29th August 13
Group-level analysis: size of batches
• 75% of batches are size 15-20 (Grid’5000 and NorduGrid) or <10 (GLOW)• Average: 31+/-110 (Grid’5000), 15+/-33 (NorduGrid) and 15+/-38 (GLOW)• Heavy-tail distribution
Euro-Par 2007, Rennes, 29th August 14
Group-level analysis: inter-arrival time (seconds)
• Expected high inter-arrival time for batches• 50% of the values are between 400 and 700 seconds• Reminder: = 120 seconds
Euro-Par 2007, Rennes, 29th August 15
Group-level analysis: duration (seconds)
• Duration of batches are higher than for single jobs• For NorduGrid, average duration of batches is 1.5 day vs. 1
day for single jobs
Euro-Par 2007, Rennes, 29th August 16
Group-level analysis: consumed CPU time (KCPUs)
• Consumed CPU time is much higher for batches than for single jobs!
Euro-Par 2007, Rennes, 29th August 17
Job-level analysis: run time (seconds)
• Average run time for batches• Grid’5000: 0.66+/-6.65 days• GLOW: 1.04+/-3.18 days• NorduGrid: 2.27+/-5.59 days
Euro-Par 2007, Rennes, 29th August 18
Job-level analysis: wait time (seconds)
• NorduGrid: no wait time information in the trace • Average wait times of batches are higher than
• The runtime of batches• The wait time of single jobs
Euro-Par 2007, Rennes, 29th August 19
Job-level analysis: consumed CPU time (KCPUs)
• No clear distinction between batches and single jobs
Euro-Par 2007, Rennes, 29th August 20
Other analyses
• Do parallel jobs inside batches exists?• Average parallelism: 1+/-1 (Grid’5000), 2+/-7 (NorduGrid)
and 1 (GLOW)• Grid’5000: 37% of batches are of size 2, 9% of size >2,
max. = 325
• To what extend batches are PSAs?• In Grid’5000, 75% of batches are PSAs• PSAs compared to batches:
• Increased grouped size by 9 in average• Average duration time divided by 5.7
Euro-Par 2007, Rennes, 29th August 21
Performance impact of grouped submissions
• Batches display an high AIT value• Over 4000% of the ART!
• Research direction for designing scheduling policies for batches: minimization of the AIT of batches
• Performances metrics• Group runtime (RT)• Group duration (DT)• Group idle time: IT = DT - RT
Batches Single jobs
ART (s) AIT (s) ART (s) AIT (s)
Grid’5000
14 181 568 483 4 127 4 233
Euro-Par 2007, Rennes, 29th August 22
Conclusion & future work
• Formally defined 3 types of groups of jobs• Batch (and PSAs), continued and bursty
• Analysis of 3 long-term traces from large and different platforms• Up to 96% of CPU time consumed by batch submissions
• Performance analysis of batches compared to single jobs
• Future work • Deeper analysis (Grid Workloads Archives)• Research direction: minimization of idle time in groups• Trace driven simulations• Dynamic resource availability [Grid2007]
Euro-Par 2007, Rennes, 29th August 23
Thank you! Questions? Remarks? Observations?
Help building our community’sGrid Workloads Archive:
http://gwa.ewi.tudelft.nl/