Condor Usage at Brookhaven National Lab
Alexander Withers (talk given by Tony Chan)RHIC Computing Facility
Condor Week - March 15, 2005
About Brookhaven National Lab
● One of a handful of Laboratories supported and managed by the U.S. gov’t through DOE.
● Multi-disciplinary Lab with 2,700+ employees, Physics being the largest department.
● Physics Dept. has its own computing division (30+ FTE’s) to support physics (HEP) projects.
● RHIC (nuclear) and ATLAS (HEP) are largest projects currently being supported.
Computing Facility Resources
● Full service facility: central/distributed storage capacity, large Linux Farm, robotic system for data storage, data backup, etc.
● 6+ PB permanent tape storage capacity.● 500+ TB central/distributed disk storage capacity.● 1.4 million SpecInt2000 aggregrate computing
power in Linux Farm.
History of Condor at Brookhaven
● First looked at Condor in 2003 as a replacement for LSF and in-house batch software.
● Installed 6.4.7 in August 2003.● Upgraded to 6.6.0 in February 2004.● Upgraded to 6.6.6 (with 6.7.0 startd binary) in
August 2004.● User base grew from 12 (April 2004) to 50+
(March 2005).
The Rise in Condor Usage
0
200
400
600
800
1000
1200
1400
kC
PU
-ho
urs
Au
g.
Se
p.
Oc
t.
No
v.
De
c.
Ja
n.
Fe
b.
Ma
r. (
es
t.)
ACF/RCF
The Rise in Condor Usage
0
200
400
600
800
1000
1200
1400
1600
1800
avg
. #
of
run
nin
g
job
s
Au
g.
Se
p.
Oc
t.
No
v.
De
c.
Ja
n.
Fe
b.
Ma
r. (
es
t.)
ACF/RCF
Condor Cluster Usage
0
5
10
15
20
25
30
35
Av
g.
Clu
ste
r U
sa
ge
(%
)
Au
g.
Se
p.
Oc
t.
No
v.
De
c.
Ja
n.
Fe
b.
Ma
r. (
es
t.)
ACF/RCF
BNL’s modified Condorview
Overview of Computing Resources● Total of 2750 CPUs (growing to 3400+ in 2005).● Two central managers with one acting as a
backup.● Three specialized submit machines which handle
~600 simultaneous jobs each on average.● 131 of the execute nodes can also act as
submission nodes.● One monitoring/Condorview server.
Overview of Computing Resources, cont.
● Six GLOBUS gateway machines for remote job submission.
● Most machines run SL-3.0.2 on the x86 platform, some still using RH 7.3.
● Running 6.6.6 with 6.7.0 startd binary to take advantage of multiple VM feature.
Overview of Configuration● Computing resources divided into 6 pools.● Two configuration models:
– Split pool resources into two parts and restrict which jobs can run in each part.
– More complex version of the Bologna Batch System.
– A pool uses one or both of these models.
● Some pools employ user priority preemption.● Use “drop queue” method to fill fast machines
first. ● Have tools to easily reconfigure nodes.● All jobs use vanilla universe (no checkpointing).
Two Part Model
● Nodes are assigned one of two tasks irrespective of Condor: analysis or reconstruction.
● Within Condor, a node advertises itself as either an analysis node or a reconstruction node.
● A job must advertise itself in the same manner to match with an appropriate node.
● Only certain users may run reconstruction jobs but anyone can run an analysis job.
Analysis/Reconstruction
Group 3
Group 2
Group 1
Fast
Slow
vm1
vm2
● No suspension● No preemption● Will start a job if CPU is free
Group 1
Group 2
Group 3
Group 4
Group 5
Reconstruction Job: wants group <= 2
A More Complex Version of the Bologna Model
● Two CPU nodes each with 8 VMs.● 2 VMs per CPU.● Only two jobs running at a time.● Four job categories, each with its own priority.● A high priority VM will suspend a random VM
of lower priority.● The random aspect is to prevent the same VM
from always getting suspended.
Analysis/Reconstruction
Group 3
Group 2
Group 1
Fast
Slow
● Low priority VMs suspended● No preemption● Will start a job if CPU is free or is of higher priority
Group 1
Group 2
Group 3
Group 4
Group 5
Reconstruction Job: wants group == 3Med. Priority (vm5/vm6)
MC (vm1/vm2)
Low (vm3/vm4)
Med (vm5/vm6)
High (vm7/vm8) High Prio
Low Prio
Issues We've Had to Deal With
● Tune parameters to alleviate scalability problems.– MATCH_TIMEOUT
– MAX_CLAIM_ALIVES_MISSED
● Panasas (proprietary file system) creates kernel threads with whitespace in process name. Breaks an fscanf in procapi.C Panasas fixed bug.
● High-volume users can dominate pool, partially solved with PREEMPTION_REQUIREMENTS.
Issues We’ve Had to Deal With, cont.
● Dagman problems (latency, termination) changed from dagman for plain Condor.
● Created own ClassAds and JobAds to create batch queues and handy management tools (ie, our version of condor_off).
● Modified Condorview to meet our accounting & monitoring requirements.
Issues Not Yet Resolved
● Need job ClassAd which gives user's primary group --> better control over cluster usage.
● Transfer output files for debugging when job is evicted.
● Need option to force the schedd to release its claim after each job.
● Allow schedd to set mandatory periodic_remove policy avoid manual cleanup.
Issues Not Yet Resolved, cont.
● Shadow seems to make a large number of NIS calls. Possible problem with caching address shadows in vanilla universe?
● Need Kerberos support to comply with security mandates.
● Interested in Condor on Demand (COD), but lack of functionality prevents more usage.
● Need more (and effective) cluster management tools condor_off works?
Near-Term Plans & Summary
● Waiting for 6.8.x series (late 2005?) to upgrade.● Scalability concerns as usage rises.● High availability more critical as usage rises.● Integration of BNL Condor pools with external
pools, but concerned about security.● Need some functionalities listed above for a
meaningful upgrade and to improve cluster management capability.