1www.cs.wisc.edu/condor
The Roadmap to New Releases
Todd TannenbaumDepartment of Computer SciencesUniversity of Wisconsin-Madisonhttp://www.cs.wisc.edu/condor
2www.cs.wisc.edu/condor
Stable vs. Development Series
› Much like the Linux kernel, Condor provides two different releases at any time: Stable series Development series
› Allows Condor to be both a research project and a production-ready system
3www.cs.wisc.edu/condor
Stable series
› Series number in version is even (e.g. 6.2.0)
› Releases are heavily tested
› Only bug fixes and ports to new platforms are added on a stable series
4www.cs.wisc.edu/condor
Stable series (cont.)
› A given stable release is always compatible with other releases from the same series
› Recommended for production pools
5www.cs.wisc.edu/condor
Development Series
› Series number in the version is odd (e.g. 6.1.17, 6.3.1)
› New features and new technology are added frequently
› Versions from the same development series are not always compatible with each other
6www.cs.wisc.edu/condor
Development Series (cont.)
› Releases are not as heavily tested
› Generally not recommended for production pools … unless new features are required … unless we recommend
otherwise :^)
7www.cs.wisc.edu/condor
Where is Condor Today?
› Version 6.3.2 being released asap – this is the v6.4.0 release candidate.
› We expect version 6.4.0 released by the end of March.
8www.cs.wisc.edu/condor
What’s new for Condor v6.4.0?
9www.cs.wisc.edu/condor
New Ports in 6.4.0
› Full support (with checkpointing and remote system calls): RedHat 7.x
(Linux 2.4.x kernel + glibc 2.2.x)
10www.cs.wisc.edu/condor
New Ports in 6.4.0 (cont.)
› ”Clipped" support (no checkpointing, PVM, or remote system calls, but all other functionality is available) Windows 2000 Mac OS X
11www.cs.wisc.edu/condor
Secure Communication› Secure network communication
Strong user authentication • Multiple methods supported: Kerberos,
X509, NT LanMan, …
Encryption Integrity
› Authorization based on host or user
12www.cs.wisc.edu/condor
New Job Universes
› MPI Universe Launch MPI jobs linked with MPICH
library
› Globus Universe Faster, more reliable, better
integrated
› Java Universe
13www.cs.wisc.edu/condor
Java Universe Jobuniverse = javaexecutable = Main.classjar_files = MyLibrary.jarinput = infileoutput = outfilearguments = Main 1 2 3queue
condor_submit
14www.cs.wisc.edu/condor
Why not use Vanilla Universe for Java jobs?
› Java Universe provides more than just inserting “java” at the start of the execute line Knows which machines have a JVM installed Knows the location, version, and
performance of JVM on each machine Provides more information about Java job
completion than just JVM exit code• Program runs in a Java wrapper, allowing Condor
to report Java exceptions, etc.
15www.cs.wisc.edu/condor
Java support, cont.
condor_status -java
Name JavaVendor Ver State Activity LoadAv Mem
aish.cs.wisc. Sun Microsy 1.2.2 Owner Idle 0.000 249
anfrom.cs.wis Sun Microsy 1.2.2 Owner Idle 0.030 249
babe.cs.wisc. Sun Microsy 1.2.2 Claimed Busy 1.120 123
...
16www.cs.wisc.edu/condor
Condor File Transfer
› Condor will transfer job files from the submit machine to the execute machine
› Files to send and/or receive specified at submit time
› Transfer is atomic All files are transferred, or transfer fails
› Appeared in v6.2 only in Condor for Windows
17www.cs.wisc.edu/condor
File Transfer, cont.
› Example: transfer_input_files = x, y, z … transfer_output_files = a, b, c …. transfer_files = [ ALWAYS | ONEXIT ]
› Note: Condor can automatically figure out output files Default: Send back any new/changed files
18www.cs.wisc.edu/condor
Remote I/O Socket› Job can request that the condor_starter
process on the execute machine create a Remote I/O Socket
› Used for online access of file on submit machine – without Standard Universe. Use in Vanilla, Java, …
› Libraries provided for Java and for C, e.g. :Java: FileInputStream -> ChirpInputStream
C : open() -> chirp_open()
Job
Fork
startershadow
HomeFile
System
I/O Library
I/O Server I/O Proxy
Secure Remote I/O
Local System Calls
Local I/O(Chirp)
Execution SiteSubmission Site
20www.cs.wisc.edu/condor
Job Policy Expressions› User can supply job policy
expressions in the submit file.› Can be used to describe a successful
run.on_exit_remove = <expression>on_exit_hold = <expression>periodic_remove = <expression>periodic_hold = <expression>
21www.cs.wisc.edu/condor
Job Policy Examples› Do not remove if exits with a signal:
on_exit_remove = ExitBySignal == False› Place on hold if exits with nonzero status or ran
for less than an hour: on_exit_hold = ((ExitBySignal==False) &&
(ExitSignal != 0)) || ((ServerStartTime – JobStartDate) < 3600)
› Place on hold if job has spent more than 50% of its time suspended:
periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime / 2.0)
22www.cs.wisc.edu/condor
Firewall Support› Port Restrictions
In condor_config file can specify: LOWPORT = x HIGHPORT = y All dynamic ports will be between x and y
inclusive
› Condor + Firewalls/Private Networks: Who: Se-Chang Son Time: 9am-12pm Weds Where: rm 3387
23www.cs.wisc.edu/condor
Condor on Windows› On both NT and Win2k› New universes added: MPI, Java,
Scheduler (and Globus in the works!)› DAGMan ported› CondorView ported› Run shadow + DAGMan as the user
Allows submission from directories on shared filesystems
24www.cs.wisc.edu/condor
And more…
› Unix Man pages
› Fetch/consolidate log files remotely
› ClassAd chaining
› Many DAGMan improvements
› Bug fixes, etc…
25www.cs.wisc.edu/condor
What’s Next?Future Directions
› Increased focus on standalone tools built with Condor Technology DAGMan NeST PFS HawkEye Condor-G …
26www.cs.wisc.edu/condor
What’s Next?
› Big Item: More focus on being a service
provider than just an end-user tool: Developer APIs / libraries SOAP access to services XML representations of user logs,
ClassAds, accounting info, etc.
27www.cs.wisc.edu/condor
More what’s next…
› Condor on Windows Increased support from Microsoft
Research Remote I/O Complete Shared Filesystem support Condor-G
› MPI Scheduling Improvements
28www.cs.wisc.edu/condor
More what’s next…› New version of ClassAds into Condor
Conditionals !! • if/then/else
Aggregates (lists, nested classads) Built-in functions
• String operations, pattern matching, time operators, unit conversions
Clean implementations in C++ and Java ClassAd collections
29www.cs.wisc.edu/condor
More what’s next…
› Re-write of the condor_schedd Performance enhancements and
lowered resource requirements (particularly RAM)
› Re-write of the checkpoint server Add secure communication NEST technology infusion Enhanced support for multiple servers Store meta-data along with checkpoint
files
30www.cs.wisc.edu/condor
Thank you for coming to Paradyn/Condor
Week!