Date post: | 29-Mar-2015 |
Category: |
Documents |
Upload: | rashad-ducksworth |
View: | 220 times |
Download: | 0 times |
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
REIRecipe Execution Infrastructure
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Purpose of REI
Main Objectives of REI- Provide the services of a parallel Batch Queue System.
- Make it easy to control and monitor complicated batches with job synchronization.
- Make it possible to distribute tasks (processing load) over a cluster of CPUs/nodes.
Not Provided in the Present Implementation- Services for distributing data within the cluster to the nodes doing the processing (data
sharing/distribution done via a common storage area/file server).
- Services provided for resource management and advertising.
- Services provided for explicit load balancing (optimized job distribution).
- Special features for GRID appliance provided.
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Main Features
Main Features of REI- Implemented in C++ (in house implementation from scratch).
- Uses RDBMS for information sharing and task synchronization.
- Execution of shell commands or native execution of CPL Recipes (no generic interfacing to shared object files).
- Pworker task execution daemon provided – can take three roles:- Process Master Commands – Master Pworker.
- Process Standard Commands – Standard Pworker.
- Process Master and Standard Comands.
- Command line utilities provided to add/remove/monitor commands and to control Pworkers.
- API provided for implementing Master Command Libraries (also referred to as Recipe Planners) and Standard Command Libraries.
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Command Line Interface
Interaction with REI- Command line interface provided:
- addcmd: Add a Master Command in the Master Command Queue (handles ABs and SOFs, which are not part of core of REI).
- cmdstat: Query the status of all commands or a specific command. ‘Tail’ feature provided.
- rmcmd: Remove information for one command or all commands from the Command Queues (clean up).
- pworker: The Pworker daemon.
- stopworker: Stop one specific Pworker or all Pworkers running.
- listworkers: List Pworkers running in the system.
- rmworker: Remove a Pworker (make it exit) or all Pworkers.
- The commands are not part of the core REI system, but should be seen as convenience features. They are based on the REI libraries.
- Can add commands in the DB directly via the REI libraries, i.e., can control and monitor the operation of REI programmatically.
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Command Lifecycle
Command States- Each command submitted has 1 of 7 states indicating its current status:
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Command Transitions
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Interprocess Synchronization
Interprocess Synchronization/Information Sharing- Pworkers synchronize themselves via the DB.
- DB also used for exchanging information between processes in the system:
- Tables:
- pworker_registry: Information about Pworkers in the system (ID, node, Master and/or Standard Commands, …).
- pworker_master_command_queue: Contains information for the Master Commands waiting to be executed under execution and executed.
- pworker_master_sequencer: Contains information about Master Commands being BLOCKED.
- pworker_command_queue: Standard Commands waiting to be executed under execution and executed.
- pworker_command_sequencer: Used to sequence Standard Commands.
- pworker_log: Log messages from Pworker processes.
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
OmegaCam Demo Science Reduction Cascade/1
OmegaCam Science Demo Cascade – Example- Used adapted WFI frames (8 extensions).
- Provided:- OCAM REI Recipe Planner Plug-In to schedule tasks for the recipes (general Recipe
Planner for all Recipes made).
- REI Standard Command Library Plug-Ins to do FITS file splitting and joining.
- Cascade Scheduler Script to submit Master Commands and to create SOF’s needed.
- 6 Recipes executed during the cascade (6 Master Commands issued to REI).
- Total number of commands scheduled within REI for the cascade: ~100.
- Total number of intermediate/temporary and final data products: ~200.
- Number of SOF’s involved: 10.
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
OmegaCam Demo Science Reduction Cascade/2
Setting up Cascade – Example:
$ addcmd -name ocam_reduce_sci_W_2005-02-08T16:29:05 -bg -waitfor ocam_reduce_std_W_2005-02-08T16:29:05 -recipe ocam_reduce_sci /data/ocam/sof/ocam_reduce_sci_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_sci_W_2005-02-08T16:29:05
$ addcmd -name ocam_reduce_std_W_2005-02-08T16:29:05 -bg -waitfor ocam_mflat_W_2005-02-08T16:29:05 -trigger ocam_reduce_std_W_2005-02-08T16:29:05 -recipe ocam_reduce_std /raid/data/ocam/sof/ocam_reduce_std_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_reduce_std_W_2005-02-08T16:29:05
$ addcmd -name ocam_mflat_W_2005-02-08T16:29:05 -bg -waitfor ocam_mtwilight_W_2005-02-08T16:29:05 -trigger ocam_mflat_W_2005-02-08T16:29:05 -recipe ocam_mflat /raid/data/ocam/sof/ocam_mflat_W_2005-02-08T16:29:05.sof -out /raid/data/ocam/products/ocam_mflat_W_2005-02-08T16:29:05
…
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Task Synchronization
Master
Split
Split
Split
Split
BIAS
BIAS
BIAS
BIAS
BIAS
BIAS
BIAS
BIAS
Join Master
Split
Split
Split
Split
DOME
DOME
DOME
DOME
DOME
DOME
DOME
DOME
JoinCompl
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Command Scheduling
Frame AFrame B
Split Split
Join Join
Recipe Recipe Recipe Recipe
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
DFO Cascading
Controlling REI – DFO Environment- Already used in operation by DFO (since a while).- DFO uses REI to control scheduling of a UNIX shell script, which itself controls the
execution of the recipes (calling internally esorex).- DFO uses parallelism at frame level, no parallelism in connection with the processing of
each frame.- REI used as a queue system, jobs are submitted and the scheduling and execution of the
jobs carried out by REI. - Example addcmd in DFO environment:
$ addcmd -name SINFO.2004-08-21T20:25:28.895_tpl.ab -bg -trigger mflat_SINFO.2004-08-21T20:25:28.895_tpl.ab -exe processAB -a SINFO.2004-08-21T20:25:28.895_tpl.ab
$ addcmd -name SINFO.2004-08-21T19:55:07.961_tpl.ab -bg -trigger mwave_SINFO.2004-08-21T19:55:07.961_tpl.ab -waitfor mflat_SINFO.2004-08-21T20:25:28.895_tpl.ab -exe processAB -a SINFO.2004-08-21T19:55:07.961_tpl.ab
REI – Recipe Execution Infrastructure
Jens Knudstrup/2005-02-08
Using REI
How to Integrate a Pipeline in REI (Simplified …)- Decide how to execute the recipes:
1. Native way in the form of CPL Recipes.2. Invoke the recipe library methods/functions from within Standard Commands.3. Execute via jacket scripts/applications encapsulating recipe.
- Define the necesary/desirable level of parallelism.- Define execution plans for the various cascades.- Implement Recipe Planner, if necessary, to do the internal coordination of the command
scheduling (+ producing data for the Standard Commands).- Implement Standard Command Library with special commands, which should execute
internally within the REI environment (if required).- Implement external control scripts to submit Master Commands, defining dependencies
and providing data for the command execution if necessary.- Decide architecture of processing cluster (number of Master Pworkers, Pworkers,
CPUs, nodes, amount of memory per CPU, …).- Start up Pworkers, defining their proper role + referring to the Command Plug-in
Libraries provided (if any) and/or possible CPL Recipe Plug-in Libraries.