AMH001 (thesis.ppt - 04/16/03)
REMOTE++: A Tool for Automatic Remote
Distribution of Programs on Windows Computers
Ashley HopkinsDepartment of Computer Science and Engineering
University of South FloridaTampa, Florida 33620
I wish to thank Dr. Kenneth Christensen for his encouragement, his enthusiasm, and his support in writing this thesis.
I also wish to thank my committee member Zornitza Genova
Prodanoff for taking the time to read this thesis and provide valuable feedback.
AMH002
Acknowledgements
• Introduction – remote distribution
• Description of Remote distribution methods • Design of REMOTE++ • Evaluation of REMOTE++ • Summary and future work
AMH003
Topics
• Two key issues addressed by remote distribution
1. Simulation programs require significant time to execute– Many require multiple runs to complete an experiment
2. Many computer resources are under utilized
AMH004
Introduction
• Parallelization of programs reduces overall execution time
• Two types of parallelization
1. Space based parallelization– Addresses programs that can be broken down easily
2. Time based parallelization– Addresses programs that require multiple executions
» Many simulations fit this category» REMOTE++ implements time parallelization
AMH005
Introduction continued
• Remote Distribution Network
AMH006
Introduction continued
Network
Master
Remote
Remote
Remote
Remote
Remote
• Remote distribution of programs– Enables execution of independent programs in parallel– Harnesses the idle CPU cycles of remote machines– Reduces the overall execution time of experiments
AMH007
Introduction continued
AMH008
Introduction continued
• Requirements of Distribution Tools
1) Distribution must be automatic (no manual interaction)
2) Tool must be simple for easy maintenance and modification
3) Output files must be available on the master PC
4) A single process must be distributed to each remote machine
5) Once a job completes, the next job must be sent
6) Each job must be executed only once
7) The failure of a job to complete must be detected
8) The failure of a remote host must be detected
9) Error messages must be displayed at the master PC
10)A log file should be kept
• Methods for Remote Distribution – Remote shell (rsh) and remote execute (rexec) commands– Cluster systems
» Beowulf– Grid Computing
» SETI@home– Unix based remote distribution tools
» Condor– Original REMOTE tool developed by Dr. Christensen
» REMOTE++ built upon this tool
AMH009
Remote Distribution Methods
• Drawbacks of current tools– Primarily designed for Unix platforms– Many are large or complex– Many require extensive installation and maintenance
AMH010
Remote Distribution Methods continued
• Key challenge is…
Develop a Windows based Remote Distribution tool that is easy to use, maintain, and modify.
• Must be able to reduce overall execution time– Overhead in distribution of processes must be overcome
• Must be able to execute many different programs– No modification to the programs – Various input and output methods allowed
AMH011
Remote Distribution Methods continued
AMH012
Description of REMOTE++
• REMOTE++ is built upon REMOTE Sockets interface replaced by rcp/rsh commands Programs read/write to standard input/output An invalid job is detected An invalid host is detected
• REMOTE++ also has drawbacks
– Each remote host required to have an rsh/rcp daemon– Status feature of REMOTE not available– Security concerns with remote shell commands
AMH013
Description of REMOTE++ continued
• Set-up of REMOTE++
1) Each client must have a remote shell/remote copy daemon.
2) REMOTE++ must be loaded on the master machine.
3) A joblist.txt file must contain a list of jobs to be executed.
4) A hostlist.txt file must contain a list of the hostnames of all remote machines.
5) A status.txt file must be created as a log file containing the success or failure of each job and each remote host.
AMH014
Description of REMOTE++ continued
• Sample joblist.txt file
file mm1.exe input1.txt output1.txt std hello.exe input2.txt output2.txt file mm1.exe input3.txt output3.txt
• Sample Hostlist.txt file
giga2.csee.usf.edu giga3.csee.usf.edu
AMH015
Description of REMOTE++ continued
• Sample status.txt file
Mode is classic.Executable file mm1.exe foundInput file input1.txt foundOutput file output1.txt found
Mode is new.Executable file hello.exe foundInput file input2.txt foundOutput file output2.txt found
Mode is classic.Input file input3.txt was not foundOutput file output3.txt found
giga2.csee.usf.edu is a valid hostgiga3.csee.usf.edu is a valid host
AMH016
Description of REMOTE++ continued
• Operation of REMOTE++
1) The existence of each job in joblist.txt is validated.
2) Threads are used to assign a job to each host in the host list.
3) The executable is remote copied (rcp) to the remote host.» rcp failure makes host unavailable and job is reassigned
4) The job is executed using a remote shell (rsh) command.
5) When the job finishes the host is assigned another job until all jobs in joblist.txt are complete.
AMH017
Description of REMOTE++ continued
• Sample Execution of REMTOE++
AMH018
Description of REMOTE++ continued
• Two input/output methods are supported by REMOTE++
1) File or “Classic” method– Used with programs that read from and write to files– Implemented in original REMOTE tool– Requires transfer of input and output files
2) Std or “New” method– Used with programs that use standard input/output– New in REMOTE++ tool– Input and Output redirected from files– No transfer of files required
AMH019
Description of REMOTE++ continued
• The remote shell/remote copy daemon:
1) Vendor version (tested with Denicomp’s rshd)– Dependable– Cost prohibitive– Not open source
2) Free version (by Silviu Marghescu) – Free– Open source – Does not support standard input/output method– Not as reliable
AMH020
Evaluation of REMOTE++
• Queuing systems can be modeled using simulation
• Queue simulations must be executed numerous times with varying input to gather statistical information
• A queue simulation was utilized to evaluate the REMOTE++ tool
AMH021
Evaluation of REMOTE++ continued
• A queue is a sequence of customers waiting to receive service
• The following features determine the behavior of a queue: a. The distribution of time between arriving customersb. The distribution of time to service a customerc. The number of servers available to service the customersd. The capacity of the queuee. The population size of customers
• The queuing discipline determines the order of service
• An M/M/1 queue has the following features:1. Markovian (exponentially distributed) inter-arrival of
customers2. Markovian (exponentially distributed) service times3. A single server4. An unlimited queue capacity5. An infinite customer population
• An M/M/1 queue has FIFO queuing discipline
AMH022
Evaluation of REMOTE++ continued
ServerArrivalsQueue
Departures
AMH023
Evaluation of REMOTE++ continued
• Evaluated REMOTE++ with an M/M/1 queue simulation
• Performance of an M/M/1 queue measured by its utilization– Utilization (ρ) is the fraction of the time the system is busy– Utilization is a ratio of arrival rate and the service rate– The length (L) of the queue is dependent on the utilization
ρρL1)
AMH024
Evaluation of REMOTE++ continued
• Goal of Evaluation…
Determine the relationship between the utilization and thesimulation run time for mean queue length within a percentof the theoretical length
• At the same time… Evaluate the reduction in execution time when executing simulation with REMOTE++ on five machines
AMH025
Evaluation of REMOTE++ continued
•M/M/1 queue simulation time was evaluated for...– Utilization from 1% to 99.5% – Length within 10% of the theoretical length– Statistical mean of 10 executions at each interval
AMH026
Evaluation of REMOTE++ continued
• As the target utilization approaches 100% the simulation time of the M/M/1 queue increasingly grows longer.
0
500000
1000000
1500000
2000000
90 91 92 93 94 95 96 97 98 99 100
Target Utilization
Sim
ulat
ion
Tim
e
AMH027
Evaluation of REMOTE++ continued
• Simulation time grows slightly faster than order six polynomial growth
0
500000
1000000
1500000
2000000
90 91 92 93 94 95 96 97 98 99 100
Target Utilization
Sim
ulat
ion
Tim
e
0102030405060
SingleMachine
Actual onFive
Machines
Projectedon Five
Machines
Execution Time in Minutes
AMH028
Evaluation of REMOTE++ continued
• The M/M/1 queue execution...– Projected a five time speed up on five machines– Achieved about two and a half time speed-up on five machines
• seven seconds of overhead per job• at low utilization jobs executed in several seconds
AMH029
Summary and future work
• Remote Distribution can be used to reduce execution time.− Existing systems are Unix-based and complex− Need a simple Windows based tool
• REMOTE++ improves upon REMOTE− Complex sockets interface replaced by simple rsh/rcp script− Enables wider variety of programs to be executed− Able to recover from invalid jobs and hosts
AMH030
Summary and future work
• Improve free remote shell daemon– Support std or “new” input/output method
• Reduce overhead in distribution to increase reduction in execution time.
• Support more and mixed input/output methods
• Implement security in REMOTE++– Currently relies on rsh daemon for security
• Implement status feature similar to original REMOTE tool
AMH031
Questions?
Ashley HopkinsDepartment of Computer Science and
EngineeringUniversity of South Florida
Tampa, Florida [email protected]
REMOTE++ soon available at:
• http://www.csee.usf.edu/~amhopki2/research
• http://www.csee.usf.edu/~christen/tools/toolpage.html
Thank You