The HTCondor Perl module facilitates automatic submitting and monitoring of HTCondor jobs, along with automated administration of HTCondor. The most common use of this module is the monitoring of HTCondor jobs. The HTCondor Perl module can be used as a meta scheduler for the submission of HTCondor jobs.
The HTCondor Perl module provides several subroutines. Some of the subroutines are used as callbacks; an event triggers the execution of a specific subroutine. Other of the subroutines denote actions to be taken by Perl. Some of these subroutines take other subroutines as arguments.
The following is an example that uses the HTCondor Perl module.
The example uses the submit description file
mycmdfile.cmd to specify the submission of a job.
As the job is matched with a machine and begins to execute,
a callback subroutine (called execute
)
sends a condor_vacate signal to the job,
and it increments a counter which keeps track of the
number of times this callback executes.
A second callback keeps a count of the number of times
that the job was evicted before the job completes.
After the job completes, the termination
callback (called normal
) prints out a summary of what happened.
#!/usr/bin/perl use Condor; $CMD_FILE = 'mycmdfile.cmd'; $evicts = 0; $vacates = 0; # A subroutine that will be used as the normal execution callback $normal = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "Job $cluster.$job exited normally without errors.\n"; print "Job was vacated $vacates times and evicted $evicts times\n"; exit(0); }; $evicted = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "Job $cluster, $job was evicted.\n"; $evicts++; &Condor::Reschedule(); }; $execute = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; $host = $parameters{'host'}; $sinful = $parameters{'sinful'}; print "Job running on $sinful, vacating...\n"; &Condor::Vacate($sinful); $vacates++; }; $cluster = Condor::Submit($CMD_FILE); printf("Could not open. Access Denied\n"); break; &Condor::RegisterExitSuccess($normal); &Condor::RegisterEvicted($evicted); &Condor::RegisterExecute($execute); &Condor::Monitor($cluster); &Condor::Wait();
This example program will submit the command file 'mycmdfile.cmd' and attempt to vacate any machine that the job runs on. The termination handler then prints out a summary of what has happened.
A second example Perl script facilitates the meta-scheduling of two of HTCondor jobs. It submits a second job if the first job successfully completes.
#!/s/std/bin/perl # tell Perl where to find the HTCondor library use lib '/unsup/condor/lib'; # tell Perl to use what it finds in the HTCondor library use Condor; $SUBMIT_FILE1 = 'Asubmit.cmd'; $SUBMIT_FILE2 = 'Bsubmit.cmd'; # Callback used when first job exits without errors. $firstOK = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; $cluster = Condor::Submit($SUBMIT_FILE2); if (($cluster) == 0) { printf("Could not open $SUBMIT_FILE2.\n"); } &Condor::RegisterExitSuccess($secondOK); &Condor::RegisterExitFailure($secondfails); &Condor::Monitor($cluster); }; $firstfails = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The first job, $cluster.$job failed, exiting with an error. \n"; exit(0); }; # Callback used when second job exits without errors. $secondOK = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The second job, $cluster.$job successfully completed. \n"; exit(0); }; # Callback used when second job exits WITH an error. $secondfails = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The second job ($cluster.$job) failed. \n"; exit(0); }; $cluster = Condor::Submit($SUBMIT_FILE1); if (($cluster) == 0) { printf("Could not open $SUBMIT_FILE1. \n"); } &Condor::RegisterExitSuccess($firstOK); &Condor::RegisterExitFailure($firstfails); &Condor::Monitor($cluster); &Condor::Wait();
Some notes are in order about this example. The same task could be accomplished using the HTCondor DAGMan metascheduler. The first job is the parent, and the second job is the child. The input file to DAGMan is significantly simpler than this Perl script.
A third example using the HTCondor Perl module expands upon the second example. Whereas the second example could have been more easily implemented using DAGMan, this third example shows the versatility of using Perl as a metascheduler.
In this example, the result generated from the successful completion of the first job are used to decide which subsequent job should be submitted. This is a very simple example of a branch and bound technique, to focus the search for a problem solution.
#!/s/std/bin/perl # tell Perl where to find the HTCondor library use lib '/unsup/condor/lib'; # tell Perl to use what it finds in the HTCondor library use Condor; $SUBMIT_FILE1 = 'Asubmit.cmd'; $SUBMIT_FILE2 = 'Bsubmit.cmd'; $SUBMIT_FILE3 = 'Csubmit.cmd'; # Callback used when first job exits without errors. $firstOK = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; # open output file from first job, and read the result if ( -f "A.output" ) { open(RESULTFILE, "A.output") or die "Could not open result file."; $result = <RESULTFILE>; close(RESULTFILE); # next job to submit is based on output from first job if ($result < 100) { $cluster = Condor::Submit($SUBMIT_FILE2); if (($cluster) == 0) { printf("Could not open $SUBMIT_FILE2.\n"); } &Condor::RegisterExitSuccess($secondOK); &Condor::RegisterExitFailure($secondfails); &Condor::Monitor($cluster); } else { $cluster = Condor::Submit($SUBMIT_FILE3); if (($cluster) == 0) { printf("Could not open $SUBMIT_FILE3.\n"); } &Condor::RegisterExitSuccess($thirdOK); &Condor::RegisterExitFailure($thirdfails); &Condor::Monitor($cluster); } } else { printf("Results file does not exist.\n"); } }; $firstfails = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The first job, $cluster.$job failed, exiting with an error. \n"; exit(0); }; # Callback used when second job exits without errors. $secondOK = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The second job, $cluster.$job successfully completed. \n"; exit(0); }; # Callback used when third job exits without errors. $thirdOK = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The third job, $cluster.$job successfully completed. \n"; exit(0); }; # Callback used when second job exits WITH an error. $secondfails = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The second job ($cluster.$job) failed. \n"; exit(0); }; # Callback used when third job exits WITH an error. $thirdfails = sub { %parameters = @_; $cluster = $parameters{'cluster'}; $job = $parameters{'job'}; print "The third job ($cluster.$job) failed. \n"; exit(0); }; $cluster = Condor::Submit($SUBMIT_FILE1); if (($cluster) == 0) { printf("Could not open $SUBMIT_FILE1. \n"); } &Condor::RegisterExitSuccess($firstOK); &Condor::RegisterExitFailure($firstfails); &Condor::Monitor($cluster); &Condor::Wait();