The HTCondor Job Router is an add-on to the condor_schedd that transforms jobs from one type into another according to a configurable policy. This process of transforming the jobs is called job routing.
One example of how the Job Router can be used is for the task of sending excess jobs to one or more remote grid sites. The Job Router can transform the jobs such as vanilla universe jobs into grid universe jobs that use any of the grid types supported by HTCondor. The rate at which jobs are routed can be matched roughly to the rate at which the site is able to start running them. This makes it possible to balance a large work flow across multiple grid sites, a local HTCondor pool, and any flocked HTCondor pools, without having to guess in advance how quickly jobs will run and complete in each of the different sites.
Job Routing is most appropriate for high throughput work flows, where there are many more jobs than computers, and the goal is to keep as many of the computers busy as possible. Job Routing is less suitable when there are a small number of jobs, and the scheduler needs to choose the best place for each job, in order to finish them as quickly as possible. The Job Router does not know which site will run the jobs faster, but it can decide whether to send more jobs to a site, based on whether jobs already submitted to that site are sitting idle or not, as well as whether the site has experienced recent job failures.
The condor_job_router daemon and configuration determine a policy for which jobs may be transformed and sent to grid sites. By default, a job is transformed into a grid universe job by making a copy of the original job ClassAd, and modifying some attributes in this copy of the job. The copy is called the routed copy, and it shows up in the job queue under a new job id.
Until the routed copy finishes or is removed, the original copy of the job passively mirrors the state of the routed job. During this time, the original job is not available for matchmaking, because it is tied to the routed copy. The original job also does not evaluate periodic expressions, such as PeriodicHold. Periodic expressions are evaluated for the routed copy. When the routed copy completes, the original job ClassAd is updated such that it reflects the final status of the job. If the routed copy is removed, the original job returns to the normal idle state, and is available for matchmaking or rerouting. If, instead, the original job is removed or goes on hold, the routed copy is removed.
Although the default mode routes vanilla universe jobs to grid universe jobs, the routing rules may be configured to do some other transformation of the job. It is also possible to edit the job in place rather than creating a new transformed version of the job.
The condor_job_router daemon utilizes a routing table, in which a ClassAd describes each site to where jobs may be sent. The routing table is given in the New ClassAd language, as currently used by HTCondor internally.
A good place to learn about the syntax of New ClassAds is the Informal Language Description in the C++ ClassAds tutorial: http://research.cs.wisc.edu/htcondor/classad/c++tut.html. Two essential differences distinguish the New ClassAd language from the current one. In the New ClassAd language, each ClassAd is surrounded by square brackets. And, in the New ClassAd language, each assignment statement ends with a semicolon. When the New ClassAd is embedded in an HTCondor configuration file, it may appear all on a single line, but the readability is often improved by inserting line continuation characters after each assignment statement. This is done in the examples. Unfortunately, this makes the insertion of comments into the configuration file awkward, because of the interaction between comments and line continuation characters in configuration files. An alternative is to use C-style comments (/* ... */). Another alternative is to read in the routing table entries from a separate file, rather than embedding them in the HTCondor configuration file.
If Job Routing is set up, then the following items ought to be considered for jobs to have the necessary prerequisites to be considered for routing.
should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = input1, input2 transfer_output_files = output1, output2
Vanilla universe jobs and most types of grid universe jobs differ in the set of files transferred back when the job completes. Vanilla universe jobs transfer back all files created or modified, while all grid universe jobs, except for HTCondor-C, only transfer back the output file, as well as those explicitly listed with transfer_output_files. Therefore, when routing jobs to grid universes other than HTCondor-C, it is important to explicitly specify all output files that must be transferred upon job completion.
An additional difference between the vanilla universe jobs and gt2 grid universe jobs is that gt2 jobs do not return any information about the job's exit status. The exit status as reported in the job ClassAd and job event log are always 0. Therefore, jobs that may be routed to a gt2 grid site must not rely upon a non-zero job exit status.
+WantJobRouter = True
This implementation can be taken further, allowing the job to first be rejected within the local pool, before being a candidate for Job Routing:
+WantJobRouter = LastRejMatchTime =!= UNDEFINED
x509userproxy = /tmp/x509up_u275
This is not necessary if the condor_job_router daemon is configured to add a grid proxy on behalf of jobs.
Job submission does not change for jobs that may be routed.
$ condor_submit job1.sub
where job1.sub might contain:
universe = vanilla executable = my_executable output = job1.stdout error = job1.stderr log = job1.ulog should_transfer_files = YES when_to_transfer_output = ON_EXIT +WantJobRouter = LastRejMatchTime =!= UNDEFINED x509userproxy = /tmp/x509up_u275 queue
The status of the job may be observed as with any other HTCondor job, for example by looking in the job's log file. Before the job completes, condor_q shows the job's status. Should the job become routed, a second job will enter the job queue. This is the routed copy of the original job. The command condor_router_q shows a more specialized view of routed jobs, as this example shows:
$ condor_router_q -S JOBS ST Route GridResource 40 I Site1 site1.edu/jobmanager-condor 10 I Site2 site2.edu/jobmanager-pbs 2 R Site3 condor submit.site3.edu condor.site3.edu
condor_router_history summarizes the history of routed jobs, as this example shows:
$ condor_router_history Routed job history from 2007-06-27 23:38 to 2007-06-28 23:38 Site Hours Jobs Runs Completed Aborted ------------------------------------------------------- Site1 10 2 0 Site2 8 2 1 Site3 40 6 0 ------------------------------------------------------- TOTAL 58 10 1
The following sample configuration sets up potential job routing to three routes (grid sites). Definitions of the configuration variables specific to the Job Router are in section 3.3.21. One route is an HTCondor site accessed via the Globus gt2 protocol. A second route is a PBS site, also accessed via Globus gt2. The third site is an HTCondor site accessed by HTCondor-C. The condor_job_router daemon does not know which site will be best for a given job. The policy implemented in this sample configuration stops sending more jobs to a site, if ten jobs that have already been sent to that site are idle.
These configuration settings belong in the local configuration file of the machine where jobs are submitted. Check that the machine can successfully submit grid jobs before setting up and using the Job Router. Typically, the single required element that needs to be added for GSI authentication is an X.509 trusted certification authority directory, in a place recognized by HTCondor (for example, /etc/grid-security/certificates). The VDT (http://vdt.cs.wisc.edu) project provides a convenient way to set up and install a trusted CA, if needed.
# These settings become the default settings for all routes JOB_ROUTER_DEFAULTS = \ [ \ requirements=target.WantJobRouter is True; \ MaxIdleJobs = 10; \ MaxJobs = 200; \ \ /* now modify routed job attributes */ \ /* remove routed job if it goes on hold or stays idle for over 6 hours */ \ set_PeriodicRemove = JobStatus == 5 || \ (JobStatus == 1 && (time() - QDate) > 3600*6); \ delete_WantJobRouter = true; \ set_requirements = true; \ ] # This could be made an attribute of the job, rather than being hard-coded ROUTED_JOB_MAX_TIME = 1440 # Now we define each of the routes to send jobs on JOB_ROUTER_ENTRIES = \ [ GridResource = "gt2 site1.edu/jobmanager-condor"; \ name = "Site 1"; \ ] \ [ GridResource = "gt2 site2.edu/jobmanager-pbs"; \ name = "Site 2"; \ set_GlobusRSL = "(maxwalltime=$(ROUTED_JOB_MAX_TIME))(jobType=single)"; \ ] \ [ GridResource = "condor submit.site3.edu condor.site3.edu"; \ name = "Site 3"; \ set_remote_jobuniverse = 5; \ ] # Reminder: you must restart HTCondor for changes to DAEMON_LIST to take effect. DAEMON_LIST = $(DAEMON_LIST) JOB_ROUTER # For testing, set this to a small value to speed things up. # Once you are running at large scale, set it to a higher value # to prevent the JobRouter from using too much cpu. JOB_ROUTER_POLLING_PERIOD = 10 #It is good to save lots of schedd queue history #for use with the router_history command. MAX_HISTORY_ROTATIONS = 20
The conversion of a job to a routed copy may require the job ClassAd to be modified. The Routing Table specifies attributes of the different possible routes and it may specify specific modifications that should be made to the job when it is sent along a specific route. In addition to this mechanism for transforming the job, external programs may be invoked to transform the job. For more information, see section 4.4.2.
The following attributes and instructions for modifying job attributes may appear in a Routing Table entry.
The Open Science Grid has a service called ReSS (Resource Selection Service). It presents grid sites as ClassAds in an HTCondor collector. This example builds a routing table from the site ClassAds in the ReSS collector.
Using JOB_ROUTER_ENTRIES_CMD, we tell the condor_job_router daemon to call a
simple script which queries the collector and outputs a routing table.
The script, called osg_ress_routing_table.sh
, is just this:
#!/bin/sh # you _MUST_ change this: export condor_status=/path/to/condor_status # if no command line arguments specify -pool, use this: export _CONDOR_COLLECTOR_HOST=osg-ress-1.fnal.gov $condor_status -format '[ ' BeginAd \ -format 'GridResource = "gt2 %s"; ' GlueCEInfoContactString \ -format ']\n' EndAd "$@" | uniq
Save this script to a file and make sure the permissions on the file mark it as executable. Test this script by calling it by hand before trying to use it with the condor_job_router daemon. You may supply additional arguments such as -constraint to limit the sites which are returned.
Once you are satisfied that the routing table constructed by the script is what you want, configure the condor_job_router daemon to use it:
# command to build the routing table JOB_ROUTER_ENTRIES_CMD = /path/to/osg_ress_routing_table.sh <extra arguments> # how often to rebuild the routing table: JOB_ROUTER_ENTRIES_REFRESH = 3600
Using the example configuration, use the above settings to replace JOB_ROUTER_ENTRIES. Or, leave JOB_ROUTER_ENTRIES there and have a routing table containing entries from both sources. When you restart or reconfigure the condor_job_router daemon, you should see messages in the Job Router's log indicating that it is adding more routes to the table.