The HTCondor Job Router

5.4 The HTCondor Job Router

  5.4.1 Routing Mechanism
  5.4.2 Job Submission with Job Routing Capability
  5.4.3 An Example Configuration
  5.4.4 Routing Table Entry ClassAd Attributes
  5.4.5 Example: constructing the routing table from ReSS

The HTCondor Job Router is an add-on to the condor_schedd that transforms jobs from one type into another according to a configurable policy. This process of transforming the jobs is called job routing.

One example of how the Job Router can be used is for the task of sending excess jobs to one or more remote grid sites. The Job Router can transform the jobs such as vanilla universe jobs into grid universe jobs that use any of the grid types supported by HTCondor. The rate at which jobs are routed can be matched roughly to the rate at which the site is able to start running them. This makes it possible to balance a large work flow across multiple grid sites, a local HTCondor pool, and any flocked HTCondor pools, without having to guess in advance how quickly jobs will run and complete in each of the different sites.

Job Routing is most appropriate for high throughput work flows, where there are many more jobs than computers, and the goal is to keep as many of the computers busy as possible. Job Routing is less suitable when there are a small number of jobs, and the scheduler needs to choose the best place for each job, in order to finish them as quickly as possible. The Job Router does not know which site will run the jobs faster, but it can decide whether to send more jobs to a site, based on whether jobs already submitted to that site are sitting idle or not, as well as whether the site has experienced recent job failures.

5.4.1 Routing Mechanism

The condor_job_router daemon and configuration determine a policy for which jobs may be transformed and sent to grid sites. By default, a job is transformed into a grid universe job by making a copy of the original job ClassAd, and modifying some attributes in this copy of the job. The copy is called the routed copy, and it shows up in the job queue under a new job id.

Until the routed copy finishes or is removed, the original copy of the job passively mirrors the state of the routed job. During this time, the original job is not available for matchmaking, because it is tied to the routed copy. The original job also does not evaluate periodic expressions, such as PeriodicHold. Periodic expressions are evaluated for the routed copy. When the routed copy completes, the original job ClassAd is updated such that it reflects the final status of the job. If the routed copy is removed, the original job returns to the normal idle state, and is available for matchmaking or rerouting. If, instead, the original job is removed or goes on hold, the routed copy is removed.

Although the default mode routes vanilla universe jobs to grid universe jobs, the routing rules may be configured to do some other transformation of the job. It is also possible to edit the job in place rather than creating a new transformed version of the job.

The condor_job_router daemon utilizes a routing table, in which a ClassAd describes each site to where jobs may be sent. The routing table is given in the New ClassAd language, as currently used by HTCondor internally.

A good place to learn about the syntax of New ClassAds is the Informal Language Description in the C++ ClassAds tutorial: http://htcondor.org/classad/c++tut.html. Two essential differences distinguish the New ClassAd language from the current one. In the New ClassAd language, each ClassAd is surrounded by square brackets. And, in the New ClassAd language, each assignment statement ends with a semicolon. When the New ClassAd is embedded in an HTCondor configuration file, it may appear all on a single line, but the readability is often improved by inserting line continuation characters after each assignment statement. This is done in the examples. Unfortunately, this makes the insertion of comments into the configuration file awkward, because of the interaction between comments and line continuation characters in configuration files. An alternative is to use C-style comments (/* …*/). Another alternative is to read in the routing table entries from a separate file, rather than embedding them in the HTCondor configuration file.

5.4.2 Job Submission with Job Routing Capability

If Job Routing is set up, then the following items ought to be considered for jobs to have the necessary prerequisites to be considered for routing.

Jobs appropriate for routing to the grid must not rely on access to a shared file system, or other services that are only available on the local pool. The job will use HTCondor’s file transfer mechanism, rather than relying on a shared file system to access input files and write output files. In the submit description file, to enable file transfer, there will be a set of commands similar to
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = input1, input2
transfer_output_files = output1, output2

Vanilla universe jobs and most types of grid universe jobs differ in the set of files transferred back when the job completes. Vanilla universe jobs transfer back all files created or modified, while all grid universe jobs, except for HTCondor-C, only transfer back the output file, as well as those explicitly listed with transfer_output_files. Therefore, when routing jobs to grid universes other than HTCondor-C, it is important to explicitly specify all output files that must be transferred upon job completion.
An additional difference between the vanilla universe jobs and gt2 grid universe jobs is that gt2 jobs do not return any information about the job’s exit status. The exit status as reported in the job ClassAd and job event log are always 0. Therefore, jobs that may be routed to a gt2 grid site must not rely upon a non-zero job exit status.
One configuration for routed jobs requires the jobs to identify themselves as candidates for Job Routing. This may be accomplished by inventing a ClassAd attribute that the configuration utilizes in setting the policy for job identification, and the job defines this attribute to identify itself. If the invented attribute is called WantJobRouter, then the job identifies itself as a job that may be routed by placing in the submit description file:
+WantJobRouter = True

This implementation can be taken further, allowing the job to first be rejected within the local pool, before being a candidate for Job Routing:
+WantJobRouter = LastRejMatchTime =!= UNDEFINED
As appropriate to the potential grid site, create a grid proxy, and specify it in the submit description file:
x509userproxy = /tmp/x509up_u275

This is not necessary if the condor_job_router daemon is configured to add a grid proxy on behalf of jobs.

Job submission does not change for jobs that may be routed.

$ condor_submit job1.sub

where job1.sub might contain:

universe = vanilla
executable = my_executable
output = job1.stdout
error = job1.stderr
log = job1.ulog
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
+WantJobRouter = LastRejMatchTime =!= UNDEFINED
x509userproxy = /tmp/x509up_u275
queue

The status of the job may be observed as with any other HTCondor job, for example by looking in the job’s log file. Before the job completes, condor_q shows the job’s status. Should the job become routed, a second job will enter the job queue. This is the routed copy of the original job. The command condor_router_q shows a more specialized view of routed jobs, as this example shows:

$ condor_router_q -S
   JOBS ST Route      GridResource
     40  I Site1      site1.edu/jobmanager-condor
     10  I Site2      site2.edu/jobmanager-pbs
      2  R Site3      condor submit.site3.edu condor.site3.edu

condor_router_history summarizes the history of routed jobs, as this example shows:

$ condor_router_history
Routed job history from 2007-06-27 23:38 to 2007-06-28 23:38

Site            Hours    Jobs    Runs
                      Completed Aborted
-------------------------------------------------------
Site1              10       2     0
Site2               8       2     1
Site3              40       6     0
-------------------------------------------------------
TOTAL              58      10     1

5.4.3 An Example Configuration

The following sample configuration sets up potential job routing to three routes (grid sites). Definitions of the configuration variables specific to the Job Router are in section 3.5.19. One route is an HTCondor site accessed via the Globus gt2 protocol. A second route is a PBS site, also accessed via Globus gt2. The third site is an HTCondor site accessed by HTCondor-C. The condor_job_router daemon does not know which site will be best for a given job. The policy implemented in this sample configuration stops sending more jobs to a site, if ten jobs that have already been sent to that site are idle.

These configuration settings belong in the local configuration file of the machine where jobs are submitted. Check that the machine can successfully submit grid jobs before setting up and using the Job Router. Typically, the single required element that needs to be added for GSI authentication is an X.509 trusted certification authority directory, in a place recognized by HTCondor (for example, /etc/grid-security/certificates). The VDT (http://vdt.cs.wisc.edu) project provides a convenient way to set up and install a trusted CA, if needed.

Note that, as of version 8.5.6, the configuration language supports multi-line values, as shown in the example below (see section 3.3.5 for more details).

# These settings become the default settings for all routes
JOB_ROUTER_DEFAULTS @=jrd
  [
    requirements=target.WantJobRouter is True;
    MaxIdleJobs = 10;
    MaxJobs = 200;

    /* now modify routed job attributes */
    /* remove routed job if it goes on hold or stays idle for over 6 hours */
    set_PeriodicRemove = JobStatus == 5 ||
                        (JobStatus == 1 && (time() - QDate) > 3600*6);
    delete_WantJobRouter = true;
    set_requirements = true;
  ]
  @jrd

# This could be made an attribute of the job, rather than being hard-coded
ROUTED_JOB_MAX_TIME = 1440

# Now we define each of the routes to send jobs on
JOB_ROUTER_ENTRIES @=jre
  [ GridResource = "gt2 site1.edu/jobmanager-condor";
    name = "Site 1";
  ]
  [ GridResource = "gt2 site2.edu/jobmanager-pbs";
    name = "Site 2";
    set_GlobusRSL = "(maxwalltime=$(ROUTED_JOB_MAX_TIME))(jobType=single)";
  ]
  [ GridResource = "condor submit.site3.edu condor.site3.edu";
    name = "Site 3";
    set_remote_jobuniverse = 5;
  ]
  @jre

# Reminder: you must restart HTCondor for changes to DAEMON_LIST to take effect.
DAEMON_LIST = $(DAEMON_LIST) JOB_ROUTER

# For testing, set this to a small value to speed things up.
# Once you are running at large scale, set it to a higher value
# to prevent the JobRouter from using too much cpu.
JOB_ROUTER_POLLING_PERIOD = 10

#It is good to save lots of schedd queue history
#for use with the router_history command.
MAX_HISTORY_ROTATIONS = 20

5.4.4 Routing Table Entry ClassAd Attributes

The conversion of a job to a routed copy may require the job ClassAd to be modified. The Routing Table specifies attributes of the different possible routes and it may specify specific modifications that should be made to the job when it is sent along a specific route. In addition to this mechanism for transforming the job, external programs may be invoked to transform the job. For more information, see section 4.4.2.

The following attributes and instructions for modifying job attributes may appear in a Routing Table entry.

GridResource: Specifies the value for the GridResource attribute that will be inserted into the routed copy of the job’s ClassAd.
Name: An optional identifier that will be used in log messages concerning this route. If no name is specified, the default used will be the value of GridResource. The condor_job_router distinguishes routes and advertises statistics based on this attribute’s value.
Requirements: A Requirements expression that identifies jobs that may be matched to the route. Note that, as with all settings, requirements specified in the configuration variable JOB_ROUTER_ENTRIES override the setting of JOB_ROUTER_DEFAULTS. To specify global requirements that are not overridden by JOB_ROUTER_ENTRIES, use JOB_ROUTER_SOURCE_JOB_CONSTRAINT.
MaxJobs: An integer maximum number of jobs permitted on the route at one time. The default is 100.
MaxIdleJobs: An integer maximum number of routed jobs in the idle state. At or above this value, no more jobs will be sent to this site. This is intended to prevent too many jobs from being sent to sites which are too busy to run them. If the value set for this attribute is too small, the rate of job submission to the site will slow, because the condor_job_router daemon will submit jobs up to this limit, wait to see some of the jobs enter the running state, and then submit more. The disadvantage of setting this attribute’s value too high is that a lot of jobs may be sent to a site, only to site idle for hours or days. The default value is 50.
FailureRateThreshold: A maximum tolerated rate of job failures. Failure is determined by the expression sets for the attribute JobFailureTest expression. The default threshold is 0.03 jobs/second. If the threshold is exceeded, submission of new jobs is throttled until jobs begin succeeding, such that the failure rate is less than the threshold. This attribute implements black hole throttling, such that a site at which jobs are sent only to fail (a black hole) receives fewer jobs.
JobFailureTest: An expression evaluated for each job that finishes, to determine whether it was a failure. The default value if no expression is defined assumes all jobs are successful. Routed jobs that are removed are considered to be failures. An example expression to treat all jobs running for less than 30 minutes as failures is target.RemoteWallClockTime < 1800. A more flexible expression might reference a property or expression of the job that specifies a failure condition specific to the type of job.
TargetUniverse: An integer value specifying the desired universe for the routed copy of the job. The default value is 9, which is the grid universe.
UseSharedX509UserProxy: A boolean expression that when True causes the value of SharedX509UserProxy to be the X.509 user proxy for the routed job. Note that if the condor_job_router daemon is running as root, the copy of this file that is given to the job will have its ownership set to that of the user running the job. This requires the trust of the user. It is therefore recommended to avoid this mechanism when possible. Instead, require users to submit jobs with X509UserProxy set in the submit description file. If this feature is needed, use the boolean expression to only allow specific values of target.Owner to use this shared proxy file. The shared proxy file should be owned by the condor user. Currently, to use a shared proxy, the job must also turn on sandboxing by having the attribute JobShouldBeSandboxed.
SharedX509UserProxy: A string representing file containing the X.509 user proxy for the routed job.
JobShouldBeSandboxed: A boolean expression that when True causes the created copy of the job to be sandboxed. A copy of the input files will be placed in the condor_schedd daemon’s spool area for the target job, and when the job runs, the output will be staged back into the spool area. Once all of the output has been successfully staged back, it will be copied again, this time from the spool area of the sandboxed job back to the original job’s output locations. By default, sandboxing is turned off. Only to turn it on if using a shared X.509 user proxy or if direct staging of remote output files back to the final output locations is not desired.
OverrideRoutingEntry: A boolean value that when True, indicates that this entry in the routing table replaces any previous entry in the table with the same name. When False, it indicates that if there is a previous entry by the same name, the previous entry should be retained and this entry should be ignored. The default value is True.
Set_<ATTR>: Sets the value of <ATTR> in the routed copy’s job ClassAd to the specified value. An example of an attribute that might be set is PeriodicRemove. For example, if the routed job goes on hold or stays idle for too long, remove it and return the original copy of the job to a normal state.
Eval_Set_<ATTR>: Defines an expression. The expression is evaluated, and the resulting value sets the value of the routed copy’s job ClassAd attribute <ATTR>. Use this attribute to set a custom or local value, especially for modifying an attribute which may have been already specified in a default routing table.
Copy_<ATTR>: Defined with the name of a routed copy ClassAd attribute. Copies the value of <ATTR> from the original job ClassAd into the specified attribute named of the routed copy. Useful to save the value of an expression, before replacing it with something else that references the original expression.
Delete_<ATTR>: Deletes <ATTR> from the routed copy ClassAd. A value assigned to this attribute in the routing table entry is ignored.
EditJobInPlace: A boolean expression that, when True, causes the original job to be transformed in place rather than creating a new transformed version (a routed copy) of the job. In this mode, the Job Router Hook <Keyword>_HOOK_TRANSLATE_JOB and transformation rules in the routing table are applied during the job transformation. The routing table attribute GridResource is ignored, and there is no default transformation of the job from a vanilla job to a grid universe job as there is otherwise. Once transformed, the job is still a candidate for matching routing rules, so it is up to the routing logic to control whether the job may be transformed multiple times or not. For example, to transform the job only once, an attribute could be set in the job ClassAd to prevent it from matching the same routing rule in the future. To transform the job multiple times with limited frequency, a timestamp could be inserted into the job ClassAd marking the time of the last transformation, and the routing entry could require that this timestamp either be undefined or older than some limit.

5.4.5 Example: constructing the routing table from ReSS

The Open Science Grid has a service called ReSS (Resource Selection Service). It presents grid sites as ClassAds in an HTCondor collector. This example builds a routing table from the site ClassAds in the ReSS collector.

Using JOB_ROUTER_ENTRIES_CMD , we tell the condor_job_router daemon to call a simple script which queries the collector and outputs a routing table. The script, called osg_ress_routing_table.sh, is just this:

#!/bin/sh

# you _MUST_ change this:
export condor_status=/path/to/condor_status
# if no command line arguments specify -pool, use this:
export _CONDOR_COLLECTOR_HOST=osg-ress-1.fnal.gov

$condor_status -format '[ ' BeginAd \
-format 'GridResource = "gt2 %s"; ' GlueCEInfoContactString \
-format ']\n' EndAd "$@" | uniq

Save this script to a file and make sure the permissions on the file mark it as executable. Test this script by calling it by hand before trying to use it with the condor_job_router daemon. You may supply additional arguments such as -constraint to limit the sites which are returned.

Once you are satisfied that the routing table constructed by the script is what you want, configure the condor_job_router daemon to use it:

# command to build the routing table
JOB_ROUTER_ENTRIES_CMD = /path/to/osg_ress_routing_table.sh <extra arguments>

# how often to rebuild the routing table:
JOB_ROUTER_ENTRIES_REFRESH = 3600

Using the example configuration, use the above settings to replace JOB_ROUTER_ENTRIES . Or, leave JOB_ROUTER_ENTRIES there and have a routing table containing entries from both sources. When you restart or reconfigure the condor_job_router daemon, you should see messages in the Job Router’s log indicating that it is adding more routes to the table.

⇐ ↙ ↑ ⇑ ⇒ Contents Index