⇐ ↙ ↓ ⇑ ⇒ Contents Index5.4 The HTCondor Job Router
The HTCondor Job Router is an add-on to the condor_schedd that transforms jobs from one type
into another according to a configurable policy. This process of transforming the jobs is called job
routing.
One example of how the Job Router can be used is for the task of sending excess jobs to one or more remote grid
sites. The Job Router can transform the jobs such as vanilla universe jobs into grid universe jobs that use
any of the grid types supported by HTCondor. The rate at which jobs are routed can be matched
roughly to the rate at which the site is able to start running them. This makes it possible to balance a
large work flow across multiple grid sites, a local HTCondor pool, and any flocked HTCondor pools,
without having to guess in advance how quickly jobs will run and complete in each of the different
sites.
Job Routing is most appropriate for high throughput work flows, where there are many more jobs than
computers, and the goal is to keep as many of the computers busy as possible. Job Routing is less suitable
when there are a small number of jobs, and the scheduler needs to choose the best place for each
job, in order to finish them as quickly as possible. The Job Router does not know which site will run
the jobs faster, but it can decide whether to send more jobs to a site, based on whether jobs already
submitted to that site are sitting idle or not, as well as whether the site has experienced recent job
failures.
5.4.1 Routing Mechanism
The condor_job_router daemon and configuration determine a policy for which jobs may be transformed and sent
to grid sites. By default, a job is transformed into a grid universe job by making a copy of the original job ClassAd,
and modifying some attributes in this copy of the job. The copy is called the routed copy, and it shows up in the job
queue under a new job id.
Until the routed copy finishes or is removed, the original copy of the job passively mirrors the state of the routed
job. During this time, the original job is not available for matchmaking, because it is tied to the routed copy. The
original job also does not evaluate periodic expressions, such as PeriodicHold. Periodic expressions are evaluated
for the routed copy. When the routed copy completes, the original job ClassAd is updated such that it reflects the
final status of the job. If the routed copy is removed, the original job returns to the normal idle state, and is
available for matchmaking or rerouting. If, instead, the original job is removed or goes on hold, the routed copy is
removed.
Although the default mode routes vanilla universe jobs to grid universe jobs, the routing rules may be configured
to do some other transformation of the job. It is also possible to edit the job in place rather than creating a new
transformed version of the job.
The condor_job_router daemon utilizes a routing table, in which a ClassAd describes each site to where jobs
may be sent. The routing table is given in the New ClassAd language, as currently used by HTCondor
internally.
A good place to learn about the syntax of New ClassAds is the Informal Language Description in the C++
ClassAds tutorial: http://htcondor.org/classad/c++tut.html. Two essential differences distinguish the New ClassAd
language from the current one. In the New ClassAd language, each ClassAd is surrounded by square
brackets. And, in the New ClassAd language, each assignment statement ends with a semicolon. When
the New ClassAd is embedded in an HTCondor configuration file, it may appear all on a single line,
but the readability is often improved by inserting line continuation characters after each assignment
statement. This is done in the examples. Unfortunately, this makes the insertion of comments into the
configuration file awkward, because of the interaction between comments and line continuation characters in
configuration files. An alternative is to use C-style comments (/* …*/). Another alternative is to read in the
routing table entries from a separate file, rather than embedding them in the HTCondor configuration
file.
5.4.2 Job Submission with Job Routing Capability
If Job Routing is set up, then the following items ought to be considered for jobs to have the necessary prerequisites
to be considered for routing.
- Jobs appropriate for routing to the grid must not rely on access to a shared file system, or other
services that are only available on the local pool. The job will use HTCondor’s file transfer mechanism,
rather than relying on a shared file system to access input files and write output files. In the submit
description file, to enable file transfer, there will be a set of commands similar to
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = input1, input2
transfer_output_files = output1, output2
Vanilla universe jobs and most types of grid universe jobs differ in the set of files transferred back when the
job completes. Vanilla universe jobs transfer back all files created or modified, while all grid universe
jobs, except for HTCondor-C, only transfer back the output file, as well as those explicitly
listed with transfer_output_files. Therefore, when routing jobs to grid universes other than
HTCondor-C, it is important to explicitly specify all output files that must be transferred upon job
completion.
An additional difference between the vanilla universe jobs and gt2 grid universe jobs is that gt2 jobs do not
return any information about the job’s exit status. The exit status as reported in the job ClassAd and job
event log are always 0. Therefore, jobs that may be routed to a gt2 grid site must not rely upon a non-zero job
exit status.
- One configuration for routed jobs requires the jobs to identify themselves as candidates for Job Routing. This
may be accomplished by inventing a ClassAd attribute that the configuration utilizes in setting the policy for
job identification, and the job defines this attribute to identify itself. If the invented attribute is called
WantJobRouter, then the job identifies itself as a job that may be routed by placing in the submit description
file:
+WantJobRouter = True
This implementation can be taken further, allowing the job to first be rejected within the local pool, before
being a candidate for Job Routing:
+WantJobRouter = LastRejMatchTime =!= UNDEFINED
- As appropriate to the potential grid site, create a grid proxy, and specify it in the submit description
file:
x509userproxy = /tmp/x509up_u275
This is not necessary if the condor_job_router daemon is configured to add a grid proxy on behalf of
jobs.
Job submission does not change for jobs that may be routed.
$ condor_submit job1.sub
where job1.sub might contain:
universe = vanilla
executable = my_executable
output = job1.stdout
error = job1.stderr
log = job1.ulog
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
+WantJobRouter = LastRejMatchTime =!= UNDEFINED
x509userproxy = /tmp/x509up_u275
queue
The status of the job may be observed as with any other HTCondor job, for example by looking in the job’s log
file. Before the job completes, condor_q shows the job’s status. Should the job become routed, a second job will
enter the job queue. This is the routed copy of the original job. The command condor_router_q shows a more
specialized view of routed jobs, as this example shows:
$ condor_router_q -S
JOBS ST Route GridResource
40 I Site1 site1.edu/jobmanager-condor
10 I Site2 site2.edu/jobmanager-pbs
2 R Site3 condor submit.site3.edu condor.site3.edu
condor_router_history summarizes the history of routed jobs, as this example shows:
$ condor_router_history
Routed job history from 2007-06-27 23:38 to 2007-06-28 23:38
Site Hours Jobs Runs
Completed Aborted
-------------------------------------------------------
Site1 10 2 0
Site2 8 2 1
Site3 40 6 0
-------------------------------------------------------
TOTAL 58 10 1
5.4.3 An Example Configuration
The following sample configuration sets up potential job routing to three routes (grid sites). Definitions of the
configuration variables specific to the Job Router are in section 3.5.19. One route is an HTCondor site accessed via
the Globus gt2 protocol. A second route is a PBS site, also accessed via Globus gt2. The third site is an HTCondor
site accessed by HTCondor-C. The condor_job_router daemon does not know which site will be best for a given
job. The policy implemented in this sample configuration stops sending more jobs to a site, if ten jobs that have
already been sent to that site are idle.
These configuration settings belong in the local configuration file of the machine where jobs are submitted.
Check that the machine can successfully submit grid jobs before setting up and using the Job Router. Typically, the
single required element that needs to be added for GSI authentication is an X.509 trusted certification authority
directory, in a place recognized by HTCondor (for example, /etc/grid-security/certificates). The
VDT (http://vdt.cs.wisc.edu) project provides a convenient way to set up and install a trusted CA, if
needed.
Note that, as of version 8.5.6, the configuration language supports multi-line values, as shown in the example
below (see section 3.3.5 for more details).
# These settings become the default settings for all routes
JOB_ROUTER_DEFAULTS @=jrd
[
requirements=target.WantJobRouter is True;
MaxIdleJobs = 10;
MaxJobs = 200;
/* now modify routed job attributes */
/* remove routed job if it goes on hold or stays idle for over 6 hours */
set_PeriodicRemove = JobStatus == 5 ||
(JobStatus == 1 && (time() - QDate) > 3600*6);
delete_WantJobRouter = true;
set_requirements = true;
]
@jrd
# This could be made an attribute of the job, rather than being hard-coded
ROUTED_JOB_MAX_TIME = 1440
# Now we define each of the routes to send jobs on
JOB_ROUTER_ENTRIES @=jre
[ GridResource = "gt2 site1.edu/jobmanager-condor";
name = "Site 1";
]
[ GridResource = "gt2 site2.edu/jobmanager-pbs";
name = "Site 2";
set_GlobusRSL = "(maxwalltime=$(ROUTED_JOB_MAX_TIME))(jobType=single)";
]
[ GridResource = "condor submit.site3.edu condor.site3.edu";
name = "Site 3";
set_remote_jobuniverse = 5;
]
@jre
# Reminder: you must restart HTCondor for changes to DAEMON_LIST to take effect.
DAEMON_LIST = $(DAEMON_LIST) JOB_ROUTER
# For testing, set this to a small value to speed things up.
# Once you are running at large scale, set it to a higher value
# to prevent the JobRouter from using too much cpu.
JOB_ROUTER_POLLING_PERIOD = 10
#It is good to save lots of schedd queue history
#for use with the router_history command.
MAX_HISTORY_ROTATIONS = 20
5.4.4 Routing Table Entry ClassAd Attributes
The conversion of a job to a routed copy may require the job ClassAd to be modified. The Routing
Table specifies attributes of the different possible routes and it may specify specific modifications that
should be made to the job when it is sent along a specific route. In addition to this mechanism for
transforming the job, external programs may be invoked to transform the job. For more information, see
section 4.4.2.
The following attributes and instructions for modifying job attributes may appear in a Routing Table
entry.
-
GridResource
- Specifies the value for the GridResource attribute that will be inserted into the routed
copy of the job’s ClassAd.
-
Name
- An optional identifier that will be used in log messages concerning this route. If no name is specified,
the default used will be the value of GridResource. The condor_job_router distinguishes routes and
advertises statistics based on this attribute’s value.
-
Requirements
- A Requirements expression that identifies jobs that may be matched to the route. Note
that, as with all settings, requirements specified in the configuration variable JOB_ROUTER_ENTRIES
override the setting of JOB_ROUTER_DEFAULTS. To specify global requirements that are not overridden
by JOB_ROUTER_ENTRIES, use JOB_ROUTER_SOURCE_JOB_CONSTRAINT.
-
MaxJobs
- An integer maximum number of jobs permitted on the route at one time. The default is 100.
-
MaxIdleJobs
- An integer maximum number of routed jobs in the idle state. At or above this value, no more
jobs will be sent to this site. This is intended to prevent too many jobs from being sent to sites which
are too busy to run them. If the value set for this attribute is too small, the rate of job submission to
the site will slow, because the condor_job_router daemon will submit jobs up to this limit, wait to
see some of the jobs enter the running state, and then submit more. The disadvantage of setting this
attribute’s value too high is that a lot of jobs may be sent to a site, only to site idle for hours or days.
The default value is 50.
-
FailureRateThreshold
- A maximum tolerated rate of job failures. Failure is determined by the expression
sets for the attribute JobFailureTest expression. The default threshold is 0.03 jobs/second. If the
threshold is exceeded, submission of new jobs is throttled until jobs begin succeeding, such that the
failure rate is less than the threshold. This attribute implements black hole throttling, such that a site
at which jobs are sent only to fail (a black hole) receives fewer jobs.
-
JobFailureTest
- An expression evaluated for each job that finishes, to determine whether it was a failure.
The default value if no expression is defined assumes all jobs are successful. Routed jobs that are
removed are considered to be failures. An example expression to treat all jobs running for less than
30 minutes as failures is target.RemoteWallClockTime < 1800. A more flexible expression might
reference a property or expression of the job that specifies a failure condition specific to the type of
job.
-
TargetUniverse
- An integer value specifying the desired universe for the routed copy of the job. The default
value is 9, which is the grid universe.
-
UseSharedX509UserProxy
- A boolean
expression that when True causes the value of SharedX509UserProxy to be the X.509 user proxy for
the routed job. Note that if the condor_job_router daemon is running as root, the copy of this file
that is given to the job will have its ownership set to that of the user running the job. This requires the
trust of the user. It is therefore recommended to avoid this mechanism when possible. Instead, require
users to submit jobs with X509UserProxy set in the submit description file. If this feature is needed,
use the boolean expression to only allow specific values of target.Owner to use this shared proxy file.
The shared proxy file should be owned by the condor user. Currently, to use a shared proxy, the job
must also turn on sandboxing by having the attribute JobShouldBeSandboxed.
-
SharedX509UserProxy
- A string representing file containing the X.509 user proxy for the routed job.
-
JobShouldBeSandboxed
- A boolean expression that when True causes the created copy of the job to be
sandboxed. A copy of the input files will be placed in the condor_schedd daemon’s spool area for the
target job, and when the job runs, the output will be staged back into the spool area. Once all of the
output has been successfully staged back, it will be copied again, this time from the spool area of the
sandboxed job back to the original job’s output locations. By default, sandboxing is turned off. Only
to turn it on if using a shared X.509 user proxy or if direct staging of remote output files back to the
final output locations is not desired.
-
OverrideRoutingEntry
- A boolean value that when True, indicates that this entry in the routing table
replaces any previous entry in the table with the same name. When False, it indicates that if there
is a previous entry by the same name, the previous entry should be retained and this entry should be
ignored. The default value is True.
-
Set_<ATTR>
- Sets the value of <ATTR> in the routed copy’s job ClassAd to the specified value. An
example of an attribute that might be set is PeriodicRemove. For example, if the routed job goes on
hold or stays idle for too long, remove it and return the original copy of the job to a normal state.
-
Eval_Set_<ATTR>
- Defines an expression. The expression is evaluated, and the resulting value sets the
value of the routed copy’s job ClassAd attribute <ATTR>. Use this attribute to set a custom or local
value, especially for modifying an attribute which may have been already specified in a default routing
table.
-
Copy_<ATTR>
- Defined with the name of a routed copy ClassAd attribute. Copies the value of <ATTR>
from the original job ClassAd into the specified attribute named of the routed copy. Useful to save the
value of an expression, before replacing it with something else that references the original expression.
-
Delete_<ATTR>
- Deletes <ATTR> from the routed copy ClassAd. A value assigned to this attribute in
the routing table entry is ignored.
-
EditJobInPlace
- A boolean expression that, when True, causes the original job to be transformed in place
rather than creating a new transformed version (a routed copy) of the job. In this mode, the Job Router
Hook <Keyword>_HOOK_TRANSLATE_JOB and transformation rules in the routing table are applied
during the job transformation. The routing table attribute GridResource is ignored, and there is no
default transformation of the job from a vanilla job to a grid universe job as there is otherwise. Once
transformed, the job is still a candidate for matching routing rules, so it is up to the routing logic to
control whether the job may be transformed multiple times or not. For example, to transform the job
only once, an attribute could be set in the job ClassAd to prevent it from matching the same routing
rule in the future. To transform the job multiple times with limited frequency, a timestamp could be
inserted into the job ClassAd marking the time of the last transformation, and the routing entry could
require that this timestamp either be undefined or older than some limit.
5.4.5 Example: constructing the routing table from ReSS
The Open Science Grid has a service called ReSS (Resource Selection Service). It presents grid sites as ClassAds
in an HTCondor collector. This example builds a routing table from the site ClassAds in the ReSS
collector.
Using JOB_ROUTER_ENTRIES_CMD , we tell the condor_job_router daemon to call a simple script which queries
the collector and outputs a routing table. The script, called osg_ress_routing_table.sh, is just
this:
#!/bin/sh
# you _MUST_ change this:
export condor_status=/path/to/condor_status
# if no command line arguments specify -pool, use this:
export _CONDOR_COLLECTOR_HOST=osg-ress-1.fnal.gov
$condor_status -format '[ ' BeginAd \
-format 'GridResource = "gt2 %s"; ' GlueCEInfoContactString \
-format ']\n' EndAd "$@" | uniq
Save this script to a file and make sure the permissions on the file mark it as executable. Test this script by
calling it by hand before trying to use it with the condor_job_router daemon. You may supply additional
arguments such as -constraint to limit the sites which are returned.
Once you are satisfied that the routing table constructed by the script is what you want, configure the
condor_job_router daemon to use it:
# command to build the routing table
JOB_ROUTER_ENTRIES_CMD = /path/to/osg_ress_routing_table.sh <extra arguments>
# how often to rebuild the routing table:
JOB_ROUTER_ENTRIES_REFRESH = 3600
Using the example configuration, use the above settings to replace JOB_ROUTER_ENTRIES . Or, leave
JOB_ROUTER_ENTRIES there and have a routing table containing entries from both sources. When you restart or
reconfigure the condor_job_router daemon, you should see messages in the Job Router’s log indicating that it is
adding more routes to the table.
⇐ ↙ ↑ ⇑ ⇒ Contents Index