Flocking is HTCondor’s way of allowing jobs that cannot immediately run within the pool of machines where the job was submitted to instead run on a different HTCondor pool. If a machine within HTCondor pool A can send jobs to be run on HTCondor pool B, then we say that jobs from machine A flock to pool B. Flocking can occur in a one way manner, such as jobs from machine A flocking to pool B, or it can be set up to flock in both directions. Configuration variables allow the condor_schedd daemon (which runs on each machine that may submit jobs) to implement flocking.
NOTE: Flocking to pools which use HTCondor’s high availability mechanisms is not advised. See section 3.13.2 for a discussion of the issues.
The simplest flocking configuration sets a few configuration variables. If jobs from machine A are to flock to pool B, then in machine A’s configuration, set the following configuration variables:
This example configuration presumes that the condor_collector and condor_negotiator daemons are running on the same machine. See section 3.8.7 on page 1032 for a discussion of security macros and their use.
The configuration macros that must be set in pool B are ones that authorize jobs from machine A to flock to pool B.
The configuration variables are more easily set by introducing a list of machines where the jobs may flock from. FLOCK_FROM is a comma separated list of machines, and it is used in the default configuration setting of the security macros that do authorization:
Wild cards may be used when setting the FLOCK_FROM configuration variable. For example, *.cs.wisc.edu specifies all hosts from the cs.wisc.edu domain.
Further, if using Kerberos or GSI authentication, then the setting becomes:
To enable flocking in both directions, consider each direction separately, following the guidelines given.
A particular job will only flock to another pool when it cannot currently run in the current pool.
The submission of jobs other than standard universe jobs must consider the location of input, output and error files. The common case will be that machines within separate pools do not have a shared file system. Therefore, when submitting jobs, the user will need to enable file transfer mechanisms. These mechanisms are discussed in section 2.5.9 on page 91.