Please see section 3.7.3 on page .
HTCondor will automatically recognize a SMP machine and advertise each CPU of the machine separately. For more details, see section 3.5.10 on page .
Please see section 3.5.10 on page for a lengthy discussion on this topic.
Restrictions on what jobs will run on a given resource are enforced by only starting jobs that meet specific constraints, and these constraints are specified as part of the configuration.
To specify that a given machine should only run certain users' jobs, and always run the jobs regardless of other activity on the machine, load average, etc., place the following entry in the machine's HTCondor configuration file:
START = ( (User == "email@example.com") || \ (User == "firstname.lastname@example.org") )
A more likely scenario is that the machine is restricted to run only specific users' jobs, contingent on the machine not having other interactive activity and not being heavily loaded. The following entries are in the machine's HTCondor configuration file. Note that extra configuration variables are defined to make the START variable easier to read.
# Only start jobs if: # 1) the job is owned by the allowed users, AND # 2) the keyboard has been idle long enough, AND # 3) the load average is low enough OR the machine is currently # running an HTCondor job, and would therefore accept running # a different one AllowedUser = ( (User == "email@example.com") || \ (User == "firstname.lastname@example.org") ) KeyboardUnused = (KeyboardIdle > $(StartIdleTime)) NoOwnerLoad = ($(CPUIdle) || (State != "Unclaimed" && State != "Owner")) START = $(AllowedUser) && $(KeyboardUnused) && $(NoOwnerLoad)
To configure multiple machines to do so, create a common configuration file containing this entry for them to share.
This is a two-step process. First, you need to tell the machines to report that they have special software installed, and second, you need to tell the jobs to require machines that have that software.
To tell the machines to report the presence of special software, first add a parameter to their configuration files like so:
HAS_MY_SOFTWARE = True
And then, if there are already STARTD_ATTRS defined in that file, add HAS_MY_SOFTWARE to them, or, if not, add the line:
STARTD_ATTRS = HAS_MY_SOFTWARE, $(STARTD_ATTRS)
NOTE: For these changes to take effect, each condor_startd you update needs to be reconfigured with condor_reconfig -startd.
Next, to tell your jobs to only run on machines that have this software, add a requirements statement to their submit files like so:
Requirements = (HAS_MY_SOFTWARE =?= True)
NOTE: Be sure to use =?= instead of == so that if a machine doesn't have the HAS_MY_SOFTWARE parameter defined, the job's Requirements expression will not evaluate to ``undefined'', preventing it from running anywhere!
A commonly requested policy for running batch jobs is to only allow them to run at night, or at other pre-specified times of the day. HTCondor allows you to configure this policy with the use of the ClockMin and ClockDay condor_startd attributes. A complete example of how to use these attributes for this kind of policy is discussed in subsubsection 3.5.9 on page .
The RANDOM_INTEGER() macro can help in this instance. Instead of defining PERIODIC_CHECKPOINT to be a fixed interval, each machine is configured to randomly choose one of a set of intervals. For example, to set a machine's interval for producing checkpoints to within the range of two to three hours, use the following configuration:
PERIODIC_CHECKPOINT = $(LastCkpt) > ( 2 * $(HOUR) + \ $RANDOM_INTEGER(0,60,10) * $(MINUTE) )
The interval used is set at configuration time. Each machine is randomly assigned a different interval (2 hours, 2 hours and 10 minutes, 2 hours and 20 minutes, etc.) at which to produce checkpoints. Therefore, the various machines will not all attempt to produce checkpoints at the same time.
If a LOCAL_CONFIG_FILE is specified in the global configuration file, but the specified file does not exist, the condor_master will not start up, and it prints a variation of the following example message.
ERROR: Can't read config file /mnt/condor/hosts/bagel/condor_config.local
This is not a bug; it is a feature! HTCondor has always worked this way on purpose. There is a potentially large security hole if HTCondor is configured to read from a file that does not exist. By creating that file, a malicious user could change all sorts of HTCondor settings. This is an easy way to gain root access to a machine, where the daemons are running as root.
The intent is that if you've set up your global configuration file to read from a local configuration file, and the local file is not there, then something is wrong. It is better for the condor_master to exit right away and log an error message than to start up.
If the condor_master continued with the local configuration file missing, either A) someone could breach security or B) you will have potentially important configuration information missing. Consider the example where the local configuration file was on an NFS partition and the server was down. There would be all sorts of really important stuff in the local configuration file, and HTCondor might do bad things if it started without those settings.
If supplied it with an empty file, the condor_master works fine.