This is an outdated version of the HTCondor Manual. You can find current documentation at http://htcondor.org/manual.

Next: 3.6 Security Up: 3. Administrators' Manual Previous: 3.4 User Priorities and Contents Index

Subsections

3.5 Policy Configuration for the condor_startd

This section describes the configuration of machines, such that they, through the condor_startd daemon, implement a desired policy for when remote jobs should start, be suspended, (possibly) resumed, vacate (with a checkpoint) or be killed (no checkpoint). This policy is the heart of Condor's balancing act between the needs and wishes of resource owners (machine owners) and resource users (people submitting their jobs to Condor). Please read this section carefully if you plan to change any of the settings described here, as a wrong setting can have a severe impact on either the owners of machines in your pool (they may ask to be removed from the pool entirely) or the users of your pool (they may stop using Condor).

Before the details, there are a few things to note:

Much of this section refers to ClassAd expressions. Please read through section 4.1 on ClassAd expressions before continuing.
If defining the policy for an SMP machine (a multi-CPU machine), also read section 3.13.9 for specific information on configuring the condor_startd daemon for SMP machines. Each slot represented by the condor_startd daemon on an SMP machine has its own state and activity (as described below). In the future, each slot will be able to have its own individual policy expressions defined. Within this manual section, the word ``machine'' refers to an individual slot within an SMP machine.

To define a policy, set expressions in the configuration file (see section 3.3 on Configuring Condor for an introduction to Condor's configuration files). The expressions are evaluated in the context of the machine's ClassAd and a job ClassAd. The expressions can therefore reference attributes from either ClassAd. See the unnumbered Appendix on page for a list of job ClassAd attributes. See the unnumbered Appendix on page for a list of machine ClassAd attributes. The START expression is explained. It describes the conditions that must be met for a machine to start a job. The RANK expression for a machine is described. It allows the specification of the kinds of jobs a machine prefers to run. A final discussion details how the condor_startd daemon works. Included are the machine states and activities, to give an idea of what is possible in policy decisions. Two example policy settings are presented.

3.5.1 Startd ClassAd Attributes

The condor_startd daemon represents the machine on which it is running to the Condor pool. The daemon publishes characteristics about the machine in the machine's ClassAd to aid matchmaking with resource requests. The values of these attributes may be listed by using the command: condor_status -l hostname. On an SMP machine, the condor_startd will break the machine up and advertise it as separate slots, each with its own name and ClassAd.

3.5.2 The `START` expression

The most important expression to the condor_startd is the START expression. This expression describes the conditions that must be met for a machine to run a job. This expression can reference attributes in the machine's ClassAd (such as KeyboardIdle and LoadAvg) and attributes in a job ClassAd (such as Owner, Imagesize, and Cmd, the name of the executable the job will run). The value of the START expression plays a crucial role in determining the state and activity of a machine.

The Requirements expression is used for matching machines with jobs.

The condor_startd defines the Requirements expression by logically anding the START expression and the IS_VALID_CHECKPOINT_PLATFORM expression.

In situations where a machine wants to make itself unavailable for further matches, the Requirements expression is set to FALSE. When the START expression locally evaluates to TRUE, the machine advertises the Requirements expression as TRUE and does not publish the START expression.

Normally, the expressions in the machine ClassAd are evaluated against certain request ClassAds in the condor_negotiator to see if there is a match, or against whatever request ClassAd currently has claimed the machine. However, by locally evaluating an expression, the machine only evaluates the expression against its own ClassAd. If an expression cannot be locally evaluated (because it references other expressions that are only found in a request ad, such as Owner or Imagesize), the expression is (usually) undefined. See section 4.1 for specifics on how undefined terms are handled in ClassAd expression evaluation.

A note of caution is in order when modifying the START to reference job ClassAd attributes. The default Is_OWNER expression is a function of the START expression

START =?= FALSE

See a detailed discussion of the IS_OWNER expression in section 3.5.7. However, the machine locally evaluates the IS_OWNER expression to determine if it is capable of running jobs for Condor. Any job ClassAd attributes appearing in the START expression, and hence in the IS_OWNER expression are undefined in this context, and may lead to unexpected behavior. Whenever the START expression is modified to reference job ClassAd attributes, the IS_OWNER expression should also be modified to reference only machine ClassAd attributes.

NOTE: If you have machines with lots of real memory and swap space such that the only scarce resource is CPU time, consider defining JOB_RENICE_INCREMENT so that Condor starts jobs on the machine with low priority. Then, further configure to set up the machines with:

        START = True
        SUSPEND = False
        PREEMPT = False
        KILL = False

In this way, Condor jobs always run and can never be kicked off from activity on the machine. However, because they would run with ``nice priority'', interactive response on the machines will not suffer. You probably would not notice Condor was running the jobs, assuming you had enough free memory for the Condor jobs that there was little swapping.

3.5.3 The `IS_VALID_CHECKPOINT_PLATFORM` expression

A checkpoint is the platform-dependent information necessary to continue the execution of a standard universe job. Therefore, the machine (platform) upon which a job executed and produced a checkpoint limits the machines (platforms) which may use the checkpoint to continue job execution. This platform-dependent information is no longer the obvious combination of architecture and operating system, but may include subtle items such as the difference between the normal, bigmem, and hugemem kernels within the Linux operating system. This results in the incorporation of a separate expression to indicate the ability of a machine to resume and continue the execution of a job that has produced a checkpoint. The REQUIREMENTS expression is dependent on this information.

At a high level, IS_VALID_CHECKPOINT_PLATFORM is an expression which becomes true when a job's checkpoint platform matches the current checkpointing platform of the machine. Since this expression is anded with the START expression to produce the REQUIREMENTS expression, it must also behave correctly when evaluating in the context of jobs that are not standard universe.

In words, the current default policy for this expression:

Any non standard universe job may run on this machine. A standard universe job may run on machines with the new checkpointing identification system. A standard universe job may run if it has not yet produced a first checkpoint. If a standard universe job has produced a checkpoint, then make sure the checkpoint platforms between the job and the machine match.

The following is the default boolean expression for this policy. A JobUniverse value of 1 denotes the standard universe. This expression may be overridden in the Condor configuration files.

IS_VALID_CHECKPOINT_PLATFORM = 
(
  ( (TARGET.JobUniverse == 1) == FALSE) ||

  (
    (MY.CheckpointPlatform =!= UNDEFINED) &&
    (
      (TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
      (TARGET.NumCkpts == 0)
    )
  )
)

IS_VALID_CHECKPOINT_PLATFORM is a separate policy expression because the complexity of IS_VALID_CHECKPOINT_PLATFORM can be very high. While this functionality is conceptually separate from the normal START policies usually constructed, it is also a part of the Requirements to allow the job to run.

3.5.4 The `RANK` expression

A machine may be configured to prefer certain jobs over others using the RANK expression. It is an expression, like any other in a machine ClassAd. It can reference any attribute found in either the machine ClassAd or a job ClassAd. The most common use of this expression is likely to configure a machine to prefer to run jobs from the owner of that machine, or by extension, a group of machines to prefer jobs from the owners of those machines.

For example, imagine there is a small research group with 4 machines called tenorsax, piano, bass, and drums. These machines are owned by the 4 users coltrane, tyner, garrison, and jones, respectively.

Assume that there is a large Condor pool in the department, and this small research group has spent a lot of money on really fast machines for the group. As part of the larger pool, but to implement a policy that gives priority on the fast machines to anyone in the small research group, set the RANK expression on the machines to reference the Owner attribute and prefer requests where that attribute matches one of the people in the group as in

  RANK = Owner == "coltrane" || Owner == "tyner" \
    || Owner == "garrison" || Owner == "jones"

The RANK expression is evaluated as a floating point number. However, like in C, boolean expressions evaluate to either 1 or 0 depending on if they are True or False. So, if this expression evaluated to 1, because the remote job was owned by one of the preferred users, it would be a larger value than any other user for whom the expression would evaluate to 0.

A more complex RANK expression has the same basic set up, where anyone from the group has priority on their fast machines. Its difference is that the machine owner has better priority on their own machine. To set this up for Garrison's machine (bass), place the following entry in the local configuration file of machine bass:

  RANK = (Owner == "coltrane") + (Owner == "tyner") \
    + ((Owner == "garrison") * 10) + (Owner == "jones")

Note that the parentheses in this expression are important, because the + operator has higher default precedence than ==.

The use of + instead of || allows us to distinguish which terms matched and which ones did not. If anyone not in the research group quartet was running a job on the machine called bass, the RANK would evaluate numerically to 0, since none of the boolean terms evaluates to 1, and 0+0+0+0 still equals 0.

Suppose Elvin Jones submits a job. His job would match the bass machine, assuming START evaluated to True for him at that time. The RANK would numerically evaluate to 1. Therefore, the Elvin Jones job could preempt the Condor job currently running. Further assume that later Jimmy Garrison submits a job. The RANK evaluates to 10 on machine bass, since the boolean that matches gets multiplied by 10. Due to this, Jimmy Garrison's job could preempt Elvin Jones' job on the bass machine where Jimmy Garrison's jobs are preferred.

The RANK expression is not required to reference the Owner of the jobs. Perhaps there is one machine with an enormous amount of memory, and others with not much at all. Perhaps configure this large-memory machine to prefer to run jobs with larger memory requirements:

  RANK = ImageSize

That's all there is to it. The bigger the job, the more this machine wants to run it. It is an altruistic preference, always servicing the largest of jobs, no matter who submitted them. A little less altruistic is the RANK on Coltrane's machine that prefers John Coltrane's jobs over those with the largest Imagesize:

  RANK = (Owner == "coltrane" * 1000000000000) + Imagesize

This RANK does not work if a job is submitted with an image size of more $10^{12}$ Kbytes. However, with that size, this RANK expression preferring that job would not be Condor's only problem!

3.5.5 Machine States

A machine is assigned a state by Condor. The state depends on whether or not the machine is available to run Condor jobs, and if so, what point in the negotiations has been reached. The possible states are

Owner

The machine is being used by the machine owner, and/or is not available to run Condor jobs. When the machine first starts up, it begins in this state.

Unclaimed

The machine is available to run Condor jobs, but it is not currently doing so.

Matched

The machine is available to run jobs, and it has been matched by the negotiator with a specific schedd. That schedd just has not yet claimed this machine. In this state, the machine is unavailable for further matches.

Claimed

The machine has been claimed by a schedd.

Preempting

The machine was claimed by a schedd, but is now preempting that claim for one of the following reasons.

the owner of the machine came back
another user with higher priority has jobs waiting to run
another request that this resource would rather serve was found

Backfill

The machine is running a backfill computation while waiting for either the machine owner to come back or to be matched with a Condor job. This state is only entered if the machine is specifically configured to enable backfill jobs.

Figure 3.3 shows the states and the possible transitions between the states.

**Figure 3.3:** Machine States
$\includegraphics{admin-man/machine-states.eps}$

Each transition is labeled with a letter. The cause of each transition is described below.

Transitions out of the Owner state

A

The machine switches from Owner to Unclaimed whenever the START expression no longer locally evaluates to FALSE. This indicates that the machine is potentially available to run a Condor job.
Transitions out of the Unclaimed state

B

The machine switches from Unclaimed back to Owner whenever the START expression locally evaluates to FALSE. This indicates that the machine is unavailable to run a Condor job and is in use by the resource owner.

C

The transition from Unclaimed to Matched happens whenever the condor_negotiator matches this resource with a Condor job.

D

The transition from Unclaimed directly to Claimed also happens if the condor_negotiator matches this resource with a Condor job. In this case the condor_schedd receives the match and initiates the claiming protocol with the machine before the condor_startd receives the match notification from the condor_negotiator.

E

The transition from Unclaimed to Backfill happens if the machine is configured to run backfill computations (see section 3.13.11) and the START_BACKFILL expression evaluates to TRUE.
Transitions out of the Matched state

F

The machine moves from Matched to Owner if either the START expression locally evaluates to FALSE, or if the MATCH_TIMEOUT timer expires. This timeout is used to ensure that if a machine is matched with a given condor_schedd, but that condor_schedd does not contact the condor_startd to claim it, that the machine will give up on the match and become available to be matched again. In this case, since the START expression does not locally evaluate to FALSE, as soon as transition F is complete, the machine will immediately enter the Unclaimed state again (via transition A). The machine might also go from Matched to Owner if the condor_schedd attempts to perform the claiming protocol but encounters some sort of error. Finally, the machine will move into the Owner state if the condor_startd receives a condor_vacate command while it is in the Matched state.

G

The transition from Matched to Claimed occurs when the condor_schedd successfully completes the claiming protocol with the condor_startd.
Transitions out of the Claimed state
H
From the Claimed state, the only possible destination is the Preempting state. This transition can be caused by many reasons:
- The condor_schedd that has claimed the machine has no more work to perform and releases the claim
- The PREEMPT expression evaluates to TRUE (which usually means the resource owner has started using the machine again and is now using the keyboard, mouse, CPU, etc)
- The condor_startd receives a condor_vacate command
- The condor_startd is told to shutdown (either via a signal or a condor_off command)
- The resource is matched to a job with a better priority (either a better user priority, or one where the machine rank is higher)
Transitions out of the Preempting state

I

The resource will move from Preempting back to Claimed if the resource was matched to a job with a better priority.

J

The resource will move from Preempting to Owner if the PREEMPT expression had evaluated to TRUE, if condor_vacate was used, or if the START expression locally evaluates to FALSE when the condor_startd has finished evicting whatever job it was running when it entered the Preempting state.
Transitions out of the Backfill state
K
The resource will move from Backfill to Owner for the following reasons:
- The EVICT_BACKFILL expression evaluates to TRUE
- The condor_startd receives a condor_vacate command
- The condor_startd is being shutdown
L

The transition from Backfill to Matched occurs whenever a resource running a backfill computation is matched with a condor_schedd that wants to run a Condor job.

M

The transition from Backfill directly to Claimed is similar to the transition from Unclaimed directly to Claimed. It only occurs if the condor_schedd completes the claiming protocol before the condor_startd receives the match notification from the condor_negotiator.

3.5.5.1 The Claimed State and Leases

When a condor_schedd claims a condor_startd, there is a claim lease. So long as the keep alive updates from the condor_schedd to the condor_startd continue to arrive, the lease is reset. If the lease duration passes with no updates, the condor_startd drops the claim and evicts any jobs the condor_schedd sent over.

The alive interval is the amount of time between, or the frequency at which the condor_schedd sends keep alive updates to all condor_schedd daemons. An alive update resets the claim lease at the condor_startd. Updates are UDP packets.

Initially, as when the condor_schedd starts up, the alive interval starts at the value set by the configuration variable ALIVE_INTERVAL . It may be modified when a job is started. The job's ClassAd attribute JobLeaseDuration is checked. If the value of JobLeaseDuration/3 is less than the current alive interval, then the alive interval is set to either this lower value or the imposed lowest limit on the alive interval of 10 seconds. Thus, the alive interval starts at ALIVE_INTERVAL and goes down, never up.

If a claim lease expires, the condor_startd will drop the claim. The length of the claim lease is the job's ClassAd attribute JobLeaseDuration. JobLeaseDuration defaults to 20 minutes time, except when explicitly set within the job's submit description file. If JobLeaseDuration is explicitly set to 0, or it is not set as may be the case for a Web Services job that does not define the attribute, then JobLeaseDuration is given the Undefined value. Further, when undefined, the claim lease duration is calculated with MAX_CLAIM_ALIVES_MISSED * alive interval. The alive interval is the current value, as sent by the condor_schedd. If the condor_schedd reduces the current alive interval, it does not update the condor_startd.

3.5.6 Machine Activities

Within some machine states, activities of the machine are defined. The state has meaning regardless of activity. Differences between activities are significant. Therefore, a ``state/activity'' pair describes a machine. The following list describes all the possible state/activity pairs.

Owner

Idle

This is the only activity for Owner state. As far as Condor is concerned the machine is Idle, since it is not doing anything for Condor.
Unclaimed

Idle

This is the normal activity of Unclaimed machines. The machine is still Idle in that the machine owner is willing to let Condor jobs run, but Condor is not using the machine for anything.

Benchmarking

The machine is running benchmarks to determine the speed on this machine. This activity only occurs in the Unclaimed state. How often the activity occurs is determined by the RUNBENCHMARKS expression.
Matched

Idle

When Matched, the machine is still Idle to Condor.
Claimed

Idle

In this activity, the machine has been claimed, but the schedd that claimed it has yet to activate the claim by requesting a condor_starter to be spawned to service a job. The machine returns to this state (usually briefly) when jobs (and therefore condor_starter) finish.

Busy

Once a condor_starter has been started and the claim is active, the machine moves to the Busy activity to signify that it is doing something as far as Condor is concerned.

Suspended

If the job is suspended by Condor, the machine goes into the Suspended activity. The match between the schedd and machine has not been broken (the claim is still valid), but the job is not making any progress and Condor is no longer generating a load on the machine.

Retiring

When an active claim is about to be preempted for any reason, it enters retirement, while it waits for the current job to finish. The MaxJobRetirementTime expression determines how long to wait (counting since the time the job started). Once the job finishes or the retirement time expires, the Preempting state is entered.
Preempting The preempting state is used for evicting a Condor job from a given machine. When the machine enters the Preempting state, it checks the WANT_VACATE expression to determine its activity.

Vacating

In the Vacating activity, the job that was running is in the process of checkpointing. As soon as the checkpoint process completes, the machine moves into either the Owner state or the Claimed state, depending on the reason for its preemption.

Killing

Killing means that the machine has requested the running job to exit the machine immediately, without checkpointing.
Backfill

Idle

The machine is configured to run backfill jobs and is ready to do so, but it has not yet had a chance to spawn a backfill manager (for example, the BOINC client).

Busy

The machine is performing a backfill computation.

Killing

The machine was running a backfill computation, but it is now killing the job to either return resources to the machine owner, or to make room for a regular Condor job.

Figure 3.4 on page gives the overall view of all machine states and activities and shows the possible transitions from one to another within the Condor system. Each transition is labeled with a number on the diagram, and transition numbers referred to in this manual will be bold.

**Figure 3.4:** Machine States and Activities
$\includegraphics{admin-man/machine-activities.eps}$

Various expressions are used to determine when and if many of these state and activity transitions occur. Other transitions are initiated by parts of the Condor protocol (such as when the condor_negotiator matches a machine with a schedd). The following section describes the conditions that lead to the various state and activity transitions.

3.5.7 State and Activity Transitions

This section traces through all possible state and activity transitions within a machine and describes the conditions under which each one occurs. Whenever a transition occurs, Condor records when the machine entered its new activity and/or new state. These times are often used to write expressions that determine when further transitions occurred. For example, enter the Killing activity if a machine has been in the Vacating activity longer than a specified amount of time.

3.5.7.1 Owner State

When the startd is first spawned, the machine it represents enters the Owner state. The machine remains in the Owner state while the expression IS_OWNER is TRUE. If the IS_OWNER expression is FALSE, then the machine transitions to the Unclaimed state. The default value for the IS_OWNER expression is optimized for a shared resource

START =?= FALSE

So, the machine will remain in the Owner state as long as the START expression locally evaluates to FALSE. Section 3.5.2 provides more detail on the START expression. If the START locally evaluates to TRUE or cannot be locally evaluated (it evaluates to UNDEFINED), transition 1 occurs and the machine enters the Unclaimed state. The IS_OWNER expression is locally evaluated by the machine, and should not reference job ClassAd attributes, which would be UNDEFINED.

For dedicated resources, the recommended value for the IS_OWNER expression is FALSE.

The Owner state represents a resource that is in use by its interactive owner (for example, if the keyboard is being used). The Unclaimed state represents a resource that is neither in use by its interactive user, nor the Condor system. From Condor's point of view, there is little difference between the Owner and Unclaimed states. In both cases, the resource is not currently in use by the Condor system. However, if a job matches the resource's START expression, the resource is available to run a job, regardless of if it is in the Owner or Unclaimed state. The only differences between the two states are how the resource shows up in condor_status and other reporting tools, and the fact that Condor will not run benchmarking on a resource in the Owner state. As long as the IS_OWNER expression is TRUE, the machine is in the Owner State. When the IS_OWNER expression is FALSE, the machine goes into the Unclaimed State.

Here is an example that assumes that an IS_OWNER expression is not present in the configuration. If the START expression is

START = KeyboardIdle > 15 * $(MINUTE) && Owner == "coltrane"

and if KeyboardIdle is 34 seconds, then the machine would remain in the Owner state. Owner is undefined, and anything && FALSE is FALSE.

If, however, the START expression is

        START = KeyboardIdle > 15 * $(MINUTE) || Owner == "coltrane"

and KeyboardIdle is 34 seconds, then the machine leaves the Owner state and becomes Unclaimed. This is because FALSE || UNDEFINED is UNDEFINED. So, while this machine is not available to just anybody, if user coltrane has jobs submitted, the machine is willing to run them. Any other user's jobs have to wait until KeyboardIdle exceeds 15 minutes. However, since coltrane might claim this resource, but has not yet, the machine goes to the Unclaimed state.

While in the Owner state, the startd polls the status of the machine every UPDATE_INTERVAL to see if anything has changed that would lead it to a different state. This minimizes the impact on the Owner while the Owner is using the machine. Frequently waking up, computing load averages, checking the access times on files, computing free swap space take time, and there is nothing time critical that the startd needs to be sure to notice as soon as it happens. If the START expression evaluates to TRUE and five minutes pass before the startd notices, that's a drop in the bucket of high-throughput computing.

The machine can only transition to the Unclaimed state from the Owner state. It does so when the IS_OWNER expression no longer evaluates to FALSE. By default, that happens when START no longer locally evaluates to FALSE.

Whenever the machine is not actively running a job, it will transition back to the Owner state if IS_OWNER evaluates to TRUE. Once a job is started, the value of IS_OWNER does not matter; the job either runs to completion or is preempted. Therefore, you must configure the preemption policy if you want to transition back to the Owner state from Claimed Busy.

3.5.7.2 Unclaimed State

If the IS_OWNER expression becomes TRUE, then the machine returns to the Owner state. If the IS_OWNER expression becomes FALSE, then the machine remains in the Unclaimed state. If the IS_OWNER expression is not present in the configuration files, then the default value for the IS_OWNER expression is

START =?= FALSE

so that while in the Unclaimed state, if the START expression locally evaluates to FALSE, the machine returns to the Owner state by transition 2.

When in the Unclaimed state, the RUNBENCHMARKS expression is relevant. If RUNBENCHMARKS evaluates to TRUE while the machine is in the Unclaimed state, then the machine will transition from the Idle activity to the Benchmarking activity (transition 3) and perform benchmarks to determine MIPS and KFLOPS. When the benchmarks complete, the machine returns to the Idle activity (transition 4).

The startd automatically inserts an attribute, LastBenchmark, whenever it runs benchmarks, so commonly RunBenchmarks is defined in terms of this attribute, for example:

        BenchmarkTimer = (CurrentTime - LastBenchmark)
        RunBenchmarks = $(BenchmarkTimer) >= (4 * $(HOUR))

Here, a macro, BenchmarkTimer is defined to help write the expression. This macro holds the time since the last benchmark, so when this time exceeds 4 hours, we run the benchmarks again. The startd keeps a weighted average of these benchmarking results to try to get the most accurate numbers possible. This is why it is desirable for the startd to run them more than once in its lifetime.

NOTE: LastBenchmark is initialized to 0 before benchmarks have ever been run. To have the condor_startd run benchmarks as soon as the machine is Unclaimed (if it has not done so already), include a term using LastBenchmark as in the example above.

NOTE: If RUNBENCHMARKS is defined and set to something other than FALSE, the startd will automatically run one set of benchmarks when it first starts up. To disable benchmarks, both at startup and at any time thereafter, set RUNBENCHMARKS to FALSE or comment it out of the configuration file.

From the Unclaimed state, the machine can go to four other possible states: Owner (transition 2), Backfill/Idle, Matched, or Claimed/Idle.

Once the condor_negotiator matches an Unclaimed machine with a requester at a given schedd, the negotiator sends a command to both parties, notifying them of the match. If the schedd receives that notification and initiates the claiming procedure with the machine before the negotiator's message gets to the machine, the Match state is skipped, and the machine goes directly to the Claimed/Idle state (transition 5). However, normally the machine will enter the Matched state (transition 6), even if it is only for a brief period of time.

If the machine has been configured to perform backfill jobs (see section 3.13.11), while it is in Unclaimed/Idle it will evaluate the START_BACKFILL expression. Once START_BACKFILL evaluates to TRUE, the machine will enter the Backfill/Idle state (transition 7) to begin the process of running backfill jobs.

3.5.7.3 Matched State

The Matched state is not very interesting to Condor. Noteworthy in this state is that the machine lies about its START expression while in this state and says that Requirements are False to prevent being matched again before it has been claimed. Also interesting is that the startd starts a timer to make sure it does not stay in the Matched state too long. The timer is set with the MATCH_TIMEOUT configuration file macro. It is specified in seconds and defaults to 120 (2 minutes). If the schedd that was matched with this machine does not claim it within this period of time, the machine gives up, and goes back into the Owner state via transition 8. It will probably leave the Owner state right away for the Unclaimed state again and wait for another match.

At any time while the machine is in the Matched state, if the START expression locally evaluates to FALSE, the machine enters the Owner state directly (transition 8).

If the schedd that was matched with the machine claims it before the MATCH_TIMEOUT expires, the machine goes into the Claimed/Idle state (transition 9).

3.5.7.4 Claimed State

The Claimed state is certainly the most complex state. It has the most possible activities and the most expressions that determine its next activities. In addition, the condor_checkpoint and condor_vacate commands affect the machine when it is in the Claimed state. In general, there are two sets of expressions that might take effect. They depend on the universe of the request: standard or vanilla. The standard universe expressions are the normal expressions. For example:

        WANT_SUSPEND            = True
        WANT_VACATE             = $(ActivationTimer) > 10 * $(MINUTE)
        SUSPEND                 = $(KeyboardBusy) || $(CPUBusy)
        ...

The vanilla expressions have the string``_VANILLA'' appended to their names. For example:

        WANT_SUSPEND_VANILLA    = True
        WANT_VACATE_VANILLA     = True
        SUSPEND_VANILLA         = $(KeyboardBusy) || $(CPUBusy)
        ...

Without specific vanilla versions, the normal versions will be used for all jobs, including vanilla jobs. In this manual, the normal expressions are referenced. The difference exists for the the resource owner that might want the machine to behave differently for vanilla jobs, since they cannot checkpoint. For example, owners may want vanilla jobs to remain suspended for longer than standard jobs.

While Claimed, the POLLING_INTERVAL takes effect, and the startd polls the machine much more frequently to evaluate its state.

If the machine owner starts typing on the console again, it is best to notice this as soon as possible to be able to start doing whatever the machine owner wants at that point. For SMP machines, if any slot is in the Claimed state, the startd polls the machine frequently. If already polling one slot, it does not cost much to evaluate the state of all the slots at the same time.

There are a variety of events that may cause the startd to try to get rid of or temporarily suspend a running job. Activity on the machine's console, load from other jobs, or shutdown of the startd via an administrative command are all possible sources of interference. Another one is the appearance of a higher priority claim to the machine by a different Condor user.

Depending on the configuration, the startd may respond quite differently to activity on the machine, such as keyboard activity or demand for the cpu from processes that are not managed by Condor. The startd can be configured to completely ignore such activity or to suspend the job or even to kill it. A standard configuration for a desktop machine might be to go through successive levels of getting the job out of the way. The first and least costly to the job is suspending it. This works for both standard and vanilla jobs. If suspending the job for a short while does not satisfy the machine owner (the owner is still using the machine after a specific period of time), the startd moves on to vacating the job. Vacating a standard universe job involves performing a checkpoint so that the work already completed is not lost. Vanilla jobs are sent a soft kill signal so that they can gracefully shut down if necessary; the default is SIGTERM. If vacating does not satisfy the machine owner (usually because it is taking too long and the owner wants their machine back now), the final, most drastic stage is reached: killing. Killing is a quick death to the job, using a hard-kill signal that cannot be intercepted by the application. For vanilla jobs that do no special signal handling, vacating and killing are equivalent.

The WANT_SUSPEND expression determines if the machine will evaluate the SUSPEND expression to consider entering the Suspended activity. The WANT_VACATE expression determines what happens when the machine enters the Preempting state. It will go to the Vacating activity or directly to Killing. If one or both of these expressions evaluates to FALSE, the machine will skip that stage of getting rid of the job and proceed directly to the more drastic stages.

When the machine first enters the Claimed state, it goes to the Idle activity. From there, it has two options. It can enter the Preempting state via transition 10 (if a condor_vacate arrives, or if the START expression locally evaluates to FALSE), or it can enter the Busy activity (transition 11) if the schedd that has claimed the machine decides to activate the claim and start a job.

From Claimed/Busy, the machine can transition to three other state/activity pairs. The startd evaluates the WANT_SUSPEND expression to decide which other expressions to evaluate. If WANT_SUSPEND is TRUE, then the startd evaluates the SUSPEND expression. If WANT_SUSPEND is any value other than TRUE, then the startd will evaluate the PREEMPT expression and skip the Suspended activity entirely. By transition, the possible state/activity destinations from Claimed/Busy:

Claimed/Idle

If the starter that is serving a given job exits (for example because the jobs completes), the machine will go to Claimed/Idle (transition 12).

Claimed/Retiring

If WANT_SUSPEND is FALSE and the PREEMPT expression is TRUE, the machine enters the Retiring activity (transition 13). From there, it waits for a configurable amount of time for the job to finish before moving on to preemption.

Another reason the machine would go from Claimed/Busy to Claimed/Retiring is if the condor_negotiator matched the machine with a ``better'' match. This better match could either be from the machine's perspective using the startd RANK expression, or it could be from the negotiator's perspective due to a job with a higher user priority.

Another case resulting in a transition to Claimed/Retiring is when the startd is being shut down. The only exception is a ``fast'' shutdown, which bypasses retirement completely.

Claimed/Suspended

If both the WANT_SUSPEND and SUSPEND expressions evaluate to TRUE, the machine suspends the job (transition 14).

If a condor_checkpoint command arrives, or the PERIODIC_CHECKPOINT expression evaluates to TRUE, there is no state change. The startd has no way of knowing when this process completes, so periodic checkpointing can not be another state. Periodic checkpointing remains in the Claimed/Busy state and appears as a running job.

From the Claimed/Suspended state, the following transitions may occur:

Claimed/Busy: If the CONTINUE expression evaluates to TRUE, the machine resumes the job and enters the Claimed/Busy state (transition 15) or the Claimed/Retiring state (transition 16), depending on whether the claim has been preempted.
Claimed/Retiring: If the PREEMPT expression is TRUE, the machine will enter the Claimed/Retiring activity (transition 16).
Preempting: If the claim is in suspended retirement and the retirement time expires, the job enters the Preempting state (transition 17). This is only possible if MaxJobRetirementTime decreases during the suspension.

For the Claimed/Retiring state, the following transitions may occur:

Preempting: If the job finishes or the job's run time exceeds the value defined for the job ClassAd attribute MaxJobRetirementTime, the Preempting state is entered (transition 18). The run time is computed from the time when the job was started by the startd minus any suspension time. When retiring due to condor_startd daemon shutdown or restart, it is possible for the administrator to issue a peaceful shutdown command, which causes MaxJobRetirementTime to effectively be infinite, avoiding any killing of jobs. It is also possible for the administrator to issue a fast shutdown command, which causes MaxJobRetirementTime to be effectively 0.
Claimed/Busy: If the startd was retiring because of a preempting claim only and the preempting claim goes away, the normal Claimed/Busy state is resumed (transition 19). If instead the retirement is due to owner activity (PREEMPT) or the startd is being shut down, no unretirement is possible.
Claimed/Suspended: In exactly the same way that suspension may happen from the Claimed/Busy state, it may also happen during the Claimed/Retiring state (transition 20). In this case, when the job continues from suspension, it moves back into Claimed/Retiring (transition 16) instead of Claimed/Busy (transition 15).

3.5.7.5 Preempting State

The Preempting state is less complex than the Claimed state. There are two activities. Depending on the value of WANT_VACATE, a machine will be in the Vacating activity (if TRUE) or the Killing activity (if FALSE).

While in the Preempting state (regardless of activity) the machine advertises its Requirements expression as FALSE to signify that it is not available for further matches, either because it is about to transition to the Owner state, or because it has already been matched with one preempting match, and further preempting matches are disallowed until the machine has been claimed by the new match.

The main function of the Preempting state is to get rid of the starter associated with the resource. If the condor_starter associated with a given claim exits while the machine is still in the Vacating activity, then the job successfully completed a graceful shutdown. For standard universe jobs, this means that a checkpoint was saved. For other jobs, this means the application was given an opportunity to do a graceful shutdown, by intercepting the soft kill signal.

If the machine is in the Vacating activity, it keeps evaluating the KILL expression. As soon as this expression evaluates to TRUE, the machine enters the Killing activity (transition 21).

When the starter exits, or if there was no starter running when the machine enters the Preempting state (transition 10), the other purpose of the Preempting state is completed: notifying the schedd that had claimed this machine that the claim is broken.

At this point, the machine enters either the Owner state by transition 22 (if the job was preempted because the machine owner came back) or the Claimed/Idle state by transition 23 (if the job was preempted because a better match was found).

If the machine enters the Killing activity, (because either WANT_VACATE was FALSE or the KILL expression evaluated to TRUE), it attempts to force the condor_starter to immediately kill the underlying Condor job. Once the machine has begun to hard kill the Condor job, the condor_startd starts a timer, the length of which is defined by the KILLING_TIMEOUT macro. This macro is defined in seconds and defaults to 30. If this timer expires and the machine is still in the Killing activity, something has gone seriously wrong with the condor_starter and the startd tries to vacate the job immediately by sending SIGKILL to all of the condor_starter's children, and then to the condor_starter itself.

Once the condor_starter has killed off all the processes associated with the job and exited, and once the schedd that had claimed the machine is notified that the claim is broken, the machine will leave the Preempting/Killing state. If the job was preempted because a better match was found, the machine will enter Claimed/Idle (transition 24). If the preemption was caused by the machine owner (the PREEMPT expression evaluated to TRUE, condor_vacate was used, etc), the machine will enter the Owner state (transition 25).

3.5.7.6 Backfill State

The Backfill state is used whenever the machine is performing low priority background tasks to keep itself busy. For more information about backfill support in Condor, see section 3.13.11 on page . This state is only used if the machine has been configured to enable backfill computation, if a specific backfill manager has been installed and configured, and if the machine is otherwise idle (not being used interactively or for regular Condor computations). If the machine meets all these requirements, and the START_BACKFILL expression evaluates to TRUE, the machine will move from the Unclaimed/Idle state to Backfill/Idle (transition 7).

Once a machine is in Backfill/Idle, it will immediately attempt to spawn whatever backfill manager it has been configured to use (currently, only the BOINC client is supported as a backfill manager in Condor). Once the BOINC client is running, the machine will enter Backfill/Busy (transition 26) to indicate that it is now performing a backfill computation.

NOTE: On SMP machines, the condor_startd will only spawn a single instance of the BOINC client, even if multiple slots are available to run backfill jobs. Therefore, only the first machine to enter Backfill/Idle will cause a copy of the BOINC client to start running. If a given slot on an SMP enters the Backfill state and a BOINC client is already running under this condor_startd, the slot will immediately enter Backfill/Busy without waiting to spawn another copy of the BOINC client.

If the BOINC client ever exits on its own (which normally wouldn't happen), the machine will go back to Backfill/Idle (transition 27) where it will immediately attempt to respawn the BOINC client (and return to Backfill/Busy via transition 26).

As the BOINC client is running a backfill computation, a number of events can occur that will drive the machine out of the Backfill state. The machine can get matched or claimed for a Condor job, interactive users can start using the machine again, the machine might be evicted with condor_vacate, or the condor_startd might be shutdown. All of these events cause the condor_startd to kill the BOINC client and all its descendants, and enter the Backfill/Killing state (transition 28).

Once the BOINC client and all its children have exited the system, the machine will enter the Backfill/Idle state to indicate that the BOINC client is now gone (transition 29). As soon as it enters Backfill/Idle after the BOINC client exits, the machine will go into another state, depending on what caused the BOINC client to be killed in the first place.

If the EVICT_BACKFILL expression evaluates to TRUE while a machine is in Backfill/Busy, after the BOINC client is gone, the machine will go back into the Owner/Idle state (transition 30). The machine will also return to the Owner/Idle state after the BOINC client exits if condor_vacate was used, or if the condor_startd is being shutdown.

When a machine running backfill jobs is matched with a requester that wants to run a Condor job, the machine will either enter the Matched state, or go directly into Claimed/Idle. As with the case of a machine in Unclaimed/Idle (described above), the condor_negotiator informs both the condor_startd and the condor_schedd of the match, and the exact state transitions at the machine depend on what order the various entities initiate communication with each other. If the condor_schedd is notified of the match and sends a request to claim the condor_startd before the condor_negotiator has a chance to notify the condor_startd, once the BOINC client exits, the machine will immediately enter Claimed/Idle (transition 31). Normally, the notification from the condor_negotiator will reach the condor_startd before the condor_schedd attempts to claim it. In this case, once the BOINC client exits, the machine will enter Matched/Idle (transition 32).

3.5.8 State/Activity Transition Expression Summary

This section is a summary of the information from the previous sections. It serves as a quick reference.

START

When TRUE, the machine is willing to spawn a remote Condor job.

RUNBENCHMARKS

While in the Unclaimed state, the machine will run benchmarks whenever TRUE.

MATCH_TIMEOUT

If the machine has been in the Matched state longer than this value, it will transition to the Owner state.

WANT_SUSPEND

If TRUE, the machine evaluates the SUSPEND expression to see if it should transition to the Suspended activity. If any value other than TRUE, the machine will look at the PREEMPT expression.

SUSPEND

If WANT_SUSPEND is TRUE, and the machine is in the Claimed/Busy state, it enters the Suspended activity if SUSPEND is TRUE.

CONTINUE

If the machine is in the Claimed/Suspended state, it enter the Busy activity if CONTINUE is TRUE.

PREEMPT

If the machine is either in the Claimed/Suspended activity, or is in the Claimed/Busy activity and WANT_SUSPEND is FALSE, the machine enters the Claimed/Retiring state whenever PREEMPT is TRUE.

CLAIM_WORKLIFE

If provided, this expression specifies the number of seconds during which a claim will continue accepting new jobs. Once this time expires, any existing job may continue to run as usual, but once it finishes or is preempted, the claim is closed. This may be useful if you want to force periodic renegotiation of resources without preemption having to occur. For example, if you have some low-priority jobs which should never be interrupted with kill signals, you could prevent them from being killed with MaxJobRetirementTime, but now high-priority jobs may have to wait in line when they match to a machine that is busy running one of these uninterruptible jobs. You can prevent the high-priority jobs from ever matching to such a machine by using a rank expression in the job or in the negotiator's rank expressions, but then the low-priority claim will never be interrupted; it can keep running more jobs. The solution is to use CLAIM_WORKLIFE to force the claim to stop running additional jobs after a certain amount of time. The default value for CLAIM_WORKLIFE is -1, which is treated as an infinite claim worklife, so claims may be held indefinitely (as long as they are not preempted and the schedd does not relinquish them, of course).

MAXJOBRETIREMENTTIME

If the machine is in the Claimed/Retiring state, this expression specifies the maximum time (in seconds) that the condor_startd will wait for the job to finish naturally (without any kill signals from the condor_startd). The clock starts when the job is started and is paused during any suspension. The job may provide its own expression for MaxJobRetirementTime, but this can only be used to take less than the time granted by the condor_startd, never more. For convenience, standard universe and nice_user jobs are submitted with a default retirement time of 0, so they will never wait in retirement unless the user overrides the default.

Once the job finishes or if the retirement time expires, the machine enters the Preempting state.

This expression is evaluated in the context of the job ClassAd, so it may refer to attributes of the current job as well as machine attributes. The expression is continually re-evaluated while the job is running, so it is possible, though unusual, to have an expression that changes over time. For example, if you want the retirement time to drop to 0 if an especially high priority job is waiting for the current job to retire, you could use PreemptingRank in the expression. Example:

MAXJOBRETIREMENTTIME = 3600 * ( \
  MY.PreemptingRank =?= UNDEFINED || \
  PreemptingRank < 600)

In this example, the retirement time is 3600 seconds, but if a job gets matched to this machine and it has a PreemptingRank of 600 or more, the retirement time drops to 0 and the current job is immediately preempted.

WANT_VACATE

This is checked only when the PREEMPT expression is TRUE and the machine enters the Preempting state. If WANT_VACATE is TRUE, the machine enters the Vacating activity. If it is FALSE, the machine will proceed directly to the Killing activity.

KILL

If the machine is in the Preempting/Vacating state, it enters Preempting/Killing whenever KILL is TRUE.

KILLING_TIMEOUT

If the machine is in the Preempting/Killing state for longer than KILLING_TIMEOUT seconds, the condor_startd sends a SIGKILL to the condor_starter and all its children to try to kill the job as quickly as possible.

PERIODIC_CHECKPOINT

If the machine is in the Claimed/Busy state and PERIODIC_CHECKPOINT is TRUE, the user's job begins a periodic checkpoint.

RANK

If this expression evaluates to a higher number for a pending resource request than it does for the current request, the machine preempts the current request (enters the Preempting/Vacating state). When the preemption is complete, the machine enters the Claimed/Idle state with the new resource request claiming it.

START_BACKFILL

When TRUE, if the machine is otherwise idle, it will enter the Backfill state and spawn a backfill computation (using BOINC).

EVICT_BACKFILL

When TRUE, if the machine is currently running a backfill computation, it will kill the BOINC client and return to the Owner/Idle state.

3.5.9 Policy Settings

This section describes the default configuration policy and then provides examples of extensions to these policies.

3.5.9.1 Default Policy Settings

These settings are the default as shipped with Condor. They have been used for many years with no problems. The vanilla expressions are identical to the regular ones. (They are not listed here. If not defined, the standard expressions are used for vanilla jobs as well).

The following are macros to help write the expressions clearly.

StateTimer: Amount of time in seconds in the current state.
ActivityTimer: Amount of time in seconds in the current activity.
ActivationTimer: Amount of time in seconds that the job has been running on this machine.
LastCkpt: Amount of time since the last periodic checkpoint.
NonCondorLoadAvg: The difference between the system load and the Condor load (the load generated by everything but Condor).
BackgroundLoad: Amount of background load permitted on the machine and still start a Condor job.
HighLoad: If the $(NonCondorLoadAvg) goes over this, the CPU is considered too busy, and eviction of the Condor job should start.
StartIdleTime: Amount of time the keyboard must to be idle before Condor will start a job.
ContinueIdleTime: Amount of time the keyboard must to be idle before resumption of a suspended job.
MaxSuspendTime: Amount of time a job may be suspended before more drastic measures are taken.
MaxVacateTime: Amount of time a job may be checkpointing before we give up and kill it outright.
KeyboardBusy: A boolean expression that evaluates to TRUE when the keyboard is being used.
CPUIdle: A boolean expression that evaluates to TRUE when the CPU is idle.
CPUBusy: A boolean expression that evaluates to TRUE when the CPU is busy.
MachineBusy: The CPU or the Keyboard is busy.
CPUIsBusy: A boolean value set to the same value as CPUBusy.
CPUBusyTime: The value 0 if CPUBusy is False; the time in seconds since CPUBusy became True.

##  These macros are here to help write legible expressions:
MINUTE          = 60
HOUR            = (60 * $(MINUTE))
StateTimer      = (CurrentTime - EnteredCurrentState)
ActivityTimer   = (CurrentTime - EnteredCurrentActivity)
ActivationTimer = (CurrentTime - JobStart)
LastCkpt        = (CurrentTime - LastPeriodicCheckpoint)

NonCondorLoadAvg        = (LoadAvg - CondorLoadAvg)
BackgroundLoad          = 0.3
HighLoad                = 0.5
StartIdleTime           = 15 * $(MINUTE)
ContinueIdleTime        = 5 * $(MINUTE)
MaxSuspendTime          = 10 * $(MINUTE)
MaxVacateTime           = 10 * $(MINUTE)

KeyboardBusy            = KeyboardIdle < $(MINUTE)
ConsoleBusy             = (ConsoleIdle  < $(MINUTE))
CPUIdle                = $(NonCondorLoadAvg) <= $(BackgroundLoad)
CPUBusy                = $(NonCondorLoadAvg) >= $(HighLoad)
KeyboardNotBusy         = ($(KeyboardBusy) == False)
MachineBusy             = ($(CPUBusy) || $(KeyboardBusy)

Macros are defined to want to suspend jobs (instead of killing them) in the case of jobs that use little memory, when the keyboard is not being used, and for vanilla universe jobs. We want to gracefully vacate jobs which have been running for more than 10 minutes or are vanilla universe jobs.

WANT_SUSPEND       = ( $(SmallJob) || $(KeyboardNotBusy) \
                       || $(IsVanilla) )
WANT_VACATE        = ( $(ActivationTimer) > 10 * $(MINUTE) \
                       || $(IsVanilla) )

Finally, definitions of the actual expressions. Start a job if the keyboard has been idle long enough and the load average is low enough OR the machine is currently running a Condor job. Note that Condor would only run one job at a time. It just may prefer to run a different job, as defined by the machine rank or user priorities.

START        = ( (KeyboardIdle > $(StartIdleTime)) \
                  && ( $(CPUIdle) || \
                       (State != "Unclaimed" && State != "Owner")) )

Suspend a job if the keyboard has been touched. Alternatively, suspend if the CPU has been busy for more than two minutes and the job has been running for more than 90 seconds.

SUSPEND         = ( $(KeyboardBusy) || \
                 ( (CpuBusyTime > 2 * $(MINUTE)) \
                    && $(ActivationTimer) > 90 ) )

Continue a suspended job if the CPU is idle, the Keyboard has been idle for long enough, and the job has been suspended more than 10 seconds.

CONTINUE        = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
                  && (KeyboardIdle > $(ContinueIdleTime)) )

There are two conditions that signal preemption. The first condition is if the job is suspended, but it has been suspended too long. The second condition is if suspension is not desired and the machine is busy.

PREEMPT	        = ( ((Activity == "Suspended") && \
                    ($(ActivityTimer) > $(MaxSuspendTime))) \
                    || (SUSPEND && (WANT_SUSPEND == False)) )

Do not give jobs any time to retire on their own when they are about to be preempted.

MAXJOBRETIREMENTTIME = 0

Kill jobs that take too long leaving gracefully.

KILL            = $(ActivityTimer) > $(MaxVacateTime)

Finally, specify periodic checkpointing. For jobs smaller than 60 Mbytes, do a periodic checkpoint every 6 hours. For larger jobs, only checkpoint every 12 hours.

PERIODIC_CHECKPOINT     = ( (ImageSize < 60000) && \
                            ($(LastCkpt) > (6 * $(HOUR))) ) || \ 
                          ( $(LastCkpt) > (12 * $(HOUR)) )

At UW-Madison, we have a fast network. We simplify our expression considerably to

PERIODIC_CHECKPOINT     = $(LastCkpt) > (3 * $(HOUR))

For reference, the entire set of policy settings are included once more without comments:

##  These macros are here to help write legible expressions:
MINUTE          = 60
HOUR            = (60 * $(MINUTE))
StateTimer      = (CurrentTime - EnteredCurrentState)
ActivityTimer   = (CurrentTime - EnteredCurrentActivity)
ActivationTimer = (CurrentTime - JobStart)
LastCkpt        = (CurrentTime - LastPeriodicCheckpoint)

NonCondorLoadAvg        = (LoadAvg - CondorLoadAvg)
BackgroundLoad          = 0.3
HighLoad                = 0.5
StartIdleTime           = 15 * $(MINUTE)
ContinueIdleTime        = 5 * $(MINUTE)
MaxSuspendTime          = 10 * $(MINUTE)
MaxVacateTime           = 10 * $(MINUTE)

KeyboardBusy            = KeyboardIdle < $(MINUTE)
ConsoleBusy             = (ConsoleIdle  < $(MINUTE))
CPUIdle                = $(NonCondorLoadAvg) <= $(BackgroundLoad)
CPUBusy                = $(NonCondorLoadAvg) >= $(HighLoad)
KeyboardNotBusy         = ($(KeyboardBusy) == False)
MachineBusy             = ($(CPUBusy) || $(KeyboardBusy)

WANT_SUSPEND       = ( $(SmallJob) || $(KeyboardNotBusy) \
                       || $(IsVanilla) )
WANT_VACATE        = ( $(ActivationTimer) > 10 * $(MINUTE) \
                       || $(IsVanilla) )
START        = ( (KeyboardIdle > $(StartIdleTime)) \
                  && ( $(CPUIdle) || \
                       (State != "Unclaimed" && State != "Owner")) )
SUSPEND         = ( $(KeyboardBusy) || \
                 ( (CpuBusyTime > 2 * $(MINUTE)) \
                    && $(ActivationTimer) > 90 ) )
CONTINUE        = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
                  && (KeyboardIdle > $(ContinueIdleTime)) )
PREEMPT	        = ( ((Activity == "Suspended") && \
                    ($(ActivityTimer) > $(MaxSuspendTime))) \
                    || (SUSPEND && (WANT_SUSPEND == False)) )
MAXJOBRETIREMENTTIME = 0
KILL            = $(ActivityTimer) > $(MaxVacateTime)
PERIODIC_CHECKPOINT     = ( (ImageSize < 60000) && \
                            ($(LastCkpt) > (6 * $(HOUR))) ) || \ 
                          ( $(LastCkpt) > (12 * $(HOUR)) )

3.5.9.2 Test-job Policy Example

This example shows how the default macros can be used to set up a machine for running test jobs from a specific user. Suppose we want the machine to behave normally, except if user coltrane submits a job. In that case, we want that job to start regardless of what is happening on the machine. We do not want the job suspended, vacated or killed. This is reasonable if we know coltrane is submitting very short running programs for testing purposes. The jobs should be executed right away. This works with any machine (or the whole pool, for that matter) by adding the following 5 expressions to the existing configuration:

        START      = ($(START)) || Owner == "coltrane"
        SUSPEND    = ($(SUSPEND)) && Owner != "coltrane"
        CONTINUE   = $(CONTINUE)
        PREEMPT    = ($(PREEMPT)) && Owner != "coltrane"
        KILL       = $(KILL)

Notice that there is nothing special in either the CONTINUE or KILL expressions. If Coltrane's jobs never suspend, they never look at CONTINUE. Similarly, if they never preempt, they never look at KILL.

3.5.9.3 Time of Day Policy

Condor can be configured to only run jobs at certain times of the day. In general, we discourage configuring a system like this, since you can often get lots of good cycles out of machines, even when their owners say ``I'm always using my machine during the day.'' However, if you submit mostly vanilla jobs or other jobs that cannot checkpoint, it might be a good idea to only allow the jobs to run when you know the machines will be idle and when they will not be interrupted.

To configure this kind of policy, you should use the ClockMin and ClockDay attributes, defined in section 3.5.1 on ``Startd ClassAd Attributes''. These are special attributes which are automatically inserted by the condor_startd into its ClassAd, so you can always reference them in your policy expressions. ClockMin defines the number of minutes that have passed since midnight. For example, 8:00am is 8 hours after midnight, or 8 * 60 minutes, or 480. 5:00pm is 17 hours after midnight, or 17 * 60, or 1020. ClockDay defines the day of the week, Sunday = 0, Monday = 1, and so on.

To make the policy expressions easy to read, we recommend using macros to define the time periods when you want jobs to run or not run. For example, assume regular ``work hours'' at your site are from 8:00am until 5:00pm, Monday through Friday:

WorkHours = ( (ClockMin >= 480 && ClockMin < 1020) && \
              (ClockDay > 0 && ClockDay < 6) ) 
AfterHours = ( (ClockMin < 480 || ClockMin >= 1020) || \
               (ClockDay == 0 || ClockDay == 6) )

Of course, you can fine-tune these settings by changing the definition of AfterHours and WorkHours for your site.

Assuming you are using the default policy expressions discussed above, there are only a few minor changes required to force Condor jobs to stay off of your machines during work hours:

# Only start jobs after hours.
START = $(AfterHours) && $(CPUIdle) && KeyboardIdle > $(StartIdleTime)

# Consider the machine busy during work hours, or if the keyboard or
# CPU are busy.
MachineBusy = ( $(WorkHours) || $(CPUBusy) || $(KeyboardBusy) )

By default, the MachineBusy macro is used to define the SUSPEND and PREEMPT expressions. If you have changed these expressions at your site, you will need to add $(WorkHours) to your SUSPEND and PREEMPT expressions as appropriate.

Depending on your site, you might also want to avoid suspending jobs during work hours, so that in the morning, if a job is running, it will be immediately preempted, instead of being suspended for some length of time:

WANT_SUSPEND = $(AfterHours)

3.5.9.4 Desktop/Non-Desktop Policy

Suppose you have two classes of machines in your pool: desktop machines and dedicated cluster machines. In this case, you might not want keyboard activity to have any effect on the dedicated machines. For example, when you log into these machines to debug some problem, you probably do not want a running job to suddenly be killed. Desktop machines, on the other hand, should do whatever is necessary to remain responsive to the user.

There are many ways to achieve the desired behavior. One way is to make a standard desktop policy and a standard non-desktop policy and to copy the desired one into the local configuration file for each machine. Another way is to define one standard policy (in condor_config) with a simple toggle that can be set in the local configuration file. The following example illustrates the latter approach.

For ease of use, an entire policy is included in this example. Some of the expressions are just the usual default settings.

# If "IsDesktop" is configured, make it an attribute of the machine ClassAd.
STARTD_ATTRS = IsDesktop

# Only consider starting jobs if:
# 1) the load average is low enough OR the machine is currently
#    running a Condor job
# 2) AND the user is not active (if a desktop)
START = ( ($(CPUIdle) || (State != "Unclaimed" && State != "Owner")) \
          && (IsDesktop =!= True || (KeyboardIdle > $(StartIdleTime))) )

# Suspend (instead of vacating/killing) for the following cases:
WANT_SUSPEND = ( $(SmallJob) || $(JustCpu) \
                 || $(IsVanilla) )

# When preempting, vacate (instead of killing) in the following cases:
WANT_VACATE  = ( $(ActivationTimer) > 10 * $(MINUTE) \
                 || $(IsVanilla) )

# Suspend jobs if:
# 1) The CPU has been busy for more than 2 minutes, AND
# 2) the job has been running for more than 90 seconds
# 3) OR suspend if this is a desktop and the user is active
SUSPEND = ( ((CpuBusyTime > 2 * $(MINUTE)) && ($(ActivationTimer) > 90)) \
            || ( IsDesktop =?= True && $(KeyboardBusy) ) )

# Continue jobs if:
# 1) the CPU is idle, AND 
# 2) we've been suspended more than 5 minutes AND
# 3) the keyboard has been idle for long enough (if this is a desktop)
CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 300) \
             && (IsDesktop =!= True || (KeyboardIdle > $(ContinueIdleTime))) )

# Preempt jobs if:
# 1) The job is suspended and has been suspended longer than we want
# 2) OR, we don't want to suspend this job, but the conditions to
#    suspend jobs have been met (someone is using the machine)
PREEMPT = ( ((Activity == "Suspended") && \
            ($(ActivityTimer) > $(MaxSuspendTime))) \
           || (SUSPEND && (WANT_SUSPEND == False)) )

# Replace 0 in the following expression with whatever amount of
# retirement time you want dedicated machines to provide.  The other part
# of the expression forces the whole expression to 0 on desktop
# machines.
MAXJOBRETIREMENTTIME = (IsDesktop =!= True) * 0

# Kill jobs if they have taken too long to vacate gracefully
KILL = $(ActivityTimer) > $(MaxVacateTime)

With this policy in condor_config, the local configuration files for desktops can be easily configured with the following line:

IsDesktop = True

In all other cases, the default policy described above will ignore keyboard activity.

3.5.9.5 Disabling Preemption

Preemption can result in jobs being killed by Condor. When this happens, the jobs remain in the queue and will be automatically rescheduled. We highly recommend designing jobs that work well in this environment, rather than simply disabling preemption.

Planning for preemption makes jobs more robust in the face of other sources of failure. One way to live happily with preemption is to use Condor's standard universe, which provides the ability to produce checkpoints. If a job is incompatible with the requirements of standard universe, the job can still gracefully shutdown and restart by intercepting the soft kill signal.

All that being said, there may be cases where it is appropriate to force Condor to never kill jobs within some upper time limit. This can be achieved with the following policy in the configuration of the execute nodes:

# When we want to kick a job off, let it run uninterrupted for
# up to 2 days before forcing it to vacate.
MAXJOBRETIREMENTTIME = $(HOUR) * 24 * 2

Construction of this expression may be more complicated. For example, it could provide a different retirement time to different users or different types of jobs. Also be aware that the job may come with its own definition of MaxJobRetirementTime, but this may only cause less retirement time to be used, never more than what the machine offers.

The longer the retirement time that is given, the slower reallocation of resources in the pool can become if there are long-running jobs. However, by preventing jobs from being killed, you may decrease the number of cycles that are wasted on non-checkpointable jobs that are killed. That is the basic trade off.

Note that the use of MAXJOBRETIREMENTTIME limits the killing of jobs, but it does not prevent the preemption of resource claims. Therefore, it is technically not a way of disabling preemption, but simply a way of forcing preempting claims to wait until an existing job finishes or runs out of time. In other words, it limits the preemption of jobs but not the preemption of claims.

Limiting the preemption of jobs is often more desirable than limiting the preemption of resource claims. However, if you really do want to limit the preemption of resource claims, the following policy may be used. Some of these settings apply to the execute node and some apply to the central manager, so this policy should be configured so that it is read by both.

#Disable preemption by machine activity.
PREEMPT = False
#Disable preemption by user priority.
PREEMPTION_REQUIREMENTS = False
#Disable preemption by machine RANK by ranking all jobs equally.
RANK = 0
#Since we are disabling claim preemption, we
# may as well optimize negotiation for this case:
NEGOTIATOR_CONSIDER_PREEMPTION = False

Be aware of the consequences of this policy. Without any preemption of resource claims, once the condor_negotiator gives the condor_schedd a match to a machine, the condor_schedd may hold onto this claim indefinitely, as long as the user keeps supplying more jobs to run. If this is not desired, force claims to be retired after some amount of time using CLAIM_WORKLIFE . This enforces a time limit, beyond which no new jobs may be started on an existing claim; therefore the condor_schedd daemon is forced to go back to the condor_negotiator to request a new match, if there is still more work to do. Example execute machine configuration to include in addition to the example above:

# after 20 minutes, schedd must renegotiate to run
# additional jobs on the machine
CLAIM_WORKLIFE = 1200

Also be aware that in all versions of Condor prior to 6.8.1, it is not advisable to set NEGOTIATOR_CONSIDER_PREEMPTION to False, because of a bug that can lead to some machines never being matched to jobs.

3.5.9.6 Job Suspension

As new jobs are submitted that receive a higher priority than currently executing jobs, the executing jobs may be preempted. If the preempted jobs are not capable of writing checkpoints, they lose whatever forward progress they have made, and are sent back to the job queue to await starting over again as another machine becomes available. An alternative to this is to use suspension to freeze the job while some other task runs, and then unfreeze it so that it can continue on from where it left off. This does not require any special handling in the job, unlike most strategies that take checkpoints. However, it does require a special configuration of Condor. This example implements a policy that allows the job to decide whether it should be evicted or suspended. The jobs announce their choice through the use of the invented job ClassAd attribute IsSuspendableJob, that is also utilized in the configuration.

The implementation of this policy utilizes two categories of slots, identified as suspendable or nonsuspendable. A job identifies which category of slot it wishes to run on. This affects two aspects of the policy:

Of two jobs that might run on a slot, which job is chosen. The four cases that may occur depend on whether the currently running job identifies itself as suspendable or nonsuspendable, and whether the potentially running job identifies itself as suspendable or nonsuspendable.
1. If the currently running job is one that identifies itself as suspendable, and the potentially running job identifies itself as nonsuspendable, the currently running job is suspended, in favor of running the nonsuspendable one. This occurs independent of the user priority of the two jobs.
2. If both the currently running job and the potentially running job identify themselves as suspendable, then the relative priorities of the users and the preemption policy determines whether the new job will replace the existing job.
3. If both the currently running job and the potentially running job identify themselves as nonsuspendable, then the relative priorities of the users and the preemption policy determines whether the new job will replace the existing job.
4. If the currently running job is one that identifies itself as nonsuspendable, and the potentially running job identifies itself as suspendable, the currently running job continues running.
What happens to a currently running job that is preempted. A job that identifies itself as suspendable will be suspended, which means it is frozen in place, and will later be unfrozen when the preempting job is finished. A job that identifies itself as nonsuspendable is evicted, which means it writes a checkpoint, when possible, and then is killed. The job will return to the idle state in the job queue, and it can try to run again in the future.

# Lie to Condor, to achieve 2 slots for each real slot
NUM_CPUS = $(DETECTED_CORES)*2
# There is no good way to tell Condor that the two slots should be treated
# as though they share the same real memory, so lie about how much
# memory we have.
MEMORY = $(DETECTED_MEMORY)*2

# Slots 1 through DETECTED_CORES are nonsuspendable and the rest are
# suspendable
IsSuspendableSlot = SlotID > $(DETECTED_CORES)

# If I am a suspendable slot, my corresponding nonsuspendable slot is
# my SlotID plus $(DETECTED_CORES)
NonSuspendableSlotState = eval(strcat("slot",SlotID-$(DETECTED_CORES),"_State")

# The above expression looks at slotX_State, so we need to add
# State to the list of slot attributes to advertise.
STARTD_SLOT_ATTRS = $(STARTD_SLOT_ATTRS) State

# For convenience, advertise these expressions in the machine ad.
STARTD_ATTRS = $(STARTD_ATTRS) IsSuspendableSlot NonSuspendableSlotState

MyNonSuspendableSlotIsIdle = \
  (NonSuspendableSlotState =!= "Claimed" && NonSuspendableSlotState =!= "Preempting")

# NonSuspendable slots are always willing to start jobs.
# Suspendable slots are only willing to start if the NonSuspendable slot is idle.
START = \
  IsSuspendableSlot!=True && IsSuspendableJob=!=True || \
  IsSuspendableSlot && IsSuspendableJob==True && $(MyNonSuspendableSlotIsIdle)

# Suspend the suspendable slot if the other slot is busy.
SUSPEND = \
  IsSuspendableSlot && $(MyNonSuspendableSlotIsIdle)!=True

WANT_SUSPEND = $(SUSPEND)

CONTINUE = ($(SUSPEND)) != True

Note that in this example, the job ClassAd attribute IsSuspendableJob has no special meaning to Condor. It is an invented name chosen for this example. To take advantage of the policy, a job that wishes to be suspended must submit the job so that this attribute is defined. The following line should be placed in the job's submit description file:

+IsSuspendableJob = True

Next: 3.6 Security Up: 3. Administrators' Manual Previous: 3.4 User Priorities and Contents Index

htcondor-admin@cs.wisc.edu