This is an outdated version of the HTCondor Manual. You can find current documentation at http://htcondor.org/manual.
next up previous contents index
Next: 3.11 The High Availability Up: 3. Administrators' Manual Previous: 3.9 DaemonCore   Contents   Index

Subsections


3.10 Pool Management

HTCondor provides administrative tools to help with pool management. This section describes some of these tasks.

All of the commands described in this section are subject to the security policy chosen for the HTCondor pool. As such, the commands must be either run from a machine that has the proper authorization, or run by a user that is authorized to issue the commands. Section 3.6 on page [*] details the implementation of security in HTCondor.


3.10.1 Upgrading - Installing a New Version on an Existing Pool

An upgrade changes the running version of HTCondor from the current installation to a newer version. The safe method to install and start running a newer version of HTCondor in essence is: shut down the current installation of HTCondor, install the newer version, and then restart HTCondor using the newer version. To allow for falling back to the current version, place the new version in a separate directory. Copy the existing configuration files, and modify the copy to point to and use the new version, as well as incorporate any configuration variables that are new or changed in the new version. Set the CONDOR_CONFIG environment variable to point to the new copy of the configuration, so the new version of HTCondor will use the new configuration when restarted.

When upgrading from a version of HTCondor earlier than 6.8 to more recent version, note that the configuration settings must be modified for security reasons. Specifically, the HOSTALLOW_WRITE configuration variable must be explicitly changed, or no jobs may be submitted, and error messages will be issued by HTCondor tools.

Another way to upgrade leaves HTCondor running. HTCondor will automatically restart itself if the condor_master binary is updated, and this method takes advantage of this. Download the newer version, placing it such that it does not overwrite the currently running version. With the download will be a new set of configuration files; update this new set with any specializations implemented in the currently running version of HTCondor. Then, modify the currently running installation by changing its configuration such that the path to binaries points instead to the new binaries. One way to do that (under Unix) is to use a symbolic link that points to the current HTCondor installation directory (for example, /opt/condor). Change the symbolic link to point to the new directory. If HTCondor is configured to locate its binaries via the symbolic link, then after the symbolic link changes, the condor_master daemon notices the new binaries and restarts itself. How frequently it checks is controlled by the configuration variable MASTER_CHECK_NEW_EXEC_INTERVAL , which defaults 5 minutes.

When the condor_master notices new binaries, it begins a graceful restart. On an execute machine, a graceful restart means that running jobs are preempted. Standard universe jobs will attempt to take a checkpoint. This could be a bottleneck if all machines in a large pool attempt to do this at the same time. If they do not complete within the cutoff time specified by the KILL policy expression (defaults to 10 minutes), then the jobs are killed without producing a checkpoint. It may be appropriate to increase this cutoff time, and a better approach may be to upgrade the pool in stages rather than all at once.

For universes other than the standard universe, jobs are preempted. If jobs have been guaranteed a certain amount of uninterrupted run time with MaxJobRetirementTime, then the job is not killed until the specified amount of retirement time has been exceeded (which is 0 by default). The first step of killing the job is a soft kill signal, which can be intercepted by the job so that it can exit gracefully, perhaps saving its state. If the job has not gone away once the KILL expression fires (10 minutes by default), then the job is forcibly hard-killed. Since the graceful shutdown of jobs may rely on shared resources such as disks where state is saved, the same reasoning applies as for the standard universe: it may be appropriate to increase the cutoff time for large pools, and a better approach may be to upgrade the pool in stages to avoid jobs running out of time.

Another time limit to be aware of is the configuration variable SHUTDOWN_GRACEFUL_TIMEOUT. This defaults to 30 minutes. If the graceful restart is not completed within this time, a fast restart ensues. This causes jobs to be hard-killed.


3.10.2 Shutting Down and Restarting an HTCondor Pool

Shutting Down HTCondor
There are a variety of ways to shut down all or parts of an HTCondor pool. All utilize the condor_off tool.

To stop a single execute machine from running jobs, the condor_off command specifies the machine by host name.

  condor_off -startd <hostname>
A running standard universe job will be allowed to take a checkpoint before the job is killed. A running job under another universe will be killed. If it is instead desired that the machine stops running jobs only after the currently executing job completes, the command is
  condor_off -startd -peaceful <hostname>
Note that this waits indefinitely for the running job to finish, before the condor_startd daemon exits.

Th shut down all execution machines within the pool,

  condor_off -all -startd

To wait indefinitely for each machine in the pool to finish its current HTCondor job, shutting down all of the execute machines as they no longer have a running job,

  condor_off -all -startd -peaceful

To shut down HTCondor on a machine from which jobs are submitted,

  condor_off -schedd <hostname>

If it is instead desired that the submit machine shuts down only after all jobs that are currently in the queue are finished, first disable new submissions to the queue by setting the configuration variable

  MAX_JOBS_SUBMITTED = 0
See instructions below in section 3.10.3 for how to reconfigure a pool. After the reconfiguration, the command to wait for all jobs to complete and shut down the submission of jobs is
  condor_off -schedd -peaceful <hostname>

Substitute the option -all for the host name, if all submit machines in the pool are to be shut down.

Restarting HTCondor, If HTCondor Daemons Are Not Running
If HTCondor is not running, perhaps because one of the condor_off commands was used, then starting HTCondor daemons back up depends on which part of HTCondor is currently not running.

If no HTCondor daemons are running, then starting HTCondor is a matter of executing the condor_master daemon. The condor_master daemon will then invoke all other specified daemons on that machine. The condor_master daemon executes on every machine that is to run HTCondor.

If a specific daemon needs to be started up, and the condor_master daemon is already running, then issue the command on the specific machine with

  condor_on -subsystem <subsystemname>
where <subsystemname> is replaced by the daemon's subsystem name. Or, this command might be issued from another machine in the pool (which has administrative authority) with
  condor_on <hostname> -subsystem <subsystemname>
where <subsystemname> is replaced by the daemon's subsystem name, and <hostname> is replaced by the host name of the machine where this condor_on command is to be directed.

Restarting HTCondor, If HTCondor Daemons Are Running
If HTCondor daemons are currently running, but need to be killed and newly invoked, the condor_restart tool does this. This would be the case for a new value of a configuration variable for which using condor_reconfig is inadequate.

To restart all daemons on all machines in the pool,

  condor_restart -all

To restart all daemons on a single machine in the pool,

  condor_restart <hostname>
where <hostname> is replaced by the host name of the machine to be restarted.


3.10.3 Reconfiguring an HTCondor Pool

To change a global configuration variable and have all the machines start to use the new setting, change the value within the file, and send a condor_reconfig command to each host. Do this with a single command,

  condor_reconfig -all

If the global configuration file is not shared among all the machines, as it will be if using a shared file system, the change must be made to each copy of the global configuration file before issuing the condor_reconfig command.

Issuing a condor_reconfig command is inadequate for some configuration variables. For those, a restart of HTCondor is required. Those configuration variables that require a restart are listed in section 3.3.1 on page [*]. The manual page for condor_restart is at  11.


3.10.4 Absent ClassAds

By default, HTCondor assumes that resources are transient: the condor_collector will discard ClassAds older than CLASSAD_LIFETIME seconds. Its default configuration value is 15 minutes, and as such, the default value for UPDATE_INTERVAL will pass three times before HTCondor forgets about a resource. In some pools, especially those with dedicated resources, this approach may make it unnecessarily difficult to determine what the composition of the pool ought to be, in the sense of knowing which machines would be in the pool, if HTCondor were properly functioning on all of them.

This assumption of transient machines can be modified by the use of absent ClassAds. When a machine ClassAd would otherwise expire, the condor_collector evaluates the configuration variable ABSENT_REQUIREMENTS against the machine ClassAd. If True, the machine ClassAd will be saved in a persistent manner and be marked as absent; this causes the machine to appear in the output of condor_status -absent. When the machine returns to the pool, its first update to the condor_collector will invalidate the absent machine ClassAd.

Absent ClassAds, like offline ClassAds, are stored to disk to ensure that they are remembered, even across condor_collector crashes. The configuration variable COLLECTOR_PERSISTENT_AD_LOG defines the file in which the ClassAds are stored, and replaces the no longer used variable OFFLINE_LOG. Absent ClassAds are retained on disk as maintained by the condor_collector for a length of time in seconds defined by the configuration variable ABSENT_EXPIRE_ADS_AFTER . A value of 0 for this variable means that the ClassAds are never discarded, and the default value is thirty days.

Absent ClassAds are only returned by the condor_collector and displayed when the -absent option to condor_status is specified, or when the absent machine ClassAd attribute is mentioned on the condor_status command line. This renders absent ClassAds invisible to the rest of the HTCondor infrastructure.

A daemon may inform the condor_collector that the daemon's ClassAd should not expire, but should be removed right away; the daemon asks for its ClassAd to be invalidated. It may be useful to place an invalidated ClassAd in the absent state, instead of having it removed as an invalidated ClassAd. An example of a ClassAd that could benefit from being absent is a system with an uninterruptible power supply that shuts down cleanly but unexpectedly as a result of a power outage. To cause all invalidated ClassAds to become absent instead of invalidated, set EXPIRE_INVALIDATED_ADS to True. Invalidated ClassAds will instead be treated as if they expired, including when evaluating ABSENT_REQUIREMENTS.


3.10.5 Monitoring

Information that the condor_collector collects can be used to monitor a pool. The condor_status command can be used to display snapshot of the current state of the pool. Monitoring systems can be set up to track the state over time, and they might go further, to alert the system administrator about exceptional conditions.

Prevalent in the Open Science Grid is the Ganglia monitoring system (http://ganglia.info/). Nagios (http://www.nagios.org/) is often used to provide alerts based on data from the Ganglia monitoring system. The condor_gangliad daemon provides an efficient way to take information from an HTCondor pool and supply it to the Ganglia monitoring system.


3.10.5.1 Ganglia

Support for Ganglia monitoring is integral to HTCondor. The condor_gangliad gathers up data as specified by its configuration, and it streamlines getting that data to the Ganglia monitoring system. Updates sent to Ganglia are done using the Ganglia shared libraries for efficiency.

If Ganglia is already deployed in the pool, the monitoring of HTCondor is enabled by running the condor_gangliad daemon on a single machine within the pool. If the machine chosen is the one running Ganglia's gmetad, then the HTCondor configuration consists of adding GANGLIAD to the definition of configuration variable DAEMON_LIST on that machine. It may be advantageous to run the condor_gangliad daemon on the same machine as is running the condor_collector daemon, because on a large pool with many ClassAds, there is likely to be less network traffic. If the condor_gangliad daemon is to run on a different machine than the one running Ganglia's gmetad, modify configuration variable GANGLIA_GSTAT_COMMAND to get the list of monitored hosts from the master gmond program.

If the pool does not use Ganglia, the pool can still be monitored by a separate server running Ganglia.

By default, the condor_gangliad will only propagate metrics to hosts that are already monitored by Ganglia. Set configuration variable GANGLIA_SEND_DATA_FOR_ALL_HOSTS to True to set up a Ganglia host to monitor a pool not monitored by Ganglia or have a heterogeneous pool where some hosts are not monitored. In this case, default graphs that Ganglia provides will not be present. However, the HTCondor metrics will appear.

On large pools, setting configuration variable GANGLIAD_PER_EXECUTE_NODE_METRICS to False will reduce the amount of data sent to Ganglia. The execute node data is the least important to monitor. One can also limit the amount of data by setting configuration variable GANGLIAD_REQUIREMENTS . Be aware that aggregate sums over the entire pool will not be accurate if this variable limits the ClassAds queried.

Metrics to be sent to Ganglia are specified in all files within the directory specified by configuration variable GANGLIAD_METRICS_CONFIG_DIR . Each file in the directory is read, and the format within each file is that of New ClassAds. Here is an example of a single metric definition given as a New ClassAd:

[
  Name   = "JobsSubmitted";
  Desc   = "Number of jobs submitted";
  Units  = "jobs";
  TargetType = "Scheduler";
]

A nice set of default metrics is in file: $(GANGLIAD_METRICS_CONFIG_DIR)/00_default_metrics.

Recognized metric attribute names and their use:

Name
The name of this metric, which corresponds to the ClassAd attribute name. Metrics published for the same machine must have unique names.

Value
A ClassAd expression that produces the value when evaluated. The default value is the value in the daemon ClassAd of the attribute with the same name as this metric.

Desc
A brief description of the metric. This string is displayed when the user holds the mouse over the Ganglia graph for the metric.

Verbosity
The integer verbosity level of this metric. Metrics with a higher verbosity level than that specified by configuration variable GANGLIA_VERBOSITY will not be published.

TargetType
A string containing a comma-separated list of daemon ClassAd types that this metric monitors. The specified values should match the value of MyType of the daemon ClassAd. In addition, there are special values that may be included. "Machine_slot1" may be specified to monitor the machine ClassAd for slot 1 only. This is useful when monitoring machine-wide attributes. The special value "ANY" matches any type of ClassAd.

Requirements
A boolean expression that may restrict how this metric is incorporated. It defaults to True, which places no restrictions on the collection of this ClassAd metric.

Title
The graph title used for this metric. The default is the metric name.

Group
A string specifying the name of this metric's group. Metrics are arranged by group within a Ganglia web page. The default is determined by the daemon type. Metrics in different groups must have unique names.

Cluster
A string specifying the cluster name for this metric. The default cluster name is taken from the configuration variable GANGLIAD_DEFAULT_CLUSTER .

Units
A string describing the units of this metric.

Scale
A scaling factor that is multiplied by the value of the Value attribute. The scale factor is used when the value is not in the basic unit or a human-interpretable unit. For example, duty cycle is commonly expressed as a percent, but the HTCondor value ranges from 0 to 1. So, duty cycle is scaled by 100. Some metrics are reported in Kbytes. Scaling by 1024 allows Ganglia to pick the appropriate units, such as number of bytes rather than number of Kbytes. When scaling by large values, converting to the "float" type is recommended.

Derivative
A boolean value that specifies if Ganglia should graph the derivative of this metric. Ganglia versions prior to 3.4 do not support this.

Type
A string specifying the type of the metric. Possible values are "double", "float", "int32", "uint32", "int16", "uint16", "int8", "uint8", and "string". The default is "string" for string values, The default is "int32" for integer values, The default is "float" for real values, The default is "int8" for boolean values. Integer values can be coerced to "float" or "double". This is especially important for values stored internally as 64-bit values.

Regex
This string value specifies a regular expression that matches attributes to be monitored by this metric. This is useful for dynamic attributes that cannot be enumerated in advance, because their names depend on dynamic information such as the users who are currently running jobs. When this is specified, one metric per matching attribute is created. The default metric name is the name of the matched attribute, and the default value is the value of that attribute. As usual, the Value expression may be used when the raw attribute value needs to be manipulated before publication. However, since the name of the attribute is not known in advance, a special ClassAd attribute in the daemon ClassAd is provided to allow the Value expression to refer to it. This special attribute is named Regex. Another special feature is the ability to refer to text matched by regular expression groups defined by parentheses within the regular expression. These may be substituted into the values of other string attributes such as Name and Desc. This is done by putting macros in the string values. "\\1" is replaced by the first group, "\\2" by the second group, and so on.

Aggregate
This string value specifies an aggregation function to apply, instead of publishing individual metrics for each daemon ClassAd. Possible values are "sum", "avg", "max", and "min".

AggregateGroup
When an aggregate function has been specified, this string value specifies which aggregation group the current daemon ClassAd belongs to. The default is the metric Name. This feature works like GROUP BY in SQL. The aggregation function produces one result per value of AggregateGroup. A single aggregate group would therefore be appropriate for a pool-wide metric. Example use of aggregate grouping: if one wished to publish the sum of an attribute across different types of slot ClassAds, one could make the metric name an expression that is unique to each type. The default AggregateGroup would be set accordingly. Note that the assumption is still that the result is a pool-wide metric, so by default it is associated with the condor_collector daemon's host. To group by machine and publish the result into the Ganglia page associated with each machine, one should make the AggregateGroup contain the machine name and override the default Machine attribute to be the daemon's machine name, rather than the condor_collector daemon's.

Machine
The name of the host associated with this metric. If configuration variable GANGLIAD_DEFAULT_MACHINE is not specified, the default is taken from the Machine attribute of the daemon ClassAd. If the daemon name is of the form name@hostname, this may indicate that there are multiple instances of HTCondor running on the same machine. To avoid the metrics from these instances overwriting each other, the default machine name is set to the daemon name in this case. For aggregate metrics, the default value of Machine will be the name of the condor_collector host.

IP
A string containing the IP address of the host associated with this metric. If GANGLIAD_DEFAULT_IP is not specified, the default is extracted from the MyAddress attribute of the daemon ClassAd. This value must be unique for each machine published to Ganglia. It need not be a valid IP address. If the value of Machine contains an "@" sign, the default IP value will be set to the same value as Machine in order to make the IP value unique to each instance of HTCondor running on the same host.


next up previous contents index
Next: 3.11 The High Availability Up: 3. Administrators' Manual Previous: 3.9 DaemonCore   Contents   Index