This is an outdated version of the HTCondor Manual. You can find current documentation at http://htcondor.org/manual.
next up previous contents index
Next: 3.13 Java Support Installation Up: 3. Administrators' Manual Previous: 3.11 The High Availability   Contents   Index

Subsections


3.12 Setting Up for Special Environments

The following sections describe how to set up Condor for use in special environments or configurations.


3.12.1 Using Condor with AFS

Configuration variables that allow machines to interact with and use a shared file system are given at section 3.3.7.

Limitations with AFS occur because Condor does not currently have a way to authenticate itself to AFS. This is true of the Condor daemons that would like to authenticate as the AFS user condor, and of the condor_shadow which would like to authenticate as the user who submitted the job it is serving. Since neither of these things can happen yet, there are special things to do when interacting with AFS. Some of this must be done by the administrator(s) installing Condor. Other things must be done by Condor users who submit jobs.


3.12.1.1 AFS and Condor for Administrators

The largest result from the lack of authentication with AFS is that the directory defined by the configuration variable LOCAL_DIR and its subdirectories log and spool on each machine must be either writable to unauthenticated users, or must not be on AFS. Making these directories writable a very bad security hole, so it is not a viable solution. Placing LOCAL_DIR onto NFS is acceptable. To avoid AFS, place the directory defined for LOCAL_DIR on a local partition on each machine in the pool. This implies running condor_configure to install the release directory and configure the pool, setting the LOCAL_DIR variable to a local partition. When that is complete, log into each machine in the pool, and run condor_init to set up the local Condor directory.

The directory defined by RELEASE_DIR, which holds all the Condor binaries, libraries, and scripts, can be on AFS. None of the Condor daemons need to write to these files. They only need to read them. So, the directory defined by RELEASE_DIR only needs to be world readable in order to let Condor function. This makes it easier to upgrade the binaries to a newer version at a later date, and means that users can find the Condor tools in a consistent location on all the machines in the pool. Also, the Condor configuration files may be placed in a centralized location. This is what we do for the UW-Madison's CS department Condor pool, and it works quite well.

Finally, consider setting up some targeted AFS groups to help users deal with Condor and AFS better. This is discussed in the following manual subsection. In short, create an AFS group that contains all users, authenticated or not, but which is restricted to a given host or subnet. These should be made as host-based ACLs with AFS, but here at UW-Madison, we have had some trouble getting that working. Instead, we have a special group for all machines in our department. The users here are required to make their output directories on AFS writable to any process running on any of our machines, instead of any process on any machine with AFS on the Internet.


3.12.1.2 AFS and Condor for Users

The condor_shadow daemon runs on the machine where jobs are submitted. It performs all file system access on behalf of the jobs. Because the condor_shadow daemon is not authenticated to AFS as the user who submitted the job, the condor_shadow daemon will not normally be able to write any output. Therefore the directories in which the job will be creating output files will need to be world writable; they need to be writable by non-authenticated AFS users. In addition, the program's stdout, stderr, log file, and any file the program explicitly opens will need to be in a directory that is world-writable.

An administrator may be able to set up special AFS groups that can make unauthenticated access to the program's files less scary. For example, there is supposed to be a way for AFS to grant access to any unauthenticated process on a given host. If set up, write access need only be granted to unauthenticated processes on the submit machine, as opposed to any unauthenticated process on the Internet. Similarly, unauthenticated read access could be granted only to processes running on the submit machine.

A solution to this problem is to not use AFS for output files. If disk space on the submit machine is available in a partition not on AFS, submit the jobs from there. While the condor_shadow daemon is not authenticated to AFS, it does run with the effective UID of the user who submitted the jobs. So, on a local (or NFS) file system, the condor_shadow daemon will be able to access the files, and no special permissions need be granted to anyone other than the job submitter. If the Condor daemons are not invoked as root however, the condor_shadow daemon will not be able to run with the submitter's effective UID, leading to a similar problem as with files on AFS.


3.12.2 Enabling the Transfer of Files Specified by a URL

Because staging data on the submit machine is not always efficient, Condor permits input files to be transferred from a location specified by a URL; likewise, output files may be transferred to a location specified by a URL. All transfers (both input and output) are accomplished by invoking a plug-in, an executable or shell script that handles the task of file transfer.

For transferring input files, URL specification is limited to jobs running under the vanilla universe and to a vm universe VM image file. The execute machine retrieves the files. This differs from the normal file transfer mechanism, in which transfers are from the machine where the job is submitted to the machine where the job is executed. Each file to be transferred by specifying a URL, causing a plug-in to be invoked, is specified separately in the job submit description file with the command transfer_input_files; see section 2.5.4 for details.

For transferring output files, either the entire output sandbox, which are all files produced or modified by the job as it executes, or a subset of these files, as specified by the submit description file command transfer_output_files are transferred to the directory specified by the URL. The URL itself is specified in the separate submit description file command output_destination; see section 2.5.4 for details. The plug-in is invoked once for each output file to be transferred.

Configuration identifies the availability of the one or more plug-in(s). The plug-ins must be installed and available on every execute machine that may run a job which might specify a URL, either for input or for output.

URL transfers are enabled by default in the configuration of execute machines. Disabling URL transfers is accomplished by setting

ENABLE_URL_TRANSFERS = FALSE

A comma separated list giving the absolute path and name of all available plug-ins is specified as in the example:

FILETRANSFER_PLUGINS = /opt/condor/plugins/wget-plugin, \
                       /opt/condor/plugins/hdfs-plugin, \
                       /opt/condor/plugins/custom-plugin

The condor_starter invokes all listed plug-ins to determine their capabilities. Each may handle one or more protocols (scheme names). The plug-in's response to invocation identifies which protocols it can handle. When a URL transfer is specified by a job, the condor_starter invokes the proper one to do the transfer. If more than one plugin is capable of handling a particular protocol, then the last one within the list given by FILETRANSFER_PLUGINS is used.

Condor assumes that all plug-ins will respond in specific ways. To determine the capabilities of the plug-ins as to which protocols they handle, the condor_starter daemon invokes each plug-in giving it the command line argument -classad. In response to invocation with this command line argument, the plug-in must respond with an output of three ClassAd attributes. The first two are fixed:

PluginVersion = "0.1"
PluginType = "FileTransfer"

The third ClassAd attribute is SupportedMethods. This attribute is a string containing a comma separated list of the protocols that the plug-in handles. So, for example

SupportedMethods = "http,ftp,file"
would identify that the three protocols described by http, ftp, and file are supported. These strings will match the protocol specification as given within a URL in a transfer_input_files command or within a URL in an output_destination command in a submit description file for a job.

When a job specifies a URL transfer, the plug-in is invoked, without the command line argument -classad. It will instead be given two other command line arguments. For the transfer of input file(s), the first will be the URL of the file to retrieve and the second will be the absolute path identifying where to place the transferred file. For the transfer of output file(s), the first will be the absolute path on the local machine of the file to transfer, and the second will be the URL of the directory and file name at the destination.

The plug-in is expected to do the transfer, exiting with status 0 if the transfer was successful, and a non-zero status if the transfer was not successful. When not successful, the job is placed on hold, and the job ClassAd attribute HoldReason will be set as appropriate for the job. The job ClassAd attribute HoldReasonSubCode will be set to the exit status of the plug-in.

As an example of the transfer of a subset of output files, assume that the submit description file contains

output_destination = url://server/some/directory/
transfer_output_files = foo, bar, qux
Condor invokes the plug-in that handles the url protocol three times. The directory delimiter (/ on Unix, and \ on Windows) is appended to the destination URL, such that the three (Unix) invocations of the plug-in will appear similar to
url_plugin /path/to/local/copy/of/foo url://server/some/directory//foo
url_plugin /path/to/local/copy/of/bar url://server/some/directory//bar
url_plugin /path/to/local/copy/of/qux url://server/some/directory//qux

Note that this functionality is not limited to a predefined set of protocols. New ones can be invented. As an invented example, the zkm transfer type writes random bytes to a file. The plug-in that handles zkm transfers would respond to invocation with the -classad command line argument with:

PluginVersion = "0.1"
PluginType = "FileTransfer"
SupportedMethods = "zkm"
And, then when a job requested that this plug-in be invoked, for the invented example:
transfer_input_files = zkm://128/r-data
the plug-in will be invoked with a first command line argument of zkm://128/r-data and a second command line argument giving the full path along with the file name r-data as the location for the plug-in to write 128 bytes of random data.

The transfer of output files in this manner was introduced in Condor version 7.6.0. Incompatibility and inability to function will result if the executables for the condor_starter and condor_shadow are versions earlier than Condor version 7.6.0. Here is the expected behavior for these cases that cannot be backward compatible.


3.12.3 Configuring Condor for Multiple Platforms

A single, global configuration file may be used for all platforms in a Condor pool, with only platform-specific settings placed in separate files. This greatly simplifies administration of a heterogeneous pool by allowing changes of platform-independent, global settings in one place, instead of separately for each platform. This is made possible by treating the LOCAL_CONFIG_FILE configuration variable as a list of files, instead of a single file. Of course, this only helps when using a shared file system for the machines in the pool, so that multiple machines can actually share a single set of configuration files.

With multiple platforms, put all platform-independent settings (the vast majority) into the regular condor_config file, which would be shared by all platforms. This global file would be the one that is found with the CONDOR_CONFIG environment variable, the user condor's home directory, or /etc/condor/condor_config.

Then set the LOCAL_CONFIG_FILE configuration variable from that global configuration file to specify both a platform-specific configuration file and optionally, a local, machine-specific configuration file (this parameter is described in section 3.3.3 on ``Condor-wide Configuration File Entries'').

The order of file specification in the LOCAL_CONFIG_FILE configuration variable is important, because settings in files at the beginning of the list are overridden if the same settings occur in files later within the list. So, if specifying the platform-specific file and then the machine-specific file, settings in the machine-specific file would override those in the platform-specific file (as is likely desired).


3.12.3.1 Utilizing a Platform-Specific Configuration File

The name of platform-specific configuration files may be specified by using the ARCH and OPSYS configuration variables, as are defined automatically by Condor. For example, for 32-bit Intel Windows 7 machines and 64-bit Intel Linux machines, the files ought to be named:

  condor_config.INTEL.WINDOWS
  condor_config.X86_64.LINUX

Then, assuming these files are in the directory defined by the ETC configuration macro, and machine-specific configuration files are in the same directory, named by each machine's host name, the LOCAL_CONFIG_FILE configuration macro should be:

LOCAL_CONFIG_FILE = $(ETC)/condor_config.$(ARCH).$(OPSYS), \
                    $(ETC)/$(HOSTNAME).local

Alternatively, when using AFS, an ``@sys link'' may be used to specify the platform-specific configuration file, and let AFS resolve this link differently on different systems. For example, consider a soft link named condor_config.platform that points to condor_config.@sys. In this case, the files might be named:

  condor_config.i386_linux2
  condor_config.platform -> condor_config.@sys

and the LOCAL_CONFIG_FILE configuration variable would be set to:

LOCAL_CONFIG_FILE = $(ETC)/condor_config.platform, \
                    $(ETC)/$(HOSTNAME).local


3.12.3.2 Platform-Specific Configuration File Settings

The configuration variables that are truly platform-specific are:

RELEASE_DIR
Full path to to the installed Condor binaries. While the configuration files may be shared among different platforms, the binaries certainly cannot. Therefore, maintain separate release directories for each platform in the pool. See section 3.3.3 on ``Condor-wide Configuration File Entries'' for details.

MAIL
The full path to the mail program. See section 3.3.3 on ``Condor-wide Configuration File Entries'' for details.

CONSOLE_DEVICES
Which devices in /dev should be treated as console devices. See section 3.3.10 on ``condor_startd Configuration File Entries'' for details.

DAEMON_LIST
Which daemons the condor_master should start up. The reason this setting is platform-specific is to distinguish the condor_kbdd. It is needed on many Linux and Windows machines, and it is not needed on other platforms. See section 3.3.9 on for details.

Reasonable defaults for all of these configuration variables will be found in the default configuration files inside a given platform's binary distribution (except the RELEASE_DIR, since the location of the Condor binaries and libraries is installation specific). With multiple platforms, use one of the condor_config files from either running condor_configure or from the <release_dir>/etc/examples/condor_config.generic file, take these settings out, save them into a platform-specific file, and install the resulting platform-independent file as the global configuration file. Then, find the same settings from the configuration files for any other platforms to be set up, and put them in their own platform-specific files. Finally, set the LOCAL_CONFIG_FILE configuration variable to point to the appropriate platform-specific file, as described above.

Not even all of these configuration variables are necessarily going to be different. For example, if an installed mail program understands the -s option in /usr/local/bin/mail on all platforms, the MAIL macro may be set to that in the global configuration file, and not define it anywhere else. For a pool with only Linux or Windows machines, the DAEMON_LIST will be the same for each, so there is no reason not to put that in the global configuration file.


3.12.3.3 Other Uses for Platform-Specific Configuration Files

It is certainly possible that an installation may want other configuration variables to be platform-specific as well. Perhaps a different policy is desired for one of the platforms. Perhaps different people should get the e-mail about problems with the different platforms. There is nothing hard-coded about any of this. What is shared and what should not shared is entirely configurable.

Since the LOCAL_CONFIG_FILE macro can be an arbitrary list of files, an installation can even break up the global, platform-independent settings into separate files. In fact, the global configuration file might only contain a definition for LOCAL_CONFIG_FILE, and all other configuration variables would be placed in separate files.

Different people may be given different permissions to change different Condor settings. For example, if a user is to be able to change certain settings, but nothing else, those settings may be placed in a file which was early in the LOCAL_CONFIG_FILE list, to give that user write permission on that file, then include all the other files after that one. In this way, if the user was trying to change settings she/he should not, they would simply be overridden.

This mechanism is quite flexible and powerful. For very specific configuration needs, they can probably be met by using file permissions, the LOCAL_CONFIG_FILE configuration variable, and imagination.


3.12.4 Full Installation of condor_compile

In order to take advantage of two major Condor features: checkpointing and remote system calls, users of the Condor system need to relink their binaries. Programs that are not relinked for Condor can run in Condor's ``vanilla'' universe just fine, however, they cannot checkpoint and migrate, or run on machines without a shared filesystem.

To relink your programs with Condor, we provide a special tool, condor_compile. As installed by default, condor_compile works with the following commands: gcc, g++, g77, cc, acc, c89, CC, f77, fort77, ld. On Solaris and Digital Unix, f90 is also supported. See the condor_compile(1) man page for details on using condor_compile.

However, you can make condor_compile work transparently with all commands on your system whatsoever, including make.

The basic idea here is to replace the system linker (ld) with the Condor linker. Then, when a program is to be linked, the condor linker figures out whether this binary will be for Condor, or for a normal binary. If it is to be a normal compile, the old ld is called. If this binary is to be linked for condor, the script performs the necessary operations in order to prepare a binary that can be used with condor. In order to differentiate between normal builds and condor builds, the user simply places condor_compile before their build command, which sets the appropriate environment variable that lets the condor linker script know it needs to do its magic.

In order to perform this full installation of condor_compile, the following steps need to be taken:

  1. Rename the system linker from ld to ld.real.
  2. Copy the condor linker to the location of the previous ld.
  3. Set the owner of the linker to root.
  4. Set the permissions on the new linker to 755.

The actual commands that you must execute depend upon the system that you are on. The location of the system linker (ld), is as follows:

	Operating System              Location of ld (ld-path)
	Linux                         /usr/bin
	Solaris 2.X                   /usr/ccs/bin
	OSF/1 (Digital Unix)          /usr/lib/cmplrs/cc

On these platforms, issue the following commands (as root), where ld-path is replaced by the path to your system's ld.

        mv /[ld-path]/ld /[ld-path]/ld.real
        cp /usr/local/condor/lib/ld /[ld-path]/ld
        chown root /[ld-path]/ld
        chmod 755 /[ld-path]/ld

If you remove Condor from your system latter on, linking will continue to work, since the condor linker will always default to compiling normal binaries and simply call the real ld. In the interest of simplicity, it is recommended that you reverse the above changes by moving your ld.real linker back to it's former position as ld, overwriting the condor linker.

NOTE: If you ever upgrade your operating system after performing a full installation of condor_compile, you will probably have to re-do all the steps outlined above. Generally speaking, new versions or patches of an operating system might replace the system ld binary, which would undo the full installation of condor_compile.


3.12.5 The condor_kbdd

The Condor keyboard daemon (condor_kbdd) monitors X events on machines where the operating system does not provide a way of monitoring the idle time of the keyboard or mouse. On UNIX platforms, it is needed to detect USB keyboard activity but otherwise is not needed. On Windows the condor_kbdd is the primary method of monitoring both keyboard and mouse idleness.

With the move of user sessions out of session 0 on Windows Vista, the condor_startd service is no longer able to listen to keyboard and mouse events as all services run in session 0. As such, any execute node will require condor_kbdd to accurately monitor and report system idle time. This is achieved by auto-starting the condor_kbdd whenever a user logs into the system. The daemon will run in an invisible window and should not be noticeable by the user except for a listing in the task manager. When the user logs out, the program is terminated by Windows. This change has been made even to pre-Vista Windows versions because it adds the capability of monitoring keyboard activity from multiple users.

To achieve the auto-start with user login, the Condor installer adds a condor_kbdd entry to the registry key at HKLM\Software\Microsoft\Windows\CurrentVersion\Run. On 64bit versions of Vista and higher, the entry is actually placed in HKLM\Software\Wow6432Node\Microsoft\Windows\CurrentVersion\Run. In instances where the condor_kbdd is unable to connect to the condor_startd on Windows XP SP2 or higher, it is likely because an exception was not properly added to the Windows firewall.

On UNIX, great measures have been taken to make this daemon as robust as possible, but the X window system was not designed to facilitate such a need, and thus is less then optimal on machines where many users log in and out on the console frequently.

In order to work with X authority, the system by which X authorizes processes to connect to X servers, the condor_kbdd needs to run with super user privileges. Currently, the daemon assumes that X uses the HOME environment variable in order to locate a file named .Xauthority, which contains keys necessary to connect to an X server. The keyboard daemon attempts to set this environment variable to various users home directories in order to gain a connection to the X server and monitor events. This may fail to work on your system, if you are using a non-standard approach. If the keyboard daemon is not allowed to attach to the X server, the state of a machine may be incorrectly set to idle when a user is, in fact, using the machine.

In some environments, the condor_kbdd will not be able to connect to the X server because the user currently logged into the system keeps their authentication token for using the X server in a place that no local user on the current machine can get to. This may be the case for AFS where the user's .Xauthority file is in an AFS home directory. There may also be cases where the condor_kbdd may not be run with super user privileges because of political reasons, but it is still desired to be able to monitor X activity. In these cases, change the XDM configuration in order to start up the condor_kbdd with the permissions of the currently logging in user. Although your situation may differ, if you are running X11R6.3, you will probably want to edit the files in /usr/X11R6/lib/X11/xdm. The .xsession file should have the keyboard daemon start up at the end, and the .Xreset file should have the keyboard daemon shut down. The -l option can be used to write the daemon's log file to a place where the user running the daemon has permission to write a file. We recommend something akin to $HOME/.kbdd.log, since this is a place where every user can write, and it will not get in the way. The -pidfile and -k options allow for easy shut down of the daemon by storing the process id in a file. It will be necessary to add lines to the XDM configuration that look something like:

  condor_kbdd -l $HOME/.kbdd.log -pidfile $HOME/.kbdd.pid

This will start the condor_kbdd as the user who is currently logging in and write the log to a file in the directory $HOME/.kbdd.log/. Also, this will save the process id of the daemon to ~/.kbdd.pid, so that when the user logs out, XDM can do:

  condor_kbdd -k $HOME/.kbdd.pid

This will shut down the process recorded in ~/.kbdd.pid and exit.

To see how well the keyboard daemon is working, review the log for the daemon and look for successful connections to the X server. If there are none, the condor_kbdd is unable to connect to the machine's X server.


3.12.6 Configuring The CondorView Server

The CondorView server is an alternate use of the condor_collector that logs information on disk, providing a persistent, historical database of pool state. This includes machine state, as well as the state of jobs submitted by users.

An existing condor_collector may act as the CondorView collector through configuration. This is the simplest situation, because the only change needed is to turn on the logging of historical information. The alternative of configuring a new condor_collector to act as the CondorView collector is slightly more complicated, while it offers the advantage that the same CondorView collector may be used for several pools as desired, to aggregate information into one place.

The following sections describe how to configure a machine to run a CondorView server and to configure a pool to send updates to it.


3.12.6.1 Configuring a Machine to be a CondorView Server

To configure the CondorView collector, a few configuration variables are added or modified for the condor_collector chosen to act as the CondorView collector. These configuration variables are described in section 3.3.16 on page [*]. Here are brief explanations of the entries that must be customized:

POOL_HISTORY_DIR
The directory where historical data will be stored. This directory must be writable by whatever user the CondorView collector is running as (usually the user condor). There is a configurable limit to the maximum space required for all the files created by the CondorView server called (POOL_HISTORY_MAX_STORAGE ).

NOTE: This directory should be separate and different from the spool or log directories already set up for Condor. There are a few problems putting these files into either of those directories.

KEEP_POOL_HISTORY
A boolean value that determines if the CondorView collector should store the historical information. It is False by default, and must be specified as True in the local configuration file to enable data collection.

Once these settings are in place in the configuration file for the CondorView server host, create the directory specified in POOL_HISTORY_DIR and make it writable by the user the CondorView collector is running as. This is the same user that owns the CollectorLog file in the log directory. The user is usually condor.

If using the existing condor_collector as the CondorView collector, no further configuration is needed. To run a different condor_collector to act as the CondorView collector, configure Condor to automatically start it.

If using a separate host for the CondorView collector, to start it, add the value COLLECTOR to DAEMON_LIST, and restart Condor on that host. To run the CondorView collector on the same host as another condor_collector, ensure that the two condor_collector daemons use different network ports. Here is an example configuration in which the main condor_collector and the CondorView collector are started up by the same condor_master daemon on the same machine. In this example, the CondorView collector uses port 12345.

  VIEW_SERVER = $(COLLECTOR)
  VIEW_SERVER_ARGS = -f -p 12345
  VIEW_SERVER_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/ViewServerLog"
  DAEMON_LIST = MASTER, NEGOTIATOR, COLLECTOR, VIEW_SERVER

For this change to take effect, restart the condor_master on this host. This may be accomplished with the condor_restart command, if the command is run with administrator access to the pool.


3.12.6.2 Configuring a Pool to Report to the CondorView Server

For the CondorView server to function, configure the existing collector to forward ClassAd updates to it. This configuration is only necessary if the CondorView collector is a different collector from the existing condor_collector for the pool. All the Condor daemons in the pool send their ClassAd updates to the regular condor_collector, which in turn will forward them on to the CondorView server.

Define the following configuration variable:

  CONDOR_VIEW_HOST = full.hostname[:portnumber]
where full.hostname is the full host name of the machine running the CondorView collector. The full host name is optionally followed by a colon and port number. This is only necessary if the CondorView collector is configured to use a port number other than the default.

Place this setting in the configuration file used by the existing condor_collector. It is acceptable to place it in the global configuration file. The CondorView collector will ignore this setting (as it should) as it notices that it is being asked to forward ClassAds to itself.

Once the CondorView server is running with this change, send a condor_reconfig command to the main condor_collector for the change to take effect, so it will begin forwarding updates. A query to the CondorView collector will verify that it is working. A query example:

  condor_status -pool condor.view.host[:portnumber]

A condor_collector may also be configured to report to multiple CondorView servers. The configuration variable CONDOR_VIEW_HOST can be given as a list of CondorView servers separated by commas and/or spaces.

The following demonstrates an example configuration for two CondorView servers, where both CondorView servers (and the condor_collector) are running on the same machine, localhost.localdomain:

VIEWSERV01 = $(COLLECTOR)
VIEWSERV01_ARGS = -f -p 12345 -local-name VIEWSERV01
VIEWSERV01_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/ViewServerLog01"
VIEWSERV01.POOL_HISTORY_DIR = $(LOCAL_DIR)/poolhist01
VIEWSERV01.KEEP_POOL_HISTORY = TRUE
VIEWSERV01.CONDOR_VIEW_HOST =

VIEWSERV02 = $(COLLECTOR)
VIEWSERV02_ARGS = -f -p 24680 -local-name VIEWSERV02
VIEWSERV02_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/ViewServerLog02"
VIEWSERV02.POOL_HISTORY_DIR = $(LOCAL_DIR)/poolhist02
VIEWSERV02.KEEP_POOL_HISTORY = TRUE
VIEWSERV02.CONDOR_VIEW_HOST =

CONDOR_VIEW_HOST = localhost.localdomain:12345 localhost.localdomain:24680
DAEMON_LIST = $(DAEMON_LIST) VIEWSERV01 VIEWSERV02

Note that the value of CONDOR_VIEW_HOST for VIEWSERV01 and VIEWSERV02 is unset, to prevent them from inheriting the global value of CONDOR_VIEW_HOST and attempting to report to themselves or each other. If the CondorView servers are running on different machines where there is no global value for CONDOR_VIEW_HOST, this precaution is not required.


3.12.7 Running Condor Jobs within a Virtual Machine

Condor jobs are formed from executables that are compiled to execute on specific platforms. This in turn restricts the machines within a Condor pool where a job may be executed. A Condor job may now be executed on a virtual machine system running VMware, Xen, or KVM. This allows Windows executables to run on a Linux machine, and Linux executables to run on a Windows machine.

In older versions of Condor, other parts of the system were also referred to as virtual machines, but in all cases, those are now known as slots. A virtual machine here describes the environment in which the outside operating system (called the host) emulates an inner operating system (called the inner virtual machine), such that an executable appears to run directly on the inner virtual machine. In other parts of Condor, a slot (formerly known as virtual machine) refers to the multiple CPUs of an SMP machine. Also, be careful not to confuse the virtual machines discussed here with the Java Virtual Machine (JVM) referenced in other parts of this manual.

Condor has the flexibility to run a job on either the host or the inner virtual machine, hence two platforms appear to exist on a single machine. Since two platforms are an illusion, Condor understands the illusion, allowing a Condor job to be execute on only one at a time.


3.12.7.1 Installation and Configuration

Condor must be separately installed, separately configured, and separately running on both the host and the inner virtual machine.

The configuration for the host specifies VMP_VM_LIST . This specifies host names or IP addresses of all inner virtual machines running on this host. An example configuration on the host machine:

VMP_VM_LIST = vmware1.domain.com, vmware2.domain.com

The configuration for each separate inner virtual machine specifies VMP_HOST_MACHINE . This specifies the host for the inner virtual machine. An example configuration on an inner virtual machine:

VMP_HOST_MACHINE = host.domain.com

Given this configuration, as well as communication between Condor daemons running on the host and on the inner virtual machine, the policy for when jobs may execute is set by Condor. While the host is executing a Condor job, the START policy on the inner virtual machine is overridden with False, so no Condor jobs will be started on the inner virtual machine. Conversely, while the inner virtual machine is executing a Condor job, the START policy on the host is overridden with False, so no Condor jobs will be started on the host.

The inner virtual machine is further provided with a new syntax for referring to the machine ClassAd attributes of its host. Any machine ClassAd attribute with a prefix of the string HOST_ explicitly refers to the host's ClassAd attributes. The START policy on the inner virtual machine ought to use this syntax to avoid starting jobs when its host is too busy processing other items. An example configuration for START on an inner virtual machine:

START = ( (KeyboardIdle > 150 ) && ( HOST_KeyboardIdle > 150 ) \
        && ( LoadAvg <= 0.3 ) && ( HOST_TotalLoadAvg <= 0.3 ) )


3.12.8 Configuring The condor_startd for SMP Machines

This section describes how to configure the condor_startd for SMP (Symmetric Multi-Processor) machines. Machines with more than one CPU may be configured to run more than one job at a time. As always, owners of the resources have great flexibility in defining the policy under which multiple jobs may run, suspend, vacate, etc.


3.12.8.1 How Shared Resources are Represented to Condor

The way SMP machines are represented to the Condor system is that the shared resources are broken up into individual slots. Each slot can be matched and claimed by users. Each slot is represented by an individual ClassAd (see the ClassAd reference, section 4.1, for details). In this way, each SMP machine will appear to the Condor system as a collection of separate slots. As an example, an SMP machine named vulture.cs.wisc.edu would appear to Condor as the multiple machines, named slot1@vulture.cs.wisc.edu, slot2@vulture.cs.wisc.edu, slot3@vulture.cs.wisc.edu, and so on.

The way that the condor_startd breaks up the shared system resources into the different slots is configurable. All shared system resources (like RAM, disk space, swap space, etc.) can either be divided evenly among all the slots, with each CPU getting its own slot, or you can define your own slot types, so that resources can be unevenly partitioned. Regardless of the partitioning scheme used, it is important to remember the goal is to create a representative slot ClassAd, to be used for matchmaking with jobs. Condor does not directly enforce slot shared resource allocations, and jobs are free to oversubscribe to shared resources.

Consider an example where two slots are each defined with 50%of available RAM. The resultant ClassAd for each slot will advertise one half the available RAM. Users may submit jobs with RAM requirements that match these slots. However, jobs run on either slot are free to consume more than 50%of available RAM. Condor will not directly enforce a RAM utilization limit on either slot. If a shared resource enforcement capability is needed, it is possible to write a Startd policy that will evict a job that oversubscribes to shared resources, see section 3.12.8.

The following section gives details on how to configure Condor to divide the resources on an SMP machine into separate slots.


3.12.8.2 Dividing System Resources in SMP Machines

This section describes the settings that allow you to define your own slot types and to control how many slots of each type are reported to Condor.

There are two main ways to go about partitioning an SMP machine:

Define your own slot types.
By defining your own types, you can specify what fraction of shared system resources (CPU, RAM, swap space and disk space) go to each slot. Once you define your own types, you can control how many of each type are reported at any given time.

Evenly divide all resources.
If you do not define your own types, the condor_startd will automatically partition your machine into slots for you. It will do so by placing a single CPU in each slot, and evenly dividing all shared resources among the slots. With this default partitioning, you only specify how many slots are reported at a time. By default, all slots are reported to Condor.

The number of each type being reported can be changed at run-time, by issuing a reconfiguration command to the condor_startd daemon (sending a SIGHUP or using condor_reconfig). However, the definitions for the types themselves cannot be changed with reconfiguration. If you change any slot type definitions, you must use condor_restart

condor_restart -startd
for that change to take effect.


3.12.8.3 Defining Slot Types

To define your own slot types, add configuration file parameters that list how much of each system resource you want in the given slot type. Do this by defining configuration variables of the form SLOT_TYPE_<N> . The <N> represents an integer (for example, SLOT_TYPE_1), which specifies the slot type defined. Note that there may be multiple slots of each type. The number created is configured with NUM_SLOTS_TYPE_<N> as described later in this section.

A type describes what share of the total system resources a given slot has available to it.

The type can be defined by:

A simple fraction or percentage causes an allocation of the total system resources. This includes the number of CPUs. A comma-separated list allows a fine-tuning of the amounts for specific attributes.

The attributes that specify the number of CPUs and the total amount of RAM in the SMP machine do not change. For these attributes, specify either absolute values or percentages of the total available amount (or auto). For example, in a machine with 128 Mbytes of RAM, all the following definitions result in the same allocation amount.

mem=64
mem=1/2
mem=50%
mem=auto

Other attributes are dynamic, such as disk space and swap space. For these, specify a percentage or fraction of the total value that is allocated to each slot, instead of specifying absolute values. As the total values of these resources change on your machine, each slot will take its fraction of the total and report that as its available amount.

The disk space allocated to each slot is taken from the disk partition containing the slots execute directory (configured with EXECUTE or SLOT<N>_EXECUTE ). If every slot is in a different partition, then each one may be defined with up to 100%for its disk share. If some slots are in the same partition, then their total is not allowed to exceed 100%.

The four attribute names are case insensitive when defining slot types. The first letter of the attribute name distinguishes between the attributes. The four attributes, with several examples of acceptable names for each are

As an example, consider a host of 4 CPUs and 256 megs of RAM. Here are valid example slot type definitions. Types 1-3 are all equivalent to each other, as are types 4-6. Note that in a real configuration, you would not use all of these slot types together because they add up to more than 100%of the various system resources. Also note that in a real configuration, you would need to also define NUM_SLOTS_TYPE_<N> for each slot type.

SLOT_TYPE_1 = cpus=2, ram=128, swap=25%, disk=1/2

SLOT_TYPE_2 = cpus=1/2, memory=128, virt=25%, disk=50%

SLOT_TYPE_3 = c=1/2, m=50%, v=1/4, disk=1/2

SLOT_TYPE_4 = c=25%, m=64, v=1/4, d=25%

SLOT_TYPE_5 = 25%

SLOT_TYPE_6 = 1/4

The default value for each resource share is auto. The share may also be explicitly set to auto. All slots with the value auto for a given type of resource will evenly divide whatever remains after subtracting out whatever was explicitly allocated in other slot definitions. For example, if one slot is defined to use 10%of the memory and the rest define it as auto (or leave it undefined), then the rest of the slots will evenly divide 90%of the memory between themselves.

In both of the following examples, the disk share is set to auto, cpus is 1, and everything else is 50%:

SLOT_TYPE_1 = cpus=1, ram=1/2, swap=50%

SLOT_TYPE_1 = cpus=1, disk=auto, 50%

The number of slots of each type is set with the configuration variable NUM_SLOTS_TYPE_<N> , where N is the type as given in the SLOT_TYPE_<N>variable.

Note that it is possible to set the configuration variables such that they specify an impossible configuration. If this occurs, the condor_startd daemon fails after writing a message to its log attempting to indicate the configuration requirements that it could not implement.


3.12.8.4 Evenly Divided Resources

If you are not defining your own slot types, then all resources are divided equally among the slots. The number of slots within the SMP machine is the only attribute that needs to be defined. Its definition is accomplished by setting the configuration variable NUM_SLOTS to the integer number of slots desired. If variable NUM_SLOTS is not defined, it defaults to the number of CPUs within the SMP machine. You cannot use NUM_SLOTS to make Condor advertise more slots than there are CPUs on the machine. To do that, use NUM_CPUS .


3.12.8.5 Configuring Startd Policy for SMP Machines

Section 3.5 details the Startd Policy Configuration. This section continues the discussion with respect to SMP machines.

Each slot within an SMP machine is treated as an independent machine, each with its own view of its machine state. There is a single set of policy expressions for the SMP machine as a whole. This policy may consider the slot state(s) in its expressions. This makes some policies easy to set, but it makes other policies difficult or impossible to set.

An easy policy to set configures how many of the slots notice console or tty activity on the SMP as a whole. Slots that are not configured to notice any activity will report ConsoleIdle and KeyboardIdle times from when the condor_startd daemon was started, (plus a configurable number of seconds). With this, you can set up a multiple CPU machine with the default policy settings plus add that the keyboard and console noticed by only one slot. Assuming a reasonable load average (see section 3.12.8 below on ``Load Average for SMP Machines''), only the one slot will suspend or vacate its job when the owner starts typing at their machine again. The rest of the slots could be matched with jobs and leave them running, even while the user was interactively using the machine. If the default policy is used, all slots notice tty and console activity and currently running jobs would suspend or preempt.

This example policy is controlled with the following configuration variables.

These configuration variables are fully described in section 3.3.10 on page [*] which lists all the configuration file settings for the condor_startd.

The configuration of slots allows each slot to advertise its own machine ClassAd. Yet, there is only one set of policy expressions for the SMP machine as a whole. This makes the implementation of certain types of policies impossible. While evaluating the state of one slot (within the SMP machine), the state of other slots (again within the SMP machine) are not available. Decisions for one slot cannot be based on what other machines within the SMP are doing.

Specifically, the evaluation of a slot policy expression works in the following way.

  1. The configuration file specifies policy expressions that are shared among all of the slots on the SMP machine.
  2. Each slot reads the configuration file and sets up its own machine ClassAd.
  3. Each slot is now separate from the others. It has a different state, a different machine ClassAd, and if there is a job running, a separate job ad. Each slot periodically evaluates the policy expressions, changing its own state as necessary. This occurs independently of the other slots on the machine. So, if the condor_startd daemon is evaluating a policy expression on a specific slot, and the policy expression refers to ProcID, Owner, or any attribute from a job ad, it always refers to the ClassAd of the job running on the specific slot.

To set a different policy for the slots within an SMP machine, a (SUSPEND) policy will be of the form

SUSPEND = ( (SlotID == 1) && (PolicyForSlot1) ) || \
            ( (SlotID == 2) && (PolicyForSlot2) )
where (PolicyForSlot1) and (PolicyForSlot2) are the desired expressions for each slot.


3.12.8.6 Load Average for SMP Machines

Most operating systems define the load average for an SMP machine as the total load on all CPUs. For example, if you have a 4-CPU machine with 3 CPU-bound processes running at the same time, the load would be 3.0 In Condor, we maintain this view of the total load average and publish it in all resource ClassAds as TotalLoadAvg.

Condor also provides a per-CPU load average for SMP machines. This nicely represents the model that each node on an SMP is a slot, separate from the other nodes. All of the default, single-CPU policy expressions can be used directly on SMP machines, without modification, since the LoadAvg and CondorLoadAvg attributes are the per-slot versions, not the total, SMP-wide versions.

The per-CPU load average on SMP machines is a Condor invention. No system call exists to ask the operating system for this value. Condor already computes the load average generated by Condor on each slot. It does this by close monitoring of all processes spawned by any of the Condor daemons, even ones that are orphaned and then inherited by init. This Condor load average per slot is reported as the attribute CondorLoadAvg in all resource ClassAds, and the total Condor load average for the entire machine is reported as TotalCondorLoadAvg. The total, system-wide load average for the entire machine is reported as TotalLoadAvg. Basically, Condor walks through all the slots and assigns out portions of the total load average to each one. First, Condor assigns the known Condor load average to each node that is generating load. If there's any load average left in the total system load, it is considered an owner load. Any slots Condor believes are in the Owner state (like ones that have keyboard activity), are the first to get assigned this owner load. Condor hands out owner load in increments of at most 1.0, so generally speaking, no slot has a load average above 1.0. If Condor runs out of total load average before it runs out of virtual machines, all the remaining machines believe that they have no load average at all. If, instead, Condor runs out of slots and it still has owner load remaining, Condor starts assigning that load to Condor nodes as well, giving individual nodes with a load average higher than 1.0.


3.12.8.7 Debug logging in the SMP Startd

This section describes how the condor_startd daemon handles its debugging messages for SMP machines. In general, a given log message will either be something that is machine-wide (like reporting the total system load average), or it will be specific to a given slot. Any log entrees specific to a slot have an extra header printed out in the entry: slot#:. So, for example, here's the output about system resources that are being gathered (with D_FULLDEBUG and D_LOAD turned on) on a 2-CPU machine with no Condor activity, and the keyboard connected to both slots:

11/25 18:15 Swap space: 131064
11/25 18:15 number of Kbytes available for (/home/condor/execute): 1345063
11/25 18:15 Looking up RESERVED_DISK parameter
11/25 18:15 Reserving 5120 Kbytes for file system
11/25 18:15 Disk space: 1339943
11/25 18:15 Load avg: 0.340000 0.800000 1.170000
11/25 18:15 Idle Time: user= 0 , console= 4 seconds
11/25 18:15 SystemLoad: 0.340   TotalCondorLoad: 0.000  TotalOwnerLoad: 0.340
11/25 18:15 slot1: Idle time: Keyboard: 0        Console: 4
11/25 18:15 slot1: SystemLoad: 0.340  CondorLoad: 0.000  OwnerLoad: 0.340
11/25 18:15 slot2: Idle time: Keyboard: 0        Console: 4
11/25 18:15 slot2: SystemLoad: 0.000  CondorLoad: 0.000  OwnerLoad: 0.000
11/25 18:15 slot1: State: Owner           Activity: Idle
11/25 18:15 slot2: State: Owner           Activity: Idle

If, on the other hand, this machine only had one slot connected to the keyboard and console, and the other slot was running a job, it might look something like this:

11/25 18:19 Load avg: 1.250000 0.910000 1.090000
11/25 18:19 Idle Time: user= 0 , console= 0 seconds
11/25 18:19 SystemLoad: 1.250   TotalCondorLoad: 0.996  TotalOwnerLoad: 0.254
11/25 18:19 slot1: Idle time: Keyboard: 0        Console: 0
11/25 18:19 slot1: SystemLoad: 0.254  CondorLoad: 0.000  OwnerLoad: 0.254
11/25 18:19 slot2: Idle time: Keyboard: 1496     Console: 1496
11/25 18:19 slot2: SystemLoad: 0.996  CondorLoad: 0.996  OwnerLoad: 0.000
11/25 18:19 slot1: State: Owner           Activity: Idle
11/25 18:19 slot2: State: Claimed         Activity: Busy

As you can see, shared system resources are printed without the header (like total swap space), and slot-specific messages (like the load average or state of each slot) get the special header appended.


3.12.8.8 Configuring STARTD_ATTRS on a per-slot basis

The STARTD_ATTRS (and legacy STARTD_EXPRS) settings can be configured on a per-slot basis. The condor_startd daemon builds the list of items to advertise by combining the lists in this order:

  1. STARTD_ATTRS
  2. STARTD_EXPRS
  3. SLOT<N>_STARTD_ATTRS
  4. SLOT<N>_STARTD_EXPRS

For example, consider the following configuration:

STARTD_ATTRS = favorite_color, favorite_season
SLOT1_STARTD_ATTRS = favorite_movie
SLOT2_STARTD_ATTRS = favorite_song

This will result in the condor_startd ClassAd for slot1 defining values for favorite_color, favorite_season, and favorite_movie. slot2 will have values for favorite_color, favorite_season, and favorite_song.

Attributes themselves in the STARTD_ATTRS list can also be defined on a per-slot basis. Here is another example:

favorite_color = "blue"
favorite_season = "spring"
STARTD_ATTRS = favorite_color, favorite_season
SLOT2_favorite_color = "green"
SLOT3_favorite_season = "summer"

For this example, the condor_startd ClassAds are

slot1:
favorite_color = "blue"
favorite_season = "spring"
slot2:
favorite_color = "green"
favorite_season = "spring"
slot3:
favorite_color = "blue"
favorite_season = "summer"


3.12.8.9 Dynamic condor_startd Provisioning: Dynamic Slots

Dynamic provisioning, also referred to as a partitionable condor_startd or as dynamic slots, allows users to mark slots as partitionable. This means that more than one job can occupy a single slot at any one time. Typically, slots have a fixed set of resources, including the CPUs, memory and disk space. By partitioning the slot, these resources become more flexible and able to be better utilized.

Dynamic provisioning provides powerful configuration possibilities, and so should be used with care. Specifically, while preemption occurs for each individual dynamic slot, it cannot occur directly for the partitionable slot, or for groups of dynamic slots. For example, for a large number of jobs requiring 1GB of memory, a pool might be split up into 1GB dynamic slots. In this instance a job requiring 2GB of memory will be starved and unable to run. A partial solution to this problem is provided by condor_defrag, which is discussed in section 3.12.8.

Here is an example that demonstrates how more than one job can be matched to a single slot using dynamic provisioning. In this example, slot1 has the following resources:

cpu=10
memory=10240
disk=BIG
Assume that JobA is allocated to this slot. JobA includes the following requirements:
cpu=3
memory=1024
disk=10240
The portion of the slot that is utilized is referred to as Slot1.1, and after allocation, the slot advertises that it has the following resources still available:
cpu=7
memory=9216
disk=BIG-10240
As each new job is allocated to Slot1, it breaks into Slot1.1, Slot1.2, etc., until the entire set of available resources have been consumed by jobs.

To enable dynamic provisioning, set the SLOT_TYPE_<N>_PARTITIONABLE configuration variable to True. The string N within the configuration variable name is the slot number.

In a pool using dynamic provisioning, jobs can have extra, and desired, resources specified in the submit description file:

request_cpus
request_memory
request_disk (in kilobytes)

This example shows a portion of the job submit description file for use when submitting a job to a pool with dynamic provisioning.

universe = vanilla

request_cpus = 3
request_memory = 1024
request_disk = 10240

queue

For each type of slot, the original, partitionable slot and the new smaller, dynamic slots, an attribute is added to identify it. The original slot, as defined at page [*], will have an attribute stating

  PartitionableSlot = True
and the dynamic slots will have an attribute, as defined at page [*],
  DynamicSlot = True
These attributes may be used in a START expression for the purposes of creating detailed policies.

A partitionable slot will always appear as though it is not running a job. It will eventually show as having no available resources, which will prevent further matching to new jobs. Because it has been effectively broken up into smaller slots, these will show as running jobs directly. These dynamic slots can also be preempted in the same way as nonpartitioned slots.


3.12.8.10 Defragmenting Dynamic Slots

When partitionable slots are used, some attention must be given to the problem of the starvation of large jobs due to the fragmentation of resources. The problem is that over time the machine resources may become partitioned into slots suitable for running small jobs. If a sufficient number of these slots do not happen to become idle at the same time on a machine, then a large job will not be able to claim that machine, even if the large job has a better priority than the small jobs.

One way of addressing the partitionable slot fragmentation problem is to periodically drain all jobs from fragmented machines so that they become defragmented. The condor_defrag daemon implements a configurable policy for doing that. To use this daemon, DEFRAG must be added to DAEMON_LIST, and the defragmentation policy must be configured. Typically, only one instance of the condor_defrag daemon would be run per pool. It is a lightweight daemon that should not require a lot of system resources.

Here is an example configuration that puts the condor_defrag daemon to work:

DAEMON_LIST = $(DAEMON_LIST) DEFRAG
DEFRAG_INTERVAL = 3600
DEFRAG_DRAINING_MACHINES_PER_HOUR = 1.0
DEFRAG_MAX_WHOLE_MACHINES = 20
DEFRAG_MAX_CONCURRENT_DRAINING = 10

This example policy tells condor_defrag to initiate draining jobs from 1 machine per hour, but to avoid initiating new draining if there are 20 completely defragmented machines or 10 machines in a draining state. A full description of each configuration variable used by the condor_defrag daemon may be found in section 3.3.37.

By default, when a machine is drained, existing jobs are gracefully evicted. This means that each job will be allowed to use the remaining time promised to it by MaxJobRetirementTime. If the job has not finished when the retirement time runs out, the job will be killed with a soft kill signal, so that it has an opportunity to save a checkpoint (if the job supports this). No new jobs will be allowed to start while the machine is draining. To reduce unused time on the machine caused by some jobs having longer retirement time than others, the eviction of jobs with shorter retirement time is delayed until the job with the longest retirement time needs to be evicted.

There is a trade off between reduced starvation and throughput. Frequent draining of machines reduces the chance of starvation of large jobs. However, frequent draining reduces total throughput. Some of the machine's resources may go unused during draining, if some jobs finish before others. If jobs that cannot produce checkpoints are killed because they run past the end of their retirement time during draining, this also adds to the cost of draining.

To help gauge the costs of draining, the condor_startd advertises the accumulated time that was unused due to draining and the time spent by jobs that were killed due to draining. These are advertised respectively in the attributes TotalMachineDrainingUnclaimedTime and TotalMachineDrainingBadput. The condor_defrag daemon averages these values across the pool and advertises the result in its daemon ClassAd in the attributes AvgDrainingBadput and AvgDrainingUnclaimed. Details of all attributes published by the condor_defrag daemon are described in section 11.

The following command may be used to view the condor_defrag daemon ClassAd:

condor_status -l -any -constraint 'MyType == "Defrag"'


3.12.9 Condor's Dedicated Scheduling

The dedicated scheduler is a part of the condor_schedd that handles the scheduling of parallel jobs that require more than one machine concurrently running per job. MPI applications are a common use for the dedicated scheduler, but parallel applications which do not require MPI can also be run with the dedicated scheduler. All jobs which use the parallel universe are routed to the dedicated scheduler within the condor_schedd they were submitted to. A default Condor installation does not configure a dedicated scheduler; the administrator must designate one or more condor_schedd daemons to perform as dedicated scheduler.


3.12.9.1 Selecting and Setting Up a Dedicated Scheduler

We recommend that you select a single machine within a Condor pool to act as the dedicated scheduler. This becomes the machine from upon which all users submit their parallel universe jobs. The perfect choice for the dedicated scheduler is the single, front-end machine for a dedicated cluster of compute nodes. For the pool without an obvious choice for a submit machine, choose a machine that all users can log into, as well as one that is likely to be up and running all the time. All of Condor's other resource requirements for a submit machine apply to this machine, such as having enough disk space in the spool directory to hold jobs. See section 3.2.2 on page [*] for details on these issues.


3.12.9.2 Configuration Examples for Dedicated Resources

Each machine may have its own policy for the execution of jobs. This policy is set by configuration. Each machine with aspects of its configuration that are dedicated identifies the dedicated scheduler. And, the ClassAd representing a job to be executed on one or more of these dedicated machines includes an identifying attribute. An example configuration file with the following various policy settings is /etc/condor_config.local.dedicated.resource.

Each dedicated machine defines the configuration variable DedicatedScheduler , which identifies the dedicated scheduler it is managed by. The local configuration file for any dedicated resource contains a modified form of

DedicatedScheduler = "DedicatedScheduler@full.host.name"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler

Substitute the host name of the dedicated scheduler machine for the string "full.host.name".

If running personal Condor, the name of the scheduler includes the user name it was started as, so the configuration appears as:

DedicatedScheduler = "DedicatedScheduler@username@full.host.name"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler

All dedicated resources must have policy expressions which allow for jobs to always run, but not be preempted. The resource must also be configured to prefer jobs from the dedicated scheduler over all other jobs. Therefore, configuration gives the dedicated scheduler of choice the highest rank. It is worth noting that Condor puts no other requirements on a resource for it to be considered dedicated.

Job ClassAds from the dedicated scheduler contain the attribute Scheduler. The attribute is defined by a string of the form

Scheduler = "DedicatedScheduler@full.host.name"
The host name of the dedicated scheduler substitutes for the string full.host.name.

Different resources in the pool may have different dedicated policies by varying the local configuration.

Policy Scenario: Machine Runs Only Jobs That Require Dedicated Resources

One possible scenario for the use of a dedicated resource is to only run jobs that require the dedicated resource. To enact this policy, the configure with the following expressions:

START     = Scheduler =?= $(DedicatedScheduler)
SUSPEND   = False
CONTINUE  = True
PREEMPT   = False
KILL      = False
WANT_SUSPEND   = False
WANT_VACATE    = False
RANK      = Scheduler =?= $(DedicatedScheduler)

The START expression specifies that a job with the Scheduler attribute must match the string corresponding DedicatedScheduler attribute in the machine ClassAd. The RANK expression specifies that this same job (with the Scheduler attribute) has the highest rank. This prevents other jobs from preempting it based on user priorities. The rest of the expressions disable all of the condor_startd daemon's regular policies for evicting jobs when keyboard and CPU activity is discovered on the machine.

Policy Scenario: Run Both Jobs That Do and Do Not Require Dedicated Resources

While the first example works nicely for jobs requiring dedicated resources, it can lead to poor utilization of the dedicated machines. A more sophisticated strategy allows the machines to run other jobs, when no jobs that require dedicated resources exist. The machine is configured to prefer jobs that require dedicated resources, but not prevent others from running.

To implement this, configure the machine as a dedicated resource (as above) modifying only the START expression:

START = True

Policy Scenario: Adding Desk-Top Resources To The Mix

A third policy example allows all jobs. These desk-top machines use a preexisting START expression that takes the machine owner's usage into account for some jobs. The machine does not preempt jobs that must run on dedicated resources, while it will preempt other jobs based on a previously set policy. So, the default pool policy is used for starting and stopping jobs, while jobs that require a dedicated resource always start and are not preempted.

The START, SUSPEND, PREEMPT, and RANK policies are set in the global configuration. Locally, the configuration is modified to this hybrid policy by adding a second case.

SUSPEND    = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND))
PREEMPT    = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT))
RANK_FACTOR    = 1000000
RANK   = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) \
               + $(RANK)
START  = (Scheduler =?= $(DedicatedScheduler)) || ($(START))

Define RANK_FACTOR to be a larger value than the maximum value possible for the existing rank expression. RANK is just a floating point value, so there is no harm in having a value that is very large.

Policy Scenario: Parallel Scheduling Groups

In some parallel environments, machines are divided into groups, and jobs should not cross groups of machines - that is, all the nodes of a parallel job should be allocated to machines within the same group. The most common example is a pool of machines using infiniband switches. Each switch might connect 16 machines, and a pool might have 160 machines on 10 switches. If the infiniband switches are not routed to each other, each job must run on machines connected to the same switch.

The dedicated scheduler's parallel scheduling groups features supports jobs that must not cross group boundaries. Define a group by having each machine within a group set the configuration variable ParallelSchedulingGroup with a string that is a unique name for the group. The submit description file for a parallel universe job which must not cross group boundaries contains

+WantParallelSchedulingGroups = True

The dedicated scheduler enforces the allocation to within a group.


3.12.9.3 Preemption with Dedicated Jobs

The dedicated scheduler can optionally preempt running MPI jobs in favor of higher priority MPI jobs in its queue. Note that this is different from preemption in non-parallel universes, and MPI jobs cannot be preempted either by a machine's user pressing a key or by other means.

By default, the dedicated scheduler will never preempt running MPI jobs. Two configuration file items control dedicated preemption: SCHEDD_PREEMPTION_REQUIREMENTS and SCHEDD_PREEMPTION_RANK . These have no default value, so if either are not defined, preemption will never occur. SCHEDD_PREEMPTION_REQUIREMENTS must evaluate to True for a machine to be a candidate for this kind of preemption. If more machines are candidates for preemption than needed to satisfy a higher priority job, the machines are sorted by SCHEDD_PREEMPTION_RANK, and only the highest ranked machines are taken.

Note that preempting one node of a running MPI job requires killing the entire job on all of its nodes. So, when preemption happens, it may end up freeing more machines than strictly speaking are needed. Also, as Condor cannot produce checkpoints for MPI jobs, preempted jobs will be re-run, starting again from the beginning. Thus, the administrator should be careful when enabling dedicated preemption. The following example shows how to enable dedicated preemption.

STARTD_JOB_EXPRS = JobPrio
SCHEDD_PREEMPTION_REQUIREMENTS = (My.JobPrio < Target.JobPrio)
SCHEDD_PREEMPTION_RANK = 0.0

In this case, preemption is enabled by the user job priority. If a set of machines is running a job at user priority 5, and the user submits a new job at user priority 10, the running job will be preempted for the new job. The old job is put back in the queue, and will begin again from the beginning when assigned to a new set of machines.


3.12.9.4 Grouping dedicated nodes into parallel scheduling groups

In some parallel environments, machines are divided into groups, and jobs should not cross groups of machines - that is, all the nodes of a parallel job should be allocated to machines in the same group. The most common example is a pool of machine using infiniband switches. Each switch might connect 16 machines, and a pool might have 160 machines on 10 switches. If the infiniband switches are not routed to each other, each job must run on machines connected to the same switch. The dedicated scheduler's parallel scheduling groups features supports this operation.

Each condor_startd must define which group it belongs to by setting the ParallelSchedulingGroup variable in the configuration file, and advertising it into the machine ClassAd. The value of this variable is a string, which should be the same for all condor_startd daemons in a given group. The property must be advertised in the condor_startd ClassAd by appending ParallelSchedulingGroup to the STARTD_ATTRS configuration variable. Then, parallel jobs which want to be scheduled by group declare this by setting +WantParallelSchedulingGroups = True in their submit description file.


3.12.10 Configuring Condor for Running Backfill Jobs

Condor can be configured to run backfill jobs whenever the condor_startd has no other work to perform. These jobs are considered the lowest possible priority, but when machines would otherwise be idle, the resources can be put to good use.

Currently, Condor only supports using the Berkeley Open Infrastructure for Network Computing (BOINC) to provide the backfill jobs. More information about BOINC is available at http://boinc.berkeley.edu.

The rest of this section provides an overview of how backfill jobs work in Condor, details for configuring the policy for when backfill jobs are started or killed, and details on how to configure Condor to spawn the BOINC client to perform the work.


3.12.10.1 Overview of Backfill jobs in Condor

Whenever a resource controlled by Condor is in the Unclaimed/Idle state, it is totally idle; neither the interactive user nor a Condor job is performing any work. Machines in this state can be configured to enter the Backfill state, which allows the resource to attempt a background computation to keep itself busy until other work arrives (either a user returning to use the machine interactively, or a normal Condor job). Once a resource enters the Backfill state, the condor_startd will attempt to spawn another program, called a backfill client, to launch and manage the backfill computation. When other work arrives, the condor_startd will kill the backfill client and clean up any processes it has spawned, freeing the machine resources for the new, higher priority task. More details about the different states a Condor resource can enter and all of the possible transitions between them are described in section 3.5 beginning on page [*], especially sections 3.5.5, 3.5.6, and 3.5.7.

At this point, the only backfill system supported by Condor is BOINC. The condor_startd has the ability to start and stop the BOINC client program at the appropriate times, but otherwise provides no additional services to configure the BOINC computations themselves. Future versions of Condor might provide additional functionality to make it easier to manage BOINC computations from within Condor. For now, the BOINC client must be manually installed and configured outside of Condor on each backfill-enabled machine.


3.12.10.2 Defining the Backfill Policy

There are a small set of policy expressions that determine if a condor_startd will attempt to spawn a backfill client at all, and if so, to control the transitions in to and out of the Backfill state. This section briefly lists these expressions. More detail can be found in section 3.3.10 on page [*].

ENABLE_BACKFILL
A boolean value to determine if any backfill functionality should be used. The default value is False.

BACKFILL_SYSTEM
A string that defines what backfill system to use for spawning and managing backfill computations. Currently, the only supported string is "BOINC".

START_BACKFILL
A boolean expression to control if a Condor resource should start a backfill client. This expression is only evaluated when the machine is in the Unclaimed/Idle state and the ENABLE_BACKFILL expression is True.

EVICT_BACKFILL
A boolean expression that is evaluated whenever a Condor resource is in the Backfill state. A value of True indicates the machine should immediately kill the currently running backfill client and any other spawned processes, and return to the Owner state.

The following example shows a possible configuration to enable backfill:

# Turn on backfill functionality, and use BOINC
ENABLE_BACKFILL = TRUE
BACKFILL_SYSTEM = BOINC

# Spawn a backfill job if we've been Unclaimed for more than 5
# minutes 
START_BACKFILL = $(StateTimer) > (5 * $(MINUTE))

# Evict a backfill job if the machine is busy (based on keyboard
# activity or cpu load)
EVICT_BACKFILL = $(MachineBusy)


3.12.10.3 Overview of the BOINC system

The BOINC system is a distributed computing environment for solving large scale scientific problems. A detailed explanation of this system is beyond the scope of this manual. Thorough documentation about BOINC is available at their website: http://boinc.berkeley.edu. However, a brief overview is provided here for sites interested in using BOINC with Condor to manage backfill jobs.

BOINC grew out of the relatively famous SETI@home computation, where volunteers installed special client software, in the form of a screen saver, that contacted a centralized server to download work units. Each work unit contained a set of radio telescope data and the computation tried to find patterns in the data, a sign of intelligent life elsewhere in the universe (hence the name: ``Search for Extra Terrestrial Intelligence at home''). BOINC is developed by the Space Sciences Lab at the University of California, Berkeley, by the same people who created SETI@home. However, instead of being tied to the specific radio telescope application, BOINC is a generic infrastructure by which many different kinds of scientific computations can be solved. The current generation of SETI@home now runs on top of BOINC, along with various physics, biology, climatology, and other applications.

The basic computational model for BOINC and the original SETI@home is the same: volunteers install BOINC client software which runs whenever the machine would otherwise be idle. However, the BOINC installation on any given machine must be configured so that it knows what computations to work for (each computation is referred to as a project using BOINC's terminology), instead of always working on a hard coded computation. A given BOINC client can be configured to donate all of its cycles to a single project, or to split the cycles between projects so that, on average, the desired percentage of the computational power is allocated to each project. Once the client software (a program called the boinc_client) starts running, it attempts to contact a centralized server for each project it has been configured to work for. The BOINC software downloads the appropriate platform-specific application binary and some work units from the central server for each project. Whenever the client software completes a given work unit, it once again attempts to connect to that project's central server to upload the results and download more work.

BOINC participants must register at the centralized server for each project they wish to donate cycles to. The process produces a unique identifier so that the work performed by a given client can be credited to a specific user. BOINC keeps track of the work units completed by each user, so that users providing the most cycles get the highest rankings (and therefore, bragging rights).

Because BOINC already handles the problems of distributing the application binaries for each scientific computation, the work units, and compiling the results, it is a perfect system for managing backfill computations in Condor. Many of the applications that run on top of BOINC produce their own application-specific checkpoints, so even if the boinc_client is killed (for example, when a Condor job arrives at a machine, or if the interactive user returns) an entire work unit will not necessarily be lost.


3.12.10.4 Installing the BOINC client software

If a working installation of BOINC currently exists on machines where backfill is desired, skip the remainder of this section. Continue reading with the section titled ``Configuring the BOINC client under Condor''.

In Condor Version 7.7.6, the BOINC client software that actually spawns and manages the backfill computations (the boinc_client) must be manually downloaded, installed and configured outside of Condor. Hopefully in future versions, the Condor package will include the boinc_client, and there will be a way to automatically install and configure the BOINC software together with Condor.

The boinc_client executables can be obtained at one of the following locations:

http://boinc.berkeley.edu/download.php
This is the official BOINC download site, which provides binaries for MacOS 10.3 or higher, Linux/x86, and Windows/x86. From the download table, use the ``Recommended version'', and use the ``Core client only (command-line)'' package when available.

http://boinc.berkeley.edu/download_other.php
This page contains links to sites that distribute boinc_client binaries for other platforms beyond the officially supported ones.

Once the BOINC client software has been downloaded, the boinc_client binary should be placed in a location where the Condor daemons can use it. The path will be specified via a Condor configuration setting, BOINC_Executable , described below.

Additionally, a local directory on each machine should be created where the BOINC system can write files it needs. This directory must not be shared by multiple instances of the BOINC software, just like the spool or execute directories used by Condor. This location of this directory is defined using the BOINC_InitialDir macro, described below. The directory must be writable by whatever user the boinc_client will run as. This user is either the same as the user the Condor daemons are running as (if Condor is not running as root), or a user defined via the BOINC_Owner setting described below.

Finally, Condor administrators wishing to use BOINC for backfill jobs must create accounts at the various BOINC projects they want to donate cycles to. The details of this process vary from project to project. Beware that this step must be done manually, as the BOINC software spawned by Condor (the boinc_client) can not automatically register a user at a given project (unlike the more fancy GUI version of the BOINC client software which many users run as a screen saver). For example, to configure machines to perform work for the Einstein@home project (a physics experiment run by the University of Wisconsin at Milwaukee) Condor administrators should go to http://einstein.phys.uwm.edu/create_account_form.php, fill in the web form, and generate a new Einstein@home identity. This identity takes the form of a project URL (such as http://einstein.phys.uwm.edu) followed by an account key, which is a long string of letters and numbers that is used as a unique identifier. This URL and account key will be needed when configuring Condor to use BOINC for backfill computations (described in the next section).


3.12.10.5 Configuring the BOINC client under Condor

This section assumes that the BOINC client software has already been installed on a given machine, that the BOINC projects to join have been selected, and that a unique project account key has been created for each project. If any of these steps has not been completed, please read the previous section titled ``Installing the BOINC client software''

Whenever the condor_startd decides to spawn the boinc_client to perform backfill computations (when ENABLE_BACKFILL is True, when the resource is in Unclaimed/Idle, and when the START_BACKFILL expression evaluates to True), it will spawn a condor_starter to directly launch and monitor the boinc_client program. This condor_starter is just like the one used to spawn normal Condor jobs. In fact, the argv[0] of the boinc_client will be renamed to ``condor_exec'', as described in section 2.15.1 on page [*].

The condor_starter for spawning the boinc_client reads values out of the Condor configuration files to define the job it should run, as opposed to getting these values from a job classified ad in the case of a normal Condor job. All of the configuration settings to control things like the path to the boinc_client binary to use, the command-line arguments, the initial working directory, and so on, are prefixed with the string "BOINC_". Each possible setting is described below:

Required settings:

BOINC_Executable
The full path to the boinc_client binary to use.

BOINC_InitialDir
The full path to the local directory where BOINC should run.

BOINC_Universe
The Condor universe used for running the boinc_client program. This must be set to "vanilla" for BOINC to work under Condor.

BOINC_Owner
What user the boinc_client program should be run as. This macro is only used if the Condor daemons are running as root. In this case, the condor_starter must be told what user identity to switch to before spawning the boinc_client. This can be any valid user on the local system, but it must have write permission in whatever directory is specified in BOINC_InitialDir).

Optional settings:

BOINC_Arguments
Command-line arguments that should be passed to the boinc_client program. For example, one way to specify the BOINC project to join is to use the -attach_project argument to specify a project URL and account key. For example:

BOINC_Arguments = --attach_project http://einstein.phys.uwm.edu [account_key]

BOINC_Environment
Environment variables that should be set for the boinc_client.

BOINC_Output
Full path to the file where STDOUT from the boinc_client should be written. If this macro is not defined, STDOUT will be discarded.

BOINC_Error
Full path to the file where STDERR from the boinc_client should be written. If this macro is not defined, STDERR will be discarded.

The following example shows one possible usage of these settings:

# Define a shared macro that can be used to define other settings.
# This directory must be manually created before attempting to run
# any backfill jobs.
BOINC_HOME = $(LOCAL_DIR)/boinc

# Path to the boinc_client to use, and required universe setting
BOINC_Executable = /usr/local/bin/boinc_client
BOINC_Universe = vanilla

# What initial working directory should BOINC use?
BOINC_InitialDir = $(BOINC_HOME)

# Save STDOUT and STDERR
BOINC_Output = $(BOINC_HOME)/boinc.out
BOINC_Error = $(BOINC_HOME)/boinc.err

If the Condor daemons reading this configuration are running as root, an additional macro must be defined:

# Specify the user that the boinc_client should run as:
BOINC_Owner = nobody

In this case, Condor would spawn the boinc_client as ``nobody'', so the directory specified in $(BOINC_HOME) would have to be writable by the ``nobody'' user.

A better choice would probably be to create a separate user account just for running BOINC jobs, so that the local BOINC installation is not writable by other processes running as ``nobody''. Alternatively, the BOINC_Owner could be set to ``daemon''.

Attaching to a specific BOINC project

There are a few ways to attach a Condor/BOINC installation to a given BOINC project:

In the first two cases (using command-line arguments for boinc_client or running the boinc_cmd tool), BOINC will write out the resulting account file to the local BOINC directory on the machine, and then future invocations of the boinc_client will already be attached to the appropriate project(s). More information about participating in multiple BOINC projects can be found at http://boinc.berkeley.edu/multiple_projects.php.


3.12.10.6 BOINC on Windows

The Windows version of BOINC has multiple installation methods. The preferred method of installation for use with Condor is the ``Shared Installation'' method. Using this method gives all users access to the executables. During the installation process

  1. Deselect the option which makes BOINC the default screen saver
  2. Deselect the option which runs BOINC on start-up.
  3. Do not launch BOINC at the conclusion of the installation.

There are three major differences from the Unix version to keep in mind when dealing with the Windows installation:

  1. The Windows executables have different names from the Unix versions. The Windows client is called boinc.exe. Therefore, the configuration variable BOINC_Executable is written:

    BOINC_Executable = C:\PROGRA~1\BOINC\boinc.exe
    

    The Unix administrative tool boinc_cmd is called boinccmd.exe on Windows.

  2. When using BOINC on Windows, the configuration variable BOINC_InitialDir will not be respected fully. To work around this difficulty, pass the BOINC home directory directly to the BOINC application via the BOINC_Arguments configuration variable. For Windows, rewrite the argument line as:

    BOINC_Arguments = --dir $(BOINC_HOME) \
              --attach_project http://einstein.phys.uwm.edu [account_key]
    

    As a consequence of setting the BOINC home directory, some projects may fail with the authentication error:

    Scheduler request failed: Peer 
    certificate cannot be authenticated 
    with known CA certificates.
    

    To resolve this issue, copy the ca-bundle.crt file from the BOINC installation directory to $(BOINC_HOME). This file appears to be project and machine independent, and it can therefore be distributed as part of an automated Condor installation.

  3. The BOINC_Owner configuration variable behaves differently on Windows than it does on Unix. Its value may take one of two forms:

    Setting this option causes the addition of the job attribute

    RunAsUser = True
    
    to the backfill client. This further implies that the configuration variable STARTER_ALLOW_RUNAS_OWNER be set to True to insure that the local condor_starter be able to run jobs in this manner. For more information on the RunAsUser attribute, see section 6.2.4. For more information on the the STARTER_ALLOW_RUNAS_OWNER configuration variable, see section 3.3.7.


3.12.11 Group ID-Based Process Tracking

One function that Condor often must perform is keeping track of all processes created by a job. This is done so that Condor can provide resource usage statistics about jobs, and also so that Condor can properly clean up any processes that jobs leave behind when they exit.

In general, tracking process families is difficult to do reliably. By default Condor uses a combination of process parent-child relationships, process groups, and information that Condor places in a job's environment to track process families on a best-effort basis. This usually works well, but it can falter for certain applications or for jobs that try to evade detection.

Jobs that run with a user account dedicated for Condor's use can be reliably tracked, since all Condor needs to do is look for all processes running using the given account. Administrators must specify in Condor's configuration what accounts can be considered dedicated via the DEDICATED_EXECUTE_ACCOUNT_REGEXP setting. See Section 3.6.13 for further details.

Ideally, jobs can be reliably tracked regardless of the user account they execute under. This can be accomplished with group ID-based tracking. This method of tracking requires that a range of dedicated group IDs (GID) be set aside for Condor's use. The number of GIDs that must be set aside for an execute machine is equal to its number of execution slots. GID-based tracking is only available on Linux, and it requires that Condor either runs as root or uses privilege separation (see Section 3.6.14).

GID-based tracking works by placing a dedicated GID in the supplementary group list of a job's initial process. Since modifying the supplementary group ID list requires root privilege, the job will not be able to create processes that go unnoticed by Condor.

Once a suitable GID range has been set aside for process tracking, GID-based tracking can be enabled via the USE_GID_PROCESS_TRACKING parameter. The minimum and maximum GIDs included in the range are specified with the MIN_TRACKING_GID and MAX_TRACKING_GID settings. For example, the following would enable GID-based tracking for an execute machine with 8 slots.

USE_GID_PROCESS_TRACKING = True
MIN_TRACKING_GID = 750
MAX_TRACKING_GID = 757

If the defined range is too small, such that there is not a GID available when starting a job, then the condor_starter will fail as it tries to start the job. An error message will be logged stating that there are no more tracking GIDs.

GID-based process tracking requires use of the condor_procd. If USE_GID_PROCESS_TRACKING is true, the condor_procd will be used regardless of the USE_PROCD setting. Changes to MIN_TRACKING_GID and MAX_TRACKING_GID require a full restart of Condor.


3.12.12 Cgroup-Based Process Tracking

A new feature in Linux kernels version 2.6.24 and more recent kernels allows Condor to more accurately and safely manage jobs composed of sets of processes. This Linux feature is called Control Groups, or cgroups for short, and it is available starting with RHEL 6, Debian 6, and related distributions. Documentation about Linux kernel support for cgroups can be found in the Documentation directory in the kernel source code distribution. Another good reference is http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/index.html Even if cgroup support is built into the kernel, many distributions do not install the cgroup tools by default. In order to use cgroups, the tools must be installed. On RPM-based systems, these can be installed with the command

yum install libcgroup\*

Starting with Condor version 7.7.0, the condor_starter daemon can optionally use cgroups to accurately track all the processes started by a job, even when quickly-exiting parent processes spawn many child processes. As with the GID-based tracking, this is only implemented when a condor_procd daemon is running. The Condor team recommends enabling this feature on Linux platforms that support it. When cgroup tracking is enabled, Condor is able to report a much more accurate measurement of the physical memory used by a set of processes.

Kernel cgroups are named in a virtual file system hierarchy. Condor will put each running job on the execute node in a separate cgroup, named using the job's attributes by job_<ClusterId>_<ProcId>, where <ClusterId> is replaced by the job ClassAd attribute ClusterId, and <ProcId> is replaced by the job ClassAd attribute ProcId. These directories will be under a base directory named by the Condor configuration variable BASE_CGROUP . This variable has no default value, so if the variable is not set, cgroup tracking will not be used. Unless there is a need for integration of Condor jobs with other cgroup-based tracking, a good choice for BASE_CGROUP location might be /condor.

Condor itself will not mount the virtual cgroup file systems. This can either be done by hand at each system reboot, by the cgconfig service which reads a file called /etc/cgconfig.conf, or automatically by the systemd service on systems which use systemd instead of init.

Here is an example of the contents of file cgconfig.conf:

mount {
        cpuacct = /mnt/cgroups/cpuacct;
        memory  = /mnt/cgroups/memory;
        freezer = /mnt/cgroups/freezer;
        blkio   = /mnt/cgroups/blkio;
}

group condor {
        cpuacct {}
        memory {}
        freezer {}
        blkio {}
}

If the mount command shows that no cgroup file systems are mounted, then either the by hand method or the cgconfig service will need to mount the four controllers which Condor needs: cpuacct, memory, freezer and blkio.

Once cgroup-based tracking is configured, usage should be invisible to the user and administrator. The condor_procd log, as defined by configuration variable PROCD_LOG, will mention that it is using this method, but no user visible changes should occur, other than the impossibility of a quickly-forking process escaping from the control of the condor_starter, and the more accurate reporting of memory usage.


3.12.13 Limiting Resource Usage

An administrator can strictly limit the usage of system resources by jobs for any job that may be wrapped using the script defined by the configuration variable USER_JOB_WRAPPER . These are jobs within universes that are controlled by the condor_starter daemon, and they include the vanilla, standard, java, local, and parallel universes.

The job's ClassAd is written by the condor_starter daemon. It will need to contain attributes that the script defined by USER_JOB_WRAPPER can use to implement platform specific resource limiting actions. Examples of resources that may be referred to for limiting purposes are RAM, swap space, file descriptors, stack size, and core file size.

An initial sample of a USER_JOB_WRAPPER script is provided in the installation at $(LIBEXEC)/condor_limits_wrapper.sh. Here is the contents of that file:

#!/bin/sh
# Copyright 2008 Red Hat, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

if [[ $_CONDOR_MACHINE_AD != "" ]]; then
   mem_limit=$((`egrep '^Memory' $_CONDOR_MACHINE_AD | cut -d ' ' -f 3` * 1024))
#   block_size=$((`stat -f -c %s .` / 1024))
#   disk_limit=$((`egrep '^Disk' $_CONDOR_MACHINE_AD | cut -d ' ' -f 3` / $block_size))
   disk_limit=`egrep '^Disk' $_CONDOR_MACHINE_AD | cut -d ' ' -f 3`
   vm_limit=`egrep '^VirtualMemory' $_CONDOR_MACHINE_AD | cut -d ' ' -f 3`

   ulimit -d $mem_limit
   if [[ $? != 0 ]] || [[ $mem_limit = "" ]]; then
      echo "Failed to set Memory Resource Limit" > $_CONDOR_WRAPPER_ERROR_FILE
      exit 1
   fi
   ulimit -f $disk_limit
   if [[ $? != 0 ]] || [[ $disk_limit = "" ]]; then
      echo "Failed to set Disk Resource Limit" > $_CONDOR_WRAPPER_ERROR_FILE
      exit 1
   fi
   ulimit -v $vm_limit
   if [[ $? != 0 ]] || [[ $vm_limit = "" ]]; then
      echo "Failed to set Virtual Memory Resource Limit" > $_CONDOR_WRAPPER_ERROR_FILE
      exit 1
   fi
fi

exec "$@"
error=$?
echo "Failed to exec($error): $@" > $_CONDOR_WRAPPER_ERROR_FILE
exit 1

If used in an unmodified form, this script sets the job's limits on a per slot basis for memory, disk, and virtual memory usage, with the limits defined by the values in the machine ClassAd. This example file will need to be modified and merged for use with a preexisting USER_JOB_WRAPPER script.

If additional functionality is added to the script, an administrator is likely to use the USER_JOB_WRAPPER script in conjunction with SUBMIT_EXPRS to force the job ClassAd to contain attributes that the USER_JOB_WRAPPER script expects to have defined.

The following variables are set in the environment of the the USER_JOB_WRAPPER script by the condor_starter daemon, when the USER_JOB_WRAPPER is defined.

_CONDOR_MACHINE_AD
The full path and file name of the file containing the machine ClassAd.
_CONDOR_JOB_AD
The full path and file name of the file containing the job ClassAd.
_CONDOR_WRAPPER_ERROR_FILE
The full path and file name of the file that the USER_JOB_WRAPPER script should create, if there is an error. The text in this file will be included in any Condor failure messages.


3.12.14 Concurrency Limits

Condor's implementation of the mechanism called concurrency limits allows an administrator to define and set integer limits on consumable resources. These limits are utilized during matchmaking, preventing matches when the resources are allocated. Typical uses of this mechanism will include the management of software licenses, database connections, and any other consumable resource external to Condor.

Use of the concurrency limits mechanism requires configuration variables to set distinct limits, while jobs must identify the need for a specific resource.

In the configuration, a string must be chosen as a name for the particular resource. This name is used in the configuration of a condor_negotiator daemon variable that defines the concurrency limit, or integer quantity available of this resource. For example, assume that there are 3 licenses for the X software. The configuration variable concurrency limit may be:

XSW_LIMIT = 3
where "XSW" is the invented name of this resource, which is appended with the string _LIMIT. With this limit, a maximum of 3 jobs declaring that they need this resource may be executed concurrently.

In addition to named limits, such as in the example named limit XSW, configuration may specify a concurrency limit for all resources that are not covered by specifically-named limits. The configuration variable CONCURRENCY_LIMIT_DEFAULT sets this value. For example,

CONCURRENCY_LIMIT_DEFAULT = 1
sets a limit of 1 job in execution for any job that declares its requirement for a resource that is not named in the configuration. If CONCURRENCY_LIMIT_DEFAULT is omitted from the configuration, then no limits are placed on the number of concurrently executing jobs of resources for which there is no specifically named concurrency limit.

The job must declare its need for a resource by placing a command in its submit description file or adding an attribute to the job ClassAd. In the submit description file, an example job that requires the X software adds:

concurrency_limits = XSW
This results in the job ClassAd attribute
ConcurrencyLimits = "XSW"

The implementation of the job ClassAd attribute ConcurrencyLimits has a more general implementation. It is either a string or a string list. A list contains items delimited by space characters and comma characters. Therefore, a job that requires the 3 separate resources named as "XSW", "y", and "Z", will contain in its submit description file:

concurrency_limits = y,XSW,Z

Additionally, a numerical value identifying the number of resources required may be specified in the definition of a resource, following the resource name by a colon character and the integer number of resources. Modifying the given example to specify that 3 of the "XSW" resource are needed results in:

concurrency_limits = y,XSW:3,Z

Concurrency limit defaults may also be declared for named groups, which allow default limits to be ``scoped'' by a group name, as in this example:

CONCURRENCY_LIMIT_DEFAULT = 5
CONCURRENCY_LIMIT_DEFAULT_LARGE = 100
CONCURRENCY_LIMIT_DEFAULT_SMALL = 25

With the above configuration, a concurrency limit named ``large.swlicense'' will receive a default limit of 100. A concurrency limit named ``large.dbsession'' will also receive a default limit of 100. A limit named ``small.dbsession'' will receive a default limit of 25. A concurrency limit ``other.license'' will receive the global default limit of 5, as there is no declaration for CONCURRENCY_LIMIT_DEFAULT_OTHER.

Note that the maximum for any given limit, as specified with the configuration variable <*>_LIMIT, is as strictly enforced as possible. In the presence of preemption and dropped updates from the condor_startd daemon to the condor_collector daemon, it is possible for the limit to be exceeded. Condor will never kill a job to free up a limit, including the case where a limit maximum is exceeded.


next up previous contents index
Next: 3.13 Java Support Installation Up: 3. Administrators' Manual Previous: 3.11 The High Availability   Contents   Index
condor-admin@cs.wisc.edu