condor_submit is the program for submitting jobs for execution under HTCondor. condor_submit requires a submit description file which contains commands to direct the queuing of jobs. One submit description file may contain specifications for the queuing of many HTCondor jobs at once. A single invocation of condor_submit may cause one or more clusters. A cluster is a set of jobs specified in the submit description file between queue commands for which the executable is not changed. It is advantageous to submit multiple jobs as a single cluster because:
Multiple clusters may be specified within a single submit description file. Each cluster must specify a single executable.
The job ClassAd attribute ClusterId identifies a cluster. See specifics for this attribute in the Appendix on page .
Note that submission of jobs from a Windows machine requires a stashed password to allow HTCondor to impersonate the user submitting the job. To stash a password, use the condor_store_cred command. See the manual page at page for details.
For lengthy lines within the submit description file,
the backslash (
) is a line continuation character.
Placing the backslash at the end of a line causes the current line's command
to be continued with the next line of the file.
Submit description files may contain comments.
A comment is any line beginning with a pound character (#
).
The submit description file must contain one executable command and at least one queue command. All of the other commands have default actions.
The commands which can appear in the submit description file are numerous. They are listed here in alphabetical order by category.
BASIC COMMANDS
In the java universe, the first argument must be the name of the class containing main.
There are two permissible formats for specifying arguments, identified as the old syntax and the new syntax. The old syntax supports white space characters within arguments only in special circumstances; when used, the command line arguments are represented in the job ClassAd attribute Args. The new syntax supports uniform quoting of white space characters within arguments; when used, the command line arguments are represented in the job ClassAd attribute Arguments.
Old Syntax
In the old syntax, individual command line arguments are delimited
(separated) by space characters.
To allow a double quote mark in an argument,
it is escaped with a backslash; that is,
the two character sequence \"
becomes a single double quote mark within an argument.
Further interpretation of the argument string differs depending on the operating system. On Windows, the entire argument string is passed verbatim (other than the backslash in front of double quote marks) to the Windows application. Most Windows applications will allow spaces within an argument value by surrounding the argument with double quotes marks. In all other cases, there is no further interpretation of the arguments.
Example:
arguments = one \"two\" 'three'
Produces in Unix vanilla universe:
argument 1: one argument 2: "two" argument 3: 'three'
New Syntax
Here are the rules for using the new syntax:
Example:
arguments = "3 simple arguments"Produces:
argument 1: 3 argument 2: simple argument 3: arguments
Another example:
arguments = "one 'two with spaces' 3"Produces:
argument 1: one argument 2: two with spaces argument 3: 3
And yet another example:
arguments = "one ""two"" 'spacey ''quoted'' argument'"
Produces:
argument 1: one argument 2: "two" argument 3: spacey 'quoted' argument
Notice that in the new syntax, the backslash has no special meaning. This is for the convenience of Windows users.
There are two different formats for specifying the environment variables: the old format and the new format. The old format is retained for backward-compatibility. It suffers from a platform-dependent syntax and the inability to insert some special characters into the environment.
The new syntax for specifying environment values:
<name>=<value>
Example:
environment = "one=1 two=""2"" three='spacey ''quoted'' value'"
Produces the following environment entries:
one=1 two="2" three=spacey 'quoted' value
Under the old syntax, there are no double quote marks surrounding the environment specification. Each environment entry remains of the form
<name>=<value>Under Unix, list multiple environment entries by separating them with a semicolon (;). Under Windows, separate multiple entries with a vertical bar (| ). There is no way to insert a literal semicolon under Unix or a literal vertical bar under Windows. Note that spaces are accepted, but rarely desired, characters within parameter names and values, because they are treated as literal characters, not separators or ignored white space. Place spaces within the parameter list only if required.
A Unix example:
environment = one=1;two=2;three="quotes have no 'special' meaning"
This produces the following:
one=1 two=2 three="quotes have no 'special' meaning"
If the environment is set with the environment command and getenv is also set to true, values specified with environment override values in the submitter's environment (regardless of the order of the environment and getenv commands).
If no path or a relative path is used, then the executable file is presumed to be relative to the current working directory of the user as the condor_submit command is issued.
If submitting into the standard universe, then the named executable must have been re-linked with the HTCondor libraries (such as via the condor_compile command). If submitting into the vanilla universe (the default), then the named executable need not be re-linked and can be any process which can run in the background (shell scripts work fine as well). If submitting into the Java universe, then the argument must be a compiled .class file.
If the environment is set with the environment command and getenv is also set to true, values specified with environment override values in the submitter's environment (regardless of the order of the environment and getenv commands).
Note that this command does not refer to the command-line arguments of the program. The command-line arguments are specified by the arguments command.
job-owner@UID_DOMAINwhere the configuration variable UID_DOMAIN is specified by the HTCondor site administrator. If UID_DOMAIN has not been specified, HTCondor sends the e-mail to:
job-owner@submit-machine-name
Note that if a program explicitly opens and writes to a file, that file should not be specified as the output file.
COMMANDS FOR MATCHMAKING
requirements = Memory > 60 rank = Memoryasks HTCondor to find all available machines with more than 60 megabytes of memory and give to the job the machine with the most amount of memory. See section 2.5.2 within the HTCondor Users Manual for complete information on the syntax and available attributes that can be used in the ClassAd expression.
&& (RequestCpus <= Target.Cpus)is appended to the requirements expression for the job.
For pools that enable dynamic condor_startd provisioning (see section 3.5.10), specifies the minimum number of CPUs requested for this job, resulting in a dynamic slot being created with this many cores.
&& (RequestDisk <= Target.Disk)is appended to the requirements expression for the job.
For pools that enable dynamic condor_startd provisioning (see section 3.5.10), a dynamic slot will be created with at least this much disk space.
Characters may be appended to a numerical value to indicate units. K or KB indicates Kbytes. M or MB indicates Mbytes. G or GB indicates Gbytes. T or TB indicates Tbytes.
For pools that enable dynamic condor_startd provisioning (see section 3.5.10), a dynamic slot will be created with at least this much RAM.
The expression
&& (RequestMemory <= Target.Memory)is appended to the requirements expression for the job.
Characters may be appended to a numerical value to indicate units. K or KB indicates Kbytes. M or MB indicates Mbytes. G or GB indicates Gbytes. T or TB indicates Tbytes.
For scheduler and local universe jobs, the requirements expression is evaluated against the Scheduler ClassAd which represents the the condor_schedd daemon running on the submit machine, rather than a remote machine. Like all commands in the submit description file, if multiple requirements commands are present, all but the last one are ignored. By default, condor_submit appends the following clauses to the requirements expression:
FILE TRANSFER COMMANDS
For more information about this and other settings related to transferring files, see section 2.5.4 on page .
Note that should_transfer_files is not supported for jobs submitted to the grid universe.
When a path to an input file or directory is specified, this specifies the path to the file on the submit side. The file is placed in the job's temporary scratch directory on the execute side, and it is named using the base name of the original path. For example, /path/to/input_file becomes input_file in the job's scratch directory.
A directory may be specified using a trailing path separator. An example of a trailing path separator is the slash character on Unix platforms; a directory example using a trailing path separator is input_data/. When a directory is specified with a trailing path separator, the contents of the directory are transferred, but the directory itself is not transferred. It is as if each of the items within the directory were listed in the transfer list. When there is no trailing path separator, the directory is transferred, its contents are transferred, and these contents are placed inside the transferred directory.
For grid universe jobs other than HTCondor-C, the transfer of directories is not currently supported.
Symbolic links to files are transferred as the files they point to. Transfer of symbolic links to directories is not currently supported.
For vanilla and vm universe jobs only, a file may be specified by giving a URL, instead of a file name. The implementation for URL transfers requires both configuration and available plug-in. See section 3.12.2 for details.
For more information about this and other settings related to transferring files, see section 2.5.4 on page .
For HTCondor-C jobs and all other non-grid universe jobs, if transfer_output_files is not specified, HTCondor will automatically transfer back all files in the job's temporary working directory which have been modified or created by the job. Subdirectories are not scanned for output, so if output from subdirectories is desired, the output list must be explicitly specified. For grid universe jobs other than HTCondor-C, desired output files must also be explicitly listed. Another reason to explicitly list output files is for a job that creates many files, and the user wants only a subset transferred back.
For grid universe jobs other than with grid type condor, to have files other than standard output and standard error transferred from the execute machine back to the submit machine, do use transfer_output_files, listing all files to be transferred. These files are found on the execute machine in the working directory of the job.
When a path to an output file or directory is specified, it specifies the path to the file on the execute side. As a destination on the submit side, the file is placed in the job's initial working directory, and it is named using the base name of the original path. For example, path/to/output_file becomes output_file in the job's initial working directory. The name and path of the file that is written on the submit side may be modified by using transfer_output_remaps. Note that this remap function only works with files but not with directories.
A directory may be specified using a trailing path separator. An example of a trailing path separator is the slash character on Unix platforms; a directory example using a trailing path separator is input_data/. When a directory is specified with a trailing path separator, the contents of the directory are transferred, but the directory itself is not transferred. It is as if each of the items within the directory were listed in the transfer list. When there is no trailing path separator, the directory is transferred, its contents are transferred, and these contents are placed inside the transferred directory.
For grid universe jobs other than HTCondor-C, the transfer of directories is not currently supported.
Symbolic links to files are transferred as the files they point to. Transfer of symbolic links to directories is not currently supported.
For more information about this and other settings related to transferring files, see section 2.5.4 on page .
name describes an output file name produced by your job, and newname describes the file name it should be downloaded to. Multiple remaps can be specified by separating each with a semicolon. If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash. You cannot specify directories to be remapped.
Setting when_to_transfer_output equal to ON_EXIT will cause HTCondor to transfer the job's output files back to the submitting machine only when the job completes (exits on its own).
The ON_EXIT_OR_EVICT option is intended for fault tolerant jobs which periodically save their own state and can restart where they left off. In this case, files are spooled to the submit machine any time the job leaves a remote site, either because it exited on its own, or was evicted by the HTCondor system for any reason prior to job completion. The files spooled back are placed in a directory defined by the value of the SPOOL configuration variable. Any output files transferred back to the submit machine are automatically sent back out again as input files if the job restarts.
For more information about this and other settings related to transferring files, see section 2.5.4 on page .
POLICY COMMANDS
The process by which the condor_schedd claims a condor_startd is somewhat time-consuming. To amortize this cost, the condor_schedd tries to reuse claims to run subsequent jobs, after a job using a claim is done. However, it can only do this if there is an idle job in the queue at the moment the previous job completes. Sometimes, and especially for the node jobs when using DAGMan, there is a subsequent job about to be submitted, but it has not yet arrived in the queue when the previous job completes. As a result, the condor_schedd releases the claim, and the next job must wait an entire negotiation cycle to start. When this submit command is defined with a non-negative integer, when the job exits, the condor_schedd tries as usual to reuse the claim. If it cannot, instead of releasing the claim, the condor_schedd keeps the claim until either the number of seconds given as a parameter, or a new job which matches that claim arrives, whichever comes first. The condor_startd in question will remain in the Claimed/Idle state, and the original job will be "charged" (in terms of priority) for the time in this state.
As an example, if the job is to be removed once the output is retrieved with condor_transfer_data, then use
leave_in_queue = (JobStatus == 4) && ((StageOutFinish =?= UNDEFINED) ||\ (StageOutFinish == 0))
This command has been historically used to implement a form of job start throttling from the job submitter's perspective. It was effective for the case of multiple job submission where the transfer of extremely large input data sets to the execute machine caused machine performance to suffer. This command is no longer useful, as throttling should be accomplished through configuration of the condor_schedd daemon.
For example: Suppose a job is known to run for a minimum of an hour. If the job exits after less than an hour, the job should be placed on hold and an e-mail notification sent, instead of being allowed to leave the queue.
on_exit_hold = (CurrentTime - JobStartDate) < (60 * $(MINUTE))
This expression places the job on hold if it exits for any reason before running for an hour. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became True.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe.
For example, suppose a job occasionally segfaults, but chances are that the job will finish successfully if the job is run again with the same data. The on_exit_remove expression can cause the job to run again with the following command. Assume that the signal identifier for the segmentation fault is 11 on the platform where the job will be running.
on_exit_remove = (ExitBySignal == False) || (ExitSignal != 11)This expression lets the job leave the queue if the job was not killed by a signal or if it was killed by a signal other than 11, representing segmentation fault in this example. So, if the exited due to signal 11, it will stay in the job queue. In any other case of the job exiting, the job will leave the queue as it normally would have done.
As another example, if the job should only leave the queue if it exited on its own with status 0, this on_exit_remove expression works well:
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)If the job was killed by a signal or exited with a non-zero exit status, HTCondor would leave the job in the queue to run again.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks these periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks these periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
See section 11, the Examples section of the condor_submit manual page, for an example of a periodic_remove expression.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions. So, the periodic_remove expression takes precedent over the on_exit_remove expression, if the two describe conflicting actions.
Only job ClassAd attributes will be defined for use by this ClassAd expression. This expression is available for the vanilla, java, parallel, grid, local and scheduler universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_schedd daemon, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
COMMANDS SPECIFIC TO THE STANDARD UNIVERSE
#! /bin/sh # get the host name of the machine $host=`uname -n` # grab a standard universe executable designed specifically # for this host scp elsewhere@cs.wisc.edu:${host} executable # The PID MUST stay the same, so exec the new standard universe process. exec executable ${1+"$@"}If this command is not present (defined), then the value defaults to false.
If your job attempts to access a file mentioned in this list, HTCondor will force all writes to that file to be appended to the end. Furthermore, condor_submit will not truncate it. This list uses the same syntax as compress_files, shown above.
This option may yield some surprising results. If several jobs attempt to write to the same file, their output may be intermixed. If a job is evicted from one or more machines during the course of its lifetime, such an output file might contain several copies of the results. This option should be only be used when you wish a certain file to be treated as a running log instead of a precise result.
This option only applies to standard-universe jobs.
These options only apply to standard-universe jobs.
If needed, you may set the buffer controls individually for each file using the buffer_files option. For example, to set the buffer size to 1 Mbyte and the block size to 256 Kbytes for the file input.data, use this command:
buffer_files = "input.data=(1000000,256000)"
Alternatively, you may use these two options to set the default sizes for all files used by your job:
buffer_size = 1000000 buffer_block_size = 256000
If you do not set these, HTCondor will use the values given by these two configuration file macros:
DEFAULT_IO_BUFFER_SIZE = 1000000 DEFAULT_IO_BUFFER_BLOCK_SIZE = 256000
Finally, if no other settings are present, HTCondor will use a buffer of 512 Kbytes and a block size of 32 Kbytes.
If your job attempts to access any of the files mentioned in this list, HTCondor will automatically compress them (if writing) or decompress them (if reading). The compress format is the same as used by GNU gzip.
The files given in this list may be simple file names or complete paths and may include as a wild card. For example, this list causes the file /tmp/data.gz, any file named event.gz, and any file ending in .gzip to be automatically compressed or decompressed as needed:
compress_files = /tmp/data.gz, event.gz, *.gzipDue to the nature of the compression format, compressed files must only be accessed sequentially. Random access reading is allowed but is very slow, while random access writing is simply not possible. This restriction may be avoided by using both compress_files and fetch_files at the same time. When this is done, a file is kept in the decompressed state at the execution machine, but is compressed for transfer to its original location.
This option only applies to standard universe jobs.
This option only applies to standard universe jobs.
Directs HTCondor to use a new file name in place of an old one. name
describes a file name that your job may attempt to open, and newname
describes the file name it should be replaced with.
newname may include an optional leading
access specifier, local:
or remote:
. If left unspecified,
the default access specifier is remote:
. Multiple remaps can be
specified by separating each with a semicolon.
This option only applies to standard universe jobs.
If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash.
file_remaps = "dataset.1=other.dataset"
file_remaps = "very.big = local:/bigdisk/bigfile"
file_remaps = "very.big = local:/bigdisk/bigfile ; dataset.1 = other.dataset"
If your job attempts to access a file mentioned in this list, HTCondor will cause it to be read or written at the execution machine. This is most useful for temporary files not used for input or output. This list uses the same syntax as compress_files, shown above.
local_files = /tmp/*
This option only applies to standard universe jobs.
COMMANDS FOR THE GRID
For a grid-type-string of batch, the single parameter is the name of the local batch system, and will be one of pbs, lsf, or sge. See section 5.3.5 for details.
For a grid-type-string of condor, the first parameter is the name of the remote condor_schedd daemon. The second parameter is the name of the pool to which the remote condor_schedd daemon belongs. See section 5.3.1 for details.
For a grid-type-string of cream, there are three parameters. The first parameter is the web services address of the CREAM server. The second parameter is the name of the batch system that sits behind the CREAM server. The third parameter identifies a site-specific queue within the batch system. See section 5.3.7 for details.
For a grid-type-string of deltacloud, the single parameter is the URL of the deltacloud service requested. See section 5.3.8 for details.
For a grid-type-string of ec2, one additional parameter specifies the EC2 URL. See section 5.3.6 for details.
For a grid-type-string of gt2, the single parameter is the name of the pre-WS GRAM resource to be used. See section 5.3.2 for details.
For a grid-type-string of gt5, the single parameter is the name of the pre-WS GRAM resource to be used, which is the same as for the grid-type-string of gt2. See section 5.3.2 for details.
For a grid-type-string of lsf, no additional parameters are used. See section 5.3.5 for details.
For a grid-type-string of nordugrid, the single parameter is the name of the NorduGrid resource to be used. See section 5.3.3 for details.
For a grid-type-string of pbs, no additional parameters are used. See section 5.3.5 for details.
For a grid-type-string of sge, no additional parameters are used. See section 5.3.5 for details.
For a grid-type-string of unicore, the first parameter is the name of the Unicore Usite to be used. The second parameter is the name of the Unicore Vsite to be used. See section 5.3.4 for details.
For transferring files other than stdin, see transfer_input_files.
For transferring files other than stdout, see transfer_output_files.
x509userproxy is relevant when the universe is vanilla, or when the universe is grid and the type of grid system is one of gt2, gt5, condor, cream, or nordugrid. Defining a value causes the proxy to be delegated to the execute machine. Further, VOMS attributes defined in the proxy will appear in the job ClassAd. See the unnumbered subsection labeled Job ClassAd Attributes on page for all job attribute descriptions.
COMMANDS FOR PARALLEL, JAVA, and SCHEDULER UNIVERSES
remove_kill_sig = SIGUSR1 remove_kill_sig = 10If this command is not present, the value of kill_sig is used.
COMMANDS FOR THE VM UNIVERSE
An example that specifies two disk files:
vm_disk = /myxen/diskfile.img:sda1:w,/myxen/swap.img:sda2:w
ADVANCED COMMANDS
:
)
and the numerical value.
See section 3.12.15 for details on concurrency limits.
See section 2.12.1 for further details and examples.
See section 2.12.1 for further details.
See section 2.12.1 for further details and examples.
Due to implementation details, a deferral time may not be used for scheduler universe jobs.
See section 2.12.1 for further details and examples.
For vanilla universe jobs where there is a shared file system, it is the current working directory on the machine where the job is executed.
For vanilla or grid universe jobs where file transfer mechanisms are utilized (there is not a shared file system), it is the directory on the machine from which the job is submitted where the input files come from, and where the job's output files go to.
For standard universe jobs, it is the directory on the machine from which the job is submitted where the condor_shadow daemon runs; the current working directory for file input and output accomplished through remote system calls.
For scheduler universe jobs, it is the directory on the machine from which the job is submitted where the job runs; the current working directory for file input and output with respect to relative path names.
Note that the path to the executable is not relative to initialdir; if it is a relative path, it is relative to the directory in which the condor_submit command is run.
SIGTSTP
which tells the HTCondor libraries to initiate a checkpoint
of the process. For jobs submitted to other universes,
the default value, when not defined,
is SIGTERM
, which is the standard way to terminate a program in Unix.
LastMatchName0 = "most-recent-Name" LastMatchName1 = "next-most-recent-Name"
The value for each introduced ClassAd is given by the
value of the Name attribute
from the machine ClassAd of a previous execution (match).
As a job is matched, the definitions for these attributes
will roll,
with LastMatchName1
becoming LastMatchName2
,
LastMatchName0
becoming LastMatchName1
,
and LastMatchName0
being set by the most recent
value of the Name attribute.
An intended use of these job attributes is in the requirements expression. The requirements can allow a job to prefer a match with either the same or a different resource than a previous match.
Setting this expression does not affect the job's resource requirements or preferences. For a job to only run on a machine with a minimum MachineMaxVacateTime, or to preferentially run on such machines, explicitly specify this in the requirements and/or rank expressions.
When a resource claim is to be preempted, this expression in the submit file specifies the maximum run time of the job (in seconds, since the job started). This expression has no effect, if it is greater than the maximum retirement time provided by the machine policy. If the resource claim is not preempted, this expression and the machine retirement policy are irrelevant. If the resource claim is preempted the job will be allowed to run until the retirement time expires, at which point it is hard-killed. The job will be soft-killed when it is getting close to the end of retirement in order to give it time to gracefully shut down. The amount of lead-time for soft-killing is determined by the maximum vacating time granted to the job.
Standard universe jobs and any jobs running with nice_user priority have a default max_job_retirement_time of 0, so no retirement time is utilized by default. In all other cases, no default value is provided, so the maximum amount of retirement time is utilized by default.
Setting this expression does not affect the job's resource requirements or preferences. For a job to only run on a machine with a minimum MaxJobRetirementTime, or to preferentially run on such machines, explicitly specify this in the requirements and/or rank expressions.
PRE AND POST SCRIPTS IMPLEMENTED WITH SPECIALLY-NAMED ATTRIBUTES
Note that if both +PreArgs and +PreArguments are specified, the +PreArguments value is used and the +PreArgs value is ignored.
Note that if both +PostArgs and +PostArguments are specified, the +PostArguments value is used and the +PostArgs value is ignored.
If any of the prescript or postscript values are not enclosed in double quotes, they are silently ignored.
Below is an example of the use of starter pre and post scripts:
+PreCmd = "my_pre" +PreArgs = "pre\"arg1 prea'rg2" +PostCmd = "my_post" +PostArguments = "post\"arg1 'post''ar g2'"
For this example PreArgs generates a first argument of
pre"a1"
and a second argument of pre'a2
.
PostArguments generates a first argument of
post"a1
and a second argument of post'a 2
.
MACROS AND COMMENTS
In addition to commands, the submit description file can contain macros and comments:
<macro_name> = <string>Two pre-defined macros are supplied by the submit description file parser. The $(Cluster) macro supplies the value of the ClusterId job ClassAd attribute, and the $(Process) macro supplies the value of the ProcId job ClassAd attribute. These macros are intended to aid in the specification of input/output files, arguments, etc., for clusters with lots of jobs, and/or could be used to supply an HTCondor process with its own cluster and process numbers on the command line.
The $(Node) macro is defined for parallel universe jobs, and is especially relevant for MPI applications. It is a unique value assigned for the duration of the job that essentially identifies the machine (slot) on which a program is executing. Values assigned start at 0 and increase monotonically. The values are assigned as the parallel job is about to start.
Recursive definition of macros is permitted. An example of a construction that works is the following:
foo = bar foo = snap $(foo)As a result, foo = snap bar.
Note that both left- and right- recursion works, so
foo = bar foo = $(foo) snaphas as its result foo = bar snap.
The construction
foo = $(foo) barby itself will not work, as it does not have an initial base case. Mutually recursive constructions such as:
B = bar C = $(B) B = $(C) boowill not work, and will fill memory with expansions.
To use the dollar sign character ($
) as a literal,
without macro expansion, use
$(DOLLAR)
In addition to the normal macro, there is also a special kind of macro called a substitution macro that allows the substitution of a machine ClassAd attribute value defined on the resource machine itself (gotten after a match to the machine has been made) into specific commands within the submit description file. The substitution macro is of the form:
$$(attribute)As this form of the substitution macro is only evaluated within the context of the machine ClassAd, use of a scope resolution prefix TARGET. or MY. is not allowed.
A common use of this form of the substitution macro is for the heterogeneous submission of an executable:
executable = povray.$$(OpSys).$$(Arch)Values for the OpSys and Arch attributes are substituted at match time for any given resource. This example allows HTCondor to automatically choose the correct executable for the matched machine.
An extension to the syntax of the substitution macro provides an alternative string to use if the machine attribute within the substitution macro is undefined. The syntax appears as:
$$(attribute:string_if_attribute_undefined)
An example using this extended syntax provides a path name to a required input file. Since the file can be placed in different locations on different machines, the file's path name is given as an argument to the program.
arguments = $$(input_file_path:/usr/foo)On the machine, if the attribute input_file_path is not defined, then the path /usr/foo is used instead.
A further extension to the syntax of the substitution macro allows the evaluation of a ClassAd expression to define the value. In this form, the expression may refer to machine attributes by prefacing them with the scope resolution prefix TARGET., as specified in section 4.1.3. To place a ClassAd expression into the substitution macro, square brackets are added to delimit the expression. The syntax appears as:
$$([ClassAd expression])An example of a job that uses this syntax may be one that wants to know how much memory it can use. The application cannot detect this itself, as it would potentially use all of the memory on a multi-slot machine. So the job determines the memory per slot, reducing it by 10% to account for miscellaneous overhead, and passes this as a command line argument to the application. In the submit description file will be
arguments = --memory $$([TARGET.Memory * 0.9])
To insert two dollar sign characters ($$
) as literals
into a ClassAd string, use
$$(DOLLARDOLLAR)
The environment macro, $ENV, allows the evaluation of an environment variable to be used in setting a submit description file command. The syntax used is
$ENV(variable)An example submit description file command that uses this functionality evaluates the submitter's home directory in order to set the path and file name of a log file:
log = $ENV(HOME)/jobs/logfileThe environment variable is evaluated when the submit description file is processed.
The $RANDOM_CHOICE macro allows a random choice to be made from a given list of parameters at submission time. For an expression, if some randomness needs to be generated, the macro may appear as
$RANDOM_CHOICE(0,1,2,3,4,5,6)When evaluated, one of the parameters values will be chosen.
condor_submit will exit with a status value of 0 (zero) upon success, and a non-zero value upon failure.
#################### # # submit description file # Example 1: queuing multiple jobs with differing # command line arguments and output files. # #################### Executable = foo Universe = standard Arguments = 15 2000 Output = foo.out1 Error = foo.err1 Queue Arguments = 30 2000 Output = foo.out2 Error = foo.err2 Queue Arguments = 45 6000 Output = foo.out3 Error = foo.err3 Queue
#################### # # Example 2: Show off some fancy features including # use of pre-defined macros and logging. # #################### Executable = foo Universe = standard Requirements = OpSys == "LINUX" && Arch =="INTEL" Rank = Memory >= 64 Request_Memory = 32 Mb Image_Size = 28 Mb Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150
#################### # # Example 3: Run on a RedHat 6 machine # #################### Universe = vanilla Executable = /bin/sleep Arguments = 30 Requirements = (OpSysAndVer == "RedHat6") Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = sleep.log Queue
condor_submit -a "log = out.log" -a "error = error.log" mysubmitfileNote that each of the added commands is contained within quote marks because there are space characters within the command.
Including the command
periodic_remove = CumulativeSuspensionTime > ((RemoteWallClockTime - CumulativeSuspensionTime) / 2.0)in the submit description file causes this to happen.
+WantCheckpoint = Falsein the submit description file before the queue command(s).
See the HTCondor Version 8.0.1 Manual or http://research.cs.wisc.edu/htcondor/ for additional notices.