Next: 13. Appendix B: Codes
Up: HTCondorTM Version 8.0.1 Manual
Previous: procd_ctl
Contents
Index
12. Appendix A: ClassAd Attributes
ClassAd Types
ClassAd attributes vary,
depending on the entity producing the ClassAd.
Therefore, each ClassAd has an attribute named MyType,
which describes the type of ClassAd.
In addition, the condor_collector appends attributes to
any daemon's ClassAd, whenever the condor_collector is
queried. These additional attributes are listed in
the unnumbered subsection labeled ClassAd Attributes Added by the condor_collector
on page .
Here is a list of defined values for MyType,
as well as a reference to a list attributes relevant to
that type.
- Job
- Each submitted job describes its state, for use by the
condor_negotiator daemon in finding a machine upon which to
run the job.
ClassAd attributes that appear in a job ClassAd are listed and described in
the unnumbered subsection labeled Job ClassAd Attributes
on page .
- Machine
- Each machine in the pool (and hence, the condor_startd daemon running
on that machine) describes its state.
ClassAd attributes that appear in a machine ClassAd
are listed and described in
the unnumbered subsection labeled Machine ClassAd Attributes
on page .
- DaemonMaster
- Each condor_master daemon describes its state.
ClassAd attributes that appear in a DaemonMaster ClassAd
are listed and described in
the unnumbered subsection labeled DaemonMaster ClassAd Attributes
on page .
- Scheduler
- Each condor_schedd daemon describes its state.
ClassAd attributes that appear in a Scheduler ClassAd
are listed and described in
the unnumbered subsection labeled Scheduler ClassAd Attributes
on page .
- Negotiator
- Each condor_negotiator daemon describes its state.
ClassAd attributes that appear in a Negotiator ClassAd
are listed and described in
the unnumbered subsection labeled Negotiator ClassAd Attributes
on page .
- Submitter
- Each submitter is described by a ClassAd.
ClassAd attributes that appear in a Submitter ClassAd
are listed and described in
the unnumbered subsection labeled Submitter ClassAd Attributes
on page .
- Defrag
- Each condor_defrag daemon describes its state.
ClassAd attributes that appear in a Defrag ClassAd
are listed and described in
the unnumbered subsection labeled Defrag ClassAd Attributes
on page .
- Collector
- Each condor_collector daemon describes its state.
ClassAd attributes that appear in a Collector ClassAd
are listed and described in
the unnumbered subsection labeled Collector ClassAd Attributes
on page .
- Query
-
In addition, statistics are published for each DaemonCore daemon.
These attributes are listed and described in
the unnumbered subsection labeled DaemonCore Statistics Attributes
on page .
Job ClassAd Attributes
-
- Absent:
- Boolean set to true True if the ad is absent.
- AllRemoteHosts:
- String containing a comma-separated list
of all the remote machines running a parallel or mpi universe job.
- Args:
- A string representing the command line arguments
passed to the job, when those arguments are specified using the
old syntax, as specified in section 11.
- Arguments:
- A string representing the command line arguments
passed to the job, when those arguments are specified using the
new syntax, as specified in section 11.
- BatchQueue:
- For grid universe jobs destined for
PBS, LSF or SGE, the name of the queue in the remote batch system.
- CkptArch:
- String describing the architecture of the machine
this job executed on at the time it last produced a checkpoint.
If the job has never produced a checkpoint,
this attribute is undefined.
- CkptOpSys:
- String describing the operating system of
the machine
this job executed on at the time it last produced a checkpoint.
If the job has never produced a checkpoint,
this attribute is undefined.
- ClusterId:
- Integer cluster identifier for this job.
A cluster is a group of jobs that were submitted together. Each
job has its own unique job identifier within the cluster, but shares a
common cluster identifier.
The value changes each time a job or set of jobs are queued for
execution under HTCondor.
- Cmd:
- The path to and the file name of the job to be executed.
- CommittedTime:
- The number of seconds of wall clock time
that the job has been allocated a machine,
excluding the time spent on run attempts that
were evicted without a checkpoint.
Like RemoteWallClockTime,
this includes time the job spent in a suspended state,
so the total committed wall time spent running is
CommittedTime - CommittedSuspensionTime
- CommittedSlotTime:
- This attribute is identical to
CommittedTime except that the time is multiplied by the
SlotWeight of the machine(s) that ran the job. This relies
on SlotWeight being listed in SYSTEM_JOB_MACHINE_ATTRS .
- CommittedSuspensionTime:
- A running total of the number of
seconds the job has spent in suspension during time in which the job was
not evicted without a checkpoint. This number is updated when the job is
checkpointed and when it exits.
- CompletionDate:
- The time when the job completed,
or the value 0 if the job has not yet completed.
Measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970).
- ConcurrencyLimits:
- A string list,
delimited by commas and space characters.
The items in the list
identify named resources that the job requires.
- CumulativeSlotTime:
- This attribute is identical to
RemoteWallClockTime except that the time is multiplied by the
SlotWeight of the machine(s) that ran the job. This relies
on SlotWeight being listed in SYSTEM_JOB_MACHINE_ATTRS .
- CumulativeSuspensionTime:
- A running total of the number of
seconds the job has spent in suspension for the life of the job.
- CumulativeTransferTime:
- The total time, in seconds, that
condor has spent transferring the input and output sandboxes for the life of the job.
- CurrentHosts:
- The number of hosts in the claimed state,
due to this job.
- DAGManJobId:
- For a DAGMan node job only,
the ClusterId job ClassAd attribute
of the condor_dagman job which is the parent of this node job.
For nested DAGs, this attribute holds only the ClusterId of
the job's immediate parent.
- DAGParentNodeNames:
- For a DAGMan node job only,
a comma separated list of each JobName which is a parent node of
this job's node.
This attribute is passed through to the job via the condor_submit
command line, if it does not exceed the line length defined with
_POSIX_ARG_MAX. For example, if a node job has two parents
with JobNames B and C, the condor_submit command line will
contain
-append +DAGParentNodeNames=B,C
- DAGManNodesLog:
- For a DAGMan node job only, gives the path to
an event log used exclusively by DAGMan to monitor the state of the DAG's jobs.
Events are written to this log file in addition to any log file
specified in the job's submit description file.
- DAGManNodesMask:
- For a DAGMan node job only,
a comma-separated list of the event codes that should be written
to the log specified by DAGManNodesLog,
known as the auxiliary log.
All events not specified in the
DAGManNodesMask string are not written to the auxiliary event log.
The value of this attribute is determined
by DAGMan, and it is passed to the job via the condor_submit command line.
By default, the following events are written to the
auxiliary job log:
- Submit, event code is 0
- Execute, event code is 1
- Executable error, event code is 2
- Job evicted, event code is 4
- Job terminated, event code is 5
- Shadow exception, event code is 7
- Job aborted, event code is 9
- Job suspended, event code is 10
- Job unsuspended, event code is 11
- Job held, event code is 12
- Job released, event code is 13
- Post script terminated, event code is 16
- Globus submit, event code is 17
- Grid submit, event code is 27
If DAGManNodesLog is
not defined, it has no effect. The value of DAGManNodesMask does not
affect events recorded in the user log file referred to by UserLog.
- DelegateJobGSICredentialsLifetime:
- An integer that specifies the maximum number of seconds for which
delegated proxies should be valid. The default behavior is determined
by the configuration
setting DELEGATE_JOB_GSI_CREDENTIALS_LIFETIME , which
defaults to one day. A value of 0 indicates that the delegated proxy
should be valid for as long as allowed by the credential used to
create the proxy. This setting currently only applies to proxies
delegated for non-grid jobs and HTCondor-C jobs. It does not currently
apply to globus grid jobs, which always behave as though this setting
were 0. This setting has no effect if the configuration
setting DELEGATE_JOB_GSI_CREDENTIALS is false, because in
that case the job proxy is copied rather than delegated.
- DeltacloudAvailableActions:
- Used for grid-type deltacloud jobs.
For a running job,
HTCondor sets this string to contain a comma-separated list of actions
that can be performed on a Deltacloud instance,
as given by the selected service.
- DeltacloudHardwareProfile:
- String taken from the submit description file command
deltacloud_hardware_profile. Specifies the
hardware configuration to be used for a grid-type deltacloud
job.
- DeltacloudHardwareProfileCpu:
- String taken from the submit description file command
deltacloud_hardware_profile_cpu. Specifies CPU
details in the hardware configuration to be used for a grid-type
deltacloud job.
- DeltacloudHardwareProfileMemory:
- String taken from the submit description file command
deltacloud_hardware_profile_memory. Specifies memory (RAM)
details in the hardware configuration to be used for a grid-type
deltacloud job.
- DeltacloudHardwareProfileStorage:
- String taken from the submit description file command
deltacloud_hardware_profile_storage. Specifies memory (disk)
details in the hardware configuration to be used for a grid-type
deltacloud job.
- DeltacloudImageId:
- String taken from the submit description file command
deltacloud_image_id.
Specifies the virtual machine image to use for a grid-type deltacloud
job.
- DeltacloudKeyname:
- String taken from the submit description file command
deltacloud_keyname.
Specifies the SSH key pair to use for a grid-type deltacloud job.
- DeltacloudPasswordFile:
- String taken from the submit description file command
deltacloud_password_file.
Specifies a file containing the secret key to be used to authenticate
with the Deltacloud service for a grid-type deltacloud job.
- DeltacloudPrivateNetworkAddresses:
- For a running Deltacloud instance,
HTCondor receives and sets this comma-separated list of the private IP addresses
allocated to the running virtual machine.
- DeltacloudPublicNetworkAddresses:
- For a running Deltacloud instance,
HTCondor receives and sets this comma-separated list of the public IP addresses
allocated to the running virtual machine.
- DeltacloudRealmId:
- String taken from the submit description file command
deltacloud_ream_id.
Specifies the realm to be used for a grid-type deltacloud job.
- DeltacloudUserData:
- String taken from the submit description file command
deltacloud_user_data.
Specifies a block of data to be provided to the instance
for a grid-type deltacloud job.
- DeltacloudUsername:
- String taken from the submit description file command
deltacloud_username.
Specifies the user name to be used to authenticate
with the Deltacloud service for a grid-type deltacloud job.
- DiskUsage:
- Amount of disk space (Kbytes) in the HTCondor
execute directory on the execute machine that this job has used.
An initial value may be set at the job's request, placing into the
job's submit description file a setting such as
# 1 megabyte initial value
+DiskUsage = 1024
vm universe jobs will default to an initial value of the disk
image size.
If not initialized by the job,
non-vm universe jobs will default to an initial value of the
sum of the job's executable and all input files.
- EC2AccessKeyId:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_access_key_id.
Defines the path and file name of the file containing the EC2 Query API's
access key.
- EC2AmiID:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_ami_id.
Identifies the machine image of the instance.
- EC2ElasticIp:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_elastic_ip.
Specifies an Elastic IP address to associate with the instance.
- EC2InstanceName:
- Used for grid type ec2 jobs;
a string set for the job once the instance starts running,
as assigned by the EC2 service,
that represents the unique ID assigned to the instance by the EC2 service.
- EC2InstanceType:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_instance_type.
Specifies a service-specific instance type.
- EC2KeyPair:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_key_pair.
Defines the key pair associated with the EC2 instance.
- EC2SpotPrice:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_spot_price.
Defines the maximum amount per hour a job submitter is willing to
pay to run this job.
- EC2SpotRequestID:
- Used for grid type ec2 jobs;
identifies the spot request HTCondor made on behalf of this job.
- EC2StatusReasonCode:
- Used for grid type ec2 jobs;
reports the reason for the most recent EC2-level state transition.
Can be used to determine if a spot request was terminated
due to a rise in the spot price.
- EC2TagNames:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_tag_names.
Defines the set, and case, of tags associated with the EC2 instance.
- EC2KeyPairFile:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_key_pair_file.
Defines the path and file name of the file
into which to write the SSH key used to access the image, once it is running.
- EC2RemoteVirtualMachineName:
- Used for grid type ec2 jobs;
a string set for the job once the instance starts running,
as assigned by the EC2 service, that represents
the host name upon which the instance runs, such that the
user can communicate with the running instance.
- EC2SecretAccessKey:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_secret_access_key.
Defines that path and file name of the file
containing the EC2 Query API's secret access key.
- EC2SecurityGroups:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_security_groups.
Defines the list of EC2 security groups which should be associated with the job.
- EC2UserData:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_user_data.
Defines a block of data that can be accessed by the virtual machine.
- EC2UserDataFile:
- Used for grid type ec2 jobs;
a string taken from the definition of the submit description file command
ec2_user_data_file.
Specifies a path and file name of a file containing
data that can be accessed by the virtual machine.
- EmailAttributes:
- A string containing a comma-separated
list of job ClassAd attributes. For each attribute name in the list,
its value will be included in the e-mail notification upon job completion.
- EnteredCurrentStatus:
- An integer containing the
epoch time of when the job entered into its current status
So for example, if the job is on hold, the ClassAd expression
time() - EnteredCurrentStatus
will equal the number of seconds that the job has been on hold.
- Args:
- A string representing the environment variables
passed to the job, when those arguments are specified using the
old syntax, as specified in section 11.
- Arguments:
- A string representing the environment variables
passed to the job, when those arguments are specified using the
new syntax, as specified in section 11.
- ExecutableSize:
- Size of the executable in Kbytes.
- ExitBySignal:
- An attribute that is True
when a user job exits via a signal and False otherwise.
For some grid universe jobs, how the job exited is
unavailable. In this case, ExitBySignal is set to False.
- ExitCode:
- When a user job exits by means other than a signal,
this is the exit return code of the user job.
For some grid universe jobs, how the job exited is
unavailable. In this case, ExitCode is set to 0.
- ExitSignal:
- When a user job exits by means of an unhandled
signal, this attribute takes on the numeric value of the signal.
For some grid universe jobs, how the job exited is
unavailable. In this case, ExitSignal will be undefined.
- ExitStatus:
- The way that HTCondor previously dealt with
a job's exit status.
This attribute should no longer be used.
It is not always accurate in
heterogeneous pools, or if the job exited with a signal.
Instead, see the attributes: ExitBySignal,
ExitCode, and
ExitSignal.
- GridJobStatus:
- A string containing the job's status as
reported by the remote job management system.
- GridResource:
- A string defined by the right hand side
of the the submit description file command grid_resource.
It specifies the target grid type, plus additional parameters
specific to the grid type.
- AcctGroup:
- The accounting group name, as set in the
condor_submit file via the accounting_group command. This attribute
is only present if an accounting group was requested by the
submission. See section 3.4.7 for more
information about accounting groups.
- AcctGroupUser:
- The user name associated with the
accounting group. This attribute is only present if an
accounting group was requested by the submission.
- HoldKillSig:
- Currently only for scheduler and local
universe jobs,
a string containing a name of
a signal to be sent to the job if the job is put on hold.
- HoldReason:
- A string containing a human-readable
message about why a job is on hold.
This is the message that will be displayed in response to
the command
condor_q -hold
.
It can be used to determine if a job should be released or not.
- HoldReasonCode:
- An integer value that represents the
reason that a job was put on hold.
Integer Code |
Reason for Hold |
HoldReasonSubCode |
1 |
The user put the job on hold with condor_hold. |
|
2 |
Globus middleware reported an error. |
The GRAM error number. |
3 |
The PERIODIC_HOLD expression evaluated to True. |
|
4 |
The credentials for the job are invalid. |
|
5 |
A job policy expression evaluated to Undefined. |
|
6 |
The condor_starter failed to start the executable. |
The Unix error number. |
7 |
The standard output file for the job could not be opened. |
The Unix error number. |
8 |
The standard input file for the job could not be opened. |
The Unix error number. |
9 |
The standard output stream for the job could not be opened. |
The Unix error number. |
10 |
The standard input stream for the job could not be opened. |
The Unix error number. |
11 |
An internal HTCondor protocol error was encountered when transferring files. |
|
12 |
The condor_starter failed to download input files. |
The Unix error number. |
13 |
The condor_starter failed to upload output files. |
The Unix error number. |
14 |
The initial working directory of the job cannot be accessed. |
The Unix error number. |
15 |
The user requested the job be submitted on hold. |
|
16 |
Input files are being spooled. |
|
17 |
A standard universe job is not compatible with the
condor_shadow version available on the submitting machine. |
|
18 |
An internal HTCondor protocol error was encountered when transferring
files. |
|
19 |
<Keyword>_HOOK_PREPARE_JOB was defined but could not be executed or returned failure. |
|
20 |
The job missed its deferred execution time and therefore failed to run. |
|
21 |
The job was put on hold because WANT_HOLD in the machine policy was true. |
|
22 |
Unable to initialize user log. |
|
23 |
Failed to access user account. |
|
24 |
No compatible shadow. |
|
25 |
Invalid cron settings. |
|
26 |
SYSTEM_PERIODIC_HOLD evaluated to true. |
|
27 |
The system periodic job policy evaluated to undefined. |
|
32 |
The maximum total input file transfer size was exceeded. (See MAX_TRANSFER_INPUT_MB .) |
|
33 |
The maximum total output file transfer size was exceeded. (See MAX_TRANSFER_OUTPUT_MB .) |
|
- HoldReasonSubCode:
- An integer value that represents further
information to go along with the HoldReasonCode, for
some values of HoldReasonCode.
See HoldReasonCode for the values.
- HookKeyword:
- A string that uniquely identifies
a set of job hooks, and added to the ClassAd once a job is fetched.
- ImageSize:
- Maximum observed memory image size
(i.e. virtual memory) of the
job in Kbytes. The initial value is equal to the size of the
executable for non-vm universe jobs, and 0 for vm universe jobs.
When the job writes a checkpoint, the ImageSize
attribute is set to the size of the checkpoint file (since the
checkpoint file contains the job's memory image).
A vanilla universe job's ImageSize is recomputed
internally every 15 seconds.
How quickly this updated information becomes visible to condor_q is
controlled by SHADOW_QUEUE_UPDATE_INTERVAL and
STARTER_UPDATE_INTERVAL.
Under Linux, ProportionalSetSize is a better indicator of
memory usage for jobs with significant sharing of memory between
processes, because ImageSize is simply the sum of virtual
memory sizes across all of the processes in the job, which may count
the same memory pages more than once.
- IwdFlushNFSCache:
- A boolean expression that controls
whether or not HTCondor attempts to flush a submit machine's NFS cache,
in order to refresh an HTCondor job's initial working directory.
The value will be True, unless a job explicitly adds this
attribute, setting it to False.
- JobAdInformationAttrs:
- A comma-separated list
of attribute names. The named attributes and their values are written
in the user log whenever any event is being written to the log.
This is the same as the configuration setting
EVENT_LOG_INFORMATION_ATTRS (see
page ) but it applies to
the user log instead of the system event log.
- JobDescription:
- A string that may be defined for
a job by setting description in the submit description file.
When set, tools which display the executable such as condor_q
will instead use this string.
For interactive jobs that do not have a submit description file,
this string will default to "Interactive job".
- JobCurrentStartDate:
- Time at which the job most recently began
running. Measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970).
- JobCurrentStartExecutingDate:
- Time at which the job most recently finished
transferring its input sandbox and began executing. Measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970)
- JobCurrentStartTransferOutputDate:
- Time at which the job most recently finished
executing and began transferring its output sandbox. Measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970)
- JobLeaseDuration:
- The number of seconds set for
a job lease, the amount of time that a job may continue running
on a remote resource,
despite its submitting machine's lack of response.
See section 2.13.4 for details on job leases.
- JobMaxVacateTime:
- An integer expression that specifies
the time in seconds requested by the job for being allowed to
gracefully shut down.
- JobNotification:
- An integer indicating what events should
be emailed to the user. The integer values correspond to the user
choices for the submit command notification.
Value |
Notification value |
0 |
Never |
1 |
Always |
2 |
Complete |
3 |
Error |
- JobPrio:
- Integer priority for this job, set by
condor_submit or condor_prio. The default value is 0. The higher
the number, the greater (better) the priority.
- JobRunCount:
- This attribute is retained for backwards
compatibility. It may go away in the future. It is equivalent to
NumShadowStarts for all universes except scheduler.
For the scheduler universe, this attribute is equivalent to
NumJobStarts.
- JobStartDate:
- Time at which the job first began
running. Measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970).
- JobStatus:
- Integer which indicates the current
status of the job.
Value |
Status |
1 |
Idle |
2 |
Running |
3 |
Removed |
4 |
Completed |
5 |
Held |
6 |
Transferring Output |
7 |
Suspended |
- JobUniverse:
- Integer which indicates the job
universe.
Value |
Universe |
1 |
standard |
5 |
vanilla |
7 |
scheduler |
8 |
MPI |
9 |
grid |
10 |
java |
11 |
parallel |
12 |
local |
13 |
vm |
- KeepClaimIdle:
- An integer value that represents the number
of seconds that the condor_schedd will continue to keep a claim,
in the Claimed Idle state,
after the job with this attribute defined completes,
and there are no other jobs ready to run from this user.
This attribute may improve the performance of linear DAGs,
in the case when a dependent job can not be scheduled until its
parent has completed.
Extending the claim on the machine may permit the dependent job to be
scheduled with less delay than with waiting for the condor_negotiator
to match with a new machine.
- KillSig:
- The Unix signal number that the job wishes to be
sent before being forcibly killed.
It is relevant only for jobs running on Unix machines.
- KillSigTimeout:
- This attribute is replaced by the
functionality in JobMaxVacateTime as of HTCondor version 7.7.3.
The number of seconds that the job
(other than the standard universe) requests the condor_starter wait
after sending the signal defined as KillSig and before forcibly
removing the job.
The actual amount of time will be the minimum of this value
and the execute machine's configuration variable KILLING_TIMEOUT .
- LastCheckpointPlatform:
- An opaque string which is the
CheckpointPlatform identifier from the last machine where this
standard universe job had successfully produced a checkpoint.
- LastCkptServer:
- Host name of the last checkpoint
server used by this job. When a pool is using multiple checkpoint
servers, this tells the job where to find its checkpoint file.
- LastCkptTime:
- Time at which the job last performed a
successful checkpoint. Measured in the number of seconds since the
epoch (00:00:00 UTC, Jan 1, 1970).
- LastMatchTime:
- An integer containing the epoch time
when the job was last successfully matched with a resource (gatekeeper) Ad.
- LastRejMatchReason:
- If, at any point in the past,
this job failed to match with a resource ad,
this attribute will contain a string with a
human-readable message about why the match failed.
- LastRejMatchTime:
- An integer containing the epoch
time when HTCondor-G last tried to find a match for the job,
but failed to do so.
- LastRemotePool:
- The name of the condor_collector of the pool
in which a job ran via flocking in the most recent run attempt.
This attribute is not defined if the job did not run via flocking.
- LastSuspensionTime:
- Time at which the job last performed a
successful suspension. Measured in the number of seconds since the
epoch (00:00:00 UTC, Jan 1, 1970).
- LastVacateTime:
- Time at which the job was last
evicted from a remote workstation. Measured in the number of seconds
since the epoch (00:00:00 UTC, Jan 1, 1970).
- LeaveJobInQueue:
- A boolean expression that defaults to
False, causing the job to be removed from the queue upon completion.
An exception is if the job is submitted using condor_submit -spool.
For this case, the default expression causes the job to be kept in the queue
for 10 days after completion.
- LocalSysCpu:
- An accumulated number of seconds of
system CPU time that the job caused to the machine upon which
the job was submitted.
- LocalUserCpu:
- An accumulated number of seconds of
user CPU time that the job caused to the machine upon which
the job was submitted.
- MachineAttr<X><N>:
- Machine attribute of name <X> that is placed into this job ClassAd,
as specified by the configuration variable
SYSTEM_JOB_MACHINE_ATTRS.
With the potential for multiple run attempts, <N> represents
an integer value providing historical values of this machine attribute
for multiple runs.
The most recent run will have a value of <N> equal to 0.
The next most recent run will have a value of <N> equal to 1.
- MaxHosts:
- The maximum number of hosts that this job would
like to claim. As long as CurrentHosts is the same as
MaxHosts, no more hosts are negotiated for.
- MaxJobRetirementTime:
- Maximum time in seconds to let this
job run uninterrupted before kicking it off when it is being preempted.
This can only decrease the amount of time from what the corresponding
startd expression allows.
- MaxTransferInputMB:
- This integer expression specifies the maximum allowed total size in
Mbytes of the input files that are transferred for a job. This
expression does not apply to grid universe, standard universe, or
files transferred via file transfer plug-ins. The expression may refer
to attributes of the job. The special value -1 indicates no limit.
If not set, the system setting MAX_TRANSFER_INPUT_MB is
used. If the observed size of all input files at submit time is
larger than the limit, the job will be immediately placed on hold with
a HoldReasonCode value of 32.
If the job passes this initial test, but the size of
the input files increases or the limit decreases so that the limit is
violated, the job will be placed on hold at the time when the file
transfer is attempted.
- MaxTransferOutputMB:
- This integer expression specifies the maximum allowed total size in
Mbytes of the output files that are transferred for a job. This
expression does not apply to grid universe, standard universe, or
files transferred via file transfer plug-ins. The expression may refer
to attributes of the job. The special value -1 indicates no limit.
If not set, the system setting MAX_TRANSFER_OUTPUT_MB is
used. If the total size of the job's output files to be transferred
is larger than the limit, the job will be placed on hold with
a HoldReasonCode value of 33.
The output will be transferred up to the point when the
limit is hit, so some files may be fully transferred, some partially,
and some not at all.
- MemoryUsage:
- An integer expression in units of Mbytes that
represents the peak memory usage for the job.
Its purpose is to be compared with the value defined by a job with the
request_memory submit command,
for purposes of policy evaluation.
- MinHosts:
- The minimum number of hosts that must be in
the claimed state for this job, before the job may enter the running state.
- NextJobStartDelay:
- An integer number of seconds delay time
after this job starts until the next job is started. The value is limited
by the configuration variable MAX_NEXT_JOB_START_DELAY .
- NiceUser:
- Boolean value which when True indicates
that this job is a nice job, raising its user priority value,
thus causing it to run on a machine only when no other HTCondor jobs want
the machine.
- Nonessential:
- A boolean value only relevant to grid universe
jobs, which when True tells HTCondor to simply abort (remove)
any problematic job, instead of putting the job on hold.
It is the equivalent of doing condor_rm followed by
condor_rm -forcex any time the job would have otherwise gone on hold.
If not explicitly set to True, the default value will be False.
- NTDomain:
- A string that identifies the NT domain under
which a job's owner authenticates on a platform running Windows.
- NumCkpts:
- A count of the number of checkpoints
written by this job during its lifetime.
- NumGlobusSubmits:
- An integer that is incremented each
time the condor_gridmanager receives confirmation of a successful job
submission into Globus.
- NumJobMatches:
- An integer that is incremented by the
condor_schedd each time the job is matched with a resource ad by the
negotiator.
- NumJobReconnects:
- An integer count of the number of times a
job successfully reconnected after being disconnected.
This occurs when the
condor_shadow and condor_starter lose contact,
for example because of
transient network failures or a condor_shadow or condor_schedd
restart.
This attribute is only defined for jobs that can reconnected:
those in the vanilla and java universes.
- NumJobStarts:
- An integer count of the number of times the
job started executing.
This is not (yet) defined for standard universe jobs.
- NumPids:
- A count of the number of child processes that
this job has.
- NumRestarts:
- A count of the number of restarts from a
checkpoint attempted by this job during its lifetime.
- NumShadowExceptions:
- An integer count of the number of
times the condor_shadow daemon had a fatal error for a given job.
- NumShadowStarts:
- An integer count of the number of
times a condor_shadow daemon was started for a given job.
This attribute is not defined for
scheduler universe jobs, since
they do not have a condor_shadow daemon associated with them.
For local universe jobs, this attribute is
defined, even though the process that manages the job is technically
a condor_starter rather than a condor_shadow.
This keeps the management of the
local universe and other universes as similar as possible.
- NumSystemHolds:
- An integer that is incremented each time
HTCondor-G places a job on hold due to some sort of error condition. This
counter is useful, since HTCondor-G will always place a job on hold when it
gives up on some error condition. Note that if the user places the job
on hold using the condor_hold command, this attribute is not incremented.
- OtherJobRemoveRequirements:
- A string that defines a list of jobs.
When the job with this attribute defined is removed,
all other jobs defined by the list are also removed.
The string is an expression that defines a constraint equivalent to
the one implied by the command
condor_rm -constraint <constraint>
This attribute is used for jobs managed with condor_dagman to ensure
that node jobs of the DAG are removed when the condor_dagman job
itself is removed. Note that the list of jobs defined by this attribute
must not form a cyclic removal of jobs,
or the condor_schedd will go into an infinite loop
when any of the jobs is removed.
- Owner:
- String describing the user who submitted this
job.
- ParallelShutdownPolicy:
- A string that is only relevant
to parallel universe jobs. Without this attribute defined, the default
policy applied to parallel universe jobs is to consider the whole job
completed when the first node exits, killing processes running on
all remaining nodes. If defined to the following strings, HTCondor's
behavior changes:
- "WAIT_FOR_ALL"
- HTCondor will wait until every node in
the parallel job has completed to consider the job finished.
- PreserveRelativeExecutable:
- When True,
the condor_starter will not prepend Iwd
to Cmd, when Cmd is a relative path name
and TransferExecutable is False.
The default value is False.
This attribute is primarily of interest for users of
USER_JOB_WRAPPER
for the purpose of allowing an executable's location to be resolved
by the user's path in the job wrapper.
- ProcId:
- Integer process identifier for this job.
Within a cluster of many jobs,
each job has the same ClusterId, but will have a unique ProcId.
Within a cluster, assignment of a ProcId value will start
with the value 0.
The job (process) identifier described here is unrelated to operating
system PIDs.
- ProportionalSetSizeKb:
- On Linux execute machines with kernel version more recent than 2.6.27,
this is the maximum observed proportional set size (PSS) in Kbytes,
summed across all processes in the job.
If the execute machine does not
support monitoring of PSS or PSS has not yet been measured,
this attribute will be undefined.
PSS differs from ImageSize in how memory shared
between processes is accounted.
The PSS for one process is the sum of that process' memory pages
divided by the number of processes sharing each of the pages.
ImageSize is the same,
except there is no division by the number of processes sharing the pages.
- QDate:
- Time at which the job was submitted to the job
queue. Measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970).
- ReleaseReason:
- A string containing a human-readable
message about why the job was released from hold.
- RemoteIwd:
- The path to the directory in which
a job is to be executed on a remote machine.
- RemotePool:
- The name of the condor_collector of the pool
in which a job is running via flocking.
This attribute is not defined if the job is not running via flocking.
- RemoteSysCpu:
- The total number of seconds
of system CPU time (the time spent at system calls) the job used
on remote machines. This does not count time spent on run attempts that
were evicted without a checkpoint.
- RemoteUserCpu:
- The total number of seconds
of user CPU time the job used on remote machines. This does not
count time spent on run attempts that were evicted without a checkpoint.
- RemoteWallClockTime:
- Cumulative number of seconds
the job has been allocated a machine.
This also includes time spent in suspension (if any),
so the total real time spent running is
RemoteWallClockTime - CumulativeSuspensionTime
Note that this number does not get reset to
zero when a job is forced to migrate from one machine to another.
CommittedTime, on the other hand, is just like
RemoteWallClockTime except it does get reset to 0 whenever
the job is evicted without a checkpoint.
- RemoveKillSig:
- Currently only for scheduler universe jobs,
a string containing a name of
a signal to be sent to the job if the job is removed.
- RequestCpus:
- The number of CPUs requested for this job.
If dynamic condor_startd provisioning is enabled,
it is the minimum number of CPUs that are needed in the created dynamic slot.
- RequestDisk:
- The amount of disk space in Kbytes requested
for this job.
If dynamic condor_startd provisioning is enabled,
it is the minimum amount of disk space needed in the created dynamic slot.
- RequestedChroot:
- A full path to the directory that the job
requests the condor_starter use as an argument to chroot().
- RequestMemory:
- The amount of memory space in Mbytes
requested for this job.
If dynamic condor_startd provisioning is enabled,
it is the minimum amount of memory needed in the created dynamic slot.
If not set by the job, its definition is specified by
configuration variable JOB_DEFAULT_REQUESTMEMORY .
- ResidentSetSize:
- Maximum observed
physical memory in use by the job in Kbytes while running.
- StackSize:
- Utilized for Linux jobs only,
the number of bytes allocated for stack space for this job.
This number of bytes replaces the default allocation of 512 Mbytes.
- StageOutFinish:
- An attribute representing a Unix epoch time that is defined for a job that is
spooled to a remote site using condor_submit -spool or HTCondor-C
and causes HTCondor to hold the output in the spool while the job waits
in the queue in the Completed state.
This attribute is defined when retrieval of the output finishes.
- StageOutStart:
- An attribute representing a Unix epoch time that is defined for a job that is
spooled to a remote site using condor_submit -spool or HTCondor-C
and causes HTCondor to hold the output in the spool while the job waits
in the queue in the Completed state.
This attribute is defined when retrieval of the output begins.
- StreamErr:
- An attribute utilized only for grid universe jobs.
The default value is True.
If True, and TransferErr is True, then
standard error is streamed back to the submit machine, instead
of doing the transfer (as a whole) after the job completes.
If False, then
standard error is transferred back to the submit machine
(as a whole) after the job completes.
If TransferErr is False, then this job attribute is ignored.
- StreamOut:
- An attribute utilized only for grid universe jobs.
The default value is True.
If True, and TransferOut is True, then
job output is streamed back to the submit machine, instead
of doing the transfer (as a whole) after the job completes.
If False, then
job output is transferred back to the submit machine
(as a whole) after the job completes.
If TransferOut is False, then this job attribute is ignored.
- SubmitterAutoregroup:
- A boolean attribute defined
by the condor_negotiator when it makes a match.
It will be True if the resource was claimed via negotiation
when the configuration variable GROUP_AUTOREGROUP was True.
It will be False otherwise.
- SubmitterGroup:
- The accounting group name defined
by the condor_negotiator when it makes a match.
- SubmitterNegotiatingGroup:
- The accounting group name under
which the resource negotiated when it was claimed,
as set by the condor_negotiator.
- TotalSuspensions:
- A count of the number of times this job
has been suspended during its lifetime.
- TransferErr:
- An attribute utilized only for grid universe jobs.
The default value is True.
If True, then the error output from the job
is transferred from the remote machine back to the submit machine.
The name of the file after transfer is the file referred to
by job attribute Err.
If False, no transfer takes place (remote to submit machine),
and the name of the file is the file referred to
by job attribute Err.
- TransferExecutable:
- An attribute utilized only for grid universe jobs.
The default value is True.
If True, then the job executable is transferred from the submit
machine to the remote machine.
The name of the file (on the submit machine)
that is transferred is given by the
job attribute Cmd.
If False, no transfer takes place, and
the name of the file used (on the remote machine) will be as
given in the job attribute Cmd.
- TransferIn:
- An attribute utilized only for grid universe jobs.
The default value is True.
If True, then the job input is transferred from the submit
machine to the remote machine.
The name of the file that is transferred is given by the
job attribute In.
If False, then the job's input is taken from a file on the
remote machine (pre-staged), and
the name of the file is given by the job attribute In.
- TransferInputSizeMB:
- The total size in Mbytes of input files to be transferred for the
job. Files transferred via file transfer plug-ins are not included.
This attribute is automatically set by condor_submit; jobs submitted
via other submission methods, such as SOAP, may not define this
attribute.
- TransferOut:
- An attribute utilized only for grid universe jobs.
The default value is True.
If True, then the output from the job
is transferred from the remote machine back to the submit machine.
The name of the file after transfer is the file referred to
by job attribute Out.
If False, no transfer takes place (remote to submit machine),
and the name of the file is the file referred to
by job attribute Out.
- TransferringInput:
- A boolean value that indicates whether the job is currently
transferring input files. The value is Undefined if the job is
not scheduled to run or has not yet attempted to start transferring
input. When this value is True, to see whether the transfer is
active or queued, check TransferQueued.
- TransferringOutput:
- A boolean value that indicates whether the job is currently
transferring output files. The value is Undefined if the job
is not scheduled to run or has not yet attempted to start transferring
output. When this value is True, to see whether the transfer
is active or queued, check TransferQueued.
- TransferQueued:
A boolean value that indicates whether the job is currently waiting to
transfer files because of limits placed by
MAX_CONCURRENT_DOWNLOADS or
MAX_CONCURRENT_UPLOADS .
- UserLog:
- The full path and file name on the submit machine
of the log file of job events.
- WantGracefulRemoval:
- A boolean expression that,
when True, specifies that a graceful shutdown of the job
should be done when the job is removed or put on hold.
- WindowsBuildNumber:
- An integer, extracted from the
platform type of the machine upon which this job is submitted,
representing a build number for a Windows operating system.
This attribute only exists for jobs submitted from Windows machines.
- WindowsMajorVersion:
- An integer, extracted from the
platform type of the machine upon which this job is submitted,
representing a major version number (currently 5 or 6)
for a Windows operating system.
This attribute only exists for jobs submitted from Windows machines.
- WindowsMinorVersion:
- An integer, extracted from the
platform type of the machine upon which this job is submitted,
representing a minor version number (currently 0, 1, or 2)
for a Windows operating system.
This attribute only exists for jobs submitted from Windows machines.
- X509UserProxy:
- The full path and file name of the file containing the X.509 user proxy.
- X509UserProxyEmail:
-
- For a job with an X.509 proxy credential, this is the email
address extracted from the proxy.
- X509UserProxyExpiration:
- For a job that defines the submit description file command
x509userproxy, this is the time at which the indicated
X.509 proxy credential will expire, measured in the
number of seconds since the epoch (00:00:00 UTC, Jan 1, 1970).
- X509UserProxyFirstFQAN:
- For a vanilla or grid universe job that defines the submit description
file command x509userproxy,
this is the VOMS Fully Qualified Attribute Name (FQAN) of
the primary role of the credential.
A credential may have multiple roles defined,
but by convention the one listed first is the primary role.
- X509UserProxyFQAN:
- For a vanilla or grid universe job that defines the submit description
file command x509userproxy,
this is a serialized list of the DN and all FQAN.
A comma is used as a separator,
and any existing commas in the DN or FQAN are replaced with the string
,.
Likewise, any ampersands in the DN or FQAN are replaced with
&.
- X509UserProxySubject:
- For a vanilla or grid universe job that defines the submit description
file command x509userproxy,
this attribute contains the Distinguished Name (DN) of the credential
used to submit the job.
- X509UserProxyVOName:
- For a vanilla or grid universe job that defines the submit description
file command x509userproxy,
this is the name of the VOMS virtual organization (VO) that
the user's credential is part of.
The following job ClassAd attributes are relevant only for
vm universe jobs.
-
- VM_MACAddr:
- The MAC address of the virtual
machine's network interface,
in the standard format of six groups of
two hexadecimal digits separated by colons.
This attribute is currently limited to apply only to Xen virtual machines.
The following job ClassAd attributes appear in the job ClassAd
only for the condor_dagman
job submitted under DAGMan.
They represent status information for the DAG.
-
- DAG_InRecovery:
- The value 1 if the DAG is in recovery mode, and
The value 0 otherwise.
- DAG_NodesDone:
- The number of DAG nodes that have finished successfully.
This means that the entire node has finished,
not only an actual HTCondor job or jobs.
- DAG_NodesFailed:
- The number of DAG nodes that have failed.
This value includes all retries, if there are any.
- DAG_NodesPostrun:
- The number of DAG nodes for which a POST script is running
or has been deferred because of a POST script throttle setting.
- DAG_NodesPrerun:
- The number of DAG nodes for which a PRE script is running
or has been deferred because of a PRE script throttle setting.
- DAG_NodesQueued:
- The number of DAG nodes for which the actual HTCondor job or jobs
are queued.
The queued jobs may be in any state.
- DAG_NodesReady:
- The number of DAG nodes that are ready to run,
but which have not yet started running.
- DAG_NodesTotal:
- The total number of nodes in the DAG, including the FINAL node, if there
is a FINAL node.
- DAG_NodesUnready:
- The number of DAG nodes that are not ready to run.
This is a node in which one or more of the parent nodes has not yet finished.
- DAG_Status:
- The overall status of the DAG, with the same values as the _DAG_STATUS
macro.
Value |
Status |
0 |
OK |
1 |
error; an error condition different than those listed here |
2 |
one or more nodes in the DAG have failed |
3 |
the DAG has been aborted by an ABORT-DAG-ON specification |
4 |
removed; the DAG has been removed by condor_rm |
5 |
a cycle was found in the DAG |
6 |
the DAG has been suspended (see section 2.10.6) |
Machine ClassAd Attributes
-
- Activity:
- String which describes HTCondor job activity on the machine.
Can have one of the following values:
- "Idle":
- There is no job activity
- "Busy":
- A job is busy running
- "Suspended":
- A job is currently suspended
- "Vacating":
- A job is currently checkpointing
- "Killing":
- A job is currently being killed
- "Benchmarking":
- The startd is running benchmarks
- "Retiring":
- Waiting for a job to finish or for the maximum retirement time to expire
- Arch:
- String with the architecture of the machine.
Currently supported architectures have the following string
definitions:
- "INTEL":
- Intel x86 CPU (Pentium, Xeon, etc).
- "X86_64":
- AMD/Intel 64-bit X86
These strings show definitions for architectures no longer supported:
- "IA64":
- Intel Itanium
- "SUN4u":
- Sun UltraSparc CPU
- "SUN4x":
- A Sun Sparc CPU other than an UltraSparc, i.e.
sun4m or sun4c CPU found in older Sparc workstations such as the Sparc 10,
Sparc 20, IPC, IPX, etc.
- "PPC":
- 32-bit PowerPC
- "PPC64":
- 64-bit PowerPC
- CanHibernate:
- The condor_startd has the capability to
shut down or hibernate a machine when certain configurable criteria are met.
However, before the condor_startd can shut down a machine,
the hardware itself must support hibernation, as must the operating system.
When the condor_startd initializes,
it checks for this support.
If the machine has the ability to hibernate,
then this boolean ClassAd attribute will be True.
By default, it is False.
- CheckpointPlatform:
- A string which opaquely encodes various
aspects about a machine's operating system, hardware, and kernel
attributes.
It is used to identify systems where previously taken checkpoints for
the standard universe may resume.
- ClockDay:
- The day of the week,
where 0 = Sunday, 1 = Monday, ... , and 6 = Saturday.
- ClockMin:
- The number of minutes passed since midnight.
- CondorLoadAvg:
- The portion of the load average generated
by HTCondor, either from remote jobs or running benchmarks.
- ConsoleIdle:
- The number of seconds since activity on the system
console keyboard or console mouse has last been detected.
- Cpus:
- The number of CPUs in this slot.
It is 1 for a single CPU slot, 2 for a dual CPU slot, etc.
- CurrentRank:
- A float which represents this machine
owner's affinity
for running the HTCondor job which it is currently hosting. If not
currently hosting an HTCondor job, CurrentRank is 0.0.
When a machine is claimed,
the attribute's value is computed by evaluating the machine's
Rank expression with respect to the current job's ClassAd.
- Disk:
- The amount of disk space on this machine available for
the job in Kbytes ( e.g. 23000 = 23 megabytes ). Specifically, this
is the amount of disk space available in the directory specified in
the HTCondor configuration files by the EXECUTE macro, minus any
space reserved with the RESERVED_DISK macro.
- Draining:
- This attribute is True when the slot
is draining and undefined if not.
- DrainingRequestId:
- This attribute contains a string that
is the request id of the draining request that put this slot in a draining
state. It is undefined if the slot is not draining.
- DotNetVersions:
- The .NET framework versions
currently installed on this computer.
Default format is a comma delimited list.
Current definitions:
- "1.1":
- for .Net Framework 1.1
- "2.0":
- for .Net Framework 2.0
- "3.0":
- for .Net Framework 3.0
- "3.5":
- for .Net Framework 3.5
- "4.0Client":
- for .Net Framework 4.0 Client install
- "4.0Full":
- for .Net Framework 4.0 Full install
- DynamicSlot:
- For SMP machines that allow dynamic
partitioning of a slot,
this boolean value identifies that this dynamic slot may be partitioned.
- EnteredCurrentActivity:
- Time at which the machine
entered the current Activity (see Activity entry above). On
all platforms (including NT), this is measured in the number of
integer seconds since the Unix epoch (00:00:00 UTC, Jan 1, 1970).
- ExpectedMachineGracefulDrainingBadput:
- The
job runtime in cpu-seconds that would be lost if graceful draining
were initiated at the time this ad was published. This calculation assumes
that jobs will run for the full retirement time and then be evicted
without saving a checkpoint.
- ExpectedMachineGracefulDrainingCompletion:
- Time at
which graceful draining of the machine could complete if it were
initiated at the time this ad was published. This is measured in the
number of integer seconds since the Unix epoch (00:00:00 UTC, Jan 1,
1970). This value is computed with the assumption that the machine
policy will not suspend jobs during draining while the machine is
waiting for the job to use up its retirement time. If suspension
happens, the upper bound on how long draining could take is
unlimited. To avoid suspension during draining, the SUSPEND
and CONTINUE expressions could be configured to pay
attention to the Draining attribute.
- ExpectedMachineGracefulQuickBadput:
- The
job runtime in cpu-seconds that would be lost if quick draining
were initiated at the time this ad was published. This calculation assumes
that all evicted jobs will not save a checkpoint.
- ExpectedMachineQuickDrainingCompletion:
- Time at
which quick draining of the machine could complete if it were
initiated at the time this ad was published. This is measured in the
number of integer seconds since the Unix epoch (00:00:00 UTC, Jan 1,
1970).
- FileSystemDomain:
- A domain name configured by the
HTCondor administrator which describes a cluster of machines which all
access the same, uniformly-mounted, networked file systems usually via
NFS or AFS. This is useful for Vanilla universe jobs which require
remote file access.
- Has_sse4_1:
- A boolean value set to True
if the machine being advertised supports
the SSE 4.1 instructions, and Undefined otherwise.
- Has_sse4_2:
- A boolean value set to True
if the machine being advertised supports
the SSE 4.2 instructions, and Undefined otherwise.
- has_ssse3:
- A boolean value set to True
if the machine being advertised supports
the SSSE 3 instructions, and Undefined otherwise.
- HasVM:
- A boolean value added to the machine ClassAd
when the configuration triggers the detection of virtual machine
software.
- IsWakeAble:
- A boolean value that when True identifies
that the machine has the capability to be woken into a
fully powered and running state by receiving a Wake On LAN (WOL) packet.
This ability is a function of the operating system,
the network adapter in the machine
(notably, wireless network adapters usually do not have this function),
and BIOS settings.
When the condor_startd initializes,
it tries to detect if the operating system and network adapter both support
waking from hibernation by receipt of a WOL packet.
The default value is False.
- IsWakeEnabled:
- If the hardware and software have the capacity
to be woken into a fully powered and running state by receiving
a Wake On LAN (WOL) packet,
this feature can still be disabled via the BIOS or software.
If BIOS or the operating system have disabled this feature,
the condor_startd sets this boolean attribute to False.
- JobVM_VCPUS:
- An attribute defined if a vm universe job
is running on this slot. Defined by the number of virtualized CPUs
in the virtual machine.
- KeyboardIdle:
- The number of seconds since activity on any
keyboard or mouse associated with this machine has last been detected.
Unlike ConsoleIdle, KeyboardIdle also takes activity
on pseudo-terminals into
account (i.e. virtual ``keyboard'' activity from telnet and rlogin
sessions as well). Note that KeyboardIdle will always be equal to or
less than ConsoleIdle.
- KFlops:
- Relative floating point performance as determined via a
Linpack benchmark.
- LastDrainStartTime:
- Time when draining of this
condor_startd was last initiated (e.g. due to condor_defrag or
condor_drain).
- LastHeardFrom:
- Time when the HTCondor central manager last
received a status update from this machine.
Expressed as
the number of integer seconds since the Unix epoch (00:00:00 UTC, Jan 1, 1970).
Note: This attribute is only inserted by the central manager once it
receives the ClassAd.
It is not present in the condor_startd copy of the ClassAd.
Therefore, you could not use this attribute in defining condor_startd
expressions (and you would not want to).
- LoadAvg:
- A floating point number with the machine's current load
average.
- Machine:
- A string with the machine's fully qualified host name.
- MachineMaxVacateTime:
- An integer expression that specifies
the time in seconds the machine will allow the job to gracefully shut
down.
- Memory:
- The amount of RAM in megabytes.
- Mips:
- Relative integer performance as determined via a Dhrystone
benchmark.
- MonitorSelfAge:
- The number of seconds that this daemon
has been running.
- MonitorSelfCPUUsage:
- The fraction of recent CPU time utilized
by this daemon.
- MonitorSelfImageSize:
- The amount of virtual memory consumed by
this daemon in Kbytes.
- MonitorSelfRegisteredSocketCount:
- The current number of sockets
registered by this daemon.
- MonitorSelfResidentSetSize:
- The amount of resident memory
used by this daemon in Kbytes.
- MonitorSelfSecuritySessions:
- The number of open (cached)
security sessions for this daemon.
- MonitorSelfTime:
- The time, represented as the number of
second elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which this daemon last checked and set the attributes with names that
begin with the string MonitorSelf.
- MyAddress:
- String with the IP and port address of the
condor_startd daemon which is publishing this machine ClassAd.
When using CCB, condor_shared_port, and/or an additional private
network interface, that information will be included here as well.
- MyType:
- The ClassAd type; always set to the literal string "Machine".
- Name:
- The name of this resource; typically the same value as
the Machine attribute, but could be customized by the site
administrator.
On SMP machines, the condor_startd will divide the CPUs up into separate
slots, each with with a unique name.
These names will be of the form ``slot#@full.hostname'', for example,
``slot1@vulture.cs.wisc.edu'', which signifies slot number 1 from
vulture.cs.wisc.edu.
- OpSys:
- String describing the operating system running on this
machine.
Currently supported operating systems have the following string
definitions:
- "LINUX":
- for LINUX 2.0.x, LINUX 2.2.x,
LINUX 2.4.x, or LINUX 2.6.x kernel systems, as well as Scientific Linux
- "OSX":
- for Darwin
- "FREEBSD7":
- for FreeBSD 7
- "FREEBSD8":
- for FreeBSD 8
- "WINDOWS":
- for all versions of Windows
- "SOLARIS5.10":
- for Solaris 2.10 or 5.10
- "SOLARIS5.11":
- for Solaris 2.11 or 5.11
These strings show definitions for operating systems no longer supported:
- "SOLARIS28":
- for Solaris 2.8 or 5.8
- "SOLARIS29":
- for Solaris 2.9 or 5.9
- OpSysAndVer:
- A string indicating an operating system and
a version number.
For Linux operating systems, it is the value of the OpSysName attribute
concatenated with the string version of the OpSysMajorVersion attribute:
- "RedHat5":
- for RedHat Linux version 5
- "RedHat6":
- for RedHat Linux version 6
- "Fedora16":
- for Fedora Linux version 16
- "Debian5":
- for Debian Linux version 5
- "Debian6":
- for Debian Linux version 6
- "SL5":
- for Scientific Linux version 5
- "SL6":
- for Scientific Linux version 6
- "SLFermi5":
- for Fermi's Scientific Linux version 5
- "SLFermi6":
- for Fermi's Scientific Linux version 6
- "SLCern5":
- for CERN's Scientific Linux version 5
- "SLCern6":
- for CERN's Scientific Linux version 6
For MacOS operating systems, it is the value of the OpSysShortName
attribute concatenated with the string version of the OpSysVer attribute:
- "MacOSX605":
- for MacOS version 10.6.5 (Snow Leopard)
- "MacOSX703":
- for MacOS version 10.7.3 (Lion)
For BSD operating systems, it is the value of the OpSysName attribute
concatenated with the string version of the OpSysMajorVersion attribute:
- "FREEBSD7":
- for FreeBSD version 7
- "FREEBSD8":
- for FreeBSD version 8
For Solaris Unix operating systems,
it is the same value as the OpSys attribute:
- "SOLARIS5.10":
- for Solaris 2.10 or 5.10
- "SOLARIS5.11":
- for Solaris 2.11 or 5.11
For Windows operating systems, it is the value of the OpSys attribute
concatenated with the string version of the OpSysMajorVersion attribute:
- "WINDOWS500":
- for Windows 2000
- "WINDOWS501":
- for Windows XP
- "WINDOWS502":
- for Windows Server 2003
- "WINDOWS600":
- for Windows Vista
- "WINDOWS601":
- for Windows 7
- OpSysLegacy:
- A string that holds the long-standing values for the OpSys attribute.
Currently supported operating systems have the following string
definitions:
- "LINUX":
- for LINUX 2.0.x, LINUX 2.2.x, LINUX 2.4.x, or LINUX 2.6.x kernel systems, as well as Scientific Linux versions
- "OSX":
- for Darwin
- "FREEBSD7":
- for FreeBSD version 7
- "FREEBSD8":
- for FreeBSD version 8
- "SOLARIS5.10":
- for Solaris 2.10 or 5.10
- "SOLARIS5.11":
- for Solaris 2.11 or 5.11
- "WINDOWS":
- for all versions of Windows
- OpSysLongName:
- A string giving a full description of
the operating system.
For Linux platforms, this is generally the string taken from /etc/hosts,
with extra characters stripped off Debian versions.
- "Red Hat Enterprise Linux Server release 5.7 (Tikanga)":
- for RedHat Linux version 5
- "Red Hat Enterprise Linux Server release 6.2 (Santiago)":
- for RedHat Linux version 6
- "Fedora release 16 (Verne)":
- for Fedora Linux version 16
- "MacOSX 6.5":
- for MacOS version 10.6.5 (Snow Leopard)
- "MacOSX 7.3":
- for MacOS version 10.7.3 (Lion)
- "FreeBSD8.2-RELEASE-p3":
- for FreeBSD version 8
- "SOLARIS5.10":
- for Solaris 2.10 or 5.10
- "SOLARIS5.11":
- for Solaris 2.11 or 5.11
- "Windows XP SP3":
- for Windows XP
- "Windows 7 SP2":
- for Windows 7
- OpSysMajorVersion:
- An integer value representing the major version of the operating system.
- 5:
- for RedHat Linux version 5
and derived platforms such as Scientific Linux
- 6:
- for RedHat Linux version 6
and derived platforms such as Scientific Linux
- 16:
- for Fedora Linux version 16
- 6:
- for MacOS version 10.6.5 (Snow Leopard)
- 7:
- for MacOS version 10.7.3 (Lion)
- 7:
- for FreeBSD version 7
- 8:
- for FreeBSD version 8
- 5:
- for Solaris 2.10, 5.10, 2.11, or 5.11
- 501:
- for Windows XP
- 600:
- for Windows Vista
- 601:
- for Windows 7
- OpSysName:
- A string containing a terse description of the operating system.
- "RedHat":
- for RedHat Linux version 6
- "Fedora":
- for Fedora Linux version 16
- "SnowLeopard":
- for MacOS version 10.6.5 (Snow Leopard)
- "Lion":
- for MacOS version 10.7.3 (Lion)
- "FREEBSD":
- for FreeBSD version 7 or 8
- "SOLARIS5.10":
- for Solaris 2.10 or 5.10
- "SOLARIS5.11":
- for Solaris 2.11 or 5.11
- "WindowsXP":
- for Windows XP
- "WindowsVista":
- for Windows Vista
- "Windows7":
- for Windows 7
- "SL":
- for Scientific Linux
- "SLFermi":
- for Fermi's Scientific Linux
- "SLCern":
- for CERN's Scientific Linux
- OpSysShortName:
- A string containing a short name for
the operating system.
- "RedHat":
- for RedHat Linux version 5 or 6
- "Fedora":
- for Fedora Linux version 16
- "Debian":
- for Debian Linux version 5 or 6
- "MacOSX":
- for MacOS version 10.6.5 (Snow Leopard) or
for MacOS version 10.7.3 (Lion)
- "FreeBSD":
- for FreeBSD version 7 or 8
- "SOLARIS5.10":
- for Solaris 2.10 or 5.10
- "SOLARIS5.11":
- for Solaris 2.11 or 5.11
- "XP":
- for Windows XP
- "Vista":
- for Windows Vista
- "7":
- for Windows 7
- "SL":
- for Scientific Linux
- "SLFermi":
- for Fermi's Scientific Linux
- "SLCern":
- for CERN's Scientific Linux
- OpSysVer:
- An integer value representing the operating system
version number.
- 602:
- for RedHat Linux version 6.2
- 1600:
- for Fedora Linux version 16.0
- 704:
- for FreeBSD version 7.4
- 802:
- for FreeBSD version 8.2
- 605:
- for MacOS version 10.6.5 (Snow Leopard)
- 703:
- for MacOS version 10.7.3 (Lion)
- 500:
- for Windows 2000
- 501:
- for Windows XP
- 502:
- for Windows Server 2003
- 600:
- for Windows Vista or Windows Server 2008
- 601:
- for Windows 7 or Windows Server 2008
- Requirements:
- A boolean, which when evaluated within the context
of the machine ClassAd and a job ClassAd, must evaluate to
TRUE before HTCondor will allow the job to use this machine.
- MaxJobRetirementTime:
- When the condor_startd wants
to kick the job off, a job which has run for less than this number
of seconds will not be hard-killed. The condor_startd will wait
for the job to finish or to exceed this amount of time, whichever
comes sooner. If the job vacating policy grants the job X seconds
of vacating time, a preempted job will be soft-killed X seconds
before the end of its retirement time, so that hard-killing of the
job will not happen until the end of the retirement time if the job
does not finish shutting down before then. This is an expression
evaluated in the context of the job ClassAd, so it may refer to job
attributes as well as machine attributes.
- RetirementTimeRemaining:
- An integer number of seconds
after MyCurrentTime when the running job can be evicted.
MaxJobRetirementTime is the expression of how much retirement
time the machine offers to new jobs, whereas RetirementTimeRemaining
is the negotiated amount of time remaining for the current running
job. This may be less than the amount offered by the machine's
MaxJobRetirementTime expression, because the job may
ask for less.
- PartitionableSlot:
- For SMP machines,
a boolean value identifying that this slot may be partitioned.
- SlotID:
- For SMP machines, the integer
that identifies the slot.
The value will be
X
for the slot with
name="slotX@full.hostname"
For non-SMP machines with one slot, the value will be 1.
NOTE: This attribute was added in HTCondor version 6.9.3.
For older versions of HTCondor, see VirtualMachineID below.
- SlotWeight:
- This specifies the weight of the slot when
calculating usage, computing fair shares, and enforcing group
quotas. For example, claiming a slot with SlotWeight = 2 is
equivalent to claiming two SlotWeight = 1 slots.
See the description of SlotWeight on
page .
- StartdIpAddr:
- String with the IP and port address of the
condor_startd daemon which is publishing this machine ClassAd.
When using CCB, condor_shared_port, and/or an additional private
network interface, that information will be included here as well.
- State:
- String which publishes the machine's HTCondor state.
Can be:
- "Owner":
- The machine owner is using the machine, and
it is unavailable to HTCondor.
- "Unclaimed":
- The machine is available to run HTCondor jobs,
but a good match is either not available or not
yet found.
- "Matched":
- The HTCondor central manager has found a good
match for this resource, but an HTCondor scheduler has not yet claimed it.
- "Claimed":
- The machine is claimed by a remote
condor_schedd and is probably running a job.
- "Preempting":
- An HTCondor job is being preempted (possibly
via checkpointing) in order to clear the machine for either a higher
priority job or because the machine owner wants the machine back.
- "Drained":
- This slot is not accepting jobs,
because the machine is being drained.
- TargetType:
- Describes what type of ClassAd to match with.
Always set to the string literal "Job", because machine ClassAds
always want to be matched with jobs, and vice-versa.
- TotalCpus:
- The number of CPUs that are on the machine.
This is in contrast with Cpus,
which is the number of CPUs in the slot.
- TotalMachineDrainingBadput:
- The
total job runtime in cpu-seconds that has been lost due to job evictions
caused by draining since this condor_startd began executing. In
this calculation, it is assumed that jobs are evicted without
checkpointing.
- TotalMachineDrainingUnclaimedTime:
- The
total machine-wide time in cpu-seconds that has not been used
(i.e. not matched to a job submitter) due to draining since this
condor_startd began executing.
- TotalTimeBackfillBusy:
- The number of seconds
that this machine (slot) has accumulated within the
backfill busy state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeBackfillIdle:
- The number of seconds
that this machine (slot) has accumulated within the
backfill idle state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeBackfillKilling:
- The number of seconds
that this machine (slot) has accumulated within the
backfill killing state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeClaimedBusy:
- The number of seconds
that this machine (slot) has accumulated within the
claimed busy state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeClaimedIdle:
- The number of seconds
that this machine (slot) has accumulated within the
claimed idle state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeClaimedRetiring:
- The number of seconds
that this machine (slot) has accumulated within the
claimed retiring state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeClaimedSuspended:
- The number of seconds
that this machine (slot) has accumulated within the
claimed suspended state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeMatchedIdle:
- The number of seconds
that this machine (slot) has accumulated within the
matched idle state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeOwnerIdle:
- The number of seconds
that this machine (slot) has accumulated within the
owner idle state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimePreemptingKilling:
- The number of seconds
that this machine (slot) has accumulated within the
preempting killing state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimePreemptingVacating:
- The number of seconds
that this machine (slot) has accumulated within the
preempting vacating state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeUnclaimedBenchmarking:
- The number of seconds
that this machine (slot) has accumulated within the
unclaimed benchmarking state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- TotalTimeUnclaimedIdle:
- The number of seconds
that this machine (slot) has accumulated within the
unclaimed idle state and activity pair since the condor_startd
began executing.
This attribute will only be defined if it has a value greater than 0.
- UidDomain:
- a domain name configured by the HTCondor
administrator which describes a cluster of machines which all have
the same passwd file entries, and therefore all have the same logins.
- VirtualMachineID:
- Starting with HTCondor version 6.9.3, this attribute is now longer used.
Instead, use SlotID, as described above.
This will only be present if ALLOW_VM_CRUFT is TRUE.
- VirtualMemory:
- The amount of currently available virtual memory
(swap space) expressed in Kbytes.
On Linux platforms, it is the sum of paging space and physical memory,
which more accurately represents the virtual memory size of the machine.
- VM_AvailNum:
- The maximum number of vm universe jobs that
can be started on this machine. This maximum is set by the configuration
variable VM_MAX_NUMBER .
- VM_Guest_Mem:
- An attribute defined if a vm universe job
is running on this slot. Defined by the amount of memory in use by the
virtual machine, given in Mbytes.
- VM_Memory:
- Gives the amount of memory available for starting
additional VM jobs on this machine, given in Mbytes.
The maximum value is set by the configuration variable VM_MEMORY .
- VM_Networking:
- A boolean value indicating whether networking
is allowed for virtual machines on this machine.
- VM_Type:
- The type of virtual machine software that can run
on this machine. The value is set by the configuration variable
VM_TYPE .
- WindowsBuildNumber:
- An integer, extracted from the
platform type, representing a build number
for a Windows operating system.
This attribute only exists on Windows machines.
- WindowsMajorVersion:
- An integer, extracted from the
platform type, representing a major version number (currently 5 or 6)
for a Windows operating system.
This attribute only exists on Windows machines.
- WindowsMinorVersion:
- An integer, extracted from the
platform type, representing a minor version number (currently 0, 1, or 2)
for a Windows operating system.
This attribute only exists on Windows machines.
In addition, there are a few attributes that are automatically
inserted into the machine ClassAd whenever a resource is in the
Claimed state:
-
- ClientMachine:
- The host name of the machine that has
claimed this resource
- RemoteAutoregroup:
- A boolean attribute which is True
if this resource was claimed via negotiation
when the configuration variable GROUP_AUTOREGROUP is True.
It is False otherwise.
- RemoteGroup:
- The accounting group name corresponding to
the submitter that claimed this resource.
- RemoteNegotiatingGroup:
- The accounting group name under
which this resource negotiated when it was claimed. This attribute will
frequently be the same as attribute RemoteGroup,
but it may differ in cases such
as when configuration variable GROUP_AUTOREGROUP is True,
in which case it will have the name of the root group,
identified as <none>.
- RemoteOwner:
- The name of the user who originally
claimed this resource.
- RemoteUser:
- The name of the user who is currently
using this resource.
In general, this will always be the same as the RemoteOwner,
but in some cases, a resource can be claimed by one entity that hands
off the resource to another entity which uses it.
In that case, RemoteUser would hold the name of the entity
currently using the resource, while RemoteOwner would hold
the name of the entity that claimed the resource.
- PreemptingOwner:
- The name of the user who is preempting
the job that is currently running on this resource.
- PreemptingUser:
- The name of the user who is preempting
the job that is currently running on this resource. The relationship
between PreemptingUser and PreemptingOwner is the same
as the relationship between RemoteUser and RemoteOwner.
- PreemptingRank:
- A float which represents this machine
owner's affinity for running the HTCondor job which is waiting for the
current job to finish or be preempted. If not currently hosting an
HTCondor job, PreemptingRank is undefined. When a machine is
claimed and there is already a job running, the attribute's value is
computed by evaluating the machine's Rank expression with
respect to the preempting job's ClassAd.
- TotalClaimRunTime:
- A running total of the amount of
time (in seconds) that all jobs (under the same claim) ran
(have spent in the Claimed/Busy state).
- TotalClaimSuspendTime:
- A running total of the amount of
time (in seconds) that all jobs (under the same claim) have been
suspended (in the Claimed/Suspended state).
- TotalJobRunTime:
- A running total of the amount of
time (in seconds) that a single job ran
(has spent in the Claimed/Busy state).
- TotalJobSuspendTime:
- A running total of the amount of
time (in seconds) that a single job has been suspended
(in the Claimed/Suspended state).
There are a few attributes that are only inserted into the
machine ClassAd if a job is currently executing.
If the resource is claimed but no job are running, none of these
attributes will be defined.
-
- JobId:
- The job's identifier (for example,
152.3
), as seen from condor_q
on the submitting machine.
- JobStart:
- The time stamp in integer seconds of when the job began
executing, since the Unix epoch (00:00:00 UTC, Jan 1, 1970). For idle
machines, the value is UNDEFINED.
- LastPeriodicCheckpoint:
- If the job has performed a
periodic checkpoint, this attribute will be defined and will hold the
time stamp of when the last periodic checkpoint was begun.
If the job has yet to perform a periodic checkpoint, or cannot
checkpoint at all, the LastPeriodicCheckpoint attribute will
not be defined.
There are a few attributes that are applicable to machines that
are offline, that is, hibernating.
-
- MachineLastMatchTime:
- The Unix epoch time when this offline
ClassAd
would have been matched to a job, if the machine were online.
In addition,
the slot1 ClassAd of a multi-slot machine will have
slot<X>_MachineLastMatchTime defined,
where <X> is replaced by the slot id of each of the slots
with MachineLastMatchTime defined.
- Offline:
- A boolean value, that when True,
indicates this machine is in an offline state in the condor_collector.
Such ClassAds are stored persistently,
such that they will continue to exist after the condor_collector restarts.
- Unhibernate:
- A boolean expression that specifies when
a hibernating machine should be woken up, for example, by condor_rooster.
Finally, the single attribute,
CurrentTime, is defined by the ClassAd
environment.
-
- CurrentTime:
- Evaluates to the
the number of integer seconds since the Unix epoch (00:00:00 UTC, Jan 1, 1970).
DaemonMaster ClassAd Attributes
-
- CkptServer:
- A string with with the fully qualified
host name of the machine running a checkpoint server.
- DaemonStartTime:
- The time that this daemon was
started, represented as the number of second elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
- Machine:
- A string with the machine's fully qualified
host name.
- MasterIpAddr:
- String with the IP and port address of the
condor_master daemon which is publishing this DaemonMaster ClassAd.
- MonitorSelfAge:
- The number of seconds that this daemon
has been running.
- MonitorSelfCPUUsage:
- The fraction of recent CPU time utilized
by this daemon.
- MonitorSelfImageSize:
- The amount of virtual memory consumed by
this daemon in Kbytes.
- MonitorSelfRegisteredSocketCount:
- The current number of sockets
registered by this daemon.
- MonitorSelfResidentSetSize:
- The amount of resident memory
used by this daemon in Kbytes.
- MonitorSelfSecuritySessions:
- The number of open (cached)
security sessions for this daemon.
- MonitorSelfTime:
- The time, represented as the number of
second elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which this daemon last checked and set the attributes with names that
begin with the string MonitorSelf.
- MyAddress:
- Description is not yet written.
- MyCurrentTime:
- The time, represented as the number of
second elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which the condor_master daemon last sent a ClassAd update to the
condor_collector.
- Name:
- The name of this resource; typically the same value as
the Machine attribute, but could be customized by the site
administrator.
On SMP machines, the condor_startd will divide the CPUs up into separate
slots, each with with a unique name.
These names will be of the form ``slot#@full.hostname'', for example,
``slot1@vulture.cs.wisc.edu'', which signifies slot number 1 from
vulture.cs.wisc.edu.
- PublicNetworkIpAddr:
- Description is not yet written.
- RealUid:
- The UID under which the condor_master is started.
- UpdateSequenceNumber:
- An integer, starting at zero,
and incremented with each ClassAd update sent to the condor_collector.
The condor_collector uses this value to sequence the updates it
receives.
Scheduler ClassAd Attributes
-
- CollectorHost:
- The name of the main condor_collector
which this condor_schedd daemon reports to,
as copied from COLLECTOR_HOST .
If a condor_schedd flocks to other condor_collector daemons,
this attribute still represents the "home" condor_collector,
so this value can be used to discover if a condor_schedd
is currently flocking.
- DaemonCoreDutyCycle:
- A Statistics attribute defining
the ratio of the time spent handling
messages and events to the elapsed time for the time period defined by
StatsLifetime of this condor_schedd.
A value near 0.0 indicates an idle daemon,
while a value near 1.0 indicates a daemon running at or above capacity.
- DaemonStartTime:
- The time that this daemon was
started, represented as the number of second elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
- DetectedCpus:
- The number of detected machine CPUs/cores.
- DetectedMemory:
- The amount of detected machine RAM in MBytes.
- JobQueueBirthdate:
- Description is not yet written.
- JobsAccumBadputTime:
- A Statistics attribute defining
the sum of the all of the time jobs which did not complete successfully
have spent running over the lifetime of this condor_schedd.
- JobsAccumRunningTime:
- A Statistics attribute defining
the sum of the all of the time jobs have spent running
in the time interval defined by attribute StatsLifetime.
- JobsAccumTimeToStart:
- A Statistics attribute defining
the sum of all the time jobs have spent waiting to start
in the time interval defined by attribute StatsLifetime.
- JobsBadputRuntimes:
- A Statistics attribute defining
a histogram count of jobs that did not complete successfully,
as classified by time spent running,
over the lifetime of this condor_schedd.
Counts within the histogram are separated by a comma and a space,
where the time interval classification is defined in the ClassAd attribute
JobsRuntimesHistogramBuckets.
- JobsBadputSizes:
- A Statistics attribute defining
a histogram count of jobs that did not complete successfully,
as classified by image size,
over the lifetime of this condor_schedd.
Counts within the histogram are separated by a comma and a space,
where the size classification is defined in the ClassAd attribute
JobsSizesHistogramBuckets.
- JobsCheckpointed:
- A Statistics attribute defining
the number of times jobs that have exited
with a condor_shadow exit code of JOB_CKPTED
in the time interval defined by attribute StatsLifetime.
- JobsCompleted:
- A Statistics attribute defining
the number of jobs successfully completed
in the time interval defined by attribute StatsLifetime.
- JobsCompletedRuntimes:
- A Statistics attribute defining
a histogram count of jobs that completed successfully
as classified by time spent running,
over the lifetime of this condor_schedd.
Counts within the histogram are separated by a comma and a space,
where the time interval classification is defined in the ClassAd attribute
JobsRuntimesHistogramBuckets.
- JobsCompletedSizes:
- A Statistics attribute defining
a histogram count of jobs that completed successfully
as classified by image size,
over the lifetime of this condor_schedd.
Counts within the histogram are separated by a comma and a space,
where the size classification is defined in the ClassAd attribute
JobsSizesHistogramBuckets.
- JobsCoredumped:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of JOB_COREDUMPED
in the time interval defined by attribute StatsLifetime.
- JobsDebugLogError:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of DPRINTF_ERROR
in the time interval defined by attribute StatsLifetime.
- JobsExecFailed:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of JOB_EXEC_FAILED
in the time interval defined by attribute StatsLifetime.
- JobsExited:
- A Statistics attribute defining
the number of times that jobs that exited
(successfully or not)
in the time interval defined by attribute StatsLifetime.
- JobsExitedAndClaimClosing:
- A Statistics attribute defining
the number of times jobs have exited
with a condor_shadow exit code of JOB_EXITED_AND_CLAIM_CLOSING
in the time interval defined by attribute StatsLifetime.
- JobsExitedNormally:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of JOB_EXITED or with an
exit code of JOB_EXITED_AND_CLAIM_CLOSING
in the time interval defined by attribute StatsLifetime.
- JobsExitException:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_EXCEPTION
or with an unknown status
in the time interval defined by attribute StatsLifetime.
- JobsKilled:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_KILLED
in the time interval defined by attribute StatsLifetime.
- JobsMissedDeferralTime:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of
JOB_MISSED_DEFERRAL_TIME
in the time interval defined by attribute StatsLifetime.
- JobsNotStarted:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of
JOB_NOT_STARTED
in the time interval defined by attribute StatsLifetime.
- JobsRunningRuntimes:
- A Statistics attribute defining
a histogram count of jobs currently running,
as classified by elapsed runtime.
Counts within the histogram are separated by a comma and a space,
where the time interval classification is defined in the ClassAd attribute
JobsRuntimesHistogramBuckets.
- JobsRunningSizes:
- A Statistics attribute defining
a histogram count of jobs currently running,
as classified by image size.
Counts within the histogram are separated by a comma and a space,
where the size classification is defined in the ClassAd attribute
JobsSizesHistogramBuckets.
- JobsRuntimesHistogramBuckets:
- A Statistics attribute defining
the predefined bucket boundaries for histogram statistics that
classify run times.
Defined as
JobsRuntimesHistogramBuckets = "30Sec, 1Min, 3Min, 10Min, 30Min, 1Hr, 3Hr,
6Hr, 12Hr, 1Day, 2Day, 4Day, 8Day, 16Day"
- JobsShadowNoMemory:
- A Statistics attribute defining
the number of times that jobs have exited
because there was not enough memory to start the condor_shadow
in the time interval defined by attribute StatsLifetime.
- JobsShouldHold:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_SHOULD_HOLD
in the time interval defined by attribute StatsLifetime.
- JobsShouldRemove:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_SHOULD_REMOVE
in the time interval defined by attribute StatsLifetime.
- JobsShouldRequeue:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_SHOULD_REQUEUE
in the time interval defined by attribute StatsLifetime.
- JobsSizesHistogramBuckets:
- A Statistics attribute defining
the predefined bucket boundaries for histogram statistics that
classify image sizes.
Defined as
JobsSizesHistogramBuckets = "64Kb, 256Kb, 1Mb, 4Mb, 16Mb, 64Mb, 256Mb,
1Gb, 4Gb, 16Gb, 64Gb, 256Gb"
- JobsStarted:
- A Statistics attribute defining
the number of jobs started
in the time interval defined by attribute StatsLifetime.
- JobsSubmitted:
- A Statistics attribute defining
the number of jobs submitted
in the time interval defined by attribute StatsLifetime.
- Machine:
- A string with the machine's fully qualified
host name.
- MaxJobsRunning:
- The same integer value as set by the
evaluation of the configuration variable MAX_JOBS_RUNNING .
See the definition at section 3.3.11 on
page .
- MonitorSelfAge:
- The number of seconds that this daemon
has been running.
- MonitorSelfCPUUsage:
- The fraction of recent CPU time utilized
by this daemon.
- MonitorSelfImageSize:
- The amount of virtual memory consumed by
this daemon in Kbytes.
- MonitorSelfRegisteredSocketCount:
- The current number of sockets
registered by this daemon.
- MonitorSelfResidentSetSize:
- The amount of resident memory
used by this daemon in Kbytes.
- MonitorSelfSecuritySessions:
- The number of open (cached)
security sessions for this daemon.
- MonitorSelfTime:
- The time, represented as the number of
second elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which this daemon last checked and set the attributes with names that
begin with the string MonitorSelf.
- MyAddress:
- Description is not yet written.
- MyCurrentTime:
- The time, represented as the number of
second elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which the condor_schedd daemon last sent a ClassAd update to the
condor_collector.
- Name:
- The name of this resource; typically the same value as
the Machine attribute, but could be customized by the site
administrator.
On SMP machines, the condor_startd will divide the CPUs up into separate
slots, each with with a unique name.
These names will be of the form ``slot#@full.hostname'', for example,
``slot1@vulture.cs.wisc.edu'', which signifies slot number 1 from
vulture.cs.wisc.edu.
- NumUsers:
- The integer number of distinct users with jobs in
this condor_schedd's queue.
- PublicNetworkIpAddr:
- Description is not yet written.
- RecentDaemonCoreDutyCycle:
- A Statistics attribute defining
the ratio of the time spent
handling messages and events to the elapsed time
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsAccumBadputTime:
- A Statistics attribute defining
the sum of the all of the time that jobs which did not complete successfully
have spent running
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsAccumRunningTime:
- A Statistics attribute defining
the sum of the all of the time jobs
which have exited
in the previous time interval defined by attribute RecentStatsLifetime
spent running.
- RecentJobsAccumTimeToStart:
- A Statistics attribute defining
the sum of all the time jobs
which have exited
in the previous time interval defined by attribute RecentStatsLifetime
had spent waiting to start.
- RecentJobsBadputRuntimes:
- A Statistics attribute defining
a histogram count of jobs that did not complete successfully,
as classified by time spent running,
in the previous time interval defined by attribute RecentStatsLifetime.
Counts within the histogram are separated by a comma and a space,
where the time interval classification is defined in the ClassAd attribute
JobsRuntimesHistogramBuckets.
- RecentJobsBadputSizes:
- A Statistics attribute defining
a histogram count of jobs that did not complete successfully,
as classified by image size,
in the previous time interval defined by attribute RecentStatsLifetime.
Counts within the histogram are separated by a comma and a space,
where the size classification is defined in the ClassAd attribute
JobsSizesHistogramBuckets.
- RecentJobsCheckpointed:
- A Statistics attribute defining
the number of times jobs that have exited
with a condor_shadow exit code of JOB_CKPTED
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsCompleted:
- A Statistics attribute defining
the number of jobs successfully completed
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsCompletedRuntimes:
- A Statistics attribute defining
a histogram count of jobs that completed successfully,
as classified by time spent running,
in the previous time interval defined by attribute RecentStatsLifetime.
Counts within the histogram are separated by a comma and a space,
where the time interval classification is defined in the ClassAd attribute
JobsRuntimesHistogramBuckets.
- RecentJobsCompletedSizes:
- A Statistics attribute defining
a histogram count of jobs that completed successfully,
as classified by image size,
in the previous time interval defined by attribute RecentStatsLifetime.
Counts within the histogram are separated by a comma and a space,
where the size classification is defined in the ClassAd attribute
JobsSizesHistogramBuckets.
- RecentJobsCoredumped:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of JOB_COREDUMPED
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsDebugLogError:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of DPRINTF_ERROR
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsExecFailed:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of JOB_EXEC_FAILED
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsExited:
- A Statistics attribute defining
the number of times that jobs have exited normally
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsExitedAndClaimClosing:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of
JOB_EXITED_AND_CLAIM_CLOSING
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsExitedNormally:
- A Statistics attribute defining
the number of times that jobs have exited
with a condor_shadow exit code of JOB_EXITED or with an
exit code of JOB_EXITED_AND_CLAIM_CLOSING
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsExitException:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_EXCEPTION
or with an unknown status
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsKilled:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_KILLED
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsMissedDeferralTime:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of
JOB_MISSED_DEFERRAL_TIME
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsNotStarted:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_NOT_STARTED
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsShadowNoMemory:
- A Statistics attribute defining
the number of times that jobs have exited
because there was not enough memory to start the condor_shadow
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsShouldHold:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_SHOULD_HOLD
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsShouldRemove:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_SHOULD_REMOVE
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsShouldRequeue:
- A Statistics attribute defining
the number of times that jobs
have exited with a condor_shadow exit code of JOB_SHOULD_REQUEUE
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsStarted:
- A Statistics attribute defining
the number of jobs started
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentJobsSubmitted:
- A Statistics attribute defining
the number of jobs submitted
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentShadowsReconnections:
- A Statistics attribute defining
the number of times that condor_shadow daemons lost
connection to their condor_starter daemons and successfully reconnected
in the previous time interval defined by attribute RecentStatsLifetime.
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- RecentShadowsRecycled:
- A Statistics attribute defining
the number of times condor_shadow
processes have been recycled for use with a new job
in the previous time interval defined by attribute RecentStatsLifetime.
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- RecentShadowsStarted:
- A Statistics attribute defining
the number of condor_shadow daemons started
in the previous time interval defined by attribute RecentStatsLifetime.
- RecentStatsLifetime:
- A Statistics attribute defining
the time in seconds over which statistics values have been collected
for attributes with names that begin with Recent.
This value starts at 0, and it may grow to a value as large as
the value defined for attribute RecentWindowMax.
- RecentStatsTickTime:
- A Statistics attribute defining
the time that attributes with names that begin with Recent
were last updated,
represented as the number of seconds elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- RecentWindowMax:
- A Statistics attribute defining
the maximum time in seconds over which
attributes with names that begin with Recent are collected.
The value is set by the configuration variable
STATISTICS_WINDOW_SECONDS , which defaults to 1200 seconds
(20 minutes).
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- ScheddIpAddr:
- String with the IP and port address of the
condor_schedd daemon which is publishing this Scheduler ClassAd.
- ServerTime:
- Description is not yet written.
- ShadowsReconnections:
- A Statistics attribute defining
the number of times condor_shadows lost
connection to their condor_starters and successfully reconnected
in the previous StatsLifetime seconds.
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- ShadowsRecycled:
- A Statistics attribute defining
the number of times condor_shadow processes have been
recycled for use with a new job
in the previous StatsLifetime seconds.
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- ShadowsRunning:
- A Statistics attribute defining
the number of condor_shadow daemons currently running
that are owned by this condor_schedd.
- ShadowsRunningPeak:
- A Statistics attribute defining
the maximum number of condor_shadow daemons running at one time
that were owned by this condor_schedd over the lifetime of
this condor_schedd.
- ShadowsStarted:
- A Statistics attribute defining
the number of condor_shadow daemons started
in the previous time interval defined by attribute StatsLifetime.
- StartLocalUniverse:
- The same boolean value as set in the
configuration variable START_LOCAL_UNIVERSE .
See the definition at section 3.3.11 on
page .
- StartSchedulerUniverse:
- The same boolean value as set in the
configuration variable START_SCHEDULER_UNIVERSE .
See the definition at section 3.3.11 on
page .
- StatsLastUpdateTime:
- A Statistics attribute defining
the time that statistics about jobs were last updated,
represented as the number of seconds elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- StatsLifetime:
- A Statistics attribute defining
the time in seconds over which statistics have been collected
for attributes with names that do not begin with Recent.
This statistic only appears in the Scheduler ClassAd if the level of
verbosity set by the configuration variable STATISTICS_TO_PUBLISH
is set to 2 or higher.
- TotalFlockedJobs:
- The total number of jobs from this
condor_schedd daemon that are currently flocked to other pools.
- TotalHeldJobs:
- The total number of jobs from this
condor_schedd daemon that are currently on hold.
- TotalIdleJobs:
- The total number of jobs from this
condor_schedd daemon that are currently idle.
- TotalJobAds:
- The total number of all jobs (in all
states) from this condor_schedd daemon.
- TotalLocalIdleJobs:
- The total number of
local universe jobs from this
condor_schedd daemon that are currently idle.
- TotalLocalRunningJobs:
- The total number of
local universe jobs from this
condor_schedd daemon that are currently running.
- TotalRemovedJobs:
- The current number of all running jobs
from this condor_schedd daemon that have remove requests.
- TotalRunningJobs:
- The total number of jobs from this
condor_schedd daemon that are currently running.
- TotalSchedulerIdleJobs:
- The total number of
scheduler universe jobs from this
condor_schedd daemon that are currently idle.
- TotalSchedulerRunningJobs:
- The total number of
scheduler universe jobs from this
condor_schedd daemon that are currently running.
- UpdateInterval:
- The interval, in seconds,
between publication of this condor_schedd ClassAd and
the previous publication.
- UpdateSequenceNumber:
- An integer, starting at zero,
and incremented with each ClassAd update sent to the condor_collector.
The condor_collector uses this value to sequence the updates it
receives.
- VirtualMemory:
- Description is not yet written.
- WantResAd:
- A boolean value that when True
causes the condor_negotiator daemon to send to this condor_schedd
daemon a full machine ClassAd corresponding to a matched job.
When using file transfer concurrency limits,
the following additional I/O usage statistics are published.
These includes the sum and rate of bytes
transferred as well as time spent reading and writing to files and
to the network. These statistics are reported for the sum of all
users and may also be reported individually for recently active users
by increasing the verbosity level STATISTICS_TO_PUBLISH = TRANSFER:2.
Each of the per-user statistics is prefixed by a
user name in the form Owner_<username>_FileTransferUploadBytes.
In this case, the attribute represents activity by the specified user.
The published user name is actually the file transfer queue name,
as defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
This expression defaults to Owner_ followed by the name of the job
owner.
The attributes that are rates have a
suffix that specifies the time span of the exponential moving average.
By default the time spans that are published are 1m, 5m, 1h, and 1d.
This can be changed by configuring configuration variable
TRANSFER_IO_REPORT_TIMESPANS . These attributes are only
reported once a full time span has accumulated.
-
- FileTransferDownloadBytes
- Total number of bytes
downloaded as output from jobs since this condor_schedd was
started.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name Owner_<username>_FileTransferDownloadBytes.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferDownloadBytesPerSecond_<timespan>
- Exponential moving average over the specified time span of the rate
at which bytes have been downloaded as output from jobs.
The time spans that are published are configured by
TRANSFER_IO_REPORT_TIMESPANS , which defaults to
1m, 5m, 1h, and 1d. When less than one full time span has
accumulated, the attribute is not published.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferDownloadBytesPerSecond_<timespan>.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferFileReadLoad_<timespan>
- Exponential
moving average over the specified time span of the rate at which
submit-side file transfer processes have spent time reading from
files to be transferred as input to jobs. One file transfer process
spending nearly all of its time reading files will generate a load
close to 1.0.
The time spans that are published are configured by
TRANSFER_IO_REPORT_TIMESPANS , which defaults to
1m, 5m, 1h, and 1d. When less than one full time span has
accumulated, the attribute is not published.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferFileReadLoad_<timespan>.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferFileReadSeconds
- Total number of
submit-side transfer process seconds spent reading from files to be
transferred as input to jobs since this condor_schedd was started.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferFileReadSeconds.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferFileWriteLoad_<timespan>
- Exponential moving average over the specified time span of the rate
at which submit-side file transfer processes have spent time writing to files
transferred as output from jobs. One file transfer process spending
nearly all of its time writing to files will generate a load close
to 1.0.
The time spans that are published are configured by
TRANSFER_IO_REPORT_TIMESPANS , which defaults to
1m, 5m, 1h, and 1d. When less than one full time span has
accumulated, the attribute is not published.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferFileWriteLoad_<timespan>.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferFileWriteSeconds
- Total number of
submit-side transfer process seconds spent writing to files
transferred as output from jobs since this condor_schedd was
started.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferFileWriteSeconds.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferNetReadLoad_<timespan>
- Exponential moving
average over the specified time span of the rate at which submit-side
file transfer processes have spent time reading from the network
when transferring output from jobs. One file transfer process
spending nearly all of its time reading from the network will
generate a load close to 1.0. The reason a file transfer process may
spend a long time writing to the network could be a network
bottleneck on the path between the submit and execute machine. It
could also be caused by slow reads from the disk on the execute side.
The time spans that are published are configured by
TRANSFER_IO_REPORT_TIMESPANS , which defaults to
1m, 5m, 1h, and 1d. When less than one full time span has
accumulated, the attribute is not published.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferNetReadLoad_<timespan>.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferNetReadSeconds
- Total number of submit-side
transfer process seconds spent reading from the network when
transferring output from jobs since this condor_schedd was
started. The reason a file transfer process may
spend a long time writing to the network could be a network
bottleneck on the path between the submit and execute machine. It
could also be caused by slow reads from the disk on the execute side.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferNetReadSeconds.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferNetWriteLoad_<timespan>
- Exponential
moving average over the specified time span of the rate at which
submit-side file transfer processes have spent time writing to the
network when transferring input to jobs. One file transfer process
spending nearly all of its time writing to the network will generate
a load close to 1.0. The reason a file transfer process may spend
a long time writing to the network could be a network bottleneck on
the path between the submit and execute machine. It could also be
caused by slow writes to the disk on the execute side.
The time spans that are published are configured by
TRANSFER_IO_REPORT_TIMESPANS , which defaults to
1m, 5m, 1h, and 1d. When less than one full time span has
accumulated, the attribute is not published.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferNetWriteLoad_<timespan>.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferNetWriteSeconds
- Total number of
submit-side transfer process seconds spent writing to the network
when transferring input to jobs since this condor_schedd was
started. The reason a file transfer process may spend
a long time writing to the network could be a network bottleneck on
the path between the submit and execute machine. It could also be
caused by slow writes to the disk on the execute side.
The time spans that are published are configured by
TRANSFER_IO_REPORT_TIMESPANS , which defaults to
1m, 5m, 1h, and 1d. When less than one full time span has
accumulated, the attribute is not published.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferNetWriteSeconds.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferUploadBytes
- Total number of bytes uploaded
as input to jobs since this condor_schedd was started.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferUploadBytes.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
- FileTransferUploadBytesPerSecond_<timespan>
- Exponential moving average over the specified time span of the rate
at which bytes have been uploaded as input to jobs.
The time spans that are published are configured by
TRANSFER_IO_REPORT_TIMESPANS , which defaults to
1m, 5m, 1h, and 1d. When less than one full time span has
accumulated, the attribute is not published.
If STATISTICS_TO_PUBLISH contains TRANSFER:2,
for each active user, this attribute is also published prefixed by
the user name, with the name
Owner_<username>_FileTransferUploadBytesPerSecond_<timespan>.
The published user name is actually the file transfer queue name, as
defined by configuration variable TRANSFER_QUEUE_USER_EXPR .
Negotiator ClassAd Attributes
-
- DaemonStartTime:
- The time that this daemon was
started, represented as the number of second elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
- LastNegotiationCycleActiveSubmitterCount<X>:
- The integer number of submitters
the condor_negotiator attempted to negotiate with in the negotiation cycle.
The number <X> appended to the attribute name indicates how
many negotiation cycles ago this cycle happened.
- LastNegotiationCycleCandidateSlots<X>:
- The number of slot ClassAds after filtering by
NEGOTIATOR_SLOT_POOLSIZE_CONSTRAINT .
This is the number of slots actually considered for matching.
The number <X> appended to the attribute name indicates how many
negotiation cycles ago this cycle happened.
- LastNegotiationCycleDuration<X>:
- The number of seconds
that it took to complete the negotiation cycle. The number <X>
appended to the attribute name indicates how many negotiation cycles
ago this cycle happened.
- LastNegotiationCycleEnd<X>:
- The time, represented as the number of seconds since the Unix epoch,
at which the negotiation cycle ended.
The number <X> appended to the attribute name
indicates how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleMatches<X>:
- The number of successful
matches that were made in the negotiation cycle.
The number <X> appended to the attribute name
indicates how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleMatchRate<X>:
- The number of matched jobs divided by the duration of this cycle giving
jobs per second.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleMatchRateSustained<X>:
- The number of matched jobs divided by the period of this cycle giving
jobs per second.
The period is the time elapsed between the end of the previous cycle
and the end of this cycle,
and so this rate includes the interval between cycles.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleNumIdleJobs<X>:
- The number of idle jobs considered for matchmaking.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleNumJobsConsidered<X>:
- The number of jobs requests returned from the schedulers for consideration.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleNumSchedulers<X>:
- The number of individual schedulers negotiated with during matchmaking.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCyclePeriod<X>:
- The number of seconds elapsed between the end of the previous
negotiation cycle and the end of this cycle.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCyclePhase1Duration<X>:
- The duration, in seconds, of Phase 1 of the negotiation cycle:
the process of getting job, submitter and claim ClassAds from the
condor_collector.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCyclePhase2Duration<X>:
- The duration, in seconds, of Phase 2 of the negotiation cycle:
the process of filtering slots and processing accounting group configuration.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCyclePhase3Duration<X>:
- The duration, in seconds, of Phase 3 of the negotiation cycle:
sorting submitters by priority.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCyclePhase4Duration<X>:
- The duration, in seconds, of Phase 4 of the negotiation cycle:
the process of matching slots to jobs in conjunction with the schedulers.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleRejections<X>:
- The number of
rejections that occurred in the negotiation cycle. The number <X>
appended to the attribute name indicates how many negotiation cycles
ago this cycle happened.
- LastNegotiationCycleSlotShareIter<X>:
- The number of iterations performed during the negotiation cycle.
Each iteration includes the reallocation of remaining slots to
accounting groups,
as defined by the implementation of hierarchical group quotas,
together with the negotiation for those slots.
The maximum number of iterations is limited by the configuration variable
GROUP_QUOTA_MAX_ALLOCATION_ROUNDS .
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleSubmittersFailed<X>:
- A string containing
a space and comma-separated list of the names of all submitters who
failed to negotiate in the negotiation cycle. One possible cause of
failure is a communication timeout. This list does not include
submitters who ran out of time due
to NEGOTIATOR_MAX_TIME_PER_SUBMITTER . Those are listed
separately in LastNegotiationCycleSubmittersOutOfTime<X>.
The number <X> appended to the attribute name indicates how
many negotiation cycles ago this cycle happened.
- LastNegotiationCycleSubmittersOutOfTime<X>:
- A string containing
a space and comma separated list of the names of all submitters who
ran out of time due to NEGOTIATOR_MAX_TIME_PER_SUBMITTER
in the negotiation cycle. The number <X> appended to the
attribute name indicates how many negotiation cycles ago this cycle
happened.
- LastNegotiationCycleSubmittersShareLimit:
- A string containing a space and comma separated list of names of submitters
who encountered their fair-share slot limit during the negotiation cycle.
The number <X> appended to the attribute name indicates how
many negotiation cycles ago this cycle happened.
- LastNegotiationCycleTime<X>:
- The time, represented as the number of second elapsed since the Unix
epoch (00:00:00 UTC, Jan 1, 1970), at which the negotiation cycle started.
The number <X> appended to the attribute name
indicates how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleTotalSlots<X>:
- The total number of slot ClassAds received by the condor_negotiator.
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- LastNegotiationCycleTrimmedSlots<X>:
- The number of slot ClassAds left after trimming currently claimed slots
(when enabled).
The number <X> appended to the attribute name indicates
how many negotiation cycles ago this cycle happened.
- Machine:
- A string with the machine's fully qualified
host name.
- MyAddress:
- Description is not yet written.
- MyCurrentTime:
- The time, represented as the number of
second elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which the condor_schedd daemon last sent a ClassAd update to the
condor_collector.
- Name:
- The name of this resource; typically the same value as
the Machine attribute, but could be customized by the site
administrator.
On SMP machines, the condor_startd will divide the CPUs up into separate
slots, each with with a unique name.
These names will be of the form slot#@full.hostname, for example,
slot1@vulture.cs.wisc.edu, which signifies slot number 1 from
vulture.cs.wisc.edu.
- NegotiatorIpAddr:
- String with the IP and port address of the
condor_negotiator daemon which is publishing this Negotiator ClassAd.
- PublicNetworkIpAddr:
- Description is not yet written.
- UpdateSequenceNumber:
- An integer, starting at zero,
and incremented with each ClassAd update sent to the condor_collector.
The condor_collector uses this value to sequence the updates it
receives.
Submitter ClassAd Attributes
-
- FlockedJobs:
- The number of jobs from this submitter
that are running in another pool.
- HeldJobs:
- The number of jobs from this submitter
that are in the hold state.
- IdleJobs:
- The number of jobs from this submitter
that are now idle.
- Name:
- The fully qualified name of the user or accounting group.
It will be of the form name@submit.domain.
- RunningJobs:
- The number of jobs from this submitter
that are running now.
- ScheddIpAddr:
- The IP address associated with the
condor_schedd daemon used by the submitter.
- ScheddName:
- The fully qualified host name of the machine
that the submitter submitted from.
It will be of the form submit.domain.
- SubmitterTag:
- The fully qualified host name of the
central manager of the pool used by the submitter,
if the job flocked to the local pool.
Or, it will be the empty string if submitter submitted from within
the local pool.
- WeightedIdleJobs:
- A total number of requested
cores across all Idle jobs from the submitter.
- WeightedRunningJobs:
- A total number of requested
cores across all Running jobs from the submitter.
Defrag ClassAd Attributes
-
- AvgDrainingBadput:
- Fraction of time CPUs
in the pool have spent on jobs that were killed during draining of the
machine. This is calculated in each polling interval by looking
at TotalMachineDrainingBadput.
Therefore, it treats evictions of jobs that do and do not produce
checkpoints the same.
When the condor_startd restarts, its counters start over from 0, so the
average is only over the time since the daemons have been alive.
- AvgDrainingUnclaimedTime:
- Fraction of time CPUs
in the pool have spent unclaimed by a user during
draining of the machine. This is calculated in each polling interval
by looking at TotalMachineDrainingUnclaimedTime.
When the condor_startd restarts, its counters start over from 0, so the
average is only over the time since the daemons have been alive.
- DaemonStartTime:
- The time that this daemon was started,
represented as the number of seconds elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
- DrainFailures:
- Total count of failed attempts
to initiate draining during the lifetime of this condor_defrag daemon.
- DrainSuccesses:
- Total count of successful attempts
to initiate draining during the lifetime of this condor_defrag daemon.
- Machine:
- A string with the machine's fully qualified
host name.
- MachinesDraining:
- Number of machines that were observed
to be draining in the last polling interval.
- MachinesDrainingPeak:
- Largest number of machines that were
ever observed to be draining.
- MonitorSelfAge:
- The number of seconds that this daemon
has been running.
- MonitorSelfCPUUsage:
- The fraction of recent CPU time utilized
by this daemon.
- MonitorSelfImageSize:
- The amount of virtual memory consumed by
this daemon in Kbytes.
- MonitorSelfRegisteredSocketCount:
- The current number of sockets
registered by this daemon.
- MonitorSelfResidentSetSize:
- The amount of resident memory
used by this daemon in Kbytes.
- MonitorSelfSecuritySessions:
- The number of open (cached)
security sessions for this daemon.
- MonitorSelfTime:
- The time, represented as the number of
seconds elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which this daemon last checked and set the attributes with names that
begin with the string MonitorSelf.
- MyAddress:
- Description is not yet written.
- MyCurrentTime:
- The time, represented as the number of
seconds elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which the condor_defrag daemon last sent a ClassAd update to the
condor_collector.
- Name:
- The name of this daemon; typically the same value as
the Machine attribute, but could be customized by the site
administrator via the configuration variable DEFRAG_NAME .
- RecentDrainFailures:
- Count of failed attempts
to initiate draining during the past RecentStatsLifetime seconds.
- RecentDrainSuccesses:
- Count of successful attempts
to initiate draining during the past RecentStatsLifetime seconds.
- RecentStatsLifetime:
- A Statistics attribute defining
the time in seconds over which statistics values have been collected
for attributes with names that begin with Recent.
- UpdateSequenceNumber:
- An integer, starting at zero,
and incremented with each ClassAd update sent to the condor_collector.
The condor_collector uses this value to sequence the updates it
receives.
- WholeMachines:
- Number of machines that were observed
to be defragmented in the last polling interval.
- WholeMachinesPeak:
- Largest number of machines that were
ever observed to be simultaneously defragmented.
Collector ClassAd Attributes
-
- CollectorIpAddr:
- String with the IP and port address of the
condor_collector daemon which is publishing this ClassAd.
- CurrentJobsRunningAll:
- An integer value representing the sum of
all jobs running under all universes.
- CurrentJobsRunning<universe>:
- An integer value representing
the current number of jobs running under the universe which forms
the attribute name. For example
CurrentJobsRunningVanilla = 567
identifies that the condor_collector counts 567 vanilla universe jobs
currently running.
<universe> is one of
Unknown, Standard, Vanilla, Scheduler,
Java, Parallel, VM, or Local.
There are other universes, but they are not listed here, as they represent
ones that are no longer used in Condor.
- DaemonStartTime:
- The time that this daemon was
started, represented as the number of second elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
- HostsClaimed:
- Description is not yet written.
- HostsOwner:
- Description is not yet written.
- HostsTotal:
- Description is not yet written.
- HostsUnclaimed:
- Description is not yet written.
- IdleJobs:
- Description is not yet written.
- Machine:
- A string with the machine's fully qualified
host name.
- MaxJobsRunning<universe:
- An integer value representing
the sum of all MaxJobsRunning<universe> values.
- MaxJobsRunning<universe>:
- An integer value representing
largest number of currently running jobs ever seen
under the universe which forms the attribute name,
over the life of this condor_collector process.
For example
MaxJobsRunningVanilla = 401
identifies that the condor_collector saw 401 vanilla universe jobs
currently running at one point in time, and that was the largest
number it had encountered.
<universe> is one of
Unknown, Standard, Vanilla, Scheduler,
Java, Parallel, VM, or Local.
There are other universes, but they are not listed here, as they represent
ones that are no longer used in Condor.
- MyAddress:
- Description is not yet written.
- MyCurrentTime:
- The time, represented as the number of
second elapsed since the Unix epoch (00:00:00 UTC, Jan 1, 1970),
at which the condor_schedd daemon last sent a ClassAd update to the
condor_collector.
- Name:
- The name of this resource; typically the same value as
the Machine attribute, but could be customized by the site
administrator.
On SMP machines, the condor_startd will divide the CPUs up into separate
slots, each with with a unique name.
These names will be of the form ``slot#@full.hostname'', for example,
``slot1@vulture.cs.wisc.edu'', which signifies slot number 1 from
vulture.cs.wisc.edu.
- RunningJobs:
- Description is not yet written.
- UpdateInterval:
- Description is not yet written.
- UpdateSequenceNumber:
- An integer that begins at 0,
and increments by one each time the same ClassAd is again advertised.
ClassAd Attributes Added by the condor_collector
-
- AuthenticatedIdentity:
- The authenticated name assigned
by the condor_collector to the daemon that published the ClassAd.
- LastHeardFrom:
- The time inserted into a daemon's
ClassAd representing the time that this condor_collector
last received a message from the daemon.
Time is represented as the number of second elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
This attribute is added if COLLECTOR_DAEMON_STATS is True.
- UpdatesHistory:
- A bitmap representing the status of
the most recent updates received from the daemon.
This attribute is only added if COLLECTOR_DAEMON_HISTORY_SIZE
is non-zero.
See page for more information on
this setting.
This attribute is added if COLLECTOR_DAEMON_STATS is True.
- UpdatesLost:
- An integer count of the number of updates
from the daemon that the condor_collector can definitively determine
were lost since the condor_collector started running.
This attribute is added if COLLECTOR_DAEMON_STATS is True.
- UpdatesSequenced:
- An integer count of the number of updates
received from the daemon,
for which the condor_collector can tell how many were or were not lost,
since the condor_collector started running.
This attribute is added if COLLECTOR_DAEMON_STATS is True.
- UpdatesTotal:
- An integer count started when the
condor_collector started running, representing the sum
of the number of updates actually received from the daemon plus
the number of updates that the condor_collector determined were lost.
This attribute is added if COLLECTOR_DAEMON_STATS is True.
DaemonCore Statistics Attributes
-
- DebugOuts:
- Description not yet written.
- PipeMessages:
- Description not yet written.
- PipeRuntime:
- Description not yet written.
- SelectWaittime:
- Description not yet written.
- SignalRuntime:
- Description not yet written.
- Signals:
- Description not yet written.
- SocketRuntime:
- Description not yet written.
- SockMessages:
- Description not yet written.
- TimerRuntime:
- Description not yet written.
- TimersFired:
- Description not yet written.
Subsections
Next: 13. Appendix B: Codes
Up: HTCondorTM Version 8.0.1 Manual
Previous: procd_ctl
Contents
Index
htcondor-admin@cs.wisc.edu