Next: 10.4 Stable Release Series
Up: 10. Version History and
Previous: 10.2 Upgrading from the
Contents
Index
Subsections
10.3 Development Release Series 8.1
This is the development release series of HTCondor.
The details of each version are described below.
Version 8.1.2
Release Notes:
- HTCondor version 8.1.2 released on October 31, 2013.
This 8.1.2 release contains all bug fixes from HTCondor version 8.0.4.
New Features:
- condor_config_val now supports -dump and -verbose
options to query configuration remotely from daemons.
(Ticket #3894).
- The condor_chirp protocol and command line tool has been
enhanced to support lower-cost, delayed updates to the job
ClassAd residing in the condor_schedd; updates occur as other communications
take place, eliminating the overhead of a separate update.
These two new Chirp commands,
set_job_attr_delayed and get_job_attr_delayed allow the job
to send lightweight notification for events such as progress
monitoring, which need not be durable.
(Ticket #3353).
- condor_history has been enhanced to support
remote history using new -pool and -name options.
(Ticket #3897).
- Matchmaking in the condor_negotiator may be made aware of resources
available for partitionable slots.
This permits multiple jobs to be matched against a partitionable slot
during a single negotiation cycle.
The new policies discussed in Section 3.5.10
are set using new configuration variables and are known as consumption policies.
(Ticket #3435).
- Definition syntax for the authorization configuration variables
ALLOW_* and DENY_* has been expanded to permit
the specification of Unix netgroups.
See section 3.6.7 for the syntax.
(Ticket #3859).
- Definition syntax for the configuration variable
QUEUE_SUPER_USERS has been expanded to accept a specification
of Unix user groups.
See section 3.3.11 for the syntax.
(Ticket #3859).
- To ensure that a grid universe job running at an EC2 service
terminates,
HTCondor now checks after a fixed time interval
that the job actually has terminated,
instead of relying on the service's potentially unreliable
job shut down indication.
If the job has not terminated after a total of four checks,
the job is placed on hold; it does not leave the queue marked as completed.
(Ticket #3438).
- Email alerts about file transfers taking longer than
MAX_TRANSFER_QUEUE_AGE are now grouped together
to reduce the number of email messages that are sent.
- Floating point values in Old ClassAds are now printed in a more
human-readable format, while retaining 64-bit double precision.
In previous versions, these values were always printed in scientific
notation.
(Ticket #3928).
- condor_ssh_to_job now works with grid universe jobs
which use EC2 resources.
(Ticket #1548).
- Machine ClassAd attributes Disk and TotalDisk
are now published as 64-bit integers,
rather than being capped at the maximum value of a 32-bit integer.
(Ticket #1784).
- In an effort to improve scalability under heavy load, the tuning
configuration variable MAX_REAPS_PER_CYCLE is exposed,
as defined at section 3.3.5.
The default for this variable changed from 1 to 0.
(Ticket #3992).
- To reduce the overwhelming quantity of per-user condor_schedd
statistics that are generated when configuration variables
SCHEDD_COLLECT_STATS_FOR_<Name> or
SCHEDD_COLLECT_STATS_BY_<Name> are used,
the statistics are now published at verbosity level 2,
instead of verbosity level 1.
(Ticket #3980).
- The Python bindings now include the Negotiator class to
manage users and their priorities.
(Ticket #3893).
- The Python bindings now provide automatic conversions from
dictionaries to ClassAds,
so they can accept a dictionary directly as an argument,
rather than constructing a ClassAd from the dictionary.
(Ticket #3892).
- The Python bindings ClassAd module has
quote() and unquote()
methods to help create string literals.
(Ticket #3900).
- The Python bindings ClassAd module has new
methods parseAds() and parseOldAds()
that implement an iterator over ClassAds, in the New ClassAd and
Old ClassAd format.
(Ticket #3918).
- The ordering of adding attributes to the machine ClassAd has been
changed, such that the attributes Draining, DrainingRequestId,
and LastDrainStartTime are now added before the job retirement
is calculated.
This allows a decision about preemption to be made based on if
a machine is currently draining.
(Ticket #3901).
Bugs Fixed:
- When USE_PID_NAMESPACES is True,
the soft kill signal is now successfully sent to the job.
Previously, a condor_rm
command of such a job would not remove the job until the
killing timeout had expired.
(Ticket #3981).
- If a standard universe job exited without producing any
checkpoints and no checkpoint server was used,
two spurious error messages would be logged to the SchedLog,
as it tried to remove the old checkpoint images from the
non-existent checkpoint server.
These error messages are no longer logged.
(Ticket #3919).
- When configuration variable STARTER_RLIMIT_AS is set
to its default value of 0, it means that there is no limit.
This value was logged as a limit of 0Mb, leading to confusion.
Now, no message is logged in this default case.
(Ticket #3914).
- Improved how the condor_schedd notifies the condor_shadow
and condor_gridmanager about modifications to job ClassAds made using
condor_qedit.
(Ticket #3909).
- Grid universe jobs now use the correct executable file when
copy_to_spool is set to True.
Previously, the executable file named in the submit description file
would be copied to the remote server,
rather than the copy of the executable file stored in the spool directory.
(Ticket #3589).
- The example configuration provided within files
condor_config.generic and condor_config.generic.redhat
has been updated to fix an inadequate expression defining
NEGOTIATOR_POST_JOB_RANK when the condor_startd is
configured to not run benchmarks, as Kflops would not be defined.
(Ticket #3589).
- Fixed a Python binding crash due to a segmentation fault,
when evaluating an expression tree with an undefined reference.
The fix allows the user to define the ClassAd scope
within which an expression tree is evaluated.
(Ticket #3910).
- The Python bindings now include a correct conversion of
absTime and relTime ClassAd literals to the
corresponding Python types.
(Ticket #3911).
Version 8.1.1
Release Notes:
- HTCondor version 8.1.1 released on September 17, 2013.
This release contains all bug fixes from the stable release version 8.0.2.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable USE_RESOURCE_REQUEST_COUNTS
is a boolean value that defaults to True,
reducing the latency of negotiation
when there are many jobs next to each other in the queue
with the same auto cluster, and many matches are being made.
(Ticket #3585).
- Four new machine ClassAd attributes are advertised.
TotalJobStarts is the total number of jobs started by
this condor_startd daemon since it booted.
RecentTotalJobStarts is the number of jobs started in the
last twenty minutes.
Similarly, TotalPreemptions is
the number of jobs preempted since the condor_startd daemon started,
and RecentTotalPreemptions is
the number in the last 20 minutes.
(Ticket #3712).
- FILE_TRANSFER_DISK_LOAD_THROTTLE now accepts tabs in addition to spaces as delimiters.
(Ticket #3798).
- Configuration variable VALID_SPOOL_FILES has been expanded
to accept a single asterisk wild card character in each listed file name.
(Ticket #3764).
- The new configuration variable GAHP_DEBUG_HIDE_SENSITIVE_DATA
is a boolean value that defaults to hiding sensitive data
such as security keys and passwords
when communication with a GAHP server is written to a daemon log.
(Ticket #3536).
- The default value of configuration variable
ENABLE_CLASSAD_CACHING has changed to True for all
daemons other than the condor_shadow, condor_starter, and condor_master.
(Ticket #3441).
Bugs Fixed:
- The condor_gridmanager now does proper failure recovery when
submitting EC2 grid universe jobs to services that do not support
the EC2 ClientToken parameter.
Previously, if there was a failure when submitting jobs to OpenStack
or Eucalyptus, the jobs could be submitted twice.
(Ticket #3682).
- Fixed the printing of nested ClassAds, so that the nested ClassAds
can be read back properly.
(Ticket #3772).
- Fixed a bug between the condor_gridmanager and condor_ft-gahp
that caused file transfers to fail if one of the two daemons was older
than version 8.1.0.
(Ticket #3856).
- Fixed a bug that caused substitution in configuration variable
evaluation to ignore per-daemon overrides.
This is a long standing bug that may result in subtle changes
to the way your configuration files are processed.
An example of how substitution works with the per-daemon overrides
is in section 3.3.1.
(Ticket #3822).
- Fixed a bug that caused the command
condor_submit -
to be interpreted as an interactive submit,
rather than a request to read input from stdin.
condor_qsub was also modified to be immune to this bug,
such that it will still work with other versions of HTCondor containing
the bug.
(Ticket #3902).
Known Bugs:
- DAGMan recovery mode does not work for Pegasus-generated SUBDAGs.
For SUBDAGs, doing condor_hold or condor_release on
the condor_dagman job, or stopping and re-starting the
condor_schedd with the DAGMan
job in the queue will result in failure of the DAG. This can be
avoided by doing a condor_rm of the DAGMan job, which produces a Rescue
DAG, and re-submitting the DAG; the Rescue DAG is automatically run.
This bug was introduced in HTCondor version 8.0.1, and it also appears
in versions 8.0.2, 8.1.0, and 8.1.1.
(Ticket #3882).
Additions and Changes to the Manual:
Version 8.1.0
Release Notes:
- HTCondor version 8.1.0 released on August 5, 2013.
This release contains all bug fixes from the stable release version 8.0.1.
New Features:
- Added support for publishing information about an HTCondor pool
to GangliaTM.
See section 3.3.38 on
page for configuration variable details.
(Ticket #3515).
- Improved the performance of the condor_collector daemon when running
at sites that do not observe daylight savings time.
(Ticket #2898).
- condor_q, condor_rm, condor_status and condor_qedit are now
more consistent in the way they handle the -constraint option.
(Ticket #1156).
- The new condor_dagman_metrics_reporter executable
with manual page at ,
reports metrics for DAGMan workflows running under Pegasus. condor_dagman
now generates an output file of the relevant metrics,
as described at .
(Ticket #3532).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The default value of configuration variable
COLLECTOR_MAX_FILE_DESCRIPTORS has changed to 10240,
and the default value of configuration variable
SCHEDD_MAX_FILE_DESCRIPTORS has changed to 4096.
This increases the scalability of the default configuration.
(Ticket #3626).
- The new configuration variable
FILE_TRANSFER_DISK_LOAD_THROTTLE enables dynamic
adjustment of the level of file transfer concurrency in order to
keep the disk load generated by transfers below a specified level.
Supporting this new feature are configuration variables
FILE_TRANSFER_DISK_LOAD_THROTTLE_WAIT_BETWEEN_INCREMENTS ,
FILE_TRANSFER_DISK_LOAD_THROTTLE_SHORT_HORIZON , and
FILE_TRANSFER_DISK_LOAD_THROTTLE_LONG_HORIZON .
(Ticket #3613).
- The following new condor_schedd ClassAd attributes are for
monitoring file transfer activity:
TransferQueueMBWaitingToDownload,
TransferQueueMBWaitingToUpload,
FileTransferDiskThrottleLevel,
FileTransferDiskThrottleHigh, and
FileTransferDiskThrottleLow.
(Ticket #3613).
- The default value for the configuration variable
PASSWD_CACHE_REFRESH has been changed from 300 seconds to
72000 seconds (20 hours).
(Ticket #3723).
- The new configuration variables
DAGMAN_PEGASUS_REPORT_METRICS and
DAGMAN_PEGASUS_REPORT_TIMEOUT
set defaults used by the new condor_dagman_metrics_reporter executable,
which reports metrics for DAGMan jobs running under Pegasus.
(Ticket #3532).
Bugs Fixed:
- HTCondor version 8.0.0 had an unintended change in the Chirp
wire protocol.
This change caused condor_chirp with the put option
to fail when the execute node
was running HTCondor version 7.8.x or earlier versions.
HTCondor 8.0.1 and later
versions will now send the original wire protocol, and accept either the
original protocol, or the variant that HTCondor version 8.0.0 sends.
(Ticket #3735).
- Fixed a bug that could cause the daemons to crash on Unix platforms,
if the operating system reported that a job owner's account
did not exist, for example due to a temporary NIS or LDAP failure.
(Ticket #3723).
- Fixed a bug that resulted in a misleading error message when
condor_status with the -constraint option specified a constraint
that could not be parsed.
(Ticket #1319).
- Fixed a typo in the output of condor_q,
where a period was erroneously present within a heading.
(Ticket #3703).
Known Bugs:
Additions and Changes to the Manual:
Next: 10.4 Stable Release Series
Up: 10. Version History and
Previous: 10.2 Upgrading from the
Contents
Index