Next: 10.4 Development Release Series
Up: 10. Version History and
Previous: 10.2 Upgrading from the
Contents
Index
Subsections
10.3 Stable Release Series 8.4
This is a stable release series of HTCondor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 8.4.x releases.
New features will be added in the 8.5.x development series.
The details of each version are described below.
Version 8.4.9
Release Notes:
- HTCondor version 8.4.9 released on September 29, 2016.
New Features:
- Increased the maximum number of unique attributes that can be
set by the condor_chirp command set_job_attr_delayed from
50 to 100, and added the configuration knob CHIRP_DELAYED_UPDATE_MAX_ATTRS.
See section 3.3.12 for more information.
(Ticket #5891).
Bugs Fixed:
- Fixed a bug where if the condor_startd crashed while running a
Docker universe job, the job would be left running and not removed when
the condor_startd restarted. The condor_startd now removes any
orphaned Docker universe jobs on restart.
(Ticket #5858).
- Fixed a bug that printed spurious locking-related warnings to
the StarterLog when running Docker universe jobs.
(Ticket #5876).
- The Job Router and HTCondor-C now properly send a RESCHEDULE
command to the condor_schedd after submitting a job.
(Ticket #5903).
- Fixed bugs in the Job Router that could cause a routed job to be
aborted if the UPDATE_JOB_INFO hook printed attributes to be set in
the job ad.
(Ticket #5899).
- The Job Router now uses the correct name for the configuration
parameter for the JOB_FINALIZE hook.
Previously, the Job Router used the name JOB_EXIT, counter to what was
documented.
(Ticket #5802).
- Updated systemd configuration to start HTCondor after NIS has started.
(Ticket #5814).
- Updated systemd configuration to start HTCondor after local LDAP name
service daemon has started.
(Ticket #5836).
- Updated systemd configuration to attempt restart of HTCondor daemons
after 1 minute.
(Ticket #5836).
- In the RPM packages, move the systemd tmpfiles configuration file to
the recommended directory (/usr/lib/tmpfiles.d).
(Ticket #5896).
- Fixed a bug introduced in 8.4.5 that caused configuration variables starting with
STARTD. or STARTER. to be ignored.
(Ticket #5861).
- Fixed a typo in the desired value of `rmem_max' in the Linux kernel
tuning script. Improved logging of Linux kernel tuning script by including
the name of the file (not) being changed.
(Ticket #5829).
- Fixed a bug that could cause the condor_master to crash after
restarting the condor_shared_port daemon.
(Ticket #5801).
- Fixed a bug that could cause the wrong dynamic slots to be preempted
for a match when ALLOW_PSLOT_PREEMPTION is set to True.
(Ticket #5748).
- Fixed a bug in cream_gahp that caused it to delegate
RFC-format X.509 proxies incorrectly to the CREAM service.
(Ticket #5773).
- Fixed a bug where the Windows version information was set to
a single value for multiple programs.
This resulted in crash boxes for most of the HTCondor tools being reported
as a crash of condor_gpu_discovery
(Ticket #5795).
- Fixed a bug whereby the condor_collector process would exit with
an error several times per hour if the configuration knob NO_DNS
is set to True.
(Ticket #5762).
- Fixed monitoring of memory and CPU usage of running jobs on Mac OS X.
This monitoring didn't work for a personal installation of HTCondor.
With Mac OS X 10.11 and above, this monitoring resulted in a flood of
errors messages to the system logs for a root-based installation.
(Ticket #5777).
- Fixed a bug when attempting to authenticate using multiple
methods wherein if a method failed, the remaining methods were not
always attempted.
(Ticket #5674).
- Fixed a bug where condor_userprio may fail to display
the correct priority factor value for a user associated
with a group.
(Ticket #5848).
- Fixed a bug that can cause the condor_procd to crash.
Fixed a bug that prevented other daemons from talking to the
condor_procd when it is restarted after a crash.
(Ticket #5863).
- If the condor_procd crashes, the condor_master now tries to
restart it several times. Previously only one restart attempt was done.
(Ticket #3655).
- Fixed a bug that resulted in the condor_starter crashing when
attempting to run a BOINC backfill job.
(Ticket #5862).
- Fixed a bug in the configuration language where an if defined
test would reject a valid variable name when it had both an underscore and a digit.
(Ticket #5914).
- Fixed a bug that caused the condor_ssh_to_job command to fail
when using the HTCondor RPM installation.
(Ticket #5591).
Version 8.4.8
Release Notes:
- HTCondor version 8.4.8 released on July 5, 2016.
New Features:
Bugs Fixed:
- Fixed a memory leak in the condor_q client code that impacted
users of the Python API call htcondor.Schedd().query().
(Ticket #5727).
- Fixed a bug that caused file transfers to fail when using Bosco.
(Ticket #5710).
- Fixed a bug that could cause the condor_schedd to crash when using
SCHEDD_CRON_JOBLIST.
(Ticket #5715).
- The condor_schedd now rejects job submissions when the job owner
doesn't have a user account on the machine.
Previously, the condor_schedd would accept such jobs and then fail
to run them.
(Ticket #5734).
- Fixed a bug introduced in the 8.4.7 release that resulted in the remote
condor_history command failing unless the -limit argument is used.
(Ticket #5735).
- Fixed a bug in condor_history that caused it to treat all unrecognized arguments
as user names
(Ticket #5706).
- The high-availability daemon now properly detects changes to the
HAD_LIST when reconfigured.
(Ticket #5753).
- The high-availability daemon now properly internalizes the
HAD_LIST when reconfigured.
(Ticket #5754).
- Fixed a bug that caused the condor_master to stop responding after it
restarted a child daemon when shared port is enabled on Windows. This bug could also
result in a hang on shutdown.
(Ticket #5713).
- Fixed a bug that could cause condor_status or condor_q to
crash when the -xml option is used.
(Ticket #5718).
- Fixed a bug introduced in the 8.4.7 release that resulted in a parse error from condor_submit
when JobAdInformationAttrs was set in the configuration variable SUBMIT_ATTRS.
(Ticket #5720).
Version 8.4.7
Release Notes:
- HTCondor version 8.4.7 released on June 6, 2016.
New Features:
- Docker universe jobs now drop all Linux capabilities by default.
The new knob DOCKER_DROP_ALL_CAPABILITIES, which defaults to true
can be set to false to revert to the old behavior.
(Ticket #5679).
- Added configuration variable MAX_TIME_SKIP to control
how much system clock skip is allowed before the HTCondor daemons
restart. See Section 3.3.4 for more information.
- On Linux, HTCondor appropriately tunes kernel parameters
root_maxkeys and root_maxbytes to prevent condor_master
startup failures on older Linux kernels.
(Ticket #5671).
- The configuration variable SUBMIT_ATTRS now understands the +Attr
syntax that condor_submit uses to inject attributes directly into the job ClassAd.
(Ticket #5694).
- The condor_submit variable job_lease_duration can now be an
expression.
(Ticket #5694).
Bugs Fixed:
- All $function macro substitutions in in configuration files
will now correctly handle variables with subsystem and localname prefixes
as well as self references. In particular
VAR = $F(VAR)
now substitutes
correctly rather than hanging forever.
(Ticket #5565).
- Fixed a bug in Docker universe where the job would
not run with the correct group id.
(Ticket #5649).
- Fixed a performance problem in the condor_schedd that could
cause it to become unresponsive for several minutes after the
set of significant attributes for negotiation changes.
(Ticket #5648).
- Fixed a bug where the python bindings ClassAd parser would fail to detect
whether old or new format ClassAds were present in a stream, even though the
ClassAd format was specified in advance.
(Ticket #5643).
- Fixed a bug where some floating point values would have an extra
.0 appended to the end when printed (e.g. 2E40.0).
These values could not be read properly by normal number parsing functions.
(Ticket #5682).
- When using GRIDMANAGER_SELECTION_EXPR, grid ads from
different condor_gridmanager instances will no longer overwrite
each other in the condor_collector.
(Ticket #5683).
- In addition to logging to the file KERNEL_TUNING_LOG,
the default LINUX_KERNEL_TUNING_SCRIPT now also logs to
syslog and /etc/systcl.d/99-htcondor.conf.
(Ticket #5489).
- Fixed a bug on condor_history that could result in truncation of
the job id field.
(Ticket #5527).
- On Windows, configuring HTCondor to restrict the range of outbound
port numbers may cause substantial delays when using the command-line
tools. Since we now know that it's not free to do so, LOWPORT
and HIGHPORT no longer restrict the port numbers of outbound
connections on Windows. If you still require this functionality, use
OUT_LOWPORT and OUT_HIGHPORT.
(Ticket #4711).
- Fixed a bug that would cause condor_submit to create extra, incorrectly named
output and error files when
$$
substitution is used as part of the filenames.
(Ticket #2720).
- Fixed a bug that would cause the condor_history_helper to be invoked using
the wrong name on Windows
(Ticket #5656).
- Fixed a bug that would sometimes cause configuration variables with a subsystem prefix
to be ignored.
(Ticket #5310).
- Fixed a bug that could cause HAD to fail if a machine has an IPv6
address.
(Ticket #5659).
- Fixed a bugs in condor_history when fetching history from a remote condor_schedd. The
bugs caused complete failure when the remote condor_schedd was running Windows, and
would corrupt some string values when the remote condor_schedd was any other operating system.
(Ticket #5701).
Version 8.4.6
Release Notes:
- HTCondor version 8.4.6 released on April 21, 2016.
New Features:
- condor_advertise -multiple now tolerates multiple blank lines in the
input file. It no longer quits parsing on the first first blank line that does not
follow a valid ClassAd.
(Ticket #5147).
Bugs Fixed:
- Fixed bug where when partitionable slots were
enabled in the condor_startd, a job would be unable
to start running on that machine in some cases.
(Ticket #5626).
- Fixed a bug that would cause the condor_startd
to crash when ALLOW_PSLOT_PREEMPTION was enabled.
(Ticket #5586).
- Fixed a bug introduced in version 8.3 that
removed the attribute REMOTE_GROUP_RESOURCES_IN_USE
from the job ad in the negotiator.
(Ticket #5593).
- Fixed a bug where HTCondor would regard as invalid text representations
of IPv6 addresses which were the longest possible. This bug typically
manifested as a failure to contact hosts which were advertising IPv6 addresses
of this sort.
(Ticket #5585).
- Fixed a memory leak in the condor_negotiator when
ALLOW_PSLOT_PREEMPTION was enabled.
(Ticket #5571).
- Fixed a bug where after a condor_schedd restart
the submitter attribute WEIGHTED_JOBS_RUNNING
would be incorrectly computed.
(Ticket #5637).
- Fixed a bug when using CLAIM_PARTITIONABLE_LEFTOVERS
and flocking.
Machines from a remote pool could be treated as if they were in the local
pool.
As a result, the RemotePool attribute would not be set in the ads
of jobs running on these machines, and the FlockedJobs and
RunningJobs attributes of submitter ads would have incorrect
values.
(Ticket #5577).
- Fixed a bug that could cause a job's supplemental groups to be set
incorrectly when SOFT_UID_DOMAIN is set to True.
(Ticket #5603).
- Fixed a bug that caused supplemental groups to be set incorrectly
when executing file transfer plugins and various hooks.
(Ticket #5600).
- Fixed a bug that resulted in Windows 10 being reported as
WindowsUnknown in the OPSYSNAME attribute of the condor_startd
ClassAd.
(Ticket #5575).
- Fixed a typo in the LIMIT_JOB_RUNTIMES policy metaknob
that prevented the policy from working as intended.
(Ticket #5307).
Version 8.4.5
Release Notes:
- HTCondor version 8.4.5 released on March 22, 2016.
New Features:
- The default for DAGMAN_LOG_ON_NFS_IS_ERROR has
been changed from True to False. This is the result
of changes in the 8.3 series that mean that file locking is no
longer required on user logs.
(Ticket #5516).
Bugs Fixed:
- Fixed a bug where HTCondor would unconditionally retry non-successful
DNS lookups of the local system's hostname; this could cause delays of up
to sixty seconds when using command-line tools on systems whose hostname
was not in DNS. We no longer retry on errors at all, and only retry
failures which are temporary.
(Ticket #5553).
- Fixed a bug that would cause condor_schedds flocking to remote
pools to not send no jobs, or fewer jobs than possible to the
remote pool. This was a result of not correctly setting
the submitter attribute WeightedJobsRunning for
flocked pools.
(Ticket #5539).
- Accounting group names that contain spaces are now rejected by
condor_submit and ignored by the condor_negotiator.
Previously, submitting a job with an accounting group name that contained
a space would cause the condor_negotiator to fail at startup.
(Ticket #5538).
- Fixed a bug whereby per-job history files (enabled by
the configuration setting PER_JOB_HISTORY_DIR) may briefly
appear to be empty or incomplete.
(Ticket #5562).
- Fixed a bug whereby ClassAds written into history files
may contain the same attribute multiple times.
(Ticket #5548).
- Fixed a bug that caused DAGMan to not work correctly with
some local universe node jobs. (This bug was introduced in version
8.3.0.)
(Ticket #5299).
- Fixed a bug that resulted in jobs managed by the condor_job_router
not reporting memory and disk usage of the job correctly.
(Ticket #5552).
- Reworked a bug fix from the 8.4.3 release that was designed to allow for
more than 100 dynamic slots to be a bit more generous in allocating Disk to
those slots.
Now, those slots are less prone to fail to match subsequent jobs.
(Ticket #5535).
- Fixed a bug in the randomization of ports within the LOWPORT to HIGHPORT range
that would sometimes generate ports outside of this range on Windows.
(Ticket #5555).
- Fixed a bug in condor_off -peaceful that could result in never
sending the "off" command to machines when at least one of the machines could
not be contacted when sending the previous "peaceful" command.
(Ticket #5504).
- When cgroups are in use, limit the amount of file system cache in the
kernel to prevent the OOM killer from killing jobs that use a large amount of
file system cache.
(Ticket #5500).
Version 8.4.4
Release Notes:
- HTCondor version 8.4.4 released on February 4, 2016.
New Features:
Bugs Fixed:
- Fixed a bug that caused the condor_collector to crash if
CONDOR_DEVELOPERS_COLLECTOR failed to resolve.
(Ticket #5492).
- Fixed a bug that caused Condor-C jobs to fail when
JobLeaseDuration was set to less than one hour (3600 seconds).
The remote job would be aborted due to the job lease not being renewed.
(Ticket #5446).
- Fixed a bug that could cause HTCondor to misreport an EC2 job as running
when it had in fact been purged from the service. Fixed bugs that could
cause a running EC2 job to be misreported as idle. HTCondor also no longer
sends EC2 services superfluous queries. (This may increase the latency
of HTCondor job status updates.)
(Ticket #4568).
- The grid manager now aborts if the GAHP hangs, which we detect by
the absence of a response after GRIDMANAGER_GAHP_RESPONSE_TIMEOUT
seconds. The EC2 GAHP now performs many fewer memory allocations in the
course of normal operations, which improves stability on some systems.
(Ticket #5442).
- Fixed a bug that caused the condor_master to fail if a shared port
daemon address file written by a version of HTCondor prior to 8.4.0
is present.
(Ticket #5488).
- Fixed a bug that caused updates to the job attribute
TimerRemove to not be respected while the job was being managed
by the condor_shadow, condor_gridmanager, or the Job Router.
(Ticket #5470).
- Fixed a bug where the job policy expression of a job could appear
in one of the Reason attributes of another job.
(Ticket #5466).
- Fixed a bug, that occurred on the Windows platform, that would cause
the condor_shadow to hang while trying to delete old shadow logs when the
value of MAX_NUM_SHADOW_LOG was larger than the default value of 1.
This bug would also sometimes result in the condor_schedd hanging.
(Ticket #5499).
Version 8.4.3
Release Notes:
- HTCondor version 8.4.3 released on December 16, 2015.
New Features:
Bugs Fixed:
- Fixed a bug that caused the -append option to be handled too
late to apply to the first Queue statement in a condor_submit file.
(Ticket #5414).
- Fixed a bug that prevented running more than 100 slots on a single
condor_startd with partitionable slots.
(Ticket #5398).
- Fixed a bug which caused ec2_iam_profile_name
not to work for Spot instances.
(Ticket #5410).
- Fixed a bug where the cgroup VM limit would not be set for sizes over
2 Gibibytes.
(Ticket #5434).
- Fixed bugs that prevented the HTCondor daemons from working promptly at
startup when the condor_shared_port daemon was in use on Windows platforms.
(Ticket #5283).
(Ticket #5430).
(Ticket #5431).
(Ticket #5432).
(Ticket #5433).
- Added SELinux type enforcement rules to allow the condor_schedd
to use sendmail on Enterprise Linux 7 platforms.
(Ticket #5418).
- Fixed a bug where HTCondor service would not start if the
condor_master.pid file was empty on Linux platforms.
(Ticket #5427).
Version 8.4.2
Release Notes:
- HTCondor version 8.4.2 released on November 17, 2015.
New Features:
- condor_history no longer reports an error when run on a system that does
not have a history file.
This change was made because the history file is not created until after the
first job runs.
So, users were always seeing an error message on a fresh installation of
HTCondor.
(Ticket #5374).
Bugs Fixed:
- Fixed a bug introduced in 8.4.1 that could cause the condor_schedd
to exit.
This affected remote submit, HTCondor-CE, and HTCondor-C.
(Ticket #4522).
- The TCP_FORWARDING_HOST is now honored by
HTCondor client programs.
(Ticket #5339).
- Fixed a problem where Standard Universe jobs could not restart
from a checkpoint in the Enterprise Linux 6 RPM distribution.
(Ticket #5382).
(Ticket #5383).
- Fixed bugs in the function of the DAGMan
DAGMAN_MAX_JOBS_IDLE/-maxidle throttle,
especially for node jobs that create multiple procs.
(Ticket #5333).
- Fixed a problem where the RPMs would claim to publicly provide
Globus shared libraries that are in a private location.
(Ticket #5349).
- Added a default request_memory for condor_submit -interactive
of 512 megabytes. Formerly, the default was one, which is
insufficient in environments that strictly enforce memory
usage.
(Ticket #5344).
- Fixed a problem were the condor_classad RPM would claim to
provide a replacement for the classad RPM in EPEL.
(Ticket #5400).
- HTCondor now applies the configuration settings
GRIDMANAGER_GAHP_CALL_TIMEOUT and
GRIDMANAGER_CONNECT_FAILURE_RETRY_COUNT
when running grid universe jobs for EC2 or Google Compute Engine.
(Ticket #5300).
- Fixed a crash in the condor_schedd that happened when the
schedd was under load and being shutdown in the fast mode.
(Ticket #5371).
- Added a timeout to the condor_fetchlog command so that it
will not hang forever waiting for a unresponsive daemon.
(Ticket #5325).
- Fixed a problem that prevented HTCondor from building on some 64-bit Linux
platforms such as Arm64.
This was reported by Debian maintainers as their Bug 804386.
(Ticket #5380).
- Fixed a problem where the platform string was incorrect in the RPM
packages.
(Ticket #5384).
Known Issues:
- The DAGMan workflow log file is not correctly written for local
universe DAG node jobs that have no log file specified in the submit file,
which causes DAGMan to wait forever, thinking the jobs have not completed.
Note that this problem can be worked around by specifying any
log file for the job, even log = /dev/null.
(This bug is a regression that was introduced some time since version
8.2.4.)
(Ticket #5299).
- DAG node retries do not work correctly with DAG node submit files
that create more than one proc in the resulting cluster (such nodes
cause DAGMan to hang if the retry is activated).
We believe that this bug has existed since DAGMan first supported
multi-proc node jobs.
(Ticket #5350).
Version 8.4.1
Release Notes:
- HTCondor version 8.4.1 released on October 27, 2015.
Known Issues:
- Remote submit to an 8.4.1 condor_schedd is broken if file transfer is
used. This also means HTCondor-CE and HTCondor-C are broken. This bug will
be fixed in version 8.4.2.
(Ticket #4522).
- TCP_FORWARDING_HOST is disregarded by HTCondor clients
starting in version 8.3.6. This bug will be fixed in version 8.4.2 and 8.5.1.
(Ticket #5339).
New Features:
- Added support to allow an admin to always volume mount
certain directories into docker universe containers running
on a host.
(Ticket #5308).
- Added four policy metaknobs to simplify configuring a policy
to either preempt or hold jobs that use more memory
or CPU cores than provisioned in the slot. See the POLICY
category of metaknobs in section 3.3.1 for
additional information.
(Ticket #5250).
- Added configuration variables and documentation so that we uniformly prefer
<var>_ATTRS over <var>_EXPRS but support both. This includes
STARTD_ATTRS, STARTD_JOB_ATTRS and SUBMIT_ATTRS
which are often used by HTCondor sites which customize the configuration. These
configuration variables are now exclusively for use by HTCondor administrators;
The former default values for these variables have been moved into other configuration
which is reserved for use by HTCondor developers. This is done to prevent administrators
from accidentally removing the necessary defaults.
A warning about use of STARTD_EXPRS has been disabled unless
STARTD_ATTRS or SLOT_TYPE_<n>_STARTD_ATTRS is also used, since
the use all three of these at the same time is not supported.
(Ticket #5326).
- When condor_reconfig and condor_restart are run as root
they will check to see if the condor user has read access to all of the
configuration files before sending the command. This is done to prevent aborting the daemons
accidentally by sending reconfig after the admin creates a new config file and
forgets to give the condor user read access to that file.
(Ticket #4506).
- Added the -natural sort option to condor_status to sort the slots
in numerical order rather than alphabetical order.
(Ticket #5131).
Bugs Fixed:
Version 8.4.0
Release Notes:
- HTCondor version 8.4.0 released on September 14, 2015.
New Features:
Bugs Fixed:
- Fixed a bug introduced in HTCondor version 8.3.7 that caused the
condor_shared_port daemon to leak file descriptors.
Also made HTCondor work better when some HTCondor daemons
are using shared port, but the condor_master is not.
(Ticket #5259).
- The condor_starter lowers the OOM (out of memory) score of jobs
so the OOM killer is more likely to chose an HTCondor job rather than
an HTCondor daemon or other user process.
(Ticket #5249).
- Job submission fails if X.509 certificates are advertised with EC2
grid universe jobs.
Therefore EC2 grid universe jobs no longer advertise their access keys.
(Ticket #5252).
Next: 10.4 Development Release Series
Up: 10. Version History and
Previous: 10.2 Upgrading from the
Contents
Index