Next: 11.5 Development Release Series
Up: 11. Version History and
Previous: 11.3 Upgrading from the
11.4 Stable Release Series 8.6
This is a stable release series of HTCondor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 8.6.x releases.
New features will be added in the 8.7.x development series.
The details of each version are described below.
- HTCondor version 8.6.10 released on March 13, 2018.
- Fixed a bug that caused condor_preen to crash before it finished
cleaning the spool directory and leave a core file of its own in the log
This problem occurred on submit nodes that had running jobs when
condor_preen was invoked.
- Improved the systemd configuration to clean up HTCondor processes
on shutdown in the event that the condor_master fails to do so.
- HTCondor daemons will do fast shutdown whenever their parent process
- Fixed a bug that would cause condor_q to crash if the hostname
was longer than 64 bytes.
- Fixed a bug where if an administrator configured a Concurrency Limit
whose name ended in a number, condor_userprio -allusers would show
additional bogus user entries.
- Fixed a bug where the condor_starter would crash when talking
to a shadow running a condor version older than 8.5 and match authentication
- Fixed a bug in Python API htcondor.Secman().ping() method which
would sometimes result in a RunTimeError exception.
- Fixed a bug where policy: want_hold_if would always
evict standard universe jobs instead of putting them on hold. Instead,
this policy now ignores standard universe jobs entirely. This means
that the metaknobs policy: hold_if_memory_exceeded and
policy: hold_if_cpus_exceeded will also ignore
standard universe jobs entirely (instead of its previous bad behavior
of of letting standard universe jobs use more than their requested
memory until the first time they were evicted, whereafter each restart
would be immediately evicted).
- The metaknob policy: hold_if_memory_exceeded and
policy: preempt_if_memory_exceeded now ignore VM universe
jobs. These jobs can't exceed their requested memory.
- Fixed a bug which mischaracterized the MemoryUsage of VM
universe jobs. This should allow VM universe jobs to run when
feature: Hold_If_Memory_Exceeded is enabled.
- Fixed a bug where the condor_shadow could accidentally kill itself
by not checking if it was attempting to change immutable attributes.
- Fixed a bug that could cause the condor_collector to exit with an
assertion error under certain (rare) conditions when it has no
outgoing connectivity to the Internet.
- Fixed a bug that would cause any daemons interfacing with the CREDMON to
retry indefinitely when polling for credentials.
- Fixed a bug that prevented grid-type batch jobs from being removed
after an attempt to submit to the underlying batch system failed.
- Fixed a bug in python plugin support for the condor_collector that would result
in the condor_collector switching from writing from the CollectorLog to writing to
the ToolLog after a reconfig.
- Fixed a bug in the
$F() macro expansion in submit and configuration files
that would cause a crash if the argument to the macro was a file literal rather than
a variable name.
- Fixed a bug that allowed the condor_schedd to attempt to run jobs
on a dynamic slot that requested more resources than the slot provided.
- HTCondor version 8.6.9 released on January 4, 2018.
- When a daemon crashes, more information about the cause is now
written to its log file.
- Fixed a bug in the group quotas that would give too much surplus
quota to some groups when ACCEPT_SURPLUS is on and
NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION is true (the default)
- Fixed a bug in the Python bindings when doing queries that specify a
projection with the ``attr_list'' argument. The bug could could
potentially result in memory corruption of the python interpreter process.
- Reduced the amount of time that condor_preen will block the
condor_preen now connects only when specifically needed,
and automatically disconnects after
- Fixed a bug on Windows that would often result in the job sandbox
on the execute node not being deleted when the condor_schedd relinquished
its claim on the slot before the condor_starter had exited.
- Fixed a bug where the condor_master stopped sending watchdog
notifications to systemd after restarting itself.
This resulted in systemd killing the condor_master shortly after
- Updated the systemd configuration to only restart HTCondor upon
failure. Otherwise, systemd would restart HTCondor if condor_off
requested the condor_master to exit.
- Fixed a bug with the use of the scheduler parameter
MAX_JOBS_SUBMITTED. If this limit was ever reached by a
submit with more than one proc in the cluster, the limit would be
reduced by the difference until the condor_schedd was restarted.
- Fixed a bug that caused very large RequestDisk requests
to fail, and cause the Disk attribute in the machine ad to go
- Fixed a bug with the RESERVED_DISK parameter that would not
accept an argument larger than 2 Gigabytes.
- Improved validation of the lengths of messages in PASSWORD
and SSL authentication methods.
- Fixed a problem where the VM universe would be taken offline
on the execute node, if the qcow2 disk image was corrupt.
The offending job is now put on hold with an appropriate hold message.
- Fixed a problem which would prevent Java universe jobs from working
when using a relative path name to a jar file and submitting from Linux to
Windows or vice versa.
- Fixed a bug on 32 bit Linux systems that caused the starter to crash
on startup if cgroup limits were enabled.
- Fixed a bug in Startd Cron (see 4.4.3)
where, in effect, SlotMergeConstraint was ignored.
- Fixed a bug when IPv6 is enabled which could cause the
condor_startd to crash when spawning a starter.
- Fixed a bug in condor_q which could cause the DONE amount to be
incorrect when multiple clusters shared a batch name.
- Fixed issue on newer versions of Linux where core files generated
by a daemon were not usable by gdb.
A side effect of this fix is that the configuration parameter
CORE_FILE_NAME no longer has any effect on Linux.
- condor_chirp will now no longer abort when given a command that it
cannot successfully execute, such as fetching a file that does not exist.
- Removed unneeded copy_to_spool statement from default
interactive submit file.
- HTCondor version 8.6.8 released on November 14, 2017.
- HTCondor version 8.6.7 released on October 31, 2017.
- Added support for HTTPS transfers in the curl_plugin utility.
- Job attributes that are recognized by the batch_gahp
but not by HTCondor can now be specified in the job ad without using
a prefix of Remote_.
- Fixed a bug that caused systems using cgroup memory limits to
not properly reset the memory limit after the first use of a slot. The memory
limit would get reused from the previous slot value.
- Added SELinux type enforcement rules to allow condor_ssh_to_job
to function on Enterprise Linux 7.
- Asking systemd to stop condor now allows the HTCondor daemons to properly
clean up, instead of simply immediately sending a SIGKILL. As a result,
HTCondor daemons stopped via systemd will no longer continue to appear
alive with condor_status.
- Fixed problems in python bindings when using the Submit class
to submit jobs specifying environment variables or file redirection.
- Change the default value of STARTD_RECOMPUTE_DISK_FREE to false, so
that the Disk attribute is mostly correct for partitionable slots.
- Docker universe now sets the cgroup cpu-shares field to 100 times the
number of requested cores, which matches vanilla universe.
- MOUNT_UNDER_SCRATCH when used in Docker universe can now be an
expression, not just a literal string.
This matches the way it works in vanilla universe.
- Fixed a bug that could cause the condor_startd to crash when spawning
a condor_starter with mixed mode networking.
- Fixed a bug that caused the condor_collector on Windows to refuse
connections whenever the number of open sockets was more than 820 even though
space was allocated for 1024 open sockets.
- Fixed a bug that caused the configuration variable
DEFAULT_MASTER_SHUTDOWN_SCRIPT to be ignored on Windows when the
condor_master was running as a service.
- condor_status will now print longer lines when its output is
redirected into a pipe, rather than its input coming from one.
- Fixed a crash in condor_transferer when a connection can't be
established with its peer.
- Fixed a bug that caused condor_job_router_info to crash if
configuration parameter JOB_ROUTER_ENTRIES_REFRESH was
set to a positive value.
- Fixed a bug in condor_history that caused it to print invalid
XML or JSON syntax when reading from multiple history files.
- Fixed a bug in the condor_schedd which resulted in the IsNoopJob
job attribute sometimes being ignored if the the value of this attribute was
changed after the job was submitted.
- Fixed a bug that rarely caused slurm jobs to be held.
When slurm reports memory utilization and it is a multiple of 1024k,
Slurm uses the 'M' suffix.
The parsing logic was extended to also interpret the 'M', 'G', 'T', and 'P'
suffixes for memory utilization.
- The condor-bosco RPM ensures the rsync is installed as required
by the Bosco scripts.
- To avoid unnecessary transfers when copy_to_spool is set in
the submit file, HTCondor no longer copies the executable to the
local spool directory more than once for a cluster.
- HTCondor version 8.6.6 released on September 12, 2017.
- Fixed a bug that might cause the condor_schedd or other daemons to
crash when logging on Linux to the syslog facility, and the condor_reconfig
command was run.
- Fixed a bug that prevented condor daemons from writing out a
core file for debugging in the very unlikely event that one of them
- Fixed a bug where the negotiator would make matches where the daemons
involved did not share an IP version, and thus could not talk to each other.
- HTCondor now works properly with systemd's watchdog feature on
all flavors of Linux.
Previously, the condor_master wouldn't send alive messages to systemd
if systemd wasn't part of the Linux distribution's standard configuration.
This would result in systemd killing the HTCondor daemons after a
short period of time.
- Fixed handling of backslashes in string values in old ClassAds
format in the python bindings.
- Fixed a bug in how the CPU usage of Slurm jobs is interpreted.
- Fixed a bug that caused a machine claimed by a parallel universe
job to stick in the Claimed/Idle state forever. This could only happen
if the job was removed as it was in the process of claiming resources.
- Fixed a bug that caused a machine to stick in the
Preempting/Vacating state after a job was removed
when a JOB_EXIT_HOOK was defined.
- Added type enforcement rules for cgroups to HTCondor's SELinux
- Fixed a bug where setting delegate_job_gsi_credentials_lifetime
to 0 in a submit description file was treated the same as not setting
it at all.
- Fixed handling of octal escape sequences in ClassAd strings.
- Updated Boost external to version 1.64.
- HTCondor version 8.6.5 released on August 1, 2017.
- Added avx2 to the set of processor flags advertised by the
- Fixed a bug in socket clean-up that was causing a memory leak. This
may have been particularly noticeable in the condor_collector.
- Fixed a bug that caused an infinite loop in the condor_starter when
cgroups were enabled on systems (such as Debian) where the kernel has disabled
the memory accounting controller. A job on such a system would go into the
"R" state, but never actually start running.
- Fixed a bug where setting NETWORK_INTERFACE to an
IPv6 address could cause HTCondor daemons to except.
- Fixed a bug where a cross protocol CCB connection would cause the
condor_shadow or condor_schedd to except.
- Fixed a bug where the wildcard '*' in ALLOW or DENY lists was
being interpreted as only matching IPv4 addresses. It now properly
matches any address family.
- Fixed a bug where reverse resolutions could return the string
representation of the address in question instead of failing. This
resulted in spurious warnings of the form "WARNING: forward resolution of
2001:630:10:f001::19a0 doesn't match 2001:630:10:f001::19a0!"
- Fixed a bug which prevented using an ImDisk RAM disk
as the execute directory on Windows.
- Fixed a bug where removal of a job could cause another job from
the same user to also be removed.
This was mostly likely to happen when the condor_schedd is under
- Fixed a bug that cause parallel universe jobs not to start on
pools with partitionable slots.
- Fixed a problem, introduced in HTCondor 8.6.4, where the
condor_collector plugins where loaded but not used.
- Fixed a bug where "condor_q -grid" did not display the
status column for any non-Globus job.
- Fixed bugs in the condor_schedd and condor_negotiator that
could cause jobs to not be negotiated for when
NEGOTIATOR_PREFETCH_REQUESTS is set to TRUE.
- HTCondor version 8.6.4 released on June 22, 2017.
- Fixed a bug with PASSWORD authentication that would sporadically cause
it to fail to exchange keys, due to whether or not the first round-trip of
communications blocked on reading from the network.
- Pslot preemption now properly handles machine custom resources,
such as GPUs.
- Fixed a bug that prevented HTCondor from correctly setting
virtual memory cgroup limits when soft physical memory limits
were being used.
- Fixed a bug that prevented parallel universe jobs from running
that used $$() expansion in submit files.
- Added a new knob, STARTD_RECOMPUTE_DISK_FREE, which defaults
to true, which tells the startd to periodically recompute and advertise free
disk space. Admins can set this to false for partitionable slots whose execute
directory is used by HTCondor alone.
- Fixed a bug that could cause condor_submit to fail to submit a
job with a proxy file to a condor_schedd older than 8.5.8, due to the
absence of an X.509 CA certificates directory.
- Restored a check in condor_submit about whether the job's X.509
proxy has sufficient lifetime remaining.
- Fixed a bug in condor_dagman where the DAG status file showed an
incorrect status code if submit attempts failed for the final node.
- Bosco now properly identifies CentOS 7 as a supported platform.
- Fixed a bug when Bosco is used to submit jobs to multiple remote
clusters. When arguments to remote_gahp are provided in the
GridResource attribute, jobs could be submitted to the wrong cluster.
- To speed up the installation process on Enterprise Linux 7, the
SELinux profile is now reloaded only once, when setting the HTCondor
daemons to run in permissive mode.
- Update the systemd configuration on Enterprise Linux 7 to start the
condor_master after time synchronization is achieved. This prevents
unnecessary daemon restarts due to sudden time shifts.
- The condor_shadow will now ignore updates of JobStartDate
from the condor_starter since the condor_schedd already sets this
attribute correctly and the condor_starter incorrectly tries to set it
even if the job has already run once. A consequence of this fix is that the
value of JobStartDate that the condor_startd uses for policy
expressions will be different than the value that the condor_schedd uses.
Resolving this problem will potentially break existing policy expressions
in the condor_startd, so it will be be not be changed in the 8.6 series,
but fixed in the 8.7 series.
- Fixed a bug where per-instance job attributes like RemoteHost
would show up in the history file for completed jobs. This bug occurred if
a job happened to complete while the condor_schedd was in the process of a
- The condor_convert_history command is present again in this release.
- The parameter SETTABLE_ATTRS_ADMINISTRATOR is now correctly
appears in condor_config_val.
- HTCondor version 8.6.3 released on May 9, 2017.
- Fixed a bug that rarely corrupts the condor_schedd's job queue
log file when the input sandbox of a job with an X.509 proxy file is
- Fixed a memory leak in the Python bindings related to logging.
- HTCondor version 8.6.2 released on April 24, 2017.
- Added metaknobs for defining map files for use with the ClassAd usermap function
in the condor_schedd, and a metaknob for automatically assigning an accounting group to
a job based on a mapping of the owner name of the job.
- When the condor_credd is polling for credentials, the timeout is now
configurable using CREDD_POLLING_TIMEOUT.
- The reverse option for condor_q was changed to reverse-analyze,
and it now implies better-analyze. Formerly, the reverse option was ignored
unless -better-analyze was also specified.
- Fixed a bug that could cause condor_store_cred to fail on
Windows due to a case-sensitive check of the user's account name.
- Updated Open MPI helper script to catch and handle SIGTERM and
to use bash explicitly.
- Docker Universe jobs now update the RemoteSysCpu attributes for job
and in the job log. Previously, this field was always 0.
- Docker universe detection is now more robust in the
face of extraneous output to standard error on docker startup.
This was preventing Condor from detecting that docker was properly
working on hosts.
- Fixed a bug that prevented SUBMIT_REQUIREMENT and
JOB_TRANSFORM expressions from referencing job attributes
describing the job's X.509 proxy credential.
- The Linux kernel tuning script no longer adjusts some kernel parameters
unless a condor_schedd will be started by the master.
- Fixed a bug that caused all but the first in a list of metaknobs to be ignored
unless there were commas separating the list items. So use ROLE : Execute CentralManager
would incorrectly add only the Execute role.
Previously, use ROLE : Execute, CentralManager would correctly add both roles.
- Worked around a problem with FORTRAN programs built with condor_compile
and recent versions of gfortran (4.7.2 was OK, 4.8.5 was not), where those
executables would not write to standard out if started in the standard universe.
Also, updated the checkpointing library to permit condor_compile to
successfully link FORTRAN (and other) programs calling certain math
functions and built against up-to-date versions of glibc.
- The default values for HAD_SOCKET_NAME and
REPLICATION_SOCKET_NAME have changed to enable the documented
configuration for using these services with shared port to work.
- Fixed a bug that caused condor_dagman to sometimes (rarely, but
repeatably) crash when parsing DAGs containing splices.
- The configuration parameters that control when job policy expressions
are evaluated now work as documented.
Previously, the default value for PERIODIC_EXPR_INTERVAL was
300, not 60 as intended.
Also, the parameters MAX_PERIODIC_EXPR_INTERVAL and
PERIODIC_EXPR_TIMESLICE were ignored for grid universe jobs.
- Fixed a bug that could cause the Job Router to crash if the
job_queue.log contained invalid or incomplete records.
- Fixed a bug that caused updates of the job attribute
x509UserProxyExpiration to be ignored for job policy evaluation
when the job was managed by the Job Router.
- Changed the default value of configuration parameters
CREAM_GAHP_WORKER_THREADS to the value of
This should prevent a back-log of commands in the CREAM GAHP observed
by some users.
- Fixed modification of PYTHONPATH environment variable that
could fail in bash if set -u is enabled.
- bosco_quickstart no longer assumes that submitting to a Slurm
cluster requires the PBS emulation module.
- Fixed a bug that caused condor_submit -dump to crash when
the submit file had an attribute to enable the use of an x509 user proxy.
- Updated the supported platform list in the Bosco installer script to
include Ubuntu 16 and Mac OSX 10.12. Also, dropped Ubuntu 12 and Mac OSX
10.6 through 10.9.
- Fixed a bug which in some obscure configurations caused a spurious
PERMISSION DENIED error was printed in the StartLog when activating a claim.
- Fixed a bug which forced the administrator to restart (rather than
reconfigure) running daemons after adding an entry to a DENY_*
- HTCondor version 8.6.1 released on March 2, 2017.
- condor_q now checks to see if authentication and security negotiation are enabled before attempting to
request only the current users jobs from the condor_schedd. Prior to this change, configurations that disabled
security or authentication would also need to set CONDOR_Q_ONLY_MY_JOBS to false.
- The CLAIMTOBE authentication method is now in the list of methods for READ access if no list of
authentication methods for READ or DEFAULT is specified in the configuration. This change allows sites that
use the default host based security model to use condor_q -global with the only-my-jobs feature
without making changes to their security configuration.
- The collector now records the authentication method used to determine the authenticated identity.
- Update Docker interface to be able to retrieve usage information
from running containers and to remove containers when certain errors
occurred when using Docker version 1.13.
- In Docker universe, all writes to files in /tmp and /var/tmp by default
write inside the container. There is a limit on the file size within the container,
and jobs that write a lot to /tmp may hit that. If a docker universe job now runs
on a system with MOUNT_UNDER_SCRATCH defined, HTCondor now adds those
mounts as volume mounts, so file writes do not go to the container, but to the host
- Fixed a bug in condor_status -format and condor_q -format that caused the
tools to truncate output to the width specified in the format specifier. The most likely manifestation of
this bug was that punctuation after the format would not be printed when the format had an explicit width.
- Fixed a bug that caused spurious shared port-related error
messages to appear in the dagman.out file (by adding the
new DAGMAN_USE_SHARED_PORT configuration macro).
- Fixed a bug that caused VM universe jobs to fail if the
vm_disk submit command contained spaces after a comma.
- Fixed a bug that can cause the Job Router and condor_c-gahp to
crash if they fail to submit a job due to submit transforms or
- Fixed a bug that caused the Job Router to not route any jobs if
the JOB_ROUTER_DEFAULTS configuration parameter value
started with white space.
- Fixed several bugs in how the Job Router writes to job event logs.
- Removed Bosco's attempt to configure a default value for
grid_resource in the submit description file, as
condor_submit no longer supports this ability.
Also, Bosco now works with Slurm clusters.
- Changed Bosco's configuration of the condor_ft-gahp to eliminate
worrying error messages in the condor_ft-gahp's log file.
- Fixed a bug that could cause a grid batch job submitted to PBS or
Slurm to go on hold when the job's X.509 proxy is refreshed.
- Fixed a bug where the condor_gridmanager fails to put a job on
hold due to the desired hold reason containing invalid characters.
- Improved the hold reason when submission of a grid-type batch
- Update helper scripts to work with current versions of Open MPI and MPICH2.
- Fixes a bug that could cause events for local universe jobs to not
be written to the global event log.
- Fixed a bug on execute machines that enable PID namespaces that
would generate a spurious error message in the daemon log when condor_off -fast was issued.
- Fixed a bug that could corrupt the job queue log file such that
the condor_schedd cannot restart.
The bug is mostly likely to occur if the disk becomes full.
- Incremented the ClassAd library version number, since the deprecated
iostream interface has been removed.
- HTCondor version 8.6.0 released on January 26, 2017.
- Added two new job ClassAd attributes, CumulativeRemoteSysCpu and
CumulativeRemoteUserCpu, which keep a running total of system and user
CPU usage, respectively, across all job restarts. Also, immediately clear attributes
RemoteSysCpu and RemoveUserCpu on job start, instead of on first update.
- Added a new configuration knob, ALWAYS_REUSEADDR, which defaults
to True. When True, it tells HTCondor to set the
SO_REUSEADDR socket option, so that
the schedd can run large numbers of very short jobs without exhausting the
number of local ports needed for shadows.
- Changed the default value of IGNORE_LEAF_OOM to True.
- Fixed a bug causing unnecessarily slow updates from the condor_startd.
If you depend on the old behavior, set UPDATE_SPREAD_TIME to 8. A
value of 0 enables the fix.
- Fixed a race condition when running multiple concurrent jobs on the same claim.
When the starter exits, it notifies the shadow, which tells the startd to kill the starter.
Immediately after the shadows tells the startd, it fetches the next job, and tries to start it.
If the starter hasn't completely exited yet (perhaps it needs to clean up a large sandbox),
it will notice the shadow has closed the command socket, and the starter will go into disconnected
mode, and get confused. This has been fixed.
- Fixed an infelicity with condor_submit -i and docker universe,
where it would start an interactive shell without a container. Added error
message expressing that this combination is not currently supported.
- When a job claimed by the Job Router is held or removed, it is no
longer considered a failure of the job route chosen for that job.
- Fixed a bug in recovering a Google Compute Engine (GCE) job if the
condor_gridmanager restarts during submission of the instance request.
- Fixed a bug that could cause re-installation of a remote cluster
to fail in Bosco.
- Fixed a bug with handling the proxy files of grid-type batch jobs
when the proxy's file name is a relative path.
- Fixed a bug that caused the batch_gahp to crash when a job's
X.509 proxy is refreshed and the batch_gahp is configured to not
create a limited copy of the proxy.
- Fixed a bug in the virtual machine universe where RequestMemory
and RequestCPUs were not changing the resources assigned to the VM
created by HTCondor. Now, VM_Memory defaults to RequestMemory,
and the number of CPUs defaults to RequestCPUs.
Next: 11.5 Development Release Series
Up: 11. Version History and
Previous: 11.3 Upgrading from the