This is an outdated version of the HTCondor Manual. You can find current documentation at http://htcondor.org/manual.

Next: 10.4 Development Release Series Up: 10. Version History and Previous: 10.2 Upgrading from the Contents Index

Subsections

10.3 Stable Release Series 8.6

This is a stable release series of HTCondor. As usual, only bug fixes (and potentially, ports to new platforms) will be provided in future 8.6.x releases. New features will be added in the 8.7.x development series.

The details of each version are described below.

Version 8.6.5

Release Notes:

HTCondor version 8.6.5 released on August 1, 2017.

New Features:

Added avx2 to the set of processor flags advertised by the condor_startd. (Ticket #6317).

Bugs Fixed:

Fixed a bug in socket clean-up that was causing a memory leak. This may have been particularly noticeable in the condor_collector. (Ticket #6342).
Fixed a bug that caused an infinite loop in the condor_starter when cgroups were enabled on systems (such as Debian) where the kernel has disabled the memory accounting controller. A job on such a system would go into the "R" state, but never actually start running. (Ticket #6347).
Fixed a bug where setting NETWORK_INTERFACE to an IPv6 address could cause HTCondor daemons to except. (Ticket #6339).
Fixed a bug where a cross protocol CCB connection would cause the condor_shadow or condor_schedd to except. (Ticket #6344).
Fixed a bug where the wildcard '*' in ALLOW or DENY lists was being interpreted as only matching IPv4 addresses. It now properly matches any address family. (Ticket #6340).
Fixed a bug where reverse resolutions could return the string representation of the address in question instead of failing. This resulted in spurious warnings of the form "WARNING: forward resolution of 2001:630:10:f001::19a0 doesn't match 2001:630:10:f001::19a0!" (Ticket #6338).
Fixed a bug which prevented using an ImDisk RAM disk as the execute directory on Windows. (Ticket #6324).
Fixed a bug where removal of a job could cause another job from the same user to also be removed. This was mostly likely to happen when the condor_schedd is under heavy load. (Ticket #6353).
Fixed a bug that cause parallel universe jobs not to start on pools with partitionable slots. (Ticket #6308).
Fixed a problem, introduced in HTCondor 8.6.4, where the condor_collector plugins where loaded but not used. (Ticket #6343).
Fixed a bug where "condor_q -grid" did not display the status column for any non-Globus job. (Ticket #6306).
Fixed bugs in the condor_schedd and condor_negotiator that could cause jobs to not be negotiated for when NEGOTIATOR_PREFETCH_REQUESTS is set to TRUE. (Ticket #6336). (Ticket #6312).

Version 8.6.4

Release Notes:

HTCondor version 8.6.4 released on June 22, 2017.

New Features:

Python bindings are now available on MacOSX. (Ticket #6244).
Allow Python modules to be used as condor_collector plugin. This undocumented feature is to be used by expert developers only. (Ticket #6213). (Ticket #6295).

Bugs Fixed:

Fixed a bug with PASSWORD authentication that would sporadically cause it to fail to exchange keys, due to whether or not the first round-trip of communications blocked on reading from the network. (Ticket #6265).
Pslot preemption now properly handles machine custom resources, such as GPUs. (Ticket #6297).
Fixed a bug that prevented HTCondor from correctly setting virtual memory cgroup limits when soft physical memory limits were being used. (Ticket #6294).
Fixed a bug that prevented parallel universe jobs from running that used $$() expansion in submit files. (Ticket #6299).
Added a new knob, STARTD_RECOMPUTE_DISK_FREE, which defaults to true, which tells the startd to periodically recompute and advertise free disk space. Admins can set this to false for partitionable slots whose execute directory is used by HTCondor alone. (Ticket #6301).
Fixed a bug that could cause condor_submit to fail to submit a job with a proxy file to a condor_schedd older than 8.5.8, due to the absence of an X.509 CA certificates directory. (Ticket #6258).
Restored a check in condor_submit about whether the job's X.509 proxy has sufficient lifetime remaining. (Ticket #6283).
Fixed a bug in condor_dagman where the DAG status file showed an incorrect status code if submit attempts failed for the final node. (Ticket #6069).
Bosco now properly identifies CentOS 7 as a supported platform. (Ticket #6303).
Fixed a bug when Bosco is used to submit jobs to multiple remote clusters. When arguments to remote_gahp are provided in the GridResource attribute, jobs could be submitted to the wrong cluster. (Ticket #6277).
To speed up the installation process on Enterprise Linux 7, the SELinux profile is now reloaded only once, when setting the HTCondor daemons to run in permissive mode. (Ticket #6304).
Update the systemd configuration on Enterprise Linux 7 to start the condor_master after time synchronization is achieved. This prevents unnecessary daemon restarts due to sudden time shifts. (Ticket #6255).
The condor_shadow will now ignore updates of JobStartDate from the condor_starter since the condor_schedd already sets this attribute correctly and the condor_starter incorrectly tries to set it even if the job has already run once. A consequence of this fix is that the value of JobStartDate that the condor_startd uses for policy expressions will be different than the value that the condor_schedd uses. Resolving this problem will potentially break existing policy expressions in the condor_startd, so it will be be not be changed in the 8.6 series, but fixed in the 8.7 series. (Ticket #6280).
Fixed a bug where per-instance job attributes like RemoteHost would show up in the history file for completed jobs. This bug occurred if a job happened to complete while the condor_schedd was in the process of a graceful shutdown. (Ticket #6251).
The condor_convert_history command is present again in this release. (Ticket #6282).
The parameter SETTABLE_ATTRS_ADMINISTRATOR is now correctly appears in condor_config_val. (Ticket #6286).

Version 8.6.3

Release Notes:

HTCondor version 8.6.3 released on May 9, 2017.

Bugs Fixed:

Fixed a bug that rarely corrupts the condor_schedd's job queue log file when the input sandbox of a job with an X.509 proxy file is spooled. (Ticket #6240).
Fixed a memory leak in the Python bindings related to logging. (Ticket #6227).

Version 8.6.2

Release Notes:

HTCondor version 8.6.2 released on April 24, 2017.

New Features:

Added metaknobs for defining map files for use with the ClassAd usermap function in the condor_schedd, and a metaknob for automatically assigning an accounting group to a job based on a mapping of the owner name of the job. (Ticket #6179).
When the condor_credd is polling for credentials, the timeout is now configurable using CREDD_POLLING_TIMEOUT.
The reverse option for condor_q was changed to reverse-analyze, and it now implies better-analyze. Formerly, the reverse option was ignored unless -better-analyze was also specified. (Ticket #6167).

Bugs Fixed:

Fixed a bug that could cause condor_store_cred to fail on Windows due to a case-sensitive check of the user's account name. (Ticket #6200).
Updated Open MPI helper script to catch and handle SIGTERM and to use bash explicitly. (Ticket #6194).
Docker Universe jobs now update the RemoteSysCpu attributes for job and in the job log. Previously, this field was always 0. (Ticket #6173).
Docker universe detection is now more robust in the face of extraneous output to standard error on docker startup. This was preventing Condor from detecting that docker was properly working on hosts. (Ticket #6185).
Fixed a bug that prevented SUBMIT_REQUIREMENT and JOB_TRANSFORM expressions from referencing job attributes describing the job's X.509 proxy credential. (Ticket #6188).
The Linux kernel tuning script no longer adjusts some kernel parameters unless a condor_schedd will be started by the master. (Ticket #6208).
Fixed a bug that caused all but the first in a list of metaknobs to be ignored unless there were commas separating the list items. So use ROLE : Execute CentralManager would incorrectly add only the Execute role. Previously, use ROLE : Execute, CentralManager would correctly add both roles. (Ticket #6171).
Worked around a problem with FORTRAN programs built with condor_compile and recent versions of gfortran (4.7.2 was OK, 4.8.5 was not), where those executables would not write to standard out if started in the standard universe. Also, updated the checkpointing library to permit condor_compile to successfully link FORTRAN (and other) programs calling certain math functions and built against up-to-date versions of glibc. (Ticket #6026).
The default values for HAD_SOCKET_NAME and REPLICATION_SOCKET_NAME have changed to enable the documented configuration for using these services with shared port to work. (Ticket #6186).
Fixed a bug that caused condor_dagman to sometimes (rarely, but repeatably) crash when parsing DAGs containing splices. (Ticket #6170).
The configuration parameters that control when job policy expressions are evaluated now work as documented. Previously, the default value for PERIODIC_EXPR_INTERVAL was 300, not 60 as intended. Also, the parameters MAX_PERIODIC_EXPR_INTERVAL and PERIODIC_EXPR_TIMESLICE were ignored for grid universe jobs. (Ticket #6199).
Fixed a bug that could cause the Job Router to crash if the job_queue.log contained invalid or incomplete records. (Ticket #6195).
Fixed a bug that caused updates of the job attribute x509UserProxyExpiration to be ignored for job policy evaluation when the job was managed by the Job Router. (Ticket #6209).
Changed the default value of configuration parameters CREAM_GAHP_WORKER_THREADS to the value of GRIDMANAGER_MAX_PENDING_REQUESTS. This should prevent a back-log of commands in the CREAM GAHP observed by some users. (Ticket #6071).
Fixed modification of PYTHONPATH environment variable that could fail in bash if set -u is enabled. (Ticket #6211).
bosco_quickstart no longer assumes that submitting to a Slurm cluster requires the PBS emulation module. (Ticket #6211).
Fixed a bug that caused condor_submit -dump to crash when the submit file had an attribute to enable the use of an x509 user proxy. (Ticket #6197).
Updated the supported platform list in the Bosco installer script to include Ubuntu 16 and Mac OSX 10.12. Also, dropped Ubuntu 12 and Mac OSX 10.6 through 10.9. (Ticket #6178).
Fixed a bug which in some obscure configurations caused a spurious PERMISSION DENIED error was printed in the StartLog when activating a claim. (Ticket #6172)..
Fixed a bug which forced the administrator to restart (rather than reconfigure) running daemons after adding an entry to a DENY_* authorization list. (Ticket #6172)..

Version 8.6.1

Release Notes:

HTCondor version 8.6.1 released on March 2, 2017.

New Features:

condor_q now checks to see if authentication and security negotiation are enabled before attempting to request only the current users jobs from the condor_schedd. Prior to this change, configurations that disabled security or authentication would also need to set CONDOR_Q_ONLY_MY_JOBS to false. (Ticket #6125).
The CLAIMTOBE authentication method is now in the list of methods for READ access if no list of authentication methods for READ or DEFAULT is specified in the configuration. This change allows sites that use the default host based security model to use condor_q -global with the only-my-jobs feature without making changes to their security configuration. (Ticket #6125).
The collector now records the authentication method used to determine the authenticated identity. (Ticket #6122).

Bugs Fixed:

Update Docker interface to be able to retrieve usage information from running containers and to remove containers when certain errors occurred when using Docker version 1.13. (Ticket #6088).
In Docker universe, all writes to files in /tmp and /var/tmp by default write inside the container. There is a limit on the file size within the container, and jobs that write a lot to /tmp may hit that. If a docker universe job now runs on a system with MOUNT_UNDER_SCRATCH defined, HTCondor now adds those mounts as volume mounts, so file writes do not go to the container, but to the host file system. (Ticket #6080).
Fixed a bug in condor_status -format and condor_q -format that caused the tools to truncate output to the width specified in the format specifier. The most likely manifestation of this bug was that punctuation after the format would not be printed when the format had an explicit width. (Ticket #6120).
Fixed a bug that caused spurious shared port-related error messages to appear in the dagman.out file (by adding the new DAGMAN_USE_SHARED_PORT configuration macro). (Ticket #6156).
Fixed a bug that caused VM universe jobs to fail if the vm_disk submit command contained spaces after a comma. (Ticket #6132).
Fixed a bug that can cause the Job Router and condor_c-gahp to crash if they fail to submit a job due to submit transforms or submit requirements. (Ticket #6152).
Fixed a bug that caused the Job Router to not route any jobs if the JOB_ROUTER_DEFAULTS configuration parameter value started with white space. (Ticket #6128).
Fixed several bugs in how the Job Router writes to job event logs. (Ticket #6092).
Removed Bosco's attempt to configure a default value for grid_resource in the submit description file, as condor_submit no longer supports this ability. Also, Bosco now works with Slurm clusters. (Ticket #6106).
Changed Bosco's configuration of the condor_ft-gahp to eliminate worrying error messages in the condor_ft-gahp's log file. (Ticket #6107).
Fixed a bug that could cause a grid batch job submitted to PBS or Slurm to go on hold when the job's X.509 proxy is refreshed. (Ticket #6136).
Fixed a bug where the condor_gridmanager fails to put a job on hold due to the desired hold reason containing invalid characters. (Ticket #6142).
Improved the hold reason when submission of a grid-type batch job fails. (Ticket #3377).
Update helper scripts to work with current versions of Open MPI and MPICH2. (Ticket #6024).
Fixes a bug that could cause events for local universe jobs to not be written to the global event log. (Ticket #6100).
Fixed a bug on execute machines that enable PID namespaces that would generate a spurious error message in the daemon log when condor_off -fast was issued. (Ticket #6137).
Fixed a bug that could corrupt the job queue log file such that the condor_schedd cannot restart. The bug is mostly likely to occur if the disk becomes full. (Ticket #6153).
Incremented the ClassAd library version number, since the deprecated iostream interface has been removed. (Ticket #6050). (Ticket #6115).

Version 8.6.0

Release Notes:

HTCondor version 8.6.0 released on January 26, 2017.

New Features:

Added two new job ClassAd attributes, CumulativeRemoteSysCpu and CumulativeRemoteUserCpu, which keep a running total of system and user CPU usage, respectively, across all job restarts. Also, immediately clear attributes RemoteSysCpu and RemoveUserCpu on job start, instead of on first update. (Ticket #6022).
Added a new configuration knob, ALWAYS_REUSEADDR, which defaults to True. When True, it tells HTCondor to set the SO_REUSEADDR socket option, so that the schedd can run large numbers of very short jobs without exhausting the number of local ports needed for shadows. (Ticket #6040).
Changed the default value of IGNORE_LEAF_OOM to True. (Ticket #5775).

Bugs Fixed:

Fixed a bug causing unnecessarily slow updates from the condor_startd. If you depend on the old behavior, set UPDATE_SPREAD_TIME to 8. A value of 0 enables the fix. (Ticket #6062).
Fixed a race condition when running multiple concurrent jobs on the same claim. When the starter exits, it notifies the shadow, which tells the startd to kill the starter. Immediately after the shadows tells the startd, it fetches the next job, and tries to start it. If the starter hasn't completely exited yet (perhaps it needs to clean up a large sandbox), it will notice the shadow has closed the command socket, and the starter will go into disconnected mode, and get confused. This has been fixed. (Ticket #6049).
Fixed an infelicity with condor_submit -i and docker universe, where it would start an interactive shell without a container. Added error message expressing that this combination is not currently supported. (Ticket #6083).
When a job claimed by the Job Router is held or removed, it is no longer considered a failure of the job route chosen for that job. (Ticket #5968).
Fixed a bug in recovering a Google Compute Engine (GCE) job if the condor_gridmanager restarts during submission of the instance request. (Ticket #6078).
Fixed a bug that could cause re-installation of a remote cluster to fail in Bosco. (Ticket #6042).
Fixed a bug with handling the proxy files of grid-type batch jobs when the proxy's file name is a relative path. (Ticket #6053).
Fixed a bug that caused the batch_gahp to crash when a job's X.509 proxy is refreshed and the batch_gahp is configured to not create a limited copy of the proxy. (Ticket #6051).
Fixed a bug in the virtual machine universe where RequestMemory and RequestCPUs were not changing the resources assigned to the VM created by HTCondor. Now, VM_Memory defaults to RequestMemory, and the number of CPUs defaults to RequestCPUs. (Ticket #5998).

Next: 10.4 Development Release Series Up: 10. Version History and Previous: 10.2 Upgrading from the Contents Index