Stable Release Series 8.8

11.4 Stable Release Series 8.8

This is the stable release series of HTCondor. As usual, only bug fixes (and potentially, ports to new platforms) will be provided in future 8.8.x releases. New features will be added in the 8.9.x development series.

The details of each version are described below.

Version 8.8.2

Release Notes:

HTCondor version 8.8.2 released on April 11, 2019.

New Features:

Added a new parameter SINGULARITY_IS_SETUID , which defaults to true. If false, allows condor_ssh_to_job to work when Singularity is configured to run without the setuid wrapper. (Ticket #6931).
The negotiator parameter NEGOTIATOR_DEPTH_FIRST has been added which, when using partitionable slots, fill each machine up with jobs before trying to use the next available machine. (Ticket #5884).
The Python bindings ClassAd module has a new printJson() method to serialize a ClassAd into a string in JSON format. (Ticket #6950).

Bugs Fixed:

Support for the condor_ssh_to_job command, when ssh’ing to a Singularity job, requires the nsenter command. Previous versions of HTCondor relied on features of nsenter not universally available. 8.8.2 now works with all known versions of nsenter. (Ticket #6934).
Moved the execution of USER_JOB_WRAPPER with Singularity jobs to be executed outside the container, not inside the container. (Ticket #6904).
Fixed a bug where condor_ssh_to_job would not work to a Docker universe job when file transfer was off. (Ticket #6945).
Included a patch from the development series that fixes problems that could crash condor_annex to crash. (Ticket #6980).
Fixed a bug that could cause the job_queue.log file to be corrupted when the condor_schedd compacts it. (Ticket #6929).
The condor_userprio command, when given the -negotiator and -l options used to emit the value of the concurrency limits in the one large ClassAd it printed. This was removed in 8.8.0, but has been restored in 8.8.2. (Ticket #6948).
In some situations, the GPU monitoring code could disagree with the GPU discovery code about the mapping between GPU device indices and actual devices. Both now use PCI bus IDs to establish the mapping. One consequence of this change is that we now prefer to use NVidia’s management library, rather than the CUDA run-time library, when doing discovery. (Ticket #6903). (Ticket #6901).
Corrected documentation of CHIRP_DELAYED_UPDATED_PREFIX; it is neither singular nor a prefix. Also resolved a problem where administrators had to specify each attribute in that list, rather than via prefixes or via wildcards. (Ticket #6958).
The Condormaster now waits until the condor_procd has exited before exiting itself. This change helps to prevent problems on Windows with using the Service Control Manager to restart the Condor service. (Ticket #6952).
Fixed a bug on Windows that could cause a delay of up to 2 minutes in responding to condor_reconfig, condor_restart or condor_off commands when using shared port. (Ticket #6960).
Fixed a bug that could cause the condor_schedd on Windows to to restart with the message "fd_set is full". This change reduces that maximum number of active connections that a condor_collector or condor_schedd on Windows will allow from 1023 to 1014. (Ticket #6957).
Fixed a bug where local universe jobs where unable to run condor_submit to their local schedd. (Ticket #6920).
Restored the old Python bindings for reading the job event log (EventIterator and read_events()). These bindings are marked as deprecated, are not available in Python 3, and will likely be removed permanently in the 8.9 series. Users should transition to the replacement bindings (JobEventLog) (Ticket #6939).
Fixed a bug that could cause entries in the job event log to be written with the wrong job id when a condor_shadow process is used to run multiple jobs. (Ticket #6919).
In some situations, the bytes sent and bytes received values in the termination event of the job event log could be reversed. This has been fixed. (Ticket #6914).
For grid universe jobs of type batch, the job now receives a signal when the batch system wants it to exit, giving the job a chance to shut down gracefully. (Ticket #6915).

Version 8.8.1

Release Notes:

HTCondor version 8.8.1 released on February 19, 2019.

Known Issues:

GPU resource monitoring is no longer enabled by default after we received reports indicating excessive CPU usage. We believe we’ve fixed the problem, but would like to get updated reports from users who were previously affected. To enable (the patched) GPU resource monitoring, add ‘use feature: GPUsMonitor’ to the HTCondor configuration. Thank you.
(Ticket #6857).
Discovered a bug in DAGMan where graph metrics reporting could sometimes send the condor_dagman process into an infinite loop. We worked around this by disabling graph metrics reporting by default, via the new DAGMAN_REPORT_GRAPH_METRICS configuration knob. (Ticket #6896).

New Features:

None.

Bugs Fixed:

Fixed a bug that caused condor_gpu_discovery to report the wrong value for DeviceMemory and possibly other attributes of the GPU when CUDA 10 was installed as the default run-time. Also fixed a bug that would sometimes cause the reported value of DeviceMemory to be limited to 4 Gigabytes. (Ticket #6883).
Fixed bug that prevented HTCondor on Windows from running jobs in the default configuration when started as a service. (Ticket #6853).
The Job Router no longer sets an incorrect User job attribute when routing a job between two condor_schedds with different values for configuration parameter UID_DOMAIN. (Ticket #6856).
Made Collector.locateAll() method more efficient in the Python bindings. (Ticket #6831).
Improved efficiency of negotiation code in the condor_schedd. (Ticket #6834).
The new minihtcondor package now starts HTCondor automatically at after installation. (Ticket #6888).
The condor_master now sends status updates to systemd every 10 seconds. (Ticket #6888).
condor_q -autocluster data is now much more up-to-date. (Ticket #6833).
In order to work better with HTCondor 8.9.1 and later, remove support for remote submission to condor_schedds older than version 7.5.0. (Ticket #6844).
Fixed a bug that would cause DAGMan jobs to fail when using Kerberos Authentication on Debian or Ubuntu. (Ticket #6917).
Fixed a bug that caused execute nodes to ignore config knob CREDD_POLLING_TIMEOUT . (Ticket #6887).
Python binding API method Schedd.submit() and submitMany() now edits job Requirements expression to consider the job ad’s RequestCPUs and RequestGPUs attributes. (Ticket #6918).

Version 8.8.0

Release Notes:

HTCondor version 8.8.0 released on January 3, 2019.

New Features:

Provides a new package: minicondor on Red Hat based systems and minihtcondor on Debian and Ubuntu based systems. This mini-HTCondor package configures HTCondor to work on a single machine. (Ticket #6823).
Made the Python bindings’ JobEvent API more Pythonic by handling optional event attributes as if the JobEvent object were a dictionary, instead. See section 7.1.1 for details. (Ticket #6820).
Added job ad attribute BlockReadKbytes and BlockWriteKybtes which describe the number of kbytes read and written by the job to the sandbox directory. These are only defined on Linux machines with cgroup support enabled for vanilla jobs. (Ticket #6826).
The new IOWait attribute gives the I/O Wait time recorded by the cgroup controller. (Ticket #6830).
condor_ssh_to_job is now configured to be more secure. In particular, it will only use FIPS 140-2 approved algorithms. (Ticket #6822).
Added configuration parameter CRED_SUPER_USERS, a list of users who are permitted to store credentials for any user when using the condor_store_credd command. Normally, users can only store credentials for themselves. (Ticket #6346).
For packaged HTCondor installations, the package version is now present in the HTCondor version string. (Ticket #6828).

Bugs Fixed:

Fixed a problem where a daemon would queue updates indefinitely when another daemon is offline. This is most noticeable as excess memory utilization when a condor_schedd is trying to flock to an offline HTCondor pool. (Ticket #6837).
Fixed a bug where invoking the Python bindings as root could change the effective uid of the calling process. (Ticket #6817).
Jobs in REMOVED status now properly leave the queue when evaluation of their LeaveJobInQueue attribute changes from True to False. (Ticket #6808).
Fixed a rarely occurring bug where the condor_schedd would crash when jobs were submitted with a queue statement with multiple keys. The bug was introduced in the 8.7.10 release. (Ticket #6827).
Fixed a couple of bugs in the job event log reader code that were made visible by the new JobEventLog python object. The remote error and job terminated event did not read all of the available information from the job log correctly. (Ticket #6816). (Ticket #6836).
On Debian and Ubuntu systems, the templates for condor_ssh_to_job and interactive submits are no longer installed in /etc/condor. (Ticket #6770).

⇐ ↙ ↑ ⇑ ⇒ Contents Index