Next: 10.5 Development Release Series
Up: 10. Version History and
Previous: 10.3 Development Release Series
Contents
Index
Subsections
10.4 Stable Release Series 8.2
This is a stable release series of HTCondor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 8.2.x releases.
New features will be added in the 8.3.x development series.
The details of each version are described below.
Version 8.2.9
Release Notes:
- HTCondor version 8.2.9 not yet released.
New Features:
Bugs Fixed:
Version 8.2.8
Release Notes:
- HTCondor version 8.2.8 released on April 7, 2015.
New Features:
Bugs Fixed:
- Fixed a bug that could cause updates sent to
the host defined by CONDOR_VIEW_HOST
to fail once the HTCondorView collector is restarted,
if the updates are being sent via TCP.
(Ticket #4915).
- Fixed a bug that could have caused the condor_schedd to start more
jobs than specified by the value of configuration variable
MAX_JOBS_RUNNING,
and then later kill the excess running jobs.
(Ticket #4554).
- Fixed a bug in which HTCondor was looking for configuration variable
IsOwner, instead of IS_OWNER.
Now, the system permits specification of either spelling,
and IsOwner takes precedence.
(Ticket #4949).
- Fixed a bug that could cause the condor_schedd to crash
when removing jobs that were in the process of being submitted
using condor_submit with a -spool or -remote
command line option.
(Ticket #4866).
- Fixed a minor bug that would emit a grep not found warning
when using condor_ssh_to_job to log in to a MacOS execute node.
(Ticket #4789).
- Fixed problems running a central manager on a Windows operating
system that could prevent the condor_collector from being restarted,
and/or could cause delays when shutting down the HTCondor service.
(Ticket #4923).
- Fixed bugs in ClassAds and the Python bindings with handling
Daylight Saving Time when using timestamp values.
(Ticket #4936).
(Ticket #4937).
- Upon reboot of a submit machine, fixed a bug that would
prevent successful reconnection with
a previously running job
using streaming I/O via submit file
options stream_output or stream_error.
(Ticket #4939).
- If CONSUMPTION_POLICY is enabled for a partitionable slot,
then CLAIM_PARTITIONABLE_LEFTOVERS will be treated as
False for that slot.
This avoids a problem, because claims using a consumption policy
and the use of CLAIM_PARTITIONABLE_LEFTOVERS conflict
with each other.
(Ticket #4950).
- Fixed a bug that could have caused a partitionable slot to remain in
the Matched state forever when configuration variables
CONSUMPTION_POLICY and NEGOTIATOR_INFORM_STARTD
were both set to True.
(Ticket #4945).
- Updated the Globus GRAM library to allow the TLS encryption standard
to be used, instead of always using the old SSLv3 standard.
(Ticket #4964).
- Fixed the condor_schedd daemon's default value for configuration
variable MAX_JOBS_RUNNING,
when the variable is cleared in the configuration file.
The default value was ten times too large.
The default value could also become negative on a machine with a large
amount of memory (over 500 GiB).
(Ticket #4966).
- For DAGMan-related configuration,
added missing entries to the default parameter table and documentation,
and removed obsolete parameters from the default parameter table.
(Ticket #4826).
- Fixed a bug in condor_who that,
when processing the -daemons command line option,
prevented it from detecting that the
condor_shared_port or condor_job_router daemons had exited.
(Ticket #4962).
- Fixed a bug that could cause the cream_gahp to fail at start up,
if an incompatible version of the Globus libraries were installed.
(Ticket #4865).
- Fixed a bug that could cause grid universe jobs of grid type lsf
to fail when attempting to submit to LSF.
(Ticket #4938).
Version 8.2.7
Release Notes:
New Features:
Bugs Fixed:
- Grid universe jobs with the grid type gce now
work with the current version of Google Compute Engine (GCE).
(Ticket #4586).
- Improved the cleanup of Google Compute Engine (GCE) instances that
are terminated.
(Ticket #4832).
- Fixed a usability problem with EC2 grid universe jobs.
HTCondor now ignores trailing white space characters within the files
identified by submit commands ec2_access_key_id
and ec2_secret_access_key.
(Ticket #4791).
- Fixed a race condition that prevented jobs from being held when
the job went over its memory allocation. On hosts where memory.use_hierarchy
was set to 1 in the memory cgroup controller, jobs would frequently receive
the SIGKILL signal or be requeued with a shadow exception instead of
being put on hold for going over memory. (Ticket #4774).
- Fixed a bug in ClassAds that can cause a crash if an attribute's
value includes an eval() function that references the attribute's
name.
(Ticket #4813).
- Fixed a rare bug that could cause the condor_schedd to write to an
old daemon log file after log rotation.
(Ticket #4761).
- Fixed a slow memory leak in the Access Control List of the
Windows desktop when configuration variable
USE_VISIBLE_DESKTOP was enabled.
(Ticket #4815).
- Fixed a problem with the python bindings in which an
invocation of function list()
on specific forms of an ExprTree object would cause an infinite loop.
(Ticket #4737).
- Fixed a rare bug in which an attempt to send session invalidations
via UDP occurred when no UDP socket was available.
(Ticket #4556).
- A regular expression specifies files within a configuration directory
to be ignored when reading the HTCondor configuration.
This regular expression has been expanded to also
ignore backup files left by CFEngine and dpkg.
(Ticket #4760).
Version 8.2.6
Release Notes:
- HTCondor version 8.2.6 released on December 16, 2014.
- Memory usage of the ec2_gahp
may grow without bound due to a bug in some versions of libcurl.
This occurs when jobs use x509:// URLs.
Testing shows that libcurl version 7.38.0 does not have this issue.
The fix may have been introduced as early as version 7.24.0.
Therefore,
if this problem occurs,
consider upgrading the installation of libcurl.
For operating systems whose vendor
does not provide a new enough version of libcurl,
build a more recent version,
and use the configuration of EC2_GAHP to specify
a wrapper script that sets up and invokes an ec2_gahp which
uses the updated libcurl.
New Features:
Bugs Fixed:
- Corrected command line arguments to /bin/mail,
adding the option to use sendmail.
(Ticket #4764).
- Fixed a bug introduced in HTCondor version 8.2.4 that caused the
condor_schedd daemon log file to not rotate when
configuration variable
USE_CLONE_TO_CREATE_PROCESSES was set to True.
(Ticket #4753).
- Fixed a race condition that could cause the transfer of output files
for HTCondor-C jobs to fail.
HTCondor-C jobs are grid universe jobs
with a grid type of condor.
(Ticket #3379).
- Fixed a bug in the Windows version of condor_submit
that prevented a job from being submitted,
if a directory specified with the submit command
transfer_input_files contained a trailing forward slash
character (/).
condor_submit failed with an error message indicating that
the directory could not be accessed,
even when there was no problem accessing the directory.
(Ticket #4747).
- Fixed a problem that prevented HTCondor from starting
on RHEL 7 platforms.
The ownership of the directory /var/lock/condor/ was incorrect.
(Ticket #4775).
- Fixed a bug in which condor_qsub mishandled setting a disk space
request with a command line argument of the form -l file=2048MB.
(Ticket #4606).
Version 8.2.5
Release Notes:
- HTCondor version 8.2.5 released on December 1, 2014.
New Features:
Bugs Fixed:
- Updated the post install script in the RPM packages,
to preserve file /etc/condor/condor_config.local,
if this file was modified since the last installation.
(Ticket #4731).
- Updated Windows builds of HTCondor to use the latest version of OpenSSL.
(Ticket #4733).
- Fixed a bug that caused file transfer to or from a condor_schedd
daemon version 8.0 or older
to fail when using the Python bindings, or when using
the -address option with condor_submit or condor_transfer_data.
(Ticket #4720).
- Fixed a bug in condor_urlfetch that caused it to sometimes
not fetch the URL when it should have,
because the cache file did not exist.
(Ticket #4732).
- Fixed a bug that prevented grid-type batch jobs from being removed
if they had an X.509 proxy that had been deleted.
(Ticket #3072).
- Fixed an inconsistency in which configuration variable
JOB_RENICE_INCREMENT,
if not explicitly set, defaulted to the value 10
for vanilla universe jobs and to the value 0 for standard universe jobs.
It now defaults to 0, which matches the documentation.
(Ticket #4697).
- Fixed a bug in which condor_submit in interactive mode did
not properly handle the -name command line option.
(Ticket #4728).
- Fixed a bug in error checking while performing job output file
transfer for jobs in which job ClassAd attribute OutputDestination
is used.
(Ticket #4739).
- Fixed a bug that caused nordugrid grid jobs to be held with
the hold reason "Unspecified gridmanager error"
when runtime information was not reported by the remote server.
(Ticket #4736).
Version 8.2.4
Release Notes:
- HTCondor version 8.2.4 released on Nov. 12, 2014.
New Features:
Bugs Fixed:
- Fixed a bug in which a condor_schedd daemon of an 8.2 version
could not send jobs to,
or obtain a claim on a condor_startd daemon of an 8.0 or previous version.
(Ticket #4687).
- Fixed a bug that could cause removed jobs to return to idle status.
If a running job was removed at the same time that an error occurred that
caused the condor_shadow to put the job on hold, the job would be put
in the held status, but change to idle status when released.
(Ticket #4619).
- Changed the default value of configuration variable
CONDOR_Q_USE_V3_PROTOCOL from True to False,
and raised the default value of configuration variable
SCHEDD_QUERY_WORKERS from 3 to 8.
This works around condor_schedd performance issues caused by using
this protocol when querying schedulers that handle large numbers of jobs.
(Ticket #4696).
- Fixed a bug that resulted in the condor_kbdd on Windows platforms
sometimes exiting with error 0xC0000374,
which indicates heap corruption.
(Ticket #4634).
- Fixed a bug that caused the condor_startd to report available disk
attributes in bytes rather than kibibytes on Windows platforms.
(Ticket #4638).
- Changed the condor_master to work around a bug in the C Runtime
on Windows platforms that resulted in the condor_master restarting
whenever the system clock was changed to account for daylight savings time.
(Ticket #3572).
- Fixed a rare bug that could cause a daemon to core dump with
a log message of
"child failed because PID XXX is still in use by DaemonCore".
(Ticket #4646).
- Fixed a bug in the condor_shadow daemon that caused
the user's supplemental
groups to be unset when the condor_shadow process is reused
to run another job.
This could result in the job being held with a hold reason of
"Failed to initialize user log to <path>".
(Ticket #4672).
- The RPM and Debian distributions no longer include a configuration
file called condor_config.local,
as this file is reserved for the use of local administration.
And, condor_install or condor_configure no longer create
file condor_config.local; they instead append to condor_config.
(Ticket #4552).
- Using condor_compile on programs which call posix_memalign()
no longer causes a link error.
(Ticket #4486).
- Fixed a bug in which condor_router_q queried the wrong queue
if the job router was configured to route jobs away from the source.
(Ticket #4599).
- Fixed a bug that prevented condor_chirp from finding its configuration
file in the default location.
(Ticket #4625).
- Fixed a bug that could cause a daemon to write to an old daemon log
file after log rotation.
(Ticket #3106).
- The HTCondor DRMAA library now works correctly when
SCHEDD_HOST is set in the configuration file.
(Ticket #4629).
- Fixed the default value of the previously undocumented
configuration variable HISTORY_HELPER_MAX_CONCURRENCY.
It incorrectly defaulted to 10000, rather than the correct value of 2.
(Ticket #4644).
- Fixed a bug in the condor_schedd daemon
that caused remote condor_history
commands to fail if the configuration variable LIBEXEC was not
explicitly set in a configuration file.
(Ticket #4678).
- For grid type condor grid universe jobs, if commands
to the remote condor_schedd fail but the daemon appears to be running,
then affected jobs will be placed in the Hold state.
Previously, any failure to talk to the remote daemon would result in
the condor_gridmanager considering the remote condor_schedd
temporarily unavailable,
and the condor_gridmanager waited for the remote condor_schedd
to become available again.
(Ticket #4557).
- Fixed a bug in the condor_ganglia daemon
that caused it to incorrectly log
that gmetric was being used when condor_reconfig was invoked.
(Ticket #4680).
- Corrected the default value of configuration variable
GANGLIAD_METRICS_CONFIG_DIR to be
/etc/condor/ganglia.d in the RPM and Debian distributions.
With the bug,
the condor_gangliad daemon would fail to start
when it could not locate this incorrectly specified directory.
(Ticket #4709).
Version 8.2.3
Release Notes:
- HTCondor version 8.2.3 released on October 1, 2014.
- This version of HTCondor includes a full port for
Ubuntu 14.04 on the x86_64 architecture.
A full port includes support for the standard universe.
(Ticket #4562).
New Features:
- The new configuration variable
RUN_FILETRANSFER_PLUGINS_WITH_ROOT permits file transfer
plug-ins to run with root privilege,
when HTCondor daemons are run as root,
and when set to the non-default value of True.
(Ticket #4561).
- The new configuration variable NETWORK_HOSTNAME sets
the host name that HTCondor uses to identify the local machine.
If NETWORK_HOSTNAME is not set,
then HTCondor uses the gethostname() function to determine
the machine's host name.
This variable is useful if a machine has multiple network interfaces
with different host names.
(Ticket #4570).
- Configuration variable JOB_ROUTER_DEFAULTS tolerates
the syntax of omitting the outer square brackets that would be
required by new ClassAd syntax,
in order to facilitate appending to an existing value.
If the value of JOB_ROUTER_DEFAULTS does not have
enclosing square brackets,
the value will be parsed as if they are present.
(Ticket #4433).
Bugs Fixed:
- The RedHat 7 RPM contains the service file to start up
HTCondor via systemd instead of via init scripts.
(Ticket #4534).
- EC2 grid universe jobs which use the X.509 authentication method will
no longer crash if environment variable USER is not set.
(Ticket #4540).
- Fixed a rare memory leak.
The leak occurred when IPv6 was disabled,
but configuration variables NETWORK_INTERFACE
and COLLECTOR_HOST were set to IPv6 addresses.
(Ticket #4502).
- Fixed a bug in which condor_qsub mishandled setting a memory request
with a command line argument similar to -l mem=2048MB.
(Ticket #4549).
- Fixed a bug that caused the condor_gridmanager to fail to talk
to the condor_schedd if the user's account was in a Windows domain.
(Ticket #4568).
- On Windows platforms, users listed in the QUEUE_SUPER_USERS
configuration variable are now checked in a case-insensitive way,
since user names are case-insensitive on Windows.
(Ticket #4579).
- Fixed a bug that could prevent the condor_schedd job queue log
from rotating on Windows platforms.
(Ticket #4548).
- Fixed a bug that caused all HTCondor daemons to leak
a small amount of memory upon reconfiguration.
(Ticket #4582).
- Fixed a bug that caused condor_config_val -verbose to sometimes append incorrect meta-knob
information to the file and line number information for a configuration variable.
(Ticket #4559).
- Fixed a bug that sometimes prevented adding a .txt file name
extension to the log file name of an HTCondor daemon on Windows platforms.
(Ticket #4571).
- Fixed a bug that caused condor_dagman to crash if
configuration variable
DAGMAN_ALWAYS_USE_NODE_LOG was set to False and
configuration variable
DAGMAN_USE_STRICT was set to 1 or a higher value.
(Ticket #4600).
- Fixed a bug that caused the DAG node status file (if one is specified)
to have the wrong final status for a DAG that is aborted by an
ABORT-DAG-ON specification.
(Ticket #4312).
- Fixed a bug in the batch_gahp that could cause it to fail
when attempting to query the status of an LSF job.
(Ticket #4592).
Known Bugs:
- On Windows platforms only, issuing condor_rm on a
condor_dagman job does not work correctly.
The condor_dagman process is immediately killed,
and it does not write a Rescue DAG or remove its node jobs.
Note that this bug has probably existed since DAGMan was first
implemented on the Windows platform.
(Ticket #4566).
Version 8.2.2
Release Notes:
- HTCondor version 8.2.2 released on August 7, 2014.
- This version of HTCondor includes a full port for
Red Hat Enterprise Linux 7.0 on the x86_64 architecture.
A full port includes support for the standard universe.
(Ticket #4511).
- The RPM for RHEL 7 contains several subpackages for elements of HTCondor,
modernizing the RPM-based installation.
(Ticket #4518).
New Features:
Bugs Fixed:
- When using the Windows installer,
the choice of a new pool caused an invalid value in the configuration of
$$(FULL_HOSTNAME) to be used,
instead of the correct value of $(FULL_HOSTNAME).
This prevented all daemons from talking to the condor_collector daemon.
(Ticket #4509).
- Fixed a bug that only manifested on Linux 3.14 or more recent kernels,
which caused the condor_collector to respond very slowly to queries.
(Ticket #4489).
- Fixed a Windows platform bug that caused condor_status to abort
when ENABLE_CLASSAD_CACHING was set to True.
(Ticket #4459).
- Fixed a bug that prevented the detection of hyper-threaded cores
on Linux platforms.
All cores reported as full cores without hyper-threading.
(Ticket #4458).
- Fixed the detection of hyper-threaded cores on Mac OS X platforms.
(Ticket #4516).
- Fixed a Windows platform bug that caused the condor_starter
to abort while creating the job sandbox.
The bug presents as a minor memory leak in all versions of HTCondor
for Windows prior to version 8.2.2 and 8.3.0.
In HTCondor version 8.2.0, this bug could sometimes
present as an abrupt condor_starter exit with status -1073740940.
(Ticket #4467).
- Fixed a file descriptor leak in the condor_shared_port
daemon.
(Ticket #4456).
- Fixed a bug existing on Linux platforms with newer kernels.
With cgroups enabled, the OOM killer killed the job when the job
went over its memory allocation.
Now, the condor_starter catches the OOM signal and
places the job on hold with an appropriate message.
(Ticket #4435).
- Fixed a bug in which the expression set by submit command
periodic_remove would not remove
jobs running on Linux machines when PID namespaces were enabled.
(Ticket #4421).
- Fixed a Windows-specific bug: specifying a DAG node status file
caused DAGMan to fail.
(Ticket #4361).
- Fixed a problem in which job rank may not have always worked
as documented due to a bug in HTCondor's auto cluster mechanism.
(Ticket #4403).
- Updated the HTCondor DRMAA library to version 1.6.2.
This version fixes minor bugs in the functions for querying how a job exited.
(Ticket #4413).
- condor_submit no longer fails if the value of
x509userproxy is a relative path,
and the value of initialdir is set to a directory
that is not the current working directory of condor_submit.
(Ticket #4415).
- Fixed a bug that caused condor_submit_dag to core dump if
a non-existent DAG file was specified.
(Ticket #4423).
- Fixed a bug that resulted in output of the string "undefined",
instead of printing nothing,
when using the %s format specifier to
condor_q -format.
(Ticket #4418).
- Fixed a bug in the condor_shadow that caused the user's supplemental
groups to be unset when trying to write to the user's job event log.
This could result in the job being held with the hold reason
"Failed to initialize user log to <path>".
(Ticket #4437).
- Fixed a bug in the cream_gahp that would corrupt memory when
using more than the default number of worker threads.
(Ticket #4416).
- Fixed a bug that could cause the cream_gahp to fail at
start up, because it could not locate a Globus threading library.
(Ticket #4440).
- When a daemon checks whether a user has execute permission for a
directory, it now considers supplemental groups and POSIX ACLs in the
determination.
(Ticket #4402).
- Fixed a bug that could cause GSI security operations to fail if
GLOBUS_THREAD_MODEL was set in the environment.
(Ticket #4464).
- Fixed a bug in condor_ft-gahp that caused it to ignore the peer
version given by the CONDOR_VERSION command, causing it to think that
its file transfer peer was the same version as itself.
(Ticket #4473).
- Fixed the handling of optional authentication parameters given to
remote_gahp. This is used as part of the batch grid-type when
submitting jobs to a remote system via ssh.
(Ticket #4434).
- Fixed a bug in the parsing the value set for the
Detected<Tag> attribute of the output of a script specified by
configuration variable MACHINE_RESOURCE_INVENTORY_<TAG>.
If the value of Detected<Tag> was not a string,
then it would not be parsed correctly.
As a result the resource quantity would be set to 0.
(Ticket #4427).
Version 8.2.1
New Features:
Bugs Fixed:
Version 8.2.0
Release Notes:
- HTCondor version 8.2.0 released on June 24, 2014.
New Features:
- The new configuration variable SOCKET_LISTEN_BACKLOG
controls the listen backlog setting for a daemon's command port.
The default value of 500 implements the previously hard coded value.
(Ticket #4393).
- Streamlined the network protocol used by condor_submit,
resulting in faster job submission times and less condor_schedd overhead,
especially when performing a submit to a remote condor_schedd.
(Ticket #3846).
- The default value for configuration variable CLAIM_WORKLIFE
has changed from 60 minutes to 20 minutes.
(Ticket #4374).
- The default value for configuration variable
NEGOTIATOR_PRE_JOB_RANK has changed to prefer to match
multi-core jobs to dynamic slots in a best-fit manner.
And, the default value for configuration variable
PREEMPTION_RANK has changed to first choose the user with the
worst priority, and then choose the job of that user with the least
amount of accumulated run time.
(Ticket #4374).
- The default set of metrics published by the condor_gangliad has been
reduced to an essential set of scheduler and negotiator metrics.
Also, the units for accumulated times have changed from seconds to hours.
(Ticket #4299).
Bugs Fixed:
- Fixed a bug that caused a memory leak in the condor_procd
when cgroup tracking is enabled.
(Ticket #4408).
- Fixed a bug that caused a memory leak in the condor_collector
under heavy load. This bug was introduced in HTCondor version 8.1.5.
(Ticket #4370).
- Windows machines with more than nine dynamic slots may have
failed to start jobs due to a limit on the number of characters
in a user name.
To address this limit, the user name is shortened from
condor-reuse-slot<N> to condor-slot<N>.
(Ticket #4388).
- Fixed a bug in which condor_q failed to communicate with a
condor_schedd of HTCondor version 8.1.4.
(Ticket #4384).
- Fixed bugs introduced in HTCondor version 8.1.5 that caused communication
between the cream_gahp and the remote CREAM servers to fail.
(Ticket #4392).
- Fixed a bug introduced in HTCondor version 8.1.2 that caused grid-type
cream jobs to fail when copy_to_spool was set to True
in the submit description file.
(Ticket #4391).
- When submitting a grid universe job with a grid type of batch and
setting request_memory, the job would fail if the remote
batch system was HTCondor. This has been fixed.
(Ticket #4367).
- Improved the detection of IPv4 link-local addresses.
(Ticket #4341).
- Fixed a bug in which the HTCondor central manager may attempt to
send email to a user named NONE, if configuration variable
CONDOR_DEVELOPERS is left unset.
(Ticket #4399).
- Fixed a bug in which condor_user_prio could result in a
segmentation fault when given the -grouporder option.
(Ticket #4407).
- Fixed a bug that caused frequent crashes of the cream_gahp.
(Ticket #4406).
- Fixed a bug that prevented attribute SubmitterUserPrio from
properly functioning in PREEMPTION_REQUIREMENTS and
PREEMPTION_RANK expressions as documented in
section 3.4.3.
(Ticket #4369).
- Fixed a bug that could cause some commands sent to HTCondor daemons
to fail, especially when sent over a slow network.
This bug was introduced in HTCondor version 8.1.5.
(Ticket #4368).
Next: 10.5 Development Release Series
Up: 10. Version History and
Previous: 10.3 Development Release Series
Contents
Index