Next: 10.5 Development Release Series
Up: 10. Version History and
Previous: 10.3 Development Release Series
Contents
Index
Subsections
10.4 Stable Release Series 8.0
This is a stable release series of HTCondor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 8.0.x releases.
New features will be added in the 8.1.x development series.
The details of each version are described below.
Version 8.0.3
Release Notes:
- HTCondor version 8.0.3 not yet released.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- Fixed a bug with condor_ssh_to_job and USE_PID_NAMESPACES. If the latter option was enabled, a user running the condor_ssh_to_job tool would run in a private pid namespace, and not be able to see their job with ps or gdb. condor_ssh_to_job now runs in the global namespace, so that it can see the processes in the user's job.
(Ticket #3872).
- Fixed a performance problem with the condor_qedit command that would cause the condor_schedd to run very slowly when condor_qedit is run on a large number of jobs. condor_qedit will no longer write an event to the job log.
(Ticket #3827).
- Fixed a problem where the classad python module would return
incorrect results when ClassAd caching is enabled.
(Ticket #3879).
Known Bugs:
Additions and Changes to the Manual:
Version 8.0.2
Release Notes:
- HTCondor version 8.0.2 released on August 22, 2013.
- Debian 5 is past its end of life.
Starting with this release, we no longer provide native packages or
tarballs for Debian 5.
(Ticket #3852).
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
- The default value of ENABLE_DEPRECATION_WARNINGS
has been changed to False.
(Ticket #3848).
Bugs Fixed:
- Implemented a workaround to avoid triggering a Linux kernel defect
when using cgroups and suspending jobs.
(Ticket #3847).
- Fixed a python bindings problem of missing converters by providing
pyclassad as a shared library.
(Ticket #3780).
- Fixed a file permission bug introduced in HTCondor version 7.9.2 that
prevented vm universe jobs from working when using the Xen or KVM
hypervisor.
(Ticket #3781).
- Fixed a bug that could cause the condor_collector to
become unresponsive if the remote HTCondorView server,
specified with configuration variable CONDOR_VIEW_HOST ,
becomes unavailable.
(Ticket #3758).
- Code to publish Linux distribution attributes in the machine ClassAd
is now more robust in the event that the /etc/issue file was edited.
(Ticket #3854).
- Fixed a bug that could cause jobs to be incorrectly placed on hold upon
completion with a hold reason claiming an out-of-memory event.
(Ticket #3824).
- Fixed a bug that prevented work fetch scripts from running
on systems where cgroup based tracking and management was enabled.
(Ticket #3819).
- Fixed a bug that could cause the condor_negotiator to give out the same
slot twice, and result in a scary entry in the NegotiatorLog file
with the wording:
INSANE: bestCached != bestSoFar
(Ticket #2245).
- Fixed a bug introduced in HTCondor version 7.9.3,
in which concurrency limits were not respected across negotiation cycles when
NEGOTIATOR_CONSIDER_PREEMPTION was False.
(Ticket #3815).
- Fixed a bug from HTCondor version 7.9.6.
The bug exhibited itself when using CCB to connect to the condor_startd;
the condor_negotiator and condor_schedd would sometimes crash and then be restarted
with the following error message in the log:
ERROR "Selector::add_fd(): fd -1 outside valid range 0-1023"
A workaround for the problem is relevant to HTCondor versions 7.9.6 through
8.0.1. Configure
SERVICE_COMMAND_SOCKET_MAX_SOCKET_INDEX = -1
(Ticket #3801).
- Fixed a bug in the condor_qsub script that caused it to exit
with a syntax error when a job with a memory requirement was
submitted.
(Ticket #3808).
- Fix a bug causing security groups for EC2 jobs to be ignored.
Also, the code respects the use of commas, as documented,
to separate the items in the list of security groups specified by
the submit description file command ec2_security_groups.
(Ticket #3787).
- When invoking glexec, environment variable
GLEXEC_TARGET_PROXY is now set to /dev/null.
In some situations, it was previously set
to a nonexistent path, which caused errors in some configurations.
(Ticket #3794).
- HTCondor daemons are now less vulnerable to long connection delays
when attempting to connect to hosts that are off-line. A specific case
where this helps is when condor_schedd is using a high availability
configuration, and the primary machine running the condor_collector
is off-line.
(Ticket #3828).
- Fixed a bug that could cause condor_dagman to hang
due to a rarely seen event ordering.
This bug could have been triggered when using the
configuration variable DAGMAN_MAX_JOBS_IDLE ,
or its equivalent command line option -maxidle.
(Ticket #3834).
- Fixed a bug that caused job submission from Windows platforms
using condor_submit with the -spool option to always fail.
(Ticket #3791).
Known Bugs:
- DAGMan recovery mode does not work for Pegasus-generated SUBDAGs.
For SUBDAGs, doing condor_hold or condor_release on
the condor_dagman job, or stopping and re-starting the
condor_schedd with the DAGMan
job in the queue will result in failure of the DAG. This can be
avoided by doing a condor_rm of the DAGMan job, which produces a Rescue
DAG, and re-submitting the DAG; the Rescue DAG is automatically run.
This bug was introduced in HTCondor version 8.0.1.
(Ticket #3882).
Additions and Changes to the Manual:
Version 8.0.1
Release Notes:
- HTCondor version 8.0.1 released on July 17, 2013.
New Features:
- HTCondor now provides the Debian Linux 7.0 (wheezy) platform,
including support for the standard universe.
(Ticket #3665).
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- Fixed a bug that prevented per-slot settings of the
STARTD_ATTRS configuration variable from being set
correctly for partitionable slots named with a SLOTX_ prefix.
(Ticket #3726).
- Fixed a bug that caused condor_status -submitters to report twice
as many jobs running as were actually running.
This bug appeared in HTCondor versions 7.9.6 and 8.0.0.
(Ticket #3713).
- Fixed a bug with hierarchical group quotas in the condor_negotiator
in which group hierarchies with parent groups that set
configuration variable GROUP_ACCEPT_SURPLUS to
False would be assigned allocations above their quota.
(Ticket #3695).
- Fixed a bug in which scheduler universe jobs that
have an on_exit_hold
expression that evaluates to True could have duplicate hold messages
in the user log.
(Ticket #3651).
- Fixed a bug in which condor_dagman would submit multiple copies of the
same job, fail, write a Rescue DAG, and leave the jobs in the queue.
This was due to a warning from condor_submit that the submit description file
was not using lines containing the string "cluster".
As a fix, condor_dagman will search for the
string " submitted to cluster ".
This will generate fewer false alarms.
If the submission succeeds and condor_dagman gets confused,
the jobs will be removed when condor_dagman writes a Rescue DAG.
(Ticket #3658).
- Added libdate-manip-perl as a dependency to the Debian packages.
It is required in order to run the condor_gather_info script.
(Ticket #3692).
- Configuration variable CCB_ADDRESS did not correctly
support a list of CCB servers. Only the first one in the list was used.
(Ticket #3699).
- Fixed a bug that caused some communication layer log messages
to end with binary characters.
(Ticket #3706).
- Fixed a bug that can cause the condor_procd to erroneously exit
on Mac OS X when many processes are created in a short period of time.
(Ticket #3725).
- Removed a bug that caused condor_dagman to have problems restarting
after an upgrade from HTCondor version 7.8.
(Ticket #3707).
- Fixed a bug that caused the command
condor_q -dag -run
to print garbage.
(Ticket #3578).
- Fixed a bug that prevented jobs with an invalid ec2_keypair
from being removed.
(Ticket #3485).
- Fixed a memory leak and potential crash in the condor_gridmanager
when requests to an EC2 service fail.
(Ticket #3701).
- Fixed a bug in the condor_gridmanager that can cause EC2 jobs to be
submitted a second time during recovery.
(Ticket #3705).
- Fixed a memory leak in the condor_gridmanager that was triggered when
submitting EC2 grid universe jobs.
(Ticket #3720).
Known Bugs:
Additions and Changes to the Manual:
Version 8.0.0
Release Notes:
- HTCondor version 8.0.0 released on June 6, 2013.
New Features:
- The condor_chirp write command now accepts an
optional numbytes parameter following the local file name.
This allows the write to be limited to the specified number of bytes.
(Ticket #3548).
- The HTCondor Python bindings now build on Mac OS X.
(Ticket #3584).
- Updated the sample condor.plist file to work better with
current versions of Mac OS X.
(Ticket #3624).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable
DEDICATED_SCHEDULER_WAIT_FOR_SPOOLER
permits the specification of a very strict execution order for
parallel universe jobs handed to a remote scheduler.
(Ticket #2946).
Bugs Fixed:
- Fixed a bug that happened with partitionable slots, jobs that
requested more than one cpu, and a negotiator with
NEGOTIATOR_CONSIDER_PREEMPTION was false. This would
cause the negotiator to incorrectly assume that each slot had
a slot weight of one.
(Ticket #3737).
- The redundant configuration variable CheckpointPlatform has
been removed and the configuration variable CHECKPOINT_PLATFORM
documented.
(Ticket #3544).
- A standard universe job will no longer crash, and it will no longer
cause the condor_shadow to crash
if the job calls mmap() with an unsupported combination of flags.
(Ticket #3565).
- Support for VMware Workstation and VMware Player
under the vm universe now works properly on Windows platforms.
(Ticket #3627).
- For grid universe jobs intended for an EC2 grid resource,
errors which have no response body now report the HTTP code.
(Ticket #3541).
- condor_chirp put would experience an assertion failure when
used on an empty file. This bug has been fixed, and put can now be
used on an empty file.
(Ticket #3542).
- The 32-bit condor_starter could fail to execute jobs when the initial
working directory of the job was on a subsystem containing 64-bit metadata,
such as inode numbers.
(Ticket #3605).
- condor_dagman failed to react correctly if a nested DAG file
did not exist. It now reacts correctly and prints a more
helpful error message.
(Ticket #3623).
- Fixed a bug that caused the condor_master daemon on Windows platforms
to think there were new binaries
when changing to and from daylight savings time.
The condor_master daemon would then kill and restart itself,
as well as all of the daemons,
if configuration variable MASTER_NEW_BINARY_RESTART was set
to its default value of GRACEFUL.
(Ticket #3572).
- Fixed a bug that caused redundant lines to show up in the user log
at the end of the partitionable resource usage summary.
(Ticket #3621).
- Fixed several bugs that can cause the condor_procd to fail on Mac OS X
and not be restartable.
(Ticket #3617).
(Ticket #3618).
(Ticket #3620).
- The condor_procd now ignores process id 0 on Mac OS X.
(Ticket #3516).
- Fixed memory leaks in the condor_shadow and the condor_startd;
fixed a file descriptor leak in the standard universe condor_starter.
(Ticket #3590).
- Fixed a bug in which condor_dagman would miscount the number
of held jobs when
multiple copies of hold events were written to the user log.
(Ticket #3650).
Known Bugs:
- The following obsolete binaries have not yet been removed from
the HTCondor tarballs:
- classad_functional_tester
- classad_version
- condor_test_match
- condor_userlog_job_counter
(Ticket #3670).
- condor_status -submitters reports twice
as many jobs running as were actually running.
(Ticket #3713).
Additions and Changes to the Manual:
- Fixed the condor_configure man page and added a corresponding
condor_install man page.
(Ticket #3619).
- Added stub man pages for the Bosco commands.
(Ticket #3634).
Next: 10.5 Development Release Series
Up: 10. Version History and
Previous: 10.3 Development Release Series
Contents
Index
htcondor-admin@cs.wisc.edu