Next: 8.4 Development Release Series
Up: 8. Version History and
Previous: 8.2 Upgrading from the
Contents
Index
Subsections
8.3 Stable Release Series 7.6
This is a stable release series of Condor.
As usual, only bug fixes (and potentially, ports to new platforms)
will be provided in future 7.6.x releases.
New features will be added in the 7.7.x development series.
The details of each version are described below.
Version 7.6.3
Release Notes:
- Condor version 7.6.3 released on August 23, 2011.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- Fixed a bug causing parallel universe jobs to be preempted upon
renewal of the job lease,
which by default happens within 20 minutes.
This meant that essentially no parallel universe job that took
longer than 20 minutes would ever finish.
(Ticket #2317).
- When the specified job requirements expression contained a
reference to RequestMemory, there was inconsistent behavior:
in some cases the default RequestMemory requirements were
suppressed, and in other cases not. Now, the default
RequestMemory requirements are always suppressed when there
are explicit references to RequestMemory in the job
requirements.
- Fixed a bug that could cause Condor to crash when using
the Local Credential Mapping Service (LCMAPS) with
GSI authentication.
(Ticket #2340).
- Fixed a bug that caused the condor_collector daemon to crash
upon receiving a CCB command,
when ENABLE_CCB_SERVER was changed from True to False
without restarting the daemon.
(Ticket #2357).
- The GT2 GAHP no longer consumes all of the CPU when compiled
with threaded Globus libraries.
(Ticket #2345).
- Fixed a problem introduced in Condor version 7.5.6,
which led to local lock files for user log locking always being created
whether or
not ENABLE_USERLOG_LOCKING was set to False.
(Ticket #2116).
- Installation as a service by the MSI installer on Windows platforms
now sets the default of Automatic Delayed.
(Ticket #2318).
- In PrivSep mode, if started as root,
the condor_master re-executes itself as the condor user.
Previously, supplementary groups were preserved.
Now supplementary groups are cleared and set to the list of groups
to which the condor user belongs.
(Ticket #2376).
- Fixed a bug where setting DAGMAN_PROHIBIT_MULTI_JOBS to
True caused SUBDAGs to stop working.
(Ticket #2331).
- Fixed a bug that caused scheduler universe jobs submitted via
Condor-C or condor_submit -spool to be held and be unable to run.
The hold reason given was File <filename> is missing or not executable.
(Ticket #2396).
- condor_submit now exits with an error,
if the command hold = True is in the submit description file
when using -spool or -remote as command-line arguments.
This combination of settings resulted in jobs being unable to run.
(Ticket #2398).
Known Bugs:
Additions and Changes to the Manual:
Version 7.6.2
Release Notes:
- Condor version 7.6.2 released on July 19, 2011.
New Features:
- Improved how condor_dagman deals with certain parse errors:
missing node name or submit description file in JOB lines.
Also, condor_dagman
now prints DAG input file lines as they are parsed,
if the debug verbosity setting is 6 or above,
as set with the condor_submit_dag command line option -debug.
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- Fixed a bug in the condor_negotiator that impacted the processing
of machine RANK such that condor_startd RANK
preemption only occurred if the preempting user had sufficient user priority
to claim another machine.
- condor_ssh_to_job did not work on systems using the
dash shell for /bin/sh.
- condor_ssh_to_job now works with jobs that are run via
glexec. Previously, it did not.
- When glexec was configured with linger=on,
the condor_starter would become unresponsive for the duration of the job.
For jobs longer than the value set by configuration variable
NOT_RESPONDING_TIMEOUT,
this caused the job to be aborted.
This also prevented job resource usage monitoring from working
while the job was running.
- Fixed a bug in hierarchical group quotas that caused
a warning to be logged, despite correct implementation.
- condor_preen now properly respects the convention that
the -debug option causes dprintf() logging to stderr.
- Fixed a problem introduced in Condor version 7.5.5
that could cause the condor_schedd to crash when a job was removed
during negotiation or when an idle parallel universe job left the queue.
- Fixed a problem that sometimes caused the condor_procd to die.
The chain of events for this fixed bug were that
the condor_startd killed the condor_starter due to unresponsiveness,
and the condor_procd would die.
Then condor_startd logged the message
ProcD has failed and the condor_startd exited.
- Fixed a problem introduced in Condor version 7.6.1
that caused the condor_shadow to crash without successfully putting the job
on hold when the user log could not be opened for writing.
- condor_history no longer crashes when given a constraint expression
longer than 512 characters.
- PBS and LSF grid jobs that arrive in a queue via Condor-C
or remote submission again work properly.
- Fix a bug that can cause the condor_gridmanager to crash
when a CREAM job ClassAd is missing the X509UserProxy attribute.
- Fix a bug that caused CREAM jobs to have incomplete input files,
if the condor_gridmanager crashed during transfer of those input files.
- Fix a bug in condor_submit that caused grid jobs intended for
CREAM services whose names contain extra dashes to become held.
- Fixed a bug in which condor_submit would try,
but fail to open the Deltacloud password file,
when the file name was dependent on an expression specified with $$().
- If the Owner attribute was not set in the ClassAd associated
with a cluster of jobs,
shared spooled executables were not correctly cleaned up.
- Fixed a bug for 64-bit versions of Windows in which the
user condor-reuse-slot<N> showed up on the login screen.
- Fixed a bug introduced in Condor version 7.5.5,
which changed the default value of the configuration variable
INVALID_LOG_FILES from the empty set to a file called core.
This resulted in core files being removed unexpectedly by condor_preen,
and that complicated debugging of Condor.
Previous behavior has been restored.
- Fixed a bug existing since Condor version 7.5.5 on Windows platforms.
The installer installed Java jar files in the correct
$(BIN)
directory,
while the value for the configuration variable
JAVA_CLASSPATH_DEFAULT utilized the obsolete $(LIB)
directory.
The installer now correctly sets JAVA_CLASSPATH_DEFAULT
to the $(BIN)
directory.
- Fixed a problem causing Condor to fail to start when
privsep was enabled and the environment had any variables
containing newlines.
Known Bugs:
Additions and Changes to the Manual:
Version 7.6.1
Release Notes:
- Condor version 7.6.1 released on June 3, 2011.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
- A bug introduced in Condor version 7.5.5 caused the condor_schedd
to die when its attempt to claim a slot for a parallel universe job
was rejected by the condor_startd.
- condor_q -analyze failed to provide detailed analysis of
the job's requirements expression when the expression contained ClassAd
function calls in some cases.
- Fixed an incorrect exit code from condor_q
when invoked with the -name option and using Quill.
- Fixed a segmentation fault bug introduced in Condor version 7.5.5,
in the checkpoint and restart of jobs using compressed checkpoint images
under the standard universe.
By default, Condor will not compress checkpoints under the standard universe.
Jobs which do not compress their checkpoints were immune to this bug.
Compressed checkpoints are only available in 32-bit versions of Condor.
Generation of checkpoints in 64-bit versions of Condor are unaffected.
- In Condor version 7.6.0, the condor_schedd would create a
spool directory for every job. The corrected and previous behavior
has now been restored,
which is to create a spool directory only when needed.
- Fixed a bug introduced in Condor version 7.5.2,
that caused the condor_negotiator daemon to crash
if any machine ClassAds contained cyclical attribute references.
- Fixed a bug that caused usage by nice_user jobs to
be charged to the user directly rather than `nice-user.user'.
This bug was introduced in the 7.5 series.
- Fixed bugs in the RPM init script that could cause some
shutdown failures to be unreported,
and they could cause status requests,
such as service condor status,
to always report Condor as inactive.
- Fixed a bug in the condor_gridmanager that could cause a crash
when a grid type amazon job was missing required attributes.
- Fixed bug in the condor_shadow, in which it would treat
the closed socket to the execute machine as an error,
after both it had asked for the claim to be deactivated and the
condor_schedd daemon was busy.
As a result, a busy condor_schedd could result in the job being re-run.
- The matchmaking attributes
SubmitterUserResourcesInUse and RemoteUserResourcesInUse
no longer ignore SlotWeight, if used by the condor_negotiator.
- On Windows, the condor_kbdd daemon was missing changes to the
port on which the condor_startd was listening.
This resulted in failure of the condor_kbdd to send updates in
keyboard and mouse activity,
further causing the failure of policy implementation that relied upon
knowledge of the activity.
- Fixed a bug present throughout ClassAds,
in which expressions expecting a floating point value returned an error,
if the expression actually evaluated to a boolean.
This is most common in machine RANK expressions.
- Fixed a bug in the condor_negotiator daemon,
which caused a crash if the condor_negotiator was reconfigured
during a negotiation cycle,
but only if hierarchical group quotas were in use.
- Fixed a bug in which when submitting a job into the condor_schedd
remotely, or with spooling,
the file transfer plug-ins activated on the submit machine
and pulled down all the specified URLs in the transfer list
to the spool directory.
This behavior has been changed so that URLs are only downloaded
when the job is actually running with a condor_starter above it.
This is usually on an execute node, but could also be in the local universe.
- Removed the requirement that the Condor GAHP needs DAEMON-level
authorization access to the condor_gridmanager.
- On Windows platforms only,
fixed a bug that could cause a sporadic access violation
when a Condor daemon spawned another process.
- Fixed a bug that would cause the condor_startd to
incorrectly report Benchmarking as its activity, instead of Idle
when there was a problem launching the benchmarking programs.
- Fixed a bug in which the condor_startd can get stuck in a loop,
trying to execute an invalid, non-existent Daemon ClassAd Hook job.
- Fixed a bug in which the dedicated scheduler did not correctly
deactivate claims,
tending to result in jobs that appear to move back and forth between
the Idle and Running states,
with the condor_shadow daemon exiting each time with status 108.
Known Bugs:
Additions and Changes to the Manual:
Version 7.6.0
Release Notes:
- Condor version 7.6.0 released on April 19, 2011.
- Prior to Condor version 7.5.0, commenting out PREEN in the
default configuration file disabled condor_preen.
Starting in Condor version 7.5.0,
there was an internal default value for PREEN, so if
the configuration variable was not set in any configuration file,
condor_preen would still run.
To disable this functionality, PREEN can be explicitly set to
nothing.
New Features:
- Condor can now create and manage virtual machine instances in a
cloud service via Deltacloud. This is done via the new
deltacloud grid type in the grid universe.
See section 5.3.9 for details.
- Improved scalability of submission of cream grid type jobs.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable DELTACLOUD_GAHP specifies
where the deltacloud_gahp binary is located. This binary is used to
manage deltacloud grid type jobs in the grid universe.
In a normal Condor installation, the value should be
$(SBIN)/deltacloud_gahp.
- Several new job ClassAd attributes have been added to support
the deltacloud grid type in the grid universe.
These attributes are taken from the submit description file:
DeltacloudUsername,
DeltacloudPasswordFile,
DeltacloudImageId,
DeltacloudRealmId,
DeltacloudHardwareProfile,
DeltacloudHardwareProfileCpu,
DeltacloudHardwareProfileMemory,
DeltacloudHardwareProfileStorage,
DeltacloudKeyname, and
DeltacloudUserData.
These attributes are set by Condor when the instance runs:
DeltacloudAvailableActions,
DeltacloudPrivateNetworkAddresses,
DeltacloudPublicNetworkAddresses.
See section 5.3.9 for details of running jobs under
Deltacloud, and see section 10
for definitions of these job ClassAd attributes.
- The configuration variable JAVA_MAXHEAP_ARGUMENT
has been removed.
This means that Java universe jobs will now run with the JVM's
default maximum heap setting,
unless specified otherwise by the administrator using the configuration
of JAVA_EXTRA_ARGUMENTS ,
or by the job via
java_vm_args in the submit description file
as described in section 2.8.
- The configuration variable TRUST_UID_DOMAIN
was set to True in the default condor_config.local
in the rpm and Debian packages. This is no longer the case.
This setting will therefore use the default value False.
- The configuration variable NEGOTIATOR_INTERVAL was set
to 20 in the default condor_config.local in the rpm and
Debian packages. This is no longer the case. This setting
therefore will use the default value 60.
Bugs Fixed:
- Fixed a bug in condor_dagman that caused it to fail when in recovery
mode in the face of a specific combination of node job failures with
retries.
- Fixed a bug that resulted in the spooled user log not being
fetched by condor_transfer_data. Prior to Condor version 7.5.4, this
problem affected spooled jobs submitted with an explicit list of
output files to transfer. In Condor version 7.5.4, this problem also
affected spooled jobs that auto-detected output files.
- When a job is submitted with the -spool option to condor_submit,
the condor_schedd now correctly writes the submit event to the user log
in its spool directory.
Previously, it would try to write the event using the user
log path given to condor_submit by the user,
which condor_submit may not have access to.
- Fixed a file descriptor leak in the condor_vm-gahp. The leak would
cause the daemon to fail if a VMware job ran for more than 16 hours on a
Linux machine.
- Fixed a bug in condor_dagman that caused it to treat any dollar
sign in the log file name of a node job's submit description file
as an illegal condor_dagman macro.
Now only the sequence of characters $( delimits a macro.
Known Bugs:
- There are two known issues related to the automatic creation
of checkpoints with the Condor checkpointing library in
Condor version 7.6.0.
The first is that compression of
standalone checkpoints is disabled for 32-bit binaries.
It was always disabled previously, for 64-bit binaries.
A standalone checkpoint is one that is run outside
of Condor's standard universe. The second problem has to do with compressed
32-bit checkpoint files within the standard universe.
If a user requests a compressed 32-bit checkpoint in the standard universe,
the resulting checkpoint will not be compressed.
As with standalone checkpoints, this has never been supported
in 64-bit binaries. We hope to fix both problems in
Condor version 7.6.1.
Additions and Changes to the Manual:
Next: 8.4 Development Release Series
Up: 8. Version History and
Previous: 8.2 Upgrading from the
Contents
Index
condor-admin@cs.wisc.edu