Next: 10.6 Stable Release Series
Up: 10. Version History and
Previous: 10.4 Stable Release Series
Contents
Index
Subsections
10.5 Development Release Series 7.7
This is the development release series of Condor.
The details of each version are described below.
Version 7.7.6
Release Notes:
- Condor version 7.7.6 released on April 24, 2012.
This release contains all bug fixes from Condor version 7.6.7,
as listed in this manual's version history.
- In the Condor directory defined by $(SBIN),
condor_vm_vmware.pl was
renamed to condor_vm_vmware and grid_monitor.sh was
renamed to grid_monitor.
This makes Condor more compliant with Linux native packaging rules.
Symbolic links to the old locations are included to ease upgrading.
(Ticket #2940).
New Features:
- The values of request_memory, request_disk and
request_cpus submit description file commands will now be
automatically included in the job Requirements expression by
condor_submit. This is part of several changes
in code and policy intended to make partitionable slots easier to deploy
and use. The requested values for memory, disk and cpus, as well as the
amount of these resources that a job actually uses are now printed in the
user log when the job exits.
(Ticket #2843).
- The new keep_claim_idle submit description
file command requests that the condor_schedd keep a claim for a defined
number of seconds after the job exits.
The job ClassAd attribute KeepClaimIdle was introduced in
Condor version 7.7.1 to implement this functionality.
See the definition of this command at
section 11.
(Ticket #2094).
- Changed the default for condor_history to print out
items in reverse chronological order.
The new -forwards option enables the previous behavior of
printing historical jobs in chronological order.
(Ticket #2808).
- Enhanced the condor_negotiator to provide the name of
concurrency limits that cause negotiation to fail, so that
condor_q -analyze can provide more informative failure information.
(Ticket #2878).
- Concurrency limit defaults may now be declared for named groups
using CONCURRENCY_LIMIT_DEFAULT_<group> so that any
concurrency limit with a name of the form <group>.<name> will get its
default limit from CONCURRENCY_LIMIT_DEFAULT_<group> .
(Ticket #2863).
- Condor binaries will now look for the Condor configuration file in
$(HOME)/.condor/condor_config, in addition to the locations where
they already look.
Within the ordered search,
$(HOME)/.condor/condor_config is checked immediately after the
CONDOR_CONFIG environment variable.
(Ticket #2657).
- The condor_hdfs daemon is now available with the source code,
and is no longer distributed as part of the Condor binaries.
See documentation in section .
(Ticket #2797).
- Several of the Condor programs used to be given by a single executable
hard linked to multiple file names.
Now, symbolic links are used; this fixes problems with Debian installations.
(Ticket #2140).
- New ClassAd functions pow(), quantize(),
splitUserName(), and splitSlotName() are available.
See section 4.1.2 for definitions of these functions.
(Ticket #2856).
(Ticket #2891).
- New format tags %v and %V have been added for use by the
condor_status -format option.
These tags request that the value of the expression or attribute be printed
using a format appropriate to its type.
When using the %V format tag, string values appear as they would in
the output of condor_q -long or condor_submit -long.
(Ticket #2857).
- condor_ssh_to_job now provides support for X11 forwarding
via the new -X option.
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new machine ClassAd attributes RemoteGroup,
RemoteNegotiatingGroup, and RemoteAutoregroup,
and the new job ClassAd attributes SubmitterGroup,
SubmitterNegotiatingGroup, and SubmitterAutoregroup
enhance support for preemption policies with accounting group awareness.
(Ticket #2885).
- The new configuration variable
NEGOTIATOR_READ_CONFIG_BEFORE_CYCLE is a boolean which causes the
condor_negotiator to re-read the configuration prior to each
negotiation cycle when set to True.
(Ticket #2851).
- The new configuration variable MASTER_NEW_BINARY_RESTART
specifies how the condor_master will restart,
when it notices that the condor_master binary has changed.
Valid values are GRACEFUL, PEACEFUL and NEVER.
The default value is GRACEFUL.
(Ticket #2779).
- The configuration variable WANT_HOLD now takes effect
whether or not WANT_VACATE is True. Previously,
it only took effect if WANT_VACATE was True.
(Ticket #2855).
- The new configuration variables MEMORY_USAGE_METRIC and
MEMORY_USAGE_METRIC_VM specify the value that the
condor_starter will
set into the MemoryUsage attribute for a job. It is expected that
this will be a ClassAd expression that defines the job memory usage in terms
of other job attributes.
(Ticket #2843).
- The configuration variable DAGMAN_SUBMIT_DELAY can now be any
non negative integer. It was formerly limited to values between 0 and 60,
inclusive.
(Ticket #2864).
- New configuration variables have been added,
such that the condor_schedd may
define statistics that count subsets of jobs.
These variables
have the form SCHEDD_COLLECT_STATS_FOR_<name> and
are defined by a boolean ClassAd expression.
<name> will be prefixed to the names of attributes in the condor_schedd
ClassAd, such as physicsJobsStarted
where SCHEDD_COLLECT_STATS_FOR_physics evaluates to True,
and this attribute would be the count of jobs that have started.
(Ticket #2862).
- Several OpSys related attributes were added or updated to assist with selection of execute resources.
- OpSysAndVer:
- A string containing the value of the OpSysName attribute with the OpSysMajorVersion attribute appended.
- OpSysLegacy:
- A string that holds the long-standing values for the OpSys attribute.
- OpSysLongName:
- A string containing a full description of the operating system.
- OpSysMajorVersion:
- An integer value representing the major version of the operating system.
- OpSysName:
- A string containing a terse description of the operating system.
- OpSysShortName:
- A string containing a short description of the operating system.
- OpSysVer:
- An integer value representing the operating system version number.
(Ticket #2366).
- New configuration variables have been added to provide default values for attributes needed to provision
dynamic slots. condor_submit will insert the values of these variables into the job ClassAd when the submit
file does not provide a value for the attribute.
- RequestMemory:
- specified by JOB_DEFAULT_REQUESTMEMORY , defaults to MemoryUsage.
- RequestDisk:
- specified by JOB_DEFAULT_REQUESTDISK , defaults to DiskUsage.
- RequestCpus:
- specified by JOB_DEFAULT_REQUESTCPUS , defaults to 1.
(Ticket #2835).
- New configuration variables have been added for the condor_startd to enable default rules for partitionable
resources. The configuration variables are expected to be expressions that quantize or otherwise modify the job's
requested sizes of resources.
- RequestMemory:
- modify with MODIFY_REQUEST_EXPR_REQUESTMEMORY .
- RequestDisk:
- modify withMODIFY_REQUEST_EXPR_REQUESTDISK .
- RequestCpus:
- modify with MODIFY_REQUEST_EXPR_REQUESTCPUS .
- (Ticket #2850).
- There is a new configuration variable MUST_MODIFY_REQUEST_EXPRS for the condor_startd.
If false, then MODIFY_REQUEST_EXPRs are only applied if the job claim still matches the partitionable
slot after modification. If true, the modifications always take place, and if the modifications cause the claim
to no longer match, then the startd will simply refuse the claim. The default value is false.
(Ticket #2850).
Bugs Fixed:
- Fixed a bug in condor_vm-gahp that caused 64-bit guest OSes that
need network access to fail on start-up when run under VMware.
(Ticket #2922).
- Submit command remote_initialdir now works for pbs and lsf
grid universe jobs.
(Ticket #2913).
- Fixed the path to sftp_server on Mac OS X and Debian
platforms.
(Ticket #2789).
- Fixed a rare problem that caused a 20 second timeout to occur in
the condor_collector when authenticating.
(Ticket #2817).
- Fixed a rare bug in which the condor_schedd would sometimes not reuse
an existing claim to run a new job when an existing job exited.
This would result in the condor_schedd daemon
waiting for a new negotiation cycle to make a new match,
and thus producing a small performance penalty due to the
wasted time during the interval between negotiation cycles.
This bug was actually fixed in Condor version 7.7.5.
(Ticket #2802).
- Fixed a bug in condor_q, such that it no longer emits a parse
error when it times out attempting to talk to the condor_schedd daemon.
(Ticket #2854).
- The shared library libcondor_utils now includes the Condor
version in its name. This will reduce the chance of a Condor binary
using the wrong version of the library, which can result in a crash or
other bad behavior.
(Ticket #2613).
- There was a bug on GRACEFUL and PEACEFUL shutdown,
as the daemons were stopped in a random order.
This resulted in the checkpoint server
sometimes being shut down before the condor_startd.
The condor_startd is now always shut down first on GRACEFUL or PEACEFUL
shutdown,
with other daemons being shut down only after the condor_startd has exited.
(Ticket #2779).
- Under some circumstances,
a job in the removed ("X") state may have ignored the -forcex option
to condor_rm.
The condor_schedd is now more aggressive about removing such jobs
from the queue.
(Ticket #2809).
- Fixed the copying of scaling factors on ClassAd literal values.
(Ticket #2839).
- When a job is killed and put on hold because of
WANT_HOLD, the maximum vacate time is now enforced. If
it takes longer than the maximum vacate time for the job to be
gracefully killed, the job is hard-killed. Previously, no upper
limit was enforced.
(Ticket #2855).
- When selecting an IPv4 network interface to use Condor would erroneously prefer private networks over public networks in some cases. This has been fixed, Condor again prefers public networks over private networks.
(Ticket #2853).
- The condor_gridmanager is much better at sending commit signals
to the GRAM job-manager in a timely manner. As a result, the occurrence of
GRAM errors 111 and 130 should be greatly reduced.
(Ticket #2859).
- Fixed a bug that caused condor_submit to warn about
dag_status and failed_count not being used in the
submit files of most DAG node jobs (DAGMan now automatically defines
these macros for all node jobs). This bug was introduced in 7.7.5.
(Ticket #2814).
Known Bugs:
Additions and Changes to the Manual:
- The condor_submit man page contains descriptions of condor_starter
prescripts and postscripts.
See 11 and 11
for the descriptions.
(Ticket #2379).
Version 7.7.5
Release Notes:
- Condor version 7.7.5 released on February 28, 2012.
This release contains all features and bug fixes from Condor version 7.6.6.
- Support for the gt4 grid type (that is, Web Services GRAM) in the grid
universe has been removed.
(Ticket #2782).
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable JOB_QUEUE_LOG
specifies an alternative path and file name for the job_queue.log file.
The default value is $(SPOOL)/job_queue.log.
This alternative location can be
useful if there is a solid state drive which is big enough to hold the
frequently written to job_queue.log,
but not big enough to hold the whole contents of the spool directory.
(Ticket #2598).
- The new configuration variable DAGMAN_HOLD_CLAIM_TIME
specifies the amount of time in seconds that the condor_schedd
will hold a claim idle for a DAGMan job,
using the KeepClaimIdle attribute in the job ClassAd.
(Ticket #2673).
- The job ClassAd attributes
ResidentSetSize and ProportionalSetSizeKb now
report the maximum observed memory usage.
Previously, they reported the most recently observed memory usage.
This change makes these attributes similar to ImageSize,
which also reports the maximum observed value.
Previously, ResidentSetSize was
usually reported as 0 in the job history for completed jobs, because
when the job was finished, the final observation of memory usage
was 0.
(Ticket #2725).
- The job ClassAd attribute ResidentSetSize is now rounded
by default,
using the new default configuration setting
SCHEDD_ROUND_ATTR_ResidentSetSize = 25%.
(Ticket #2729).
- The configuration variable PROCD_LOG now defaults to
$(LOG)/ProcLog. Previously, there was no default value,
so the condor_procd did not log by default.
(Ticket #2775).
- The meaning of the VirtualMemory attribute of the condor_startd
has been changed for Linux platforms.
Previously, it was the amount of paging space configured for the system.
So, if a machine with a lot of memory had no paging space,
the VirtualMemory attribute would report zero.
Now, the VirtualMemory attribute on Linux platforms
is the sum of paging space and physical memory,
which more accurately represents the virtual memory size of the machine.
(Ticket #2763).
- The submit command globus_xml is no longer
recognized. Therefore, the following configuration variables are no longer
recognized:
- GRIDFTP_SERVER
- GRIDFTP_SERVER_WRAPPER
- GRIDFTP_URL_BASE
- GT4_GAHP
- GT4_LOCATION
- GT42_GAHP
- GT42_LOCATION
- GRIDMANAGER_MAX_WS_DESTROYS_PER_RESOURCE
(Ticket #2782).
- The new configuration variable
GRIDMANAGER_PROXY_REFRESH_TIME controls when the
condor_gridmanager forwards a refreshed proxy to the remote GRAM server.
The lifetime remaining on the proxy on the remote server (in seconds) must
fall below this value before the condor_gridmanager will forward a
refreshed proxy.
The default value is 21600 seconds (6 hours).
Previously, this value was not configurable.
(Ticket #2792).
- New job ClassAd attributes were added to assist in tracking the time
jobs spend transferring output. JobCurrentStartExecutingDate is the
time that execution actually begins, and JobCurrentStartTransferOutputDate
is the time that transfer output begins (and execution ends). In addition,
CumulativeTransferTime is the total amount of time the job spent transferring
data. This includes input and output.
(Ticket #2783).
Bugs Fixed:
- Fixed a bug in which condor_submit allowed the specification of
ec2_secret_access_key and ec2_access_key_id
to be directories instead of files.
condor_submit now generates an error in these cases.
(Ticket #2619).
- Communication errors were not always correctly handled when
fetching results of a query when using the -stream option to
condor_q. This problem was introduced in Condor version 7.7.0.
(Ticket #2601).
- Fixed Condor's CronTab (Crondor, section 2.12.2)
scheduling of jobs,
as they did not correctly take into account
shifts in time caused by daylight savings time transitions.
(Ticket #2620).
- Previously, condor_ssh_to_job sessions inherited the condor_starter
environment. Now, this only happens when
JOB_INHERITS_STARTER_ENVIRONMENT is True.
(Ticket #2621).
- On Linux platforms, the memory usage was ignored for job sub-processes
that were created via fork() without calling exec().
This problem affected ImageSize and ResidentSetSize,
but not ProportionalSetSize.
- Fixed a rare condition that could cause a job to remain in the
running state indefinitely when the job was removed or put on hold
and there was a communication failure between the condor_shadow
and the condor_starter.
This problem was introduced in Condor version 7.7.2.
(Ticket #2591).
- Fixed a bug in the condor_gridmanager that could cause crashes
and prevent the attribute x509UserProxyEmail from being set properly for
jobs forwarded via Condor-C.
(Ticket #2655).
- Fixed the output of condor_q -dag,
such that children of a non-existent DAG node would not be mistakenly
shown as belonging to another instance of condor_dagman.
This can happen, for example, when a condor_dagman process dies while
its children are still running.
(Ticket #2463).
- Fixed a bug in condor_dagman that caused a DAG to fail if node
job user log files were actually symbolic links.
This problem was introduced in the Condor 7.7 development series.
(Ticket #2704).
- Fixed a bug in the collection of Statistics attributes,
introduced in Condor version 7.7.2.
Condor did not count completed scheduler universe jobs in reported statistics.
(Ticket #2731).
- Fixed a rare bug in which the condor_c-gahp process could get
into an infinite loop on start up,
if more than one condor_c-gahp was running under different users,
and the names of the users only differed in their last character.
(Ticket #2749).
Known Bugs:
Additions and Changes to the Manual:
- Condor's ability to use cgroup-based process tracking,
available since Condor version 7.7.0,
has now been documented in section 3.12.12.
(Ticket #1831).
(Ticket #2120).
- Submitter ClassAd attributes are now documented in the unnumbered
appendix on page .
Version 7.7.4
Release Notes:
- Condor version 7.7.4 released on December 21, 2011.
This release contains all features and bug fixes from Condor version 7.6.5
as are currently documented (section 10.6) in this manual.
New Features:
- Condor version 7.7.4 has all of the features and fixes of 7.7.3, it
includes work toward running on a pure IPv6 network. This is disabled by
default. There is an severe bug where enabling IPv6 in a multi-computer pool
may cause the condor_starter to crash. For
more information on enabling IPv6 support in the 7.7 series of Condor, see https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToEnableIpvSix.
(Ticket #9).
Configuration Variable and ClassAd Attribute Additions and Changes:
Bugs Fixed:
Known Bugs:
- When IPv6 is enabled and you have multiple computers in your pool, the condor_starter may crash.
Additions and Changes to the Manual:
Version 7.7.3
Release Notes:
- Condor version 7.7.3 not yet released.
- On Linux and Mac OS X, the Condor binaries now dynamically link with
libcondor_utils,
a shared library that contains all Condor code that is
used by multiple binaries.
This library is not meant to be linked with user applications.
(Ticket #2132).
- Condor now dynamically links with the ClassAds, Globus and VOMS
libraries on Mac OS X.
A copy of these libraries is included with Condor.
(Ticket #2482).
New Features:
- In Condor version 7.7.2, multiple Condor installations led to the
possibility for some installations to use the wrong version of the ClassAds
library.
This should no longer be an issue,
as the binaries now use RUNPATH instead of RPATH,
allowing use of the LD_LIBRARY_PATH environment variable
to set where to look for the shared libraries.
(Ticket #2539).
- The Amazon SOAP interface is no longer present or supported in Condor.
The EC2 REST interface is favored and supported in Condor
using a grid_resource of ec2.
(Ticket #2523).
- The new condor_gather_info tool introduced in
Condor version 7.5.6 has been updated and enhanced.
It collects data about a Condor installation, and, if desired,
about a specific job.
This information is useful to Condor developers to help
debug problems in a pool or with a job.
(Ticket #1664).
(Ticket #2372).
- The condor_userprio tool supports two new command line options.
The -grouporder flag displays submitter entries
for accounting groups at top of the list,
in breadth-first order by group hierarchy.
The -grouprollup flag reports accounting statistics for groups
as summed at a level within the group hierarchy.
(Ticket #1926).
- The condor_collector now avoids the performance problems caused
previously when clients initiated communication with the condor_collector,
but then delayed sending input.
(Ticket #2506).
- When using versions of glexec that create a copy of the proxy
for use by the job,
Condor now ensures that this copy of the proxy is cleaned up
when the job is done.
(Ticket #2501).
- The condor_startd now logs a clear message, if it rejects a job
because no valid condor_starter daemons were detected.
(Ticket #2470).
- The new submit command want_graceful_removal
may be used to specify that a job being removed or put on hold should
be shut down gracefully, rather than being immediately hard-killed.
This allows the job to perform some final actions such as cleaning
up or saving state. The usual policies governing the Preempting/Vacating
state apply in this case.
This new submit command replaces a different mechanism that was added
in Condor version 7.5.2 to achieve some of the same effects.
The version 7.5.2 mechanism applied to vanilla jobs under Linux;
if the job set remove_kill_sig or kill_sig,
the hard-kill signal that Condor would normally send to end the job was
replaced with the signal specified by the user.
With the new submit command, the version 7.5.2 mechanism is no longer used.
The soft-kill signal may still be customized using
kill_sig, so a similar effect can be achieved by setting
want_graceful_removal=True and setting kill_sig
to an alternative signal, if desired. The new mechanism works on all
platforms and works for all universes in which the job is managed by
the condor_startd; as such the new mechanism is not supported
in the grid, local, or scheduler universes.
In addition, the new submit command job_max_vacate_time
replaces the kill_sig_timeout command.
job_max_vacate_time
adjusts the time given to an evicted job for gracefully shutting down.
(Ticket #2536).
- The condor_master now logs a more informative error message
when it fails to start a daemon.
(Ticket #2580).
- The condor_schedd daemon now logs a more informative error message
when it rejects job ClassAd updates from the condor_shadow due to
authorization problems.
(Ticket #2581).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable MachineMaxVacateTime is
now used to express the maximum time in seconds that the machine is
willing to wait for a job to gracefully shut down.
The default is 600 seconds (10 minutes).
The boolean KILL expression was
previously used to terminate the graceful shutdown of jobs.
It should normally be set to False now. If desired, it may be
used to abort the graceful shutdown of the job earlier than
MachineMaxVacateTime.
(Ticket #2536).
- The new configuration variable NEGOTIATOR_SLOT_CONSTRAINT
defines an expression which constrains which ClassAds are fetched
by the condor_negotiator from the condor_collector
for the negotiation cycle.
(Ticket #2277).
- The new configuration variable
NEGOTIATOR_SLOT_POOLSIZE_CONSTRAINT
replaces GROUP_DYNAMIC_MACH_CONSTRAINT .
GROUP_DYNAMIC_MACH_CONSTRAINT may still be used,
but a warning is written to the log,
identifying that the configuration needs to be updated to use the new name.
The pool size resulting from applying this constraint is used
to determine quotas for both dynamic quotas in hierarchical groups,
and when there are no groups.
(Ticket #2277).
- The configuration variable NEGOTIATOR_STARTD_CONSTRAINT_REMOVE
was introduced in Condor version 7.7.1.
It has now been removed, as its functionality
was made obsolete by NEGOTIATOR_SLOT_CONSTRAINT.
(Ticket #2277).
- The configuration variables IGNORE_NFS_LOCK_ERRORS
and BIND_ALL_INTERFACES no longer support the undocumented use of
'Y' or 'y' to mean True.
Bugs Fixed:
- Fixed a bug from Condor version 7.7.1
that caused submit description file commands using a substitution macro,
$$(),
to not work correctly when a condor_shadow daemon is recycled,
as it is when the configuration variable SHADOW_WORKLIFE
is set to a non-zero value.
(Ticket #2552).
- When the condor_procd's named command pipe is removed,
or when the inode of the pipe has been changed while the daemon is running,
the condor_procd will now exit.
Its previous behavior had the condor_procd continue to execute
in a useless mode of operation, unable to receive any communication.
(Ticket #2500).
- For Mac OS X platforms,
improper detection of a non existent process led to lines such as
ProcAPI sanity failure on pid 1317, age = -1901476270
appearing in the condor_master daemon log.
This should no longer be the case.
(Ticket #2594).
- Fixed a bug introduced with hierarchical group quotas that
failed to correctly initialize table entries.
The fix adds logic to the accounting mechanism in the
condor_negotiator daemon,
such that initialization occurs correctly
when starting up and upon reconfiguration.
(Ticket #2509).
- When condor_advertise is used with the -tcp option, this
used to cause the following log message to appear in the condor_collector
log:
DaemonCore: Can't receive command request from IP (perhaps a timeout?)
(Ticket #2483).
- Fixed a bug introduced in Condor version 7.7.0,
in which the setting of NETWORK_INTERFACE did not have any effect.
(Ticket #2513).
- glexec now also works when Condor is running as root.
(Ticket #2503).
- The condor_master daemon now successfully advertises itself in
a Personal Condor installation,
when the condor_collector is configured to use port 0
and to operate through a shared port.
(Ticket #2555).
- Since Condor version 7.7.1,
the configuration variable WANT_HOLD did not work,
unless WANT_HOLD_SUBCODE was set to a non-zero value.
(Ticket #2565).
- Since Condor version 7.7.2, there was a rare condition that could cause
a job to be removed from the queue,
if the job was put on hold while it was running.
In such cases, there was also a spurious
unsuspend event logged in the job's user log.
(Ticket #2577).
- Fixed a bug introduced in Condor version 7.7.2 by the change
of OpSys to "WINDOWS".
Submit description files that used old syntax for the
environment command
were using Unix syntax rather than Windows syntax.
(Ticket #2607).
- Fixed the linking of Kerberos libraries on RHEL 3.
The bug could cause
the Condor binaries to fail on some systems with the error:
relocation error: /usr/kerberos/lib/libgssapi_krb5.so.2:
undefined symbol: krb5int_enc_arcfour
(Ticket #2627).
Known Bugs:
Additions and Changes to the Manual:
Version 7.7.2
Release Notes:
- Condor version 7.7.2 released on October 11, 2011.
This release contains all features and bug fixes from Condor version 7.6.4
as are currently documented (section 10.6) in this manual.
- Condor now dynamically links with the ClassAds, Globus and VOMS libraries on
linux.
A copy of these libraries is included with Condor, under
lib/condor/ in the tarball releases and under
/usr/lib/condor/ or /usr/lib64/condor/ in the native package
releases.
(Ticket #2389).
(Ticket #2390).
New Features:
- Condor's standard universe now supports reading from and writing to
files that are larger than 2 GBytes,
when the standard universe application and
the condor_shadow daemon are both 64-bit executables.
(Ticket #2337).
- There is command line support to both suspend and continue jobs.
The new tools condor_suspend and condor_continue will
suspend and continue running jobs.
(Ticket #2368).
- The EC2 GAHP now supports X.509 for connecting to and authenticating
with EC2 services. See section 5.3.6 for details
on using the X.509 protocol.
(Ticket #2084).
- Previously, the dedicated scheduler attempted to change the
Scheduler attribute on all parallel job processes in a durable fashion,
resulting in an fsync() for each process.
This has been changed to be not durable,
thereby improving the scalability by reducing the
number of fsync() calls without impacting correctness.
(Ticket #2367).
- In PrivSep mode, when an error is encountered when trying to
switch to the user account chosen for running the job,
the error message has been improved to make debugging easier.
Now, the error message distinguishes between safety check failures
for the UID, tracking group ID, primary group ID, and supplementary group IDs.
(Ticket #2364).
- The name of the user used to execute the job is now logged in
the condor_starter log, except when using glexec.
(Ticket #2268).
- condor_dagman now defaults to writing a partial DAG file
for a Rescue DAG,
as opposed to a full DAG file.
The Rescue DAG file is parsed in combination with the original DAG file,
meaning that any
changes to the original DAG input file take effect when running a Rescue DAG.
(Ticket #2165).
- The behavior of DAGMan is changed, such that, by default,
POST scripts will be run regardless of the return value from
the PRE script of the same node as described in section 2.10.2.
The previous behavior of not running the POST script can be restored by
either adding the -DontAlwaysRunPost option to the condor_submit_dag
command line,
or by setting the new configuration variable
DAGMAN_ALWAYS_RUN_POST to False,
as defined at 3.3.25.
(Ticket #2057).
- DAGMan will now copy PRIORITY values from the DAG input file to
the JobPrio attribute in the job ClassAd.
Furthermore, the PRIORITY values are propagated to child nodes and SUBDAGs,
so that child nodes always have priority at least that
of the maximum of the priorities of its parents.
This has been a cause of confusion for DAGMan users.
(Ticket #2167).
- A matchmaking optimization has significantly improved the speed
of matching,
when there are machines with many slots.
(Ticket #2403).
- When the condor_schedd is starting up and it encounters corruption
in its job transaction log, the error message in the log file now reports
the offset within the file at which the error occurred.
(Ticket #2450).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new job ClassAd attribute PreserveRelativeExecutable,
when True prevents the condor_starter from
prepending Iwd to the command executable Cmd,
when Cmd is a relative path name and TransferExecutable
is False.
(Ticket #2460).
- Attributes have been added to all daemons to publish statistics
about the the number of timers, signals, socket, and pipe messages
that have been handled, as well as the amount of time spent handling them. Statistics attributes for DaemonCore
have names that begin with DC or RecentDC.
(Ticket #2354).
- The default value of OpSys on Windows machines has been changed
to "WINDOWS", and a new attribute OpSysVer has been added
that contains the version number of the operating system.
This behavior is controlled by a new configuration variable
ENABLE_VERSIONED_OPSYS which defaults to False on Windows
and to True on other platforms.
The new machine ClassAd attribute OpSys_And_Ver will always contain
the versioned operating system.
Note that this change could cause problems with mixed pools,
because Condor version 7.7.2 condor_submit may add OpSys="WINDOWS",
but machines running Condor versions prior to 7.7.2 will be publishing
a versioned OpSys value,
unless there is an override in the configuration.
(Ticket #2366).
- Configuration variable COLLECTOR_ADDRESS_FILE is now set
in the example configuration,
similar to MASTER_ADDRESS_FILE.
This configuration variable is required when COLLECTOR_HOST
has the port set to 0, which means to select any available port.
In other environments, it should have no visible impact.
(Ticket #2375).
- Attributes have been added to the condor_schedd
to publish aggregate statistics
about jobs that are running and have completed, as well as counts of various
failures.
(Ticket #2197).
- The new configuration variable DAGMAN_WRITE_PARTIAL_RESCUE
enables the new feature of writing a partial DAG file, instead of a full
DAG input file, as a Rescue DAG.
See section 3.3.25 for a definition.
Also, the configuration variable
DAGMAN_OLD_RESCUE no longer exists,
as it is incompatible with the implementation of partial Rescue DAGs.
(Ticket #2165).
Bugs Fixed:
- Fixed a bug introduced in Condor version 7.7.1,
in the standard universe,
where the getdirentries() call failed during remote I/O situations.
(Ticket #2467).
- Fixed a bug in the condor_startd that was preventing dynamic slots
from being properly instantiated from partitionable slots.
(Ticket #2507).
- Fixed a bug introduced in Condor version 7.7.0,
in which the condor_startd may erroneously report
Can't find hostname of client machine.
In cases where Condor was unable to identify the host name,
the ClientMachine
attribute in the machine ClassAd would have gone unset.
(Ticket #2382).
- Fixed a bug existing since April 2001,
in which on start up of the condor_schedd, with parallel universe jobs,
the job queue sanity checking code would change the Scheduler
attribute on jobs,
only to have the attribute changed later by the dedicated scheduler.
(Ticket #2367).
- Machine ClassAds with the Offline attribute set to True,
and with neither MyType nor TargetType
attributes defined caused
the condor_collector to fail to start when it was next restarted.
(Ticket #2417).
- Fixed a file descriptor leak in the EC2 GAHP,
which would cause grid-type ec2 jobs to become held.
The HoldReason for most such jobs would be
Unable to read from accesskey file.
(Ticket #2447).
- Fixed a bug that could cause a job's standard output and error to
be written to the wrong location when should_transfer_files was
set to IF_NEEDED,
and the job runs on the machine where file transfer is not needed.
If the standard output or error file names contained any path information,
the output would be written to _condor_stdout or
_condor_stderr in the job's initial working directory.
(Ticket #1811).
- Fixed a bug introduced in Condor version 7.7.1
that could cause the condor_schedd daemon to crash after
failing to expand a
$$
macro in the job ClassAd.
(Ticket #2491).
Known Bugs:
- In Condor version 7.7.2,
the Condor daemons on Linux platforms rely on shared libraries.
A bug in Condor version 7.7.1 and all previous versions of Condor
prevents a 7.7.1 condor_master from starting 7.7.2 or later daemons.
This also means that a 7.7.1 condor_master cannot upgrade itself to
version 7.7.2.
If a 7.7.1 condor_master binary is replaced with
a 7.7.2 condor_master binary,
Condor will shut off, and need to be restarted by hand.
Additions and Changes to the Manual:
Version 7.7.1
Release Notes:
- Condor version 7.7.1 released on September 12, 2011.
This developer release contains all bug fixes from Condor version 7.6.3.
New Features:
- Condor now dynamically links with the OpenSSL and Kerberos security
libraries, and Condor will use the operating system's version of these
libraries, when they are available.
The tarball release of Condor on Linux platforms includes
a copy of these libraries.
If the operating system's version is incompatible with Condor,
Condor will use its own copy instead.
Condor's copy of these libraries is located under lib/condor/.
To prevent Condor from considering using them, delete these libraries.
(Ticket #1874).
- The ClassAd language now has an unparse() function.
It converts an expression into a string,
which is handy with the new eval() function.
(Ticket #1613).
- The new job ClassAd attribute KeepClaimIdle is defined with an integer
number of seconds in the job submit description file, as the example:
+KeepClaimIdle = 300
If set, then when the job exits,
if there are no other jobs immediately ready to run for this user,
the condor_schedd daemon,
instead of relinquishing the claim back to the condor_negotiator,
will keep the claim for the specified number of seconds.
This is useful if another job will be arriving soon,
which can happen with linear DAGs.
The condor_startd slot
will go to the Claimed Idle state for at least that many seconds until
either a new job arrives or the timeout occurs.
See page ,
the unnumbered Appendix A for a complete definition of this
job ClassAd attribute.
(Ticket #2094).
- The new PRE_SKIP key word in DAGMan changes the
behavior of DAG node execution such that the node's job and POST script
may be skipped based on the exit value of the PRE script.
See section 2.10.2 for details.
(Ticket #2122).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable
NEGOTIATOR_STARTD_CONSTRAINT_REMOVE defaults to False.
When True, any ClassAds not satisfying the expression
in GROUP_DYNAMIC_MACH_CONSTRAINT are removed from the
list of condor_startd ClassAds considered for negotiation.
(Ticket #2232).
- The new configuration variable
NEGOTIATOR_UPDATE_AFTER_CYCLE defaults to False.
When True, it forces the condor_negotiator daemon
to update the negotiator ClassAd in the condor_collector daemon
at the end of every negotiation cycle.
This is handy for monitoring and debugging activities.
(Ticket #2373).
Bugs Fixed:
- Expressions for periodic policies such as
PERIODIC_HOLD and PERIODIC_RELEASE
could inadvertently cause a claim to be released,
if the condor_shadow exited before waiting for final update from the
condor_starter.
(Ticket #2329).
- condor_submit previously could incorrectly detect references
in the requirements expression to special attributes such as
Memory when the name of the attribute happened to appear in a
string literal or as part of the name of some other attribute.
The detection of references to various special attributes influences the
automatic requirements which are appended to the job requirements.
(Ticket #2350).
- In rare cases, CCB requests could cause the server to hang for
20 seconds while waiting for all of the request to arrive.
(Ticket #2360).
Known Bugs:
Additions and Changes to the Manual:
Version 7.7.0
Release Notes:
- Condor version 7.7.0 released on July 29, 2011.
This developer release contains all bug fixes from Condor version 7.6.2.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable NEGOTIATOR_UPDATE_AFTER_CYCLE
defaults to False.
If set to True, it will force the condor_negotiator daemon
to publish an update ClassAd to the condor_collector at the end of
every negotiation cycle.
This is useful if monitoring cycle-based statistics.
- The configuration variables for security
DENY_CLIENT and HOSTDENY_CLIENT
now also look for the prefixes TOOL and SUBMIT.
- CONDOR_VIEW_HOST is now a comma and/or white space separated
list of hosts, in order to support more than one CondorView host.
- For a job with an X.509 proxy credential, the new job ClassAd
attribute X509UserProxyEmail is the email address extracted
from the proxy.
- On Linux execute machines with kernel version more recent than 2.6.27,
the proportional set size (PSS) in Kbytes summed across all
processes in the job is now reported in the attribute
ProportionalSetSizeKb. If the execute machine does not
support monitoring of PSS or PSS has not yet been measured, this
attribute will be undefined. PSS differs from ImageSize in
how memory shared between processes is accounted. The PSS for one
process is the sum of that process' memory pages divided by the
number of processes sharing each of the pages. ImageSize is
the same, except there is no division by the number of processes
sharing the pages.
- The new configuration variable DAGMAN_USE_STRICT
turns warnings into errors, as defined in section 3.3.25.
- The condor_schedd now publishes performance-related statistics.
Page in Appendix A contains
definitions for these new attributes:
- DetectedMemory
- DetectedCpus
- UpdateInterval
- WindowedStatWidth
- ExitCode<N>
- ExitCodeCumulative<N>
- JobsSubmitted
- JobsSubmittedCumulative
- JobsStarted
- JobsStartedCumulative
- JobsCompleted
- JobsCompletedCumulative
- JobsExited
- JobsExitedCumulative
- ShadowExceptions
- ShadowExceptionsCumulative
- JobSubmissionRate
- JobStartRate
- JobCompletionRate
- MeanTimeToStart
- MeanTimeToStartCumulative
- MeanRunningTime
- MeanRunningTimeCumulative
- SumTimeToStartCumulative
- SumRunningTimeCumulative
- For Windows platforms, the condor_startd now publishes the
ClassAd attribute DotNetVersions,
containing a comma separated list of installed .NET versions.
Bugs Fixed:
- Fixed a bug in which the condor_startd daemon can get stuck in a
loop trying to execute an invalid,
that is non-existent, Daemon ClassAd Hook job.
- Fixed bug that would cause the condor_startd daemon to incorrectly
report Benchmarking activity instead of Idle activity,
when there is a problem launching the benchmarking programs.
- On Windows only, fixed a rare bug that could cause
a sporadic access violation when a Condor daemon spawned another process.
- Fixed a bug introduced in Condor version 7.5.5,
which caused the condor_schedd to die managing parallel jobs.
- The condor_startd daemon now looks up the condor_kbdd daemon address
on every update.
This fixed problems if the condor_kbdd daemon is restarted
during the condor_startd lifespan.
- Fixed bug in condor_hold that happened if the hold
reason contained a double quote character.
- Fixed a bug introduced in Condor version 7.5.6 that
caused any Daemon ClassAd hook job with non-empty value for
STARTD_CRON_<JobName>_ARGS,
SCHEDD_CRON_<JobName>_ARGS
or BENCHMARKS_<JobName>_ARGS to fail.
Also, the specification of
STARTD_CRON_<JobName>_ENV,
SCHEDD_CRON_<JobName>_ENV,
or BENCHMARKS_<JobName>_ENV for these jobs was ignored.
- Fixed bug in the RPM init script.
A status request would always report Condor as inactive,
and a shutdown request would not report failure if there was a
timeout shutting down Condor.
- File transfer plug-ins now have a correctly set environment.
- Fixed a problem with detecting IBM Java Virtual Machines whose
version strings have embedded newline characters.
- condor_q -analyze now works with ClassAd built-in functions.
- Fixed bug in condor_q -run, such that it displays
the host name correctly for local and scheduler universe jobs.
- Standalone checkpointing now works with compressed checkpoints again.
This had been broken in Condor version 7.5.4.
- On Windows, net stop condor would sometimes cause the
condor_master daemon to crash. This is now fixed.
- JobUniverse was effectively a required attribute for
jobs created via the Fetch Work hook,
due to the need to set the IS_VALID_CHECKPOINT_PLATFORM
expression, such that it would not evaluate to Undefined.
Now the default IS_VALID_CHECKPOINT_PLATFORM expression
evaluates to True when JobUniverse is not defined.
- When there are multiple cpus but only one slot, the slot name no
longer begins with slot1@.
- The tool condor_advertise seemed to be trying too hard to resolve
host names. This was fixed to only do the minimally necessary
number of look ups.
Known Bugs:
Additions and Changes to the Manual:
Next: 10.6 Stable Release Series
Up: 10. Version History and
Previous: 10.4 Stable Release Series
Contents
Index
htcondor-admin@cs.wisc.edu