Next: 10.5 Stable Release Series
Up: 10. Version History and
Previous: 10.3 Stable Release Series
Contents
Index
Subsections
10.4 Development Release Series 7.9
This is the development release series of HTCondor.
The details of each version are described below.
Version 7.9.6
Release Notes:
- HTCondor version 7.9.6 released on May 8, 2013.
New Features:
- The new condor_ping command line tool attempts one or more
targeted security negotiations to see if it succeeds or fails,
potentially helping to debug security configuration.
(Ticket #3371).
- The condor_schedd will now also advertise demand by jobs for slots,
weighted by the count of requested CPUs, if the configuration variable
NEGOTIATOR_USE_WEIGHTED_DEMAND is set to True.
The default value is False.
(Ticket #3574).
- Negotiation under groups now prefers the specification of
groups with the new submit commands accounting_group and
accounting_group_user.
See section 3.4.7 for details.
(Ticket #2728).
- The vm universe now supports VMware Workstation
and VMware Player.
(Ticket #740).
- On Linux platforms where cgroups are supported and enabled, the
condor_starter will now detect and trap if a vanilla universe job
would otherwise be killed by the system Out Of Memory (OOM) killer.
This situation is especially likely when a job sets RequestMemory
lower than needed. The job will now be put on hold.
(Ticket #2992).
- The new -force-graceful command-line option to condor_off
allows administrators to issue a graceful shutdown command, even after
issuing a -peaceful command. Previously, a peaceful condor_off
command would preclude a -graceful off command.
(Ticket #2949).
- The condor_gather_info tool now includes the output of the Unix
uptime and free programs,
as well as logs of the condor_master, condor_startd, and condor_starter
of the machine where the job most recently ran,
if condor_gather_info has the necessary permissions to fetch those logs.
(Ticket #3246).
- The rotation of a daemon log file can now be specified
in terms of time (seconds) or in terms of maximum size (bytes).
Only size was allowed previously.
(Ticket #3560).
- The new condor_qsub command line tool emulates submission to PBS, SGE,
and Torque-like systems. It handles both scripts and command line options.
(Ticket #2699).
- When submitting a grid universe job with a grid type of batch,
the value of request_memory is now propagated to the batch system
submission request.
(Ticket #3398).
- The set of Python bindings introduced in HTCondor version 7.9.4 is now
distributed as part of HTCondor, not as a contrib module.
(Ticket #3586).
- Several improvements have been made to the
condor_gather_info tool. It now prints the name of the
tarball it emits, and now also checks the history file for the
job in question, and if found, uses and displays the information
there.
(Ticket #3239).
(Ticket #3240).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable
GSI_DELEGATION_CLOCK_SKEW_ALLOWABLE , expressed in seconds,
allows HTCondor to adjust the amount of
allowable clock skew between two parties.
This is relevant when delegating X.509 proxies.
(Ticket #3557).
- The new configuration variable CheckpointPlatform
is a string that may be set by
an administrator to override the auto-detected
platform used to determine if a standard universe job that produced
a checkpoint on one machine can be started on another.
(Ticket #3544).
- The new Machine ClassAd attributes Has_sse4_1,
Has_sse4_2, and Has_ssse3 are set to True
if the corresponding instruction set additions exist on that machine.
These attributes will be undefined otherwise.
(Ticket #3544).
- The name of the configuration variable MEMORY_LIMIT
introduced in HTCondor version 7.9.2 has changed.
This variable is now called CGROUP_MEMORY_LIMIT_POLICY .
(Ticket #3564).
- The new configuration variable EXPIRE_INVALIDATED_ADS ,
when set to True, causes invalidated ClassAds that would have
been removed from the condor_collector right away to instead
be treated as expired ClassAds, such that they may become absent ClassAds.
See section 3.10.4 for details on absent ClassAds.
(Ticket #3085).
- The new configuration variable
GLEXEC_HOLD_ON_INITIAL_FAILURE controls whether jobs are put
on hold when a failure is encountered in the glexec setup phase of
managing the job. The default value is True,
which implements the previous behavior of putting a job on hold when
there is a failure.
(Ticket #3569).
- The new configuration variable
NEGOTIATOR_CONSIDER_EARLY_PREEMPTION controls whether jobs
can be matched to slots that still have retirement time remaining
before the existing job can be evicted. The default is False.
The old behavior can be enabled by setting it to True. The
new default behavior is intended to improve scheduling behavior
when MaxJobRetirementTime is used.
(Ticket #3539).
- The new configuration variable SCHEDD_AUDIT_LOG
defines a file name, such that the
condor_schedd can now write an audit log that records all
commands issued by users that modify the job queue.
(Ticket #3493).
- The per-user file transfer I/O statistics now have a prefix of
Owner_<username>_. In HTCondor version 7.9.5,
they had a prefix of
<username>_. This can be configured via
TRANSFER_QUEUE_USER_EXPR .
(Ticket #3496).
- The new configuration variable
BATCH_GAHP_CHECK_STATUS_ATTEMPTS controls how often the
condor_gridmanager should retry a failed job status check when using
the batch_gahp. The default is 5.
(Ticket #3533).
Bugs Fixed:
Known Bugs:
- Using condor_userprio with the -grouprollup option
will fail to produce any output
if the condor_negotiator it queries is of a version older
than HTCondor version 7.9.6 and the condor_userprio executable is
HTCondor version 7.9.6.
(Ticket #3600).
Additions and Changes to the Manual:
Version 7.9.5
Release Notes:
- HTCondor version 7.9.5 released on April 17, 2013.
New Features:
- The new command line tool condor_tail
displays files that are in the sandbox of a running job.
See details in the manual page at
section 11.
(Ticket #3522).
- When there are multiple users
waiting to transfer files within the limits set by
configuration variables
MAX_CONCURRENT_UPLOADS and/or
MAX_CONCURRENT_DOWNLOADS , the scheduling algorithm
now gives the users
an equal share of the transfer slots. How shares are counted can be
configured with TRANSFER_QUEUE_USER_EXPR .
(Ticket #3487).
- When using the -remote or -spool options to
condor_submit, the job owner will now be set based
upon how the job submitter was authenticated.
This will make it easier to submit jobs
to a remote condor_schedd where the credentials may map to a different
account name.
(Ticket #3370).
- New functions are available in the Python Bindings contrib module.
ClassAds now more closely mimic Python dictionaries and provide
support for lists and values that are ClassAds.
(Ticket #3494).
- If a job is submitted specifying keep_claim_idle,
the claim is kept not only when the job exits,
but also when the job is removed.
(Ticket #3491).
- condor_dagman now publishes in its own job ClassAd,
attributes with the DAG status,
such as total number of nodes, nodes queued, and nodes finished.
See section 2.10.13 for more information.
(Ticket #1782).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable GSI_DELEGATION_KEYBITS
allows the number of bits in a delegated proxy to be specified
by the receiving side.
(Ticket #3503).
- When using file transfer concurrency limits, additional I/O
usage statistics are now published as attributes in the ClassAd of the
condor_schedd. This includes the sum and rate of bytes
transferred as well as time spent reading and writing to files and
to the network. These statistics are reported for the sum of all
users and, when increased verbosity is configured, individually for
recently active users.
These ClassAd attributes are fully described
within the section on scheduler attributes at
section .
(Ticket #3496).
- FileTransferUploadBytes
-
- FileTransferUploadBytesPerSecond_<timespan>
-
- FileTransferDownloadBytes
-
- FileTransferDownloadBytesPerSecond_<timespan>
-
- FileTransferFileReadSeconds
-
- FileTransferFileReadLoad_<timespan>
-
- FileTransferFileWriteSeconds
-
- FileTransferFileWriteLoad_<timespan>
-
- FileTransferNetReadSeconds
-
- FileTransferNetReadLoad_<timespan>
-
- FileTransferNetWriteSeconds
-
- FileTransferNetWriteLoad_<timespan>
-
- NOT_RESPONDING_TIMEOUT now internally adds some random skew
to avoid synchronization of heartbeat messages, which can lead to UDP
buffer overflow and incorrect determination that daemons are hung.
(Ticket #3510).
Bugs Fixed:
- The EC2 GAHP now treats OpenStack's stopped state as if it
were shutoff, terminating instances which enter this state and
preventing the instances from remaining in the queue forever.
(Ticket #3507).
- Two EC2 GAHP bugs are fixed.
It now correctly parses XML namespaces as returned for
some installations of Eucalyptus.
The second bug caused HTCondor to put the job on hold,
as it incorrectly believed that the cloud
service had purged it.
(Ticket #3492).
- The EC2 GAHP now reports the
bidding status, defined at
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-bid-status.html,
for spot instances.
(Ticket #3388).
- The condor_negotiator now checks to see if
the time set by configuration variable
NEGOTIATOR_MAX_TIME_PER_SUBMITTER
has been exceeded while negotiating with a single condor_schedd daemon.
This configuration variable was previously only effective
if a submitter used multiple condor_schedd daemons.
(Ticket #3504).
- condor_dagman will now recover correctly in a DAG where a node has been
skipped because of a PRE_SKIP has triggered.
(Ticket #2966).
- Fixed a bug in the condor_gridmanager and condor_ft-gahp that
could cause a crash when transferring files for grid universe jobs of
grid type batch going to a remote cluster.
(Ticket #3529).
- Fixed a bug in which condor_status queries would not work,
with output of "Access denied", unless the condor_collector
and the machine doing the query had synchronized clocks.
(Ticket #3360).
- Fixed a Linux platform bug in which mount points were leaked
to the greater namespace when the configuration set
MOUNT_UNDER_SCRATCH for file systems
that have been mounted with shared propagation enabled.
(Ticket #3505).
- Fixed a bug in the logging code that was causing grid universe batch jobs
to abort and drop a dprintf error file during file transfer once the log had
grown large enough to rotate.
(Ticket #3528).
- On Windows platforms, running the condor_kbdd no
longer creates a visible console window.
(Ticket #2805).
Known Bugs:
- Running condor_rm with the -f option on a parallel universe
job can cause the condor_schedd to crash.
(Ticket #3561).
- Using privilege separation may cause execute directories to be leaked,
if the condor_starter is shut down prematurely;
for example, shut down may occur by a hard kill signal or power interruption.
(Ticket #3573).
- If a job has any output files and uses the file transfer mechanism,
the job ClassAd attribute ExitCode may be lost,
causing its value to be reported as 0.
(Ticket #3577).
Additions and Changes to the Manual:
Version 7.9.4
Release Notes:
- HTCondor version 7.9.4 released on February 20, 2013.
New Features:
- Per job PID namespaces are available for Linux RHEL 6 platforms.
See section 3.12.10 for details.
(Ticket #1959).
- The EC2 GAHP now batches requests for status updates, significantly
reducing its resource requirements.
(Ticket #3436).
- The maximum total size of file transfers for a job may now be
specified using the new configuration variables
MAX_TRANSFER_INPUT_MB and MAX_TRANSFER_OUTPUT_MB
and/or the new submit commands
max_transfer_input_mb and
max_transfer_output_mb.
(Ticket #3333).
- The batch_gahp no longer relies on programs
grid-proxy-info and grid-proxy-init from the Globus
Toolkit to handle the X.509 proxies of jobs.
(Ticket #3431).
- When the job's executable is transferred, always set the execute
bits on the copy.
(Ticket #3028).
- By default, condor_dagman now issues a fatal error
if any log file, which is either
the default log file or the log file specified for a node job,
is in /tmp, because this can cause DAGMan to fail.
This error can be downgraded to a warning by setting the
configuration variable
DAGMAN_USE_STRICT value to 0.
(Ticket #1419).
- The condor_collector will accept and display collector ClassAds for
multiple collectors from the same machine. For this to work, the
collectors must configured with different values for configuration
variable COLLECTOR_NAME.
(Ticket #3467).
- condor_dagman now will successfully set attributes for submitted jobs
using the condor_submit syntax of placing a + sign just to the
left of the attribute name.
See section 2.10.7 for more details.
(Ticket #3469).
- The HTCondor contrib now includes a set of Python bindings in
two modules.
The htcondor module interacts with the condor_schedd and
condor_collector daemons.
The classad module provides an interface to work with ClassAds.
(Ticket #3407).
- When using condor_compile,
Pthreads are not normally permitted to be used by standard universe jobs.
However, condor_compile will now tell a user that they
should be linking to the GNU Pth library,
which is built with the
--enable-pthread
flag.
This will permit jobs that use Pthreads to be built with condor_compile.
(Ticket #3319).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable USE_PID_NAMESPACES
enables per job PID namespaces for Linux RHEL 6 platforms when True.
(Ticket #1959).
- The new configuration variable FLOCK_INCREMENT allows
administrators to more aggressively flock to remote condor_collector daemons,
as more pools will be considered.
(Ticket #3375).
- The new configuration variable HOST_ALIAS specifies the
fully qualified host name that clients authenticating this daemon with
GSI should
expect the daemon's certificate to match. The alias is advertised
to the condor_collector as part of the address of the daemon.
When this is not set, clients validate the daemon's certificate
host name by matching it against DNS A records for the host they
are connected to. See GSI_SKIP_HOST_CHECK for ways
to disable this validation step.
(Ticket #1605).
- The configuration variable DAGMAN_USE_STRICT now
defaults to a value of 1, rather than 0.
See the definition at section 3.3.25.
(Ticket #3418).
- The new configuration variable GRACEFULLY_REMOVE_JOBS
is a boolean value that controls whether jobs to be removed are
gracefully removed.
The default is to do graceful removal.
(Ticket #3470).
Bugs Fixed:
- When HTCondor creates a key pair at an EC2 job's request, it no
longer fails to remove the private key from disk when the job leaves
the queue.
(Ticket #3477).
- The EC2 GAHP now recognizes the OpenStack shutoff state and
terminates instances which enter this state,
preventing the instances from remaining in the queue forever.
(Ticket #3367).
- condor_dagman no longer does unnecessary sleeps for log file
consistency when a single default/workflow log file is used.
(Ticket #3456).
- Fixed a bug introduced in HTCondor version 7.9.0 that caused
the following configuration variables to not sort ClassAds properly
when they evaluated to True or False:
NEGOTIATOR_PRE_JOB_RANK ,
NEGOTIATOR_POST_JOB_RANK , PREEMPTION_RANK , and
SCHEDD_PREEMPTION_RANK .
(Ticket #3468).
- Fixed a bug that can cause grid universe jobs of type batch
to fail when submitted to an HTCondor cluster with a large history file.
(Ticket #3429).
- Corrected the submission of interactive jobs for cases
in which the submit description file specified Arguments.
(Ticket #3455).
- The semantics of signals sent to jobs were changed.
They have been changed back to the semantics defined in version 7.6.
(Ticket #3470).
Known Bugs:
Additions and Changes to the Manual:
Version 7.9.3
Release Notes:
- HTCondor version 7.9.3 released on January 16, 2013.
New Features:
- When the new configuration variable ASSIGN_CPU_AFFINITY
is set to True,
the condor_startd will automatically set the CPU affinity
mask jobs run with, so that a multi-threaded job will not use
more cores than the number it requests.
(Ticket #3348).
- When configuration variable NEGOTIATOR_CONSIDER_PREEMPTION
is False, the condor_negotiator
now fetches machine ClassAds more quickly from the condor_collector
by skipping most attributes of the busy machines.
This can make negotiation much faster in
a very large pool of mostly claimed machines.
(Ticket #3366).
- Round-robin scheduling is now used when there are multiple users
waiting to transfer files in the limits set by
MAX_CONCURRENT_UPLOADS and/or
MAX_CONCURRENT_DOWNLOADS . Previously, the file transfer
queue was scheduled in first-in-first-out order, so one user with
many files to transfer could delay other users for as long as it took
to transfer those files. Now, when choosing a new job to allow to
transfer, the first job belonging to the user who has least
recently been given an opportunity to transfer will be selected.
The old behavior, or variations on the new behavior, can be achieved
by configuring TRANSFER_QUEUE_USER_EXPR .
(Ticket #3333).
- condor_dagman will now try twice to write a POST script terminate
event, rather than trying once and exiting.
If it is unable to write the event, condor_dagman exits,
writing a Rescue DAG.
(Ticket #965).
- The condor_gridmanager now cleans up temporary files and directories
that are sometimes left by the batch_gahp when executing a grid
universe job of grid type batch.
(Ticket #3276).
- Added counts of nodes in various states to the condor_dagman
node status file. Refer to section 2.10.11 for
more information.
(Ticket #2075).
- When submitting jobs to a remote batch system (for example, BOSCO),
file transfer no longer requires a network connection from the remote machine
back to the local one.
(Ticket #3293).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new expert-only configuration variable
STATISTICS_WINDOW_QUANTUM
allows administrators to set the time interval,
known as a quantum, that divides a window over which statistics are
kept into smaller pieces. The window advances one quantum at a time.
(Ticket #3288).
Bugs Fixed:
- Jobs of the EC2 grid type which make invalid requests of the
service no longer go on hold when removed.
An example of this is when a job specifies a nonexistent AMI.
(Ticket #3287).
- Jobs of the EC2 grid type which cannot authenticate with the
service no longer go on hold when removed.
(Ticket #3387).
- Fixed a problem with glexec that caused jobs not to start
due to permission errors on the execute directory.
(Ticket #3369).
- A change was made to more accurately implement the
minimum time defined by the configuration variable
NEGOTIATOR_CYCLE_DELAY .
(Ticket #3332).
- The batch_gahp is no longer dependent on the Perl module
XML::Simple when submitting jobs to SGE.
(Ticket #3350).
- The batch_gahp now properly handles job X.509 proxies that
are not in the old proxy format.
(Ticket #3362).
- On 32-bit platforms,
setting configuration variable STARTER_RLIMIT_AS to a value
larger than 4096 could cause jobs to abort on start up.
Since values larger than 2047 have no real meaning on 32-bit platforms,
the fix treats values larger than 2047 as no limit on 32-bit platforms.
(Ticket #3309).
- Fixed a bug that can cause proxy refresh to fail for pbs, lsf,
and sge grid jobs.
(Ticket #3383).
- When doing remote pbs, lsf, or sge grid job submissions, the
condor_gridmanager now ensures that no unusual characters are used in
the name of the job sandbox directory it creates.
(Ticket #3294).
- When a GAHP server fails to start, the condor_gridmanager now
puts the affected jobs on hold.
(Ticket #3301).
- Environment variable GLOBUS_LOCATION is now set for
batch_gahp,
allowing it to find proxy management that it needs for jobs that have an
X.509 proxy.
(Ticket #3015).
- The installation RPM now requires Security Enhanced Linux (SELinux)
scripts at post install time,
so that the scripts can set the appropriate security contexts.
(Ticket #3313).
Known Bugs:
Additions and Changes to the Manual:
Version 7.9.2
Release Notes:
- HTCondor version 7.9.2 released on December 11, 2012.
This release contains all of the bug fixes in the version 7.8.6
stable release,
and most of the bug fixes in the
soon to be released version 7.8.7 stable release.
New Features:
- The permissions for the temporary execute directory of a job
have been tightened for vanilla universe jobs,
such that only the owner of the job is allowed to see or
modify the contents.
(Ticket #3315).
- Added experimental support for EC2 spot instances.
(Ticket #3209).
- (This feature was added in version 7.9.1.)
There are two new protocols for the submission of grid type EC2 jobs,
euca3:// and euca3s://.
These protocols exist to work correctly when the resources do not support
the InstanceInitiatedShutdownBehavior parameter.
(Ticket #2974).
- (This feature was added in version 7.9.1.)
Added both a -suppress_notification,
a -dont_suppress_notification command line option,
and corresponding
DAGMAN_SUPPRESS_NOTIFICATION configuration variable
to condor_dagman and condor_submit_dag.
This enables a user of DAGMan to stop email notification of job
events for jobs submitted by condor_dagman. The value of
DAGMAN_SUPPRESS_NOTIFICATION defaults to True,
so that jobs submitted
by condor_dagman will not send email notification.
(Ticket #3352).
- The default for job notification email has changed
from Complete to Never.
There is also a new configuration variable, JOB_DEFAULT_NOTIFICATION ,
which permits administrators to change the default for all jobs.
(Ticket #2155).
- For platforms supporting cgroups,
resource limits can now be applied per job,
where a job may consist of multiple processes.
See section 3.12.14 for details.
(Ticket #2734).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable MEMORY_LIMIT
supports implementing memory resource limits on a per-job basis under cgroups.
(Ticket #2734).
Bugs Fixed:
- condor_schedd and condor_shadow were not respecting the
DAGManNodesMask attribute. This caused extra events to be written to
the DAGMan node log.
(Ticket #3311).
- Removed a spurious newline from the output of condor_submit.
(Ticket #3316).
- Fixed a bug that caused the condor_shadow to set job attribute
X509UserProxySubject to the wrong value when the job's X.509
proxy file was updated. It incorrectly set the value to be
the proxy's subject name, rather than to the correct value, which is
its identity.
(Ticket #3265).
- The batch_gahp no longer modifies the environment variable
LD_LIBRARY_PATH.
In some instances, modifying LD_LIBRARY_PATH caused the
batch system's command line tools to fail when run by the batch_gahp.
(Ticket #3317).
- Grid-type batch jobs now work properly on machines
where the gLite software has been installed.
(Ticket #3269).
- The condor_shadow would never print the allocated amount of
partitionable resources in the job log.
(Ticket #3318).
- condor_who would sometimes incorrectly display blank or partial
values in the PROGRAM column.
(Ticket #3314).
Known Bugs:
Additions and Changes to the Manual:
Version 7.9.1
Release Notes:
- Condor version 7.9.1 released on October 22, 2012.
- Condor no longer looks for its main configuration file in the
location $(GLOBUS_LOCATION)/etc/condor_config.
(Ticket #2830).
- Security Item: This version contains an important security bug fix. See below
for details of this and other bugs fixed.
New Features:
- There are two new protocols for the submission of grid type EC2 jobs,
euca3:// and euca3s://.
These protocols exist to work correctly when the resources do not support
the InstanceInitiatedShutdownBehavior parameter.
(Ticket #2974).
- condor_job_router can now submit the routed copy of jobs to a
different condor_schedd than the one that serves as the source of
jobs to be routed. The spool directories of the two
condor_schedds must still be directly accessible to
condor_job_router. This feature is enabled by using the new
optional configuration settings:
- JOB_ROUTER_SCHEDD1_SPOOL
See definition at section 3.3.21.
- JOB_ROUTER_SCHEDD2_SPOOL
See definition at section 3.3.21.
- JOB_ROUTER_SCHEDD1_NAME
See definition at section 3.3.21.
- JOB_ROUTER_SCHEDD2_NAME
See definition at section 3.3.21.
- JOB_ROUTER_SCHEDD1_POOL
See definition at section 3.3.21.
- JOB_ROUTER_SCHEDD2_POOL
See definition at section 3.3.21.
(Ticket #3030).
- The condor_job_router can now optionally transform jobs in place,
rather than creating a second transformed version (copy) of the job.
(Ticket #3185).
- The condor_defrag daemon now has a policy option implemented
by configuration to cancel the draining
of a machine that is in the Draining mode. This can be used to effect
partial draining of machines.
(Ticket #2993).
- Communication between the condor_c-gahp and the condor_schedd has
been improved. A large number of Condor-C jobs should no longer cause
other clients of the remote condor_schedd to time out trying to get the
condor_schedd daemon's attention.
(Ticket #2575).
- condor_history and condor_q can now be told to read job records
from a user log, instead of parsing the history file or querying the
condor_schedd. This can be used to monitor the status of jobs with
reduced load on the condor_schedd.
(Ticket #3188).
- Eucalyptus 3.x support has been added to the EC2 GAHP.
(Ticket #2974).
- File transfer remaps now support remapping directories.
(Ticket #3039).
- The condor_schedd can now dynamically spawn a local condor_startd
to manage local universe jobs.
(Ticket #3129).
- condor_q -jobads will now respect the -constraint option.
(Ticket #3191).
- Added BOSCO, a set of tools that makes it easy to use a Personal
Condor to run jobs on remote batch systems without administrator
assistance or manual installation of software on the remote systems.
See https://twiki.grid.iu.edu/bin/view/CampusGrids/BoSCO
for more
information about BOSCO.
(Ticket #2421).
Configuration Variable and ClassAd Attribute Additions and Changes:
- Dynamic slots now fill the values for attributes of with names
that begin with
TotalSlot,
for configured local resources in a way consistent with standard resources
such as TotalSlotCpus.
Previously those values were all given the value zero on dynamic slots.
(Ticket #3229).
- The condor_schedd now advertises the value of configuration variable
COLLECTOR_HOST as attribute CollectorHost in
its daemon ClassAd. This allows one to determine if a given
condor_schedd reporting to a condor_collector is flocking to that
condor_collector or not.
(Ticket #3202).
- Added the attribute DAGManNodesMask to control the verboseness of
the log referred to by DAGManNodesLog.
(Ticket #3351).
- The new configuration variable
QUEUE_SUPER_USER_MAY_IMPERSONATE specifies a regular
expression that matches the user names that
the queue super user may impersonate when managing jobs. When not
set, the default behavior is to allow impersonation of any user who
has had a job in the queue during the life of the condor_schedd. For
proper functioning of the condor_shadow, the condor_gridmanager, and
the condor_job_router, this expression, if set, must match the owner
names of all jobs that these daemons will manage.
(Ticket #3030).
- The new configuration variable DEFRAG_CANCEL_REQUIREMENTS
is an expression that specifies which draining machines should have
draining be canceled.
This defaults to $(DEFRAG_WHOLE_MACHINE_EXPR).
This could be used to drain partial rather than whole machines.
(Ticket #2993).
- The new submit command use_x509userproxy can be set
to True to indicate that an X.509 user proxy is required for the job.
If x509userproxy is not set,
then the proxy file will be looked for in the standard locations.
(Ticket #3025).
- If condor_submit is used to submit an interactive job,
and the job is interrupted before the interactive job starts,
an attempt is made to immediately remove the interactive job from the queue.
Similarly, condor_ssh_to_job has a new option -remove-on-interrupt.
(Ticket #3242).
- Changes to were made to the ClassAd machine attributes
OpSys, OpSysVer, Distro, as well as others,
in order to do a better job of identifying the operating system.
(Ticket #2366).
- GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE can now be a
list, specifying different values for different hosts.
(Ticket #3220).
- The new configuration parameter GRIDMANAGER_JOB_PROBE_RATE
limits the number of job status requests sent to each remote resource.
(Ticket #3023).
- The default value of GRIDMANAGER_JOB_PROBE_INTERVAL has
changed from 300 to 60.
(Ticket #3023)..
- The configuration parameters CONDOR_JOB_POLL_INTERVAL and
INFN_JOB_POLL_INTERVAL should no longer be used. Use
GRIDMANAGER_JOB_PROBE_INTERVAL_CONDOR and
GRIDMANAGER_JOB_PROBE_INTERVAL_BATCH instead.
(Ticket #3023).
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
Version 7.9.0
Release Notes:
- Condor version 7.9.0 released on August 16, 2012.
New Features:
- Machine slots can now be configured to identify and
divide customized local resources.
Jobs may then request these resources.
See section 3.5.10 for details.
(Ticket #2905).
- Condor now supports and implements the caching of ClassAds
to reduce memory footprints.
This feature is experimental and is currently disabled by default.
It can be enabled by setting
the new configuration variable ENABLE_CLASSAD_CACHING
to True.
(Ticket #2541).
(Ticket #3127).
- condor_status now returns the condor_schedd ClassAd directly
from the condor_schedd daemon,
if both options -direct and -schedd are given on the command line.
(Ticket #2492).
- The new -status and -echo command line options to
condor_wait command cause it to show job start and terminate information,
and to print events to stdout.
(Ticket #2926).
- Added a DEBUG logging level output flag D_CATEGORY,
which causes Condor to include the logging level
flags in effect for each line of logged output.
(Ticket #2712).
- condor_status and condor_q each have a new -autoformat option
to make some output format specifications easier than the existing
-format option.
See the condor_status manual page located on page
and the condor_q manual page located on page
for details.
(Ticket #2941).
- Enhanced the ClassAd log system to report the log line number
on parse failures,
and improved the ability to detect parse failures closer to
the point of corruption.
(Ticket #2934).
- Added an -evaluate option to condor_config_val, which causes the configured value queried from
a given daemon to be evaluated with respect to that daemon's ClassAd.
(Ticket #856).
- Added code to condor_dagman,
such that a VARS assignment in a top-level DAG is applied to splices.
(Ticket #1780).
- Condor now uses libraries from Globus 5.2.1.
(Ticket #2838).
- When authenticating Condor daemons with GSI and
configuration variable GSI_DAEMON_NAME is undefined,
Condor checks that the server name in the certificate matches the
host name that the client is connecting to.
When GSI_DAEMON_NAME is defined,
the old behavior is preserved: only certificates matching
GSI_DAEMON_NAME pass the authentication step,
and no host name check is performed.
The behavior of the host name check
may be further controlled with the new configuration variables
GSI_SKIP_HOST_CHECK and
GSI_SKIP_HOST_CHECK_CERT_REGEX.
(Ticket #1605).
- Added new capability to condor_submit to allow recursive macros in
submit description files.
This facility allows one to update variables recursively.
Before this new capability was added,
recursive definition would send condor_submit into an
infinite loop of expanding the macro,
such that the expansion would fill up memory.
See section 11 for details.
(Ticket #406).
- A DAGMan limitation and restriction has been removed.
It is now permitted to define a log command using a macro,
within a node job's submit description file.
(Ticket #2428).
- To enforce the dependencies of a DAG,
DAGMan now uses and watches only the default node
user log of the condor_dagman job for events.
DAGMan requests the condor_schedd and condor_shadow daemons to write each
event to this default log,
in addition to writing to a log specified by the node job.
condor_dagman writes POST script terminate events only to its default log;
these terminate events are not written to the user log.
This behavior can be turned off by setting the configuration variable
DAGMAN_ALWAYS_USE_NODE_LOG to False.
For correct behavior,
DAGMAN_ALWAYS_USE_NODE_LOG should be set to False
if condor_dagman version 7.9.0 or later is submitting jobs
to an older version of
a condor_schedd daemon or of a condor_submit executable.
(Ticket #2807).
- condor_submit has a new -interactive option for
platforms other than Windows,
which schedules and runs a job that provides a shell prompt
on the execute machine.
(Ticket #3088).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variables MACHINE_RESOURCE_NAMES
(see section 3.3.10)
and MACHINE_RESOURCE_<name>
(see section 3.3.10)
identify and specify the use of customized local machine resources.
(Ticket #2905).
- The new configuration variable ENABLE_CLASSAD_CACHING
controls whether the new caching feature of ClassAds is used.
The default value is False.
(Ticket #3127).
- The new configuration variable CLASSAD_LOG_STRICT_PARSING
controls whether ClassAd log files such as the job queue
log are read with strict parse checking on ClassAd expressions.
(Ticket #3069).
- The default value for configuration variable USE_PROCD
is now True for the condor_master daemon.
This means that by
default the condor_master will start a condor_procd daemon to be used
by it and all other daemons on that machine.
(Ticket #2911).
- There is a new configuration variable used by the condor_starter.
If STARTER_RLIMIT_AS is set to an integer value,
the condor_starter
will use the setrlimit() system call with the
RLIMIT_AS parameter to
limit the virtual memory size of each process in the user job.
The value of this configuration variable is a ClassAd expression,
evaluated in the context of both the machine and job ClassAds,
where the machine ClassAd is the my ClassAd,
and the job ClassAd is the target ClassAd.
(Ticket #1663).
- New configuration variables were added to to the condor_schedd to
define statistics that count subsets of jobs.
These variables have the form SCHEDD_COLLECT_STATS_BY_<Name> ,
and should be defined by a ClassAd expression that evaluates to a string.
See section 3.3.11
for the complete definition.
The optional configuration variable of the form
SCHEDD_EXPIRE_STATS_BY_<Name> can be used to set an expiration time,
in seconds, for each set of statistics.
(Ticket #2862).
- The new batch_queue submit description file command
and new job ClassAd attribute BatchQueue specify which job
queue to use for grid universe jobs of type
pbs, lsf, and sge.
(Ticket #2996).
- The new configuration variable GSI_SKIP_HOST_CHECK is
a boolean that controls whether a check is performed during
GSI authentication of a Condor daemon.
When the default value False,
the check is not skipped, so the daemon host name must match the
host name in the daemon's certificate, unless otherwise exempted
by values of GSI_DAEMON_NAME or
GSI_SKIP_HOST_CHECK_CERT_REGEX.
When True, this check is skipped, and hosts will not be rejected
due to a mismatch of certificate and host name.
(Ticket #1605).
- The new configuration variable
GSI_SKIP_HOST_CHECK_CERT_REGEX may be set to a
regular expression. GSI certificates of Condor daemons with a
subject name that are matched in full by this regular expression
are not required to have a matching daemon host name and certificate
host name. The default is an empty regular expression, which will
not match any certificates, even if they have an empty subject name.
(Ticket #1605).
Bugs Fixed:
- Fixed a bug in which usage of cgroups incorrectly included the page cache
in the maximum memory usage.
This bug fix is also included in Condor version 7.8.2.
(Ticket #3003).
- The EC2 GAHP will now respect the value of the environment variable
X509_CERT_DIR and the configuration variable
GSI_DAEMON_TRUSTED_CA_DIR for all secure connections.
(Ticket #2823).
- Condor will avoid selecting down (disabled) network interfaces. Previously Condor could select a down interface over an up (active) interface.
(Ticket #2893).
- Made logic in the condor_negotiator that computes submitter limits
properly aware of the configuration variable
NEGOTIATOR_CONSIDER_PREEMPTION .
(Ticket #2952).
- Condor no longer back-dates file modification times by 3 minutes
when transferring job input files into the job spool directory or the job
execute directory.
(Ticket #2423).
- Fixed a bug in which the use of a pipe in the configuration file
on Windows platforms would cause a visible console window
to show up whenever the configuration was read.
(Ticket #1534).
Known Bugs:
Additions and Changes to the Manual:
- Machine ClassAd attribute string values relating to OpSys have
been documented for Scientific Linux platforms.
(Ticket #2366).
Next: 10.5 Stable Release Series
Up: 10. Version History and
Previous: 10.3 Stable Release Series
Contents
Index
htcondor-admin@cs.wisc.edu