Next: 10.5 Stable Release Series
Up: 10. Version History and
Previous: 10.3 Stable Release Series
Contents
Index
Subsections
10.4 Development Release Series 8.1
This is the development release series of HTCondor.
The details of each version are described below.
Version 8.1.6
Release Notes:
- HTCondor version 8.1.6 released on May 22, 2014.
New Features:
- HTCondor can discover, schedule, and manage GPUs in an
exceedingly simple way by inserting
use feature : GPUs
in the configuration file.
The HTCondor wiki page,
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToManageGpus,
describes the capabilities.
- The grid universe can now be used to submit and manage jobs on
a BOINC server, using the new grid type boinc.
(Ticket #3540).
- Configuration has been enhanced in structure and with
newly implemented semantics describing configuration.
As part of this effort, most all configuration variables have
compile-time defaults specified and incorporated into the code.
Therefore, they no longer appear in the example, distributed
configuration file.
It is only when values change that these variables will be placed
into a configuration file.
For current installations wishing to transition to the new, stripped down
configurations files,
the new -writeconfig option to condor_config_val will
help to identify values different from defaults.
New configuration semantics permit
- the inclusion of configuration defined elsewhere.
See section 3.3.1 for a description.
- metaknobs, which incorporate predefined sets of configuration
that are commonly used.
See section 3.3.1 for a description.
- a simple if/else syntax for conditional specification of
configuration.
See section 3.3.1 for a description.
(Ticket #4325).
(Ticket #3894).
(Ticket #4319).
(Ticket #4031).
(Ticket #4211).
- When hierarchical group quotas are used, and surplus
sharing is enabled, the quotas are now correctly computed
if slot weights are also enabled.
(Ticket #4324).
- The default priority factor set for new users is now 1000.
This was changed from a default value of 1, because a value of 1
leaves no room to boost the priority factor.
(Ticket #4282).
- The condor_schedd may now keep open a configurable number
of job event log files.
This improves performance over the previous behavior of
open, write, close done for each event.
New configuration variables USERLOG_FILE_CACHE_MAX and
USERLOG_FILE_CACHE_CLEAR_INTERVAL specify the number
of job event log files that may be kept open at the same time
and the periodic interval of time that passes
before the set of open files are closed.
(Ticket #4040).
- The curl file transfer plug-in can now be used to transfer output
files in addition to input files.
(Ticket #4190).
- New python bindings allow the user access to the same
file locking protocol as HTCondor daemons.
(Ticket #4315).
- The DAGMan node status file formatting has changed.
The format of the DAG node status file is now New ClassAds,
and the amount of information in the file has increased.
Section 2.10.12 has details on node status files.
(Ticket #4115).
- The new configuration variable STARTER_LOG_NAME_APPEND
controls the file name extension of the log used by the condor_starter.
(Ticket #4244).
- The new configuration variable
ENVIRONMENT_VALUE_FOR_UnAssigned<name>
is intended for use with GPUs, where <name> is GPUs.
It defines what GPU ID to assign to slots that have no assigned GPU.
Without this, the CUDA runtime would allow slots with no assigned GPU to use
all of the GPUs.
(Ticket #4320).
- The batch system name HTCondor is now published in
each job's environment.
(Ticket #4233).
- New configuration variables UDP_NETWORK_FRAGMENT_SIZE and
UDP_LOOPBACK_FRAGMENT_SIZE added to control UDP message
fragmentation size over the network and loopback interface,
respectively.
(Ticket #4321).
- The new condor_pool_job_report tool for Linux platforms
composes and mails a report about all jobs run in the previous
24 hours on all execute machines within the pool.
(Ticket #4267).
- HTCondor now publishes more I/O statistics as job ClassAd attributes.
The new attributes are
BlockReads,
BlockWrites,
RecentBlockReads,
RecentBlockWrites,
RecentBlockReadKbytes, and
RecentBlockWriteKbytes.
(Ticket #3850).
- The new job ClassAd attribute SpoolOnEvict facilitates
the debugging of failed jobs.
(Ticket #4292).
- Memory corruption mitigation is enabled by additional linker flags,
when building HTCondor from source against system-shared
libraries installed by the distribution.
(Ticket #4153).
- An experimental new feature to overlap the transfer of job output
with the execution of a subsequent job is documented with a link from
the HTCondor wiki page,
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalFeatures.
(Ticket #4291).
- An experimental new feature to provide custom output formatting
for condor_q and condor_status is documented with a link from
the HTCondor wiki page,
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalFeatures.
(Ticket #4241).
Bugs Fixed:
- The condor_shared_port daemon no longer blocks
on a very unresponsive daemon.
(Ticket #4314).
- vm universe jobs now report attribute RemoteUserCPU when
run on a KVM hypervisor.
CPU usage remains unreported by VMware hypervisors.
(Ticket #4337).
- The condor_gridmanager no longer assumes that a NorduGrid ARC job
with a reported exit code greater than 128 exited abnormally via a signal.
(Ticket #4342).
- Many tools, including condor_off and condor_restart interpreted
the command line argument -defrag incorrectly as -debug,
since both words start with the string "de".
The confusion has been fixed.
Use of -defrag will now produce an error message,
since it is not a valid option for these tools.
(Ticket #3717).
- Fixed a crash by the condor_gpu_discovery tool,
when running on a 32-bit platform or on Windows and detecting via OpenCL.
(Ticket #4339).
Version 8.1.5
Release Notes:
- HTCondor version 8.1.5 released on April 15, 2014.
New Features:
- The default configuration now implements a policy
that disables preemption.
(Ticket #4281).
- The protocol for interaction between condor_q and the
condor_schedd daemon has been rewritten.
The new protocol does not require the condor_schedd to fork a child process
and does not cause blocking;
the result is that the condor_schedd should be able to handle
many concurrent condor_q requests with minimal resource usage.
(Ticket #4111).
- The specification in configuration for the size or amount of time
that a log file may grow has changed.
An explicit size or amount of time may still be specified for any
individual log file.
However, any log files not explicitly specified have a default maximum
size specified by the new configuration variable
MAX_DEFAULT_LOG.
(Ticket #4246).
- The new condor_urlfetch tool is enables the acquisition of
configuration with a query to a URL.
(Ticket #4018).
- The cream_gahp and nordugrid_gahp can now talk to
servers over IPv6.
(Ticket #4243).
- The python bindings can now accept a list of condor_collector hosts
in the constructor of the Collector object.
This eases use of the bindings for high availability setups.
(Ticket #4245).
- The new python binding transaction creates a transaction
with the condor_schedd,
providing a way to submit multiple clusters of jobs
or edit multiple attributes atomically.
(Ticket #4225).
- New configuration variable NEGOTIATOR_MAX_TIME_PER_CYCLE
places an upper time limit on the time spent in each negotiation cycle.
(Ticket #4271).
- The configuration variable VALID_SPOOL_FILES has been redefined
to list only files that the system administrator determines must not
be removed by condor_preen.
The new configuration variable SYSTEM_VALID_SPOOL_FILES contains
a predetermined list of files that are known to be valid at
the time HTCondor was built.
condor_preen will use the union of these two configuration variables
as the set of valid files that should not be removed from the SPOOL
directory.
(Ticket #4257).
- The new configuration variable OFFLINE_MACHINE_RESOURCE_<name>
is used to identify a custom machine resource as offline,
so that the resource will not be allocated to any slot.
(Ticket #4177).
- The default value of configuration variable
NEGOTIATOR_USE_WEIGHTED_DEMAND has been changed from
False to True.
(Ticket #4238).
- The new configuration variable
NEGOTIATOR_TRIM_SHUTDOWN_THRESHOLD can be used to avoid
making matches to resources that are about to go away.
It is primarily of interest to glidein pools.
Section 3.3.17 details the new
configuration variable.
(Ticket #4266).
- No user-visible changes result from reductions in the quantity of
unused memory within DaemonCore data structures.
(Ticket #4206).
- The condor_negotiator logs more information about its round robin
iteration to ease debugging.
(Ticket #3871).
- Some communications between daemons will cause fewer network timeouts,
as the reading of commands no longer blocks while
waiting for completion of the command.
(Ticket #4237).
Bugs Fixed:
Version 8.1.4
Release Notes:
- HTCondor version 8.1.4 released on February 27, 2014.
- This version of HTCondor includes all bug fixes from version 8.0.6,
as well as the new full port for the Red Hat Enterprise Linux 7.0 Beta
release on the x86_64 architecture.
A full port includes support for the standard universe.
New Features:
- When configured to use partitionable slots,
those slots running jobs can now be preempted by the
condor_negotiator daemon based on the value of
the machine's configuration of RANK.
(Ticket #3667).
- Improved support for publishing monitoring information about an
HTCondor pool to GangliaTM.
Added Ganglia statistics for total job starts and total job preemptions
within a condor_startd.
This allows Ganglia to graph the total job preemptions
across all condor_startd daemons in a pool.
See section 3.3.37 for configuration variable definitions,
and section 3.10.1 for details about monitoring
with Ganglia.
(Ticket #4151).
(Ticket #3965).
- The grid universe can now be used to create and manage VM instances
in Google Compute Engine (GCE), using the new grid type gce.
(Ticket #3833).
- As a scalability improvement for Unix platforms,
the condor_shared_port daemon no longer forks on incoming connections.
(Ticket #4094).
- condor_ssh_to_job and interactive jobs no longer try to
connect to held jobs.
They instead report the hold and the reason why the job is being held.
(Ticket #3867).
- Improved the restart time of the condor_schedd after it has crashed.
(Ticket #4169).
- The new configuration variable EC2_RESOURCE_TIMEOUT sets
the amount of time that HTCondor will wait for an unresponsive EC2 service
before placing the corresponding jobs on hold.
(Ticket #4113).
- The new python binding refreshGSIProxy()
can refresh a remote job's GSI proxy as a part of the Schedd object.
(Ticket #4116).
- By default,
the TCP keep alive interval is automatically tuned to 5 minutes.
This causes at least one packet to be sent on established,
but idle, TCP connections once every 5 minutes,
and it speeds up the detection of connections that were silently dropped
by NAT or firewall devices.
Without this,
the condor_shadow may not reliably recover from transient network failures.
This behavior is controlled by the new configuration variable
TCP_KEEPALIVE_INTERVAL.
Setting this variable to 0 restores the prior behavior.
In addition, the configuration variable CCB_HEARTBEAT_INTERVAL
default value has been reduced to 5 minutes.
(Ticket #4122).
- New python ClassAd module function calls
Attribute(), Function(), Literal(),
flatten(), matches(), and symmetricMatch()
aid the composition of ClassAd expressions.
It should now be possible to build expressions directly
in python, without having to resort to string manipulation.
(Ticket #4154).
- For those that use the Python bindings,
the LD_LIBRARY_PATH environment variable no longer needs to be set.
(Ticket #4128).
- The Python bindings are now compatible with Python 3.
(Ticket #4146).
- Setting configuration variable
DAGMAN_ALWAYS_USE_NODE_LOG to False
or using the corresponding -dont_use_default_node_log option
to condor_submit_dag is no longer recommended.
It is no longer recommended to have condor_dagman read the log files
specified in the node job submit description files.
(Ticket #4091).
- Invoking condor_fetchlog with the STARTD_HISTORY argument
now fetches all condor_startd history by concatenating all instances
of log files resulting from rotation to the current history log.
(Ticket #4152).
- Several general mechanisms for specifying user-defined condor_startd
resources have been enhanced,
so that GPUs can be easily defined and used.
New to this 8.1.4 version of HTCondor is the allocation of user defined
resources (especially GPUs) with partitionable and dynamic slots.
This includes having HTCondor automatically set the environment variable
CUDA_VISIBLE_DEVICES for jobs that use CUDA GPUs
and GPU_DEVICE_ORDINAL for jobs that use OpenCL GPUs.
The mechanism defines configuration variables
MACHINE_RESOURCE_<name> and
MACHINE_RESOURCE_INVENTORY_<name>
to specify the definition user-defined resources with a list of resource
identifiers.
When HTCondor allocates one of these user-defined resources to a slot,
it will also publish this assignment within the slot's ClassAd
using the new job ClassAd attribute Assigned<name>.
And, it will define in the job's environment the variable
_CONDOR_Assigned<name>.
The new configuration variable ENVIRONMENT_FOR_Assigned<name>
also sets further environment variables.
(Ticket #4141).
(Ticket #4148).
- The new condor_gpu_discovery tool detects CUDA and OpenCL GPUs,
reporting them in the format needed to configure GPU resources
using the configuration variable
MACHINE_RESOURCE_INVENTORY_GPUs.
(Ticket #3386).
- Two new pre-defined configuration variables are referenced with
$(DETECTED_PHYSICAL_CPUS) and $(DETECTED_CPUS).
$(DETECTED_PHYSICAL_CPUS) contains the number of
physical (non-hyperthreaded) CPUs.
$(DETECTED_CPUS) will match the value of
either DETECTED_CORES or DETECTED_PHYSICAL_CPUS,
depending on the state of COUNT_HYPERTHREAD_CPUS.
The default value of NUM_CPUS now defaults to the value
of DETECTED_CPUS.
(Ticket #4197).
- condor_q will now show the macro-expanded job description from the attribute
MATCH_EXP_JobDescription instead of JobDescription if it is available.
(Ticket #4110).
Bugs Fixed:
- Fixed a small memory leak that was triggered by failed
file transfer attempts.
(Ticket #4134).
- Fixed a bug that would leak one socket in each daemon,
when NO_DNS = True.
(Ticket #4140).
- Changed the way the condor_startd allocates CPUs to
slots in configurations where there are more slots than CPUs.
CPUs are now distributed equally between slots that are not configured
to receive a specific number
(using configuration variable SLOT_TYPE_<N>).
Before this change, these slots received 1 CPU each.
The new behavior matches how other slot resources are distributed.
(Ticket #3249).
- The failure to terminate an EC2 grid universe job instance,
because the instance no longer exists at the service,
is now considered a successful termination.
This allows EC2 grid universe jobs to exit the queue,
if the service purges termination records quickly.
(Ticket #4133).
- HTCondor now interacts with EC2 services by using POST
instead of GET,
which permits more services to accept user data with size greater than 8Kbytes.
(Ticket #4004).
- Improved the handling of the coresize
submit description file command,
by allowing values larger than 4Gbytes.
(Ticket #4155).
- Fixed a bug that caused job arguments to not be displayed in the
default output of condor_q when the submit description file used the
new syntax for job arguments.
(Ticket #2875).
- The condor_startd daemon will no longer abort when it exhausts
the supply of user-defined resources such as GPUs
while assigning automatic resource shares to slots.
(Ticket #4176).
Version 8.1.3
Release Notes:
- HTCondor version 8.1.3 released on December 23, 2013.
This developer release contains all bug fixes from HTCondor version 8.0.5.
New Features:
- The parsing of configuration has changed with respect to how
line continuation characters and comments interact.
The line continuation character no longer takes precedence over the
comment character.
(Ticket #4027).
- When the super user issues a command
or when the new condor_sos tool invokes another tool,
the command can be serviced with a higher priority.
This should be useful when attempting to get information from an
overloaded daemon, in order to diagnose or fix a problem.
Commands directed at the condor_schedd or condor_collector daemons
have this ability by default.
Other DaemonCore daemons require configuration using the new
configuration variable
<SUBSYS>_SUPER_ADDRESS_FILE.
(Ticket #4029).
- The dedicated scheduler cpu usage within the condor_schedd is now
throttled, so that it cannot consume all of the cpu, while starving the vanilla
scheduler. This throttle can be adjusted by the new configuration variable
DEDICATED_SCHEDULER_DELAY_FACTOR.
This variable, which defaults to five,
sets the ratio of time spent not in the dedicated scheduler to the
time scheduling parallel jobs.
With this default of five,
a maximum of 20% of the scheduler's time will go to scheduling
parallel jobs.
(Ticket #4048).
- The new condor_defrag daemon ClassAd attribute
MeanDrainedArrived
measures the mean time between arrivals of fully drained machines,
and the new attribute DrainedMachines
measures the total numbers of fully drained machines
which have arrived during the run time of this condor_defrag daemon.
(Ticket #4055).
- The new -defrag option for condor_status queries ClassAds
of the condor_defrag daemon.
(Ticket #4039).
- Machine ClassAd attributes ExpectedMachineQuickDrainingCompletion
and ExpectedMachineGracefulDrainingCompletion are updated with their
completion times if there are no active claims,
making these attributes more useful in setting policy for
partitionable slots.
(Ticket #3481).
- In a DAG, the node retry number is now available as VARS macro
(see section 2.10.8).
(Ticket #4032).
- Macro substitution both within configuration and within submit
description files has been extended to specify and use
an optional default value if a value is not defined.
Section 3.3.1 has details for configuration.
(Ticket #4033).
- The Python bindings htcondor module has
a new read_events() method to acquire an iterator of
an HTCondor event log file.
(Ticket #4071).
- The new -daemons option to condor_who prints information
about the HTCondor daemons running on the specified machine,
including the daemon's PID, IP address and command port.
(Ticket #4007).
Configuration Variable and ClassAd Attribute Additions and Changes:
- Configuration variable DAGMAN_DEFAULT_NODE_LOG
has been made more powerful,
so that it can be defined in HTCondor configuration files,
instead of being useful only when defined in a per-DAG configuration file.
See section 3.3.25 for details.
(Ticket #3930).
- The new configuration variable CORE_FILE_NAME is used to set
the name that DaemonCore uses to create a core file,
in the event of a daemon crash.
The default value for this configuration variable appends the daemon name,
so a crash of the condor_schedd would create a core file named
core.SCHEDD.
(Ticket #4100).
- The new configuration variable JOB_EXECDIR_PERMISSIONS
defines the permissions on a job's scratch directory.
It defaults to setting permissions as 0700.
(Ticket #4016).
- The following recently added machine ClassAd attributes have been renamed.
- TotalJobStarts became JobStarts.
- RecentTotalJobStarts became RecentJobStarts.
- TotalPreemptions became JobPreemptions.
- RecentPreemptions became RecentJobPreemptions.
- TotalRankPreemptions became JobRankPreemptions.
- RecentTotalRankPreemptions became RecentJobRankPreemptions.
- TotalUserPrioPreemptions became JobUserPrioPreemptions.
- RecentTotalUserPrioPreemptions became RecentJobUserPrioPreemptions.
(Ticket #4101).
- The new condor_schedd statistics ClassAd attribute
Autoclusters gives the number of active autoclusters.
(Ticket #4020).
Bugs Fixed:
Known Bugs:
Additions and Changes to the Manual:
Version 8.1.2
Release Notes:
- HTCondor version 8.1.2 released on October 31, 2013.
This 8.1.2 release contains all bug fixes from HTCondor version 8.0.4.
New Features:
- condor_config_val now supports -dump and -verbose
options to query configuration remotely from daemons.
(Ticket #3894).
- The condor_chirp protocol and command line tool has been
enhanced to support lower-cost, delayed updates to the job
ClassAd residing in the condor_schedd; updates occur as other communications
take place, eliminating the overhead of a separate update.
These two new Chirp commands,
set_job_attr_delayed and get_job_attr_delayed allow the job
to send lightweight notification for events such as progress
monitoring, which need not be durable.
(Ticket #3353).
- condor_history has been enhanced to support
remote history using new -pool and -name options.
(Ticket #3897).
- Matchmaking in the condor_negotiator may be made aware of resources
available for partitionable slots.
This permits multiple jobs to be matched against a partitionable slot
during a single negotiation cycle.
The new policies discussed in Section 3.5.10
are set using new configuration variables and are known as consumption policies.
(Ticket #3435).
- Definition syntax for the authorization configuration variables
ALLOW_* and DENY_* has been expanded to permit
the specification of Unix netgroups.
See section 3.6.7 for the syntax.
(Ticket #3859).
- Definition syntax for the configuration variable
QUEUE_SUPER_USERS has been expanded to accept a specification
of Unix user groups.
See section 3.3.11 for the syntax.
(Ticket #3859).
- To ensure that a grid universe job running at an EC2 service
terminates,
HTCondor now checks after a fixed time interval
that the job actually has terminated,
instead of relying on the service's potentially unreliable
job shut down indication.
If the job has not terminated after a total of four checks,
the job is placed on hold; it does not leave the queue marked as completed.
(Ticket #3438).
- Email alerts about file transfers taking longer than
MAX_TRANSFER_QUEUE_AGE are now grouped together
to reduce the number of email messages that are sent.
- Floating point values in Old ClassAds are now printed in a more
human-readable format, while retaining 64-bit double precision.
In previous versions, these values were always printed in scientific
notation.
(Ticket #3928).
- condor_ssh_to_job now works with grid universe jobs
which use EC2 resources.
(Ticket #1548).
- Machine ClassAd attributes Disk and TotalDisk
are now published as 64-bit integers,
rather than being capped at the maximum value of a 32-bit integer.
(Ticket #1784).
- In an effort to improve scalability under heavy load, the tuning
configuration variable MAX_REAPS_PER_CYCLE is exposed,
as defined at section 3.3.5.
The default for this variable changed from 1 to 0.
(Ticket #3992).
- To reduce the overwhelming quantity of per-user condor_schedd
statistics that are generated when configuration variables
SCHEDD_COLLECT_STATS_FOR_<Name> or
SCHEDD_COLLECT_STATS_BY_<Name> are used,
the statistics are now published at verbosity level 2,
instead of verbosity level 1.
(Ticket #3980).
- The Python bindings now include the Negotiator class to
manage users and their priorities.
(Ticket #3893).
- The Python bindings now provide automatic conversions from
dictionaries to ClassAds,
so they can accept a dictionary directly as an argument,
rather than constructing a ClassAd from the dictionary.
(Ticket #3892).
- The Python bindings ClassAd module has
quote() and unquote()
methods to help create string literals.
(Ticket #3900).
- The Python bindings ClassAd module has new
methods parseAds() and parseOldAds()
that implement an iterator over ClassAds, in the New ClassAd and
Old ClassAd format.
(Ticket #3918).
- The ordering of adding attributes to the machine ClassAd has been
changed, such that the attributes Draining, DrainingRequestId,
and LastDrainStartTime are now added before the job retirement
is calculated.
This allows a decision about preemption to be made based on if
a machine is currently draining.
(Ticket #3901).
Bugs Fixed:
- When USE_PID_NAMESPACES is True,
the soft kill signal is now successfully sent to the job.
Previously, a condor_rm
command of such a job would not remove the job until the
killing timeout had expired.
(Ticket #3981).
- If a standard universe job exited without producing any
checkpoints and no checkpoint server was used,
two spurious error messages would be logged to the SchedLog,
as it tried to remove the old checkpoint images from the
non-existent checkpoint server.
These error messages are no longer logged.
(Ticket #3919).
- When configuration variable STARTER_RLIMIT_AS is set
to its default value of 0, it means that there is no limit.
This value was logged as a limit of 0Mb, leading to confusion.
Now, no message is logged in this default case.
(Ticket #3914).
- Improved how the condor_schedd notifies the condor_shadow
and condor_gridmanager about modifications to job ClassAds made using
condor_qedit.
(Ticket #3909).
- Grid universe jobs now use the correct executable file when
copy_to_spool is set to True.
Previously, the executable file named in the submit description file
would be copied to the remote server,
rather than the copy of the executable file stored in the spool directory.
(Ticket #3589).
- The example configuration provided within files
condor_config.generic and condor_config.generic.redhat
has been updated to fix an inadequate expression defining
NEGOTIATOR_POST_JOB_RANK when the condor_startd is
configured to not run benchmarks, as Kflops would not be defined.
(Ticket #3589).
- Fixed a Python binding crash due to a segmentation fault,
when evaluating an expression tree with an undefined reference.
The fix allows the user to define the ClassAd scope
within which an expression tree is evaluated.
(Ticket #3910).
- The Python bindings now include a correct conversion of
absTime and relTime ClassAd literals to the
corresponding Python types.
(Ticket #3911).
Version 8.1.1
Release Notes:
- HTCondor version 8.1.1 released on September 17, 2013.
This release contains all bug fixes from the stable release version 8.0.2.
New Features:
Configuration Variable and ClassAd Attribute Additions and Changes:
- The new configuration variable USE_RESOURCE_REQUEST_COUNTS
is a boolean value that defaults to True,
reducing the latency of negotiation
when there are many jobs next to each other in the queue
with the same auto cluster, and many matches are being made.
(Ticket #3585).
- Four new machine ClassAd attributes are advertised.
TotalJobStarts is the total number of jobs started by
this condor_startd daemon since it booted.
RecentTotalJobStarts is the number of jobs started in the
last twenty minutes.
Similarly, TotalPreemptions is
the number of jobs preempted since the condor_startd daemon started,
and RecentTotalPreemptions is
the number in the last 20 minutes.
(Ticket #3712).
- FILE_TRANSFER_DISK_LOAD_THROTTLE now accepts tabs in addition to spaces as delimiters.
(Ticket #3798).
- Configuration variable VALID_SPOOL_FILES has been expanded
to accept a single asterisk wild card character in each listed file name.
(Ticket #3764).
- The new configuration variable GAHP_DEBUG_HIDE_SENSITIVE_DATA
is a boolean value that defaults to hiding sensitive data
such as security keys and passwords
when communication with a GAHP server is written to a daemon log.
(Ticket #3536).
- The default value of configuration variable
ENABLE_CLASSAD_CACHING has changed to True for all
daemons other than the condor_shadow, condor_starter, and condor_master.
(Ticket #3441).
Bugs Fixed:
- The condor_gridmanager now does proper failure recovery when
submitting EC2 grid universe jobs to services that do not support
the EC2 ClientToken parameter.
Previously, if there was a failure when submitting jobs to OpenStack
or Eucalyptus, the jobs could be submitted twice.
(Ticket #3682).
- Fixed the printing of nested ClassAds, so that the nested ClassAds
can be read back properly.
(Ticket #3772).
- Fixed a bug between the condor_gridmanager and condor_ft-gahp
that caused file transfers to fail if one of the two daemons was older
than version 8.1.0.
(Ticket #3856).
- Fixed a bug that caused substitution in configuration variable
evaluation to ignore per-daemon overrides.
This is a long standing bug that may result in subtle changes
to the way your configuration files are processed.
An example of how substitution works with the per-daemon overrides
is in section 3.3.1.
(Ticket #3822).
- Fixed a bug that caused the command
condor_submit -
to be interpreted as an interactive submit,
rather than a request to read input from stdin.
condor_qsub was also modified to be immune to this bug,
such that it will still work with other versions of HTCondor containing
the bug.
(Ticket #3902).
Known Bugs:
- DAGMan recovery mode does not work for Pegasus-generated sub-DAGs.
For sub-DAGs, doing condor_hold or condor_release on
the condor_dagman job, or stopping and re-starting the
condor_schedd with the DAGMan
job in the queue will result in failure of the DAG. This can be
avoided by doing a condor_rm of the DAGMan job, which produces a Rescue
DAG, and re-submitting the DAG; the Rescue DAG is automatically run.
This bug was introduced in HTCondor version 8.0.1, and it also appears
in versions 8.0.2, 8.1.0, and 8.1.1.
(Ticket #3882).
Additions and Changes to the Manual:
Version 8.1.0
Release Notes:
- HTCondor version 8.1.0 released on August 5, 2013.
This release contains all bug fixes from the stable release version 8.0.1.
New Features:
- Added support for publishing information about an HTCondor pool
to GangliaTM.
See section 3.3.37 on
page for configuration variable details.
(Ticket #3515).
- Improved the performance of the condor_collector daemon when running
at sites that do not observe daylight savings time.
(Ticket #2898).
- condor_q, condor_rm, condor_status and condor_qedit are now
more consistent in the way they handle the -constraint option.
(Ticket #1156).
- The new condor_dagman_metrics_reporter executable
with manual page at ,
reports metrics for DAGMan workflows running under Pegasus. condor_dagman
now generates an output file of the relevant metrics,
as described at .
(Ticket #3532).
Configuration Variable and ClassAd Attribute Additions and Changes:
- The default value of configuration variable
COLLECTOR_MAX_FILE_DESCRIPTORS has changed to 10240,
and the default value of configuration variable
SCHEDD_MAX_FILE_DESCRIPTORS has changed to 4096.
This increases the scalability of the default configuration.
(Ticket #3626).
- The new configuration variable
FILE_TRANSFER_DISK_LOAD_THROTTLE enables dynamic
adjustment of the level of file transfer concurrency in order to
keep the disk load generated by transfers below a specified level.
Supporting this new feature are configuration variables
FILE_TRANSFER_DISK_LOAD_THROTTLE_WAIT_BETWEEN_INCREMENTS,
FILE_TRANSFER_DISK_LOAD_THROTTLE_SHORT_HORIZON, and
FILE_TRANSFER_DISK_LOAD_THROTTLE_LONG_HORIZON.
(Ticket #3613).
- The following new condor_schedd ClassAd attributes are for
monitoring file transfer activity:
TransferQueueMBWaitingToDownload,
TransferQueueMBWaitingToUpload,
FileTransferDiskThrottleLevel,
FileTransferDiskThrottleHigh, and
FileTransferDiskThrottleLow.
(Ticket #3613).
- The default value for the configuration variable
PASSWD_CACHE_REFRESH has been changed from 300 seconds to
72000 seconds (20 hours).
(Ticket #3723).
- The new configuration variables
DAGMAN_PEGASUS_REPORT_METRICS and
DAGMAN_PEGASUS_REPORT_TIMEOUT
set defaults used by the new condor_dagman_metrics_reporter executable,
which reports metrics for DAGMan jobs running under Pegasus.
(Ticket #3532).
Bugs Fixed:
- HTCondor version 8.0.0 had an unintended change in the Chirp
wire protocol.
This change caused condor_chirp with the put option
to fail when the execute node
was running HTCondor version 7.8.x or earlier versions.
HTCondor 8.0.1 and later
versions will now send the original wire protocol, and accept either the
original protocol, or the variant that HTCondor version 8.0.0 sends.
(Ticket #3735).
- Fixed a bug that could cause the daemons to crash on Unix platforms,
if the operating system reported that a job owner's account
did not exist, for example due to a temporary NIS or LDAP failure.
(Ticket #3723).
- Fixed a bug that resulted in a misleading error message when
condor_status with the -constraint option specified a constraint
that could not be parsed.
(Ticket #1319).
- Fixed a typo in the output of condor_q,
where a period was erroneously present within a heading.
(Ticket #3703).
Known Bugs:
Additions and Changes to the Manual:
Next: 10.5 Stable Release Series
Up: 10. Version History and
Previous: 10.3 Stable Release Series
Contents
Index