While upgrading from the 7.8 series of HTCondor to the 8.0 series will bring many new features and improvements introduced in the 7.9 series of HTCondor, it will also introduce changes that administrators of sites running from an older HTCondor version should be aware of when planning an upgrade. Here is a list of items that administrators should be aware of.
To avoid starting DAGMan jobs from the beginning after the upgrade, the administrator should ensure that no condor_dagman jobs are queued. Do a condor_rm on all condor_dagman jobs and wait for Rescue DAGs to be written before shutting down HTCondor to perform the upgrade. Any condor_dagman jobs that are on hold should be released before being removed. After the upgrade is complete and HTCondor has restarted, all of these DAGMan jobs should be re-submitted. This will cause them to read the appropriate Rescue DAGs and continue on.
To avoid losing work within partially-completed node jobs, an alternative is to use the halt file feature, as described in section 2.10.7. This will cause all condor_dagman jobs to eventually drain from the queue(s). This will take longer than doing a condor_rm on those jobs. condor_dagman jobs drained via the halt file method will also have to be re-submitted after the upgrade.
For example, if machine attribute CheckpointPlatform changed
from LINUX INTEL 2.6.x normal N/A
to
LINUX INTEL 2.6.x normal N/A ssse3 sse4_1 sse4_2
,
use the following command:
condor_qedit -constraint 'LastCheckpointPlatform == "LINUX INTEL 2.6.x normal N/A"' LastCheckpointPlatform "LINUX INTEL 2.6.x normal N/A ssse3 sse4_1 sse4_2"