When a condor_shadow daemon exits, the condor_shadow exit code is recorded in the condor_schedd log, and it identifies why the job exited. Prose in the log appears of the form
where YYY is the exit code, or
where the exit code is the value 100. Table B.1 lists these codes.
Value | Error Name | Description |
4 | JOB_EXCEPTION | the job exited with an exception |
44 | DPRINTF_ERROR | there was a fatal error with dprintf() |
100 | JOB_EXITED | the job exited (not killed) |
101 | JOB_CKPTED | the job did produce a checkpoint |
102 | JOB_KILLED | the job was killed |
103 | JOB_COREDUMPED | the job was killed and a core file was produced |
105 | JOB_NO_MEM | not enough memory to start the condor_shadow |
106 | JOB_SHADOW_USAGE | incorrect arguments to condor_shadow |
107 | JOB_NOT_CKPTED | the job vacated without a checkpoint |
107 | JOB_SHOULD_REQUEUE | same number as JOB_NOT_CKPTED, |
to achieve the same behavior. | ||
This exit code implies that we want | ||
the job to be put back in the job queue | ||
and run again. | ||
108 | JOB_NOT_STARTED | can not connect to the condor_startd or request refused |
109 | JOB_BAD_STATUS | job status != RUNNING on start up |
110 | JOB_EXEC_FAILED | exec failed for some reason other than ENOMEM |
111 | JOB_NO_CKPT_FILE | there is no checkpoint file (as it was lost) |
112 | JOB_SHOULD_HOLD | the job should be put on hold |
113 | JOB_SHOULD_REMOVE | the job should be removed |
114 | JOB_MISSED_DEFERRAL_TIME | the job goes on hold, because it did not run within the |
specified window of time | ||
115 | JOB_EXITED_AND_CLAIM_CLOSING | the job exited (not killed) but the condor_startd |
is not accepting any more jobs on this claim | ||