When a condor_shadow daemon exits, the condor_shadow exit code is recorded in the condor_schedd log, and it identifies why the job exited. Prose in the log appears of the form
Shadow pid XXXXX for job XX.X exited with status YYYwhere YYY is the exit code, or
Shadow pid XXXXX for job XX.X reports job exit reason 100.where the exit code is the value 100. Table 13.1 lists these codes.
Value | Error Name | Description |
4 | JOB_EXCEPTION | the job exited with an exception |
44 | DPRINTF_ERROR | there was a fatal error with dprintf() |
100 | JOB_EXITED | the job exited (not killed) |
101 | JOB_CKPTED | the job did produce a checkpoint |
102 | JOB_KILLED | the job was killed |
103 | JOB_COREDUMPED | the job was killed and a core file was produced |
105 | JOB_NO_MEM | not enough memory to start the condor_shadow |
106 | JOB_SHADOW_USAGE | incorrect arguments to condor_shadow |
107 | JOB_NOT_CKPTED | the job vacated without a checkpoint |
107 | JOB_SHOULD_REQUEUE | same number as JOB_NOT_CKPTED, |
to achieve the same behavior. | ||
This exit code implies that we want | ||
the job to be put back in the job queue | ||
and run again. | ||
108 | JOB_NOT_STARTED | can not connect to the condor_startd or request refused |
109 | JOB_BAD_STATUS | job status != RUNNING on start up |
110 | JOB_EXEC_FAILED | exec failed for some reason other than ENOMEM |
111 | JOB_NO_CKPT_FILE | there is no checkpoint file (as it was lost) |
112 | JOB_SHOULD_HOLD | the job should be put on hold |
113 | JOB_SHOULD_REMOVE | the job should be removed |
114 | JOB_MISSED_DEFERRAL_TIME | the job goes on hold, because it did not run within the |
specified window of time | ||
115 | JOB_EXITED_AND_CLAIM_CLOSING | the job exited (not killed) but the condor_startd |
is not accepting any more jobs on this claim |
Table 13.2 lists codes that appear as the first field within a job event log file. See more detailed descriptions of these values in section 2.6.6.
Event Code | Description |
000 | Submit |
001 | Execute |
002 | Executable error |
003 | Checkpointed |
004 | Job evicted |
005 | Job terminated |
006 | Image size |
007 | Shadow exception |
008 | Generic |
009 | Job aborted |
010 | Job suspended |
011 | Job unsuspended |
012 | Job held |
013 | Job released |
014 | Node execute |
015 | Node terminated |
016 | Post script terminated |
017 | Globus submit (no longer used) |
018 | Globus submit failed |
019 | Globus resource up (no longer used) |
020 | Globus resource down (no longer used) |
021 | Remote error |
022 | Job disconnected |
023 | Job reconnected |
024 | Job reconnect failed |
025 | Grid resource up |
026 | Grid resource down |
027 | Grid submit |
028 | Job ClassAd attribute values added to event log |
029 | Job status unknown |
030 | Job status known |
031 | Grid job stage in |
032 | Grid job stage out |
033 | Job ClassAd attribute update |
034 | DAGMan PRE_SKIP defined |
Server | Port Number |
condor_negotiator | 9614 (obsolete, now dynamically allocated) |
---|---|
condor_collector | 9618 |
GT2 gatekeeper | 2119 |
gridftp | 2811 |
GT4 web services | 8443 |
Number | Name |
60000 | DC_RAISESIGNAL |
60001 | DC_PROCESSEXIT |
60002 | DC_CONFIG_PERSIST |
60003 | DC_CONFIG_RUNTIME |
60004 | DC_RECONFIG |
60005 | DC_OFF_GRACEFUL |
60006 | DC_OFF_FAST |
60007 | DC_CONFIG_VAL |
60008 | DC_CHILDALIVE |
60009 | DC_SERVICEWAITPIDS |
60010 | DC_AUTHENTICATE |
60011 | DC_NOP |
60012 | DC_RECONFIG_FULL |
60013 | DC_FETCH_LOG |
60014 | DC_INVALIDATE_KEY |
60015 | DC_OFF_PEACEFUL |
60016 | DC_SET_PEACEFUL_SHUTDOWN |
60017 | DC_TIME_OFFSET |
60018 | DC_PURGE_LOG |
Exit Code | Description |
0 | Normal exit of daemon |
99 | DAEMON_SHUTDOWN evaluated to True |