Condor Administration Tutorial: Hands On Workbook

You will probably want to keep two windows open, one for root, one for student.

These examples assume the use of a Bourne-style shell.

Installing Condor

root: Installation and Configuration

These steps should all be done as the root user.

First, create a condor user. This isn't strictly necessary, but it reduces the amount of configuration we'll need to do.

% adduser condor
% chmod a+rx ~condor

Now we will install and configure Condor. We'll put it into /tmp/condor now. In a production system you might want to place it onto a shared filesystem and share the installation between machines.

We'll be using the most recent release, Condor version 6.5.5. You can download it from the official site, but we'll try to provide a local mirror for this tutorial.

% cd /tmp
% wget http://local.mirror.example.com/condor-6.5.5-linux-x86-glibc23.tar.gz
% tar -xzf condor-6.5.5-linux-x86-glibc23.tar.gz
% cd condor-6.5.5
% ./condor_configure --install --install-dir=/tmp/condor \
--local-dir=/tmp/condor/var --owner=condor
 
Unable to find a valid Java installation
Java Universe will not work properly until the JAVA
(and JAVA_MAXHEAP_ARGUMENT) parameters are set in the configuration file!
 
Condor has been installed into:
    /tmp/condor
 
In order for Condor to work properly you must set your
CONDOR_CONFIG environment variable to point to your
Condor configuration file:
    /tmp/condor/etc/condor_config
before running Condor commands/daemons.  

It's worth noting that the default policy generated by condor_configure sets the machine up to always accept and run jobs, a good default for testing and our tutorial. (START = TRUE, SUSPEND = FALSE, PREEMPT = FALSE, VACATE = FALSE)

Condor looks in several locations for its configuration file, including in the location specified in the CONDOR_CONFIG environment variable. But ~condor/condor_config is the most straight forward. We'll link the configuration file to ~condor/condor_config.

% ln -s /tmp/condor/etc/condor_config ~condor/

For ease of use, put the Condor binaries in your path:

% PATH=$PATH:/tmp/condor/bin:/tmp/condor/sbin

condor_configure has made several guesses that aren't strictly correct in our environment. The system will work, but we'll run into some nuisances. So we'll correct the assumptions.

Condor assumes that systems with the same FILESYSTEM_DOMAIN have a shared filesystem. condor_configure has guessed that these systems do share a filesystem and has set the FILESYSTEM_DOMAIN to nesc.ed.ac.uk. However, these systems don't share a filesystem, so we need to set FILESYSTEM_DOMAIN to a unique value for each computer. $(FULL_HOSTNAME) should be unique and is a good option.

Because Condor takes the last setting in its configuration files, we can simply append the correct value to the end of the local configuration file. This is convient for a tutorial, but will leave lots of out of date information in the file in a production environment.

TODO: CONDOR_ADMIN? Defaults to root@$(FULL_HOSTNAME) (evaluated). Will mail work?

% echo 'FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)' >> /tmp/condor/var/condor_config.local
% condor_config_val FILESYSTEM_DOMAIN
lab-07.nesc.ed.ac.uk

If you're not sure what value Condor is using, you can check with condor_config_val. The -verbose option will tell you where it is defined, useful for complex files.

% condor_config_val -verbose FILESYSTEM_DOMAIN
FILESYSTEM_DOMAIN: lab-07.nesc.ed.ac.uk
  Defined in '/tmp/condor/var/condor_config.local', line 38.

Start up Condor:

% condor_master

Verify that it started:

% ps -efwwww | grep condor_
condor    2782     1  0 16:39 ?        00:00:00 condor_master
condor    2786  2782  0 16:39 ?        00:00:00 condor_collector -f
condor    2787  2782  0 16:39 ?        00:00:00 condor_negotiator -f
condor    2788  2782 81 16:39 ?        00:00:08 condor_startd -f
condor    2789  2782  0 16:39 ?        00:00:00 condor_schedd -f

You might also see some condor_starters and perhaps Java applications. Those are transient, Condor is simply probing for capabilities on start up.

Check out the condor_master's log file:

% condor_config_val MASTER_LOG
/tmp/condor/var/log/MasterLog
% cat /tmp/condor/var/log/MasterLog
10/21 17:27:00 ******************************************************
10/21 17:27:00 ** condor_master (CONDOR_MASTER) STARTING UP
10/21 17:27:00 ** $CondorVersion: 6.5.5 Sep 16 2003 $
10/21 17:27:00 ** $CondorPlatform: INTEL-LINUX-GLIBC23 $
10/21 17:27:00 ** PID = 3723
10/21 17:27:00 ******************************************************
10/21 17:27:00 Using config file: /home/condor/condor_config
10/21 17:27:00 Using local config files: /tmp/condor/var/condor_config.local
10/21 17:27:00 DaemonCore: Command Socket at <129.215.30.215:2008>
10/21 17:27:00 Started DaemonCore process "/tmp/condor/sbin/condor_collector", pid and pgroup = 3724
10/21 17:27:00 Started DaemonCore process "/tmp/condor/sbin/condor_negotiator", pid and pgroup = 3725
10/21 17:27:00 Started DaemonCore process "/tmp/condor/sbin/condor_startd", pid and pgroup = 3726
10/21 17:27:00 Started DaemonCore process "/tmp/condor/sbin/condor_schedd", pid and pgroup = 3727

Looks healthy.

Check that you have one machine (the one you're sitting in front of), in your pool. If no machines are returned, try again in a moment; there can be a brief delay while the various daemons get in touch with each other.

% condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
lab-07.nesc.e LINUX       INTEL  Unclaimed  Idle       0.540   122  0+00:00:04
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
         INTEL/LINUX        1     0       0         1       0          0
 
               Total        1     0       0         1       0          0

student: Test

Let's verify that everything is working with a minimal test. As the normal user, put the Condor user binaries in your path.

% PATH=$PATH:/tmp/condor/bin

Create a submit file. This is a vanilla job, so we'll either need a shared file system or Condor's file transfer support. We don't have

% cd
% mkdir testjob
% cd testjob
% cat > myjob.submit
executable=myprog
universe=vanilla
arguments=Example.$(Cluster).$(Process) 100
output=results.output.$(Process)
error=results.error.$(Process)
log=results.log
notification=never
should_transfer_files=YES
when_to_transfer_output = ON_EXIT
queue
Ctrl-D
% cat myjob.submit
executable=myprog
universe=vanilla
arguments=Example 100
output=results.output.$(Process)
error=results.error.$(Process)
log=results.log
notification=never
should_transfer_files=YES
when_to_transfer_output = ON_EXIT
queue

Create a simple program to run.

% cat > myprog
#! /bin/sh

echo "I'm process id $$ on" `hostname`
echo "This is sent to standard error" 1>&2
date
echo "Running as binary $0" "$@"
echo "My name (argument 1) is $1"
echo "My sleep duration (argument 2) is $2"
sleep $2
echo "Sleep of $2 seconds finished.  Exiting"
exit 42
Ctrl-D 
% chmod a+x myscript.sh
% cat myscript.sh
#! /bin/sh

echo "I'm process id $$ on" `hostname`
echo "This is sent to standard error" 1>&2
date
echo "Running as binary $0" "$@"
echo "My name (argument 1) is $1"
echo "My sleep duration (argument 2) is $2"
sleep $2
echo "Sleep of $2 seconds finished.  Exiting"
exit 42

Submit the test:

% condor_submit myjob.submit
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 1.

Watch the job with condor_q:

% condor_q
 
 
-- Submitter: lab-07.nesc.ed.ac.uk : <129.215.30.76:2683> : lab-07.nesc.ed.ac.uk
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   1.0   adesmet        10/21 18:24   0+00:00:21 R  0   0.0  myprog Example 100
 
1 jobs; 0 idle, 1 running, 0 held 

Depending on how quickly you run condor_q, the job might be in the Idle state (I) waiting to run, Running (R), Completed (C) finishing up, or finished and gone from the list. You can continue to monitor the run with condor_q (perhaps using the "watch" program) or by examining the results.log file.

You may also be able to catch the machine being marked as busy while the job runs:

% condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
lab-07.nesc.e LINUX       INTEL  Claimed    Busy       0.170   122  0+00:00:04
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
         INTEL/LINUX        1     0       1         0       0          0
 
               Total        1     0       1         0       0          0

When the job disappears from the condor_q list (in few minutes) check that the output is what you expect:

% tail --lines=100 results.*
==> results.error <==
This is sent to standard error
                                                                                
==> results.log <==
000 (002.000.000) 10/21 18:24:12 Job submitted from host: <129.215.30.76:2683>
...
001 (002.000.000) 10/21 18:24:18 Job executing on host: <129.215.30.76:2682>
...
006 (002.000.000) 10/21 18:24:26 Image size of job updated: 3720
...
005 (002.000.000) 10/21 18:25:58 Job terminated.
        (1) Normal termination (return value 42)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        295  -  Run Bytes Sent By Job
        273  -  Run Bytes Received By Job
        295  -  Total Bytes Sent By Job
        273  -  Total Bytes Received By Job
...
                                                                                
==> results.output <==
I'm process id 5220 on lab-07
Tue Oct 21 18:24:18 BST 2003
Running as binary /tmp/condor/var/execute/dir_5218/condor_exec.exe Example 100
My name (argument 1) is Example.1.0
My sleep duration (argument 2) is 100
Sleep of 100 seconds finished.  Exiting

Simple Policy

root: Installation and Configuration

condor_configure set the START expression to TRUE. As a result the machine defaults to Idle and will always accept jobs. Try toggling START to FALSE and check the difference with condor_status. You may need to wait a bit for the collector to learn about the change in state.

% condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
lab-07.nesc.e LINUX       INTEL  Unclaimed  Idle       0.180   122  0+00:00:04
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
         INTEL/LINUX        1     0       0         1       0          0
 
               Total        1     0       0         1       0          0
% echo "START=FALSE" >> /tmp/condor/var/condor_config.local
% condor_reconfig
Sent "Reconfig" command to local master
% sleep 60
% condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
lab-07.nesc.e LINUX       INTEL  Owner  Idle       0.180   122  0+00:00:04
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
         INTEL/LINUX        1     1       0         0       0          0
 
               Total        1     1       0         0       0          0

The machine is now in the Owner state. So long as START evaluates to FALSE the machine will remain in the Owner state and will refuse jobs.

As the student user:

% rm results.out.* results.err.* results.log
% condor_submit myjob.submit
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 9.
% condor_q -analyze 9
 

-- Submitter: lab-07.nesc.ed.ac.uk : <129.215.30.76:1534> : lab-07-nes-ed.ac.uk
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
009.000:  Run analysis summary.  Of 1 machines,
      0 are rejected by your job's requirements
      1 reject your job because of their own requirements
      0 match, but are serving users with a better priority in the pool
      0 match, but prefer another specific job despite its worse user-priority
      0 match, but will not currently preempt their existing job
      0 are available to run your job
        No successful match recorded.
        Last failed match: Wed Oct 22 14:24:11 2003
        Reason for last match failure: no match found
 
WARNING:  Be advised:   Request 9.0 did not match any resource's constraints

Sure enough, no machines can run your job. The message discusses the machine's requirements not matching your job because typically START is defined in part as a set of requirements about your job. In this particular case no job can satisify the requirement of FALSE.

Set START back to TRUE. As root:

% echo "START=TRUE" >> /tmp/condor/var/condor_config.local
% condor_reconfig

In a bit your job should run and exit.

Debugging Shadow Exceptions

Debugging jobs that generating Shadow Exceptions comes up occasionally. It typically means that there is a configuration problem, either in the machine configuration, or in the user's job.

As root:

% chmod a-w /tmp/condor/var/execute/

As student:

% rm results.out.* results.err.* results.log
% condor_submit myjob.submit
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 10.

After a few moments, check condor_q:

% condor_q
 
 
-- Submitter: wireless52.cs.wisc.edu : <129.215.30.76:1534> : wireless52.cs.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  10.0   adesmet        10/22 13:54   0+00:00:07 I  0   0.0  myprog Example.10.
 
1 jobs; 1 idle, 0 running, 0 held

The job has collected RUN_TIME, but is Idle. We know that our policy doesn't allow evictions, so this suggests a problem. Check your job log. The first command is a useful way to find the user log if it's not obvious:

% condor_q -format '%s\n' UserLog
/home/adesmet/testjob/results.log
% cat /home/adesmet/testjob/results.log
000 (010.000.000) 10/22 13:54:13 Job submitted from host: <129.215.30.76:1534>
...
007 (010.000.000) 10/22 13:54:20 Shadow exception!
        Can no longer talk to condor_starter on execute machine (129.215.30.76)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...
007 (010.000.000) 10/22 13:54:21 Shadow exception!
        Can no longer talk to condor_starter on execute machine (129.215.30.76)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job

The error is probably repeated many times. The job is trying to start, but something is going wrong.

Usually the user log will contain some information on why the condor_starter failed, but this isn't always the case. This is one such case. We'll need to check the StarterLog on the machine in question. In this case this is your local machine, but in most cases it will a different machine, so we'll walk through the process tracking down the machine in question.

Using the IP address in the error message, find out the machine name.

% host 129.215.30.76
76.30.215.129.in-addr.arpa domain name pointer lab-07.nesc.ed.ac.uk.

You could log into the machine with the problem (using the name or IP address) and check the StarterLog. If, upon logging into the machine you could not easily find the StarterLog, you could use "condor_config_val STARTER_LOG" to locate it. Or, you can use condor_fetchlog. You'll need to run this

% /tmp/condor/sbin/condor_fetchlog lab-07.nesc.ed.ac.uk STARTER
10/22 14:09:29 ******************************************************
10/22 14:09:29 ** condor_starter (CONDOR_STARTER) STARTING UP
10/22 14:09:29 ** $CondorVersion: 6.5.5 Sep 16 2003 $
10/22 14:09:29 ** $CondorPlatform: INTEL-LINUX-GLIBC23 $
10/22 14:09:29 ** PID = 5532
10/22 14:09:29 ******************************************************
10/22 14:09:29 Using config file: /home/condor/condor_config
10/22 14:09:29 Using local config files: /tmp/condor/var/condor_config.local
10/22 14:09:29 DaemonCore: Command Socket at <129.215.30.76:1853>
10/22 14:09:29 Done setting resource limits
10/22 14:09:29 Starter communicating with condor_shadow <129.215.30.76:1850>
10/22 14:09:29 Submitting machine is "lab-07.nesc.ed.ac.uk"
10/22 14:09:29 couldn't create dir /tmp/condor/var/execute/dir_5532: Permission denied
10/22 14:09:29 Failed to initialize JobInfoCommunicator, aborting
10/22 14:09:29 Unable to start job.
10/22 14:09:29 **** condor_starter (condor_STARTER) EXITING WITH STATUS 1

Sure enough, "couldn't create dir /tmp/condor/var/execute/dir_5532: Permission denied" is the problem. If we reenable writing to that directory things we begin working again. We can use condor_reschedule to let Condor know to try rematching jobs to machines again. As root:

% chmod a+w /tmp/condor/var/execute/
% condor_reschedule

After a few moments the job should run and finish.

Big Pool

At the moment we have a bunch of one machine pools, let's merge them into a single big pool. All we need to do is pick one machine and have all of the others report.

root: Installation and Configuration

Change your CONDOR_HOST to point to the shared machine. There is no need to continue running a negotiator and collector on individual machines, so remove them from the DAEMON_LIST. You can edit /tmp/condor/var/condor_config.local, or use the following commands:

% echo 'DAEMON_LIST = MASTER, STARTD, SCHEDD' >> /tmp/condor/var/condor_config.local
% echo 'CONDOR_HOST = shared.machine.name.example' >> /tmp/condor/var/condor_config.local

We need to let Condor know that we made a change. Normally condor_reconfig will do the job, but for a handful of changes (including changes to DAEMON_LIST), you need to restart Condor:

% condor_restart
Sent "Restart" command to local master

After a moment condor_status should report the machines in the larger pool.

% condor_status
 
Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
lab-01.nesc.e LINUX       INTEL  Unclaimed  Idle       0.540   122  0+00:00:04
lab-02.nesc.e LINUX       INTEL  Unclaimed  Idle       0.200   122  0+00:01:00
lab-03.nesc.e LINUX       INTEL  Unclaimed  Idle       0.000   122  0+00:03:34
lab-07.nesc.e LINUX       INTEL  Unclaimed  Idle       1.000   122  0+00:02:23
lab-09.nesc.e LINUX       INTEL  Unclaimed  Idle       0.120   122  0+00:00:47
lab-21.nesc.e LINUX       INTEL  Unclaimed  Idle       0.930   122  0+00:30:39
 
                     Machines Owner Claimed Unclaimed Matched Preempting
 
         INTEL/LINUX        6     0       0         6       0          0
 
               Total        6     0       0         6       0          0

student: Test

Modify the submit file to submit several more copies of the job. You might also want to modify the arguments so that the second argument is only 10 instead of 100. Submit the jobs.

% echo "queue 4" >> myjob.submit
% rm results.out.* results.err.* results.log
% condor_submit myjob.submit
Submitting job(s).....
Logging submit event(s).....
5 job(s) submitted to cluster 5.

Monitor the jobs with condor_q or by watching results.log. When the jobs finish examine the output files or the results.log to confirm that your jobs ran on other machines. (There is a chance that all of your jobs ran on just your machine, but it is unlikely.)

% grep 'executing' results.log
001 (005.000.000) 10/22 12:05:26 Job executing on host: <129.215.30.81:3171>
001 (005.001.000) 10/22 12:05:38 Job executing on host: <129.215.30.90:3171>
001 (005.002.000) 10/22 12:05:51 Job executing on host: <129.215.30.81:3171>
001 (005.003.000) 10/22 12:06:03 Job executing on host: <129.215.30.81:3171>
001 (005.004.000) 10/22 12:06:16 Job executing on host: <129.215.30.77:3171>
% grep 'process id' results.output.*
results.output.0:I'm process id 4119 on lab-07
results.output.1:I'm process id 4127 on lab-21
results.output.2:I'm process id 4136 on lab-07
results.output.3:I'm process id 4145 on lab-07
results.output.4:I'm process id 4153 on lab-01

START Expression

You can create very complex policies using the startd expressions. START is the most important expression. As a simple example, we'll limit your machine to running your own jobs.

root: Configuration

Normally you would use "START=Owner=="username"". Unfortunately in this tutorial everyone's username is "student". So for this lab we'll add a custom attribute "RealName" for the same purpose. Where it says "YourName" in the following, enter your own name, email address, or something else unique.

% echo 'START=RealName=="YourName"' >> /tmp/condor/var/condor_config.local
% condor_reconfig

Wait a bit for everyone to get this far.

student: Test job

Edit your submit file to add the RealName entry to your job. Submit the job.

% mv myjob.submit myjob.submit.orig
% echo '+RealName="YourName"' > myjob.submit
% cat myjob.submit.orig >> myjob.submit
% rm myjob.submit.orig
% condor_submit myjob.submit

If you catch it before the job finishes, you can see RealName in the job's ClassAd:

% condor_q -l | grep RealName
RealName = "YourName"

When the job finishes, examine the user log (results.log as the student user) and the StarterLog (/tmp/condor/var/log/StarterLog). Your job should have run on your machine, and no one else's jobs should have run on your machine.

Bad Requirements

It's not uncommon to need to determine why a particular machine won't accept a given job, or why a given job won't run on a particular machine.

student: Submit a Bad Job

% cat > badjob.submit
executable=myprog
universe=vanilla
arguments=Example.$(Cluster).$(Process) 10
output=results.output.$(Process)
error=results.error.$(Process)
log=results.log
notification=never
should_transfer_files=YES
when_to_transfer_output = ON_EXIT
requirements=Memory>2000
+RealName="Bad User Name"
queue
Ctrl-D 
% cat badjob.submit
executable=myprog
universe=vanilla
arguments=Example.$(Cluster).$(Process) 10
output=results.output.$(Process)
error=results.error.$(Process)
log=results.log
notification=never
should_transfer_files=YES
when_to_transfer_output = ON_EXIT
requirements=Memory>2000
+RealName="Bad User Name"
queue
% condor_submit badjob.submit
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 14.

You could wait a while for the job to run, but it won't. Typically a user might contact you after their job fails to run for "a while" (anywhere from five minutes to several days, depending on how impatient they are).

% condor_q -analyze 14
 
 
-- Submitter: lab-07.nesc.ed.ac.uk : <129.215.30.76:1534> : lab-07.nesc.ed.ac.uk
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
014.000:  Run analysis summary.  Of 1 machines,
      1 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match, but are serving users with a better priority in the pool
      0 match, but prefer another specific job despite its worse user-priority
      0 match, but will not currently preempt their existing job
      0 are available to run your job
        No successful match recorded.
        Last failed match: Wed Oct 22 14:56:43 2003
        Reason for last match failure: no match found
 
WARNING:  Be advised:
   No resources matched request's constraints
   Check the Requirements expression below:
 
Requirements = (Memory > 2000) && (Arch == "INTEL") &&
 (OpSys == "LINUX") && (Disk >= DiskUsage) && (HasFileTransfer)
 
 
1 jobs; 1 idle, 0 running, 0 held

root: Install condor_analyze

condor_analyze will provide a much analysis. condor_analyze will be included in Condor 6.6 and later. It is also present in some 6.5 releases. Unfortunately it's missing from 6.5.5. Install a copy as root:

% cd /tmp/condor/bin
% wget http://www.cs.wisc.edu/~adesmet/condor_analyze.gz
--12:23:26--  http://www.cs.wisc.edu/%7Eadesmet/condor_analyze.gz
           => `condor_analyze.gz'
Resolving www.cs.wisc.edu... done.
Connecting to www.cs.wisc.edu[128.105.7.11]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 781,581 [text/plain]
 
100%[====================================>] 781,581      360.54K/s    ETA 00:00
 
15:12:42 (360.54 KB/s) - `condor_analyze.gz' saved [781581/781581]
 
% gunzip condor_analyze.gz
% chmod a+x condor_analyze

student

Now, as the student user we can run it:

% condor_analyze
 
 
-- Submitter: wireless52.cs.wisc.edu : <129.215.30.76:1534> : wireless52.cs.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
014.000:  Run analysis summary.  Of 1 machines,
      1 are rejected by your job's requirements
        No successful match recorded.
        Last failed match: Wed Oct 22 15:16:51 2003
        Reason for last match failure: no match foundPl
 
 
WARNING:  Be advised:
   No machines matched job's requirements
 
 
The Requirements expression for your job is:
 
( target.Memory > 2000 ) && ( target.Arch == "INTEL" ) &&
( target.OpSys == "LINUX" ) && ( target.Disk > DiskUsage ) &&
( target.HasFileTransfer )
 
    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( target.Memory > 2000 )          0                   MODIFY TO 121
2   ( target.Arch == "INTEL" )        1
3   ( target.OpSys == "LINUX" )       1
4   ( target.Disk > 1 )               1
5   ( target.HasFileTransfer )        1
 
1 jobs; 1 idle, 0 running, 0 held

Sure enough, we're asking for too much memory.

We can modify the job's requirements, so let's do so:

% condor_q -format '%s\n' Requirements 14
% condor_qedit 14 requirements '(Arch == "INTEL") && (OpSys == % "LINUX") && (Disk >= DiskUsage) && (HasFileTransfer)'
Set attribute "reqiurements".
% condor_reschedule

Unfortunately our job still doesn't work. Let's see why:

% condor_analyze 14
 
 
-- Submitter: lab-07.nesc.ed.ac.uk : <129.215.30.76:1534> : lab-07.nesc.ed.ac.uk
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
014.000:  Run analysis summary.  Of 1 machines,
      0 are rejected by your job's requirements
      1 reject your job because of their own requirements
      0 match, but are serving users with a better priority in the pool
      0 match, but prefer another specific job despite its worse user-priority
      0 match, but will not currently preempt their existing job
      0 are available to run your job
        No successful match recorded.
        Last failed match: Wed Oct 22 15:27:12 2003
        Reason for last match failure: no match found
 
WARNING:  Be advised:   Job 14.0 did not match any machine's requirements
 
 
The following attributes should be added or modified:
 
Attribute               Suggestion
---------               ----------
RealName                change to "Alan De Smet"
 
1 jobs; 1 idle, 0 running, 0 held