Before you can submit a job to Condor, you need a job. We will quickly write a small program in C. If you aren't an expert C program, fear not. We will hold your hand throughout this process.
First, create a file called simple.c using your favorite editor. In that file, put the following text. Copy and paste is a good choice:
#include <stdio.h> main(int argc, char **argv) { int sleep_time; int input; int failure; if (argc != 3) { printf("Usage: simple <sleep-time> <integer>\n"); failure = 1; } else { sleep_time = atoi(argv[1]); input = atoi(argv[2]); printf("Thinking really hard for %d seconds...\n", sleep_time); sleep(sleep_time); printf("We calculated: %d\n", input * 2); failure = 0; } return failure; }
Now compile that program:
nova 1% gcc -o simple simple.c nova 2% ls -lh simple -rwxr-xr-x 1 alainroy math 14k Dec 18 00:57 simple
Finally, run the program and tell it to sleep for four seconds and calculate 10 * 2:
nova 3% ./simple 4 10 Thinking really hard for 4 seconds... We calculated: 20
Great! You have a job you can tell Condor to run! Although it clearly isn't an interesting job, it models some of the aspects of a real scientific program. It takes a while to run and it does a calculation.
Now that you have a job, you just have to tell Condor to run it. Put the following text into a file called submit:
Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.out Error = simple.error Queue
Let's examine each of these lines:
Next, tell Condor to run your job:
nova 4% condor_submit submit Submitting job(s)con. Logging submit event(s). 1 job(s) submitted to cluster 6075.
Now, watch your job run:
nova 5% % condor_q -- Submitter: nova.cs.tau.ac.il : <132.67.192.133:43609> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 4589.0 doronn 3/30 18:07 19+09:26:01 I 0 0.0 go1 ... 6073.3 zomerosn 12/17 17:46 0+07:30:15 R 0 1.3 q1 Arabidopsis_tha 6075.0 alainroy 12/18 01:16 0+00:00:00 I 0 0.0 simple 4 10 nova 6% condor_q -sub alainroy -- Submitter: alainroy@cs.tau.ac.il : <132.67.192.133:43609> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 6075.0 alainroy 12/18 01:16 0+00:00:04 R 0 0.0 simple 4 10 nova 7% condor_q -sub alainroy -- Submitter: alainroy@cs.tau.ac.il : <132.67.192.133:43609> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 6075.0 alainroy 12/18 01:16 0+00:00:29 R 0 0.0 simple 4 10 nova 8% condor_q -sub alainroy -- Submitter: alainroy@cs.tau.ac.il : <132.67.192.133:43609> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held
Notice a few things here. First, when I did condor_q, I got a long list of everyone's jobs. (I trimmed the output above.) So I told condor_q to just list my jobs with the -sub option, which is short for submitter. You will want to substitute your user name for alainroy. When my job was done, it was no longer listed. Because I told Condor to log information about my job, I can see what happened:
nova 9% cat simple.log 000 (6075.000.000) 12/18 01:16:45 Job submitted from host: <132.67.192.133:43609> ... 001 (6075.000.000) 12/18 01:17:10 Job executing on host: <132.67.105.236:35676> ... 005 (6075.000.000) 12/18 01:17:14 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job ...
That looks good: It took less than 30 seconds for the job to start up (Condor doesn't optimize for fast job startup, but for high throughput), and the job ran for about four seconds. But did our job execute correctly?
nov 10% cat simple.out Thinking really hard for 4 seconds... We calculated: 20
Excellent! We ran our sophisticated scientific job on a remote computer!
If you only ever had to run a single job, you probably wouldn't need Condor. But we would like to have our program calculate a whole set of values for different inputs. How can we do that? Let's change our submit file to look like this:
Universe = vanilla Executable = simple Arguments = 4 10 Log = simple.log Output = simple.$(Process).out Error = simple.$(Process).error Queue Arguments = 4 11 Queue Arguments = 4 12 Queue
There are two important differences to notice here. First, the Output and Error lines have the $(Process) macro in them. This means that the output and error files will be named according to the process number of the job. You'll see what this looks like in a moment. Second, we told Condor to run the same job an extra two times by adding extra Arguments and Queue statements. We are doing a parameter sweep on the values 10, 11, and 12. Let's see what happens:
nova 11% condor_submit submit Submitting job(s)... Logging submit event(s)... 3 job(s) submitted to cluster 6076. nova 12% condor_q -sub alainroy -- Submitter: alainroy@cs.tau.ac.il : <132.67.192.133:43609> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 6076.0 alainroy 12/18 01:35 0+00:00:19 R 0 0.0 simple 4 10 6076.1 alainroy 12/18 01:35 0+00:00:14 R 0 0.0 simple 4 11 6076.2 alainroy 12/18 01:35 0+00:00:17 R 0 0.0 simple 4 12 3 jobs; 0 idle, 3 running, 0 held nova 13% condor_q -sub alainroy -- Submitter: alainroy@cs.tau.ac.il : <132.67.192.133:43609> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held nova 14% ls simple*out simple.0.out simple.1.out simple.2.out simple.out nova 15% cat simple.0.out Thinking really hard for 4 seconds... We calculated: 20 nova 16% cat simple.1.out Thinking really hard for 4 seconds... We calculated: 22 nova 17% cat simple.2.out Thinking really hard for 4 seconds... We calculated: 24Notice that we had three jobs with the same cluster number, but different process numbers. They have the same cluster number because they were all submitted from the same process. When the jobs ran, they created three different output files, each with the desired output.
You are now ready to submit lots of jobs! Although this example was simple, Condor has many, many options so you can get a wide variety of behaviors. You can find many of these if you look at the documentation for condor_submit.