This is a good point for your tutorial leader to jump in and describe to you the Java universe. Strictly speaking, the Java universe is not necessary because you can use the vanilla universe for Java jobs. However, we discovered that life can get complicated sometimes. For instance, how can Condor tell if a job exited normally but with an error code, or if the Java virtual machine exited abnormally due to a problem with the computer? Condor should act differently in these two cases: if the JVM is bad, another computer can be tried, but if the job failed, that's the fault of the author.
The Java universe addresses this problem, among others, as your tutorial leader will describe to you. There is a really wonderful paper about grid computing and the Java universe, and I can't recommend the paper highly enough. You should read this paper.
First you need a Java program. Here's one to start you off. Save it in a file named simple.java.
public class simple { public static void main(String[] args) { if (args.length != 2) { System.out.println("Usage: simple.java <sleep-time> <integer>"); } Integer arg_sleep_time; Integer arg_input; arg_sleep_time = new Integer(args[0]); arg_input = new Integer(args[1]); int sleep_time; int input; sleep_time = arg_sleep_time.intValue(); input = arg_input.intValue(); try { System.out.println("Thinking really hard for " + sleep_time + " seconds..."); Thread.sleep(sleep_time * 1000); System.out.println("We calculated: " + input * 2); } catch (InterruptedException exception) { ; } return; } }
Then compile and try out the program:
nova 1% javac simple.java nova 2% ls -lh simple.class -rw-r--r-- 1 alainroy math 1008 Dec 20 04:01 simple.class nova 3% java simple 4 10 Thinking really hard for 4 seconds... We calculated: 20
Create a submit file. Note that in the arguments, the first argument is "simple", which is the name of the class you want to invoke. Skipping this argument is a common mistake.
Universe = java Executable = simple.class Arguments = simple 4 10 Log = simple.log Output = simple.out Error = simple.error Queue
Now submit your job:
nova 4% rm simple.log nova 5% condor_submit submit.java Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 6088. nova 6% condor_q -sub alainroy -- Submitter: alainroy@cs.tau.ac.il : <132.67.192.133:49346> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 6088.0 alainroy 12/20 04:05 0+00:00:01 R 0 0.0 simple.class simpl nova 7% condor_q -sub alainroy -- Submitter: alainroy@cs.tau.ac.il : <132.67.192.133:49346> : nova.cs.tau.ac.il ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held nova 8% cat simple.log 000 (6088.000.000) 12/20 04:05:38 Job submitted from host: <132.67.192.133:49346> ... 001 (6088.000.000) 12/20 04:05:46 Job executing on host: <132.67.105.212:33960> ... 005 (6088.000.000) 12/20 04:05:51 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job ...
Congratulations, you've submitted a Java job to Condor!
Condor keeps track of which computers have a functional Java virtual machine and which version it is. You can find this out by using condor_status:
nova 9% condor_status -java Name JavaVendor Ver State Activity LoadAv Mem ActvtyTime abel-01.cs.ta Blackdown J 1.4.1 Claimed Busy 1.230 494 0+00:16:46 abel-03.cs.ta Blackdown J 1.4.1 Unclaimed Idle 0.150 494 0+03:15:00 abel-05.cs.ta Blackdown J 1.4.1 Unclaimed Idle 0.060 494 0+01:10:04 ...