Banner
Title: Condor Practical
Subtitle: Submitting a Java Universe Job
Tutors: Alain Roy and Todd Tannenbaum
Authors: Alain Roy and Ben Burnett

5.0 Submitting a Java universe job

5.1 What is the Java universe?

Strictly speaking, the Java universe is not necessary because you can use the vanilla universe for Java jobs. However, we discovered that life can get complicated sometimes. For instance, how can Condor tell if a job exited normally but with an error code, or if the Java virtual machine exited abnormally due to a problem with the computer? Condor should act differently in these two cases: if the JVM is bad, another computer can be tried, but if the job failed, that's the fault of the author.

The Java universe addresses this problem, among others. There is a really wonderful paper about grid computing and the Java universe, and I can't recommend the paper highly enough. You should read this paper.

Douglas Thain and Miron Livny, "Error Scope on a Computational Grid: Theory and Practice", Proceedings of the Eleventh IEEE Symposium on High Performance Distributed Computing (HPDC11), Edinburgh, Scotland, July 2002. Postscript PDF

5.2 Creating a Java program

First you need a Java program. Here's one to start you off. Save it in a file named simple.java.

public class simple
{
    public static void main(String[] args)
    {
        if (args.length != 2) {
            System.out.println("Usage: simple.java <sleep-time> <integer>");
        }
        Integer arg_sleep_time;
        Integer arg_input;

        arg_sleep_time = new Integer(args[0]);
        arg_input      = new Integer(args[1]);

        int sleep_time;
        int input;
        
        sleep_time = arg_sleep_time.intValue();
        input      = arg_input.intValue();

        try {
            System.out.println("Thinking really hard for " + sleep_time + " seconds...");
            Thread.sleep(sleep_time * 1000);
            System.out.println("We calculated: " + input * 2);
        } catch (InterruptedException exception) {
            ;
        }
        return;
    }
}

Then compile and try out the program:

C:\condor-test> javac simple.java

C:\condor-test> dir simple.class
 Volume in drive C has no label.
 Volume Serial Number is 14E3-4F7E

 Directory of C:\condor-test

11/15/2007  03:38 PM             1,082 simple.class
               1 File(s)          1,082 bytes
               0 Dir(s)  30,566,567,936 bytes free

C:\condor-test> java simple 4 10
Thinking really hard for 4 seconds...
We calculated: 20

Top

5.3 Submitting a Java job

Create a submit file. Name this file simple.java.sub.

Universe                = java
Executable              = simple.class
Arguments               = simple 4 10
Log                     = simple.log.txt
Output                  = simple.out.txt
Error                   = simple.err.txt
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
Queue

Now submit your job:

C:\condor-test> del simple.log

C:\condor-test> condor_submit simple.java.sub
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 26.

C:\condor-test> condor_q

-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   8.0   Administrator  11/27 10:54   0+00:00:00 I  0   0.0  java simple 4 10

1 jobs; 1 idle, 0 running, 0 held

C:\condor-test> condor_q

-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   8.0   Administrator  11/27 10:54   0+00:00:02 R  0   0.0  java simple 4 10

1 jobs; 0 idle, 1 running, 0 held

C:\condor-test> condor_q

-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held

C:\condor-test> more simple.log.txt

000 (008.000.000) 11/27 10:54:01 Job submitted from host: <129.215.30.181:2207>
...
001 (008.000.000) 11/27 10:54:07 Job executing on host: <129.215.30.173:2217>
...
005 (008.000.000) 11/27 10:54:12 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        58  -  Run Bytes Sent By Job
        1082  -  Run Bytes Received By Job
        58  -  Total Bytes Sent By Job
        1082  -  Total Bytes Received By Job

Congratulations, you've submitted a Java job to Condor!

Top

5.4 Java on your Condor pool

Condor keeps track of which computers have a functional Java virtual machine and which version it is. You can find this out by using condor_status:

C:\condor-test> condor_status -java
Name               JavaVendor Ver    State     Activity LoadAv Mem   ActvtyTime

lab-01             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+03:15:04
lab-02             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+03:15:04
lab-03             Sun Micros 1.6.0_ Unclaimed Idle     0.050   502  0+02:45:04
lab-04             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+00:19:05
lab-05             Sun Micros 1.6.0_ Unclaimed Idle     0.090   502  0+03:05:04
lab-06             Sun Micros 1.6.0_ Unclaimed Idle     0.040   502  0+00:15:04
lab-07             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+00:10:04
lab-08             Sun Micros 1.6.0_ Unclaimed Idle     0.040   502  0+03:05:04
lab-09             Sun Micros 1.6.0_ Unclaimed Idle     0.100   502  0+03:05:04
lab-10             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+03:05:04
lab-11             Sun Micros 1.6.0_ Unclaimed Idle     0.020   502  0+03:00:04
lab-13             Sun Micros 1.6.0_ Unclaimed Idle     0.020   502  0+00:01:32
lab-14             Sun Micros 1.6.0_ Unclaimed Idle     0.050   502  0+03:00:04
lab-15             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+02:55:04
lab-16             Sun Micros 1.6.0_ Unclaimed Idle     0.070   502  0+03:40:04
lab-17             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+02:55:04
lab-19             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+02:55:04
lab-20             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+02:50:04
lab-21             Sun Micros 1.6.0_ Unclaimed Idle     0.000   502  0+02:50:04
lab-22             Sun Micros 1.6.0_ Unclaimed Idle     0.090   502  0+00:15:19

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

       INTEL/WINNT51    20     0       0        20       0          0        0

               Total    20     0       0        20       0          0        0

C:\condor-test> condor_status -l lab-01 | find "Java"
JavaVendor = "Sun Microsystems Inc."
JavaVersion = "1.6.0_03"
JavaMFlops = 422.381805
HasJava = TRUE
StarterAbilityList = "HasFileTransfer,HasPerFileEncryption,HasReconnect,
HasMPI,HasTDP,HasJobDeferral,HasJICLocalConfig,HasJICLocalStdin,HasJava,HasVM,
HasWindowsRunAsOwner"

Extra credit

Make another scientific program that takes its input from a file. Now submit 3 copies of this program where each input file is in a separate directory. Use the initialdir option described in the lecture, or in the manual.

We don't yet hav java checkpointing for Java. In the past we implemented it and found it to be unusably slow. Your tutorial leader can fill you in on the details. However, Java has something similar to the remote I/O provided by the standard universe. You can read about the Java universe and Chirp I/O to learn more.

Next: Coordinating a set of jobs: A simple DAG

Top