|
||||||||||||
|
5.0 Submitting a Java universe job5.1 What is the Java universe?Strictly speaking, the Java universe is not necessary because you can use the vanilla universe for Java jobs. However, we discovered that life can get complicated sometimes. For instance, how can Condor tell if a job exited normally but with an error code, or if the Java virtual machine exited abnormally due to a problem with the computer? Condor should act differently in these two cases: if the JVM is bad, another computer can be tried, but if the job failed, that's the fault of the author. The Java universe addresses this problem, among others. There is a really wonderful paper about grid computing and the Java universe, and I can't recommend the paper highly enough. You should read this paper.
Douglas Thain and Miron Livny, "Error Scope on a Computational Grid:
Theory and Practice", Proceedings of the Eleventh IEEE Symposium on
High Performance Distributed Computing (HPDC11), Edinburgh, Scotland,
July 2002.
Postscript
PDF
5.2 Creating a Java programFirst you need a Java program. Here's one to start you off. Save it in a file named simple.java. public class simple { public static void main(String[] args) { if (args.length != 2) { System.out.println("Usage: simple.java <sleep-time> <integer>"); } Integer arg_sleep_time; Integer arg_input; arg_sleep_time = new Integer(args[0]); arg_input = new Integer(args[1]); int sleep_time; int input; sleep_time = arg_sleep_time.intValue(); input = arg_input.intValue(); try { System.out.println("Thinking really hard for " + sleep_time + " seconds..."); Thread.sleep(sleep_time * 1000); System.out.println("We calculated: " + input * 2); } catch (InterruptedException exception) { ; } return; } } Then compile and try out the program: % javac simple.java % ls -lh simple.class -rw-r--r-- 1 aroy users 1001 Feb 4 23:00 simple.class % java simple 4 10 Thinking really hard for 4 seconds... We calculated: 20 5.3 Submitting a Java jobCreate a submit file. Note that in the arguments, the first argument is "simple", which is the name of the class you want to invoke. Skipping this argument is a common mistake. Name this file submit.java. Universe = java Executable = simple.class Arguments = simple 4 10 Log = simple.log Output = simple.out Error = simple.error java_vm_args = -Xmx500m should_transfer_files = YES when_to_transfer_output = ON_EXIT Queue Now submit your job: % rm -f simple.log % condor_submit submit.java Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 45029. % condor_q -- Submitter: osg-edu.cs.wisc.edu : <192.168.0.1:46374> : osg-edu.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 45029.0 roy 4/28 15:46 0+00:00:02 R 0 0.0 java simple 4 10 1 jobs; 0 idle, 1 running, 0 held % condor_q -- Submitter: osg-edu.cs.wisc.edu : <192.168.0.1:46374> : osg-edu.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held % cat simple.log 000 (45029.000.000) 04/28 15:46:39 Job submitted from host: <192.168.0.1:46374> ... 001 (45029.000.000) 04/28 15:46:45 Job executing on host: <192.168.0.4:33478> ... 005 (45029.000.000) 04/28 15:46:49 Job terminated. (1) Normal termination (return value 0) (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 56 - Run Bytes Sent By Job 1001 - Run Bytes Received By Job 56 - Total Bytes Sent By Job 1001 - Total Bytes Received By Job ... Congratulations, you've submitted a Java job to Condor! 5.4 Java on your Condor poolCondor keeps track of which computers have a functional Java virtual machine and which version it is. You can find this out by using condor_status: % condor_status -java Name JavaVendor Ver State Activity LoadAv Mem ActvtyTime vm2@osgs-c03. Sun Microsy 1.5.0_ Unclaimed Idle 0.000 1013 0+01:56:45 vm1@osgs-c05. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 253 0+00:54:41 vm2@osgs-c05. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 253 0+00:54:37 vm3@osgs-c05. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 253 0+00:58:23 vm4@osgs-c05. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 253 0+00:58:24 vm5@osgs-c05. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 253 0+00:58:25 vm6@osgs-c05. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 253 0+00:58:26 vm7@osgs-c05. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 253 0+00:58:27 vm2@osgs-c06. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 1013 0+02:12:26 vm2@osgs-c09. Sun Microsy 1.4.2_ Unclaimed Idle 0.000 1013 0+01:02:24 Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/LINUX 10 0 0 10 0 0 0 Total 10 0 0 10 0 0 0 % condor_status -l osgs-c03 | grep -i java JavaVendor = "Sun Microsystems Inc." JavaVersion = "1.5.0_13" JavaMFlops = 437.715149 HasJava = TRUE StarterAbilityList = "HasFileTransfer,HasPerFileEncryption,HasReconnect,HasMPI,HasTDP,HasJobDeferral,HasJICLocalConfig,HasJICLocalStdin,HasJava,HasPVM,HasRemoteSyscalls,HasCheckpointing"
Extra credit
Make another scientific program that takes its input from a file. Now submit 3 copies of this program where each input file is in a separate directory. Use the initialdir option described in the lecture, or in the manual. We don't yet have Java checkpointing for Java. In the past we implemented it and found it to be unusably slow. Your tutorial leader can fill you in on the details. However, Java has something similar to the remote I/O provided by the standard universe. You can read about the Java universe and Chirp I/O to learn more. |
|||||||||||
|