You might want to refer to the
online Condor manual.
You may enjoy browsing the
Condor web page
You will need a laptop with a web browser and an ssh client. The web browser is to read these directions, and the ssh client is to log into a computer that has Condor set up and ready to go. The computer's name is tg-condor.purdue.teragrid.org, and we have set up guest accounts for you. We'll hand out the usernames and passwords.
Using VM universe Condor allows jobs to be Virtual Machines instead of simply executables. Virtual Machines allow for a greater flexibility with regards to the types of jobs users can submit. It allows a user to run applications written for one platform to be run on top of an arbitrary platform, without the need to port the original application to the new platform. VM universe supports several virtual machine applications, today we will be looking at VMware Server, but similar jobs can be run using Xen, etc.
For your convenience, we have created a VM for this exercise. It is a small
Linux VM. You can find it under $TG_COMMUNITY/osg-vm/condorvm/
on the tutorial machine.
You can also download the
configuration file and
disk image from this webpage.
You also need a Condor submit file. We've provived one under
$TG_COMMUNITY/osg-vm/condorvm.desc
.
Let's take a look at it...
universe = vm executable = any_name_you_like log = condorvm.log vm_type = vmware vm_memory = 64 vmware_dir = $ENV(TG_COMMUNITY)/osg-vm/condorvm vmware_should_transfer_files = yes queue
Note the lack of real executable in this universe (as we mentioned
above: the VM image itself is the executable in this universe). So why do we
have an executable name? The executable name is provided to identify the job
when you run condor_q
. Accordingly, you can change it to change
it to something more representative, like: linux_vm_test
or
something similar.
Now submit your job:
% mkdir ~/condor-test % cd ~/condor-test % condor_submit $TG_COMMUNITY/osg-vm/condorvm.desc Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 26. % condor_q -- Submitter: leovinus : <128.105.48.96:50589> : leovinus ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD 6.0 aroy 11/20 15:31 0+00:02:46 R 0 0.0 any_name_you_like 1 jobs; 0 idle, 1 running, 0 held -- Submitter: leovinus : <128.105.48.96:50589> : leovinus ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD 6.0 aroy 11/20 15:31 0+00:02:56 R 0 0.0 any_name_you_like 1 jobs; 0 idle, 1 running, 0 held -- Submitter: leovinus : <128.105.48.96:50589> : leovinus ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD 6.0 aroy 11/20 15:31 0+00:03:06 R 0 0.0 any_name_you_like 1 jobs; 0 idle, 1 running, 0 held -- Submitter: leovinus : <128.105.48.96:50589> : leovinus ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD 6.0 aroy 11/20 15:31 0+00:03:16 R 0 0.0 any_name_you_like 1 jobs; 0 idle, 1 running, 0 held ... -- Submitter: leovinus : <128.105.48.96:50589> : leovinus ID OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held
The first time the image starts up, it will run its fake "job" which will
run for 10-15 minutes. Just enough time to ask your instructors some difficult
questions. The second time the job is run, it will do nothing. This is so we ca
n open
up the image using VMware Server and view /root/job.out
for the results.
This is how the Linux image works: the job is run from
/etc/rc.d/rc.start/60.job
. It invokes /root/job
in
the background to do the actual work. The job itself will run if and only if
/root/job.out
doesn't exist. This is so you can extract the
output during the next boot. By removing /root/job.out
you can
force the job to run again.
When the job completes (and disappears from condor_q
),
Condor will transfer the modified VM files back to your submit machine.
% ls condorvm-000001.vmdk vmware-0.log vmxHpmJe_condor.vmsd condorvm.log vmware.log vmxHpmJe_condor.vmx nvram vmxHpmJe_condor-Snapshot1.vmsn %
condorvm.log
is a file written by Condor that
contains the execution history of your job.
When the job completes, it'll look something like this:
% cat condorvm.log 000 (013.000.000) 06/20 01:41:16 Job submitted from host: <193.10.156.74:40295> ... 001 (013.000.000) 06/20 01:41:20 Job executing on host: <193.10.156.74:40304> ... 006 (013.000.000) 06/20 01:42:12 Image size of job updated: 66632 ... 005 (013.000.000) 06/20 01:41:35 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:11 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:11 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 1237936 - Run Bytes Sent By Job 10945025 - Run Bytes Received By Job 1237936 - Total Bytes Sent By Job 10945025 - Total Bytes Received By Job
For VMware VMs, Condor creates a snapshot of the original VM image and
returns the snapshot disk image (condorvm-000001.vmdk
).
This snapshot image contains the changes made to the original disk
image and is much smaller than the original image. The file contains
a reference to the original image file.
Congratulations, you've submitted a VM job to Condor!
Condor keeps track of which computers have a functional VMware Server and which version it is. You can find this out by using condor_status:
% condor_status -vm Name VMType Ver State Activity LoadAv VMMe ActvtyTime VMNetworking slot1@tg-data-01.r vmware server1.0 Claimed Busy 0.000 960 0+00:03:36 nat slot2@tg-data-01.r vmware server1.0 Unclaimed Idle 0.000 960 1+00:11:40 nat ... Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 32 2 1 29 0 0 0 Total 32 2 1 29 0 0 0 % condor_status -l slot1@tg-data-01.rcac.purdue.edu | grep VM ... HasVM = TRUE VM_AvailNum = 10000 VM_GAHP_VERSION = "0.0.1" VM_Type = "vmware" VM_Version = "server1.0" VM_Memory = 960 VM_Networking = TRUE VM_Networking_Types = "nat"