2.0 Looking at Condor
2.1 Where is Condor?
You will find the Condor binaries in /usr/bin:
% which condor_q /usr/bin/condor_q
You can see the version of Condor with
% condor_version $CondorVersion: 6.8.8 Dec 19 2007 $ $CondorPlatform: I386-LINUX_RHEL3 $
You might be surprised that it reports RHEL3 instead of Scientific Linux 4 (the version of Linux installed on these computers). It is reporting the operating system that it was compiled on, not the operating system that is in use. Don't worry, the RHEL 3 binaries work just fine on Scientific Linux 4.
Check is Condor is running:
% condor_master % ps auwx | grep condor | grep -v condor_stater PID TTY STAT TIME COMMAND 4554 ? Ss 592:11 /usr/sbin/condor_master -pid /var/condor/condor.pid 4556 ? Ss 42:03 condor_collector -f 4558 ? Ss 21:39 condor_negotiator -f 4559 ? Ss 58:25 condor_schedd -f
Excellent! It's running!
The output you see from ps may be slightly different than ours, but as long as it lists all of those Condor programs, it's okay. Let's look at what we see:
condor_master: This program runs constantly and ensures that all other parts of Condor are running. If they hang or crash, it restarts them.
condor_collector: This program is part of the Condor central manager. It collects information about all computers in the pool as well as which users want to run jobs. It is what normally responds to the condor_status command.
condor_negotiator: This program is part of the Condor central manager. It decides what jobs should be run where.
condor_startd: If this program is running, it allows jobs to be started up on this computer--that is, your computer is an "execute machine". This advertises your computer to the central manager (more on that later, but in this case it's also your computer) so that it knows about this computer. It will start up the jobs that run. This isn't listed on osg-edu, but it is on the other computers in our Condor pool.
condor_schedd If this program is running, it allows jobs to be submitted from this computer--that is, your computer is a "submit machine". This will advertise jobs to the central manager so that it knows about them. It will contact a condor_startd on other execute machines for each job that needs to be started.
condor_shadow (Not shown above) For each job that has been submitted from this computer, there is one condor_shadow running. It will watch over the job as it runs remotely. In some cases it will provide some assistance (see the standard universe later.) You may or may not see any condor_shadow processes running, depending on what is happening on the computer when you try it out.
We have a graphic representation of these daemons, drawn by Sarah Miller, age 12.
You can find out what jobs have been submitted on your computer with the condor_q command:
% condor_q -- Submitter: osg-edu.cs.wisc.edu : <192.168.0.1:46374> : osg-edu.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held
Nothing is running right now. If something was running, you would see output like this:
% condor_q -- Submitter: royal01.cs.wisc.edu : <220.127.116.11:32775> : royal01.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 4589.0 doronn 3/30 18:07 19+09:26:01 I 0 0.0 go1 5140.0 araddan 7/18 15:59 8+08:16:47 I 0 0.0 .condor_run.23359 5145.0 araddan 7/18 17:22 0+21:29:41 I 0 0.0 matlab-script.txt 6041.0 grishas 12/7 18:41 7+08:03:25 R 0 45.7 a.out 6042.0 grishas 12/7 18:42 8+07:47:14 R 0 45.7 a.out 6044.0 grishas 12/9 11:15 6+17:14:46 R 0 45.7 a.outThe output that you see will be different depending on what jobs are running. Notice what we can see from this:
What else can you find out with condor_q? Try any one of:
How do you use the -constraint or -format options to condor_q? When would you want them? When would you use the -l option? This might be an easier exercise to try once you submit some jobs.
You can find out what computers are in the Condor pool. (A pool is similar to a cluster, but it doesn't have the connotation that all computers are dedicated full-time to computation: some may be desktop computers owned by users.) To look, use condor_status:
% condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime vm1@osgs-c03. LINUX INTEL Unclaimed Idle 0.000 1013 3+21:40:55 vm2@osgs-c03. LINUX INTEL Unclaimed Idle 0.000 1013 0+02:35:29 osgs-c05.cs.w LINUX INTEL Unclaimed Idle 0.010 2026 0+02:00:58 vm1@osgs-c06. LINUX INTEL Owner Idle 0.010 1013 0+01:14:17 vm2@osgs-c06. LINUX INTEL Unclaimed Idle 0.000 1013107+03:55:50 vm1@osgs-c07. LINUX INTEL Unclaimed Idle 0.000 1013107+03:53:50 vm2@osgs-c07. LINUX INTEL Unclaimed Idle 0.000 1013 0+02:48:48 vm1@osgs-c08. LINUX INTEL Unclaimed Idle 0.000 1013 0+02:20:03 vm2@osgs-c08. LINUX INTEL Unclaimed Idle 0.000 1013152+19:19:58 vm1@osgs-c09. LINUX INTEL Unclaimed Idle 0.000 1013 0+02:26:46 vm2@osgs-c09. LINUX INTEL Unclaimed Idle 0.000 1013106+08:38:57 Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/LINUX 11 1 0 10 0 0 0 Total 11 1 0 10 0 0 0
On some of your commputers, you might see two apparent computers because we have multiple CPUs. These are listed as "vm1" or "vm2". Let's look at exactly what you can see:
What else can you find out with condor_status? Try any one of: