2.0 The Condor Installation
In this section, you will learn a little about your Condor installation.
2.1 The Installation
Since Condor has been installed for you, we think it important to first show you a little about what's under the hood of a Condor installation. Condor should already be running on your machine: but you can prove it to yourself by launching the Task Manager (either by right-clicking on the Taskbar (that colorful bar across the bottom of your screen) or by using the keyboard short-cut: Ctrl + Shift + Esc—which you all probably have committed to memory already)
Once you have the Task Manager up and running, click on the Image Name column heading to sort the list of programs alphabetically (if it isn't already). If the Condor programs aren't immediately visible, use the scroll bar on the right to find them. Also, make sure that "Show processes from all users" is clicked. (Even though it's not shown that way in the screenshot.)
The programs listed may be slightly different than ours, but as long as it lists all of those Condor programs, it's okay. The picture on the right is a slightly different view than the Task Manager can give us; the reason it is there is to illustrate that the master, as elaborated on bellow, is really in charge—it's the parent process of all the other Condor programs. Let's take a closer look at what we see:
condor_master: This program runs constantly and ensures that all other parts of Condor are running. If they hang or crash, it restarts them.
condor_collector: This program is part of the Condor central manager. It collects information about all computers in the pool as well as which users want to run jobs. It is what normally responds to the condor_status command.
condor_negotiator: This program is part of the Condor central manager. It decides what jobs should be run where.
condor_startd: If this program is running, it allows jobs to be started up on this computer--that is, your computer is an "execute machine". This advertises your computer to the central manager (more on that later, but in this case it's also your computer) so that it knows about this computer. It will start up the jobs that run.
condor_schedd If this program is running, it allows jobs to be submitted from this computer—that is, your computer is a "submit machine". This will advertise jobs to the central manager so that it knows about them. It will contact a condor_startd on other execute machines for each job that needs to be started.
condor_shadow (Not shown above) For each job that has been submitted from this computer, there is one condor_shadow running. It will watch over the job as it runs remotely. In some cases it will provide some assistance. You may or may not see any condor_shadow processes running, depending on what is happening on the computer when you try it out.
Now that you know all the names of the main Condor programs, as well as their function, you should all be experts and require no further guidance... right? Well, just in case everything isn't quite perfectly clear we have included the following diagram to illustrates the relationship between all the programs we've just discussed:
We also have a less formal graphic representation of these programs, drawn by Sarah Miller, age 12. (Us Windows people might not get the UNIX joke right away, but services on UNIX are called daemons which is commonly misinterpreted as demons. And to be fair, some of the daemons out there can be pretty nasty to work with. Of course, that doesn't apply to any of ours.)
Condor, unfortunatly, has no GUI interface, so before we can interact with it, we'll need to run a Command Prompt. As with many things in Windows there are several ways to do this: you can click on the Start Menu and then on Run.... This will bring up the Run dialog. In the dialog type cmd and click the OK button. If you are more keyboard enclined, the same can be acomplished by skiping the clicks and typing WinKey + R (which will bring up the same Run dialog), you'll also need to type cmd, but you can press Enter rather than clicking on the OK button.
You can find out what jobs have been submitted on your computer with the condor_q command:
C:\> condor_q -- Submitter: lab-21 : <126.96.36.199:2207> : lab-21 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held
Nothing is running right now. If something was running, you would see output like this:
C:\> condor_q -- Submitter: royal01.cs.wisc.edu : <188.8.131.52:32775> : royal01.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 4589.0 doronn 3/30 18:07 19+09:26:01 I 0 0.0 go1 5140.0 araddan 7/18 15:59 8+08:16:47 I 0 0.0 .condor_run.23359 5145.0 araddan 7/18 17:22 0+21:29:41 I 0 0.0 matlab-script.txt 6041.0 grishas 12/7 18:41 7+08:03:25 R 0 45.7 a.out 6042.0 grishas 12/7 18:42 8+07:47:14 R 0 45.7 a.out 6044.0 grishas 12/9 11:15 6+17:14:46 R 0 45.7 a.outThe output that you see will be different depending on what jobs are running. Notice what we can see from this:
What else can you find out with condor_q? Try any one of:
How do you use the -constraint or -format options to condor_q? When would you want them? When would you use the -l option? This might be an easier exercise to try once you submit some jobs.
You can find out what computers are in your Condor pool. (A pool is similar to a cluster, but it doesn't have the connotation that all computers are dedicated full-time to computation: some may be desktop computers owned by users.) To look, use condor_status:
C:\> condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime lab-01 WINNT51 INTEL Unclaimed Idle 0.000 502 0+01:05:04 lab-02 WINNT51 INTEL Unclaimed Idle 0.010 502 0+01:05:04 lab-03 WINNT51 INTEL Unclaimed Idle 0.000 502 0+00:40:04 lab-04 WINNT51 INTEL Unclaimed Idle 0.090 502 0+01:00:04 lab-05 WINNT51 INTEL Unclaimed Idle 0.040 502 0+01:00:04 lab-06 WINNT51 INTEL Unclaimed Idle 0.050 502 0+02:05:04 lab-07 WINNT51 INTEL Unclaimed Idle 0.120 502 0+02:05:04 lab-08 WINNT51 INTEL Unclaimed Idle 0.020 502 0+01:00:04 lab-09 WINNT51 INTEL Unclaimed Idle 0.000 502 0+00:55:04 lab-10 WINNT51 INTEL Unclaimed Idle 0.020 502 0+00:55:04 lab-11 WINNT51 INTEL Unclaimed Idle 0.050 502 0+00:55:04 lab-13 WINNT51 INTEL Unclaimed Idle 0.020 502 0+00:50:04 lab-14 WINNT51 INTEL Unclaimed Idle 0.010 502 0+00:50:04 lab-15 WINNT51 INTEL Unclaimed Idle 0.070 502 0+00:50:04 lab-16 WINNT51 INTEL Unclaimed Idle 0.020 502 0+01:30:04 lab-17 WINNT51 INTEL Unclaimed Idle 0.040 502 0+00:50:04 lab-19 WINNT51 INTEL Unclaimed Idle 0.090 502 0+00:45:04 lab-20 WINNT51 INTEL Unclaimed Idle 0.000 502 0+00:45:04 lab-21 WINNT51 INTEL Unclaimed Idle 0.030 502 0+00:45:04 lab-22 WINNT51 INTEL Unclaimed Idle 0.030 502 0+03:00:05 Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/WINNT51 20 0 0 20 0 0 0 Total 20 0 0 20 0 0 0
Here we only see one computer, because it is not part of a pool, but you will likely see many more in your list. On some of your computers, as in the example above, you might see two virtual computers (called slots) due to hyperthreading, dual-core CPUs or computers with multiple CPUs.
Let's look at exactly the what condor_status is telling us:
What else can you find out with condor_status? Try any one of: