Condor User Tools

The condor team has striven to build user tools that make analysis of a given pool trivial. Here we will give you an introduction to these tools.

Examining the Condor Pool With condor_status

The condor_status command is useful for monitoring lots of things. The most general is the state of all the machines in your pool. Try running it with no arguments.

%  condor_status

The condor_status command will print a table containing information about the state of every machine in the pool. It will also print a summary of the states of the machine at the end. Next try:

%  condor_status -pool condor.cs.wisc.edu -total

The -total option simply displays summary information without information about every machine. We can also limit the information we see with the -constraint option. The -pool option gets the status of the pool with the central manager condor.cs.wisc.edu. Next:

%  condor_status -pool condor.cs.wisc.edu -submitters

This option summarizes information about jobs that have been submitted. The first table organizes them by machine, the second by user. Next, type:

%  condor_status -pool condor.cs.wisc.edu -run

The -run option gives information about all the machines in the pool that condor jobs are currently running on. Next try:

%  condor_status -pool condor.cs.wisc.edu -constraint "Memory >= 256"

This will display only machines with greater then or equal to 256 MB of memory. This option is very powerful since it allows you to make complex queries about the state of your pool. If one wanted to see all the machines that have more then 256 MB of memory and have more then 1 processor, or that are faster then 400 Mips, I could do:

%  condor_status -pool condor.cs.wisc.edu -constraint "(Memory >= 256 && Cpus > 2) || Mips > 400"

Examining the Condor Job Queue With condor_q

Preliminary: Please change to the example1 directory by typing:

%  cd ~/workbook/tools/example1

The condor_q command can be used to view the state of the submit queue. With no arguments, it builds a table of the jobs submitted from the local machine.

%  condor_q

Right now this is empty because we don't have any jobs submitted. Let's submit the example in this directory.

%  condor_submit example1.submit
%  condor_q

Now we should have a list of several jobs that we submitted. The condor_q command can also be used to view the global queue of all condor jobs for the pool

%  condor_q -pool condor.cs.wisc.edu -global

We can also find out information about a particular person's jobs only with the -submitter option with replaced with your username.

%  condor_q -pool condor.cs.wisc.edu -global -submitter

One of the most common problems that users have with condor is with jobs that won't run. There is now an option to condor_q that attempts to figure out why jobs aren't running. Let's submit a bogus submit file and then ask condor why the job isn't running

%  cd ../example2
%  condor_submit example2.submit
%  condor_q

You will see from condor_q that the job is not running, so we ask it to figure out what is wrong. Remember to replace your_cluster and your_proc with the cluster and proc that your job was submitted as.

%  condor_q your_cluster.your_proc -analyze

Condor now does an analysis of the job, and reports the following:

-- Submitter: dukat.cs.wisc.edu : <128.105.45.39:48418> : dukat.cs.wisc.edu
---
009.000:  Run analysis summary.  Of 324 resource offers,
          324 do not satisfy the request's constraints
            0 resource offer constraints are not satisfied by this request
            0 are serving equal or higher priority customers
            0 are serving more preferred customers
            0 cannot preempt because preemption has been held
            0 are available to service your request

WARNING:  Be advised:
   No resources matched request's constraints
   Check the Requirements expression below:

Requirements = (0 == 1) && (Arch == "INTEL") && (OpSys == "SOLARIS251") && (Disk
 >= ExecutableSize) && (VirtualMemory >= ImageSize) && (FileSystemDomain == "cs.
wisc.edu")

From the above output, I am advised that there is something wrong with my requirements field, and indeed, I am asking 0 to equal 1, which is not likely.

Removing Jobs From the Queue With condor_rm

Since the job we just submitted is obviously never going to run. We need to remove it from the queue. We do this using condor_rm

condor_rm your_cluster.your_proc

The job is now gone.

Changing Job Priorities With condor_prio

Preliminary: Please change to the example3 directory by typing:

%  cd ~/workbook/tools/example3

Users can set the priority in which they would like their jobs to be run. Once the jobs are submitted, this is accomplished using the condor_prio command. Let's look at an example

%  cat example3.submit

You can see that we can set the priority of a job in the config file, in this case, we set the priority based on the Process number. This will lead to the jobs running in reverse order. Now submit the job.

%  condor_submit example3.submit
%  condor_q

The default priority is 0, but this can be changed to as high as 20, or as low as -20. Let's change the first jobs priority to -20 and the last to +20. Let's change the priority of job 0 to 20, so that it will run next.

%  condor_prio -p 20 5.0
%  condor_q

You will notice that condor_q reflects the changes in the priority column. We can continue to do condor_q over and over, watching jobs get started in the order in which we prioritized them.

Viewing User Priorities With condor_userprio

Besides job priorities, all users also have a priority based on resource usage, and a specific pool's policy. To view the current priorities, type condor_userprio

%  condor_userprio

Lower priorities are better in the user priority system.