Installing a Personal Condor Pool


For the first tutorial, you will install a "Personal Condor Pool" on your own machine. Once you have the pool up and running, you will monitor the daemons, check the status of your own pool, and submit a test job. At the end, we will join all the machines together into one large pool.

In your home directory, you have a Condor binary release file, "condor-6.1.7-linux-x86-glibc.tar.gz". This is exactly what you'd download from Condor's web page if you were installing Condor from scratch. First, unpack this file:

% tar -zxvf condor-6.1.7-linux-x86-glibc.tar.gz

Now, you'll have a "condor-6.1.7" directory. cd into that:

% cd condor-6.1.7

Now, just run "condor_install"

% ./condor_install

Please read the instructions and questions carefully. If something is unclear, just ask. Here are the answers you should give:

Question
Answer
Press enter to begin Condor installation
{enter}
Do you want to do a full installation of Condor?
yes
Are you planning to setup Condor on multiple machines? [yes]
no
Have you installed a release directory already? [no]
{enter}
Where would you like to install the Condor release directory?
/local/condor
If something goes wrong with Condor, who should get email about it? [root@infn-corsi06.corsi.infn.it]
{enter}
What is the full path to a mail program that understands "-s" means you want to specify a subject? [/bin/mail]
{enter}
Do all of the machines in your pool from your domain ("corsi.infn.it") share a common filesystem? [no]
{enter}
Do all of the users across all the machines in your domain have a unique UID (in other words, do they all share a common passwd file)? [no]
{enter}
Shall I create links in some other directory? [yes]
no
What is the full hostname of the central manager? [infn-corsi**.corsi.infn.it]
(Note: condor_install will give you the full hostname of your local machine here, which is what is meant by "infn-corsi**" above).
{enter}
You have a "condor" user on this machine. Do you want to put all the Condor directories in /local/condor/home? [yes]
{enter}
Should I put a "condor_config.local" file in /local/condor/home? [yes]
{enter}
What name would you like to use for this pool? ...
(Note: We're using a trick here for the answer. You're going to use a "macro", $(FUll_HOSTNAME), to define this value. Condor will automatically replace the macro with the full hostname of your local machine when it is using it. We'll talk more about macros later in the tutorial.)
$(FULL_HOSTNAME)
Should I put in a soft link from /local/condor/home/condor_config to /local/condor/etc/condor_config [yes]
{enter}

That's it. Condor is now installed. Now, you just have to start the "condor_master" daemon, and you will have your own Condor pool running on your machine:

% /local/condor/sbin/condor_master

Now, Condor will be running. To see the daemons running, use "ps":

% ps auwwx | grep condor_

You can also look at your pool with "condor_status".

% condor_status

(Note: if you don't see anything yet, it's because the condor_startd is running some benchmarks, which it always does on startup. If you wait a few seconds and try again, you should see your machine).

There are a number of different options. Take a look at each one and the kind of information given:

% condor_status -l
% condor_status -master -l 
% condor_status -schedd
% condor_status -schedd -l

You can also look at the status of another Condor pool with the "-pool" option:

% condor_status -pool infn-corsi98

You can use this option in addition to any of the others, so feel free to experiment with different combinations.

Now, you can build a Condor job:

% cd examples
% make registers.remote

This will run "condor_compile" to link your job with Condor's libraries.

Now, submit it to Condor. First, look at the "submit file":

% cat registers.cmd

Finally, submit this job to Condor:

% condor_submit registers.cmd

Now, you can view the job with "condor_q":

% condor_q

To see the entire job ClassAd, use "-long"

% condor_q -l

This job will not run as long as you continue typing on your machine, since the default policy is to only start Condor jobs on machines that have had their keyboard idle for 15 minutes. We'll describe the Condor policy expressions in great detail in the next section. Later, we'll change the policy of your machines to always run jobs, even when you are typing.

As the final step, we'll join all the machines together into a single pool. There's already a central manager running at infn-corsi98. You simply have to change the configuration of your machine to tell it that this should be the central manager. You can do this by editing your configuration file:

  1. Edit ~/condor_config (use vi or emacs)
  2. Search for "CONDOR_HOST" (the first real entry)
  3. Change it to look like this:
      CONDOR_HOST   = infn-corsi98.corsi.infn.it
    
  4. Save the file and exit

Finally, you just have to send a "condor_reconfig" command to have the daemons re-read their configuration and put the changes into effect:

% condor_reconfig

Now, your machine will start reporting to the "big" pool. You can run "condor_status" and you will see the status of this new pool we have just created.

% condor_status

As one last interesting step, take a look at "condor_status -submit":

% condor_status -submit

You'll see that the job you submitted to your schedd when your machine was in its own pool is now being reported to the big pool. You submit jobs to a specific schedd, not a specific pool. Your schedd can switch pools and still keep your jobs. In fact, you can configure a schedd to ask for machines from multiple pools. This is called "flocking". We don't have time today to discuss flocking, but this gives you a flavor of how it works.

You're done. Time to eat!