This is an outdated version of the HTCondor Manual. You can find current documentation at http://htcondor.org/manual.
next up previous contents index
Next: 6.3 HTCondor Annex Customization Up: 6. Cloud Computing Previous: 6.1 Introduction   Contents   Index

Subsections


6.2 HTCondor Annex User's Guide

A user of condor_annex may be a regular job submitter, or she may be an HTCondor pool administrator. This guide will cover basic condor_annex usage first, followed by advanced usage that may be of less interest to the submitter. Users interested in customizing condor_annex should consult section 6.3.

6.2.1 Considerations and Limitations

When you run condor_annex, you are adding (virtual) machines to an HTCondor pool. As a submitter, you probably don't have permission to add machines to the HTCondor pool you're already using; generally speaking, security concerns will forbid this. If you're a pool administrator, you can of course add machines to your pool as you see fit. By default, however, condor_annex instances will only start jobs submitted by the user who started the annex, so pool administrators using condor_annex on their users' behalf will probably want to use the -owners option or -no-owner flag; see the man page (section 12). Once the new machines join the pool, they will run jobs as normal.

Submitters, however, will have to set up their own personal HTCondor pool, so that condor_annex has a pool to join, and then work with their pool administrator if they want to move their existing jobs to their new pool. Otherwise, jobs will have to be manually divided (removed from one and resubmitted to the other) between the pools. For instructions on creating a personal condor pool, configuring condor_annex to use a particular AWS account, and then setting up that account for use with condor_annex, see

https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=UsingCondorAnnexForTheFirstTimeEightSevenFour.

Starting in v8.7.1, condor_annex will check for inbound access to the collector (usually port 9618) before starting an annex (it does not support other network topologies). When checking connectivity from AWS, the IP(s) used by the AWS Lambda function implementing this check may not be in the same range(s) as those used by AWS instance; please consult AWS's list of all their IP ranges6.2when configuring your firewall.

Starting in v8.7.2, condor_annex requires that the AWS secret (private) key file be owned by the submitting user and not readable by anyone else. This helps to ensure proper attribution.

6.2.2 Basic Usage

A quick-start guide with basic usage information is kept on our Wiki:

https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToUseCondorAnnexWithOnDemandInstancesEightSevenFour.

6.2.3 Advanced Usage

The basic usage guide on the Wiki covered using what AWS calls ``on-demand'' instances. (An ``instance'' is ``a single occurrence of something,'' in this case, a virtual machine. The intent is to distinguish between the active process that's pretending to be a real piece of hardware - the ``instance'' - and the template it used to start it up, which may also be called a virtual machine.) An on-demand instance has a price fixed by AWS; once acquired, AWS will let you keep it running as long as you continue to pay for it.

In constrast, a ``Spot'' instance has a price determined by an (automated) auction; when you request a ``Spot'' instance, you specify the most (per hour) you're willing to pay for that instance. If you get an instance, however, you pay only what the spot price is for that instance; in effect, AWS determines the spot price by lowering it until they run out of instances to rent. AWS advertises savings of up to 90% over on-demand instances.

There are two drawbacks to this cheaper type of instance: first, you may have to wait (indefinitely) for instances to become available at your preferred price-point; the second is that your instances may be taken away from you before you're done with them because somebody else will pay more for them. (You won't be charged for the hour in which AWS kicks you off an instance, but you will still owe them for all of that instance's previous hours.) Both drawbacks can be mitigated (but not eliminated) by bidding the on-demand price for an instance; of course, this also minimizes your savings.

Determining an appropriate bidding strategy is outside the purview of this manual.

6.2.3.1 Using AWS Spot Fleet

condor_annex supports Spot instances via an AWS technology called ``Spot Fleet''. Normally, when you request instances, you request a specific type of instance (the default on-demand instance is, for instance, `m4.large'.) However, in many cases, you don't care too much about how many cores an intance has - HTCondor will automatically advertise the right number and schedule jobs appropriately, so why would you? In such cases - or in other cases where your jobs will run acceptably on more than one type of instance - you can make a Spot Fleet request which says something like ``give me a thousand cores as cheaply as possible'', and specify that an `m4.large' instance has two cores, while `m4.xlarge' has four, and so on. (The interface actually allows you to assign arbitrary values - like HTCondor slot weights - to each instance type6.3, but the default value is core count.) AWS will then divide the current price for each instance type by its core count and request spot instances at the cheapest per-core rate until the number of cores (not the number of instances!) has reached a thousand, or that instance type is exhausted, at which point it will request the next-cheapest instance type.

(At present, a Spot Fleet only chooses the cheapest price within each AWS region; you would have to start a Spot Fleet in each AWS region you were willing to use to make sure you got the cheapest possible price. For fault tolerance, each AWS region is split into independent zones, but each zone has its own price. Spot Fleet takes care of that detail for you.)

In order to create an annex via a Spot Fleet, you'll need a file containing a JSON blob which describes the Spot Fleet request you'd like to make. (It's too complicated for a reasonable command-line interface.) The AWS web console can be used to create such a file; the button to download that file is (currently) in the upper-right corner of the last page before you submit the Spot Fleet request; it is labeled `JSON config'. You may need to create an IAM role the first time you make a Spot Fleet request; please do so before running condor_annex.

You must select the instance role profile used by your on-demand instances for condor_annex to work. This value will be stored in the configuration macro ANNEX_DEFAULT_ODI_INSTANCE_PROFILE_ARN by the setup procedure.

Specify the JSON configuration file using -aws-spot-fleet-config-file, or set the configuration macro ANNEX_DEFAULT_SFR_CONFIG_FILE to the full path of the file you just downloaded, if you'd like it to become your default configuration for Spot annexes. Be aware that condor_annex does not alter the validity period if one is set in the Spot Fleet configuration file. You should remove the references to `ValidFrom' and `ValidTo' in the JSON file to avoid confusing surprises later.

Additionally, be aware that condor_annex uses the Spot Fleet API in its ``request'' mode, which means that an annex created with Spot Fleet has the same semantics with respect to replacement as it would otherwise: if an instance terminates for any reason, including AWS taking it away to give to someone else, it is not replaced.

You must specify the number of cores (total instance weight; see above) using -slots. You may also specify -aws-spot-fleet, if you wish; doing so may make this condor_annex invocation more self-documenting. You may use other options as normal, excepting those which begin with -aws-on-demand, which indicates an option specific to on-demand instances.

6.2.3.2 Custom HTCondor Configuration

When you specify a custom configuration, you specify the full path to a configuration directory which will be copied to the instance. The customizations performed by condor_annex will be applied to a temporary copy of this directory before it is uploaded to the instance. Those customizations consist of creating two files: password_file.pl (named that way to ensure that it isn't ever accidentally treated as configuration), and 00ec2-dynamic.config. The former is a password file for use by the pool password security method, which if configured, will be used by condor_annex automatically. The latter is an HTCondor configuration file; it is named so as to sort first and make it easier to over-ride with whatever configuration you see fit.

6.2.3.3 AWS Instance User Data

HTCondor doesn't interfere with this in anyway, but if you'd like to set an instance's user data, you may do so. However, as of v8.7.2, the -user-data options don't work for on-demand instances (the default type). If you'd like to specify user data for your Spot Fleet -driven annex, you may do so in four different ways: on the command-line or from a file, and for all launch specifications or for only those launch specifications which don't already include user data. These two choices correspond to the absence or presence of a trailing -file and the absence or presence of -default immediately preceding -user-data.

A ``launch specification,'' in this context, means one of the virtual machine templates you told Spot Fleet would be an acceptable way to accomodate your resource request. This usually corresponds one-to-one with instance types, but this is not required.

6.2.3.4 Expert Mode

The man page (in section 12) lists the ``expert mode'' options.

Four of the ``expert mode'' options set the URLs used to access AWS services, not including the CloudFormation URL needed by the -setup flag. (To change the region for the -setup flag, you must change the HTCondor configuration macro ANNEX_DEFAULT_CF_URL, which you may of course change temporarily via the environment on the command line.) This can be useful if you'd like to temporarily use a different region than the one in your configuration, which defaults to us-east-1. Be sure to change all of the URLs - Lambda functions and CloudWatch events in one region don't work with instances in another region.

You may also temporarily specify a different AWS account by using the access (-aws-access-key-file) and secret key (-aws-secret-key-file) options. Regular users may have an accounting reason to do this.

The options labeled ``developers only'' control implementation details and may change without warning; they are probably best left unused unless you're a developer.


next up previous contents index
Next: 6.3 HTCondor Annex Customization Up: 6. Cloud Computing Previous: 6.1 Introduction   Contents   Index