Aside from the configuration macros (see section 6.5, below), the major way to customize condor_annex is my customizing the default disk image. Because the implementation of condor_annex varies from service to service, and that implementation determines the constraints on the disk image, the this section is divided by service.
Requirements for an Annex-compatible AMI are driven by how condor_annex securely transports HTCondor configuration and security tokens to the instances; we will discuss that implementation briefly, to help you understand the requirements, even though it will hopefully never matter to you.
For on-demand or Spot instances, we begin by making a single resource request whose client token is the annex name concatenated with an underscore and then a newly-generated GUID. This construction allows us to terminate on-demand instances belonging to a particular annex (by its name), as well as discover the annex name from inside an instance.
An on-demand instance may obtain its instance ID directly from the AWS metadata server, and then ask another AWS API for that instance ID’s client token. Since GUIDs do not contain underscores, we can be certain that anything to the left of the last underscore is the annex’s name.
An instance started by a Spot Fleet has a client token generated by the Spot Fleet. Instead of performing a direct lookup, a Spot Fleet instance must therefore determine which Spot Fleet started it, and then obtain that Spot Fleet’s client token. A Spot Fleet will tag an instance with the Spot Fleet’s identity after the instance starts up. This usually only takes a few minutes, but the default image waits for up to 50 minutes, since you’re already paying for the first hour anyway.
At this point, the instance knows its annex’s name. This allows the instance to construct the name of the tarball it should download (config-AnnexName.tar.gz), but does not tell it from where a file with that name should be downloaded.
(Because the user data associated with resource request is not secure, and because we want to leave the user data available for its normal usage, we can’t just encode the tarball or its location in the user data.)
The instance determines from which S3 bucket to download by asking the metadata server which role the instance is playing. (An instance without a role is unable to make use of any AWS services without acquiring valid AWS tokens through some other method.) The instance role created by the setup procedure includes permission to read files matching the pattern config-*.tar.gz from a particular private S3 bucket. If the instance finds permissions matching that pattern, it assumes that the corresponding S3 bucket is the one from which it should download, and does so; if successful, it untars the file in /etc/condor/config.d.
In v8.7.1, the script executing these steps is named 49ec2-instance.sh, and is called during configuration when HTCondor first starts up.
In v8.7.2, the script executing these steps is named condor-annex-ec2, and is called during system start-up.
The HTCondor configuration and security tokens are at this point protected on the instance’s disk by the usual filesystem permissions. To prevent HTCondor jobs from using the instance’s permissions to do anything, but in particular download their own copy of the security tokens, the last thing the script does is use the Linux kernel firewall to forbid any non-root process from accessing the metadata server.
Thus, to work with condor_annex, an AWS AMI must:
The second item could be construed as optional, but if left unimplemented, will disable the -idle command-line option.
The default disk image implements the above as follows:
We also strongly recommend that every condor_annex disk image:
The default disk image is configured to do all of this.
Not implemented as of v8.7.8.
Not implemented as of v8.7.8.
Not implemented as of v8.7.8.