next up previous contents index
Next: 6.2 The DRMAA API Up: 6. Application Programming Interfaces Previous: 6. Application Programming Interfaces   Contents   Index

Subsections


6.1 Web Service

HTCondor's Web Service (WS) API provides a way for application developers to interact with HTCondor, without needing to utilize HTCondor's command-line tools. In keeping with the HTCondor philosophy of reliability and fault-tolerance, this API is designed to provide a simple and powerful way to interact with HTCondor. HTCondor daemons understand and implement the SOAP (Simple Object Access Protocol) XML API to provide a web service interface for HTCondor job submission and management.

To deal with the issues of reliability and fault-tolerance, a two-phase commit mechanism to provides a transaction-based protocol. The following API description describes interaction between a client using the API and both the condor_schedd and condor_collector daemons to illustrate transactions for use in job submission, queue management and ClassAd management functions.


6.1.1 Transactions

All applications using the API to interact with the condor_schedd will need to use transactions. A transaction is an ACID unit of work (atomic, consistent, isolated, and durable). The API limits the lifetime of a transaction, and both the client (application) and the server (the condor_schedd daemon) may place a limit on the lifetime. The server reserves the right to specify a maximum duration for a transaction.

The client initiates a transaction using the beginTransaction() method. It ends the transaction with either a commit (using commitTransaction()) or an abort (using abortTransaction()).

Not all operations in the API need to be performed within a transaction. Some accept a null transaction. A null transaction is a SOAP message with

<transaction xsi:type="ns1:Transaction" xsi:nil="true"/>
Often this is achieved by passing the programming language's equivalent of null in place of a transaction identifier. It is possible that some operations will have access to more information when they are used inside a transaction. For instance, a getJobAds(). query would have access to the jobs that are pending in a transaction, which are not committed and therefore not visible outside of the transaction. Transactions are as ACID compliant as possible. Therefore, do not query for information outside of a transaction on which to make a decision inside a transaction based on the query's results.


6.1.2 Job Submission

A ClassAd is required to describe a job. The job ClassAd will be submitted to the condor_schedd within a transaction using the submit() method. The complexity of job ClassAd creation may be simplified by the createJobTemplate() method. It returns an instance of a ClassAd structure that may be further modified. A necessary part of the job ClassAd are the job attributes ClusterId and ProcId, which uniquely identify the cluster and the job within a cluster. Allocation and assignment of (monotonically increasing) ClusterId values utilize the newCluster() method. Jobs may be submitted within the assigned cluster only until the newCluster() method is invoked a subsequent time. Each job is allocated and assigned a (monotonically increasing) ProcId within the current cluster using the newJob() method. Therefore, the sequence of method calls to submit a set of jobs initially calls newCluster(). This is followed by calls to newJob() and then submit() for each job within the cluster.

As an example, here are sample cluster and job numbers that result from the ordered calls to submission methods:

  1. A call to newCluster(), assigns a ClusterId of 6.
  2. A call to newJob(), assigns a ProcId of 0, as this is the first job within the cluster.
  3. A call to submit() results in a job submission numbered 6.0.
  4. A call to newJob(), assigns a ProcId of 1.
  5. A call to submit() results in a job submission numbered 6.1.
  6. A call to newJob(), assigns a ProcId of 2.
  7. A call to submit() results in a job submission numbered 6.2.
  8. A call to newCluster(), assigns a ClusterId of 7.
  9. A call to newJob(), assigns a ProcId of 0, as this is the first job within the cluster.
  10. A call to submit() results in a job submission numbered 7.0.
  11. A call to newJob(), assigns a ProcId of 1.
  12. A call to submit() results in a job submission numbered 7.1.

There is the potential that a call to submit() will fail. Failure means that the job is in the queue, and it typically indicates that something needed by the job has not been sent. As a result the job has no hope in successfully running. It is possible to recover from such a failure by trying to resend information that the job will need. It is also completely acceptable to abort and make another attempt. To simplify the client's effort in figuring out what the job requires, a discoverJobRequirements() method accepting a job ClassAd and returning a list of things that should be sent along with the job is provided.


6.1.3 File Transfer

A common job submission case requires the job's executable and input files to be transferred from the machine where the application is running to the machine where the condor_schedd daemon is running. This is the analogous situation to running condor_submit using the -spool or -remote option. The executable and input files must be sent directly to the condor_schedd daemon, which places all files in a spool location.

The two methods declareFile() and sendFile() work in tandem to transfer files to the condor_schedd daemon. The declareFile() method causes the condor_schedd daemon to create the file in its spool location, or indicate in its return value that the file already exists. This increases efficiency, as resending an existing file is a waste of resources. The sendFile() method sends base64 encoded data. sendFile() may be used to send an entire file, or chunks of files as desired.

The declareFile() method has both required and optional arguments. declareFile() requires the name of the file and its size in bytes. The optional arguments relate hash information. A hash type of NOHASH disables file verification; the condor_schedd daemon will not have a reliable way to determine the existence of the file being declared.

Methods for retrieving files are most useful when a job is completed. Consider the categorization of the typical life-cycle for a job:

Birth:
The birth of a job begins with submit().
Childhood:
The job executes.
Middle Age:
A completed job waits to be removed. As the job enters Middle Age, its JobStatus ClassAd attribute becomes Completed (the value 4).
Old Age:
The job's information goes into the history log.

Once the job enters Middle Age, the getFile() method retrieves a file. The listSpool() method assists by providing a list of all the job's files in the spool location.

The job enters Old Age by the application's use of the closeSpool() method. It causes the condor_schedd daemon to remove the job from the queue, and the job's spool files are no longer available. As there is no requirement for the application to invoke the closeSpool() method, jobs can potentially remain in the queue forever. The configuration variable SOAP_LEAVE_IN_QUEUE may mitigate this problem. When this boolean variable evaluates to False, a job enters Old Age. A reasonable example for this configuration variable is

SOAP_LEAVE_IN_QUEUE = ((JobStatus==4) && ((ServerTime - CompletionDate) < (60 * 60 * 24)))
This expression results in Old age for a job (removed from the queue), once the job has been Middle Aged (been completed) for 24 hours.


6.1.4 Implementation Details

HTCondor daemons understand and communicate using the SOAP XML protocol. An application seeking to use this protocol will require code that handles the communication. The XML WSDL (Web Services Description Language) that HTCondor implements is included with the HTCondor distribution. It is in $(RELEASE_DIR)/lib/webservice. The WSDL must be run through a toolkit to produce language-specific routines that do communication. The application is compiled with these routines.

HTCondor must be configured to enable responses to SOAP calls. Please see section 3.3.31 for definitions of the configuration variables related to the web services API. The WS interface is listening on the condor_schedd daemon's command port. To obtain a list of all the the condor_schedd daemons in the pool with a WS interface, issue the command:

  %  condor_status -schedd -constraint "HasSOAPInterface=?=TRUE"
With this information, a further command locates the port number to use:
  % condor_status -schedd -constraint "HasSOAPInterface=?=TRUE" -l | grep MyAddress 

HTCondor's security configuration must be set up such that access is authorized for the SOAP client. See Section 3.6.7 for information on how to set the ALLOW_SOAP and DENY_SOAP configuration variables.

The API's routines can be roughly categorized into ones that deal with

The routines for each of these categories is detailed. Note that the signature provided will accurately reflect a routine's name, but that return values and parameter specification will vary according to the target programming language.


6.1.5 Get These Items Correct


6.1.6 Methods for Transaction Management

beginTransaction
Begin a transaction. A prototype is

StatusAndTransaction beginTransaction(int duration);

Parameters
  • duration The expected duration of the transaction.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains the new transaction.

commitTransaction
Commits a transaction. A prototype is

Status commitTransaction(Transaction transaction);

Parameters
  • transaction The transaction to be committed.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

abortTransaction
Abort a transaction. A prototype is

Status abortTransaction(Transaction transaction);

Parameters
  • transaction The transaction to be aborted.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

extendTransaction
Request an extension in duration for a specific transaction. A prototype is

StatusAndTransaction extendTransaction( Transaction transaction, int duration);

Parameters
  • transaction The transaction to be extended.
  • duration The duration of the extension.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains the transaction with the extended duration.


6.1.7 Methods for Job Submission

submit
Submit a job. A prototype is

StatusAndRequirements submit(Transaction transaction, int clusterId, int jobId, ClassAd jobAd);

Parameters
  • transaction The transaction in which the submission takes place.
  • clusterId The cluster identifier.
  • jobId The job identifier.
  • jobAd The ClassAd describing the job. Creation of this ClassAd can be simplified with createJobTemplate();.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, the return value contains the job's requirements.

createJobTemplate
Request a job Class Ad, given some of the job requirements. This job Class Ad will be suitable for use when submitting the job. Note that the job attribute NTDomain is not set by this function, but must be set for jobs that will execute on Windows platforms. A prototype is

StatusAndClassAd createJobTemplate(int clusterId, int jobId, String owner, UniverseType type, String command, String arguments, String requirements);

Parameters
  • clusterId The cluster identifier.
  • jobId The job identifier.
  • owner The name to be associated with the job.
  • type The universe under which the job will run, where type can be one of the following:

    enum UniverseType { STANDARD = 1, VANILLA = 5, SCHEDULER = 7, MPI = 8, GRID = 9, JAVA = 10, PARALLEL = 11, LOCALUNIVERSE = 12, VM = 13 };

  • command The command to execute once the job has started.
  • arguments The command-line arguments for command.
  • requirements The requirements expression for the job. For further details and examples of the expression syntax, please refer to section 4.1.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

discoverJobRequirements
Discover the requirements of a job, given a Class Ad. May be helpful in determining what should be sent along with the job. A prototype is

StatusAndRequirements discoverJobRequirements( ClassAd jobAd);

Parameters
  • jobAd The ClassAd of the job.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains the job's requirements.


6.1.8 Methods for File Transfer

declareFile
Declare a file that may be used by a job. A prototype is

Status declareFile(Transaction transaction, int clusterId, int jobId, String name, int size, HashType hashType, String hash);

Parameters
  • transaction The transaction in which this file is declared.
  • clusterId The cluster identifier.
  • jobId An identifier of the job that will use the file.
  • name The name of the file.
  • size The size of the file.
  • hashType The type of hash mechanism used to verify file integrity, where hashType can be one of the following:

    enum HashType { NOHASH, MD5HASH };

  • hash An optionally zero-length string encoding of the file hash.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

sendFile
Send a file that a job may use. A prototype is

Status sendFile(Transaction transaction, int clusterId, int jobId, String name, int offset, Base64 data);

Parameters
  • transaction The transaction in which this file is send.
  • clusterId The cluster identifier.
  • jobId An identifier of the job that will use the file.
  • name The name of the file being sent.
  • offset The starting offset within the file being sent.
  • length The length from the offset to send.
  • data The data block being sent. This could be the entire file or a sub-section of the file as defined by offset and length.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

getFile
Get a file from a job's spool. A prototype is

StatusAndBase64 getFile(Transaction transaction, int clusterId, int jobId, String name, int offset, int length);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster in which to search.
  • jobId The job identifier the file is associated with.
  • name The name of the file to retrieve.
  • offset The starting offset withing the file being retrieved.
  • length The length from the offset to retrieve.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains the file or a sub-section of the file as defined by offset and length.

closeSpool
Close a job's spool. All the files in the job's spool can be deleted. A prototype is

Status closeSpool(Transaction transaction, int clusterId, int jobId);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster identifier which the job is associated with.
  • jobId The job identifier for which the spool is to be removed.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

listSpool
List the files in a job's spool. A prototype is

StatusAndFileInfoArray listSpool(Transaction transaction, int clusterId, int jobId);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster in which to search.
  • jobId The job identifier to search for.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains a list of files and their respective sizes.


6.1.9 Methods for Job Management

newCluster
Create a new job cluster. A prototype is

StatusAndInt newCluster(Transaction transaction);

Parameters
  • transaction The transaction in which this cluster is created.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains the cluster id.

removeCluster
Remove a job cluster, and all the jobs within it. A prototype is

Status removeCluster(Transaction transaction, int clusterId, String reason);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster to remove.
  • reason The reason for the removal.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

newJob
Creates a new job within the most recently created job cluster. A prototype is

StatusAndInt newJob(Transaction transaction, int clusterId);

Parameters
  • transaction The transaction in which this job is created.
  • clusterId The cluster identifier of the most recently created cluster.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains the job id.

removeJob
Remove a job, regardless of the job's state. A prototype is

Status removeJob(Transaction transaction, int clusterId, int jobId, String reason, boolean forceRemoval);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster identifier to search in.
  • jobId The job identifier to search for.
  • reason The reason for the release.
  • forceRemoval Set if the job should be forcibly removed.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

holdJob
Put a job into the Hold state, regardless of the job's current state. A prototype is

Status holdJob(Transaction transaction, int clusterId, int jobId, string reason, boolean emailUser, boolean emailAdmin, boolean systemHold);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster in which to search.
  • jobId The job identifier to search for.
  • reason The reason for the release.
  • emailUser Set if the submitting user should be notified.
  • emailAdmin Set if the administrator should be notified.
  • systemHold Set if the job should be put on hold.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

releaseJob
Release a job that has been in the Hold state. A prototype is

Status releaseJob(Transaction transaction, int clusterId, int jobId, String reason, boolean emailUser, boolean emailAdmin);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster in which to search.
  • jobId The job identifier to search for.
  • reason The reason for the release.
  • emailUser Set if the submitting user should be notified.
  • emailAdmin Set if the administrator should be notified.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

getJobAds
A prototype is

StatusAndClassAdArray getJobAds(Transaction transaction, String constraint);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • constraint A string constraining the number ClassAds to return. For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains all job ClassAds matching the given constraint.

getJobAd
Finds a specific job ClassAd.

This method does much the same as the first element from the array returned by

getJobAds(transaction, "(ClusterId==clusterId && JobId==jobId)")

A prototype is

StatusAndClassAd getJobAd(Transaction transaction, int clusterId, int jobId);

Parameters
  • transaction An optionally nullable transaction, meaning this call does not need to occur in a transaction.
  • clusterId The cluster in which to search.
  • jobId The job identifier to search for.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values. Additionally, on success, the return value contains the requested ClassAd.

requestReschedule
Request a condor_reschedule from the condor_schedd daemon. A prototype is

Status requestReschedule();

Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.


6.1.10 Methods for ClassAd Management

insertAd
A prototype is

Status insertAd(ClassAdType type, ClassAdStruct ad);

Parameters
  • type The type of ClassAd to insert, where type can be one of the following:

    enum ClassAdType { STARTD_AD_TYPE, QUILL_AD_TYPE, SCHEDD_AD_TYPE, SUBMITTOR_AD_TYPE, LICENSE_AD_TYPE, MASTER_AD_TYPE, CKPTSRVR_AD_TYPE, COLLECTOR_AD_TYPE, STORAGE_AD_TYPE, NEGOTIATOR_AD_TYPE, HAD_AD_TYPE, GENERIC_AD_TYPE };

  • ad The ClassAd to insert.
Return Value
If the function succeeds, the return value is SUCCESS; otherwise, see StatusCode for valid return values.

queryStartdAds
A prototype is

ClassAdArray queryStartdAds(String constraint);

Parameters
  • constraint A string constraining the number ClassAds to return. For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
A list of all the condor_startd ClassAds matching the given constraint.

queryScheddAds
A prototype is

ClassAdArray queryScheddAds(String constraint);

Parameters
  • constraint A string constraining the number ClassAds to return. For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
A list of all the condor_schedd ClassAds matching the given constraint.

queryMasterAds
A prototype is

ClassAdArray queryMasterAds(String constraint);

Parameters
  • constraint A string constraining the number ClassAds to return. For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
A list of all the condor_master ClassAds matching the given constraint.

querySubmittorAds
A prototype is

ClassAdArray querySubmittorAds(String constraint);

Parameters
  • constraint A string constraining the number ClassAds to return. For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
A list of all the submitters ClassAds matching the given constraint.

queryLicenseAds
A prototype is

ClassAdArray queryLicenseAds(String constraint);

Parameters
  • constraint A string constraining the number ClassAds to return.For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
A list of all the license ClassAds matching the given constraint.

queryStorageAds
A prototype is

ClassAdArray queryStorageAds(String constraint);

Parameters
  • constraint A string constraining the number ClassAds to return. For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
A list of all the storage ClassAds matching the given constraint.

queryAnyAds
A prototype is

ClassAdArray queryAnyAds(String constraint);

Parameters
  • constraint A string constraining the number ClassAds to return. For further details and examples of the constraint syntax, please refer to section 4.1.
Return Value
A list of all the ClassAds matching the given constraint. to return.


6.1.11 Methods for Version Information

getVersionString
A prototype is

StatusAndString getVersionString();

Return Value
Returns the HTCondor version as a string.

getPlatformString
A prototype is

StatusAndString getPlatformString();

Return Value
Returns the platform information HTCondor is running on as string.


6.1.12 Common Data Structures

Many methods return a status. Table 6.1 lists and defines the StatusCode return values.


Table 6.1: StatusCode definitions
Value Identifier Definition
0 SUCCESS All OK
1 FAIL An error occurred that is not specific to another error code
2 INVALIDTRANSACTION No such transaction exists
3 UNKNOWNCLUSTER The specified cluster is not the currently active one
4 UNKNOWNJOB The specified job does not exist within the specified cluster
5 UNKNOWNFILE  
6 INCOMPLETE  
7 INVALIDOFFSET  
8 ALREADYEXISTS For this job, the specified file already exists



next up previous contents index
Next: 6.2 The DRMAA API Up: 6. Application Programming Interfaces Previous: 6. Application Programming Interfaces   Contents   Index
htcondor-admin@cs.wisc.edu