Contents Index

Contents

1 Overview
 1.1 High-Throughput Computing (HTC) and its Requirements
 1.2 HTCondor’s Power
 1.3 Exceptional Features
 1.4 Current Limitations
 1.5 Availability
 1.6 Contributions and Acknowledgments
 1.7 Contact Information
 1.8 Privacy Notice
2 Users’ Manual
 2.1 Welcome to HTCondor
 2.2 Introduction
 2.3 Matchmaking with ClassAds
  2.3.1 Inspecting Machine ClassAds with condor_status
 2.4 Running a Job: the Steps To Take
  2.4.1 Choosing an HTCondor Universe
 2.5 Submitting a Job
  2.5.1 Sample submit description files
  2.5.2 Using the Power and Flexibility of the Queue Command
  2.5.3 Variables in the Submit Description File
  2.5.4 Including Submit Commands Defined Elsewhere
  2.5.5 Using Conditionals in the Submit Description File
  2.5.6 Function Macros in the Submit Description File
  2.5.7 About Requirements and Rank
  2.5.8 Submitting Jobs Using a Shared File System
  2.5.9 Submitting Jobs Without a Shared File System: HTCondor’s File Transfer Mechanism
  2.5.10 Environment Variables
  2.5.11 Heterogeneous Submit: Execution on Differing Architectures
  2.5.12 Jobs That Require GPUs
  2.5.13 Interactive Jobs
 2.6 Managing a Job
  2.6.1 Checking on the progress of jobs
  2.6.2 Removing a job from the queue
  2.6.3 Placing a job on hold
  2.6.4 Changing the priority of jobs
  2.6.5 Why is the job not running?
  2.6.6 Job in the Hold State
  2.6.7 In the Job Event Log File
  2.6.8 Job Completion
 2.7 Priorities and Preemption
  2.7.1 Job Priority
  2.7.2 User priority
  2.7.3 Details About How HTCondor Jobs Vacate Machines
 2.8 Java Applications
  2.8.1 A Simple Example Java Application
  2.8.2 Less Simple Java Specifications
  2.8.3 Chirp I/O
 2.9 Parallel Applications (Including MPI Applications)
  2.9.1 How Parallel Jobs Run
  2.9.2 Parallel Jobs and the Dedicated Scheduler
  2.9.3 Submission Examples
  2.9.4 MPI Applications Within HTCondor’s Vanilla Universe
 2.10 DAGMan Applications
  2.10.1 DAGMan Terminology
  2.10.2 The DAG Input File: Basic Commands
  2.10.3 Command Order
  2.10.4 Node Job Submit File Contents
  2.10.5 DAG Submission
  2.10.6 File Paths in DAGs
  2.10.7 DAG Monitoring and DAG Removal
  2.10.8 Suspending a Running DAG
  2.10.9 Advanced Features of DAGMan
  2.10.10 The Rescue DAG
  2.10.11 DAG Recovery
  2.10.12 Visualizing DAGs with dot
  2.10.13 Capturing the Status of Nodes in a File
  2.10.14 A Machine-Readable Event History, the jobstate.log File
  2.10.15 Status Information for the DAG in a ClassAd
  2.10.16 Utilizing the Power of DAGMan for Large Numbers of Jobs
  2.10.17 Workflow Metrics
  2.10.18 DAGMan and Accounting Groups
 2.11 Virtual Machine Applications
  2.11.1 The Submit Description File
  2.11.2 Checkpoints
  2.11.3 Disk Images
  2.11.4 Job Completion in the vm Universe
  2.11.5 Failures to Launch
 2.12 Docker Universe Applications
 2.13 Time Scheduling for Job Execution
  2.13.1 Job Deferral
  2.13.2 CronTab Scheduling
 2.14 Special Environment Considerations
  2.14.1 AFS
  2.14.2 NFS
  2.14.3 HTCondor Daemons That Do Not Run as root
  2.14.4 Job Leases
 2.15 Potential Problems
  2.15.1 Renaming of argv[0]
3 Administrators’ Manual
 3.1 Introduction
  3.1.1 The Different Roles a Machine Can Play
  3.1.2 The HTCondor Daemons
 3.2 Installation, Start Up, Shut Down, and Reconfiguration
  3.2.1 Obtaining the HTCondor Software
  3.2.2 Installation on Unix
  3.2.3 Installation on Windows
  3.2.4 Upgrading – Installing a New Version on an Existing Pool
  3.2.5 Shutting Down and Restarting an HTCondor Pool
  3.2.6 Reconfiguring an HTCondor Pool
 3.3 Introduction to Configuration
  3.3.1 HTCondor Configuration Files
  3.3.2 Ordered Evaluation to Set the Configuration
  3.3.3 Configuration File Macros
  3.3.4 Comments and Line Continuations
  3.3.5 Multi-Line Values
  3.3.6 Executing a Program to Produce Configuration Macros
  3.3.7 Including Configuration from Elsewhere
  3.3.8 Reporting Errors and Warnings
  3.3.9 Conditionals in Configuration
  3.3.10 Function Macros in Configuration
  3.3.11 Macros That Will Require a Restart When Changed
  3.3.12 Pre-Defined Macros
 3.4 Configuration Templates
  3.4.1 Configuration Templates: Using Predefined Sets of Configuration
  3.4.2 Available Configuration Templates
  3.4.3 Configuration Template Transition Syntax
  3.4.4 Configuration Template Examples
 3.5 Configuration Macros
  3.5.1 HTCondor-wide Configuration File Entries
  3.5.2 Daemon Logging Configuration File Entries
  3.5.3 DaemonCore Configuration File Entries
  3.5.4 Network-Related Configuration File Entries
  3.5.5 Shared File System Configuration File Macros
  3.5.6 Checkpoint Server Configuration File Macros
  3.5.7 condor_master Configuration File Macros
  3.5.8 condor_startd Configuration File Macros
  3.5.9 condor_schedd Configuration File Entries
  3.5.10 condor_shadow Configuration File Entries
  3.5.11 condor_starter Configuration File Entries
  3.5.12 condor_submit Configuration File Entries
  3.5.13 condor_preen Configuration File Entries
  3.5.14 condor_collector Configuration File Entries
  3.5.15 condor_negotiator Configuration File Entries
  3.5.16 condor_procd Configuration File Macros
  3.5.17 condor_credd Configuration File Macros
  3.5.18 condor_gridmanager Configuration File Entries
  3.5.19 condor_job_router Configuration File Entries
  3.5.20 condor_lease_manager Configuration File Entries
  3.5.21 Grid Monitor Configuration File Entries
  3.5.22 Configuration File Entries Relating to Grid Usage
  3.5.23 Configuration File Entries for DAGMan
  3.5.24 Configuration File Entries Relating to Security
  3.5.25 Configuration File Entries Relating to Virtual Machines
  3.5.26 Configuration File Entries Relating to High Availability
  3.5.27 MyProxy Configuration File Macros
  3.5.28 Configuration File Entries Relating to condor_ssh_to_job
  3.5.29 condor_rooster Configuration File Macros
  3.5.30 condor_shared_port Configuration File Macros
  3.5.31 Configuration File Entries Relating to Hooks
  3.5.32 Configuration File Entries Only for Windows Platforms
  3.5.33 condor_defrag Configuration File Macros
  3.5.34 condor_gangliad Configuration File Macros
  3.5.35 condor_annex Configuration File Macros
 3.6 User Priorities and Negotiation
  3.6.1 Real User Priority (RUP)
  3.6.2 Effective User Priority (EUP)
  3.6.3 Priorities in Negotiation and Preemption
  3.6.4 Priority Calculation
  3.6.5 Negotiation
  3.6.6 The Layperson’s Description of the Pie Spin and Pie Slice
  3.6.7 Group Accounting
  3.6.8 Accounting Groups with Hierarchical Group Quotas
 3.7 Policy Configuration for Execute Hosts and for Submit Hosts
  3.7.1 condor_startd Policy Configuration
  3.7.2 condor_schedd Policy Configuration
 3.8 Security
  3.8.1 HTCondor’s Security Model
  3.8.2 Security Negotiation
  3.8.3 Authentication
  3.8.4 The Unified Map File for Authentication
  3.8.5 Encryption
  3.8.6 Integrity
  3.8.7 Authorization
  3.8.8 Security Sessions
  3.8.9 Host-Based Security in HTCondor
  3.8.10 Examples of Security Configuration
  3.8.11 Changing the Security Configuration
  3.8.12 Using HTCondor w/ Firewalls, Private Networks, and NATs
  3.8.13 User Accounts in HTCondor on Unix Platforms
 3.9 Networking (includes sections on Port Usage and CCB)
  3.9.1 Port Usage in HTCondor
  3.9.2 Reducing Port Usage with the condor_shared_port Daemon
  3.9.3 Configuring HTCondor for Machines With Multiple Network Interfaces
  3.9.4 HTCondor Connection Brokering (CCB)
  3.9.5 Using TCP to Send Updates to the condor_collector
  3.9.6 Running HTCondor on an IPv6 Network Stack
 3.10 The Checkpoint Server
  3.10.1 Preparing to Install a Checkpoint Server
  3.10.2 Installing the Checkpoint Server Module
  3.10.3 Configuring the Pool to Use Multiple Checkpoint Servers
  3.10.4 Checkpoint Server Domains
 3.11 DaemonCore
  3.11.1 DaemonCore and Unix signals
  3.11.2 DaemonCore and Command-line Arguments
 3.12 Monitoring
  3.12.1 Ganglia
  3.12.2 Absent ClassAds
 3.13 The High Availability of Daemons
  3.13.1 High Availability of the Job Queue
  3.13.2 High Availability of the Central Manager
 3.14 Setting Up for Special Environments
  3.14.1 Using HTCondor with AFS
  3.14.2 Enabling the Transfer of Files Specified by a URL
  3.14.3 Enabling the Transfer of Public Input Files over HTTP
  3.14.4 Configuring HTCondor for Multiple Platforms
  3.14.5 Full Installation of condor_compile
  3.14.6 The condor_kbdd
  3.14.7 Configuring The HTCondorView Server
  3.14.8 Running HTCondor Jobs within a Virtual Machine
  3.14.9 HTCondor’s Dedicated Scheduling
  3.14.10 Configuring HTCondor for Running Backfill Jobs
  3.14.11 Per Job PID Namespaces
  3.14.12 Group ID-Based Process Tracking
  3.14.13 Cgroup-Based Process Tracking
  3.14.14 Limiting Resource Usage with a User Job Wrapper
  3.14.15 Limiting Resource Usage Using Cgroups
  3.14.16 Concurrency Limits
 3.15 Java Support Installation
 3.16 Setting Up the VM and Docker Universes
  3.16.1 The VM Universe
  3.16.2 The Docker Universe
 3.17 Singularity Support
 3.18 Power Management
  3.18.1 Entering a Low Power State
  3.18.2 Returning From a Low Power State
  3.18.3 Keeping a ClassAd for a Hibernating Machine
  3.18.4 Linux Platform Details
  3.18.5 Windows Platform Details
4 Miscellaneous Concepts
 4.1 HTCondor’s ClassAd Mechanism
  4.1.1 ClassAds: Old and New
  4.1.2 Old ClassAd Syntax
  4.1.3 Old ClassAd Evaluation Semantics
  4.1.4 Old ClassAds in the HTCondor System
  4.1.5 Extending ClassAds with User-written Functions
 4.2 HTCondor’s Checkpoint Mechanism
  4.2.1 Standalone Checkpoint Mechanism
  4.2.2 Checkpoint Safety
  4.2.3 Checkpoint Warnings
  4.2.4 Checkpoint Library Interface
 4.3 Computing On Demand (COD)
  4.3.1 Overview of How COD Works
  4.3.2 Authorizing Users to Create and Manage COD Claims
  4.3.3 Defining a COD Application
  4.3.4 Managing COD Resource Claims
  4.3.5 Limitations of COD Support in HTCondor
 4.4 Hooks
  4.4.1 Job Hooks That Fetch Work
  4.4.2 Hooks for a Job Router
  4.4.3 Daemon ClassAd Hooks
 4.5 Logging in HTCondor
  4.5.1 Job and Daemon Logs
  4.5.2 DAGMan Logs
5 Grid Computing
 5.1 Introduction
 5.2 Connecting HTCondor Pools with Flocking
  5.2.1 Flocking Configuration
  5.2.2 Job Considerations
 5.3 The Grid Universe
  5.3.1 HTCondor-C, The condor Grid Type
  5.3.2 HTCondor-G, the gt2, and gt5 Grid Types
  5.3.3 The nordugrid Grid Type
  5.3.4 The unicore Grid Type
  5.3.5 The batch Grid Type (for PBS, LSF, SGE, and SLURM)
  5.3.6 The EC2 Grid Type
  5.3.7 The GCE Grid Type
  5.3.8 The Azure Grid Type
  5.3.9 The cream Grid Type
  5.3.10 The BOINC Grid Type
  5.3.11 Matchmaking in the Grid Universe
 5.4 The HTCondor Job Router
  5.4.1 Routing Mechanism
  5.4.2 Job Submission with Job Routing Capability
  5.4.3 An Example Configuration
  5.4.4 Routing Table Entry ClassAd Attributes
  5.4.5 Example: constructing the routing table from ReSS
6 Cloud Computing
 6.1 Introduction
  6.1.1 Use Case: Deadlines
  6.1.2 Use Case: Capabilities
  6.1.3 Use Case: Capacities
 6.2 HTCondor Annex User’s Guide
  6.2.1 Considerations and Limitations
  6.2.2 Basic Usage
  6.2.3 Start an Annex
  6.2.4 Monitor your Annex
  6.2.5 Run a Job
  6.2.6 Stop an Annex
  6.2.7 Using Different or Multiple AWS Regions
  6.2.8 Advanced Usage
 6.3 Using condor_annex for the First Time
  6.3.1 Install a Personal HTCondor
  6.3.2 Prepare your AWS account
  6.3.3 Configure condor_annex
 6.4 HTCondor Annex Customization Guide
  6.4.1 Amazon Web Services
  6.4.2 Azure
  6.4.3 Google Cloud Platform
 6.5 HTCondor Annex Configuration
  6.5.1 User Settings
  6.5.2 Logging
  6.5.3 Expert Settings
  6.5.4 Developer Settings
7 Application Programming Interfaces (APIs)
 7.1 Python Bindings
  7.1.1 htcondor Module
  7.1.2 Sample Code using the htcondor Python Module
  7.1.3 ClassAd Module
  7.1.4 Sample Code using the classad Module
 7.2 Chirp
 7.3 The HTCondor User and Job Log Reader API
  7.3.1 Constants and Enumerated Types
  7.3.2 Constructors and Destructors
  7.3.3 Initializers
  7.3.4 Primary Methods
  7.3.5 Accessors
  7.3.6 Methods for saving and restoring persistent reader state
  7.3.7 Save state to persistent storage
  7.3.8 Restore state from persistent storage
  7.3.9 API Reference
  7.3.10 Access to the persistent state data
  7.3.11 Future persistence API
 7.4 The Command Line Interface
 7.5 The DRMAA API
  7.5.1 Implementation Details
8 Platform-Specific Information
 8.1 Linux
  8.1.1 Linux Address Space Randomization
 8.2 Microsoft Windows
  8.2.1 Limitations under Windows
  8.2.2 Supported Features under Windows
  8.2.3 Secure Password Storage
  8.2.4 Executing Jobs as the Submitting User
  8.2.5 The condor_credd Daemon
  8.2.6 Executing Jobs with the User’s Profile Loaded
  8.2.7 Using Windows Scripts as Job Executables
  8.2.8 How HTCondor for Windows Starts and Stops a Job
  8.2.9 Security Considerations in HTCondor for Windows
  8.2.10 Network files and HTCondor
  8.2.11 Interoperability between HTCondor for Unix and HTCondor for Windows
  8.2.12 Some differences between HTCondor for Unix -vs- HTCondor for Windows
 8.3 Macintosh OS X
9 Frequently Asked Questions (FAQ)
10 Contrib and Source Modules
 10.1 Introduction
 10.2 The HTCondorView Client Contrib Module
  10.2.1 Step-by-Step Installation of the HTCondorView Client
 10.3 Job Monitor/Log Viewer
  10.3.1 Transition States
  10.3.2 Events
  10.3.3 Selecting Jobs
  10.3.4 Zooming
  10.3.5 Keyboard and Mouse Shortcuts
11 Version History and Release Notes
 11.1 Introduction to HTCondor Versions
  11.1.1 HTCondor Version Number Scheme
  11.1.2 The Stable Release Series
  11.1.3 The Development Release Series
 11.2 Development Release Series 8.9
 11.3 Upgrading from the 8.6 series to the 8.8 series of HTCondor
 11.4 Stable Release Series 8.8
 11.5 Development Release Series 8.7
 11.6 Stable Release Series 8.6
12 Command Reference Manual (man pages)
 bosco_cluster
 bosco_findplatform
 bosco_install
 bosco_ssh_start
 bosco_start
 bosco_stop
 bosco_uninstall
 condor_advertise
 condor_annex
 condor_check_userlogs
 condor_checkpoint
 condor_chirp
 condor_cod
 condor_compile
 condor_config_val
 condor_configure
 condor_continue
 condor_convert_history
 condor_dagman
 condor_dagman_metrics_reporter
 condor_drain
 condor_fetchlog
 condor_findhost
 condor_gather_info
 condor_gpu_discovery
 condor_history
 condor_hold
 condor_install
 condor_job_router_info
 condor_master
 condor_now
 condor_off
 condor_on
 condor_ping
 condor_pool_job_report
 condor_power
 condor_preen
 condor_prio
 condor_procd
 condor_q
 condor_qedit
 condor_qsub
 condor_reconfig
 condor_release
 condor_reschedule
 condor_restart
 condor_rm
 condor_rmdir
 condor_router_history
 condor_router_q
 condor_router_rm
 condor_run
 condor_set_shutdown
 condor_ssh_to_job
 condor_sos
 condor_stats
 condor_status
 condor_store_cred
 condor_submit
 condor_submit_dag
 condor_suspend
 condor_tail
 condor_top
 condor_transfer_data
 condor_transform_ads
 condor_update_machine_ad
 condor_updates_stats
 condor_urlfetch
 condor_userlog
 condor_userprio
 condor_vacate
 condor_vacate_job
 condor_version
 condor_wait
 condor_who
 gidd_alloc
 procd_ctl
A ClassAd Attributes
 A.1 ClassAd Types
 A.2 Job ClassAd Attributes
 A.3 Machine ClassAd Attributes
 A.4 DaemonMaster ClassAd Attributes
 A.5 Scheduler ClassAd Attributes
 A.6 Negotiator ClassAd Attributes
 A.7 Submitter ClassAd Attributes
 A.8 Defrag ClassAd Attributes
 A.9 Collector ClassAd Attributes
 A.10 ClassAd Attributes Added by the condor_collector
 A.11 DaemonCore Statistics Attributes
B Codes and Other Needed Values
 B.1 condor_shadow Exit Codes
 B.2 Job Event Log Codes
 B.3 Well-known Port Numbers
 B.4 DaemonCore Command Numbers
 B.5 DaemonCore Daemon Exit Codes
LICENSING AND COPYRIGHT

HTCondor is released under the Apache License, Version 2.0.

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

Copyright © 1990-2019 Center for High Throughput Computing, Computer Sciences Department, University of Wisconsin-Madison, WI.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

For complete information and additional license notices see

http://htcondor.org/license.html

     Contents Index