Condor - High Throughput Computing

Condor Pool Goodput

Goodput is the allocation time when an application uses a remote workstation to make forward progress. Goodput can be significantly less than total allocation time. An application is prevented from using the workstation when it is waiting for the network. Additionally, forward progress is lost when an application must roll-back to an earlier state due to a failure.

Goodput = Throughput - Network Wait Time - Roll-back

Network Wait Time includes: Roll-back occurs when the application does not successfully write a checkpoint when preempted.

We published a paper on this subject, titled Improving Goodput by Co-scheduling CPU and Network Capacity, in the Fall 1999 International Journal of High Performance Computing Applications, Volume 13(3).

Checkpoint Server Statistics

Statistics maintained by the checkpoint server are also useful for monitoring goodput. The checkpoint server maintains a record of all attempted checkpoint transfers. This record is used to report network usage by the system, including success rate and network throughput.

Matchmaker Statistics

The matchmaker initiates all job placements in the Condor pool. The matchmaker maintains a record of all matches it performs, which includes an estimate of the placement cost of each job.

Idle Machine Statistics

Condor monitors keyboard activity on all of the machines it manages. Statistics about when and how long machines are idle can influence goodput scheduling decisions. For example, if machines are mostly idle for short periods of time, checkpointing plays an increased role in the goodput of the pool.

Representative Application Statistics

One method we use to monitor goodput is to keep a small number of representative applications running at all times and track the goodput obtained by these applications. Together, these representative applications form a "goodput index." Note: we are not currently running the representative applications.

Goodput statistics are updated daily. All graphs are also available on a single goodput statistics page.


condor-admin@cs.wisc.edu