Skip to main content.

MW Overview

MW is a set of C++ abstract base classes that allow rapid development of sophisticated scientific computing applications based on the master-worker paradigm. We outline here the design of MW and indicate how it can be used to build an application.

There are three abstract base classes to implement. The MWDriver class corresponds to the master process and contains the control center for distributing tasks to workers. The MWTask class describes the inputs and outputs - the data and results - that are associated with a single unit of work. The MWWorker class contains code to initialize a worker process and to execute any tasks that are sent to it by the master.

MWDriver

To create the MWDriver - the master process - the user need only implement four pure virtual functions:
  • get_userinfo() -- Do basic setup, process any arguments, etc. This is called once when the master is started up.
  • setup_initial_tasks() -- Returns a set of tasks for the computation to begin work on. Also executed once at the beginning.
  • pack_worker_init_data() -- When a worker starts up, it may have to be sent some initial data. That is done here.
  • act_on_completed_task()- - This is called every time a task finishes. Some actions that the user could take might include adding more tasks or making calculations based on the result of the task.
The MWDriver manages a set of MWTasks and a set of MWWorkers to execute those tasks. The MWDriver handles workers joining and leaving the computation, assigns tasks to appropriate workers and rematches running tasks when workers are lost. Despite this internal complexity, a minimal implementation can be quite simple. The MWDriver offers more advanced functionality that will be explained below.

MWTask

The MWTask is the abstraction of one unit of work. The class holds both the data describing that task and the results computed by the worker. The derived task class must also implement functions for sending and receiving its data between the master and worker. The names of these functions are self-explanatory: pack_work(), unpack_work(), pack_results(), and unpack_results().

MWWorker

The MWWorker class is the core of the worker executable. Two pure virtual functions must be implemented:
  • unpack_init_data()-- Unpacks the initialization information passed in the MWDriver's pack_worker_init_data().
  • execute_task()-- Given a task, compute the results.
After it does some basic initialization, the MWWorker sits in a simple loop. It asks the master for a task, computes the results, reports the results back and waits for another task. The loop finishes when the master asks the worker to end. It is an easy matter to bring in other libraries, such as highly optimized FORTRAN routines, to the worker. They can be linked in with the C++ code, and called in the execute_task() function.

Other MWDriver features

To make computations fully reliable, MWDriver offers features to checkpoint the state of the computation on a user-defined frequency. MWDriver can restart from that state after a crash of the master. The user has just to implement functions for writing and reading the state contained in its application's master and tasks. This feature can be used to perform rudimentary ``computational steering'' if the user stops the computation by hand, modifies the checkpoint file, and then restarts from that checkpoint.

To help the user make the best use of available resources, MWDriver has abstract mechanisms to sort the task pool according to user-supplied priorities. MWDriver also maintains information about each participating workers. This information can be used by the user to develop advanced scheduling policies which match tasks with the best suited workers.

The user can set the number of workers desired for a computation by using the set_target_num_workers() function. This function can be called as needed to change the target size of the pool of workers. To deal with heterogeneous resources, MWDriver currently makes use of the multi-architecture features of HTCondor. But one can easily extend this for other resource manager as well. The user compiles the workers for the targeted architectures, and the MWDriver takes care of selecting the correct executable as new workers enter the computation.