MW OverviewMW is a set of C++ abstract base classes that allow rapid development of sophisticated scientific computing applications based on the master-worker paradigm. We outline here the design of MW and indicate how it can be used to build an application.
There are three abstract base classes to implement. The MWDriver class corresponds to the master process and contains the control center for distributing tasks to workers. The MWTask class describes the inputs and outputs - the data and results - that are associated with a single unit of work. The MWWorker class contains code to initialize a worker process and to execute any tasks that are sent to it by the master.
MWDriverTo create the MWDriver - the master process - the user need only implement four pure virtual functions:
MWTaskThe MWTask is the abstraction of one unit of work. The class holds both the data describing that task and the results computed by the worker. The derived task class must also implement functions for sending and receiving its data between the master and worker. The names of these functions are self-explanatory: pack_work(), unpack_work(), pack_results(), and unpack_results().
MWWorkerThe MWWorker class is the core of the worker executable. Two pure virtual functions must be implemented:
Other MWDriver featuresTo make computations fully reliable, MWDriver offers features to checkpoint the state of the computation on a user-defined frequency. MWDriver can restart from that state after a crash of the master. The user has just to implement functions for writing and reading the state contained in its application's master and tasks. This feature can be used to perform rudimentary ``computational steering'' if the user stops the computation by hand, modifies the checkpoint file, and then restarts from that checkpoint.
To help the user make the best use of available resources, MWDriver has abstract mechanisms to sort the task pool according to user-supplied priorities. MWDriver also maintains information about each participating workers. This information can be used by the user to develop advanced scheduling policies which match tasks with the best suited workers.
The user can set the number of workers desired for a computation by using the set_target_num_workers() function. This function can be called as needed to change the target size of the pool of workers. To deal with heterogeneous resources, MWDriver currently makes use of the multi-architecture features of HTCondor. But one can easily extend this for other resource manager as well. The user compiles the workers for the targeted architectures, and the MWDriver takes care of selecting the correct executable as new workers enter the computation.