|
|
MW Overview
MW is a set of C++ abstract base classes that allow rapid
development of sophisticated scientific computing applications based
on the master-worker paradigm. We outline here the design of MW and
indicate how it can be used to build an application.
There are three abstract base classes to implement. The
MWDriver class corresponds to the master process and contains
the control center for distributing tasks to workers. The
MWTask class describes the inputs and outputs - the data and
results - that are associated with a single unit of work. The
MWWorker class contains code to initialize a worker process and
to execute any tasks that are sent to it by the master.
MWDriver
To create the MWDriver - the master process - the user need only
implement four pure virtual functions:
- get_userinfo() -- Do basic setup, process any
arguments, etc. This is called once when the master is started up.
- setup_initial_tasks() -- Returns a set of tasks for the
computation to begin work on. Also executed once at the beginning.
- pack_worker_init_data() -- When a worker starts up, it may
have to be sent some initial data. That is done here.
- act_on_completed_task()- - This is called every time a task
finishes. Some actions that the user could take might include adding
more tasks or making calculations based on the result of the task.
The MWDriver manages a set of MWTasks and a set of MWWorkers
to execute those tasks. The MWDriver handles workers joining and
leaving the computation, assigns tasks to appropriate workers and
rematches running tasks when workers are lost. Despite this internal
complexity, a minimal implementation can be quite simple. The
MWDriver offers more advanced functionality that will be explained
below.
MWTask
The MWTask is the abstraction of one unit of work. The class holds
both the data describing that task and the results computed by the worker.
The derived task class must also implement functions for sending and
receiving its data between the master and worker. The names of these
functions are self-explanatory: pack_work(), unpack_work(),
pack_results(), and unpack_results().
MWWorker
The MWWorker class is the core of the worker executable. Two pure
virtual functions must be implemented:
- unpack_init_data()-- Unpacks the initialization
information passed in the MWDriver's pack_worker_init_data().
- execute_task()-- Given a task, compute the results.
After it does some basic initialization, the MWWorker sits in a simple
loop. It asks the master for a task, computes the results, reports
the results back and waits for another task. The loop finishes when
the master asks the worker to end. It is an easy matter to bring in
other libraries, such as highly optimized FORTRAN routines, to the
worker. They can be linked in with the C++ code, and called in the
execute_task() function.
Other MWDriver features
To make computations fully reliable, MWDriver offers features to checkpoint
the state of the computation on a user-defined frequency. MWDriver can restart
from that state after a crash of the master. The user has just
to implement functions for writing and reading the state contained in its
application's master and tasks. This feature can be used to perform rudimentary
``computational steering'' if the user stops the computation by hand,
modifies the checkpoint file, and then restarts from that checkpoint.
To help the user make the best use of available resources, MWDriver
has abstract mechanisms to sort the task pool according to
user-supplied priorities. MWDriver also maintains
information about each participating workers. This information can be
used by the user to develop advanced scheduling policies which match
tasks with the best suited workers.
The user can set the number of workers desired for a computation by
using the set_target_num_workers() function. This function
can be called as needed to change the target size of the pool of workers.
To deal with heterogeneous resources, MWDriver currently makes use of
the multi-architecture features of HTCondor. But one can easily extend this for
other resource manager as well. The user compiles
the workers for the targeted architectures, and the MWDriver takes care of
selecting the correct executable as new workers enter the
computation.
|