This class is responsible for managing an application in an opportunistic environment
The return value is taken as the return value from the user's pack_worker_init_data() function
This class is responsible for managing an application in an opportunistic environment. The goal is to be completely fault - tolerant, dealing with all possiblities of host (worker) problems. To do this, the MWDriver class manages a set of tasks and a set of workers. It monitors messages about hosts coming up and going down, and assigns tasks appropriately.This class is built upon some sort of resource management and message passing lower layer. Previously, it was built directly on top of Condor - PVM, but the interface to that has been abstracted away so that it can use any facility that provides for resource management and message passing. See the abstract MWRMComm class for details of this lower layer. When interfacing with this level, you'll have use the RMC object that's a static member of the MWDriver, MWTask, and MWWorker class.
To implement an application, a user must derive a class from this base class and implement the following methods:
- get_userinfo()
- setup_initial_tasks()
- pack_worker_init_data()
- act_on_completed_task()
For a higher level of control regarding the distribution of tasks to workers, the following methods have to be implemented:
- set_workClasses()
- act_on_starting_worker()
Similar application dependent methods must be implemented for the "Task" of work to be done and the "Worker" who performs the tasks.
The MWTasks pointed to should be of the task type derived
for your application
This one packs all the user's initial data. It is unpacked
int the worker class, in unpack_init_data().
Potential "initial" information that might be useful is...
These sorts of things could be useful in building some
scheduling intelligence into the driver.
Probably a better solution in the long run is to provide
users hooks into these functions or something. Basic default functionality that updates the known
status of our virtual machine is provided.
Then MWDriver does the rest for you. When checkpoint() is
called (see below) it opens up a known filename for writing.
It passes the file pointer of that file to write_master_state(),
which dumps the "state" of the master to that fp. Here
"sate" includes all the variables, info, etc of YOUR
CLASS THAT WAS DERIVED FROM MWDRIVER. All state in
MWDriver.C is taken care of (there's not much). Next,
checkpoint will walk down the running queue and the todo
queue and call each member's write_ckpt_info(). Upon restart, MWDriver will detect the presence of the
checkpoint file and restart from it. It calls
read_master_state(), which is the inverse of
write_master_state(). Then, for each task in the
checkpoint file, it creates a new MWTask, calls
read_ckpt_info() on it, and adds it to the todo queue. We start from there and proceed as normal. One can set the "frequency" that checkpoint files will be
written (using set_checkpoint_frequency()). The default
frequency is zero - no checkpointing. When the frequency is
set to n, every nth time that act_on_completed_task gets
called, we checkpoint immediately afterwards. If your
application involves "work steps", you probably will want to
leave the frequency at zero and call checkpoint yourself
at the end of a work step.
If your application coredumps when trying to restart from
a checkpoint, it might be becasue you haven't implemented this function.
The return value is taken as the return value from the user's
pack_worker_init_data() function
My_Task *dt = dynamic_cast<My_Task *> ( t );
assert( dt );
MWTask* gimme_a_task() {
return new <your derived task class>;
}
argv - The argv from the command line
this page has been generated automatically by doc++
(c)opyright by Malte Zöckler, Roland Wunderling
contact: doc++@zib.de