This class is responsible for managing an application in an opportunistic environment
The return value is taken as the return value from the user's pack_worker_init_data() function
This class is responsible for managing an application in an opportunistic environment. The goal is to be completely fault - tolerant, dealing with all possiblities of host (worker) problems. To do this, the MWDriver class manages a set of tasks and a set of workers. It monitors messages about hosts coming up and going down, and assigns tasks appropriately.This class is built upon some sort of resource management and message passing lower layer. Previously, it was built directly on top of Condor - PVM, but the interface to that has been abstracted away so that it can use any facility that provides for resource management and message passing. See the abstract MWRMComm class for details of this lower layer. When interfacing with this level, you'll have use the RMC object that's a static member of the MWDriver, MWTask, and MWWorker class.
To implement an application, a user must derive a class from this base class and implement the following methods:
- get_userinfo()
- setup_initial_tasks()
- pack_worker_init_data()
- act_on_completed_task()
For a higher level of control regarding the distribution of tasks to workers, the following methods have to be implemented:
- set_workClasses()
- act_on_starting_worker()
Similar application dependent methods must be implemented for the "Task" of work to be done and the "Worker" who performs the tasks.
The MWTasks pointed to should be of the task type derived
for your application
This one packs all the user's initial data. It is unpacked
int the worker class, in unpack_init_data().
Potential "initial" information that might be useful is...
These sorts of things could be useful in building some
scheduling intelligence into the driver.
Probably a better solution in the long run is to provide
users hooks into these functions or something. Basic default functionality that updates the known
status of our virtual machine is provided.
Then MWDriver does the rest for you. When checkpoint() is
called (see below) it opens up a known filename for writing.
It passes the file pointer of that file to write_master_state(),
which dumps the "state" of the master to that fp. Here
"sate" includes all the variables, info, etc of YOUR
CLASS THAT WAS DERIVED FROM MWDRIVER. All state in
MWDriver.C is taken care of (there's not much). Next,
checkpoint will walk down the running queue and the todo
queue and call each member's write_ckpt_info(). Upon restart, MWDriver will detect the presence of the
checkpoint file and restart from it. It calls
read_master_state(), which is the inverse of
write_master_state(). Then, for each task in the
checkpoint file, it creates a new MWTask, calls
read_ckpt_info() on it, and adds it to the todo queue. We start from there and proceed as normal. One can set the "frequency" that checkpoint files will be
written (using set_checkpoint_frequency()). The default
frequency is zero - no checkpointing. When the frequency is
set to n, every nth time that act_on_completed_task gets
called, we checkpoint immediately afterwards. If your
application involves "work steps", you probably will want to
leave the frequency at zero and call checkpoint yourself
at the end of a work step.
If your application coredumps when trying to restart from
a checkpoint, it might be becasue you haven't implemented this function.
The return value is taken as the return value from the user's
pack_worker_init_data() function
virtual ~MWDriver()
void go( int argc, char *argv[] )
void go()
virtual void printresults()
static MWRMComm* RMC
A. Pure Virtual Methods
virtual MWReturn get_userinfo( int argc, char *argv[] )
virtual MWReturn setup_initial_tasks( int *n, MWTask ***task )
virtual MWReturn act_on_completed_task( MWTask * )
My_Task *dt = dynamic_cast<My_Task *> ( t );
assert( dt );
virtual MWReturn act_on_starting_worker( MWWorkerID *w )
virtual MWReturn pack_worker_init_data( void )
virtual void unpack_worker_initinfo( MWWorkerID *w )
virtual void pack_driver_task_data( void )
B. Task List Management
void workClasses_set( int num )
int workClasses_get( )
int workClasses_getworkers( int num )
int workClasses_gettasks( int num )
int refreshWorkers( int i, MWREFRESH_TYPE )
void addTask( MWTask * )
void addTasks( int, MWTask ** )
void addSortedTasks( int n, MWTask **add_tasks )
void addTaskByKey( MWTask *add_task )
void set_task_key_function( MWKey (*)( MWTask * ) )
int set_task_add_mode( MWTaskAdditionMode )
int set_task_retrieve_mode( MWTaskRetrievalMode )
int sort_task_list( void )
int delete_tasks_worse_than( MWKey )
int get_number_tasks()
int get_number_running_tasks()
int print_task_keys( void )
C. Worker Policy Management
. task timeout policy.
bool worker_timeout
double worker_timeout_limit
int worker_timeout_check_frequency
int next_worker_timeout_check
void reassign_tasks_timedout_workers()
void set_worker_timeout_limit(double timeout_limit, int timeout_frequency)
D. Event Handling Methods
virtual MWReturn handle_benchmark( MWWorkerID *w )
virtual void handle_hostdel()
virtual void handle_hostsuspend()
virtual void handle_hostresume()
virtual void handle_taskexit()
virtual void handle_checksum()
E. Checkpoint Handling Functions
void checkpoint()
void restart_from_ckpt()
int set_checkpoint_frequency( int freq )
int set_checkpoint_time( int secs )
virtual void write_master_state( FILE *fp )
virtual void read_master_state( FILE *fp )
virtual MWTask* gimme_a_task()
MWTask* gimme_a_task() {
return new <your derived task class>;
}
Benchmarking
double timeval_to_double( struct timeval t )
Main Internal Handling Routines
MWReturn master_setup( int argc, char *argv[] )
argv - The argv from the command line
MWReturn master_mainloop()
MWReturn worker_init( MWWorkerID *w )
MWReturn create_initial_tasks()
MWReturn handle_worker_results( MWWorkerID *w )
void send_task_to_worker( MWWorkerID *w )
void rematch_tasks_to_workers( MWWorkerID *nosend )
void call_hostaddlogic()
void kill_workers()
void hostPostmortem( MWWorkerID *w )
void ControlPanel( )
Internal Task List Routines
void pushTask( MWTask * )
MWTask* getNextTask( MWGroup *grp )
void putOnRunQ( MWTask *t )
MWTask* rmFromRunQ( int jobnum )
void printRunQ()
void ckpt_addTask( MWTask * )
MWWorkerID* task_assigned( MWTask *t )
bool task_in_todo_list( MWTask *t )
MWKey (*task_key)( MWTask * )
MWTaskAdditionMode addmode
MWTaskRetrievalMode getmode
MWKey (*worker_key)( MWWorkerID * )
int task_counter
bool listsorted
MWList* todo
MWTask* get_todo_head()
MWList* running
Worker management methods
void addWorker( MWWorkerID *w )
MWWorkerID* lookupWorker( int tid )
MWWorkerID* rmWorker( int tid )
void worker_last_rites( MWWorkerID *w )
void printWorkers()
MWWorkerID* get_workers_head()
MWList* workers
MWSuspensionPolicy suspensionPolicy
void sort_worker_list()
int numWorkers()
int numWorkers( int arch )
int numWorkersInState( int ThisState )
MWKey return_best_todo_keyval( void )
MWKey return_best_running_keyval( void )
Checkpoint internal helpers...
int checkpoint_frequency
int checkpoint_time_freq
long next_ckpt_time
int num_completed_tasks
MWStatistics stats
double get_instant_pool_perf()
char* getHostName()
XML and Status Methods.
void write_XML_status()
virtual char* get_XML_results_status(void )
char* get_XML_status()
char* get_XML_job_information()
char* get_XML_problem_description()
char* get_XML_interface_remote_files()
char* get_XML_resources_status()
const char* xml_filename
const char* xml_menus_filename
const char* xml_jobinfo_filename
const char* xml_pbdescrib_filename
void get_machine_info()
char* get_Arch()
char* get_OpSys()
char* get_IPAddress()
double get_CondorLoadAvg()
double get_LoadAvg()
int get_Memory()
int get_Cpus()
int get_VirtualMemory()
int get_Disk()
int get_KFlops()
int get_Mips()
char Arch[64]
char OpSys[64]
char IPAddress[64]
double CondorLoadAvg
double LoadAvg
int Memory
int Cpus
int VirtualMemory
int Disk
int KFlops
int Mips
int check_for_int_val(char* name, char* key, char* value)
char mach_name[64]
double defaultTimeInterval
this page has been generated automatically by doc++
(c)opyright by Malte Zöckler, Roland Wunderling
contact: doc++@zib.de