class MWDriver

This class is responsible for managing an application in an opportunistic environment

Public Fields

static MWRMComm* RMC
A static instance of our Resource Management / Communication class

Public Methods

void go( int argc, char *argv[] )
This method runs the entire fault-tolerant application in the condor environment
void go()
This version of go simply calls go(0, NULL)
MWDriver()
Default constructor
virtual void printresults()
Prints the Results
virtual ~MWDriver()
Destructor - walks through lists of tasks & workers and deletes them

Protected Methods

double timeval_to_double( struct timeval t )
A helper function

Protected

. task timeout policy.
bool worker_timeout
If false : workers never timeout and can potentially work forever on a task If true : workers time out after worker_timeout_limit seconds
double worker_timeout_limit
Limit of seconds after which workers are considered time out and tasks are re-assigned
int worker_timeout_check_frequency
frequency at which we check if there are timed out workers
int next_worker_timeout_check
based on the time out frequency, next timeout check time
void reassign_tasks_timedout_workers()
Go through the list of timed out WORKING workers and reschedule tasks
void set_worker_timeout_limit(double timeout_limit, int timeout_frequency)
Sets the timeout_limit and turn worker_timeout to 1
A. Pure Virtual Methods
virtual MWReturn get_userinfo( int argc, char *argv[] )
This function is called to read in all information specific to a user's application and do any initialization on this information
virtual MWReturn setup_initial_tasks( int *n, MWTask ***task )
This function must return a number n > 0 of pointers to Tasks to "jump start" the application
virtual MWReturn act_on_completed_task( MWTask * )
This function performs actions that happen once the Driver receives notification of a completed task
virtual MWReturn act_on_starting_worker( MWWorkerID *w )
This function should be implemented by the application to assign the workClass number to the worker if it is doing intelligent work scheduling
virtual MWReturn pack_worker_init_data( void )
A common theme of Master-Worker applications is that there is a base amount of "initial" data defining the problem, and then just incremental data defining "Tasks" to be done by the Workers
virtual void unpack_worker_initinfo( MWWorkerID *w )
This one unpacks the "initial" information sent to the driver once the worker initializes
virtual void pack_driver_task_data( void )
OK, This one is not pure virtual either, but if you have some "driver" data that is conceptually part of the task and you wish not to replicate the data in each task, you can pack it in a message buffer by implementing this function
B. Task List Management
void workClasses_set( int num )
Set up workClasses
int workClasses_get( )
Get the number of workclasses
int workClasses_getworkers( int num )
get workers in the present class
int workClasses_gettasks( int num )
get tasks in the present class
int refreshWorkers( int i, MWREFRESH_TYPE )
Send all workers in group i a RE_INIT message
void addTask( MWTask * )
Add a task to the list
void addTasks( int, MWTask ** )
Add a bunch of tasks to the list
void addSortedTasks( int n, MWTask **add_tasks )
This will add a list of tasks that are sorted by key
void addTaskByKey( MWTask *add_task )
This is a helper function for addSortedTasks().
void set_task_key_function( MWKey (*)( MWTask * ) )
Sets the function that MWDriver users to get the "key" for a task
int set_task_add_mode( MWTaskAdditionMode )
Set the mode you wish for task addition.
int set_task_retrieve_mode( MWTaskRetrievalMode )
Set the mode you wish for task retrieval.
int sort_task_list( void )
This sorts the task list by the key that is set
int delete_tasks_worse_than( MWKey )
This deletes all tasks in the task list with a key worse than the one specified
int get_number_tasks()
returns the number of tasks on the todo list.
int get_number_running_tasks()
returns the number of running tasks.
int print_task_keys( void )
(Mostly for debugging) -- Prints the task keys in the todo list
Benchmarking
void register_benchmark_task( MWTask *t )
register the task that will be used for benchmarking
MWTask* get_benchmark_task()
get the benchmark task
C. Worker Policy Management
void set_suspension_policy( MWSuspensionPolicy )
Set the policy to use when suspending
int set_machine_ordering_policy( MWMachineOrderingPolicy )
Sets the machine ordering policy.
D. Event Handling Methods
virtual MWReturn handle_benchmark( MWWorkerID *w )
Here, we get back the benchmarking results, which tell us something about the worker we've got
virtual void handle_hostdel()
This is what gets called when a host goes away
virtual void handle_hostsuspend()
Implements a suspension policy
virtual void handle_hostresume()
Here's where you go when a host gets resumed
virtual void handle_taskexit()
We do basically the same thing as handle_hostdel()
virtual void handle_checksum()
Routine to handle when the communication layer says that a checksum error happened
E. Checkpoint Handling Functions
void checkpoint()
This function writes the current state of the job to disk
void restart_from_ckpt()
This function does the inverse of checkpoint
int set_checkpoint_frequency( int freq )
This function sets the frequency with with checkpoints are done
int set_checkpoint_time( int secs )
Set a time-based frequency for checkpoints
virtual void write_master_state( FILE *fp )
Here you write out all 'state' of the driver to fp
virtual void read_master_state( FILE *fp )
Here, you read in the 'state' of the driver from fp
virtual MWTask* gimme_a_task()
It's really annoying that the user has to do this, but they do

Private Fields

MWStatistics stats
The instance of the stats class that takes workers and later prints out relevant stats

Private Methods

double get_instant_pool_perf()
This returns the sum of the bench results for the currently working machines
char* getHostName()
This returns the hostname

Private

Checkpoint internal helpers...
int checkpoint_frequency
How often to checkpoint? Task frequency based
int checkpoint_time_freq
How often to checkpoint? Time based
long next_ckpt_time
Time to do next checkpoint
int num_completed_tasks
The number of tasks acted upon up to now
MWTask* bench_task
The benchmark task
const char* ckpt_filename
The name of the checkpoint file
Internal Task List Routines
void pushTask( MWTask * )
This puts a (generally failed) task at the beginning of the list
MWTask* getNextTask( MWGroup *grp )
Get a Task.
void putOnRunQ( MWTask *t )
This puts a task at the end of the running list
MWTask* rmFromRunQ( int jobnum )
Removes a task from the queue of things to run
void printRunQ()
Print the tasks in the list of tasks to do
void ckpt_addTask( MWTask * )
Add one task to the todo list; do NOT set the 'number' of the task - useful in restarting from a checkpoint
MWWorkerID* task_assigned( MWTask *t )
returns the worker this task is assigned to, NULL if none.
bool task_in_todo_list( MWTask *t )
Returns true if "t" is still in the todo list
MWKey (*task_key)( MWTask * )
A pointer to a (user written) function that takes an MWTask and returns the "key" for this task
MWTaskAdditionMode addmode
Where should tasks be added to the list?
MWTaskRetrievalMode getmode
Where should tasks by retrived from the list
MWKey (*worker_key)( MWWorkerID * )
A pointer to the function that returns the "key" by which machines are ranked
int task_counter
MWDriver keeps a unique identifier for each task -- here's the counter
bool listsorted
Is the list sorted by the current key function
MWList* todo
The head of the list of tasks to do
This is Jeff's nasty addition so that he can get access
MWTask* get_todo_head()
to the tasks on the master
MWList* running
The head of the list of tasks that are actually running
Main Internal Handling Routines
MWReturn master_setup( int argc, char *argv[] )
This method is called before master_mainloop() is
MWReturn master_mainloop()
This is the main controlling routine of the master
MWReturn worker_init( MWWorkerID *w )
unpacks the initial worker information, and sends the application startup information (by calling pure virtual pack_worker_init_data()

The return value is taken as the return value from the user's pack_worker_init_data() function

MWReturn create_initial_tasks()
This routine sets up the list of initial tasks to do on the todo list
MWReturn handle_worker_results( MWWorkerID *w )
Act on a "completed task" message from a worker
void send_task_to_worker( MWWorkerID *w )
We grab the next task off the todo list, make and send a work message, and send it to a worker
void rematch_tasks_to_workers( MWWorkerID *nosend )
After each result message is processed, we try to match up tasks with workers
void call_hostaddlogic()
A wrapper around the lower level's hostaddlogic
void kill_workers()
Kill all the workers
void hostPostmortem( MWWorkerID *w )
This is called in both handle_hostdelete and handle_taskexit
void ControlPanel( )
The control panel that controls the execution of the independent mode
Worker management methods
void addWorker( MWWorkerID *w )
Adds a worker to the list of avaiable workers
MWWorkerID* lookupWorker( int tid )
Looks up information about a worker given its task ID
MWWorkerID* rmWorker( int tid )
Removes a worker from the list of available workers
This function removes worker from the list, removes it and deletes
void worker_last_rites( MWWorkerID *w )
the structure.
void printWorkers()
Prints the available workers
MWWorkerID* get_workers_head()
Another terrible addition so that Jeff can print out the worker list in his own format
MWList* workers
The head of the list of workers.
MWSuspensionPolicy suspensionPolicy
Here's where we store what should happen on a suspension...
void sort_worker_list()
Based on the ordering policy, place w in the worker list appropriately
int numWorkers()
Counts the existing workers
int numWorkers( int arch )
Counts the number of workers in the given arch class
int numWorkersInState( int ThisState )
Counts the number of workers in the given state
MWKey return_best_todo_keyval( void )
Returns the value (only) of the best key in the Todo list
MWKey return_best_running_keyval( void )
Returns the best value (only) of the best key in the Running list.
XML and Status Methods.
void write_XML_status()
virtual char* get_XML_results_status(void )
If you want to display information about status of some results variables of your solver, you have to dump a string in ASCII, HTML or XML format out of the following method
char* get_XML_status()
char* get_XML_job_information()
char* get_XML_problem_description()
char* get_XML_interface_remote_files()
char* get_XML_resources_status()
const char* xml_filename
const char* xml_menus_filename
const char* xml_jobinfo_filename
const char* xml_pbdescrib_filename
void get_machine_info()
Set the current machine information
char* get_Arch()
Returns a pointer to the machine's Arch
char* get_OpSys()
Returns a pointer to the machine's OpSys
char* get_IPAddress()
Returns a pointer to the machine's IPAddress
double get_CondorLoadAvg()
double get_LoadAvg()
int get_Memory()
int get_Cpus()
int get_VirtualMemory()
int get_Disk()
int get_KFlops()
int get_Mips()
char Arch[64]
char OpSys[64]
char IPAddress[64]
double CondorLoadAvg
double LoadAvg
int Memory
int Cpus
int VirtualMemory
int Disk
int KFlops
int Mips
int check_for_int_val(char* name, char* key, char* value)
Utility functions used by get_machine info
char mach_name[64]
The name of the machine the worker is running on.
double defaultTimeInterval
for measuring network connectivity

Documentation

This class is responsible for managing an application in an opportunistic environment. The goal is to be completely fault - tolerant, dealing with all possiblities of host (worker) problems. To do this, the MWDriver class manages a set of tasks and a set of workers. It monitors messages about hosts coming up and going down, and assigns tasks appropriately.

This class is built upon some sort of resource management and message passing lower layer. Previously, it was built directly on top of Condor - PVM, but the interface to that has been abstracted away so that it can use any facility that provides for resource management and message passing. See the abstract MWRMComm class for details of this lower layer. When interfacing with this level, you'll have use the RMC object that's a static member of the MWDriver, MWTask, and MWWorker class.

To implement an application, a user must derive a class from this base class and implement the following methods:

For a higher level of control regarding the distribution of tasks to workers, the following methods have to be implemented:

Similar application dependent methods must be implemented for the "Task" of work to be done and the "Worker" who performs the tasks.

MWDriver()
Default constructor

virtual ~MWDriver()
Destructor - walks through lists of tasks & workers and deletes them

void go( int argc, char *argv[] )
This method runs the entire fault-tolerant application in the condor environment. What is *really* does is call setup_master(), then master(), then printresults(), and then ends. See the other functions for details.

void go()
This version of go simply calls go(0, NULL)

virtual void printresults()
Prints the Results. Applications may re-implement this to print their application specific results. It is meant to be over-ridden.

static MWRMComm* RMC
A static instance of our Resource Management / Communication class. It's a member of this class because that way derived classes can use it easily; it's static because there should only be one instance EVER. The instance of RMC in the MWTask class is actually a pointer to this one...

A. Pure Virtual Methods
These are the methods from the MWDriver class that a user must reimplement in order to have to create an application.

virtual MWReturn get_userinfo( int argc, char *argv[] )
This function is called to read in all information specific to a user's application and do any initialization on this information

virtual MWReturn setup_initial_tasks( int *n, MWTask ***task )
This function must return a number n > 0 of pointers to Tasks to "jump start" the application.

The MWTasks pointed to should be of the task type derived for your application

virtual MWReturn act_on_completed_task( MWTask * )
This function performs actions that happen once the Driver receives notification of a completed task. You will need to cast the MWTask * to a pointer of the Task type derived for your application. For example

				My_Task *dt = dynamic_cast<My_Task *> ( t );
				assert( dt );     
				

virtual MWReturn act_on_starting_worker( MWWorkerID *w )
This function should be implemented by the application to assign the workClass number to the worker if it is doing intelligent work scheduling

virtual MWReturn pack_worker_init_data( void )
A common theme of Master-Worker applications is that there is a base amount of "initial" data defining the problem, and then just incremental data defining "Tasks" to be done by the Workers.

This one packs all the user's initial data. It is unpacked int the worker class, in unpack_init_data().

virtual void unpack_worker_initinfo( MWWorkerID *w )
This one unpacks the "initial" information sent to the driver once the worker initializes.

Potential "initial" information that might be useful is...

  • Information on the worker characteristics etc...
  • Information on the bandwith between MWDriver and worker

These sorts of things could be useful in building some scheduling intelligence into the driver.

virtual void pack_driver_task_data( void )
OK, This one is not pure virtual either, but if you have some "driver" data that is conceptually part of the task and you wish not to replicate the data in each task, you can pack it in a message buffer by implementing this function. If you do this, you must implement a matching unpack_worker_task_data() function.

B. Task List Management
These functions are to manage the list of Tasks. MW provides default useful functionality for managing the list of tasks.

void workClasses_set( int num )
Set up workClasses

int workClasses_get( )
Get the number of workclasses

int workClasses_getworkers( int num )
get workers in the present class

int workClasses_gettasks( int num )
get tasks in the present class

int refreshWorkers( int i, MWREFRESH_TYPE )
Send all workers in group i a RE_INIT message

void addTask( MWTask * )
Add a task to the list

void addTasks( int, MWTask ** )
Add a bunch of tasks to the list. You do this by making an array of pointers to MWTasks and giving that array to this function. The MWDriver will take over memory management for the MWTasks, but not for the array of pointers, so don't forget to delete [] it!

void addSortedTasks( int n, MWTask **add_tasks )
This will add a list of tasks that are sorted by key. Efficiency can be greatly improved by using this function

void addTaskByKey( MWTask *add_task )
This is a helper function for addSortedTasks().

void set_task_key_function( MWKey (*)( MWTask * ) )
Sets the function that MWDriver users to get the "key" for a task

int set_task_add_mode( MWTaskAdditionMode )
Set the mode you wish for task addition.

int set_task_retrieve_mode( MWTaskRetrievalMode )
Set the mode you wish for task retrieval.

int sort_task_list( void )
This sorts the task list by the key that is set

int delete_tasks_worse_than( MWKey )
This deletes all tasks in the task list with a key worse than the one specified

int get_number_tasks()
returns the number of tasks on the todo list.

int get_number_running_tasks()
returns the number of running tasks.

int print_task_keys( void )
(Mostly for debugging) -- Prints the task keys in the todo list

C. Worker Policy Management

void set_suspension_policy( MWSuspensionPolicy )
Set the policy to use when suspending. Currently this can be either DEFAULT or REASSIGN

int set_machine_ordering_policy( MWMachineOrderingPolicy )
Sets the machine ordering policy.

. task timeout policy.
MW provides a mechanism for performing tasks on workers that are potentially "lost". If the RMComm fails to notify MW of a worker going away in a timely fashion, the state of the computing platform and MW's vision of its state may become out of synch. In order to make sure that all tasks are done in a timely fashion, the user may set a time limit after which a task running on a "lost" worker may be rescheduled.

bool worker_timeout
If false : workers never timeout and can potentially work forever on a task If true : workers time out after worker_timeout_limit seconds

double worker_timeout_limit
Limit of seconds after which workers are considered time out and tasks are re-assigned

int worker_timeout_check_frequency
frequency at which we check if there are timed out workers

int next_worker_timeout_check
based on the time out frequency, next timeout check time

void reassign_tasks_timedout_workers()
Go through the list of timed out WORKING workers and reschedule tasks

void set_worker_timeout_limit(double timeout_limit, int timeout_frequency)
Sets the timeout_limit and turn worker_timeout to 1

D. Event Handling Methods
In the case that the user wants to take specific actions when notified of processors going away, these methods may be reimplemented. Care must be taken when reimplementing these, or else things may get messed up.

Probably a better solution in the long run is to provide users hooks into these functions or something.

Basic default functionality that updates the known status of our virtual machine is provided.

virtual MWReturn handle_benchmark( MWWorkerID *w )
Here, we get back the benchmarking results, which tell us something about the worker we've got. Also, we could get some sort of error back from the worker at this stage, in which case we remove it.

virtual void handle_hostdel()
This is what gets called when a host goes away. We figure out who died, remove that worker from our records, remove its task from the running queue (if it was running one) and put that task back on the todo list.

virtual void handle_hostsuspend()
Implements a suspension policy. Currently either DEFAULT or REASSIGN, depending on how suspensionPolicy is set.

virtual void handle_hostresume()
Here's where you go when a host gets resumed. Usually, you do nothing...but it's nice to know...

virtual void handle_taskexit()
We do basically the same thing as handle_hostdel(). One might think that we could restart something on that host; in practice, however -- especially with the Condor-PVM RMComm implementation -- it means that the host has gone down, too. We put that host's task back on the todo list.

virtual void handle_checksum()
Routine to handle when the communication layer says that a checksum error happened. If the underlying Communitor gives a reliably reliable communication then this messge need not be generated. But for some Communicators like MW-File we may need some thing like this.

E. Checkpoint Handling Functions
These are logical checkpoint handling functions. They are virtual, and are *entirely* application-specific. In them, the user must save the "state" of the application to permanent storage (disk). To do this, you need to:

  • Implement the methods write_master_state() and read_master_state() in your derived MWDriver app.
  • Implement the methods write_ckpt_info() and read_ckpt_info() in your derived MWTask class.

Then MWDriver does the rest for you. When checkpoint() is called (see below) it opens up a known filename for writing. It passes the file pointer of that file to write_master_state(), which dumps the "state" of the master to that fp. Here "sate" includes all the variables, info, etc of YOUR CLASS THAT WAS DERIVED FROM MWDRIVER. All state in MWDriver.C is taken care of (there's not much). Next, checkpoint will walk down the running queue and the todo queue and call each member's write_ckpt_info().

Upon restart, MWDriver will detect the presence of the checkpoint file and restart from it. It calls read_master_state(), which is the inverse of write_master_state(). Then, for each task in the checkpoint file, it creates a new MWTask, calls read_ckpt_info() on it, and adds it to the todo queue.

We start from there and proceed as normal.

One can set the "frequency" that checkpoint files will be written (using set_checkpoint_frequency()). The default frequency is zero - no checkpointing. When the frequency is set to n, every nth time that act_on_completed_task gets called, we checkpoint immediately afterwards. If your application involves "work steps", you probably will want to leave the frequency at zero and call checkpoint yourself at the end of a work step.

void checkpoint()
This function writes the current state of the job to disk. See the section header to see how it does this.
See Also:
MWTask

void restart_from_ckpt()
This function does the inverse of checkpoint. It opens the checkpoint file, calls read_master_state(), then, for each task class in the file, creates a MWTask, calls read_ckpt_info on it, and adds that class to the todo list.

int set_checkpoint_frequency( int freq )
This function sets the frequency with with checkpoints are done. It returns the former frequency value. The default frequency is zero (no checkpoints). If the frequency is n, then a checkpoint will occur after the nth call to act_on_completed_task(). A good place to set this is in get_userinfo().
Returns:
The former frequency value.
Parameters:
freq - The frequency to set checkpoints to.

int set_checkpoint_time( int secs )
Set a time-based frequency for checkpoints. The time units are in seconds. A value of 0 "turns off" time-based checkpointing. Time-based checkpointing cannot be "turned on" unless the checkpoint_frequency is set to 0. A good place to do this is in get_userinfo().
Returns:
The former time frequency value.
Parameters:
secs - Checkpoint every "secs" seconds

virtual void write_master_state( FILE *fp )
Here you write out all 'state' of the driver to fp
Parameters:
fp - A file pointer that has been opened for writing.

virtual void read_master_state( FILE *fp )
Here, you read in the 'state' of the driver from fp. Note that this is the reverse of write_master_state().
Parameters:
fp - A file pointer that has been opened for reading.

virtual MWTask* gimme_a_task()
It's really annoying that the user has to do this, but they do. The thing is, we have to make a new task of the user's derived type when we read in the checkpoint file.

If your application coredumps when trying to restart from a checkpoint, it might be becasue you haven't implemented this function.

      MWTask* gimme_a_task() {
      return new <your derived task class>;
      }
      

Benchmarking
We now have a user-defined benchmarking phase. The user can "register" a task that is sent to each worker upon startup. This way, the user knows which machines are fastest, and MW can perform can automatic "normalization" of the equivalent CPU time.

void register_benchmark_task( MWTask *t )
register the task that will be used for benchmarking

MWTask* get_benchmark_task()
get the benchmark task

double timeval_to_double( struct timeval t )
A helper function... Returns double value of seconds in timeval t.

Main Internal Handling Routines

MWReturn master_setup( int argc, char *argv[] )
This method is called before master_mainloop() is. It does some setup, including calling the get_userinfo() and create_initial_tasks() methods. It then figures out how many machines it has and starts worker processes on them.
Returns:
This is the from the user's get_userinfo() routine. If get_userinfo() returns OK, then the return value is from the user's setup_initial_tasks() function.
Parameters:
argc - The argc from the command line
argv - The argv from the command line

MWReturn master_mainloop()
This is the main controlling routine of the master. It sits in a loop that accepts a message and then (in a big switch statement) calls routines to deal with that message. This loop ends when there are no jobs on either the running or todo queues. It is probably best to see the switch staement yourself to see which routines are called when.

MWReturn worker_init( MWWorkerID *w )
unpacks the initial worker information, and sends the application startup information (by calling pure virtual pack_worker_init_data()

The return value is taken as the return value from the user's pack_worker_init_data() function

MWReturn create_initial_tasks()
This routine sets up the list of initial tasks to do on the todo list. In calls the pure virtual function setup_initial_tasks().
Returns:
Is taken from the return value of setup_initial_tasks().

MWReturn handle_worker_results( MWWorkerID *w )
Act on a "completed task" message from a worker. Calls pure virtual function act_on_completed_task().
Returns:
Is from the return value of act_on_completed_task().

void send_task_to_worker( MWWorkerID *w )
We grab the next task off the todo list, make and send a work message, and send it to a worker. That worker is marked as "working" and has its runningtask pointer set to that task. The worker pointer in the task is set to that worker. The task is then placed on the running queue.

void rematch_tasks_to_workers( MWWorkerID *nosend )
After each result message is processed, we try to match up tasks with workers. (New tasks might have been added to the list during processing of a message). Don't send a task to "nosend", since he just reported in.

void call_hostaddlogic()
A wrapper around the lower level's hostaddlogic. Handles things like counting machines and deleting surplus

void kill_workers()
Kill all the workers

void hostPostmortem( MWWorkerID *w )
This is called in both handle_hostdelete and handle_taskexit. It removes the host from our records and cleans up relevent pointers with the task it's running.

void ControlPanel( )
The control panel that controls the execution of the independent mode

Internal Task List Routines
These methods and data are responsible for managing the list of tasks to be done

void pushTask( MWTask * )
This puts a (generally failed) task at the beginning of the list

MWTask* getNextTask( MWGroup *grp )
Get a Task.

void putOnRunQ( MWTask *t )
This puts a task at the end of the running list

MWTask* rmFromRunQ( int jobnum )
Removes a task from the queue of things to run

void printRunQ()
Print the tasks in the list of tasks to do

void ckpt_addTask( MWTask * )
Add one task to the todo list; do NOT set the 'number' of the task - useful in restarting from a checkpoint

MWWorkerID* task_assigned( MWTask *t )
returns the worker this task is assigned to, NULL if none.

bool task_in_todo_list( MWTask *t )
Returns true if "t" is still in the todo list

MWKey (*task_key)( MWTask * )
A pointer to a (user written) function that takes an MWTask and returns the "key" for this task. The user is allowed to change the "key" by simply changing the function

MWTaskAdditionMode addmode
Where should tasks be added to the list?

MWTaskRetrievalMode getmode
Where should tasks by retrived from the list

MWKey (*worker_key)( MWWorkerID * )
A pointer to the function that returns the "key" by which machines are ranked. Right now, we offer only some (hopefully useful) default functions that are set through the machine_ordering_policy

int task_counter
MWDriver keeps a unique identifier for each task -- here's the counter

bool listsorted
Is the list sorted by the current key function

MWList* todo
The head of the list of tasks to do

This is Jeff's nasty addition so that he can get access

MWTask* get_todo_head()
to the tasks on the master

MWList* running
The head of the list of tasks that are actually running

Worker management methods
These methods act on the list of workers (or specifically) ID's of workers, that the driver knows about.

void addWorker( MWWorkerID *w )
Adds a worker to the list of avaiable workers

MWWorkerID* lookupWorker( int tid )
Looks up information about a worker given its task ID

MWWorkerID* rmWorker( int tid )
Removes a worker from the list of available workers

This function removes worker from the list, removes it and deletes

void worker_last_rites( MWWorkerID *w )
the structure.

void printWorkers()
Prints the available workers

MWWorkerID* get_workers_head()
Another terrible addition so that Jeff can print out the worker list in his own format

MWList* workers
The head of the list of workers.

MWSuspensionPolicy suspensionPolicy
Here's where we store what should happen on a suspension...

void sort_worker_list()
Based on the ordering policy, place w in the worker list appropriately

int numWorkers()
Counts the existing workers

int numWorkers( int arch )
Counts the number of workers in the given arch class

int numWorkersInState( int ThisState )
Counts the number of workers in the given state

MWKey return_best_todo_keyval( void )
Returns the value (only) of the best key in the Todo list

MWKey return_best_running_keyval( void )
Returns the best value (only) of the best key in the Running list.

Checkpoint internal helpers...

int checkpoint_frequency
How often to checkpoint? Task frequency based

int checkpoint_time_freq
How often to checkpoint? Time based

long next_ckpt_time
Time to do next checkpoint...valid when using time-based checkpointing.

int num_completed_tasks
The number of tasks acted upon up to now. Used with checkpoint_frequency

MWTask* bench_task
The benchmark task

const char* ckpt_filename
The name of the checkpoint file

MWStatistics stats
The instance of the stats class that takes workers and later prints out relevant stats...

double get_instant_pool_perf()
This returns the sum of the bench results for the currently working machines

char* getHostName()
This returns the hostname

XML and Status Methods.
This function is called by the CORBA layer to get the XML status of the MWDriver.

void write_XML_status()

virtual char* get_XML_results_status(void )
If you want to display information about status of some results variables of your solver, you have to dump a string in ASCII, HTML or XML format out of the following method. The iMW interface will be in charge of displaying this information on the user's browser.

char* get_XML_status()

char* get_XML_job_information()

char* get_XML_problem_description()

char* get_XML_interface_remote_files()

char* get_XML_resources_status()

const char* xml_filename

const char* xml_menus_filename

const char* xml_jobinfo_filename

const char* xml_pbdescrib_filename

void get_machine_info()
Set the current machine information

char* get_Arch()
Returns a pointer to the machine's Arch

char* get_OpSys()
Returns a pointer to the machine's OpSys

char* get_IPAddress()
Returns a pointer to the machine's IPAddress

double get_CondorLoadAvg()

double get_LoadAvg()

int get_Memory()

int get_Cpus()

int get_VirtualMemory()

int get_Disk()

int get_KFlops()

int get_Mips()

char Arch[64]

char OpSys[64]

char IPAddress[64]

double CondorLoadAvg

double LoadAvg

int Memory

int Cpus

int VirtualMemory

int Disk

int KFlops

int Mips

int check_for_int_val(char* name, char* key, char* value)
Utility functions used by get_machine info

char mach_name[64]
The name of the machine the worker is running on.

double defaultTimeInterval
for measuring network connectivity


This class has no child classes.
Author:
Mike Yoder, Jeff Linderoth, Jean-Pierre Goux, Sanjeev Kulkarni
See Also:
MWTask
MWWorker
MWRMComm

alphabetic index hierarchy of classes


this page has been generated automatically by doc++

(c)opyright by Malte Zöckler, Roland Wunderling
contact: doc++@zib.de