GLEXEC-2011-0001

Summary:

When using gLExec in a typical configuration it is possible for malicious jobs to continue consuming resources and in certain situation to attack subsequent jobs.

Component	Vulnerable Versions	Platform	Availability	Fix Available
	all	all	not known to be publicly available
Status	Access Required	Host Type Required	Effort Required	Impact/Consequences
Verified	Any user with permission to execute a job through gLExec.		low	medium
Fixed Date	Credit
	Daniel Crowell James A. Kupsch

Effort Required:

low

All that is required is being able to get the batch system to start a malicious job using gLExec.

Impact/Consequences:

medium

An attacker can continue consuming resources after the batch system thinks the original job has exited and in certain situations attack subsequent jobs.

Full Details:

GLExec is an identity switching program which allows a job to execute under a different identity. This identity is selected using the user provided X509 credential, and a mapping service such as a grid-map file or GUMS. GLExec will switch users when starting the job and has a mode called linger where it waits until the job has terminated. Under the linger mode, gLExec starts a job, waits for the job to finish, writes a log record, and then exits. Without the linger mode, gLExec execs the job itself. This causes gLExec to replace itself with the process of the job, and therefore can not perform any additional monitoring.

This problem is that this linger mode merely waits for the initial child process using waitpid (and the non-linger mode does nothing at all to monitor the job). To get around this check, the job can fork and exec another child process of its own and then allow the initial job process to terminate. As soon as the initial process terminates, gLExec will log that the job has finished and exit. Meanwhile, the child process continues running.

Child job processes still running after the job has finished can causes several problems. First, resources can continue to be used indefinitely on the system. Second, depending upon the policy, subsequent jobs may be attacked if user IDs are reused. In this situation, a later job submitted to gLExec can end up with the same ID as the first malicious process. Since the two processes are owned by the same user, the malicious process can attack this new job by sending it signals, accessing its files, or controlling the process using a debugging interface.

An simple example of where the malicious code could be used:


  #include <unistd.h>
  #include <stdlib.h>
  #include <sys/types.h>

  int main(int argc, char* argv[]) {

        pid_t pid = fork();

	if( pid == 0 ) {
		//malicious code here
	} else {
		//parent process
		exit(EXIT_SUCCESS);
	}
  }

Cause:

GLExec is not designed to perform a full cleanup of a job it executes.

Proposed Fix:

The best solution would be for gLExec to perform a full cleanup. To do this, gLExec would need to track and kill all processes created by the job. The second best option would be an official LCMAPS plug-in for gLExec which performs this functionality. An example of such a plug-in can be obtained from VDT and is called gLExec-osg. In general this idea would work, but there are problems with the current osg version (see OSG-GLEXEC_2011-0001 ). If gLExec itself is not changed to support job tracking and an official plug-in is not implemented, the gLExec documentation needs to specify and warn users of this limitation and that users need to provide tracking and cleanup functionality.

Acknowledgment:

This research funded in part by Department of Homeland Security grant FA8750-10-2-0030 (funded through AFRL) and NATO grant ICS.MD.CLG 984138.