3.5 Serialization Attack Exercise

1. Introduction

Serialization is a technique to convert in-memory data structures into a stable byte representation that can later be converted back into an equivalent in-memory structure. This is most often used for transmitting data over a network or persistently storing program state. While serialization and deserialization offer a powerful tool for abstraction, they can contain dangerous vulnerabilities when applied to untrusted data. See Chapter 3.5 on Serialization for more information on general serialization security concerns.

1.1 Exercise Description

For this exercise, we provide a simple server and client that communicate serialized strings over socket connections. This is to emulate a situation in which some server deserializes data from an untrusted client. For example, consider a situation in which you write and distribute a Python program that sends some serialized program state along with a bug report. Since the client program can be trivially reverse engineered, an attacker can send any serialized data to your server.

The program for this exercise uses Python's pickle library to serialize an object, send it to our server, deserialize the object, and print its value. Serialization (pickling) in Python works in an interesting, flexible, and somewhat complicated way. The pickled object is actually a pair of fields. The first field is a callable object (basically a function name) and the second is a tuple of the data elements to be passed as arguments to that method. The Python method pickle.dumps(...) internally calls the object's __reduce__(self) method, which returns this pair of fields. These two fields are sent to the recipient, who deserializes (unpickles) it by calling the method, passing it the arguments in the tuple. The result of this call is the deserialized object. Our implementation of this server is in three Python files, described below. Your objective is to change the client such that deserializing the malicious object will execute arbitrary shell commands on the server.

Filename		Purpose
`codec.py`		This file defines two trivial functions to convert a string object into a byte representation. This must be in a separate file because the client and server must be using consistent encoding/decoding mechanisms. You will not need to change this file.
`server.py`		This file implements our simple server, which listens for socket connections and deserializes the data it receives. It expects to deserialize a string using the functions defined in our codec. You will change this file as a part of mitigating the vulnerability.
`client.py`		This file implements a "good" client that connects to our server. It serializes our "surprise" object by overriding the `__reduce__(self)` method and sends that serialized data to our server. The override for `__reduce__(self)` simply calls our `myEncode` function on a member string, uses this as the single argument in our data tuple, and specifies `myDecode` as the deserialization function. You will change this file to create your exploit and again as a part of mitigating the vulnerability.

1.2 Vulnerability Mitigation

There are many considerations when trying to mitigate serialization vulnerabilities. The text covers some of these general concepts. For this exercise, we will mitigate the vulnerability by restricting which methods can be called by the unpickling process. The Python pickle documentation describes this mechanism in more detail. Your objective will be to implement this restriction such that only our expected decoding methods are possible to invoke. Note that it is still important that we consider the data to be untrusted. For example, imagine we are populating an object that contains a "privilege" field; we must deliberately prevent an attacker from specifying an unauthorized privilege.

In the case of this exercise, restricting the available methods may sufficiently mitigate the vulnerability. However, in many cases, it is preferred to avoid serialization of untrusted data altogether. For example, instead of serializing an object from a client, we might send only the necessary information in primitive values (ensuring that each value is validated and sanitized). This is often more secure and more computationally efficient than object serialization. Another option is to use general data formats such as JSON or YAML and libraries to convert object members to those formats. For example, the third-party PyYAML for Python has a safe_load method that can conveniently do these conversions.

2. Exercise Instructions

This exercise will be completed entirely on the command line terminal of the provided virtual machine. To open the terminal, right-click on the "EXERCISES" directory and select "Open in Terminal". Enter the following command to change into the exercise directory:

cd 3.5_serialization

You will need two shell windows for this exercise, one for the server and one for the client. Perform the above two actions twice.

2.1 Run the Program

To run this program, you will need to first start the server in one shell window. Enter the following command in one of your shell windows:

python server.py

The server will continue to run until you press ctrl + C in the server's shell window.

The client program takes an optional argument string, which is effectively the string that is sent to the server. With the server running in one window, switch to your other shell window and enter the following command:

python client.py "my message"

You should see some output in both your server and your client shell windows. The server will output the string that our client sent (namely "my message" in this case) and the client will output the serialized data along with the response string from the server.

2.2 Inspect the Program Code

Now that you understand the basic operation of our client-server program, it's time to look at the implementation to find our attack vector. A good place to start is in server.py. Use your favorite text editor to open this file. Enter the following command to open it in Nano:

nano server.py

First familiarize yourself with the general flow of data into the server. The server endlessly waits for new socket connections. When one is accepted, the server forks (creates a new process). The parent process, our original server process, waits for another thread while the child process, the newly forked process, handles data coming into the opened socket. Once you have a clear understanding of where our incoming data is processed, move onto the client program.

Open the client.py file and follow the execution path of the program. You should see that it creates a "surprise" object, sets the member string, serializes the object, and sends it over the wire. We know that pickle.dumps will internally call the __reduce__(self) method of our object. Note that the override of __reduce__(self) "encodes" the string using our codec function and specifies the myDecode function as the function to deserialize the object.

2.3 Exploit the Vulnerability

We now have a clear attack vector to the unprotected deserialization method of the vulnerable server. It's time to exploit it. Using what you've learned above, try to change our "surprise" object so that it runs a shell command when the server deserializes it. In a real attack, you can be very creative in the destructive potential of this vulnerability, from destroying file system trees to installing malware, but we recommend something innocuous like echo ATTACK_SUCCESSFUL for this exercise. Run the server and make changes to client.py. When you run your malicious client, you should be able to see the output from your shell command in the server command line interface.

Once you see the "ATTACK_SUCCESSFUL" appear in your server command line, you've exploited the serialization vulnerability!

Note: This exploit is different from simply encoding the string "ATTACK_SUCCESSFUL" by, for example, running python client.py "ATTACK_SUCCESSFUL" because it invokes the os.system function. You can tell this is the case, because you will see the "ATTACK_SUCCESSFUL" string before the print statement saying "Server Received: 0".

2.4 Mitigate the Vulnerability

Our exploit works because Python's deserialization can invoke arbitrary functions. An intuitive solution is to allow only the functions we expect to encounter and block all other functions. Luckily, Python's pickle API has a convenient way to do this. Follow the example from Restricting Globals section of Python's pickle API.

Now mitigate the vulnerability in our server.py program. Implement the restricted unpickler class in server.py and test to make sure the vulnerability is mitigated. Test good inputs to ensure the program still works, and test malicious inputs to ensure the vulnerability is mitigated.