Serialization is a technique to convert in-memory data structures into a stable byte representation that can later be converted back into an equivalent in-memory structure. This is most often used for transmitting data over a network or persistently storing program state. While serialization and deserialization offer a powerful tool for abstraction, they can contain dangerous vulnerabilities when applied to untrusted data. See Chapter 3.5 on Serialization for more information on general serialization security concerns.
For this exercise, we provide a simple server and client that communicate serialized strings over socket connections. This is to emulate a situation in which some server deserializes data from an untrusted client. For example, consider a situation in which you write and distribute a Python program that sends some serialized program state along with a bug report. Since the client program can be trivially reverse engineered, an attacker can send any serialized data to your server.
The program for this exercise uses Python's pickle library to
serialize an object, send it to our server, deserialize the object, and
print its value.
Serialization (pickling) in Python works in an interesting,
flexible, and somewhat complicated way.
The pickled object is actually a pair of fields.
The first field is a callable object (basically a function name) and
the second is a tuple of the data elements to be passed as arguments to
The Python method
pickle.dumps(...) internally calls the
__reduce__(self) method, which returns this pair
These two fields are sent to the recipient, who deserializes
(unpickles) it by calling the method, passing it the arguments in the
The result of this call is the deserialized object.
Our implementation of this server is in three Python files,
Your objective is to change the client such that deserializing the
malicious object will execute arbitrary shell commands on the server.
||This file defines two trivial functions to convert a string object into a byte representation. This must be in a separate file because the client and server must be using consistent encoding/decoding mechanisms. You will not need to change this file.|
||This file implements our simple server, which listens for socket connections and deserializes the data it receives. It expects to deserialize a string using the functions defined in our codec. You will change this file as a part of mitigating the vulnerability.|
||This file implements a "good" client that connects to our
It serializes our "surprise" object by overriding the
There are many considerations when trying to mitigate serialization vulnerabilities. The text covers some of these general concepts. For this exercise, we will mitigate the vulnerability by restricting which methods can be called by the unpickling process. The Python pickle documentation describes this mechanism in more detail. Your objective will be to implement this restriction such that only our expected decoding methods are possible to invoke. Note that it is still important that we consider the data to be untrusted. For example, imagine we are populating an object that contains a "privilege" field; we must deliberately prevent an attacker from specifying an unauthorized privilege.
In the case of this exercise, restricting the available methods may
mitigate the vulnerability.
However, in many cases, it is preferred to avoid serialization of
untrusted data altogether.
For example, instead of serializing an object from a client, we
send only the
necessary information in primitive values (ensuring that each value
validated and sanitized).
This is often more secure and more computationally
than object serialization.
Another option is to use general data formats such as
JSON or YAML
and libraries to convert object members to those formats.
For example, the third-party
for Python has a
safe_load method that can conveniently
This exercise will be completed entirely on the command line terminal of the provided virtual machine. To open the terminal, right-click on the "EXERCISES" directory and select "Open in Terminal". Enter the following command to change into the exercise directory:
You will need two shell windows for this exercise, one for the server and one for the client. Perform the above two actions twice.
To run this program, you will need to first start the server in one shell window. Enter the following command in one of your shell windows:
The server will continue to run until you press
ctrl + C in the server's shell window.
The client program takes an optional argument string, which is effectively the string that is sent to the server. With the server running in one window, switch to your other shell window and enter the following command:
python client.py "my message"
You should see some output in both your server and your client shell windows. The server will output the string that our client sent (namely "my message" in this case) and the client will output the serialized data along with the response string from the server.
Now that you understand the basic operation of our client-server
program, it's time to look at the implementation to find our attack
A good place to start is in
Use your favorite text editor to open this file.
Enter the following command to open it in Nano:
First familiarize yourself with the general flow of data into the server. The server endlessly waits for new socket connections. When one is accepted, the server forks (creates a new process). The parent process, our original server process, waits for another thread while the child process, the newly forked process, handles data coming into the opened socket. Once you have a clear understanding of where our incoming data is processed, move onto the client program.
client.py file and follow the execution path
of the program.
You should see that it creates a "surprise" object, sets the member
string, serializes the object, and sends it over the wire.
We know that
pickle.dumps will internally call the
__reduce__(self) method of our object.
Note that the override of
__reduce__(self) "encodes" the
string using our
codec function and specifies the
myDecode function as the function to deserialize the
We now have a clear attack vector to the unprotected deserialization
method of the vulnerable server. It's time to exploit it.
Using what you've learned above, try to change our "surprise" object
so that it runs a shell command when the server deserializes it.
In a real attack, you can be very creative in the destructive
potential of this vulnerability, from destroying file system trees
to installing malware, but we recommend something innocuous like
echo ATTACK_SUCCESSFUL for this exercise.
Run the server and make changes to
When you run your malicious client, you should be able to see the
output from your shell command in the server command line interface.
Once you see the "ATTACK_SUCCESSFUL" appear in your server command line, you've exploited the serialization vulnerability!
Note: This exploit is different from simply encoding the string
"ATTACK_SUCCESSFUL" by, for example, running
python client.py "ATTACK_SUCCESSFUL" because it
os.system function. You can tell this
is the case, because you will see the "ATTACK_SUCCESSFUL" string
before the print statement saying "Server Received: 0".
Our exploit works because Python's deserialization can invoke arbitrary functions. An intuitive solution is to allow only the functions we expect to encounter and block all other functions. Luckily, Python's pickle API has a convenient way to do this. Follow the example from Restricting Globals section of Python's pickle API.
Now mitigate the vulnerability in our
Implement the restricted unpickler class in
and test to make sure the vulnerability is mitigated.
Test good inputs to ensure the program still works, and test malicious
inputs to ensure the vulnerability is mitigated.