3.8.4 XML Injection Exercise

1. Introduction

XML stands for eXtensible Markup Language, a derivative of SGML (upon which HTML is also based) used to represent structured data objects as human-readable text. An XML parser extracts the data from complex structured XML files. Unless a program simply copies the whole XML file as a unit, every program must implement or call on an XML parser. There are various types of XML injection attacks that can cause damage. Though many computer languages and libraries have improved safety configurations and security features, there are still vulnerabilities in XML specifications and XML parsers to be exploited. See the chapter on XML Injection Attacks for more details on XML bomb attacks and XML external entity attacks.

1.1 Exercise Description

In this exercise, we provide a Java XML parser using standard libraries and two exercises that apply two different types of XML attacks. Java provides libraries to parse, modify, and inquire XML documents.

There are three exercises:

Coercive parsing: parsing a large junk data file to take up memory, slow down speed, and hence affect the entire system.
Information disclosure: exploiting the external entity to force an information disclosure.
Remote Code Execution: exploiting the PHP EXPECT extension to execute the external entity as command.

1.2 Vulnerability Mitigation

To mitigate the XML attacks we will follow two approaches. We will modify the parser settings, and we will modify the parser to include time-outs and support allow lists. The chapter on XML Injection Attacks covers more of these mitigation practices.

2. Exercise Instructions

These two exercises will be done using the command line terminal (shell) of the virtual machine. To open the terminal, right-click on the "EXERCISES" directory and select "Open in Terminal". Enter the following command to change into the exercise directory:

cd 3.8.4_XML_Injections

You will need two shell windows for this exercise, one for executing the command line and one for inspecting the XML file or parser.

2.1 Exercise 1

A coercive attack in XML involves parsing deeply nested XML documents that contain opening tags but not the corresponding closing tags. The idea is to make the victim use up and eventually deplete the machine's resources, causing a denial of service on the target. Removing the closing tags simplifies the attack since it requires only half the size of a well-formed document to accomplish the same results. The number of tags being processed eventually causes an error mesage in this virtual machine. If run on other computer, such as a Linux system in the CS Department, a heap out-of-memory or garbage collector error might occur.

Enter the following command to change into the exercise one directory:

cd Exercise_One

2.1.1 Compile the Parser

To compile this parser, enter the following command in one of your shell windows:

make

The parser we are using for this exercise is xmlParser.java, making use of the Java interface, XMLReader.

2.1.2 Run the Parser

To run this parser, enter the following command in one of your shell windows:

java xmlParser

The parser will throw an I/O Exception in shell window as following:

java.io.IOException: Need a valid XML file name.

The parser program requires an argument with the name of XML file you want to parse. To parse a valid XML file called books.xml, enter:

java xmlParser books.xml

You should see output in your window showing content from books.xml file.

For this exercise, to parse a malformed large XML file, largeFile.xml(900mb), enter

java xmlParser largeFile.xml

Note: Be careful! Parsing the malformed largeFile.xml (900mb) may crash your computer. Make sure you are ready before parsing it.

You should see some output in your window, printing out tag names from the largeFile.xml file. Then you should see the speed of printing in the console slowing down, and eventually the virtualBox will detect an error stating:

An error has occured duing virtual machine execution!
The error details are shown below.
You may try to correct the error and resume the virtual machine execution.

If you parse this file in one of the Linux systems in CS computer lab, an exception will be thrown:

java.lang.OutOfMemoryError: Java heap space or GC overhead limit exceeded

2.1.3 Inspect the File and Parser

Now that you have seen the result of our coercive parsing attack, it is time to inspect largeFile.xml to understand the cause of the problem:

more largeFile.xml

This file contains 80,000,000 nested starting tags, which forces the parser to parse starting tags that don't have ending tags. Any attacker could craft a long malformed XML input file.

Now let's look at the implementation of the XML parser to better understand the attack vector. A good place to start is in xmlParser.java. Use your favorite text editor to open this file. For example:

vim xmlParser.java

In xmlParser.java, the XMLReaderFactory class creates an XML reader, and the XMLReader interface is used to read and parse an XML file. In this parser, MyContentHandler implements the default content handler to print out content of the file while the file is being parsed.

2.1.4 Mitigate the Vulnerability: Forcing a Time-Out

In this mitigation, you will set a timer to go off if the parsing is taking too long processing a long, malformed XML file.

To mitigate this vulnerability go the the Mitigation folder in the Exercise_One folder:

cd Mitigation

In the Mitigation folder, there are three files:

Filename		Purpose
`xmlParser.java`		This file has a copy of the xmlParser we use to parse the largeFile.xml. Use this copy to edit and write your own code to mitigate this vulnerability.
`Makefile`		This Makefile is for you to compile your own version of the xmlParser.
`testFile.xml`		As parsing the largeFile.xml might cause negative effects to your computer, we do not recommend testing your code with this file. This testFile.xml has fewer nested tags for testing purposes.

As largeFile.xml is over 900MB, you probably shouldn't make another copy in this subdirectory. To final test your modified parser with this file:

java xmlParser ../largeFile.xml

For this mitigation, first start by familiarizing yourself with the XMLReader class. The goal is for your code to create a new thread to do the XML parsing. The main thread will then wait for this thread to complete, with a time-out if the parsing thread takes too long. If you don't have experience with threading in Java you may want to read one (or both) of these tutorials: Tutorial 1, Tutorial 2. Note that there is more than one correct way to implement this mitigation, this guidance is merely a resource for you to use, not a requirement.

2.2 Exercise 2

In this exercise, you will get familiar with how an XML parser might get access to a local confidential file (passwd.xml), and learn two ways to prevent it.

Change to the Exercise 2 directory:

cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Two/AttackOne

2.2.1 Compile the Parser

To compile this parser, enter the following command:

make

2.2.2 Run the Parser

To run this parser, enter the following command in one of your shell windows:

java xmlParser passwd.xml

You should see the contents of the /etc/passwd file as output.

2.2.3 Inspect the File and Parser

Now that you observed an information disclosure using an external entity, it is time to look at the passwd.xml to understand how this disclosure happened:

vim passwd.xml

This file contains an external entity referencing a local file, /etc/passwd. While processing this file, the parser replaces XXE with the contents of /etc/passwd.

Now that you understand the basic format of an external entity, it's time to look at the implementation of the parser to find the attack vector. Again, we will start in xmlParser.java.

vim xmlParser.java

2.2.4 Mitigate the Vulnerability

For this mitigation, we will follow two approaches: (1) disabling external entity expansions by changing the configuration settings and (2) comparing the external entity that is referenced with the contents of an allow list.

To start these mitigations, go into the Mitigation folder:

cd ../Mitigation

Approach (1): Disable the External Entity

The first approach consists of turning off the option that allows the use of external entities. This option is turned on by default.

Go into the ApproachOne folder:

cd ApproachOne

In this folder, we have three files used for this exercise:

Filename		Purpose
`xmlParser.java`		This file is a copy of the xmlParser that we use to parse the largeFile.xml. Use this copy to edit and write your own code to mitigate this vulnerability.
`Makefile`		This Makefile is for you to compile your own version of the xmlParser.
`passwd.xml`		This file is used to test your mitigation.

The class xmlReader has a method, setFeature, for changing the settings of the parser configuration. To disable external entity process, you will need to insert a couple of calls to this method:

xmlReader.setFeature(featureName: String, Flag: boolean);

featureName is a URL referencing a specific feature for the parser. To figure out which feature name and flag to use for disabling the external entity, you can reference the article from apache.org on Setting Features for detailed information.

Approach (2): Permit access only if reference is on an allow list

Turning off the external entity processing is a safe solution to our problem but lacks flexibility. There are cases where it might be appropriate to use an external entity, such as wanting to include the output of a program (specified in a URL) in your XML file.

As a result, we will use a allow list to keep track of which files or URLs are OK to parse.

To try this second approach, go into the ApproachTwo folder:

cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Two/Mitigation/ApproachTwo

You will modify the XML parser to compare the external entity reference with the with strings on your allow list. If the external entity is on the allow list, this entity will be parsed as normal. If it is not on the allow list, then the parser will ignore that external entity.

There are seven files in the directory for this exercise:

Filename		Purpose
`xmlParser.java`		Contains is the same xmlParser that we used previously. Use this copy to edit and write your own code to mitigate the vulnerability.
`Makefile`		This Makefile is for you to compile your own version of the xmlParser.
`passwd.xml`		Used to test your mitigation. If it works, the confidential file /etc/passwd will not be parsed.
`allowListForXMLXXEAccess.txt`		This file contains the list of strings that the xmlParser will allow to be used as external entities.
`readable.xml`		This file is used to test your mitigation. If it works, the local file readableFile.txt will be parsed.
`normalHtml.xml`		This file is used to test your mitigation. If it works, the remote file normalHtml.xml will be parsed.
`readableFile.txt`		This file is on the allow list. If your mitigation works, this file will be parsed.

To implement the mitigation add the line below in the main of xmlParser.java. That will allow your replacement resolveEntity method be called by the parser. That resolveEntity method is called every time that your parser finds an external entity.

xmlReader.setEntityResolver(new MyResolver());

Now you are ready to write our own resolver. Add in xmlParser.java:

class MyResolver implements EntityResolver{
    public InputSource resolveEntity(String publicId, String systemId) {
        //
        // Your allow list checking code goes here
        //
    }
}

The parameter systemId is the external entity found by the XML parser. This is the string that you need to check.

Your resolveEntity method must return null if the external entity was found on the allow list. That means that the parser will expand it.

Your resolveEntity method must return something different than null (for example the empty string, "") if the entity was not found on the allow list. That means that the parser will ignore that entity and continue with the parsing process.

Below is an example of returning an empty InputSource for the parser not to expand the external entity.

return new InputSource(new StringReader(""));

2.3 Exercise 3

In this exercise, we provide a PHP XML parser using PHP XML Parser extension to parse, modify, and inquire XML documents. We are using PHP 5.6 with a Process Control extension, EXPECT. EXPECT is an extension that allows interaction with processes through PTY. Streams opened via the expect:// protocol handler provide access to process' stdio, stdout and stderr via PTY. This could be used to execute malicious code. See the PHP manual on EXPECT for more details on PHP and EXPECT Extension.

Change to the Exercise 3 directory:

cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Three

2.3.1 Run the Parser

To run this parser, enter the following command in one of your shell windows:

php XMLParser.php

The parser will print out a line in shell window as follows:

Your input XML file:

The parser program requires a user input of the name of the XML file you want to parse. To parse a valid XML file called books.xml, enter:

books.xml

You should see output in your window showing content from books.xml file.

2.3.2 Inspect the Program Code

Now let's look at the implementation of the PHP XML parser to better understand the attack vector. A good place to start is in XMLParser.php. Use your favorite text editor to open this file. For example:

vim XMLParser.php

In XMLParser.php, the PHP XML Parser extension creates an XML parser, and then defines handlers for different XML events. In this parser, the startElement function is called whenever the XML parser encounters start tags. Similarly, the endElement function is called whenever the XML parser encounters end tags. The char function is called whenever the XML parser encounters the non-markup contents of XML documents. Additionally, the externalEntityRefhandler function is called whenever the XML parser finds a reference to an externally parsed general entity.

In the externalEntityRefHandler function, fopen() binds a named resource to a stream. This named resource is specified by the fourth parameter, systemId, which is the name of the external entity referenced in the XML file. Then it prints out the content of this stream till the end.

The format of this systemId is as follows:

protocol://path

When fopen() is used to open this systemId, it will first read in the name of the protocol and search for a protocol handler to process this systemId into a stream. The following is a table of main types of protocols and the behaviors of the protocol handlers:

Format	Referenced	Protocol Handler Behavior
`file://[filepath]`	local file	Access content of this file
`http://[filepath] or ftp://[filepath] or other web protocol`	online resource	Access content of the online resource
`expect://cmdline`	command or executable file	Execute this cmdline if EXPECT extension installed
`Others`	possibly malformed reference	Report an error

Now spend some time looking at the code and tracing the flow of execution. You are looking for an attack surface and a corresponding attack vector. In this case, consider how an attacker can make use of expect:// protocol and EXPECT extension to execute a malicious command or an executable file.

2.3.3 Exploit the Vulnerability

In order to exploit the vulnerability, you will write the external entity, which makes use of the EXPECT extension to implement two kinds of attacks: infinite stream and information disclosure.

To inspect the skeleton XML file, enter:

vim infiniteStream.xml

The file infiniteStream.xml contains as follows:

 <?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE root [
        
<!ENTITY content SYSTEM "// Your version of external entity goes here ">
        
]>
        
<root>&content;</root>

This skeleton XML file first defines one element, root, and one external entity, content, which is inside of the element, root. It then references the external entity using the syntax &content between the start tag and end tag of the element, root.

Infinite Stream

You need to use the infiniteStream.xml file to implement an attack. The attack consist of executing a program or command that will never end. Open and edit the XML file infiniteStream.xml.

vim infiniteStream.xml

Try a command that would cause an infinite output on the shell. Then edit the infiniteStream.xml file and make use of the EXPECT extension to make the PHP parser execute this command.

Information Disclosure

To implement the attack of information disclosure, you will open and edit an XML file informationDisclosure.xml. It shares the same structure and content with infiniteStream.xml. To start, enter:

vim informationDisclosure.xml

Recall the information disclosure attack in the Exercise 2 in XML Injection Attacks. In that attack, /etc/passwd is referenced by the external entity and parsed by the XML parser. In this attack, try a command that could print out the content of /etc/passwd. Then edit the informationDisclosure.xml file to make use of the EXPECT extension to make the PHP parser execute this command.

2.4 Mitigate the Vulnerability (Extra Credit)

Mitigate the Vulnerability: Forcing a Time-Out

In this mitigation, you will set a timer to go off if it takes too long for the parser to execute a command or an executable file.

To mitigate this vulnerability, go to the MitigationOne folder in the Exercise_Three folder:

cd MitigationOne

Please copy your finished attack file infiniteStream.xml to this directory by entering:

scp ../infiniteStream.xml infiniteStream.xml

After copying the attack file, there are two files in the MitigationTwo folder as follows:

Filename		Purpose
`XMLParser.php`		This file is a copy of the XMLParser that we use to exploit the vulnerability. Use this copy to edit and write your own code to mitigate this vulneralbility.
`infiniteStream.xml`		The malicious XML file you designed to reference the XML Parser to produce an infinite stream. If mitigation is successful, this attack will not work.

To implement this mitigation, you should keep track of the time the XML Parser uses to print out infinite stream of content from the external entity source. If it takes too long, your code should stop the externalEntityRefHandler function.

Mitigate the Vulnerability: Using an Allow List

For this mitigation, we will compare the external entity that is referenced with the content of an allow list.

To start this mitigation, please go into the Mitigation folder:

cd ../MitigationTwo

To mitigate the attack of information disclosure, disallowing the EXPECT extension is a safe solution to our problem, but lacks flexibility. There are cases where it might be appropriate to execute an external entity, such as wanting to include the output of a program (specified in your XML file).

As a result, we will use an allow list to keep track of which commands and executables are OK to execute.

You will modify the XML Parser to compare the external entity references with the strings on your allow list. If the external entity is on the allow list, this entity will be executed using EXPECT as PHP. If it is not on the allow list, then the parser will ignore that external entity.

Please copy your finished attack file informationDisclosure.xml to this directory by entering:

scp ../informationDisclosure.xml informationDisclosure.xml

After copying the attack file, there are five files in the directory for this exercise:

Filename		Purpose
`XMLParser.php`		This file contains the same xmlParser that we used previously. Use this copy to edit and write your own code to mitigate the vulnerability.
`informationDisclosure.xml`		The malicious XML file you designed to reference the XML Parser to output a local confidential file. If your mitigation is successful, this attack will not work.
`allowList.txt`		This file contains the list of strings that the xmlParser will allow to be used as external entities.
`importantData`		This executable file is on the while list. If your mitigation is successful, this file will be executed.
`showImportantData.xml`		This XML file references the XML Parser to execute a local executable file `importantData`. If your mitigation is sucessful, `importantData` will be executed.

For this mitigation, you will first inspect allowList.txt by entering:

vim allowList.txt

Then you will read from allowList.txt one line after another, and compare it with systemId in the externalEntityRefHandler function. If a match is found, execute the systemId. Otherwise, skip the systemId.