XML stands for eXtensible Markup Language, a derivative of SGML (upon which HTML is also based) used to represent structured data objects as human-readable text. An XML parser extracts the data from complex structured XML files. Unless a program simply copies the whole XML file as a unit, every program must implement or call on an XML parser. There are various types of XML injection attacks that can cause damage. Though many computer languages and libraries have improved safety configurations and security features, there are still vulnerabilities in XML specifications and XML parsers to be exploited. See the chapter on XML Injection Attacks for more details on XML bomb attacks and XML external entity attacks.
In this exercise, we provide a Java XML parser using standard libraries and two exercises that apply two different types of XML attacks. Java provides libraries to parse, modify, and inquire XML documents.
There are three exercises:
To mitigate the XML attacks we will follow two approaches. We will modify the parser settings, and we will modify the parser to include time-outs and support allow lists. The chapter on XML Injection Attacks covers more of these mitigation practices.
These two exercises will be done using the command line terminal (shell) of the virtual machine. To open the terminal, right-click on the "EXERCISES" directory and select "Open in Terminal". Enter the following command to change into the exercise directory:
cd 3.8.4_XML_Injections
You will need two shell windows for this exercise, one for executing the command line and one for inspecting the XML file or parser.
A coercive attack in XML involves parsing deeply nested XML documents that contain opening tags but not the corresponding closing tags. The idea is to make the victim use up and eventually deplete the machine's resources, causing a denial of service on the target. Removing the closing tags simplifies the attack since it requires only half the size of a well-formed document to accomplish the same results. The number of tags being processed eventually causes an error mesage in this virtual machine. If run on other computer, such as a Linux system in the CS Department, a heap out-of-memory or garbage collector error might occur.
Enter the following command to change into the exercise one directory:
cd Exercise_One
make
The parser we are using for this exercise is xmlParser.java, making use of
the Java interface, XMLReader.
java xmlParser
The parser will throw an I/O Exception in shell window as following:
java.io.IOException: Need a valid XML file name.
The parser program requires an argument with the name of XML file you want to parse. To parse a valid XML file called books.xml, enter:
java xmlParser books.xml
You should see output in your window showing content from books.xml file.
For this exercise, to parse a malformed large XML file, largeFile.xml(900mb), enter
java xmlParser largeFile.xml
Note: Be careful! Parsing the malformed largeFile.xml (900mb) may crash your computer. Make sure you are ready before parsing it.
You should see some output in your window, printing out tag names from the largeFile.xml file. Then you should see the speed of printing in the console slowing down, and eventually the virtualBox will detect an error stating:
An error has occured duing virtual machine execution!
The error details are shown below.
You may try to correct the error and resume the virtual machine execution.
If you parse this file in one of the Linux systems in CS computer lab, an exception will be thrown:
java.lang.OutOfMemoryError: Java heap space or GC overhead limit exceeded
more largeFile.xml
This file contains 80,000,000 nested starting tags, which forces the parser to parse starting tags that don't have ending tags. Any attacker could craft a long malformed XML input file.
Now let's look at the implementation of the XML parser to better
understand the attack vector.
A good place to start is in xmlParser.java
.
Use your favorite text editor to open this file.
For example:
vim xmlParser.java
In xmlParser.java, the XMLReaderFactory
class creates an
XML reader, and the XMLReader
interface is used to read
and parse an XML file.
In this parser, MyContentHandler
implements the default
content handler to print out content of the file while the file is
being parsed.
To mitigate this vulnerability go the the Mitigation folder in the Exercise_One folder:
cd Mitigation
In the Mitigation folder, there are three files:
Filename | Purpose | |
---|---|---|
xmlParser.java |
This file has a copy of the xmlParser we use to parse the largeFile.xml. Use this copy to edit and write your own code to mitigate this vulnerability. | |
Makefile |
This Makefile is for you to compile your own version of the xmlParser. | |
testFile.xml |
As parsing the largeFile.xml might cause negative effects to your computer, we do not recommend testing your code with this file. This testFile.xml has fewer nested tags for testing purposes. |
As largeFile.xml is over 900MB, you probably shouldn't make another copy in this subdirectory. To final test your modified parser with this file:
java xmlParser ../largeFile.xml
For this mitigation, first start by familiarizing yourself with the
XMLReader class
.
The goal is for your code to create a new thread to do the XML parsing.
The main thread will then wait for this thread to complete, with a
time-out if the parsing thread takes too long. If you don't have experience with threading
in Java you may want to read one (or both) of these tutorials:
Tutorial 1,
Tutorial 2.
Note that there is more than one correct way to implement this mitigation, this guidance
is merely a resource for you to use, not a requirement.
In this exercise, you will get familiar with how an XML parser might get access to a local confidential file (passwd.xml), and learn two ways to prevent it.
Change to the Exercise 2 directory:
cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Two/AttackOne
make
java xmlParser passwd.xml
You should see the contents of the /etc/passwd file as output.
vim passwd.xml
This file contains an external entity referencing a local file, /etc/passwd. While processing this file, the parser replaces XXE with the contents of /etc/passwd.
Now that you understand the basic format of an external entity, it's time to look at the implementation of the parser to find the attack vector. Again, we will start in xmlParser.java.
vim xmlParser.java
To start these mitigations, go into the Mitigation folder:
cd ../Mitigation
The first approach consists of turning off the option that allows the use of external entities. This option is turned on by default.
Go into the ApproachOne folder:
cd ApproachOne
In this folder, we have three files used for this exercise:
Filename | Purpose | |
---|---|---|
xmlParser.java |
This file is a copy of the xmlParser that we use to parse the largeFile.xml. Use this copy to edit and write your own code to mitigate this vulnerability. | |
Makefile |
This Makefile is for you to compile your own version of the xmlParser. | |
passwd.xml |
This file is used to test your mitigation. |
The class xmlReader
has a method, setFeature
,
for changing the settings of the parser configuration.
To disable external entity process, you will need to insert a
couple of calls to this method:
xmlReader.setFeature(featureName: String, Flag: boolean);
featureName
is a URL referencing a specific feature for
the parser.
To figure out which feature name and flag to use for disabling the
external entity, you can reference the article from apache.org on
Setting
Features for detailed information.
Turning off the external entity processing is a safe solution to our problem but lacks flexibility. There are cases where it might be appropriate to use an external entity, such as wanting to include the output of a program (specified in a URL) in your XML file.
As a result, we will use a allow list to keep track of which files or URLs are OK to parse.
To try this second approach, go into the ApproachTwo folder:
cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Two/Mitigation/ApproachTwo
You will modify the XML parser to compare the external entity reference with the with strings on your allow list. If the external entity is on the allow list, this entity will be parsed as normal. If it is not on the allow list, then the parser will ignore that external entity.
There are seven files in the directory for this exercise:
Filename | Purpose | |
---|---|---|
xmlParser.java |
Contains is the same xmlParser that we used previously. Use this copy to edit and write your own code to mitigate the vulnerability. | |
Makefile |
This Makefile is for you to compile your own version of the xmlParser. | |
passwd.xml |
Used to test your mitigation. If it works, the confidential file /etc/passwd will not be parsed. | |
allowListForXMLXXEAccess.txt |
This file contains the list of strings that the xmlParser will allow to be used as external entities. | |
readable.xml |
This file is used to test your mitigation. If it works, the local file readableFile.txt will be parsed. | |
normalHtml.xml |
This file is used to test your mitigation. If it works, the remote file normalHtml.xml will be parsed. | |
readableFile.txt |
This file is on the allow list. If your mitigation works, this file will be parsed. |
To implement the mitigation add the line below in the main
of xmlParser.java
.
That will allow your replacement resolveEntity
method be
called by the parser.
That resolveEntity
method is called every time that your
parser finds an external entity.
xmlReader.setEntityResolver(new MyResolver());
Now you are ready to write our own resolver. Add in xmlParser.java:
class MyResolver implements EntityResolver{
public InputSource resolveEntity(String publicId, String systemId) {
//
// Your allow list checking code goes here
//
}
}
The parameter systemId
is the external entity found by the
XML parser.
This is the string that you need to check.
Your resolveEntity
method must return
null
if the external entity was found on the
allow list.
That means that the parser will expand it.
Your resolveEntity
method must return
something different than null
(for example the empty string, "") if the entity was not found on the
allow list.
That means that the parser will ignore that entity and continue
with the
parsing process.
Below is an example of returning an empty InputSource for the parser not to expand the external entity.
return new InputSource(new StringReader(""));
In this exercise, we provide a PHP XML parser using PHP
XML Parser
extension to parse, modify, and inquire XML
documents.
We are using PHP 5.6 with a Process Control extension, EXPECT.
EXPECT is an extension that allows interaction with processes through
PTY.
Streams opened via the expect:// protocol handler provide access to
process' stdio, stdout and stderr via PTY.
This could be used to execute malicious code.
See the PHP manual on
EXPECT
for more details on PHP and EXPECT Extension.
Change to the Exercise 3 directory:
cd /home/user/Desktop/EXERCISES/3.8.4_XML_Injections/Exercise_Three
To run this parser, enter the following command in one of your shell windows:
php XMLParser.php
The parser will print out a line in shell window as follows:
Your input XML file:
The parser program requires a user input of the name of the XML file you want to parse. To parse a valid XML file called books.xml, enter:
books.xml
You should see output in your window showing content from books.xml file.
Now let's look at the implementation of the PHP XML parser to better
understand the attack vector.
A good place to start is in XMLParser.php
.
Use your favorite text editor to open this file.
For example:
vim XMLParser.php
In XMLParser.php, the PHP XML Parser
extension creates an
XML parser, and then defines handlers for different XML events.
In this parser, the startElement
function is called
whenever the XML parser encounters start tags.
Similarly, the endElement
function is called whenever the
XML parser encounters end tags.
The char
function is called whenever the XML parser
encounters the non-markup contents of XML documents.
Additionally, the externalEntityRefhandler
function is
called whenever the XML parser finds a reference to an externally
parsed general entity.
In the externalEntityRefHandler
function, fopen() binds a
named resource to a stream.
This named resource is specified by the fourth parameter, systemId,
which is the name of the external entity referenced in the XML file.
Then it prints out the content of this stream till the end.
The format of this systemId is as follows:
protocol://path
When fopen() is used to open this systemId, it will first read in the name of the protocol and search for a protocol handler to process this systemId into a stream. The following is a table of main types of protocols and the behaviors of the protocol handlers:
Format | Referenced | Protocol Handler Behavior |
---|---|---|
file://[filepath] |
local file | Access content of this file |
http://[filepath] or ftp://[filepath] or other web protocol
|
online resource | Access content of the online resource |
expect://cmdline |
command or executable file | Execute this cmdline if EXPECT extension installed |
Others |
possibly malformed reference | Report an error |
Now spend some time looking at the code and tracing the flow of
execution.
You are looking for an attack surface and a corresponding attack
vector.
In this case, consider how an attacker can make use of
expect://
protocol and EXPECT extension to execute a
malicious command or an executable file.
In order to exploit the vulnerability, you will write the external entity, which makes use of the EXPECT extension to implement two kinds of attacks: infinite stream and information disclosure.
To inspect the skeleton XML file, enter:
vim infiniteStream.xml
The file infiniteStream.xml
contains as follows:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE root [
<!ENTITY content SYSTEM "// Your version of external entity goes here ">
]>
<root>&content;</root>
This skeleton XML file first defines one element, root, and one
external entity, content, which is inside of the element, root.
It then references the external entity using the syntax
&content
between the start tag and
end tag of the element, root.
You need to use the infiniteStream.xml
file to implement
an attack.
The attack consist of executing a program or command that will never
end.
Open and edit the XML file infiniteStream.xml
.
vim infiniteStream.xml
Try a command that would cause an infinite output on the shell.
Then edit the infiniteStream.xml
file and make use of the
EXPECT extension to make the PHP parser execute this command.
To implement the attack of information disclosure, you will open and
edit an XML file informationDisclosure.xml
.
It shares the same structure and content with
infiniteStream.xml
. To start, enter:
vim informationDisclosure.xml
Recall the information disclosure attack in the
Exercise 2 in XML Injection Attacks.
In that attack, /etc/passwd
is referenced by the external
entity and parsed by the XML parser.
In this attack, try a command that could print out the content of
/etc/passwd
.
Then edit the informationDisclosure.xml
file to make use
of the EXPECT extension to make the PHP parser execute this command.
In this mitigation, you will set a timer to go off if it takes too long for the parser to execute a command or an executable file.
To mitigate this vulnerability, go to the MitigationOne folder in the Exercise_Three folder:
cd MitigationOne
Please copy your finished attack file infiniteStream.xml
to this directory by entering:
scp ../infiniteStream.xml infiniteStream.xml
After copying the attack file, there are two files in the MitigationTwo folder as follows:
Filename | Purpose | |
---|---|---|
XMLParser.php |
This file is a copy of the XMLParser that we use to exploit the vulnerability. Use this copy to edit and write your own code to mitigate this vulneralbility. | |
infiniteStream.xml |
The malicious XML file you designed to reference the XML Parser to produce an infinite stream. If mitigation is successful, this attack will not work. |
To implement this mitigation, you should keep track of the time the XML
Parser uses to print out infinite stream of content from the external
entity source.
If it takes too long, your code should stop the
externalEntityRefHandler
function.
For this mitigation, we will compare the external entity that is referenced with the content of an allow list.
To start this mitigation, please go into the Mitigation folder:
cd ../MitigationTwo
To mitigate the attack of information disclosure, disallowing the EXPECT extension is a safe solution to our problem, but lacks flexibility. There are cases where it might be appropriate to execute an external entity, such as wanting to include the output of a program (specified in your XML file).
As a result, we will use an allow list to keep track of which commands and executables are OK to execute.
You will modify the XML Parser to compare the external entity references with the strings on your allow list. If the external entity is on the allow list, this entity will be executed using EXPECT as PHP. If it is not on the allow list, then the parser will ignore that external entity.
Please copy your finished attack file
informationDisclosure.xml
to this directory by entering:
scp ../informationDisclosure.xml informationDisclosure.xml
After copying the attack file, there are five files in the directory for this exercise:
Filename | Purpose | |
---|---|---|
XMLParser.php |
This file contains the same xmlParser that we used previously. Use this copy to edit and write your own code to mitigate the vulnerability. | |
informationDisclosure.xml |
The malicious XML file you designed to reference the XML Parser to output a local confidential file. If your mitigation is successful, this attack will not work. | |
allowList.txt |
This file contains the list of strings that the xmlParser will allow to be used as external entities. | |
importantData |
This executable file is on the while list. If your mitigation is successful, this file will be executed. | |
showImportantData.xml |
This XML file references the XML Parser to execute a local
executable file importantData .
If your mitigation is sucessful, importantData
will be executed.
|
For this mitigation, you will first inspect allowList.txt
by entering:
vim allowList.txt
Then you will read from allowList.txt
one line after
another, and compare it with systemId in the
externalEntityRefHandler
function.
If a match is found, execute the systemId
.
Otherwise, skip the systemId
.