Go back to software page
ALICE: Application-Level Intelligent Crash Explorer
This is a user documentation of the ALICE tool, which can be used to discover “crash vulnerabilities” in applications. Crash vulnerabilities are problems that get exposed by a sudden power loss or system crash while the application is running, and the application cannot recover correctly after rebooting the machine. ALICE focuses on single-node applications that run atop file systems. ALICE is different from similar tools in that it aims to find vulnerabilities that might occur across all file systems, including future ones. ALICE is also unique in targeting vulnerabilities associated with different source-code lines of the application, instead of checking the application’s correctness atop arbitrarily (or systematically, but less useful) simulated crash scenarios. ALICE is designed to be extensible: both how it checks different source lines, and the combined behavior it assumes of underlying file systems, can be customized. The ALICE tool is a by-product of a research project (http://research.cs.wisc.edu/adsl/Publications/alice-osdi14.html) in the University of Wisconsin-Madison.
The following are the steps to install ALICE:
The toy application can be found in alice/example/toy/toy.c; the reader is encouraged to go through it. The application does the following:
A script that runs the application and records a trace, along with all initialization setup, can be found in alice/example/toy/toy_workload.sh; the reader is encouraged to go through it.
To perform Step 1, two directories are needed. The first, the workload directory, is where the files of the application will be stored. The application, as it runs, will modify the workload directory and its contents. For the toy application, this is the place where file1, link1, and link2, are placed. The toy_workload.sh script first creates the workload directory, workload_dir, and then initializes it with the file file1 containing “hello”.
The other needed directory, traces directory is for storing the (multiple) traces that are recorded as the application is run. The toy_workload.sh script next creates this directory, traces_dir. After setting up the workload directory and the traces directory, the toy_workload.sh script does a few more initialization things: compiling the toy.c application, and cding into workload_dir so that the toy application can be run within there.
The toy_workload.sh script finally runs the application and records traces, by issuing the following command:
alice-record --workload_dir . \
--traces_dir ../traces_dir \
If the reader is familiar with the strace utility, the above command is similar to an invocation of strace: alice-record is a script that records traces, while ../a.out is the actual application to be run (the process and all subprocesses of ../a.out are traced, similar to strace with the -ff option). The alice-record script requires two mandatory arguments: the workload directory and the traces directory (alice-record takes one more optional argument, --verbose, to control verbosity).
Step 2 requires the user to supply ALICE with a checker script. The checker script will be invoked multiple times by ALICE, each invocation corresponding to a (simulated) system crash scenario that could have happened while the application was running in Step 1. During each invocation, the checker script will be given a directory that reflects the state of the workload directory if the (simulated) crash had really happened. If the given crashed-state workload directory has an expected (i.e., consistent) set of files, the checker script should exit with status zero, and should exit with a non-zero status otherwise.
ALICE supplies the checker script with two command-line arguments. The first is the path to the crashed-state workload directory. The second command-line argument to the checker script is the path to an stdout file. The stdout file contains all the messages that had been printed to the user’s terminal at the time of the crash (corresponding to the supplied crashed-state workload directory), and can be used by the checker to check for durability, as explained below. Note that the crashed-state workload directory supplied by ALICE might differ from the original workload directory in Step 1. Hence, for applications that expect the absolute path of the contents within the workload directory to not have changed (a small subset of applications in our experience), the checker script needs to move the supplied directory to the original directory, and then operate atop the original directory.
The checker script for the toy application can be found in alice/example/toy/toy_checker.py, and the reader is encouraged to go through it. The script first changes the current working directory into the crashed-state directory supplied by ALICE, and reads all the messages printed in the terminal at the time of the crash by reading the stdout file supplied by ALICE. If the application has printed the “Updated file1 to world” message, the checker script makes sure that file1 contains “world”; otherwise, the checker script makes sure that file1 contains either “hello” or “world”. The checker script then makes sure that either link1 and link2 are both present, or are both not present. If any of the checked conditions do not hold, the checker script results in an assertion failure, thus exiting with a non-zero status (and thus informing ALICE that the application will fail if the simulated crash scenario happens in real).
After writing the checker script, the user can invoke the alice-check script to actually run ALICE and get the list of vulnerabilities. The reader is encouraged to run the following command from within the alice/example/toy directory, to get a list of vulnerabilities discovered in the toy application (after running toy_workload.sh first).
alice-check --traces_dir=traces_dir --checker=./toy_checker.py
The alice-check script has the following arguments:
ALICE first outputs a list of list of the logical operations that form the update protocol used by the application workload invoked in Step 1. The logical operations displayed is similar to a system-call trace, except that it is easier to understand, for example substituiting file names instead of file descriptor numbers.
ALICE then displays any discovered vulnerabilities. Vulnerabilities are displayed in two ways: dynamic vulnerabilities, relating to different operations in the update protocol, and static vulnerabilities, relating to source-code lines. The proper display of static vulnerabilities requires the originally traced application to have debugging symbols; also, ALICE associates each logical operation to one of the stack frames in the logical operation’s stack trace to display static vulnerabilities, and this association can sometimes be faulty.
ALICE is designed to be extensible. The current version of ALICE strips off many features that were previously implemented, in hopes that a smaller code base promotes extensions. However, the current version is also not sufficiently commented, and does not follow some good coding practices; a well-commented version of the software might be released in the future if users shows interest.
To extend ALICE, readers are required to go through our publication (http://research.cs.wisc.edu/adsl/Publications/alice-osdi14.html) to understand ALICE’s design and philosophy. Note that there is some terminology difference between the publication and ALICE’s source code; in particular, logical operations discussed in the publication correspond to micro operations in the source code, while micro operations in the publication correspond to disk operations in the source code.
ALICE’s default exploration strategy, which investigates the ordering and atomicity of each system call and reports any associated vulnerabilities, is coded in alice/alicedefaultexplorer.py, and can be easily changed. The alicedefaultexplorer.py code is complicated since it displays static vulnerabilities and invokes checkers in multiple threads. A functionally equivalent exploration strategy can be simpler.
ALICE’s default APM is coded in alice/alicedefaultfs.py, and can be easily changed. The alicedefaultfs.py code is complicated since it models a file system that can be configured to split file operations in different granularities. A functionally equivalent file system (with a single granularity) can be simpler.
Other than the extensions discussed till now, users might try to add support for more system calls, file attributes, symbolic links, or other such details, in ALICE. Relevant to these, the _aliceparsesyscalls.py script contains code that converts system calls into logical operations, while the replay_disk_ops() function from the alice.py script contains code that re-constructs a directory from a given list of micro-ops.
ALICE is a safe, but not a complete tool. That is, the application might have additional vulnerabilities than those discovered and reported. ALICE is thus not aligned towards comparing the correctness of different applications; specifically, any comparisons when not using equivalent workloads and checkers can easily produce confusing, wrong inferences. Also, any vulnerability displayed by ALICE might already be known to an application developer: the application documentation might explicitly require that the underlying file system not behave in those ways that will expose the vulnerability, or might simply not provide those guarantees that are being checked by the checker.
The default file-system model (APM) used by ALICE is designed to also find vulnerabilities that can get exposed by future file systems; some crash scenarios that are possible with the default model do not happen in common current file systems. Also, ALICE’s output (a list of vulnerabilities) is only designed to show the number of source lines that require ordering or atomicity. It is thus erraneous to directly correlate the number of vulnerabilities shown by ALICE with current real-world impact.
ALICE does not currently attempt to deal with any file attributes (including modification time) other than the file size, or with the FD_CLOEXEC and O_CLOEXEC facilities. If the application’s logic (that is invoked in the workload and the checker) depends on these, ALICE’s output is probably wrong. Support for a few rarely-used system calls is also lacking; warning or error messages are displayed by ALICE if the application workload had invoked such calls. The situation for symlinks is similar; while the current version of ALICE attempts to support them slightly, if the application logic depends on symlinks, ALICE’s output might be wrong.
The current version of ALICE also does not support tracing memory-mapped writes; applications that use such writes as a part of their (relevant) update protocol cannot use ALICE. Note that a version of ALICE used in our published research paper (http://research.cs.wisc.edu/adsl/Publications/alice-osdi14.html) traced memory-mapped writes, but support was removed in the interest of distributability.
Adding support for file attributes, CLOEXEC, symlinks, and mmap() writes does not require any changes to the design of ALICE, and might be done in future versions if users deem them helpful.
Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, and Samer Al-Kiswany were involved in various aspects of design and testing of the ALICE tool. Thanumalayan Sankaranarayana Pillai (email@example.com) is the primary author of the tool, and might serve to be the best contact for bug reports, feature requests, or other general discussions. Ramnatthan Alagappan extensively tested the tool, and Vijay Chidambaram also wrote a part of the code.
The ALICE tool is a by-product of a research project (http://research.cs.wisc.edu/adsl/Publications/alice-osdi14.html) in the University of Wisconsin-Madison, and due credit must be given to all parties who were involved in or contributed to the project.
The alice-strace tracing framework is a slight customization of the strace tool (http://sourceforge.net/projects/strace/), along with some code adapted from strace-plus (https://code.google.com/p/strace-plus/). Credits must be given to the authors and contributors of strace and strace-plus.