Details About Privacy Measures

Your Computer, Your Choice

Protection starts from the moment you install our software. Only a system administrator can install our application packages for system-wide use. The first time you run any piece of instrumented software, a first-time opt-in dialog appears. No data will ever be collected or reported until you have used this dialog to give your permission. If for any reason the first-time opt-in dialog cannot be displayed, the entire feedback system is disabled.

The Bug Isolation Monitor is a visual reminder that you are using our instrumented applications. It appears in the notification area of the GNOME or KDE panel whenever an instrumented application is running. The monitor reveals at a glance whether the feedback system is enabled or disabled. A mouse click reveals a popup notification with easy controls for turning the system on and off, or for learning more about the project.

Secrecy of Feedback Data

As an instrumented application is running, it collects measurements into global variables within the program itself. When the program exits, it sends these measurements to the launcher application across a dedicated pipe. The launcher is responsible for assembling the feedback report and uploading it to our server here at Bug Isolation Headquarters. The feedback report is only held in the memory of running processes. It is never stored in any temporary file on your computer.

Report submission takes place using HTTP over an encrypted SSL connection, also known as https://. Our server’s 1024-bit RSA key gives third parties no practical means to eavesdrop on your report as it travels to us.

The report collection server at the University of Wisconsin-Madison is managed by a highly skilled system administration team. Reports are held in a secure, Kerberos-authenticated AFS file system with access restricted to only system administrators and CBI project members. Reports may be moved into databases, collated for analysis, or may be discussed in aggregate in future research publications. In all cases we will continue to apply the same stringent controls to protect your privacy.

The server records upload client IP addresses in a rotating log for system administration purposes only. These log files are accessible only by the server administrator. We do not associate IP addresses with individual reports. We have no practical means to connect specific reports with specific users. You are anonymous.

How Much Does a Report Reveal?

In spite of all of these precautions, suppose a clever attacker gets his hands on your feedback report. How much can the attacker learn?

Not much. If you’ve read about feedback reports, you know that the measurements are collected into a set of predicate counters testing various properties of program data values. Reports do not directly reveal the values themselves. We might check whether a particular file name variable is NULL, but we never record the file name itself.

Furthermore, if you’ve read about sparse random sampling, you know that we don’t even record most of what happens at all. If we are sampling at a rate of 1/100, then 99% of all the things a program does will not appear in the feedback report. So it is very difficult to tell exactly what went on during any single run simply by looking at one feedback report.

The details on feedback reports include a few examples of feedback data from Gaim, an instant messaging program. We will go even further and tell you that this Gaim run included a conversation between one of the project members and a good friend, and that much of this conversation was spent discussing a date one of the conversants had recently been on.

What can an attacker learn from the Gaim feedback report? The obvious details come from code coverage. We can see that several plugins were loaded. If you count up all of the non-zero predicates in each plugin, you might be able to guess whether the conversation took place over Jabber, Yahoo! Messenger, MSN, or AIM.

But can you tell what messages were exchanged? No, there is no evidence of the conversation text, and even guessing how many messages were exchanged based on code coverage would be highly error-prone due to the sparse sampling. Can you learn the names of the participants? Can you learn the passwords used to sign on to the various messaging services? Can you tell how the date went, or even whether this is really a truthful description of what went on during that Gaim session? No. These high-level details are impossible to recover. The data collected is too low-level, does not contain any actual program data values, and sparse sampling renders even code coverage a noisy metric at best.

Statistical debugging is a strategy for attributing failures to anomalous program behavior, but it works by finding patterns in hundreds or thousands of runs. Only on that kind of scale can we overcome the noise and uncertainty introduced by sparse random sampling. At the level of an individual feedback report, no sensitive or personally identifying information is revealed.