Computer Sciences Dept.

Probabilistic Methods for Interpreting Electron-Density Maps

Frank DiMaio

With recent advances in structural genomics, there has been considerable interest in the rapid determination of protein structures in a high-throughput setting. One bottleneck in this process arises in protein crystallography, and deals with interpretation of the electron-density map, the three-dimensional "picture" of the protein that crystallography produces. This thesis presents a novel solution to this important problem of electron-density map interpretation. I apply probabilistic methods to automate the interpretation of poor-quality electron-density maps.

I show my probabilistic approach to density-map interpretation leads to more complete and more accurate protein models, in terms of the fraction of the protein automatically interpreted, as well as the RMS error of my method's inferred models versus "ground truth" (the deposited structure), than do other automated approaches. My probabilistic approach is also amenable to production of multiple protein models that explain the observed density. I show that multiple static conformations generated by my framework do a better job of explaining the observed density than does a single structure, based on the R-free metric, which measures the difference between observed and predicted crystallographic reflection data on a testset of held-aside data. My method accurately interprets 3-4A density maps, further extending the resolution of density maps that can be automatically interpreted.

This thesis also describes several computational contributions. I describe a significant improvement over previous work in three-dimensional template matching in electron-density maps. I use the spherical-harmonic decomposition of a template to rapidly search for all rotations of the template. This offers both improved efficiency and accuracy compared to previous work, producing better models in 60% of the running time. I present a novel joint type as well as improved methods for collision-handling in part-based object-recognition. Finally, I present a general part-based object-recognition framework specialized for identifying topologically complex objects in large three-dimensional images. My framework introduces an algorithm that improves the efficiency of current probabilistic inference algorithms. This improved efficiency allows recognition of objects with hundreds of parts. Although originally developed for density-map interpretation, these computational contributions may be beneficial in other problem domains.

Download this report (PDF)

Return to tech report index

Computer Science | UW Home