Observing the Universe Can Drown You in Images: Data Mining Solutions at JPL

Usama Fayyad, Microsoft Research (formerly at JPL)

2:30 pm Wed. May. 1 in 1325 CS&S

Modern science instruments can gather data at rates that make traditional inspection by humans infeasible. Techniques for automating the initial stages of analysis to allow analysts to reduce data so that it is analyzable by traditional methods are becoming a necessity in many fields. The talk will describe efforts to develop a new generation of data mining systems where users specify what to search for simply by providing the system with training examples, and letting the system automatically learn what to do. The system would then sift through the data and catalog objects of interest for analysis purposes.

Two applications at JPL will be used to illustrate the techniques and their effects. The first targets automating the cataloging of sky objects in digitized sky survey consisting of three terabytes of image data and containing on the order of two billion sky objects. The system (SKICAT) allows for automated and accurate classification, enabling the automated cataloging of billions of objects, the majority of which being too faint for visual recognition by astronomers. The second part of the talk will cover JARtool (JPL Adaptive Recognition Tool) targeting the detection and cataloging of about 1 million small volcanoes visible in the Magellan SAR database of over 30,000 images of Venus.

The techniques described are applicable to a wide range of problems and have little to do with the fact that the data happens to be images. Potential applications include medical imaging, automated inspection and diagnosis in manufacturing, decision support systems, database marketing, and summarization/visualization of large databases. More information on the JPL Machine Learning Systems Group is at http://www-aig.jpl.nasa.gov/mls/.