The talk describes work that is part of a project to build tools to help biologists interpret the results of microarray experiments. The goal is to put together data from a variety of sources to generate good characterizations of the genes whose expression levels changed significantly under some treatment (eg, an antibiotic). The data that we currently use includes the expression level data from the microarray experiment and textual information about genes from two different databases. The task is a little different from traditional machine learning tasks in several ways, so that interesting issues arise, particularly in evaluating proposed characterizations. The talk will present a new algorithm for generating characterizations and a method for evaluating them that are appropriate for datasets consisting of a small number of instances with large descriptions, such as the microarray data. The talk will also present ways in which this algorithm could be incorporated into a larger characterization system. This work is part of a project with Jude Shavlik and Michael Molla.