The talk describes work that is part of a project to build tools to help
biologists interpret the results of microarray experiments.  The goal is to put
together data from a variety of sources to generate good characterizations of
the genes whose expression levels changed significantly under some treatment
(eg, an antibiotic). The data that we currently use includes the expression
level data from the microarray experiment and textual information about genes
from two different databases.

The task is a little different from traditional machine learning tasks in
several ways, so that interesting issues arise, particularly in evaluating
proposed characterizations.  The talk will present a new algorithm for
generating characterizations and a method for evaluating them that are
appropriate for datasets consisting of a small number of instances with large
descriptions, such as the microarray data.  The talk will also present ways in
which this algorithm could be incorporated into a larger characterization
system.

This work is part of a project with Jude Shavlik and Michael Molla.