Extracting Comprehensible Symbolic Representations from Trained Neural
Networks
Mark Craven
Computer Sciences Department, UW-Madison
12:05 pm, Wednesday April 25 in 8417 Social Science
Neural networks offer an appealing approach to concept learning because they
are applicable to a large class of problems, and because they have demonstrated
good generalization performance on a number of difficult real-world tasks. A
limitation of neural networks, however, is that the concept representations
they form are nearly impenetrable to human understanding. To address this
limitation, we have been developing algorithms for extracting comprehensible,
symbolic representations from trained neural networks. I will first discuss why
it is important to be able to understand the concept representations formed by
neural networks, and then describe our approach to this task. We have developed
a novel method that involves viewing the rule-extraction task as a separate
learning problem in which the target concept is the network itself. In addition
to learning from training examples, our method exploits the property that
networks can be queried.