Extracting Comprehensible Symbolic Representations from Trained Neural Networks

Mark Craven
Computer Sciences Department, UW-Madison

12:05 pm, Wednesday April 25 in 8417 Social Science

Neural networks offer an appealing approach to concept learning because they are applicable to a large class of problems, and because they have demonstrated good generalization performance on a number of difficult real-world tasks. A limitation of neural networks, however, is that the concept representations they form are nearly impenetrable to human understanding. To address this limitation, we have been developing algorithms for extracting comprehensible, symbolic representations from trained neural networks. I will first discuss why it is important to be able to understand the concept representations formed by neural networks, and then describe our approach to this task. We have developed a novel method that involves viewing the rule-extraction task as a separate learning problem in which the target concept is the network itself. In addition to learning from training examples, our method exploits the property that networks can be queried.