Growing Simpler Decision Trees to Facilitate Knowledge Discovery
Kevin Cherkauer,
(joint work with Jude W. Shavlik)
UW Computer Sciences
2:30 pm Fri Sep 27 2310 CS & Stats
When using machine learning techniques for knowledge discovery, output that
is comprehensible to a human is as important as predictive accuracy. We
introduce a new algorithm, SET-Gen, that improves the comprehensibility of
decision trees grown by standard C4.5 without reducing accuracy. It does this
by using genetic search to select the set of input features C4.5 is allowed to
use to build its tree. We test SET-Gen on a wide variety of real-world datasets
and show that SET-Gen trees are significantly smaller and reference
significantly fewer features than trees grown by C4.5 without using SET-Gen.
Statistical significance tests show that the accuracies of SET-Gen's
trees are either not distinguishable from or are more accurate than those of
the original C4.5 trees on all ten datasets tested.