Importance-Based Feature Extraction for Reinforcement Learning
David Finton, UW Computer Sciences
2:30 pm Fri Oct 4 2310 CS & Stats
For on-line learning techniques to be truly autonomous, the learner must
be able to develop an effective representation of the important aspects
of its environment. This is a challenging problem when the feedback
given to the learner is complete, but even more so for reinforcement
learning. In fully-supervised learning problems, the learner is told
the correct responses, and can construct an error function by comparing
its behavior with the correct behavior. In reinforcement learning, the
learner is not told the correct behavior, but instead receives occasional
reinforcement feedback which indicates the level of success of its actions
over time. So the feedback to the learner is ambiguous; it doesn't
indicate whether failure results from an inadequate representation or
wrong strategy but good features.
A typical reinforcement learner learns to estimate action values, and
follows a policy which maximizes the return from its actions. I have
developed a new criterion -- "importance" -- for evaluating the
effectiveness of features in terms of the emerging action values. The goal
of "importance-based feature extraction" is to produce feature detectors
which reliably indicate the utility of choosing particular actions. Hence,
importance-based feature extraction constructs a representation which
is relevant to the particular task faced by the learner.