Importance-Based Feature Extraction for Reinforcement Learning

David Finton, UW Computer Sciences

2:30 pm Fri Oct 4 2310 CS & Stats

For on-line learning techniques to be truly autonomous, the learner must be able to develop an effective representation of the important aspects of its environment. This is a challenging problem when the feedback given to the learner is complete, but even more so for reinforcement learning. In fully-supervised learning problems, the learner is told the correct responses, and can construct an error function by comparing its behavior with the correct behavior. In reinforcement learning, the learner is not told the correct behavior, but instead receives occasional reinforcement feedback which indicates the level of success of its actions over time. So the feedback to the learner is ambiguous; it doesn't indicate whether failure results from an inadequate representation or wrong strategy but good features.

A typical reinforcement learner learns to estimate action values, and follows a policy which maximizes the return from its actions. I have developed a new criterion -- "importance" -- for evaluating the effectiveness of features in terms of the emerging action values. The goal of "importance-based feature extraction" is to produce feature detectors which reliably indicate the utility of choosing particular actions. Hence, importance-based feature extraction constructs a representation which is relevant to the particular task faced by the learner.