1. This paper shows that a motion database can be preprocessed for flexibility in behavior and efficient search and exploited for real-time avatar control.

2. Real-time control of three-dimensional avatars is an important problem in computer games and virtual environments. Avatar animation and control is difficult, because a large repertoire of avatar behaviors must be made available, and the user must be able to select from this set of behaviors, possibly with a low-dimensional input device.

3. The motion is preprocessed to add variety and flexibility by creating connecting transitions where good matches in poses, velocities, and contact state of the character exist. The motion is then clustered into groups for efficient searching and for presentation in the interfaces. Three different interfaces are proposed to provide the user with intuitive control of the avatar¡¯s motion: choice, sketch, and performance.

4. A two-layer structure is proposed to represent the motion data. The higher layer is a statistical model that provides support for the user interfaces by clustering the data to capture similarities among character states. The lower layer is a Markov process that creates new motion sequences by selecting transitions between motion frames based on the high-level directions of the user. A unique aspect of this data structure is the link between these layers: trees of accessible clusters are stored in the lower layer on a frame by frame basis, providing a high-level and correct description of the set of behaviors achievable from each specific motion frame.

4.1 Lower Layer: Markov Process
Motion data is modeled as a first-order Markov process. The transition from one frame to the next depends only on the current frame. Blending is used to make smooth transition.

4.2 Higher Layer: Statistical Models
The higher layer is a generalization of the motion data that captures the distribution of frames and transitions through cluster analysis. Clusters are formed from the original motion data. These clusters capture similarities in motion frames, but they do not capture the connections between frames. To capture these connections, or the tree of choices available to the avatar at any given motion frame, a cluster tree at each motion frame is constructed. The entire higher layer is then called a cluster forest.

5. Avatar control interfaces
5.1 Choice-based interface: the user is continuously presented with a set of actions from which to choose.

5.2 Sketch-based interface: the user is allowed to draw paths on the screen using a mouse. The two-dimensional paths are projected onto the surfaces of environment objects to provide threedimensional coordinates.

5.3 Vision-based interface: the user acts out the desired motion in front of a camera. Visual features are extracted from video and used to determine avatar motion.

6. Potential Improvment
6.1 Keeping database size manageable is a concern, and determining when a database has sufficient variety for a given space of motions is an open and interesting research problem.

6.2 A much larger database might require more aggressive pruning of transitions to prevent the computation time of the preprocessing step from scaling as the square of the number of frames.

6.3 Automatically picking the initial member of each cluster is desired.