Automatic cinematography:

Automated Cinematography

What is automated cinematography?

Cinematography is what film directors do. Automated cinematography, then, is the computer scientist/animator playing Steven Spielberg. Loosely defined, automated cinematography is any technique that produces meaningful visual sequences of a dynamic scene.

Why is automated cinematography hard? Given a mere iota of creative sense, playing director is easy. (Who here isn't a movie critic?) Somewhat more difficult is translating one's sense of film direction into a set of rules that can be applied to the scene (i.e., the data set). But the core problem is even more challenging: how can we develop a high-level interpretation of the scene from the data and a general set of assumptions? To make it tractable, the problem must be attacked as a set of smaller subproblems.

The problem and my approach to it

The problem I attempted to solve was that of "naturally" tracking a single figure engaged in an arbitrary motion. Of course, "naturally" is a subjective measure. My criteria were

The entire figure should be always visible in the frame
The camera should initially target the front side of the figure
The camera should move smoothly with no jerks and sudden accelerations
The camera should anticipate the figure's movement, not blindly follow every action

The first two criteria were relatively easy to satisfy. Fitting the figure in the frame was a matter setting the field of view (given a fixed camera distance) so that the figure's volume fit in the frame. Locating the front of the figure was a matter of determining the orientation of the hip node. As it turns out, looking at the front of the figure does not produce the most interesting viewpoint. Perhaps a better technique would have been to choose the viewpoint that shows the most horizontal motion.

Collectively satisfying the other two criteria was more difficult. The truly naive approach is to blindly focus the camera on the figure's hips. This leads to unnecessary camera motion, especially during highly dynamic motions (see the Dancer example). Simply smoothing this path of camera targets does not fix the problem. Instead I used a combination of the figure's hips and the center of the figure's volume as the camera's target. When standing upright, focusing on the figure's center of volume produces the ideal view. However, when contorted, focusing on the figure's hips tends to keep the camera still, in anticipation that the figure will regain its normal (extended) configuration. Still, this does not solve the problem. The camera still moved too much.

It was necessary to limit the camera's movement by doing what a real camera operator would do: anticipate the figure's next move. I did this by looking three frames ahead in the motion sequence. If the figure made a net (average) movement that exceeded the variance of the movement, then the camera should begin to follow the figure at the current frame. The justification here was that the net movement should be followed only if significant; that is, in excess of some threshold. The threshold used was the statistical variance of the motion over those same three frames. If the variance was low, then the movement represented a directed movement in some direction (e.g., climbing a ladder). If the variance was high, then the movement was likely sporadic and not maintainable (e.g., the dancer's swinging hips); the figure was likely to return to where it started and therefore should not be followed. This worked well in practice across a large variety of motions. This technique had a tendency to produce very choppy camera movements, as the camera would often "snap" to a target when the figure made a significant movement. Applying a smoothing filter resolved this problem.

Results

The technique described above was a succesful first shot at the problem. Download the executable (RunMe) and check out the different motions (in the /data subdirectory). The program only runs on a Windows platform, preferably with a graphics accelerator that supports OpenGL. (Note that there are several options on the toolbar not related to this project. For a description, go here.) The only (optional) user input related to cinematography is a smoothing factor. See the Cinematography menu and experiement.

Notice that the technique works especially well on the Cartwheeling Dancer and Ladder examples. Other examples reveal the technique's weaknesses. In several motions, the camera shifts slightly to follow the figure even though the figure never leaves the frame through the entire motion.

Source code

If you are interested you can look at the source code. The motionAnalysis class has a simple interface that you can plug into your favorite motion capture viewer. Given motion capture data (in .bhv format), it plots the camera path and provides the field of view.

Discussion

One way to possibly correct the problem described above is to consider the figure's volume in the frame. The camera should not move as long as the volume is completely contained inside the frame. I would also like to devise a system for automatically choosing cuts between multiple cameras. I attempted this in the implementation but it worked poorly.

Automated cinematography does not appear to be a well-studied problem. This is probably somewhat due to the fact that cinematography itself is more an art than a science. It is also a challenging problem and therefore an interesting one. The best part is being able to see the results and compare them to our intuition. It would be pleasing to ultimately develop a system that animator's find useful.