1. Title and Due Date

GrabCut, GMMs, and Augmenting Image Cutout with Image Saliency
Due: March 10, 2006

2. Problem Statement

Lazy Snapping successfully cuts out a region of an image when provided
with good user-defined label seeds, a sufficient segmented image, and
accurate edge weights.  This method relies very heavily on the quality
of the user interaction.  A desirable alternative would be an image
cutout system that minimizes the level of user interaction needed to
complete a segmentation.  It is my goal to implement a system that
both minimizes user interaction, and possibly eliminates the need for
it altogether.

3. Methods

One system that already solves the problem of image segmentation while
also reducing the amount of user interaction is GrabCut.  Instead of
explicitly marking a foreground and background region, GrabCut only
requires the user to mark the explicit background area in the form of
a rectangular bounding box that is drawn around the uncertain region
(all pixels not explicitly in the background).  Implementing a "hard
segmentation" version of GrabCut is the first step to achieving the
final goal.  (Emphasis is put on "hard segmentation" to indicate that
including GrabCut's Border Matting system will not be part of my
implementation).

Once GrabCut is working, solving the problem of eliminating user
interaction to do an image cutout can be characterized as finding the
best bounding box around the "object of interest" for GrabCut to use
as its input.  One suggested method for finding this box is to
evaluate the saliency of the target image and attempt to draw a box
that encapsulates the most salient areas of the image.  Since this is
a relatively untested approach, it will be necessary to find a way of
evaluating this method as a solution for unsupervised image cutout.

4. Plan

Milestone 1: Implement the basic GrabCut image cutout system.

Milestone 2: GrabCut models the color data of the image using Gaussian
mixtures.  Code exists that already creates a Gaussian mixture model,
but its current form is not suitable for use as a modular tool.  Using
this code to build a standalone GMM tool will be the next goal after
GrabCut.

Milestone 3: Tune performance on GrabCut.

Milestone 4: With Feng's help, evaluate using saliency to compute an
input bounding box for GrabCut to use.

Milestone 5: Replace K-means with GMMs in Lazy Snapping to evaluate
how results are affected by using a more accurate color model.

5. Evaluation Criteria

GrabCut should be able to perform a hard segmentation of an object as
outlined in the original paper.  This includes implementing user
editing to tune the cutout results after the initial segmentation is
computed.  (This should be a relatively easy step since GrabCut's user
interaction at this step is almost identical to the one we use in Lazy
Snapping).  The final result should be a well-performing image cutout
system.

We should be able to tell if using the saliency of an image is a
viable replacement for interactive user input.  This will be
determined by the results achieved by using saliency to define the
seed regions.

6. Deliverables

a) Implementation of GrabCut outputting a hard segmentation cutout
with post-processing user editing
b) A version of the GMM toolkit I'm currently using in GrabCut that
does not rely on file I/O, but can rather be used as a C++ library.
c) An extension of GrabCut that generates a bounding box by analyzing
the saliency of the image.  This will be an optional replacement for
the initial user interaction.  All other aspects of the system will
remain unchanged.
d) An extension of Lazy Snapping that allows a user to pick between
K-means and GMMs for modeling the color data.

7. Readings

Old readings:
  a) Lazy Snapping
  b) Interactive Graph Cuts for Optimal Boundary & Region
Segmentation of Objects
  c) GrabCut
  d) Fast Approximate Energy Minimization via Graph Cuts
  e) Book sections on the topic of Watershed segmentation
  f) Book sections on the topic of color clustering techniques

New readings:
  g) Interactive Image Segmentation using an adaptive GMMRF model
  h) Maximum Likelihood from incomplete data via the EM algorithm
  i) Papers on Image Saliency

8. Risks

My immediate problem and the one that is currently my biggest
stumbling block is how to define the region edge weights for graph
cut.  The paper does not do the best job of specifying what the
terminal edge weights should be, both for explicitly-marked regions
and and edges connecting regions to the terminals representing the
compliment of the GMM that the region has an assigned component value
from (e.g. pixel p is assigned to component Kf from the foreground
GMM.  The edge weight between p and foreground terminal Tf is defined
by a function of Kf.  There is no information for the edge between p
and Tb.)  My intention is to contact one of the authors for
clarification.  I have already inquired with Feng, but he is unable to
help at this time.  One risk is not getting a hold of anyone that can
provide me with a definitive insight into this problem.

The other algorithm-specific stumbling block is in regards to the
learning of the GMM parameters once each pixel in the uncertain region
has been assigned a GMM component from either the foreground or
background model.  Their definition of this step is slightly ambiguous
and I have yet to think of a way to test my interpretation of this
step for correctness.

It is possible that some of my implementation-specific issues are
coming from the version of GMM tools I'm using.  It's possible that
the authors used a GMM implementation that is much more suited for the
purposes of image cutout.

9. Motivation

Having a way of doing fast and accurate image cutout would be a very
helpful tool for other image-related projects, such as creating
dynamic photo collages, especially if the cutout process can be
automated to the point that it requires little or no user interaction.