1. Title and Due Date GrabCut, GMMs, and Augmenting Image Cutout with Image Saliency Due: March 10, 2006 2. Problem Statement Lazy Snapping successfully cuts out a region of an image when provided with good user-defined label seeds, a sufficient segmented image, and accurate edge weights. This method relies very heavily on the quality of the user interaction. A desirable alternative would be an image cutout system that minimizes the level of user interaction needed to complete a segmentation. It is my goal to implement a system that both minimizes user interaction, and possibly eliminates the need for it altogether. 3. Methods One system that already solves the problem of image segmentation while also reducing the amount of user interaction is GrabCut. Instead of explicitly marking a foreground and background region, GrabCut only requires the user to mark the explicit background area in the form of a rectangular bounding box that is drawn around the uncertain region (all pixels not explicitly in the background). Implementing a "hard segmentation" version of GrabCut is the first step to achieving the final goal. (Emphasis is put on "hard segmentation" to indicate that including GrabCut's Border Matting system will not be part of my implementation). Once GrabCut is working, solving the problem of eliminating user interaction to do an image cutout can be characterized as finding the best bounding box around the "object of interest" for GrabCut to use as its input. One suggested method for finding this box is to evaluate the saliency of the target image and attempt to draw a box that encapsulates the most salient areas of the image. Since this is a relatively untested approach, it will be necessary to find a way of evaluating this method as a solution for unsupervised image cutout. 4. Plan Milestone 1: Implement the basic GrabCut image cutout system. Milestone 2: GrabCut models the color data of the image using Gaussian mixtures. Code exists that already creates a Gaussian mixture model, but its current form is not suitable for use as a modular tool. Using this code to build a standalone GMM tool will be the next goal after GrabCut. Milestone 3: Tune performance on GrabCut. Milestone 4: With Feng's help, evaluate using saliency to compute an input bounding box for GrabCut to use. Milestone 5: Replace K-means with GMMs in Lazy Snapping to evaluate how results are affected by using a more accurate color model. 5. Evaluation Criteria GrabCut should be able to perform a hard segmentation of an object as outlined in the original paper. This includes implementing user editing to tune the cutout results after the initial segmentation is computed. (This should be a relatively easy step since GrabCut's user interaction at this step is almost identical to the one we use in Lazy Snapping). The final result should be a well-performing image cutout system. We should be able to tell if using the saliency of an image is a viable replacement for interactive user input. This will be determined by the results achieved by using saliency to define the seed regions. 6. Deliverables a) Implementation of GrabCut outputting a hard segmentation cutout with post-processing user editing b) A version of the GMM toolkit I'm currently using in GrabCut that does not rely on file I/O, but can rather be used as a C++ library. c) An extension of GrabCut that generates a bounding box by analyzing the saliency of the image. This will be an optional replacement for the initial user interaction. All other aspects of the system will remain unchanged. d) An extension of Lazy Snapping that allows a user to pick between K-means and GMMs for modeling the color data. 7. Readings Old readings: a) Lazy Snapping b) Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects c) GrabCut d) Fast Approximate Energy Minimization via Graph Cuts e) Book sections on the topic of Watershed segmentation f) Book sections on the topic of color clustering techniques New readings: g) Interactive Image Segmentation using an adaptive GMMRF model h) Maximum Likelihood from incomplete data via the EM algorithm i) Papers on Image Saliency 8. Risks My immediate problem and the one that is currently my biggest stumbling block is how to define the region edge weights for graph cut. The paper does not do the best job of specifying what the terminal edge weights should be, both for explicitly-marked regions and and edges connecting regions to the terminals representing the compliment of the GMM that the region has an assigned component value from (e.g. pixel p is assigned to component Kf from the foreground GMM. The edge weight between p and foreground terminal Tf is defined by a function of Kf. There is no information for the edge between p and Tb.) My intention is to contact one of the authors for clarification. I have already inquired with Feng, but he is unable to help at this time. One risk is not getting a hold of anyone that can provide me with a definitive insight into this problem. The other algorithm-specific stumbling block is in regards to the learning of the GMM parameters once each pixel in the uncertain region has been assigned a GMM component from either the foreground or background model. Their definition of this step is slightly ambiguous and I have yet to think of a way to test my interpretation of this step for correctness. It is possible that some of my implementation-specific issues are coming from the version of GMM tools I'm using. It's possible that the authors used a GMM implementation that is much more suited for the purposes of image cutout. 9. Motivation Having a way of doing fast and accurate image cutout would be a very helpful tool for other image-related projects, such as creating dynamic photo collages, especially if the cutout process can be automated to the point that it requires little or no user interaction.