Results

Many improvements were made to the Lazy Snapping implementation I started last semester. My contributions were: (1) the improvement of the K-means clustering by optimizing the current method, (2) improving the general performance time of lazy snapping by refining my graph cut code and completely re-writing my lazy snapping algorithm implementation to exploit certain assumptions about the nature of this particular image cutout problem. In a nutshell, I reduced the modularity of the individual algorithm steps so that everything is performed in a more "pipelined" fashion. The results of these improvements, and those made by Nick, produced an implementation of Lazy Snapping that performs at near-interactive speeds. Since Nick's work dealt primarily with Lazy Snapping, I will focus the rest of this section on my work with GrabCut. GrabCut Method (1) GrabCut first creates an initial trimap labeling (background, foreground, unknown) of the target image from a user-defined bounding box drawn around the desired object. Everything outside of this box is labeled as explicitly background and everything inside the box is labeled as unknown. From this, an initial segmentation is created; all unknown pixels are tentatively placed in the foreground class and all known background pixels are placed in the background class. (2) Separate Gaussian Mixture Models are created from the sampled color spaces for both the foreground and background classes. Each model contains 5 components (10 components total). This is done by using the CLUSTER algorithm and source code written by Charles Bouman of Purdue. For my tests, an externally compiled executable (GaussMixture.exe) was used along with text file I/O to generate each GMM. (3) Once the GMMs are created, a probability heuristic function is used to assign each pixel in the foreground class to the most likely Gaussian component in the foreground model, and each pixel in the background class is similarly assigned to the most likely Gaussian component in the background model. (4) New GMMs for both the foreground and background are then created based on these new labelings. The idea is to best separate the two models so that there is a definitive difference between them. (5) From the newly-calculated GMMs, a graph is built and graph cut is used to find a new classification for all pixels. The terminal weights of the graph are calculated based on the sum of the probabilities that a pixel belongs in either the foreground or background models. The neighboring edge weights are calculated in a way that supports smoothness and penalizes sharp edges (aka, sharp edges are preferable candidates for cuts). (6) Steps 3-5 are repeated until the cutout converges and no further changes are made in the pixel classifications. GrabCut Results I was unable to get GrabCut to yield the proper results on my set of test images. All parts of the system have been implemented, but despite my best efforts in the time I had I still could not get the iterated graph cut method to properly cut out my basic test case image (GCBox.png), nor any of my other test images (Box6.png, Gauss.png, Flower-small.png, Flower-small256.png). The best I get is an image cutout that is close or nearly-identical to the cutout of the initial bounding box that the user defines. I have created two result set images to illustrate how GrabCut performs and offer some analysis about the results. Below are the explanations for each file. Explanation of Mixture Window The Gaussian Mixture window plots the means of the 5 mixture components calculated from the sampled image. The YELLOW dots indicate a background component and the green dots indicate a foreground component. The x-axis represents the RED color channel and the y-axis represents the GREEN color channel. The BLUE channel would be represented by the z-axis, but is flattened in this case for easier viewing. Effectively, this is a 2D modeling of the total 3D mixture. GrabCut Test Cases (1) "Result-GCBox.png" GrabCut starts with the original image in the window labeled 'GrabCut.' In this case, the image is a green rectangle drawn inside a purple rectangle against a black background. The bounding box is drawn around the green rectangle to indicate that this is the object we wish to cut out. The 'Results' window displays the final image cutout and the 'Gaussian Mixture' window diplays a 2D representation of the final 3D gaussian mixture components. At each iterative step, these components are recalculated to fit the pixel sets that are assigned to them based on a highest-probability heuristic function. A graph cut is then performed, the pixels are re-labeled and re-assigned to new components as necessary, and then the process starts again until the cutout converges (no pixels are re-assigned). In this case, the cutout converges after only one iteration. A numerical study of the weights being assigned to the graph cut indicate that there may be an issue with how a neighboring edge weight is calculated when the color exists in both a foreground and background mixture component. The gaussian mixture graph would seem to indicate that one of the background components has the greatest probability of containing the color black. I have even verified that the terminal edge weights being assigned to the interior black regions are consistent with this analysis. However, the edge penalty being assigned between the neighboring pixels on either side of the dividing line is still great enough to cause the interior black pixels to remain labeled as part of the foreground. Consequently, the iterated algorithm determines that no pixel labels were changed and concludes that the cutout has converged. My intuition based on what I know about how the edge weights are calculated is to say that there is a problem with how the algorithm handles edges that are perfectly smooth across boundary labels. Because the weights preserve smoothness, the chances of a cut happening along the initial boundary are very small. This is indeed problematic since in the first graph cut, the only neighboring edges that have weights above zero are those on the dividing line of the initial bounding box. Hence, the neighboring edges act as a sort of barrier against any further segmentation and the classification converges prematurely. Predictably, setting all edge penalties to zero would solve this problem. (2) "Result-Flower.png" In the second test image, I perform a GrabCut on the flower. The initial bounding box is drawn around the magenta flower and the background is made up of the green foliage behind it. Once again, the results of the cutout are displayed in the 'Results' window and the gaussian models for the final calculated components are displayed in the 'Gaussian Mixture' window. As you can see, GrabCut fails to properly cut out the flower from the green foliage. However, in this case we see minor exclusions along the border of the initial foreground region. Indeed, when GrabCut was performed on this image, the algorithm took multiple cycles to converge. To explain why this happened, I look at the clues gathered from the previous test; namely, at what is happening around the border of the initial cutout. Closer inspection of the output image seems to reveal that the places where the initial foreground pixels were correctly reclassified are those where there is less than perfect smoothness over the dividing line. Moreover, it appears that the places where the pixel classifications begin to converge are at all spots where perfect smoothness is discovered between a background-classified pixel and its foreground-classified neighbors. This would lead us to conclude that the edge penalties belonging to the neighboring pixel edges that straddle the line between foreground and background pixels are too strong and end up skewing the graph cut. If this is indeed the case, it would definitely explain why my results fail to turn out correctly. Problem Evaluation If my conclusions about the edge weights are correct, which I believe they are based on the empirical results and my understanding of the algorithm, then these edge weights must be better tuned to handle the perfect smoothness boundary case. However, tuning these weights myself is difficult since neither of the two reference papers I've been using address the issue of explaining how they calculate the expectation term over two sample pixels, by which I am referring to the inner term in equation (5) of the original GrabCut paper. For this term, I calculate the intensity difference of two neighboring pixels and plug that value in to calculate the final beta term. It's very possible that my choice of this term is the sole cause of my inaccurate results. However, neither of the papers give any clue to the correctness of this choice. For debugging purposes, I tried several different approaches on how to calculate this "expectation term," but none of my methods proved any more successful. Limitations The most obvious limitation of this system is color depth. The algorithm assumes there are at least 5 distinct colors for both the foreground and background mixture models. If there are fewer than this, then the algorithm fails to initialize properly. However, it is also possible that because I used a third party implementation for calculating the GMMs, which is quite different than the method they used in both of the reference papers, this is only a problem in my implementation. Another limitation regarding the color space is that of color variety. Namely, the time it takes to calculate an accurate Gaussian Mixture Model grows with the number of color samples that need to be modeled. Modeling duplicate colors proved to be especially computationally expensive. To alleviate this, I tracked every color that was added to the model and removed all duplicates. This improved results dramatically. However, I found that reducing the color depth of the image to 256 down from a full 256^3 guaranteed that my gaussian model always dealt with a very managable color space. In hindsight however, I'm sure that forcing this distinct model also skewed the weight parameter of each component. For example, in my (GCBox.png) test, the large band is a solid black strip of one color. Due to my method of truncating the input data for creating the model components, the single color of black would be grossly misrepresented. Preserving the color space may be important in how the terminal weights are calculated, but to do so is very computationally infeasible. Once again, both of the reference papers are unclear about what to do in this case. Hand-In My source code is located in the ZIP file: "GrabCut.zip", "GraphCut.zip" My test images are: "Box6.png", "Flower-small256.png", "Flower-small.png", "GCBox.png" My result images are: "Result-Flower.png", "Result-GCBox.png" My proposal files are: "Proposal-Original.txt", "Proposal-Revised.txt" My bibliograph file is: "Bibliography.txt" My final document files are: "results.txt", "work.txt", "learned.txt", "self-evaluation.txt" Instructions for Compiling GrabCut My code uses FLTK 1.1.6, GLEW OpenGL extension library, and my own GraphCut library which I have included the source code for in a separate zip file ("GraphCut.zip"). All dependencies and includes in the project are relational based on environment variables. The working directory of all projects is set to a "..\DLLs" folder located in the parent folder of the source code. It is here that GrabCut looks for the compiled Graph Cut dll.