Image Parsing: Segmentation + Grouping + Recognition

Natural images consist of an overwhelming number of visual patterns generated by very diverse stochastic processes in nature. The objective of image understanding is to parse an input image into its constituent patterns. The figure below is an example of parsing a stadium scene hierarchically: human (face and clothes), sports field (a point process, a curve process, homogeneous color regions, text) and spectators ( textures, persons)

Depending on the type of patterns that a task is interested in, the parsing problem is called respectively 1). Image segmentation --- for homogeneous grey/color/texture region processes. 2). Perceptual grouping --- for point, curve, and general graph processes 3). Object recognition --- for text and objects. Therefore these three traditional vision problems ought to be solved in a unified way, and this can be achieved in a Bayesian framework. Our goal is to develop effective Markov chain Monte Carlo algorithms to search for globally optimal solutions in complex solution spaces. The computation involves both stochastic diffusions by partial differential equations and reversible jumps with Metropolis-Hastings method.

We have nine projects in image segmentation/parsing with grey level, color, texture, 3D range, motion, and curve processes.

1. Region Competition [1995-1996]

2. Image Segmentation by Data Driven Markov Chain Monte Carlo [1999-2001]

3. 3D Range Segmentation by DDMCMC [2000-2001]

4. Parsing Image Into Regions, Curves and Curve Groups [2001-2002] 5. Motion Segmentation (integrated with intensity) [2000-2003] 6. Graph Partition by Swendsen-Wang Cuts [2000-2003] 7. Integrating Image Segmentation with Object Recognition [2003-2004] 8. Bayesian Reconstruction of 3D Scenes From a Single Image [2002-2004] 9. Top-down/Bottom-up Inference with Graph Grammar [2004-2005]

A general Introduction for image Segmentation

Image segmentation is a long standing problem in computer vision. Often it is viewed as an ill-defined problem in comparison to other vision tasks which have apparently well defined objectives, such as detection, recognition, and tracking. Unfortunately, without addressing segmentation problems, those special purpose vision tasks are fundamentally ill-defined. It is fair to say that computer vision or image understanding is all about parsing images. The confusion in defining image segmentation (and also perceptual grouping)simply reflect the following two facts. 1). It is difficult to model all types of stochastic patterns in generic vision. Real world images consist of multiple layers of stochastic processes, such as texture, texton, stochastic point, line, curve, graph, region, and object processes, which generate images through spatial organizations. Thus an appropriate formulation should beimage decomposition or parsing, which decomposes an image into its natural constituents as various stochastic processes. This subsumes image segmentation as region process, and naturally integrates object recognition and perceptual organization. The latter deal with point, line, curve, and object processes. Implicit in this formulation is the notion of generative models for image interpretation in contrast to classification and discrimination methods. This observation motivated our work and many others in modeling and learning various stochastic patterns. See Texture Analysis and Synthesis. A segmentation algorithm must be general enough to handle many families of image models in a principled way.

2). Image segmentation is a computationalprocessand should NOT be treated as atask. Real images are intrinsically ambiguous, and our perception changes dynamically, even in a very short time duration, depending on our attention. Generally speaking, the more one looks at an image, the more one sees. It seems narrow-minded to think that a segmentation algorithm just outputs one final result. Instead a segmentation algorithm should realize the intrinsic ambiguities characterized, say, in a Bayesian posterior probability, and outputs multiple distinct solutionsdynamically and endlesslyso that these solutions, as samples, ``best preserve'' the posterior density. Then we need a mathematical principle for pruning and selecting the outputs. This motivated our work on DDMCMC which can explore a complicated solution space and a K-adventurer's algorithm that prunes and select multiple segmentations.

Our main contributions in image segmentation and grouping.

I). Region competition equations (Zhu and Yuille 95-96) Our first work on region competition is a variational method (PDE). This unifies previous algorithms, like region growing, SNAKE/Balloon, Split-merge etc. As all PDE methods, it cannot handle dimension change (such as determining the number of regions and switching image models). This is amended by the reversible jumps in a MCMC method. II). Data-Driven Markov Chain Monet Carlo (DDMCMC) scheme (Tu and Zhu, 2000-2003) We analyze the solution space and design importance proposal probabilities to drive the Markov chain search in complex space. This unifies generative methods with discriminative methods. The generative methods optimize a complex Posterior probability in the Bayes framework, and the discriminative methods approximate the posterior using local features etc. The latter are used to drive the former for fast mixing and convergence Some comparison experiments on generic natural image dataset showed that DDMCMC can easily beats graph theoretic (graph Cuts) algorithms and is much more general in formulation. For example, it can handle global images models such as shading model and range surface model, which are not handled well by discriminative methods. It also can integrate various prior models. III). Graph partition by Swendsen-Wang Cut (Barbu and Zhu, 2002-2003) Our most recent progress is the generalization of the famous Swendsen-Wang algorithm for general graph partition. The SW algorithm was a fast algorithm for sampling Ising model at critical (low) temperature. This means that we can sample a probability without an annealing procedure. I.e. the Markov chain walks fast at low temperature by swapping large chunks. In segmentation and grouping, this means flipping sub-graphs of large size. The SW-cut can simulate Markov chains sampling the graph partition space, and it makes the split-merge moves reversible and thus is capable of globally optimal solutions for general posterior probabilities or energy functions.

```
The Diagram integrating generative method (top-down MCMC inference)
with discriminative methods (bottom-up, data-driven)
```