Image Parsing: Segmentation + Grouping + Recognition

    Natural images consist of an overwhelming number of visual patterns generated by very diverse stochastic processes in nature.
The objective of image understanding is to parse an input image into its constituent patterns. The figure below is an example of
parsing a stadium scene hierarchically:  human (face and clothes), sports field (a point process, a curve process, homogeneous color
regions, text) and spectators ( textures, persons) 

Depending on the type of patterns that a task is interested in, the parsing problem is called respectively
      
    1).  Image segmentation  --- for homogeneous grey/color/texture region processes.
    2).  Perceptual grouping  --- for point, curve, and general graph  processes
    3).  Object recognition     --- for text and objects.

  Therefore these three traditional vision problems ought to be solved in a unified way, and this can be achieved in a
Bayesian framework.   Our goal is to develop effective Markov chain Monte Carlo algorithms to search for globally optimal
solutions in complex solution spaces.  The computation involves both stochastic diffusions by partial differential equations
 and reversible jumps with Metropolis-Hastings method. 
We have nine projects in image segmentation/parsing with grey level, color, texture, 3D range, motion, and curve processes.
1. Region Competition [1995-1996]
2. Image Segmentation by Data Driven Markov Chain Monte Carlo  [1999-2001]
3. 3D Range Segmentation by DDMCMC [2000-2001]
4. Parsing Image Into Regions, Curves and Curve Groups [2001-2002]

5. Motion Segmentation (integrated with intensity) [2000-2003]

6. Graph Partition by Swendsen-Wang Cuts [2000-2003]

7. Integrating Image Segmentation with Object Recognition [2003-2004]

8. Bayesian Reconstruction of 3D Scenes From a Single Image [2002-2004]

9. Top-down/Bottom-up Inference with Graph Grammar [2004-2005]
 
   A general Introduction for image Segmentation 
  Image segmentation is a long standing problem in computer vision. Often it is viewed as an
ill-defined problem in comparison to other vision tasks which have apparently well defined   
objectives, such as detection, recognition, and tracking. Unfortunately, without addressing 
segmentation problems, those special purpose vision tasks are fundamentally ill-defined. 
It is fair to say that computer vision or image understanding is all about parsing images.
The confusion in defining image segmentation (and also perceptual grouping)simply reflect
 the following two facts.


 1). It is difficult to model all types of stochastic patterns in generic vision.

Real world images consist of multiple layers of stochastic processes, such as texture, texton,
 stochastic point, line, curve, graph, region, and object processes, which generate images 
through spatial organizations. Thus an appropriate formulation should be image decomposition 
or parsing, which decomposes an  image into its natural constituents as various stochastic processes. 
This subsumes image segmentation as region process, and naturally integrates object 
recognition and perceptual organization.  The latter deal with point, line, curve, and
 object processes. Implicit in  this formulation is the notion of generative models for
 image interpretation in contrast to classification and discrimination methods. This 
observation motivated our work and many others in modeling and learning various stochastic 
patterns. See Texture Analysis and Synthesis. A segmentation algorithm must be general enough 
to handle many families of image models in a principled way.  
 2). Image segmentation is a computational process and should 
     NOT be treated as a task. 

   Real images are intrinsically ambiguous, and our perception changes dynamically, even 
in a very short time duration, depending on our attention. Generally speaking, the more one
looks at an image, the more one sees. It seems narrow-minded to think that a segmentation 
algorithm just outputs one final result. Instead a segmentation algorithm should realize the 
intrinsic ambiguities characterized, say, in a Bayesian posterior probability, and outputs 
multiple distinct solutions dynamically and endlessly so that these solutions, as
samples, ``best preserve'' the posterior density. Then we need a mathematical principle
for pruning and selecting the outputs. This motivated our work on DDMCMC which can explore
a complicated solution space and a K-adventurer's algorithm that prunes and select multiple
 segmentations.
Our main contributions in image segmentation and grouping.
   I). Region competition equations (Zhu and Yuille 95-96)
     Our first work on region competition is a variational method (PDE). This unifies previous
     algorithms, like region growing, SNAKE/Balloon, Split-merge etc. As all PDE methods, 
     it cannot handle dimension change (such as determining the number of regions and switching 
     image models). This is amended by the reversible jumps in a MCMC method.

   II). Data-Driven Markov Chain Monet Carlo (DDMCMC) scheme (Tu and Zhu, 2000-2003)
     We analyze the solution space and design importance proposal probabilities to drive
     the Markov chain search in complex space. This unifies generative methods with discriminative
     methods. The generative methods optimize a complex Posterior probability in the Bayes framework,
     and the discriminative methods approximate the posterior using local features etc.
     The latter are used to drive the former for fast mixing and convergence

     Some comparison experiments on generic natural image dataset showed that 
     DDMCMC can easily beats graph theoretic (graph Cuts) algorithms and is much more 
     general in formulation.  For example, it can handle global images models such as 
     shading model and range surface model, which are not handled well by discriminative 
     methods. It also can integrate various prior models.

   III). Graph partition by Swendsen-Wang Cut (Barbu and Zhu, 2002-2003)
    Our most recent progress is the generalization of the famous Swendsen-Wang algorithm for general
    graph partition. The SW algorithm was a fast algorithm for sampling Ising model at critical (low)
    temperature. This means that we can sample a probability without an annealing procedure. I.e. the Markov 
    chain walks fast at low temperature by swapping large chunks.  In segmentation and grouping, this means
    flipping sub-graphs of large size.  

      The SW-cut can simulate Markov chains sampling the graph partition space, and it makes the 
    split-merge moves reversible and thus is capable of globally optimal solutions for general posterior
    probabilities or energy functions.

     The Diagram integrating generative method (top-down MCMC inference) 
                 with discriminative methods (bottom-up, data-driven)