Visual Learning by Integrating Descriptive and Generative Models

There are two general learning paradigm in the literature.

1. One is the descriptive method, including Markov random fields, Gibbs models, FRAME, ---all expoential families. In a descriptive method, one extracts some feature statistics, and then a model is constructed by maximum entropy under the constraints that it reproduces the observed statistics. This is the minimax entropy learning paradigm (Zhu, Wu, and Mumford, 1996-98).

2. The other is generative method, including PCA, TCA, ICA, HMM, mixture experts, image coding, and Holmholta machine etc. This leads to mixture models. These models assume that observed signals are generated by hidden variables, and the model can be estimated through EM-type learning.

We argue that the two paradigms must be integrated to benefit from each other, and the relationships are revealed:

a). Descriptive models (Gibbs) are precursors of generative models. Though one can always ask for why, why and why, a generative model must stop at certain hidden variable which are not caused by deeper variables. Then such "root" variables must be characterized by descriptive models. In the literature from PCA to image coding, the root variables are assumed to be iid--a trivia case of Gibbs model.

b). Generative models are a dimension reduction step for discriptive models. With "meaningful" variables, the descriptive models become simple and easy to compute.

The evolution of visual learning is the process of replacing descriptive models with generative + simple descriptive models.

For the Ivy and wall image shown above, though a Gibbs model can characterize and synthesize this image based on pixels, it is more interesting if the model capture the explicite notion of ivy and bricks. This leads us to a two layered texture model (shown in the right. Each layer is a stochastic texton process with some basic elements called texton. The spatial arrangements of textons are governed by a Gibbs model (Gestalt ensemble).
[1]. C. E. Guo, S. C. Zhu and Y. N. Wu, "Visual learning by integrating desriptive and generative
     methods", Proc. of Int'l Conf. on Computer Vision, Vancouver, Canada, July, 2001.

[2]. S. C. Zhu and C. E. Guo, "Mathematical modeling of clutter: descriptive vs. generative models", 
     Proc. of SPIE AreoSense Conf. on Automatic Target Recognition, Orlando, FL, 2000.

[3].  C. E Guo, S.C. Zhu, and Y. N. Wu, "Modeling Visual Patterns by Integrating Descriptive and 
          Generative  Models, Int'l J.  of Computer Vision, 53(1), 5-29, 2003.
Click here to see some results of two layered texture synthesis by first sampling the Gibbs models for each layer, and then generate an image through the texton patterns.