Hierarchical models, semantic contexts, compositionality, taxonomy of visual categories, visual event ontology, stochastic graph matching, and bottom-up/top-down inference are popular research topics in computer vision and pattern recognition. They can be viewed as different aspects of stochastic image grammars. The virtue of image grammars lies in their expressive power to represent an exponentially large number of object and event configurations by using a relatively much smaller vocabulary and a few compositional rules. In addition to objects and events, various semantic contexts can be associated with all levels of hierarchical descriptions in grammars, facilitating rich image interpretations.

After lying nearly dormant for a quarter of a century, stochastic image grammars are resurging as a common framework for studying diverse vision problems. This is, in part, thanks to recent advances in modeling, learning, and inference techniques. It seems, however, that the renewed interest in image grammars often ignores the problem formulations and solutions of earlier work. Also, old solutions to fundamental questions (e.g., "what is a visual word", "what are object/event parts", and "what is a visual context") are too often taken for granted by the new generation of researchers. In addition, this renaissance of image grammars seems to be a result of individual team efforts, rather than a collaborative study toward a unifying theory. Therefore, we believe that the timing is right to organize SIG-09, aimed at:

  • Illuminating directions for future research,
  • Reducing the historical disconnect from early work, and
  • Bringing together researchers from different subcommunities and thus increasing the interdisciplinary awareness and collaboration.

The major theme of SIG-09 will be to identify challenges facing the work toward a unifying theoretical foundation of stochastic image grammars. This unification is twofold, and concerns formulating a grammar that would:

  • Jointly address vision problems that are traditionally viewed as distinct and thus solved separately (e.g., object segmentation and activity recognition), and
  • Encompass mathematical theories and techniques traditionally viewed as distinct and used by different subcommunities (e.g., harmonic analysis, Bayesian inference, sparse coding, Markov random fields, graphical models, etc.).