As natural scenes contain a huge number of visual patterns generated by various stochastic processes, how to represent and model these diverse visual patterns, and how to learn and compute/infer those visual patterns efficiently become fundamental problems in computer vision.
We are seeking a generative probabilistic method for generic image understanding. We hope that our generative model can answer such questions, like "What do we perceive when we look at an image or a video sequence?" Some early psychology studies for this question led to early vision theories including Julesz's texton concept and Marr's primal sketch scheme, etc. These theories shed light on our perception-based generative model.
The goal of my research is to compute the semantic content of images and representing them with symbolic graphs using generic visual vocabulary learned from natural images. In our image model, we studied and integrated four important aspects in image modeling, which are appearance, geometric, dynamic, and topological aspects. These four aspects span sub-dimensions in image space. With the unified framework, we studied the spatial and temporal relationships of the graph elements and their interactions, inferred the hidden dynamic graph structures. We augmented the model complexity progressively and consistently to learn various complicated visual patterns. The model learning is verified by the synthesis of a series of real examples. The research results can be broadly applied to many applications such as document image understanding, motion recognition, video annotation, tracking, and surveillance.
Textured motion and complex motion are two typical complicated cases of generic motions. They bear rich stochastic appearance variations, geometric deformations and topological changes. These two complicated motion patterns fall beyond the scope of being handled by conventional motion models. Readers can see how our generative model successfully modeled these two challenging typical cases from the following links.
In real world, an object or a concept of an object only exists within a range of scales. The descriptions/features of objects strongly depend on the scale at which the world is formed/modeled. The traditional scale-space theory and related research fields such as image pyramids, multi-resolution analysis and invariant or salient feature pursuit are mainly concerned with the continuous change of deterministic feature detectors. My previous study of topological transitions inspired my rethinking about the remaining problems in traditional scale-space theory: how we may account for (1) the lifespan of image features, and (2) perceptual transitions in representation over scales, in terms of generative image representations.
1. Y. Z. Wang, S.C. Zhu, "Analysis and Synthesis of Textured Motion: Particle, Wave and Cartoon Sketch", IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Oct. 2004. (.pdf 1.3M)
2. S.C. Zhu, C. Guo, Y. Z. Wang, and Z. J. Xu, "What are Textons?", International Journal of Computer Vision (IJCV), 2005. (.pdf1.88M)
3. Y. Z. Wang, S. Bahrami, and S.C. Zhu, "Perceptual Scale Space and Its Applications", International Conference on Computer Vision (ICCV), Beijing, Oct. 2005.
4. Y. Z. Wang, S.C. Zhu, "Modeling Textured Motion : Particle, Wave and Sketch", International Conference on Computer Vision (ICCV), Nice, Oct. 2003. (.pdf 625K)
5. Y. Z. Wang, S.C. Zhu, "Modeling Complex Motion by Tracking and Editing Hidden Markov Graphs", Proc. of Computer Vision and Pattern Recognition (CVPR), Washington DC, Jun. 2004. (.pdf 494K)
6. Y. Z. Wang, S.C. Zhu, "A Generative Method for Textured Motion: Analysis and Synthesis", Proc. of European Conf. on Computer Vision (ECCV), Copenhagen, June, 2002. (.pdf 2.2M)
7. S. C. Zhu, C. Guo, Y. N. Wu and Y. Z. Wang, "What are Textons?", ECCV, Copenhagen, June, 2002. (.pdf 767K)
2. Related Research Groups
3. Raw Image Sequences for Download
Birds; Cloud; Fall13; Fall14; Firework10; Firework12; Floating Ball; Floating Foam; Snow;
Some other sequence please download from the MIT temporal texture site.