Visual Learning, Modeling, Conceptualization, and Knowledge Representation

Ten Topics in Modeling and Conceptualization

1). Four research streams in visual knowledge representation 2). Four categories of visual models 3). General problem formulation as visual learning 4). The 1/f-power law and its descriptive and generative model 5). The scale invariance in natural image statistics and its models 6). Modeling and definition of textures 7). Modeling and definition of shapes 8). Modeling and definition of textons 9). Modeling of human faces 10). A unifying picture --- where Markov random fields, Wavelets, and PDEs meet!

Examples of natural images, which are very rare (almost zero volume) in the space of images and are very different from white noise

General Introduction

Reading novels, we can imagine vivid scenes; at sleep we can dream; and some blind people can paint pictures. These simply mean that we have visual knowledge represented in our brains. Such knowledge is not only for imagination, dreaming and planning, but is crucial for interpreting real world images. Because there is simply not enough information in a 2D image about the 3D world. Our brains have to make up the missing information ! This is the Bayesian statistics view. Those friends who are against Baysian or against the use of prior knowledge/models fall into two cases: 1). They use some knowledge implicitly and don't know how to formulate it properly. 2). They find their dreams are white noise --- like a television without signal. Can we make a computer dream? How can we represent visual knowledge in a computer and let it learning from new images? From the perspective of pattern theory --- a subject pioneered by Ulf Grenander and others, anything we can perceive is called a "pattern", thus our objective is to represent and compute patterns. More specifically, we should address the following issues. i). Conceptualization of visual patterns: What is a quantitative definition of a visual pattern? e.g. how do we define mathematically a texture, a willow tree, or a human face? ii). Statistical modeling of visual patterns: In the 70s-80s, artificial intelligence (AI) represented knowledge by logics, using propositional calculus or predicate calculus. This is clearly ineffective for representing stochastic patterns. More general scheme for representing knowledge is the calculus of probability. Why do we need to play with statistical models? What is the origin of statistical models? What are the relationship between a model and its math definition? iii). Learning the visual vocabulary: Compared with speech and language which is phonemes, words, phrases, clause, sentences, ... what are the vocabulary for visual description? Can they be learned from images? iv). Knowledge for computational heuristics: How do we infer the visual patterns effectively? These questions are addressed in a recent epistemological paper S.C. Zhu, "Statistical modeling and conceptualization of visual patterns", IEEE. Tran. on PAMI vol. 25, no.6, pp. 691-712, 2003.