Perceptual Transitions in Scale-Space

Y.Z. Wang, S. Bahrami, and S.C. Zhu , "Perceptual Scale Space and Its Applications", Int'l Conf. on Computer Vision, 2005. IJCV submitted.

This project aims to enrich the conventional scale-space theory by studying the perceptual transitions in a wide range of scales and explain the perceptual transitions by a set of graph grammar rules. The perceptual transition mechanism is in scale-space is formulated as model comparison problem.

Perceptual Transitions in Scale-Space

The conventional scale-space theory and related research fields such as image pyramids, multi-resolution analysis and invariant or salient feature pursuit are mainly concerned with the continuous change of deterministic feature detectors. Our observations are: (1) image features have life spans, and (2) there exist perceptual transitions in scale-space.

Fig.1 Examples of square images in a 7-level Gaussian pyramid. It is clear to see the perceptual transition from structures to textures and to nearly white noise.


The above figure clearly exhibits the perceptual transitions from structure to texture and to random noise in scale-space. The following figure is an image pyramid for a 1D signal, a slice from an image. Fig 2 (d) shows the topological change of the sketch trajectories in scale-space. All these figures suggest that both the image and our perception of the pattern change over scales.

Fig. 2 Scale-space of a 1D signal. (a) A 1D signal (marked as black line) from the roaster image. (b) Trajectories of the 2nd derivative zero-crossing of the 1D signal (Witkin 1983). The finest scale is at bottom. (c) The 1D signal at different scales. The y axis is scale. The gray curves are the 1D signal at different scales. The black segments on the curves corresponds to the primal sketch primitives on the image slice. (d) Symbolic representation of the sketch trajectories in scale-space.



Topological Model in Scale-Space

In this project, we adopt the primal sketch model and represent raw images by an attribute graph representation without perceptual loss. The traditional scale-space is enriched by study change of the graphs and thus the perceptual transitions in a very broad range of scales in the scale-space. And the perceptual transitions in scale-space are explained by a set of grammar rules in the generative graph representation. The three types of perceptual transitions identified in up-scaling scale-space are:

(1) Continuous sharpening of image primitives.

(2) Topology changes in graph representation defined by a graph grammar.

(3) Catastrophic changes from texture to structures with explosive birth of new image primitives.

By properly addressing the perceptual transitions issues, we expect performance improvement in a lot of vision applications, e.g. image enhancement, super-resolution, multi-scale object recognition and coarse to fine tracking, etc.


Fig.3 Examples of perceptual transitions. (a) Continuous sharpening/blurring of image primitives. (b) Topology changes in graph representation defined by a graph grammar, including (1) birth/death of a node, (2) birth/death of a junction, (3) extending/shrinking a node, (4) a ridge terminator changing into a pair of step-edges with a set of corners and its reverse, (5) a ridge changing into a pair of step-edges and its reverse. (c) Catastrophic changes with explosive birth of image primitives.

As shown in Fig. 4, let I[0,n] be discrete levels of the Gaussian pyramid with increasing resolutions. I(k) is a smoothed version of I(k+1) by an isotropic Gaussian kernel or equivalently by running a heat diffusion process. The difference (band-pass) images I(k+) =I(k+1) - I(k) for k=0, 1,...,n-1 form the Laplacian pyramid. When images are viewed at increasing resolutions in the Gaussian pyramid, more semantic content will be revealed. This evokes quantum jumps in visual perception amid continuous intensity changes (diffusion).

Fig. 4 Augmenting the image scale-space to perceptual scale-space which includes a sketch pyramid and a series of graph grammar rules for perceptual transitions.

Fig. 5 illustrates a four-level sketch pyramid S0, S1, S2, S3 and a series of graph grammar R0, R1, R2 for the graph expansion. Each R(k) includes production rules r(k,i), i=1,2,...,m(k) and each rule extends a subgraph g (could be g=NULL) conditional on its neighborhood dg.

R(k) = { r(k,i): g(k,i) | dg(k,i) -----> g'(k,i) | dg(k,i) }.

The expansion of a graph is realized through a series of rules, r(k,1), ..., r(k,m(k)) ,

S(k) -----> S(k+1).

Fig. 5 An example of a 4-level sketch pyramid and corresponding graph grammar for perceptual transitions.


Perceptual Transition Mechanism in Scale-Space

The perceptual transition mechanism is posed as a model comparison problem and can be handled in the Bayesian framework. It is basically, MDL (model selection) with ratio of posterior test. Let¡¯s consider we are in a up-scaling pyramid. G+ is the new structures introduce (including the 3 types of transitions).

The 1st term is the likelihood ratio test, which is usually >0, because complex model always explain the data better. However, the 2nd term, Log(G+|G) penalize the model complexity. Thus, perceptual transition only happens when a complex model explains the current resolution well, but over explained the image at a coarser resolution.



Our goal is to compute/infer the perceptual sketch pyramid together with the optimal path of transitions by maximizing a Bayesian posterior probability,


We use the original sketch pursuit algorithm to obtain initial sketch graphs, and then adopt the MCMC reversible jumps to track and edit the sketch graphs both upwards and downwards iteratively in scale-space across scales. Our Markov chain consists of six pairs of reversible jumps, e.g. death/birth of a node, death/birth of a junction, extending/shrinking of a node, ... . They correspond to the grammar rules in. Each pair of reversible jumps is selected probabilistically and they observe the detailed balance equations. These steps simulate a Markov chain with invariant probability p(S[0,n],R[0,n-1]|I[0,n]). For details, please refer to our paper.



Application I: Multi-scale Object Tracking

Most work in motion tracking assumes certain object structures (like a contour or small Markov graphs) appear in a narrow range of scales. When the object motion occurs in a wide range of scales, we observe significant structural changes in the graph representation. This is always considered a challenging problem in the motion tracking literature. Here we choose the example of tracking a car driving towards the camera. The sketches for the car and background are shown in two different colors.

Scaling is one type of motion different from the traditional tracking task, as it involves a lot of photometric and topological changes. Additionally, since our sketch graphs are inferred upwards-downwards in the pyramid to maintain consistency, we can even ``hallucinate'' the detailed sketches of the car at a far distance. Furthermore, the background sketches are also stabilized through frames.

Original Sequence
Original Sketch Sequence
Tracked Car Sketch Sequence
Car tracking sequence. (a) Sample frames from the observed sequence. The largest image size is 352x240. (b) Corresponding initial sketches from bottom-up algorithm. (c) Corresponding tracked car sketches. The tracked car sketches are in black. The background sketches are in grey.


Application II: Adaptive Image Display

This task has recently emerged from the growing need to display large digital images (say 2048x2048 pixels) using a small screen (say 128x128 pixels), such as in PDAs, cellular phones, and digital cameras. Normally a user has to manually browse through an image by selecting a location and zooming to the desired level. This is inconvenient for very large images especially with today's shrinking screens. It is desirable then to present the user with a ``tour" of the image that summarizes its informational content in as few frames as possible. Each frame would then be at a different location and resolution.

The problem formulation naturally leads to the sketch pyramid if we interpret informational content as sketch content. The solution then is to associate each subregion of an image with a scale such that any further zooming would not expand its sketch graph, i.e. there is no perceptual gain to further zooming. A tour would then consist of visiting these subregions at their identified scales.

These images represent the replacement of each sketch partition with an image region from the corresponding level in the Gaussian pyramid. Note how the areas of higher structural content are in higher resolution (e.g. face, house), and areas of little structure are in low resolution (e.g. landscape, grass). A limited view screen can thus view each partition at its appropriate scale without loss of structural information.


Application III: PDA/Cell-Phone Picture Auto-Browsing


Visiting the nodes of a quad-tree decomposition of a sketch pyramid is an efficient way to automatically convey a large image's informational content.