The code and data for this experiment is avaible at the Main page (newest code) and at the Change Log page (with code archives).

To perform the experiment, run StartFromHere.m. A full code documentaion is available here. Below we explain how the experiment is conducted.

Step 1. We design three interesting shape motifs, namely ellipsoid, angle and parallel lines. The three shape motifs, as active basis templates, are each learned from a single clean training image (see this matlab function for details). The figure below are learned three shape motifs composed of Gabor wavelet elements.

Step 2. A shape script template for egret/pelican is composed of one ellipsoid as the body, one angle as its mouth, and two parallel lines as the flexible neck. We don't need to place the shapes to precise positions like a professional artist. A rough layout is enough. Below is the template we constructed (see this matlab function for details), shown side by side with an egret image and the dictionary of rotated/scaled/stretched versions of the three shape motifs (generated by this matlab function).

     

Step 3. Match the shape script template to newly observed examples. This is done by a recursive SUM/MAX procedure.

1) SUM1 maps are computed by convoluting Gabor wavelet filters on the source image and obtaining the local energy maps. MAX1 maps are computed from SUM1 maps by local maximum pooling, where the maximum is over neighboring positions and orientations.

2) SUM2 maps are computed by "convoluting" all the rotated/scaled/stretched shape motifs on MAX1 maps (since the matching score of shape motif is linear in MAX1 responses). We perform subsampling when computing SUM2 maps. MAX2 maps are computed from SUM2 maps by pooling local maximum over neighboring positions, rotations, scalings and stretchings.

3) SUM3 maps are computed by "convoluting" the shape script template on MAX2 maps. The matching score of shape script is also linear in MAX2 responses.

Below we show for one egret example the MAX2 maps (of three shape motifs selected in the shape script), the SUM3 map and final matching result. See this matlab function for details about the neighborhood size for MAX1 and MAX2 maps.

source image M2 map for ellipsoid M2 map for angle M2 map for parallel S3 map matched shape script

Matching results on egret and pelican images.

eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps eps

Failed examples are also shown below. Some of them have heavy background clutter. And for almost all of the mis-matched templates, the egret parts are off position. Such "dislocation" is possible because we allow parts to perturb relative to the object center, while the parts perturbations are independent of each other. In future work, we will improve this basic model by adding pairwise constraints, so that two parts always stay close enough to each other, while they can still move together for a long distance relative to the object center.

eps eps eps eps

Step 4. Verify that the matching score is able to distinguish egret/pelican from random background. We pool over S3 scores over random natural images, and over correctly detected egret/pelican instances. Below is the comparative histogarm for two populations of S3 scores. The simple shape script model is able to reasonably tell an egret or pelican from a background clutter. See this matlab function for more details.