Shape Context

Despite the lack of a perfect representation of visual objects, supervised learning has provided a natural and successful framework for studying object recognition.

Some make careful choices of the representation of the object, so as to obtain a rich and meaningful feature descriptor; others put more emphasis on the “learning” aspect of the recognition task, making use of more sophisticated learning techniques.

Many methods on shape matching could be divided into:

  1. Feature-based methods extract points from the image (usually edge or corner points) and reduce the problem to point set matching.
    1. capture the structure of the shape using the medial axis transform
    2. Hausdorff distance
    3. Geometric hashing uses configuration of keypoints to vote for a particular shape.
  2. Brightness-based methods work with the intensity of the pixels and use the image itself as a feature descriptor.

Shape Context works well with Learning techniques, eg., nearest neighbor

  1. PCA for face recognition
  2. BAyes classifier and Decision tree learning, for more general object recognition tasks
  3. Boosting for feature selection
  4. SVM for template matching, e.g. recognizing pedestrians
  5. Neural Network works in digit classificaiton

A broad class of object recognition problems have benefited from statistical learning machinery.

The basic idea of Shape Context:

  1. Take N samples from the edge elements on the shape.
    • The points can be on internal or external contours. No need to correspond to keypoints such as maxima of curvature or inflection points
  2. Calculate the vector originating from one point to all other points in the shape
    • Vectors express the appearance of the entire shape relative to the reference point.
    • The set of Vectors is a rich description.
    • N gets large, the representation of the shape becomes exact.
    • Euclidean Distance r and angle a, Then Normalize r by the median distance, measure the angle relative to the positive x-aixs.
  3. Compute the log of the r vector
    • histogram could distinguish more finely among differences in nearby pixels, so use log-polar coordinate system.
  4. For each origin point, capture number of points that lie a given bin.
    • a coarse histogram of the relative cooridnates of the remaining points.
    • The reference orientation could be absolute/relative to a given axis, depending on the problem setting
  5. Each shape context is a log-polar histogram of the coordinates of the N-1 points measured from the origin reference point

Shape Context encodes a description of the density of boundary points at various distances and angles, it may be desirable to include an additional feature that accounts for the local appearance of the reference point itself.

    • local orientation
    • vectors/filters outputs
    • color histograms

Matching Shape Context

How can we assign the sample points of ShapeP to correspond to those of SahpeQ?

  • Corresponding points have very similar descriptors
    • Matching cost = weighted shape context + weighted local appearance
    • shape context distance between the two normalized histograms
    • local appearance is the dissimilarity of the tangent angles.
    • The dissimilarity between two shapes can be computed as the sum of matching errors between the corresponding points, together with a term measuring the magnitude of the aligning transform.
    • modelling tansform: given a set of correspondences, estimate a transformation that maps the model into the target, e.g., Euclidean, Affine, Thin Plate Spline etc.
  • the correspondences are unique
  • Bipartite matching shape for correspondence

Classification based on a fixed shape

The nearest neighbor classifier effectively weighs the information on each sample point equally. Yet, usually some
parts of the shape are better in telling apart two classes.

Given a dissimilarity measure, a k-NN technique can be used for object classification/recognition


incorporates invariance to:

  • Translation
  • Sclae
  • Rotation
  • Occlusion


  1. Sensitive local distortion or blurred edges
  2. problems in cluttered background


  • Digit recognition
  • Silhouette similarity based retrieval
  • 3D object recognition
  • Trademark retrieval

Database for evaluation

  • MNIST datasets of handwritten digits: 60000 training and 10000 testing digits
  • MPEG-7 shape silhouette: core experiment CE-Shape-1 part B, 1400 images with 70 shape classes, 20 images per class
  • COIL-20 database for 3D recognition: 20 common household objects; turn ever 5 degree for a total of 72 views per object
  • Trademark database: 300 different real-world trademark





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Just another site

Jing's Blog

Just another site

Start from here......







Just another site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: