shape context 【matching】theory from wiki

via wikipedia

The shape context is intended to be a way of describing shapes that allows for measuring shape similarity and the recovering of point correspondences.

The basic idea is to pick points on the contours of a shape. For each point pi on the shape, consider the n − 1 vectors obtained by connecting pi to all other points. The set of all these vectors is a rich description of the shape localized at that point but is far too detailed.

The key idea is that the distribution over relative positions is a robust, compact, and highly discriminative descriptor. So, for the point pi, the coarse histogram of the relative coordinates of the remaining n − 1 points,

h_i(k) = \#\{q \ne p_i  :  (q - p_i) \in \mbox{bin}(k)\}

is defined to be the shape context of p_i.

The bins are normally taken to be uniform in log-polar space.

The fact that the shape context is a rich and discriminative descriptor can be seen in the figure below, in which the shape contexts of two different versions of the letter “A” are shown.


Now in order for a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scale, small perturbations, and depending on application rotation.

  • Translational invariance come naturally to shape context.
  • Scale invariance is obtained by normalizing all radial distances by the mean distance \alpha between all the point pairs in the shape  although the median distance can also be used.
  • Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers[4] using synthetic point set matching experiments.

One can provide complete rotation invariance in shape contexts.

  • One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges).
  • This results in a completely rotationally invariant descriptor.
  • But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame.
  • Many applications in fact forbid rotation invariance

A complete system that uses shape contexts for shape matching consists of the following steps:

  1. Randomly select a set of points that lie on the edges of a known shape and another set of points on an unknown shape.
  2. Compute the shape context of each point found in step 1.
  3. Match each point from the known shape to a point on an unknown shape.
    • To minimize the cost of matching, first choose a transformation (e.g. affinethin plate spline, etc.) that warps the edges of the known shape to the unknown (essentially aligning the two shapes).
    • Then select the point on the unknown shape that most closely corresponds to each warped point on the known shape.
    • Consider two points p and q that have normalized K-bin histograms (i.e. shape contexts) g(k) and h(k):
      •  As shape contexts are distributions represented as histograms, it is natural to use the χ2 test statistic as the “shape context cost” of matching the two points:C_S = \frac{1}{2}\sum_{k=1}^K \frac{[g(k) - h(k)]^2}{g(k) + h(k)}, the values of this range from 0 to 1.
      • an extra cost based on the appearance can be added: For instance, it could be a measure of tangent angle dissimilarity (particularly useful in digit recognition):
        C_A = \frac{1}{2}\begin{Vmatrix}
 \dbinom{\cos(\theta_1)}{\sin(\theta_1)} - \dbinom{\cos(\theta_2)}{\sin(\theta_2)}
      • Now the total cost of matching the two points could be a weighted-sum of the two costs:C = (1 - \beta)C_S + \beta C_A\!\,
      • for each point pi on the first shape and a point qj on the second shape, calculate the cost as described and call it Ci,j. This is the cost matrix.
  4. Calculate the “shape distance” between each pair of points on the two shapes.
    • Use a weighted sum of the shape context distance, the image appearance distance, and the bending energy (a measure of how much transformation is required to bring the two shapes into alignment).
    • a transformation T : \Bbb{R}^2 \to \Bbb{R}^2 can be estimated to map any point from one shape to the other, There are several choices for this transformation:
      • Affine

        The affine model is a standard choice: T(p) = Ap + o\!.

      • The thin plate spline (TPS) model is the most widely used model for transformations when working with shape contexts. A 2D transformation can be separated into two TPS function to model a coordinate transform:
         T(x,y) = \left (f_x(x,y),f_y(x,y)\right )
    • Now, a shape distance between two shapes P\! and Q\!. This distance is going to be a weighted sum of three potential terms:
      • Shape context distance: this is the symmetric sum of shape context matching costs over best matching points:
        D_{sc}(P,Q) = \frac{1}{n}\sum_{p \in P} \arg \underset{q \in Q}{\min} C(p,T(q)) + \frac{1}{m}\sum_{q \in Q} \arg \underset{p \in P}{\min} C(p,T(q))

        where T(·) is the estimated TPS transform that maps the points in Q to those in P.

      • Appearance cost: After establishing image correspondences and properly warping one image to match the other, one can define an appearance cost as the sum of squared brightness differences in Gaussian windows around corresponding image points:
        D_{ac}(P,Q) = \frac{1}{n}\sum_{i=1}^n\sum_{\Delta \in Z^2} G(\Delta)\left [I_P(p_i + \Delta) - I_Q(T(q_{\pi(i)}) + \Delta)\right ]^2

        where I_P\! and I_Q\! are the gray-level images (I_Q\! is the image after warping) and G\! is a Gaussian windowing function.

      • Transformation cost: The final cost D_{be}(P,Q)\!\, measures how much transformation is necessary to bring the two images into alignment. In the case of TPS, it is assigned to be the bending energy.
  5. To identify the unknown shape, use nearest-neighbor classifier to compare its shape distance to shape distances of known objects.
    • Now that we have a way of calculating the distance between two shapes, we can use a nearest neighbor classifier (k-NN) with distance defined as the shape distance calculated here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s




Just another site

Jing's Blog

Just another site

Start from here......







Just another site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: