【After】Key points matching

Keypoint extractors refers to visual modules that are aimed at detecting salient and highly distinguisable points and/or regions in the image.

how keypoint matching works between two images, referred to as A and B:

1. keypoints are found in A.
2. keypoints are found in B.

3. The closest matching keypoints are found between the images.

Each keypoint is a 128 dimension vector.

1. Euclidean distance

To find the distance between two keypoints, the Euclidean distance is found between the feature vectors belonging to the keypoints. Each keypoint in A is compared with each keypoint in B by Euclidean distance to find the closest matching keypoint.

A match is determined as follows:

Let a’ be a point in A. Let b’ and b” be the first and second closest matching points in B to the point a’. Let D(x,y) represent the Euclidean distance between its arguments x and y.

If D(a’, b’)< D(a’, b”)*0.6 then the closest point is chosen. Else, no points are chosen (no matching keypoint).

According to Lowe, finding merely the closest matching keypoint does not yield a reliable match. Lowe claims that a more reliable match is determined by comparing the second-closest matching keypoint. He does this by using the second-closest matching keypoint as a threshold.

2. iteratively finding matched key features against all database members – Random Sample Consensus (RANSAC) Affine Matching

The algorithm finds the best affine transformation by sampling the key features, and returns those that are determined to be near enough matches.

heuristic thresholds – the returned values need only be “good enough” matches

the most perceptually adequate results came from returning the database image with the greatest number of matched features.

SIFT requires an euclidean-distance-based matcher (FLANN), but FREAK or other binary descriptors require a hamming-distance-based matcher.

SIFT and SURF features consist of two parts, the detector and the descriptor.

  • The detector finds the point in some n-dimensional space (4D for SIFT),
  • the descriptor is used to robustly describe the surroundings of said points.
  • The latter is increasingly used for image categorization and identification in what is commonly known as the “bag of word” or “visual words” approach.
  • In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means.
  • Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset(uses VLFeat to train an test an image classifier on the Caltech-101 data):

The classifier achieves 65% average accuracy by using a single feature and 15 training images per class. It uses:

    • PHOW features (dense multi-scale SIFT descriptors)
    • Elkan k-means for fast visual word dictionary construction
    • Spatial histograms as image descriptors
    • A homogeneous kernel map to transform a Chi2 support vector machine (SVM) into a linear one
    • An internal SVM (based on PEGASOS) for classification

The program is fully contained in a single MATLAB M-file, and can also be simply adapted to use your own data (change conf.calDir).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......







Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: