【paper】feature points matching for CBIR

via this paper

Feature points are a set of points with sharp variations of luminance, which combine some characteristics of color, texture and shape in an image.

  • The color space RGB of an image is first transformed into the luminance and chrominance components, in which only the luminance component is used in the feature point extraction process.
  1. an image is decomposed into its multi-resolution representation by using discrete wavelet ransform, 4 subimages are obtained
  2. row and column low-passing filtering – the smoothed and down sampled subimage. low-low image, low-high, high-low, and high-high images
  3. a degree of level j – the scale of the decomposition process
  4. multiscale edge detection – the points with sharp variations of luminance can be detected from the modulus of the wavelet gradient vecctors
  5. the points with gradient values larger than a predefined threshold are retained –  possible feature points

edge correlation – to reduce the effect of noises( increase the computational cost, degrade the matching accuracy). By multiplying the gradient values over multiple scales, noises can be suppressed while the sharp feature points sharpened and retained. The feature points with the normalized edge correlation value larger than its corresponding gradient value

Check if the gradient value of each point retained, is the local maximum within a window: an nxn window is centered at each feature point, if the local maximum within the nxn window, then the feature point is retained.

Matching algorithm

1. finding all possible matching pairs

Assume, I1, I2 are two images, N1, N2 represent the number of feature points in FP1 and FP2.

FP1 ={pm(x,y)m=1,2,…N1}

FP2 ={qn(x,y), n=1,2,…N2}

The normalised cross-correlation of any two feature points p and q, is used as the matching metric: for each feature point p, the cross-corrleation between an nxn matching window centered at p and a window centered at q is calculated and normalized on the scale ranging from -1 to 1.



For each feature point in I2, the best matching point in I1 is found. E.g., to find the matching point of q1, the normalized cross-correlations between q1 and each pi are calculated, and the feature point with the high correlation value with q1 is determined as the matching point of q1.

  • if more than more point q are matched to the same point pm, the matching pair (pm, qn) with the highest correlation value is retained and all the others are removed
  • finally the matching pairs with normalized cross correlation smaller than a threshold are eliminated.

2. Grouping the matching pairs

The matching pairs between I1, I2 MP(I1,I2) = {(p1, qi), i = 1, 2, …, N} ,

N represents the number of the matching pairs.

The translation vector of the matching pair (pi, qi) can be computed: T = qi-pi;

  • Two matching pairs (pm, qm) and (pn, qn) are considered to be consistent if the difference of their translation vectors is less than a threshold.
  • The set of consistent matching pairs forms a group, that is, a group represents an area of similar content in the two images.
    1. the matching pairs are stored in a list by their positions in the order of left-right and top-bottom



(p1, q1) and (p2, q2) are determined as in the same group because they have similar translation vectors,

3. similarity measures

measure the similarity degree between images depending on whether the spatial relationship of the corresponding groups is required.

  • ignore the spatial relationship of the corresponding groups, number of the matching pairs between I and Q/number of the feature points in Q



  • consider the spatial relationship of the corresponding groups: the displacement of each group – the average length of the translation vectors,



Ng – the number of groups

Li and Mi – displacement and the number of matching pairs in group Gi

Fq – the number of feature points of Q

MaxL – the possible maximal length of the translation vectors, is image size is mxn, MaxL=sqrt(m2+n2)










4. Evaluation 

  • precision (number of retrieved images relevant to the query/number of retrieved images) – what percentages of the retrieved images are actually relevant to the query
  • recall (number of the retrieved images relevant to query/number of images relevant to the query in database) – what percentages of the images relevant to the query are retrieved
















































Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......







Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: