match corners between images

What is interest point detection?

  • Visually ‘salient’ features.
  • Localized in 2D.
  • Sparse.
  • High ‘information’ content.
  • Repeatable between images.

To extract a small square of pixels (e.g. 11×11) from around the FAST interest points, as a vector. Match two points by looking at the norm of the difference between the square. Then, you have to compare every point in the first image to every point in the second to find the best match.

If you wish to build your own FAST detector (e.g. trained on your own data, targeting another language, or using some new optimizations), then the FAST-ER code provides programs for training new FAST-N detectors as well as FAST-ER detectors.

Quantised Patches:

  • Sparsely sample 8×8 patches around corners
  • Quantise to 5 levels, relative to mean and standard deviation of samples
  • Use independent features from different scales and orientations
  • Matching problem is simplified,  However, will need lots of features to cover range of views to be matched
  • 252 viewpoint bins (each with 10 degrees rotation, scale reduction by 0.8, up to 30 degrees out-of-plane view) Around 50 features from each viewpoint, So around 13000 features for a target

Combining Quantised Patches

Combine quantised patches from different  images where interest point detected nearby

The radius you need depends on the scale of the features, rather than the size of the image. If the features are very blurry, then you will need a bigger ring. The easiest and most efficient way to do this is to subsample the image, e.g. by taking 2×2 squares, and averaging the pixels inside to make a single output pixel.

Histograms quantised to binary representation

used FAST on greyscale images. The proper way is to convert to grey by using the CIE weightings. The easiest/quickest way is to use the green channel, which is not a bad approximation of the CIE weightings.

FAST, extract patches, matching, pose

Matching speed is of key importance in real-time vision applications

  • Frame-to-frame tracking can be efficient, but requires initialisation
  • fast localisation methods are needed
  • Local Feature
    • Naturally handle partial occlusion and some incorrect correspondences
    • Represent a target as a small set of key features (~100s)
    • Attempt to identify and match key features in any view of target
    • Existing Local Feature Approaches:
      • Descriptor-based, e.g. SIFT: Factor out as much variation as possible, Soft-binned histograms
      • Clasification-based, e.g. Ferns: Train classifiers on different views of the same feature, Lower runtime computational cost, but high memory usage

Just require features to match under small viewpoint variations – Simplifies matching problem;

Independent sets of features can handle large  viewpoint change

Classification-based(runtime speed is key)

Desired runtime operations:
– FAST-9 Corner Detection
– Simple “descriptor”
– Efficient dissimilarity score computation
– (PROSAC for pose estimation)

shape context algorithm

I was trying to achieve rotation invariance for shape Context.
The general approach for shape context is
  • to compute distances and angles between each set of interest points in a given image.
  • Then bin into a histogram based on whether these calculated values fall into certain ranges.
You do this for both a standard and a test image.
  • To match two different images, from this you use a chi-square function to estimate a “cost” between each possible pair of points in the two different histograms.
  • Finally, use an optimization technique such as the hungarian algorithm to find optimal assignments of points and then sum up the total cost, which will be lower for good matches.
they say that to make the above approach rotation invariant,
you need to calculate each angle between each pair of points using the tangent vector as the x-axis. (ie page 513)

Learning grasping points with shape context

Color-shape context for object recognition

Advanced shape context for plant species identification using leaf image retrieval

Shape Context

Despite the lack of a perfect representation of visual objects, supervised learning has provided a natural and successful framework for studying object recognition.

Some make careful choices of the representation of the object, so as to obtain a rich and meaningful feature descriptor; others put more emphasis on the “learning” aspect of the recognition task, making use of more sophisticated learning techniques.

Many methods on shape matching could be divided into:

  1. Feature-based methods extract points from the image (usually edge or corner points) and reduce the problem to point set matching.
    1. capture the structure of the shape using the medial axis transform
    2. Hausdorff distance
    3. Geometric hashing uses configuration of keypoints to vote for a particular shape.
  2. Brightness-based methods work with the intensity of the pixels and use the image itself as a feature descriptor.

Shape Context works well with Learning techniques, eg., nearest neighbor

  1. PCA for face recognition
  2. BAyes classifier and Decision tree learning, for more general object recognition tasks
  3. Boosting for feature selection
  4. SVM for template matching, e.g. recognizing pedestrians
  5. Neural Network works in digit classificaiton

A broad class of object recognition problems have benefited from statistical learning machinery.

The basic idea of Shape Context:

  1. Take N samples from the edge elements on the shape.
    • The points can be on internal or external contours. No need to correspond to keypoints such as maxima of curvature or inflection points
  2. Calculate the vector originating from one point to all other points in the shape
    • Vectors express the appearance of the entire shape relative to the reference point.
    • The set of Vectors is a rich description.
    • N gets large, the representation of the shape becomes exact.
    • Euclidean Distance r and angle a, Then Normalize r by the median distance, measure the angle relative to the positive x-aixs.
  3. Compute the log of the r vector
    • histogram could distinguish more finely among differences in nearby pixels, so use log-polar coordinate system.
  4. For each origin point, capture number of points that lie a given bin.
    • a coarse histogram of the relative cooridnates of the remaining points.
    • The reference orientation could be absolute/relative to a given axis, depending on the problem setting
  5. Each shape context is a log-polar histogram of the coordinates of the N-1 points measured from the origin reference point

Shape Context encodes a description of the density of boundary points at various distances and angles, it may be desirable to include an additional feature that accounts for the local appearance of the reference point itself.

    • local orientation
    • vectors/filters outputs
    • color histograms

Matching Shape Context

How can we assign the sample points of ShapeP to correspond to those of SahpeQ?

  • Corresponding points have very similar descriptors
    • Matching cost = weighted shape context + weighted local appearance
    • shape context distance between the two normalized histograms
    • local appearance is the dissimilarity of the tangent angles.
    • The dissimilarity between two shapes can be computed as the sum of matching errors between the corresponding points, together with a term measuring the magnitude of the aligning transform.
    • modelling tansform: given a set of correspondences, estimate a transformation that maps the model into the target, e.g., Euclidean, Affine, Thin Plate Spline etc.
  • the correspondences are unique
  • Bipartite matching shape for correspondence

Classification based on a fixed shape

The nearest neighbor classifier effectively weighs the information on each sample point equally. Yet, usually some
parts of the shape are better in telling apart two classes.

Given a dissimilarity measure, a k-NN technique can be used for object classification/recognition


incorporates invariance to:

  • Translation
  • Sclae
  • Rotation
  • Occlusion


  1. Sensitive local distortion or blurred edges
  2. problems in cluttered background


  • Digit recognition
  • Silhouette similarity based retrieval
  • 3D object recognition
  • Trademark retrieval

Database for evaluation

  • MNIST datasets of handwritten digits: 60000 training and 10000 testing digits
  • MPEG-7 shape silhouette: core experiment CE-Shape-1 part B, 1400 images with 70 shape classes, 20 images per class
  • COIL-20 database for 3D recognition: 20 common household objects; turn ever 5 degree for a total of 72 views per object
  • Trademark database: 300 different real-world trademark




【zz】Local feature descriptors

该文把比较流行的局部特征描述方法大体罗列了一下,期待图像更全面特征或者特征模型的一些综述、例如HOG、Part Model、Examplar、Sparse Coding、Local…等等。

以工序而言,HOG算特征,Sparse Coding算Pooling,Part Model和Exampler算Model

在诸多的局部图像特征描述子中,SIFT(Scale Invariant Feature Transform)是其中应用最广的,它在1999年D. Lowe首次提出,至2004年得到完善。SIFT的提出也是局部图像特征描述子研究领域一项里程碑式的工作。由于SIFT对尺度、旋转以及一定视角和光照变化等图像变化都具有不变性,并且SIFT具有很强的可区分性,自它提出以来,很快在物体识别、宽基线图像匹配、三维重建、图像检索中得到了应用,局部图像特征描述子在计算机视觉领域内也得到了更加广泛的关注,涌现了一大批各具特色的局部图像特征描述子。
SURF(Speeded Up Robust Features)是对SIFT的改进版本,它利用Haar小波来近似SIFT方法中的梯度操作,同时利用积分图技术进行快速计算,SURF的速度是SIFT的3-7倍,大部分情况下它和SIFT的性能相当,因此它在很多应用中得到了应用,尤其是对运行时间要求高的场合。
DAISY是面向稠密特征提取的可快速计算的局部图像特征描述子,它本质思想和SIFT是一样的:分块统计梯度方向直方图,不同的是,DAISY在分块策略上进行了改进,利用高斯卷积来进行梯度方向直方图的分块汇聚,这样利用高斯卷积的可快速计算性就可以快速稠密地进行特征描述子的提取。比较巧合的是,DAISY这种特征汇聚策略被一些研究者(Matthen Brown,Gang Hua,Simon Winder)通过机器学习的方法证明相对于其他几种特征汇聚策略(卡迪尔坐标下分块、极坐标下分块)是最优的。
ASIFT(Affine SIFT)通过模拟所有成像视角下得到的图像进行特征匹配,可以很好地处理视角变化的情况,尤其是大视角变化下的图像匹配。
MROGH(Multi-support Region Order-based Gradient Histogram)则是特征汇聚策略上寻求创新,之前的局部图像特征描述子,其特征汇聚策略都是基于邻域内点的几何位置的,而MROGH基于点的灰度序进行特征汇聚。
BRIEF(Binary Robust Independent Element Feature)利用局部图像邻域内随机点对的灰度大小关系来建立局部图像特征描述子,得到的二值特征描述子不仅匹配速度快,而且存储要求内存低,因此手机应用中具有很好的应用前景。其实,利用邻域内点对的灰度大小关系进行特征描述这一思想在SMD(ECCV’08)中就已经有了。
除了BRIEF,近两年还提出了许多二值特征描述子,例如ORBBRISKFREAK。上述这些特征描述子都是基于手动设计得到的,也有一些研究试图利用机器学习的方法,通过数据驱动得到想要的特征描述子。这类特征描述子包括PCA-SIFTLinear Discriminative EmbeddingLDA-Hash等。当然,除了提到到的这些特征描述子之外,还有许多其他的特征描述子,在这就不再一一叙述了。
国际上研究局部图像特征描述子比较著名的学者有:英国Surrey大学的Mikolajzyk,他在INRIA做博后的时候,在宽基线应用背景下,对SIFT、Shape Context、PCA-SIFT、不变矩等多种局部图像描述子的性能进行了评测,相关论文发表在2005年PAMI上,他提出来的评测方法至今仍是局部图像描述子研究领域中广泛采用的性能评测方法。

INRIA的C. Schmid,她九十年代就开始研究局部图像描述方法了,是这个领域内的元老之一,不过这几年她的团队正在将重心转向大规模图像检索和行为识别等应用中。
比利时Leuven大学的Tinne Tuytelaars,她是著名的SURF描述子的提出者,SURF相关的论文于2011年获得CVIU引用最多论文奖,她写了三篇局部图像特征描述相关的综述文章,分别是“Local Invariant Feature Detectors: A Survey”,“Local Image Features”和“Wide baseline matching”。
英国Oxford大学的Andrea Vedaldi,他是Vlfeat的发起者和主要作者。Vlfeat是一个开源程序,其中包括了SIFT、MSER,被许多研究者广泛采用。Vlfeat目前正在逐渐实现其他常用的特征描述子。
瑞士EPFL的Vincent LepetitPascal Fua,他们的团队主要致力于发展快速、高效的局部图像特征描述子,用于模板匹配、三维重建、虚拟现实等应用。他们的工作包括用于稠密立体匹配的DAISY特征描述子,基于Random Trees的模板匹配方法,基于Random Ferns的模板匹配方法。此外,LDA-Hash、BRIEF、D-BRIEF(ECCV 2012)也是他们的杰作。




Just another site

Jing's Blog

Just another site

Start from here......







Just another site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web