match corners between images

What is interest point detection?

  • Visually ‘salient’ features.
  • Localized in 2D.
  • Sparse.
  • High ‘information’ content.
  • Repeatable between images.

To extract a small square of pixels (e.g. 11×11) from around the FAST interest points, as a vector. Match two points by looking at the norm of the difference between the square. Then, you have to compare every point in the first image to every point in the second to find the best match.

If you wish to build your own FAST detector (e.g. trained on your own data, targeting another language, or using some new optimizations), then the FAST-ER code provides programs for training new FAST-N detectors as well as FAST-ER detectors.

Quantised Patches:

  • Sparsely sample 8×8 patches around corners
  • Quantise to 5 levels, relative to mean and standard deviation of samples
  • Use independent features from different scales and orientations
  • Matching problem is simplified,  However, will need lots of features to cover range of views to be matched
  • 252 viewpoint bins (each with 10 degrees rotation, scale reduction by 0.8, up to 30 degrees out-of-plane view) Around 50 features from each viewpoint, So around 13000 features for a target

Combining Quantised Patches

Combine quantised patches from different  images where interest point detected nearby

The radius you need depends on the scale of the features, rather than the size of the image. If the features are very blurry, then you will need a bigger ring. The easiest and most efficient way to do this is to subsample the image, e.g. by taking 2×2 squares, and averaging the pixels inside to make a single output pixel.

Histograms quantised to binary representation

used FAST on greyscale images. The proper way is to convert to grey by using the CIE weightings. The easiest/quickest way is to use the green channel, which is not a bad approximation of the CIE weightings.

FAST, extract patches, matching, pose

Matching speed is of key importance in real-time vision applications

  • Frame-to-frame tracking can be efficient, but requires initialisation
  • fast localisation methods are needed
  • Local Feature
    • Naturally handle partial occlusion and some incorrect correspondences
    • Represent a target as a small set of key features (~100s)
    • Attempt to identify and match key features in any view of target
    • Existing Local Feature Approaches:
      • Descriptor-based, e.g. SIFT: Factor out as much variation as possible, Soft-binned histograms
      • Clasification-based, e.g. Ferns: Train classifiers on different views of the same feature, Lower runtime computational cost, but high memory usage

Just require features to match under small viewpoint variations – Simplifies matching problem;

Independent sets of features can handle large  viewpoint change

Classification-based(runtime speed is key)

Desired runtime operations:
– FAST-9 Corner Detection
– Simple “descriptor”
– Efficient dissimilarity score computation
– (PROSAC for pose estimation)

http://mirror2image.wordpress.com/2009/01/25/fast-with-surf-descriptor/

http://fastcorner.wordpress.com/2010/09/09/fast-faq/

Advertisements

shape context algorithm

I was trying to achieve rotation invariance for shape Context.
The general approach for shape context is
  • to compute distances and angles between each set of interest points in a given image.
  • Then bin into a histogram based on whether these calculated values fall into certain ranges.
You do this for both a standard and a test image.
  • To match two different images, from this you use a chi-square function to estimate a “cost” between each possible pair of points in the two different histograms.
  • Finally, use an optimization technique such as the hungarian algorithm to find optimal assignments of points and then sum up the total cost, which will be lower for good matches.
they say that to make the above approach rotation invariant,
you need to calculate each angle between each pair of points using the tangent vector as the x-axis. (ie http://www.cs.berkeley.edu/~malik/papers/BMP-shape.pdf page 513)

Learning grasping points with shape context

Color-shape context for object recognition

Advanced shape context for plant species identification using leaf image retrieval

Shape Context

Despite the lack of a perfect representation of visual objects, supervised learning has provided a natural and successful framework for studying object recognition.

Some make careful choices of the representation of the object, so as to obtain a rich and meaningful feature descriptor; others put more emphasis on the “learning” aspect of the recognition task, making use of more sophisticated learning techniques.

Many methods on shape matching could be divided into:

  1. Feature-based methods extract points from the image (usually edge or corner points) and reduce the problem to point set matching.
    1. capture the structure of the shape using the medial axis transform
    2. Hausdorff distance
    3. Geometric hashing uses configuration of keypoints to vote for a particular shape.
  2. Brightness-based methods work with the intensity of the pixels and use the image itself as a feature descriptor.

Shape Context works well with Learning techniques, eg., nearest neighbor

  1. PCA for face recognition
  2. BAyes classifier and Decision tree learning, for more general object recognition tasks
  3. Boosting for feature selection
  4. SVM for template matching, e.g. recognizing pedestrians
  5. Neural Network works in digit classificaiton

A broad class of object recognition problems have benefited from statistical learning machinery.

The basic idea of Shape Context:

  1. Take N samples from the edge elements on the shape.
    • The points can be on internal or external contours. No need to correspond to keypoints such as maxima of curvature or inflection points
  2. Calculate the vector originating from one point to all other points in the shape
    • Vectors express the appearance of the entire shape relative to the reference point.
    • The set of Vectors is a rich description.
    • N gets large, the representation of the shape becomes exact.
    • Euclidean Distance r and angle a, Then Normalize r by the median distance, measure the angle relative to the positive x-aixs.
  3. Compute the log of the r vector
    • histogram could distinguish more finely among differences in nearby pixels, so use log-polar coordinate system.
  4. For each origin point, capture number of points that lie a given bin.
    • a coarse histogram of the relative cooridnates of the remaining points.
    • The reference orientation could be absolute/relative to a given axis, depending on the problem setting
  5. Each shape context is a log-polar histogram of the coordinates of the N-1 points measured from the origin reference point

Shape Context encodes a description of the density of boundary points at various distances and angles, it may be desirable to include an additional feature that accounts for the local appearance of the reference point itself.

    • local orientation
    • vectors/filters outputs
    • color histograms

Matching Shape Context

How can we assign the sample points of ShapeP to correspond to those of SahpeQ?

  • Corresponding points have very similar descriptors
    • Matching cost = weighted shape context + weighted local appearance
    • shape context distance between the two normalized histograms
    • local appearance is the dissimilarity of the tangent angles.
    • The dissimilarity between two shapes can be computed as the sum of matching errors between the corresponding points, together with a term measuring the magnitude of the aligning transform.
    • modelling tansform: given a set of correspondences, estimate a transformation that maps the model into the target, e.g., Euclidean, Affine, Thin Plate Spline etc.
  • the correspondences are unique
  • Bipartite matching shape for correspondence

Classification based on a fixed shape

The nearest neighbor classifier effectively weighs the information on each sample point equally. Yet, usually some
parts of the shape are better in telling apart two classes.

Given a dissimilarity measure, a k-NN technique can be used for object classification/recognition

Advantages

incorporates invariance to:

  • Translation
  • Sclae
  • Rotation
  • Occlusion

Drawbacks

  1. Sensitive local distortion or blurred edges
  2. problems in cluttered background

Application

  • Digit recognition
  • Silhouette similarity based retrieval
  • 3D object recognition
  • Trademark retrieval

Database for evaluation

  • MNIST datasets of handwritten digits: 60000 training and 10000 testing digits
  • MPEG-7 shape silhouette: core experiment CE-Shape-1 part B, 1400 images with 70 shape classes, 20 images per class
  • COIL-20 database for 3D recognition: 20 common household objects; turn ever 5 degree for a total of 72 views per object
  • Trademark database: 300 different real-world trademark

 

 

 

【zz】Local feature descriptors

该文把比较流行的局部特征描述方法大体罗列了一下,期待图像更全面特征或者特征模型的一些综述、例如HOG、Part Model、Examplar、Sparse Coding、Local…等等。

以工序而言,HOG算特征,Sparse Coding算Pooling,Part Model和Exampler算Model

一个基本研究问题,在寻找图像中的对应点以及物体特征描述中有着重要的作用。它是许多方法的基础,因此也是目前视觉研究中的一个热点,每年在视觉领域的顶级会议ICCV/CVPR/ECCV上都有高质量的特征描述论文发表。同时它也有着广泛的应用,举例来说,在利用多幅二维图像进行三维重建、恢复场景三维结构的应用中,其基本出发点是要有一个可靠的图像对应点集合,而自动地建立图像之间点与点之间的可靠对应关系通常都依赖于一个优秀的局部图像特征描述子。又比如,在物体识别中,目前非常流行以及切实可行的方法之一是基于局部特征的,由于特征的局部性,使得物体识别可以处理遮挡、复杂背景等比较复杂的情况。
局部图像特征描述的核心问题是不变性(鲁棒性)和可区分性。由于使用局部图像特征描述子的时候,通常是为了鲁棒地处理各种图像变换的情况。因此,在构建/设计特征描述子的时候,不变性问题就是首先需要考虑的问题。在宽基线匹配中,需要考虑特征描述子对于视角变化的不变性、对尺度变化的不变性、对旋转变化的不变性等;在形状识别和物体检索中,需要考虑特征描述子对形状的不变性。
然而,特征描述子的可区分性的强弱往往和其不变性是矛盾的,也就是说,一个具有众多不变性的特征描述子,它区分局部图像内容的能力就稍弱;而如果一个非常容易区分不同局部图像内容的特征描述子,它的鲁棒性往往比较低。举个例子,假定我们需要对一个点周围固定大小的局部图像内容进行描述。如果我们直接将图像内容展开成一个列向量对其进行描述,那么只要局部图像内容发生了一点变化,就会使得它的特征描述子发生较大的变化,因此这样的特征描述方式很容易区分不同的局部图像内容,但是对于相同的局部图像内容发生旋转变化等情况,它同样会产生很大的差异,即不变性弱。
而另一方面,如果我们通过统计局部图像灰度直方图来进行特征描述,这种描述方式具有较强的不变性,对于局部图像内容发生旋转变化等情况比较鲁棒,但是区分能力较弱,例如无法区分两个灰度直方图相同但内容不同的局部图像块。
综上所述,一个优秀的特征描述子不仅应该具有很强不变性,还应该具有很强的可区分性。
在诸多的局部图像特征描述子中,SIFT(Scale Invariant Feature Transform)是其中应用最广的,它在1999年D. Lowe首次提出,至2004年得到完善。SIFT的提出也是局部图像特征描述子研究领域一项里程碑式的工作。由于SIFT对尺度、旋转以及一定视角和光照变化等图像变化都具有不变性,并且SIFT具有很强的可区分性,自它提出以来,很快在物体识别、宽基线图像匹配、三维重建、图像检索中得到了应用,局部图像特征描述子在计算机视觉领域内也得到了更加广泛的关注,涌现了一大批各具特色的局部图像特征描述子。
SURF(Speeded Up Robust Features)是对SIFT的改进版本,它利用Haar小波来近似SIFT方法中的梯度操作,同时利用积分图技术进行快速计算,SURF的速度是SIFT的3-7倍,大部分情况下它和SIFT的性能相当,因此它在很多应用中得到了应用,尤其是对运行时间要求高的场合。
DAISY是面向稠密特征提取的可快速计算的局部图像特征描述子,它本质思想和SIFT是一样的:分块统计梯度方向直方图,不同的是,DAISY在分块策略上进行了改进,利用高斯卷积来进行梯度方向直方图的分块汇聚,这样利用高斯卷积的可快速计算性就可以快速稠密地进行特征描述子的提取。比较巧合的是,DAISY这种特征汇聚策略被一些研究者(Matthen Brown,Gang Hua,Simon Winder)通过机器学习的方法证明相对于其他几种特征汇聚策略(卡迪尔坐标下分块、极坐标下分块)是最优的。
ASIFT(Affine SIFT)通过模拟所有成像视角下得到的图像进行特征匹配,可以很好地处理视角变化的情况,尤其是大视角变化下的图像匹配。
MROGH(Multi-support Region Order-based Gradient Histogram)则是特征汇聚策略上寻求创新,之前的局部图像特征描述子,其特征汇聚策略都是基于邻域内点的几何位置的,而MROGH基于点的灰度序进行特征汇聚。
BRIEF(Binary Robust Independent Element Feature)利用局部图像邻域内随机点对的灰度大小关系来建立局部图像特征描述子,得到的二值特征描述子不仅匹配速度快,而且存储要求内存低,因此手机应用中具有很好的应用前景。其实,利用邻域内点对的灰度大小关系进行特征描述这一思想在SMD(ECCV’08)中就已经有了。
除了BRIEF,近两年还提出了许多二值特征描述子,例如ORBBRISKFREAK。上述这些特征描述子都是基于手动设计得到的,也有一些研究试图利用机器学习的方法,通过数据驱动得到想要的特征描述子。这类特征描述子包括PCA-SIFTLinear Discriminative EmbeddingLDA-Hash等。当然,除了提到到的这些特征描述子之外,还有许多其他的特征描述子,在这就不再一一叙述了。
国际上研究局部图像特征描述子比较著名的学者有:英国Surrey大学的Mikolajzyk,他在INRIA做博后的时候,在宽基线应用背景下,对SIFT、Shape Context、PCA-SIFT、不变矩等多种局部图像描述子的性能进行了评测,相关论文发表在2005年PAMI上,他提出来的评测方法至今仍是局部图像描述子研究领域中广泛采用的性能评测方法。

INRIA的C. Schmid,她九十年代就开始研究局部图像描述方法了,是这个领域内的元老之一,不过这几年她的团队正在将重心转向大规模图像检索和行为识别等应用中。
比利时Leuven大学的Tinne Tuytelaars,她是著名的SURF描述子的提出者,SURF相关的论文于2011年获得CVIU引用最多论文奖,她写了三篇局部图像特征描述相关的综述文章,分别是“Local Invariant Feature Detectors: A Survey”,“Local Image Features”和“Wide baseline matching”。
英国Oxford大学的Andrea Vedaldi,他是Vlfeat的发起者和主要作者。Vlfeat是一个开源程序,其中包括了SIFT、MSER,被许多研究者广泛采用。Vlfeat目前正在逐渐实现其他常用的特征描述子。
瑞士EPFL的Vincent LepetitPascal Fua,他们的团队主要致力于发展快速、高效的局部图像特征描述子,用于模板匹配、三维重建、虚拟现实等应用。他们的工作包括用于稠密立体匹配的DAISY特征描述子,基于Random Trees的模板匹配方法,基于Random Ferns的模板匹配方法。此外,LDA-Hash、BRIEF、D-BRIEF(ECCV 2012)也是他们的杰作。
中国科学院自动化研究所的吴福朝研究员,他在这方面也做了比较深入的研究,并提出了许多不错的局部图像特征提取和描述方法。这些名字都是我们在读论文的时候会经常看到的。
最近几年局部图像特征描述子的发展趋势是:快速、低存储。这两个趋势使得局部图像特征描述子可以在快速实时、大规模应用中发挥作用,而且有利于将许多应用做到手机上去进行开发,实实在在的将计算机视觉技术应用于我们周围的世界中。为了满足快速和低存储这两个需求,二值特征描述子得到了研究者的广泛关注,这两年CVPR和ICCV中关于局部图像特征描述子的文章,大部分都是这类的。相信它们在未来几年还会继续受到关注,期待出现一些深入大众生活中的成功应用。

生活在西班牙

自己动手丰衣足食

BlueAsteroid

Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......

我的心情魔方

天才遠私廚

希望能做一個分享各種資訊的好地方

语义噪声

西瓜大丸子汤的博客

笑对人生,傲立寰宇

Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision

datarazzi

Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness

槑烎

1,2,∞

Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web