重读SIFT

图像匹配,可以用在物体和场景识别,(通过多幅图像进行)3D重构,立体匹配和运动跟踪。

SIFT特征对于图像的旋转和尺度变化,光照改变,摄像机角度变化具有部分的不变性。使用SIFT描述图像特征(即为了得到对图像缩放、旋转、尺度缩放保持不变性的关键点)步骤:

  1. 极值检测:多个的octave中检测极值点,通过高斯微分函数来识别潜在的对于尺度和旋转不变的兴趣点。
  2. 关键点定位:在每个候选的位置上,通过一个拟合精细的模型来确定位置和尺度。关键点的选择依据于它们的稳定程度。
  3. 方向确定:基于图像局部的梯度方向,分配给每个关键点位置一个或多个方向。
  4. 关键点描述:在每个关键点周围的邻域内,在选定的尺度上测量图像局部的梯度。以特征点为中心取16*16的邻域作为采样窗口,将采样点与特征点的相对方向通过高斯加权后归入包含8个bin的方向直方图,最后获得4*4*8的128维特征描述子。示意图如下:5. 匹配
  • 尺度空间是由一个变化尺度的高斯函数G(x, y, σ)与图像I(x, y)卷积生成。
  • 多尺度表示中的较粗尺度应该是较细尺度的简化,较粗尺度是通过某种固定的方式,由较细尺度图像经过平滑得到,高斯函数是唯一可用的平滑函数。
  • Gaussian卷积是有尺寸大小的,使用同一尺寸的滤波器对两幅包含有不同尺寸的同一物体的图像求局部最值,将有可能出现一方求得最值而另一方却没有的情况,但是容易知道假如物体的尺寸都一致的话它们的局部最值将会相 同。
  • SIFT的精妙之处在于采用图像金字塔的方法解决这一问题:比较两个图形,每一个 截面与原图像相似,那么两个金字塔中必然会有包含大小一致的物体的无穷个截面,但应用只能是离散的,所以我们只能构造有限层,层数越多当然越好,但处理时 间会相应增加,层数太少不行,因为向下采样的截面中可能找不到尺寸大小一致的两个物体的图像。
  • 有了图像金字塔就可以对每一层求出局部最值,但是这样的稳定 点数目将会十分可观,所以需要使用某种方法抑制去除一部分点,保存同一尺度下的稳定点
  • 图像金字塔的构建:图像金字塔共O组,每组有S层,下一组的图像由上一组图像降采样得到。1) Increasing σ moves the image plane upward in scale-space, and the image becomes increasingly more blurredas the higher spatial frequencies are filtered out.2) Convoluting the image I(x, y) with G(x, y, σ) creates the scale space representation L(x, y, σ) of the image

    L(x, y, σ) = G(x, y, σ) ∗ I(x, y)

    3) The SIFT algorithm uses extrema found in the difference-of-Gaussian function from the convolved image pyramid that is computed from two adjacent scales in scale-space separated by a constant factor k.

    4) The difference-of-Gaussian function D(x, y, σ) of the convoluted image L(x, y, σ) is the subtraction of two adjacent scales in the Gaussian scale-space pyramid separated by constant factor k.
    D(x, y, σ) = L(x, y, kσ) − L(x, y, σ)

    5) To create the image pyramid, discrete intervals in scale-space are sampled by increasing the scale parameter σ by a constant amount.

    Figure 1: Two octaves of a Gaussian scale-space image pyramid with s = 2 intervals. The first image in the second octave is created by down sampling the second to last image in the previous octave by a factor of 2 (shown in green).

    Figure 2: The difference of two adjacent intervals in the Gaussian scale-space pyramid create
    an interval in the difference-of-Gaussian pyramid (shown in green).

    6) For generating the extrema, which is the feature candidate, the process requires comparing each pixel to its 8 neighbours in the current image, also to 18 (9×2) neighbours in scale above and below, the extrema must be the local maximum or minimum.

    Figure 3: Extrema:the local maxima or minima value for identifying the potential interest feature.

    7) Keypoint localization 

    Due to the DoG function has a strong response along the edges, not every feature point is stable, and these undesired keypointsreduce the matching accuracy and noise resistance. So After identi ed the potential interest feature, a filter process is required to reject the low contrast points and edge points

    The first test that needs to be made is that of contrast. If the value of D(x, y, σ) at the keypoint location is less than a contrast threshold constant, then it is discarded as unstable and susceptible to low levels of noise. A value of |D(x, y, σ)| less than 0.03 is used to filter out low contrast keypoints in Lowe’s (2004) paper.

    The most recent SIFT paper proposes a method of interpolating a keypoint by fitting a 3D parabola to the nearby sample data. If the calculated offset from the sample point is greater than 0.5 in any dimension, then the keypoint is moved to that position instead.

    8) Orientation Assignment

    The orientations need to be assigned to each of the remaining keypoints so that local pixel data can be described relative to the orientation of a keypoint for rotational invariance.

    The orientation histogram generates the gradient from a region around the keypoint. The majority orientation in the region is decided as the major orientation. The histogram consists of 36 bins (one for each 10◦ step). Each sample contributes to the appropriate bin by its magnitude weighted by the Gaussian window.

    Once the histogram has been calculated, keypoints are created for each orientation that has a value of 80% of the maximum histogram value or more.

    Figure 4: Orientation

    9) Keypoint Descriptor

    Once all keypoint locations have been determined and have orientations assigned to them, the next stage is to create a descriptor to represent the image data around the keypoint in an invariant form.

    Figure 5: Orientation histogram

    The reason for using the orientation histogram for feature point descriptor is because the gradient, which is more robust method to deal with illumination.

    When the illuminations changes on the image that is like every single pixel multiplies to a constant, this has the same aff ect on the gradient that multiplies to a constant. The constant will be cancelled by the normalization.

    The blue circle indicates the Gaussian weighted region. As the pixels are closer to the keypoint, they provide greater the contribution of gradient direction information. 以关键点为中心取8×8的窗口, 每个小格代表关键点邻域所在尺度空间的一个像素,箭头方向代表该像素的梯度方向,箭头长度代表梯度模值,图中蓝色的圈代表高斯加权的范围(越靠近关键点的像素梯度方向信息贡献越大)

    A keypoint is described by 2×2共4个 sub points, each sub point with 8 direction vector information. 这与HOG的区别在于HOG使用密集矩阵描述, Whereas SIFT uses sparsely distributed descriptors positioned at extrema found in scale-space, HOG uses a dense array of overlapping histograms across a sample window.

    实际计算过程中,为了增强匹配的稳健性,Lowe建议对每个关键点使用4×4共16个sub points 来描述,即

    这样对于一个关键点就可以产生4*4*8=128个数据,即最终形成128维的SIFT特征向量。此时128维SIFT特征向量已经对尺度变化、旋转等几何变形因素robust,再继续将特征向量的长度归一化,则可以进一步去除光照变化的影响。

    10) Matching

    After generate the SIFT feature vector of the two images, Euclidean distance is used to determine the similarity measurement according to the location, scale and orientation of the each keypoint. The best match is found the smallest Euclidean distance of its neighbour for each keypoint.

    取图像1中的某个关键点,并找出其与图像2中欧式距离最近的前两个关键点,在这两个关键点中,如果最近的距离除以次近的距离少于某个比例阈值,则接受这一对匹配点。降低这个比例阈值,SIFT匹配点数目会减少,但更加稳定。

  • SIFT算法建议在某一个尺度上的对关键点的检测,可以通过对两个相邻高斯尺度空间的图像相减,得到一个DoG(高斯微分)的响应值图像D(x, y, σ)。然后,通过对响应值图像D(x, y, σ)进行局部极大搜索,在位置空间和尺度空间中定位关键点。The Difference of Gaussian module is a filter that identifies edges.
  • The DOG performs edge detection by performing a Gaussian blur on an image at a specified theta (also known as sigma or standard deviation). The resulting image is a blurred version of the source image. The module then performs another blur with a sharper theta that blurs the image less than previously.
  • The final image is then calculated by replacing each pixel with the difference between the two blurred images and detecting when the values cross zero, i.e. negative becomes positive and vice versa. The resulting zero crossings will be focused at edges or areas of pixels that have some variation in their surrounding neighborhood.
  • http://underthehood.blog.51cto.com/2531780/658350

http://www.prg-cn.com/article-7614-3.html

http://www.myexception.cn/image/293782.html

http://www.404qa.com/q-15057.html

More useful paper:

An Exploration of the SIFT Operator

Monitoring 3D camera rigs for film production

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

生活在西班牙

自己动手丰衣足食

BlueAsteroid

Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......

我的心情魔方

天才遠私廚

希望能做一個分享各種資訊的好地方

语义噪声

西瓜大丸子汤的博客

笑对人生,傲立寰宇

Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision

datarazzi

Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness

槑烎

1,2,∞

Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: