重新解读HoG 和PHoG (pyramid HoG)

从零开始学习Hog特征,一开始网上搜了些资料,作大概了解;由于功底不深,看作者Navneet Dalal和Bill Triggs的paper[4]也不知所云,但现在发现,解铃还须系铃人,要知道本质还要去追溯源头。

中文网上疯传的Hog理解和解释大致为下面三者之一:

1. Timehandle 的百度:code + 图释  http://hi.baidu.com/timehandle/blog/item/ca6e3cdfab738fe376c638a8.html

2. 丕子的《hog理解与源码》:理论解释 http://www.zhizhihu.com/html/y2010/1690.html

3. CSDN上狂转的不知original来源的HoG:个人以为这个比较容易懂,另外附加了一些原作者phd thesis的关键点引用,很不错。http://blog.csdn.net/pp5576155/article/details/7023709

4. 其他。比如Z-Wang.com, 冒泡的崔(有matlab code)

 

HoG,顾名思义,就是一个histogram,具体说是图像边缘斜率的histogram。Dalal&Triggs的创意之处在于——采用分治思想:把图像分成若干cells (所以Hog可以抵御图像geometric & photometric deformation带来的负面效应),然后通过某种策略(如4个cells组成一个block的local nomarlization,采用sliding windows的方式减少cell间的contrast差异)计算所有cells的normalized histogram。

ps:据某些research,特别是The Pascal Challenge2005-2008,sliding window classifier 在object detection中地位越来越重要。

具体实现如下:

1. 计算每个cell(8*8 pixels)的histogram:

  • 首先计算cell中每个pixel的gradient,包括magnitude和orientation
  • 为histogram投票:论文中的orientation-based histogram channel就是histogram的bins,不通的应用可以采用不同数目的bins(也可以叫orientation-based histogram channel), 如
  1. N-T paper[4] 推荐在行人检测使用1-180度分成9个histogram bins效果最好,即9-dimensional HoG;
  2. [2]使用了9个方向histogram分别进行4次不同normalization(L1-sqrt, L1-norm, L2- Hys, L2-norm )共组成4*9=36-dimensional HoG;
  3. [1] 使用31个bins:9 orientations (contrast insensitive orientations) under a single normalization+4 texture gradients+18 contrast sensitive orientation

论文中提出了多种投票计分方式,每一票的分值都根据gradient的强度加权,即最后histogram每个bin的值eg.,{m1,m2,…,m9} 是每个像素magnitude of gradient的函数:

  1. 幅值本身magnitude
  2. 幅值的平方(square of the gradient magnitude)
  3. 幅值的平方根(square root)
  4. 幅值的截断形式(clipped version of the magnitude)

2. 归一化

采用L2- Hys L2-norm 和 L1-sqrt方式所取得的效果是一样的,L1-norm稍微表现出一点点不可靠性。

3. training a model (hog vector)

using the linear SVM

4. test the model

A filter is a rectangular template defined by an array of d-dimensional weight vectors. The response, or score, of a filter F at a position (x,y) in a feature map G is the “dot product” of the filter and a subwindow of the feature map with top-left corner at (x,y),

优化Hog:

By doing principal component analysis (PCA)on HOG features the dimensionality of the feature vectorcan be significantly reduced with no noticeable loss of information. At the same time by examining the principal eigenvectors we discover structure that leads to “analytic” versions of low-dimensional features which are easily interpretable and can be computed efficiently.

 

问题:

1. HoG is only sensitive to the direction of gradients.

so alising – two image blocks which are perceptually very different can end up with very similar HoG feature:

diffuse gradient

step edge

2. 许多有用的图像信息是来自变化剧烈的边缘,而在计算梯度之前加入高斯滤波会把这些边缘滤除掉。这个高斯平滑滤波的加入使得检测效果更差

3. 对遮挡的物体效果差。 Compared to the part-based detectors, the sliding window approach handles partial occlusionspoorly. Because the features inside the scanning window are densely selected,  in a sliding window, a classifier is applied at all positions, scales, and, in some cases, orientations of an image.

改进:An HOG-LBP Human Detector with Partial Occlusion Handling pdf

4. 资源耗费大(时间,空间)

 

Pyramid HoG (PHoG)

这个特征descriptor用在partial-detection上效果最好,因为你可以计算detector和各种scale和position的图像窗口的Hog特征。

PHoG 简单来说就是多层HoG的结合,每层HoG来自不同scale图像的HoG,即我们可以放大/缩小图像,然后计算其标准HoG特征。那么我们到底需要放大/缩小图像多少次呢?当然这也是凭个人喜好。pyramid的level越多,则检测时所耗的时间越长。

在Felzenszwalb et al. 提出的一系列part-based model训练过程中,在[1,2]中, 作者在训练中使用5 levels,检测时使用10 levels PHoG

 

reference

[1] P. Felzenszwalb, D. McAllester, D. Ramaman. A Discriminatively Trained, Multiscale, Deformable Part Model. Proceedings of the IEEE CVPR 2008.

[2] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, September 2010 pdf

[3] P. Felzenszwalb, R. Girshick, D. McAllester Cascade Object Detection with Deformable Part Models. Proceedings of the IEEE CVPR 2010.  pdf you also can get code star-cascade source code, This project is now hosted on github.

[4] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

生活在西班牙

自己动手丰衣足食

BlueAsteroid

Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......

我的心情魔方

天才遠私廚

希望能做一個分享各種資訊的好地方

语义噪声

西瓜大丸子汤的博客

笑对人生,傲立寰宇

Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision

datarazzi

Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness

槑烎

1,2,∞

Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: