easy to recognize objects will often have many visual associations created around the object of interest

augment each detection’s raw score with context score – weighted sum of the local association score and the context score (overlapping detection).

Various representation like GIST, SIFT, HOG, wavelet etc. try to capture the local salient – high gradient & high contrast. More informative part should be given higher importance. So estimate the importance of feature with respect to particular scene’s overall visual impression is crucial

same local features could represent very different visual content depending of context – so each query image decide the best way to weight its parts.

Shechtman and Irani [2007] described an image in terms of local self-similarity descriptors that are invariant across visual domains.

SHECHTMAN, E., AND IRANI, M. 2007. Matching local selfsimilarities across images and videos. In CVPR.


HOG is able to describe an image based on “the distribution of local intensity gradients or edge directions

A major downside to the HOG+SVM approach to object detection is that it runs very slowly. Full frame detection on a 640×480 pixel frame takes 4 seconds for hazmat signs. On an 800×600 frame it takes 12 seconds. Luckily the algorithm is highly amenable to parallelisation and a modern GPU can take this processing time down to 66 milli-seconds per 640×480 frame. It can even process a 1280×960 frame in 184ms.

the SVM is able to detect any HOG with a width and height of 16 cells (16*16*9 unsigned orientation bins per cell). the edges contribute a strong vote to the orientation bins of the HOG.

There are many overlapping rectangles because the full frame image is scanned by sampling a dense grid of windows at various scales across the whole frame. Each window is fed into the SVM to determine whether that rectangle contains that object or not.

The HOG cells can cover an arbitrary number of pixels, for example each cell may cover 4×4 pixels or 64×64 pixels and the HOG would still have 16×16 HOG cells. This means that an input image may be scanned at various scales to find both large and small hazmat signs in the scene.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Just another site

Jing's Blog

Just another site

Start from here......







Just another site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: