Inverted File System Structure

An inverted file system is an array of key-value pairs:

the key being the Sift descriptor of all known images, the value being an array of image entry.

An image entry consists of an image id and possibly other refinement data, e.g. hamming embedding[3], for improved matching accuracy.

K-mean algorithm:is used to cluster the descriptors to form code words Each code word will then be associated to multiple images. Otherwise, if no clustering is done, the each codeword only associate one image only.)so the array of image id will have only one element.

To match a query descriptor to a codeword, a brute force is used to find the closest codeword.

A keypoint – in an image is usually an extrema point of convolving the image with a finite length filter:

  • Difference of Gaussian (DoG)
  • Laplacian of Gaussian (LoG)

a convolution is expensive

several downsampling is often required to calculate the filter response extrema over several image scales.

Motilal Agrawal et al. suggested using three integral images:

  1. one square integral image,
  2. two slanted integral images

and an octagon bi-level filter to approximate LoG calculation [2]. Keypionts are found by applying octagon filters of different sizes to an image and finding the locations of extrema. If the extrema is above a certain threshold, the location is considered as a keypoint.

An octagon shape is chosen as the filter’s shape because it is a close approximation to a circular LoG filter. Seven scales of the filter are applied to the image for scale invariance, with sizes set according to the original paper.

  • The Censure detector is used in place of the Sift detector to find keypoints.
  • The Sift descriptor is used to describe Censure keypoints.

The calculation of the Sift descriptor involves finding the statistics, i.e. binning, of the gradient of a small image patch in the vicinity of the keypiont.

The size of the image patch is determined by the size of the keypoint scale given by the Censure detector. Usually the size of the image patch is 1.5 to 3 times the size of the keypoint.

After computing the gradient magnitude of the image patch, the gradient magnitude is weighted by a Gaussian window to emphasize the importance of closer samples.

40 sample images are resized to 256×256 and then indexed by their Sift descriptors on the server machine and stored in an inverted file system (IFS) in RAM. The id of the closest matching image (from 1 to 40) is returned to the client.

 

For each image, four anchor points are added to the corners. When the client scans the image, the four corners will be used as reference points to do a perspective transform to correct the image to its upright position.

The four anchors of the image is tried to be found using template matching per-frame. The realign the image to 256×256. Then its Censure keypoints are localized and Sift descriptors calculated. These descriptors are then Base64 encoded and sent to the server using HTTP Post method.

The server tries to match the received descriptors to its IFS using brute force method. The id of the closest matching image (from 1 to 40) is returned to the client.

A grayscale image I is generated by a scene of piecewise smooth (multiply-connected) surfaces S and albedo反照率 , Nuisances are divided into those that are a group g (contrast transformations, local changes of viewpoint) and a non-invertible map  (quantization, occlusions).

Deviations from this model (non-di ffuse reflectance, mutual illumination, cast shadows, sensor noise) are not represented explicitly and lumped as an additive error n:

As abstract “visual recognition” tasks we consider classifi cations (detection, localization, categorization and recognition) that boil down to learning and evaluating the likelihood p(I|c) of a class c that a ffects the data via a Markov chain c !  ! I.

 

http://www.vision.ee.ethz.ch/research/projects_icu.cgi?topic=4

http://www.willowgarage.com/pages/software/overview

http://vision.ucla.edu/papers/leeS11imavis.pdf

http://www.csse.uwa.edu.au/~du/ps/Huynh-et-al-IVCNZ09.pdf

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

生活在西班牙

自己动手丰衣足食

BlueAsteroid

Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......

我的心情魔方

天才遠私廚

希望能做一個分享各種資訊的好地方

语义噪声

西瓜大丸子汤的博客

笑对人生,傲立寰宇

Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision

datarazzi

Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness

槑烎

1,2,∞

Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: