基于hash方法的相似计算

1. Minhash方法

Minhash方法是Locality-sensitive hashing算法族里的一个常用方法,基本的思想是,对于每一个对象的itemlist,将输入的item进行hash,这样相似的item具有很高的相似度被映射到相同的buckets里面,这样尽量保证了hash之后两个对象之间的相似程度和原来是高相似的,而buckets的数量是远远小于输入的item的,因此又达到降低复杂度的目的。

Feature uniqueness means that under a large-size dictionary, the probability of at two detected features sharing the same word index is very low.

The repeatablity shows that compared to randomly selected s min-hashes for a s-tuple, geometrically selected s-tuple (also with similar scales) has higher probability to repeat in a related image.

Hashing based methods are more suitable for partial duplicate image discovery, because all images can be hashed into a hash table and hash collisions can be retrieved as similar images, which can then be further expanded into more complete
image clusters by image retrieval.

An image is represented as a set of visual words, which could be obtained by quantizing local SIFT feature descriptors. The similarity between two images can be defined as the Jaccard similarity between the two corresponding sets of visual words I1 and I2, which is simply the ratio of the intersection to the union of the two sets:

Min-hash and its variants can then be applied to finding similar sets and therefore similar images.

Min-hash is a hash function h : I -> v, which maps a set I to some value v. More specifically, a hash function is applied to each visual word in the set I, and the visual word that has minimum hashed value is returned as the min-hash h(I).

One way to implement the hash function is by a look-up table, with a random floating-point value assigned for each visual word in the vocabulary, followed by a min operator. The computation of the min-hash of a set I involves computing a hash of every element in the set and the time taken is therefore linear in the size of the set |I|

2. Partition min-Hash (PmH)

3. Geometric min-hash (GmH)

To further utilize geometric constraints among visual words, we augment PmH by encoding the geometric structure in the sketches.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

生活在西班牙

自己动手丰衣足食

BlueAsteroid

Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......

我的心情魔方

天才遠私廚

希望能做一個分享各種資訊的好地方

语义噪声

西瓜大丸子汤的博客

笑对人生,傲立寰宇

Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision

datarazzi

Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness

槑烎

1,2,∞

Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: