Feature uniqueness means that under a large-size dictionary, the probability of at two detected features sharing the same word index is very low.
The repeatablity shows that compared to randomly selected s min-hashes for a s-tuple, geometrically selected s-tuple (also with similar scales) has higher probability to repeat in a related image.
Hashing based methods are more suitable for partial duplicate image discovery, because all images can be hashed into a hash table and hash collisions can be retrieved as similar images, which can then be further expanded into more complete
image clusters by image retrieval.
An image is represented as a set of visual words, which could be obtained by quantizing local SIFT feature descriptors. The similarity between two images can be deﬁned as the Jaccard similarity between the two corresponding sets of visual words I1 and I2, which is simply the ratio of the intersection to the union of the two sets:
Min-hash and its variants can then be applied to ﬁnding similar sets and therefore similar images.
Min-hash is a hash function h : I -> v, which maps a set I to some value v. More speciﬁcally, a hash function is applied to each visual word in the set I, and the visual word that has minimum hashed value is returned as the min-hash h(I).
One way to implement the hash function is by a look-up table, with a random ﬂoating-point value assigned for each visual word in the vocabulary, followed by a min operator. The computation of the min-hash of a set I involves computing a hash of every element in the set and the time taken is therefore linear in the size of the set |I|
2. Partition min-Hash (PmH)
3. Geometric min-hash (GmH)
To further utilize geometric constraints among visual words, we augment PmH by encoding the geometric structure in the sketches.