【Near duplicate image】高相似图片

>The exact duplicate detection: no changes are allowed

>The near duplicate image detection (NDID): the image of the same scene or object.

Detection near duplicate images in large databases imposes two challenging constraints on the methods used:

  1. for each image only a small amount of data (a fingerprint) can be stored
  2. queries must be very cheap to evaluate.

>The choice of an image representation

>The choice of the distance measure


>The amount of stored data – from a constant (small) amount of data per image to storing large sets of image features, whose size often far exceeds the size of the images themselves.

>A bag of visual words with tf-idf: 

The tf part of the weighting scheme captures the number of features described by a given visual word. The frequency of visual word in the image provides useful information about repeated structures and textures

The idf part captures the informativeness of visual words – visual words that appear in many different images are less informative than those that appear rarely

>The min-Hash method:

The min-Hash method stores only a small constant amount of data per image, and a complexity for duplicate enumeration that is close to linear in the number of duplicates returned.

  • The image is represented by a sparse set of visual words;
  • Similarity is measured by the set overlap (the ratio of sizes between the intersection and the union)
  • The drawback is that some relevant information is not preserved in the set of visual words (binary) representation.

>Comparison of two schemes for near duplicate image and video-shot detection:

1) Locality Sensitive Hashing for fast retrieval on global hierarchical tiled colour histograms

2) Local feature descriptors (SIFT) 

(i) being perceptually identical (e.g. up to noise, discretization effects, small photometric distortions etc); and

(ii) being images of the same 3D scene (so allowing for viewpoint changes and partial occlusion)

Normally the near-duplicate images differ in size, color adjustment, compression level, etc. Therefore the exact duplicate detection will not be able to group all similar results together.

>Near-duplicate shot detection (NDSD): 

Given a reference shot, this can be used to find all shots in a database that are near-duplicates of the reference, where we define this to mean that a high proportion of images in the reference shot have near-duplicates in the returned shot.

Application:  a large amount of copyrighted television and films


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s




Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......







Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: