[paper]Fast Multiresolution Image Querying

User may paint on a “canvas” of any aspect ratio. However, the painted query is internally rescaled to a square aspect ratio and searched against a database in which all images have been similarly rescaled as a preprocess.

how a user-specified aspect ratio might also be used to improve the match?

To evaluate our image querying algorithm, we collected three types of query data:

  • The first set, called “scanned queries,” were obtained by printing out small 1/2×1/2  thumbnails of our database images. 270: 100 were reserved for evaluating our metric, and the other 170 were used as a training set.
  • The second set, called “painted queries,” were obtained by asking 20 subjects, most of whom were first-time users of the system, to paint complete image queries, in the non-interactive mode, while looking at thumbnail-sized versions of the images they were attempting to retrieve. 270
  • The third set, called “memory queries,” without being able to see the image directly. 100 queries only for evaluation

To see if painting from memory would affect retrieval time: 3 subjects

  • Each subject was then asked to paint the 10 images from the first set while looking at a thumbnail of the image, and the 10 images from the second set from memory
  • median query time increased from 18 seconds when the subjects were looking at the thumbnails, to 22 seconds when the queries were painted from memory.

Users will typically be able to sketch all the information they know about an image in a minute or less, whether they are looking at a thumbnail or painting from memory.

  1. If the query fails to bring up the intended target within a minute or so, users will typically try adding some random details, which sometimes help in bringing up the image.
  2. If this tactic fails, users will simply give up and, in a real system, would presumably fall back on some other method of searching for the image.

benefits of painting queries interactively:

  1. the time to retrieve an image is generally reduced because the user simply paints until the target image appears, rather than painting until the query image seems finished.
  2. the interactive mode subtly helps “train” the user to find images more efficiently, because the application is always providing feedback about the relative effectiveness of an unfinished query while it is being painted.

Multiresolution Query by Image Content

The key to the algorithm is the establishment of an effective and efficient metric capable of computing the distance between a query image Q and a potential target image T. The chosen metric use the YIQ color space and the Haar wavelet.

figure1.gif (82382 bytes)

Wavelet decomposition:

  • few coefficients provides a good approximation of the original image retaining information from existing edges;
  • presents relative invariance to resolution changes;
  • it is fast to compute, running in linear time in size of the image;
  • spatial localization of the frequencies.

Compute Haar wavelet transform For each image, truncating and quantizing its coeffecients. Those remaining coeffecients represent the image signature.

figure3.gif (19639 bytes)

A painting its truncated and quantized wavelet decomposition with 2000 coefficients
(Y color channel)
the actual decomposition used with 60 coefficients
(Y color channel)


To use a low-resolution database, and also use a low-resolution version of the image provided for querying. This speeds up the process substantially and gives essentially the same results since the continuous wavelet transform is invariant under change of scale and almost all of the largest m coefficients are located in the low resolution.

We implemented an estimated perceptual of approximation between the query image and the images in the data base.

  • We also tried to use another Haar decomposition, which is faster than the one used by Salesin et al.. The results were good, in spite of using the same weights of the Haar initially used.
  • shark.gif (360661 bytes)

Stackoverflow:I implemented a very reliable algorithm for this called Fast Multiresolution Image Querying. My (ancient, unmaintained) code for that is here.

What Fast Multiresolution Image Querying does is:

  • split the image into 3 pieces based on the YIQ colorspace (better for matching differences than RGB).
  • Then the image is essentially compressed using a wavelet algorithm until only the most prominent features from each colorspace are available.
  • These points are stored in a data structure.
  • Query images go through the same process, and the prominent features in the query image are matched against those in the stored database.
  • The more matches, the more likely the images are similar.

The algorithm is often used for “query by sketch” functionality. My software only allowed entering query images via URL, so there was no user interface. However, I found it worked exceptionally well for matching thumbnails to the large version of that image.

Much more impressive than my software is retrievr which lets you try out the FMIQ algorithm using Flickr images as the source. Very cool! Try it out via sketch or using a source image, and you can see how well it works.


Mona Lisa (monalisa.jpg)

person has a coarse-scale idea of what the Mona Lisa, for instance, looks like. This information should be fairly useful for finding an actual image of the Mona Lisa, but given current techniques, searches for visual data break down as effective strategies when the database size increases to even a small fraction of the number of images on the World Wide Web.

Our algorithm should also be well suited to matching coarse-scale versions of images to high detail versions of the same image. Users should be able to sketch an image in a simple drawing application where a lot of detail is not easy to add to the query image. They should also be able to enter images that have been digitized by the use of a scanner, which we assume introduces blurriness and additional noise such as scratches, dust, etc, to the extent that they would find it highly useful to search for a higher-resolution version of the image online.

Ideally, we would also like our algorithm to be able to handle affine transformations, such as translation, rotation, and scaling. It is unreasonable to expect a user to be able to draw parts of an image in exactly the same region that they appear in the original image. While these three transformations are all important components of an image querying system, we made the decision to focus on translation because it seems like the most likely type of error that a user would make.

The primary drawback of FMIQ is that the approach is ineffective for detecting shifts of an image since the separable discrete wavelet basis is not shift-invariant. Therefore, we propose the use of the complex discrete wavelet basis which possesses a high degree of shift-invariance in its magnitude. When coupled appropriately with the two-dimensional Discrete Fourier Transform, the two-dimensional Complex Discrete Wavelet Transform allows us to match shifted versions of an image with a significantly higher degree of certainty than does the approach of Jacobs, et al.

The signatures are computed as follows:

  1. Compute the discrete wavelet transform of the image.
  2. Set all but the highest magnitude wavelet coefficients to 0.
  3. Of the remaining coefficients, quantize the positive coefficients to +1 and the negative ones to –1.
  4. These +1’s and –1’s correspond to the feature points in an image, and basically characterize the image structure. Jacobs et al. concluded, after some experimentation, that on their database, considering the top 60 magnitude coefficients worked well for scanned images, while the top 40 coefficients gave best results for hand-drawn images.
  5. The signatures in our implementation were compared using the generic L1 norm of the difference between signature matrices. Jacobs et al. use the non-intuitive “Lq” norm, which somehow weights the coefficients corresponding to different scales differently. This idea definitely carries some merit, but Jacobs et al. do not provide a very good explanation of this scheme, and we don’t believe that it will improve the performance of their querying algorithm significantly.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......







Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: