Occlusions, cluttered backgrounds, and viewpoint or orientation changes that occur in real-world images motivated the development of object recognition or image retrieval methods that model image appearances locally by using the so-called “local-features”. e.g., the neighbourhood of corners, lines/edges, contours or homogenous regions capture interesting aspects of images to classify or compare them.
A single detector might not capture enough information to distinguishing all images!
Using several detectors on a uniform grid or even randomly. e.g., subwindow random sampling scheme: square patches of random sizes are extracted at random locations in image, resized by bilinear interpolation to a fixed-size 16×16, described by HSV value for color images (768d), or gray intensities to graylevel images(256d).
Extraction of random subwindows described by raw pixel values – randomized trees:
The method recursively partitions the training sample of subwindows by randomly generated tests.
- Each test is chosen by selecting a random pixel component among the 768 wubwindows descriptors, and a random cut-point in the range of variation of the pixel component in the subset of subwindows associated to the node to split.
- Each test associated to an internal node of a tree, just simply compares a pixel component to a numerical threshold
- The development of a node is stopped: either all descriptors are constant in the leaf, or the number of subwindows in the leaf is smaller than a predefined threshold.
- A number of such trees are grown from the training sample.
There exists a number of indexing techniques based on recursive partitioning.
- Use of an ensemble of trees
- Random selection of tests in place of more elaborated splitting strategies: based on a distance computed over the whole descriptors taken at the median of the pixel component whose distribution exhibits the greatest spread.
- Computational complexity is essentially independent of the dimensionality of the feature space, O(Nlog(N)) in the number of the subwindows. Like other tree methods
Indexing with totally randomized trees, is also related to the random projection method of locality-sensitive hashing (LSH), which used to approximate nearest neighbour searches.
- The assumption is that nearby objects are more likely to be hashed to the same bucket than distance ones: Once a new object is hashed to a bucket, a similarity measure is computed between this object and all reference objects which were hashed in the same bucket during indexing phase
Extra-Trees: Totally randomized trees
Use of extremely randomized trees to build visual vocabularies before applying the SVM classifier, binary encoding in tree leaves is not good, real-valued similarity measure preform image classification from labeled images.
Image similarities from tree ensembles
- Two subwindows are very similar of they fall in a same leaf that has a very small subset of training subwindows.
- Two subwindows are similar if they are considered similar by a large proportion of the trees
- The smallest number of subwindows – the bigger, more subwindows falling into the same leaf which yields a higher similarity, and the number of trees – controls the smoothness of the similarity
The similarity between two images
the average similarity between all pairs of their subwindows, although finite, the number of different subwindows of variable size and location which can be extracted from a given image is in practice very large, thus we propose to estimate by extracting at random from each image an a priori fixed number of subwindows.
Given a set of Nr reference images, to find images from this set which are most similar to a query image Iq:
- Randomly extract Nls subwindows of variable size and location from each reference image, resize them to 16×16, and grow an ensemble of totally randomized trees from them.
- Compute k(Iq, Ir) for each reference image Ir and returns the N most similar images to the query according to these similarities.