shape database of image

As mentioned in this post

A shape-based retrieval system:1

The use of shape as a cue for indexing into pictorial databases

This has been traditionally based on global invariant statistics and deformable templates, and local edge correlation on the other.

an intermediate approach based on a characterization of the symmetry in edge maps


how Shape Context use dataset for testing?

1. Original SC+TPS

1NN clssfifier with shape context dissimilarity as the distance measure:

  1. estimate affine transforms between the query shape and prototype
  2. apply the affine transform and recompute the shape contexts for the transfromed point set
  3. score the match by summing up the shape context distances between each point on a shape to its most similar point on other shape

Digit recognition

    • MNIST hand-written dataset: 60,000 training and 10,000 test digits, n=100(Canny edge), nd = 5, no = 12

3D object recognition

    • COIL-20 dataset: 72 vies from 20 common household objects

MPEG-7 shape silhouette database –

    • CE(core experiment)-shape-1 part B: 1400 images from 20 images*70 shapes

 Trademark Retrieval

    • Trademark are visually often best described by their shape info, 
    • Vienna code broadly categorizes trademarks, manually classify by perceptual similarity
    • 300 trademarks, n=300, 8 query trademarks, top 4 hitst


3. Inner distance shape context [pdf]

optional invariance

    • sampled n landmark points – larger n produce greater accuracy with less efficiency
    • size of histogram:

– nd: number of inner distance bins = 5, sometimes nd =8 can get better result

– n0: number of inner-angle bins = 12

– nr: number of relative orientation bins = 8

    • number of different starting points for alignment, for dynamic programming
      • larger k can improve the performance further,
    • penalty for one occlusion t, [0.25, 0.5] do not affect the results too much

Test Dataset

    • an articulated shape dataset40 images from 8 different objects, n = 200, nd = 5, no = 12, k = 1


Each object has 5 images articulated to different degrees

    • MPEG7 CE-Shape-11400 silhouette images from 70 classes, n = 100, nd = 8, no = 12, k = 8 

mpegTypical shape images from the MPEG7 CE-Shape-1, one image from each class.

complexity of shapes are mainly due to the part structures but not articulations

    • Kimia silhouettes: 
      • (a) 25 images from 6 classes, n = 100, nd = 5, no = 12, k = 4
      • (b) 99 images from 9 classes, n = 300, nd = 8, no = 12, k = 4


    • ETH-80 data set: 80 objects from 8 classes,41 views for each objects, 3280 images in total. n = 128, nd = 8, no = 12, k = 16

ethleave-one-object-out cross-validation for test mode: for each image, comparing to all images from the other 79 objects

    • Swedish leaf dataset: 75 leaves from 15 species, n = 128, nd = 8, n0 = 12, nr = 8, k = 1

swEach species contains 25 training samples and 50 testing samples per species, the recognition result is 1-nearest-neighbor

    • Smithsonian leaf dataset343 leaves from 93 species, nd = 5,  nr = 8, no = 12, no use DP

smOne typical image from each species is shown

187 of them are used as the training set and 156 as the testing set

    • human motion silhouette dataset: human body matching

4. SC + Chamfer matching in cluttered scene

5. Partial shape matching [pdf]

 ETHZ Shape Classes dataset v1.2

INRIA horses datasett v1.03

6.Angular Partitioning Sketch-Based Image Matching [pdf]2005

    • Model: 4000 full color heterogeneous images of various sizes (500 in 8 groups), which is a true-balanced combination of:
      • 250 art works, gained from the World Art Kiosk, California State University
      • 250 real natural photographs from set S3 of the MPEG-7 database
      • Each group contains 8 similar images created by rotation in steps of 45
    • Query: 400 sketches (100 in 4 groups):
      • hand-drawn black and white sketches similar to 100 arbitrary candidates from the model and their rotated versions

        (90 , 180 , and 270 )

      • scanned with 200 dpi resolution
      • each input query has eight similar images in the database, in the best case there are eight nonsimilar images in the retrieval list


  • Elastic Matching of User Sketches97 [pdf],
    • 100 test images: 22 Morandi paitings, 10 sacred pictures, and sample pictures

    • of diverse objects with dissimilar shapes

  • CBIR by shape matching06 [pdf], VRIMA [pdf]
    • 20 Morandi’s bottle paintings
    • each database image is stored as a collection of object shapes
    • each shape is represented by a list of 30 vertices
    • qualitative:
      • 25 smapled object images,m
      • 3 templates3
      • 15 people classify 25 paitings by the similarity with each of 3 templates
    • quantitative: precision and recall
      • 5 queries:
        • oblong shape
        • squared shape
        • round bottle shape
        • squared bottle shape
        • irregular shape5
  • Shape Similarity with Perceptual Distance and Effective Indexing00 [pdf]
      • test database: 1637 shapes of objects extracted from 20th century paintings.

      • Each shape has been sampled at 100 equally spaced points

      • M-tree indexing structure
      • qualitative effectiveness: measure to what extent the system agrees with human perception,
        • 22 sample images 

        • representing bottles

  • 3 reference bottle sketches
      • 42 people: for each sketch, assign a score [0,1] to its retrieved image
    • quantitative: precision and recall
    • occlusion test: three reference bottle sketches



【dataset】Snodgrass&Vanderwart line drawing

  • The dataset contains line drawings of 260 general object, which are a standard set of objects that have been frequencly used in the psychophysics community for tests with human subjects.
  • Only line drawings available, without any texture and color information
  • Has only one image per object,

Only shape matters, color and other surface characteristics are not part of an object representation (e.g. Biederman, 1987)

However, Object naming is facilitated by congruent surface color and photographic detail as compared to line drawings (Price & Humphreys, 1989)

1 2

Texture and color contribute to object recognition for all categories of objects, including artefacts without any dagnostic color.

  • Better segmentation ? (perceptual contribution)
  • Better recognition ? (“knowledge-based” contribution)

When all informations are available, objects are recognized at the same speed, suggesting that recognition of an object is based on multiple cues, with contour and surface information all part of an object representation and providing important information for recognition.


Snodgrass, J.G. & Vanderwart, M. (1980). A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. JEP:HPP, 6, 174-215.

Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Vanderwart’s object set: The role of surface detail in basic-level object recognition. Perception, 33, 217-236.PDF file.  Download the colored and shaded images set of “Snodgrass and Vanderwart-like” objects

【Dataset】Visipedia CUB-200-2011


CUB Birds-200–a challenging dataset of birds in unconstrained pose and environment.

Visipedia, short for “Visual Encyclopedia,” is an augmented version of Wikipedia, where pictures are first-class citizens alongside text.

Goals of Visipedia include

  • creation of hyperlinked, interactive images embedded in Wikipedia articles,
  • scalable representations of visual knowledge,
  • large scale machine vision datasets, and
  • visual search capabilities.

Human in the loop: Visipedia advocates interaction and collaboration between machine vision and human users and experts.


【dataset】Weizmann Horses

The Weizmann Horse Database consists of 328 side-view color images of horses that were also manually segmented.

The images were randomly collected from the WWW to evaluate the top-down segmentation scheme as well as its combination with bottom-up processing.

More refer to Eran Borenstein from MIT

Other horse dataset:

【shape dataset】ETHZ Shape Classes

a dataset for testing object class detection algorithms from ETH Zurich CV lab

Some object classes, by their nature, are better represented by contour features than by image patches or interest points. E.G. Mug

ETHZ Shape Classes

contains images of five diverse shape-based classes, collected from Flickr and Google Images. The images represent the clutter, intra-class shape variability, and scale changes.

  • The dataset tries to include objects appearing at a wide range of scales. For example, object comprises only a rather small portion of the image.
  • The objects are mostly unoccluded and are all taken from approximately the same viewpoint (the side).

The dataset has been collected and annotated by Vittorio Ferrari, and experiments on it first appeared in [1]. They tackled the challenge of detecting objects in real images given a single hand-drawn example as ‘model’, the hand-drawings are included in release of Version: 1.2.

  • Detect objects with hand-drawing: Using all 255 images as test set for every class. Hence, to search for one object, e.g., images of mug, making for a large negative test set. This is important as it allows to get a reliable value for the incidence of false-positives generated by the detection algorithm.
  • Train model from real images: This dataset is also suited for the conventional setting in which models are learnt from real images (for example, by splitting the dataset in half training / half testing). The further results in this setting in [2,3,4]. Moreover [3,4] also report experiments in the setting of [1], i.e. using a single hand-drawn example as a model.

Five classes are covered(in total 255 images,  289 instances):

  1. apple logos, 40 images, 44 instances
  2. bottles, 48 images, 55
  3. giraffes,  87, 91
  4. mugs, 48, 66
  5. swans, 32, 33

Most images contain a single instance of an object class, while some contain multiple instances. No image contains instances of different classes.

Groundtruth bounding-boxes:

Object bounding boxes are included in files


  • Each line in the file encodes the bounding-box of an instance of <class> in <image>
  • The coordinates of the bounding-box appear in the following format

top_left_x top_left_y bottom_right_x bottom_right_y

Groundtruth outlines(NOT used during training)

  • Object outlines for applelogos, bottles, and giraffes are included in files (a separate file per object instance ):


  • For mugs and swans, the outlines are in files(All instances are in the same file, and different instances have different greylevels):


Performance plots

The complete detection-rate vs FPPI performance plots for all their works [1,2,3,4], as they appeared in [4], are included in this release (as well as plots for the Chamfer Matching baseline).

  • plots/real-images:

plots in his directory correspond to figure 12 of [4]. The models used for these plots have been trained from a subset of the ETHZ Shape Classes. The test images are a disjoint subset of thedataset.

  • plots/hand-drawings:

plots in this directory correspond to figure 17 of [4]. The models used for these plots the hand-drawings from the ETHZ Shape Classes. The test set are all real images in the ETHZ Shape Classes.

Refer to [4] for details of the meaning of each curve, and for the exact experimental setup.

Edge maps

In files *_edges.tif, edge maps produced by the excellent Berkeley ‘natural boundary detector’ were included. Using this advanced edge detector instead of the standard Canny, resulted in a significant improvement in object detection performance. We recommend using these edge maps.

[1] Vittorio Ferrari, Tinne Tuytelaars and Luc Van Gool, Object Detection by Contour Segment Networks, ECCV 2006, Graz, Austria

[2] Vittorio Ferrari, Loic Fevrier, Frederic Jurie, and Cordelia Schmid, Groups of Adjacent Segments for Object Detection,
PAMI, January 2008

[3] Vittorio Ferrari, Frederic Jurie, and Cordelia Schmid, Accurate Object Detection with Deformable Shape Models Learnt from Images, CVPR 2007, Minneapolis, USA

[4] Vittorio Ferrari, Frederic Jurie, and Cordelia Schmid, From Images to Shape Models for Object Detection, IJCV 2009 (to appear)

Yann LeCun’s MNIST【dataset】

The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.


【dataset】Chui-Rangarajan Synthesized Data Sets

Synthetic data is easy to get and can be designed to test a specified aspect of the algorithms.

There are three sets of data designed to measure the robustness of an algorithm under deformation, noise and outliers.

In each test, the model point set is subjected to one of the above distortions to create a “target” point set. And then we run our algorithm to find the correspondence between these two sets of points and use the estimated correspondence to warp the model.

The accuracy of the matching is quantified as the average Euclidean distance between a point in the warped model and the correspondent point in the target.100 samples are generated for each degradation level.  Several examples from the synthetic data sets are shown as follows: 

The value of m refers to the test number, n refers to the test example number. The range of m is 1-5 for deformation and outlier, 1-6 for noise. The range of n is 1-100.

Example: save_fish_outlier_3_2.mat
This corresponds to the “fish” shape, outlier test, test no. 3 (out of 5), corresponding to 100% outlier to data ratio, example no. 2.

Each file contains the model data (x1), the target data (y2a), and in the case of the noise test, the target data before adding noise (y2).

A matching approach should be:

  1. invariant under scaling and translation
    • Invariance to translation is intrinsic to the shape context, since all measurements are taken w.r.t points on the object
    • scale invariance: we normalize all radial distances by the mean distance between the n*n point pairs in the shape.
  2. robust under small geometrical distortions, occlusion and presence of outliers
    • Shape context is insensitive to small perturbations of parts of the shape.
    • Robustness to small nonlinear transformations, occlusions and presence of outliers is evaluated experimentally
    • Complete rotation invariance: to use a relative frame based on treating the tangent vector at each point as the positive x-aixs. ===so the reference frame turns with the tangent angle,

In certain applications, one may want complete invariance under rotation, or even the full group of affine transformations.

But in many applications, complete invariance impedes recognition performance, e.g., distinguishing 6 from 9 rotation invariance would be completely inappropriate. Many points will not have well-defined or reliable tangents.

Many local appearance features lose their discriminative power if they are not measured in the same coordinate system.

Thin Plate Spline model is commonly used for representing flexible coordinate transformations.

TPS is the 2D generalization of the cubic spline

Handwriting is a kind of non-rigid shape.



















Just another site

Jing's Blog

Just another site

Start from here......







Just another site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web