Mathematicians define shape as an equivalence class under a group of transformations. This tells when two shapes are exactly the same, we need more than that for a theory of shape similarity/shape distance.
The statisticians define shape to address the problem of shape distance, but assumes that correspondences are known. Other statistical approaches to shape comparison do not require correspondences – e.g., one could compare feature vectors containing descriptors, e.g., area/moments, but they discard detailed shape information.
Broadly speaking, there are two approaches:
- feature-based: use spatial arrangements of extracted feature, e.g., edge elements/junctions
- Boundaries of silhouette image:
- Silhouettes do not have holes or internal markings, the associated boundaries are represented by a single-closed curve which can be parametrized by arclength.
- Fourier descriptors
- medial aixs transform – capture the part structure of the shape in the graph structure of the skeleton
- 1D nature of silhouette curves
- comparing silhouette in MPEG-7 standard
- Silhouette ignore internal contours, difficult to extract from real images, so treat the shape as a set of points in 2D image (e.g., edge detector).
- Hausdorff distance is extended to deal with parital matching and clutter, but no return correspondences.
- Several approaches to shape recognition based on spatial configurations of small number of keypoints or landmarks:
- Geometric hashing votes for a model without explicitly solving for correspondences
- Train decision trees for recognition by learning discriminative spatial configurations of keypoints
- Gray-level info at the keypoints provides greateer discriminative power
-
Not all objects have distinguished keypoints(e.g., circle), using keypoints alone sacrifices the shape info available in smooth portions of object ontours
- brightness (Appearance) -based: make more direct use of pixel brightness
- complement to feature-base methods
- Make direct use of gray values within the visible portion of the object, instead of focusing on the shape of the occluding contour or other extracted features.
- Build classifiers without explicitly finding correspondences, but relies on a learning algorithm having enough examples to acquire the appropriate invariances