1. Image segmentation is an integral part of image processing applications like medical images analysis and photo editing.
- Intermediate-level vision problems such as shape from silhouette, shape from stereo and object tracking could make use of reliable segmentation of the object of interest from the rest of the scene.
- Higher-level problems such as recognition and image indexing can also make use of segmentation results in matching.
2. The shape of an object (as conveyed by edge curves) is among its most distinctive features. However, shape matchers confront a difficult global/local dilemma:
- local edges carry too little information for reliable matching, combinatorial explosion in the number of potential matches.
- globally the images have too much variability. matching suffers from occlusions and dropouts; also huge search space of possible deformations.
Given a single image, it is extremely difﬁcult to partition it into semantically meaningful elements, not just blobs of similar color or texture.
??? how would the algorithm ﬁgure out that doors and windows on a building, which look quite different, belong to the same segment?
???”how could I build a useful (e.g. rotation invariant) feature descriptor”?
???the grey pavement and a grey house next to it are different segments?
???”regarding the statement from the linked question, how does he accomplish free rotational invariance?”
3. interactive image segmentation
- graph cuts
- random walker
- cellular automata
- Magic Wand – gathers color statistics from the user-speciﬁed image point (or region) and segments (connected) image region with pixels, which color properties fall within some given tolerance of the gathered statistics.
- Intelligent paint – a region-based interactive segmentation based on hierarchical image segmentation by tobogganing. It uses a connect-and-collect strategy to deﬁne an objects region.
4. Some recent shape-matching approaches
use grouped edge fragments as intermediate representations
6. Active contour
Active contour methods for image segmentation allow a contour to deform iteratively to partition an image into regions.
7. CBIR using Component Trees
Some expected advantages of using Component Trees to describe images would be:
- The Component Tree representation of an image would not depend so much on the deformations (even projective) to the image
- Examining different levels of the tree would allow comparisons and operations up to a different level of detail
- Discrimination and description should work better than current techniques on low-textured images
- more accurately match image regions instead of just points, because regions might be more discriminative
- Spatial Consistency, where a group of matches between feature points is accepted only if the feature points keep a similar spatial configuration in both images.
- matching is not only dependent on the type of feature extracted (DoG, MSER,…) or the descriptor (SIFT), but it also looks at the wider surroundings of a feature point, making it (at least a little) region dependent.
8. Low-level local approach
Shape Adapted (centered on corner like features) and Maximally Stable (corresponding to blobs of high contrast, more methods today combine multiple ways of detecting features, and then use SIFT descriptors (which still rock as descriptors) to calculate the invariant vector representations.
E.g. combination of
- DoG: focus on corner-like parts of images
- MSER(Maximally stable extremal regions): focus on blob-like distinguished points through multiple scales
- Keypoint detector:
- Harris corner detector
- Harris-Laplace – scale-invariant version of Harris detector (an affine invariant version also exists, presented by Mikolajczyk and Schmidt, and I believe is also patent free).
- Multi-Scale Oriented Patches (MOPs) – athough it is patented, the detector is basically the multi-scale Harris, so there would be no problems with that
- LoG filter – since the patented SIFT uses DoG (Difference of Gaussian) approximation of LoG (Laplacian of Gaussian) to localize interest points in scale, LoG alone can be used in modified, patent-free algorithm, tough the implementation could run a little slower
- Keypoint descriptor:
- Normalized gradient – simple, working solution
- Wavelet filtered image patch – similar to gradient, the details are given in MOPs paper, but can be implemented differently to avoid the patent issue (e.g. using different wavelet basis or different indexing scheme)
- Histogram of oriented gradients
9. Semantic approaches
are typically based on hierarchical representations of the whole image. The general idea is to represent an image with a tree-shaped structure, where leaves contain the image details and objects can be found in the nodes closer to the root of such trees. Then, somehow, you compare the sub-trees to identify the objects contained in different images.
- make the tree based on the existing features (where the cluster centers in each level are actually the representative, “central”, features for each cluster).
- Once the tree is constructed to the required depth (they recommend 10 cluster over 6 levels), the cluster centers at the last level are representative for a very small number of features
- Each original feature can be represented by the corresponding cluster center, and instead of descriptors, for each image you only need to store the information about which cluster centers – features – it contains.
This is much easier since you only need to store one or two integers per feature – coding it’s path through a tree. The only thing you need to do for adding any new picture in the dataset is determine which of the calculated cluster centers represent the image features the best (as mentioned before, the last level of cluster centers is quite precise) and store the information about the cluster centers (the before-mentioned integers). Looking up the cluster centers should be really fast – there’s only 10 comparisons at each of the 6 levels.