localized contour segmentation

1. Image segmentation is an integral part of image processing applications like medical images analysis and photo editing.

  • Intermediate-level vision problems such as shape from silhouette, shape from stereo and object tracking could make use of reliable segmentation of the object of interest from the rest of the scene.
  • Higher-level problems such as recognition and image indexing can also make use of segmentation results in matching.

2. The shape of an object (as conveyed by edge curves) is among its most distinctive features.  However, shape matchers confront a difficult global/local dilemma:

 

  • local edges carry too little information for reliable matching, combinatorial explosion in the number of potential matches.
  • globally the images have too much variability. matching suffers from occlusions and dropouts; also huge search space of possible deformations.

 

Given a single image, it is extremely difficult to partition it into semantically meaningful elements, not just blobs of similar color or texture.

??? how would the algorithm figure out that doors and windows on a building, which look quite different, belong to the same segment?

???”how could I build a useful (e.g. rotation invariant) feature descriptor”?

???the grey pavement and a grey house next to it are different segments?

???”regarding the statement from the linked question, how does he accomplish free rotational invariance?”

3. interactive image segmentation

  • graph cuts
  • random walker
  • cellular automata
  • Magic Wand – gathers color statistics from the user-specified image point (or region) and segments (connected) image region with pixels, which color properties fall within some given tolerance of the gathered statistics.
  • Intelligent paint – a region-based interactive segmentation based on hierarchical image segmentation by tobogganing. It uses a connect-and-collect strategy to define an objects region.

4. Some recent shape-matching approaches

use grouped edge fragments as intermediate representations

6. Active contour

Active contour methods for image segmentation allow a contour to deform iteratively to partition an image into regions.

7. CBIR using Component Trees

Some expected advantages of using Component Trees to describe images would be:

  • The Component Tree representation of an image would not depend so much on the deformations (even projective) to the image
  • Examining different levels of the tree would allow comparisons and operations up to a different level of detail
  • Discrimination and description should work better than current techniques on low-textured images
  • more accurately match image regions instead of just points, because regions might be more discriminative
  • Spatial Consistency, where a group of matches between feature points is accepted only if the feature points keep a similar spatial configuration in both images.
  • matching is not only dependent on the type of feature extracted (DoG, MSER,…) or the descriptor (SIFT), but it also looks at the wider surroundings of a feature point, making it (at least a little) region dependent.

8. Low-level local approach

Shape Adapted (centered on corner like features) and Maximally Stable (corresponding to blobs of high contrast, more methods today combine multiple ways of detecting features, and then use SIFT descriptors (which still rock as descriptors) to calculate the invariant vector representations.

E.g. combination of

  1. DoGfocus on corner-like parts of images
  2. MSER(Maximally stable extremal regions): focus on blob-like distinguished points through multiple scales
  3. Keypoint detector:
    • Harris corner detector
    • Harris-Laplace – scale-invariant version of Harris detector (an affine invariant version also exists, presented by Mikolajczyk and Schmidt, and I believe is also patent free).
    • Multi-Scale Oriented Patches (MOPs) – athough it is patented, the detector is basically the multi-scale Harris, so there would be no problems with that
    • LoG filter – since the patented SIFT uses DoG (Difference of Gaussian) approximation of LoG (Laplacian of Gaussian) to localize interest points in scale, LoG alone can be used in modified, patent-free algorithm, tough the implementation could run a little slower
    • FAST
    • BRISK
  4. Keypoint descriptor:
    • Normalized gradient – simple, working solution
    • Wavelet filtered image patch – similar to gradient, the details are given in MOPs paper, but can be implemented differently to avoid the patent issue (e.g. using different wavelet basis or different indexing scheme)
    • Histogram of oriented gradients
    • GLOH
    • LESH

9. Semantic approaches

are typically based on hierarchical representations of the whole image. The general idea is to represent an image with a tree-shaped structure, where leaves contain the image details and objects can be found in the nodes closer to the root of such trees. Then, somehow, you compare the sub-trees to identify the objects contained in different images.

  • make the tree based on the existing features (where the cluster centers in each level are actually the representative, “central”, features for each cluster).
  • Once the tree is constructed to the required depth (they recommend 10 cluster over 6 levels), the cluster centers at the last level are representative for a very small number of features
  • Each original feature can be represented by the corresponding cluster center, and instead of descriptors, for each image you only need to store the information about which cluster centers – features – it contains.

This is much easier since you only need to store one or two integers per feature – coding it’s path through a tree. The only thing you need to do for adding any new picture in the dataset is determine which of the calculated cluster centers represent the image features the best (as mentioned before, the last level of cluster centers is quite precise) and store the information about the cluster centers (the before-mentioned integers). Looking up the cluster centers should be really fast – there’s only 10 comparisons at each of the 6 levels.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

生活在西班牙

自己动手丰衣足食

BlueAsteroid

Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......

我的心情魔方

天才遠私廚

希望能做一個分享各種資訊的好地方

语义噪声

西瓜大丸子汤的博客

笑对人生,傲立寰宇

Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision

datarazzi

Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness

槑烎

1,2,∞

Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: