Whether categorization is helpful for computer vision

Many object classes are functional.

They often shows the visual polysemy – view dependence.

They are language dependent.

They are individual dependent.

So, why categorize?

“A part of the world is explained well by parametric/mathematical models, and a part of the world is better explained by data.”

[Halevy et al., 2009]

“visual world can be sufficiently explained by data”

  • In image classification, dataset sizes typically range from a few thousands to a million;
  • In object detection, however, the number of negative windows can go as high as hundreds of millions.

The power of object association goes beyond object category detection.

To match individual objects within scenes, we must partition the image into chunks which are small enough to be matchable in a reasonably-sized database, but large enough to encode specific objects, not generic “visual words” [Sivic and Zisserman, 2003]. – Object-sized image chunk

the cases when the visual content is only similar on the higher scene level, but quite dissimilar on the pixel level

  • scene matching
  • re-texturing an image
  • NOT pixel-wise matching

“visual memex” — a dataset that explores the visual similarities and contexts of a set of photos. 

source http://www.gizmag.com/carnegie-mellon-computer-image-matching/20757/

Every year, thousands of tourists stand in front of Paris’ Eiffel Tower to have their picture taken, painted, or sketched. Though every image is different, each contains the sky piercing tower. Now, a computer can match up all those images based on that one identifying feature.

This could be useful, for example, to someone who is wondering how the Eiffel Tower and its surroundings have changed since their grandparents had their picture painted in front of it on their honeymoon. In this case, the computer could find a match to the painting by searching online for a modern match.

The technique differs from photo-matching methods that focus on similarities in shapes, colors, or composition, which work well when searching for exact or very close matches but fail when applied across domains, such as a picture taken in different seasons or a painting and a picture.

“The language of a painting is different than the language of a photograph,” Alexei Efros, an associate professor of computer science and robotics at Carnegie Mellon University in Pittsburgh, Penn., said in a news release. “Most computers latch onto the language, not what’s being said.”

source from http://futureoftech.msnbc.msn.com/_news/2011/12/06/9252228-computer-mimics-human-ability-to-match-images


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Just another WordPress.com site

Jing's Blog

Just another WordPress.com site

Start from here......







Just another WordPress.com site

Where On Earth Is Waldo?

A Project By Melanie Coles

the Serious Computer Vision Blog

A blog about computer vision and serious stuff

Cauthy's Blog

paper review...

Cornell Computer Vision Seminar Blog

Blog for CS 7670 - Special Topics in Computer Vision


Life through nerd-colored glasses

Luciana Haill

Brainwaves Augmenting Consciousness



Dr Paul Tennent

and the university of nottingham

turn off the lights, please

A bunch of random, thinned and stateless thoughts around the Web

%d bloggers like this: