- Feature tuning
- Feature selection
- Feature construction
>the chosen features typically optimize the discriminatory power regarding the ground truth which consists of positive and negative examples.
It is ironic that human visual understanding which is effortless and instantaneous would be perplexing and difficult to reproduce in computers.
To select the feature subset which gives us the highest classification accuracy, one possibility is to try every possible feature subset. BUT! The number of possibilities grows exponentially with the number of candidate features, which means that it is no t computable in practice for a large set of candidate features.
Use all features???
Result of decreasing the classification error seems appealing, however, in most practical applications, the samples sets do not have infinite extent, which means there could be insufficient data to arrive at statistically representative values for the unknown parameters associated with the features.
That is, the classifier will be well tuned for the training set, but would lack the ability to generalize to new instances, curse of dimensionality.
We wish to minimize the size of the selected feature subset (so minimize the number of unknown parameters and require a smaller training set).
LESS overlapping classes in pattern space, the general approach is to select the feature subspace which has the maximum class separation.
Execution time is less important than the classification accuracy!!!
- at a glance
- Picard asks how people can identify classes of photos without taking time to look at the precise content of each photo
- Society of models
Automatic visual concept learning!!!
For each visual concept, a large set of positive and negative examples are collected.
Dynamic feature sets
The critical disadvantage with feature selection techniques is that they are limited by the discriminative power of the set of candidate features, which are prechosen by the system designer.
Automatic feature construction
Rules are automatically generated from positive and negative example training sets using decision trees. Concepts were represented as AND/OR expressions.
BUT, the automatically discovered rules were not sufficiently complex to discriminate between a wide range of visual concepts.
BETTER performance might be found from splitting the classes into clusters, then finding the appropriate feature subset for each class. Depending on how the visual concept is clustered, the classification rate may vary.
- To develop techniques performing features selection using sparse training sets
- To automatically determine when to split the training examples into multiple classes and find feature subsets for each of them
- To design the feature subset so that the performance degrades gracefully when a greater percentage of the visual concept is occluded.
- To develop multiple modality integration techniques for content based retrieval.
Capability for learning new visual concepts
- 1. Spatial models are represented by static instances of the shape of the object
- 2. Spatial reasoning about the class of visual concepts
- 3. It is essential to prune large areas of the search tree
- 4. Higher degree of spatial model reasoning to the feature selection based methods
Obstacles which include but are not limited to the following:
How can we avoid overfit in feature selection when there is minimal training data?
How can we determine when to split the training examples in to multiple subclasses and find separate feature sets for each subclass?
How can we detect partially occluded examples of visual concepts?
How can we combine multiple modalities, e.g., text, audio, visual imagery, to maximizing the classification rate?
How can we add a deeper level of spatial reasoning to visual concept detection methods?
why histograms are often used?
- they achieve significant data reduction
- they are robust to noise and local image transformations
- they can be computed easily and efficiently
drawback: they can’t encode the spatial information about the image
if we convolve both images with the same smoothing filter, both their histograms will converge to the same single middle impulse (mid-point between max. and min. values), BUT they will converge at different rates
coarse-to-fine multi-resolution searching