Synergy Improves Automated Scene Recognition

Richard Gaughan

In the same way that human vision provides sensory input to our brains, effective computer vision could provide unprecedented data input capabilities for automated systems. Computer vision, however, involves more than pixel intensities — it requires the ability to classify and identify objects within an image. But the promise of automatic object recognition is equaled by the technical challenge it presents.

The goal of computer vision is to enable an automated system to classify a scene and take appropriate action, which requires more than simple object recognition. Researchers are developing a method to integrate object recognition with viewpoint determination and surface geometry, using the synergy among the three to create a coherent scene analysis. For example, applying the synergistic technique to an image (top) improves the system’s ability to recognize cars (bottom).

Our brains recognize objects by correlating past experiences, contextual clues and input from other sensory systems. To ask a computer to consistently recognize objects based simply on area, orientation or pixel intensities is ambitious. Now researchers at Carnegie Mellon University in Pittsburgh have developed a method of integrating several levels of image information to gain a coherent understanding of an entire scene, including its objects, its surfaces and its perspectives.

The technique integrates three facets of image information: camera viewpoint, object classification and surface geometry. The key innovation is recognizing that each of these parameters influences and improves perception of the others. In an outdoor scene, for example, if the ground level is identified, then a block of windows midway up the side of a building is less likely to be classified as a car, even if the pixel intensities and orientation might otherwise imply that a car exists at that height. But the inference also works in the opposite way: If a line of car-shaped objects is receding into the distance, the probability is high that the objects are at ground level.

Derek Hoiem, Alexei A. Efros and Martial Hebert of the university’s Robotics Institute are developing an algorithm that enables each of the three classes of image information to improve the other two. The result is a net improvement in the fidelity of object recognition, even for complex, cluttered scenes with multiple pedestrians and cars.

According to Hoiem, the algorithm goes beyond the independent and local processing that is prevalent in recent computer vision efforts, and toward the broader problem of scene understanding.