Synergy Improves Automated Scene Recognition

Viewpoints, object classifications and surface geometries aid understanding of whole picture.

Richard Gaughan

In the same way that human vision provides sensory input to our brains, effective computer vision could provide unprecedented data input capabilities for automated systems. Computer vision, however, involves more than pixel intensities — it requires the ability to classify and identify objects within an image. But the promise of automatic object recognition is equaled by the technical challenge it presents.

The goal of computer vision is to enable an automated system to classify a scene and take appropriate action, which requires more than simple object recognition. Researchers are developing a method to integrate object recognition with viewpoint determination and surface geometry, using the synergy among the three to create a coherent scene analysis. For example, applying the synergistic technique to an image (top) improves the system’s ability to recognize cars (bottom).

Our brains recognize objects by correlating past experiences, contextual clues and input from other sensory systems. To ask a computer to consistently recognize objects based simply on area, orientation or pixel intensities is ambitious. Now researchers at Carnegie Mellon University in Pittsburgh have developed a method of integrating several levels of image information to gain a coherent understanding of an entire scene, including its objects, its surfaces and its perspectives.

Videology Industrial-Grade Cameras - NEW 2MP Camera 2024 MR

The technique integrates three facets of image information: camera viewpoint, object classification and surface geometry. The key innovation is recognizing that each of these parameters influences and improves perception of the others. In an outdoor scene, for example, if the ground level is identified, then a block of windows midway up the side of a building is less likely to be classified as a car, even if the pixel intensities and orientation might otherwise imply that a car exists at that height. But the inference also works in the opposite way: If a line of car-shaped objects is receding into the distance, the probability is high that the objects are at ground level.

Derek Hoiem, Alexei A. Efros and Martial Hebert of the university’s Robotics Institute are developing an algorithm that enables each of the three classes of image information to improve the other two. The result is a net improvement in the fidelity of object recognition, even for complex, cluttered scenes with multiple pedestrians and cars.

According to Hoiem, the algorithm goes beyond the independent and local processing that is prevalent in recent computer vision efforts, and toward the broader problem of scene understanding.

Published: August 2006

Glossary

image: In optics, an image is the reconstruction of light rays from a source or object when light from that source or object is passed through a system of optics and onto an image forming plane. Light rays passing through an optical system tend to either converge (real image) or diverge (virtual image) to a plane (also called the image plane) in which a visual reproduction of the object is formed. This reconstructed pictorial representation of the object is called an image.

Browse Cameras & Imaging, Lasers, Optical Components, Test & Measurement, and more.