Camera-processor Chip Brings Computer Vision Closer to Natural Perception

Taking their cues from nature, researchers at the University of Bristol and the University of Manchester are developing methods for increasingly intelligent cameras for AI systems. Applications for the cameras span autonomous vehicles and other intelligent-vision-enabled devices.

The combination of sensors such as digital cameras and computing devices are compositional of current vision systems. The problem with such systems is that they perceive the environment only after the visual information has been recorded and transmitted. The information being sent also tends to include a fair amount of data that is irrelevant to a user, either machine or human; in an autonomous vehicle, a system may capture details about leaves on a tree on the side of the road. Such details consume power and processing time, reducing the efficiency of the system.

A Convolutional Neural Network (CNN) on the SCAMP-5D vision system classifying hand gestures at 8,200 frames per second. Courtesy of University of Bristol.

“We can borrow inspiration from the way natural systems process the visual world — we do not perceive everything — our eyes and our brains work together to make sense of the world, and in some cases, the eyes themselves do processing to help the brain reduce what is not relevant,” said Walterio Mayol-Cuevas, professor in robotics, computer vision, and mobile systems at the University of Bristol and principal investigator on the project.

The collaboration yielded two papers on the subject — one led by Laurie Bose and the other led by Yanan Liu — and two refinements toward the goal of more efficient intelligent cameras. The researchers implemented CNNs (convolutional neural networks) directly on the image plane. The CNNs developed by the team classified frames at thousands of times per second without ever needing to record them or send them down the processing pipeline. The researchers demonstrated the technology by classifying handwritten numbers, hand gestures, and even plankton.

Videology Industrial-Grade Cameras - NEW 2MP Camera 2024 MR

The work was made possible due to the SCAMP architecture developed by Piotr Dudek, professor of circuits and systems at the University of Manchester, and his team. SCAMP is a camera-processor chip that the team has described as a PPA (pixel processor array). A PPA has a processor embedded in each pixel that allows them to communicate and process in a truly parallel form — ideal for CNNs and vision algorithms.

SCAMP-5d vision system. Courtesy of The University of Manchester.

“Integration of sensing, processing, and memory at the pixel level is not only enabling high-performance, low-latency systems, but also promises low-power, highly efficient hardware,” Dudek said. “SCAMP devices can be implemented with footprints similar to current camera sensors, but with the ability to have a general purpose massively parallel processor right at the point of image capture.”

“What is so exciting about these cameras is not only the newly emerging machine learning capability, but the speed at which they run and the lightweight configuration,” said Tom Richardson, senior lecturer in flight mechanics at the University of Bristol. Richardson has been working to integrate the technology into lightweight drones.

The research could lead to intelligent dedicated AI cameras, visual systems that can send high-level information to the rest of the system, such as the type of object or event taking place in front of the camera. This approach could make systems more efficient and secure as no images would need to be recorded.

The research papers were presented at the European Conference on Computer Vision 2020. Demos and videos are accessible here.

Published: January 2021

Glossary

machine learning: Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to improve their performance on a specific task through experience or training. Instead of being explicitly programmed to perform a task, a machine learning system learns from data and examples. The primary goal of machine learning is to develop models that can generalize patterns from data and make predictions or decisions without being...
machine vision: Machine vision, also known as computer vision or computer sight, refers to the technology that enables machines, typically computers, to interpret and understand visual information from the world, much like the human visual system. It involves the development and application of algorithms and systems that allow machines to acquire, process, analyze, and make decisions based on visual data. Key aspects of machine vision include: Image acquisition: Machine vision systems use various...

Browse Cameras & Imaging, Lasers, Optical Components, Test & Measurement, and more.