Cameras Record Object Density More Accurately

Crowd counting — the process of obtaining information on the density or number of objects such as vehicles or people — can benefit from the same deep learning techniques that have been used for image and video processing. Scientists at Japan Advanced Institute of Science and Technology (JAIST), in collaboration with researchers at Sirindhorn International Institute of Technology (SIIT) in Thailand, developed a way to achieve higher performance in crowd counting by using a backward connection in a deep neural network (DNN).

The estimation network proposed by the researchers consists of two identical networks for extracting a high-level feature and estimating the final result. To preserve semantic information, dilated convolution is used without resizing the feature map.

Instead of using a normal skip connection or a forward connection, the researchers use a backward connection that extracts feature maps from the deeper layer to a shallow layer. This helps the shallow layer to recognize the characteristic of the target in advance. False positives are reduced before the density map is formulated. Objects with small and large scales are correlated to shallow and deep layers, respectively.

An example of a density map obtained from an image in TRANCOS dataset. Courtesy of JAIST.

An example of a density map obtained from an image in TRANCOS data set. Courtesy of JAIST.

To ensure the quality of a density map, feature maps in every layer should have the same resolution, the researchers said. In the team’s approach, dilated convolution is used in skip-network to increase the receptive field sizes while keeping the information of high-level features for a feature map integration in the skip connection. The receptive field with a dilated convolution layer will grow exponentially while preserving the resolution of the feature map.

Bristol Instruments, Inc. - 872 Series High-Res 4/24 MR

The researchers tested their method in three data sets for counting humans and vehicles in a crowd image. They evaluated the counting performance by mean absolute error and root mean squared error to indicate the accuracy and robustness of the network, respectively. The experimental results showed that the network outperformed other related networks in a high crowd density and could be effective for reducing overcounting errors.

Crowd counting is a challenging task dealing with variations in object scale and crowd density. Existing approaches emphasize skip connections by integrating shallower layers with deeper layers where each layer extracts features in a different object scale and crowd density. In these approaches only high-level features are emphasized, while low-level features are ignored, the researchers said. The new DNN with a backward connection could achieve a more accurate estimation of the density of objects, and could be applied for estimating human density in public spaces or vehicle density on a road in order to improve public safety, security, and traffic efficiency.

“Backward connection in DNN enables [us] to take advantages obtained from both high-level feature and low-level feature in an image, and therefore achieves higher performance than before,” professor Atsuo Yoshitaka, head of Yoshitaka lab, said. The Yoshitaka lab is currently developing different kinds of DNNs for industrial applications such as object detection and identification in micrograph and defect detection for industrial products.

The research was published in the Journal of Imaging (www.doi.org/10.3390/jimaging6050028).

Published: July 2020

Glossary

machine vision: Machine vision, also known as computer vision or computer sight, refers to the technology that enables machines, typically computers, to interpret and understand visual information from the world, much like the human visual system. It involves the development and application of algorithms and systems that allow machines to acquire, process, analyze, and make decisions based on visual data. Key aspects of machine vision include: Image acquisition: Machine vision systems use various...

Browse Cameras & Imaging, Lasers, Optical Components, Test & Measurement, and more.