AI Learns to Predict Human Behavior from Videos

NEW YORK, June 28, 2021 — A team from Columbia University has developed an algorithm to give machines a more intuitive sense of what may happen next after an action. The computer vision algorithm is designed to predict human interactions and body language in video. The researchers said this capability could have applications for assistive technology, autonomous vehicles, and collaborative robots, and that it is the most accurate method to date for predicting video action events up to several minutes in the future.

“Our algorithm is a step toward machines being able to make better predictions about human behavior and thus better coordinate their actions with ours,” said Carl Vondrick, assistant professor of computer science at Columbia, who directed the study. “Our results open a number of possibilities for human-robot collaboration, autonomous vehicles, and assistive technology.”

Previous attempts at predictive machine learning, including those by the team, focused on predicting just one action at a time. The algorithms decide whether to classify the action as a hug, high-five, handshake, or even a nonaction like “ignore.” But when the uncertainty is high, most machine learning models are unable to find commonalities between the possible options.

This approach looks at the longer-range prediction problem from a different angle. After analyzing thousands of hours of movies, sports games, and shows like “The Office,” the system learns to predict hundreds of activities by leveraging higher-level associations between people, animals, and objects.

“Not everything in the future is predictable, said Didac Suris, a Ph.D. student in engineering and co-lead author of the paper. “When a person cannot foresee exactly what will happen, they play it safe and predict at a higher level of abstraction. Our algorithm is the first to learn this capability to reason abstractly about future events.”

The team said it had to revisit questions in mathematics — dating back to the ancient Greeks. In geometry, students learn rules about straight lines, parallel lines, and so on. Machine learning systems typically obey these rules as well, though other geometries have counterintuitive properties such as straight lines that bend, or triangles with a curved line. The team used these unusual geometries to build AI models that organize high-level concepts and predict human behavior in the future.

Meadowlark Optics - Building system MR 7/23

“Prediction is the basis of human intelligence,” said Aude Oliva, a senior research scientist at MIT and co-director for the MIT-IGM Watson AI Lab; she is an expert in AI and human cognition who is not involved in the study. “Machines make mistakes that humans never would because they lack our ability to reason abstractly. This work is a pivotal step toward bridging this technological gap.”

The mathematical framework developed by the researchers enables machines to organize events by how predictable they are in the future. The system categorizes activities on its own, is aware of uncertainty, and provides more specific actions when there is certainty — as opposed to more generic predictions when there is not.

The technique, the researchers said, could move computers closer to being able to interpret a situation and make a nuanced decision, rather than a preprogrammed action, the researchers said. It’s a critical step in building trust between humans and computers. “If machines can understand and anticipate our behaviors, computers will be able to seamlessly assist people in daily activity,” said Ruoshi Liu, a Ph.D. student in engineering at Columbia and co-lead author of the paper.

Though the new algorithm is able make more accurate predictions on benchmark tasks than previous methods, the next steps are to verify that it works outside the lab, Vondrick said. If the system can work in diverse settings, it could open possibilities to deploy machines and robots to improve safety, health, and security, the researchers said. The group intends to continue improving the algorithm’s performance with larger data sets and computers, and other forms of geometry.

The study was presented at the Conference on Computer Vision and Pattern Recognition, June 24, 2021 (www.arxiv.org/pdf/2101.01600.pdf).

Published: June 2021

Glossary

computer vision: Computer vision enables computers to interpret and make decisions based on visual data, such as images and videos. It involves the development of algorithms, techniques, and systems that enable machines to gain an understanding of the visual world, similar to how humans perceive and interpret visual information. Key aspects and tasks within computer vision include: Image recognition: Identifying and categorizing objects, scenes, or patterns within images. This involves training algorithms...
machine learning: Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of algorithms and statistical models that enable computers to improve their performance on a specific task through experience or training. Instead of being explicitly programmed to perform a task, a machine learning system learns from data and examples. The primary goal of machine learning is to develop models that can generalize patterns from data and make predictions or decisions without being...
algorithm: A precisely defined series of steps that describes how a computer performs a task.
machine vision: Machine vision, also known as computer vision or computer sight, refers to the technology that enables machines, typically computers, to interpret and understand visual information from the world, much like the human visual system. It involves the development and application of algorithms and systems that allow machines to acquire, process, analyze, and make decisions based on visual data. Key aspects of machine vision include: Image acquisition: Machine vision systems use various...
artificial intelligence: The ability of a machine to perform certain complex functions normally associated with human intelligence, such as judgment, pattern recognition, understanding, learning, planning, and problem solving.

Browse Cameras & Imaging, Lasers, Optical Components, Test & Measurement, and more.