Multicamera Technology Enables Multiperson-Pose Estimation Method

A method has been developed that enables computers to understand the body poses and movements of multiple people — including facial expressions and hand positions — using video in real time. The novel method was created using the Panoptic Studio, a multiview system for social motion capture. The Panoptic Studio consists of a two-story dome embedded with 500 video cameras and was developed by researchers at Carnegie Mellon University’s Robotics Institute.

Carnegie Mellon University researchers have developed methods to detect the body pose, including facial expressions and hand positions, of multiple individuals. This enables computers to not only identify parts of the body, but to understand how they are moving and positioned. Courtesy of Carnegie Mellon University.

Tracking multiple people in real time, particularly in situations where they may be in contact with each other, presents a number of challenges. The research team took a bottom-up approach, first localizing all the body parts in a scene — arms, legs, faces, etc. — and then associating those parts with specific individuals.

Hand detection can be an even greater challenge, because a camera is unlikely to see all parts of the hand at the same time. But for every image that shows only part of the hand, there often exists another image from a different angle with a full or complementary view of the hand, said researcher Hanbyul Joo. That’s where the researchers made use of the multicamera Panoptic Studio.

“A single shot gives you 500 views of a person’s hand, plus it automatically annotates the hand position,” said Joo. “Hands are too small to be annotated by most of our cameras, however, so for this study we used just 31 high-definition cameras, but still were able to build a massive data set.”

The novel method for tracking 2D human form and motion could open up new ways for people and machines to interact with each other, such as communicating with computers simply by pointing at things, said professor Yaser Sheikh.

Detecting the nuances of nonverbal communication between individuals would allow robots to serve in social spaces. A self-driving car could get an early warning that a pedestrian was about to step into the street by monitoring body language. Enabling machines to understand human behavior also could lead to novel approaches to behavioral diagnosis and rehabilitation.

In sports analytics, real-time pose detection would make it possible for computers not only to track the position of each player on the field of play, as is now the case, but to also know what players are doing with their arms, legs and heads at each point in time.

Developed a decade ago, the Panoptic Studio served as site of the experiments that led to the discovery.

“The Panoptic Studio supercharges our research,” Sheikh said.

The studio now is being used to improve body, face and hand detectors by jointly training them. As work progresses to move from the 2D models of humans to 3D models, the facility's ability to automatically generate annotated images will be crucial.

To encourage more research and applications, the team has released its computer code for both multiperson and hand-pose estimation. It is being used by research groups, and more than 20 commercial groups, including automotive companies, have expressed interest in licensing the technology, Sheikh said.

When the Panoptic Studio was built with support from the National Science Foundation, it was not clear what impact it would have, said Sheikh.

“Now, we’re able to break through a number of technical barriers primarily as a result of that NSF grant 10 years ago,” he added. “We're sharing the code, but we're also sharing all the data captured in the Panoptic Studio.”

The research on multiperson and hand-pose detection methods will be presented at the Computer Vision and Pattern Recognition Conference, CVPR 2017, July 21-26, 2017 in Honolulu.

Carnegie Mellon University researchers have developed methods that enable computers to track the body pose of multiple individuals. Courtesy of Carnegie Mellon University.