HOLLY O'DELL, CONTRIBUTING EDITOR
In 1970, robotics professor Masahiro Mori developed his “uncanny valley”
hypothesis, which stated that people
react positively to a robot if it has some
humanlike features, but they become
more repulsed if it starts looking too
much like us.
A robot named Pepper was developed to bridge that gap by providing human-robot interaction to improve quality of life. Made by SoftBank Robotics Corp., the compact, mobile robot features a design mimicking that of a traditional robot but with eyes that move and arms that articulate. The robot is equipped with advanced computer vision and machine learning technologies — developed by Rensselaer Polytechnic Institute’s (RPI’s) Intelligent Systems Lab following nearly 20 years of research — so it can accurately
detect and recognize nonverbal cues
to naturally interact with humans.
Computer vision and custom algorithms allow Pepper to maintain eye contact with humans. Courtesy of Rensselaer Polytechnic Institute.
Its software uses custom algorithms
to detect face and body movement inreal time. The small video camera mounted on Pepper’s head can
recognize facial expressions and estimate face poses, enabling it to recognize happiness, sadness, or surprise. It can also estimate an individual’s age and gender, and even maintain eye contact with a human.
Pepper takes a similar approach to recognizing the body. “First we start with detecting the body as a 2D image,” says Qiang Ji, professor of electrical, computer, and systems engineering at RPI. “From there we can analyze the body pose, which is determined by the position and angle of the shoulders. Then from the body pose, we can recognize different body gestures.”
Pepper uses deep learning to articulate gestures such as a handshake. Combined with the small
camera mounted on the robot’s head, the software helps Pepper recognize facial expressions and
estimate age and gender. Courtesy of Rensselaer Polytechnic Institute.
For example, when Pepper sees you cross your arms, it says, “Hey, be friendly to me.” When it recognizes
the drinking motion, it says, “Cheers!”
Ji predicts that computer vision will also be added to audio-only programs such
as Alexa from Amazon or iPhone’s Siri.
This multimodal approach allows for
more humanlike interaction initiated
from the robot.
Pepper is only one application to
which the RPI researchers have applied
computer vision technology. Augmented by the computer vision algorithms
developed in Ji’s group, Milo, a robot
from RoboKind, is equipped with motors on its face that act as artificial muscles. These “muscles” produce and
replicate different facial expressions to
interact with children with autism, for
example.
The computer screen on Pepper’s chest can be
programmed to display various content during
interactions, such as information about what the
robot is seeing in real time. Courtesy of Rensselaer Polytechnic Institute.
Other use cases include human state
monitoring and prediction, driver behavior estimation and prediction, and
security and surveillance. Research
from DARPA supports the use of aerial
videos to detect suspicious action and
activities.
Meanwhile, with funding from Honda
and the U.S. Department of Transportation, RPI is focusing research on detecting distracted and fatigued drivers
through behaviors such as head and
eye movements, gaze, yawning, and
nodding — all captured by a dash-board camera.
The Milo robot has facial motors that act as artificial muscles, producing and replicating various facial expressions. RPI professor Qiang Ji sees the robot’s ability to
interact with children with autism as a
promising application. Courtesy of Rensselaer Polytechnic Institute.
Into the deep
While the camera has been important in the development of Pepper and other highly automated applications, advancements in deep learning software are responsible for the applications’ ongoing improved performance. Deep learning allows computers to learn to perform classification tasks directly from images, rather than from
a task-specific algorithm. A large set of labeled data — in this case, many thousands of images displaying various expressions and gestures — along with neural networks are used to train the computer model, allowing it to learn much like a human does.
Still, the technology needs to mature
to make the Pepper robot safely deployable in homes and other locations. “In a real-world environment, it is very hard for current deep learning software to predict a task,” Ji says. “If you train the model for one task, it cannot automatically adapt to another, even similar, task without another retraining process.”
In its early deployments, Pepper has been used by businesses to promote products. Ji also sees the robot’s
potential to act as a museum guide.
But the ultimate goal is to improve people’s lives. “Pepper’s ability to perceive and understand humans’ emotional states, and then respond in an empathetic manner, makes it ideal as a companion robot for the ill or older populations.”