AI Method Enables Controllable Image Synthesis

A method that controls how AI systems create images has applications for autonomous robotics and AI training. The method builds on the techniques (or AI task) of conditional image generation to give users more control over the resulting images and layout.

Developed by researchers at North Carolina State University (NC State), the new technique specifically trains the AI system to control certain image characteristics across a series of pictures that may show movement or other changes.

In conditional image generation, the AI system is trained to create images that meet a specific set of conditions — for example, the system could be trained to create original images of cats or dogs, depending on which animal the user requested. Advances to this technique allow the AI system to be trained to meet conditions that specify image layout — for example, where a tree image should be placed on the screen.

In new work, the NC State team has taken these techniques even further to give users control of image synthesis from reconfigurable structured inputs.

“Like previous approaches, ours allows users to have the system generate an image based on a specific set of conditions. But ours also allows you to retain that image and add to it,” professor Tianfu Wu said. “For example, users could have the AI create a mountain scene. The users could then have the system add skiers to that scene. Our approach is highly reconfigurable.”

The new AI method enables the system to create and retain a background image, while also creating figures that are consistent from picture to picture, but show change or movement. Courtesy of North Carolina State University.

The NC State technique allows users to train the AI system to manipulate specific image characteristics so that the images retain their identity even if they move or otherwise change. For example, the AI system could create a series of images showing the same skiers turn toward the viewer as they move across the landscape.

The researchers created a model for the task of layout-to-mask-to-image, and modeled how the AI system would learn to unfold object masks in a weakly supervised way based on an input layout and object style codes. To ensure a strong connection between the input layout and synthesized images, the researchers connected the layout-to-mask component with layers deep in the generator network.

The researchers then created a method for their proposed layout-to-mask-to-image synthesis based on generative adversarial networks (GANs). The method allows for layout and style control at both the image and object levels. The researchers introduced an instance-sensitive and layout-aware normalization (ISLA-Norm) scheme for ensuring controllability.

The team tested its new approach using the COCO-stuff data set and the visual genome data set. Based on standard measures of image quality, the new approach outperformed previous image creation techniques. Although the researchers required a 4-GPU workstation for training the AI system, they said that deploying the system is less computationally expensive. “We found that one GPU gives you almost real-time speed,” Wu said.

One application for the method, the researchers said, could be to help autonomous robots “imagine” what the end result might look like before they begin an assigned task. “You could also use the system to generate images for AI training. So, instead of compiling images from external sources, you could use this system to create images for training other AI systems,” Wu said.

Next, the researchers plan to work on extending their approach to be applicable for video and 3D images. They have made the source code for their method available on GitHub. “We’re always open to collaborating with industry partners,” Wu said.

The research was published in IEEE Transactions on Pattern Analysis and Machine Intelligence (www.ieeexplore.ieee.org/document/9427066).