AI Software Uses Programmatic Imaging to Train Vision Systems

Princeton University researchers have developed a software system that aims to overcome limits to existing generative AI systems and quickly create image sets to prepare machines for nearly any visual setting. The system, called Infinigen, creates natural-looking objects and environments in three dimensions.

Though AI presents an opportunity for creating the massive sets of images necessary to train autonomous cars and other machines to see their environments, current generative AI systems have shortcomings that can limit their use. Infinigen is a procedural generator, meaning that it creates content based on automated, human-designed algorithms rather than labor-intensive manual data entry or the neural networks that power modern AI. In this way, the new program generates myriad 3D objects using only randomized mathematical rules.

An open-source software system that generates an infinite number of photorealistic scenes of the natural world, an advancement that could improve the training of autonomous cars and other robots. Courtesy of Princeton University.

Infinigen’s mathematical approach allows it to create labeled visual data, which is needed to train computer vision systems including those deployed on home robots and autonomous cars. Because Infinigen generates every image programmatically — it creates a 3D world first, populates it with objects, and places a camera to take a picture — Infinigen can automatically provide detailed labels about each image including the category and location of each object.

The resulting labeled images can be used to train a robot to recognize and locate objects given only an image as input. Such labeled visual data would not be possible with existing AI image generators, according to Jia Deng, an associate professor of computer science at Princeton and senior author of a study that details the software system. This is because those programs generate images using a deep neural network that does not allow the extraction of labels, Deng said.

In addition, Infinigen’s users have detailed control of the system’s settings, such as the precise lighting and viewing angle, and they can fine-tune the system to make images more useful as training data.

Besides generating virtual worlds populated by digital objects with natural shapes, sizes, textures, and colors, Infinigen’s capabilities extend to synthetic representations of natural phenomena including fire, clouds, rain, and snow. By vastly expanding the menu of 3D-rendered objects and landscapes, Infinigen also boosts machines’ ability to perform 3D reconstructions, from just 2D pixels, of the complex spaces they will operate within. While moving away from real-world images to synthetic images to develop cars and robots that will move in the real world might seem counterintuitive, real image data sets have key limitations, Deng said.

For example, the computers that guide robots and smart cars do not perceive images and other visual objects like humans do. An image that looks three-dimensional to a human is just a two-dimensional collection of pixels to a computer. To allow robots to perceive an image in 3D, the image needs to include an instruction called a “3D ground truth.” This is difficult to do with existing 2D images.

An array of Infinigen-generated trees showing the variation and control users have over their images. Courtesy of Princeton University.

According to Deng, the developers expect the system to be a useful resource for augmented and virtual reality, and for additive manufacturing.

A study describing Infinigen was presented at the 2023 Conference on Computer Vision and Pattern Recognition (www.doi.org/10.48550/arXiv.2306.09310).