Multiview 3-D photography simplified

Ashley N. Rice, [email protected]

A small checkerboard-patterned plastic film inserted beneath the lens of an ordinary camera can transform the device into a light-field camera capable of producing multiperspective images.

Current light-field cameras trade a good deal of resolution for that extra-angle information: A camera with a 20-megapixel sensor, for instance, will yield a refocused image of only 1 megapixel. And such devices cost nearly $400.

Researchers in the Camera Culture Group at MIT’s Media Lab aim to change that with a system they’re calling Focii. Their device – which can produce a full 20-megapixel multiview 3-D image from a single exposure of a 20-megapixel sensor – relies on a small rectangle of plastic film printed with a unique checkerboard pattern that is inserted beneath the lens of an ordinary digital single-lens-reflex camera. Software does the rest.

The new work complements the Camera Culture Group’s ongoing glasses-free 3-D-display research, said postdoc Gordon Wetzstein. “Generating live-action content for these types of displays is very difficult,” Wetzstein said. “The future vision would be to have a completely integrated pipeline from live-action shooting to editing to display. We’re developing core technologies for that pipeline.”

Because a light-field camera captures information about not only the intensity of light rays but also their angle of arrival, the images it produces can be refocused later. Scientists from the Camera Culture Group at MIT’s Media Lab have developed a system called Focii that relies on a small rectangle of plastic film printed with a unique checkerboard pattern. When inserted beneath the lens of an ordinary camera, it can produce a full 20-megapixel multiview 3-D image from a single exposure of a 20-megapixel sensor. Photo courtesy of Kshitij Marwah.

In 2007, Ramesh Raskar – the NEC Career Development Associate Professor of Media Arts and Sciences, and head of the Camera Culture Group – and colleagues at Mitsubishi Electric Research showed that a mask and some algorithmic wizardry could produce a light-field camera whose resolution matched that of cameras that used arrays of tiny lenses, the approach adopted in today’s commercial devices.

“It has taken almost six years now to show that we can actually do significantly better in resolution, not just equal,” Raskar said.

Focii represents a light field as a grid of square patches; each patch, in turn, consists of a 5 x 5 grid of blocks. Each block represents a different perspective on a 121-pixel patch of the light field, so Focii captures 25 perspectives in all; conventional 3-D systems capture only two perspectives.

The key to the system is a novel way to represent the grid of patches corresponding to any given light field.

In particular, Focii describes each patch as the weighted sum of a number of atoms, or reference patches, stored in a dictionary of about 5000 patches. Rather than describing the upper-left corner of a light field by specifying the individual values of all 121 pixels in each of 25 blocks, Focii simply describes it as some weighted combination of, say, atoms 796, 23 and 4231, the investigators said.

Each atom is itself a 5 x 5 grid of 121-pixel blocks, each consisting of arbitrary-seeming combinations of color: The blocks in one atom might all be green in the upper-left corner and red in the lower right, with lines at slightly different angles separating the regions of color; the blocks of another atom might all feature slightly different-sized blobs of yellow invading a region of blue.

To build the dictionary, several combinations of colored blobs were tested to determine which, empirically, enabled the most efficient representation of actual light fields. Once the dictionary was established, however, they still had to calculate the optimal design of the mask they use to record light-field information.

Visiting scientist Yosuke Bando explains the principle behind mask design using the analogy of Fourier transform.

“If a mask has a particular frequency in the vertical direction” – say, a regular pattern of light and dark bars – “you only capture that frequency component of the image,” Bando said. “So you have no way of recovering the other frequencies. If you use frequency domain reconstruction, the mask should contain every frequency in a systematic manner.”

“Think of atoms as the new frequency,” said graduate student Kshitij Marwah. “In our case, we need a mask pattern that can effectively cover as many atoms as possible.”

Assembling an image from the information captured by the mask is currently computationally intensive, said Kari Pulli, senior director of research at graphics-chip company Nvidia. Moreover, he says, the examples of light fields used to assemble the dictionary may have omitted some types of features common in the real world.

“There’s still work to be done for this to be actually something that consumers would embrace,” Pulli said.

The work was presented at Siggraph 2013 in Anaheim, Calif.