3D Vision Transforms Robotic Piece Picking by Imaging Transparent Materials

Facebook X LinkedIn Email
Structured light and 3D imaging are the keys to imaging transparent packaging materials that are popular in e-commerce, such as poly bags, bubble packaging, and shrink-wrap.


E-commerce has had a profound effect on everyday life, making it possible for goods to be bought and delivered across the world. This infinitely large marketplace is available from smartphones everywhere, and fulfilling purchasing desires has become incredibly simple from the customer’s end.

As a result, statistics indicate a substantial increase in online retail sales in the past 10 years, starting at around $1.3 trillion in 2014 and projected to reach more than $6.4 trillion by 20241. This exponential growth demonstrates the significant shift toward e-commerce as a prominent mode of shopping worldwide, reflecting consumers’ increasing reliance on online platforms for their shopping needs.

An example of a piece picking application using Zivid 2+ 3D vision. Courtesy of Zivid.

An example of a piece picking application using Zivid 2+ 3D vision. Courtesy of Zivid.

But behind this consumer ease is a vast and complex world of supply chains and logistical wizardry. Not only is the scope of choice immense but also the time for delivery is becoming shorter and shorter. The integration of robotics in e-commerce fulfillment centers is playing a fundamental role in the continual improvement in terms of capability, efficiency, and consistency.

One of the most noteworthy effects of robotics in e-commerce fulfillment centers is the improvement in operational efficiency. Robots are adept at handling repetitive and labor-intensive tasks, such as picking and packing items from shelves. Automated guided vehicles and robotic arms work seamlessly together to navigate through warehouses, reducing the time it takes to locate and retrieve products. This efficiency not only accelerates the order fulfilment process but also minimizes the likelihood of errors that can occur with manual handling.

In addition to efficiency gains, robots contribute to increased accuracy in order fulfilment. Advanced robotic systems are equipped with sensors and cameras that allow them to precisely identify and pick items with high precision. This reduces the likelihood of errors in product selection, leading to improved customer satisfaction and a decrease in returns. The integration of robotics in fulfilment centers thus helps to enhance overall supply chain reliability.

The increasing speed at which robots can operate is another significant improvement with the use of e-commerce in fulfillment centers. Unlike human workers, robots can work continuously without the need for breaks, leading to a considerable reduction in order processing times. This accelerated pace not only enables businesses to meet the growing demands of e-commerce but also provides customers with faster delivery times. In the highly competitive e-commerce landscape, swift order fulfillment has become a crucial factor in customer retention and satisfaction.

Transparent bottles imaged using a Zivid 2+ 3D camera.. Courtesy of Zivid.
Transparent bottles imaged using a Zivid 2+ 3D camera.. Courtesy of Zivid.

Transparent bottles imaged using a Zivid 2+ 3D camera.. Courtesy of Zivid.

Furthermore, the use of robotics in fulfillment centers allows for better inventory management. Automated systems can keep track of inventory levels in real time, providing accurate and up-to-date information on stock levels. This not only helps in preventing overstock or stockouts but also enables businesses to optimize their warehouse space efficiently. As a result, companies can reduce carrying costs and make more informed decisions about restocking and replenishing inventory.

The eyes have it

Piece picking robots in fulfilment centers comprise the following system elements:
• Industrial and/or collaborative robots.
• A 2D vision sensor.
• A 3D vision sensor.
• A barcode reader.
• A gripper/end-of-arm picking tool.
• Detection/segmentation/picking software.
• An infeed system of singular or multiple stock-keeping unit (SKU) items for picking.
• An outfeed system of picked items for order fulfilment.

Arguably the most critical elements in this system are the 2D and 3D imaging sensors and the detection and picking software. Other components in such systems are in development and refinement to reach a level of reliability, such as appropriate gripping devices that can consistently pick items securely. So, while it is not the only technology that is still evolving for piece picking use cases, 3D and 2D machine vision remains probably the most critical part of the system. The software for detection, segmentation, and picking relies on the completeness and accuracy of the data it receives from the machine vision sensors.

A dispersed reflection from a Lambertian surface (top). A specular reflection from a very shiny surface (middle). Reflections from multiple sources with transparency (bottom). Courtesy of Zivid.
A dispersed reflection from a Lambertian surface (top). A specular reflection from a very shiny surface (middle). Reflections from multiple sources with transparency (bottom). Courtesy of Zivid.
A dispersed reflection from a Lambertian surface (top). A specular reflection from a very shiny surface (middle). Reflections from multiple sources with transparency (bottom). Courtesy of Zivid.

A dispersed reflection from a Lambertian surface (top). A specular reflection from a very shiny surface (middle). Reflections from multiple sources with transparency (bottom). Courtesy of Zivid.

There has also been a trend toward machine vision that incorporates both high-resolution 2D and high-accuracy 3D in a single unit. This unification of data for detection and picking has advanced the possibilities of deep learning software by offering image and point cloud data in a pixel-by-pixel format that simplifies the calibration and coherence between the sets of imaging data, which are essential for the picking cell.

Reliable piece picking

While modern e-commerce fulfillment centers hum with highly integrated automation, one area of operations has proved to be particularly problematic to automate reliably: the order picking stations. These picking stations are still the province of human activity. There are two prime causes for this being such a difficult automation challenge to address.

The first challenge is simply accounting for the scope of products that can be encountered, identified, and handled. Typically, smaller specialist warehouses and fulfilment centers will have several thousand separate items, called SKUs. This can easily increase up to hundreds of thousands and even millions of SKUs at the very largest e-commerce businesses. For context, Amazon, one of the largest e-commerce companies in the world, has 12 million products on its books, and when the Amazon Marketplace is considered, the number rises to 350 million separate products.

A variety of transparent items and materials using a Zivid 2+ 3D camera. Courtesy of Zivid.
A variety of transparent items and materials using a Zivid 2+ 3D camera. Courtesy of Zivid.

A variety of transparent items and materials using a Zivid 2+ 3D camera. Courtesy of Zivid.

As anyone who has shopped on Amazon will know, the shape, material, and size of these products is unlimited, which includes everything from toys to aerosol cans, books to car tires, televisions to kitchen utensils. It is a vast challenge. Conquering this challenge is reliant on sophisticated and reliable detection and segmentation software. In this area, smart and adaptable algorithms that can learn in real time have been significant. The latest advancements in AI and deep learning have made this task achievable. From a machine vision perspective, the very highest quality of 2D and 3D data is essential for success; all AI models are reliant on the quality of the data that they receive to perform at their best.


The next fundamental challenge is transparency. Transparent materials are prevalent in e-commerce packaging. Certain items are transparent or semitransparent, but a significant number of items are enclosed in some form of transparent materials, such as poly bags, bubble packaging, and/or shrink-wrap. Transparency has long been considered the ultimate challenge in 3D imaging. This is intuitive to understand given how transparency appears to our human eyes. If you look at an object inside a poly bag, you will see various surfaces: the surface of the bag, the surface of the object, and even the surface of the back of the bag. These multiple reflective surfaces are the source of the challenge. Humans have a very well-trained system in our brains to make correct differentiation and classification between these surfaces. For a machine vision system to make similar decisions is extremely complicated and challenging.

Why is transparency so difficult?

Imagine a setup in which two or more cameras are fixed at different angles around an object, such as a coffee mug on a table. Each camera captures a photo from its unique perspective, akin to how our two eyes perceive slightly different views of the same scene. This is the essence of stereoscopic 3D imaging systems — they analyze these multiple fixed-viewpoint images to discern the depth and position of the object’s surfaces. It is not about creating new visuals, but rather precisely measuring and mapping the 3D space the object occupies. This technique is like a high-tech ruler, offering a detailed understanding of the object’s physical presence in 3D.

In the field of stereoscopic 3D imaging, the essence of creating a detailed 3D map lies in the ability to find corre- sponding points between multiple images. This task, however, presents significant challenges in certain scenarios, particularly when dealing with surfaces that lack distinct visual features. Imagine the difficulty in identifying similar points on a completely white wall using two different cameras. The absence of distinctive features or textures on such uniform surfaces makes this a daunting task for these systems. To overcome this, a blend of sophisticated processing techniques and innovative imaging methods is often employed. Key areas in this space include feature enhancement, active illumination, multispectral imaging, temporal analysis, and machine learning.

A demonstration of a high-speed transparent piece picking cell with Zivid, Fizyr, and The Gripper Company at Automatica 2023. Courtesy of Zivid.

A demonstration of a high-speed transparent piece picking cell with Zivid, Fizyr, and The Gripper Company at Automatica 2023. Courtesy of Zivid.

One innovative approach is to use structured light, where one or more of the cameras in a stereoscopic system is replaced by a projection system. This method involves projecting a known pattern onto the scene, akin to casting a net of light that drapes over the surfaces being imaged. Time-multiplexed structured light takes this a step further by capturing a sequence of images, each slightly different, effectively coding each location in a unique light pattern. This technique not only overcomes the texture problem but also enhances the accuracy and resolution of the 3D images obtained.

Yet, the efficacy of these imaging techniques is not solely dependent on the systems themselves but also significantly influenced by the surfaces they are imaging. In computer vision, much of the focus has traditionally been on Lambertian surfaces, which evenly scatter light in all directions. These surfaces are relatively easy to image because their uniform dispersion of light means that the viewing angle does not significantly alter the appearance of these surfaces, simplifying their imaging.

However, the real world presents more complexity. For instance, when imaging surfaces with specular reflections, such as mirrors, the challenge is different. A mirror does not provide a return signal from its own surface. Instead, it reflects the environment, so the signal captured is that of whatever is reflected on the mirror from the specific viewpoint of the camera. This adds a layer of complication because the system must interpret and separate the reflected images from the actual surface characteristics.

This nuanced understanding of surface properties, from the simple Lambertian to the complex reflective surfaces, sets the stage for exploring the intricacies of imaging one of the most challenging types of surfaces — transparent objects. How does one effectively capture 3D images of materials designed to let light pass through them? These materials come with their unique set of complexities, varying in how they absorb, bend, and reflect light, each adding layers of intricacy to the imaging process.

While transparent and specular surfaces might initially seem to pose similar challenges due to their low signal return from the desired surface, their imaging intricacies differ significantly. Specular objects often present a mix of properties; not every surface is a polished mirror and many objects have elements that are specular to varying degrees. Similarly, transparent surfaces are not always perfectly clear like crystal glass. They exhibit a range of transparency levels, with some being semitransparent or translucent, as in the case of frosted glass. Each variation alters the way light interacts with the surface, thereby affecting the imaging process.

In practical applications, such as piece picking in automation, the reality is a wide array of object types, many of which are transparent or partially so. One of the most common examples is objects wrapped in plastic. These materials pose a unique challenge for stereoscopic 3D imaging systems. The system must not only discern the object’s shape and position but also navigate the additional layer of complexity introduced by the transparent or translucent wrapping. Understanding these diverse surface properties is key to advancing the capabilities of 3D imaging technologies in handling real-world scenarios.

In the case of transparent materials, the imaging challenge intensifies. A stereoscopic 3D system might inadvertently capture the 3D data of objects located behind the transparent surface. This scenario is particularly problematic in automation tasks such as piece picking, in which precision is paramount. Picking based on the background data instead of the object’s actual surface can lead to errors, such as the picking tool crashing through the transparent layer, damaging both the mechanism and the object.

Additionally, when dealing with transparent objects, the system often encounters the complexity of layered visuals. Seeing through one object often means capturing multiple objects superimposed upon each other, especially in environments with multiple layers of transparent materials. This overlapping of data creates a convoluted 3D map, challenging the system to distinguish between the actual target object and the background or adjacent objects.

These complexities necessitate sophisticated imaging techniques and algorithms capable of discerning the true surface of the object to be picked. It highlights the need for advanced 3D imaging solutions that can effectively handle the diverse and often challenging real-world scenarios encountered in automated piece picking and similar applications. During a recent exhibit — a collaborative effort between Zivid, Fizyr, and The Gripper Company — a high-speed piece picking technology was displayed that included a piece picking cell capable of handling ~1000 picks per hour. Its most notable feature was the proficient handling of transparent objects and those wrapped in transparent materials, a task traditionally challenging for automated systems. This showcase marked a significant advancement in 3D machine vision technology and demonstrated reliable detection and handling of transparent items, a feat not commonly achieved in the field. The technology used a sophisticated 3D camera system, which did not require preprogrammed information about the items it handled. Instead, it generated detailed point clouds that enabled efficient object recognition and handling.

Challenges and future directions

While the demonstration was a breakthrough, it also highlighted ongoing challenges and areas for future research. The technology showed proficiency in handling items with varying degrees of transparency, but struggled with materials that were completely clear, such as certain glass types, or those that had mirror-like surfaces. These scenarios still require further R&D.

The approach, based on structured light in 3D machine vision, has shown potential for significant advancements in the field. This technology opens possibilities for future improvements, particularly in addressing the remaining 10% of challenges related to complete transparency. Continued research in the field bolsters optimism about overcoming these hurdles and leading to a more comprehensive and robust solution for handling a wide range of transparent materials in automated systems.

Meet the authors

Martin Ingvaldsen leads Zivid’s visionary pursuits, driving innovation as head of vision. With expertise in imaging and robotics, he pioneers cutting-edge solutions in 3D vision technology; email: [email protected].

John Leonard drives Zivid’s market strategies as product marketing manager. With a fusion of technical expertise and consumer insights, he shapes the narrative for innovative 3D vision solutions; email: [email protected].

Henrik Schumann-Olsen is the cofounder and CTO for Zivid, having spearheaded the company’s growth from SINTEF incubation to helping it become a global leader in 3D robotics. He has won many awards as a senior researcher in machine vision, pattern recognition, 3D cameras, optics, robotics, and programming; email: [email protected].


1. Statista. Retail e-commerce sales worldwide from 2014 to 2026,

Published: March 2024
3D vision2D visione-commerceroboticspiece pickingbin pickinggrippersautomationstructured light imagingFeaturesRobotic Systems and Equipment

We use cookies to improve user experience and analyze our website traffic as stated in our Privacy Policy. By using this website, you agree to the use of cookies unless you have disabled them.