Slicing and Dicing Food and Agricultural Data

Multivariate analysis helps parse calibration data that take into account conditions encountered from the farm to the market.

William A. Ivancic and K. David Monson, Battelle

Farming has gone high-tech, with optical monitoring taking place from the air and from satellites, on planting and harvesting equipment, and on ground-based sensors. The food and beverage industry uses optical sensing to ensure that raw materials, additives and final products are of consistently high quality. Transporters and receivers of goods also use optical sensing because it is noninvasive, and shipments can be inspected for freshness or purity without being opened.

Apple growers, for example, examine levels of ethylene, fructose, galactose and sucrose because they can be correlated with apple ripeness. Measurements of these sugars can be performed on the trees or after picking by closely aiming a hand-held grating-based spectrometer at the surface of the fruit. A heated near-IR source produces a signal penetrating to a depth of a few millimeters. The detected signal is a combination of reflectance and transmission components of the penetrated signal (transflection).

By quantifying the spectral pattern of the near-IR signals passing through the fruit skin (measuring the levels of the sugars and the ethylene), multivariate techniques can determine whether the fruit is ripe and ready for harvest.

Often, phenomena are sufficiently complex that information must be collected from multiple sensor measurements or as a time series of sensor measurements.

Pollen identification

Spectral analysis also is used to study processes of fertilization. Identifying pollens collected from the atmosphere and mapped by geography can be important in understanding the processes of fertility in a particular species to aid placement of crops.

For example, we studied pollen samples with Raman, IR and fluorescence spectroscopies to learn whether pollen could be uniquely identified. Measurements were made on pressed/crushed samples using IR transmittance through a microscope. Fluorescence and Raman samples were observed directly on the pollen surfaces after the particles were deposited on aluminum-coated slides.

Using multiple data collection techniques for categorizing the pollen produced much better differentiation of four types of pollen samples — labeled P1, P2, P3 and P4 in Figure 1. The plot axes that correspond to principal components PC1 to PC3 map three scores (a defined point) for each individual in each type. The left side of Figure 1 — using only IR transmittance — shows incomplete separation of the individual pollen samples into clear categories. The right side — using three spectral techniques in the analysis — shows the classification into four clear categories.

Figure 1. The plot on the left shows components associated with four pollens analyzed by principal components analysis using only the IR spectra. The plot on the right shows the result using principal components analysis with three measurements (Raman, fluorescence and IR), where separation of the four pollen types is definitive.

Additives in oil

Spectral analysis can be used to quantify an additive — e.g., vitamin E — as it is added to an edible oil mixture on a processing line. We observed an additive oil mixture in the process stream at the additive mixing tank. A small portion of the stream was diverted through a quartz cell in the sample position of a process fluorescence system. The measurements and analysis for additive concentration were designed to operate during the mixing process in real time. However, correlation cannot be made directly from the signals because various oils cause different fluorescence levels in the additives at the same temperature.

In this process, the additive in various oils was found to have different fluorescence levels (Figure 2). In each plot, the additive fluoresces strongly at 320 nm when excited at 280 nm, but the oil does not fluoresce significantly at 320 nm. When excited at 315 nm, the additive does not fluoresce, but the oil does. Note, however, that the additive in two oil lots (Oil 2 and Oil 3) has very different intensities, even though it is nearly the same concentration in each lot (the additive in Oil 2 was prepared at 1880 ppm, and the additive in Oil 3 was prepared at 1830 ppm). The difference in the additive’s fluorescence intensity, however, is more than 15 percent, even though the same nominal concentration is present in each sample.

Bristol Instruments, Inc. - 872 Series High-Res 4/24 MR

Figure 2. The data from two lots of oil that have nearly the same additive concentration reveal that differences in the additives’ fluorescence signals are much greater than the concentration difference and result from interaction effects between the oil and the additive.

We found that temperature also caused a difference — accounting for an additional process variation of 10 percent. These large differences make measuring the concentration difficult. However, the latent variable technique provides a solution.

Samples representing a wide variety of oil lots and temperatureswere used in developing calibrations. The temperature range completely spanned the expected manufacturing conditions. The analysis method yielded concentration predictions with errors of 5 percent or less, even though the original intensity of the additive fluorescence varies by as much as 25 percent at a given concentration (Table 1).

Table 1. Partial least squares analysis results show the additive concentration errors
of multiple lots of oils and additives using five partial least squares factors.

Making sense of data

Spectral analysis as used in such agricultural and food applications produces a multitude of data. Multivariate methods use that data to classify and quantify, matching spectral patterns to a desired quality. By using calibration and validation techniques, patterns can be recognized and variables quantified even in the midst of other complex signals.

Multivariate analysis is results-based analysis; that is, the correlation to a value is trained into the algorithm so that, when the algorithm encounters the same patterns, it can predict the correct values as results.

Training occurs on as many factors as are required to recognize variations in the data. The range of the independent variable values must cover the expected variations found in the application.

Finding correlations reduces the number of variables — multiple wavelengths, temperature, pressure — into a smaller set of orthogonal components that account for the highest variations in the data values. These are the principal components or latent variables, which are then used to predict changes in observables that are characterized by scores or by quantitative values.

Spectroscopic sensors can produce an abundance of raw data defining the environments and conditions in which they are placed. Simplistic treatment of the raw data often leads to indeterminate or ambiguous conclusions. However, the same data can be transformed into meaningful and useful information when correlated with qualities and quantities in a calibration/validation methodology using multivariate algorithms. Data analyzed with calibrated multivariate approaches result in predictions that are well characterized qualitatively and that are accurate quantitatively.

Meet the authors

William A. Ivancic is principal research scientist and K. David Monson is agrifood program manager, both at Battelle in Columbus, Ohio; e-mail: [email protected].