Daniel A. Shiley, Panalytical Boulder
Near-infrared spectroscopy moves out of the lab and into manufacturing.
Economic volatility continues to put pressure on manufacturers, increasing the focus on cost reduction and process optimization across applications. To make improvements in these areas, a solid understanding of the materials and overall process is required through decisive metrics. Generally, increased data comes at a higher cost, at least when considering standard chemical analyses. To decouple this trend, many companies are turning to near-infrared (NIR) spectroscopy to provide the real-time data needed to optimize their processes at a lower cost.
Near-infrared spectroscopy traditionally was used in the lab, but industry is starting to adopt it for production applications.
Historically, NIR has been used in laboratory environments. But as instruments have become smaller and more robust, an increasing number have found their way into production environments as portable, at-line and online systems. This includes applications in raw material certification, qualitative measurement of in-process materials, and quantitative measurement of in-process and finished product analyses.
NIR spectroscopy is a secondary technique and as such requires the creation of spectral libraries or calibration models. NIR systems are being used in several ways, with each having its own set of calibration development challenges.
Spectral libraries can easily be constructed for raw material certification and validation, but care must be taken to represent all the sources of variability in raw materials. Many experts recommend that there be examples for each raw material to be measured. Spectral libraries can be useful for identifying gross similarities between incoming raw materials by using a pattern-matching algorithm. Determination of more subtle differences requires a multivariate model.
Production applications where NIR spectroscopy can be particularly useful include raw material certification, qualitative measurement of in-process materials and product analysis.
Multivariate models are, quite simply, models that use multiple variables (spectral points) in the creation of the prediction or classification model. Multivariate models are necessary because the spectral features in the visible and NIR regions overlap and are not the result of a single molecular group. Considering the entire spectrum within the visible to NIR regions (350 to 2500 nm) leads to a more definitive understanding of the sample composition.
Calibration models are needed whether the NIR is used in the laboratory, at-line or as part of an online system. To create the multivariate model, users must consider sources of error and understand the effects of these errors on the models.
Anytime a model is developed, there should be independent calibration and validation sets. Samples used for development of the calibration and validation sets should include the entire range of composition that would be expected from routine samples. The sets also should include any contaminants or constituent combinations that would be anticipated. Both sets should be developed from the total available samples at the time of calibration creation.
A common mistake is to create the calibration and then obtain more samples to use as a validation set. This is problematic for a number of reasons, but primarily because the second set of samples may have an entirely different distribution of concentration values. That can cause difficulties in interpretation of the validation results because the R2 value and standard error of prediction are both somewhat influenced by the concentration distribution of the samples.
A new set of samples can be used to test the model, but this set should not be expected to provide exactly the same statistics as the model, whereas the calibration and validation sets should have very similar performance statistics. Selection of the validation set is critical: The samples selected for the validation set must encompass the same variation and sample characteristics as are contained in the calibration set. The validation set should contain a minimum of 20 percent of the amount of samples in the calibration set.
An easy way to assure that the validation set is representative is to choose the calibration and validation set at the same time from all available samples. View all the reference data in a spreadsheet, and rank-order the samples by their constituent value, then mark every fifth sample for use in the validation set. Validation samples must be selected without knowledge of their predicted values. Also, samples should not be removed from the validation set due to lack of fit from the prediction unless the reference data is found to be erroneous upon retest of these samples.
NIR is a molecular technique, not an elemental technique, so it is important to consider this when choosing the reference method. The reference method used for the calibration and validation should have the lowest possible laboratory error.
The best way to determine reproducibility of a method is through the use of replicate analysis of a limited number of samples (6 to 12) by the reference method. The standard error of laboratory (SEL) should be calculated using these replicates.
It is always best to determine the lab error rather than assuming that it is acceptable. If the reference error as measured by SEL is higher than acceptable, it is likely that the calibration model will also have substantial error.
The reference data used for creation of the multivariate model can be either from a chemical test or from a physical test, such as viscosity. In either case, the changes in the spectra must be related to the property that is measured using the reference assay – or no calibration will be possible.
Collection of spectra
When collecting spectra for multivariate modeling, it is important to control as many relevant variables as possible. System performance should be verified prior to collection of spectra. At a bare minimum, the wavelength stability should be checked to ensure that the system is functioning properly.
The calibration and validation samples should be prepared in the same manner as would be used for new unknown samples measured by the system. If the goal of the model will be to measure materials online, the sample spectra should be collected in the form and condition that the online system would be expected to measure.
Several chemometric programs are available to create the multivariate model. These programs combine the spectral data and reference data to produce a multivariate calibration model. Commercially available programs include GRAMS/IQ from Thermo Fisher Scientific of Woburn, Mass.; The Unscrambler from Camo Software Inc. of Woodbridge, N.J.; PLS_Toolbox from Eigenvector Research of Wenatchee, Wash.; and SIMCA-P of San Jose, Calif.
Although the exact sequence and calibration development tools differ from program to program, the basic steps remain the same. Significant differences between these model development software tools exist, and the exact features of each program are not duplicated in the others.
The most important statistics that describe the calibration are standard error of cross-validation and correlation coefficient – R2. Samples removed as concentration outliers from the model should be rechecked by the primary assay and should also have new spectra acquired for these samples, if possible.
Samples removed as spectral outliers should be thoroughly investigated, as the leading cause of spectral outliers is inadequate representation in the calibration set. Outliers can provide the calibration developer with important information, so they should not simply be discarded without additional investigation. Too often, samples removed during the calibration development cycle are not adequately investigated, which can result in models that are not robust. The goal is always to remove as few samples as possible during creation of the model.
However, the calibration developer always has to decide whether a sample that has a poor fit to the primary method has a poor fit because of the lab or because that type is not adequately represented in the model. It is quite common to resubmit some sample to the primary assay. If samples have high spectral deviation, it means they are poorly represented in the model, whether that particular sample type is likely to be encountered again or whether it was a one-time occurrence.
Have a plan
NIR calibrations will need to be updated on a frequent basis – or when the prediction error rises beyond the acceptable limits. Most models will need updates within the first year, and less frequently after one year as more samples are added and more variability has been explained by the model. The most important thing is to have a plan that includes how the model will be evaluated on an ongoing basis and the conditions that should prompt a remodel outside of the normal remodeling schedule. The plan should also include a budget to support the plan. It should include how often samples will be submitted for comparison to the primary method, how these samples will be selected, how many samples should be used for calculation of comparisons, what statistics will be reviewed and what limits of R2 or SEL will prompt a general remodel.
Additionally, the plan should set time limits for the model review. The performance of all multivariate models should be continually verified using comparisons to the primary reference assays, and when submitting samples for primary reference
assays, it is desirable to include a few samples that have been previously measured. This resubmission can help identify any changes in the primary reference method. If changes have been made to the method, the new data will likely cause a reduction in calibration performance if added to the original data without accounting for the new bias.
Toward better control
Using NIR throughout the manufacturing process can allow for better decision making, tighter process controls and reduction of off-spec product. NIR used for raw material ID helps to ensure that the process starts on the right foot. At-line NIR can allow instrumentation to be placed in the production area, where it can frequently help control the process. Online NIR allows for real-time monitoring of processes, which can help businesses save money by reducing the normal overage associated with creating
a product to a specification. This results in cost savings because real-time monitoring allows for a set point closer to the real product specification.
Online NIR systems are even becoming part of a process-line multivariate sensor system that can be used to predict the ending outcome. This allows the plant to modify various process parameters to prevent off-spec product from ever being produced, and to project ahead of time when a batch is likely to be out of control and result in off-spec material. And of course preventing rework is better than only having a test plan to identify it.
Meet the author
Daniel A. Shiley is the manager of the SummitCAL Solutions Team at PANalytical Boulder (formerly ASD Inc.) in Colorado; email: firstname.lastname@example.org.