This step may actually take more time than the analysis itself, and more often than
not the process consists of an iterative procedure where data preprocessing steps are
alternated with data analysis steps.
Some problems can immediately be recognized, such as measurement noise,
spikes, non-detects, and unrealistic values. In these cases, taking appropriate action
is rarely a problem. More difficult are the cases where it is not obvious which characteristics of the data contain information, and which do not. There are many examples
where chance correlations lead to statistical models that are perfectly able to describe
the training data (the data used to set up the model in the first place) but have no
predictive abilities whatsoever.

This chapter will focus on standard preprocessing techniques used in the natural
sciences and the life sciences. Data are typically spectra or chromatograms, and topics
include noise reduction, baseline removal, peak alignment, peak picking, and scaling.
Only the basic general techniques are
 mentioned here; some more specific ways to
improve the quality of the data will be treated in later chapters. Examples include
Orthogonal Partial Least Squares for removing uncorrelated variation (Sect. 11.4)
and variable selection
 (Chap. 10).