Drift Correction by Cubic Spline Smoothing
Source:R/correct-drift-batch.R
correct_drift_cubicspline.Rd
This function corrects for run-order drifts within or across batches using cubic spline smoothing. The correction is typically based on QC (Quality Control) samples that are measured at specific intervals throughout the run sequence. The smoothed curve derived from the QC samples is then used to adjust all other samples in the dataset. The correction can be applied to "intensity", "norm_intensity", or "conc" data.
The cubic spline smoothing approach, particularly when used with the
regularization parameter lambda
, is similar but not identical to previously described
QC-based drift correction methods, such as QC-RSC (Quality Control Regularized
Spline Correction), described in Dunn et al. (Nat Protoc, 2011) and Kirwan et al.
(Anal Bioanal Chem, 2014).
By default, the smoothing parameter is determined using cross-validation,
which can lead to overfitting. To reduce overfitting
the regularization parameter lambda
may be defined, with a good starting point
being lambda = 0.01
. Additionally, the global
smoothing parameter can be specified via spar
.
It is recommended to visually inspect the correction using the
plot_runscatter()
function. Set the argument
recalc_trend_after = TRUE
so that the trends after correction are also
available for plotting. For further details, refer to the description
of plot_runscatter()
.
The corrections can be applied on a batch-by-batch basis (batch_wise = TRUE
, default) or across all batches (batch_wise = FALSE
). Existing
corrections are either replaced (replace_previous = TRUE
) or added on top
of them (replace_previous = FALSE
).
Furthermore, drift correction can be applied unconditionally
(conditional_correction = FALSE
) or conditionally, based on whether the
sample CV change before and after correction is below a defined
threshold (cv_diff_threshold
). This conditional correction is assessed
independently for each batch if batch_wise = TRUE
, where the median of
the CV changes across the batch is compared with the threshold.
Note: The function outputs a message indicating the median CV change and the mean absolute CV before and after correction for all samples. However, these metrics are experimental and should not be used as definitive criteria for correction (see Details below).
This cubic spline method is implemented using the base R function
Usage
correct_drift_cubicspline(
data = NULL,
variable,
ref_qc_types,
batch_wise = TRUE,
ignore_istd = TRUE,
replace_previous = TRUE,
cv = TRUE,
spar = NULL,
lambda = NULL,
penalty = 1,
conditional_correction = FALSE,
recalc_trend_after = FALSE,
log_transform_internal = TRUE,
feature_list = NULL,
cv_diff_threshold = 0,
use_original_if_fail = FALSE
)
Arguments
- data
MidarExperiment object
- variable
The variable to be corrected for drift effects. Must be one of "intensity", "norm_intensity", or "conc"
- ref_qc_types
QC types used for drift correction
- batch_wise
Logical. Apply the correction to each batch separately (
TRUE
, default) or across all batches (FALSE
).- ignore_istd
Logical. Exclude internal standards (ISTDs) from correction if
TRUE
.- replace_previous
Logical. Replace existing correction (
TRUE
, default) or layer on top of it (FALSE
).- cv
Ordinary leave-one-out (TRUE) or ‘generalized’ cross-validation (GCV) when FALSE; is used for smoothing parameter computation only when spar is not specified
- spar
Smoothing parameter for cubic spline smoothing. If not specified or
NULL
, the smoothing parameter is computed using the specified cv method. Typically (but not necessarily) in in (0,1].- lambda
Regularization parameter for cubic spline smoothing. Default is 0, which means no regularization.
- penalty
The coefficient of the penalty for degrees of freedom in the GCV criterion.
- conditional_correction
Determines whether drift correction should be applied to all features unconditionally (
TRUE
) or conditionally, based on sample CV change.- recalc_trend_after
Recalculate trend post-drift correction for
plot_qc_runscatter()
. This will double calculation time.- log_transform_internal
Log transform the data for correction when
TRUE
(the default). Note: log transformation is solely applied internally for smoothing, results will not be log-transformed.- feature_list
Subset the features for correction whose names match the specified text using regular expression. Default is
NULL
.- cv_diff_threshold
Maximum allowable change in CV ratio before and after smoothing for correction to be applied.
- use_original_if_fail
Determines the action when smoothing fails or results in invalid values for a feature. If
FALSE
(default), the result for each feature willNA
for all batches, ifTRUE
, the original data is kept.
Details
In the output message, the median CV change is computed as the median of CV changes for all
features in global correction or for
features where the correction passed the defined CV difference treshold in
case of conditional correction (conditional_correction = FALSE
).
For batch-wise correction, the change is calculated per batch, with the final median CV
change being the median of these batch medians across features.
References
Dunn, W., Broadhurst, D., Begley, P. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat Protoc 6, 1060–1083 (2011). https://doi.org/10.1038/nprot.2011.335
Kirwan, J.A., Broadhurst, D.I., Davidson, R.L. et al. Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow. Anal Bioanal Chem 405, 5147–5157 (2013). https://doi-org.libproxy1.nus.edu.sg/10.1007/s00216-013-6856-7