Calibrate Features Values Using Reference Sample
Source:R/calc-ref-normalization.R
calibrate_by_reference.Rd
This function calibrates feature abundances based on a specified reference sample.
Calibration can be applied to the entire dataset using one or more reference samples,
or batch-wise using reference sample analyses present within each batch.
For both approaches, multiple measurements of the same reference sample are
summarized using either mean
(default) or median
(set by the summarize_fun
argument).
Usage
calibrate_by_reference(
data,
variable,
reference_sample_id,
absolute_calibration,
batch_wise = FALSE,
summarize_fun = "mean",
store_conc_ratio = NULL,
undefined_conc_action = NULL,
store_normalized = FALSE
)
Arguments
- data
A
MidarExperiment
object containing the metabolomics data to be normalized- variable
Character string indicating which data type to calibrate Must be one of: "intensity", "norm_intensity", or "conc"
- reference_sample_id
Character vector specifying the sample ID(s) to use as reference(s) or standards
- absolute_calibration
Logical indicating whether to perform absolute calibration using known concentrations of the reference sample (TRUE) or relative calibration (FALSE).
- batch_wise
Logical indicating whether to perform calibration for each batch seperately (TRUE) or for all samples together (FALSE).
- summarize_fun
Either "mean" or "median". If
absolute_calibration = TRUE
, this function is used to summarize the reference sample concentrations across analyses of specifiedreference_sample_id
. Default is "mean".- store_conc_ratio
Logical. Whether to store the ratio of measured (non-calibrated) compared to the expected (known) concentrations. Only applied if
absolute_calibration = TRUE
. This ratio is stored under the feature variablefeature_conc_ratio
. By default it isTRUE
whenvariable = 'conc', otherwise
FALSE`.- undefined_conc_action
Character string specifying how to handle features without defined concentrations in reference samples when
absolute_calibration = TRUE
. Must be one of: "original" (keep original values), "na" (set to NA), or "error". Default is "keep".Default is
TRUE
.- store_normalized
Logical indicating whether to keep the normalized values in the dataset when
absolute_calibration = TRUE
. Default is FALSE. These values are then stored as[VARIABLE]_normalized
, where[VARIABLE]
is the input variable, e.g.,conc
.
Details
Calibration can be performed in two ways, either absolute, resulting in concentrations, or relative, resulting in ratios:
Absolute calibration (when
absolute_calibration = TRUE
)Calibrates (or re-calibrates) feature abundances based on known concentrations of the corresponding features defined for a reference sample. The calibrated concentration for a given analyte is calculated as:
$$c_\textrm{cal}^\textrm{Analyte} = \frac{c_\textrm{sample}^\textrm{Analyte}}{c_\textrm{ref}^\textrm{Analyte}} \times c_\textrm{known}^\textrm{Analyte}$$
The input variable can either
conc
,norm_intensity
, orintensity, whereas the result will always be stored under the variable
conc` (concentration), in the unit defined for the feature concentrations in the reference sample.Metadata requirements:
sample_id
andanalyte_id
must be defined for the reference sample and features in the analysis and feature metadata, respectively.Known analyte concentrations must be defined in the
QC concentration
metadata for the for the reference sampleAn error will be raised if no concentrations are defined for any features
Missing analyte concentrations for the reference sample can be handed via
undefined_conc_handling
with following options:original
: Keep original feature values, i.e. the non-calibrated values will be returned. Note: this is only available whenvariable = conc
. Use with caution to avoid mixing units.na
: Set affected features values toNA
error
(default): The function stops with error in case of any undefined reference sample feature concentration.In case all feature concentrations are undefined, the function will stop with an error.
The re-calibrated feature concentrations are stored as
conc
, overwriting existingconc
values. The originalconc
values are stored asconc_beforecal
.The ratio between the measured and expected (known) concentrations in the reference sample is available via the feature variable
feature_conc_ratio
and is calculated as follows:$$c_\textrm{ratio}^\textrm{Analyte} = \frac{c_\textrm{measured}^\textrm{Analyte}}{c_\textrm{expected}^\textrm{Analyte}}$$
where \(c_\textrm{measured}\) is the measured (non-calibrated) concentration, and \(c_\textrm{expected}\) is the known or reference concentration for the same analyte. A bias value of 1 indicates perfect agreement; values above or below 1 indicate over- or underestimation.
To export the calibrated concentrations use
save_dataset_csv()
withvariable = "conc", or to export non-calibrated values with
variable = "conc_beforecal". When saving the MiDAR XLSX report, the calibrated concentrations will also be stored as
conc`.Normalization (relative calibration,
absolute_calibration = FALSE
)Normalizes features abundances with corresponding feature abundances in a reference sample, resulting in ratios. Any available feature abundance variable (i.e.,
conc
,norm_intensity
, orintensity
) can be used as input. The normalization is calculate for all present features. The resulting output will be stored as[VARIABLE]_normalized
, whereby[VARIABLE]
is the input variable, e.g.,conc_normalized
.To export the normalized abundances , use
save_dataset_csv()
withvariable = "[VARIABLE]_normalized"
For MiDAR XLSX report, usesave_report_xlsx()
with same variable setting as forsave_dataset_csv()
to When saving the MiDAR XLSX report viasave_report_xlsx()
, availble unfiltered normalized feature abundances will be included by default. To include filtered normalized feature abundances, setfiltered_variable = "[VARIABLE]_normalized"
.
Examples
dat_file = system.file("extdata", "S1P_MHQuant.csv", package = "midar")
meta_file = system.file("extdata", "S1P_metadata_tables.xlsx", package = "midar")
# Load data and metadata
mexp <- MidarExperiment()
mexp <- import_data_masshunter(mexp, dat_file, import_metadata = FALSE)
#> ✔ Imported 65 analyses with 16 features
#> ℹ `feature_area` selected as default feature intensity. Modify with `set_intensity_var()`.
mexp <- import_metadata_analyses(mexp, path = meta_file, sheet = "Analyses")
#> ✔ Analysis metadata associated with 65 analyses.
mexp <- import_metadata_features(mexp, path = meta_file, sheet = "Features")
#> ✔ Analysis metadata associated with 65 analyses.
#> ✔ Feature metadata associated with 16 features.
mexp <- import_metadata_istds(mexp, path = meta_file, sheet = "ISTDs")
#> ✔ Analysis metadata associated with 65 analyses.
#> ✔ Feature metadata associated with 16 features.
#> ✔ Internal Standard metadata associated with 2 ISTDs.
# Load known feature concentrations in the reference sample
mexp <- import_metadata_qcconcentrations(mexp, path = meta_file, sheet = "QCconcentrations")
#> ✔ Analysis metadata associated with 65 analyses.
#> ✔ Feature metadata associated with 16 features.
#> ✔ Internal Standard metadata associated with 2 ISTDs.
#> ✔ QC concentration metadata associated with 1 annotated samples and 6 annotated analytes
mexp <- normalize_by_istd(mexp)
#> ! Interfering features defined in metadata, but no correction was applied. Use `correct_interferences()` to correct.
#> ✔ 14 features normalized with 2 ISTDs in 65 analyses.
mexp <- quantify_by_istd(mexp)
#> ✔ 14 feature concentrations calculated based on 2 ISTDs and sample amounts of 65 analyses.
#> ℹ Concentrations are given in μmol/L.
# Absolute calibration
# --------------------
mexp <- calibrate_by_reference(
data = mexp,
variable = "conc",
reference_sample_id = "SRM1950",
absolute_calibration = TRUE,
batch_wise = FALSE,
summarize_fun = "mean",
undefined_conc_action = "original"
)
#> ! One or more feature concentration are not defined in the reference sample SRM1950. Original values will be returned for these. To change this, modify `undefined_conc_action` argument.
#> ✔ 6 feature concentrations were re-calibrated using the reference sample SRM1950.
#> ℹ Concentrations are given in umol/L.
# Export absolute calibration concentrations
save_dataset_csv(mexp, "calibrated.csv", variable = "conc", filter_data = FALSE)
#> ✔ Concentration values for 65 analyses and 7 features have been exported to 'calibrated.csv'.
# Export non-calibrated concentrations
save_dataset_csv(mexp, "noncalibrated.csv", variable = "conc_beforecal", filter_data = FALSE)
#> ✔ Conc_beforecal values for 65 analyses and 16 features have been exported to 'noncalibrated.csv'.
# Create XLSX report with calibrated concentrations as filtered dataset
save_report_xlsx(mexp, "report.xlsx", filtered_variable = "conc")
#>
Saving report to disk - please wait...
#> ✔
The data processing report has been saved to 'report.xlsx'.
# Relative calibration
# --------------------
mexp <- calibrate_by_reference(
data = mexp,
variable = "conc",
reference_sample_id = "SRM1950",
batch_wise = FALSE,
absolute_calibration = FALSE
)
#> ✔ All features were normalized with reference sample SRM1950 features.
#> ℹ Unit is: sample [conc] / SRM1950 [conc]
# Export SRM1950-normalized concentrations
save_dataset_csv(mexp, "normalized.csv", variable = "conc_normalized", filter_data = FALSE)
#> ✔ Conc_normalized values for 65 analyses and 16 features have been exported to 'normalized.csv'.
# Create XLSX report with SRM1950-normalized concentrations as filtered dataset
save_report_xlsx(mexp, "report.xlsx", filtered_variable = "conc_normalized")
#>
Saving report to disk - please wait...
#> ✔
The data processing report has been saved to 'report.xlsx'.