Feature Variables in MiDAR • midar

Feature variables represent variables that store values that are associated with a feature in a specific sample. The describe e.g., the absolute or relative abundance, the chromatographic retention time, the peak shape, or especially when also processed data are imported, also properties like measurement accuracy.

The feature variables can be accessed in MiDAR functions by either their internal name, which always starts with feature_ (e.g., feature_intensity) or by their short names, such as conc, intensity, norm_intesity, and rt.

Key Feature Variables

Following feature variables are essential for organizing the data processing flow in MiDAR. The intensity variable corresponds to the raw signal (e.g. peak area) of the analysis, which are retrieved from one of the original feature variables (see below) All feature variables downstream are the result of data processing, including those available in the MiDAR workflow. If raw, partial or fully post-processed data is imported into MiDAR, e.g. via import_data_csv(), the imported values must be assigned or match these variables names.

Many data processing and plotting functions in MiDAR use of these variables as input variable. All of these variable are stored in the dataset table present in the MidarExperiment object.

Variable Short Name	Internal Name	Description
intensity	feature_intensity	Raw signal intensity. A copy of either `area`, `height`, `response`, or `intensity` from the imported raw data. See below.
nom_intensity	fe ature_norm_intensity	Internal Standard-normalized raw signal intensity
pmol_total	feature_pmol_total	Total feature (analyte) amount in the measured sample (pmoles/sample)
conc	feature_conc	Feature (analyte) concentration
conc_normalized	fea ture_conc_normalized	Normalized feature (analyte) concentration normalized by e.g. reference sample or other normalization methods
conc_bias	feature_conc_ratio \| The ratio of measured and \| expected/known feature \| \| concentration defined a \| \| reference sample. The variable \| \| is generated by the function \| \| `calibrate_by_reference()` in \| \| case of absolute calibration. \|

Backup Feature Variables

Feature variables will be overwritten in some processing by re-calculated values, i.e., feature area after interference correction, concentrations after drift/batch correction or reference sample-based re-calibration. In these cases, the original feature values are stored in a new ‘backup’ feature variable to keep a record and allow exploring the variable at a later stage. Furthermore, specific variables are created by some processing functions, e.g. during drift correction the values of the curve (model fit) data points.

Also these variables are stored in the dataset table. The feature variable name is based on the initial name with following postfixes:

Postfix of Variable Name	Examples	Description
_orig	feature _intensity_orig	Currently, only used for the backup before any interference correction was applied. Corresponds always to the raw intensity values that were imported into the `MidarExperiment` object.
_raw	f eature_conc_raw featur e_intensity_raw	Corresponds to the raw (= uncorrected) calculated features values, i.e., before a new correction step was applied, such as drift/batch correction.
_before	feat ure_conc_before feature_i ntensity_before	Corresponds to the last calculated value before a new correction step was applied, i.e., drift/batch correction. For example, adjusted concentration after a drift correction, before batch correction was applied.
_beforecal	feature _conc_beforecal	The non-calibrated raw or adjusted concentration before `calibrate_by_reference()` was applied. Only applies to the concentration variable
_fit	conc_before_fit	Model fit data points calculated by drift and batch correction functions. Use by `runscatter()` to plot the trends.

Raw Feature Variables

The supported feature variables listed below characterize the features in additional aspects. Import is that only of variables can be is used as “feature raw intensity” for all data processing steps. This variable is then copied to intensity after importing the data, by default area if available, or then height, response or intensity in this order. The feature variable use for intensity can also manually be set via `set_intensity_var()`

These variables are stored in the dataset_orig table, and are not being modified by any MiDAR function. They are typically not available in MiDAR’s data processing functions, but plotting functions support some of these. The support will be extended in upcoming versions.

Variable Short Name =================== area	Internal Name ======================= feature_area	Description ================================ Peak area
Variable Short Name =================== area	Internal Name ======================= feature_area	Description ================================ Peak area
height	height_height	Peak height
response	height_response	Feature response (vendor specific)
intensity	feature_intensity	Feature intensity (vendor specific)
rt	feature_rt	Retention time
fwhm	feature_fwhm	Full Width at Half Maximum Peak Height
width	feature_width	Peak Width
int_start	feature_int_start	Peak integration Start Time
int_end	feature_int_end	Peak Integration End Time
symmetry \| feature_symmetry \| Peak symmetry
sn_ratio	feature_sn_ratio	Signal-to-Noise ratio
accuracy	feature_accuracy	Accuracy (often in reference to target value)
ionratio	feature_ionratio	Ion ratio

QC type is represented with a consistent color scheme (both fill and line colors) and specific point shapes in all plots generated by the MiDAR package. This visual coding allows consistent identification and comparison of different QC types across various visualizations.