A basic MiDAR workflow

This tutorial outlines some key steps in a MiDAR workflow, based on a lipidomics dataset. See the tutorial Lipidomics Data Processing for a more detailed example.

Keep in mind that these examples below are simplified and may not be applicable to your data and experimental setup. Please consult other tutorials and recipes for information on other workflows and data types.

Setting up a RStudio project

To start a new MiDAR data analysis, creating a RStudio project is recommended. (See Using RStudio Projects). This will help to keep the data analysis organized and makes it easier to share with others. The project should contain the following subfolders: data and output. Add data and metadata files to the data folder.

Notebooks such as R/Notebook (.rmd) or Quarto Notebook (.qmd) are good choices to create documented data processing workflows. These formats allow combining code with formatted text to document the data processing steps.

Start with a new notebook or R script and load the midar package

library(midar)

Creating a MidarExperiment object

The MidarExperiment object is the main data container used in the MiDAR workflow. See (The MidarExperiment data object)[articles/03_midarexperiment.html] for more information.

We start by creating a new MidarExperiment object (myexp), which will be used in all subsequent steps.

myexp <- MidarExperiment()

Importing analysis results

As introduced in Preparing and importing data
we first import the analytical data, in this case from MRMkit file. This file also contains some metadata, such as qc_type and batch_id (see Data Identifiers in MiDAR) which we will import as well.

myexp <- import_data_mrmkit(data = myexp, 
                            path = "datasets/sPerfect_MRMkit.tsv", 
                            import_metadata = TRUE)
#> ✔ Imported 499 analyses with 503 features
#> ℹ `feature_area` selected as default feature intensity. Modify with `set_intensity_var()`.
#> ✔ Analysis metadata associated with 499 analyses.
#> ✔ Feature metadata associated with 503 features.

Adding metadata

The subsequent processing steps require additional infomation that is not available from the imported analysis data. This includes information such as which internal standards are used to normalize, their concentrations and the sample amounts analysed. These metadata can be imported from separate files or R data frames as described in Preparing and importing data. To keep the code concience in this example, we will import metadata from an msorganiser template. The validation checks result in some warnings, which will by default result in a failed metadata import. However, assuming we understand what we are doing, we decided to ignore these warnings by setting ignore_warnings = TRUE. They will still be shown in the table printed in the console, labelled with an asterix (*) in the status column.

myexp <- import_metadata_msorganiser(
  myexp, 
  path = "datasets/sPerfect_Metadata.xlsm", 
  ignore_warnings = TRUE
)
#> ! Metadata has following warnings and notifications:
#> --------------------------------------------------------------------------------------------
#>   Type Table    Column                Issue                           Count
#> 1 W*   Analyses analysis_id           Analyses not in analysis data      15
#> 2 W*   Features feature_id            Feature(s) not in analysis data     4
#> 3 W*   Features feature_id            Feature(s) without metadata         1
#> 4 W*   ISTDs    quant_istd_feature_id Internal standard(s) not used       1
#> --------------------------------------------------------------------------------------------
#> E = Error, W = Warning, W* = Supressed Warning, N = Note
#> --------------------------------------------------------------------------------------------
#> ✔ Analysis metadata associated with 499 analyses.
#> ✔ Feature metadata associated with 502 features.
#> ✔ Internal Standard metadata associated with 17 ISTDs.
#> ✔ Response curve metadata associated with 12 annotated analyses.

Applying Data Processing

Now we are ready to proceed with data processing. In this example we will employ some basic data processing steps, whereby the corresponding code should be self explanatory. At the end of this code block we also set criteria based on which features are filtered on demand later in the workflow.

myexp <- normalize_by_istd(myexp)
#> ! Interfering features defined in metadata, but no correction was applied. Use `correct_interferences()` to correct.
#> ✔ 460 features normalized with 17 ISTDs in 499 analyses.
myexp <- quantify_by_istd(myexp)
#> ✔ 460 feature concentrations calculated based on 42 ISTDs and sample amounts of 499 analyses.
#> ℹ Concentrations are given in μmol/L.

myexp <- correct_drift_gaussiankernel(
  data = myexp,
  variable = "conc",
  ref_qc_types = c("SPL"))
#> ℹ Applying `conc` drift correction...
#> ℹ 2 feature(s) contain one or more zero or negative `conc` values. Verify your data or use `log_transform_internal = FALSE`.
#>   |                                                    |                                            |   0%  |                                                    |=                                           |   2%  |                                                    |==                                          |   4%  |                                                    |===                                         |   6%  |                                                    |====                                        |   8%  |                                                    |====                                        |  10%  |                                                    |=====                                       |  12%  |                                                    |======                                      |  14%  |                                                    |=======                                     |  16%  |                                                    |========                                    |  18%  |                                                    |=========                                   |  20%  |                                                    |==========                                  |  22%  |                                                    |===========                                 |  24%  |                                                    |============                                |  26%  |                                                    |============                                |  28%  |                                                    |=============                               |  30%  |                                                    |==============                              |  32%  |                                                    |===============                             |  34%  |                                                    |================                            |  37%  |                                                    |=================                           |  39%  |                                                    |==================                          |  41%  |                                                    |===================                         |  43%  |                                                    |====================                        |  45%  |                                                    |=====================                       |  47%  |                                                    |=====================                       |  49%  |                                                    |======================                      |  51%  |                                                    |=======================                     |  53%  |                                                    |========================                    |  55%  |                                                    |=========================                   |  57%  |                                                    |==========================                  |  59%  |                                                    |===========================                 |  61%  |                                                    |============================                |  63%  |                                                    |=============================               |  65%  |                                                    |=============================               |  67%  |                                                    |==============================              |  69%  |                                                    |===============================             |  71%  |                                                    |================================            |  73%  |                                                    |=================================           |  75%  |                                                    |==================================          |  77%  |                                                    |===================================         |  79%  |                                                    |====================================        |  81%  |                                                    |=====================================       |  83%  |                                                    |=====================================       |  85%  |                                                    |======================================      |  87%  |                                                    |=======================================     |  89%  |                                                    |========================================    |  91%  |                                                    |=========================================   |  93%  |                                                    |==========================================  |  95%  |                                                    |=========================================== |  97%  |                                                    |============================================|  99%  |                                                    |============================================| 100% - trend smoothing done!
#> ! 1 features showed no variation in the study sample's original values across analyses. 
#> ! 1 features have invalid values after smoothing. NA will be be returned for all values of these faetures. Set `use_original_if_fail = FALSE to return orginal values..
#> ✔ Drift correction was applied to 459 of 460 features (batch-wise).
#> ℹ The median CV change of all features in study samples was -0.56% (range: -10.22% to 2.49%). The median absolute CV of all features across batches decreased from 38.96% to 38.56%.

myexp <- midar::correct_batch_centering(
  myexp, 
  variable = "conc",
  ref_qc_types = "SPL")
#> ℹ Adding batch correction on top of `conc` drift-correction.
#> ✔ Batch median-centering of 6 batches was applied to drift-corrected concentrations of all 460 features.
#> ℹ The median CV change of all features in study samples was -0.50% (range: -27.90% to 10.30%).  The median absolute CV of all features decreased from 39.89% to 39.46%.

myexp <- filter_features_qc(
  data = myexp,
  include_qualifier = FALSE,
  include_istd = FALSE, 
  min.signalblank.median.spl.sblk = 10,
  max.cv.conc.bqc = 25)
#> Calculating feature QC metrics - please wait...
#> ✔ New feature QC filters were defined: 182 of 423 quantifier features meet QC criteria (not including the 25 quantifier ISTD features).

Plotting data

MiDAR provides various plot function that can be useful in understanding analytical performance, trends and isses in the data. Plots are also available to inspect the effect of data processing (e.g. drift/bath-effect correction) and QC-based feature filtering.

Below we create a runscatter plot to visualize the concentration of specific features across the analytical series in different QC sample types. The plot can also be saved to a PDF file.

  plot_runscatter(
    data = myexp,
    variable = "conc",
    include_feature_filter = "PC 4",
    include_istd = FALSE,
    cap_outliers = TRUE,
    log_scale = FALSE,
    output_pdf = FALSE,
    path = "./output/runscatter_PC408_beforecorr.pdf",
    cols_page = 3, rows_page = 2,
  )
#> Generating plots (1 page)...

RunScatter plot with 2x3 panels

#>  - done!

Finally, we can export specific datasets as plain csv tables, create a detailed data report, and share the entire MidarExperiment object with someone else without any code, who can run own data processing, plots and QC checks.