{MiDAR}

Small Molecule Mass Spectrometry Large-Scale Data Processing, Quality Control, Analysis and Reporting

R/Basel 2023

Bo Burla - Singapore Lipidomics Incubator - National University of Singapore

SLING

Singapore Lipidomics Incubator

  • State-of-the art facility (mass spec, automation, data processes)
  • Strong base of skilled & well-trained staff
  • Tusted partner for clinical cohort studies
  • Clinical translation of assays
  • NUS-Agilent Hub (Video)
  • Conferences (isls11.eventengage.live)
  • Prof Markus R Wenk & Dr Anne K Bendt

More info: sling.sg

Big Interest in Small Molecule Analyses

Metabolites - Lipids - Signalling molecules - Natural Products - Drugs

Mass Spectrometry-Based Workflow

Available Tools for Data Post-Processing

Issues

  • Limited and static functions
  • By and for bioinformaticians
  • Blackboxes for the end-user
  • Assuming ‘perfect’ analyses and data
  • Manual (meta)data preparation often required

Analytical ‘Reality’

  • Project Diversity
  • Methods/Platform Diversity
  • Analytical performance/failures
  • Mistakes
  • Communication wet and dry lab
  • Processing scripts rarely published

Commercial tools

  • MS vendor software

  • EXCEL / ‘in-house’ R

Packages for lipidomics

  • lipidr Mohamed et al. (2020)

Packages for Metabolomics

  • pmp Jankevics et al. (2023)
  • metaX Wen et al. (2017)
  • tidyMS Riquelme et al. (2020)

Proteomics/Transcriptomics

  • Diverse Bioconductor packages

Data and Metadata

Data

  • Vendor peak integration tool outputs
  • Open source tools (e.g. Teo et al. (2020))
  • Generic formats (incl. pre-processed data)

Metadata

  • Analyses, Samples, Features, Calibration, exp. groups etc
  • CSV or XLS(M) templates (github/slinghub)
  • LIMS

{MiDAR} - Enabling data processing by the lab

For

  • Bioanalytical scientists and
  • Data scientists
  • Publishing

Aims

  • Supervised automation
  • Flexible modular steps
  • Standaridized and reproducible
  • Validated
  • Data and Process sharing
  • Tool and libary

Workflow

  • Lab people analyse data with {MiDAR}
  • Lab people share {MiDAR} project file
  • Data people support and amend
  • {MiDAR} project returns to the lab people

Challenges

  • Learning R
  • User acceptance
  • Scripts vs UI
  • Resources for SW development

Basic Workflow

library(midar)

mexp <- MidarExperiment()

mexp <- read_masshunter_csv(mexp, file_dir_names = data_path)
mexp <- read_msorganizer_xlm(mexp, filename = meta_path)

mexp <- normalize_by_istd(mexp)
mexp <- quantitate_by_istd(mexp)

mexp <- calculate_qc_metrics(mexp)
mexp <- apply_qc_filter(mexp, CV_BQC_max = 20, SB_RATIO_min = 5, R2_min = 0.8, RQC_CURVE = 1)

Basic Workflow

mexp <- MidarExperiment()

mexp <- read_masshunter_csv(mexp, file_dir_names = data_path)
mexp <- read_msorganizer_xlm(mexp, filename = meta_path)

mexp <- normalize_by_istd(mexp)
mexp <- quantitate_by_istd(mexp)

mexp <- calculate_qc_metrics(mexp)
mexp <- apply_qc_filter(mexp, CV_BQC_max = 20, SB_RATIO_min = 5, R2_min = 0.8, RQC_CURVE = 1)

Another example: Drift smoothing

mexp <- corr_drift_loess(
  data = mexp,
  qc_types = "BQC",
  within_batch = TRUE, 
  apply_conditionally = TRUE,
  min_sample_cv_ratio_before_after = 1,
  log2_transform = TRUE,
  span = 0.75)

QC of highly-dimensional large-scale data

Lipidomics analysis of 5000 samples took ~ 6 months, of which 2-3 months were instrument time

QC Plots as sharable HTML pages

plot_runscatter(data = mexp, 
                y_var = "Intensity", 
                feature_filter = "^Cer|Hex3Cer|GM3", 
                cap_values = TRUE, 
                show_batches = TRUE,
                outputPDF = FALSE, 
                annot_scale = 1.9, point_size = 4) 
[1] "Plotting 3 pages..."
[1] "page 1"

[1] "page 2"

[1] "page 3"

Modifiable {MiDAR} outputs

plt <- plot_pca_qc(data = mexp, variable = "Concentration", 
                        log_transform = FALSE, dim_x = 1, dim_y = 2)
ggplotly(plt)

Share data and the processing steps

print(mexp)
save(mexp, "D:/my-data/")

 MidarExperiment 
 
   Data:  
   • Samples:  215 
   • Features:   428 
 
   Metadata:  
   • Sample annotation:  ✓ 
   • Feature annotation:  ✓ 
   • Internal standard annotation:  ✓ 
   • Response curves annotation:  ✓ 
   • Study samples annotation:  ✗ 
 
   Processing status:  Adjusted Quantitated Data 
 
   Processing:  
   • ISTD normalized:  ✓ 
   • ISTD quantitated:  ✓ 
   • Drift corrected:  ✓ 
   • Batch corrected:  ✗ 
   • Interference (isotope) corrected:  ✗ 

Outlook

  • Development
  • Collaborations
  • UI (Shiny)

Info

Acknowledgements

  • R/Basel organizers

  • Markus R. Wenk
  • Federico Torta
  • Hyung Won Choi
  • Shanshan Ji
  • Alicia Chan
  • Jeremy Selva
  • Anne K. Bendt

This presenstation was made with Quarto with elements taken from https://github.com/garthtarr/sydney_quarto/

References

Buergel, T., Steinfeldt, J., Ruyoga, G., Pietzner, M., Bizzarri, D., Vojinovic, D., … Landmesser, U. (2022). Metabolomic profiles predict individual multidisease outcomes. Nature Medicine, 28(11), 2309–2320. DOI: 10.1038/s41591-022-01980-3
Huynh, K., Lim, W.L.F., Giles, C., Jayawardana, K.S., Salim, A., Mellett, N.A., … Meikle, P.J. (2020). Concordant peripheral lipidome signatures in two large clinical studies of Alzheimer’s disease. Nature Communications, 11(1), 5698. DOI: 10.1038/s41467-020-19473-7
Jankevics, A., Lloyd, G.R., & Weber, R.J.M. (2023). Pmp: Peak Matrix Processing and signal batch correction for metabolomics datasets. Manual. DOI: 10.18129/B9.bioc.pmp
Laaksonen, R., Ekroos, K., Sysi-Aho, M., Hilvo, M., Vihervaara, T., Kauhanen, D., … Lüscher, T.F. (2016). Plasma ceramides predict cardiovascular death in patients with stable coronary artery disease and acute coronary syndromes beyond LDL-cholesterol. European Heart Journal, 37(25), 1967–1976. DOI: 10.1093/eurheartj/ehw148
Mohamed, A., Molendijk, J., & Hill, M.M. (2020). Lipidr: A Software Tool for Data Mining and Analysis of Lipidomics Datasets. Journal of Proteome Research, 19(7), 2890–2897. DOI: 10.1021/acs.jproteome.0c00082
Riquelme, G., Zabalegui, N., Marchi, P., Jones, C.M., & Monge, M.E. (2020). A Python-Based Pipeline for Preprocessing LC for Untargeted Metabolomics Workflows. Metabolites, 10(10), 416. DOI: 10.3390/metabo10100416
Tan, S.H., Koh, H.W.L., Chua, J.Y., Burla, B., Ong, C.C., Teo, L.S.L., … Chan, M.Y. (2022). Variability of the Plasma Lipidome and Subclinical Coronary Atherosclerosis. Arteriosclerosis, Thrombosis, and Vascular Biology, 42(1), 100–112. DOI: 10.1161/ATVBAHA.121.316847
Teo, G., Chew, W.S., Burla, B., HErr, D., Tai, E.S., Wenk, M., … Choi, H. (2020). MRMkit: Automated Data Processing for Large-Scale Targeted Metabolomics Analysis. Analytical Chemistry, 92(20), 13677–13682. DOI: 10.1021/acs.analchem.0c03060
Wen, B., Mei, Z., Zeng, C., & Liu, S. (2017). metaX: A flexible and comprehensive software for processing metabolomics data. BMC Bioinformatics, 18(1), 183. DOI: 10.1186/s12859-017-1579-y