rbasel2023

{MiDAR}

Small Molecule Mass Spectrometry Large-Scale Data Processing, Quality Control, Analysis and Reporting

R/Basel 2023

Bo Burla - Singapore Lipidomics Incubator - National University of Singapore

SLING

Singapore Lipidomics Incubator

State-of-the art facility (mass spec, automation, data processes)
Strong base of skilled & well-trained staff
Tusted partner for clinical cohort studies
Clinical translation of assays
NUS-Agilent Hub (Video)
Conferences (isls11.eventengage.live)
Prof Markus R Wenk & Dr Anne K Bendt

More info: sling.sg

Big Interest in Small Molecule Analyses

Metabolites - Lipids - Signalling molecules - Natural Products - Drugs

Mass Spectrometry-Based Workflow

Available Tools for Data Post-Processing

Issues

Limited and static functions
By and for bioinformaticians
Blackboxes for the end-user
Assuming ‘perfect’ analyses and data
Manual (meta)data preparation often required

Analytical ‘Reality’

Project Diversity
Methods/Platform Diversity
Analytical performance/failures
Mistakes
Communication wet and dry lab
Processing scripts rarely published

Commercial tools

MS vendor software
EXCEL / ‘in-house’ R

Packages for lipidomics

lipidr Mohamed et al. (2020)

Packages for Metabolomics

pmp Jankevics et al. (2023)
metaX Wen et al. (2017)
tidyMS Riquelme et al. (2020)

Proteomics/Transcriptomics

Diverse Bioconductor packages

Data and Metadata

Data

Vendor peak integration tool outputs
Open source tools (e.g. Teo et al. (2020))
Generic formats (incl. pre-processed data)

Metadata

Analyses, Samples, Features, Calibration, exp. groups etc
CSV or XLS(M) templates (github/slinghub)
LIMS

{MiDAR} - Enabling data processing by the lab

For

Bioanalytical scientists and
Data scientists
Publishing

Aims

Supervised automation
Flexible modular steps
Standaridized and reproducible
Validated
Data and Process sharing
Tool and libary

Workflow

Lab people analyse data with {MiDAR}
Lab people share {MiDAR} project file
Data people support and amend
{MiDAR} project returns to the lab people

Challenges

Learning R
User acceptance
Scripts vs UI
Resources for SW development

Basic Workflow

library(midar)

mexp <- MidarExperiment()

mexp <- read_masshunter_csv(mexp, file_dir_names = data_path)
mexp <- read_msorganizer_xlm(mexp, filename = meta_path)

mexp <- normalize_by_istd(mexp)
mexp <- quantitate_by_istd(mexp)

mexp <- calculate_qc_metrics(mexp)
mexp <- apply_qc_filter(mexp, CV_BQC_max = 20, SB_RATIO_min = 5, R2_min = 0.8, RQC_CURVE = 1)

Basic Workflow

mexp <- MidarExperiment()

mexp <- read_masshunter_csv(mexp, file_dir_names = data_path)
mexp <- read_msorganizer_xlm(mexp, filename = meta_path)

mexp <- normalize_by_istd(mexp)
mexp <- quantitate_by_istd(mexp)

mexp <- calculate_qc_metrics(mexp)
mexp <- apply_qc_filter(mexp, CV_BQC_max = 20, SB_RATIO_min = 5, R2_min = 0.8, RQC_CURVE = 1)

Another example: Drift smoothing

mexp <- corr_drift_loess(
  data = mexp,
  qc_types = "BQC",
  within_batch = TRUE, 
  apply_conditionally = TRUE,
  min_sample_cv_ratio_before_after = 1,
  log2_transform = TRUE,
  span = 0.75)

QC of highly-dimensional large-scale data

Lipidomics analysis of 5000 samples took ~ 6 months, of which 2-3 months were instrument time

QC Plots as sharable HTML pages

plot_runscatter(data = mexp, 
                y_var = "Intensity", 
                feature_filter = "^Cer|Hex3Cer|GM3", 
                cap_values = TRUE, 
                show_batches = TRUE,
                outputPDF = FALSE, 
                annot_scale = 1.9, point_size = 4)

[1] "Plotting 3 pages..."
[1] "page 1"

[1] "page 2"

[1] "page 3"

Modifiable {MiDAR} outputs

plt <- plot_pca_qc(data = mexp, variable = "Concentration", 
                        log_transform = FALSE, dim_x = 1, dim_y = 2)
ggplotly(plt)

Share data and the processing steps

print(mexp)
save(mexp, "D:/my-data/")


 MidarExperiment 
 
   Data:  
   • Samples:  215 
   • Features:   428 
 
   Metadata:  
   • Sample annotation:  ✓ 
   • Feature annotation:  ✓ 
   • Internal standard annotation:  ✓ 
   • Response curves annotation:  ✓ 
   • Study samples annotation:  ✗ 
 
   Processing status:  Adjusted Quantitated Data 
 
   Processing:  
   • ISTD normalized:  ✓ 
   • ISTD quantitated:  ✓ 
   • Drift corrected:  ✓ 
   • Batch corrected:  ✗ 
   • Interference (isotope) corrected:  ✗

Outlook

Development
Collaborations
UI (Shiny)

Info

Bo Burla bo.burla@nus.edu.sg
Slides: https://slinghub.github.io/rbasel2023/

Acknowledgements

R/Basel organizers
Markus R. Wenk
Federico Torta
Hyung Won Choi
Shanshan Ji
Alicia Chan
Jeremy Selva
Anne K. Bendt

This presenstation was made with Quarto with elements taken from https://github.com/garthtarr/sydney_quarto/

References

Buergel, T., Steinfeldt, J., Ruyoga, G., Pietzner, M., Bizzarri, D., Vojinovic, D., … Landmesser, U. (2022). Metabolomic profiles predict individual multidisease outcomes. Nature Medicine, 28(11), 2309–2320. DOI: 10.1038/s41591-022-01980-3

Huynh, K., Lim, W.L.F., Giles, C., Jayawardana, K.S., Salim, A., Mellett, N.A., … Meikle, P.J. (2020). Concordant peripheral lipidome signatures in two large clinical studies of Alzheimer’s disease. Nature Communications, 11(1), 5698. DOI: 10.1038/s41467-020-19473-7

Jankevics, A., Lloyd, G.R., & Weber, R.J.M. (2023). Pmp: Peak Matrix Processing and signal batch correction for metabolomics datasets. Manual. DOI: 10.18129/B9.bioc.pmp

Laaksonen, R., Ekroos, K., Sysi-Aho, M., Hilvo, M., Vihervaara, T., Kauhanen, D., … Lüscher, T.F. (2016). Plasma ceramides predict cardiovascular death in patients with stable coronary artery disease and acute coronary syndromes beyond LDL-cholesterol. European Heart Journal, 37(25), 1967–1976. DOI: 10.1093/eurheartj/ehw148

Mohamed, A., Molendijk, J., & Hill, M.M. (2020). Lipidr: A Software Tool for Data Mining and Analysis of Lipidomics Datasets. Journal of Proteome Research, 19(7), 2890–2897. DOI: 10.1021/acs.jproteome.0c00082

Riquelme, G., Zabalegui, N., Marchi, P., Jones, C.M., & Monge, M.E. (2020). A Python-Based Pipeline for Preprocessing LC for Untargeted Metabolomics Workflows. Metabolites, 10(10), 416. DOI: 10.3390/metabo10100416

Tan, S.H., Koh, H.W.L., Chua, J.Y., Burla, B., Ong, C.C., Teo, L.S.L., … Chan, M.Y. (2022). Variability of the Plasma Lipidome and Subclinical Coronary Atherosclerosis. Arteriosclerosis, Thrombosis, and Vascular Biology, 42(1), 100–112. DOI: 10.1161/ATVBAHA.121.316847

Teo, G., Chew, W.S., Burla, B., HErr, D., Tai, E.S., Wenk, M., … Choi, H. (2020). MRMkit: Automated Data Processing for Large-Scale Targeted Metabolomics Analysis. Analytical Chemistry, 92(20), 13677–13682. DOI: 10.1021/acs.analchem.0c03060

Wen, B., Mei, Z., Zeng, C., & Liu, S. (2017). metaX: A flexible and comprehensive software for processing metabolomics data. BMC Bioinformatics, 18(1), 183. DOI: 10.1186/s12859-017-1579-y