Data Collaboration

New Tools from the R world and SLING



Bo Burla - Research Update Q2 2023

An Experiment

  • Process
  • Explore
  • Collaborate
  • Share

View these slides (live) on your device
https://slinghub.github.io/Talk_SLING_RU2023Q2

What actually is R and RStudio ?


R

  • Statistical Programming Language

  • V1.0 in February 2000

  • R packages

    • CRAN (19’000)

    • Bioconductor (genomics)

    • GitHub

  • FDA: approved R

  • R Foundation (non-profit)

RStudio and Posit PBC (RStudio Inc.)

  • Graphical Interface for R (IDE)

  • Tidyverse

  • Shiny

  • RMarkdown -> R/Notebooks

  • For-Profit

Quarto - creating content…

https://quarto.org

MiDAR - An R package

  • Data and Metadata
  • Processing
  • Exploration (QC)
  • Reporting

  • Validated
  • Traceable
  • Shareable
  • Accessible

Basic Data Processing Workflow

library(midar)

panelA <- MidarExperiment()

panelA <- read_masshunter_csv(panelA, file_dir_names = data_path)
panelA <- read_msorganizer_xlm(panelA, filename = meta_path)

panelA <- normalize_by_istd(panelA)
panelA <- quantitate_by_istd(panelA)

panelA <- calculate_qc_metrics(panelA)
panelA <- apply_qc_filter(panelA,
                          CV_BQC_max = 20,
                          SB_RATIO_min = 5,
                          R2_min = 0.8,
                          RQC_CURVE = 1)

Basic Data Processing Workflow

panelA <- read_masshunter_csv(panelA, file_dir_names = data_path)
panelA <- read_msorganizer_xlm(panelA, filename = meta_path)

panelA <- normalize_by_istd(panelA)
panelA <- quantitate_by_istd(panelA)

panelA <- calculate_qc_metrics(panelA)
panelA <- apply_qc_filter(panelA,
                          CV_BQC_max = 20,
                          SB_RATIO_min = 5,
                          R2_min = 0.8, RQC_CURVE = 1)

Drift and Batch Correction

panelA <- corr_drift_loess(
  data = panelA,
  qc_types = "BQC",
  within_batch = TRUE, 
  apply_conditionally = TRUE,
  min_sample_cv_ratio_before_after = 1,
  log2_transform = TRUE,
  span = 0.75)

Plots as web pages

plot_runscatter(data = panelA, 
                y_var = "Intensity", 
                feature_filter = "^Cer|Hex3Cer|GM3", 
                cap_values = TRUE, 
                show_batches = TRUE,
                outputPDF = FALSE, 
                annot_scale = 1.9, point_size = 4) 
[1] "Plotting 3 pages..."
[1] "page 1"

[1] "page 2"

[1] "page 3"

Interactive Plots

plotA <- plot_pca_qc(data = panelA, variable = "Concentration", 
                        log_transform = FALSE, dim_x = 1, dim_y = 2)
ggplotly(plotA)

Share data and process

info(panelA)
save(panelA, "D:/my-data/")

 MidarExperiment 
 
   Data:  
   • Samples:  215 
   • Features:   428 
 
   Metadata:  
   • Sample annotation:  ✓ 
   • Feature annotation:  ✓ 
   • Internal standard annotation:  ✓ 
   • Response curves annotation:  ✓ 
   • Study samples annotation:  ✗ 
 
   Processing status:  Adjusted Quantitated Data 
 
   Processing:  
   • ISTD normalized:  ✓ 
   • ISTD quantitated:  ✓ 
   • Drift corrected:  ✓ 
   • Batch corrected:  ✗ 

QA/QC … of scripts and tools!

  • At design
  • At programming
  • Make packages

  • Automated testing

Thank You 🙂