Batch-effect correction • midar

MiDAR currently supports median centering-based batch effect correction.

The data must be provided via a MidarExperiment object, whereby raw data that was imported or processed data can be corrected, such as intensity or conc values.

Import data

In this tutorial, we import pre-calculcated concentration values from a CSV file. This file must contain a column with batch information (batch_id), see import_data_csv() for more information.

library(midar)

myexp <- midar::MidarExperiment()

myexp <- import_data_csv(
  myexp,
  path = "simdata-u1000-sd100_7batches.csv", 
  variable_name = "conc", 
  import_metadata = TRUE)

Apply batch-centering

Now we center the concentration values of batches based on a reference qc type, the samples (ref_qc_types = "SPL") in this case. The correct_scale parameter

myexp <- correct_batch_centering(
  data = myexp, 
  variable = "conc",
  ref_qc_types = "SPL", 
  correct_scale = FALSE
)
#> ℹ Adding batch correction to `conc` data.
#> ✔ Batch median-centering of 7 batches was applied to raw concentrations of all 1 features.
#> ℹ The median CV change of all features in study samples was -30.49% (range: -30.50% to -30.50%).  The median absolute CV of all features decreased from 44.05% to 13.56%.

Next, we inspect the data before and after batch correction. We observe that the batches are now aligned. Note: If the samples or other quality control types do not follow the reference samples, they may not be appropriately corrected.

plot_runscatter(myexp, variable = "conc_before", rows_page = 1, cols_page = 1)

RunScatter plots before and after batch correction

plot_runscatter(myexp, variable = "conc", rows_page = 1, cols_page = 1)

RunScatter plots before and after batch correction

Batch-centering with variance scaling

Above, we observe that while the batches are aligned, the spread (variance) of the data points varies considerably between the batches. We can correct this by scaling the variance via correct_scale = TRUE. As shown in the plot below, the variance of the data points now appears fairly consistent across the batches.

myexp <- midar::correct_batch_centering(
  myexp, 
  ref_qc_types = "SPL", 
  variable = "conc",
  correct_scale = TRUE
)
#> ℹ Replacing previous `conc` batch correction.
#> ✔ Batch median-centering of 7 batches was applied to raw concentrations of all 1 features.
#> ℹ The median CV change of all features in study samples was -32.20% (range: -32.20% to -32.20%).  The median absolute CV of all features decreased from 44.05% to 11.85%.

plot_runscatter(myexp, variable = "conc", rows_page = 1, cols_page = 1)

RunScatter plots before and after batch correction

Export batch-corrected data

Next, we can either continue to work with the corrected data using MiDAR functions or export the data.

save_dataset_csv(
  myexp, 
  path = "batch-corrected-conc-data.csv", 
  variable = "conc", 
  filter_data = FALSE
  )
#> ✔ Concentration values for 700 analyses and 1 features have been exported to 'batch-corrected-conc-data.csv'.