Importing analytical data • midar

Analytical data, i.e. preprocessed data from mass spectrometry experiments, can be imported from different sources.

Data files present in a folder can also be imported and merged as well. This can be useful when the raw data processing is broken down in batches resulting in separate result files.

Data Sources

Following formats are currently supported:

Source	MiDAR function	Details	Extension
Agilent MassHunter	`import_data_massshunter()`	Flat and nested tables from MassHunter Quant.	.csv
MRMkit	`import_data_mrmkit()`	Long format output format	.csv or .tsv
Skyline	`import_data_skyline()`	Skyline Small Molecule Transition Results (long format)
Generic wide CSV	`import_data_csv_wide()`	Samples/analyses as rows and features as columns. Can contain columns specific sample annotations.	.csv
Generic long CSV	`import_data_csv_long()`	Long-format table with each row being a unique `analysis_id` and `feature_id` pair and addtional columns representing feature variables and sample/method information	.csv

Metadata within analytical results

When the analytical results contain metadata, such as sample and feature annotations, these can be imported as metadata in the MidarExperiment object as well. The imported metadata is checked for integrity and consistency (see TODO) and then added to the annotation tables within the MidarExperiment. To include available metadata, set the argument import_metadata = TRUE.

MRMkit

Output files from MRMkit, an open-source peak integration software for MRM data () can imported directly. Specific metadata present in the data file can be imported as well (`import_metadata = TRUE`)

library(midar)
filepath <- system.file("extdata/MRMkit_demo.tsv", package = "midar")
myexp <- MidarExperiment()

myexp <- import_data_mrmkit(myexp, filepath, import_metadata = TRUE)

Agilent MassHunter Quantitative

Peak integration results exported from Agilent Masshunter Quant in the CSV format can be imported. Samples must be present in rows, features as columns. Import of qualifier results is supported. Sample, method and result metadata present in the files can also be imported (`import_metadata = TRUE`)

filepath <- system.file("extdata/MHQuant_demo.csv", package = "midar")
myexp <- MidarExperiment()

myexp <- import_data_masshunter(myexp, filepath, import_metadata = TRUE)

Skyline Molecule Transition Results

Small molecule peak integration results from Skyline can be imported using long-format CSV reports. To uniquely define feature_ids, the CSV file must include the Molecule Name along with either the corresponding precursor and product m/z values, or their names. The analysis_id is mapped from the Replicate Name column, which must also be present in the CSV file.

During the import process, unique feature_ids are generated by appending either the precursor/product names or m/z values to the Molecule Name, unless the Molecule Name alone uniquely identifies the features. This behavior is controlled by the transition_id_columns argument described below. Sample, method, and result metadata present in the files can also be imported by setting import_metadata = TRUE. For more details, refer to the documentation of the import_data_skyline() function.

To export results from Skyline: Navigate to “File” > “Export” and select ‘Molecule Transition Results’ format. Ensure your export includes Replicate Name, Molecule Name, and either Precursor Mz and Product Mz, or Precursor Name and Product Name. Also, export at least one feature variable like Area or Retention Time (RT).

filepath <- system.file("extdata/Skyline_MoleculeTransitionResults.csv", package = "midar")
myexp <- MidarExperiment()

myexp <- import_data_skyline(myexp, 
                             filepath, 
                             import_metadata = TRUE, 
                             transition_id_columns = "mz" )

Generic Wide-Format CSV files

Analysis results, whether raw intensities (e.g., peak areas) or preprocessed data (e.g., concentrations), can be provided as plain wide-format CSV tables. In these tables, analyses (samples) should be arranged in rows, and features in columns. The specific data type in the table (e.g., area or concentration) is defined using the variable_name argument.

filepath <- system.file("extdata/plain_wide_dataset.csv", package = "midar")
myexp <- MidarExperiment()

myexp <- midar::import_data_csv_wide(
  myexp, 
  path = filepath,
  variable_name = "area", 
  import_metadata = TRUE)

Generic Long-Format CSV Files

Analysis results containing various feature variables (e.g., peak areas, retention times) can be provided as generic long-format CSV tables. In this format, each row represents a unique observation of a feature-value pair for a given sample, while additional columns capture feature variables as well as sample- or method-related metadata.

This long-format structure is a common export format supported by many vendor and open-source software tools for both targeted and untargeted raw data processing.

By default, the CSV file must include at least the following columns: analysis_id, feature_id, and one feature variable column such as area. Additional metadata and feature variable columns are also supported — please refer to the documentation for import_data_csv_long() for further details.

If your CSV file uses different column names, you can import it by specifying a column name mapping. This mapping associates the MiDAR expected column names with the corresponding column names in your CSV file (refer to the MiDAR Manual for details).

The column mapping is defined as a named vector, where the names correspond to the MiDAR column names and the values correspond to the column names in your CSV.

 file_path <- system.file("extdata", "plain_long_dataset.csv", package = "midar")

 mexp <- MidarExperiment()

# Define a column mapping, right side is the column name in the CSV file
col_map <- c(
  "analysis_id" = "raw_data_filename",
  "qc_type" = "qc_type",
  "feature_id" = "feature_id",
  "feature_class" = "feature_class",
  "istd_feature_id" = "istd_feature_id",
  "qc_type" = "qc_type",
  "feature_rt" = "rt",
  "feature_area" = "area")

# Import data
 mexp <- import_data_csv_long(
   data = mexp,
   path = file_path,
   column_mapping = col_map,
   import_metadata = TRUE)

Multiple files Import and Merging

Multiple data files can be imported and merged. Users can either provide a list of file paths or specify a folder path to import all data files within that directory. This support for multiple files is useful when raw data processing is divided into batches, leading to separate result files.

The imported merged data is checked for consistency, ensuring that each analysis ID and feature ID pair is unique. This means that the same feature cannot be reported multiple times within the same analysis, which can happend for example if the same feature in the same sample was integrated in different raw data processing batches.