Analytical data, i.e. preprocessed data from mass spectrometry experiments, can be imported from different sources.
Data files present in a folder can also be imported and merged as well. This can be useful when the raw data processing is broken down in batches resulting in separate result files.
Data Sources
Following formats are currently supported:
Source | MiDAR function | Details | Extension |
---|---|---|---|
Agilent MassHunter | import_data_massshunter() |
Flat and nested tables from MassHunter Quant. | .csv |
MRMkit | import_data_mrmkit() |
Long format output format | .csv or .tsv |
Skyline | import_data_skyline() |
Skyline Small Molecule Transition Results (long format) | |
Generic wide CSV | import_data_csv_wide() |
Samples/analyses as rows and features as columns. Can contain columns specific sample annotations. | .csv |
Generic long CSV | import_data_csv_long() |
Long-format table with each row being a unique
analysis_id and feature_id pair and addtional
columns representing feature variables and sample/method
information |
.csv |
Metadata within analytical results
When the analytical results contain metadata, such as sample and
feature annotations, these can be imported as metadata in the
MidarExperiment
object as well. The imported metadata is
checked for integrity and consistency (see TODO) and then added to the
annotation tables within the MidarExperiment
. To include
available metadata, set the argument
import_metadata = TRUE
.
MRMkit
Output files from MRMkit, an open-source peak integration software for MRM data () can imported directly. Specific metadata present in the data file can be imported as well (`import_metadata = TRUE`)
library(midar)
filepath <- system.file("extdata/MRMkit_demo.tsv", package = "midar")
myexp <- MidarExperiment()
myexp <- import_data_mrmkit(myexp, filepath, import_metadata = TRUE)
Agilent MassHunter Quantitative
Peak integration results exported from Agilent Masshunter Quant in the CSV format can be imported. Samples must be present in rows, features as columns. Import of qualifier results is supported. Sample, method and result metadata present in the files can also be imported (`import_metadata = TRUE`)
filepath <- system.file("extdata/MHQuant_demo.csv", package = "midar")
myexp <- MidarExperiment()
myexp <- import_data_masshunter(myexp, filepath, import_metadata = TRUE)
Skyline Molecule Transition Results
Small molecule peak integration results from Skyline can be imported using long-format CSV reports.
To uniquely define feature_id
s, the CSV file must include
the Molecule Name
along with either the corresponding
precursor and product m/z values, or their names. The
analysis_id
is mapped from the Replicate Name
column, which must also be present in the CSV file.
During the import process, unique feature_id
s are
generated by appending either the precursor/product names or m/z values
to the Molecule Name
, unless the Molecule Name
alone uniquely identifies the features. This behavior is controlled by
the transition_id_columns
argument described below. Sample,
method, and result metadata present in the files can also be imported by
setting import_metadata = TRUE
. For more details, refer to
the documentation of the import_data_skyline()
function.
To export results from Skyline: Navigate to “File” > “Export” and
select ‘Molecule Transition Results’ format. Ensure your export includes
Replicate Name
, Molecule Name
, and either
Precursor Mz
and Product Mz,
or
Precursor Name
and Product Name
. Also, export
at least one feature variable like Area
or
Retention Time (RT)
.
filepath <- system.file("extdata/Skyline_MoleculeTransitionResults.csv", package = "midar")
myexp <- MidarExperiment()
myexp <- import_data_skyline(myexp,
filepath,
import_metadata = TRUE,
transition_id_columns = "mz" )
Generic Wide-Format CSV files
Analysis results, whether raw intensities (e.g., peak areas) or
preprocessed data (e.g., concentrations), can be provided as plain
wide-format CSV tables. In these tables, analyses (samples) should be
arranged in rows, and features in columns. The specific data type in the
table (e.g., area or concentration) is defined using the
variable_name
argument.
filepath <- system.file("extdata/plain_wide_dataset.csv", package = "midar")
myexp <- MidarExperiment()
myexp <- midar::import_data_csv_wide(
myexp,
path = filepath,
variable_name = "area",
import_metadata = TRUE)
Generic Long-Format CSV Files
Analysis results containing various feature variables (e.g., peak areas, retention times) can be provided as generic long-format CSV tables. In this format, each row represents a unique observation of a feature-value pair for a given sample, while additional columns capture feature variables as well as sample- or method-related metadata.
This long-format structure is a common export format supported by many vendor and open-source software tools for both targeted and untargeted raw data processing.
By default, the CSV file must include at least the following columns:
analysis_id
, feature_id
, and one feature
variable column such as area
. Additional metadata and
feature variable columns are also supported — please refer to the
documentation for import_data_csv_long()
for further
details.
If your CSV file uses different column names, you can import it by specifying a column name mapping. This mapping associates the MiDAR expected column names with the corresponding column names in your CSV file (refer to the MiDAR Manual for details).
The column mapping is defined as a named vector, where the names correspond to the MiDAR column names and the values correspond to the column names in your CSV.
file_path <- system.file("extdata", "plain_long_dataset.csv", package = "midar")
mexp <- MidarExperiment()
# Define a column mapping, right side is the column name in the CSV file
col_map <- c(
"analysis_id" = "raw_data_filename",
"qc_type" = "qc_type",
"feature_id" = "feature_id",
"feature_class" = "feature_class",
"istd_feature_id" = "istd_feature_id",
"qc_type" = "qc_type",
"feature_rt" = "rt",
"feature_area" = "area")
# Import data
mexp <- import_data_csv_long(
data = mexp,
path = file_path,
column_mapping = col_map,
import_metadata = TRUE)
Multiple files Import and Merging
Multiple data files can be imported and merged. Users can either provide a list of file paths or specify a folder path to import all data files within that directory. This support for multiple files is useful when raw data processing is divided into batches, leading to separate result files.
The imported merged data is checked for consistency, ensuring that each analysis ID and feature ID pair is unique. This means that the same feature cannot be reported multiple times within the same analysis, which can happend for example if the same feature in the same sample was integrated in different raw data processing batches.