Import Analysis Results from Plain Wide-Format CSV Files
Source:R/data-import.R
import_data_csv_wide.Rd
Imports analysis result data from wide-format .csv
files, where each row corresponds
to a unique analysis-feature pair and columns contain analysis- or feature-specific variables.
Usage
import_data_csv_wide(
data = NULL,
path,
variable_name,
analysis_id_col = NA,
import_metadata = TRUE,
first_feature_column = NA,
na_strings = "NA"
)
Arguments
- data
A
MidarExperiment
object.- path
A file path or vector of file paths, or a directory path. If a directory is provided, all
.csv
files within it will be read.- variable_name
A character string specifying the variable type contained in the data. Must be one of
"intensity"
,"norm_intensity"
,"conc"
,"area"
,"height"
, or"response"
.- analysis_id_col
The column name or index to be used as
analysis_id
. Defaults toNA
, in which case"analysis_id"
is used if present; otherwise, the first column is used if it contains unique values.- import_metadata
Logical indicating whether to import additional metadata columns (e.g., batch ID, sample type) into the
MidarExperiment
object. Supported metadata columns are:"qc_type"
,"batch_id"
,"is_quantifier"
,"is_istd"
, and"analysis_order"
.- first_feature_column
Integer indicating the column number where feature value columns start.
- na_strings
Character vector of strings to interpret as NA values. Blank fields are also treated as NA.
Details
The dataset must include two identifier columns: "analysis_id"
and "feature_id"
,
with each pair of values unique across the table. Additionally, the table must contain
at least one feature variable column, such as "area"
, "height"
, "intensity"
,
"norm_intensity"
, "response"
, "conc"
, "rt"
, or "fwhm"
. Some downstream
functions may require specific columns among these to be present.
The variable_name
argument specifies the data type represented in the table, which
must be one of: "area"
, "height"
, "intensity"
, "norm_intensity"
, "response"
,
"conc"
, "conc_raw"
, "rt"
, or "fwhm"
.
If there is no column named analysis_id
, it will be inferred from the first column,
provided it contains unique values.
When import_metadata
is set to TRUE
, the following metadata columns will be imported
if present:
analysis_order
qc_type
batch_id
is_quantifier
To prevent additional non-metadata columns from being misinterpreted as features,
use the first_feature_column
parameter to specify the column where feature data starts.
If a directory path is provided to path
, all .csv
files in that directory will be
processed and merged into a single dataset. This facilitates handling datasets split
into multiple files during preprocessing. Ensure each feature and raw data file pair
appears only once to avoid duplication errors.
The na_strings
parameter allows specifying character strings to be interpreted as
missing values (NA). Blank fields are also treated as missing.
Examples
file_path <- system.file("extdata", "plain_wide_dataset.csv", package = "midar")
mexp <- MidarExperiment()
mexp <- import_data_csv_wide(
data = mexp,
path = file_path,
variable_name = "conc",
import_metadata = TRUE
)
#> ℹ Metadata column(s) 'qc_type, batch_id' imported. To ignore, set `import_metadata = FALSE`
#> ✔ Imported 87 analyses with 5 features
#> ✔ Analysis metadata associated with 87 analyses.
#> ✔ Feature metadata associated with 5 features.
#> ℹ Analysis order was based on `analysis_order` column of imported data. Use `set_analysis_order` to change the order.
print(mexp)
#>
#> ── MidarExperiment ─────────────────────────────────────────────────────────────
#> Title:
#>
#> Processing status: Annotated raw CONC values
#>
#> ── Annotated Raw Data ──
#>
#> • Analyses: 87
#> • Features: 5
#> • Raw signal used for processing: `feature_conc`
#>
#> ── Metadata ──
#>
#> • Analyses/samples: ✔
#> • Features/analytes: ✔
#> • Internal standards: ✖
#> • Response curves: ✖
#> • Calibrants/QC concentrations: ✖
#> • Study samples: ✖
#>
#> ── Processing Status ──
#>
#> • Isotope corrected: ✖
#> • ISTD normalized: ✖
#> • ISTD quantitated: ✔
#> • Drift corrected variables: ✖
#> • Batch corrected variables: ✖
#> • Feature filtering applied: ✖
#>
#> ── Exclusion of Analyses and Features ──
#>
#> • Analyses manually excluded (`analysis_id`): ✖
#> • Features manually excluded (`feature_id`): ✖