2 Import data into R
2.1 Introduction
Importing data is often the first step in a data analysis project. Quality and integrity of imported data is key for the subsequent analysis - Quality In Quality Out. A good overview of data import in R helps to ensure integrity, completeness and efficient import of data .
2.2 Libraries
2.3 Read CSV files
The console output summarizes the data types assigned to the different columns
#> Rows: 215 Columns: 430
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): DataFileName, SPLType, VialPosition
#> dbl (426): CE 14:0, CE 15:0, CE 16:0, CE 16:1, CE 16:2, CE 17:0, CE 17:1, C...
#> dttm (1): AcqTimeStamp
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Output
A data frame (tibble)
Parameters
-
col_names = TRUE
means the first row defines column names (default is TRUE) -
trim_ws = TRUE
removes white-spaces before and after text (default is TRUE) -
na
defines which text values are to be treated as missing values (NA). See also Section 2.5.2
2.4 Read EXCEL sheets
Output
A data frame (tibble)
Parameters
-
sheet
defines the Excel sheet to be imported - For other parameters, see Section 2.3
2.5 Remarks
Verify the integrity of imported data - do not just assume all is fine
2.5.1 Date File Formats
- Use structured and consistent data files as input data
- First row should contain column names.
- Avoid rows before and after the table (e.g. descriptions or notes)
- Rows should contain observations (e.g. samples) and columns the variables (e.g. compounds)
2.5.2 Missing Values
-
read_csv()
orread_xlsx()
assign empty/missing values asNA
(Not Available). - Data files may contain specific text values for missing values (e.g.,“ND”,”LOD”, “hemolytic”). By default,
read_csv()
andread_xlsx()
assign columns with such values as text (character) instead as numeric - even if there is just one such text value. You can use thena
parameter to set text values to be interpreted as NA (see example) - Column that contain only missing values will be assigned the data type
logical
2.5.3 Data Types
-
Note:
read_csv()
orread_xlsx()
will by default guess the data type of each column. - Typical data types are numbers (
dbl
), text (chr
), logical (lgl
) and factor (fct
) - If a column contains only one cell with text, the full column will be defined as text.
- Columns that contains only missing values will be assigned the data type
logical
- You can define the column types already at import (see
?read_csv
).