Import, Export, and Convert Data Files

The idea behind rio is to simplify the process of importing data into R and exporting data from R. This process is, probably unnecessarily, extremely complex for beginning R users. Indeed, R supplies an entire manual describing the process of data import/export. And, despite all of that text, most of the packages described are (to varying degrees) out-of-date. Faster, simpler, packages with fewer dependencies have been created for many of the file types described in that document. rio aims to unify data I/O (importing and exporting) into two simple functions: import() and export() so that beginners (and experienced R users) never have to think twice (or even once) about the best way to read and write R data.

The core advantage of rio is that it makes assumptions that the user is probably willing to make. Specifically, rio uses the file extension of a file name to determine what kind of file it is. This is the same logic used by Windows OS, for example, in determining what application is associated with a given file type. By taking away the need to manually match a file type (which a beginner may not recognize) to a particular import or export function, rio allows almost all common data formats to be read with the same function.

By making import and export easy, it’s an obvious next step to also use R as a simple data conversion utility. Transferring data files between various proprietary formats is always a pain and often expensive. The convert function therefore combines import and export to easily convert between file formats (thus providing a FOSS replacement for programs like Stat/Transfer or Sledgehammer).

Supported file formats

rio supports a variety of different file formats for import and export. To keep the package slim, all non-essential formats are supported via “Suggests” packages, which are not installed (or loaded) by default. To ensure rio is fully functional, install these packages the first time you use rio via:

install_formats()

The full list of supported formats is below:

Format Typical Extension Import Package Export Package Installed by Default
Comma-separated data .csv data.table data.table Yes
Pipe-separated data .psv data.table data.table Yes
Tab-separated data .tsv data.table data.table Yes
SAS .sas7bdat haven haven Yes
SPSS .sav haven haven Yes
Stata .dta haven haven Yes
SAS XPORT .xpt haven Yes
SPSS Portable .por haven Yes
Excel .xls readxl Yes
Excel .xlsx readxl openxlsx Yes
R syntax .R base base Yes
Saved R objects .RData, .rda base base Yes
Serialized R objects .rds base base Yes
Epiinfo .rec foreign Yes
Minitab .mtp foreign Yes
Systat .syd foreign Yes
“XBASE” database files .dbf foreign foreign Yes
Weka Attribute-Relation File Format .arff foreign foreign Yes
Data Interchange Format .dif utils Yes
Fortran data no recognized extension utils Yes
Fixed-width format data .fwf utils utils Yes
gzip comma-separated data .csv.gz utils utils Yes
CSVY (CSV + YAML metadata header) .csvy csvy csvy No
Feather R/Python interchange format .feather feather feather No
Fast Storage .fst fst fst No
JSON .json jsonlite jsonlite No
Matlab .mat rmatio rmatio No
OpenDocument Spreadsheet .ods readODS readODS No
HTML Tables .html xml2 xml2 No
Shallow XML documents .xml xml2 xml2 No
YAML .yml yaml yaml No
Clipboard default is tsv clipr clipr No
Google Sheets as Comma-separated data

Additionally, any format that is not supported by rio but that has a known R implementation will produce an informative error message pointing to a package and import or export function. Unrecognized formats will yield a simple “Unrecognized file format” error.

Data Import

rio allows you to import files in almost any format using one, typically single-argument, function. import() infers the file format from the file’s extension and calls the appropriate data import function for you, returning a simple data.frame. This works for any for the formats listed above.

library("rio")

x <- import("mtcars.csv")
y <- import("mtcars.rds")
z <- import("mtcars.dta")

# confirm identical
all.equal(x, y, check.attributes = FALSE)
## [1] TRUE
all.equal(x, z, check.attributes = FALSE)
## [1] TRUE

If for some reason a file does not have an extension, or has a file extension that does not match its actual type, you can manually specify a file format to override the format inference step. For example, we can read in a CSV file that does not have a file extension by specifying csv:

head(import("mtcars_noext", format = "csv"))
##    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Importing Data Lists

Sometimes you may have multiple data files that you want to import. import() only ever returns a single data frame, but import_list() can be used to import a vector of file names into R. This works even if the files are different formats:

str(import_list(dir()), 1)

Similarly, some single-file formats (e.g. Excel Workbooks, Zip directories, HTML files, etc.) can contain multiple data sets. Because import() is type safe, always returning a data frame, importing from these formats requires specifying a which argument to import() to dictate which data set (worksheet, file, table, etc.) to import (the default being which = 1). But import_list() can be used to import all (or only a specified subset, again via which) of data objects from these types of files.

Data Export

The export capabilities of rio are somewhat more limited than the import capabilities, given the availability of different functions in various R packages and because import functions are often written to make use of data from other applications and it never seems to be a development priority to have functions to export to the formats used by other applications. That said, rio currently supports the following formats:

library("rio")

export(mtcars, "mtcars.csv")
export(mtcars, "mtcars.rds")
export(mtcars, "mtcars.dta")

It is also easy to use exp