dwapi quickstart

Configuration

Make sure to configure the library at the begining of every new R session. To do so, invoke dwapi::configure() passing the data.world authentication token obtained at https://data.world/settings/advanced

DO NOT SHARE YOUR AUTHENTICATION TOKEN

For your security, do not include your API authentication token in code that is intended to be shared with others.

Call this function via console, always when possible.

If you must call it in code do not include the actual API token. Instead, pass the token via a variable in .Renviron, and do not share your .Renviron file. For example:

dwapi::configure(auth_token = Sys.getenv("DW_AUTH_TOKEN"))

Creating datasets and updating datasets

Use dwapi::create_dataset() to create a new dataset. The library includes number of constructor functions to facilitate the praparation of complex requests like this. The example here is dwapi::dataset_create_request().

create_cars_dataset <- dwapi::dataset_create_request(
  title = sprintf("My cars dataset %s", runif(1)),
  visibility = "PRIVATE",
  license_string = "Other"
)

cars_dataset <- dwapi::create_dataset(Sys.getenv("DW_USER"), create_cars_dataset)
cars_dataset

Additional information can be added over time, with dataset updates.

update_cars_dataset <- dwapi::dataset_update_request(
  description = "This is a dataset created from R's cars dataset."
)

dwapi::update_dataset(cars_dataset$uri, update_cars_dataset)

Uploading files

Files can be added via URL, from the local file system, or directly as a data frame.

upload_response <- dwapi::upload_data_frame(cars_dataset$uri, cars, "cars.csv")
Sys.sleep(10) # Files are processed asyncronously.
upload_response

Tables and schemas (data dictionary)

data.world extracts tabular data from various tabular data formats. Tables are a logical representation of tabular data that has been extracted and normalized.

tables = dwapi::list_tables(cars_dataset$uri)
tables

At this point, it is possible to review the schema of dataset tables.

dwapi::get_table_schema(cars_dataset$uri, tables[[1]])

And also, to annotate fields, providing textual description to make datasets easier to understand and work with.

update_cars_schema <- dwapi::table_schema_update_request(
  fields = list(dwapi::table_schema_field_update_request(
    name = "speed", description = "Top speed"))
)
dwapi::update_table_schema(cars_dataset$uri, tables[[1]], update_cars_schema)
dwapi::get_table_schema(cars_dataset$uri, tables[[1]])

Queries

Datasets can be queried using SQL and SPARQL. Once again, it’s important to keep the concept of tables and their names in mind.

sql_query <- "SELECT * FROM cars"
dwapi::sql(cars_dataset$uri, sql_query)