R package ropenaq

M. Salmon

2019-01-31

Introduction

This R package is aimed at accessing the openaq API. OpenAQ is a community of scientists, software developers, and lovers of open environmental data who are building an open, real-time database that provides programmatic and historical access to air quality data. See their website at https://openaq.org/ and see the API documentation at https://docs.openaq.org/. The package contains 5 functions that correspond to the 5 different types of query offered by the openaq API: cities, countries, latest, locations and measurements. The package uses the dplyr package: all output tables are data.frame (dplyr “tbl_df”) objects, that can be further processed and analysed.

What data can you get?

Via the API since November 2017 the API only provides access to the latest 90 days of OpenAQ data. The whole OpenAQ data can be accessed via Amazon S3. See this announcement. You can interact with Amazon S3 using the aws.s3 package and the maintainer of ropenaq plans to write tutorials about how to access OpenAQ data and will also keep the documentation of ropenaq up-to-date regarding data access changes.

Finding measurements availability

Three functions of the package allow to get lists of available information. Measurements are obtained from locations that are in cities that are in countries.

The aq_countries function

The aq_countries function allows to see for which countries information is available within the platform. It is the easiest function because it does not have any argument. The code for each country is its ISO 3166-1 alpha-2 code.

The aq_cities function

Using the aq_cities functions one can get all cities for which information is available within the platform. For each city, one gets the number of locations and the count of measures for the city, the URL encoded string, and the country it is in.

The optional country argument allows to do this for a given country instead of the whole world.

If one inputs a country that is not in the platform (or misspells a code), then an error message is thrown.

The aq_locations function

The aq_locations function has far more arguments than the first two functions. On can filter locations in a given country, city, location, for a given parameter (valid values are “pm25”, “pm10”, “so2”, “no2”, “o3”, “co” and “bc”), from a given date and/or up to a given date, for values between a minimum and a maximum, for a given circle outside a central point by the use of the latitude, longitude and radius arguments. In the output table one also gets URL encoded strings for the city and the location. Below are several examples.

Here we only look for locations with PM2.5 information in Chennai, India.

Getting measurements

Two functions allow to get data: aq_measurement and aq_latest. In both of them the arguments city and location needs to be given as URL encoded strings.

The aq_measurements function

The aq_measurements function has many arguments for getting a query specific to, say, a given parameter in a given location or for a given circle outside a central point by the use of the latitude, longitude and radius arguments. Below we get the PM2.5 measures for Delhi in India.

One could also get all possible parameters in the same table.

The aq_latest function

This function gives a table with all newest measures for the locations that are chosen by the arguments. If all arguments are NULL, it gives all the newest measures for all locations. Below are the latest values for Hyderabad at the time this vignette was compiled.

Paging and limit

For all endpoints/functions, there a a limit and a page arguments, which indicate, respectively, how many results per page should be shown and which page should be queried. If you don’t enter the parameters by default all results for the query will be retrieved with async requests, but it might take a while nonetheless depending on the total number of results.

aq_measurements(city = "Delhi", parameter = "pm25")

Rate limiting

In October 2017 the API introduced a rate limit of 2,000 requests every 5 minutes. Please keep this in mind. In the case when the request receives a response status of 429 (too many requests), the package will wait 5 minutes.

Other packages of interest for getting air quality data