cbsodataR, all data of Statistics Netherlands (CBS)

Edwin de Jonge

2018-02-02

Statistics Netherlands (CBS) is the office that produces all official statistics of the Netherlands.

For long SN has put its data on the web in its online database StatLine. Since 2014 this data base has an open data web API based on the OData protocol. The cbsodata package allows for retrieving data right into R.

Table information

A list of tables can be retrieved using the get_table_list function.

tables <- get_table_list(Language="en") # retrieve only enlgish tables
## [1] "https://opendata.cbs.nl/ODataCatalog/Tables?$format=json&$filter=(Language%20eq%20'en')"
tables %>% 
  select(Identifier, ShortTitle) %>% 
  head 
##   Identifier                              ShortTitle
## 1   80783eng  Agriculture; general farm type, region
## 2   80784eng       Agriculture; labour force, region
## 3    7100eng                Arable crops; production
## 4   70671ENG      Fruit culture; area fruit orchards
## 5   37738ENG Vegetables; yield per kind of vegetable
## 6   71509ENG                  Yield apples and pears

Using an “Identifier” from get_table_list information on the table can be retrieved with get_meta

m <- get_meta('71509ENG')
## [1] "https://opendata.cbs.nl/ODataApi/odata/71509ENG"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/TableInfos"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/DataProperties"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/CategoryGroups"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/FruitFarmingRegions"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/Periods"
m
## 71509ENG: 'Yield apples and pears', 2016
##   FruitFarmingRegions: 'Fruit farming regions'
##   Periods: 'Periods'

The meta object contains all metadata properties of cbsodata (see the original documentation) in the form of data.frames. Each data.frame describes properties of the SN table.

names(m)
## [1] "TableInfos"          "DataProperties"      "CategoryGroups"     
## [4] "FruitFarmingRegions" "Periods"

Data download

With get_data data can be retrieved. By default all data for this table will be downloaded in a temporary directory.

get_data('71509ENG') %>% 
  select(2:5) %>%  # select column 2 to 5 (for demonstration purpose)
  head
## [1] "https://opendata.cbs.nl/ODataApi/odata/71509ENG"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/TableInfos"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/DataProperties"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/CategoryGroups"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/FruitFarmingRegions"
## [1] "http://opendata.cbs.nl/ODataApi/odata/71509ENG/Periods"
## [1] "https://opendata.cbs.nl/ODataFeed/odata/71509ENG/UntypedDataSet?$format=json"
## # A tibble: 6 x 4
##   FruitFarmingRegions Periods TotalAppleVarieties_1 CoxSOrangePippin_2
##                <fctr>  <fctr>                 <chr>              <chr>
## 1   Total Netherlands    1997                   420                 43
## 2   Total Netherlands    1998                   518                 40
## 3   Total Netherlands    1999                   568                 39
## 4   Total Netherlands    2000                   461                 27
## 5   Total Netherlands    2001                   408                 30
## 6   Total Netherlands    2002                   354                 17

The data will be automatically recoded with titles for the categories. If needed the original data can be retained with recode=FALSE

get_data('71509ENG', recode = FALSE) %>% 
  select(2:5) %>% 
  head
## [1] "https://opendata.cbs.nl/ODataFeed/odata/71509ENG/UntypedDataSet?$format=json"
## # A tibble: 6 x 4
##   FruitFarmingRegions  Periods TotalAppleVarieties_1 CoxSOrangePippin_2
##                 <chr>    <chr>                 <chr>              <chr>
## 1                   1 1997JJ00                   420                 43
## 2                   1 1998JJ00                   518                 40
## 3                   1 1999JJ00                   568                 39
## 4                   1 2000JJ00                   461                 27
## 5                   1 2001JJ00                   408                 30
## 6                   1 2002JJ00                   354                 17

Select and filter

It is possible restrict the download using filter statements. This may shorten the download time considerably.

  get_data('71509ENG', Periods='2000JJ00') %>% 
  select(2:5) %>% 
  head
## [1] "https://opendata.cbs.nl/ODataFeed/odata/71509ENG/UntypedDataSet?$format=json&$filter=(Periods%20eq%20'2000JJ00')"
## # A tibble: 5 x 4
##   FruitFarmingRegions Periods TotalAppleVarieties_1 CoxSOrangePippin_2
##                <fctr>  <fctr>                 <chr>              <chr>
## 1   Total Netherlands    2000                   461                 27
## 2        Region North    2000                    87                  5
## 3         Region West    2000                   105                 10
## 4      Region Central    2000                   215                 10
## 5        Region South    2000                    53                  2