Getting started with censusapi

censusapi is a wrapper for the United States Census Bureau’s APIs. As of 2017 over 200 Census API endpoints are available, including Decennial Census, American Community Survey, Poverty Statistics, and Population Estimates APIs. This package is designed to let you get data from all of those APIs using the same main function—getCensus—and the same syntax for each dataset.

censusapi generally uses the APIs’ original parameter names so that users can easily transition between Census’s documentation and examples and this package. It also includes metadata functions to return data frames of available APIs, variables, and geographies.

API key setup

To use the Census APIs, sign up for an API key. Then, if you’re on a non-shared computer, add your Census API key to your .Renviron profile and call it CENSUS_KEY. censusapi will use it by default without any extra work on your part. Within R, run:

# Add key to .Renviron
Sys.setenv(CENSUS_KEY=YOURKEYHERE)
# Reload .Renviron
readRenviron("~/.Renviron")
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")

In some instances you might not want to put your key in your .Renviron - for example, if you’re on a shared school computer. You can always choose to specify your key within getCensus instead.

Finding your API

To get started, load the censusapi library.

library(censusapi)

The Census APIs have over 200 endpoints, covering dozens of different datasets.

To see a current table of every available endpoint, run listCensusApis:

apis <- listCensusApis()
View(apis)

This returns useful information about each endpoint, including name, which you’ll need to make your API call.

Using getCensus

The main function in censusapi is getCensus, which makes an API call to a given Census API and returns a data frame of results. Each API has slightly different parameters, but there are always a few required arguments:

Some APIs have additional required or optional arguments, like time, monthly, or period. Check the specific documentation for your API to see what options are allowed.

Let’s walk through an example getting uninsured rates by income group using the Small Area Health Insurance Estimates API, which provides detailed annual state-level and county-level estimates of health insurance rates.

Choosing variables

censusapi includes a metadata function called listCensusMetadata to get information about an API’s variable options and geography options. Let’s see what variables are available in the SAHIE API:

sahie_vars <- listCensusMetadata(name="timeseries/healthins/sahie", type = "variables")
head(sahie_vars)
##       name                                                           label
## 1 AGE_DESC                                        Age Category Description
## 2 NUI_LB90       Number Uninsured, Lower Bound for 90% Confidence Interval
## 3    STATE                                                 State FIPS Code
## 4  NIC_MOE                                 Number Insured, Margin of Error
## 5  NIPR_PT Number in Demographic Group for Selected Income Range, Estimate
## 6  RACECAT                                                   Race Category
##               concept predicateType group limit          required
## 1      Demographic ID           int   N/A     0              <NA>
## 2 Uncertainty Measure           int   N/A     0              <NA>
## 3       Geographic ID           int   N/A     0              <NA>
## 4 Uncertainty Measure           int   N/A     0              <NA>
## 5            Estimate           int   N/A     0              <NA>
## 6      Demographic ID           int   N/A     0 default displayed

We’ll use a few of these variables to get uninsured rates by income group:

Choosing regions

We can also use listCensusMetadata to see which geographic levels we can get data for using the SAHIE API.

listCensusMetadata(name="timeseries/healthins/sahie", type = "geography")
##     name geoLevelId requires wildcard optionalWithWCFor
## 1     us        010     NULL     NULL              <NA>
## 2 county        050    state    state             state
## 3  state        040     NULL     NULL              <NA>

This API has three geographic levels: us, county within states, and state.

First, using getCensus, let’s get uninsured rate by income group at the national level for 2015.

getCensus(name="timeseries/healthins/sahie",
    vars=c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region="us:*", time=2015)
##            NAME IPRCAT                IPR_DESC PCTUI_PT time us
## 1 United States      0             All Incomes     10.9 2015  1
## 2 United States      1      <= 200% of Poverty     18.6 2015  1
## 3 United States      2      <= 250% of Poverty     17.8 2015  1
## 4 United States      3      <= 138% of Poverty     19.1 2015  1
## 5 United States      4      <= 400% of Poverty     15.1 2015  1
## 6 United States      5 138% to 400% of Poverty     12.8 2015  1

We can also get this data at the state level for every state by changing region to "state:*":

sahie_states <- getCensus(name="timeseries/healthins/sahie",
    vars=c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region="state:*", time=2015)
head(sahie_states)
##         NAME IPRCAT    IPR_DESC PCTUI_PT time state
## 1    Alabama      0 All Incomes     11.9 2015    01
## 2     Alaska      0 All Incomes     16.3 2015    02
## 3    Arizona      0 All Incomes     12.8 2015    04
## 4   Arkansas      0 All Incomes     11.1 2015    05
## 5 California      0 All Incomes      9.7 2015    06
## 6   Colorado      0 All Incomes      9.2 2015    08

Finally, we can get county-level data. The geography metadata showed that we can choose to get county-level data within states. We’ll use region to specify county-level results and regionin to request data for Alabama and Alaska.

sahie_counties <- getCensus(name="timeseries/healthins/sahie",
    vars=c("NAME", "IPRCAT", "IPR_DESC", "PCTUI_PT"), 
    region="county:*", regionin="state:1,2", time=2015)
head(sahie_counties, n=12L)
##                  NAME IPRCAT                IPR_DESC PCTUI_PT time state
## 1  Autauga County, AL      0             All Incomes      9.4 2015    01
## 2  Autauga County, AL      1      <= 200% of Poverty     16.8 2015    01
## 3  Autauga County, AL      2      <= 250% of Poverty     15.5 2015    01
## 4  Autauga County, AL      3      <= 138% of Poverty     18.6 2015    01
## 5  Autauga County, AL      4      <= 400% of Poverty     12.4 2015    01
## 6  Autauga County, AL      5 138% to 400% of Poverty      9.6 2015    01
## 7  Baldwin County, AL      0             All Incomes     11.5 2015    01
## 8  Baldwin County, AL      1      <= 200% of Poverty     21.1 2015    01
## 9  Baldwin County, AL      2      <= 250% of Poverty     19.5 2015    01
## 10 Baldwin County, AL      3      <= 138% of Poverty     22.5 2015    01
## 11 Baldwin County, AL      4      <= 400% of Poverty     15.7 2015    01
## 12 Baldwin County, AL      5 138% to 400% of Poverty     12.2 2015    01
##    county
## 1     001
## 2     001
## 3     001
## 4     001
## 5     001
## 6     001
## 7     003
## 8     003
## 9     003
## 10    003
## 11    003
## 12    003

Advanced geographies

Some geographies, particularly Census tracts and blocks, need to be specified within larger geographies like states and counties. This varies by API endpoint, so make sure to read the documentation for your specific API and run listCensusMetadata to see the available geographies.

You may want to get get data for many geographies that require a parent geography. For example, tract-level data from the 1990 Decennial Census can only be requested from one state at a time.

In this example, we use the built in fips list of state FIPS codes to request tract-level data from each state and join into a single data frame.

fips
##  [1]  1  2  4  5  6  8  9 10 11 12 13 15 16 17 18 19 20 21 22 23 24 25 26
## [24] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 44 45 46 47 48 49 50
## [47] 51 53 54 55 56
tracts <- NULL
for (f in fips) {
    stateget <- paste("state:", f, sep="")
    temp <- getCensus(name="sf3", vintage=1990,
    vars=c("P0070001", "P0070002", "P114A001"), region="tract:*",
    regionin = stateget)
    tracts <- rbind(tracts, temp)
}
head(tracts)
##   state county  tract P0070001 P0070002 P114A001
## 1    01    001 020100      944      917    11663
## 2    01    001 020200      917     1060     8555
## 3    01    001 020300     1451     1518    11782
## 4    01    001 020400     2166     2223    15323
## 5    01    001 020500     1604     1582    14522
## 6    01    001 020600     1784     1661    10630

The regionin argument of getCensus can also be used with a string of nested geographies, as shown below.

The 2010 Decennial Census summary file 1 requires you to specify a state and county to retrieve block-level data. Use region to request block level data, and regionin to specify the desired state and county.

data2010 <- getCensus(name="sf1", vintage=2010,
    vars=c("P0010001", "P0030001"), 
    region="block:*", regionin="state:36+county:027")
head(data2010)
##   state county  tract block P0010001 P0030001
## 1    36    027 010000  1020       73       73
## 2    36    027 010000  1023        0        0
## 3    36    027 010000  1030       68       68
## 4    36    027 010000  1031        0        0
## 5    36    027 010000  1032        0        0
## 6    36    027 010000  1033        0        0

For the 2000 Decennial Census summary file 1, tract is also required to retrieve block-level data. This example requests data for all blocks within Census tract 010000 in county 027 of state 36.

data2000 <- getCensus(name="sf1", vintage=2000,
    vars=c("P001001", "P003001"), 
    region="block:*", regionin="state:36+county:027+tract:010000")
head(data2000)
##   state county  tract block P001001 P003001
## 1    36    027 010000  1000      18      18
## 2    36    027 010000  1001      26      26
## 3    36    027 010000  1002      59      59
## 4    36    027 010000  1003      67      67
## 5    36    027 010000  1004      52      52
## 6    36    027 010000  1005     116     116

Additional resources

Disclaimer

This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.