ipumsr Example - CPS

Minnesota Population Center

2017-12-15

IPUMS - CPS Extraction and Analysis

Exercise 1

OBJECTIVE: Gain an understanding of how the IPUMS dataset is structured and how it can be leveraged to explore your research interests. This exercise will use the IPUMS dataset to explore associations between health and work status and to create basic frequencies of food stamp usage.

This vignette is adapted from the CPS Data Training Exercise available here: https://pop.umn.edu/sites/pop.umn.edu/files/final_review_-_cps_spss_exercise_1_0.pdf

Research Questions

What is the frequency of food stamp recipiency in the US? Are health and work statuses related?

Objectives

  • Create and download an IPUMS data extract
  • Decompress data file and read data into R
  • Analyze the data using sample code
  • Validate data analysis work using answer key

IPUMS Variables

  • PERNUM: Person number in sample unit
  • FOODSTMP: Food stamp receipt
  • AGE: Age
  • EMPSTAT: Employment status
  • AHRSWORKT: Hours worked last week
  • HEALTH: Health status

Download Extract from IPUMS Website

  1. Register with IPUMS - Go to http://cps.ipums.org, click on CPS Registration and apply for access. On login screen, enter email address and password and submit it!

  2. Make an Extract
  1. Request the Data
  1. Download the Data

Getting the data into R

You will need to change the filepaths noted below to the place where you have saved the extracts.

#> Could not find CPS data and so could not run vignette.
#> 
#> If you tried to download the data following the instructions above, please makesure that the filenames are correct: 
#> ddi - cps_00001.xml
#> data - cps_00001.dat
#> And that you are in the correct directory if you are using a relative path:
#> Current directory - C:/Users/umn-burkx031/AppData/Local/Temp/Rtmpm6X7mY/Rbuilde8418a62dc1/ipumsr/vignettes
#> 
#> The data is also available on github. You can install it using the following commands: 
#>   if (!require(devtools)) install.packages('devtools')
#>   devtools::install_github('mnpopcenter/ipumsr/ipumsexamples')
#> After installation, the data should be available for this vignette.

Note that the data_file argument is optional if you didn’t change the data file name and have it saved in your working directory; read_ipums_micro can use information from the DDI file to locate the corresponding data file.

Exercises

These exercises include example code written in the “tidyverse” style, meaning that they use the dplyr package. This package provides easy to use functions for data analysis, including mutate(), select(), arrange(), slice() and the pipe (%>%). There a numerous other ways you could solve these answers, including using the base R, the data.table package and others.

Analyze the Sample – Part I Frequencies of FOODSTMP

  1. On the website, find the codes page for the FOODSTMP variable and write down the code value, and what category each code represents.
  1. What is the universe for FOODSTMP in 2011 (under the Universe tab on the website)?
  1. How many people received food stamps in 2011?
  1. What proportion of the population received food stamps in 2011?

Using household weights (HWTSUPP)

Suppose you were interested not in the number of people living in homes that received food stamps, but in the number of households that were food stamp participants. To get this statistic you would need to use the household weight.

In order to use household weight, you should be careful to select only one person from each household to represent that household’s characteristics. You will need to apply the household weight (HWTSUPP).

  1. How many households received food stamps in 2011?
  1. What proportion of households received food stamps in 2011?

Analyze the Sample – Part II Relationships in the Data

  1. What is the universe for EMPSTAT in 2011?
  1. What are the possible responses and codes for the self-reported HEALTH variable?
  1. What percent of people with ‘poor’ self-reported health are at work?
  1. What percent of people with ‘very good’ self-reported health are at work?
  1. In the EMPSTAT universe, what percent of people:
  1. self-report ‘poor’ health and are at work?
  1. self-report ‘very good’ health and are at work?

Analyze the Sample – Part III Relationships in the Data

  1. What is the universe for AHRSWORK?
  1. What are the average hours of work for each self-reported health category?

Bonus

  1. Use the ipumsr package metadata functions (like ipums_var_label() and ipums_file_info()) and ggplot2 to make a graph of the relationship between HEALTH and percent employed (from Part III above).
  1. Are there any variables that might be confounding this relationship? How might you explore this relationship?