Purpose

This document describes how to interact with U.S. Census data. This allows researchers to collect data relevant to their research questions.

Prerequisites

  • You must have a Census API key.

Visit https://api.census.gov/data/key_signup.html to apply for an API key. The key will be delivered to you via email.

Installation

Package dependencies

We’ll use the tidycensus package. Most of the steps in this brief how_to come from https://walkerke.github.io/tidycensus/articles/basic-usage.html.

if (!require(tidycensus)){
  install.packages("tidycensus")
}
## Loading required package: tidycensus
# Load tidyverse package for pipe `%>%` operator
library(tidyverse)
## ── Attaching packages ────────────────
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Store credentials

Run the following command at the R console, inserting the API key you received from the Census.

tidycensus::census_api_key("YOUR API KEY GOES HERE")

To install for future sessions, run the following command:

tidycensus::census_api_key("YOUR API KEY GOES HERE", install = TRUE)

This saves the Census API key in your .Renviron variable.

Testing the package

The tutorial suggests the following command to test the API:

med_rent_1990 <- tidycensus::get_decennial(geography = "state", 
                                         variables = "H043A001", year = 1990)
## Getting data from the 1990 decennial Census
head(med_rent_1990)
## # A tibble: 6 x 4
##   GEOID NAME       variable value
##   <chr> <chr>      <chr>    <dbl>
## 1 01    Alabama    H043A001   325
## 2 02    Alaska     H043A001   559
## 3 04    Arizona    H043A001   438
## 4 05    Arkansas   H043A001   328
## 5 06    California H043A001   620
## 6 08    Colorado   H043A001   418

This downloads data about median rents by state from the 1990 Census.

med_rent_1990 %>%
  ggplot(aes(x = value, y = reorder(NAME, value))) + 
  geom_point()

Here’s another test of the package. Let’s look at household income by county in Pennsylvania from the 2018 American Community Survey (ACS).

pa <- tidycensus::get_acs(geography = "county", 
              variables = c(medincome = "B19013_001"), 
              state = "PA", 
              year = 2018)
## Getting data from the 2014-2018 5-year ACS
pa
## # A tibble: 67 x 5
##    GEOID NAME                           variable  estimate   moe
##    <chr> <chr>                          <chr>        <dbl> <dbl>
##  1 42001 Adams County, Pennsylvania     medincome    64507  1226
##  2 42003 Allegheny County, Pennsylvania medincome    58383   611
##  3 42005 Armstrong County, Pennsylvania medincome    49032  2051
##  4 42007 Beaver County, Pennsylvania    medincome    55828  1201
##  5 42009 Bedford County, Pennsylvania   medincome    49146  1524
##  6 42011 Berks County, Pennsylvania     medincome    61522   826
##  7 42013 Blair County, Pennsylvania     medincome    47969  1449
##  8 42015 Bradford County, Pennsylvania  medincome    51457  1337
##  9 42017 Bucks County, Pennsylvania     medincome    86055  1084
## 10 42019 Butler County, Pennsylvania    medincome    68472  1115
## # … with 57 more rows

With a bit of cleaning, we can make a nice plot.

pa %>%
  mutate(NAME = gsub(" County, Pennsylvania", "", NAME)) %>%
  ggplot(aes(x = estimate, y = reorder(NAME, estimate))) +
  geom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe)) +
  geom_point(color = "red", size =1) +
  labs(title = "Household income by county in Pennsylvania",
       subtitle = "2013-2017 American Community Survey",
       y = "",
       x = "ACS estimate (bars represent margin of error)")