You may have heard that Republicans in the Senate have been putting together a “repeal and replace” bill to “fix” Obamacare called the Better Care Reconciliation Act (BCRA). I won’t get into the political debate or the many demerits of the bill on this blog. Others have much more insightful analysis than I. Suffice it to say, the evidence-wielding health policy crowd has little positive to say about BCRA. Most relevant to this post, the current Senate bill removes important provisions for those currently on Medicaid, a program that helps the poor in many ways. Last week, the Congressional Budget Office (CBO) released their score for BCRA, and projected that 22 million people would lose insurance coverage by the year 2026 under BCRA. It gets worse by 2036, according to a report the CBO released a few days later to project longer-term trends.
In addition to these projections, the Center for American Progress (CAP) created state-level projections of increases in the percent of uninsured as well as state-level losses in the number of insured individuals, including those losing Medicaid coverage. It is these projections upon which I focus today. I’ll warn readers: the rest of the post will be statistical software how-to and less about the law itself, so no hard feelings if you turn back now.
I’ve been toying around in the statistical software R for a year or so, trying to teach myself some data management, visualization, and statistical analyses. A cool visualization package, geofacet, was released recently that allows you to build a geographic “grid” with different regions onto which you can map regional data. I took this opportunity to come up with a timely hands-on example of using geofacet that I’d like to share. So here’s how I put this map together.
CAP makes available spreadsheets of their analysis results, so I took that spreadsheet as a starting point. That dataset shows numbers of people losing coverage by state and eligibility type: non-elderly adults, children, disabled, and elderly. They also include a projection for number of people losing Medicaid coverage via rollbacks of Medicaid expansion made possible under Obamacare. Since not all states expanded Medicaid, I didn’t include those projections in this graphic.
Preview of the CAP Dataset
Before we get into the R code, we need to do a few manipulations to the dataset in its spreadsheet form. In the future, I may update this post with how to do all of these steps in R, but for this example I did matching with the VLOOKUP function and percentage conversions in Google Sheets.
For geofacet to really shine, you need the range of your x-axis to be the same across all states (or whatever units you’re using). If you allow them to be flexible, you risk misrepresenting your results and states are not directly comparable to one another. However, if you make them the same and there are massive ranges across states, you will get a chart with almost invisible bars for some states. This happened to me initially because the CAP numbers are raw counts of people, and obviously more people in California stand to lose coverage than in Wyoming, simply by population. So we have to convert them into percentages of state totals of Medicaid enrollees. This is likely a more meaningful statistic as well, to see the relative losses in each state.
To get percentages, we need another dataset that the CAP team used from the Medicaid and CHIP Payment and Access Commission (MACPAC). This spreadsheet shows the 2013 numbers of Medicaid enrolled individuals across the different coverage types that CAP analyzed. It’s important to recognize that Medicaid inherently has a lot of churn – as in people fluctuate in and out of Medicaid eligibility – so we use this dataset that is the “full year equivalent” enrollment data. Once we match the enrollment numbers from MACPAC to the loss numbers from CAP, we can divide the losses by the number of enrolled for each coverage type and arrive at a percentage loss for each state and eligibility group. An important limitation that has to be recognized here is that we are taking coverage losses accruing from 2018-2026 as a percentage of 2013 enrollments. This is not ideal; we’d much prefer contemporaneous enrollments and losses over time. However, the purpose of the example, we will settle for the 2013 denominator. Matching the MACPAC data and calculating percentages gives us the spreadsheet below (truncated). Now it’s time to use R (annotated R script here).
CAP Dataset with added baseline enrollments from MACPAC.
Load the packages we will need for 1) pulling the data in from Google Sheets, 2) setting it up for display, and 3) displaying it on the map grid.
install.packages("devtools") # install geofacet dev version from github devtools::install_github("hafen/geofacet") install.packages("ggplot2","googlesheets","reshape2","plyr","geofacet") library(ggplot2) library(googlesheets) library(reshape2) library(plyr) library(geofacet)
I use the “googlesheets” package to pull in data from Google Sheets into R for analysis. I do this so that all of the data can be in my Drive and I don’t have to import an actual document. Instead I can just have R connect to my Google Sheets (once authenticated that it’s me) and scrape the data I want. See this post for more on using googlesheets.
We connect to Google Sheets and list out our most recently edited sheets document with the gs_ls() command:
Running this command in R will generate a list that looks like this:
Header of googlesheets list of recently updated sheets.
To pull in the data we want from the “CAP_BCRA” sheet (which you can access & add to your Drive here), we need to run another couple of commands. The first stores a bunch of “demographic” information from the document so that we can reference it in future code. The second is using some of that information to tell R that we want it to look at the “CAP_BCRA” document and bring in data from the “for_r” sheet.
CAP_BCRA <- gs_title("CAP_BCRA") cap_bcra_state_losses <- gs_read(CAP_BCRA, ws = "for_r", stringsAsFactors=FALSE)
We now have our data in R and can get it ready for use in the geofacet mapping package. We will create a new data set that only has the state identifiers and percentage columns that we care about mapping.
cap_bcra_state_losses_pct <- cap_bcra_state_losses[,c(1,2,5,8,11,14,17)]
Now, since the columns are not intuitively named, we will rename them for better visualization and readability.
cap_bcra_state_losses_pct <- rename(cap_bcra_state_losses_pct, c("Total Pct Lost" = "Total Non-Elderly", "Adult Pct Lost" = "Adult", "Children Pct Lost" = "Children", "Disabled Pct Lost" = "Disabled", "Elderly Pct Lost" = "Elderly"))
We then need to convert the percentages into whole number percents (e.g. 98 instead of 0.98).
cap_bcra_state_losses_pct$`Total Non-Elderly` <- cap_bcra_state_losses_pct$`Total Non-Elderly`*100 cap_bcra_state_losses_pct$Adult <- cap_bcra_state_losses_pct$Adult*100 cap_bcra_state_losses_pct$Children <- cap_bcra_state_losses_pct$Children*100 cap_bcra_state_losses_pct$Disabled <- cap_bcra_state_losses_pct$Disabled*100 cap_bcra_state_losses_pct$Elderly <- cap_bcra_state_losses_pct$Elderly*100
The last step of setup is to take our dataset from a “wide” one to a “long” one, which geofacet needs in order to accurately map the data. The data currently looks like this:
And this command:
cap_bcra_state_losses_pct <- melt(cap_bcra_state_losses_pct, id = c("name","code"))
Makes it look like this:
At this point, our data is ready to be mapped on to the United States grid. We will tell the plotting command to use our percent dataset above, and fill the columns by state with the “value” associated with each “variable.” We’ll make our y-axis only go to 50% since none of the values are above 50. The rest is simply adding text in the form of a title, caption, and axis labels. Finally, the facet_geo argument tells R to use the default US state map grid and match states based on “code” which in this case is the two-letter abbreviation of the state.
ggplot(cap_bcra_state_losses_pct, aes(variable, value, fill = variable)) + geom_col() + scale_y_continuous(limits = c(0,50), breaks = c(10, 20, 30, 40, 50)) + coord_flip() + theme_bw() + labs(title = "Percent Loss in Medicaid Coverage for 2026, by State and Eligibility Type", caption = "Data Source: Center for American Progress & MACPAC. ID, LA, RI inestimable due to unreliable MACPAC data on total enrollees. Visualization by @NateApathy", x = "Eligibility Type", y = "Percent of Medcaid Enrollees Losing Coverage") + facet_geo(~ code, label = "name")
After running this, a window should open and populate your graphic!
Final map showing percentage losses in Medicaid coverage under BCRA by state and eligibility type.