How old is the Senate?

The current tax reform debate and congressional procedures in general have me watching a lot of CSPAN lately. As I’ve followed this debate, I got curious about something that became more and more apparent: I knew Congress was old… but exactly how old? This seemed like a fun, easy data visualization task, and here we are. I found a website that maintains a nice simple HTML table of names, current ages, term length, party affiliation, and some other information about the Senate, and I started there by scraping the data into R. If you want to run this yourself, the R code is below the graphic. Note: it may require you to install some packages via the code commented out at the top.

I’m not going to go into the scraping/data cleaning process because it does take a little bit of wrangling to get the table data formatted to what we need for the visualization. There are some nasty extra characters in the table that for whatever reason the readHTMLTable() function doesn’t handle well. I’m open to suggestions for other HTML scraping packages/functions that improve the fidelity and clarity of the data. The upshot of using this website in particular is that it appears to be a daily-updated table, so the ages will be current if you re-run the code in, say, a year from now.

Right now, the visualization is flat and not interactive, which is less than desirable. I’ve got an interactive one in the works. If you run the code below, it will generate an interactive plotly version of the graphic with tooltips showing the age, term length, party, and the senator’s name.

sen_ages

R Code

# get our packages loaded... you may have to install some of these
# install.packages("XML")
# install.packages("RCurl")
# install.packages("rlist")
# install.packages("ggplot2")
# install.packages("stringr")
# install.packages("plotly")

library(XML)
library(RCurl)
library(rlist)
library(ggplot2)
library(stringr)
library(plotly)

theurl <- getURL("https://infogalactic.com/info/List_of_current_United_States_Senators_by_age",.opts = list(ssl.verifypeer = FALSE) )
sen_ages <- readHTMLTable(theurl)
sen_ages <- sen_ages$`NULL`

# the last row is erroneously included, trim it off
sen_ages <- sen_ages[1:100,]

# convert the current age field to character for parsing
as.character(sen_ages$`Current age`)

# grab the age in years out of that string
sen_ages$Age <- as.numeric(substr(sen_ages$`Current age`, 21,22))

# char conversion for party
sen_ages$Party <- as.character(sen_ages$Party)
# independent has some weird [1] after it, remove that
sen_ages$Party <- str_replace_all(sen_ages$Party,regex("^Indep.*$"),"Independent")
# make it a factor to use in the colorization
sen_ages$Party <- as.factor(sen_ages$Party)

# clean up the senator names
sen_ages$lastname <- as.character(sen_ages$Senator)
sen_ages$lastname <- gsub(",.*$", "", sen_ages$lastname)

# grab the term length variable
sen_ages$TermLength <- as.numeric(substr(as.character(sen_ages[,7]), 21,22))
# some are single digits, so just trim off that white space
sen_ages$TermLength <- trimws(sen_ages$termlen,which = "right")
# make sure they are numeric
sen_ages$TermLength <- as.numeric(sen_ages$termlen)

# define our colors for the parties
plotcols <- c('Democratic'='blue','Republican'='red','Independent'='green')

# plot it
plot <- ggplot(sen_ages, aes(x=Age)) + geom_histogram(aes(Age), alpha=0.3, bins=10) +
 scale_fill_manual(values=plotcols) +
 scale_color_manual(values = plotcols) +
 geom_vline(xintercept = 81) +
 geom_vline(xintercept = 65) +
 geom_point(aes(y=TermLength, color=Party, text=c(paste("Senator:",lastname))), alpha=0.6, size=4) +
 labs(x="Current Age in Years",
 y="Years in Office",
 title = "Age Distribution and Term Length of Current Senate") +
 annotate("text", x = 50, y = 19, label = "Histogram shows overall distribution of ages in Senate") +
 annotate("text", x= 70, y = 0.5, label="Vertical lines at 65y (retirement age) and 81y (female life expectancy)") +
 theme(legend.position="none")

plot

ggplotly(plot)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s