Below, you will find some of the graphs (and accompanying code) I made as part of the R4DS weekly visualization challenges. All of the entries I made for #TidyTuesday can be found on my Github page.
On this week’s TidyTuesday challenge, we worked with data on vaccination rates across the USA. I decided to make a graph of vaccination rates by county. See below for the final product. Start by loading the necessary packages and the measles data.
# packages
library(tidyverse)
library(pacman)
library(tidyverse)
# data
measles <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-25/measles.csv')
Let’s explore the data.
summary(measles$mmr) # distribution of vaccination rates
summary(measles$overall) # distribution of vaccination rates
weird <- measles %>%
filter(overall == -1 | mmr == -1)# who are the "-1" ? Apparently they're missing data.
table(weird$state) # some have missing data for "mmr", some have missing data for "overall" -- we will use this later
length(unique(measles$county)) # how many unique counties?
table(measles$year, useNA = "always") # which years?
Then, we need to do some data clean-up.
measles_plot_viz <- measles %>%
distinct(state, name, county, overall, mmr, lat, lng, enroll) %>% # remove duplicates
mutate(drop = ifelse(mmr >= 0 & overall >= 0, "drop", "keep")) %>%
filter(lng < 0) %>% # remove schools outside of the continent
gather(type, value, overall:mmr) %>%
filter(value >= 0) %>%
filter(!(drop == "drop" & type == "overall"))
Finally, let’s draw the map! I will be using the maps
package to do so. I had help from Kieran Healy’s book during this step, as
well as Timo Grossenbacher’s blog.
# load packages
library(maps)
library(mapproj)
# 1. load the county data
us_counties <- map_data("county") %>%
mutate(state_county = str_c(region, subregion, sep = "_"))
head(us_counties)
# 2. check the distribution of vacc. rate
summary(measles_plot_viz$value)
# 3. draw a plain county map to see if it's working
p <- ggplot(mapping = aes(x = long, y = lat, group = group), data = us_counties)
p1 <- p + geom_polygon(color = "white", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
guides(fill = FALSE) +
theme_minimal()
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
p1
Now that we know the map is working at the vaccination data makes sense, we can plot vaccination rates across the U.S. But first, for the map’s caption, we need to check which type of vaccination rate is plotted in which state:
table(measles_plot_viz$type, measles_plot_viz$state)
##
## Arizona Arkansas California Colorado Connecticut Florida Idaho
## mmr 1171 567 14068 1501 586 0 0
## overall 0 0 0 0 0 2667 467
##
## Illinois Iowa Maine Massachusetts Michigan Minnesota Missouri
## mmr 7115 0 357 946 0 1570 645
## overall 0 1163 0 0 2351 0 0
##
## Montana New Jersey New York North Carolina North Dakota Ohio
## mmr 520 0 4137 0 360 2919
## overall 0 1993 0 2083 0 0
##
## Oklahoma Oregon Pennsylvania Rhode Island South Dakota Tennessee
## mmr 0 806 1930 0 373 0
## overall 1052 0 0 215 0 1152
##
## Texas Utah Vermont Washington Wisconsin
## mmr 810 603 255 1977 0
## overall 0 0 0 0 2497
Now we can plot:
# 4. draw map with vacc. data
p <- ggplot()
p1 <- p +
geom_polygon(mapping = aes(x = long, y = lat, group = group),
data = us_counties,
fill = "#f5f5f2",
color = "black",
size = 0.1) + # add county lines
geom_point(mapping = aes(x = lng, y = lat, color = value),
data = subset(measles_plot_viz, value > 50),
size = 0.6) + # add school dots
coord_map(projection = "albers", lat0 = 39, lat1 = 45) + # make the map more "curvy"
theme_void() + # remove grid and axes
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5,
vjust = 0.5,
size = 15),
plot.background = element_rect(fill = "#f5f5f2",
color = NA),
plot.caption = element_text(hjust = 0.5,
size = 10),
plot.margin = margin(t = 1, r = 1, b = 1, l = 1,
unit = "cm"),
legend.margin = margin(t = 2, r = 2, b = 5, l = 2,
unit = "mm")) + # modify plot title/caption/background/legend
scale_color_viridis_c(option = "plasma",
direction = -1,
name = "Vaccination rate",
guide = guide_legend(direction = "horizontal",
title.position = "top",
title.hjust = 0.5)) + # change legend title
labs(title = "Schools' Vaccination Rate in 31 U.S. States",
caption = "Measles, Mumps, and Rubella (MMR) vaccination\nrate shown in FL, ID, IA, MI, NJ, NC, OK, RI, TN, WI.\nElsewhere, the overall vaccination rate is shown.") # deifne title and subtitle
p1
# save plot using ggsave(filename = "name.png", plot = last_plot(), width = 15, height = 15)
For this week,s TidyTuesday, I create a graph that reports the names and achievements of African-Americans in the Arts and Entertainment. I chose a bar chart, where each bar corresponds to a decade of African-American accomplishments. Start by loading the necessary packages and the data on African-American firsts.
library(tidyverse)
library(extrafont)
firsts <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-06-09/firsts.csv')
# Let's explore some variables
table(firsts$category)
Now, we will recode the data. We will focus on the
person
variable, which contains the information on who the
achiever is, and what they’ve accomplished. Once we’ve cleaned up the
person
variable, we will create a categorical variable for
the decade of achievement, and collapse everyone (and their
accomplishment) from the same decade together into one observation. The
last thing we will do is insert a new line every 20 characters. If we do
not insert the new line, the text will run too wide in the graph.
dat <- firsts %>%
mutate(person_sh = str_remove_all(person, "\\(.*\\)"), # remove everything in parenthesis
person_sh = str_remove_all(person, "\\(.*"), # remove everything in parenthesis
person_sh = str_remove_all(person_sh, "\\[.*\\]"), # remove brackets and their content
person_sh = str_trim(person_sh, "right"), # remove white space at the end of names
person_complete = str_c(person_sh, ", ", accomplishment), # create complete name + accomplishment title
word_count = str_count(person_complete), # count number of words in complete names
count = 1,
person_complete = str_replace_all(person_complete, "African-American", "Af-Am"), # shorten for visualization purposes
twenty_yr = ifelse(year <= 1780, '1760s-70s',
ifelse(year >= 1781 & year <= 1800, "1780s-90s",
ifelse(year >= 1801 & year <= 1820, "1800s-10s",
ifelse(year >= 1821 & year <= 1840, "1820s-30s",
ifelse(year >= 1841 & year <= 1860, "1840s-50s",
ifelse(year >= 1861 & year <= 1880, "1860s-70s",
ifelse(year >= 1881 & year <= 1900, "1880s-90s",
ifelse(year >= 1901 & year <= 1920, "1900s-10s",
ifelse(year >= 1921 & year <= 1940, "1920s-30s",
ifelse(year >= 1941 & year <= 1960, "1940s-50s",
ifelse(year >= 1961 & year <= 1980, "1960s-70s",
ifelse(year >= 1981 & year <= 2000, "1980s-90s", "2000s-10s"))))))))))))) %>% # create categorical variable for twenty years
group_by(twenty_yr, category) %>%
mutate(persons_20 = paste0(person_complete, collapse = "; ")) %>% # collapse strings for each twenty years and category
ungroup() %>%
group_by(category) %>%
mutate(count_cat = cumsum(count)) %>%
ungroup() %>%
distinct(twenty_yr, count_cat, category, persons_20) %>%
group_by(twenty_yr, category) %>%
arrange(count_cat) %>%
filter(row_number() == n())
dat$persons_20 <- gsub("(.{15,}?)\\s", "\\1\n", dat$persons_20) # insert new line every 15 characters
Let’s make the graph! I was inspired by G. Karamanis’s work to do this graph. You can check out his work here. I decided to plot accomplishments in the Arts and Entertainment only, but you could do the same with science, education, sports, etc.
# plot, Arts only
viz <- dat %>% filter(category == "Arts & Entertainment")
p <- ggplot(viz, aes(x = twenty_yr, y = count_cat))
p1 <- p +
scale_y_continuous(name = "",
limits = c(-700,700)) +
scale_x_discrete(name = "",
expand = c(0.2, 0)) +
annotate("text", x = viz$twenty_yr, # add names
y = viz$count_cat,
label = viz$persons_20,
size = 2.5,
color = "white",
family = "Arial Narrow") +
annotate("text", x = "1760s-70s", # add title
y = 400,
label = "First\nAfrican Americans\nin Arts & Entertainment",
size = 15,
hjust = 0,
color = "white",
family = "Andale Mono") +
theme_void() +
coord_cartesian(clip = "off") +
theme(plot.background = element_rect(fill = 'grey11', colour = 'grey11'),
plot.margin = margin(200, 40, 100, 30), # play with margins
axis.text.x = element_text(colour = "gray69", # modify x axis
size = 15,
family = "Andale Mono"))
p1