As I learn more about exploratory data analysis, I’ve started working with R’s visualization libraries and learning topics like faceting, mapping, and basic interactive tables.
As I work with ggplot more, I find myself tweaking the graphics I’ve built quite a bit. In building manual patchwork
graphics, this created a lot of repetitive code. Much of this was related to the quirks of the data and was partly unavoidable given my approach, but a lot of duplicated code was produced to manage the theme elements of the plots.
Elements of a ggplot theme
It’s easier to show ggplot theming than to talk about it, however, so let’s look at a facet grid I wrote previously, but this time stripped of theme updates so that only the default theme_gray()
is used. Then we will modify the major elements that allow us to change many individual minor elements all at once.
If you are unfamiliar with how ggplot code structures a graphic, we start by piping our data (creatively called data
) using the pipe operator %>%
into an empty plot (ggplot()
) and add elements using the +
operator, in this case a boxplot with geom_boxplot()
.
data %>%
ggplot() +
geom_boxplot(
mapping = aes(
x = Reservoir,
y = Release,
fill = Reservoir)
) +
facet_wrap(
Site ~ .,
scales = "free_y") +
labs(
title = "Water Release from Upstate Reservoirs (millions of gallons)",
caption = "Source: NYC Department of Environmental Protection"
)
I find the default theme very ugly, but it’s sufficient to demonstrate each major element of a ggplot theme: lines, rectangles, text and title, and aspect ratio. When these major elements are modified, many minor elements of the theme will inherit the changes. Minor element categories include the axes, legend, panel, plot, and strips.
We’ll use ?theme()
to read the docs and identify specific elements to modify.
data %>%
ggplot() +
geom_boxplot(
mapping = aes(
x = Reservoir,
y = Release,
fill = Reservoir)
) +
facet_wrap(
Site ~ .,
scales = "free_y") +
labs(
title = "Water Release from Upstate Reservoirs (millions of gallons)",
caption = "Source: NYC Department of Environmental Protection"
) +
theme(
rect = element_rect(fill = "#55aaff"),
line = element_line(size = 1),
text = element_text(family = "IBM Plex Sans"),
aspect.ratio = 2.76/1
)
With this, we can see each of the major elements modified. The rectangle fill component is now light blue (#5AF), but we can see many rectangles did not inherit the change. Preventing universal fill inheritance is presumably a sanity check by the developers, since globally changing all rectangular fills is not something most people want to do.
The lines are now set at a thickness of 1, so we can see the plot grid and tick lines are much larger. Notably, the box plots themselves are not modified by theme changes. The text is now set to IBM Plex Sans, which I use throughout my blog. The aspect ratio is set to a cinematic 2.76:1, which would be an appropriate ratio for an IMAX documentary.
Modifying minor elements
Now let’s demonstrate modifying some minor elements. We’ll keep the global text element change, but the other changes were not practical settings and were really only useful for demonstration.
data %>%
ggplot() +
geom_boxplot(
mapping = aes(
x = Reservoir,
y = Release,
fill = Reservoir)
) +
facet_wrap(
Site ~ .,
scales = "free_y") +
labs(
title = "Water Release from Upstate Reservoirs (millions of gallons)",
caption = "Source: NYC Department of Environmental Protection"
) +
theme(
text = element_text(family = "IBM Plex Sans"),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_text(angle = 90),
legend.title.align = .5,
legend.key = element_blank(),
legend.background = element_rect(color = "#EEEEEE"),
panel.background = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color = "#EEEEEE"),
plot.caption.position = "plot",
strip.background = element_blank()
)
This now looks much more presentable. I won’t explain every individual change, but I modified at least one item from each minor element category. If it seems like there are quite a few lines defining these changes, every theme actually has several hundred elements defined, even if any given element is blank.
I would print out the results of a theme object here, but I don’t a good way to do that without printing a 395-line block of code. If you’re interested, enter theme_gray()
to see details for the standard theme. I used theme_test()
as a minimal base to reduce the lines I had to write while patchworking as it has many blank elements.
Speaking of blank elements, undefined (and thus unused) elements are NULL
values for all default themes and those that come with packages like ggthemes
. Above I used a special function called element_blank()
to set given theme elements to NULL
. Modifying theme()
only modifies the current plot, not future ones.
Updating and building themes
Instead of the tedious, repetitive, and error-prone task of modifying each plot with theme()
, it’s much more efficient and practical to set a theme once. I plan to use R heavily, so implementing a personal theme makes sense for me. Some ready-made themes are close to what I want, particularly theme_test()
, but I’d like more control.
To get an idea of how others have used ggplot theming, I surveyed a number of style guides written by organizations that use R for inspiration, including the BBC’s R Cookbook. This TDS article was helpful in thinking about developing themes, and the actual code for the default theme_grey()
was also insightful.
After this survey, it seems that for my immediate purposes and RMarkdown blog workflow that the most efficient route for now is to set the changes I want once at the beginning of a post and then reuse the code for each post. Eventually, I’d like to create a package for a full theme. This is the approach the BBC took.
Their package has two functions, bbc_style()
and finalise_plot()
, which adds branding and proper spacing and such. Their typical plot could look something like data %>% ggplot() + bbc_style() + finalise_plot()
with relevant geometries and theme()
tweaks added in between.
Instead of adding the theme to each plot, I’d add these lines to my setup code chunk:
library(tidyverse)
# install.packages('devtools')
# devtools::install_github('aemacleod/theme_aem')
library(theme_aem)
theme_set(theme_aem)
That would make the theme apply by default so that + theme_aem()
wouldn’t be necessary for each plot, only per-plot modifications with theme()
. For now, I will add the changes I want to make to my initial setup code chunk and then reuse that code in subsequent RMarkdown post documents.
Creating a personal theme
The ggplot2 documentation describes some of the basics of writing a theme: theme_get()
will return the current theme, by default theme_gray()
. theme_set(theme_test())
changes the theme to theme_test()
for future plots, theme_update()
updates elements but does not delete unspecified ones, and theme_replace()
deletes unspecified theme elements.
To demonstrate the basics of this, I have taken the theme modifications I used above but put them in the function theme_update()
instead of theme()
. They should now become the default settings for subsequent plots.
theme_update(
text = element_text(family = "IBM Plex Sans"),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_text(angle = 90),
legend.title.align = .5,
legend.key = element_blank(),
legend.background = element_rect(color = "#EEEEEE"),
panel.background = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color = "#EEEEEE"),
plot.caption.position = "plot",
strip.background = element_blank()
)
Now I will repeat the same code block I used at the top of this post to confirm that the changes to the working theme were saved:
data %>%
ggplot() +
geom_boxplot(
mapping = aes(
x = Reservoir,
y = Release,
fill = Reservoir)
) +
facet_wrap(
Site ~ .,
scales = "free_y") +
labs(
title = "Water Release from Upstate Reservoirs (millions of gallons)",
caption = "Source: NYC Department of Environmental Protection"
)
These are not all the changes I would make to the theme, but they are a good representative sample. The full theme I drafted to address all theme elements has 366 lines of code, so it’s more than I want to put directly into any document. I saved it in a separate file, theme_aem.r
and will need to import it in order to use it.
The R Markdown Cookbook includes instructions for importing scripts from external sources. First, I’ll set the theme back to the default with theme_set()
, then import my theme document and plot the data again. I added the x axis text rotation manually with theme()
because that’s a non-standard modification that I did not include in my theme.
theme_set(theme_gray())
source("~/Documents/theme_aem.R", local = knitr::knit_global())
data %>%
ggplot() +
geom_boxplot(
mapping = aes(
x = Reservoir,
y = Release,
fill = Reservoir)
) +
facet_wrap(
Site ~ .,
scales = "free_y") +
labs(
title = "Water Release from Upstate Reservoirs (millions of gallons)",
caption = "Source: NYC Department of Environmental Protection"
) +
theme(
axis.text.x = element_text(angle = 90)
)
These theme settings work for this particular plot, which has many common elements and was good for demonstration, but I will need to battle test it over time with other plots to make sure all the individual settings make sense. I will also experiment with different variations to see if there are minor variations I like better.
It’s a decent start, and it will let me save a lot of time and effort manually adjusting plots.