R graphics: varying geoms between facets in ggplot2

Hadley Wickham’s ggplot2 package is a very powerful and (once you’ve got used to it) intuitive R graphics framework, based on the Grammar of Graphics, that most R users will come across at some point. One of its most useful features is facetting: splitting data up between multiple plots, in the same window (or device), based on some aspect of the data – usually a factor.

Facetting is great but, as with many aspects of ggplot2, it lacks some flexibility. The reasons for this lack of flexibility are often sensible, but it can be frustrating. We can use facets to split data, but there’s no option to vary the geoms used to plot data between facets: all facets must share the same geom, e.g. bars, or lines. This can be frustrating when some aspects of your data would be better represented using a different geom to the one you’ve already chosen.

Yesterday, however, I came across a work-around for this via this Stat Bandit blog post. The work-around uses the subset() function within each geom to control which facet each geom in plotted on. I’ve included an example below, which illustrates plotting monthly counts of blog views alongside a cumulative count. My code is based on that on the aforementioned post, so do check that out too.

Blog views

# Load libraries
library(ggplot2)
library(reshape)

# Load data
BLOGVIEWS = read.table("blogviews.txt",
						header = T,
						sep = "\t")

# We have times series data, with one observation per month
# Convert into Date class, specifying "1" as the day of the month						
BLOGVIEWS$DATE = as.Date(paste("1", BLOGVIEWS$MON, BLOGVIEWS$YEAR),
						format = "%d %b %Y")

# Check that our data look as we expect						
str(BLOGVIEWS)

# We want to replace NAs (representing zero views) with 0
BLOGVIEWS$VIEWS[is.na(BLOGVIEWS$VIEWS)] = 0
# Next we calculate cumulative site views by month
BLOGVIEWS$CVIEWS = cumsum(BLOGVIEWS$VIEWS)
# Check the results
BLOGVIEWS

# To plot the data using facets, we need to reshape the
# data into 'long' format using melt
(BVIEWS.MELT = melt(BLOGVIEWS, id.vars = c("DATE", "MON", "YEAR")))

# Change the levels of the 'variable' factor so that our
# facets have sensible names
levels(BVIEWS.MELT$variable) = c("Monthly views", "Cumulative views")

# The first plot sets up the axes and facets, but we
# use geom_blank to draw a blank plot, which we'll add
# geoms to next
g1 = ggplot(BVIEWS.MELT, aes(DATE, value)) +
		facet_wrap(~ variable,
					nrow = 2,
					scales = "free_y") +
		labs(x = "Year", y = "Number of views")

# Update the first plot, adding bars to display monthly counts
# The subset operation ensures that we only add to the facet
# corresponding to 'Monthly views'
g2 = g1 + geom_bar(subset = .(variable == "Monthly views"),
					stat = "identity")
# Do the same for the 'Cumulative views' facet. It makes
# more sense to display these data using geom_line					
g3 = g2 + geom_line(subset = .(variable == "Cumulative views"),
					colour = "blue",
					size = 1)
# Finally, print the plot and save it to a .png image file					
print(g3)
ggsave(file = "g3.png")
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s