R: facet_grid plot of differences between groups using ggplot2 -
i'm trying create series of plots showing differences between groups of measured variable, , searching efficient way using facet_grid
feature of ggplot2
in r.
here illustrative example:
# sample input data df <- data.frame(year=rep(c(2011:2015), 2), value=c(0:4, 1:5), scenario=rep(c("a","b"), each=5)) # make sample plot p <- ggplot(df, aes(x=year, y=value)) + geom_point() + geom_line() + facet_grid(scenario ~ scenario)
this produces following sample plot, in value
plotted against year
separately each scenario combination:
(i assume second row not plotted because identical first).
however, looking plot where, in each facet, (value in scenario on top) - (value in scenario on right) plotted year. specifically:
- upper left plot (value a) - (value a) = 0 years.
- upper right plot (value b) - (value a) = 1 years.
- lower left plot (value a) - (value b) = -1 years.
- lower right plot (value b) - (value b) = 0 years
i have not been able find built-in or automated difference command facet_grid
. initial thought pass function y
argument ggplot
, given data frame has single value
column got stumped. guessing there might solution using combination of dplyr
, reshape2
cannot wrap head around how implement it.
here option using functions tidyr
first spread
data allow contrasts calculated, gather
ing allow plotting:
forplotting <- df %>% spread(scenario, value) %>% mutate(`a - b` = - b , `b - a` = b - , `a - a` = 0 , `b - b` = 0) %>% gather(comparison, difference, -(year:b) ) %>% separate(comparison, c("first val", "second val"), " - ")
that returns data.frame (just head here):
year b first val second val difference 1 2011 0 1 b -1 2 2012 1 2 b -1 3 2013 2 3 b -1 4 2014 3 4 b -1 5 2015 4 5 b -1 6 2011 0 1 b 1
and can plot so:
ggplot(forplotting , aes(x = year, y = difference)) + geom_point() + geom_line() + facet_grid(`first val` ~ `second val`)
the bigger question why want this. assume know plotting 2 sets different color lines easier visualization:
ggplot(df, aes(x=year, y=value, col = scenario)) + geom_point() + geom_line()
so, assuming have more complicated data -- specifically, lots more columns compare. so, here approach automate (and simplify) many of above steps multiple columns. approach same, uses mutate_
allow pass in vector columns trying create.
df <- data.frame( year = 2011:2015 , = 0:4 , b = 1:5 , c = 2:6 , d = 3:7 ) allcontrasts <- outer(colnames(df)[-1] , colnames(df)[-1] , paste , sep = " - ") %>% as.character() %>% setnames(., .) %>% as.list() forplotting <- df %>% mutate_(.dots = allcontrasts) %>% select(-(a:d)) %>% gather(comparison, difference, -year ) %>% separate(comparison, c("first val", "second val"), " - ") %>% filter(`first val` != `second val`) ggplot(forplotting , aes(x = year, y = difference)) + geom_point() + geom_line() + facet_grid(`first val` ~ `second val`) + theme(axis.text.x = element_text(angle = 90))
gives this:
why can not leave alone? playing standard evaluation much. if have non-parsing column names (e.g, things spaces) above fail. so, here example such column names, showing addition of backticks ensure columns parse correctly.
df <- data.frame( year = 2011:2015 , value = c(0:4, 1:5, 2:6, 3:7) , scenario = rep(c("unit 1", "exam 2" , "homework", "final exam") , each = 5) ) %>% spread(scenario, value) allcontrasts <- outer(paste0("`", colnames(df)[-1], "`") , paste0("`", colnames(df)[-1], "`") , paste , sep = " - ") %>% as.character() %>% setnames(., .) %>% as.list() forplotting <- df %>% mutate_(.dots = allcontrasts) %>% select_(.dots = paste0("-`", colnames(df)[-1], "`")) %>% gather(comparison, difference, -year ) %>% separate(comparison, c("first val", "second val"), " - ") %>% filter(`first val` != `second val`) %>% mutate_each(funs(gsub("`", "", .)), `first val`, `second val`) ggplot(forplotting , aes(x = year, y = difference)) + geom_point() + geom_line() + facet_grid(`first val` ~ `second val`) + theme(axis.text.x = element_text(angle = 90))
Comments
Post a Comment