'Scatterplot comparing two variables with ggplot and tidy data
With untidy data, running a scatterplot comparing two variables is trivial in either base R or ggplot2
. For example, here is a sample scatterplot from R for Data Science:
ggplot(data = faithful) +
geom_point(mapping = aes(x = eruptions, y = waiting))
With a tidy version of the data, though, it's unclear how to plot eruption length against wait time except as a numeric variable against a categorical variable (e.g., as in a boxplot, or point and line plot below).
Obviously, one could work with the original "wide" version or, in other cases, use the spread
command but I wonder if I'm missing a straightforward way to assign x
and y
values by sub-group in ggplot
? Or, alternatively, is this a good example of one limitation of tidy data and a use case for wide data?
tidy_faithful <- faithful %>%
mutate(pair = row_number() %>% # create pair number
as.factor()) %>% # make categorical
tidyr::gather(
eruptions:waiting,
key = "event",
value = "time") %>%
arrange(pair)
> head(tidy_faithful, 4)
pair event time
1 1 eruptions 3.600
2 1 waiting 79.000
3 2 eruptions 1.800
4 2 waiting 54.000
tidy_faithful %>%
slice(1:50) %>% # simplify data
ggplot() +
aes(x = factor(event),
y = time,
group = pair) +
geom_point() +
geom_line()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|