'Scatterplot comparing two variables with ggplot and tidy data

With untidy data, running a scatterplot comparing two variables is trivial in either base R or ggplot2. For example, here is a sample scatterplot from R for Data Science:

ggplot(data = faithful) + 
  geom_point(mapping = aes(x = eruptions, y = waiting))

Wait times vs eruptions for Old Faithful

With a tidy version of the data, though, it's unclear how to plot eruption length against wait time except as a numeric variable against a categorical variable (e.g., as in a boxplot, or point and line plot below).

Obviously, one could work with the original "wide" version or, in other cases, use the spread command but I wonder if I'm missing a straightforward way to assign x and y values by sub-group in ggplot? Or, alternatively, is this a good example of one limitation of tidy data and a use case for wide data?

tidy_faithful <- faithful %>%
  mutate(pair = row_number() %>%   # create pair number
                  as.factor()) %>% # make categorical
tidyr::gather(
          eruptions:waiting,
          key   = "event",
          value = "time") %>%
arrange(pair)

> head(tidy_faithful, 4)
  pair     event   time
1    1 eruptions  3.600
2    1   waiting 79.000
3    2 eruptions  1.800
4    2   waiting 54.000


tidy_faithful %>%
  slice(1:50) %>% # simplify data
  ggplot() +
    aes(x     = factor(event),
        y     = time,
        group = pair) + 
    geom_point() +
    geom_line()

paired point and line plot with tidy Old Faithful data



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source