'How to connect grouped points in ggplot within groups?

I have a dataset with two groups - Experimental and Control. Each participant contributes two responses per group, which represent different learning styles. These are represented in the box plots with jitter below. I would like to connect each participant's two responses together with lines using ggplot (so each red line in the Control group would correspond to each turquoise line in the control group), however I can't figure out how to do this within the conditions. Can someone please help? I am new to R and really need guidance.

Then, I need to change the color of the lines within the conditions to black if Increase = TRUE and red if Increase = FALSE.

Ideally, I need it to look like Jon's example here, but with black or red lines based on True or False: Connecting grouped points with lines in ggplot

The data and ggplot looks like this:

d <- data.frame (
  Subject = c("1", "2", "3", "4"),
  Group  = c("Exp", "Exp", "Control", "Control"),
  Tr = c("14", "11", "4", "23"),
  Sr = c("56", "78", "12", "10"),
  Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
)

# put the data in long format
d <- d %>%
  gather(key = "Strategy", value = "raw", Tr, Sr)

d %>%
  ggplot(aes(x = Group, y = raw, color = Strategy)) +
  geom_boxplot(width = 0.5, lwd = 0.5) +
  geom_jitter(width = 0.15) +
  geom_line(aes(group = raw),
            color = "grey",
            arrow = arrow(type = "closed",
                          length = unit(0.075, "inches"))) 


Solution 1:[1]

Inspired from the answer you linked to - @Jon's answer

There are a few key things to understand the solution

  1. Since you need points and lines to be connected, you need them both to apply the exact same random jitter or it is best to jitter the data before it goes into plotting which is what I did.
  2. Since the variable to apply jitter on is not a number, it is helpful to note that R plots the character vector Group as a factor, interpreted as numbers 1,2,3,.. corresponding to the factor levels. Hence we create a numeric vector group_jit with values around 1 and 2, with offsets based on the colouring variable Strategy to shift slightly left and right around 1 and 2.
  3. Since you have two independent colour scales going on, it is best to have the Groups represented as fill and the lines represented as colour to avoid a single legend with 4 things on it.

Here's the code -

library(tidyverse)

# Load data
d <- data.frame (
  Subject = c("1", "2", "3", "4"),
  Group  = c("Exp", "Exp", "Control", "Control"),
  Tr = c("14", "11", "4", "23"),
  Sr = c("56", "78", "12", "10"),
  Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
)

width_jitter <- 0.2 # 1 means full width between points

# put the data in long format
d_jit <- d %>%
  gather(key = "Strategy", value = "raw", Tr, Sr) %>% 
  
  # type conversions
  mutate(across(c(Group, Strategy), as_factor)) %>% # convert to factors
  mutate(raw = as.numeric(raw)) %>% # make raw as numbers
  
  # position on x axis is based on combination of Group and jittered Strategy. Mix to taste.
  mutate(group_jit = as.numeric(Group) + jitter(as.numeric(Strategy) - 1.5) * width_jitter * 2,
         grouping = interaction(Subject, Strategy))

# plotting
d_jit %>%
  ggplot(aes(x = Group, y = raw, fill = Strategy)) +
  geom_boxplot(width = 0.5, lwd = 0.5, alpha = 0.05, show.legend = FALSE) +
  geom_point(aes(x = group_jit), size = 3, shape = 21) +
  
  geom_line(aes(x = group_jit,
                group = Subject,
                colour = Increase),
            alpha = 0.5,
            arrow = arrow(type = "closed",
                          length = unit(0.075, "inches"))
            ) + 
  scale_colour_manual(values = c('red', 'black'))

Created on 2022-05-14 by the reprex package (v2.0.1)

For completeness sake, a different, and more elegant way to to do the jitter is to give a position argument to the geom_point and geom_line commands a function which adds the random jitter like this (source: @erocoar's answer)

position = ggplot2::position_jitterdodge(dodge.width = 0.75, jitter.width = 0.3, seed = 1)

This way the data itself is not changed and the plotting takes care of the jittering details

  • jitterdodge does the dodge (shift for the x axis variable) and jitter (small noise for the coloured points)
  • The seed argument here is key since it ensures that the same random values are returned for the point and the line functions that call it independently

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1