'Plotting the “Average ” curve of set of curves in ggplot2

My Question is exactly this one:

Plotting the "Average " curve of set of curves

but I am looking to implement the accepted answer (below) in ggplot. is it possible?

First I create some data. Here I am creating a list , with 5 data.frame, with differents xs:

 ll <- lapply(1:5,function(i)
  data.frame(x=seq(i,length.out=10,by=i),y=rnorm(10)))

Then to apply approx, I create a big data.frame containing all the data:

big.df <- do.call(rbind,ll)

Then , I plot the linear approximation and all my series :

plot(approx(big.df$x,big.df$y),type='l')
lapply(seq_along(ll), 
       function(i) points(ll[[i]]$x,ll[[i]]$y,col=i))

EDIT

structure of my data (example. the actual DF contain 183000 rows)

structure(list(timeseries = c(1, 7, 59, 0, 0, 5, 0, 0, 1, 0), 
t = c(1, 3, 7, 1, 3, 7, 1, 3, 7, 1)), .Names = c("timeseries", 

"t"), row.names = c(NA, 10L), class = "data.frame")



Solution 1:[1]

In the code below, we start with the list you created (depending on what your actual data looks like, there are probably better approaches, but I've left it as is for now). Then we use bind_rows to convert it to a single data frame and mutate to add the interpolated values. The we feed it to ggplot on the fly. geom_line plots the interpolated values.

The interpolated points are the exact average of all y values at each x value in the data. For comparison, I've also added geom_smooth, which uses locally weighted regression to plot a smooth curve through the data. The span argument in geom_smooth can be used to determine the amount of smoothing.

library(tidyverse)
theme_set(theme_classic())

# Fake data
set.seed(2)
ll <- lapply(1:5,function(i)
  data.frame(x=seq(i,length.out=10,by=i),y=rnorm(10)))

# Combine into single data frame and add interpolation column
bind_rows(ll, .id="source") %>% 
  mutate(avg = approx(x,y,xout=x)$y) %>% 
  ggplot(aes(x, y)) +
    geom_point(aes(colour=source)) +
    geom_line(aes(y=avg)) +
    geom_smooth(se=FALSE, colour="red", span=0.3, linetype="11")

enter image description here

Now let's go through the individual data processing steps:

  1. Generate a single data frame from the list:

    dat = bind_rows(ll, .id="source")
    

    Here are selected rows from that data frame:

    dat[c(1:3, 15:17, 25:27), ]
    
       source  x            y
    1       1  1 -0.896914547
    2       1  2  0.184849185
    3       1  3  1.587845331
    15      2 10  1.782228960
    16      2 12 -2.311069085
    17      2 14  0.878604581
    25      3 15  0.004937777
    26      3 18 -2.451706388
    27      3 21  0.477237303
    
  2. We can get interpolated values as follows:

     with(dat, approx(x, y, xout=x))
    

    To get just the y values, which is all we wanted above, we would do:

     with(dat, approx(x, y, xout=x))$y
    

    To add the y-values to the data frame:

     dat$avg = with(dat, approx(x, y, xout=x))
    

To create the plot, we performed the data processing steps using functions from the dplyr package, which is part of the tidyverse suite of packages the we loaded at the start of the code. It includes the pipe (%>%) operator, which allows us to chain functions one after the other and feed the data directly into ggplot without having to assign the intermediate data frame to an object (although we can of course create the intermediate data frame first if we wish). For example:

dat = bind_rows(ll, .id="source") %>% 
  mutate(avg = approx(x,y,xout=x)$y)

ggplot(dat, aes(x, y)) +
  geom_point(aes(colour=source)) +
  geom_line(aes(y=avg)) +
  geom_smooth(se=FALSE, colour="red", span=0.3, linetype="11")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1