'Plotting the “Average ” curve of set of curves in ggplot2
My Question is exactly this one:
Plotting the "Average " curve of set of curves
but I am looking to implement the accepted answer (below) in ggplot. is it possible?
First I create some data. Here I am creating a list , with 5 data.frame, with differents xs:
ll <- lapply(1:5,function(i)
data.frame(x=seq(i,length.out=10,by=i),y=rnorm(10)))
Then to apply approx, I create a big data.frame containing all the data:
big.df <- do.call(rbind,ll)
Then , I plot the linear approximation and all my series :
plot(approx(big.df$x,big.df$y),type='l')
lapply(seq_along(ll),
function(i) points(ll[[i]]$x,ll[[i]]$y,col=i))
EDIT
structure of my data (example. the actual DF contain 183000 rows)
structure(list(timeseries = c(1, 7, 59, 0, 0, 5, 0, 0, 1, 0),
t = c(1, 3, 7, 1, 3, 7, 1, 3, 7, 1)), .Names = c("timeseries",
"t"), row.names = c(NA, 10L), class = "data.frame")
Solution 1:[1]
In the code below, we start with the list you created (depending on what your actual data looks like, there are probably better approaches, but I've left it as is for now). Then we use bind_rows
to convert it to a single data frame and mutate
to add the interpolated values. The we feed it to ggplot on the fly. geom_line
plots the interpolated values.
The interpolated points are the exact average of all y values at each x value in the data. For comparison, I've also added geom_smooth
, which uses locally weighted regression to plot a smooth curve through the data. The span
argument in geom_smooth
can be used to determine the amount of smoothing.
library(tidyverse)
theme_set(theme_classic())
# Fake data
set.seed(2)
ll <- lapply(1:5,function(i)
data.frame(x=seq(i,length.out=10,by=i),y=rnorm(10)))
# Combine into single data frame and add interpolation column
bind_rows(ll, .id="source") %>%
mutate(avg = approx(x,y,xout=x)$y) %>%
ggplot(aes(x, y)) +
geom_point(aes(colour=source)) +
geom_line(aes(y=avg)) +
geom_smooth(se=FALSE, colour="red", span=0.3, linetype="11")
Now let's go through the individual data processing steps:
Generate a single data frame from the list:
dat = bind_rows(ll, .id="source")
Here are selected rows from that data frame:
dat[c(1:3, 15:17, 25:27), ] source x y 1 1 1 -0.896914547 2 1 2 0.184849185 3 1 3 1.587845331 15 2 10 1.782228960 16 2 12 -2.311069085 17 2 14 0.878604581 25 3 15 0.004937777 26 3 18 -2.451706388 27 3 21 0.477237303
We can get interpolated values as follows:
with(dat, approx(x, y, xout=x))
To get just the y values, which is all we wanted above, we would do:
with(dat, approx(x, y, xout=x))$y
To add the y-values to the data frame:
dat$avg = with(dat, approx(x, y, xout=x))
To create the plot, we performed the data processing steps using functions from the dplyr
package, which is part of the tidyverse
suite of packages the we loaded at the start of the code. It includes the pipe (%>%
) operator, which allows us to chain functions one after the other and feed the data directly into ggplot
without having to assign the intermediate data frame to an object (although we can of course create the intermediate data frame first if we wish). For example:
dat = bind_rows(ll, .id="source") %>%
mutate(avg = approx(x,y,xout=x)$y)
ggplot(dat, aes(x, y)) +
geom_point(aes(colour=source)) +
geom_line(aes(y=avg)) +
geom_smooth(se=FALSE, colour="red", span=0.3, linetype="11")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |