'Creating predicted vs observed confidence interval graph
Hello and thank you for you time and consideration,
I'd like to recreate this graph with ggplot.
The top blue dots are the predicted values from my fitted model na_lmod
and the lower red values are the observed values from one city's log_trip
over the years.
Can you please help me combine these three functions ggplot(smooth/point), predict, and some sort of dplyr filter or something?
This code got me the log_trip
and year
for the desired city of interest but I'm struggling to even get that to graph.
filter(transit, msaid == "Denver")[,c("log_trip", "year")]
Desired Output:
Solution 1:[1]
Here's an example. First, I'll generate some data and estimate the model.
library(tidyverse)
set.seed(123)
dat <- expand.grid(city = LETTERS[1:10],
year = 2006:2018)
dat$log_trip <- log(abs(dat$year * .05 + rnorm(nrow(dat), 0, 100)))
dat$year <- as.factor(dat$year)
mod <- lm(log_trip ~ year, data=dat)
Next, we need to make some data that will be used to make predictions. For this model, since year
is the only variable in it, this hypothetical data only has year
in it. If you had other variables in the model, you would need to hold them constant at some (presumably central) value, like the mean.
pred_dat <- data.frame(year = factor(2006:2018))
We can then generate predictions with both confidence and prediction intervals:
preds <- predict(mod, newdata=pred_dat, interval="confidence")
preds2 <- predict(mod, newdata=pred_dat, interval="prediction")
preds <- as.data.frame(preds)
Next, we put the prediction intervals into the press
data frame.
preds$lwr_pred <- preds2[,2]
preds$upr_pred <- preds2[,3]
pred_dat <- bind_cols(pred_dat, preds)
Next, we join the prediction data and the observed data from one city ("A" in this case).
pred_dat <- left_join(pred_dat, dat %>% filter(city == "A"))
For the plot, we need to turn year
from a factor into numeric, then we can make the plot:
pred_dat %>%
mutate(year = as.numeric(as.character(year))) %>%
ggplot() +
geom_ribbon(aes(x=year, ymin = lwr_pred, ymax=upr_pred, fill="Prediction"), alpha=.25) +
geom_ribbon(aes(x=year, ymin = lwr, ymax=upr, fill="Confidence"), alpha=.25) +
scale_fill_manual(values=c("blue", "gray65")) +
geom_point(aes(x=year, y=fit, color="Predicted")) +
geom_point(aes(x=year, y=log_trip, colour="Observed (City A)")) +
scale_colour_manual(values=c("red", "blue")) +
scale_x_continuous(breaks=2006:2018) +
labs(colour="Points", fill = "Intervals", x="Year", y="Predicted Values") +
theme_classic()
Created on 2022-05-12 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DaveArmstrong |