'Fill white gaps in geom_area
I have a dataframe with daily datas for some factors ("fill" var) I would like to plot an area plot with the ggplot2::geom_area
function but there could be missing "fill" values for the day 0 or the last day.
df <- data.frame(x = do.call(c, mapply(rep, seq(Sys.Date() - 2, Sys.Date(), by = 1), c(2, 3, 2))),
y = 1,
fill = c("A", "B", "A", "B", "C", "A", "C"))
x y fill
1 2015-07-06 1 A
2 2015-07-06 1 B
3 2015-07-07 1 A
4 2015-07-07 1 B
5 2015-07-07 1 C
6 2015-07-08 1 A
7 2015-07-08 1 C
If you try to plot the area :
library(plyr)
library(dplyr)
library(ggplot2)
df %>%
group_by(x) %>%
mutate(freq = y / sum(y)) %>%
ggplot(aes(x, freq, fill = fill)) +
geom_area()
you got this :
The regression for the fill factor X will start the day of 1st occurence of the value X and end the day of the last occurence of this factor. So if not all the existing values occur the day 0 and the last day, the plot will get white gaps.
I thought that I could force the regression to start day 0 and end last day by adding missing factors (with y = 0) to those days but it seems to work only for datas missing the 1st day :
df <- arrange(df, x)
li <- split(df, df$x)
li[[1]] <-
ldply(li, function(x)
anti_join(x, li[[1]], by = "fill"), .id = NULL) %>%
mutate(x = as.Date(names(li[1])),
y = 0) %>%
distinct %>%
bind_rows(li[[1]], .)
li[[length(li)]] <-
ldply(li, function(x)
anti_join(x, last(li), by = "fill"), .id = NULL) %>%
mutate(x = as.Date(last(names(li))),
y = 0) %>%
distinct %>%
bind_rows(last(li), .)
df.m <- bind_rows(li)
df.m %>%
group_by(x) %>%
mutate(freq = y / sum(y)) %>%
ggplot(aes(x, freq, fill = fill)) +
geom_area()
Do you have any ideas to fill the gaps or any suggestions ? Thank
(Maybe you'll think that area plots could be a bad data visualisation if there is a lot of missing values, but I don't have a lot of missing values in my real datas and they are on a longer period so I only have a little gap at start or end, but it's visible though and I would like to hide it)
Please tell me if it's unclear what I'm asking, I tried my best.
Solution 1:[1]
Simply add pos = "identity", e.g. from your code above:
ggplot(aes(x, freq, fill = fill), pos = "identity")
Solution 2:[2]
You could complete
the missing dates with zeros ...
library(tidyverse)
df <- tribble(
~x, ~y, ~fill,
"2015-07-06", 1, "A",
"2015-07-06", 1, "B",
"2015-07-07", 1, "A",
"2015-07-07", 1, "B",
"2015-07-07", 1, "C",
"2015-07-08", 1, "A",
"2015-07-08", 1, "C"
) |>
complete(fill, nesting(x), fill = list(y = 0)) |>
mutate(x = as.Date(x))
df %>%
group_by(x) %>%
mutate(freq = y / sum(y)) %>%
ggplot(aes(x, freq, fill = fill)) +
geom_area()
Created on 2022-04-22 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Nick |
Solution 2 | Carl |