'Fill white gaps in geom_area

I have a dataframe with daily datas for some factors ("fill" var) I would like to plot an area plot with the ggplot2::geom_area function but there could be missing "fill" values for the day 0 or the last day.

df <- data.frame(x = do.call(c, mapply(rep, seq(Sys.Date() - 2, Sys.Date(), by = 1), c(2, 3, 2))), 
                 y = 1, 
                 fill = c("A", "B", "A", "B", "C", "A", "C"))

x y fill
1 2015-07-06 1    A
2 2015-07-06 1    B
3 2015-07-07 1    A
4 2015-07-07 1    B
5 2015-07-07 1    C
6 2015-07-08 1    A
7 2015-07-08 1    C  

If you try to plot the area :

library(plyr)
library(dplyr)
library(ggplot2)

df %>% 
  group_by(x) %>% 
  mutate(freq = y / sum(y)) %>% 
  ggplot(aes(x, freq, fill = fill)) +
  geom_area()

you got this : enter image description here

The regression for the fill factor X will start the day of 1st occurence of the value X and end the day of the last occurence of this factor. So if not all the existing values occur the day 0 and the last day, the plot will get white gaps.

I thought that I could force the regression to start day 0 and end last day by adding missing factors (with y = 0) to those days but it seems to work only for datas missing the 1st day :

df <- arrange(df, x)

li <- split(df, df$x)

li[[1]] <- 
      ldply(li, function(x) 
        anti_join(x, li[[1]], by = "fill"), .id = NULL) %>% 
      mutate(x = as.Date(names(li[1])),
             y = 0) %>% 
      distinct %>% 
      bind_rows(li[[1]], .)

li[[length(li)]] <- 
  ldply(li, function(x) 
    anti_join(x, last(li), by = "fill"), .id = NULL) %>% 
  mutate(x = as.Date(last(names(li))),
         y = 0) %>% 
  distinct %>% 
  bind_rows(last(li), .)

df.m <- bind_rows(li)

df.m %>% 
  group_by(x) %>% 
  mutate(freq = y / sum(y)) %>% 
  ggplot(aes(x, freq, fill = fill)) +
  geom_area()

enter image description here

Do you have any ideas to fill the gaps or any suggestions ? Thank

(Maybe you'll think that area plots could be a bad data visualisation if there is a lot of missing values, but I don't have a lot of missing values in my real datas and they are on a longer period so I only have a little gap at start or end, but it's visible though and I would like to hide it)

Please tell me if it's unclear what I'm asking, I tried my best.



Solution 1:[1]

Simply add pos = "identity", e.g. from your code above:

ggplot(aes(x, freq, fill = fill), pos = "identity")

Solution 2:[2]

You could complete the missing dates with zeros ...

library(tidyverse)

df <- tribble(
  ~x, ~y, ~fill,
  "2015-07-06", 1, "A",
  "2015-07-06", 1, "B",
  "2015-07-07", 1, "A",
  "2015-07-07", 1, "B",
  "2015-07-07", 1, "C",
  "2015-07-08", 1, "A",
  "2015-07-08", 1, "C"
) |> 
  complete(fill, nesting(x), fill = list(y = 0)) |> 
  mutate(x = as.Date(x))

df %>% 
  group_by(x) %>% 
  mutate(freq = y / sum(y)) %>% 
  ggplot(aes(x, freq, fill = fill)) +
  geom_area()

Created on 2022-04-22 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nick
Solution 2 Carl