'ggplot holes in stacked area chart

Here is a link to my data.

I use the following code:

#read in data
data = read.csv("ggplot_data.csv")

#order by group then year
data = arrange(data, group, year)

#generage ggplot stacked area chart
plot = ggplot(data, aes(x=year,y=value, fill=group)) +
  geom_area() 
plot

That produces the following chart: enter image description here

As you can see, there are odd holes in three different parts of this chart.

I previously had this issue and asked about it, and the answer provided then was that I needed to sort my data by group and then year. At the time, that answer fixed my holes. However, it doesn't seem to eliminate all the holes this time. Any help?



Solution 1:[1]

The reason for the gaps is that some time series start later than others. When the first non-vanishing value appears, the new area starts with an non-continuous jump. The area just above is however connected to the next point by linear interpolation. This result in the gap.

For example, look at the left-most gap. The olive region starts just after the gap with a vertical jump in 1982. The green area, however, increases linearly from the value in 1981 (where the olive area is zero) to the value in 1982 (where the olive area suddenly contributes).

What you could do is, for instance, add a value of zero at the beginning of each time series that starts after 1975. I use dplyr functionality to create a data frame of these additional first years:

first_years <- group_by(data, group, group_id) %>%
               summarise(year = min(year) - 1) %>%
               filter(year > 1974) %>%
               mutate(value = 0, value_pct = 0)
first_years
## Source: local data frame [3 x 5] 
## Groups: group [3]
## 
##    group group_id  year value value_pct
##   (fctr)    (int) (dbl) (dbl)     (dbl)
## 1      c    10006  1981     0         0
## 2      e    10022  2010     0         0
## 3      i    24060  2002     0         0

As you can see, these three new values fit exactly the three gaps in your plot. Now, you can combine these new data frames with your data and sort in the same way as before:

data_complete <- bind_rows(data, first_years) %>%
                 arrange(year, group)

And the plot then has no gaps:

ggplot(data_complete, aes(x=year,y=value, fill=group)) +
  geom_area()

enter image description here

Solution 2:[2]

@Stibu's answer is probably best, but for those of us who are not very R-savvy and don't know how to go through a dataset with R to find missing rows and fill them with zeros, I solved this issue with a bit of a different approach.

For my case, I created a dummy dataset with zeroes for all years and all groups, then appended it to my original dataset. This way I added rows for years where before there was simply no rows of data. After aggregating by year and group, my aggregated dataset then contained rows with zero, as opposed to no rows existing at all. This removed all those weird gaps for me.

Solution 3:[3]

Best is to simply add: pos = "identity", e.g. from your code above:

ggplot(aes(x=year,y=value, fill=group), pos = "identity")

Solution 4:[4]

I found it simpler to save my table into csv and use python's matplotlib function stackplot(demo), which does not seem to have issues with negative numbers.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 spops
Solution 3 Nick
Solution 4 Valentas