'Counting observations by 30-days window

As I explained in previous posts I'm trying to count observations over 30 days windows grouping by id.

The data:

df<-structure(list(id=c(1,1,1,2),date=c("2001-07-29","2001-08-01","2001-07-20","2001-07-21")))


df<-as.data.frame(df)

df<-df%>%
  mutate(date=ymd(date))

The code:

df<-df%>%
  group_by(id)%>%
  mutate(n=sapply(seq(length(date)),
                  function(x) sum(between(date[1:x],date[x]-days(30),date[x]))))

The output is wrong because the expected result is:


id  date    n
1   29/07/2001  2
1   01/08/2001  3
1   20/07/2001  1
2   21/07/2001  1



Solution 1:[1]

Something like this?

library(dplyr)
library(lubridate)

df<-structure(list(id=c(1,1,1,2),date=c("2001-07-29","2001-08-01","2001-07-20","2001-07-21")))
df <- as.data.frame(df)

df %>%
  mutate(date = ymd(date)) %>% 
  group_by(id, fl = floor_date(date - days(19), "month") + days(19)) %>% 
  arrange(date, .by_group = T) %>% 
  mutate(n = row_number())

# A tibble: 4 × 4
# Groups:   id, fl [2]
     id date       fl             n
  <dbl> <date>     <date>     <int>
1     1 2001-07-20 2001-07-20     1
2     1 2001-07-29 2001-07-20     2
3     1 2001-08-01 2001-07-20     3
4     2 2001-07-21 2001-07-20     1

Solution 2:[2]

If I understand you correctly, you have to arrange your date first and you can use the following code:

library(lubridate)
library(dplyr)
df %>%
  group_by(id) %>% 
  arrange(date) %>%
  mutate(n = sapply(seq(length(date)), 
                         function(x) sum(between(date[1:x], date[x] - days(30), date[x]))))

Output:

# A tibble: 4 × 3
# Groups:   id [2]
     id date           n
  <dbl> <date>     <int>
1     1 2001-07-20     1
2     2 2001-07-21     1
3     1 2001-07-29     2
4     1 2001-08-01     3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Maël
Solution 2 Quinten