'R: slicing over dates in a dataframe using custom time window

I have a dateframe of player rankings over many years (2000-2020), which looks like : enter image description here

Now, I wish to group_by() and summarise() and calculate statistics for different time slices. One way is to use custom start and end dates to subset the date, like:

dataSubset = filter(data, rankingDate >="2000-01-01" & rankingDate <="2002-01-01")
dataSubset %>% 
  group_by(player) %>% 
  summarise(
    avg_pts = mean(points)
  )

to get the average ranking points for each player in the 2-year period between Jan-1,2000 and Jan-1,2002.

Now, this is fine for a single slice of the data. But what I want are multiple slices over the entire dataset with a startDate, an endDate, and a period value, like:

dataSubsets = filter[data, rankingDate, startDate:endDate:period]

so that I could divide up the full 20-year period into, say, ten 2-year periods, and then calculate statistics for each 2-year period.

I don't want to copy-paste. What is the solution?


EDIT: An example of the data I am using can be found here: https://github.com/JeffSackmann/tennis_atp/blob/master/atp_rankings_00s.csv

Do:

data <- read_csv("data/atp/atp_rankings_00s.csv")
data = data %>% 
  mutate(rankingDate = ymd(ranking_date) ) %>% 
  select(-ranking_date)


Solution 1:[1]

Using lubridate and the intervals functionality, we can divide with the period of interest, like so:

data %>%
  #' consider each player separately
  group_by(player) %>% 
  #' treat every second year as one group
  group_by(
    year_id = lubridate::interval(
      start = min(rankingDate),
      end = rankingDate
    ) %/% (2 * dyears()),
    .add = TRUE
  ) %>% 
  summarise(
    n_obs = n(),
    avg_points_pr_player = mean(points)
  ) %>% 
  ungroup()

This is for the 2 years period, where each players' points are averaged.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1