'R: slicing over dates in a dataframe using custom time window
I have a dateframe of player rankings over many years (2000-2020), which looks like :
Now, I wish to group_by()
and summarise()
and calculate statistics for different time slices. One way is to use custom start and end dates to subset the date, like:
dataSubset = filter(data, rankingDate >="2000-01-01" & rankingDate <="2002-01-01")
dataSubset %>%
group_by(player) %>%
summarise(
avg_pts = mean(points)
)
to get the average ranking points for each player in the 2-year period between Jan-1,2000 and Jan-1,2002.
Now, this is fine for a single slice of the data. But what I want are multiple slices over the entire dataset with a startDate
, an endDate
, and a period
value, like:
dataSubsets = filter[data, rankingDate, startDate:endDate:period]
so that I could divide up the full 20-year period into, say, ten 2-year periods, and then calculate statistics for each 2-year period.
I don't want to copy-paste. What is the solution?
EDIT: An example of the data I am using can be found here: https://github.com/JeffSackmann/tennis_atp/blob/master/atp_rankings_00s.csv
Do:
data <- read_csv("data/atp/atp_rankings_00s.csv")
data = data %>%
mutate(rankingDate = ymd(ranking_date) ) %>%
select(-ranking_date)
Solution 1:[1]
Using lubridate
and the intervals functionality, we can divide with the period of interest, like so:
data %>%
#' consider each player separately
group_by(player) %>%
#' treat every second year as one group
group_by(
year_id = lubridate::interval(
start = min(rankingDate),
end = rankingDate
) %/% (2 * dyears()),
.add = TRUE
) %>%
summarise(
n_obs = n(),
avg_points_pr_player = mean(points)
) %>%
ungroup()
This is for the 2 years period, where each players' points are averaged.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |