'How can I find the nearest date after another date in a different column in grouping by ID using R?
I am looking for a method that will look at each date in "Date A" and find the next nearest date after that value in "Date B" by ID (group_by). I then want to calculate the difference in days. Below is the table that I would like.
ID | Date A | Date B | Difference|
11111 | 09/01/21 | 09/03/21 | 2 |
22222 | 09/06/21 | 09/20/21 | 11 |
11111 | 09/08/21 | 09/18/21 | 10 |
44444 | 09/04/21 | NA | 11 |
44444 | 09/10/21 | 09/15/21 | 5 |
22222 | NA | 09/17/21 | NA |
77777 | NA | 10/16/21 | NA |
77777 | 09/04/21 | 10/17/21 | 24 |
77777 | 09/01/21 | 09/28/21 | 27 |
If you could please help me out with this, I would greatly appreciate it!
Cheers
Solution 1:[1]
A dplyr
solution via group_by
solution is not obvious to me here, but here is a relatively straightforward sqldf
solution. Presumably this could be translated into a dplyr
solution if you really wanted.
First mock up the data within R
df <- dplyr::tribble(
~'ID', ~'Date A', ~'Date B',
11111, '09/01/21', '09/03/21',
22222, '09/06/21', '09/20/21',
11111, '09/08/21', '09/18/21',
44444, '09/04/21', NA ,
44444, '09/10/21', '09/15/21',
22222, NA , '09/17/21',
77777, NA , '10/16/21',
77777, '09/04/21', '10/17/21',
77777, '09/01/21', '09/28/21'
)
df$`Date A` <- lubridate::mdy(df$`Date A`)
df$`Date B` <- lubridate::mdy(df$`Date B`)
df
Which looks like
# A tibble: 9 x 3
ID `Date A` `Date B`
<dbl> <date> <date>
1 11111 2021-09-01 2021-09-03
2 22222 2021-09-06 2021-09-20
3 11111 2021-09-08 2021-09-18
4 44444 2021-09-04 NA
5 44444 2021-09-10 2021-09-15
6 22222 NA 2021-09-17
7 77777 NA 2021-10-16
8 77777 2021-09-04 2021-10-17
9 77777 2021-09-01 2021-09-28
Then do an inequality join combined with a group by. The column I is added to allow for nuances of the data such as multiple of the same Date A within each ID
df$I <- 1:nrow(df)
df <- sqldf::sqldf('
SELECT a.I, a.ID, a."Date A", a."Date B",
MIN(b."Date B") AS NextB
FROM df a
LEFT JOIN df b
ON a.ID = b.ID
AND a."Date A" < b."Date B"
GROUP BY a.I, a.ID, a."Date A", a."Date B"
ORDER BY a.I
')
df$Difference = df$NextB - as.integer(df$`Date A`)
df$I <- NULL
df$NextB <- NULL
df
Which matches your example data (and should generalize well for edge cases not in your example data). Unclear how well it might scale up to non-trivial data.
ID Date A Date B Difference
1 11111 2021-09-01 2021-09-03 2
2 22222 2021-09-06 2021-09-20 11
3 11111 2021-09-08 2021-09-18 10
4 44444 2021-09-04 <NA> 11
5 44444 2021-09-10 2021-09-15 5
6 22222 <NA> 2021-09-17 NA
7 77777 <NA> 2021-10-16 NA
8 77777 2021-09-04 2021-10-17 24
9 77777 2021-09-01 2021-09-28 27
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | jmuhlenkamp |