'R, Python or Excel. How can I count time range occurrences happening during other time ranges?
Say for example, I want to count how many people are in a room while others are in that room for each day. When that person leaves, I want an updated count of how many are in that room. This feels like it should be the most simple question in history, yet I am completely stumped. See table to understand what I am working with.
These times are meant to be checked every 15 minutes and thus are recorded as full duration of their stay in 15 minute intervals.
I have access to R, Python, Excel and Power BI and am fairly familiar with each. So if anyone can help using any of those languages I would be entirely grateful.
Cheers!
ID | Date | Start Time | End Time |
11111 | 09/01/21 | 0900 | 1700 |
22222 | 09/01/21 | 1000 | 1300 |
33333 | 09/01/21 | 0900 | 1200 |
44444 | 09/02/21 | 0900 | 1700 |
55555 | 09/02/21 | 1200 | 1500 |
66666 | 09/02/21 | 0945 | 1400 |
77777 | 09/02/21 | 1000 | 1230 |
88888 | 09/02/21 | 0900 | 1445 |
99999 | 09/02/21 | 1300 | 1700 |
Edit: Both Langtang and Maël's solution worked well for me, but I ultimately ended up using something very similar to langtang's solution for my project. Here is the code langtang provided:
============================================================================
library(data.table)
# set to data.table, and create datetime columns sdate and edate
setDT(dt)[
,c("sdate","edate"):=lapply(.SD, \(x)
lubridate::mdy_hm(paste(Date,x))),
.SDcols = c("Start Time", "End Time")
]
dt[, in_room:=sapply(edate, \(x) sum(sdate<x & edate>x)), by=Date]
[,`:=`(sdate=NULL, edate=NULL)]
============================================================================
Thank you all!
Solution 1:[1]
Here's a way with the ivs
and tidyverse
libraries in R:
library(ivs)
library(tidyverse)
df <- df %>%
mutate(Start.Time = mdy_hm(paste(Date, Start.Time)),
End.Time = mdy_hm(paste(Date, End.Time)),
ivs = iv(Start.Time, End.Time))
bounds <- range(df$ivs)
lower <- iv_start(bounds[[1]])
upper <- iv_end(bounds[[2]]) - 1L
tibble(minutes = seq(lower, upper, by = 15 * 60),
count = iv_count_between(minutes, df$ivs)) %>%
group_by(gp = data.table::rleid(count)) %>%
summarise(StartTime = min(minutes),
EndTime = max(minutes),
count = unique(count))
# A tibble: 14 × 4
gp StartTime EndTime count
<int> <dttm> <dttm> <int>
1 1 2021-09-01 09:00:00 2021-09-01 09:45:00 2
2 2 2021-09-01 10:00:00 2021-09-01 11:45:00 3
3 3 2021-09-01 12:00:00 2021-09-01 12:45:00 2
4 4 2021-09-01 13:00:00 2021-09-01 16:45:00 1
5 5 2021-09-01 17:00:00 2021-09-02 08:45:00 0
6 6 2021-09-02 09:00:00 2021-09-02 09:30:00 2
7 7 2021-09-02 09:45:00 2021-09-02 09:45:00 3
8 8 2021-09-02 10:00:00 2021-09-02 11:45:00 4
9 9 2021-09-02 12:00:00 2021-09-02 12:15:00 5
10 10 2021-09-02 12:30:00 2021-09-02 12:45:00 4
11 11 2021-09-02 13:00:00 2021-09-02 13:45:00 5
12 12 2021-09-02 14:00:00 2021-09-02 14:30:00 4
13 13 2021-09-02 14:45:00 2021-09-02 14:45:00 3
14 14 2021-09-02 15:00:00 2021-09-02 16:45:00 2
data
df <- tibble::tribble(
~ID, ~Date, ~Start.Time, ~End.Time,
11111L, "09/01/21", "0900", 1700,
22222L, "09/01/21", "1000", 1300L,
33333L, "09/01/21", "0900", 1200L,
44444L, "09/02/21", "0900", 1700L,
55555L, "09/02/21", "1200", 1500L,
66666L, "09/02/21", "0945", 1400L,
77777L, "09/02/21", "1000", 1230L,
88888L, "09/02/21", "0900", 1445L,
99999L, "09/02/21", "1300", 1700L
)
Solution 2:[2]
You can use the following approach, that leverages data.table.
library(data.table)
# set to data.table, and create datetime columns sdate and edate
setDT(dt)[
,c("sdate","edate"):=lapply(.SD, \(x) lubridate::mdy_hm(paste(Date,x))),
.SDcols = c("Start Time", "End Time")
]
dt[, in_room:=sapply(edate, \(x) sum(sdate<x & edate>x)), by=Date][,`:=`(sdate=NULL, edate=NULL)]
Output:
ID Date Start Time End Time in_room
<int> <char> <char> <char> <int>
1: 11111 09/01/21 09:00 17:00 0
2: 22222 09/01/21 10:00 13:00 1
3: 33333 09/01/21 09:00 12:00 2
4: 44444 09/02/21 09:00 17:00 0
5: 55555 09/02/21 12:00 15:00 2
6: 66666 09/02/21 09:45 14:00 4
7: 77777 09/02/21 10:00 12:30 4
8: 88888 09/02/21 09:00 14:45 3
9: 99999 09/02/21 13:00 17:00 0
Update
The OP has clarified that they are interested in, for each person/Date, the total number of people that were in the room with that person across the span of that person's time in the room. This is also quite simple:
ct_p <- function(s,e) dt[(sdate>e | edate<s)==F, .N]
dt[, overlapped:=ct_p(sdate,edate), by=1:nrow(dt)][,`:=`(sdate=NULL, edate=NULL)]
Output:
ID Date Start Time End Time overlapped
<int> <char> <char> <char> <int>
1: 11111 09/01/21 09:00 17:00 3
2: 22222 09/01/21 10:00 13:00 3
3: 33333 09/01/21 09:00 12:00 3
4: 44444 09/02/21 09:00 17:00 6
5: 55555 09/02/21 12:00 15:00 6
6: 66666 09/02/21 09:45 14:00 6
7: 77777 09/02/21 10:00 12:30 5
8: 88888 09/02/21 09:00 14:45 6
9: 99999 09/02/21 13:00 17:00 5
Input:
structure(list(ID = c(11111L, 22222L, 33333L, 44444L, 55555L,
66666L, 77777L, 88888L, 99999L), Date = c("09/01/21", "09/01/21",
"09/01/21", "09/02/21", "09/02/21", "09/02/21", "09/02/21", "09/02/21",
"09/02/21"), `Start Time` = c("09:00", "10:00", "09:00", "09:00",
"12:00", "09:45", "10:00", "09:00", "13:00"), `End Time` = c("17:00",
"13:00", "12:00", "17:00", "15:00", "14:00", "12:30", "14:45",
"17:00")), row.names = c(NA, -9L), class = "data.frame")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |