'Using R to Calculate the time since binary output=1

I have binary data in a dataframe with a time feature and I'm looking to produce a dataframe like below with a new column "duration since =1". I was able to find the python equivalent of this answer here. I am looking for a way to do this in R

Binary Output   Time (secs)   duration since =1
0               0             0
0               0.000983      0.000983
0               0.001966      0.001966
1               0.002949      0
0               0.003932      0.000983  # (0.003932-0.002949)
0               0.005000      0.002051  # (0.005000-0.002949)

Solution 1:[1]

We can use cumsum to indicate whether we should subtract Time with Binary_Output == 1. If cumsum == 0, it means all previous Binary_Output has a value of 0, and we will not subtract Time with Binary_Output == 1 in these rows.


df <- read.table(header = T, text = "Binary_Output   Time
0               0
0               0.000983
0               0.001966
1               0.002949
0               0.003932
0               0.005000")

df %>% 
  mutate(duration = ifelse(cumsum(Binary_Output) == 0, Time, Time - Time[Binary_Output == 1]))

#>   Binary_Output     Time duration
#> 1             0 0.000000 0.000000
#> 2             0 0.000983 0.000983
#> 3             0 0.001966 0.001966
#> 4             1 0.002949 0.000000
#> 5             0 0.003932 0.000983
#> 6             0 0.005000 0.002051

Created on 2022-05-05 by the reprex package (v2.0.1)

Solution 2:[2]

With data.table:


df[,DurationSince1:=Time-nafill(fifelse(Binary_Output==1,Time,NA),type = 'locf')][]

   Binary_Output     Time DurationSince1
           <int>    <num>          <num>
1:             0 0.000000             NA
2:             0 0.000983             NA
3:             0 0.001966             NA
4:             1 0.002949       0.000000
5:             0 0.003932       0.000983
6:             0 0.005000       0.002051

Solution 3:[3]

Alternatively, this can be solved by grouping on cumsum(Binary_Output) which has the benefit to reproduce OP's expected result for the first group, i.e., the first 3 rows:

setDT(df)[, duration_since_1 := Time - first(Time), by = cumsum(Binary_Output)][]
   Binary_Output     Time duration_since_1
1:             0 0.000000         0.000000
2:             0 0.000983         0.000983
3:             0 0.001966         0.001966
4:             1 0.002949         0.000000
5:             0 0.003932         0.000983
6:             0 0.005000         0.002051


df <- fread("Binary_Output   Time   duration_since_1
0               0             0
0               0.000983      0.000983
0               0.001966      0.001966
1               0.002949      0
0               0.003932      0.000983
0               0.005000      0.002051")


This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 benson23
Solution 2 Waldi
Solution 3 Uwe