'How do you populate missing dates for lag?
Say that I have a dataset.
date <- c("2004-02-01", "2004-03-05", "2004-08-09", "2004-08-13", "2004-10-20", "2004-11-02", "2008-01-05", "2008-02-03", "2008-08-09", "2008-11-04", "2012-01-05", "2012-02-03", "2012-08-09", "2012-10-04", "2012-10-04", "2012-10-31", "2012-11-04")
date <- ymd(date)
name <- c("Joe", "Joe", "Joe", "Joe", "Joe", "Joe",
"Larry", "Larry", "Larry", "Larry",
"Jeff", "Jeff", "Jeff", "Jeff", "Jeff", "Jeff", "Jeff")
hits <- c(5, 4, 10, 9, 15, 1,
13, 22, 9, 11,
15, 17, 10, 3, 4, 2, 33)
df <- data.frame(date, name, hits)
I want to do 7-day time lags for each observation. In order to do this, I will have to restructure the dataset a bit.
I want to add seven days after each date for each name, but the hits will be 0. I hope to end up with a dataset like the following (for Joe):
date name hits
2004-02-01 Joe 5
2004-02-02 Joe 0
2004-02-03 Joe 0
2004-02-04 Joe 0
2004-02-05 Joe 0
2004-02-06 Joe 0
2004-02-07 Joe 0
2004-02-08 Joe 0
2004-03-05 Joe 4
2004-03-06 Joe 0
2004-03-07 Joe 0
2004-03-08 Joe 0
2004-03-09 Joe 0
2004-03-10 Joe 0
2004-03-11 Joe 0
2004-03-12 Joe 0
2004-08-09 Joe 10
2004-08-10 Joe 0
2004-08-11 Joe 0
2004-08-12 Joe 0
2004-08-13 Joe 9
2004-08-14 Joe 0
2004-08-15 Joe 0
2004-08-16 Joe 0
2004-08-17 Joe 0
2004-08-18 Joe 0
2004-08-19 Joe 0
2004-08-20 Joe 0
2004-10-20 Joe 15
2004-10-21 Joe 0
2004-10-22 Joe 0
2004-10-23 Joe 0
2004-10-24 Joe 0
2004-10-25 Joe 0
2004-10-26 Joe 0
2004-10-27 Joe 0
2004-11-02 Joe 1
2004-11-03 Joe 0
2004-11-04 Joe 0
2004-11-05 Joe 0
2004-11-06 Joe 0
2004-11-07 Joe 0
2004-11-08 Joe 0
2004-11-09 Joe 0
Is there a fast way to do this using dplyr
?
Solution 1:[1]
The dplyr::summarise
function can be used to add rows for the next 7 days for each combination of name
and date
:
library(tidyverse)
ndays=7
df.filled = df %>%
mutate(date = as.Date(date)) %>%
arrange(name, date) %>%
group_by(name, date, hits) %>%
summarise(date = date + 0:ndays,
hits = c(hits, rep(0, ndays))) %>%
ungroup()
df.filled %>% filter(name=="Joe") %>% print(n=Inf)
#> # A tibble: 48 × 3
#> name date hits
#> <chr> <date> <dbl>
#> 1 Joe 2004-02-01 5
#> 2 Joe 2004-02-02 0
#> 3 Joe 2004-02-03 0
#> 4 Joe 2004-02-04 0
#> 5 Joe 2004-02-05 0
#> 6 Joe 2004-02-06 0
#> 7 Joe 2004-02-07 0
#> 8 Joe 2004-02-08 0
#> 9 Joe 2004-03-05 4
#> 10 Joe 2004-03-06 0
#> 11 Joe 2004-03-07 0
#> 12 Joe 2004-03-08 0
#> 13 Joe 2004-03-09 0
#> 14 Joe 2004-03-10 0
#> 15 Joe 2004-03-11 0
#> 16 Joe 2004-03-12 0
#> 17 Joe 2004-08-09 10
#> 18 Joe 2004-08-10 0
#> 19 Joe 2004-08-11 0
#> 20 Joe 2004-08-12 0
#> 21 Joe 2004-08-13 0
#> 22 Joe 2004-08-14 0
#> 23 Joe 2004-08-15 0
#> 24 Joe 2004-08-16 0
#> 25 Joe 2004-08-13 9
#> 26 Joe 2004-08-14 0
#> 27 Joe 2004-08-15 0
#> 28 Joe 2004-08-16 0
#> 29 Joe 2004-08-17 0
#> 30 Joe 2004-08-18 0
#> 31 Joe 2004-08-19 0
#> 32 Joe 2004-08-20 0
#> 33 Joe 2004-10-20 15
#> 34 Joe 2004-10-21 0
#> 35 Joe 2004-10-22 0
#> 36 Joe 2004-10-23 0
#> 37 Joe 2004-10-24 0
#> 38 Joe 2004-10-25 0
#> 39 Joe 2004-10-26 0
#> 40 Joe 2004-10-27 0
#> 41 Joe 2004-11-02 1
#> 42 Joe 2004-11-03 0
#> 43 Joe 2004-11-04 0
#> 44 Joe 2004-11-05 0
#> 45 Joe 2004-11-06 0
#> 46 Joe 2004-11-07 0
#> 47 Joe 2004-11-08 0
#> 48 Joe 2004-11-09 0
Note, however, that with the code above you could end up with repeated dates if a given name
has two dates that are less than 7 days apart. Thus, it's probably safer to do the following: In the code below, we fill in every date from the first to the last + 7 days for each name
. Then we join that back to the original data to populate the dates that have non-zero hits
.
df$date = as.Date(df$date)
df.filled2 = df %>%
group_by(name) %>%
summarise(date = seq(min(date), max(date)+7,"1 day")) %>%
left_join(df) %>%
mutate(hits=replace_na(hits, 0))
df.filled2 %>% filter(name=="Joe") %>% print(n=Inf)
#> # A tibble: 283 × 3
#> # Groups: name [1]
#> name date hits
#> <chr> <date> <dbl>
#> 1 Joe 2004-02-01 5
#> 2 Joe 2004-02-02 0
#> 3 Joe 2004-02-03 0
#> 4 Joe 2004-02-04 0
#> 5 Joe 2004-02-05 0
#> 6 Joe 2004-02-06 0
#> 7 Joe 2004-02-07 0
#> 8 Joe 2004-02-08 0
#> 9 Joe 2004-02-09 0
#> 10 Joe 2004-02-10 0
#> 11 Joe 2004-02-11 0
#> 12 Joe 2004-02-12 0
#> 13 Joe 2004-02-13 0
#> 14 Joe 2004-02-14 0
#> 15 Joe 2004-02-15 0
#> 16 Joe 2004-02-16 0
#> 17 Joe 2004-02-17 0
#> 18 Joe 2004-02-18 0
#> 19 Joe 2004-02-19 0
#> 20 Joe 2004-02-20 0
#> 21 Joe 2004-02-21 0
#> 22 Joe 2004-02-22 0
#> 23 Joe 2004-02-23 0
#> 24 Joe 2004-02-24 0
#> 25 Joe 2004-02-25 0
#> 26 Joe 2004-02-26 0
#> 27 Joe 2004-02-27 0
#> 28 Joe 2004-02-28 0
#> 29 Joe 2004-02-29 0
#> 30 Joe 2004-03-01 0
#> 31 Joe 2004-03-02 0
#> 32 Joe 2004-03-03 0
#> 33 Joe 2004-03-04 0
#> 34 Joe 2004-03-05 4
#> 35 Joe 2004-03-06 0
#> 36 Joe 2004-03-07 0
#> 37 Joe 2004-03-08 0
#> 38 Joe 2004-03-09 0
#> 39 Joe 2004-03-10 0
#> 40 Joe 2004-03-11 0
#> 41 Joe 2004-03-12 0
#> 42 Joe 2004-03-13 0
#> 43 Joe 2004-03-14 0
#> 44 Joe 2004-03-15 0
#> 45 Joe 2004-03-16 0
#> 46 Joe 2004-03-17 0
#> 47 Joe 2004-03-18 0
#> 48 Joe 2004-03-19 0
#> 49 Joe 2004-03-20 0
#> 50 Joe 2004-03-21 0
#> 51 Joe 2004-03-22 0
#> 52 Joe 2004-03-23 0
#> 53 Joe 2004-03-24 0
#> 54 Joe 2004-03-25 0
#> 55 Joe 2004-03-26 0
#> 56 Joe 2004-03-27 0
#> 57 Joe 2004-03-28 0
#> 58 Joe 2004-03-29 0
#> 59 Joe 2004-03-30 0
#> 60 Joe 2004-03-31 0
#> 61 Joe 2004-04-01 0
#> 62 Joe 2004-04-02 0
#> 63 Joe 2004-04-03 0
#> 64 Joe 2004-04-04 0
#> 65 Joe 2004-04-05 0
#> 66 Joe 2004-04-06 0
#> 67 Joe 2004-04-07 0
#> 68 Joe 2004-04-08 0
#> 69 Joe 2004-04-09 0
#> 70 Joe 2004-04-10 0
#> 71 Joe 2004-04-11 0
#> 72 Joe 2004-04-12 0
#> 73 Joe 2004-04-13 0
#> 74 Joe 2004-04-14 0
#> 75 Joe 2004-04-15 0
#> 76 Joe 2004-04-16 0
#> 77 Joe 2004-04-17 0
#> 78 Joe 2004-04-18 0
#> 79 Joe 2004-04-19 0
#> 80 Joe 2004-04-20 0
#> 81 Joe 2004-04-21 0
#> 82 Joe 2004-04-22 0
#> 83 Joe 2004-04-23 0
#> 84 Joe 2004-04-24 0
#> 85 Joe 2004-04-25 0
#> 86 Joe 2004-04-26 0
#> 87 Joe 2004-04-27 0
#> 88 Joe 2004-04-28 0
#> 89 Joe 2004-04-29 0
#> 90 Joe 2004-04-30 0
#> 91 Joe 2004-05-01 0
#> 92 Joe 2004-05-02 0
#> 93 Joe 2004-05-03 0
#> 94 Joe 2004-05-04 0
#> 95 Joe 2004-05-05 0
#> 96 Joe 2004-05-06 0
#> 97 Joe 2004-05-07 0
#> 98 Joe 2004-05-08 0
#> 99 Joe 2004-05-09 0
#> 100 Joe 2004-05-10 0
#> 101 Joe 2004-05-11 0
#> 102 Joe 2004-05-12 0
#> 103 Joe 2004-05-13 0
#> 104 Joe 2004-05-14 0
#> 105 Joe 2004-05-15 0
#> 106 Joe 2004-05-16 0
#> 107 Joe 2004-05-17 0
#> 108 Joe 2004-05-18 0
#> 109 Joe 2004-05-19 0
#> 110 Joe 2004-05-20 0
#> 111 Joe 2004-05-21 0
#> 112 Joe 2004-05-22 0
#> 113 Joe 2004-05-23 0
#> 114 Joe 2004-05-24 0
#> 115 Joe 2004-05-25 0
#> 116 Joe 2004-05-26 0
#> 117 Joe 2004-05-27 0
#> 118 Joe 2004-05-28 0
#> 119 Joe 2004-05-29 0
#> 120 Joe 2004-05-30 0
#> 121 Joe 2004-05-31 0
#> 122 Joe 2004-06-01 0
#> 123 Joe 2004-06-02 0
#> 124 Joe 2004-06-03 0
#> 125 Joe 2004-06-04 0
#> 126 Joe 2004-06-05 0
#> 127 Joe 2004-06-06 0
#> 128 Joe 2004-06-07 0
#> 129 Joe 2004-06-08 0
#> 130 Joe 2004-06-09 0
#> 131 Joe 2004-06-10 0
#> 132 Joe 2004-06-11 0
#> 133 Joe 2004-06-12 0
#> 134 Joe 2004-06-13 0
#> 135 Joe 2004-06-14 0
#> 136 Joe 2004-06-15 0
#> 137 Joe 2004-06-16 0
#> 138 Joe 2004-06-17 0
#> 139 Joe 2004-06-18 0
#> 140 Joe 2004-06-19 0
#> 141 Joe 2004-06-20 0
#> 142 Joe 2004-06-21 0
#> 143 Joe 2004-06-22 0
#> 144 Joe 2004-06-23 0
#> 145 Joe 2004-06-24 0
#> 146 Joe 2004-06-25 0
#> 147 Joe 2004-06-26 0
#> 148 Joe 2004-06-27 0
#> 149 Joe 2004-06-28 0
#> 150 Joe 2004-06-29 0
#> 151 Joe 2004-06-30 0
#> 152 Joe 2004-07-01 0
#> 153 Joe 2004-07-02 0
#> 154 Joe 2004-07-03 0
#> 155 Joe 2004-07-04 0
#> 156 Joe 2004-07-05 0
#> 157 Joe 2004-07-06 0
#> 158 Joe 2004-07-07 0
#> 159 Joe 2004-07-08 0
#> 160 Joe 2004-07-09 0
#> 161 Joe 2004-07-10 0
#> 162 Joe 2004-07-11 0
#> 163 Joe 2004-07-12 0
#> 164 Joe 2004-07-13 0
#> 165 Joe 2004-07-14 0
#> 166 Joe 2004-07-15 0
#> 167 Joe 2004-07-16 0
#> 168 Joe 2004-07-17 0
#> 169 Joe 2004-07-18 0
#> 170 Joe 2004-07-19 0
#> 171 Joe 2004-07-20 0
#> 172 Joe 2004-07-21 0
#> 173 Joe 2004-07-22 0
#> 174 Joe 2004-07-23 0
#> 175 Joe 2004-07-24 0
#> 176 Joe 2004-07-25 0
#> 177 Joe 2004-07-26 0
#> 178 Joe 2004-07-27 0
#> 179 Joe 2004-07-28 0
#> 180 Joe 2004-07-29 0
#> 181 Joe 2004-07-30 0
#> 182 Joe 2004-07-31 0
#> 183 Joe 2004-08-01 0
#> 184 Joe 2004-08-02 0
#> 185 Joe 2004-08-03 0
#> 186 Joe 2004-08-04 0
#> 187 Joe 2004-08-05 0
#> 188 Joe 2004-08-06 0
#> 189 Joe 2004-08-07 0
#> 190 Joe 2004-08-08 0
#> 191 Joe 2004-08-09 10
#> 192 Joe 2004-08-10 0
#> 193 Joe 2004-08-11 0
#> 194 Joe 2004-08-12 0
#> 195 Joe 2004-08-13 9
#> 196 Joe 2004-08-14 0
#> 197 Joe 2004-08-15 0
#> 198 Joe 2004-08-16 0
#> 199 Joe 2004-08-17 0
#> 200 Joe 2004-08-18 0
#> 201 Joe 2004-08-19 0
#> 202 Joe 2004-08-20 0
#> 203 Joe 2004-08-21 0
#> 204 Joe 2004-08-22 0
#> 205 Joe 2004-08-23 0
#> 206 Joe 2004-08-24 0
#> 207 Joe 2004-08-25 0
#> 208 Joe 2004-08-26 0
#> 209 Joe 2004-08-27 0
#> 210 Joe 2004-08-28 0
#> 211 Joe 2004-08-29 0
#> 212 Joe 2004-08-30 0
#> 213 Joe 2004-08-31 0
#> 214 Joe 2004-09-01 0
#> 215 Joe 2004-09-02 0
#> 216 Joe 2004-09-03 0
#> 217 Joe 2004-09-04 0
#> 218 Joe 2004-09-05 0
#> 219 Joe 2004-09-06 0
#> 220 Joe 2004-09-07 0
#> 221 Joe 2004-09-08 0
#> 222 Joe 2004-09-09 0
#> 223 Joe 2004-09-10 0
#> 224 Joe 2004-09-11 0
#> 225 Joe 2004-09-12 0
#> 226 Joe 2004-09-13 0
#> 227 Joe 2004-09-14 0
#> 228 Joe 2004-09-15 0
#> 229 Joe 2004-09-16 0
#> 230 Joe 2004-09-17 0
#> 231 Joe 2004-09-18 0
#> 232 Joe 2004-09-19 0
#> 233 Joe 2004-09-20 0
#> 234 Joe 2004-09-21 0
#> 235 Joe 2004-09-22 0
#> 236 Joe 2004-09-23 0
#> 237 Joe 2004-09-24 0
#> 238 Joe 2004-09-25 0
#> 239 Joe 2004-09-26 0
#> 240 Joe 2004-09-27 0
#> 241 Joe 2004-09-28 0
#> 242 Joe 2004-09-29 0
#> 243 Joe 2004-09-30 0
#> 244 Joe 2004-10-01 0
#> 245 Joe 2004-10-02 0
#> 246 Joe 2004-10-03 0
#> 247 Joe 2004-10-04 0
#> 248 Joe 2004-10-05 0
#> 249 Joe 2004-10-06 0
#> 250 Joe 2004-10-07 0
#> 251 Joe 2004-10-08 0
#> 252 Joe 2004-10-09 0
#> 253 Joe 2004-10-10 0
#> 254 Joe 2004-10-11 0
#> 255 Joe 2004-10-12 0
#> 256 Joe 2004-10-13 0
#> 257 Joe 2004-10-14 0
#> 258 Joe 2004-10-15 0
#> 259 Joe 2004-10-16 0
#> 260 Joe 2004-10-17 0
#> 261 Joe 2004-10-18 0
#> 262 Joe 2004-10-19 0
#> 263 Joe 2004-10-20 15
#> 264 Joe 2004-10-21 0
#> 265 Joe 2004-10-22 0
#> 266 Joe 2004-10-23 0
#> 267 Joe 2004-10-24 0
#> 268 Joe 2004-10-25 0
#> 269 Joe 2004-10-26 0
#> 270 Joe 2004-10-27 0
#> 271 Joe 2004-10-28 0
#> 272 Joe 2004-10-29 0
#> 273 Joe 2004-10-30 0
#> 274 Joe 2004-10-31 0
#> 275 Joe 2004-11-01 0
#> 276 Joe 2004-11-02 1
#> 277 Joe 2004-11-03 0
#> 278 Joe 2004-11-04 0
#> 279 Joe 2004-11-05 0
#> 280 Joe 2004-11-06 0
#> 281 Joe 2004-11-07 0
#> 282 Joe 2004-11-08 0
#> 283 Joe 2004-11-09 0
The second approach will in general result in many more rows of data. If you want to keep a maximum of 7 rows after any date with non-zero hits
, you can do the following:
df.filled2 = df.filled2 %>%
group_by(name) %>%
mutate(test=cumsum(hits > 0)) %>%
group_by(name, test) %>%
slice(1:8) %>%
ungroup %>%
select(-test)
Solution 2:[2]
Old topic but this is the way I pad by time. The timetk package has some great time series functionality. I always find it best to fill with NA and then do a quick mutate to replace those as sometimes the pad by time function doesn't work if you are filling with another value.
library(timetk)
df %>%
mutate(date = lubridate::ymd(date))
group_by(name) %>%
timetk::pad_by_time(
.date_var = date,
.by = "auto", ### insert date level to summarize at (month, day)
.pad_value = NA,
.fill_na_direction = "updown",
.start_date = NULL,
.end_date = NULL
)%>%
mutate(hits = ifelse(hits == "NA", 0, hits))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | KBE11416 |