'Get data from variable z where variable x = variable y
library(dplyr)
df = data.frame(group_id = c(rep("a",5), rep("b",5)),
prod_id = 1:10,
prod_type = rep(c("a","a", "b", "c","d"),2),
start = lubridate::dmy(c("01/01/2001", "02/02/2002", "03/03/2003", "04/04/2004", "05/05/2005")),
fin = lubridate::dmy(c(NA,NA,NA,NA,NA)))%>%
group_by(group_id) %>%
mutate(next_acd = if_else(lead(prod_type) %in% c("a","c","d"), lead(prod_id), prod_id[NA])) %>%
tidyr::fill(next_acd, .direction = "up")
df
group_id prod_id prod_type start fin next_acd
<chr> <int> <chr> <date> <date> <int>
1 a 1 a 2001-01-01 NA 2
2 a 2 a 2002-02-02 NA 4
3 a 3 b 2003-03-03 NA 4
4 a 4 c 2004-04-04 NA 5
5 a 5 d 2005-05-05 NA NA
6 b 6 a 2001-01-01 NA 7
7 b 7 a 2002-02-02 NA 9
8 b 8 b 2003-03-03 NA 9
9 b 9 c 2004-04-04 NA 10
10 b 10 d 2005-05-05 NA NA
In the above next_acd
is the next prod_id
that is prod_type
"a","c" or "d".
I now want to mutate
fin
to be the start
date of the prod_id
that corresponds with next_acd
. Result should be:
group_id prod_id prod_type start fin next_acd
<chr> <int> <chr> <date> <date> <int>
1 a 1 a 2001-01-01 2002-02-02 2
2 a 2 a 2002-02-02 2004-04-04 4
3 a 3 b 2003-03-03 2004-04-04 4
4 a 4 c 2004-04-04 2005-05-05 5
5 a 5 d 2005-05-05 NA NA
6 b 6 a 2001-01-01 2002-02-02 7
7 b 7 a 2002-02-02 2004-04-04 9
8 b 8 b 2003-03-03 2004-04-04 9
9 b 9 c 2004-04-04 2005-05-05 10
10 b 10 d 2005-05-05 NA NA
EDIT
The original answer works with the example data, where the prod_id
are in numerical order without any missing. It doesn't work in my real data which is more like:
df = data.frame(group_id = c("G1","G1", rep("G2",8)),
prod_id = c(1,2,5,6,7,8,9,10,11,12),
prod_type = rep(c("a","a", "b", "c","d"),2),
start = lubridate::dmy(c("01/01/2001", "02/02/2002",
"05/05/2005","06/06/2006", "07/07/2007", "08/08/2008", "09/09/2009","01/01/2010", "02/02/2011", "03/03/2012" )),
fin = lubridate::dmy(NA))%>%
group_by(group_id) %>%
mutate(next_acd = if_else(lead(prod_type) %in% c("a","c","d"), lead(prod_id), prod_id[NA])) %>%
tidyr::fill(next_acd, .direction = "up")%>%
ungroup()
The result I'm expecting is:
group_id prod_id prod_type start fin next_acd
<chr> <dbl> <chr> <date> <date> <dbl>
1 G1 1 a 2001-01-01 2002-02-02 2
2 G1 2 a 2002-02-02 NA NA
3 G2 5 b 2005-05-05 2006-06-06 6
4 G2 6 c 2006-06-06 2007-07-07 7
5 G2 7 d 2007-07-07 2008-08-08 8
6 G2 8 a 2008-08-08 2009-09-09 9
7 G2 9 a 2009-09-09 2011-02-02 11
8 G2 10 b 2010-01-01 2011-02-02 11
9 G2 11 c 2011-02-02 2012-03-03 12
10 G2 12 d 2012-03-03 NA NA
But trying df$fin <- df$start[df$prod_id[df$next_acd]]
, I get:
df
# A tibble: 10 x 6
group_id prod_id prod_type start fin next_acd
<chr> <dbl> <chr> <date> <date> <dbl>
1 G1 1 a 2001-01-01 2002-02-02 2
2 G1 2 a 2002-02-02 NA NA
3 G2 5 b 2005-05-05 2010-01-01 6
4 G2 6 c 2006-06-06 2011-02-02 7
5 G2 7 d 2007-07-07 2012-03-03 8
6 G2 8 a 2008-08-08 NA 9
7 G2 9 a 2009-09-09 NA 11
8 G2 10 b 2010-01-01 NA 11
9 G2 11 c 2011-02-02 NA 12
10 G2 12 d 2012-03-03 NA NA
I don't understand what is happening here or how to fix it.
Solution 1:[1]
A possible solution, in base R
:
df$fin <- df$start[match(df$next_acd, df$prod_id)]
#> # A tibble: 10 × 6
#> group_id prod_id prod_type start fin next_acd
#> <chr> <dbl> <chr> <date> <date> <dbl>
#> 1 G1 1 a 2001-01-01 2002-02-02 2
#> 2 G1 2 a 2002-02-02 NA NA
#> 3 G2 5 b 2005-05-05 2006-06-06 6
#> 4 G2 6 c 2006-06-06 2007-07-07 7
#> 5 G2 7 d 2007-07-07 2008-08-08 8
#> 6 G2 8 a 2008-08-08 2009-09-09 9
#> 7 G2 9 a 2009-09-09 2011-02-02 11
#> 8 G2 10 b 2010-01-01 2011-02-02 11
#> 9 G2 11 c 2011-02-02 2012-03-03 12
#> 10 G2 12 d 2012-03-03 NA NA
Or using dplyr
:
library(tidyverse)
df %>%
mutate(fin = start[match(next_acd, prod_id)])
#> # A tibble: 10 × 6
#> group_id prod_id prod_type start fin next_acd
#> <chr> <dbl> <chr> <date> <date> <dbl>
#> 1 G1 1 a 2001-01-01 2002-02-02 2
#> 2 G1 2 a 2002-02-02 NA NA
#> 3 G2 5 b 2005-05-05 2006-06-06 6
#> 4 G2 6 c 2006-06-06 2007-07-07 7
#> 5 G2 7 d 2007-07-07 2008-08-08 8
#> 6 G2 8 a 2008-08-08 2009-09-09 9
#> 7 G2 9 a 2009-09-09 2011-02-02 11
#> 8 G2 10 b 2010-01-01 2011-02-02 11
#> 9 G2 11 c 2011-02-02 2012-03-03 12
#> 10 G2 12 d 2012-03-03 NA NA
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |