'replacing missing values in r

I need help in replacing missing values in the following dummy file. The following rule need to be followed when replacing a missing value.

If the value is the same on both sides of the column where the cell has a missing value, the missing value should be replaced with the value on either side.
If the value is the same on both sides of a column where two adjacent cells have a missing value, the missing value should be replaced with the value on either side.
If the value same on both sides of the column where 3, 4 or more adjacent cells have missing value, the missing value should be replaced with the value on either side of it
If the value in 2007 Colum missing, then it should be replaced with the value of 2008 and 2009 if they are the same
If a value in the 2017 Column is missing, it should be replaced with the values from 2016 and 2015, if they are the same.
If the value is not the same on both sides of the column containing the missing value, the missing value should be replaced with the most frequently occurring value between 2007 and 2017 columns.
If 2007 and 2008 missing, replace both missing value with 2009 if 2009==2010==2011
If 2007, 2008 and 2009 missing, replace all three-missing value with 2010 if 2010==2011==2012
If 2007, 2008, 2009 and 2010 missing, replace all four-missing value with 2011 if 2011==2012==2013
If 2017 and 2016 missing, replace both missing value with 2015 if 2015==2014==2013
If 2017, 2016 and 2015 missing, replace all three-missing value with 2014 if 2014==2013==2012
If 2017, 2016, 2015 and 2014 missing, replace all four-missing value with 2013 if 2013==2012==2011
create new variable of count of unique value during 2007 and 2017 for every case

dummy data is below

dput(gb)
structure(list(ID = 1:20, X2007 = c("a1", "v1", "", "e1", "d1", 
"g1", "t1", "b2w", "p1", "q1", "sd1", "fr4", "fr6", "gt7", "", 
"ju8", "ki9", "lo9", "", "i88"), X2008 = c("a1", "v1", "c1", 
"e1", "d1", "", "t1", "b2w", "", "", "", "", "", "", "", "", 
"", "", "", ""), X2009 = c("a1", "", "c1", "", "", "d1", "t1", 
"", "p1", "", "sd1", "", "fr6", "", "hj7", "ju8", "ki9", "lo9", 
"k99", "i88"), X2010 = c("a1", "", "", "e1", "", "d1", "", "", 
"p1", "", "sd1", "", "fr6", "gt7", "hj7", "", "ki9", "", "k99", 
""), X2011 = c("", "v1", "", "", "", "d1", "", "b2w", "p1", "q1", 
"sd1", "", "fr6", "gt7", "hj7", "", "ki9", "", "k99", ""), X2012 = c("a1", 
"v1", "c1", "e1", "", "", "", "b2w", "p1", "q1", "sd1", "", "fr6", 
"gt7", "hj7", "ju8", "ki9", "lo9", "k99", ""), X2013 = c("b1 ", 
"", "c1", "e1", "d1", "", "t1", "", "p1", "q1", "sd1", "fr4", 
"fr6", "gt7", "hj7", "ju8", "ki9", "lo9", "k99", ""), X2014 = c("", 
"v1", "", "", "d1", "g1", "t1", "", "", "q1", "", "fr4", "", 
"gt7", "", "ju8", "", "lo9", "", "i88"), X2015 = c("b3", "b6", 
"", "", "d1", "g1", "t1", "", "", "q1", "", "fr4", "", "", "", 
"ju8", "", "lo9", "", "i88"), X2016 = c("b4", "b6", "", "", "d1", 
"g1", "t1", "b2w", "", "", "", "fr4", "", "", "", "", "", "lo9", 
"", "i88"), X2017 = c("b5", "b6", "c1", "e1", "d1", "g1", "", 
"", "", "", "", "fr4", "", "", "", "", "", "lo9", "", "i88")), class = "data.frame", row.names = c(NA, 
-20L))

r excel statistics missing-data

Solution 1:^[1]

Here is a possible approach:

Pivot longer
change "" to NA, and trim white space
group by ID
Within each group:

use zoo::locf, get the "adjacent" values for any sequence of consecutive missing
get the most common value (I create a little helper function for this)
apply the rules using case_when()

Pivot back to wide

most_common <- function(v) {
  tv = table(v[!is.na(v)])
  names(tv)[which.max(tv)]
}
gb %>% 
  pivot_longer(-ID,names_prefix = "X",names_transform = as.integer, names_to="year") %>% 
  mutate(value=if_else(value=="", as.character(NA), trimws(value))) %>% 
  group_by(ID) %>% 
  mutate(
    prevv=zoo::na.locf(value, na.rm=F),
    nextv=zoo::na.locf(value, na.rm=F, fromLast=T),
    mostf = most_common(value),
    # Now replace the NA
    nvalue = case_when(
      !is.na(value)~value, 
      prevv==nextv~prevv,
      is.na(value) & year==2007 & lead(value,1) == lead(value,2)~lead(value,1),
      is.na(value) & year==2017 & lag(value,1) == lag(value,2)~lag(value,1),
      TRUE~mostf
    )) %>% 
  pivot_wider(ID, names_from=year,names_prefix = "X", values_from=nvalue)

Output:

      ID X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016 X2017
   <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1     1 a1    a1    a1    a1    a1    a1    b1    a1    b3    b4    b5   
 2     2 v1    v1    v1    v1    v1    v1    v1    v1    b6    b6    b6   
 3     3 c1    c1    c1    c1    c1    c1    c1    c1    c1    c1    c1   
 4     4 e1    e1    e1    e1    e1    e1    e1    e1    e1    e1    e1   
 5     5 d1    d1    d1    d1    d1    d1    d1    d1    d1    d1    d1   
 6     6 g1    g1    d1    d1    d1    g1    g1    g1    g1    g1    g1   
 7     7 t1    t1    t1    t1    t1    t1    t1    t1    t1    t1    t1   
 8     8 b2w   b2w   b2w   b2w   b2w   b2w   b2w   b2w   b2w   b2w   b2w  
 9     9 p1    p1    p1    p1    p1    p1    p1    p1    p1    p1    p1   
10    10 q1    q1    q1    q1    q1    q1    q1    q1    q1    q1    q1   
11    11 sd1   sd1   sd1   sd1   sd1   sd1   sd1   sd1   sd1   sd1   sd1  
12    12 fr4   fr4   fr4   fr4   fr4   fr4   fr4   fr4   fr4   fr4   fr4  
13    13 fr6   fr6   fr6   fr6   fr6   fr6   fr6   fr6   fr6   fr6   fr6  
14    14 gt7   gt7   gt7   gt7   gt7   gt7   gt7   gt7   gt7   gt7   gt7  
15    15 hj7   hj7   hj7   hj7   hj7   hj7   hj7   hj7   hj7   hj7   hj7  
16    16 ju8   ju8   ju8   ju8   ju8   ju8   ju8   ju8   ju8   ju8   ju8  
17    17 ki9   ki9   ki9   ki9   ki9   ki9   ki9   ki9   ki9   ki9   ki9  
18    18 lo9   lo9   lo9   lo9   lo9   lo9   lo9   lo9   lo9   lo9   lo9  
19    19 k99   k99   k99   k99   k99   k99   k99   k99   k99   k99   k99  
20    20 i88   i88   i88   i88   i88   i88   i88   i88   i88   i88   i88

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	langtang

'replacing missing values in r

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]