'Custom imputation function for MICE stopped working

Last year I used info from this question to build a custom function to impute missing data under a simple logical constraint using MICE in R. The following code still works on mice version 2.30, but it will not work as of 2.46 (and mice is now on 3.3). Here is what I'm trying to do, complete with MWE and a full explanation of what I've tried so far.

Given the data below, the combination (N, Y) is impossible (structural zero). All other combos are fine. So the missing data in rows 7, 9 and 10 can be imputed as Y or N, but the missing value for y4 in row 8 is constrained to be "N".

df <- data.frame(U = runif(10), 
                y4 = factor(c("N", "N", "Y", "Y", "Y", "Y",  NA,  NA, "Y",  "Y")),
                y5 = factor(c("N", "N", "Y", "Y", "N", "N", "N", "Y", NA,   NA)))

> df
            U   y4   y5
1  0.49717835    N    N
2  0.37466084    N    N
3  0.14765796    Y    Y
4  0.98469334    Y    Y
5  0.33477385    Y    N
6  0.96072250    Y    N
7  0.47953952 <NA>    N
8  0.08374912 <NA>    Y
9  0.27682921    Y <NA>
10 0.13180437    Y <NA>

Since y4 and y5 are factors, we use polytomous regression models to generate imputations. The code below does this, with no errors.

library(mice)
mice(df, m=2, method=c("", "polyreg", "polyreg"), maxit = 5)

The polyreg function calls the mice.impute.polyreg function. At each iteration the data that is being passed to mice.impute.polyreg was imputed from the prior iteration, so has no missing data. For the MWE I get rid of those rows with missing y5 and try to impute again.

df.nm5 <- df[1:8,]

> mice.impute.polyreg(y=df.nm5$y4, ry = !is.na(df.nm5$y4), x=df.nm5[,c(1,3)])
[1] "Y" "Y"

Now, this call to mice.impute.polyreg does throw some na.rm type errors,but I think it's related to the size/structure of the data, not related to the imputation function

Warning messages:
1: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(newX[, i], ...) :
  argument is not numeric or logical: returning NA
3: In FUN(newX[, i], ...) : NAs introduced by coercion

The output of mice.impute.polyreg is a vector of values to be imputed. I want to hijack that output, and make a deterministic edit such that all imputed values follow those constraints. so I write my own function, mice.impute.polyreg_y4_adv

mice.impute.polyreg_y4_adv <- function(y, ry, x){

  vals <- mice.impute.polyreg(y, ry, x) # generate imputed values using polyreg

  logic_5 <- x[!ry, "y5"]               # extracts y5 fom the data x where y4 is missing

  vals[logic_5=="Y"] <- "N"             # if y5 is "Y", then change the imputed value for y4 to be "N"

  return(vals)
}

If I pass this new function to mice, I get an argument about wy as an unused argument.

> mice(df.nm5, m=2, method=c("", "polyreg_y4_adv", "polyreg"), maxit = 5)

 iter imp variable
  1   1  y4Error in mice.impute.polyreg_y4_adv(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),  : 
  unused arguments (wy = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE), 
                    type = c(1, 1))

If I modify mice.impute.polyreg_y4_adv to provide a wy=NULL argument,

mice.impute.polyreg_y4_adv <- function(y, ry, x, wy=NULL){
  vals <- mice.impute.polyreg(y, ry, x)
  logic_5 <- x[!ry, "y5"]
  vals[logic_5=="Y"] <- "N" 
  return(vals)
}

iter imp variable
  1   1  y4Error in mice.impute.polyreg_y4_adv(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),  : 
  unused argument (type = c(1, 1))

I still get an error about unused argument type (which I can't find in the help file for mice.impute.polyreg). Still if I NULL that argument as well..

mice.impute.polyreg_y4_adv <- function(y, ry, x, wy=NULL, type=NULL){
  vals <- mice.impute.polyreg(y, ry, x)
  logic_5 <- x[!ry, "y5"]
  vals[logic_5=="Y"] <- "N" 
  return(vals)
}

mice(df.nm5, m=2, method=c("", "polyreg_y4_adv", "polyreg"), maxit = 5)

Now I've got a subscript out of bounds error.

 iter imp variable
  1   1  y4Error in x[!ry, "y5"] : subscript out of bounds

If I manually pass arguments to mice.impute.polyreg(y, ry, x), I don't get the out of bounds error with x[!ry, "y5"].

The changelog does not provide any substantial information.



Solution 1:[1]

I think this is a factor-conversion issue.

Try setting a breakpoint on your custom function and then running it:

debugonce(mice.impute.polyreg_y4_adv)
mice(df.nm5, m=2, method=c("", "polyreg_y4_adv", "polyreg"), maxit = 5)


Browse[2]> str(x)
num [1:10, 1:2] 0.158 0.169 0.243 0.534 0.815 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "1" "2" "3" "4" ...
..$ : chr [1:2] "U" "y5Y"

It looks like the column is now named y5Y. I'll keep looking for what changed this, it might be in a dependency and not mice itself.

EDIT:

In sampler.R, we can see it transforming x through model.matrix: https://github.com/cran/mice/blob/53f69107bb81f03e98dcdd19e90186043864c670/R/sampler.R#L183-L193

That file was introduced in 2.46 - presumably something else was used before then, but I can't easily find it in the history.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Neal Fultz