'Automating conditional logic for database data checks in R

I am trying to do a large data check for a database. Some fields in the database are hidden, so when I am doing the datacheck, I need to ignore all hidden fields. Fields are hidden based on conditional logic stored in the database. I have exported this conditional logic and have stored it in a dataframe in R. Now I need to automate the data check by somehow using the text string of a conditional argument to automate the script writing itself, which I do not think is possible, or finding a way around this problem.

Below is example code that I need to solve:

id <- c(1001, 1002, 1003, 1004, 1005, 1001, 1002, 1003, 1004, 1005)
target_var <- c("race","race","race","race","race", "race_other", 
         "race_other", "race_other", "race_other", "race_other")
value <- c(1, NA, 1, 1, 6, NA, NA, NA, NA, "Asian")
branching_logic <- c(NA, NA, NA, NA, NA, 
                     "race == 6", "race == 6", "race == 6", 
                     "race == 6", "race == 6")
race <- c(NA, NA, NA,NA, NA, 1, 1, 1, 6, 6)

data <- data.frame(id, var, value, branching_logic, race) %>%
  mutate(data_check_result = case_when(
    !is.na(value) ~ "No Missing Data", 
    is.na(value) & is.na(branching_logic) ~ "Missing Data 1",
    is.na(value) & race == 6 ~ "Missing Data 2", 
    is.na(value) & race != 6 ~ "Hidden field",
  ))

It would be great if I could replace (race==6) with a variable or somehow directing the script to the conditional expression already saved as a string, but I know that R can't do that.

The above problem has four categories which the data could fall into:

  • No Missing Data: only if value is non-na
  • Missing Data 1: if the value is NA, and there is no branching logic that hid the variable.
  • Missing Data 2: if the value is NA and the branching logic is met to show the field
  • Hidden Field: if the value is NA and the branching logic is NOT net to show the field

I have thousands of fields to check with accompanying branching logic, so I need a way to use the branching logic saved in the "branching_logic" column within the script.

IMPORTANT NOTE: The case here is the simplest case. Many target_var variables and value variables have branching logic that looks at multiple other variables to determine whether to hide the field (Ex. race==6 & race==1)

This is only my second time posting, and I usually do not see such in depth problems here, but it would be great if someone has an idea!



Solution 1:[1]

You can store the expression you want to evaluate as a string if you pass it into parse() first as explained in this answer.

Here's a simple example of how you can store the expression in a column and then feed it to dplyr::case_when().

library(tidyverse)

set.seed(1)
d <- tibble(
  a = sample(10),
  b = sample(10),
  c = "a >  b"
)

d %>% 
  mutate(a_bigger = case_when(
    eval(parse(text = c)) ~ "Y",
    TRUE ~ "N"
  ))
#> # A tibble: 10 x 4
#>        a     b c      a_bigger
#>    <int> <int> <chr>  <chr>   
#>  1     9     3 a >  b Y       
#>  2     4     1 a >  b Y       
#>  3     7     5 a >  b Y       
#>  4     1     8 a >  b N       
#>  5     2     2 a >  b N       
#>  6     5     6 a >  b N       
#>  7     3    10 a >  b N       
#>  8    10     9 a >  b Y       
#>  9     6     4 a >  b Y       
#> 10     8     7 a >  b Y

Created on 2022-03-07 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1