'R incongruity when copying a column in R with ifelse

After loading lots of xlsx sheets of multiple workbooks, I want to create a double check of the tidiness and cleanliness of the data source. I created a data frame reading the worksheets as.numeric (with the risk of NAs introduced by coercion) and an other data frame reading the data as it is (with the risk of not having number value) . This is to confront them and to find if there could be some import problems and where the correction to do in the xlsx source workbooks.

When asking the sum, in the "non numerical" data frame I want to turn around the error (if the data are dirty..(not numerical))

df <- data.frame(cbind(c(1:10),c(1,2,"a","b",5,6,7,8,9,10)))
colnames(df) <- c("ColA","ColB")
df <- df[order(df$ColB),]

# if the values are not numerical,  
# I want to create a column with the formula to use in excel 
# to do the sum in xlsx (it should start with row+1 
# as there are the headers in xlsx). So I want just to put a string.

# Doing the below I always have the same unique string 
# for column NonNumeric1  "=SUM(A2:B2)", it is not what I want.

df$Total.NonNumeric1 <-   ifelse(sum(as.numeric(lapply(select(df, ColA:ColB),is.character)))>0, 
                                 paste("=SUM(A",1 + seq.int(nrow(df)),":B",1 + seq.int(nrow(df)),")",sep = ""), 
                                 select(df, ColA:ColB) %>% rowSums(na.rm = TRUE))

df
   ColA ColB Total.NonNumeric1
1     1    1       =SUM(A2:B2)
10   10   10       =SUM(A2:B2)
5     5    2       =SUM(A2:B2)
2     2    5       =SUM(A2:B2)
9     9    6       =SUM(A2:B2)
7     7    7       =SUM(A2:B2)
8     8    8       =SUM(A2:B2)
6     6    9       =SUM(A2:B2)
3     3    a       =SUM(A2:B2)
4     4    b       =SUM(A2:B2)


# I tried to turn around and create the column with the string 
# for each row, this worked without the ifelse.

df<- within(df,row_num <- paste("=SUM(A",1 + seq.int(nrow(df)),":B",1 + seq.int(nrow(df)),")",sep = ""))

# as you can see row_num is sequential
df
   ColA ColB Total.NonNumeric1       row_num
1     1    1       =SUM(A2:B2)   =SUM(A2:B2)
10   10   10       =SUM(A2:B2)   =SUM(A3:B3)
5     5    2       =SUM(A2:B2)   =SUM(A4:B4)
2     2    5       =SUM(A2:B2)   =SUM(A5:B5)
9     9    6       =SUM(A2:B2)   =SUM(A6:B6)
7     7    7       =SUM(A2:B2)   =SUM(A7:B7)
8     8    8       =SUM(A2:B2)   =SUM(A8:B8)
6     6    9       =SUM(A2:B2)   =SUM(A9:B9)
3     3    a       =SUM(A2:B2) =SUM(A10:B10)
4     4    b       =SUM(A2:B2) =SUM(A11:B11)

# but if I try to put it to replace the column Total.NonNumeric2 
#(that needs to be replace 
# with the string only if the data are not numerical, 
# it goes back to the same "=SUM(A2:B2)" 
# and does not keep the sequence or the original string. 

df$Total.NonNumeric2 <-   ifelse(sum(as.numeric(lapply(select(df, ColA:ColB),is.character)))>0,
                          df$row_num, 
                          select(df, ColA:ColB) %>% rowSums(na.rm = TRUE))

df
   ColA ColB Total.NonNumeric1       row_num Total.NonNumeric2
1     1    1       =SUM(A2:B2)   =SUM(A2:B2)       =SUM(A2:B2)
10   10   10       =SUM(A2:B2)   =SUM(A3:B3)       =SUM(A2:B2)
5     5    2       =SUM(A2:B2)   =SUM(A4:B4)       =SUM(A2:B2)
2     2    5       =SUM(A2:B2)   =SUM(A5:B5)       =SUM(A2:B2)
9     9    6       =SUM(A2:B2)   =SUM(A6:B6)       =SUM(A2:B2)
7     7    7       =SUM(A2:B2)   =SUM(A7:B7)       =SUM(A2:B2)
8     8    8       =SUM(A2:B2)   =SUM(A8:B8)       =SUM(A2:B2)
6     6    9       =SUM(A2:B2)   =SUM(A9:B9)       =SUM(A2:B2)
3     3    a       =SUM(A2:B2) =SUM(A10:B10)       =SUM(A2:B2)
4     4    b       =SUM(A2:B2) =SUM(A11:B11)       =SUM(A2:B2)

I would need that a column with the formula string is inserted only if the sum is not possible because characters, otherwise it should be the sum of the defined columns. But especially I would like to understand where is my error. Thank you very much



Solution 1:[1]

As the comments say, you are starting out will all character columns, but even if not, any column with mixed numbers and strings would be character anyway. In that case, it might be better to do the ifelse element-wise instead of on the columns as a whole:

# temp var y to avoid adding twice
df$Total.NonNumeric2 <- with(df, {
  y <- as.numeric(ColA) + as.numeric(ColB)
  # y <- as.character(y)

  ifelse(is.na(y), row_num, y)
})
df["Total.NonNumeric2"]
   Total.NonNumeric2
1                  2
10                20
2                  4
5                 10
6                 12
7                 14
8                 16
9                 18
3      =SUM(A10:B10)
4      =SUM(A11:B11)

I don't know how fragile this will be with the data in your spreadsheet, but should be ok with just numbers and letters. Also, if all your values are numbers, then the result will be a numerical column. Not sure what the rest of your code will do, but for robustness, you can coerce y to character.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1