'Why do factors get coerced to a number subsetting a data frame?
I was trying to get the diagonal of the iris data set and wrote the following for loop:
diagonal_list <- list()
for (j in seq_len(ncol(iris))) {
diagonal_list[j] <- iris[[j,j]]
}
diagonal_list
My output is:
[[1]]
[1] 5.1
[[2]]
[1] 3
[[3]]
[1] 1.3
[[4]]
[1] 0.2
[[5]]
[1] 1
But I want
[[1]]
[1] 5.1
[[2]]
[1] 3
[[3]]
[1] 1.3
[[4]]
[1] 0.2
[[5]]
[1] setosa
Levels: setosa versicolor virginica
This normally should return a list of the diagonal, while the 5 th column of the iris data frame contains the species. However, in my list output the species is not a factor but simply 1 (a number). How can I make sure that my list contains the factor?
Solution 1:[1]
The assignment in the for-loop should use double brackets [[
on the both sides.
diagonal_list <- list()
for (j in seq_len(ncol(iris))) {
diagonal_list[[j]] <- iris[[j,j]]
}
Another solution to extract the diagonal without a loop:
lapply(seq_along(iris), \(x) iris[x, x])
Output
[[1]]
[1] 5.1
[[2]]
[1] 3
[[3]]
[1] 1.3
[[4]]
[1] 0.2
[[5]]
[1] setosa
Levels: setosa versicolor virginica
Solution 2:[2]
You have to add iris[[j,j]]
in a list
diagonal_list <- list()
for (j in seq_len(ncol(iris))) {
diagonal_list[j] <- list(iris[[j,j]])
}
str(diagonal_list)
List of 5
$ : num 5.1
$ : num 3
$ : num 1.3
$ : num 0.2
$ : Factor w/ 3 levels "setosa","versicolor",..: 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Merijn van Tilborg |