'Using column numbers not names in lm()
Instead of something like lm(bp~height+age, data=mydata)
I would like to specify the columns by number, not name.
I tried lm(mydata[[1]]~mydata[[2]]+mydata[[3]])
but the problem with this is that, in the fitted model, the coefficients are named mydata[[2]]
, mydata[[3]]
etc, whereas I would like them to have the real column names.
Perhaps this is a case of not having your cake and eating it, but if the experts could advise whether this is possible I would be grateful
Solution 1:[1]
lm(
as.formula(paste(colnames(mydata)[1], "~",
paste(colnames(mydata)[c(2, 3)], collapse = "+"),
sep = ""
)),
data=mydata
)
Instead of c(2, 3)
you can use how many indices you want (no need for for loop).
Solution 2:[2]
lm(mydata[,1] ~ ., mydata[-1])
The trick that I found in a course on R is to remove the response column, otherwise you get warning "essentially perfect fit: summary may be unreliable". I do not know why it works, it does not follow from documentation. Normally, we keep the response column in.
And a simplified version of the earlier answer by Tomas:
lm(
as.formula(paste(colnames(mydata)[1], "~ .")),
data=mydata
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |