'Difference between pull and select in dplyr?
It seems like dplyr::pull()
and dplyr::select()
do the same thing. Is there a difference besides that dplyr::pull()
only selects 1 variable?
Solution 1:[1]
You could see select
as an analogue of [
or magrittr::extract
and pull
as an analogue of [[
(or $
) or magrittr::extract2
for data frames (an analogue of [[
for lists would be purr::pluck
).
df <- iris %>% head
All of these give the same output:
df %>% pull(Sepal.Length)
df %>% pull("Sepal.Length")
a <- "Sepal.Length"; df %>% pull(!!quo(a))
df %>% extract2("Sepal.Length")
df %>% `[[`("Sepal.Length")
df[["Sepal.Length"]]
# all of them:
# [1] 5.1 4.9 4.7 4.6 5.0 5.4
And all of these give the same output:
df %>% select(Sepal.Length)
a <- "Sepal.Length"; df %>% select(!!quo(a))
df %>% select("Sepal.Length")
df %>% extract("Sepal.Length")
df %>% `[`("Sepal.Length")
df["Sepal.Length"]
# all of them:
# Sepal.Length
# 1 5.1
# 2 4.9
# 3 4.7
# 4 4.6
# 5 5.0
# 6 5.4
pull
and select
can take literal
, character
, or numeric
indices, while the others take character
or numeric
only
One important thing is they differ on how they handle negative indices.
For select
negative indices mean columns to drop.
For pull
they mean count from last column.
df %>% pull(-Sepal.Length)
df %>% pull(-1)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica
Strange result but Sepal.Length
is converted to 1
, and column -1
is Species
(last column)
This feature is not supported by [[
and extract2
:
df %>% `[[`(-1)
df %>% extract2(-1)
df[[-1]]
# Error in .subset2(x, i, exact = exact) :
# attempt to select more than one element in get1index <real>
Negative indices to drop columns are supported by [
and extract
though.
df %>% select(-Sepal.Length)
df %>% select(-1)
df %>% `[`(-1)
df[-1]
# Sepal.Width Petal.Length Petal.Width Species
# 1 3.5 1.4 0.2 setosa
# 2 3.0 1.4 0.2 setosa
# 3 3.2 1.3 0.2 setosa
# 4 3.1 1.5 0.2 setosa
# 5 3.6 1.4 0.2 setosa
# 6 3.9 1.7 0.4 setosa
Solution 2:[2]
First, it makes sense to see what class
each function creates.
library(dplyr)
mtcars %>% pull(cyl) %>% class()
#> 'numeric'
mtcars %>% select(cyl) %>% class()
#> 'data.frame'
So pull()
creates a vector -- which, in this case, is numeric
-- whereas select()
creates a data frame.
Basically, pull()
is the equivalent to writing mtcars$cyl
or mtcars[, "cyl"]
, whereas select()
removes all of the columns except for cyl
but maintains the data frame structure
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |