'Difference between pull and select in dplyr?

It seems like dplyr::pull() and dplyr::select() do the same thing. Is there a difference besides that dplyr::pull() only selects 1 variable?



Solution 1:[1]

You could see select as an analogue of [ or magrittr::extract and pull as an analogue of [[ (or $) or magrittr::extract2 for data frames (an analogue of [[ for lists would be purr::pluck).

df <- iris %>% head

All of these give the same output:

df %>% pull(Sepal.Length)
df %>% pull("Sepal.Length")
a <- "Sepal.Length"; df %>% pull(!!quo(a))
df %>% extract2("Sepal.Length")
df %>% `[[`("Sepal.Length")
df[["Sepal.Length"]]

# all of them:
# [1] 5.1 4.9 4.7 4.6 5.0 5.4

And all of these give the same output:

df %>% select(Sepal.Length)
a <- "Sepal.Length"; df %>% select(!!quo(a))
df %>% select("Sepal.Length")
df %>% extract("Sepal.Length")
df %>% `[`("Sepal.Length")
df["Sepal.Length"]
# all of them:
#   Sepal.Length
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

pull and select can take literal, character, or numeric indices, while the others take character or numeric only

One important thing is they differ on how they handle negative indices.

For select negative indices mean columns to drop.

For pull they mean count from last column.

df %>% pull(-Sepal.Length)
df %>% pull(-1)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica

Strange result but Sepal.Length is converted to 1, and column -1 is Species (last column)

This feature is not supported by [[ and extract2 :

df %>% `[[`(-1)
df %>% extract2(-1)
df[[-1]]
# Error in .subset2(x, i, exact = exact) : 
#   attempt to select more than one element in get1index <real>

Negative indices to drop columns are supported by [ and extract though.

df %>% select(-Sepal.Length)
df %>% select(-1)
df %>% `[`(-1)
df[-1]

#   Sepal.Width Petal.Length Petal.Width Species
# 1         3.5          1.4         0.2  setosa
# 2         3.0          1.4         0.2  setosa
# 3         3.2          1.3         0.2  setosa
# 4         3.1          1.5         0.2  setosa
# 5         3.6          1.4         0.2  setosa
# 6         3.9          1.7         0.4  setosa

Solution 2:[2]

First, it makes sense to see what class each function creates.

library(dplyr)

mtcars %>% pull(cyl) %>% class()
#> 'numeric'

mtcars %>% select(cyl) %>% class()
#> 'data.frame'

So pull() creates a vector -- which, in this case, is numeric -- whereas select() creates a data frame.

Basically, pull() is the equivalent to writing mtcars$cyl or mtcars[, "cyl"], whereas select() removes all of the columns except for cyl but maintains the data frame structure

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2