'Name columns within aggregate in R
I know I can *re*name columns after I aggregate the data:
blubb <- aggregate(dat$two ~ dat$one, ...)
colnames(blubb) <- c("One", "Two")
Nothing wrong with that. But is there a way to aggregate and name the columns in one go? Sort of like:
blubb <- aggregate( ... , cols = c("One", "Two"))
It would be escpecially nice (and typo-proof) to somehow catch the original column names and do like:
blubb <- aggregate( ... , cols = c(name_of_dat$one, name_of_dat$two."_Mean"))
Solution 1:[1]
You can use setNames
as in:
blubb <- setNames(aggregate(dat$two ~ dat$one, ...), c("One", "Two"))
Alternatively, you can bypass the slick formula method, and use syntax like:
blubb <- aggregate(list(One = dat$one), list(Two = dat$two), ...)
Update
This update is to just help get you started on deriving a solution on your own.
If you inspect the code for stats:::aggregate.formula
, you'll see the following lines towards the end:
if (is.matrix(mf[[1L]])) {
lhs <- as.data.frame(mf[[1L]])
names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
}
else aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
If all that you want to do is append the function name to the variable that was aggregated, perhaps you can change that to something like:
if (is.matrix(mf[[1L]])) {
lhs <- as.data.frame(mf[[1L]])
names(lhs) <- as.character(m[[2L]][[2L]])[-1L]
myOut <- aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...)
colnames(myOut) <- c(names(mf[-1L]),
paste(names(lhs), deparse(substitute(FUN)), sep = "."))
}
else {
myOut <- aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...)
colnames(myOut) <- c(names(mf[-1L]),
paste(strsplit(gsub("cbind\\(|\\)|\\s", "",
names(mf[1L])), ",")[[1]],
deparse(substitute(FUN)), sep = "."))
}
myOut
This basically captures the value entered for FUN
by using deparse(substitute(FUN))
, so you can probably modify the function to accept a custom suffix, or perhaps even a vector of suffixes. This can probably be improved a bit with some work, but I'm not going to do it!
Here is a Gist with this concept applied, creating a function named "myAgg".
Here is some sample output of just the resulting column names:
> names(myAgg(weight ~ feed, data = chickwts, mean))
[1] "feed" "weight.mean"
> names(myAgg(breaks ~ wool + tension, data = warpbreaks, sum))
[1] "wool" "tension" "breaks.sum"
> names(myAgg(weight ~ feed, data = chickwts, FUN = function(x) mean(x^2)))
[1] "feed" "weight.function(x) mean(x^2)"
Notice that only the aggregated variable name changes. But notice also that if you use a custom function, you'll end up with a really strange column name!
Solution 2:[2]
The answer to your first question is yes. You can certainly include the column names in the aggregate function. Using the names from your example above:
blubb <- aggregate(dat,list(One=dat$One,Two=dat$Two),sum)
I like the part about possibly pulling in the original column names automatically. If I figure it out I'll post it.
Solution 3:[3]
In case you prefer writing aggregates as formula
the documentation shows the usage of cbind
. And cbind
allows you to name its arguments, which are used by aggregate
.
aggregate(cbind(SLength = Sepal.Length) ~ cbind(Type = Species),
data = iris, mean)
# Type SLength
#1 1 5.006
#2 2 5.936
#3 3 6.588
But cbind
replaces factors
by their internal codes and when used on the right side of ~
it keeps the cbind
call in the result. To avoid this the new names can already be given in the input data set using transform
or within
aggregate(SLength ~ Type, transform(iris, SLength = Sepal.Length,
Type = Species), mean)
# Type SLength
#1 setosa 5.006
#2 versicolor 5.936
#3 virginica 6.588
or
aggregate(cbind(SLength = Sepal.Length) ~ Type, transform(iris, Type = Species),
mean)
# Type SLength
#1 setosa 5.006
#2 versicolor 5.936
#3 virginica 6.588
The advantage of using cbind
or data.frame
compared to list
is that not all columns need to be given a (new) name. Aggregation of more than one column by more than one grouping factor could be done like:
aggregate(cbind("Miles/gallon" = mpg, Weight = wt, hp) ~ Cylinders + Carburetors
+ gear, transform(mtcars, Cylinders = cyl, Carburetors = carb), mean)
# Cylinders Carburetors gear Miles/gallon Weight hp
#1 4 1 3 21.50 2.46500 97.0
#2 6 1 3 19.75 3.33750 107.5
#...
and if you want to use more than one function:
aggregate(cbind(cases=ncases, ncontrols) ~ alc + tobgp, transform(esoph,
alc=alcgp), FUN = function(x) c("mean" = mean(x), "median" = median(x)))
# alc tobgp cases.mean cases.median ncontrols.mean ncontrols.median
#1 0-39g/day 0-9g/day 1.5000000 1.0000000 42.000000 44.000000
#2 40-79 0-9g/day 5.6666667 4.0000000 24.166667 29.000000
#...
which adds to the colname the used aggregate-function.
Some examples to aggregate mpg
renamed to Miles/gallon
and hp by cyl
renamed to Cylinders
and gear
using mtcars
resulting in:
# Cylinders gear Miles/gallon hp
#1 4 3 21.500 97.0000
#2 6 3 19.750 107.5000
#3 8 3 15.050 194.1667
#4 4 4 26.925 76.0000
#5 6 4 19.750 116.5000
#6 4 5 28.200 102.0000
#7 6 5 19.700 175.0000
#8 8 5 15.400 299.5000
aggregate(cbind("Miles/gallon" = mpg, hp) ~ Cylinders + gear,
transform(mtcars, Cylinders = cyl, Carburetors = carb), mean)
with(mtcars, aggregate(cbind("Miles/gallon" = mpg, hp),
data.frame(Cylinders = cyl, gear), mean))
with(mtcars, aggregate(data.frame("Miles/gallon" = mpg, hp),
data.frame(Cylinders = cyl, gear), mean))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | orville jackson |
Solution 3 |