'How can I best use dplyr to subset data and create relative frequency tables?
I'm using the iris
data set to learn how to use dplyr
, and am trying to create a relative frequency table that looks like this:
Petal.Width | .1 | .2 | .3 | .4 | .5 | .6 | 1 | 1.1 | 1.2 | 1.3 | 1.4 | 1.5 | 1.6 | 1.7 | 1.8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Species | |||||||||||||||
setosa | 0.10 | 0.58 | 0.14 | 0.14 | 0.02 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
versicolor | 0 | 0 | 0 | 0 | 0 | 0 | 0.14 | 0.06 | 0.10 | 0.26 | 0.14 | 0.02 | 0.20 | 0.04 | 0.06 |
I'm struggling to group the observations by species, and then produce relative frequencies on a species by species basis.
I'm guessing it'll have to be something using group_by, mutate, and count, but the closest thing I could find online was this:
my_data %>%
group_by(Petal.Width,Species) %>%
summarise(n = n()) %>%
ungroup %>%
mutate(total = sum(n), rel.freq = n / total)
This was still not quite what I was looking for as it is the total number of observations, not the number per species.
Any help is appreciated greatly!
Solution 1:[1]
Something like this?
Not sure about the "wide" format though; I'd be inclined to keep it as long (omit the pivot_wider
step).
library(dplyr)
library(tidyr)
iris %>%
count(Species, Petal.Width) %>%
group_by(Species) %>%
mutate(p = n/sum(n)) %>%
ungroup() %>%
select(-n) %>%
pivot_wider(names_from = "Petal.Width", values_from = "p")
Result:
Species `0.1` `0.2` `0.3` `0.4` `0.5` `0.6` `1` `1.1` `1.2` `1.3` `1.4` `1.5` `1.6` `1.7` `1.8` `1.9` `2` `2.1` `2.2` `2.3` `2.4` `2.5`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 0.1 0.58 0.14 0.14 0.02 0.02 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 versicolor NA NA NA NA NA NA 0.14 0.06 0.1 0.26 0.14 0.2 0.06 0.02 0.02 NA NA NA NA NA NA NA
3 virginica NA NA NA NA NA NA NA NA NA NA 0.02 0.04 0.02 0.02 0.22 0.1 0.12 0.12 0.06 0.16 0.06 0.06
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | neilfws |