'How can I best use dplyr to subset data and create relative frequency tables?

I'm using the iris data set to learn how to use dplyr, and am trying to create a relative frequency table that looks like this:

Petal.Width .1 .2 .3 .4 .5 .6 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
Species
setosa 0.10 0.58 0.14 0.14 0.02 0.02 0 0 0 0 0 0 0 0 0
versicolor 0 0 0 0 0 0 0.14 0.06 0.10 0.26 0.14 0.02 0.20 0.04 0.06

I'm struggling to group the observations by species, and then produce relative frequencies on a species by species basis.

I'm guessing it'll have to be something using group_by, mutate, and count, but the closest thing I could find online was this:

my_data %>% 
    group_by(Petal.Width,Species) %>% 
    summarise(n = n()) %>%
    ungroup %>% 
    mutate(total = sum(n), rel.freq = n / total)

This was still not quite what I was looking for as it is the total number of observations, not the number per species.

Any help is appreciated greatly!



Solution 1:[1]

Something like this?

Not sure about the "wide" format though; I'd be inclined to keep it as long (omit the pivot_wider step).

library(dplyr)
library(tidyr)

iris %>% 
  count(Species, Petal.Width) %>% 
  group_by(Species) %>% 
  mutate(p = n/sum(n)) %>% 
  ungroup() %>% 
  select(-n) %>% 
  pivot_wider(names_from = "Petal.Width", values_from = "p")

Result:

Species    `0.1` `0.2` `0.3` `0.4` `0.5` `0.6`   `1` `1.1` `1.2` `1.3` `1.4` `1.5` `1.6` `1.7` `1.8` `1.9`   `2` `2.1` `2.2` `2.3` `2.4` `2.5`
  <fct>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa       0.1  0.58  0.14  0.14  0.02  0.02 NA    NA     NA   NA    NA    NA    NA    NA    NA     NA   NA    NA    NA    NA    NA    NA   
2 versicolor  NA   NA    NA    NA    NA    NA     0.14  0.06   0.1  0.26  0.14  0.2   0.06  0.02  0.02  NA   NA    NA    NA    NA    NA    NA   
3 virginica   NA   NA    NA    NA    NA    NA    NA    NA     NA   NA     0.02  0.04  0.02  0.02  0.22   0.1  0.12  0.12  0.06  0.16  0.06  0.06

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 neilfws