'How to order frequency from highest to lowest for character variable

Suppose my dataframe (df) only includes this single character variable:

race.ethnicity<-c("W", "C", "F", "F", "J")

I want to create a frequency table for the top 2 categories. Like the table below (although it includes the top 15 categories)

enter image description here

I am using gtsummary for my frequency table.

Here are the codes:

# summarize the subdata
table1 <- tbl_summary(df, missing = "always",                              
                      missing_text = "(Missing)",
                      percent = "cell", 
                      type = all_dichotomous() ~"categorical"
) %>%
  bold_labels()
#export to latex(pdf is not available in the package)
as_kable_extra(table1, format = "latex")

With my current set of codes, I don't get the output by frequency. So any suggestions would be welcome.

If there are other suggestions to create a table like the one above besides using gtsummary then please share as well. I just want R to spit out the Latex codes as well.



Solution 1:[1]

Use xtabs to make a frequency count, convert that to data frame, sort and take the first two rows. No packages are used.

dat <- as.data.frame(xtabs(~ race.ethnicity))
dat2 <- head(dat[order(-dat$Freq), ], 2)
dat2

giving:

  race.ethnicity Freq
2              F    2
1              C    1

To get latex:

library(kableExtra)
kable(dat2, "latex")

giving:

\begin{tabular}{l|l|r}
\hline
  & race.ethnicity & Freq\\
\hline
2 & F & 2\\
\hline
1 & C & 1\\
\hline
\end{tabular}

or write it as the following pipeline:

 library(dplyr)
 library(kableExtra)

 xtabs(~ race.ethnicity) %>%
   as.data.frame %>%
   arrange(desc(Freq)) %>%
   slice(1:2) %>%
   kable("latex")

or

 library(kableExtra)

 xtabs(~ race.ethnicity) %>%
   { .[order(- .)] } %>%
   head(2) %>%
   kable("latex")

Solution 2:[2]

We can use table (no packages are used)

tbl1 <- table(race.ethnicity)
stack(head(tbl1[order(-tbl1)], 2))

Solution 3:[3]

I easiest way to do this is a combination of the forcats and gtsummary packages. First, we'll use forcats::fct_infreq() to re-order the variable putting the most frequent levels first. Then we'll use forcats::fct_lump_n() to keep the two most frequent levels, and all others will be lumped together in an Other category. Lastly,

library(gtsummary)
library(forcats)

gt::pizzaplace %>%
  select(name) %>%
  mutate(
    name =
      # re-order with most frequent first
      fct_infreq(name) %>% 
      # keep top two groups; all others in to Other categort
      fct_lump_n(n = 2)) %>%
  tbl_summary() 

enter image description here

You can convert to latex with as_kable_extra(x, format = "latex") or as_hux_table(x) %>% huxtable::to_latex()

Solution 4:[4]

A gtsummary solution:

library(dplyr)
library(gtsummary)

race.ethnicity %>% 
  tbl_summary(
    statistic = list(all_categorical() ~ "{n} / {N} ({p}%)")
  ) 

data:

race.ethnicity<- tibble(variable=c("W", "C", "F", "F", "J"))

enter image description here

Solution 5:[5]

Create a table, sort it decreasingly, get the head and make a 'data.frame'.

table(race.ethnicity) |> sort(TRUE) |> head(2) |> data.frame()  #Using pipes (since 4.1.0)
#data.frame(head(sort(table(race.ethnicity), TRUE), 2))         #Traditional
#  race.ethnicity Freq
#1              F    2
#2              C    1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Daniel D. Sjoberg
Solution 4 TarJae
Solution 5 GKi