'How to order frequency from highest to lowest for character variable
Suppose my dataframe (df) only includes this single character variable:
race.ethnicity<-c("W", "C", "F", "F", "J")
I want to create a frequency table for the top 2 categories. Like the table below (although it includes the top 15 categories)
I am using gtsummary for my frequency table.
Here are the codes:
# summarize the subdata
table1 <- tbl_summary(df, missing = "always",
missing_text = "(Missing)",
percent = "cell",
type = all_dichotomous() ~"categorical"
) %>%
bold_labels()
#export to latex(pdf is not available in the package)
as_kable_extra(table1, format = "latex")
With my current set of codes, I don't get the output by frequency. So any suggestions would be welcome.
If there are other suggestions to create a table like the one above besides using gtsummary
then please share as well. I just want R to spit out the Latex codes as well.
Solution 1:[1]
Use xtabs to make a frequency count, convert that to data frame, sort and take the first two rows. No packages are used.
dat <- as.data.frame(xtabs(~ race.ethnicity))
dat2 <- head(dat[order(-dat$Freq), ], 2)
dat2
giving:
race.ethnicity Freq
2 F 2
1 C 1
To get latex:
library(kableExtra)
kable(dat2, "latex")
giving:
\begin{tabular}{l|l|r}
\hline
& race.ethnicity & Freq\\
\hline
2 & F & 2\\
\hline
1 & C & 1\\
\hline
\end{tabular}
or write it as the following pipeline:
library(dplyr)
library(kableExtra)
xtabs(~ race.ethnicity) %>%
as.data.frame %>%
arrange(desc(Freq)) %>%
slice(1:2) %>%
kable("latex")
or
library(kableExtra)
xtabs(~ race.ethnicity) %>%
{ .[order(- .)] } %>%
head(2) %>%
kable("latex")
Solution 2:[2]
We can use table
(no packages are used)
tbl1 <- table(race.ethnicity)
stack(head(tbl1[order(-tbl1)], 2))
Solution 3:[3]
I easiest way to do this is a combination of the forcats and gtsummary packages. First, we'll use forcats::fct_infreq()
to re-order the variable putting the most frequent levels first. Then we'll use forcats::fct_lump_n()
to keep the two most frequent levels, and all others will be lumped together in an Other category. Lastly,
library(gtsummary)
library(forcats)
gt::pizzaplace %>%
select(name) %>%
mutate(
name =
# re-order with most frequent first
fct_infreq(name) %>%
# keep top two groups; all others in to Other categort
fct_lump_n(n = 2)) %>%
tbl_summary()
You can convert to latex with as_kable_extra(x, format = "latex")
or as_hux_table(x) %>% huxtable::to_latex()
Solution 4:[4]
A gtsummary
solution:
library(dplyr)
library(gtsummary)
race.ethnicity %>%
tbl_summary(
statistic = list(all_categorical() ~ "{n} / {N} ({p}%)")
)
data:
race.ethnicity<- tibble(variable=c("W", "C", "F", "F", "J"))
Solution 5:[5]
Create a table
, sort
it decreasingly, get the head
and make a 'data.frame'.
table(race.ethnicity) |> sort(TRUE) |> head(2) |> data.frame() #Using pipes (since 4.1.0)
#data.frame(head(sort(table(race.ethnicity), TRUE), 2)) #Traditional
# race.ethnicity Freq
#1 F 2
#2 C 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | Daniel D. Sjoberg |
Solution 4 | TarJae |
Solution 5 | GKi |