'How to do sum and filter in rbind function

I am using an R package which extracts data from tables in a database based on the flag for each table. If the flag is 1, extract data from that table. If the flag is 0, don't extract data.

The extracted data is stored in the person_list_df list as shown below

list(structure(list(person_id = 21:25, count = 2:6), class = "data.frame", row.names = c(NA, 
    -5L)), structure(list(person_id = 24:28, count = 3:7), class = "data.frame", row.names = c(NA, 
    -5L)))  

Code is below

 person_list_df = list()
 casesANDcontrols_df = list()
 list1 <- data.frame("person_id" = 21:25, "count" = 2:6)
 list2 <- data.frame("person_id" = 24:28, "count" = 3:7)
 person_list_df <- list(list1, list2)
 

What I would like to do is

a) Combine/Merge all this and get the sum of counts (for each person)

b) Filter and store only persons who have count > 4

I tried the below but it doesn't work

casesANDcontrols_df[[1]] <- do.call(rbind, persons_list_df) # how to sum and apply filter here?

I would like to combine/merge all of them and sum their counts and finally select persons who have count > 4

I expect my output to be like as shown below

casesANDcontrols_df[[1]]

        person_id
    1   24        #COUNT IS 8
    2   25        #COUNT IS 10
    3   26        #COUNT IS 5
    4   27        #COUNT IS 6
    5   28        #COUNT IS 7  

The structure of caseANDcontrols_df should be as shown below

enter image description here



Solution 1:[1]

In base R, you can combine the list into one dataframe with do.call +rbind, take sum for each person_id and keep rows where count is greater than 4.

subset(aggregate(count ~ person_id, do.call(rbind, person_list_df), sum), count > 4)

Solution 2:[2]

You can use the rbindlist funciton from data.table, which output a data.table, so you can chain:

bindeddf <- rbindlist(persons_list_df)[,.(count = sum(count)),by = person_id]
bindeddf[count>4]

Solution 3:[3]

Using tidyverse

library(dplyr)
bind_rows(persons_list_df) %>%
     group_by(person_id) %>%
     summarise(count = sum(count)) %>%
     filter(count > 4)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 rwb
Solution 2
Solution 3 akrun