'DataFrame append to DataFrame row by row and reset if condition is matched

I have a DataFrame which I want to slice into many DataFrames by adding rows by one until the sum of column Score of the DataFrame is greater than 50,000. Once that condition is met, then I want a new slice to begin.

Here is an example of what this might look like:



Solution 1:[1]

Sum Score cumulatively, floor divide it by 50,000, and shift it up one cell (since you want each group to be > 50,000 and not < 50,000).

import pandas as pd
import numpy as np

# Generating DataFrame with random data
df = pd.DataFrame(np.random.randint(1,60000,15))

# Creating new column that's a cumulative sum with each
# value floor divided by 50000
df['groups'] = df[0].cumsum() // 50000

# Values shifted up one and missing values filled with the maximum value
# so that values at the bottom are included in the last DataFrame slice
df.groups = df.groups.shift(-1, fill_value=df.groups.max())

Then as per this answer you can use pandas.DataFrame.groupby in a list comprehension to return a list of split DataFrames.

df_list = [df_slice for _, df_slice in df.groupby(['groups'])]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1