'DataFrame append to DataFrame row by row and reset if condition is matched
I have a DataFrame which I want to slice into many DataFrames by adding rows by one until the sum of column Score
of the DataFrame is greater than 50,000. Once that condition is met, then I want a new slice to begin.
Here is an example of what this might look like:
Solution 1:[1]
Sum Score
cumulatively, floor divide it by 50,000, and shift it up one cell (since you want each group to be > 50,000 and not < 50,000).
import pandas as pd
import numpy as np
# Generating DataFrame with random data
df = pd.DataFrame(np.random.randint(1,60000,15))
# Creating new column that's a cumulative sum with each
# value floor divided by 50000
df['groups'] = df[0].cumsum() // 50000
# Values shifted up one and missing values filled with the maximum value
# so that values at the bottom are included in the last DataFrame slice
df.groups = df.groups.shift(-1, fill_value=df.groups.max())
Then as per this answer you can use pandas.DataFrame.groupby
in a list comprehension to return a list of split DataFrames.
df_list = [df_slice for _, df_slice in df.groupby(['groups'])]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |