'Oracle Cube Replica in Python Pandas
Oracle has a cube function for groupby clause which takes 2 or more columns and groups the results set in all possible combinations of columns passed to the cube function.
SELECT Time, Region, Department,
sum(Profit) AS Profit FROM sales
GROUP BY CUBE (Time, Region, Dept)
https://docs.oracle.com/cd/F49540_01/DOC/server.815/a68003/rollup_c.htm#32311
Is there a similar function in python pandas or any other python framework which takes in list of pandas columns and produce a new data frame which has results of all possible group by.
I am coding my own version of cube in python. Good to know if already exists one.
Solution 1:[1]
In Pandas you should automatize a function that concatenates horizontally the results of the following groupby
pd.concat([
df.groupby("Time")['Profit].sum().reset_index(),
df.groupby("Profit")['Profit].sum().reset_index(),
df.groupby("Region")['Profit].sum().reset_index(),
df.groupby(["Time", "Region"])['Profit].sum().reset_index(),
df.groupby(["Time", "Dept"])['Profit].sum().reset_index(),
df.groupby(["Region", "Dept"])['Profit].sum().reset_index(),
df.groupby(["Time", "Region", "Dept"])['Profit].sum().reset_index(),
],
axis=0
)
To do this I suggest using itertools.combinations
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | linello |