'Add a column to pandas dataframe containing the proportions for a particular column, based on grouping column

I have some data for which I want to do the following:

  • group by a set of columns G
  • for each grouping find the proportion of a particular column within the group
  • return the full data with the additional proportion column

I'm not sure what a decent approach to this is though, this is something that I tried:

data = pd.DataFrame(
    {
        "x": [1, 2, 3, 4] + [4, 5, 6, 7],
        "y": ["a"] * 4 + ["b"] * 4,
    }
)

gives

   x  y
0  1  a
1  2  a
2  3  a
3  4  a
4  4  b
5  5  b
6  6  b
7  7  b

then

pd.concat(
    [
        data,
        data.groupby("y")
        .apply(lambda df: df["x"].div(df["x"].sum()))
        .reset_index()
        .rename(columns={"x": "proportion"})
        .drop(["y", "level_1"], axis=1),
    ],
    axis=1,
)

gives

   x  y  proportion
0  1  a    0.100000
1  2  a    0.200000
2  3  a    0.300000
3  4  a    0.400000
4  4  b    0.181818
5  5  b    0.227273
6  6  b    0.272727
7  7  b    0.318182


Solution 1:[1]

I think you can do it more easily with:

data["proportion"] = data["x"] / data.groupby("y")["x"].transform("sum")
print(data.to_markdown())

Prints:

x y proportion
0 1 a 0.1
1 2 a 0.2
2 3 a 0.3
3 4 a 0.4
4 4 b 0.181818
5 5 b 0.227273
6 6 b 0.272727
7 7 b 0.318182

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrej Kesely