'Add a column to pandas dataframe containing the proportions for a particular column, based on grouping column

I have some data for which I want to do the following:

group by a set of columns G
for each grouping find the proportion of a particular column within the group
return the full data with the additional proportion column

I'm not sure what a decent approach to this is though, this is something that I tried:

data = pd.DataFrame(
    {
        "x": [1, 2, 3, 4] + [4, 5, 6, 7],
        "y": ["a"] * 4 + ["b"] * 4,
    }
)

gives

then

pd.concat(
    [
        data,
        data.groupby("y")
        .apply(lambda df: df["x"].div(df["x"].sum()))
        .reset_index()
        .rename(columns={"x": "proportion"})
        .drop(["y", "level_1"], axis=1),
    ],
    axis=1,
)

gives

   x  y  proportion
0  1  a    0.100000
1  2  a    0.200000
2  3  a    0.300000
3  4  a    0.400000
4  4  b    0.181818
5  5  b    0.227273
6  6  b    0.272727
7  7  b    0.318182

Solution 1:^[1]

I think you can do it more easily with:

data["proportion"] = data["x"] / data.groupby("y")["x"].transform("sum")
print(data.to_markdown())

Prints:

	x	y	proportion
0	1	a	0.1
1	2	a	0.2
2	3	a	0.3
3	4	a	0.4
4	4	b	0.181818
5	5	b	0.227273
6	6	b	0.272727
7	7	b	0.318182

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Andrej Kesely

'Add a column to pandas dataframe containing the proportions for a particular column, based on grouping column

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]