'Sort within groups on entire table

If I have a single column, I can sort that column within groups using the over method. For example,

import polars as pl

df = pl.DataFrame({'group': [2,2,1,1,2,2], 'value': [3,4,3,1,1,3]})
 
df.with_column(pl.col('value').sort().over('group'))
# shape: (6, 2)
# ┌───────┬───────┐
# │ group ┆ value │
# │ ---   ┆ ---   │
# │ i64   ┆ i64   │
# ╞═══════╪═══════╡
# │ 2     ┆ 1     │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2     ┆ 3     │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 1     ┆ 1     │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 1     ┆ 3     │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2     ┆ 3     │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2     ┆ 4     │
# └───────┴───────┘

What is nice about operation is that it maintains the order of the groups (e.g. group=1 is still rows 3 and 4; group=2 is still rows 1, 2, 5, and 6).

But this only works to sort a single column. How do sort an entire table like this? I tried these things below, but none of them worked:

import polars as pl

df = pl.DataFrame({'group': [2,2,1,1,2,2], 'value': [3,4,3,1,1,3], 'value2': [5,4,3,2,1,0]})

df.groupby('group').sort(['value', 'value2'])
# errors

df.sort([pl.col('value').over('group'), pl.col('value2').over('group')])
# does not sort with groups

# Looking for this:
# shape: (6, 3)
# ┌───────┬───────┬────────┐
# │ group ┆ value ┆ value2 │
# │ ---   ┆ ---   ┆ ---    │
# │ i64   ┆ i64   ┆ i64    │
# ╞═══════╪═══════╪════════╡
# │ 2     ┆ 1     ┆ 1      │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 2     ┆ 3     ┆ 0      │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 1     ┆ 1     ┆ 2      │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 1     ┆ 3     ┆ 3      │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 2     ┆ 3     ┆ 5      │
# ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
# │ 2     ┆ 4     ┆ 4      │
# └───────┴───────┴────────┘


Solution 1:[1]

The solution to sorting an entire table in a grouped situation is pl.all().sort_by(sort_columns).over(group_columns).

import polars as pl

df = pl.DataFrame({
  'group': [2,2,1,1,2,2],
  'value': [3,4,3,1,1,3],
  'value2': [5,4,3,2,1,0],
})

df.select(pl.all().sort_by(['value','value2']).over('group'))

Solution 2:[2]

df.select(
    pl.all().sort_by(['value','value2']).over('group').sort_by(['group'])
)

may be helpful.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 drhagen
Solution 2 lemmingxuan