'Polars: Setting categorical column to a specific value while keeping categorical type
Can somebody help me with the preferred way to set a categorical value for some rows of a polars data frame (based on a condition)?
Right now I came up with a solution that works by splitting the original data frame in two parts (condition==True and condition==False). I set the categorical value on the first part and concatenate them together again.
┌────────┬──────┐
│ column ┆ more │
│ --- ┆ --- │
│ cat ┆ i32 │
╞════════╪══════╡
│ a ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ b ┆ 5 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ e ┆ 9 │ <- I want to set column to 'b' for all rows where it is 'e'
└────────┴──────┘
import polars as pl
df = pl.DataFrame(data={'column': ['a', 'b', 'e'], 'values': [1, 5, 9]}, columns=[('column', pl.Categorical), ('more', pl.Int32)])
print(df)
b_cat_value = df.filter(pl.col('column')=='b')['column'].unique()
df_e_replaced_with_b = df.filter(pl.col('column')=='e').with_column(b_cat_value.alias('column'))
df_no_e = df.filter(pl.col('column')!='e')
print(pl.concat([df_no_e, df_e_replaced_with_b]))
Output is as expected:
┌────────┬──────┐
│ column ┆ more │
│ --- ┆ --- │
│ cat ┆ i32 │
╞════════╪══════╡
│ a ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ b ┆ 5 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ b ┆ 9 │ <- column has been set to 'b'
└────────┴──────┘
Is there something more straight forward/canonical to get the b_cat_value
, like something similar to df['column'].dtype['b']
?
And how would I use this in a conditional expression without splitting the data frame apart as in the above example? Something along the lines of...
df.with_column(
pl.when(pl.col('column') == 'e').then(b_cat_value).otherwise(pl.col('column'))
)
Solution 1:[1]
As of polars>=0.13.33
you can simply set a categorical value with a string
and the Categorical
dtype will be maintained.
So in this case:
df.with_column(
pl.when(pl.col("column") == "e").then("b").otherwise(pl.col("column"))
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ritchie46 |