'Color pandas DataFrame value if larger than 1.5*median(column)

Let's say I have a DataFrame that looks like this:

df= pd.DataFrame({'A': [1,-2,0,-1,17],
                  'B': [11,-23,1,-3,132],
                  'C': [121,2029,-243,17,-45]}
                )

I use a jupyter notebook and want to colour with df.style the values in each column only if they exceed a value X, where X=1.5*median(column). So, I would like to have something like this:

enter image description here

Preferably, I would like to have some gradient (df.style.background_gradient) to the colouring of the values, e.g. in column A the entry 17 to be darker than 1, because 17 is further away from the median of the column. But the gradient is optional.

How can I do this?



Solution 1:[1]

This answer uses pandas 1.4.2, the Styler can function differently depending on version.

The simple case is fairly straightforward. Create a function which accepts a Series as input and then use np.where to conditionally build styles:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'A': [1, -2, 0, -1, 17],
    'B': [11, -23, 1, -3, 132],
    'C': [121, 2029, -243, 17, -45]
})


def simple_median_style(
        s: pd.Series, true_css: str, false_css: str = ''
) -> np.ndarray:
    return np.where(s > 1.5 * s.median(), true_css, false_css)


df.style.apply(simple_median_style, true_css='background-color: green')

Simply styled table with green background for values above 1.5 * median

Simple support for separate styles for values > 1.5 * median and less than by configuring the true_color and false_color values. Naturally, more functionality can be added depending on specific need.


The gradient piece is a bit more involved. We can use get_cmap to get a Colormap from its name. Then we can create a CenteredNorm to form a gradient around a specific value. In this case the median value (1.5 * median) for each column. Using these two together we can create a gradient over the entire column.

Here I've used a simple list comprehension to conditionally apply the gradient or some false style ('' no styles).

from typing import List

import pandas as pd
from matplotlib.cm import get_cmap
from matplotlib.colors import Colormap, CenteredNorm, rgb2hex

df = pd.DataFrame({
    'A': [1, -2, 0, -1, 17],
    'B': [11, -23, 1, -3, 132],
    'C': [121, 2029, -243, 17, -45]
})


def centered_gradient(
        s: pd.Series, cmap: Colormap, false_css: str = ''
) -> List[str]:
    # Find center point
    center = 1.5 * s.median()
    # Create normaliser centered on median
    norm = CenteredNorm(vcenter=center)
    # s = s.where(s > center, center)
    return [
        # Conditionally apply gradient to values above center only
        f'background-color: {rgb2hex(rgba)}' if row > center else false_css
        for row, rgba in zip(s, cmap(norm(s)))
    ]


df.style.apply(centered_gradient, cmap=get_cmap('Greens'))

Styled table with gradient centered on median value, but styles conditionally applied

Note: this approach considers all values when normalising so the gradient will be affected by all values in the column.


In case the more general case is needed, an unconditional gradient centered on the median could be built with (the rest is the same as the complete example above):

def centered_gradient(
        s: pd.Series, cmap: Colormap, false_css: str = ''
) -> List[str]:
    # Find center point
    center = 1.5 * s.median()
    # Create normaliser centered on median
    norm = CenteredNorm(vcenter=center)
    # Convert rgba value arrays to hex
    return [
        f'background-color: {rgb2hex(rgba)}' for rgba in cmap(norm(s))
    ]

Styled table with gradient centered on median value

Solution 2:[2]

Whilst @HenryEcker solution is well explained and detailed, a very simple approach would be to directly tackle your problem with style chaining, something like:

styler = df.style
for col in df.columns:
    mask = (df[col] > df[col].median() * 1.5)
    styler.background_gradient(subset=(mask, col), cmap="Blues", vmin=-100)
styler

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Attack68