'How to plot distribution of missing values in a dataframe

I have a data frame with 100's of column and would like to investigate the proportion of missing values by plotting graph.

I'm able to get the proportion using below code :

Code :

missing_data_in_df=pd.DataFrame({'NaN_Counts': df.isna().sum(), 'NaN_Proportions(%)': (df.isna().sum() / df.shape[0]) * 100}).sort_values(by='NaN_Counts', ascending=False)
missing_data_in_df.head()

Output :

        NaN_Counts  NaN_Proportions(%)
Col1    889061      99.757636
Col2    685843      76.955435
Col3    584612      65.596749
Col4    476524      53.468668
Col4    392318      44.020282

Now when trying to visualize using histogram -

Code :

missing_data_in_df.hist()

I'm getting output as -

enter image description here

Is there any way to get feature names of dataframe in x-axis ?



Solution 1:[1]

With your dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "features": ["Col1", "Col2", "Col3", "Col4", "Col5"],
        "NaN_Counts": [889061, 685843, 584612, 476524, 392318],
        "NaN_Proportions(%)": [99.757636, 76.955435, 65.596749, 53.468668, 44.020282],
    }
)

Here is one way to do it:

df.plot.bar(x="features", subplots=True)

Ouput:

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Laurent