'How to plot distribution of missing values in a dataframe
I have a data frame with 100's of column and would like to investigate the proportion of missing values by plotting graph.
I'm able to get the proportion using below code :
Code :
missing_data_in_df=pd.DataFrame({'NaN_Counts': df.isna().sum(), 'NaN_Proportions(%)': (df.isna().sum() / df.shape[0]) * 100}).sort_values(by='NaN_Counts', ascending=False)
missing_data_in_df.head()
Output :
NaN_Counts NaN_Proportions(%)
Col1 889061 99.757636
Col2 685843 76.955435
Col3 584612 65.596749
Col4 476524 53.468668
Col4 392318 44.020282
Now when trying to visualize using histogram -
Code :
missing_data_in_df.hist()
I'm getting output as -
Is there any way to get feature names of dataframe in x-axis ?
Solution 1:[1]
With your dataframe:
import pandas as pd
df = pd.DataFrame(
{
"features": ["Col1", "Col2", "Col3", "Col4", "Col5"],
"NaN_Counts": [889061, 685843, 584612, 476524, 392318],
"NaN_Proportions(%)": [99.757636, 76.955435, 65.596749, 53.468668, 44.020282],
}
)
Here is one way to do it:
df.plot.bar(x="features", subplots=True)
Ouput:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Laurent |