'Why is the output of the sklearn.feature_selection chi2 nan - can a feature with no variation not be compared to a feature with variation?

I want to build a heat map that correlates whether a feature is present in each column, with whether the feature is present in every other column.

I have this:

import sys
import pandas as pd
from sklearn.feature_selection import chi2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


df = pd.DataFrame([[0,0,0],[0,1,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,1,0],[0,0,0],
           [0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,1,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],
           [0,0,0],[0,0,0],[0,1,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0],[0,1,0],[0,0,0],[0,0,0],[0,0,0],[0,1,0],[0,1,0]],columns=['feature1','feature2','feature3'])


# Resultant Dataframe will be a dataframe where the column names and Index will be the same
# This is a matrix similar to correlation matrix which we get after df.corr()
# Initialize the values in this matrix with 0
resultant = pd.DataFrame(data=[(0 for i in range(len(df.columns))) for i in range(len(df.columns))], 
                         columns=list(df.columns))
resultant.set_index(pd.Index(list(df.columns)), inplace = True)

# Finding p_value for all columns and putting them in the resultant matrix
for i in list(df.columns):
    for j in list(df.columns):
        if i != j:
            chi2_val, p_val = chi2(np.array(df[i]).reshape(-1, 1), np.array(df[j]).reshape(-1, 1))
            resultant.loc[i,j] = p_val
print(resultant)


fig = plt.figure(figsize=(6,6))
sns.heatmap(resultant, annot=True, cmap='Blues')
plt.title('Chi-Square Test Results')
plt.show()

It generates a heat map:

enter image description here

However the actual scores are like this:

          feature1  feature2  feature3
feature1  0.000000  0.867632       NaN
feature2  0.862684  0.000000       NaN
feature3       NaN       NaN       0.0

This is a realistic interpretation of my real data, whether there are only a few missing data points in each column and I wanted to check their relation to all the other columns. Is it not feasible to do this (because for example, in this case, feature 2 is 1 and 0s, but feature 3 is all 0s, so therefore is it just not possible to calculate the chi squared between feature 2 and 3)?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source