'How to do point biserial correlation for multiple columns in one iteration

I am trying to calculate a point biserial correlation for a set of columns in my datasets. I am able to do it on individual variable, however if i need to calculate for all the columns in one iteration then it is showing an error.

Below is the code:

df = pd.DataFrame({'A':[1, 0, 1, 0, 1], 'B':[6, 7, 8, 9, 10],'C':[9, 4, 6,9,10],'D':[8,9,5,7,10]})

from scipy import stats
corr_list = {}
y = df['A'].astype(float)
for column in df:
    x = df[['B','C','D']].astype(float)
    corr = stats.pointbiserialr(x, y)
    corr_list[['B','C','D']] = corr 
print(corr_list)

TypeError: No loop matching the specified signature and casting was found for ufunc add


Solution 1:[1]

x must be a column not a dataframe, if you take the column instead of the dataframe , it will work. You can try this :

df = pd.DataFrame({'A':[1, 0, 1, 0, 1], 'B':[6, 7, 8, 9, 10],'C':[9, 4, 6,9,10],'D':[8,9,5,7,10]})
print(df)
from scipy import stats
corr_list = []
y = df['A'].astype(float)


for column in df:
    x=df[column]
    corr = stats.pointbiserialr(list(x), list(y))
    corr_list.append(corr[0])
print(corr_list)

by the way you can use print(df.corr())and this give you the Correlation Matrix of the dataframe

Solution 2:[2]

You can use the pd.DataFrame.corrwith() function:

df[['B', 'C', 'D']].corrwith(df['A'].astype('float'), method=stats.pointbiserialr)

Output will be a list of the columns and their corresponding correlations & p-values (row 0 and 1, respectively) with the target DataFrame or Series. Link to docs:

    B               C           D
0   4.547937e-18    0.400066    -0.094916
1   1.000000e+00    0.504554    0.879331

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Youssef_boughanmi
Solution 2