'How to perform a Levene's test using scipy
I've been trying to use scipy.stats.levene with no success.
I have a numpy matrix with shape (2128, 45100). Each row is a sample and belongs to one of 3 clusters.
I want to test if there is homoscedasticity between clusters.
I've tried filtering my matrix by cluster and sending the params like so:
from scipy.stats import levene
levene(matrixAudioData[np.ix_((cutTree == 0).ravel()),:][0],
matrixAudioData[np.ix_((cutTree == 1).ravel()),:][0],
matrixAudioData[np.ix_((cutTree == 2).ravel()),:][0])
ValueError: setting an array element with a sequence.
or even
levene(matrixAudioData)
ValueError: Must enter at least two input sample vectors.
This works:
levene([1,2,3],[2,3,4])
But what if each sample is not just one number ?
Please note that each matrixAudioData[np.ix_((cutTree == 0).ravel()),:][0]
that I'm using as parameter has shape (1048, 45100) so it should be fine.
Can you guys point me in any direction ?
Thanks !
Solution 1:[1]
As you have notice levene([1,2,3],[2,3,4])
will work because you are passing array_like
objects to the function. But, taking as input matrixAudioData[np.ix_((cutTree == 0).ravel()),:][0]
would'nt because your require a 1-D array as input.
For example, consider the next example
col1, col2, col3 = list(range(1, 100)), list(range(50, 78)), list(range(115, 139))
notice that each list has different length because we can perform the statistical test with samples of differents length. Now, to call the leven
function we take as input array_like
one dimensional objects
statistic, p_value = leven(col1,col2,col3,center="mean")
In this case, p_value=1.3326317740560537e-14
. If p_value of the Levene's result is greater than 0.05, it can be assumed as there is homogeneity of variance (HOV). Otherwise, there is no homogeneity present.
So, in this case we can reject the null hypothesis that variance is the same across col1
, col2
and col3
.
Solution 2:[2]
Based on the Box's M Test formula, here is a Python program for conducting a Box's M Test on two equal sized covariance matrices X0 and X1 (i.e. each have same no. of rows and columns), stored as numpy arrays using the np.cov() function. This has been tested against SPSS output.
Numpy is a dependency, abbreviated to np.
def box_m(X0,X1):
global Xp
m = 2
k = len(np.cov(X0))
n_1 = len(X0[0])
n_2 = len(X1[0])
n = len(X0[0])+len(X1[0])
Xp = ( ((n_1-1)*np.cov(X0)) + ((n_2-1)*np.cov(X1)) ) / (n-m)
M = ((n-m)*np.log(np.linalg.det(Xp))) \
- (n_1-1)*(np.log(np.linalg.det(np.cov(X0)))) - (n_2-1)*(np.log(np.linalg.det(np.cov(X1))))
c = ( ( 2*(k**2) + (3*k) - 1 ) / ( (6*(k+1)*(m-1)) ) ) \
* ( (1/(n_1-1)) + (1/(n_2-1)) - (1/(n-m)) )
df = (k*(k+1)*(m-1))/2
c2 = ( ((k-1)*(k+2)) / (6*(m-1)) ) \
* ( (1/((n_1-1)**2)) + (1/((n_2-1)**2)) - (1/((n-m)**2)) )
df2 = (df+2) / (np.abs(c2-c**2))
if (c2>c**2):
a_plus = df / (1-c-(df/df2))
F = M / a_plus
else:
a_minus = df2 / (1-c+(2/df2))
F = (df2*M) / (df*(a_minus-M))
print('M = {}'.format(M))
print('c = {}'.format(c))
print('c2 = {}'.format(c2))
print('-------------------')
print('df = {}'.format(df))
print('df2 = {}'.format(df2))
print('-------------------')
print('F = {}'.format(F))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | EMT |
Solution 2 | Andy Banks |