'Cosine similarity calculation between two matrices
I have a code to calculate cosine similarity between two matrices:
def cos_cdist_1(matrix, vector):
v = vector.reshape(1, -1)
return sp.distance.cdist(matrix, v, 'cosine').reshape(-1)
def cos_cdist_2(matrix1, matrix2):
return sp.distance.cdist(matrix1, matrix2, 'cosine').reshape(-1)
list1 = [[1,1,1],[1,2,1]]
list2 = [[1,1,1],[1,2,1]]
matrix1 = np.asarray(list1)
matrix2 = np.asarray(list2)
results = []
for vector in matrix2:
distance = cos_cdist_1(matrix1,vector)
distance = np.asarray(distance)
similarity = (1-distance).tolist()
results.append(similarity)
dist_all = cos_cdist_2(matrix1, matrix2)
results2 = []
for item in dist_all:
distance_result = np.asarray(item)
similarity_result = (1-distance_result).tolist()
results2.append(similarity_result)
results
is
[[1.0000000000000002, 0.9428090415820635],
[0.9428090415820635, 1.0000000000000002]]
However, results2
is [1.0000000000000002, 0.9428090415820635, 0.9428090415820635, 1.0000000000000002]
My ideal result is results
, which means the result contains lists of similarity values, but I want to keep the calculation between two matrices instead of vector and matrix, any good idea?
Solution 1:[1]
In [75]: import scipy.spatial as sp
In [76]: 1 - sp.distance.cdist(matrix1, matrix2, 'cosine')
Out[76]:
array([[ 1. , 0.94280904],
[ 0.94280904, 1. ]])
Therefore, you could eliminate the for-loops
and replace it all with
results2 = 1 - sp.distance.cdist(matrix1, matrix2, 'cosine')
Solution 2:[2]
you can have a look at scikit learn's API for calculating cosine similarity: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html.
Cosine similarity, or the cosine kernel, computes similarity as the normalized dot product of X and Y:
K(X, Y) = <X, Y> / (||X||*||Y||)
X: darray or sparse array, shape: (n_samples_X, n_features)
Y: darray or sparse array, shape: (n_samples_Y, n_features) If None, the output will be the pairwise similarities between all samples in X.
Solution 3:[3]
If A and B are numpy ndarrays, then the following should give you a row-wise cosine similarity also as a numpy array.
def cos(A, B):
return (A*B).sum(axis=1) / (A*A).sum(axis=1) ** .5 / (B*B).sum(axis=1) ** .5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | unutbu |
Solution 2 | Minstein |
Solution 3 | axolotl |