'Python CUDA parallize multiple SVD's of small matrices

I've seen a similar post on stackoverflow which tackles the problem in C++: Parallel implementation for multiple SVDs using CUDA I want to do exactly the same in python, is that possible? I have multiple matrices (approximately 8000 with size 15x3) and each of them I want to decompose using the SVD. This takes years on a CPU. Is it possible to do that in python? My computer has an NVIDIA GPU installed. I already had a look at several libraries such as numba, pycuda, scikit-cuda, cupy but didnt found a way to implement my plan with that libraries. I would be very glad for some help.



Solution 1:[1]

cuPy gives access to cuSolver, including a batched SVD:

https://docs.cupy.dev/en/stable/reference/generated/cupy.linalg.svd.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 talonmies