'Specifying Exact CPU Instruction Set with Cythonized Python Wheels
I have a Python package with a native extension compiled by Cython. Due to some performance needs, the compilation is done with -march=native, -mtune=native
flags. This basically enables the compiler to use any of the available ISA extensions.
Additionally, we keep a non-cythonized, pure-python version of this package. It should be used in environments which are less performance sensitive.
Hence, in total we have two versions published:
- Cythonized wheel built for a very specific platform
- Pure-python wheel.
Some other packages depend on this package, and some of the machines are a bit different than the one that the package was compiled on. Since we used -march=native
, as a result we get SIGILL
, since some ISA extension is missing on the server.
So, in essence, I'd like to somehow make pip
disregard the native wheel if the host CPU is not compatible with the wheel.
The native wheel does have the cp37
and platform name, but I don't see a way to define a more granular ISA requirements here. I can always use --implementation
flags for pip, but I wonder if there's a better way for pip to differentiate among different ISAs.
Thanks,
Solution 1:[1]
The pip infrastructure doesn't support such granularity.
I think a better approach would be to have two versions of the Cython-extension compiled: with -march=native
and without, to install both and to decide at the run time which one should be loaded.
Here is a proof of concept.
The first hoop to jump: how to check at run time which instructions are supported by CPU/OS combination. For the simplicity we will check for AVX (this SO-post has more details) and I offer only a gcc-specific (see also this) solution - called impl_picker.pyx
:
cdef extern from *:
"""
int cpu_supports_avx(void){
return __builtin_cpu_supports("avx");
}
"""
int cpu_supports_avx()
def cpu_has_avx_support():
return cpu_supports_avx() != 0
The second problem: the pyx-file and the module must have the same name. To avoid code duplication, the actual code is in a pxi-file:
# worker.pxi
cdef extern from *:
"""
int compiled_with_avx(void){
#ifdef __AVX__
return 1;
#else
return 0;
#endif
}
"""
int compiled_with_avx()
def compiled_with_avx_support():
return compiled_with_avx() != 0
As one can see, the function compiled_with_avx_support
will yield different results, depending on whether it was compiled with -march=native
or not.
And now we can define two versions of the module just by including the actual code from the *.pxi-file. One module called worker_native.pyx
:
# distutils: extra_compile_args=["-march=native"]
include "worker.pxi"
and worker_fallback.pyx
:
include "worker.pxi"
Building everything, e.g. via cythonize -i -3 *.pyx
, it can be used as follows:
from impl_picker import cpu_has_avx_support
# overhead once when imported:
if cpu_has_avx_support():
import worker_native as worker
else:
print("using fallback worker")
import worker_fallback as worker
print("compiled_with_avx_support:", worker.compiled_with_avx_support())
On my machine the above would lead to compiled_with_avx_support: True
, on older machines the "slower" worker_fallback
will be used and the result will be compiled_with_avx_support: False
.
The goal of this post is not to give a working setup.py
, but just to outline the idea how one could achieve the goal of picking correct version at the run time. Obviously, the setup.py could be quite more complicated: e.g. one would need to compile multiple c-files with different compiler settings (see this SO-post, how this could be achieved).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |