'Does C++ support run-time query for natural width of SIMD units of a core?
In C++, is there a way to query number of lanes of SIMD units like this:
// 4 for bulldozer,
// 8 for skylake,
// 16 for cascadelake
int width = std::this_thread::SIMD_WIDTH;
or does it have to be a non-portable code path? I would like to test some optimizations on a code but I guess keeping tiling-length (4/8/16/32) fixed for a vectorized loop is not good.
By "natural" width, I mean optimal performance. For example, bulldozer has avx capability by joining two cores FPU together. But some operations are better on SSE4 for that. Also Cascadelake can run SSE efficiently but not as efficient as AVX512 (just a guess).
On a more "advanced" point, does it work for CPUs with different cores in same package? For example, newest Intel arch has some efficient cores and performance cores at the same time. What if I query the length as 256 bits (8 lanes for 32bit fp) and OS schedules the thread on a different core that has wider SIMD?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|