'How undefined are __builtin_ctz(0) or __builtin_clz(0)?
Background
For a long time, gcc has been providing a number of builtin bit-twiddling functions, in particular the number of trailing and leading 0-bits (also for long unsigned
and long long unsigned
, which have suffixes l
and ll
):
— Built-in Function:
int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in
x
, starting at the most significant bit position. Ifx
is 0, the result is undefined.— Built-in Function:
int __builtin_ctz (unsigned int x)
Returns the number of trailing 0-bits in
x
, starting at the least significant bit position. Ifx
is 0, the result is undefined.
On every online (disclaimer: only x64) compiler I tested, however, the result has been that both clz(0)
and ctz(0)
return the number of bits of the underlying builtin type, e.g.
#include <iostream>
#include <limits>
int main()
{
// prints 32 32 32 on most systems
std::cout << std::numeric_limits<unsigned>::digits << " " << __builtin_ctz(0) << " " << __builtin_clz(0);
}
Attempted workaround
The latest Clang SVN trunk in std=c++1y
mode has made all these functions relaxed C++14 constexpr
, which makes them candidates to use in a SFINAE expression for a wrapper function template around the 3 ctz
/ clz
builtins for unsigned
, unsigned long
, and unsigned long long
template<class T> // wrapper class specialized for u, ul, ull (not shown)
constexpr int ctznz(T x) { return wrapper_class_around_builtin_ctz<T>()(x); }
// overload for platforms where ctznz returns size of underlying type
template<class T>
constexpr auto ctz(T x)
-> typename std::enable_if<ctznz(0) == std::numeric_limits<T>::digits, int>::type
{ return ctznz(x); }
// overload for platforms where ctznz does something else
template<class T>
constexpr auto ctz(T x)
-> typename std::enable_if<ctznz(0) != std::numeric_limits<T>::digits, int>::type
{ return x ? ctznz(x) : std::numeric_limits<T>::digits; }
The gain from this hack is that platforms that give the required result for ctz(0)
can omit an extra conditional to test for x==0
(which might seem a micro-optimization, but when you are already down to the level of builtin bit-twiddling functions, it can make a big difference)
Questions
How undefined is the family of builtin functions clz(0)
and ctz(0)
?
- can they throw an
std::invalid_argument
exception? - for x64, will they for the current gcc distro return the size of the underyling type?
- are the ARM/x86 platforms any different (I have no access to that to test those)?
- is the above SFINAE trick a well-defined way to separate such platforms?
Solution 1:[1]
Unfortunately, even x86-64 implementations can differ - from Intel's instruction set reference,BSF
and BSR
, with a source operand value of (0)
, leaves the destination undefined, and sets the ZF
(zero flag). So the behaviour may not be consistent between micro-architectures or, say, AMD and Intel. (I believe AMD leaves the destination unmodified.)
The newer LZCNT
and TZCNT
instructions are not ubiquitous. Both are present only as of the Haswell architecture (for Intel).
Solution 2:[2]
The reason the value is undefined is that it allows the compiler to use processor instructions for which the result is undefined, when those instructions are the fastest way to get an answer.
But it's important to understand that not only are the results undefined; they are undeterministic. It's valid, given Intel's instruction reference, for the instruction to return the low 7 bits of the current time, for example.
And here's where it gets interesting/dangerous: the compiler writer can take advantage of this situation, to produce smaller code. Consider this non-template-specialization version of your code:
using std::numeric_limits;
template<class T>
constexpr auto ctz(T x) {
return ctznz(0) == numeric_limits<T>::digits || x != 0
? ctznz(x) : numeric_limits<T>::digits;
}
This works well on a processor/compiler that have decided to return #bits for ctznz(0). But un a processor/compiler that decide to return pseudo-random values, the compiler may decide "I'm allowed to return whatever I want for ctznz(0), and the code is smaller if I return #bits, so I will". Then the code ends up calling ctznz all the time, even though it produces the wrong answer.
To put it another way: the compiler's undefined results are not guaranteed to be undefined in the same way that the running program's undefined results are.
There really is no way around this. If you must use __builtin_clz, with a source operand that might be zero, you have to add the check, all the time.
Solution 3:[3]
C++20 update: countl_zero, countr_zero, countl_one, and countr_one are now standard, and in . They will generally call the same asm as the intrinsic.
So, once you are on C++20, don't use the intrinsics.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | jorgbrown |
Solution 3 | Glenn Teitelbaum |