native isnan check in C++ - c++

I stumbled upon this code to check for NaN:
* isnan(val) returns true if val is nan.
* We cannot rely on std::isnan or x!=x, because GCC may wrongly optimize it
* away when compiling with -ffast-math (default in RASR).
* This function basically does 3 things:
* - ignore the sign (first bit is dropped with <<1)
* - interpret val as an unsigned integer (union)
* - compares val to the nan-bitmask (ones in the exponent, non-zero significand)
template<typename T>
inline bool isnan(T val) {
if (sizeof(val) == 4) {
union { f32 f; u32 x; } u = { (f32)val };
return (u.x << 1) > 0xff000000u;
} else if (sizeof(val) == 8) {
union { f64 f; u64 x; } u = { (f64)val };
return (u.x << 1) > 0x7ff0000000000000u;
} else {
std::cerr << "isnan is not implemented for sizeof(datatype)=="
<< sizeof(val) << std::endl;
This looks arch dependent, right? However, I'm not sure about endianess, because no matter about little or big endian, the float and the int are probably stored in the same order.
Also, I wonder whether something like
volatile T x = val;
return std::isnan(x);
would have worked.
This was used with GCC 4.6 in the past.

Also, I wonder whether something like std::isnan((volatile)x) would have worked.
isnan takes its argument by value so the volatile qualifier would have been discarded. In other words, no, this doesn’t work.
The code you’ve posted relies on a specific floating point representation (IEEE). It also exhibits undefined behaviour since it relies on the union hack to retrieve the underlying float representation.
On a note about code review, the function is badly written even if we ignore the potential problems of the previous paragraph (which are justifiable): why does the function use runtime checks rather than compile-time checks and compile time error handling? It would have been better and easier just to offer two overloads.


Optimizations are killing my integer overflow checks in clang 6

I have a fixed-point implementation for some financial application. It's basically an integer wrapped in a class that is based on the number of decimals given Ntreated as a decimal number. The class is paranoid and checks for overflows, but when I ran my tests in release mode, and they failed, and finally I created this minimal example that demonstrates the problem:
#include <iostream>
#include <sstream>
template <typename T, typename U>
typename std::enable_if<std::is_convertible<U, std::string>::value, T>::type
FromString(U&& str)
std::stringstream ss;
ss << str;
T ret;
ss >> ret;
return ret;
int main()
int NewAccu=32;
int N=10;
using T = int64_t;
T l = 10;
T r = FromString<T>("1" + std::string(NewAccu - N, '0'));
if (l == 0 || r == 0) {
return 0;
T res = l * r;
std::cout << l << std::endl;
std::cout << r << std::endl;
std::cout << res << std::endl;
std::cout << (res / l) << std::endl;
std::cout << std::endl;
if ((res / l) != r) {
throw std::runtime_error(
"FixedPoint Multiplication Overflow while upscaling [:" + std::to_string(l) + ", " + std::to_string(r) + "]");
return 0;
This happens with Clang 6, my version is:
$ clang++ --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
It's funny because it's an impressive optimization, but this ruins my application and prevents me from detecting overflows. I was able to reproduce this problem in g++ here. It doesn't throw an exception there.
Notice that the exception is thrown in debug mode, but it's not in release mode.
As #Basile already stated, signed integer overflow is an undefined behavior, so the compiler can handle it in any way - even optimizing it away to gain a performance advantage. So detecting integer overflow after its occurence is way too late. Instead, you should predict integer overflow just before it occurs.
Here is my implementation of overflow prediction of integer multiplication:
#include <limits>
template <typename T>
bool predict_mul_overflow(T x, T y)
static_assert(std::numeric_limits<T>::is_integer, "predict_mul_overflow expects integral types");
if constexpr (std::numeric_limits<T>::is_bounded)
return ((x != T{0}) && ((std::numeric_limits<T>::max() / x) < y));
return false;
The function returns true if the integer multiplication x * y is predicted to overflow.
Note that while unsigned overflow is well-defined in terms of modular arithmetic, signed overflow is an undefined behavior. Nevertheless, the presented function works for signed and unsigned T types as well.
If you want to detect (signed) integer overflows (on scalar types like int64_t or long), you should use appropriate builtins, often compiler specific.
For GCC, see integer overflow builtins.
Integer overflow (on plain int or long or other signed integral type) is an instance of undefined behavior, so the compiler can optimize as it please against it. Be scared. If you depend on UB you are no more coding in standard C++ and your program is tied to a particular compiler and system, so is not portable at all (even to other compilers, other compiler versions, other compilation flags, other computers and OSes). So Clang (or GCC) is allowed to optimize against integer overflow, and sometimes does.
Or consider using some bignum package (then of course you don't deal with just predefined C++ integral scalar types). Perhaps GMPlib.
You could consider using GCC's __int128 if your numbers fit into 128 bits.
I believe you cannot reliably detect integer overflows when they happen (unless you use the integer overflow builtins). You should avoid them (or use some bignum library, or some library using these builtins, etc.).

How does this float square root approximation work?

I found a rather strange but working square root approximation for floats; I really don't get it. Can someone explain me why this code works?
float sqrt(float f)
const int result = 0x1fbb4000 + (*(int*)&f >> 1);
return *(float*)&result;
I've test it a bit and it outputs values off of std::sqrt() by about 1 to 3%. I know of the Quake III's fast inverse square root and I guess it's something similar here (without the newton iteration) but I'd really appreciate an explanation of how it works.
(nota: I've tagged it both c and c++ since it's both valid-ish (see comments) C and C++ code)
(*(int*)&f >> 1) right-shifts the bitwise representation of f. This almost divides the exponent by two, which is approximately equivalent to taking the square root.1
Why almost? In IEEE-754, the actual exponent is e - 127.2 To divide this by two, we'd need e/2 - 64, but the above approximation only gives us e/2 - 127. So we need to add on 63 to the resulting exponent. This is contributed by bits 30-23 of that magic constant (0x1fbb4000).
I'd imagine the remaining bits of the magic constant have been chosen to minimise the maximum error across the mantissa range, or something like that. However, it's unclear whether it was determined analytically, iteratively, or heuristically.
It's worth pointing out that this approach is somewhat non-portable. It makes (at least) the following assumptions:
The platform uses single-precision IEEE-754 for float.
The endianness of float representation.
That you will be unaffected by undefined behaviour due to the fact this approach violates C/C++'s strict-aliasing rules.
Thus it should be avoided unless you're certain that it gives predictable behaviour on your platform (and indeed, that it provides a useful speedup vs. sqrtf!).
1. sqrt(a^b) = (a^b)^0.5 = a^(b/2)
2. See e.g.
See Oliver Charlesworth’s explanation of why this almost works. I’m addressing an issue raised in the comments.
Since several people have pointed out the non-portability of this, here are some ways you can make it more portable, or at least make the compiler tell you if it won’t work.
First, C++ allows you to check std::numeric_limits<float>::is_iec559 at compile time, such as in a static_assert. You can also check that sizeof(int) == sizeof(float), which will not be true if int is 64-bits, but what you really want to do is use uint32_t, which if it exists will always be exactly 32 bits wide, will have well-defined behavior with shifts and overflow, and will cause a compilation error if your weird architecture has no such integral type. Either way, you should also static_assert() that the types have the same size. Static assertions have no run-time cost and you should always check your preconditions this way if possible.
Unfortunately, the test of whether converting the bits in a float to a uint32_t and shifting is big-endian, little-endian or neither cannot be computed as a compile-time constant expression. Here, I put the run-time check in the part of the code that depends on it, but you might want to put it in the initialization and do it once. In practice, both gcc and clang can optimize this test away at compile time.
You do not want to use the unsafe pointer cast, and there are some systems I’ve worked on in the real world where that could crash the program with a bus error. The maximally-portable way to convert object representations is with memcpy(). In my example below, I type-pun with a union, which works on any actually-existing implementation. (Language lawyers object to it, but no successful compiler will ever break that much legacy code silently.) If you must do a pointer conversion (see below) there is alignas(). But however you do it, the result will be implementation-defined, which is why we check the result of converting and shifting a test value.
Anyway, not that you’re likely to use it on a modern CPU, here’s a gussied-up C++14 version that checks those non-portable assumptions:
#include <cassert>
#include <cmath>
#include <cstdint>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <vector>
using std::cout;
using std::endl;
using std::size_t;
using std::sqrt;
using std::uint32_t;
template <typename T, typename U>
inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it reads an inactive union member.
static_assert( sizeof(T)==sizeof(U), "" );
union tu_pun {
U u = U();
T t;
const tu_pun pun{x};
return pun.t;
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
const bool is_little_endian = after_rshift == target;
float est_sqrt(const float x)
/* A fast approximation of sqrt(x) that works less well for subnormal numbers.
static_assert( std::numeric_limits<float>::is_iec559, "" );
assert(is_little_endian); // Could provide alternative big-endian code.
/* The algorithm relies on the bit representation of normal IEEE floats, so
* a subnormal number as input might be considered a domain error as well?
if ( std::isless(x, 0.0F) || !std::isfinite(x) )
return std::numeric_limits<float>::signaling_NaN();
constexpr uint32_t magic_number = 0x1fbb4000UL;
const uint32_t raw_bits = reinterpret<uint32_t,float>(x);
const uint32_t rejiggered_bits = (raw_bits >> 1U) + magic_number;
return reinterpret<float,uint32_t>(rejiggered_bits);
int main(void)
static const std::vector<float> test_values{
4.0F, 0.01F, 0.0F, 5e20F, 5e-20F, 1.262738e-38F };
for ( const float& x : test_values ) {
const double gold_standard = sqrt((double)x);
const double estimate = est_sqrt(x);
const double error = estimate - gold_standard;
cout << "The error for (" << estimate << " - " << gold_standard << ") is "
<< error;
if ( gold_standard != 0.0 && std::isfinite(gold_standard) ) {
const double error_pct = error/gold_standard * 100.0;
cout << " (" << error_pct << "%).";
} else
cout << '.';
cout << endl;
Here is an alternative definition of reinterpret<T,U>() that avoids type-punning. You could also implement the type-pun in modern C, where it’s allowed by standard, and call the function as extern "C". I think type-punning is more elegant, type-safe and consistent with the quasi-functional style of this program than memcpy(). I also don’t think you gain much, because you still could have undefined behavior from a hypothetical trap representation. Also, clang++ 3.9.1 -O -S is able to statically analyze the type-punning version, optimize the variable is_little_endian to the constant 0x1, and eliminate the run-time test, but it can only optimize this version down to a single-instruction stub.
But more importantly, this code isn’t guaranteed to work portably on every compiler. For example, some old computers can’t even address exactly 32 bits of memory. But in those cases, it should fail to compile and tell you why. No compiler is just suddenly going to break a huge amount of legacy code for no reason. Although the standard technically gives permission to do that and still say it conforms to C++14, it will only happen on an architecture very different from we expect. And if our assumptions are so invalid that some compiler is going to turn a type-pun between a float and a 32-bit unsigned integer into a dangerous bug, I really doubt the logic behind this code will hold up if we just use memcpy() instead. We want that code to fail at compile time, and to tell us why.
#include <cassert>
#include <cstdint>
#include <cstring>
using std::memcpy;
using std::uint32_t;
template <typename T, typename U> inline T reinterpret(const U &x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it modifies a variable.
static_assert( sizeof(T)==sizeof(U), "" );
T temp;
memcpy( &temp, &x, sizeof(T) );
return temp;
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
extern const bool is_little_endian = after_rshift == target;
However, Stroustrup et al., in the C++ Core Guidelines, recommend a reinterpret_cast instead:
#include <cassert>
template <typename T, typename U> inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it uses reinterpret_cast.
static_assert( sizeof(T)==sizeof(U), "" );
const U temp alignas(T) alignas(U) = x;
return *reinterpret_cast<const T*>(&temp);
The compilers I tested can also optimize this away to a folded constant. Stroustrup’s reasoning is [sic]:
Accessing the result of an reinterpret_cast to a different type from the objects declared type is still undefined behavior, but at least we can see that something tricky is going on.
From the comments: C++20 introduces std::bit_cast, which converts an object representation to a different type with unspecified, not undefined, behavior. This doesn’t guarantee that your implementation will use the same format of float and int that this code expects, but it doesn’t give the compiler carte blanche to break your program arbitrarily because there’s technically undefined behavior in one line of it. It can also give you a constexpr conversion.
Let y = sqrt(x),
it follows from the properties of logarithms that log(y) = 0.5 * log(x) (1)
Interpreting a normal float as an integer gives INT(x) = Ix = L * (log(x) + B - σ) (2)
where L = 2^N, N the number of bits of the significand, B is the exponent bias, and σ is a free factor to tune the approximation.
Combining (1) and (2) gives: Iy = 0.5 * (Ix + (L * (B - σ)))
Which is written in the code as (*(int*)&x >> 1) + 0x1fbb4000;
Find the σ so that the constant equals 0x1fbb4000 and determine whether it's optimal.
Adding a wiki test harness to test all float.
The approximation is within 4% for many float, but very poor for sub-normal numbers. YMMV
Worst:1.401298e-45 211749.20%
Worst:1.262738e-38 3.52%
Note that with argument of +/-0.0, the result is not zero.
printf("% e % e\n", sqrtf(+0.0), sqrt_apx(0.0)); // 0.000000e+00 7.930346e-20
printf("% e % e\n", sqrtf(-0.0), sqrt_apx(-0.0)); // -0.000000e+00 -2.698557e+19
Test code
#include <float.h>
#include <limits.h>
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
float sqrt_apx(float f) {
const int result = 0x1fbb4000 + (*(int*) &f >> 1);
return *(float*) &result;
double error_value = 0.0;
double error_worst = 0.0;
double error_sum = 0.0;
unsigned long error_count = 0;
void sqrt_test(float f) {
if (f == 0) return;
volatile float y0 = sqrtf(f);
volatile float y1 = sqrt_apx(f);
double error = (1.0 * y1 - y0) / y0;
error = fabs(error);
if (error > error_worst) {
error_worst = error;
error_value = f;
error_sum += error;
void sqrt_tests(float f0, float f1) {
error_value = error_worst = error_sum = 0.0;
error_count = 0;
for (;;) {
if (f0 == f1) break;
f0 = nextafterf(f0, f1);
printf("Worst:%e %.2f%%\n", error_value, error_worst*100.0);
printf("Average:%.2f%%\n", error_sum / error_count);
int main() {
sqrt_tests(FLT_TRUE_MIN, FLT_MIN);
sqrt_tests(FLT_MIN, FLT_MAX);
return 0;

How to check a double's bit pattern is 0x0 in a C++11 constexpr?

I want to check that a given double/float variable has the actual bit pattern 0x0. Don't ask why, it's used in a function in Qt (qIsNull()) that I'd like to be constexpr.
The original code used a union:
union { double d; int64_t i; } u;
u.d = d;
return u.i == 0;
This doesn't work as a constexpr of course.
The next try was with reinterpret_cast:
return *reinterpret_cast<int64_t*>(&d) == 0;
But while that works as a constexpr in GCC 4.7, it fails (rightfully, b/c of pointer manipulation) in Clang 3.1.
The final idea was to go Alexandrescuesque and do this:
template <typename T1, typename T2>
union Converter {
T1 t1;
T2 t2;
explicit constexpr Converter( T1 t1 ) : t1(t1) {}
constexpr operator T2() const { return t2; }
// in qIsNull():
return Converter<double,int64_t>(d);
But that's not clever enough for Clang, either:
note: read of member 't2' of union with active member 't1' is not allowed in a constant expression
constexpr operator T2() const { return t2; }
Does anyone else have a good idea?
I want to check that a given double/float variable has the actual bit pattern 0x0
But if it's constexpr then it's not checking any variable, it's checking the value that this variable is statically determined to hold. That's why you aren't supposed to pull pointer and union tricks, "officially" there isn't any memory to point at.
If you can persuade your implementation to do non-trapping IEEE division-by-zero, then you could do something like:
return (d == 0) && (1 / d > 0)
Only +/-0 are equal to 0. 1/-0 is -Inf, which isn't greater than 0. 1/+0 is +Inf, which is. But I don't know how to make that non-trapping arithmetic happen.
It seems both clang++ 3.0 and g++ 4.7 (but not 4.6) treats std::signbit as constexpr.
return x == 0 && std::signbit(x) == 0;
It is not possible to look at the underlying bit pattern of a double from within a constant expression. There was a defect in the C++11 standard which allowed such inspection by casting via a void*, but that was addressed by C++ core issue 1312.
As "proof", clang's constexpr implementation (which is considered to be complete) has no mechanism for extracting the representation of a constant double value (other than via non-standard vector operations, and even then there is currently no way to inspect the result).
As others have suggested, if you know you will be targeting a platform which uses IEEE-754 floating point, 0x0 corresponds to the value positive zero. I believe the only way to detect this, that works inside a constant expression in both clang and g++, is to use __builtin_copysign:
constexpr bool isPosZero(double d) {
return d == 0.0 && __builtin_copysign(1.0, d) == 1.0;

Is there a standard sign function (signum, sgn) in C/C++?

I want a function that returns -1 for negative numbers and +1 for positive numbers.
It's easy enough to write my own, but it seems like something that ought to be in a standard library somewhere.
Edit: Specifically, I was looking for a function working on floats.
The type-safe C++ version:
template <typename T> int sgn(T val) {
return (T(0) < val) - (val < T(0));
Actually implements signum (-1, 0, or 1). Implementations here using copysign only return -1 or 1, which is not signum. Also, some implementations here are returning a float (or T) rather than an int, which seems wasteful.
Works for ints, floats, doubles, unsigned shorts, or any custom types constructible from integer 0 and orderable.
Fast! copysign is slow, especially if you need to promote and then narrow again. This is branchless and optimizes excellently
Standards-compliant! The bitshift hack is neat, but only works for some bit representations, and doesn't work when you have an unsigned type. It could be provided as a manual specialization when appropriate.
Accurate! Simple comparisons with zero can maintain the machine's internal high-precision representation (e.g. 80 bit on x87), and avoid a premature round to zero.
It's a template so it might take longer to compile in some circumstances.
Apparently some people think use of a new, somewhat esoteric, and very slow standard library function that doesn't even really implement signum is more understandable.
The < 0 part of the check triggers GCC's -Wtype-limits warning when instantiated for an unsigned type. You can avoid this by using some overloads:
template <typename T> inline constexpr
int signum(T x, std::false_type is_signed) {
return T(0) < x;
template <typename T> inline constexpr
int signum(T x, std::true_type is_signed) {
return (T(0) < x) - (x < T(0));
template <typename T> inline constexpr
int signum(T x) {
return signum(x, std::is_signed<T>());
(Which is a good example of the first caveat.)
I don't know of a standard function for it. Here's an interesting way to write it though:
(x > 0) - (x < 0)
Here's a more readable way to do it:
if (x > 0) return 1;
if (x < 0) return -1;
return 0;
If you like the ternary operator you can do this:
(x > 0) ? 1 : ((x < 0) ? -1 : 0)
There is a C99 math library function called copysign(), which takes the sign from one argument and the absolute value from the other:
result = copysign(1.0, value) // double
result = copysignf(1.0, value) // float
result = copysignl(1.0, value) // long double
will give you a result of +/- 1.0, depending on the sign of value. Note that floating point zeroes are signed: (+0) will yield +1, and (-0) will yield -1.
It seems that most of the answers missed the original question.
Is there a standard sign function (signum, sgn) in C/C++?
Not in the standard library, however there is copysign which can be used almost the same way via copysign(1.0, arg) and there is a true sign function in boost, which might as well be part of the standard.
#include <boost/math/special_functions/sign.hpp>
//Returns 1 if x > 0, -1 if x < 0, and 0 if x is zero.
template <class T>
inline int sign (const T& z);
Apparently, the answer to the original poster's question is no. There is no standard C++ sgn function.
Is there a standard sign function (signum, sgn) in C/C++?
Yes, depending on definition.
C99 and later has the signbit() macro in <math.h>
int signbit(real-floating x);
The signbit macro returns a nonzero value if and only if the sign of its argument value is negative. C11 §
Yet OP wants something a little different.
I want a function that returns -1 for negative numbers and +1 for positive numbers. ... a function working on floats.
#define signbit_p1_or_n1(x) ((signbit(x) ? -1 : 1)
OP's question is not specific in the following cases: x = 0.0, -0.0, +NaN, -NaN.
A classic signum() returns +1 on x>0, -1 on x<0 and 0 on x==0.
Many answers have already covered that, but do not address x = -0.0, +NaN, -NaN. Many are geared for an integer point-of-view that usually lacks Not-a-Numbers (NaN) and -0.0.
Typical answers function like signnum_typical() On -0.0, +NaN, -NaN, they return 0.0, 0.0, 0.0.
int signnum_typical(double x) {
if (x > 0.0) return 1;
if (x < 0.0) return -1;
return 0;
Instead, I propose this functionality: On -0.0, +NaN, -NaN, it returns -0.0, +NaN, -NaN.
double signnum_c(double x) {
if (x > 0.0) return 1.0;
if (x < 0.0) return -1.0;
return x;
Faster than the above solutions, including the highest rated one:
(x < 0) ? -1 : (x > 0)
There's a way to do it without branching, but it's not very pretty.
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
Lots of other interesting, overly-clever stuff on that page, too...
If all you want is to test the sign, use signbit (returns true if its argument has a negative sign).
Not sure why you would particularly want -1 or +1 returned; copysign is more convenient
for that, but it sounds like it will return +1 for negative zero on some platforms with
only partial support for negative zero, where signbit presumably would return true.
In general, there is no standard signum function in C/C++, and the lack of such a fundamental function tells you a lot about these languages.
Apart from that, I believe both majority viewpoints about the right approach to define such a function are in a way correct, and the "controversy" about it is actually a non-argument once you take into account two important caveats:
A signum function should always return the type of its operand, similarly to an abs() function, because signum is usually used for multiplication with an absolute value after the latter has been processed somehow. Therefore, the major use case of signum is not comparisons but arithmetic, and the latter shouldn't involve any expensive integer-to/from-floating-point conversions.
Floating point types do not feature a single exact zero value: +0.0 can be interpreted as "infinitesimally above zero", and -0.0 as "infinitesimally below zero". That's the reason why comparisons involving zero must internally check against both values, and an expression like x == 0.0 can be dangerous.
Regarding C, I think the best way forward with integral types is indeed to use the (x > 0) - (x < 0) expression, as it should be translated in a branch-free fashion, and requires only three basic operations. Best define inline functions that enforce a return type matching the argument type, and add a C11 define _Generic to map these functions to a common name.
With floating point values, I think inline functions based on C11 copysignf(1.0f, x), copysign(1.0, x), and copysignl(1.0l, x) are the way to go, simply because they're also highly likely to be branch-free, and additionally do not require casting the result from integer back into a floating point value. You should probably comment prominently that your floating point implementations of signum will not return zero because of the peculiarities of floating point zero values, processing time considerations, and also because it is often very useful in floating point arithmetic to receive the correct -1/+1 sign, even for zero values.
My copy of C in a Nutshell reveals the existence of a standard function called copysign which might be useful. It looks as if copysign(1.0, -2.0) would return -1.0 and copysign(1.0, 2.0) would return +1.0.
Pretty close huh?
The question is old but there is now this kind of desired function. I added a wrapper with not, left shift and dec.
You can use a wrapper function based on signbit from C99 in order to get the exact desired behavior (see code further below).
Returns whether the sign of x is negative.
This can be also applied to infinites, NaNs and zeroes (if zero is unsigned, it is considered positive
#include <math.h>
int signValue(float a) {
return ((!signbit(a)) << 1) - 1;
NB: I use operand not ("!") because the return value of signbit is not specified to be 1 (even though the examples let us think it would always be this way) but true for a negative number:
Return value
A non-zero value (true) if the sign of x is negative; and zero (false) otherwise.
Then I multiply by two with left shift (" << 1") which will give us 2 for a positive number and 0 for a negative one and finally decrement by 1 to obtain 1 and -1 for respectively positive and negative numbers as requested by OP.
The accepted answer with the overload below does indeed not trigger -Wtype-limits. But it does trigger unused argument warnings (on the is_signed variable). To avoid these the second argument should not be named like so:
template <typename T> inline constexpr
int signum(T x, std::false_type) {
return T(0) < x;
template <typename T> inline constexpr
int signum(T x, std::true_type) {
return (T(0) < x) - (x < T(0));
template <typename T> inline constexpr
int signum(T x) {
return signum(x, std::is_signed<T>());
For C++11 and higher an alternative could be.
template <typename T>
typename std::enable_if<std::is_unsigned<T>::value, int>::type
inline constexpr signum(T const x) {
return T(0) < x;
template <typename T>
typename std::enable_if<std::is_signed<T>::value, int>::type
inline constexpr signum(T const x) {
return (T(0) < x) - (x < T(0));
For me it does not trigger any warnings on GCC 5.3.1.
No, it doesn't exist in c++, like in matlab. I use a macro in my programs for this.
#define sign(a) ( ( (a) < 0 ) ? -1 : ( (a) > 0 ) )
Bit off-topic, but I use this:
template<typename T>
constexpr int sgn(const T &a, const T &b) noexcept{
return (a > b) - (a < b);
template<typename T>
constexpr int sgn(const T &a) noexcept{
return sgn(a, T(0));
and I found first function - the one with two arguments, to be much more useful from "standard" sgn(), because it is most often used in code like this:
int comp(unsigned a, unsigned b){
return sgn( int(a) - int(b) );
int comp(unsigned a, unsigned b){
return sgn(a, b);
there is no cast for unsigned types and no additional minus.
in fact i have this piece of code using sgn()
template <class T>
int comp(const T &a, const T &b){
if (a < b)
return -1;
if (a > b)
return +1;
return 0;
inline int comp(int const a, int const b){
return a - b;
inline int comp(long int const a, long int const b){
return sgn(a, b);
You can use boost::math::sign() method from boost/math/special_functions/sign.hpp if boost is available.
Here's a branching-friendly implementation:
inline int signum(const double x) {
if(x == 0) return 0;
return (1 - (static_cast<int>((*reinterpret_cast<const uint64_t*>(&x)) >> 63) << 1));
Unless your data has zeros as half of the numbers, here the branch predictor will choose one of the branches as the most common. Both branches only involve simple operations.
Alternatively, on some compilers and CPU architectures a completely branchless version may be faster:
inline int signum(const double x) {
return (x != 0) *
(1 - (static_cast<int>((*reinterpret_cast<const uint64_t*>(&x)) >> 63) << 1));
This works for IEEE 754 double-precision binary floating-point format: binary64 .
While the integer solution in the accepted answer is quite elegant it bothered me that it wouldn't be able to return NAN for double types, so I modified it slightly.
template <typename T> double sgn(T val) {
return double((T(0) < val) - (val < T(0)))/(val == val);
Note that returning a floating point NAN as opposed to a hard coded NAN causes the sign bit to be set in some implementations, so the output for val = -NAN and val = NAN are going to be identical no matter what (if you prefer a "nan" output over a -nan you can put an abs(val) before the return...)
int sign(float n)
union { float f; std::uint32_t i; } u { n };
return 1 - ((u.i >> 31) << 1);
This function assumes:
binary32 representation of floating point numbers
a compiler that make an exception about the strict aliasing rule when using a named union
double signof(double a) { return (a == 0) ? 0 : (a<0 ? -1 : 1); }
Why use ternary operators and if-else when you can simply do this
#define sgn(x) x==0 ? 0 : x/abs(x)

Templatized branchless int max/min function

I'm trying to write a branchless function to return the MAX or MIN of two integers without resorting to if (or ?:). Using the usual technique I can do this easily enough for a given word size:
inline int32 imax( int32 a, int32 b )
// signed for arithmetic shift
int32 mask = a - b;
// mask < 0 means MSB is 1.
return a + ( ( b - a ) & ( mask >> 31 ) );
Now, assuming arguendo that I really am writing the kind of application on the kind of in-order processor where this is necessary, my question is whether there is a way to use C++ templates to generalize this to all sizes of int.
The >>31 step only works for int32s, of course, and while I could copy out overloads on the function for int8, int16, and int64, it seems like I should use a template function instead. But how do I get the size of a template argument in bits?
Is there a better way to do it than this? Can I force the mask T to be signed? If T is unsigned the mask-shift step won't work (because it'll be a logical rather than arithmetic shift).
template< typename T >
inline T imax( T a, T b )
// how can I force this T to be signed?
T mask = a - b;
// I hope the compiler turns the math below into an immediate constant!
mask = mask >> ( (sizeof(T) * 8) - 1 );
return a + ( ( b - a ) & mask );
And, having done the above, can I prevent it from being used for anything but an integer type (eg, no floats or classes)?
EDIT: This answer is from before C++11. Since then, C++11 and later has offered make_signed<T> and much more as part of the standard library
Generally, looks good, but for 100% portability, replace that 8 with CHAR_BIT (or numeric_limits<char>::max()) since it isn't guaranteed that characters are 8-bit.
Any good compiler will be smart enough to merge all of the math constants at compile time.
You can force it to be signed by using a type traits library. which would usually look something like (assuming your numeric_traits library is called numeric_traits):
typename numeric_traits<T>::signed_type x;
An example of a manually rolled numeric_traits header could look like this: (there is plenty of room for additions, but you get the idea).
or better yet, use boost:
typename boost::make_signed<T>::type x;
EDIT: IIRC, signed right shifts don't have to be arithmetic. It is common, and certainly the case with every compiler I've used. But I believe that the standard leaves it up the compiler whether right shifts are arithmetic or not on signed types. In my copy of the draft standard, the following is written:
The value of E1 >> E2 is E1
rightshifted E2 bit positions. If E1
has an unsigned type or if E1 has a
signed type and a nonnegative value,
the value of the result is the
integral part of the quotient of E1
divided by the quantity 2 raised to
the power E2. If E1 has a signed type
and a negative value, the resulting
value is implementation defined.
But as I said, it will work on every compiler I've seen :-p.
Here's another approach for branchless max and min. What's nice about it is that it doesn't use any bit tricks and you don't have to know anything about the type.
template <typename T>
inline T imax (T a, T b)
return (a > b) * a + (a <= b) * b;
template <typename T>
inline T imin (T a, T b)
return (a > b) * b + (a <= b) * a;
To achieve your goals, you're best off just writing this:
template<typename T> T max(T a, T b) { return (a > b) ? a : b; }
Long version
I implemented both the "naive" implementation of max() as well as your branchless implementation. Both of them were not templated, and I instead used int32 just to keep things simple, and as far as I can tell, not only did Visual Studio 2017 make the naive implementation branchless, it also produced fewer instructions.
Here is the relevant Godbolt (and please, check the implementation to make sure I did it right). Note that I'm compiling with /O2 optimizations.
Admittedly, my assembly-fu isn't all that great, so while NaiveMax() had 5 fewer instructions and no apparent branching (and inlining I'm honestly not sure what's happening) I wanted to run a test case to definitively show whether the naive implementation was faster or not.
So I built a test. Here's the code I ran. Visual Studio 2017 (15.8.7) with "default" Release compiler options.
#include <iostream>
#include <chrono>
using int32 = long;
using uint32 = unsigned long;
constexpr int32 NaiveMax(int32 a, int32 b)
return (a > b) ? a : b;
constexpr int32 FastMax(int32 a, int32 b)
int32 mask = a - b;
mask = mask >> ((sizeof(int32) * 8) - 1);
return a + ((b - a) & mask);
int main()
int32 resInts[1000] = {};
int32 lotsOfInts[1'000];
for (uint32 i = 0; i < 1000; i++)
lotsOfInts[i] = rand();
auto naiveTime = [&]() -> auto
auto start = std::chrono::high_resolution_clock::now();
for (uint32 i = 1; i < 1'000'000; i++)
const auto index = i % 1000;
const auto lastIndex = (i - 1) % 1000;
resInts[lastIndex] = NaiveMax(lotsOfInts[lastIndex], lotsOfInts[index]);
auto finish = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
auto fastTime = [&]() -> auto
auto start = std::chrono::high_resolution_clock::now();
for (uint32 i = 1; i < 1'000'000; i++)
const auto index = i % 1000;
const auto lastIndex = (i - 1) % 1000;
resInts[lastIndex] = FastMax(lotsOfInts[lastIndex], lotsOfInts[index]);
auto finish = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
std::cout << "Naive Time: " << naiveTime << std::endl;
std::cout << "Fast Time: " << fastTime << std::endl;
return 0;
And here's the output I get on my machine:
Naive Time: 2330174
Fast Time: 2492246
I've run it several times getting similar results. Just to be safe, I also changed the order in which I conduct the tests, just in case it's the result of a core ramping up in speed, skewing the results. In all cases, I get similar results to the above.
Of course, depending on your compiler or platform, these numbers may all be different. It's worth testing yourself.
The Answer
In brief, it would seem that the best way to write a branchless templated max() function is probably to keep it simple:
template<typename T> T max(T a, T b) { return (a > b) ? a : b; }
There are additional upsides to the naive method:
It works for unsigned types.
It even works for floating types.
It expresses exactly what you intend, rather than needing to comment up your code describing what the bit-twiddling is doing.
It is a well known and recognizable pattern, so most compilers will know exactly how to optimize it, making it more portable. (This is a gut hunch of mine, only backed up by personal experience of compilers surprising me a lot. I'll be willing to admit I'm wrong here.)
You may want to look at the Boost.TypeTraits library. For detecting whether a type is signed you can use the is_signed trait. You can also look into enable_if/disable_if for removing overloads for certain types.
I don't know what are the exact conditions for this bit mask trick to work but you can do something like
template<typename T, typename = std::enable_if_t<std::is_integral<T>{}> >
inline T imax( T a, T b )
Other useful candidates are std::is_[un]signed, std::is_fundamental, etc.
In addition to tloch14's answer "tl;dr", one can also use an index into an array. This avoids the unwieldly bitshuffling of the "branchless min/max"; it's also generalizable to all types.
template<typename T> constexpr T OtherFastMax(const T &a, const T &b)
const T (&p)[2] = {a, b};
return p[a>b];