How to write a portable constexpr std::copysign()? - c++

In particular, it must work with NaNs as std::copysign does. Similarly, I need a constexpr std::signbit.
constexpr double copysign(double mag, double sgn)
{
// how?
}
constexpr bool signbit(double arg)
{
// how?
}
// produce the two types of NaNs
constexpr double nan_pos = copysign(std::numeric_limits<double>::quiet_NaN(), +1);
constexpr double nan_neg = copysign(std::numeric_limits<double>::quiet_NaN(), -1);
// must pass the checks
static_assert(signbit(nan_pos) == false);
static_assert(signbit(nan_neg) == true);
The story behind is that I need two types of NaNs at compile time, as well as ways to distinguish between them. The most straightforward way I can think of is to manipulate the sign bit of the NaNs. It does work at run time; now I just want to move some computations to compile time, and this is the last hurdle.
Notes: at the moment, I'm relying on GCC, as it has built-in versions of these functions and they are indeed constexpr, which is nice. But I want my codebase to compile on Clang and perhaps other compilers too.

P0533: constexpr for <cmath> and <cstdlib> is now accepted.
Starting in C++23 you can just use std::copysign and std::syncbit.

Use of __builtin... is not really portable, but works in compilers that mentioned as target. __builtin_copysign is contexpr, but __builtin_signbit is apparently not on clang, so doing signbit with __builtin_copysign:
#include <limits>
constexpr double copysign(double mag, double sgn)
{
return __builtin_copysign(mag, sgn);
}
constexpr bool signbit(double arg)
{
return __builtin_copysign(1, arg) < 0;
}
// produce the two types of NaNs
constexpr double nan_pos = copysign(std::numeric_limits<double>::quiet_NaN(), +1);
constexpr double nan_neg = copysign(std::numeric_limits<double>::quiet_NaN(), -1);
// must pass the checks
static_assert(signbit(nan_pos) == false);
static_assert(signbit(nan_neg) == true);
int main() {}
https://godbolt.org/z/8Wafaj4a4

If you can use std::bit_cast, you can manipulate floating point types cast to integer types. The portability is limited to the representation of double, but if you can assume the IEEE 754 double-precision binary floating-point format, cast to uint64_t and using sign bit should work.

Related

Efficient division of an int by intmax

I have an integer of type uint32_t and would like to divide it by a maximum value of uint32_t and obtain the result as a float (in range 0..1).
Naturally, I can do the following:
float result = static_cast<float>(static_cast<double>(value) / static_cast<double>(std::numeric_limits<uint32_t>::max()))
This is however quite a lot of conversions on the way, and a the division itself may be expensive.
Is there a way to achieve the above operation faster, without division and excess type conversions? Or maybe I shouldn't worry because modern compilers are able to generate an efficient code already?
Edit: division by MAX+1, effectively giving me a float in range [0..1) would be fine too.
A bit more context:
I use the above transformation in a time-critical loop, with uint32_t being produced from a relatively fast random-number generator (such as pcg). I expect that the conversions/divisions from the above transformation may have some noticable, albeit not overwhelming, negative impact on the performance of my code.
This sounds like a job for:
std::uniform_real_distribution<float> dist(0.f, 1.f);
I would trust that to give you an unbiased conversion to float in the range [0, 1) as efficiently as possible. If you want the range to be [0, 1] you could use this:
std::uniform_real_distribution<float> dist(0.f, std::nextafter(1.f, 2.f))
Here's an example with two instances of a not-so-random number generator that generates min and max for uint32_t:
#include <iostream>
#include <limits>
#include <random>
struct ui32gen {
constexpr ui32gen(uint32_t x) : value(x) {}
uint32_t operator()() { return value; }
static constexpr uint32_t min() { return 0; }
static constexpr uint32_t max() { return std::numeric_limits<uint32_t>::max(); }
uint32_t value;
};
int main() {
ui32gen min(ui32gen::min());
ui32gen max(ui32gen::max());
std::uniform_real_distribution<float> dist(0.f, 1.f);
std::cout << dist(min) << "\n";
std::cout << dist(max) << "\n";
}
Output:
0
1
Is there a way to achieve the operation faster, without division
and excess type conversions?
If you want to manually do something similar to what uniform_real_distribution does (but much faster, and slightly biased towards lower values), you can define a function like this:
// [0, 1) the common range
inline float zero_to_one_exclusive(uint32_t value) {
static const float f_mul =
std::nextafter(1.f / float(std::numeric_limits<uint32_t>::max()), 0.f);
return float(value) * f_mul;
}
It uses multiplication instead of division since that often is a bit faster (than your original suggestion) and only has one type conversion. Here's a comparison of division vs. multiplication.
If you really want the range to be [0, 1], you can do like below, which will also be slightly biased towards lower values compared to what std::uniform_real_distribution<float> dist(0.f, std::nextafter(1.f, 2.f)) would produce:
// [0, 1] the not so common range
inline float zero_to_one_inclusive(uint32_t value) {
static const float f_mul = 1.f/float(std::numeric_limits<uint32_t>::max());
return float(value) * f_mul;
}
Here's a benchmark comparing uniform_real_distribution to zero_to_one_exclusive and zero_to_one_inclusive.
Two of the casts are superfluous. You dont need to cast to float when anyhow you assign to a float. Also it is sufficient to cast one of the operands to avoid integer arithmetics. So we are left with
float result = static_cast<double>(value) / std::numeric_limits<int>::max();
This last cast you cannot avoid (otherwise you would get integer arithmetics).
Or maybe I shouldn't worry because modern compilers are able to
generate an efficient code already?
Definitely a yes and no! Yes, trust the compiler that it knows best to optimize code and write for readability first. And no, dont blindy trust. Look at the output of the compiler. Compare different versions and measure them.
Is there a way to achieve the above operation faster, without division
[...] ?
Probably yes. Dividing by std::numeric_limits<int>::max() is so special, that I wouldn't be too surprised if the compiler comes with some tricks. My first approach would again be to look at the output of the compiler and maybe compare different compilers. Only if the compilers output turns out to be suboptimal I'd bother to enter some manual bit-fiddling.
For further reading this might be of interest: How expensive is it to convert between int and double? . TL;DR: it actually depends on the hardware.
If performance were a real concern I think I'd be inclined to represent this 'integer that is really a fraction' in its own class and perform any conversion only where necessary.
For example:
#include <iostream>
#include <cstdint>
#include <limits>
struct fraction
{
using value_type = std::uint32_t;
constexpr explicit fraction(value_type num = 0) : numerator_(num) {}
static constexpr auto denominator() -> value_type { return std::numeric_limits<value_type>::max(); }
constexpr auto numerator() const -> value_type { return numerator_; }
constexpr auto as_double() const -> double {
return double(numerator()) / denominator();
}
constexpr auto as_float() const -> float {
return float(as_double());
}
private:
value_type numerator_;
};
auto generate() -> std::uint32_t;
int main()
{
auto frac = fraction(generate());
// use/manipulate/display frac here ...
// ... and finally convert to double/float if necessary
std::cout << frac.as_double() << std::endl;
}
However if you look at code gen on godbolt you'll see that the CPU's floating point instructions take care of the conversion. I'd be inclined to measure performance before you run the risk of wasting time on early optimisation.

How does this float square root approximation work?

I found a rather strange but working square root approximation for floats; I really don't get it. Can someone explain me why this code works?
float sqrt(float f)
{
const int result = 0x1fbb4000 + (*(int*)&f >> 1);
return *(float*)&result;
}
I've test it a bit and it outputs values off of std::sqrt() by about 1 to 3%. I know of the Quake III's fast inverse square root and I guess it's something similar here (without the newton iteration) but I'd really appreciate an explanation of how it works.
(nota: I've tagged it both c and c++ since it's both valid-ish (see comments) C and C++ code)
(*(int*)&f >> 1) right-shifts the bitwise representation of f. This almost divides the exponent by two, which is approximately equivalent to taking the square root.1
Why almost? In IEEE-754, the actual exponent is e - 127.2 To divide this by two, we'd need e/2 - 64, but the above approximation only gives us e/2 - 127. So we need to add on 63 to the resulting exponent. This is contributed by bits 30-23 of that magic constant (0x1fbb4000).
I'd imagine the remaining bits of the magic constant have been chosen to minimise the maximum error across the mantissa range, or something like that. However, it's unclear whether it was determined analytically, iteratively, or heuristically.
It's worth pointing out that this approach is somewhat non-portable. It makes (at least) the following assumptions:
The platform uses single-precision IEEE-754 for float.
The endianness of float representation.
That you will be unaffected by undefined behaviour due to the fact this approach violates C/C++'s strict-aliasing rules.
Thus it should be avoided unless you're certain that it gives predictable behaviour on your platform (and indeed, that it provides a useful speedup vs. sqrtf!).
1. sqrt(a^b) = (a^b)^0.5 = a^(b/2)
2. See e.g. https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Exponent_encoding
See Oliver Charlesworth’s explanation of why this almost works. I’m addressing an issue raised in the comments.
Since several people have pointed out the non-portability of this, here are some ways you can make it more portable, or at least make the compiler tell you if it won’t work.
First, C++ allows you to check std::numeric_limits<float>::is_iec559 at compile time, such as in a static_assert. You can also check that sizeof(int) == sizeof(float), which will not be true if int is 64-bits, but what you really want to do is use uint32_t, which if it exists will always be exactly 32 bits wide, will have well-defined behavior with shifts and overflow, and will cause a compilation error if your weird architecture has no such integral type. Either way, you should also static_assert() that the types have the same size. Static assertions have no run-time cost and you should always check your preconditions this way if possible.
Unfortunately, the test of whether converting the bits in a float to a uint32_t and shifting is big-endian, little-endian or neither cannot be computed as a compile-time constant expression. Here, I put the run-time check in the part of the code that depends on it, but you might want to put it in the initialization and do it once. In practice, both gcc and clang can optimize this test away at compile time.
You do not want to use the unsafe pointer cast, and there are some systems I’ve worked on in the real world where that could crash the program with a bus error. The maximally-portable way to convert object representations is with memcpy(). In my example below, I type-pun with a union, which works on any actually-existing implementation. (Language lawyers object to it, but no successful compiler will ever break that much legacy code silently.) If you must do a pointer conversion (see below) there is alignas(). But however you do it, the result will be implementation-defined, which is why we check the result of converting and shifting a test value.
Anyway, not that you’re likely to use it on a modern CPU, here’s a gussied-up C++14 version that checks those non-portable assumptions:
#include <cassert>
#include <cmath>
#include <cstdint>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <vector>
using std::cout;
using std::endl;
using std::size_t;
using std::sqrt;
using std::uint32_t;
template <typename T, typename U>
inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it reads an inactive union member.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
union tu_pun {
U u = U();
T t;
};
const tu_pun pun{x};
return pun.t;
}
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
const bool is_little_endian = after_rshift == target;
float est_sqrt(const float x)
/* A fast approximation of sqrt(x) that works less well for subnormal numbers.
*/
{
static_assert( std::numeric_limits<float>::is_iec559, "" );
assert(is_little_endian); // Could provide alternative big-endian code.
/* The algorithm relies on the bit representation of normal IEEE floats, so
* a subnormal number as input might be considered a domain error as well?
*/
if ( std::isless(x, 0.0F) || !std::isfinite(x) )
return std::numeric_limits<float>::signaling_NaN();
constexpr uint32_t magic_number = 0x1fbb4000UL;
const uint32_t raw_bits = reinterpret<uint32_t,float>(x);
const uint32_t rejiggered_bits = (raw_bits >> 1U) + magic_number;
return reinterpret<float,uint32_t>(rejiggered_bits);
}
int main(void)
{
static const std::vector<float> test_values{
4.0F, 0.01F, 0.0F, 5e20F, 5e-20F, 1.262738e-38F };
for ( const float& x : test_values ) {
const double gold_standard = sqrt((double)x);
const double estimate = est_sqrt(x);
const double error = estimate - gold_standard;
cout << "The error for (" << estimate << " - " << gold_standard << ") is "
<< error;
if ( gold_standard != 0.0 && std::isfinite(gold_standard) ) {
const double error_pct = error/gold_standard * 100.0;
cout << " (" << error_pct << "%).";
} else
cout << '.';
cout << endl;
}
return EXIT_SUCCESS;
}
Update
Here is an alternative definition of reinterpret<T,U>() that avoids type-punning. You could also implement the type-pun in modern C, where it’s allowed by standard, and call the function as extern "C". I think type-punning is more elegant, type-safe and consistent with the quasi-functional style of this program than memcpy(). I also don’t think you gain much, because you still could have undefined behavior from a hypothetical trap representation. Also, clang++ 3.9.1 -O -S is able to statically analyze the type-punning version, optimize the variable is_little_endian to the constant 0x1, and eliminate the run-time test, but it can only optimize this version down to a single-instruction stub.
But more importantly, this code isn’t guaranteed to work portably on every compiler. For example, some old computers can’t even address exactly 32 bits of memory. But in those cases, it should fail to compile and tell you why. No compiler is just suddenly going to break a huge amount of legacy code for no reason. Although the standard technically gives permission to do that and still say it conforms to C++14, it will only happen on an architecture very different from we expect. And if our assumptions are so invalid that some compiler is going to turn a type-pun between a float and a 32-bit unsigned integer into a dangerous bug, I really doubt the logic behind this code will hold up if we just use memcpy() instead. We want that code to fail at compile time, and to tell us why.
#include <cassert>
#include <cstdint>
#include <cstring>
using std::memcpy;
using std::uint32_t;
template <typename T, typename U> inline T reinterpret(const U &x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it modifies a variable.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
T temp;
memcpy( &temp, &x, sizeof(T) );
return temp;
}
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
extern const bool is_little_endian = after_rshift == target;
However, Stroustrup et al., in the C++ Core Guidelines, recommend a reinterpret_cast instead:
#include <cassert>
template <typename T, typename U> inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it uses reinterpret_cast.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
const U temp alignas(T) alignas(U) = x;
return *reinterpret_cast<const T*>(&temp);
}
The compilers I tested can also optimize this away to a folded constant. Stroustrup’s reasoning is [sic]:
Accessing the result of an reinterpret_cast to a different type from the objects declared type is still undefined behavior, but at least we can see that something tricky is going on.
Update
From the comments: C++20 introduces std::bit_cast, which converts an object representation to a different type with unspecified, not undefined, behavior. This doesn’t guarantee that your implementation will use the same format of float and int that this code expects, but it doesn’t give the compiler carte blanche to break your program arbitrarily because there’s technically undefined behavior in one line of it. It can also give you a constexpr conversion.
Let y = sqrt(x),
it follows from the properties of logarithms that log(y) = 0.5 * log(x) (1)
Interpreting a normal float as an integer gives INT(x) = Ix = L * (log(x) + B - σ) (2)
where L = 2^N, N the number of bits of the significand, B is the exponent bias, and σ is a free factor to tune the approximation.
Combining (1) and (2) gives: Iy = 0.5 * (Ix + (L * (B - σ)))
Which is written in the code as (*(int*)&x >> 1) + 0x1fbb4000;
Find the σ so that the constant equals 0x1fbb4000 and determine whether it's optimal.
Adding a wiki test harness to test all float.
The approximation is within 4% for many float, but very poor for sub-normal numbers. YMMV
Worst:1.401298e-45 211749.20%
Average:0.63%
Worst:1.262738e-38 3.52%
Average:0.02%
Note that with argument of +/-0.0, the result is not zero.
printf("% e % e\n", sqrtf(+0.0), sqrt_apx(0.0)); // 0.000000e+00 7.930346e-20
printf("% e % e\n", sqrtf(-0.0), sqrt_apx(-0.0)); // -0.000000e+00 -2.698557e+19
Test code
#include <float.h>
#include <limits.h>
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
float sqrt_apx(float f) {
const int result = 0x1fbb4000 + (*(int*) &f >> 1);
return *(float*) &result;
}
double error_value = 0.0;
double error_worst = 0.0;
double error_sum = 0.0;
unsigned long error_count = 0;
void sqrt_test(float f) {
if (f == 0) return;
volatile float y0 = sqrtf(f);
volatile float y1 = sqrt_apx(f);
double error = (1.0 * y1 - y0) / y0;
error = fabs(error);
if (error > error_worst) {
error_worst = error;
error_value = f;
}
error_sum += error;
error_count++;
}
void sqrt_tests(float f0, float f1) {
error_value = error_worst = error_sum = 0.0;
error_count = 0;
for (;;) {
sqrt_test(f0);
if (f0 == f1) break;
f0 = nextafterf(f0, f1);
}
printf("Worst:%e %.2f%%\n", error_value, error_worst*100.0);
printf("Average:%.2f%%\n", error_sum / error_count);
fflush(stdout);
}
int main() {
sqrt_tests(FLT_TRUE_MIN, FLT_MIN);
sqrt_tests(FLT_MIN, FLT_MAX);
return 0;
}

Normalized Integer to/from Float Conversion

I need to convert normalized integer values to and from real floating-point values. For instance, for int16_t, a value of 1.0 is represented by 32767 and -1.0 is represented by -32768. Although it's a bit tedious to do this for each integer type, both signed and unsigned, it's still easy enough to write by hand.
However, I want to use standard methods whenever possible rather than going off and reinventing the wheel, so what I'm looking for is something like a standard C or C++ header, a Boost library, or some other small, portable, easily-incorporated source that already performs these conversions.
Here's a templated solution using std::numeric_limits:
#include <cstdint>
#include <limits>
template <typename T>
constexpr double normalize (T value) {
return value < 0
? -static_cast<double>(value) / std::numeric_limits<T>::min()
: static_cast<double>(value) / std::numeric_limits<T>::max()
;
}
int main () {
// Test cases evaluated at compile time.
static_assert(normalize(int16_t(32767)) == 1, "");
static_assert(normalize(int16_t(0)) == 0, "");
static_assert(normalize(int16_t(-32768)) == -1, "");
static_assert(normalize(int16_t(-16384)) == -0.5, "");
static_assert(normalize(uint16_t(65535)) == 1, "");
static_assert(normalize(uint16_t(0)) == 0, "");
}
This handles both signed and unsigned integers, and 0 does normalize to 0.
View Successful Compilation Result
I'd question whether your intent is correct here (or indeed that of most of the answers).
Since you're likely just dealing with something like an integer representation of a "real" value such as that produced by an ADC - I'd argue that in fact a floating point fraction of +32767/32768 (not +1.0) is represented by the integer +32767, as a value of +1.0 can't actually be expressed in this form due to the 2's complement arithmetic used.
Although it's a bit tedious to do this for each integer type, both
signed and unsigned, it's still easy enough to write by hand.
You certainly don't need to do this for each integer type! Use <limits> instead.
template<class T> double AsDouble(const T x) {
const double valMin = std::numeric_limits<T>::min();
const double valMax = std::numeric_limits<T>::max();
return 2 * (x - valMin) / (valMax - valMin) - 1; // note: 0 does not become 0.
}

Range analysis of floating point values?

I have an image processing program which uses floating point calculations. However, I need to port it to a processor which does not have floating point support in it. So, I have to change the program to use fixed point calculations. For that I need proper scaling of those floating point numbers, for which I need to know the range of all values, including intermediate values of the floating point calculations.
Is there a method where I just run the program and it automatically give me the range of all the floating point calculations in the program? Trying to figure out the ranges manually would be too cumbersome, so if there is some tool for doing it, that would be awesome!
You could use some "measuring" replacement for your floating type, along these lines (live example):
template<typename T>
class foo
{
T val;
using lim = std::numeric_limits<int>;
static int& min_val() { static int e = lim::max(); return e; }
static int& max_val() { static int e = lim::min(); return e; }
static void sync_min(T e) { if (e < min_val()) min_val() = int(e); }
static void sync_max(T e) { if (e > max_val()) max_val() = int(e); }
static void sync(T v)
{
v = std::abs(v);
T e = v == 0 ? T(1) : std::log10(v);
sync_min(std::floor(e)); sync_max(std::ceil(e));
}
public:
foo(T v = T()) : val(v) { sync(v); }
foo& operator=(T v) { val = v; sync(v); return *this; }
template<typename U> foo(U v) : foo(T(v)) {}
template<typename U> foo& operator=(U v) { return *this = T(v); }
operator T&() { return val; }
operator const T&() const { return val; }
static int min() { return min_val(); }
static int max() { return max_val(); }
};
to be used like
int main ()
{
using F = foo<float>;
F x;
for (F e = -10.2; e <= 30.4; e += .2)
x = std::pow(10, e);
std::cout << F::min() << " " << F::max() << std::endl; // -11 31
}
This means you need to define an alias (say, Float) for your floating type (float or double) and use it consistently throughout your program. This may be inconvenient but it may prove beneficial eventually (because then your program is more generic). If your code is already templated on the floating type, even better.
After this parametrization, you can switch your program to "measuring" or "release" mode by defining Float to be either foo<T> or T, where T is your float or double.
The good thing is that you don't need external tools, your own code carries out the measurements. The bad thing is that, as currently designed, it won't catch all intermediate results. You would have to define all (e.g. arithmetic) operators on foo for this. This can be done but needs some more work.
It is not true that you cannot use floating point code on hardware that does not support floating point - the compiler will provide software routines to perform floating point operations - they just may be rather slow - but if it is fast enough for your application , that is the path of least resistance.
It is probably simplest to implement a fixed point data type class and have its member functions detect over/underflow as a debug option (because the checking will otherwise slow your code).
I suggest you look at Anthony Williams' fixed-Point math C++ library. It is in C++ and defines a fixed class with extensive function and operator overloading, so it can largely be used simply by replacing float or double in your existing code with fixed. It uses int64_t as the underlying integer data type, with 34 integer bits and 28 fractional bits (34Q28), so is good for about 8 decimal places and a wider range than int32_t.
It does not have the under/overflow checking I suggested, but it is a good starting point for you to add your own.
On 32bit ARM this library performs about 5 times faster than software-floating point and is comparable in performance to ARM's VFP unit for C code.
Note that the sqrt() function in this library has poor precision performance for very small values as it looses lower-order bits in intermediate calculations that can be preserved. It can be improved by replacing it with the code the version I presented in this question.
For self-contained C programs, you can use Frama-C's value analysis to obtain ranges for the floating-point variables, for instance h below:
And variable g computed from h:
There is a specification language to describe the ranges of the inputs (information without which it is difficult to say anything informative). In the example above, I used that language to specify what function float_interval was expected to do:
/*# ensures \is_finite(\result) && l <= \result <= u ; */
float float_interval(float l, float u);
Frama-C is easiest to install on Linux, with Debian and Ubuntu binary packages for a recent (but usually not the latest) version available from within the distribution.
If you could post your code, it would help telling if if this approach is realistic. If your code is C++, for instance (your question does not say, it is tagged with several language tags), then the current version of Frama-C will be no help as it only accepts C programs.

Is there a standard sign function (signum, sgn) in C/C++?

I want a function that returns -1 for negative numbers and +1 for positive numbers.
http://en.wikipedia.org/wiki/Sign_function
It's easy enough to write my own, but it seems like something that ought to be in a standard library somewhere.
Edit: Specifically, I was looking for a function working on floats.
The type-safe C++ version:
template <typename T> int sgn(T val) {
return (T(0) < val) - (val < T(0));
}
Benefits:
Actually implements signum (-1, 0, or 1). Implementations here using copysign only return -1 or 1, which is not signum. Also, some implementations here are returning a float (or T) rather than an int, which seems wasteful.
Works for ints, floats, doubles, unsigned shorts, or any custom types constructible from integer 0 and orderable.
Fast! copysign is slow, especially if you need to promote and then narrow again. This is branchless and optimizes excellently
Standards-compliant! The bitshift hack is neat, but only works for some bit representations, and doesn't work when you have an unsigned type. It could be provided as a manual specialization when appropriate.
Accurate! Simple comparisons with zero can maintain the machine's internal high-precision representation (e.g. 80 bit on x87), and avoid a premature round to zero.
Caveats:
It's a template so it might take longer to compile in some circumstances.
Apparently some people think use of a new, somewhat esoteric, and very slow standard library function that doesn't even really implement signum is more understandable.
The < 0 part of the check triggers GCC's -Wtype-limits warning when instantiated for an unsigned type. You can avoid this by using some overloads:
template <typename T> inline constexpr
int signum(T x, std::false_type is_signed) {
return T(0) < x;
}
template <typename T> inline constexpr
int signum(T x, std::true_type is_signed) {
return (T(0) < x) - (x < T(0));
}
template <typename T> inline constexpr
int signum(T x) {
return signum(x, std::is_signed<T>());
}
(Which is a good example of the first caveat.)
I don't know of a standard function for it. Here's an interesting way to write it though:
(x > 0) - (x < 0)
Here's a more readable way to do it:
if (x > 0) return 1;
if (x < 0) return -1;
return 0;
If you like the ternary operator you can do this:
(x > 0) ? 1 : ((x < 0) ? -1 : 0)
There is a C99 math library function called copysign(), which takes the sign from one argument and the absolute value from the other:
result = copysign(1.0, value) // double
result = copysignf(1.0, value) // float
result = copysignl(1.0, value) // long double
will give you a result of +/- 1.0, depending on the sign of value. Note that floating point zeroes are signed: (+0) will yield +1, and (-0) will yield -1.
It seems that most of the answers missed the original question.
Is there a standard sign function (signum, sgn) in C/C++?
Not in the standard library, however there is copysign which can be used almost the same way via copysign(1.0, arg) and there is a true sign function in boost, which might as well be part of the standard.
#include <boost/math/special_functions/sign.hpp>
//Returns 1 if x > 0, -1 if x < 0, and 0 if x is zero.
template <class T>
inline int sign (const T& z);
Apparently, the answer to the original poster's question is no. There is no standard C++ sgn function.
Is there a standard sign function (signum, sgn) in C/C++?
Yes, depending on definition.
C99 and later has the signbit() macro in <math.h>
int signbit(real-floating x);
The signbit macro returns a nonzero value if and only if the sign of its argument value is negative. C11 §7.12.3.6
Yet OP wants something a little different.
I want a function that returns -1 for negative numbers and +1 for positive numbers. ... a function working on floats.
#define signbit_p1_or_n1(x) ((signbit(x) ? -1 : 1)
Deeper:
OP's question is not specific in the following cases: x = 0.0, -0.0, +NaN, -NaN.
A classic signum() returns +1 on x>0, -1 on x<0 and 0 on x==0.
Many answers have already covered that, but do not address x = -0.0, +NaN, -NaN. Many are geared for an integer point-of-view that usually lacks Not-a-Numbers (NaN) and -0.0.
Typical answers function like signnum_typical() On -0.0, +NaN, -NaN, they return 0.0, 0.0, 0.0.
int signnum_typical(double x) {
if (x > 0.0) return 1;
if (x < 0.0) return -1;
return 0;
}
Instead, I propose this functionality: On -0.0, +NaN, -NaN, it returns -0.0, +NaN, -NaN.
double signnum_c(double x) {
if (x > 0.0) return 1.0;
if (x < 0.0) return -1.0;
return x;
}
Faster than the above solutions, including the highest rated one:
(x < 0) ? -1 : (x > 0)
There's a way to do it without branching, but it's not very pretty.
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
http://graphics.stanford.edu/~seander/bithacks.html
Lots of other interesting, overly-clever stuff on that page, too...
If all you want is to test the sign, use signbit (returns true if its argument has a negative sign).
Not sure why you would particularly want -1 or +1 returned; copysign is more convenient
for that, but it sounds like it will return +1 for negative zero on some platforms with
only partial support for negative zero, where signbit presumably would return true.
In general, there is no standard signum function in C/C++, and the lack of such a fundamental function tells you a lot about these languages.
Apart from that, I believe both majority viewpoints about the right approach to define such a function are in a way correct, and the "controversy" about it is actually a non-argument once you take into account two important caveats:
A signum function should always return the type of its operand, similarly to an abs() function, because signum is usually used for multiplication with an absolute value after the latter has been processed somehow. Therefore, the major use case of signum is not comparisons but arithmetic, and the latter shouldn't involve any expensive integer-to/from-floating-point conversions.
Floating point types do not feature a single exact zero value: +0.0 can be interpreted as "infinitesimally above zero", and -0.0 as "infinitesimally below zero". That's the reason why comparisons involving zero must internally check against both values, and an expression like x == 0.0 can be dangerous.
Regarding C, I think the best way forward with integral types is indeed to use the (x > 0) - (x < 0) expression, as it should be translated in a branch-free fashion, and requires only three basic operations. Best define inline functions that enforce a return type matching the argument type, and add a C11 define _Generic to map these functions to a common name.
With floating point values, I think inline functions based on C11 copysignf(1.0f, x), copysign(1.0, x), and copysignl(1.0l, x) are the way to go, simply because they're also highly likely to be branch-free, and additionally do not require casting the result from integer back into a floating point value. You should probably comment prominently that your floating point implementations of signum will not return zero because of the peculiarities of floating point zero values, processing time considerations, and also because it is often very useful in floating point arithmetic to receive the correct -1/+1 sign, even for zero values.
My copy of C in a Nutshell reveals the existence of a standard function called copysign which might be useful. It looks as if copysign(1.0, -2.0) would return -1.0 and copysign(1.0, 2.0) would return +1.0.
Pretty close huh?
The question is old but there is now this kind of desired function. I added a wrapper with not, left shift and dec.
You can use a wrapper function based on signbit from C99 in order to get the exact desired behavior (see code further below).
Returns whether the sign of x is negative.
This can be also applied to infinites, NaNs and zeroes (if zero is unsigned, it is considered positive
#include <math.h>
int signValue(float a) {
return ((!signbit(a)) << 1) - 1;
}
NB: I use operand not ("!") because the return value of signbit is not specified to be 1 (even though the examples let us think it would always be this way) but true for a negative number:
Return value
A non-zero value (true) if the sign of x is negative; and zero (false) otherwise.
Then I multiply by two with left shift (" << 1") which will give us 2 for a positive number and 0 for a negative one and finally decrement by 1 to obtain 1 and -1 for respectively positive and negative numbers as requested by OP.
The accepted answer with the overload below does indeed not trigger -Wtype-limits. But it does trigger unused argument warnings (on the is_signed variable). To avoid these the second argument should not be named like so:
template <typename T> inline constexpr
int signum(T x, std::false_type) {
return T(0) < x;
}
template <typename T> inline constexpr
int signum(T x, std::true_type) {
return (T(0) < x) - (x < T(0));
}
template <typename T> inline constexpr
int signum(T x) {
return signum(x, std::is_signed<T>());
}
For C++11 and higher an alternative could be.
template <typename T>
typename std::enable_if<std::is_unsigned<T>::value, int>::type
inline constexpr signum(T const x) {
return T(0) < x;
}
template <typename T>
typename std::enable_if<std::is_signed<T>::value, int>::type
inline constexpr signum(T const x) {
return (T(0) < x) - (x < T(0));
}
For me it does not trigger any warnings on GCC 5.3.1.
No, it doesn't exist in c++, like in matlab. I use a macro in my programs for this.
#define sign(a) ( ( (a) < 0 ) ? -1 : ( (a) > 0 ) )
Bit off-topic, but I use this:
template<typename T>
constexpr int sgn(const T &a, const T &b) noexcept{
return (a > b) - (a < b);
}
template<typename T>
constexpr int sgn(const T &a) noexcept{
return sgn(a, T(0));
}
and I found first function - the one with two arguments, to be much more useful from "standard" sgn(), because it is most often used in code like this:
int comp(unsigned a, unsigned b){
return sgn( int(a) - int(b) );
}
vs.
int comp(unsigned a, unsigned b){
return sgn(a, b);
}
there is no cast for unsigned types and no additional minus.
in fact i have this piece of code using sgn()
template <class T>
int comp(const T &a, const T &b){
log__("all");
if (a < b)
return -1;
if (a > b)
return +1;
return 0;
}
inline int comp(int const a, int const b){
log__("int");
return a - b;
}
inline int comp(long int const a, long int const b){
log__("long");
return sgn(a, b);
}
You can use boost::math::sign() method from boost/math/special_functions/sign.hpp if boost is available.
Here's a branching-friendly implementation:
inline int signum(const double x) {
if(x == 0) return 0;
return (1 - (static_cast<int>((*reinterpret_cast<const uint64_t*>(&x)) >> 63) << 1));
}
Unless your data has zeros as half of the numbers, here the branch predictor will choose one of the branches as the most common. Both branches only involve simple operations.
Alternatively, on some compilers and CPU architectures a completely branchless version may be faster:
inline int signum(const double x) {
return (x != 0) *
(1 - (static_cast<int>((*reinterpret_cast<const uint64_t*>(&x)) >> 63) << 1));
}
This works for IEEE 754 double-precision binary floating-point format: binary64 .
While the integer solution in the accepted answer is quite elegant it bothered me that it wouldn't be able to return NAN for double types, so I modified it slightly.
template <typename T> double sgn(T val) {
return double((T(0) < val) - (val < T(0)))/(val == val);
}
Note that returning a floating point NAN as opposed to a hard coded NAN causes the sign bit to be set in some implementations, so the output for val = -NAN and val = NAN are going to be identical no matter what (if you prefer a "nan" output over a -nan you can put an abs(val) before the return...)
int sign(float n)
{
union { float f; std::uint32_t i; } u { n };
return 1 - ((u.i >> 31) << 1);
}
This function assumes:
binary32 representation of floating point numbers
a compiler that make an exception about the strict aliasing rule when using a named union
double signof(double a) { return (a == 0) ? 0 : (a<0 ? -1 : 1); }
Why use ternary operators and if-else when you can simply do this
#define sgn(x) x==0 ? 0 : x/abs(x)