I have a fixed-point implementation for some financial application. It's basically an integer wrapped in a class that is based on the number of decimals given Ntreated as a decimal number. The class is paranoid and checks for overflows, but when I ran my tests in release mode, and they failed, and finally I created this minimal example that demonstrates the problem:
#include <iostream>
#include <sstream>
template <typename T, typename U>
typename std::enable_if<std::is_convertible<U, std::string>::value, T>::type
FromString(U&& str)
{
std::stringstream ss;
ss << str;
T ret;
ss >> ret;
return ret;
}
int main()
{
int NewAccu=32;
int N=10;
using T = int64_t;
T l = 10;
T r = FromString<T>("1" + std::string(NewAccu - N, '0'));
if (l == 0 || r == 0) {
return 0;
}
T res = l * r;
std::cout << l << std::endl;
std::cout << r << std::endl;
std::cout << res << std::endl;
std::cout << (res / l) << std::endl;
std::cout << std::endl;
if ((res / l) != r) {
throw std::runtime_error(
"FixedPoint Multiplication Overflow while upscaling [:" + std::to_string(l) + ", " + std::to_string(r) + "]");
}
return 0;
}
This happens with Clang 6, my version is:
$ clang++ --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
It's funny because it's an impressive optimization, but this ruins my application and prevents me from detecting overflows. I was able to reproduce this problem in g++ here. It doesn't throw an exception there.
Notice that the exception is thrown in debug mode, but it's not in release mode.
As #Basile already stated, signed integer overflow is an undefined behavior, so the compiler can handle it in any way - even optimizing it away to gain a performance advantage. So detecting integer overflow after its occurence is way too late. Instead, you should predict integer overflow just before it occurs.
Here is my implementation of overflow prediction of integer multiplication:
#include <limits>
template <typename T>
bool predict_mul_overflow(T x, T y)
{
static_assert(std::numeric_limits<T>::is_integer, "predict_mul_overflow expects integral types");
if constexpr (std::numeric_limits<T>::is_bounded)
{
return ((x != T{0}) && ((std::numeric_limits<T>::max() / x) < y));
}
else
{
return false;
}
}
The function returns true if the integer multiplication x * y is predicted to overflow.
Note that while unsigned overflow is well-defined in terms of modular arithmetic, signed overflow is an undefined behavior. Nevertheless, the presented function works for signed and unsigned T types as well.
If you want to detect (signed) integer overflows (on scalar types like int64_t or long), you should use appropriate builtins, often compiler specific.
For GCC, see integer overflow builtins.
Integer overflow (on plain int or long or other signed integral type) is an instance of undefined behavior, so the compiler can optimize as it please against it. Be scared. If you depend on UB you are no more coding in standard C++ and your program is tied to a particular compiler and system, so is not portable at all (even to other compilers, other compiler versions, other compilation flags, other computers and OSes). So Clang (or GCC) is allowed to optimize against integer overflow, and sometimes does.
Or consider using some bignum package (then of course you don't deal with just predefined C++ integral scalar types). Perhaps GMPlib.
You could consider using GCC's __int128 if your numbers fit into 128 bits.
I believe you cannot reliably detect integer overflows when they happen (unless you use the integer overflow builtins). You should avoid them (or use some bignum library, or some library using these builtins, etc.).
Related
EDIT In the actual example, it appears possible that negative overflow can happen, I've also added an example to demonstrate the error there
I'm using C++20 and trying to convert a library which relies on signed integer overflow in Java and C# into C++ code. I'm also trying to generate the tables it uses at compile time, and allow those to be available at compile time.
In my code I get errors in reference to code that looks like this (Minimal example to reproduce the error, the solution to this will solve my problem as well):
#include <iostream>
constexpr auto foo(){
std::int64_t a = 2;
std::int64_t very_large_constant = 0x598CD327003817B5L;
std::int64_t x = a * very_large_constant;
return x;
}
int main(){
std::cout << foo() << std::endl;
return 0;
}
https://godbolt.org/z/TvM45vd8d
Negative overflow version
#include <iostream>
constexpr auto foo(){
std::int64_t a = -2;
std::int64_t very_large_constant = 0x598CD327003817B5L;
std::int64_t x = a * very_large_constant;
return x;
}
int main(){
std::cout << foo() << std::endl;
return 0;
}
https://godbolt.org/z/7zoE9r18E
I get 12905529061151879018 is out side of range representable by long long and -12905529061151879018 respectively.
I understand that undefined behavior here is not allowed, I also recognize that GCC and MSVC do not error here, and you can put a flag to make clang compile this anyway. But what am I supposed to do to actually solve this issue with out switching compilers or applying the flag to ignore invalid constexpr?
Is there some way I can define the behavior I expect and want to happen here?
Signed integers have two's complement layout in any implementation that you could name. It's also guaranteed to use two'
s complement layout since C++20.
This means that you can perform your math on unsigned integers and get well-defined overflow behavior that matches what you want your signed integers to do.
#include <iostream>
#include <bit>
constexpr auto foo(){
std::uint64_t a = 2;
std::uint64_t very_large_constant = 0x598CD327003817B5L;
std::uint64_t x = a * very_large_constant;
return static_cast<std::int64_t>(x);
}
You cannot do this with signed integers. However, there are some things you can rely on in C++20:
Unsigned integer overflow is well-defined.
Signed integers are required to be represented as 2's complement.
Conversions between corresponding sized and unsigned integers preserve the bitpattern.
So you can do all of your overflow-based math using explicitly unsigned types and literals, then cast them to signed values when you need to. This conversion is required to leave the bits unchanged.
I have a question regarding conversion of integers:
#include <iostream>
#include <cstdint>
using namespace std;
int main()
{
int N,R,W,H,D;
uint64_t sum = 0;
uint64_t sum_2 = 0;
cin >> W >> H >> D;
sum += static_cast<uint64_t>(W) * H * D * 100;
sum_2 += W * H * D * 100;
cout << sum << endl;
cout << sum_2 << endl;
return 0;
}
I thought, that sum should be equal to sum_2, because uint64_t type is bigger than int type and during arithmetic operations compiler chooses bigger type(which is uint64_t). So by my understanding, sum_2 must have uint64_t type. But it has int type.
Can you explain my why sum_2 was converted to int? Why didn't it stay uint64_t?
Undefined behavior signed-integer overflow/underflow, and well-defined behavior unsigned-integer overflow/underflow, in C and C++
If I enter 200, 300, and 358 for W, H, and D, I get the following output, which makes perfect sense for my gcc compiler on a 64-bit Linux machine:
2148000000
18446744071562584320
Why does this make perfect sense?
Well, the default type is int, which is int32_t for the gcc compiler on a 64-bit Linux machine, and its max value is 2^32/2-1 = 2147483647, and its min value is -2147483648. The line sum_2 += W * H * D * 100; does int arithmetic since that's the type of each variable there, 100 included, and no explicit cast is used. So, after doing int arithmetic, it then implicitly casts the int result into a uint64_t as it stores the result into the uint64_t sum_2 variable. The int arithmetic on the right-hand side prior to that point, however, results in 2148000000, which has undefined behavior signed integer overflow over the top of the max int value and back down to the min int value and up again.
Even though according to the C and C++ standards, signed integer overflow or underflow is undefined behavior, in the gcc compiler, I know that signed integer overflow happens to roll over to negative values if it is not optimized out. This, by default, is still "undefined behavior", and a bug, however, and must not be relied upon by default. See notes below for details and information on how to make this well-defined behavior via a gcc extension. Anyway, 2148000000 - 2147483647 = 516353 up-counts, the first of which causes roll-over. The first count up rolls over to the min int32_t value of -2147483648, and the next (516353 - 1 = 516352) counts go up to -2147483648 + 516352 = -2146967296. So, the result of W * H * D * 100 for the inputs above is now -2146967296, based on undefined behavior. Next, that value is implicitly cast from an int (int32_t in this case) to a uint64_t in order to store it from an int (int32_t in this case) into the uint64_t sum_2 variable, resulting in well-defined behavior unsigned integer underflow. You start with -2146967296. The first down-count underflows down to uint64_t max, which is 2^64-1 = 18446744073709551615. Now subtract the remaining 2146967296 - 1 = 2146967295 counts from that and you get 18446744073709551615 - 2146967295 = 18446744071562584320, just as shown above!
Voila! With a little compiler and hardware architecture understanding, and some expected but undefined behavior, the result is perfectly explainable and makes sense!
To easily see the negative value, add this to your code:
int sum_3 = W*H*D*100;
cout << sum_3 << endl; // output: -2146967296
Notes
Never intentionally leave undefined behavior in your code. That is known as a bug. You do not have to write ISO C++, however! If you can find compiler documentation indicating a certain behavior is well-defined, that's ok, so long as you know you are writing in the g++ language and not the C++ language, and don't expect your code to work the same across compilers. Here is an example where I do that: Using Unions for "type punning" is fine in C, and fine in gcc's C++ as well (as a gcc [g++] extension). I'm generally okay with relying on compiler extensions like this. Just be aware of what you're doing is all.
#user17732522 makes a great point in the comments here:
"in the gcc compiler, I know that signed integer overflow happens to roll over to negative values.": That is not correct by-default. By-default GCC assumes that signed overflow does not happen and applies optimizations based on that. There is the -fwrapv and/or -fno-strict-overflow flag to enforce wrapping behavior. See https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Code-Gen-Options.html#Code-Gen-Options.
Take a look at that link above (or even better, this one, to always point to the latest gcc documentation instead of the documentation for just one version: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options). Even though signed-integer overflow and underflow is undefined behavior (a bug!) according to the C and C++ standards, gcc allows, by extension, to make it well-defined behavior (not a bug!) so long as you use the proper gcc build flags. Using -fwrapv makes signed-integer overflow/underflow well-defined behavior as a gcc extension. Additionally, -fwrapv-pointer allows pointers to safely overflow and underflow when used in pointer arithmetic, and -fno-strict-overflow applies both -fwrapv and -fwrapv-pointer. The relevant documentation is here: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options (emphasis added):
These machine-independent options control the interface conventions used in code generation.
Most of them have both positive and negative forms; the negative form of -ffoo is -fno-foo.
...
-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables others. The options -ftrapv and -fwrapv override each other, so using -ftrapv -fwrapv on the command-line results in -fwrapv being effective. Note that only active options override, so using -ftrapv -fwrapv -fno-wrapv on the command-line results in -ftrapv being effective.
-fwrapv-pointer
This option instructs the compiler to assume that pointer arithmetic overflow on addition and subtraction wraps around using twos-complement representation. This flag disables some optimizations which assume pointer overflow is invalid.
-fstrict-overflow
This option implies -fno-wrapv -fno-wrapv-pointer and when negated [as -fno-strict-overflow] implies -fwrapv -fwrapv-pointer.
So, relying on signed-integer overflow or underflow withOUT using the proper gcc extension flags above is undefined behavior, and therefore a bug, and can not be safely relied upon! It may be optimized out by the compiler and not work reliably as intended without the gcc extension flags above.
My test code
Here is my total code I used for some quick checks to write this answer. I ran it with the gcc/g++ compiler on a 64-bit Linux machine. I did not use the -fwrapv or -fno-strict-overflow flags, so all signed integer overflow or underflow demonstrated below is undefined behavior, a bug, and cannot be relied upon safely without those gcc extension flags. The fact that it works is circumstantial, as the compiler could, by default, choose to optimize out the overflows in unexpected ways.
If you run this on an 8-bit microcontroller such as an Arduino Uno, you'd get different results since an int is a 2-byte int16_t by default, instead! But, now that you understand the principles, you could figure out the expected result. (Also, I think 64-bit values don't exist on that architecture, so they become 32-bit values).
#include <iostream>
#include <cstdint>
using namespace std;
int main()
{
int N,R,W,H,D;
uint64_t sum = 0;
uint64_t sum_2 = 0;
// cin >> W >> H >> D;
W = 200;
H = 300;
D = 358;
sum += static_cast<uint64_t>(W) * H * D * 100;
sum_2 += W * H * D * 100;
cout << sum << endl;
cout << sum_2 << endl;
int sum_3 = W*H*D*100;
cout << sum_3 << endl;
sum_2 = -1; // underflow to uint64_t max
cout << sum_2 << endl;
sum_2 = 18446744073709551615ULL - 2146967295;
cout << sum_2 << endl;
return 0;
}
Just a short version of #Gabriel Staples good answer.
"and during arithmetic operations compiler chooses bigger type(which is uint64_t)"
There is no uin64_t in W * H * D * 100, just four int. After this multiplication, the int product (which overflowed and is UB) is assigned to an uint64_t.
Instead, use 100LLU * W * H * D to perform a wider unsigned multiplication.
I found a rather strange but working square root approximation for floats; I really don't get it. Can someone explain me why this code works?
float sqrt(float f)
{
const int result = 0x1fbb4000 + (*(int*)&f >> 1);
return *(float*)&result;
}
I've test it a bit and it outputs values off of std::sqrt() by about 1 to 3%. I know of the Quake III's fast inverse square root and I guess it's something similar here (without the newton iteration) but I'd really appreciate an explanation of how it works.
(nota: I've tagged it both c and c++ since it's both valid-ish (see comments) C and C++ code)
(*(int*)&f >> 1) right-shifts the bitwise representation of f. This almost divides the exponent by two, which is approximately equivalent to taking the square root.1
Why almost? In IEEE-754, the actual exponent is e - 127.2 To divide this by two, we'd need e/2 - 64, but the above approximation only gives us e/2 - 127. So we need to add on 63 to the resulting exponent. This is contributed by bits 30-23 of that magic constant (0x1fbb4000).
I'd imagine the remaining bits of the magic constant have been chosen to minimise the maximum error across the mantissa range, or something like that. However, it's unclear whether it was determined analytically, iteratively, or heuristically.
It's worth pointing out that this approach is somewhat non-portable. It makes (at least) the following assumptions:
The platform uses single-precision IEEE-754 for float.
The endianness of float representation.
That you will be unaffected by undefined behaviour due to the fact this approach violates C/C++'s strict-aliasing rules.
Thus it should be avoided unless you're certain that it gives predictable behaviour on your platform (and indeed, that it provides a useful speedup vs. sqrtf!).
1. sqrt(a^b) = (a^b)^0.5 = a^(b/2)
2. See e.g. https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Exponent_encoding
See Oliver Charlesworth’s explanation of why this almost works. I’m addressing an issue raised in the comments.
Since several people have pointed out the non-portability of this, here are some ways you can make it more portable, or at least make the compiler tell you if it won’t work.
First, C++ allows you to check std::numeric_limits<float>::is_iec559 at compile time, such as in a static_assert. You can also check that sizeof(int) == sizeof(float), which will not be true if int is 64-bits, but what you really want to do is use uint32_t, which if it exists will always be exactly 32 bits wide, will have well-defined behavior with shifts and overflow, and will cause a compilation error if your weird architecture has no such integral type. Either way, you should also static_assert() that the types have the same size. Static assertions have no run-time cost and you should always check your preconditions this way if possible.
Unfortunately, the test of whether converting the bits in a float to a uint32_t and shifting is big-endian, little-endian or neither cannot be computed as a compile-time constant expression. Here, I put the run-time check in the part of the code that depends on it, but you might want to put it in the initialization and do it once. In practice, both gcc and clang can optimize this test away at compile time.
You do not want to use the unsafe pointer cast, and there are some systems I’ve worked on in the real world where that could crash the program with a bus error. The maximally-portable way to convert object representations is with memcpy(). In my example below, I type-pun with a union, which works on any actually-existing implementation. (Language lawyers object to it, but no successful compiler will ever break that much legacy code silently.) If you must do a pointer conversion (see below) there is alignas(). But however you do it, the result will be implementation-defined, which is why we check the result of converting and shifting a test value.
Anyway, not that you’re likely to use it on a modern CPU, here’s a gussied-up C++14 version that checks those non-portable assumptions:
#include <cassert>
#include <cmath>
#include <cstdint>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <vector>
using std::cout;
using std::endl;
using std::size_t;
using std::sqrt;
using std::uint32_t;
template <typename T, typename U>
inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it reads an inactive union member.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
union tu_pun {
U u = U();
T t;
};
const tu_pun pun{x};
return pun.t;
}
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
const bool is_little_endian = after_rshift == target;
float est_sqrt(const float x)
/* A fast approximation of sqrt(x) that works less well for subnormal numbers.
*/
{
static_assert( std::numeric_limits<float>::is_iec559, "" );
assert(is_little_endian); // Could provide alternative big-endian code.
/* The algorithm relies on the bit representation of normal IEEE floats, so
* a subnormal number as input might be considered a domain error as well?
*/
if ( std::isless(x, 0.0F) || !std::isfinite(x) )
return std::numeric_limits<float>::signaling_NaN();
constexpr uint32_t magic_number = 0x1fbb4000UL;
const uint32_t raw_bits = reinterpret<uint32_t,float>(x);
const uint32_t rejiggered_bits = (raw_bits >> 1U) + magic_number;
return reinterpret<float,uint32_t>(rejiggered_bits);
}
int main(void)
{
static const std::vector<float> test_values{
4.0F, 0.01F, 0.0F, 5e20F, 5e-20F, 1.262738e-38F };
for ( const float& x : test_values ) {
const double gold_standard = sqrt((double)x);
const double estimate = est_sqrt(x);
const double error = estimate - gold_standard;
cout << "The error for (" << estimate << " - " << gold_standard << ") is "
<< error;
if ( gold_standard != 0.0 && std::isfinite(gold_standard) ) {
const double error_pct = error/gold_standard * 100.0;
cout << " (" << error_pct << "%).";
} else
cout << '.';
cout << endl;
}
return EXIT_SUCCESS;
}
Update
Here is an alternative definition of reinterpret<T,U>() that avoids type-punning. You could also implement the type-pun in modern C, where it’s allowed by standard, and call the function as extern "C". I think type-punning is more elegant, type-safe and consistent with the quasi-functional style of this program than memcpy(). I also don’t think you gain much, because you still could have undefined behavior from a hypothetical trap representation. Also, clang++ 3.9.1 -O -S is able to statically analyze the type-punning version, optimize the variable is_little_endian to the constant 0x1, and eliminate the run-time test, but it can only optimize this version down to a single-instruction stub.
But more importantly, this code isn’t guaranteed to work portably on every compiler. For example, some old computers can’t even address exactly 32 bits of memory. But in those cases, it should fail to compile and tell you why. No compiler is just suddenly going to break a huge amount of legacy code for no reason. Although the standard technically gives permission to do that and still say it conforms to C++14, it will only happen on an architecture very different from we expect. And if our assumptions are so invalid that some compiler is going to turn a type-pun between a float and a 32-bit unsigned integer into a dangerous bug, I really doubt the logic behind this code will hold up if we just use memcpy() instead. We want that code to fail at compile time, and to tell us why.
#include <cassert>
#include <cstdint>
#include <cstring>
using std::memcpy;
using std::uint32_t;
template <typename T, typename U> inline T reinterpret(const U &x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it modifies a variable.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
T temp;
memcpy( &temp, &x, sizeof(T) );
return temp;
}
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
extern const bool is_little_endian = after_rshift == target;
However, Stroustrup et al., in the C++ Core Guidelines, recommend a reinterpret_cast instead:
#include <cassert>
template <typename T, typename U> inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it uses reinterpret_cast.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
const U temp alignas(T) alignas(U) = x;
return *reinterpret_cast<const T*>(&temp);
}
The compilers I tested can also optimize this away to a folded constant. Stroustrup’s reasoning is [sic]:
Accessing the result of an reinterpret_cast to a different type from the objects declared type is still undefined behavior, but at least we can see that something tricky is going on.
Update
From the comments: C++20 introduces std::bit_cast, which converts an object representation to a different type with unspecified, not undefined, behavior. This doesn’t guarantee that your implementation will use the same format of float and int that this code expects, but it doesn’t give the compiler carte blanche to break your program arbitrarily because there’s technically undefined behavior in one line of it. It can also give you a constexpr conversion.
Let y = sqrt(x),
it follows from the properties of logarithms that log(y) = 0.5 * log(x) (1)
Interpreting a normal float as an integer gives INT(x) = Ix = L * (log(x) + B - σ) (2)
where L = 2^N, N the number of bits of the significand, B is the exponent bias, and σ is a free factor to tune the approximation.
Combining (1) and (2) gives: Iy = 0.5 * (Ix + (L * (B - σ)))
Which is written in the code as (*(int*)&x >> 1) + 0x1fbb4000;
Find the σ so that the constant equals 0x1fbb4000 and determine whether it's optimal.
Adding a wiki test harness to test all float.
The approximation is within 4% for many float, but very poor for sub-normal numbers. YMMV
Worst:1.401298e-45 211749.20%
Average:0.63%
Worst:1.262738e-38 3.52%
Average:0.02%
Note that with argument of +/-0.0, the result is not zero.
printf("% e % e\n", sqrtf(+0.0), sqrt_apx(0.0)); // 0.000000e+00 7.930346e-20
printf("% e % e\n", sqrtf(-0.0), sqrt_apx(-0.0)); // -0.000000e+00 -2.698557e+19
Test code
#include <float.h>
#include <limits.h>
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
float sqrt_apx(float f) {
const int result = 0x1fbb4000 + (*(int*) &f >> 1);
return *(float*) &result;
}
double error_value = 0.0;
double error_worst = 0.0;
double error_sum = 0.0;
unsigned long error_count = 0;
void sqrt_test(float f) {
if (f == 0) return;
volatile float y0 = sqrtf(f);
volatile float y1 = sqrt_apx(f);
double error = (1.0 * y1 - y0) / y0;
error = fabs(error);
if (error > error_worst) {
error_worst = error;
error_value = f;
}
error_sum += error;
error_count++;
}
void sqrt_tests(float f0, float f1) {
error_value = error_worst = error_sum = 0.0;
error_count = 0;
for (;;) {
sqrt_test(f0);
if (f0 == f1) break;
f0 = nextafterf(f0, f1);
}
printf("Worst:%e %.2f%%\n", error_value, error_worst*100.0);
printf("Average:%.2f%%\n", error_sum / error_count);
fflush(stdout);
}
int main() {
sqrt_tests(FLT_TRUE_MIN, FLT_MIN);
sqrt_tests(FLT_MIN, FLT_MAX);
return 0;
}
I stumbled upon this code to check for NaN:
/**
* isnan(val) returns true if val is nan.
* We cannot rely on std::isnan or x!=x, because GCC may wrongly optimize it
* away when compiling with -ffast-math (default in RASR).
* This function basically does 3 things:
* - ignore the sign (first bit is dropped with <<1)
* - interpret val as an unsigned integer (union)
* - compares val to the nan-bitmask (ones in the exponent, non-zero significand)
**/
template<typename T>
inline bool isnan(T val) {
if (sizeof(val) == 4) {
union { f32 f; u32 x; } u = { (f32)val };
return (u.x << 1) > 0xff000000u;
} else if (sizeof(val) == 8) {
union { f64 f; u64 x; } u = { (f64)val };
return (u.x << 1) > 0x7ff0000000000000u;
} else {
std::cerr << "isnan is not implemented for sizeof(datatype)=="
<< sizeof(val) << std::endl;
}
}
This looks arch dependent, right? However, I'm not sure about endianess, because no matter about little or big endian, the float and the int are probably stored in the same order.
Also, I wonder whether something like
volatile T x = val;
return std::isnan(x);
would have worked.
This was used with GCC 4.6 in the past.
Also, I wonder whether something like std::isnan((volatile)x) would have worked.
isnan takes its argument by value so the volatile qualifier would have been discarded. In other words, no, this doesn’t work.
The code you’ve posted relies on a specific floating point representation (IEEE). It also exhibits undefined behaviour since it relies on the union hack to retrieve the underlying float representation.
On a note about code review, the function is badly written even if we ignore the potential problems of the previous paragraph (which are justifiable): why does the function use runtime checks rather than compile-time checks and compile time error handling? It would have been better and easier just to offer two overloads.
I have an 8-character string representing a hexadecimal number and I need to convert it to an int. This conversion has to preserve the bit pattern for strings "80000000" and higher, i.e., those numbers should come out negative. Unfortunately, the naive solution:
int hex_str_to_int(const string hexStr)
{
stringstream strm;
strm << hex << hexStr;
unsigned int val = 0;
strm >> val;
return static_cast<int>(val);
}
doesn't work for my compiler if val > MAX_INT (the returned value is 0). Changing the type of val to int also results in a 0 for the larger numbers. I've tried several different solutions from various answers here on SO and haven't been successful yet.
Here's what I do know:
I'm using HP's C++ compiler on OpenVMS (using, I believe, an Itanium processor).
sizeof(int) will be at least 4 on every architecture my code will run on.
Casting from a number > INT_MAX to int is implementation-defined. On my machine, it usually results in a 0 but interestingly casting from long to int results in INT_MAX when the value is too big.
This is surprisingly difficult to do correctly, or at least it has been for me. Does anyone know of a portable solution to this?
Update:
Changing static_cast to reinterpret_cast results in a compiler error. A comment prompted me to try a C-style cast: return (int)val in the code above, and it worked. On this machine. Will that still be safe on other architectures?
Quoting the C++03 standard, §4.7/3 (Integral Conversions):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
Because the result is implementation-defined, by definition it is impossible for there to be a truly portable solution.
While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:
int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;
You can negate an unsigned twos-complement number by taking the complement and adding one. So let's do that for negatives:
if (val < 0x80000000) // positive values need no conversion
return val;
if (val == 0x80000000) // Complement-and-addition will overflow, so special case this
return -0x80000000; // aka INT_MIN
else
return -(int)(~val + 1);
This assumes that your ints are represented with 32-bit twos-complement representation (or have similar range). It does not rely on any undefined behavior related to signed integer overflow (note that the behavior of unsigned integer overflow is well-defined - although that should not happen here either!).
Note that if your ints are not 32-bit, things get more complex. You may need to use something like ~(~0U >> 1) instead of 0x80000000. Further, if your ints are no twos-complement, you may have overflow issues on certain values (for example, on a ones-complement machine, -0x80000000 cannot be represented in a 32-bit signed integer). However, non-twos-complement machines are very rare today, so this is unlikely to be a problem.
Here's another solution that worked for me:
if (val <= INT_MAX) {
return static_cast<int>(val);
}
else {
int ret = static_cast<int>(val & ~INT_MIN);
return ret | INT_MIN;
}
If I mask off the high bit, I avoid overflow when casting. I can then OR it back safely.
C++20 will have std::bit_cast that copies bits verbatim:
#include <bit>
#include <cassert>
#include <iostream>
int main()
{
int i = -42;
auto u = std::bit_cast<unsigned>(i);
// Prints 4294967254 on two's compliment platforms where int is 32 bits
std::cout << u << "\n";
auto roundtripped = std::bit_cast<int>(u);
assert(roundtripped == i);
std::cout << roundtripped << "\n"; // Prints -42
return 0;
}
cppreference shows an example of how one can implement their own bit_cast in terms of memcpy (under Notes).
While OpenVMS is not likely to gain C++20 support anytime soon, I hope this answer helps someone arriving at the same question via internet search.
unsigned int u = ~0U;
int s = *reinterpret_cast<int*>(&u); // -1
Сontrariwise:
int s = -1;
unsigned int u = *reinterpret_cast<unsigned int*>(&s); // all ones