EDIT In the actual example, it appears possible that negative overflow can happen, I've also added an example to demonstrate the error there
I'm using C++20 and trying to convert a library which relies on signed integer overflow in Java and C# into C++ code. I'm also trying to generate the tables it uses at compile time, and allow those to be available at compile time.
In my code I get errors in reference to code that looks like this (Minimal example to reproduce the error, the solution to this will solve my problem as well):
#include <iostream>
constexpr auto foo(){
std::int64_t a = 2;
std::int64_t very_large_constant = 0x598CD327003817B5L;
std::int64_t x = a * very_large_constant;
return x;
}
int main(){
std::cout << foo() << std::endl;
return 0;
}
https://godbolt.org/z/TvM45vd8d
Negative overflow version
#include <iostream>
constexpr auto foo(){
std::int64_t a = -2;
std::int64_t very_large_constant = 0x598CD327003817B5L;
std::int64_t x = a * very_large_constant;
return x;
}
int main(){
std::cout << foo() << std::endl;
return 0;
}
https://godbolt.org/z/7zoE9r18E
I get 12905529061151879018 is out side of range representable by long long and -12905529061151879018 respectively.
I understand that undefined behavior here is not allowed, I also recognize that GCC and MSVC do not error here, and you can put a flag to make clang compile this anyway. But what am I supposed to do to actually solve this issue with out switching compilers or applying the flag to ignore invalid constexpr?
Is there some way I can define the behavior I expect and want to happen here?
Signed integers have two's complement layout in any implementation that you could name. It's also guaranteed to use two'
s complement layout since C++20.
This means that you can perform your math on unsigned integers and get well-defined overflow behavior that matches what you want your signed integers to do.
#include <iostream>
#include <bit>
constexpr auto foo(){
std::uint64_t a = 2;
std::uint64_t very_large_constant = 0x598CD327003817B5L;
std::uint64_t x = a * very_large_constant;
return static_cast<std::int64_t>(x);
}
You cannot do this with signed integers. However, there are some things you can rely on in C++20:
Unsigned integer overflow is well-defined.
Signed integers are required to be represented as 2's complement.
Conversions between corresponding sized and unsigned integers preserve the bitpattern.
So you can do all of your overflow-based math using explicitly unsigned types and literals, then cast them to signed values when you need to. This conversion is required to leave the bits unchanged.
Related
Consider the following program:
#include <iostream>
int main()
{
unsigned int a = 3;
unsigned int b = 7;
std::cout << (a - b) << std::endl; // underflow here!
return 0;
}
In the line starting with std::cout an underflow is happening because a is lesser than b so a-b is less than 0, but since a and b are unsigend so is a-b.
Is there a compiler flag (for G++) that gives me a warning when I try to calculate the difference of two unsigend integers?
Now, one could argue that an overflow/underflow can happen in any calculation using any operator. But I think it is more dangerous to apply operator - to unsigend ints because with unsigned integers this error may happen with quite low (to me: "more common") numbers.
A (static analysis) tool that finds such things would also be great but I much prefer a compiler flag and warning.
GCC does not (afaict) support it, but Clang's UBSanitizer has the following option [emphasis mine]:
-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation
Had been going through this code:
#include<cstdio>
#define TOTAL_ELEMENTS (sizeof(array) / sizeof(array[0]))
int array[] = {1,2,3,4,5,6,7};
int main()
{
signed int d;
printf("Total Elements in the array are => %d\n",TOTAL_ELEMENTS);
for(d=-1;d <= (TOTAL_ELEMENTS-2);d++)
printf("%d\n",array[d+1]);
return 0;
}
Now obviously it does not get into the for loop.
Whats the reason?
The reason is that in C++ you're getting an implicit promotion. Even though d is declared as signed, when you compare it to (TOTAL_ELEMENTS-2) (which is unsigned due to sizeof), d gets promoted to unsigned. C++ has very specific rules which basically state that the unsigned value of d will then be the congruent unsigned value mod numeric_limits<unsigned>::max(). In this case, that comes out to the largest possible unsigned number which is clearly larger than the size of the array on the other side of the comparison.
Note that some compilers like g++ (with -Wall) can be told to warn about such comparisons so you can make sure that the code looks correct at compile time.
The program looks like it should throw a compile error. You're using "array" even before its definition. Switch the first two lines and it should be okay.
I was trying to determine the largest possible value in a bit field, what I did is:
using namespace std;
struct A{
unsigned int a:1;
unsigned int b:3;
};
int main()
{
A aa;
aa.b = ~0U;
return 0;
}
MSVC is fine but GCC 4.9.2 gave me a warning:
warning: large integer implicitly truncated to unsigned type [-Woverflow]
Wondering how I can get rid of it (Assuming I don't know the bit width of the field, and I want to know what's the largest possible value in it).
You can try working around this as follows
aa.b = 1;
aa.b = -aa.b;
Note that value-representation aspects of bit-fields, including their range, are currently underspecified in the language standard, which is considered a defect in C++ standard. The is strange, especially considering that other parts of the document (e.g. specification of enum types) attempt to rely on the range of representable values of bit-fields for their own purposes. This is supposed to be taken care of in the future.
I have an 8-character string representing a hexadecimal number and I need to convert it to an int. This conversion has to preserve the bit pattern for strings "80000000" and higher, i.e., those numbers should come out negative. Unfortunately, the naive solution:
int hex_str_to_int(const string hexStr)
{
stringstream strm;
strm << hex << hexStr;
unsigned int val = 0;
strm >> val;
return static_cast<int>(val);
}
doesn't work for my compiler if val > MAX_INT (the returned value is 0). Changing the type of val to int also results in a 0 for the larger numbers. I've tried several different solutions from various answers here on SO and haven't been successful yet.
Here's what I do know:
I'm using HP's C++ compiler on OpenVMS (using, I believe, an Itanium processor).
sizeof(int) will be at least 4 on every architecture my code will run on.
Casting from a number > INT_MAX to int is implementation-defined. On my machine, it usually results in a 0 but interestingly casting from long to int results in INT_MAX when the value is too big.
This is surprisingly difficult to do correctly, or at least it has been for me. Does anyone know of a portable solution to this?
Update:
Changing static_cast to reinterpret_cast results in a compiler error. A comment prompted me to try a C-style cast: return (int)val in the code above, and it worked. On this machine. Will that still be safe on other architectures?
Quoting the C++03 standard, §4.7/3 (Integral Conversions):
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
Because the result is implementation-defined, by definition it is impossible for there to be a truly portable solution.
While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:
int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;
You can negate an unsigned twos-complement number by taking the complement and adding one. So let's do that for negatives:
if (val < 0x80000000) // positive values need no conversion
return val;
if (val == 0x80000000) // Complement-and-addition will overflow, so special case this
return -0x80000000; // aka INT_MIN
else
return -(int)(~val + 1);
This assumes that your ints are represented with 32-bit twos-complement representation (or have similar range). It does not rely on any undefined behavior related to signed integer overflow (note that the behavior of unsigned integer overflow is well-defined - although that should not happen here either!).
Note that if your ints are not 32-bit, things get more complex. You may need to use something like ~(~0U >> 1) instead of 0x80000000. Further, if your ints are no twos-complement, you may have overflow issues on certain values (for example, on a ones-complement machine, -0x80000000 cannot be represented in a 32-bit signed integer). However, non-twos-complement machines are very rare today, so this is unlikely to be a problem.
Here's another solution that worked for me:
if (val <= INT_MAX) {
return static_cast<int>(val);
}
else {
int ret = static_cast<int>(val & ~INT_MIN);
return ret | INT_MIN;
}
If I mask off the high bit, I avoid overflow when casting. I can then OR it back safely.
C++20 will have std::bit_cast that copies bits verbatim:
#include <bit>
#include <cassert>
#include <iostream>
int main()
{
int i = -42;
auto u = std::bit_cast<unsigned>(i);
// Prints 4294967254 on two's compliment platforms where int is 32 bits
std::cout << u << "\n";
auto roundtripped = std::bit_cast<int>(u);
assert(roundtripped == i);
std::cout << roundtripped << "\n"; // Prints -42
return 0;
}
cppreference shows an example of how one can implement their own bit_cast in terms of memcpy (under Notes).
While OpenVMS is not likely to gain C++20 support anytime soon, I hope this answer helps someone arriving at the same question via internet search.
unsigned int u = ~0U;
int s = *reinterpret_cast<int*>(&u); // -1
Сontrariwise:
int s = -1;
unsigned int u = *reinterpret_cast<unsigned int*>(&s); // all ones
Here are the goals I'm trying to achieve:
I need to pack 32 bit IEEE floats into 30 bits.
I want to do this by decreasing the size of mantissa by 2 bits.
The operation itself should be as fast as possible.
I'm aware that some precision will be lost, and this is acceptable.
It would be an advantage, if this operation would not ruin special cases like SNaN, QNaN, infinities, etc. But I'm ready to sacrifice this over speed.
I guess this questions consists of two parts:
1) Can I just simply clear the least significant bits of mantissa? I've tried this, and so far it works, but maybe I'm asking for trouble... Something like:
float f;
int packed = (*(int*)&f) & ~3;
// later
f = *(float*)&packed;
2) If there are cases where 1) will fail, then what would be the fastest way to achieve this?
Thanks in advance
You actually violate the strict aliasing rules (section 3.10 of the C++ standard) with these reinterpret casts. This will probably blow up in your face when you turn on the compiler optimizations.
C++ standard, section 3.10 paragraph 15 says:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Specifically, 3.10/15 doesn't allow us to access a float object via an lvalue of type unsigned int. I actually got bitten myself by this. The program I wrote stopped working after turning on optimizations. Apparently, GCC didn't expect an lvalue of type float to alias an lvalue of type int which is a fair assumption by 3.10/15. The instructions got shuffled around by the optimizer under the as-if rule exploiting 3.10/15 and it stopped working.
Under the following assumptions
float really corresponds to a 32bit IEEE-float,
sizeof(float)==sizeof(int)
unsigned int has no padding bits or trap representations
you should be able to do it like this:
/// returns a 30 bit number
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return r >> 2;
}
float unpack_float(unsigned int x) {
x <<= 2;
float r;
std::memcpy(&r,&x,sizeof r);
return r;
}
This doesn't suffer from the "3.10-violation" and is typically very fast. At least GCC treats memcpy as an intrinsic function. In case you don't need the functions to work with NaNs, infinities or numbers with extremely high magnitude you can even improve accuracy by replacing "r >> 2" with "(r+1) >> 2":
unsigned int pack_float(float x) {
unsigned r;
std::memcpy(&r,&x,sizeof r);
return (r+1) >> 2;
}
This works even if it changes the exponent due to a mantissa overflow because the IEEE-754 coding maps consecutive floating point values to consecutive integers (ignoring +/- zero). This mapping actually approximates a logarithm quite well.
Blindly dropping the 2 LSBs of the float may fail for small number of unusual NaN encodings.
A NaN is encoded as exponent=255, mantissa!=0, but IEEE-754 doesn't say anything about which mantiassa values should be used. If the mantissa value is <= 3, you could turn a NaN into an infinity!
You should encapsulate it in a struct, so that you don't accidentally mix the usage of the tagged float with regular "unsigned int":
#include <iostream>
using namespace std;
struct TypedFloat {
private:
union {
unsigned int raw : 32;
struct {
unsigned int num : 30;
unsigned int type : 2;
};
};
public:
TypedFloat(unsigned int type=0) : num(0), type(type) {}
operator float() const {
unsigned int tmp = num << 2;
return reinterpret_cast<float&>(tmp);
}
void operator=(float newnum) {
num = reinterpret_cast<int&>(newnum) >> 2;
}
unsigned int getType() const {
return type;
}
void setType(unsigned int type) {
this->type = type;
}
};
int main() {
const unsigned int TYPE_A = 1;
TypedFloat a(TYPE_A);
a = 3.4;
cout << a + 5.4 << endl;
float b = a;
cout << a << endl;
cout << b << endl;
cout << a.getType() << endl;
return 0;
}
I can't guarantee its portability though.
How much precision do you need? If 16-bit float is enough (sufficient for some types of graphics), then ILM's 16-bit float ("half"), part of OpenEXR is great, obeys all kinds of rules (http://www.openexr.com/), and you'll have plenty of space left over after you pack it into a struct.
On the other hand, if you know the approximate range of values they're going to take, you should consider fixed point. They're more useful than most people realize.
I can't select any of the answers as the definite one, because most of them have valid information, but not quite what I was looking for. So I'll just summarize my conclusions.
The method for conversion I've posted in my question's part 1) is clearly wrong by C++ standard, so other methods to extract float's bits should be used.
And most important... as far as I understand from reading the responses and other sources about IEEE754 floats, it's ok to drop the least significant bits from mantissa. It will mostly affect only precision, with one exception: sNaN. Since sNaN is represented by exponent set to 255, and mantissa != 0, there can be situation where mantissa would be <= 3, and dropping last two bits would convert sNaN to +/-Infinity. But since sNaN are not generated during floating point operations on CPU, its safe under controlled environment.