As I know reinterpret_cast must not lead to data loss.
So it is not possible to compile such code in X86_64 due integer is smaller than pointer
#include <cstdio>
int main() {
int a = 123;
int res = reinterpret_cast<int>(reinterpret_cast<void*>(a));
printf("%d", a == res);
}
The question is: why I can compile such code in GCC and Clang?
#include <cstdio>
int main() {
__uint128_t a = 4000000000000000000;
a *= 100;
__uint128_t res = reinterpret_cast<__uint128_t>(reinterpret_cast<void*>(a));
printf("%d", a == res);
}
And the result I get is "0", means that there is a data loss.
Edit
I think there are 3 possible variants what it could be.
Compiler bug, abuse of spec, or consequence of spec.
Which one is this?
It's explained here https://en.cppreference.com/w/cpp/language/reinterpret_cast
A pointer can be converted to any integral type large enough to hold all values of its type (e.g. to std::uintptr_t)
That's why you have an error for the first case
A value of any integral or enumeration type can be converted to a pointer type...
that's why you don't have an error, but it wraps to 0 in the second case. it somehow assumes that pointer type has the biggest range compared to any integral types, whereas with 128 bits integers it's not the case.
Note that a 128 bit integer is not an integral type generally speaking but at least gcc defines it as is in gcc extensions:
from https://quuxplusone.github.io/blog/2019/02/28/is-int128-integral/
libstdc++ (in standard, non-gnu++XX mode) leaves is_integral_v<__int128> as false. This makes a certain amount of sense from the library implementor’s point of view, because __int128 is not one of the standard integral types, and furthermore, if you call it integral, then you have to face the consequence that intmax_t (which is 64 bits on every ABI that matters) is kind of lying about being the “max.”
but
In -std=gnu++XX mode, libstdc++ makes is_integral_v<__int128> come out to true
Related
How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Different compilers will warn for something like:
int i = get_int();
size_t s = i;
for loss of signedness or
size_t s = get_size();
int i = s;
for narrowing.
casting can remove the warnings but don't solve the safety issue.
Is there a proper way of doing this?
You can try boost::numeric_cast<>.
boost numeric_cast returns the result of converting a value of type Source to a value of type Target. If out-of-range is detected, an exception is thrown (see bad_numeric_cast, negative_overflow and positive_overflow ).
How does one convert from one integer type to another safely and with setting off alarm bells in compilers and static analysis tools?
Control when conversion is needed. As able, only convert when there is no value change. Sometimes, then one must step back and code at a higher level. IOWs, was a lossy conversion needed or can code be re-worked to avoid conversion loss?
It is not hard to add an if(). The test just needs to be carefully formed.
Example where size_t n and int len need a compare. Note that positive values of int may exceed that of size_t - or visa-versa or the same. Note in this case, the conversion of int to unsigned only happens with non-negative values - thus no value change.
int len = snprintf(buf, n, ...);
if (len < 0 || (unsigned)len >= n) {
// Handle_error();
}
unsigned to int example when it is known that the unsigned value at this point of code is less than or equal to INT_MAX.
unsigned n = ...
int i = n & INT_MAX;
Good analysis tools see that n & INT_MAX always converts into int without loss.
There is no built-in safe narrowing conversion between int types in c++ and STL. You could implement it yourself using as an example Microsoft GSL.
Theoretically, if you want perfect safety, you shouldn't be mixing types like this at all. (And you definitely shouldn't be using explicit casts to silence warnings, as you know.) If you've got values of type size_t, it's best to always carry them around in variables of type size_t.
There is one case where I do sometimes decide I can accept less than 100.000% perfect type safety, and that is when I assign sizeof's return value, which is a size_t, to an int. For any machine I am ever going to use, the only time this conversion might lose information is when sizeof returns a value greater than 2147483647. But I am content to assume that no single object in any of my programs will ever be that big. (In particular, I will unhesitatingly write things like printf("sizeof(int) = %d\n", (int)sizeof(int)), explicit cast and all. There is no possible way that the size of a type like int will not fit in an int!)
[Footnote: Yes, it's true, on a 16-bit machine the assumption is the rather less satisfying threshold that sizeof won't return a value greater than 32767. It's more likely that a single object might have a size like that, but probably not in a program that's running on a 16-bitter.]
Consider this example:
#include <iostream>
int main()
{
char c = 256;
std::cout << static_cast<int>(c);
}
Raise a warning:
warning: overflow in conversion from 'int' to 'char' changes value from '256' to ''\000'' [-Woverflow]
But this:
#include <iostream>
int main()
{
char c = 255;
std::cout << static_cast<int>(c);
}
don't, but the std::cout in both cases doesn't print 256 and 255, so it shows char can't hold 256 and 255, but the warning only raise when the char c is 256?
You can toy around with it here
It is important to specify whether char is signed in your example or not, but going by your link it is signed.
If you write
char c = 256;
256 has type int, so to store the value in c a conversion to char has to happen.
For signed target types such integer conversions produce the same value if it is representable in the target type. That would with the typical bit size and representation of a signed char be -128 to 127.
What happens if the source value is not representable in the target type depends on a few factors.
First, since C++20 two's-complement is guaranteed, meaning that the resulting value is guaranteed to be the unique value such that the it and the source value are equal modulo 2^n with n the bit size of the target type (typically 8 for char).
Before C++20 it was implementation-defined what happens in such a case, but it is very likely that the implementation would just have specified behavior equivalent to the C++20 one.
So, the warning is not meant to prevent undefined behavior or even implementation-defined behavior since C++20, but just to notify the user about likely mistakes.
I can't be sure why GCC chooses to warn only for values larger than 255, but my guess is that it is done because a statement like
char c = 255;
makes sense if you interpret the right-hand side as an unsigned char. The conversion rule explained above would not change the character represented before and after the conversion.
However
char c = 256;
doesn't even make sense if the right-hand side is interpreted as unsigned char. So the likelihood that this is a mistake seems higher.
But maybe I am also guessing in the wrong direction. There is an open bug report for GCC concerning this behavior of the -Woverflow warning. From reading the comments there it is not clear to me what the reasoning for the behavior was originally.
Clang for example consistently warns about all out-of-range values. So there seem to be different thoughts put into the warnings.
Consider the following program:
#include <iostream>
int main()
{
unsigned int a = 3;
unsigned int b = 7;
std::cout << (a - b) << std::endl; // underflow here!
return 0;
}
In the line starting with std::cout an underflow is happening because a is lesser than b so a-b is less than 0, but since a and b are unsigend so is a-b.
Is there a compiler flag (for G++) that gives me a warning when I try to calculate the difference of two unsigend integers?
Now, one could argue that an overflow/underflow can happen in any calculation using any operator. But I think it is more dangerous to apply operator - to unsigend ints because with unsigned integers this error may happen with quite low (to me: "more common") numbers.
A (static analysis) tool that finds such things would also be great but I much prefer a compiler flag and warning.
GCC does not (afaict) support it, but Clang's UBSanitizer has the following option [emphasis mine]:
-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation
I was investigating the structure of floating-point numbers, and I've found that most of compilers use IEEE 754 standard to store floating point numbers.
And when I tried to do:
float a=0x3f520000; //have to be equal to 0.8203125 in IEEE 754
printf("value of 'a' is: %X [%x] %f\n",(int)a,(int)a, a);
it produces the result:
value of 'a' is: 3F520000 [3f520000] 1062338560.000000
but if I try:
int b=0x3f520000;
float* c = (float*)&b;
printf("value of 'c' is: %X [%x] %f\r\n", *(int*)c, *(int*)c, c[0]);
it gives:
value of 'c' is: 3F520000 [3f520000] 0.820313
The second try gave me the right answer. What is it wrong with the first try? And why does the result differ from that when I cast int to float via pointer?
The difference is that the first converts the value (0x3f520000 is the integer 1062338560), and is equivalent to this:
float a = 1062338560;
printf("value of 'a' is: %X [%x] %f\n",(int)a,(int)a, a);
The second reinterprets the representation of the int - 111111010100100000000000000000 - as being the representation of a float instead.
(It's also undefined, so you shouldn't expect it to do anything in particular.)
[Note: This answer assumes C, not C++, which have different rules]
With
float a=0x3f520000;
you take the integer value 1062338560 and the compiler will convert it to 1062338560.0f.
If you want hexadecimal floating point constant you must use exponent-format using the letter p. As in 0x1.a40010c6f7a0bp-1 (which is the hexadecimal notation for 0.820313).
What happens with
int b=0x3f520000;
float* c = (float*)&b;
is that you break strict aliasing and tell the compiler that c is pointing to a floating-point value (the strict aliasing break is because b isn't a floating point value). The compiler will then reinterpret the bits in *c as a float value.
0x3f520000 is an integer constant. When assigned to a float, the integer is converted.
Some more proper example of how to convert in the second case:
#include <stdio.h>
#include <string.h>
#include <stdint.h>
int main() {
uint32_t fi = 0x3f520000;
float f;
memcpy(&f, &fi, sizeof(f));
printf("%.7g\n", f);
}
it prints:
0.8203125
so that is what you expected.
The approach I used is memcpy that is the safest for all compilers and best choice for modern compilers (GCC since approx. 4.6, Clang since 3.x) that interpret memcpy as "bit cast" in such case and optimize it in a efficient and safe way (at least in "hosted" mode). That's still safe for older compilers, but not nessessarily efficient in the same way; some can prefer cast through union or ever through different pointer type. On dangers of that ways, see here or generally search "type punning and strict aliasing".
(Also, there could be some weird platforms that suffer from endianness issue that integer endianness differs from float one; ones that have byte different than 8 bits, and so on. I don't consider them here.)
UPDATE: I was starting answering to initial version of the question. Yep, bit casting and value conversion will give principally different results. That's how float numbers work.
I was trying to determine the largest possible value in a bit field, what I did is:
using namespace std;
struct A{
unsigned int a:1;
unsigned int b:3;
};
int main()
{
A aa;
aa.b = ~0U;
return 0;
}
MSVC is fine but GCC 4.9.2 gave me a warning:
warning: large integer implicitly truncated to unsigned type [-Woverflow]
Wondering how I can get rid of it (Assuming I don't know the bit width of the field, and I want to know what's the largest possible value in it).
You can try working around this as follows
aa.b = 1;
aa.b = -aa.b;
Note that value-representation aspects of bit-fields, including their range, are currently underspecified in the language standard, which is considered a defect in C++ standard. The is strange, especially considering that other parts of the document (e.g. specification of enum types) attempt to rely on the range of representable values of bit-fields for their own purposes. This is supposed to be taken care of in the future.