Somebody told me that type-casting C conversions does only change how the system interprets the information (for example, casting the char 'A' into int does return 65 when using cout to print it since in
memory it stays as 01000001).
However, I noticed that, when casting floating point numbers into same width integers, the value is conserved and not changed, as it would be if only the interpretation was changed.
For example, let X be a double precision floating point number:
double X = 3.14159;
As far as I now, when inspecting &X we will find (converted by decimal to binary converter):
01000000 00001001 00100001 11111001 11110000 00011011 10000110 01101110
But, as some of you would already know, when doing:
long long Y = (long long)X;
Y will be 3, the truncated version of X, instead of 4614256650576692846, the value it would get when looking at the binary values at &X if looking for a long long.
So, I think it is clear that they were wrong but, then, how does casting work in low level? Is there any detection of whether the value would be changed or not? How would you code it to get Y = 4614256650576692846 instead of Y = 3?
Casting will try to preserve the values as precise as possible.
You can use memcpy() to copy bit patterns.
#include <iostream>
#include <cstring>
int main() {
double X = 3.14159;
long long Y;
memcpy(&Y, &X, sizeof(Y));
std::cout << Y << '\n';
return 0;
}
Casting lets the compiler decide how to change the data in order for it to be as useful as possible yet respecting the requested datatype.
The int to char conversion just changes the interpretation from, let us say, 65 to 'A'.
However, when we have a value we may want to conserve, the compiler will use special instructions for its conversion.
For example, when casting from double to long long, the processor will use the CVTTSD2SI instruction, which loads and truncates a FP register's value into a general purpose one:
double a = 3.14159;
long long b = (long long)a;
will have a disassembly of (I got rid of the stack pointers for ease of understanding):
movsd xmm0, QWORD PTR [a]
cvttsd2si rax, xmm0
mov QWORD PTR [b], rax
So, the ways to use the original value would be as mentioned in the selected answer: dereferencing the pointer to the double and place it into the long long variable or, as other stated, using memcpy().
If you want to get Y = 4614256650576692846, you can use:
double X = 3.14159;
long long Y = *( (long long*)(&X) );
This will cast a double pointer to a long long pointer, and then the compiler thinks that (long long*)(&X) is somewhere a long long stores.
But I don't advise you to do so because the result is based on how double is stored on your machine, and the result is not guaranteed to be 4614256650576692846.
Related
For some legacy reasons I have code that casts a double to unsigned byte and we are seeing much difference between the two platforms. Short of doing - "don't try to stuff a signed value into unsigned value; don't try to stuff a double value in integer", is there something else that can be done?
unsigned char newR1 = -40.16;
Value of newR1 is 216 on windows (as we expected from a long time); but on ARM64 it is 0.
Disassembly on Win:
00007FF75E388818 cvttsd2si eax,mmword ptr [R]
00007FF75E38881D mov byte ptr [newR1],al
On ARM64
00007FF6F9E800DC ldr d16,[sp,#0x38 |#0x38 ]
00007FF6F9E800E0 fcvtzu w8,d16
00007FF6F9E800E4 uxtb w8,w8
00007FF6F9E800E8 strb w8,[sp,#0x43 |#0x43 ]
Will try these as well, but just wanted some other opinions
unsigned char newR1 = -40.16;
unsigned char newR2 = (int)-40.16;
unsigned char newR3 = (unsigned char)-40.16;
unsigned char newR4 = static_cast<int>(-40.16);
or may be
int i = -40.16;
unsigned char c = i;
What the C standard says (and there's similar text in the C++ one):
When a finite value of real floating type is converted to an integer
type other than _Bool, the fractional part is discarded (i.e., the
value is truncated toward zero). If the value of the integral part
cannot be represented by the integer type, the behavior is undefined.
So, getting 216 out of -40.16 with a single cast from double to unsigned char is already UB. In fact, getting any result in this case is UB. Which is why the compiler is free to produce anything and not 216 that you desire.
You may want to do two casts:
(unsigned char)(int)-40.16
Again, the first cast (to int) is still subject to the above restriction I quoted.
I've searched the docs but I can't find anything to explain this.
I have a D program:
import std.stdio;
void main () {
writeln(int.max);
int a = 2;
int b = 180;
writeln(a^^b);
}
It writes:
2147483647
0
I overflowed int, but instead of getting junk or wrapping, I get 0.
If I use real or double, obviously the output will be correct.
I wrote a program to experiment with this in C (not that C is D, but D compiles to native code and C is portable assembler, so they should be comparable):
#include <stdio.h>
#include <math.h>
#include <limits.h>
int main(void) {
double i = pow(2, 128);
int j = (int) i;
printf("int max: %d\n", INT_MAX);
printf("print dbl as int: %d\n", i);
printf("cast dbl -> int: %d\n", j);
printf("double: %f\n", i);
return 0;
}
It gives:
int max: 2147483647
print dbl as int: -1254064128
cast dbl -> int: -2147483648
double: 340282366920938463463374607431768211456.000000
The second and third lines line will rarely be the same thing twice, as I believe it's undefined behaviour, which is the point.
I know that D wants to be a better C, and a way to do this is to eliminate undefined behaviour.
But, if D is a systems programming language (it even has inline asm), why does D refuse to wrap on overflow?
It DID wrap on overflow. You just happened to try the family of cases that happen to wrap to zero. Try 3^^180. I got -949019631. Just because a number happens to look pretty on screen doesn't mean it isn't garbage!
Consider that 2^^n == 1 << n. What happens when you shift a single bit left over and over and over again? Eventually, all the bits on the right become zero! Then when you truncate that to fit in a 64 bit value, you are left with all being zero.
But let me go into some detail anyway. First, a critique of your C:
// snip. note that i is type double
printf("print dbl as int: %d\n", i);
This line is wrong on two levels: it passes a 64 bit double where printf is expecting a 32 bit int, and it is reinterpret casting those bits to int, which is entirely different than doing a conversion to int.
If you wanted to do this in D, you'd want to explicitly reinterpret the bits using a union or cast through an intermediate pointer. You could even slice off the other 32 bits if you wanted to!
The next line, which uses a proper explicit cast, is written correctly, but still undefined behavior because casting a double to int when it is too large to fit is something neither C nor D (nor the underlying hardware) makes any promises about.
And back to D. The ^^ operator in D simply rewrites the expression into std.math.pow(a, b). std.math.pow has different implementations for different types. Since both arguments are integral here, it does no floating point calculations at all - it has a pure int/long implementation that works just like multiplication.
So your C comparison isn't quite right because in C, you used double and tried to convert, whereas in D, it never touched floating point at all. Integer multiplication is defined to work via two's complement and truncation, and that's exactly what happened here. It overflowed, leaving all zeros in the bits left behind.
I have a function to convert floating point array to unsigned char array. This uses asm code to do that. The code was written many years ago. Now I am trying to build the solution in x64 bit. I understand that _asm is not supported on X64.
What is the best way to remove asm dependency?
Will the latest MS VC compiler optimize if I write C code? Does anyone know if there is anything in the boost or intrinsic funtions to accomplish this?
Thanks
--Hari
I solved by the following code and this is faster than asm
inline static void floatTOuchar(float * pInbuf, unsigned char * pOutbuf, long len)
{
std::copy(pInbuf, pInbuf + len, pOutbuf);
return ;
}
With SSE2, you can use intrinsics to pack from float down to unsigned char, with saturation to unsigned the 0..255 range.
Convert four vectors of floats to vectors of ints, with CVTPS2DQ (_mm_cvtps_epi32) to round to nearest, or convert with truncation (_mm_cvttps_epi32) if you want the default C floor behaviour.
Then pack those vectors together, first to two vectors of signed 16bit int with two PACKSSDW (_mm_packs_epi32), then to one vector of unsigned 8bit int with PACKUSWB (_mm_packus_epi16). Note that PACKUSWB takes signed input, so using SSE4.1 PACKUSDW as the first step just makes things more difficult (extra masking step). int16_t can represent all possible values of uint8_t, so there's no problem.
Store the resulting vector of uint8_t and repeat for the next four vectors of floats.
Without manual vectorization, normal compiler output is good for code like.
int ftoi_truncate(float f) { return f; }
cvttss2si eax, xmm0
ret
int dtoi(double d) { return nearbyint(d); }
cvtsd2si eax, xmm0 # only with -ffast-math, though. Without, you get a function call :(
ret
You can try the following and let me know:
inline int float2int( double d )
{
union Cast
{
double d;
long l;
};
volatile Cast c;
c.d = d + 6755399441055744.0;
return c.l;
}
// Same thing but it's not always optimizer safe
inline int float2int( double d )
{
d += 6755399441055744.0;
return reinterpret_cast<int&>(d);
}
for(int i = 0; i < HUGE_NUMBER; i++)
int_array[i] = float2int(float_array[i]);
So the trick is the double parameters. In the current code , the function rounds the float to the nearest whole number.If you want truncation , use 6755399441055743.5 (0.5 less).
Very informative article available at: http://stereopsis.com/sree/fpu2006.html
Let me start by saying that I don't know much about Assembly, but this is something that I'm wondering about.
Let's say that I have a code in C++ such as the following:
float f = 34.2;
int i;
i = f;
Now obviously what will happen when this code gets executed is that the value of f (34.2) will be converted to an integer value (34) and assigned to i.
My question is how does this conversion happens, I mean does it happen at runtime, so at runtime there is a code embedded into the executable that goes something like this: f is being assigned to i, now I know that f is a float and i is an integer, so what I should do is convert the bits in the f variable to an integer representation and then assign it to i;
Or what happens is that at compile time the i = f will directly be replaced by code that will convert a float to an integer?
Your code is
float f = 34.2;
int i;
i = f;
Just debug and have a look at the Dis-assembly Window. In a debug build (so constant-propagation doesn't happen, and the variables aren't optimized away entirely):
float f = 34.2;
01175498 movss xmm0,dword ptr ds:[117DF70h]
011754A0 movss dword ptr [f],xmm0
int i;
i = f;
011754A5 cvttss2si eax,dword ptr [f]
011754AA mov dword ptr [i],eax
You can see the instruction cvttss2si (Convert with Truncation Scalar Single-Precision Floating-Point Value to Integer) is being executed.
This is what Mats Petersson said in his comment. This instruction will convert the float to its integer representation, with rounding towards 0 regardless of the current rounding mode.
The input operand is 32 bits wide, and is interpreted as IEEE single-precision because x86 uses that format for float.
(C++ compilers targeting x86 without SSE1/SSE2 had to change the x87 rounding mode to truncation and then back to the previous value; that's why SSE1 and SSE2 included convert-with-truncation instructions but not other rounding-mode overrides, until SSE4.1 roundss/roundsd to implement floor/ceil/trunc/nearbyint with a floating-point result. C++ requires FP->integer conversions to truncate towards zero, separately from the default rounding mode when producing an FP result.)
Most other modern ISAs have a single instruction FP->int conversion with truncation instruction, although non-CISC ones can only operate between registers and would need separate load and store in a debug build.
I have a question: how can I see what is the value of the number at memory address X in c++.
I want to make something like the:
mov bx, 1024d
mov ax, [bx]
from assembly, where ax will be my result.
Thanks for the support.
P.S. I just started working with pointers and memory addresses
In C++, the value at that address is *reinterpret_cast<uint16_t*>(1024).
in c/c++ address is stored as pointer, so BX in your code in c++ will be
unsigned short* _mybx = (unsigned short*)1024U; // note casting to unsigned short*
to get value which is stored on address you need to use:
unsigned short _myax = *_mybx; // note asterix here
instead of c kind of cast, you can use reinterpret_cast
unsigned short* _bx = reinterpret_cast<unsigned short*>(1024);
which is more c++ way