union in c stores one data at a time - c++

If size of the union allocated in memory is equal to the biggest data type member in bytes then could anyone tell me how compiler is storing and fetching two datas double d and int i (total 8+4 bytes) (double on my machine is of 8 bytes) .
#include<stdio.h>
union test {
int i;
double d;
};
int main()
{
union test obj;
obj.d=15.5;
obj.i=200;
printf("\nValue stored in d is %f",obj.d);
printf("\nValue stored in i is %d",obj.i);
printf("\n size of obj is %d ",sizeof(obj));
}
**Output is : Value stored in d is 15.500000
Value stored in i is 200
size of obj is 8**

The way it can store both is "pure luck". I'm just going to assume your computer architecture uses IEEE 754 floating point numbers and try to explain what you're seeing. Your union really does use only eight bytes, but 15.5 looks like this in hex: 402F000000000000. As you can see the lower four bytes are completely zero. Now let's set the lowest four bytes to an integer 200 and see what happens to the eight byte value. That gives us 402F0000000000C8. Now say you read all eight bytes back as a double now, in IEEE754 you get 15.500000000000355 which when printed will round off to 15.5 making it appear that the union can store both a double and an int.
All that said accessing both members of the union like that is undefined behavior in C++ at least up to C++11 (even though it behaves in the logical way on all platforms I'm aware of), so this is simply one possible explanation for the behavior you observe. In C it appears to be completely legal though.

The reason it seems like you can store both numbers in overlapping memory is the way the representation of the small values you chose is arranged.
If you try values that require more information stored like for example:
obj.d=100000000000000;
obj.i=0xffffffff;
You'll see a difference in the output of printing the double value:
Value stored in d is 100000059097087.984375

I think it's not really working the way you think it is. If you add a bit more to the test:
#include<stdio.h>
union test {
int i;
double d;
};
int main()
{
union test obj;
obj.d=15.5;
obj.i=200;
printf("\nValue stored in d is %f",obj.d);
printf("\nValue stored in i is %d",obj.i);
printf("\n size of obj is %d ",sizeof(obj));
obj.d=17.5;
printf("\nValue stored in d is %f",obj.d);
printf("\nValue stored in i is %d",obj.i);
printf("\n size of obj is %d ",sizeof(obj));
obj.i=300;
printf("\nValue stored in d is %f",obj.d);
printf("\nValue stored in i is %d",obj.i);
printf("\n size of obj is %d ",sizeof(obj));
}
Output is:
$ ./main
Value stored in d is 15.500000
Value stored in i is 200
size of obj is 8
Value stored in d is 17.500000
Value stored in i is 0
size of obj is 8
Value stored in d is 17.500000
Value stored in i is 300
size of obj is 8
Note that value of i is 0 in the middle there! That's because it got (partially?) overwritten.
If I understand correctly it's undefined behavior there, and exactly what value you see will depend on the architecture you compile it for, the alignment the compiler uses for this structure etc. etc.
Edit:
I think I understand the original question now -- it's about, why is it possible to recover the first stored value at all, right? Why don't I see gibberish instead of 200.
My guess is that it might depend on some implementation details of the floating point stuff? Maybe the double happens not to overwrite the bits of "int" if its a "simple" double like 17.5. Not completely sure. Edit: See "imreal"'s answer.

You can store either i or d to your union, not both at the same time. The statement
obj.i=200;
overwrites the value stored at the memory allocated for the union. Now accessing obj.d with %f specifier will invoke undefined behavior that's because the value stored in obj is int.

After writing to obj.i, reading from obj.d is undefined behavior, and as far as the C standard is concerned anything can happen.
In this particular case you probably don't see anything happen because of floating point numbers are laid out in memory. You are changing some of the least significant bits of the mantissa, creating a change that's not seen in the first 6 decimal digits. Add more digits:
printf("\nValue stored in d is %.17f",obj.d);
and you'll get:
Value stored in d is 15.50000000000035705

Related

Pointer with same memory address with different value

I casted the memory address from double to an integer .
Even though they point to the same address why the values are different ?
#include<iostream>
using namespace std;
int main()
{
double d = 2.5;
auto p = (int*)&d;
auto q = &d;
cout<<p<<endl; // prints memory address 0x7fff5fbff660
cout<<q<<endl; // print memory address 0x7fff5fbff660
cout<<*p<<endl; //prints 0
cout<<*q<<endl; // prints 2.5
return 0;
}
But why the value are different
0x7fff5fbff660
0x7fff5fbff660
0
2.5
Program ended with exit code: 0
Suppose you have "11" written on a piece of paper. That is eleven if it's decimal digits. That is two if there's one mark for each value. That's three if it's binary. How you interpret stored information affects the value you understand it to be storing.
double d = 2.5;
auto p = (int*)&d;
auto q = &d;
p and q are created pointing to the same memory location. The memory holds a double (usually 8 bytes)
When you create
auto p = (int*)&d;
you are telling the compiler ( reintepret_cast< int*> ( &d) ) that the value in d was an integer.
So the values of the pointers are the same, but the types are not.
When you print out
cout<<*q<<endl; // prints 2.5
You are displaying the correct value - as it came in and out through that.
when you print out
cout<<*p<<endl; //prints 0
You are looking at 4 (typically) bytes of the 8 byte memory, and interpreting them as an integer.
These happen to be 0x00, 0x00, 0x00, 0x00
It's because you've violated the strict aliasing rule, giving you undefined behavior. You cannot acesss type A through a pointer of type B and just pretend it works.
TL;DR:
if you have an int* pointing to some memory containing an int and then
you point a float* to that memory and use it as a float you break the
rule. If your code does not respect this, then the compiler's
optimizer will most likely break your code.
The memory addresses are the same, and they both point to a double-precision floating point number in memory. However, you've asked the compiler to treat one as an integer and another as a double. (A pointer might just be a memory address, but at compile-time the compiler has information about the type as well.) It just so happens that the in-memory representation of this particular double-precision number looks like a 0 when treated as an integer.
Because you have casted them to different types yourself.
When you do auto p = (int*)&d; you are asking the compiler to store a double value in a memory area allocated for an integer. An integer and a double are represented in different formats in a computer's memory. A double is stored using a floating point representation in memory, while an int is not. This is a classic example of undefined behaviour.

Can anyone please explain what this C++ code is doing?

char b = 'a';
int *a = (int*)&b;
std::cout << *a;
What could be the content of *a? It is showing garbage value. Can you anyone please explain. Why?
Suppose char takes one byte in memory and int takes two bytes (the exact number of bytes depends of the platform, but usually they are not same for char and int). You set a to point to the memory location same as b. In case of b dereferencing will consider only one byte because it's of type char. In case of a dereferencing will access two bytes and thus will print the integer stored at these locations. That's why you get a garbage: first byte is 'a', the second is random byte - together they give you a random integer value.
Either the first or the last byte should be hex 61 depending on byte order. The other three bytes are garbage. best to change the int to an unsigned int and change the cout to hex.
I don't know why anyone would want to do this.
You initialize a variable with the datatype char ...
a char in c++ should have 1 Byte and an int should contain 2 Byte. Your a points to the address of the b variable... an adress should be defined as any hexadecimal number. Everytime you call this "program" there should be any other hexadecimal number, because the scheduler assigns any other address to your a variable if you start this program new.
Think of it as byte blocks. Char has one byte block (8 bits). If you set a conversion (int*) then you get the next 7 byte blocks from the char's address. Therefore you get 7 random byte blocks which means you'll get a random integer. That's why you get a garbage value.
The code invokes undefined behavior, garbage is a form of undefined behavior, but your program could also cause a system violation and crash with more consequences.
int *a = (int*)&b; initializes a pointer to int with the address of a char. Dereferencing this pointer will attempt to read an int from that address:
If the address is misaligned and the processor does not support misaligned accesses, you may get a system specific signal or exception.
If the address is close enough to the end of a segment that accessing beyond the first byte causes a segment violation, that's what you can get.
If the processor can read the sizeof(int) bytes at the address, only one of those will be a, (0x61 in ASCII) but the others have undetermined values (aka garbage). As a matter of fact, on some architectures, reading from uninitialized memory may cause problems: under valgrind for example, this will cause a warning to be displayed to the user.
All the above are speculations, undefined behavior means anything can happen.

Float to int number conversion in c++

The following C++ code:
union float2bin{
float f;
int i;
};
float2bin obj;
obj.f=2.243;
cout<<obj.i;
gives output as some garbage value .
But
union float2bin{
float f;
float i;
};
float2bin obj;
obj.f=2.243;
cout<<obj.i;
gives output same as the value of f i.e 2.243
Compiler GCC has int & float of same size i.e 4 but then what's the reason behind this output behaviour?
The reason is because it is undefined behavior. In practice,
you'll get away with reading an int from something that was
stored as a float on most machines, but you'll read garbage
values unless you know what to expect. Doing it in the other
direction will likely cause the program to crash for certain
values of int.
Under the hood, of course, integral values and floating point
values have different representations, at least on most
machines. (On some Unisys mainframes, your code would do what
you expect. But they're not the most common systems around, and
you probably don't have one on your desktop.) Basically,
regardless of the type, you have a sequence of bits, which will
be interpreted by the hardware in some way. C++ requires
integers to use a pure binary representation, which constrains
the representation somewhat. It also requires a very large
range for floating point values, and more or less requires some
form of exponential notation, with some bits representing the
exponent, and others the mantissa. With different encodings for
each.
The reason is because floating point values are stored in a more complicated way, partitioning the 32 bits into a sign, an exponent and a fraction. If these bits are read as an integer straight off, it will look like a very different value.
The important point here is that if you create a union, you are saying that it is one contiguous block of memory that can be interpreted in two different ways. No where in this mechanism does it account for a safe conversion between float and int, in which case some kind of rounding occurs.
Update: What you might want is
float f = 10.25f;
int i = (int)f;
// Will give you i = 10
However, the union approach is closer to this:
float f = 10.25f;
int i = *((int *)&f);
// Will give you some seemingly arbitrary value

Confused in the output of the following programme

float b = 1.0f;
int i = b;
int& j = (int&)i;
cout<<j<<endl;
o/p = 1
But for the following scenario
float b = 1.0f;
int i = b;
int& j = (int&)b;
cout<<j<<endl;
O/P = 1065353216
since both are having the same value it shall show the same result ...Can anyone please let me know whats really happening when i am doing some change in line number 3 ?
In the first one, you are doing everything fine. The compiler is able to convert float b to int i, losing precision, but it's fine. Now, take a look at my debugger window during the execution of your second example:
Sorry for my Russian IDE interface, the first column is variable name, the second is value, and the third is type.
As you can see, now the float is simply interpreted as int. So the leading 1 bits are interpreted as the integer's bits, which leads to the result you are getting. So basically, you take the float's binary representation (usually it's represented as sign bit, mantissa and exponent), and try to interpret it as an int.
In the first case you're initializing j correctly and the cast is superfluous. In the second case you're doing it wrong (i.e. to an object of a different type) but the cast shuts the compiler up.
In this second case, what you get is probably the internal representation of 1.0 interpreted as in integer.
Integer 1 and floating-point 1.0f may be mathematically the same value, but in C++ they have different types, with different representations.
Casting an lvalue to a reference is equivalent to reinterpret_cast; it says "look at whatever is in this memory location, and interpret those bytes as an int".
In the first case, the memory contains an int, so interpreting those bytes as an int gives expected value.
In the second case, the memory contains a float, so you see the bytes (or perhaps just some of them, or perhaps some extra ones too, if sizeof(int) != sizeof(float)) that represent the floating-point number, reinterpreted as an integer.
Your computer probably uses 32-bit int and 32-bit IEEE float representations. The float value 1.0f has a sign bit of zero, an exponent of zero (represented by the 8-bit value 127, or 01111111 in binary), and a mantissa of 1 (represented by the 23-bit value zero), so the 32-bit pattern would look like:
00111111 10000000 00000000 00000000
When reinterpreted as an integer, this gives the hex value 0x3f800000, which is 1065353216 in decimal.
Reference doesn't do any memory allocation, it just places an entry into table of local names and their addresses. In first case name 'j' points to the memory previously allocated to int datatype (for variable 'i'), while in second case name 'j' points to memory allocated to float datatype (for variable 'b'). When you use 'j' compiler interprets data at the appropriate address as if it was int, but in fact some float is placed there, that's why you get some "strange" numbers instead of 1
The first one first casts b to an int before assigning it to i. This is the "proper" way, as the compiler will properly convert the value.
The second one does no casting and re-interpret's b's bits as an integer. If you read up on floating point format you can see exactly why you're getting the value you're getting.
Under the covers, all your variables are just collections of bits. How you interpret those bits changes the perceived value they represent. In the first one, you're rearranging the bit pattern to preserve the "perceived" value (of 1). In the second one, you're not rearranging the bit pattern, and so the perceived value is not properly converted.

C++ union to represent data memory vs C scalar variable type

Today I've a weird question.
The Code(C++)
#include <iostream>
union name
{
int num;
float num2;
}oblong;
int main(void)
{
oblong.num2 = 27.881;
std::cout << oblong.num << std::endl;
return 0;
}
The Code(C)
#include <stdio.h>
int main(void)
{
float num = 27.881;
printf("%d\n" , num);
return 0;
}
The Question
As we know, C++ unions can hold more than one type of data element but only one type at a time. So basically the name oblong will only reserve one portion of memory which is 32-bit (because the biggest type in the union is 32-bit, int and float) and this portion could either hold a integer or float.
So I just assign a value of 27.881 into oblong.num2 (as you can see on the above code). But out of curiosity, I access the memory using oblong.num which is pointing to the same memory location.
As expected, it gave me a value which is not 27 because the way float and integer represented inside a memory is different, that's why when I use oblong.num to access the memory portion it'll treat that portion of memory value as integer and interpret it using integer representation way.
I know this phenomena also will happen in C , that's why I initialize a float type variable with a value and later on read it using the %d.So I just try it out by using the same value 27.881 which you can see above. But when I run it, something weird happens, that is the value of the one I get in C is different from C++.
Why does this happen? From what I know the two values I get from the two codes in the end are not garbage values, but why do I get different values? I also use the sizeof to verified both C and C++ integer and float size and both are 32-bit. So memory size isn't the one that causes this to happen, so what prompt this difference in values?
First of all, having the wrong printf() format string is undefined behavior. Now that said, here is what is actually happening in your case:
In vararg functions such as printf(), integers smaller than int are promoted to int and floats smaller than double are promoted to double.
The result is that your 27.881 is being converted to an 8-byte double as it is passed into printf(). Therefore, the binary representation is no longer the same as a float.
Format string %d expects a 4-byte integer. So in effect, you will be printing the lower 4-bytes of the double-precision representation of 27.881. (assuming little-endian)
*Actually (assuming strict-FP), you are seeing the bottom 4-bytes of 27.881 after it is cast to float, and then promoted to double.
In both cases you are encountering undefined behaviour. Your implementation just happens to do something strange.