Undefined bhaviour with usage of union in cpp?

Undefined bhaviour with usage of union in cpp? - c++

I'm working on union using c++,
following is code snippet:
#include<iostream>
using namespace std;
typedef union myunion
{
double PI;
int B;
}MYUNION;
int main()
{
MYUNION numbers;
numbers.PI = 3;
numbers.B = 50;
cout <<" numbers.PI :" << numbers.PI << endl;
if(numbers.PI == 3.0)
{
cout <<"True";
if(numbers.B == 50)
{
cout <<" numbers.PI :" << numbers.PI << endl;
cout <<" numbers.B :" << numbers.B << endl;
}
}
return 0;
}
Output is:
numbers.PI :3
Even value of numbers.PI is set to 3 already, first "if" condition yields to false.
what is the reason of this behavior?

The reason is that there's no reason.
Your code invokes undefined behavior because you are setting the B member of the union:
numbers.B = 50;
but immediately after setting it, you read out the other member, PI:
cout <<" numbers.PI :" << numbers.PI << endl;
Maybe you are confusing unions and structures - unless the floating-point number 3 and the integer 50 have the very same bit representation on your architecture (which is very unlikely), the behavior you expect from your program would be reasonable only if you used a struct instead.
(union members reside at the same place in memory - setting one overwrites the other too. This is not true for a struct, which has each of its members stored at a different memory location.)

Remember that all members of a union shares the same memory. When you assign to B you change the value of PI as well.
To be safe, you should only "read" from the last field you "write" to.
It seems to me that what you want is a structure.

You're getting undefined behavior, but here's what is happening behind the scenes:
You're using a little-endian machine with sizeof(int) < sizeof(double), such as an x86. Almost certainly, the machine uses IEEE 754 format for floats/double (pretty much all machines do these days).
When you write into the B field, it overwrites the low order bits of the double in PI. So
when you initially store 3.0 in PI, that sets it to something like 0x4008000000000000. Then when you store 50 in B that changes PI to 0x4008000000000032, which happens to be 3.00000000000002220446049250313080847263336181640625. So that's not equal to 3.0, but when you print it with default precision, it rounds it to 3.0

Related

Does converting a pointer to int actually return the location of it in byte or something? [duplicate]

This question already has answers here:
What does "dereferencing" a pointer mean?
(6 answers)
Closed 2 years ago.
I am new to C++, but I am curious enough to dig into these strange things.
I was wondering what happens when I convert a pointer to an int and realized could they indicate something. So I wrote this program to test my ideas as pointers in the same array are close enough in terms of memory location to be compared.
This is the code that will explain my question clearly.
#include <iostream>
using namespace std;
int main() {
cout << "--------------------[ Pointers ]--------------------" << endl;
const unsigned int NSTRINGS = 9;
string strArray[NSTRINGS] = { "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine" };
string *pStartArray = &strArray[0]; // Setting pStartArray pointer location to the first block of the array.
string *pEndArray = &strArray[NSTRINGS - 1]; // Setting pEndArray pointer location to the last block of the array.
cout << "---[ pStartArray value : " << *pStartArray << endl; // Showing the value of the pStartArray pointer (Just for safety check).
cout << "---[ pEndArray value : " << *pEndArray << endl; // Showing the value of the pEndArray pointer (Just for safety check).
short int blockDifferential = pEndArray - pStartArray; // Calculating the block differential of those two pointers.
cout << "---[ Differential of the block locations that pointers are pointing to in array (pEndArray - pStartArray) : " << blockDifferential << endl;
long long pStartIntLocation = (long long)pStartArray; // Converting the memory location (Hexadecimal) of pStartArray pointer to int (Maybe it's byte, regardless of being positive or negative). What's your opinion on this?
cout << "---[ (long long) pStartArray current memory location to int : \"" << pStartIntLocation << "\"" << endl;
long long pEndIntLocation = (long long)pEndArray;
cout << "---[ (long long) pEndArray current memory location converted to int : \"" << pEndIntLocation << "\"" << endl; // Converting the memory location (Hexadecimal) of pEndArray pointer to int (Maybe it's byte, regardless of being positive or negative). What's your opinion on this?
short int locationDifferential = pEndIntLocation - pStartIntLocation; // And subtracting the integer convetred location of pEndArray from pStartArray.
cout << "---[ Differential of the memory locations converted to int ((long long)pStartArray - (long long)pEndArray) : " << locationDifferential << " (Bytes?)" << endl; // Seems like even after running the program multiple times, this number does not change. Something's fishy. Doesn't it seem like it's a random thing. It must be investigated.
cout << "---[ Size of variable <string> (According to the computer that it's running) : " << sizeof(string) << " (Bytes)" << endl; // To know how much memory does a string consume. For example mine was 40.
// Here it goes interesting. I can get the block differential of the pointers using <locationDifferential>.
cout << "---[ Differential of the cell location (AGAIN) using the <locationDifferential> that I have calculated : " << locationDifferential/sizeof(string) << endl; // So definately <locationDifferential> was in bytes. Because I got 8 again. I just wonder is it a new discovery. LOL.
/*
I might look really crazy, because I can't tell it another way. It just can't happen.
This is the last try to make it as clear as I can.
pStrArray ]--\ pEndArray ]--\
\ - 8 cell difference - \
Array = | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|-------- Differential ---------|
Cell difference : 8 string cells
String size that I considered : 32 Bytes
data (location difference) : 8 * 32 = 256
So if you see this, all this might make sense.
I am excited to see what opinions you professional programmers have come up with.
- D3F4U1T
*/
cout << "----------------------------------------------------" << endl;
return 0;
}
How does this exactly work?
pStartIntLocation, pEndIntLocation are all in bytes?
If so, why sometimes they return negative value?
This is strange.
Also correct me if I am wrong about any information I provided.
- Best regards.
- D3F4U1T.
Edit 2:
Does the value that results from the conversion from pointer to a long long mean anything? Like the memory address but with the difference that this one is in bytes?
Edit 3: Seems like this is related to virtual address space. Correct me if I am wrong. Does the OS have any mechanism to number the memory as bytes. For example: Byte 1 , Byte 2 , ....

A "pointer" is an integer quantity of some length whose contents are understood to represent a memory address. (By convention, zero means NULL ... no address.)
If you typecast it into an integer, you are simply declaring to the compiler that *"no, these however-many bits should not be treated as an address ... treat them as an integer." The content of the location does not change, only the compiler's interpretation of it.
Typecasting does not change the bits – only the momentary interpretation of what they are and what they mean.
FYI: unions are another way to do a similar thing: every element of a union overlaps the others and describes various interpretations of the same area of storage. (In the Fortran language, this was called EQUIVALENCE.)

Usual arithmetic conversions and Integer promotion

I'm trying to understand what's going on under the hood of c conversions, and different types promotions and comparing stuff and all of this.
union myUnion{
int intVal;
float floatVal;};
if (m.floatVal == m.intVal)
{
cout << "BINGO!";
}
if (*ptrInt == *ptrInt2)
{
cout << "BINGO!" << endl << *ptrInt << endl << *ptrInt2;
}
The first if statement is evaluated to false and the second if statement to true.
How c compiler interprets this values m.floatVal, m.intVal. I'm mean what's going on down there, into assembly, because that's going to be run on the CPU.
Moreover m.floatVal, m.intVal gets evaluated different values depending on which variable I initialised first.
m.floatVal = 3; first gets something
m.intVal = 3; first gets something else.
In the end there is the same value there!?!?!?!?!?!?
Second example:
char minstogo = 0x98;
if (minstogo <= 7) {
cout << "BEAST!";
} beast is printed
char minstogo = 0x98;
if ((unsigned char)minstogo <= 7) {
cout << "BEAST!";
} nothing is printed
char minstogo = 0x98;
if (minstogo <= (unsigned char)7) {
cout << "BEAST!";
} beast is printed
How the compiler interprets this mess and what is going on down the assembly?
Third example:
How is a float converted to an int? Who the bits are all remapped?
Thank you so much guys! Thank you.

First example:
union myUnion{
int intVal;
float floatVal;};
if (m.floatVal == m.intVal)
{
cout << "BINGO!";
}
This is undefined behaviour in c++. Having written to intVal, reading floatVal is undefined behaviour. Having written to floatVal, reading intVal is undefined behaviour.

C++11 Why does cout print large integers from a boolean array?

#include <iostream>
using namespace std;
int main() {
bool *a = new bool[10];
cout << sizeof(bool) << endl;
cout << sizeof(a[0]) << endl;
for (int i = 0; i < 10; i++) {
cout << a[i] << " ";
}
delete[] a;
}
The above code outputs:
1
1
112 104 151 0 0 0 0 0 88 1
The last line should contain garbage values, but why are they not all 0 or 1? The same thing happens for a stack-allocated array.
Solved: I forgot that sizeof counts bytes, not bits as I thought.

You have an array of default-initialized bools. Default-initialization for primitive types entail no initialization, thus they all have indeterminate values.
You can zero-initialize them by providing a pair of parentheses:
bool *a = new bool[10]();
Booleans are 1-byte integral types so the reason you're seeing this output is probably because that is the data on the stack at that moment that can be viewed with a single byte. Notice how they are values under 255 (the largest number that can be produced from an unsigned 1-byte integer).
OTOH, printing out an indeterminate value is Undefined Behavior, so there really is no logic to consider in this program.

sizeof(bool) on your machine returns 1.
That's 1 byte, not 1 bit, so the values you show can certainly be present.

What you are seeing is uninitialized values, different compilers generate different code. On GCC I see everything as 0 on windows i see junk values.
generally char is the smallest byte addressable- even though bool has 1/0 value- memory access wise it will be a char. Thus you will never see junk value greater than 255
Following initialization (memset fixes the things for you)
#include <iostream>
using namespace std;
int main() {
bool* a = new bool[10];
memset(a, 0, 10*sizeof(bool));
cout << sizeof(bool) << endl;
cout << sizeof(a[0]) << endl;
for (int i = 0; i < 10; ++i)
{
bool b = a[i];
cout << b << " ";
}
return 0;
}

Formally speaking, as pointed out in this answer, reading any uninitialized variable is undefined behaviour, which basically means everything is possible.
More practically, the memory used by those bools is filled with what you called garbage. ostreams operator<< inserts booleans via std::num_put::put(), which, if boolalpha is not set, converts the value present to an int and outputs the result.

I do not know why you put a * sign before variable a .
Is it a pointer to point a top element address of the array?

Why do I need to initialize an int variable to 0?

I just made this program which asks to enter number between 5 and 10 and then it counts the sum of the numbers which are entered here is the code
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
int a,i,c;
cout << "Enter the number between 5 and 10" << endl;
cin >> a;
if (a < 5 || a > 10)
{
cout << "Wrong number" << endl;
system("PAUSE");
return 0;
}
for(i=1; i<=a; i++)
{
c=c+i;
}
cout << "The sum of the first " << a << " numbers are " << c << endl;
system("PAUSE");
return 0;
}
if i enter number 5 it should display
The sum of the first 5 numbers are 15
but it displays
The sum of the first 5 numbers are 2293687
but when i set c to 0
it works corectly
So what is the difference ?

Because C++ doesn't automatically set it zero for you. So, you should initialize it yourself:
int c = 0;
An uninitialized variable has a random number such as 2293687, -21, 99999, ... (If it doesn't invoke undefined behavior when reading it)
Also, static variables will be set to their default value. In this case 0.

If you don't set c to 0, it can take any value (technically, an indeterminate value). If you then do this
c = c + i;
then you are adding the value of i to something that could be anything. Technically, this is undefined behaviour. What happens in practice is that you cannot rely on the result of that calculation.
In C++, non-static or global built-in types have no initialization performed when "default initialized". In order to zero-initialize an int, you need to be explicit:
int i = 0;
or you can use value initialization:
int i{};
int j = int();

Non-static variables are, by definition, uninitialized - their initial values are undefined.
On another compiler, you might get the right answer, another wrong answer, or a different answer each time.
C/C++ don't do extra work (initialization to zero involves at least an instruction or two) that you didn't ask them to do.

The sum of the first 5 numbers are 2293687
This is because without initializing c you are getting value previous stored at that location (garbage value). This will make yor program's behavior undefined. You must have to initialize c before using it in your program.
int c= 0;

Because when you do:
int a,i,c;
thus instantiating and initializing c, you haven't said what you want it initialized to. The rules here are somewhat complex, but what it boils down to is two things:
For integral types, if you don't specify an initializer, the variable's value is indeterminate
When you try to read an uninitialized variable, you evoke Undefined Behavior

How are the values of uninitialized variables determined?

Given a program of :
int main()
{
short myVariableName1; // stores from -32768 to +32767
short int myVariableName2; // stores from -32768 to +32767
signed short myVariableName3; // stores from -32768 to +32767
signed short int myVariableName4; // stores from -32768 to +32767
unsigned short myVariableName5; // stores from 0 to +65535
unsigned short int myVariableName6; // stores from 0 to +65535
int myVariableName7; // stores from -32768 to +32767
signed int myVariableName8; // stores from -32768 to +32767
unsigned int myVariableName9; // stores from 0 to +65535
long myVariableName10; // stores from -2147483648 to +2147483647
long int myVariableName11; // stores from -2147483648 to +2147483647
signed long myVariableName12; // stores from -2147483648 to +2147483647
signed long int myVariableName13; // stores from -2147483648 to +2147483647
unsigned long myVariableName14; // stores from 0 to +4294967295
unsigned long int myVariableName15; // stores from 0 to +4294967295
cout << "Hello World!" << endl;
cout << myVariableName1 << endl;
cout << myVariableName2 << endl;
cout << myVariableName3 << endl;
cout << myVariableName4 << endl;
cout << myVariableName5 << endl;
cout << myVariableName6 << endl;
cout << myVariableName7 << endl;
cout << myVariableName8 << endl;
cout << myVariableName9 << endl;
cout << myVariableName10 << endl;
cout << myVariableName11 << endl;
cout << myVariableName12 << endl;
cout << myVariableName13 << endl;
cout << myVariableName14 << endl;
cout << myVariableName15 << endl;
cin.get();
return 0;
}
Printing out the unassigned variables will print whatever was stored in that memory location previously. What I've noticed is that across multiple consecutive executions the printed values are not changing - which tells me that the locations in memory are the same each time they execute.
I'm just curious as to how this is determined, why this is so.

Those variables live on the stack. The execution of your program looks to be deterministic, so every time you run it the same things happen. It's not choosing the same address necessarily (many runtimes these days make use of Address Space Randomization techniques to ensure that the stack addresses are not the same between runs), but the relative addresses on the stack contain the same data every time.

They can be anything don't rely on them to be anything specific.
Technically, the values are Indeterminate.
Note that using an Indeterminate value results in Undefined Behavior.

They're all stack based. Probably the startup code has already used those locations, so you're getting whatever it left in them.

The behaviour is not defined as "whatever was stored in that memory location previously", but it is rather not defined at all. There is no guarantee to what will happen.
My best guess is that your operation system is using virtual memory (as most modern OS do), thus giving you the same memory addresses every time.

Unlike humans, computers are deterministic.
Usually a good idea to use the compiler options to pick up when reading a variable that has not been given a value.
So the OS picks up the code and does exactly the same thing ever time. Hence the same result.
But if you start fiddling with the code the executable will be different. As you have not been specific you get a different result the next time around.
So in summary just use all the features of the compiler to help you spot this error and get into the habit of giving the variables values before using that variable.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Undefined bhaviour with usage of union in cpp? - c++

Remember that all members of a union shares the same memory. When you assign to B you change the value of PI as well. To be safe, you should only "read" from the last field you "write" to. It seems to me that what you want is a structure.

Related

Does converting a pointer to int actually return the location of it in byte or something? [duplicate]

Usual arithmetic conversions and Integer promotion

C++11 Why does cout print large integers from a boolean array?

Why do I need to initialize an int variable to 0?

How are the values of uninitialized variables determined?

Categories

Resources