confusion in union concept

confusion in union concept - c++

#include<stdio.h>
union node {
int i;
char c[2];
};
main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
printf("%d\n", n.i);
return 0;
}
I think it gives 512 output becouse c[0] value stores in first byte and c[1] value stores in second byte, but gives 1965097472. Why ?.
I compiled this program in codeblocks in windows.

Your union allocates four bytes, starting off as:
[????] [????] [????] [????]
You set the least two significant bytes:
[????] [????] [0x02] [0x00]
You then print out all four bytes as an integer. You're not going to get 512, necessarily, because anything can be in those most significant two bytes. In this case, you had:
[0x75] [0x21] [0x02] [0x00]

Because undefined behavior. Accessing an union member that wasn't set does that, simple as that. It can do anything, print anything, and even crash.

Undefined behavior is, well... undefined.
We can try to answer why a specific result was given (and the other answers do that by guessing compiler implementation details), but we cannot say why another result was not given. For all that we know, the compiler could have printed 0, formatted your hard drive, set your house on fire or transferred 100,000,000 USD to your bank account.

The intis compiled as a 32 bit number, little endian. By setting the two lower bytes to 2 and 0 respectively and then reading the int you get 1965097472. If you look at the hexadecimal representation 7521 0200 you see your bytes again, and besides it is undefined behaviour and part of it depends on the memory architecture of the platform the program is running on.

Note that your int is likely to be at least 4 bytes (not 2, like it was in the good ol' days). To let the sizes match, change the type of i to uint16_t.
Even after this, the standard does not really permit setting one union member, and then accessing a different one in an attempt to reinterpret the bytes. However, you could get the same effect with a reinterpret_cast.
union node {
uint16_t i;
uint8_t c[2];
};
int main() {
union node n;
n.c[0] = 0;
n.c[1] = 2;
std::cout << *reinterpret_cast<uint16_t *>(&n) << std::endl;
return 0;
}

Related

How to iterate over every bit of a type in C++

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this
#include <iostream>
template<typename T>
bool getIthBit (T data, unsigned int bit) {
return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}
int main() {
uint32_t a = 16;
for (int i = 0; i < 32; i++) {
std::cout << getIthBit (a, i);
}
std::cout << std::endl;
}
Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here
#include <iostream>
struct s {
uint32_t i;
char c;
};
int main() {
std::cout << sizeof (s) << std::endl;
}
The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".

It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.
I recommend not attempting to support types that have padding as keys of your DST.
Following trick might work for finding padding bits of trivially copyable classes:
Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.
I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.
So, there are a lot of caveats, but it should work in your example case:
s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);
std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;

There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.
So if an object has unique representations, it is safe to assume that every bit is significant.
There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.
Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }
Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.
¹ like multiple NaN floating point values

C++/Address Space: 2 Bytes per adress?

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards

With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.

While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...

This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

short pointer to a float

i run this code in c++:
#include <iostream>
using namespace std;
int main()
{
float f = 7.0;
short s = *(short *)&f;
cout << sizeof(float) << endl
<< sizeof(short) << endl
<< s << endl;
return 0;
}
i get the following out pot:
4
2
0
but, in a lecture given in Stanford university, Professor Jerry Cain says he is sure the out pot well not be 0.
the lecture is can be fond here. he says that around the 48 minute.
is he wrong, or that some standard change since? or is there a difference between platforms?
I'm using g++ to compile my code.
EDIT: in the next lecture he does mention "big endian" and "small endian" and says that they well affect the result.

static void bitPrint(float f)
{
assert(sizeof(int) == sizeof(float));
int *data = reinterpret_cast<int*>(&f);
for (int i = 0; i < sizeof(int) * 8; ++i)
{
int bit = (1 << i) & *data;
if (bit) bit = 1;
cout << bit;
}
cout << endl;
}
int main()
{
float f = 7.0;
bitPrint(f);
return 0;
}
This program prints 00000000000000000000011100000010
Since the sizeof(short) == 2 on your platform you get the first 2 bytes which are both zeros
Note that since size of types and possibly float implementation (not sure about this) are implementation defined different output can be seen on different platforms.

Well, let's see. First you write a float into the memory. It occupies 4 bytes, and it's value is 7. A float in the memory looks something like "sign bit -> exponent bits -> mantissa bits". I'm not sure how many bits are there for each part exactly, probably that depends on your platform.
Since the float's value is 7, it only occupies some of the least-significant bits on the right (I assume big-endian).
Your short pointer points to the beginning of the float, which means to the most significant bit. Since the value is greater than 0, the sign bit is zero. Since the float value is far on the right, we can say that those two most significant bytes are filled with zeros.
Now, provided that a size of short is 2, which means we will only take two bytes out of float's 4 bytes, we get our 0.
I believe though, that this result is rather UB and can differ on different platforms, compilers, etc.

Accessing data through a pointer to a different type than it was stored as gives (except in a few special cases) undefined behavour.
Firstly it's platform dependent how the data it stored so different systems may well give different values, and secondly the compiler might well generate code that doesn't even see the value you'd expect as it's allowed to do anything it likes when you do this (It's undefined behavour due to the strict aliases rules).
Having said that there are probably reasons why the number you are seeing is valid, but you can't rely on it unless you specifically know your platform will do what you expect, it's not guarenteed by the standard.

He's "pretty" sure it's not zero, he says that explicitly.
However, given that the representation of a short can be big-endian or little-endian, I wouldn't be so certain. In any case, this is a throwaway line at the end of a fifty-minute lecture so we can forgive him a little. It may be he came back in the next lecture with a clarification.
You would need to examine the underlying bits at (at least) a byte-by-byte level to understand what's going on.

Casting a character array into an integer

EDIT: The wrong type of num2 has been corrected.
Hello,
I have some character arrays of known size which contains raw integer data read from a binary file.
The size of all these arrays have the size of a integer.
I would like to ask whether the following operation is safe and accurate in ALL normal situation, assuming that the endianness of the raw data and the computer running this code agrees.
char arr1[4] = { ... };
char arr2[2] = { ... };
uint32_t num1 = *static_cast<uint32_t*>(arr1); /* OR num1 = *(uint32_t*)arr1 in C */
uint16_t num2 = *static_cast<uint16_t*>(arr2); /* OR num2 = *(uint32_t*)arr2 in C */
Thank you!

You should use a union.
union charint32 {
char arr1[4];
uint32_t num;
};
This will simplify storage and casting for you.

It is technically safe, but there are a few things I would consider:
Add compile-time asserts to verify the sizes. Are you SURE that your char array equals sizeof(your_int_type)? Your num2 is a great example of why this is important - your typo would cause undefined behavior.
Consider the alignment. Are you sure that your char array is on a 4-byte boundary (assuming your int is 4 bytes)? PowerPC for example will crash if you try to read an int from an unaligned pointer.

This should be safe:
char arr1[4] = { ... };
uint32_t num1;
memcpy(&num1, arr1, sizeof num1);
But why is arr2 only 2 bytes big? Is that a typo?

A safer approach would be to use a macro (e.g. MAKEDWORD) to put the bytes in their proper order.

If you are sure the arrays are properly aligned, then there shouldn't be a problem (given the endianness).
In the code, however, I don't know what you're doing with arr2, since it is 16 bits, and you are reading a 32 bit quantity from it.

Yes, that should work fine (under your assumption of endianness), since the representation of these bytes in memory is the same regardless of whether it's interpreted as an array of bytes or an integer.
Really all you're doing is changing the type, not the data.

Explicit Address Manipulation in C++

Please check out the following func and its output
void main()
{
Distance d1;
d1.setFeet(256);
d1.setInches(2.2);
char *p=(char *)&d1;
*p=1;
cout<< d1.getFeet()<< " "<< d1.getInches()<< endl;
}
The class Distance gets its values thru setFeet and setInches, passing int and float arguments respectively. It displays the values through through the getFeet and getInches methods.
However, the output of this function is 257 2.2. Why am I getting these values?

This is a really bad idea:
char *p=(char *)&d1;
*p=1;
Your code should never make assumptions about the internal structure of the class. If your class had any virtual functions, for example, that code would cause a crash when you called them.
I can only conclude that your Distance class looks like this:
class Distance {
short feet;
float inches;
public:
void setFeet(...
};
When you setFeet(256), it sets the high byte (MSB) to 1 (256 = 1 * 2^8) and the low byte (LSB) to 0. When you assign the value 1 to the char at the address of the Distance object, you're forcing the first byte of the short representing feet to 1. On a little-endian machine, the low byte is at the lower address, so you end up with a short with both bytes set to 1, which is 1 * 2^8 + 1 = 257.
On a big-endian machine, you would still have the value 256, but it would be purely coincidental because you happen to be forcing a value of 1 on a byte that would already be 1.
However, because you're using undefined behavior, depending on the compiler and the compile options, you might end up with literally anything. A famous expression from comp.lang.c is that such undefined behavior could "cause demons to fly out of your nose".

You are illegally munging memory via the 'p' pointer.
The output of the program is undefined; as you are directly manipulating memory that is owned by an object through a pointer of another type without regard to the underlying types.
Your code is somewhat like this:
struct Dist
{
int x;
float y;
};
union Plop
{
Dist s; // Your class
char p; // The type you are pretending to use via 'p'
};
int main()
{
Plop p;
p.s.x = 5; // Set up the Dist structure.
p.s.y = 2.3;
p.p = 1; // The value of s is now undefined.
// As you have scribbled over the memory used by s.
}

The behaviour based on the code given is going to be very unpredictable. Setting the first byte of d1's data could potentially clobber a vptr, compiler-specific memory, the sign/exponent of a floating point value, or LSB or MSB of an integer, all depending on the definition of Distance.

I assume you think doing *p = 1 will set one of the internal data members (presumably 'feet') in the Distance object. It may work, but (afaik) you've got no guarantees that the feet member is at the first address of the object, is of the correct size (unless its type is also char) or that it's aligned correctly.
If you want to do that why not make the 'feet' member public and do:
d1.feet = 1;

Another thing, to comment on the program: don't use void main(). It isn't standard, and it offers you no benefits. It will make people not take you as seriously when asking C or C++ questions, and could cause programs to not compile, or not work properly.
The C++ Standard, in 3.6.1 paragraph 2, says that main() always returns int, although the implementation may offer variations with different arguments.
This would be a good time to break the habit. If you're learning from a book that uses void main(), the book is unreliable. See about getting another book, if only for reference.

It looks like you are new to programming and could use some help with basic concepts.
It's good that you are looking for that, but SO may not be the right place to get it.
Good luck.

The Definition of class is
class Distance{
int feet;
float inches;
public:
//...functions
};
now the int feet would be 00000001 00000000 (2 bytes) where the zeros would occupy lower address in Little Endian so the char *p will be 00000000.. when u make *p=1, the lower byte becomes 00000001 so the int variable now is 00000001 00000001 which is exactly 257!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

confusion in union concept - c++

Because undefined behavior. Accessing an union member that wasn't set does that, simple as that. It can do anything, print anything, and even crash.

Related

How to iterate over every bit of a type in C++

C++/Address Space: 2 Bytes per adress?

short pointer to a float

Casting a character array into an integer

Explicit Address Manipulation in C++

Categories

Resources