arrays + unions + structs containing bit fields C++ - c++

I was just playing around with bit fields and came across something that I can't quite figure out how to get around.
(Note about the platform: size of an int = 2bytes, long = 4bytes, long long = 8bytes - thought it worth mentioning as I know it can vary. Also the 'byte' type is defined as an 'unsigned char')
I would like to be able to make an array of two 36 bit variables and put them into a union with an array of 9 bytes. This is what I came up with:
typedef union {
byte bytes[9];
struct {
unsigned long long data:36;
} integers[2];
} Colour;
I was working on the theory that the compiler would realise there was supposed to be two bitfields as part of the anonymous struct and put them together into the space of 9 bytes. However it turns out that they get aligned at a byte boundary so the union occupies 10 bytes not 9, which makes perfect sense.
The question is then, is there a way to create an array of two bit fields like this? I considered the 'packed' attribute, but the compiler just ignores it.
While this works as expected (sizeof() returns 9):
typedef union {
byte bytes[9];
struct {
unsigned long long data0:36;
unsigned long long data1:36;
} integers;
} Colour;
It would be preferable to have it accessible as an array.
Edit:
Thanks to cdhowie for his explanation of why this won't work.
Fortunately I thought of a way to achieve what I want:
typedef union {
byte bytes[9];
struct {
unsigned long long data0:36;
unsigned long long data1:36;
unsigned long long data(byte which){
return (which?data1:data0);
}
void data(byte which, unsigned long long _data){
if(which){
data1 = _data;
} else {
data0 = _data;
}
}
} integers;
} Colour;

You can't directly do this using arrays, if you want each bitfield to be exactly 36 bits wide.
Pointers must be aligned to byte boundaries, that's just the way pointers are. Since arrays function like pointers in most cases (with exceptions), this is just not possible with bitfields that contain a number of bits not evenly divisible by 8. (What would you expect &(((Colour *) 0)->integers[1]) to return if the bitfields were packed? What value would make sense?)
In your second example, the bitfields can be tightly-packed because there is no pointer math going on under the hood. For things to be addressable by pointer, they must fall on a byte boundary, since bytes are the units used to "measure" pointers.
You will note that if you try to take the address of (((Colour *) 0)->integers.data0) or data1 in the second example, the compiler will issue an error, for exactly this reason.

Related

Using Unions to convert an IP address into one 32-bit, two 16-bit, and four 8-bit values

I'm working on a school assignment. I am writing a program that utilizes unions to convert a given IP address in "192.168.1.10" format into its 32-bit single value, two 16-bit values, and four 8-bit values.
I'm having trouble with implementing my structs and unions appropriately, and am looking for insight on the subject. To my understanding, unions point to the same location as the referenced struct, but can look at specified pieces.
Any examples showing how a struct with four 8-bit values and a union can be used together would help. Also, any articles or books that might help me would also be appreciated.
Below is the assignment outline:
Create a program that manages an IP address. Allow the user to enter the IP address as four 8 bit unsigned integer values (just use 4 sequential CIN statements). The program should output the IP address upon the users request as any of the following. As a single 32 bit unsigned integer value, or as four 8 bit unsigned integer values, or as 32 individual bit values which can be requested as a single bit by the user (by entering an integer 0 to 31). Or as all 32 bits assigned into 2 variable sized groups (host group and network group) and outputted as 2 unsigned integer values from 1 bit to 31 bits each.
I was going to cin to int pt1,pt2,pt3,pt4 and assign them to the IP_Adress.pt1, .... etc.
struct IP_Adress {
unsigned int pt1 : 8;
unsigned int pt2 : 8;
unsigned int pt3 : 8;
unsigned int pt4 : 8;
};
I have not gotten anything to work appropriately yet. I think I am lacking a true understanding of the implementation of unions.
A union is not a good fit for this assignment. In fact, nothing in the text you quoted even says to use a union at all. And, a union will not help you with the parts of the assignment that deal with "32 individual bit values" or with "32 bits assigned into 2 variable sized groups". Those parts of the assignment will require bit shifting instead. Bit shifting is the better way to solve the other parts of the assignment, as well.
That being said, if you absolutely must use a union, you are probably looking for something more like this instead:
union IP_Adress {
uint8_t u8[4]; // four 8-bit values
uint16_t u16[2]; // two 16-bit values
uint32_t u32; // one 32-bit value
};
Except that C++ does not allow you to write into one union field and then read from another. C allows that kind of type puning, but it is undefined behavior in C++.
Why is type punning considered UB?
The asker already knows that doing this can blow up in their face a number of different ways, but here's a simple example for 4 byte, 4x1 byte, and 32x1 bit.
union bad_idea
{
uint32_t ints; // 32 bit unsigned integer
uint8_t bytes[sizeof(uint32_t)]; // 4 8 bit unsigned integers
};
and then
uint32_t get_int(const bad_idea & in)
{
return in.ints;
}
uint8_t get_byte(const bad_idea & in,
size_t offset)
{
if (offset >= sizeof(uint32_t)) // trap typos and idiots
{
throw std::runtime_error("invalid offset");
}
return in.bytes[offset];
}
bool get_bit(const bad_idea & in,
size_t offset)
{
if (offset >= sizeof(uint32_t)*8)
{
throw std::runtime_error("invalid offset");
}
return (in.ints >> offset) & 1; // shift the required bit to the end (in.ints >> offset)
// then mask off all of the other bits (& 1)
}
Things get a bit ugly getting input because you can't simply
std::cin >> bad.bytes[0];
because it reads a single character. Type in 127 for the first octet and you'll wind up filling bad.bytes[0] through bad.bytes[2] with '1', '2', and '7'.
You need to involve a temporary variable
int temp;
std::cin >> temp;
// tests for valid range in temp
bad.bytes[0] = temp
or risk some explosive batsmurf like
std::cin >> *(int*)&bad.bytes[0];
// tests for valid value in bad.bytes[0] impossible because aliasing has been broken
// and everything is already <expletive deleted>ed
pardon my C. The more respectable
std::cin >> *reinterpret_cast<int*>(&bad.bytes[0]);
isn't any better. As ugly as it is, use the temporary variable and bundle it up in a function to eliminate the duplication. Frankly this is a time when I'd probably fall back into C and pull out good ol' scanf.
The assignment doesn't say c++, you can just use typecasting instead of a union. I like to print the 32bit address out in hex also as it's easier to make sure you have right 32bit value;
#define word8 uint8_t
#define word16 uint16_t
#define word32 uint32_t
char *sIP = "192.168.0.11";
main(){
word32 ip, *pIP;
pIP = &ip;
inet_pton(AF_INET, sIP, pIP);
printf("32bit:%u %x\n", *pIP, *pIP);
printf("16bit:%u %u\n", *(word16*)pIP, *(((word16*)pIP)+1));
printf("8bit:%u %u %u %u\n", *(word8*)pIP, *(((word8*)pIP)+1),*(((word8*)pIP)+2),*(((word8*)pIP)+3));
}
Output:
32bit:184592576 b00a8c0
16bit:43200 2816
8bit:192 168 0 11
You could also store the IP as a 4 byte string and do math to get the 16 bit and 32 bit answers. Its a pretty dumb assignment IMO; I would never use a union to do it.

c bitfields strange behaviour with long int in struct

i am observing strange behaviour when i run the following code.
i create a bitfield by using a struct, where i want to use 52 bits, so i use long int.
The size of long int is 64 bits on my system, i check it inside the code.
Somehow when i try to set one bit, it alwas sets two bits. one of them is the one i wanted to set and the second one is the index of the first one plus 32.
Cann anybody tell me, why is that?
#include <stdio.h>
typedef struct foo {
long int x:52;
long int:12;
};
int main(){
struct foo test;
int index=0;
printf("%ld\n",sizeof(test));
while(index<64){
if(test.x & (1<<index))
printf("%i\n",index);
index++;
}
test.x=1;
index=0;
while(index<64){
if(test.x & (1<<index))
printf("%i\n",index);
index++;
}
return 0;
}
Sry forgot to post the output, so my question was basicly not understandable...
The Output it gives me is the following:
8
0
32
index is of type int, which is probably 32 bits on your system. Shifting a value by an amount greater than or equal to the number of bits in its type has undefined behavior.
Change index to unsigned long (bit-shifting signed types is ill-advised). Or you can change 1<<index to 1L << index, or even 1LL << index.
As others have pointed out, test is uninitialized. You can initialize it to all zeros like this:
struct foo test = { 0 };
The correct printf format for size_t is %zu, not %ld.
And it wouldn't be a bad idea to modify your code so it doesn't depend on the non-portable assumption that long is 64 bits. It can be as narrow as 32 bits. Consider using the uint_N_t types defined in <stdint.h>.
I should also mention that bit fields of types other than int, unsigned int, signed int, and _Bool (or bool) are implementation-defined.
You have undefined behavior in your code, as you check the bits in text.x without initializing the structure. Because you don't initialize the variable, it will contain random data.

Bit setting question

In C or C++, it's apparently possible to restrict the number of bits a variable has, so for example:
unsigned char A:1;
unsigned char B:3;
I am unfamiliar however with how it works specifically, so a number of questions:
If I have a class with the following variables:
unsigned char A:1;
unsigned char B:3;
unsigned char C:1;
unsigned char D:3;
What is the above technique actually called?
Is above class four bytes in size, or one byte in size?
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
Is there someway of combining the bits to a centralised byte? So for example:
.
unsigned char MainByte;
unsigned char A:1; //Can this be made to point at the first bit in MainByte?
unsigned char B:3; //Etc etc
unsigned char C:1;
unsigned char D:3;
Is there an article that covers this topic in more depth?
If 'A:1' is treated like an entire byte, what is the point/purple of it?
Feel free to mention any other considerations (like compiler restrictions or other limitations).
Thank you.
What is the above technique actually called?
Bitfields. And you're only supposed to use int (signed, unsigned or otherwise) as the "type", not char.
Is above class four bytes in size, or one byte in size?
Neither. It is probably sizeof(int) because the compiler generates a word-sized object. The actual bitfields will be stored within a byte, however. It'll just waste some space.
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
They represent only the bits specified, and will be packed as tightly as possible.
Is there someway of combining the bits to a centralised byte? So for example:
Use a union:
struct bits {
unsigned A:1;
unsigned B:3;
unsigned C:1;
unsigned D:3;
};
union SplitByte {
struct bits Bits;
unsigned char Byte[sizeof(struct bits)];
/* the array is a trick so the two fields
are guaranteed to be the same size and
thus be aligned on the same boundary */
} SplitByteObj;
// access the byte
SplitByteObj.Byte[0]
// access a bitfield
SplitByteObj.Bits.B
Note that there are problems with bitfields, for example when using threads. Each bitfield cannot be accessed individually, so you may get errors if you try to use a mutex to guard each of them. Also, the order in which the fields are laid out is not clearly specified by the standard. Many people prefer to use bitwise operators to implement bitfields manually for that reason.
Is there an article that covers this topic in more depth?
Not many. The first few you'll get when you Google it are about all you'll find. They're not a widely used construct. You'll be best off nitpicking the standard to figure out exactly how they work so that you don't get bitten by a weird edge case. I couldn't tell you exactly where in the standard they're specified.
If 'A:1' is treated like an entire byte, what is the point/purple of it?
It's not, but I've addressed this already.
These are bit-fields.
The details of how these fields are arranged in memory are largely implementation-defined. Typically, you will find that the compiler packs them in some way. But it may take various alignment issues into account.

Forcing unaligned bitfield packing in MSVC

I've a struct of bitfields that add up to 48 bits. On GCC this correctly results in a 6 byte structure, but in MSVC the structure comes out 8 bytes. I need to find some way to force MSVC to pack the struct properly, both for interoperability and because it's being used in a memory-critical environment.
The struct seen below consists of three 15-bit numbers, one 2-bit number, and a 1-bit sign. 15+15+15+2+1 = 48, so in theory it should fit into six bytes, right?
struct S
{
unsigned short a:15;
unsigned short b:15;
unsigned short c:15;
unsigned short d:2;
unsigned short e:1;
};
However, compiling this on both GCC and MSVC results in sizeof(S) == 8. Thinking that this might have to do with alignment, I tried using #pragma pack(1) before the struct declaration, telling the compiler to back to byte, not int, boundaries. On GCC, this worked, resulting in sizeof(S) == 6.
However, on MSVC05, the sizeof still came out to 8, even with pack(1) set! After reading this other SO answer, I tried replacing unsigned short d with unsigned char and unsigned short e with bool. The result is sizeof(S) == 7!
I found that if I split d into two one-bit fields and wedged them in between the other members, the struct finally packed properly.
struct S
{
unsigned short a:15;
unsigned short dHi : 1;
unsigned short b:15;
unsigned short dLo : 1;
unsigned short c:15;
unsigned short e:1;
};
printf( "%d\n", sizeof(S) ); // "6"
But having d split like that is cumbersome and causes trouble for me later on when I have to work on the struct. Is there some way I can force MSVC to pack this struct into 6 bytes, exactly as GCC does?
It is implementation defined how fields will be placed in the structure. Visual Studio will fit consecutive bitfields into an underlying type, if it can, and waste the leftover space. (C++ Bit Fields in VS)
If you use the type "unsigned __int64" to declare all elements of the structure, you'll get an object with sizeof(S)=8, but the last two bytes will be unused and the first six will contain the data in the format you want.
Alternatively, if you can accept some structure reordering, this will work
#pragma pack(1)
struct S3
{
unsigned int a:15;
unsigned int b:15;
unsigned int d:2;
unsigned short c:15;
unsigned short e:1;
};
I don't think so, and I think it's MSVC's behavior that is actually correct, and GCC that deviates from the standard.
AFAIK, the standard does not permit bitfields to cross word boundaries of the underlying type.

Custom byte size?

So, you know how the primitive of type char has the size of 1 byte? How would I make a primitive with a custom size? So like instead of an in int with the size of 4 bytes I make one with size of lets say 16.
Is there a way to do this? Is there a way around it?
It depends on why you are doing this. Usually, you can't use types of less than 8 bits, because that is the addressable unit for the architecture. You can use structs, however, to define different lengths:
struct s {
unsigned int a : 4; // a is 4 bits
unsigned int b : 4; // b is 4 bits
unsigned int c : 16; // c is 16 bits
};
However, there is no guarantee that the struct will be 24 bits long. Also, this can cause endian issues. Where you can, it's best to use system independent types, such as uint16_t, etc. You can also use bitwise operators and bit shifts to twiddle things very specifically.
Normally you'd just make a struct that represents the data in which you're interested. If it's 16 bytes of data, either it's an aggregate of a number of smaller types or you're working on a processor that has a native 16-byte integral type.
If you're trying to represent extremely large numbers, you may need to find a special library that handles arbitrarily-sized numbers.
In C++11, there is an excellent solution for this: std::aligned_storage.
#include <memory>
#include <type_traits>
int main()
{
typedef typename std::aligned_storage<sizeof(int)>::type memory_type;
memory_type i;
reinterpret_cast<int&>(i) = 5;
std::cout << reinterpret_cast<int&>(i) << std::endl;
return 0;
}
It allows you to declare a block of uninitialized storage on the stack.
If you want to make a new type, typedef it. If you want it to be 16-bytes in size, typedef a struct that has 16-bytes of member data within it. Just beware that quite often compilers will pad things on you to match your systems alignment needs. A 1 byte struct rarely remains 1 bytes without care.
You could just static cast to and from std::string. I don't know enough C++ to give an example, but I think this would be pretty intuitive.