Packing bools with bit field (C++) - c++

I'm trying to interface with Ada code using C++, so I'm defining a struct using bit fields, so that all the data is in the same place in both languages. The following is not precisely what I'm doing, but outlines the problem. The following is also a console application in VS2008, but that's not super relevant.
using namespace System;
int main() {
int array1[2] = {0, 0};
int *array2 = new int[2]();
array2[0] = 0;
array2[1] = 0;
#pragma pack(1)
struct testStruct {
// Word 0 (desired)
unsigned a : 8;
unsigned b : 1;
bool c : 1;
unsigned d : 21;
bool e : 1;
// Word 1 (desired)
int f : 32;
// Words 2-3 (desired)
int g[2]; //Cannot assign bit field but takes 64 bits in my compiler
};
testStruct test;
Console::WriteLine("size of char: {0:D}", sizeof(char) * 8);
Console::WriteLine("size of short: {0:D}", sizeof(short) * 8);
Console::WriteLine("size of int: {0:D}", sizeof(int) * 8);
Console::WriteLine("size of unsigned: {0:D}", sizeof(unsigned) * 8);
Console::WriteLine("size of long: {0:D}", sizeof(long) * 8);
Console::WriteLine("size of long long: {0:D}", sizeof(long long) * 8);
Console::WriteLine("size of bool: {0:D}", sizeof(bool) * 8);
Console::WriteLine("size of int[2]: {0:D}", sizeof(array1) * 8);
Console::WriteLine("size of int*: {0:D}", sizeof(array2) * 8);
Console::WriteLine("size of testStruct: {0:D}", sizeof(testStruct) * 8);
Console::WriteLine("size of test: {0:D}", sizeof(test) * 8);
Console::ReadKey(true);
delete[] array2;
return 0;
}
(If it wasn't clear, in the real program, the basic idea is that the program gets a void* from something communicating with the Ada code and casts it to a testStruct* to access the data.)
With #pragma pack(1) commented out, the output is:
size of char: 8
size of short: 16
size of int: 32
size of unsigned: 32
size of long: 32
size of long long: 64
size of bool: 8
size of int[2]: 64
size of int*: 32
size of testStruct: 224
size of test: 224
Obviously 4 words (indexed 0-3) should be 448 = 32*4 = 128 bits, not 224. The other output lines were to help confirm the size of types under the VS2008 compiler.
With #pragma pack(1) uncommented, that number (on the last two lines of output) is reduced to 176, which is still greater than 128. It seems that the bools aren't being packed together with the unsigned ints in "Word 0".
Note: a&b, c, d, e, f, packaged in different words would be 5, +2 for the array = 7 words, times 32 bits = 224, the number we get with #pragma pack(1) commented out. If c and e (the bools) instead take up 8 bits each, as opposed to 32, we get 176, which is the number we get with #pragma pack(1) uncommented. It seems #pragma pack(1) is only allowing the bools to be packed into single bytes by themselves, instead of words, but not the bools with the unsigned ints at all.
So my question, in one sentence: Is there a way to force the compiler to pack a through e into one word? Related is this question: C++ bitfield packing with bools , but that doesn't answer my question; it only points out the behavior I'm trying to force to go away.
If there is literally no way to do this, does anyone have any ideas for workarounds? I'm at a loss, because:
I was asked to avoid changing the struct format that I'm copying (no re-ordering).
I don't want to change the bools to unsigned ints because it may cause problems down the road with constantly having to re-cast it to bool and maybe accidentally using the wrong version of an overloaded function, not to mention making the code more obscure for others who read it later.
I don't want to declare them as private unsigned ints then make public accessors or something because all other members of all other structs in the project are accessed directly without () afterward, so it would seem a bit hacky and obtuse, and one would almost NEED the IntelliSense or trial-and-error to remember which needs () and which doesn't.
I would like to avoid creating another struct type just for the data conversion (and e.g. make a constructor for testStruct that takes in a single testStructImport-type object) because the actual struct is very long with lots of bit-field-specified variables.

I recommend that you create a "normal" structure without any bit packing. Use default POD types for the members.
Create interface functions for loading the "normal" fields from a buffer (uint8_t), and storing to a buffer.
This will allow you to use the data members in a sane method in your program. The bit packing and unpacking will be handled by the interface function. The bit twiddling should use bitwise AND and bitwise OR functions and not rely on the bit field notation in a structure. This will allow you to adjust the bit twiddling and be more portable among compilers.
This is how I designed my protocol classes. And I don't have to worry about bit field positioning, Endianess or things of that sort.
Also, I can use block I/O for reading and writing the buffer.

Try packing in this way:
#pragma pack( push, 1 )
struct testStruct {
// Word 0 (desired)
unsigned a : 8;
unsigned b : 1;
unsigned c : 1;
unsigned d : 21;
unsigned e : 1;
// Word 1 (desired)
unsigned f : 32;
// Words 2-3 (desired)
unsigned g[2]; //Cannot assign bit field but takes 64 bits in my compiler
};
#pragma pack(pop)

There is no easy, elegant method without using accessors or an interface layer. Unfortunately, there is nothing like a #pragma thing to fix this. I ended up just converting the bools to unsigned int and renaming variables from e.g. f to f_flag or f_bool to encourage correct usage and make it clear what the variables contained. It's lower-effort than Thomas's solution, but not as robust, obviously, and still gets around some of the main drawbacks with any of the easier methods.

Years after I posted this question, user #WaltK added this comment to the linked, related question:
"If you want to have more control over the layout of bit field
structures in memory, consider using this bit field facility,
implemented as a library header file."

Related

Bitfield using 1 byte instead of 1 bit

I am working on a networking application where I will receive 2 bytes and certain bits have specific significance. I am trying to implement that packet as a structure. The intent is to do a binary copy to object address and the fields of the packet are ready to be accessed. Here is a simple example representing my problem. When we try to inspect the size of the bitfield and structure they are not coming as expected.
#include <bitset>
#include<iostream>
struct a
{
std::bitset<8> b;
uint8_t c;
};
int main()
{
std::cout<<sizeof(a);
}
Output: 8
Expected Output: 2
Is this something specific to Bitset's implementation?
Generally, each element occupies only one bit (which, on most systems, is eight times less than the smallest elemental type: char).
(ref- cplusplus.com/reference/bitset/bitset/ )
compiled on Microsoft Visual Studio 2019 16.10.2
The source code of Microsoft Visual Studio' STL is open-sourced recently, you can check the implementation of bitset here, we can confirm that the data structure is an array, the sketch:
template <size_t _Bits>
class bitset { // store fixed-length sequence of Boolean elements
public:
using _Ty = conditional_t<_Bits <= sizeof(unsigned long) * CHAR_BIT, unsigned long, unsigned long long>;
static constexpr ptrdiff_t _Bitsperword = CHAR_BIT * sizeof(_Ty);
static constexpr ptrdiff_t _Words = _Bits == 0 ? 0 : (_Bits - 1) / _Bitsperword; // NB: number of words - 1
_Ty _Array[_Words + 1];
};
So for std::bitset<8>, the underly array is unsigned long _Array[1]. Then we get the size: 4. For the result of sizeof struct a we get 8, this is caused by the alignment around b and c, please see this answer for reference about alignment.
As you are dealing with network applications, it's not suitable to use bitset here. To decode network protocol, it's better to use bit fields. There is a good example in seastar's tcp decoder, you may take it as an example.

size of a hex pattern in cpp

I have a hex pattern stored in a variable, how to do I know what is the size of the hex pattern
E.g. --
#define MY_PATTERN 0xFFFF
now I want to know the size of MY_PATTERN, to use somewhere in my code.
sizeof (MY_PATTERN)
this is giving me warning -- "integer conversion resulted in truncation".
How can I fix this ? What is the way I should write it ?
The pattern can increase or decrease in size so I can't hard code it.
Don't do it.
There's no such thing in C++ as a "hex pattern". What you actually use is an integer literal. See paragraph "The type of the literal". Thus, sizeof (0xffff) is equal to sizeof(int). And the bad thing is: the exact size may vary.
From the design point of view, I can't really think of a situation where such a solution is acceptable. You're not even deriving a type from a literal value, which would be a suspicious as well, but at least, a typesafe solution. Sizes of values are mostly used in operations working with memory buffers directly, like memcpy() or fwrite(). Sizes defined in such indirect ways lead to a very brittle binary interface and maintenance difficulties. What if you compile a program on both x86 and Motorola 68000 machines and want them to interoperate via a network protocol, or want to write some files on the first machine, and read them on another? sizeof(int) is 4 for the first and 2 for the second. It will break.
Instead, explicitly use the exactly sized types, like int8_t, uint32_t, etc. They're defined in the <cstdint> header.
This will solve your problem:
#define MY_PATTERN 0xFFFF
struct TypeInfo
{
template<typename T>
static size_t SizeOfType(T) { return sizeof(T); }
};
void main()
{
size_t size_of_type = TypeInfo::SizeOfType(MY_PATTERN);
}
as pointed out by Nighthawk441 you can just do:
sizeof(MY_PATTERN);
Just make sure to use a size_t wherever you are getting a warning and that should solve your problem.
You could explicitly typedef various types to hold hex numbers with restricted sizes such that:
typedef unsigned char one_byte_hex;
typedef unsigned short two_byte_hex;
typedef unsigned int four_byte_hex;
one_byte_hex pattern = 0xFF;
two_byte_hex bigger_pattern = 0xFFFF;
four_byte_hex big_pattern = 0xFFFFFFFF;
//sizeof(pattern) == 1
//sizeof(bigger_pattern) == 2
//sizeof(biggest_pattern) == 4
four_byte_hex new_pattern = static_cast<four_byte_hex>(pattern);
//sizeof(new_pattern) == 4
It would be easier to just treat all hex numbers as unsigned ints regardless of pattern used though.
Alternatively, you could put together a function which checks how many times it can shift the bits of the pattern until it's 0.
size_t sizeof_pattern(unsigned int pattern)
{
size_t bits = 0;
size_t bytes = 0;
unsigned int tmp = pattern;
while(tmp >> 1 != 0){
bits++;
tmp = tmp >> 1;
}
bytes = (bits + 1) / 8; //add 1 to bits to shift range from 0-31 to 1-32 so we can divide properly. 8 bits per byte.
if((bits + 1) % 8 != 0){
bytes++; //requires one more byte to store value since we have remaining bits.
}
return bytes;
}

size of a structure containing bit fields [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
I was trying to understand the concept of bit fields.
But I am not able to find why the size of the following structure in CASE III is coming out as 8 bytes.
CASE I:
struct B
{
unsigned char c; // +8 bits
} b;
sizeof(b); // Output: 1 (because unsigned char takes 1 byte on my system)
CASE II:
struct B
{
unsigned b: 1;
} b;
sizeof(b); // Output: 4 (because unsigned takes 4 bytes on my system)
CASE III:
struct B
{
unsigned char c; // +8 bits
unsigned b: 1; // +1 bit
} b;
sizeof(b); // Output: 8
I don't understand why the output for case III comes as 8. I was expecting 1(char) + 4(unsigned) = 5.
You can check the layout of the struct by using offsetof, but it will be something along the lines of:
struct B
{
unsigned char c; // +8 bits
unsigned char pad[3]; //padding
unsigned int bint; //your b:1 will be the first byte of this one
} b;
Now, it is obvious that (in a 32-bit arch.) the sizeof(b) will be 8, isn't it?
The question is, why 3 bytes of padding, and not more or less?
The answer is that the offset of a field into a struct has the same alignment requirements as the type of the field itself. In your architecture, integers are 4-byte-aligned, so offsetof(b, bint) must be multiple of 4. It cannot be 0, because there is the c before, so it will be 4. If field bint starts at offset 4 and is 4 bytes long, then the size of the struct is 8.
Another way to look at it is that the alignment requirement of a struct is the biggest of any of its fields, so this B will be 4-byte-aligned (as it is your bit field). But the size of a type must be a multiple of the alignment, 4 is not enough, so it will be 8.
I think you're seeing an alignment effect here.
Many architectures require integers to be stored at addresses in memory that are multiple of the word size.
This is why the char in your third struct is being padded with three more bytes, so that the following unsigned integer starts at an address that is a multiple of the word size.
Char are by definition a byte. ints are 4 bytes on a 32 bit system. And the struct is being padded the extra 4.
See http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86 for some explanation of padding
To keep the accesses to memory aligned the compiler is adding padding if you pack the structure it will no add the padding.
I took another look at this and here's what I found.
From the C book, "Almost everything about fields is implementation-dependant."
On my machine:
struct B {
unsigned c: 8;
unsigned b: 1;
}b;
printf("%lu\n", sizeof(b));
print 4 which is a short;
You were mixing bit fields with regular struct elements.
BTW, a bit fields is defined as: "a set of adjacent bits within a sindle implementation-defined storage unit" So, I'm not even sure that the ':8' does what you want. That would seem to not be in the spirit of bit fields (as it's not a bit any more)
The alignment and total size of the struct are platform and compiler specific. You cannot not expect straightforward and predictable answers here. Compiler can always have some special idea. For example:
struct B
{
unsigned b0: 1; // +1 bit
unsigned char c; // +8 bits
unsigned b1: 1; // +1 bit
};
Compiler can merge fields b0 and b1 into one integer and may not. It is up to compiler. Some compilers have command line keys that control this, some compilers not. Other example:
struct B
{
unsigned short c, d, e;
};
It is up to compiler to pack/not pack the fields of this struct (asuming 32 bit platform). Layout of the struct can differ between DEBUG and RELEASE builds.
I would recommend using only the following pattern:
struct B
{
unsigned b0: 1;
unsigned b1: 7;
unsigned b2: 2;
};
When you have sequence of bit fields that share the same type, compiler will put them into one int. Otherwise various aspects can kick in. Also take into account that in a big project you write piece of code and somebody else will write and rewrite the makefile; move your code from one dll into another. At this point compiler flags will be set and changed. 99% chance that those people will have no idea of alignment requirements for your struct. They will not even open your file ever.

How can I have exactly 2 bits in memory?

I should be able to store a value in a data structure that could go from 0 to 3.. so I need 2 bits. This data structure I will be great 2 ^ 16 locations. So, i want to have 2 ^ 16 * 2 (bits). In C + + do you use to have exactly 2 bits in memory?
You need two bits per unit (not three), so you can pack four units into one byte, or 16 units into one 32-bit integer.
So you will need a std::array<uint32_t, 4096> to accomodate 216 units of 2-bit values.
You access the nth value as follows:
unsigned int get(std::size_t n, std::array<uint32_t, 4096> const & arr)
{
const uint32_t u = arr[n / 16];
return (u >> (2 * (n % 16))) & 0x3;
}
Alternatively, you could go with a bitfield:
struct BF32 {
uint32_t u0 : 2;
uint32_t u1 : 2;
//...
uint32_t uF : 2;
}
And then make an std::array<BF32, 4096>.
You cannot allocate a single object that is less than 1 byte (because 1 byte is the smallest addressable unit in the system).
You can, however, have portions of a structure that are smaller than a byte using bitfields. You could create one of these to hold 8 of your values, the size of this is exactly 3 bytes:
#pragma pack(1) // MSVC requires this
struct three_by_eight {
unsigned value1 : 3;
unsigned value2 : 3;
unsigned value3 : 3;
unsigned value4 : 3;
unsigned value5 : 3;
unsigned value6 : 3;
unsigned value7 : 3;
unsigned value8 : 3;
}
__attribute__ ((packed)) // GCC requires this
;
These can be clumsy to work with since they can't be accessed using [].... Your best be would be to create your own class that works similar to a bitset but works on 3 bits instead of 1.
If you are not working on an embedded system and resources are sufficient, you can have a look at std::bitset<> which will make your job as a programmer easier.
But if you are working on an embedded system, the bitset is probably not good for you (your compiler probably doesn't even support templates). There are a number of techniques for manipulating bits, each with its own quirks; here's an article that might help you:
> http://www.atmel.com/dyn/resources/prod_documents/avr_3_04.pdf
0 to 3 has 4 possible values. Since log2(4) == 2, or because 2^2 == 4, you need two bits, no three.
You might want to use bit fields
There was a discussion on the size allocated to bit-field structs last night. A struct cannot be smaller than a byte, and with most machines and compilers will be either 2 or 4, depending on the compiler and word size. So, no, you can't get a 3-bit struct (2-bit as you actually need). You can, however, pack bits yourself into an array of, say, uint64_ts. Or you could make a struct with 16 2-bit members and see if gcc makes that a 4-byte struct, then use an array of those.
There a very old trick to sneak a couple of bits around if you already have some data structures. This is quite nasty and unless you have extremely good reasons, it is most likely not at all a good idea. I'm just pointing this out in case you really really need to save a couple of bits.
Due to alignment, pointers on x86 or x64 are often multiples of 4, hence the two least significant bits of such pointers (e.g. pointers to int) are always 0. You can exploit this and sneak your two bits in there, but you have to make sure to remove them, when accessing those pointers (depending on the architecture, I'm not sure here).
Again, this is nasty, dangerous and pretty UB but perhaps it is worth it in your case.
3^5 = 243
and can fit 5 entries in 8bits. You spend like 20% less space storing lot of data this way. All you need is lookup table for 2 directional lookups and manipulations.

How to use an int as an array of ints/bools?

I noticed while making a program that a lot of my int type variables never went above ten. I figure that because an int is 2 bytes at the shortest (1 if you count char), so I should be able to store 4 unsigned ints with a max value of 15 in a short int, and I know I can access each one individually using >> and <<:
short unsigned int SLWD = 11434;
S is (SLWD >> 12), L is ((SLWD << 4) >> 12),
W is ((SLWD << 8) >> 12), and D is ((SLWD << 8) >> 12)
However, I have no idea how to encompase this in a function of class, since any type of GetVal() function would have to be of type int, which defeats the purpose of isolating the bits in the first place.
First, remember the Rules of Optimization. But this is possible in C or C++ using bitfields:
struct mystruct {
unsigned int smallint1 : 3; /* 3 bits wide, values 0 -- 7 */
signed int smallint2 : 4; /* 4 bits wide, values -8 -- 7 */
unsigned int boolean : 1; /* 1 bit wide, values 0 -- 1 */
};
It's worth noting that while you gain by not requiring so much storage, you lose because it becomes more costly to access everything, since each read or write now has a bunch of bit twiddling mechanics associated with it. Given that storage is cheap, it's probably not worth it.
Edit: You can also use vector<bool> to store 1-bit bools; but beware of it because it doesn't act like a normal vector! In particular, it doesn't provide iterators. It's sufficiently different that it's fair to say a vector<bool> is not actually a vector. Scott Meyers wrote very clearly on this topic in 'Effective STL'.
In C, and for the sole purpose of saving space, you can reinterpret the unsigned short as a structure with bitfields (or use such structure without messing with reinterpretations):
#include <stdio.h>
typedef struct bf_
{
unsigned x : 4;
unsigned y : 4;
unsigned z : 4;
unsigned w : 4;
} bf;
int main(void)
{
unsigned short i = 5;
bf *bitfields = (bf *) &i;
bitfields->w = 12;
printf("%d\n", bitfields->x);
// etc..
return 0;
}
That's a very common technique. You usually allocate an array of the larger primitive type (e.g., ints or longs), and have some abstraction to deal with the mapping. If you're using an OO language, it's usually a good idea to actually define some sort of BitArray or SmartArray or something like that, and impement a getVal() that takes an index. The important thing is to make sure you hide the details of the internal representation (e.g., for when you move between platforms).
That being said, most mainstream languages already have this functionality available.
If you just want bits, WikiPedia has a good list.
If you want more than bits, you can still find something, or implement it yourself with a similar interface. Take a look at the Java BitSet for reference