Disclaimer: Please correct me in the event that I make any false claims in this post.
Consider a struct that contains eight bool member variables.
/*
* Struct uses one byte for each flag.
*/
struct WithBools
{
bool f0 = true;
bool f1 = true;
bool f2 = true;
bool f3 = true;
bool f4 = true;
bool f5 = true;
bool f6 = true;
bool f7 = true;
};
The space allocated to each variable is a byte in length, which seems like a waste if the variables are used solely as flags. One solution to reduce this wasted space, as far as the variables are concerned, is to encapsulate the eight flags into a single member variable of unsigned char.
/*
* Struct uses a single byte for eight flags; retrieval and
* manipulation of data is achieved through accessor functions.
*/
struct WithoutBools
{
unsigned char getFlag(unsigned index)
{
return flags & (1 << (index % 8));
}
void toggleFlag(unsigned index)
{
flags ^= (1 << (index % 8));
}
private:
unsigned char flags = 0xFF;
};
The flags are retrieved and manipulated via. bitwise operators, and the struct provides an interface for the user to retrieve and manipulate the flags. While flag sizes have been reduced, we now have the two additional methods that add to the size of the struct. I do not know how to benchmark this difference, therefore I could not be certain of any fluctuation between the above structs.
My questions are:
1) Would the difference in space between these two structs be negligible?
2) Generally, is this approach of "optimising" a collection of bools by compacting them into a single byte a good idea? Either in an embedded systems context or otherwise.
3) Would a C++ compiler make such an optimisation that compacts a collection of bools wherever possible and appropriate.
we now have the two additional methods that add to the size of the
struct
Methods are code and do not increase the size of the struct. Only data makes up size on the structure.
3) Would a C++ compiler make such an optimisation that compacts a
collection of bools wherever possible and appropriate.
That is a sound resounding no. The compiler is not allowed to change data types.
1) Would the difference in space between these two structs be
negligible?
No, there definitely is a size difference between the two approaches.
2) Generally, is this approach of "optimising" a collection of bools
by compacting them into a single byte a good idea? Either in an
embedded systems context or otherwise.
Generally yes, the idiomatic way to model flags is with bit-wise manipulation inside an unsigned integer. Depending on the number of flags needed you can use std::uint8_t, std::uint16_t and so on.
However the most common way to model this is not via index as you've done, but via masks.
Would the difference in space between these two structs be negligible?
That depends on how many values you are storing and how much space you have to store them in. The size difference is 1 to 8.
Generally, is this approach of "optimising" a collection of bools by compacting them into a single byte a good idea? Either in an embedded systems context or otherwise.
Again, it depends on how many values and how much space. Also note that dealing with bits instead of bytes increases code size and execution time.
Many embedded systems have relatively little RAM and plenty of Flash. Code is stored in Flash, so the increased code size can be ignored, and the saved memory could be important on small RAM systems.
Would a C++ compiler make such an optimisation that compacts a collection of bools wherever possible and appropriate.
Hypothetically it could. I would consider that an aggressive space optimization, at the expense of execution time.
STL has a specialization for vector<bool> that I frequently avoid for performance reasons - vector<char> is much faster.
Related
I often see structures in the code, at the end of which there is a memory reserve.
struct STAT_10K4
{
int32_t npos; // position number
...
float Plts;
Pxts;
float Plto [NUM];
uint32_t reserv [(NUM * 3)% 2 + 1];
};
Why do they do this?
Why are some of the reserve values dependent on constants?
What can happen if you do not make such reserves? Or make a mistake in their size?
This is a form of manual padding of a class to make its size a multiple of some number. In your case:
uint32_t reserv [(NUM * 3)% 2 + 1];
NUM * 3 % 2 is actually nonsensical, as it would be equivalent to NUM % 2 (not considering overflow). So if the array size is odd, we pad the struct with one additional uint32_t, on top of + 1 additional ones. This padding means that STAT_10K4's size is always a multiple of 8 bytes.
You will have to consult the documentation of your software to see why exactly this is done. Perhaps padding this struct with up to 8 bytes makes some algorithm easier to implement. Or maybe it has some perceived performance benefit. But this is pure speculation.
Typically, the compiler will pad your structs to 64-bit boundaries if you use any 64-bit types, so you don't need to do this manually.
Note: This answer is specific to mainstream compilers and x86. Obviously this does not apply to compiling for TI-calculators with 20-bit char & co.
This would typically be to support variable-length records. A couple of ways this could be used will be:
1 If the maximum number of records is known then a simple structure definition can accomodate all cases.
2 In many protocols there is a "header-data" idiom. The header will be a fixed size but the data variable. The data will be received as a "blob". Thus the structure of the header can be declared and accessed by a pointer to the blob, and the data will follow on from that. For example:
typedef struct
{
uint32_t messageId;
uint32_t dataType;
uint32_t dataLenBytes;
uint8_t data[MAX_PAYLOAD];
}
tsMessageFormat;
The data is received in a blob, so a void* ptr, size_t len.
The buffer pointer is then cast so the message can be read as follows:
tsMessageFormat* pMessage = (psMessageFormat*) ptr;
for (int i = 0; i < pMessage->dataLenBytes; i++)
{
//do something with pMessage->data[i];
}
In some languages the "data" could be specified as being an empty record, but C++ does not allow this. Sometimes you will see the "data" omitted and you have to perform pointer arithmetic to access the data.
The alternative to this would be to use a builder pattern and/or streams.
Windows uses this pattern a lot; many structures have a cbSize field which allows additional data to be conveyed beyond the structure. The structure accomodates most cases, but having cbSize allows additional data to be provided if necessary.
The code base I work in is quite old. While we compile nearly everything with c++11. Much of the code was written in c many years ago. When developing new classes in old areas I always find myself in a situation where I have to choose between matching old methodologies, or going with a more modern approach.
In most cases, I prefer sticking with more modern techniques when possible. However, one common old practice I often see, which I have a hard time arguing the use of, is bitfields. We pass a lot of messages, here, many times, they are full of single bit values. Take the example below:
class NewStructure
{
public:
const bool getValue1() const
{
return value1;
}
void setValue1(const bool input)
{
value1 = input;
}
private:
bool value1;
bool value2;
bool value3;
bool value4;
bool value5;
bool value6;
bool value7;
bool value8;
};
struct OldStructure
{
const bool getValue1() const
{
return value1;
}
void setValue1(const bool input)
{
value1 = input;
}
unsigned char value1 : 1;
unsigned char value2 : 1;
unsigned char value3 : 1;
unsigned char value4 : 1;
unsigned char value5 : 1;
unsigned char value6 : 1;
unsigned char value7 : 1;
unsigned char value8 : 1;
};
In this case the sizes are 8 bytes for the New Structure and 1 for the old.
I added a "getter" and "setter" to illustrate the point that from a user perspective, they can be identical. I realize that perhaps you could make the case of readability for the next developer, but other than that, is there a reason to avoid bit fields? I know that packed fields take a performance hit, but because these are all characters, padding rules are still in place.
There are several things to consider when using bitfields. Those are (order of importance would depend on situation)
Performance
Bitfields operation incur performance penalty when set or read (as compared to direct types). A simple example of codegen shows the extra instructions emitted: https://gcc.godbolt.org/z/DpcErN However, bitfields provide for more compact data, which becomes more cache-friendly, and that could completely outweigh any drawbacks of additional operations. The only way to understand the real performance impact is by benchmarking actual application in the real use case.
ABI Interoperability
Endiannes of bit fields is implementation defined, so layout of the same struct produced by two compiler can differ.
Usability
There is no reference binding to a bitfield, nor you can take it's address. This could affect code and make it less clear.
For you, as the programmer, there's not much difference. But the machine code to access a whole byte is much simpler/shorter than to access an individual bit, so using bitfields bulks up the generated code.
In a pseudo-assembly-language, your setter might turn into something like:
ldb input1,b ; get the new value into accumulator b
movb b,value1 ; put it into the variable
rts ; return from subroutine
But it's not so easy for bitfields:
ldb input1,b ; get the new value into accumulator b
movb bitfields,a ; get current bitfield values into accumulator a
cmpb b,#0 ; See what to do.
brz clearvalue1: ; If it's zero, go to clearing the bit
orb #$80,a ; set the bit representing value1.
bra resume: ; skip the clearing code.
clearvalue1:
andb #$7f,a ; clear the bit representing value1
resume:
movb a,bitfields ; put the value back
rts ; return
And it has to do that for each of your 8 members' setters, and something similar for getters. It adds up. Additionally, even today's dumbest compilers would probably inline the full-byte setter code, rather than actually make a subroutine call. For the bitfield setter, it may depend whether you're compiling optimizing for speed vs. space.
And you've only asked about booleans. If they were integer bitfields, then the compiler has to deal with loading, masking out prior values, shifting the value to align into its field, masking out unused bits, and/or the value into place, then writing it back to memory.
So why would you use one vs. the other?
Bitfields are slower, but pack data more efficiently.
Non-bitfields are faster, and require less machine code to access.
As the developer, it's your judgement call. If you will be keeping many instances of Structure in memory at once, the memory savings may be worth it. If you aren't going to have many instances of that structure in memory at once, the compiled code bloat offsets memory savings and you're sacrificing speed.
template<typename enum_type,size_t n_bits>
class bit_flags{
std::bitset<n_bits> bits;
auto operator[](enum_type bit){return bits[bit];};
auto& set(enum_type bit)){return set(bit);};
auto& reset(enum_type bit)){return set(bit);};
//go on with flip et al...
static_assert(std::is_enum<enum_type>{});
};
enum class v_flags{v1,v2,/*...*/vN};
bit_flags<v_flags,v_flags::vN+1> my_flags;
my_flags.set(v_flags::v1);
my_flags.[v_flags::v2]=true;
std::bitset is as efficient as bool bit fields. You can wrap it in a class to force using every bit by names defined in an enum. Now you have a small but scalable utility to use for multiple defferent sets of bool flags. C++17 makes it even more convenient:
template<auto last_flag, typename enum_type=decltype(last_flag)>
class bit_flags{
std::bitset<last_flag+1> bits;
//...
};
bit_flags<v_flags::vN+1> my_flags;
I want to use and store "Handles" to data in an object buffer to reduce allocation overhead. The handle is simply an index into an array with the object. However I need to detect use-after-reallocations, as this could slip in quite easily. The common approach seems to be using bit fields. However this leads to 2 problems:
Bit fields are implementation defined
Bit shifting is not portable across big/little endian machines.
What I need:
Store handle to file (file handler can manage either integer types (byte swapping) or byte arrays)
Store 2 values in the handle with minimum space
What I got:
template<class T_HandleDef, typename T_Storage = uint32_t>
struct Handle
{
typedef T_HandleDef HandleDef;
typedef T_Storage Storage;
Handle(): handle_(0){}
private:
const T_Storage handle_;
};
template<unsigned T_numIndexBits = 16, typename T_Tag = void>
struct HandleDef{
static const unsigned numIndexBits = T_numIndexBits;
};
template<class T_Handle>
struct HandleAccessor{
typedef typename T_Handle::Storage Storage;
typedef typename T_Handle::HandleDef HandleDef;
static const unsigned numIndexBits = HandleDef::numIndexBits;
static const unsigned numMagicBits = sizeof(Storage) * 8 - numIndexBits;
/// "Magic" struct that splits the handle into values
union HandleData{
struct
{
Storage index : numIndexBits;
Storage magic : numMagicBits;
};
T_Handle handle;
};
};
A usage would be for example:
typedef Handle<HandleDef<24> > FooHandle;
FooHandle Create(unsigned idx, unsigned m){
HandleAccessor<FooHandle>::HandleData data;
data.idx = idx;
data.magic = m;
return data.handle;
}
My goal was to keep the handle as opaque as possible, add a bool check but nothing else. Users of the handle should not be able to do anything with it but passing it around.
So problems I run into:
Union is UB -> Replace its T_Handle by Storage and add a ctor to Handle from Storage
How does the compiler layout the bit field? I fill the whole union/type so there should be no padding. So probably the only thing that can be different is which type comes first depending on endianess, correct?
How can I store handle_ to a file and load it from a possible different endianess machine and still have index and magic be correct? I think I can store the containing Storage 'endian-correct' and get correct values, IF both members occupy exactly half the space (2 Shorts in an uint) But I always want more space for the index than for the magic value.
Note: There are already questions about bitfields and unions. Summary:
Bitfields may have unexpected padding (impossible here as whole type occupied)
Order of "members" depend on compiler (only 2 possible ways here, should be save to assume order depends entirely on endianess, so this may or may not actually help here)
Specific binary layout of bits can be achieved by manual shifting (or e.g. wrappers http://blog.codef00.com/2014/12/06/portable-bitfields-using-c11/) -> Is not an answer here. I need also a specific layout of the values IN the bitfield. So I'm not sure what I get, if I e.g. create a handle as handle = (magic << numIndexBits) | index and save/load this as binary (no endianess conversion) Missing a BigEndian machine for testing.
Note: No C++11, but boost is allowed.
Answer is pretty simple (based on another question I forgot the link to and comments by #Jeremy Friesner ):
As "numbers" are already an abstraction in C++ one can be sure to always have the same bit representation when the variable is in a CPU register (when it is used for anything calculation like) Also bit shifts in C++ are defined in an endian-independent way. This means x << 1 is always equal x * 2 (and hence big-endian)
Only time one get endianess problems is when saving to file, send/recv over network or accessing it from memory differently (e.g. via pointers...)
One cannot use C++ bitfields here, as one cannot be 100% sure about the order of the "entries". Bitfield containers might be ok, if they allow access to the data as a "number".
Savest is (still) using bitshifts, which are very simple in this case (only 2 values) During storing/serialization the number must then be stored in an endian-agnostic way.
If there is for example a class that requires a pointer and a bool. For simplicity an int pointer will be used in examples, but the pointer type is irrelevant as long as it points to something whose size() is more than 1 .
Defining the class with { bool , int *} data members will result in the class having a size that is double the size of the pointer and a lot of wasted space
If the pointer does not point to a char (or other data of size(1)), then presumably the low bit will always be zero. The class could defined with {int *} or for convenience: union { int *, uintptr_t }
The bool is implemented by setting/clearing the low bit of the pointer as per the logical bool value and clearing the bit when you need to use the pointer.
The defined way:
struct myData
{
int * ptr;
bool flag;
};
myData x;
// initialize
x.ptr = new int;
x.flag = false;
// set flag true
x.flag = true;
// set flag false
x.flag = false;
// use ptr
*(x.ptr)=7;
// change ptr
x = y; // y is another int *
And the proposed way:
union tiny
{
int * ptr;
uintptr_t flag;
};
tiny x;
// initialize
x.ptr = new int;
// set flag true
x.flag |= 1;
// set flag false
x.flag &= ~1;
// use ptr
tiny clean=x; // note that clean will likely be optimized out
clean.flag &= ~1; // back to original value as assigned to ptr
*(clean.ptr)=7;
// change ptr
bool flag=x.flag;
x.ptr = y; // y is another int *
x.flag |= flag;
This seems to be undefined behavior, but how portable is this?
As long as you restore the pointer's low-order bit before trying to use it as a pointer, it's likely to be "reasonably" portable, as long as your system, your C++ implementation, and your code meet certain assumptions.
I can't necessarily give you a complete list of assumptions, but off the top of my head:
It assumes you're not pointing to anything whose size is 1 byte. This excludes char, unsigned char, signed char, int8_t, and uint8_t. (And that assumes CHAR_BIT == 8; on exotic systems with, say, 16-bit or 32-bit bytes, other types might be excluded.)
It assumes objects whose size is at least 2 bytes are always aligned at an even address. Note that x86 doesn't require this; you can access a 4-byte int at an odd address, but it will be slightly slower. But compilers typically arrange for objects to be stored at even addresses. Other architectures may have different requirements.
It assumes a pointer to an even address has its low-order bit set to 0.
For that last assumption, I actually have a concrete counterexample. On Cray vector systems (J90, T90, and SV1 are the ones I've used myself) a machine address points to a 64-bit word, but the C compiler under Unicos sets CHAR_BIT == 8. Byte pointers are implemented in software, with the 3-bit byte offset within a word stored in the otherwise unused high-order 3 bits of the 64-bit pointer. So a pointer to an 8-byte aligned object could have easily its low-order bit set to 1.
There have been Lisp implementations (example) that use the low-order 2 bits of pointers to store a type tag. I vaguely recall this causing serious problems during porting.
Bottom line: You can probably get away with it for most systems. Future architectures are largely unpredictable, and I can easily imagine your scheme breaking on the next Big New Thing.
Some things to consider:
Can you store the boolean values in a bit vector outside your class? (Maintaining the association between your pointer and the corresponding bit in the bit vector is left as an exercise).
Consider adding code to all pointer operations that fails with an error message if it ever sees a pointer with its low-order bit set to 1. Use #ifdef to remove the checking code in your production version. If you start running into problems on some platform, build a version of your code with the checks enabled and see what happens.
I suspect that, as your application grows (they seldom shrink), you'll want to store more than just a bool along with your pointer. If that happens, the space issue goes away, because you're already using that extra space anyway.
In "theory": it's undefined behavior as far as I know.
In "reality": it'll work on everyday x86/x64 machines, and probably ARM too?
I can't really make a statement beyond that.
It's very portable, and furthermore, you can assert when you accept the raw pointer to make sure it meets the alignment requirement. This will insure against the unfathomable future compiler that somehow messes you up.
Only reasons not to do it are the readability cost and general maintenance associated with "hacky" stuff like that. I'd shy away from it unless there's a clear gain to be made. But it is sometimes totally worth it.
Conform to those rules and it should be very portable.
I have a series of classes that are going to require many boolean fields, somewhere between 4-10. I'd like to not have to use a byte for each boolean. I've been looking into bit field structs, something like:
struct BooleanBitFields
{
bool b1:1;
bool b2:1;
bool b3:1;
bool b4:1;
bool b5:1;
bool b6:1;
};
But after doing some research I see a lot of people saying that this can cause inefficient memory access and not be worth the memory savings. I'm wondering what the best method for this situation is. Should I use bit fields, or use a char with bit masking (and's and or
s) to store 8bits? If the second solution is it better to bit shift or use logic?
If anyone could comment as to what method they would use and why it would really help me decide which route I should go down.
Thanks in advance!
With the large address spaces on desktop boxes, an array of 32/64-bit booleans may seem wasteful, and indeed it is, but most developers don't care, (me included). On RAM-restricted embedded controllers, or when accessing hardware in drivers, then sure, use bitfields, otherwise..
One other issue, apart from R/W ease/speed, is that a 32- or 64-bit boolean is thread-safer than one bit in the middle that has to be manipulated by multiple logical operations.
Bit fields are only a recommendation for the compiler. The compiler is free to implement them as it likes. On embedded systems there are compilers that guarantee 1 bit-to-bit mapping. Other compilers don't.
I would go with a regular struct, like yours but no bit fields. Make them unsigned chars - the shortest data type. The struct will make it easier to access them while editing, if your IDE supports auto completion.
Use an int bit array (leaves you lots of space to expand, and there is no advantage to a single char) and test with mask constants:
#define BOOL_A 1
#define BOOL_B 1 << 1
#define BOOL_C 1 << 2
#define BOOL_D 1 << 3
/* Alternately: use const ints for encapsulation */
// declare and set
int bitray = 0 | BOOL_B | BOOL_D;
// test
if (bitray & BOOL_B) cout << "Set!\n";
I want to write an answer to make sure once again and formalize the thought: "What does the transition from working with bytes to working with bits entail?" And also because the answer "I don't care" seems to me to be unreasonable.
Exploring char vs bitfield
Agree, It's very tempting. Especially when it's supposed to be used like this:
#define FLAG_1 1
#define FLAG_2 (1 << 1)
#define FLAG_3 (1 << 2)
#define FLAG_4 (1 << 3)
struct S1 {
char flag_1: 1;
char flag_2: 1;
char flag_3: 1;
char flag_4: 1;
}; //sizeof == 1
void MyFunction(struct S1 *obj, char flags) {
obj->flag_1 = flags & FLAG_1;
obj->flag_2 = flags & FLAG_2;
obj->flag_3 = flags & FLAG_3;
obj->flag_4 = flags & FLAG_4;
// we desire it to be as *obj = flags;
}
int main(int argc, char **argv)
{
struct S1 obj;
MyFunction(&obj, FLAG_1 | FLAG_2 | FLAG_3 | FLAG_4);
return 0;
}
But let's cover all aspects of such optimization. Let's decompose the operation into simpler C-commands, roughly corresponding to the assembler commands:
Initialization of all flags.
char flags = FLAG_1 | FLAG_3;
//obj->flag_1 = flags & FLAG_1;
//obj->flag_2 = flags & FLAG_2;
//obj->flag_3 = flags & FLAG_3;
//obj->flag_4 = flags & FLAG_4;
*obj = flags;
Writing one flag as a constant
//obj.flag_3 = 1;
char a = *obj;
a &= ~FLAG_3;
a |= FLAG_3;
*obj = a;
Write a single flag using a variable
char b = 3;
//obj.flag_3 = b;
char a = *obj;
a &= ~FLAG_3;
char c = b;
c <<= 3;
c &= ~FLAG_3; //Fixing b > 1
a |= c;
*obj = a;
Reading one flag into variable
//char f = obj.flag_3;
char f = *obj;
f >>= 3;
f &= 0x01;
Write one flag to another
//obj.flag_2 = obj.flag_4;
char a = *obj;
char b = a;
a &= FLAG_4;
a <<= 2; //Shift to FLAG_2 position
b |= a;
*obj = b;
Resume
Command
Cost, bitfield
Cost, variable
1. Init
1
4 or less
2. obj.flag_3 = 1;
3
1
3. obj.flag_3 = b;
7
1 or 3 *
4. char f = obj.flag_3;
2
1
5. obj.flag_2 = obj.flag_4;
6
1
*- if we guarantee flag be no more than 1
All operations except initialization take many lines of code. It looks like it would be better for us to leave bit fields alone after initialization)))). However, this is usually what happens to flags all the time. They change their state without warning and randomly.
We are essentially trying to make the rare value initialization operation cheaper by sacrificing frequent value change operations.
There are systems in which bitwise comparison operations, bit set and reset, bit copying and even bit swapping, bit branching, take one cycle. There are even systems in which mutex locking operations are implemented by a single assembler instruction (in such systems, bit fields may not be located on the entire memory area, for example, PIC microcontrollers). in any way it's not a common memory area.
Perhaps in such systems, the bool type could point to a component of the bitfield.
If your desire to save on insignificant bits of a byte has not yet disappeared, try to think about implementing addressability, atomicity of operations, arithmetic with bytes, and the resulting overhead for calls, data memory, code memory, stack if algorithms are placed in functions.
Reflections on the choice of bool or char
If your target platform decodes the bool type as 2 bytes or 4 or more. That most likely operations with bits on it will not be optimized. Rather, it is a platform for high-volume computing. This means that bit operations are not so in demand on it, in addition, operations with bytes and words are not so in demand on it.
In the same way that operations on bits hurt performance, operations on a single byte can also greatly increase the number of cycles to access a variable.
No system can be equally optimal for everything at once. Instead of obsessing over memory savings in systems that are clearly built with a lot of memory surplus, pay attention to the strengths of those systems.
Conclusion
Use char or bool if:
You need to store the mutable state or behavior of the algorithm (and change and return flags individually).
Your flag does not accurately describe the system and could evolve into a number.
You need to be able to access the flag by address.
If your code claims to be platform independent and there is no guarantee that bit operations will be optimized on the target platform.
Use bitfields if:
You need to store a huge number of flags without having to constantly read and rewrite them.
You have unusually tight memory requirements, or memory is low.
In other deeply justified cases, with calculations and confirming experiments.
Perhaps a short rule might be:
Independent flags are stored in a bool.
P.S.: If you've read this far and still want to save 7 bits out of 8, then consider why there is no desire to use 7 bit bit fields for variables that take a value up to 100 maximum.
References
Raymond Chen: The cost-benefit analysis of bitfields for a collection of booleans