Are boolean variables typically implemented as single bits? [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
One-byte bool. Why?
I want to add a boolean variable to a class. However, this class is pretty size-sensitive, and as a result I'm loath to add another field. However, it is composed of a pile of members that are at least a char wide, and a single other bool.
If I were hand-writing this code, I would implement those boolean fields as bits in the last byte or so of the object. Since accesses have to be byte-aligned, this would cause no spacial overhead.
Now, do compilers typically do this trick? The only reason I can of for them not to is because it would involve an additional mask to get that bit out of there.

No, compilers can't do this trick because the address of each member has to be distinct. If you want to pack a fixed number of bits, use std::bitset. If you need a variable number of bits use boost::dynamic_bitset.

No, I don't know of any compilers which optimize a bool down to a bit.
You can force this behavior via:
unsigned int m_firstBit : 1;
unsigned int m_secondBit : 1;
unsigned int m_thirdBit : 1;
As for reasons why not, it would likely violate some language guarantees. For instance, you couldn't pass &myBool to a function which takes a bool* if it doesn't have its own reserved byte.

Compilers typically do not do that, but you could use std::bitset<2> to pack two bools into one byte.

Related

C++: Is bool a 1-bit variable?

I was wondering if bools in C++ are actually 1-bit variables.
I am working on a PMM for my kernel and using (maybe multidimensional) bool-arrays would be quiet nice. But i don't want to waste space if a bool in C++ is 8 bit long...
EDIT: Is a bool[8] then 1 Byte long? Or 8 Bytes? Could i maybe declare something like bool bByte[8] __attribute__((packed)); when using gcc?
And as i said: I am coding a kernel. So i can't include the standard librarys.
No there's no such thing like a 1 bit variable.
The smallest unit that can be addressed in c++ is a unsigned char.
Is a bool[8] then 1 Byte long?
No.
Or 8 Bytes?
Not necessarily. Depends on the target machines number of bits taken for a unsigned char.
But i don't want to waste space if a bool in C++ is 8 bit long...
You can avoid wasting space when dealing with bits using std::bitset, or boost::dynamic_bitset if you need a dynamic sizing.
As pointed out by #zett42 in their comment you can also address single bits with a bitfield struct (but for reasons of cache alignement this will probably use even more space):
struct S {
// will usually occupy 4 bytes:
unsigned b1 : 1,
b2 : 1,
b3 : 1;
};
A bool uses at least one (and maybe more) byte of storage, so yes, at least 8 bits.
A vector<bool>, however, normally stores a bool in only one bit, with some cleverness in the form of proxy iterators and such to (mostly) imitate access to actual bool objects, even though that's not what they store. The original C++ standard required this. More recent ones have relaxed the requirements to allow a vector<bool> to actually be what you'd normally expect (i.e., just a bunch of bool objects). Despite the relaxed requirements, however, a fair number of implementations continue to store them in packed form in a vector<bool>.
Note, however, that the same is not true of other container types--for example, a list<bool> or deque<bool> cannot use a bit-packed representation.
Also note that due to the requirement for a proxy iterator (and such) a vector<bool> that uses a bit-packed representation for storage can't meet the requirements imposed on normal containers, so you need to be careful in what you expect from them.
The smallest unit of addressable memory is a char. A bool[N] or std::array<bool, N> will use as much space as a char[N] or std::array<char, N>.
It is permitted by the standard (although not required) that implementations of std::vector<bool> may be specialized to pack bits together.

Creating a simple portable bitmask and using it

This is my first time trying to create a bitmask, and although seemingly simple I have having trouble visualizing everything.
Keep in mind I cannot use std::bitset
First, I have read that accessing raw bits is undefined behavior. (so using a union of a char would be bad because the bits might be reversed for a different compiler).
Most code I've looked at uses a struct to define each bit, and this way of structuring data should be compiler independent because the first bit will always be the LSB. (I assume) Here is an example:
struct foo
{
unsigned char a : 1;
unsigned char b : 1;
unsigned char unused : 6;
};
Now the question is...could you use more than one bit for a variable in the struct AND have it still be comipiler independent? It seems like the answer is yes, but I have had some weird answers and want to be sure. Something like:
struct foo
{
unsigned char ab : 2;
unsigned char unused : 6;
};
It seems like regardless if the raw structure is reversed, the first bit accessed from the struct is always the LSB, so how many bits you use should not matter.
The C standard does not specify the ordering of fields within a unit -- there's no guarantee that a, in your example, is in the LSB. If you want fully portable behavior, you need to do the bit manipulation yourself, using unsigned integral types, and (if using unsigned integral types bigger than a byte) you need to worry about the endianness when reading/writing them from external sources.
The behaviour does not depend on the bit order. What you have written corresponds to the language standard and therefore behaves the same on all platforms.
Bitfields cannot be portably used to access specific bits in an external block of data (like a hardware register or data serialized in a stream of bytes). So bitfields aren't useful in this context - at least for portable code.
But if you're talking about using the bitfield within the program and not trying to have it model some external bit representation, then it's 100% portable. Not super useful, but portable.
I've spent a career twiddling bits in C/C++, and maybe because of this issue, I never see it done this way. We always use unsigned variables and apply bit masks to them:
#define BITMASK_A 0x01
#define BITMASK_B 0x02
unsigned char bitfield;
Then when you want to access a, you use (bitfield & BITMASK_A)
But to answer your question, there should be no logical difference between your two examples, if the compiler places ab at the low end, then the first example should also place a at the LSb.

How many options can I include in a bit mask?

In this question it was pointed out that:
Using int [for bit mask] is asking for trouble
I have been using an unsigned char to store bitmask flags, but it occurs to me that I will hit low limit since a char is only a byte, thus 8 bits, thus only 8 options in my mask?
enum options{
k1=1<<0,
k2=1<<1,
.... through to k8
}
unsigned char myOption=k2;
Do I simply need to make myOption an int or some other type for example if I wish it to store more than 8 possible options (and combinations of options, of course, hence why I am using the bit mask in the first place)? What's the best type?
If you need an unknown number of 'bits' you could use something like the std::vector<bool> class, see here:
http://www.cplusplus.com/reference/vector/vector-bool/
This is a specialization of the vector class which can pack the bool values using bits, so it is more space efficient than an array of bools (whether you need that extra efficiency is up to you).
Of course I don't know what your application is, there are many valid reasons for using bitfields. If you are simply storing a bunch of true and false values though, something like an array of bools or this vector of bools might be more easily maintained (it has downsides though of course, you can't test to see if say 3 bits are all set in one operation as you can with masking and bitfields, so it is application specific).
vector<bool> is somewhat controversial though, I think. See: http://howardhinnant.github.io/onvectorbool.html
#include <stdint.h>
This defines types with fixed sizes that are not compiler specific.
int16_t = 16 bits
uint16_t = 16 bits unsigned
int32_t = 32 bits
If you need more than 64 flags you should consider the ::std::vector<> as Wayne Uroda suggested.

C++ BOOL (typedef int) vs bool for performance

I read somewhere that using BOOL (typedef int) is better than using the standard c++ type bool because the size of BOOL is 4 bytes (i.e. a multiple of 4) and it saves alignment operations of variables into registers or something along those lines...
Is there any truth to this? I imagine that the compiler would pad the stack frames in order to keep alignments of multiple of 4s even if you use bool (1 byte)?
I'm by no means an expert on the underlying workings of alignments, registers, etc so I apologize in advance if I've got this completely wrong. I hope to be corrected. :)
Cheers!
First of all, sizeof(bool) is not necessarily 1. It is implementation-defined, giving the compiler writer freedom to choose a size that's suitable for the target platform.
Also, sizeof(int) is not necessarily 4.
There are multiple issues that could affect performance:
alignment;
memory bandwidth;
CPU's ability to efficiently load values that are narrower than the machine word.
What -- if any -- difference that makes to a particular piece of code can only be established by profiling that piece of code.
The only guaranteed size you can get in C++ is with char, unsigned char, and signed char 2), which are always exactly one byte and defined for every platform.0)1)
0) Though a byte does not have a defined size. sizeof(char) is always 1 byte, but might be 40 binary bits in fact
1) Yes, there is uint32_t and friends, but no, their definition is optional for actual C++ implementations. Use them, but you may get compile time errors if they are not available (compile time errors are always good)
2) char, unsigned char, and signed char are distinct types and it is not defined whether char is signed or not. Keep this in mind when overloading functions and writing templates.
There are three commonly accepted performance-driven practices in regards to booleans:
In if-statements order of checking the expressions matters and one needs to be careful about them.
If a check of a boolean expression causes a lot of branch mispredictions, then it should (if possible) be substituted with a bit twiddling hack.
Since boolean is a smallest data type, boolean variables should be declared last in structures and classes, so that padding does not add noticeable holes in the structure memory layout.
I've never heard about any performance gain from substituting a boolean with (unsigned?) integer however.

In C++ what is the proper term for splitting an int into bits

I see in some C++ code things like:
// Header
struct SomeStruct {
uint32_t nibble1:4, bitField1:1, bitField2:1, bitField3:1, bitField4:1,
padding:11, field5Bits:5, byteField:8;
};
What is this called? I typically like to google before asking here, but I have no idea what to even type in. I'm hoping to understand this when it comes to endianness - is bit order something to consider or just byte order? Also, what is the type of each field - bitFieldX should be a bool, while field5Bits should be a uint8_t. At least that's what I would think.
Thanks.
They are called bitfields (MSVC) (GCC)
Endianess usually refers to the order of bytes. However bit order can be important, see the above links.
They behave as an unsigned int (uint32_t) in your case.
In general, the term for selecting several bits out of a larger binary integer representation is masking.
What you posted is a packed structure. The elements within the structure are know as bitfields as others have posted. These are often used to represent communication protocol structures, where the protocol specifies fields that are less than one byte, or not aligned to a byte, half-word or word alignment that would normally take place.
Since there is only one type listed, each member of the structure is the same type, uint_32.
Endianess does matter for anthing that is part of a data type that is larger than 1 byte.