Memory Conservation with Manual Bit Fields vs. std::bitset - c++

I'm learning about bit flags and creating bit fields manually using bit-wise operators. I then came across bitsets, seemingly an easier and cleaner way of storing a field of bits. I understand the value of using bit fields as far as minimizing memory usage. After testing the sizeof(bitset), though, I'm having a hard time understanding how this is a better approach.
Consider:
#include <bitset>
#include <iostream>
int main ()
{
// Using Bit Set, Size = 8 Bytes
const unsigned int i1 = 0;
const unsigned int i2 = 1;
std::bitset<8> mySet(0);
mySet.set(i1);
mySet.set(i2);
std::cout << sizeof(mySet) << std::endl;
// Manually managing bit flags
const unsigned char t1 = 1 << 0;
const unsigned char t2 = 1 << 1;
unsigned char bitField = 0;
bitField |= t1 | t2;
std::cout << sizeof(bitField) << std::endl;
return 0;
}
The output is:
The mySet is 8 bytes. The bitField is 1 byte.
Should I not use std::bitset if minimal memory usage is desired?

For the lowest possible memory footprint you shouldn't use std::bitset. It will likely require more memory than a plain built in type like char or int of equivalent effective size. Thus, it probably has memory overhead, but how much will depend on the implementation.
One major advantage of std::bitset is that it frees you from hardware-dependent implementations of various types. In theory, the hardware can use any representation for any type, as long as it fulfills some requirements in the C++ standard. Thus when you rely on unsigned char t1 = 1 to be 00000001 in memory, this is not actually guaranteed. But if you create a bitset and initialize it properly, it won't give you any nasty surprises.
A sidenote on bitfiddling: considering the pitfalls to fiddling with bits in this way, can you really justify this error-prone method instead of using std::bitset or even types like int and bool? Unless you're extremely resource constrained (e.g. MCU / DSP programming), I don't think you can.
Those who play with bits will be bitten, and those who play with bytes will be bytten.
By the way, the char bitField you declare and manipulate using bitwise operators is a bit field, but it's not the C++ language notion of a bit field, which looks like this:
struct BitField{
unsigned char flag1 : 1, flag2 : 1, flag3 : 1;
}
Loosely speaking, it is a data structure whose data member(s) are subdivided into separate variables. In this case the unsigned char of (presumably) 8 bits is used to create three 1-bit variables (flag1, flag2 and flag3). It is explicitly subdivided, but at the end of the day this is just compiler-/language-assisted bit fiddling similar to what you did above. You can read more about bit fields here.

Related

How to iterate over every bit of a type in C++

I wanted to write the Digital Search Tree in C++ using templates. To do that given a type T and data of type T I have to iterate over bits of this data. Doing this on integers is easy, one can just shift the number to the right an appropriate number of positions and "&" the number with 1, like it was described for example here How to get nth bit values . The problem starts when one tries to do get i'th bit from the templated data. I wrote something like this
#include <iostream>
template<typename T>
bool getIthBit (T data, unsigned int bit) {
return ((*(((char*)&data)+(bit>>3)))>>(bit&7))&1;
}
int main() {
uint32_t a = 16;
for (int i = 0; i < 32; i++) {
std::cout << getIthBit (a, i);
}
std::cout << std::endl;
}
Which works, but I am not exactly sure if it is not undefined behavior. The problem with this is that to iterate over all bits of the data, one has to know how many of them are, which is hard for struct data types because of padding. For example here
#include <iostream>
struct s {
uint32_t i;
char c;
};
int main() {
std::cout << sizeof (s) << std::endl;
}
The actual data has 5 bytes, but the output of the program says it has 8. I don't know how to get the actual size of the data, or if it is at all possible. A question about this was asked here How to check the size of struct w/o padding? , but the answers are just "don't".
It's easy to know know how many bits there are in a type. There's exactly CHAR_BIT * sizeof(T). sizeof(T) is the actual size of the type in bytes. But indeed, there isn't a general way within standard C++ to know which of those bits - that are part of the type - are padding.
I recommend not attempting to support types that have padding as keys of your DST.
Following trick might work for finding padding bits of trivially copyable classes:
Use std::memset to set all bits of the object to 0.
For each sub object with no sub objects of their own, set all bits to 1 using std::memset.
For each sub object with their own sub objects, perform the previous and this step recursively.
Check which bits stayed 0.
I'm not sure if there are any technical guarantees that the padding actually stays 0, so whether this works may be unspecified. Furthermore, there can be non-classes that have padding, and the described trick won't detect those. long double is typical example; I don't know if there are others. This probably won't detect unused bits of integers that underlie bitfields.
So, there are a lot of caveats, but it should work in your example case:
s sobj;
std::memset(&sobj, 0, sizeof sobj);
std::memset(&sobj.i, -1, sizeof sobj.i);
std::memset(&sobj.c, -1, sizeof sobj.c);
std::cout << "non-padding bits:\n";
unsigned long long ull;
std::memcpy(&ull, &sobj, sizeof sobj);
std::cout << std::bitset<sizeof sobj * CHAR_BIT>(ull) << std::endl;
There's a Standard way to know if a type has unique representation or not. It is std::has_unique_object_representations, available since C++17.
So if an object has unique representations, it is safe to assume that every bit is significant.
There's no standard way to know if non-unique representation caused by padding bytes/bits like in struct { long long a; char b; }, or by equivalent representations¹. And no standard way to know padding bits/bytes offsets.
Note that "actual size" concept may be misleading, as padding can be in the middle, like in struct { char a; long long b; }
Internally compiler has to distinguish padding bits from value bits to implement C++20 atomic<T>::compare_exchange_*. MSVC does this by zeroing padding bits with __builtin_zero_non_value_bits. Other compiler may use other name, another approach, or not expose atomic<T>::compare_exchange_* internals to this level.
¹ like multiple NaN floating point values

Serializing/deserializing a bitfield structure in c++

I have a
typedef struct {
uint32_t Thread: HTHREAD_BUS_WIDTH;
uint32_t Member: 3;
uint32_t Proxy:3;
// Other members, fill out 32 bits
} MyStruct;
that I must transfer from one system to another as an item of
a buffer comprising 32-bit words.
What is the best way to serialize the struct, and on the other side,
to deserialize it? "best" means here safe casting, and no unneeded copying.
For one direction of casting, I have found (as member function)
int &ToInt() {
return *reinterpret_cast<int *>(this);}
Is there similar valid casting in the other way round, i.e. from integer to MyStruct; the best would be as a member function?
How can I define which bit means which field? (It may even the case,
that the deserialization happens in another program, in another language, in little/big endian systems?
How can I define which bit means which field?
You cannot. You have no control over the layout of bitfields.
"best" means here safe casting, and no unneeded copying.
There is no portable safe cast that could avoid copying.
A portable way to serialise bitfields is to manually shift into an integer, in the desired order. For example:
MyStruct value = something;
uint32_t out = 0;
out |= value.Thread;
out << HTHREAD_BUS_WIDTH;
out |= value.Member;
out << 3;
out |= value.Proxy;
In the shown example, the least significant bits contain the field Proxy while the other fields are adjacent in more significant bits.
Of course, in order to serialise this generated integer correctly, just like serialising any integer, you must take endianness into consideration. Serialisation of an integer can be portably implemented by repeatedly shifting the integer, and copying the bytes in order of significance into an array.
If you need to read from other system which might have different endianess you cannot rely on a portable bitfield. A solution is to "expanse" your structure so that each field is serialyzed as a 32 bit value in the "transport" buffer. A safe implementation could be something like:
typedef struct {
uint32_t Thread: HTHREAD_BUS_WIDTH;
uint32_t Member: 3;
uint32_t Proxy:3;
// Other members, fill out 32 bits
std::vector<uint32_t > to_buffer() const;
} MyStruct;
Implementation of to_buffer():
std::vector<uint32_t > MyStruct::to_buffer() const
{
std::vector<uint32_t> buffer;
buffer.push_back((uint32_t )(Thread);
buffer.push_back((uint32_t )(Member);
buffer.push_back((uint32_t )(Proxy);
// push other members
return buffer;
}
then on the reception side you can do the "buffer" to struct.
If you do not want to expanse the fields that do not use 32 bits you can always implement you own packing function by shifting and masking bits eg:
uint32_t menber_and_procy = (Member << 3) | proxy; // and so one for other members.
It is much more error prone.
From my own experience, if communication bandwith is not an issue, relying on "text like" content is a better choice (no endianess issues and very easy to debug).

Bit field vs Bitset

I want to store bits in an array (like structure). So I can follow either of the following two approaches
Approach number 1 (AN 1)
struct BIT
{
int data : 1
};
int main()
{
BIT a[100];
return 0;
}
Approach number 2 (AN 2)
int main()
{
std::bitset<100> BITS;
return 0;
}
Why would someone prefer AN 2 over AN 1?
Because approach nr. 2 actually uses 100 bits of storage, plus some very minor (constant) overhead, while nr. 1 typically uses four bytes of storage per Bit structure. In general, a struct is at least one byte large per the C++ standard.
#include <bitset>
#include <iostream>
struct Bit { int data : 1; };
int main()
{
Bit a[100];
std::bitset<100> b;
std::cout << sizeof(a) << "\n";
std::cout << sizeof(b) << "\n";
}
prints
400
16
Apart from this, bitset wraps your bit array in a nice object representation with many useful operations.
A good choice depends on how you're going to use the bits.
std::bitset<N> is of fixed size. Visual C++ 10.0 is non-conforming wrt. to constructors; in general you have to provide a workaround. This was, ironically, due to what Microsoft thought was a bug-fix -- they introduced a constructor taking int argument, as I recall.
std::vector<bool> is optimized in much the same way as std::bitset. Cost: indexing doesn't directly provide a reference (there are no references to individual bits in C++), but instead returns a proxy object -- which isn't something you notice until you try to use it as a reference. Advantage: minimal storage, and the vector can be resized as required.
Simply using e.g. unsigned is also an option, if you're going to deal with a small number of bits (in practice, 32 or less, although the formal guarantee is just 16 bits).
Finally, ALL UPPERCASE identifiers are by convention (except Microsoft) reserved for macros, in order to reduce the probability of name collisions. It's therefore a good idea to not use ALL UPPERCASE identifiers for anything else than macros. And to always use ALL UPPERCASE identifiers for macros (this also makes it easier to recognize them).
Cheers & hth.,
bitset has more operations
Approach number 1 will most likely be compiled as an array of 4-byte integers, and one bit of each will be used to store your data. Theoretically a smart compiler could optimize this, but I wouldn't count on it.
Is there a reason you don't want to use std::bitset?
To quote cplusplus.com's page on bitset, "The class is very similar to a regular array, but optimizing for space allocation". If your ints are 4 bytes, a bitset uses 32 times less space.
Even doing bool bits[100], as sbi suggested, is still worse than bitset, because most implementations have >= 1-byte bools.
If, for reasons of intellectual curiosity only, you wanted to implement your own bitset, you could do so using bit masks:
typedef struct {
unsigned char bytes[100];
} MyBitset;
bool getBit(MyBitset *bitset, int index)
{
int whichByte = index / 8;
return bitset->bytes[whichByte] && (1 << (index = % 8));
}
bool setBit(MyBitset *bitset, int index, bool newVal)
{
int whichByte = index / 8;
if (newVal)
{
bitset->bytes[whichByte] |= (1 << (index = % 8));
}
else
{
bitset->bytes[whichByte] &= ~(1 << (index = % 8));
}
}
(Sorry for using a struct instead of a class by the way. I'm thinking in straight C because I'm in the middle of a low-level assignment for school. Obviously two huge benefits of using a class are operator overloading and the ability to have a variable-sized array.)

Custom byte size?

So, you know how the primitive of type char has the size of 1 byte? How would I make a primitive with a custom size? So like instead of an in int with the size of 4 bytes I make one with size of lets say 16.
Is there a way to do this? Is there a way around it?
It depends on why you are doing this. Usually, you can't use types of less than 8 bits, because that is the addressable unit for the architecture. You can use structs, however, to define different lengths:
struct s {
unsigned int a : 4; // a is 4 bits
unsigned int b : 4; // b is 4 bits
unsigned int c : 16; // c is 16 bits
};
However, there is no guarantee that the struct will be 24 bits long. Also, this can cause endian issues. Where you can, it's best to use system independent types, such as uint16_t, etc. You can also use bitwise operators and bit shifts to twiddle things very specifically.
Normally you'd just make a struct that represents the data in which you're interested. If it's 16 bytes of data, either it's an aggregate of a number of smaller types or you're working on a processor that has a native 16-byte integral type.
If you're trying to represent extremely large numbers, you may need to find a special library that handles arbitrarily-sized numbers.
In C++11, there is an excellent solution for this: std::aligned_storage.
#include <memory>
#include <type_traits>
int main()
{
typedef typename std::aligned_storage<sizeof(int)>::type memory_type;
memory_type i;
reinterpret_cast<int&>(i) = 5;
std::cout << reinterpret_cast<int&>(i) << std::endl;
return 0;
}
It allows you to declare a block of uninitialized storage on the stack.
If you want to make a new type, typedef it. If you want it to be 16-bytes in size, typedef a struct that has 16-bytes of member data within it. Just beware that quite often compilers will pad things on you to match your systems alignment needs. A 1 byte struct rarely remains 1 bytes without care.
You could just static cast to and from std::string. I don't know enough C++ to give an example, but I think this would be pretty intuitive.

How to use an int as an array of ints/bools?

I noticed while making a program that a lot of my int type variables never went above ten. I figure that because an int is 2 bytes at the shortest (1 if you count char), so I should be able to store 4 unsigned ints with a max value of 15 in a short int, and I know I can access each one individually using >> and <<:
short unsigned int SLWD = 11434;
S is (SLWD >> 12), L is ((SLWD << 4) >> 12),
W is ((SLWD << 8) >> 12), and D is ((SLWD << 8) >> 12)
However, I have no idea how to encompase this in a function of class, since any type of GetVal() function would have to be of type int, which defeats the purpose of isolating the bits in the first place.
First, remember the Rules of Optimization. But this is possible in C or C++ using bitfields:
struct mystruct {
unsigned int smallint1 : 3; /* 3 bits wide, values 0 -- 7 */
signed int smallint2 : 4; /* 4 bits wide, values -8 -- 7 */
unsigned int boolean : 1; /* 1 bit wide, values 0 -- 1 */
};
It's worth noting that while you gain by not requiring so much storage, you lose because it becomes more costly to access everything, since each read or write now has a bunch of bit twiddling mechanics associated with it. Given that storage is cheap, it's probably not worth it.
Edit: You can also use vector<bool> to store 1-bit bools; but beware of it because it doesn't act like a normal vector! In particular, it doesn't provide iterators. It's sufficiently different that it's fair to say a vector<bool> is not actually a vector. Scott Meyers wrote very clearly on this topic in 'Effective STL'.
In C, and for the sole purpose of saving space, you can reinterpret the unsigned short as a structure with bitfields (or use such structure without messing with reinterpretations):
#include <stdio.h>
typedef struct bf_
{
unsigned x : 4;
unsigned y : 4;
unsigned z : 4;
unsigned w : 4;
} bf;
int main(void)
{
unsigned short i = 5;
bf *bitfields = (bf *) &i;
bitfields->w = 12;
printf("%d\n", bitfields->x);
// etc..
return 0;
}
That's a very common technique. You usually allocate an array of the larger primitive type (e.g., ints or longs), and have some abstraction to deal with the mapping. If you're using an OO language, it's usually a good idea to actually define some sort of BitArray or SmartArray or something like that, and impement a getVal() that takes an index. The important thing is to make sure you hide the details of the internal representation (e.g., for when you move between platforms).
That being said, most mainstream languages already have this functionality available.
If you just want bits, WikiPedia has a good list.
If you want more than bits, you can still find something, or implement it yourself with a similar interface. Take a look at the Java BitSet for reference