When to use STL bitsets instead of separate variables? - c++

In what situation would it be more appropriate for me to use a bitset (STL container) to manage a set of flags rather than having them declared as a number of separate (bool) variables?
Will I get a significant performance gain if I used a bitset for 50 flags rather than using 50 separate bool variables?

Well, 50 bools as a bitset will take 7 bytes, while 50 bools as bools will take 50 bytes. These days that's not really a big deal, so using bools is probably fine.
However, one place a bitset might be useful is if you need to pass those bools around a lot, especially if you need to return the set from a function. Using a bitset you have less data that has to be moved around on the stack for returns. Then again, you could just use refs instead and have even less data to pass around. :)

std::bitset will give you extra points when you need to serialize / deserialize it. You can just write it to a stream or read from a stream with it. But certainly, the separate bools are going to be faster. They are optimized for this kind of use after all, while a bitset is optimized for space, and has still function calls involved. It will never be faster than separate bools.
Bitset
Very space efficient
Less efficient due to bit fiddling
Provides serialize / de-serialize with op<< and op>>
All bits packed together: You will have the flags at one place.
Separate bools
Very fast
Bools are not packed together. They will be members somewhere.
Decide on the facts. I, personally, would use std::bitset for some not-performance critical, and would use bools if I either have only a few bools (and thus it's quite overview-able), or if I need the extra performance.

It depends what you mean by 'performance gain'. If you only need 50 of them, and you're not low on memory then separate bools is pretty much always a better choice than a bitset. They will take more memory, but the bools will be much faster. A bitset is usually implemented as an array of ints (the bools are packed into those ints). So the first 32 bools (bits) in your bitset will only take up a single 32bit int, but to read each value you have to do some bitwise operations first to mask out all the values you don't want. E.g. to read the 2nd bit of a bitset, you need to:
Find the int that contains the bit you want (in this case, it's the first int)
Bitwise And that int with '2' (i.e. value & 0x02) to find out if that bit is set
However, if memory is a bottleneck and you have a lot of bools using a bitset could make sense (e.g. if you're target platform is a mobile phone, or it's some state in a very busy web service)
NOTE: A std::vector of bool usually has a specialisation to use the equivalent of a bitset, thus making it much smaller and also slower for the same reasons. So if speed is an issue, you'll be better off using a vector of char (or even int), or even just use an old school bool array.

RE #Wilka:
Actually, bitsets are supported by C/C++ in a way that doesn't require you to do your own masking. I don't remember the exact syntax, but it's something like this:
struct MyBitset {
bool firstOption:1;
bool secondOption:1;
bool thirdOption:1;
int fourBitNumber:4;
};
You can reference any value in that struct by just using dot notation, and the right things will happen:
MyBitset bits;
bits.firstOption = true;
bits.fourBitNumber = 2;
if(bits.thirdOption) {
// Whatever!
}
You can use arbitrary bit sizes for things. The resulting struct can be up to 7 bits larger than the data you define (its size is always the minimum number of bytes needed to store the data you defined).

Related

should I use a bit set or a vector? C++

I need to be able to store an array of binary numbers in c++ which will be passed through different methods and eventually outputted to file and terminal,
What are the key differences between vectors and bit sets and which would be easiest and/or more efficient to use?
(I do not know how many bits I need to store)
std::bitset size should be known at compile time, so your choice is obvious - use std::vector<bool>.
Since it's implemented not the same as std::vector<char> (since a single element takes a bit, not a full char), it should be a good solution in terms of memory use.
It all depends on what you want to do binary.
You could also use boost.dynamic_bitset which is like std::bitset but not with fixed bits.
The main drawback is dependency on boost if you don't already use it.
You could also store your input in std::vector<char> and use a bitset per char to convert binary notation.
As others already told: std::bitset uses a fixed number of bits.
std::vector<bool> is not always advised because it has its quirks, as it is not a real container (gotw).
If you don't know how many bits you need to store at compilation time, you cannot use bitset, as it's size is fixed. Because of this, you should you vector<bool>, as it can be resized dynamically. If you want an array of bits saved like this, you can then use vector< vector<bool> >.
If you don't have any specific upper bound for the size of the numbers you need to store, then you need to store your data in two different dimensions:
The first dimension will be the number, whose size varies.
The second dimension will be your array of numbers.
For the latter, using std:vector is fine if you require your values to be contiguous in memory. For the numbers themselves you don't need any data structure at all: just allocate memory using new and an unsigned primitive type such as unsigned char, uint8_t or any other if you have alignment constraints.
If, on the other hand, you know that your numbers won't be larger than let's say 64 bits, then use a data type that you know will hold this amount of data such as uint64_t.
PS: remember that what you store are numbers. The computer will store them in binary whether you use them as such or with any other representation.
I think that in your case you should use std::vector with the value type of std::bitset. Using such an approach you can consider your "binary numbers" either like strings or like objects of some integer type and at the same time it is easy to do binary operations like setting or resetting bits.

C++ Variable Width Bit Field

I'm writing a program that is supposed to manipulate very long strings of boolean values. I was originally storing them as a dynamic array of unsigned long long int variables and running C-style bitwise operations on them.
However, I don't want the overhead that comes with having to iterate over an array even if the processor is doing it at the machine code level - i.e. it is my belief that the compiler is probably more efficient than I am.
So, I'm wondering if there's a way to store them as a bitfield. The only problem with that is that I heard you needed to declare a constant at runtime for that to work and I don't particularly care to do that as I don't know how many bits I need when the program starts. Is there a way to do this?
As per the comments, std::bitset or std::vector<bool> are probably what you need. bitset is fixed-length, vector<bool> is dynamic.
vector<bool> is a specialization of vector that only uses one bit per value, rather than sizeof(bool), like you might expect... While good for memory use, this exception is actually disliked by the standards body these days, because (among other things) vector<bool> doesn't fulfil the same contract that vector<T> does - it returns proxy objects instead of references, which wreaks havoc in generic code.

How to use bit values instead of chars in c++ program?

I got some code that I'd like to improve. It's a simple app for one of the variations of 2DBPP and you can take a look at the source at https://gist.github.com/892951
Here's an outline of things that I use chars for (I'd like to switch to binary values instead.) Initialize a block of memory with '0's ():
...
char* bin;
bin = new (nothrow) char[area];
memset(bin, '\0', area);
sometimes I check particular values:
if (!bin[j*height+k]) {...}
or blocks:
if (memchr(bin+i*height+pos.y, '\1', pos.height)) {...}
or set values to '1's:
memset(bin+i*height+best.y,'\1',best.height);
I don't know of any standart types or methods to work with binary values. How do I get to use the bits instead of bytes?
There's a related question that you might be interested in -
C++ performance: checking a block of memory for having specific values in specific cells
Thank you!
Edit: There's still a bigger question - would it be an improvement? I'm only concerned with time.
For starters, you can refer to this post:
How do you set, clear, and toggle a single bit?
Also, try looking into the C++ Std Bitset, or bit field.
I recommend reading up on boost.dynamic_bitset, which is a runtime-sized version of std::bitset.
Alternatively, if you don't want to use boost for some reason, consider using a std::vector<bool>. Quoting cppreference.com:
Note that a boolean vector (std::vector<bool>) is a specialization of the vector template that is designed to use less memory. A normal boolean variable usually uses 1-4 bytes of memory, but a boolean vector uses only one bit per boolean value.
Unless memory space is an issue, I would stay away from bit twiddling. You may save some memory space but extend performance time. Packing and unpacking bits takes time, and extra code.
Get the code more robust and correct before attempting bit twiddling. Play with different (high level) designs that can improve performance and memory usage.
If you are going to the bit level, study up on boolean arithmetic and logic. Redesign your data to be easier to manipulate at the bit level.

C++ 2.5 bytes (20-bit) integer

I know it's ridiculous, but I need it for storage optimization. Is there any good way to implement it in C++?
It has to be flexible enough so that I can use it as a normal data type e.g Vector< int20 >, operator overloading, etc..
If storage is your main concern, I suspect you need quite a few 20-bit variables. How about storing them in pairs? You could create a class representing two such variables and store them in 2.5+2.5 = 5 bytes.
To access the variables conveniently you could override the []-operator so you could write:
int fst = pair[0];
int snd = pair[1];
Since you may want to allow for manipulations such as
pair[1] += 5;
you would not want to return a copy of the backing bytes, but a reference. However, you can't return a direct reference to the backing bytes (since it would mess up it's neighboring value), so you'd actually need to return a proxy for the backing bytes (which in turn has a reference to the backing bytes) and let the proxy overload the relevant operators.
As a metter of fact, as #Tony suggest, you could generalize this to have a general container holding N such 20-bit variables.
(I've done this myself in a specialization of a vector for efficient storage of booleans (as single bits).)
No... you can't do that as a single value-semantic type... any class data must be a multiple of the 8-bit character size (inviting all the usual quips about CHAR_BITS etc).
That said, let's clutch at straws...
Unfortunately, you're obviously handling very many data items. If this is more than 64k, any proxy object into a custom container of packed values will probably need a >16 bit index/handle too, but still one of the few possibilities I can see worth further consideration. It might be suitable if you're only actively working with and needing value semantic behaviour for a small subset of the values at one point in time.
struct Proxy
{
Int20_Container& container_; // might not need if a singleton
Int20_Container::size_type index_;
...
};
So, the proxy might be 32, 64 or more bits - the potential benefit is only if you can create them on the fly from indices into the container, have them write directly back into the container, and keep them short-lived with few concurrently. (One simple way - not necessarily the fastest - to implement this model is to use an STL bitset or vector as the Int20_Container, and either store 20 times the logical index in index_, or multiply on the fly.)
It's also vaguely possible that although your values range over a 20-bit space, you've less than say 64k distinct values in actual use. If you have some such insight into your data set, you can create a lookup table where 16-bit array indices map to 20-bit values.
Use a class. As long as you respect the copy/assign/clone/etc... STL semantics, you won't have any problem.
But it will not optimize the memory space on your computer. Especially if you put in in a flat array, the 20bit will likely be aligned on a 32bit boundary, so the benefit of a 20bit type there is useless.
In that case, you will need to define your own optimized array type, that could be compatible with the STL. But don't expect it to be fast. It won't be.
Use a bitfield. (I'm really surprised nobody has suggested this.)
struct int20_and_something_else {
int less_than_a_million : 20;
int less_than_four_thousand : 12; // total 32 bits
};
This only works as a mutual optimization of elements in a structure, where you can spackle the gaps with some other data. But it works very well!
If you truly need to optimize a gigantic array of 20-bit numbers and nothing else, there is:
struct int20_x3 {
int one : 20;
int two : 20;
int three : 20; // 60 bits is almost 64
void set( int index, int value );
int get( int index );
};
You can add getter/setter functions to make it prettier if you like, but you can't take the address of a bitfield, and they can't participate in an array. (Of course, you can have an array of the struct.)
Use as:
int20_x3 *big_array = new int20_x3[ array_size / 3 + 1 ];
big_array[ index / 3 ].set( index % 3, value );
You can use C++ std::bitset. Store everything in a bitset and access your data using the correct index (x20).
Your not going to be able to get exactly 20 bits as a type(even with a bit packed struct), as it will always be aligned (at smallest grainularity) to a byte. Imo the only way to go, if you must have 20 bits, is to create a bitstream to handle the data(which you can overload to accept indexing etc)
You can use the union keyword to create a bit field. I've used it way back when bit fields were a necessity. Otherwise, you can create a class that holds 3 bytes, but through bitwise operations exposes just the most significant 20.
As far as I know that isn't possible.
The easiest option would be to define a custom type, that uses an int32_t as the backing storage, and implements appropriate maths as override operators.
For better storage density, you could store 3 int20 in a single int64_t value.
Just an idea: use optimized storage (5 bytes for two instances), and for operations, convert it into 32-bit int and then back.
While its possible to do this a number of ways.
One possibilty would be to use bit twidling to store them as the left and right parts of a 5 byte array with a class to store/retrieve which converts yoiur desired array entry to an array entry in byte5[] array and extracts the left ot right half as appropriate.
However on most hardware requires integers to be word aligned so as well as the bit twiddling to extract the integer you would need some bit shifiting to align it properly.
I think it would be more efficient to increase your swap space and let virtual memory take care of your large array (after all 20 vs 32 is not much of a saving!) always assuming you have a 64 bit OS.

How can I manage bits/binary in c++?

What I need to do is open a text file with 0s and 1s to find patterns between the columns in the file.
So my first thought was to parse each column into a big array of bools, and then do the logic between the columns (now in arrays). Until I found that the size of bools is actually a byte not a bit, so i would be wasting 1/8 of memory, assigning each value to a bool.
Is it even relevant in a grid of 800x800 values? What would be the best way to handle this?
I would appreciate a code snippet in case its a complicated answer
You could use std::bitset or Boosts dynamic_bitset which provide different methods which will help you manage your bits.
They for example support constructors which create bitsets from other default types like int or char. You can also export the bitset into an ulong or into a string (which then could be turned into a bitset again etc)
I once asked about concatenating those, which wasn't performantly possible to do. But perhaps you could use the info in that question too.
you can use std::vector<bool> which is a specialization of vector that uses a compact store for booleans....1 bit not 8 bits.
I think it was Knuth who said "premature optimization is the root of all evil." Let's find out a little bit more about the problem. Your array is 800**2 == 640,000 bytes, which is no big deal on anything more powerful than a digital watch.
While storing it as bytes may seem wasteful -- as you say, 7/8ths of the memory is redundant -- but on the other hand, most machines don't do bit operations as efficiently as bytes; by saving the memory, you might waste so much effort masking and testing that you would have been better off with the bytes model.
On the other hand, if what you want to do with it is look for larger patterns, you might want to use a bitwise representation because you can do things with 8 bits at a time.
The real point here is that there are several possibilities, but no one can tell you the "right" representation without knowing what the problem is.
For that size grid your array of bools would be about 640KB. Depends how much memory you have if that will be a problem. It would probably be the simplest for the logic analysis code.
By grouping the bits and storing in an array of int you could drop the memory requirement to 80KB, but the logic code would be more complicated as you'd be always isolating the bits you wanted to check.