How do you use bitwise flags in C++? - c++

As per this website, I wish to represent a Maze with a 2 dimensional array of 16 bit integers.
Each 16 bit integer needs to hold the following information:
Here's one way to do it (this is by no means the only way): a 12x16 maze grid can be represented as an array m[16][12] of 16-bit integers. Each array element would contains all the information for a single corresponding cell in the grid, with the integer bits mapped like this:
(source: mazeworks.com)
To knock down a wall, set a border, or create a particular path, all we need to do is flip bits in one or two array elements.
How do I use bitwise flags on 16 bit integers so I can set each one of those bits and check if they are set.
I'd like to do it in an easily readable way (ie, Border.W, Border.E, Walls.N, etc).
How is this generally done in C++? Do I use hexidecimal to represent each one (ie, Walls.N = 0x02, Walls.E = 0x04, etc)? Should I use an enum?
See also How do you set, clear, and toggle a single bit?.

If you want to use bitfields then this is an easy way:
typedef struct MAZENODE
{
bool backtrack_north:1;
bool backtrack_south:1;
bool backtrack_east:1;
bool backtrack_west:1;
bool solution_north:1;
bool solution_south:1;
bool solution_east:1;
bool solution_west:1;
bool maze_north:1;
bool maze_south:1;
bool maze_east:1;
bool maze_west:1;
bool walls_north:1;
bool walls_south:1;
bool walls_east:1;
bool walls_west:1;
};
Then your code can just test each one for true or false.

Use std::bitset

Use hex constants/enums and bitwise operations if you care about which particular bits mean what.
Otherwise, use C++ bitfields (but be aware that the ordering of bits in the integer will be compiler-dependent).

Learn your bitwise opertors: &, |, ^, and !.
At the top of a lot of C/C++ files I have seen flags defined in hex to mask each bit.
#define ONE 0x0001
To see if a bit is turned on, you AND it with 1. To turn it on, you OR it with 1. To toggle like a switch, XOR it with 1.

To manipulate sets of bits, you can also use ....
std::bitset<N>
std::bitset<4*4> bits;
bits[ 10 ] = false;
bits.set(10);
bits.flip();
assert( !bits.test(10) );

You can do it with hexadecimal flags or enums as you suggested, but the most readable/self-documenting is probably to use what are called "bitfields" (for details, Google for C++ bitfields).

Yes a good way is to use hex decimal to represent the bit patterns. Then you use the bitwise operators to manipulate your 16-bit ints.
For example:
if(x & 0x01){} // tests if bit 0 is set using bitwise AND
x ^= 0x02; // toggles bit 1 (0 based) using bitwise XOR
x |= 0x10; // sets bit 4 (0 based) using bitwise OR

I'm not a huge fan of bitset. It's just more typing in my opinion. And it doesn't hide what you are doing anyway. You still have to & && | bits. Unless you are picking on just 1 bit. That may work for small groups of flags. Not that we need to hide what we are doing either. But the intention of a class is usually to make something easier for it's users. I don't think this class accomplishes it.
Say for instance, you have a flag system with .. 64 flags. If you want to test.. I don't know.. 39 of them in 1 if statement to see if they are all on... using bitfields is a huge pain. You have to type them all out.. Course. I'm making the assumption you use only bitfields functionality and not mix and match methods. Same thing with bitset. Unless I am missing something with the class.. which is quite possible since I rarely use it.. I don't see a way you can test all 39 flags unless you type out the hole thing or resort to "standard methods" (using enum flag lists or some defined value for 39 bits and using the bitsets && operator). This can start to get messy depending on your approach. And I know.. 64 flags sounds like a lot. And well. It is.. depending on what you are doing. Personally speaking, most of the projects I'm involved with depend on flag systems. So actually.. 64 is not that unheard of. Though 16~32 is far more common in my experience. I'm actually helping out in a project right now where one flag system has 640 bits. It's basically a privilege system. So it makes some sense to arrange them all together... However.. admittedly.. I would like to break that up a bit.. but.. eh... I'm helping.. not creating.

Related

Best way to convert 8 boolean to one byte?

I want to save 8 boolean to one byte and then save it to a file(this work must be done for a very large data), I've used the following code but I'm not sure it is the best one(in terms of speed and space):
int bits[]={1,0,0,0,0,1,1,1};
char a='\0';
for (int i=0;i<8;i++){
a=a<<1;
a+=bits[i]
}
//and then save "a"
can anyone give me a better code(more speed) ?
If you don't mind using SSE intrinsics, then _mm_movemask_epi8 is an excellent fit. It uses 16 bytes, but you can just set the others to zero.
For example (not tested)
__m128i values = _mm_loadl_epi64((__m128i*)array);
__m128i order = _mm_set_epi8(0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0, 1, 2, 3, 4, 5, 6, 7);
values = _mm_shuffle_epi8(values, order);
int result = _mm_movemask_epi8(_mm_slli_epi32(values, 7));
This assumes the array is an array of chars. If you can't make that happen, it takes some more loads and packs and it becomes a bit annoying.
Regarding
” can anyone give me a better code(more speed)
you should measure. Most of the impact on the speed of serializing to file is i/o speed. What you do with the bits will likely have an unmeasurably small impact, but if it has any impact then that is likely mostly influenced by your original representation of the sequence of booleans.
Now regarding the given code
int bits[]={1,0,0,0,0,1,1,1};
char a='\0';
for (int i=0;i<8;i++){
a=a<<1;
a+=bits[i]
}
//and then save "a"
Use unsigned char as byte type, just on principle.
Use bitlevel OR, the | operator, again just on principle.
Use prefix ++, yes, also that just on principle.
The “on principle” for the first point is because in practice your code will not run on any machine with sign-and-magnitude or one's complement representation of signed integers, where char is signed. But I think it's generally a good idea to express in the code exactly what one intends doing, instead of rewriting it as something slightly different. And the intention here is to deal with bits, an unsigned byte.
The “on principle” for the bitlevel OR is because for this particular case there's no practical difference between bitlevel OR and addition. But in general it's a good idea to write in code what one means to express. And then it's no good to write a bitlevel OR as an addition: it might even trip you up, bite you in the a**, in some other context.
The “on principle” for the prefix ++ is because in practice the compiler will optimize prefix and postfix ++ for a basic type, when the expression result isn't used, to the very same machine code. But again it's generally better to write what one intends to express. Asking for an original value (the postfix ++) is just misleading a reader of the code when you're not ever using that original value – and as with the bitlevel OR expressed as addition, the pure increment expressed as postfix ++ might trip you up, bite you in the a**, in some other context, e.g. with iterators.
The general approach of explicitly coding up shifting and ORing appears to me to be fine because std::bitset does not support initialization from a sequence of booleans (only initialization from a text string), so it doesn't save you any work. But generally it's a good idea to check the standard library, whether it supports whatever one wants to do. It might even happen that someone else chimes in here with some standard library based approach that I didn't think of! ;-)
Replace the += operator by |=, which is the bit-wise operation (and actually what you want to do here).
Use unsigned char for your truth values, if possible.
Unless you want to hand-unroll your loops and/or use SIMD intrinsics, that would be the most compiler-optimizable solution, I guess.
there's another trick: structs can have bit offsets, and you can use union on them to misuse them as ints.
By the way: your code is buggy. You shift first, then write; you use addition, but a signed char, which will definitely go wrong for the 7th and 8th bits (given you erroneously shift too early; if you did that properly, only the 8th bit will cause hazard).

Can I use data types like bool to compress data while improving readability?

My official question will be: "Is there a clean way to use data types to "encode and compress" data rather than using messy bit masking." The hopes would be to save space in the case of compressing, and I would like to use native data types, structures, and arrays in order to improve readability over bit masking. I am proficient in bit masking from my assembly background but I am learning C++ and OOP. We can store so much information in a 32 bit register by using individual bits and I feel that I am trying to get back to that low level environment while having the readability of C++ code.
I am attempting to save some space because I am working with huge resource requirements. I am still learning more about how c++ treats the bool data type. I realize that memory is stored in byte chunks and not individual bits. I believe that a bool usually uses one byte and is masked somehow. In my head I could use 8 bool values in one byte.
If I malloc in C++ an array of 2 bool elements. Does it allocate two bytes or just one?
Example: We will use DNA as an example since it can be encoded into two bit to represent A,C,G and T. If I make a struct with an array of two bool called DNA_Base, then I make an array of 20 of those.
struct DNA_Base{ bool Bit_1; bool Bit_2; };
DNA_Base DNA_Sequence[7] = {false};
cout << sizeof(DNA_Base)<<sizeof(DNA_Sequence)<<endl;
//Yields a 2 and a 14.
//I would like this to say 1 and 2.
In my example I would also show the case where the DNA sequence can be 20 bases long which would require 40 bits to encode. GATTACA could only take up a maximum of 2 bytes? I suppose an alternative question would have been "How to make C++ do the bit masking for me in a more readable way" or should I just make my own data type and classes and implement the bit masking using classes and operator overloading.
Not fully what you want but you can use bitfield:
struct DNA_Base
{
unsigned char Bit_1 : 1;
unsigned char Bit_2 : 1;
};
DNA_Base DNA_Sequence[7];
So sizeof(DNA_Base) == 1 and sizeof(DNA_Sequence) == 7
So you have to pack the DNA_Base to avoid to lose place with padding, something like:
struct DNA_Base_4
{
unsigned char base1 : 2; // may have value 0 1 2 or 3
unsigned char base2 : 2;
unsigned char base3 : 2;
unsigned char base4 : 2;
};
So sizeof(DNA_Base_4) == 1
std::bitset is an other alternative, but you have to do the interpretation job yourself.
An array of bools will be N-elements x sizeof(bool).
If your goal is to save space in registers, don't bother, because it is actually more efficient to use a word size for the processor in question than to use a single byte, and the compiler will prefer to use a word anyway, so in a struct/class the bool will usually be expanded to a 32-bit or 64-bit native word.
Now, if you like to save room on disk, or in RAM, due to needing to store LOTS of bools, go ahead, but it isn't going to save room in all cases unless you actually pack the structure, and on some architectures packing can also have performance impact because the CPU will have to perform unaligned or byte-by-byte access.
A bitmask (or bitfield), on the other hand, is performant and efficient and as dense as possible, and uses a single bitwise operation. I would look at one of the abstract data types that provide bit fields.
The standard library has bitset http://www.cplusplus.com/reference/bitset/bitset/ which can be as long as you want.
Boost also has something I'm sure.
Unless you are on a 4 bit machine, the final result will be using bit arithmetic. Whether you do it explicitly, have the compiler do it via bit fields, or use a bit container, there will be bit manipulation.
I suggest the following:
Use existing compression libraries.
Use the method that is most readable or understood by people other
than yourself.
Use the method that is most productive (talking about development
time).
Use the method that you will inject the least amount of defects.
Edit 1:
Write each method up as a separate function.
Tell the compiler to generate the assembly language for each function.
Compare the assembly language of each function to each other.
My belief is that they will be very similar, enough that wasting time discussing them is not worthwhile.
You can't operate on bits directly, but you can treat the smallest unit available to you as a multiple data store, and define
enum class DNAx4 : uint8_t {
AAAA = 0x00, AAAC = 0x01, AAAG = 0x02, AAAT = 0x03,
// .... And the rest of them
AAAA = 0xFC, AAAC = 0xFD, AAAG = 0xFE, AAAT = 0xFF
}
I'd actually go further, and create a structure DNAx16 or DNAx32 to efficiently use the native word size on your machine.
You can then define functions on the data type, which will have to use the underlying bit representation, but at least it allows you to encapsulate this and build higher level operations from these primitives.

Explain Bit Test macro in C++

I'm trying to figure out how does this code work, but I can't manage to get a single answer.
#define testbit(x, y) ( ( ((const char*) & (x))[(y)>>3] & 0x80 >> ((y)&0x07)) >> (7-((y)&0x07) ) )
I'm new at pointers, so if you can figure out a way to explain this in simplified english, I would really appreciate it.
It belongs to a segment of code for an X-Plane Plug-in found at https://code.google.com/p/xplugins/source/browse/trunk/Xsaitekpanels/SwitchPanel.cpp?r=38 line=19
The macro tests the value of the y-th bit in x. You can't directly address bits, so the code starts by treating x as an array of bytes (the const char* cast).
It then looks up the byte where the bit lives. There are 8 bits in a byte, so it divides by 8. Chasing performance, instead of simply dividing by 8, the code uses the binary trick of shifting right 3 places. In general, for unsigned x and y, x >> y = x/2^y, and x << y = x*2^y.
At this point you need to test the bit within the byte, so you get the remainder of y/8. Yet another bit trick, using y & 7 instead of the clearer y % 8.
With this information you can make a mask, a single on bit, 0x80 and shift it into position to test the y%8-th bit. The mask is ANDed against the byte and a non-zero result here means the bit was set to 1, otherwise 0.
Completing #RhythmicFistman's answer
#RhythmicFistman's answer is missing one small part to it and that is the last step in the shifts.
The >> (7-((y)&0x07) step ensures that you only ever get a result of 1 or 0. With this code it is safe to do comparisons like:
if (testbit(varible, 6) == 1) {
// do something
}
Where without that step testbit would return a bit mask in which the 6th bit would be set to 1 or 0 and all the other bits are always set to 0. That is the intent but it is not implemented in what is considered a portable way, see Warning 3 below.
Possible issues with using this code
Now to add something to the other answers. The other answers have not pointed out some keywords that should be mentioned here and they are strict aliasing and shift arithmetic right. My elaboration will come in the form of warnings below.
Warning 1: Endianness
This code assumes that you are using a big endian architecture or only wish to get the correct bit from an array of chars.
The reason is that if you convert an int into an array of chars (bytes) you will get different results on a big endian machine vs a little endian machine.
Warning 2: Strict Aliasing
The macro makes use of a cast (const char*) &(x) which is designed to change the type, a.k.a. alias, of (x) so that it is easier to get to the correct bits.
This is dangerous and the reason why is explained beautifully in this SO answer. The short version is that if you compile this code with optimisations strange things can happen.
The wikipedia pages on Aliasing and Pointer Aliasing are also useful and should be read.
Warning 3: Shift Arithmetic Right
In addition to this there could be a potential issue with the way this code uses the right shift operator >>. This operator has two different behaviors depending on whether the variable it is operating on is signed or unsigned. So long as you never use negative numbers you will be safe but this code will not protect you against that mistake. I suspect though, that you're less likely to make such a mistake anyway so it should be ok to use it.
Also worth mentioning, you are using signed char and are shifting it right. Though this works I would prefer unsigned char which would improve portability because it will not risk generating an arithmetic shift right when char and int are the same width (which is almost never the case in practice, granted). This works because char is promoted to int for the shift, see this SO answer for an explanation.
What you see is a macro, that make the following job :
(In order)
Make a bit shift to y (value : 3)
That take the address of x and pick the character in position y (into the string x)
Make a binary operation between the selected char and 0x80
Make a bit shift to the previous result (value: result of binary operation between y and 0x7)
Make a bit shift ti the previous result (value: 7 - (result of binary operation between y and 0x7))
Well, this is help you? I don't think so!
Because this macro is clairly unproper, and kind of tricky.
Bit mask, Binary operation, Binary shift...
If you can explain more precisly what you want to understand in this, maybe i can be helpfull.

Creating a simple portable bitmask and using it

This is my first time trying to create a bitmask, and although seemingly simple I have having trouble visualizing everything.
Keep in mind I cannot use std::bitset
First, I have read that accessing raw bits is undefined behavior. (so using a union of a char would be bad because the bits might be reversed for a different compiler).
Most code I've looked at uses a struct to define each bit, and this way of structuring data should be compiler independent because the first bit will always be the LSB. (I assume) Here is an example:
struct foo
{
unsigned char a : 1;
unsigned char b : 1;
unsigned char unused : 6;
};
Now the question is...could you use more than one bit for a variable in the struct AND have it still be comipiler independent? It seems like the answer is yes, but I have had some weird answers and want to be sure. Something like:
struct foo
{
unsigned char ab : 2;
unsigned char unused : 6;
};
It seems like regardless if the raw structure is reversed, the first bit accessed from the struct is always the LSB, so how many bits you use should not matter.
The C standard does not specify the ordering of fields within a unit -- there's no guarantee that a, in your example, is in the LSB. If you want fully portable behavior, you need to do the bit manipulation yourself, using unsigned integral types, and (if using unsigned integral types bigger than a byte) you need to worry about the endianness when reading/writing them from external sources.
The behaviour does not depend on the bit order. What you have written corresponds to the language standard and therefore behaves the same on all platforms.
Bitfields cannot be portably used to access specific bits in an external block of data (like a hardware register or data serialized in a stream of bytes). So bitfields aren't useful in this context - at least for portable code.
But if you're talking about using the bitfield within the program and not trying to have it model some external bit representation, then it's 100% portable. Not super useful, but portable.
I've spent a career twiddling bits in C/C++, and maybe because of this issue, I never see it done this way. We always use unsigned variables and apply bit masks to them:
#define BITMASK_A 0x01
#define BITMASK_B 0x02
unsigned char bitfield;
Then when you want to access a, you use (bitfield & BITMASK_A)
But to answer your question, there should be no logical difference between your two examples, if the compiler places ab at the low end, then the first example should also place a at the LSb.

How many options can I include in a bit mask?

In this question it was pointed out that:
Using int [for bit mask] is asking for trouble
I have been using an unsigned char to store bitmask flags, but it occurs to me that I will hit low limit since a char is only a byte, thus 8 bits, thus only 8 options in my mask?
enum options{
k1=1<<0,
k2=1<<1,
.... through to k8
}
unsigned char myOption=k2;
Do I simply need to make myOption an int or some other type for example if I wish it to store more than 8 possible options (and combinations of options, of course, hence why I am using the bit mask in the first place)? What's the best type?
If you need an unknown number of 'bits' you could use something like the std::vector<bool> class, see here:
http://www.cplusplus.com/reference/vector/vector-bool/
This is a specialization of the vector class which can pack the bool values using bits, so it is more space efficient than an array of bools (whether you need that extra efficiency is up to you).
Of course I don't know what your application is, there are many valid reasons for using bitfields. If you are simply storing a bunch of true and false values though, something like an array of bools or this vector of bools might be more easily maintained (it has downsides though of course, you can't test to see if say 3 bits are all set in one operation as you can with masking and bitfields, so it is application specific).
vector<bool> is somewhat controversial though, I think. See: http://howardhinnant.github.io/onvectorbool.html
#include <stdint.h>
This defines types with fixed sizes that are not compiler specific.
int16_t = 16 bits
uint16_t = 16 bits unsigned
int32_t = 32 bits
If you need more than 64 flags you should consider the ::std::vector<> as Wayne Uroda suggested.