Explain Bit Test macro in C++ - c++

I'm trying to figure out how does this code work, but I can't manage to get a single answer.
#define testbit(x, y) ( ( ((const char*) & (x))[(y)>>3] & 0x80 >> ((y)&0x07)) >> (7-((y)&0x07) ) )
I'm new at pointers, so if you can figure out a way to explain this in simplified english, I would really appreciate it.
It belongs to a segment of code for an X-Plane Plug-in found at https://code.google.com/p/xplugins/source/browse/trunk/Xsaitekpanels/SwitchPanel.cpp?r=38 line=19

The macro tests the value of the y-th bit in x. You can't directly address bits, so the code starts by treating x as an array of bytes (the const char* cast).
It then looks up the byte where the bit lives. There are 8 bits in a byte, so it divides by 8. Chasing performance, instead of simply dividing by 8, the code uses the binary trick of shifting right 3 places. In general, for unsigned x and y, x >> y = x/2^y, and x << y = x*2^y.
At this point you need to test the bit within the byte, so you get the remainder of y/8. Yet another bit trick, using y & 7 instead of the clearer y % 8.
With this information you can make a mask, a single on bit, 0x80 and shift it into position to test the y%8-th bit. The mask is ANDed against the byte and a non-zero result here means the bit was set to 1, otherwise 0.

Completing #RhythmicFistman's answer
#RhythmicFistman's answer is missing one small part to it and that is the last step in the shifts.
The >> (7-((y)&0x07) step ensures that you only ever get a result of 1 or 0. With this code it is safe to do comparisons like:
if (testbit(varible, 6) == 1) {
// do something
}
Where without that step testbit would return a bit mask in which the 6th bit would be set to 1 or 0 and all the other bits are always set to 0. That is the intent but it is not implemented in what is considered a portable way, see Warning 3 below.
Possible issues with using this code
Now to add something to the other answers. The other answers have not pointed out some keywords that should be mentioned here and they are strict aliasing and shift arithmetic right. My elaboration will come in the form of warnings below.
Warning 1: Endianness
This code assumes that you are using a big endian architecture or only wish to get the correct bit from an array of chars.
The reason is that if you convert an int into an array of chars (bytes) you will get different results on a big endian machine vs a little endian machine.
Warning 2: Strict Aliasing
The macro makes use of a cast (const char*) &(x) which is designed to change the type, a.k.a. alias, of (x) so that it is easier to get to the correct bits.
This is dangerous and the reason why is explained beautifully in this SO answer. The short version is that if you compile this code with optimisations strange things can happen.
The wikipedia pages on Aliasing and Pointer Aliasing are also useful and should be read.
Warning 3: Shift Arithmetic Right
In addition to this there could be a potential issue with the way this code uses the right shift operator >>. This operator has two different behaviors depending on whether the variable it is operating on is signed or unsigned. So long as you never use negative numbers you will be safe but this code will not protect you against that mistake. I suspect though, that you're less likely to make such a mistake anyway so it should be ok to use it.
Also worth mentioning, you are using signed char and are shifting it right. Though this works I would prefer unsigned char which would improve portability because it will not risk generating an arithmetic shift right when char and int are the same width (which is almost never the case in practice, granted). This works because char is promoted to int for the shift, see this SO answer for an explanation.

What you see is a macro, that make the following job :
(In order)
Make a bit shift to y (value : 3)
That take the address of x and pick the character in position y (into the string x)
Make a binary operation between the selected char and 0x80
Make a bit shift to the previous result (value: result of binary operation between y and 0x7)
Make a bit shift ti the previous result (value: 7 - (result of binary operation between y and 0x7))
Well, this is help you? I don't think so!
Because this macro is clairly unproper, and kind of tricky.
Bit mask, Binary operation, Binary shift...
If you can explain more precisly what you want to understand in this, maybe i can be helpfull.

Related

Best way to convert 8 boolean to one byte?

I want to save 8 boolean to one byte and then save it to a file(this work must be done for a very large data), I've used the following code but I'm not sure it is the best one(in terms of speed and space):
int bits[]={1,0,0,0,0,1,1,1};
char a='\0';
for (int i=0;i<8;i++){
a=a<<1;
a+=bits[i]
}
//and then save "a"
can anyone give me a better code(more speed) ?
If you don't mind using SSE intrinsics, then _mm_movemask_epi8 is an excellent fit. It uses 16 bytes, but you can just set the others to zero.
For example (not tested)
__m128i values = _mm_loadl_epi64((__m128i*)array);
__m128i order = _mm_set_epi8(0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
0, 1, 2, 3, 4, 5, 6, 7);
values = _mm_shuffle_epi8(values, order);
int result = _mm_movemask_epi8(_mm_slli_epi32(values, 7));
This assumes the array is an array of chars. If you can't make that happen, it takes some more loads and packs and it becomes a bit annoying.
Regarding
” can anyone give me a better code(more speed)
you should measure. Most of the impact on the speed of serializing to file is i/o speed. What you do with the bits will likely have an unmeasurably small impact, but if it has any impact then that is likely mostly influenced by your original representation of the sequence of booleans.
Now regarding the given code
int bits[]={1,0,0,0,0,1,1,1};
char a='\0';
for (int i=0;i<8;i++){
a=a<<1;
a+=bits[i]
}
//and then save "a"
Use unsigned char as byte type, just on principle.
Use bitlevel OR, the | operator, again just on principle.
Use prefix ++, yes, also that just on principle.
The “on principle” for the first point is because in practice your code will not run on any machine with sign-and-magnitude or one's complement representation of signed integers, where char is signed. But I think it's generally a good idea to express in the code exactly what one intends doing, instead of rewriting it as something slightly different. And the intention here is to deal with bits, an unsigned byte.
The “on principle” for the bitlevel OR is because for this particular case there's no practical difference between bitlevel OR and addition. But in general it's a good idea to write in code what one means to express. And then it's no good to write a bitlevel OR as an addition: it might even trip you up, bite you in the a**, in some other context.
The “on principle” for the prefix ++ is because in practice the compiler will optimize prefix and postfix ++ for a basic type, when the expression result isn't used, to the very same machine code. But again it's generally better to write what one intends to express. Asking for an original value (the postfix ++) is just misleading a reader of the code when you're not ever using that original value – and as with the bitlevel OR expressed as addition, the pure increment expressed as postfix ++ might trip you up, bite you in the a**, in some other context, e.g. with iterators.
The general approach of explicitly coding up shifting and ORing appears to me to be fine because std::bitset does not support initialization from a sequence of booleans (only initialization from a text string), so it doesn't save you any work. But generally it's a good idea to check the standard library, whether it supports whatever one wants to do. It might even happen that someone else chimes in here with some standard library based approach that I didn't think of! ;-)
Replace the += operator by |=, which is the bit-wise operation (and actually what you want to do here).
Use unsigned char for your truth values, if possible.
Unless you want to hand-unroll your loops and/or use SIMD intrinsics, that would be the most compiler-optimizable solution, I guess.
there's another trick: structs can have bit offsets, and you can use union on them to misuse them as ints.
By the way: your code is buggy. You shift first, then write; you use addition, but a signed char, which will definitely go wrong for the 7th and 8th bits (given you erroneously shift too early; if you did that properly, only the 8th bit will cause hazard).

Verilog access specific bits

I have problem in accessing 32 most significant and 32 least significant bits in Verilog. I have written the following code but I get the error "Illegal part-select expression" The point here is that I don't have access to a 64 bit register. Could you please help.
`MLT: begin
if (multState==0) begin
{C,Res}<={A*B}[31:0];
multState=1;
end
else
begin
{C,Res}<={A*B}[63:32];
multState=2;
end
Unfortunately the bit-select and part-select features of Verilog are part of expression operands. They are not Verilog operators (see Sec. 5.2.1 of the Verilog 2005 Std. Document, IEEE Std 1364-2005) and can therefore not be applied to arbitrary expressions but only directly to registers or wires.
There are various ways to do what you want but I would recommend using a temporary 64 bit variable:
wire [31:0] A, B;
reg [63:0] tmp;
reg [31:0] ab_lsb, ab_msb;
always #(posedge clk) begin
tmp = A*B;
ab_lsb <= tmp[31:0];
ab_msb <= tmp[63:32];
end
(The assignments to ab_lsb and ab_msb could be conditional. Otherwise a simple "{ab_msb, ab_lsb} <= A*B;" would do the trick as well of course.)
Note that I'm using a blocking assignment to assign 'tmp' as I need the value in the following two lines. This also means that it is unsafe to access 'tmp' from outside
this always block.
Also note that the concatenation hack {A*B} is not needed here, as A*B is assigned to a 64 bit register. This also fits the recommendation in Sec 5.4.1 of IEEE Std 1364-2005:
Multiplication may be performed without losing any overflow bits by assigning the result
to something wide enough to hold it.
However, you said: "The point here is that I don't have access to a 64 bit register".
So I will describe a solution that does not use any Verilog 64 bit registers. This will however not have any impact on the resulting hardware. It will only look different in
the Verilog code.
The idea is to access the MSB bits by shifting the result of A*B. The following naive version of this will not work:
ab_msb <= (A*B) >> 32; // Don't do this -- it won't work!
The reason why this does not work is that the width of A*B is determined by the left hand side of the assignment, which is 32 bits. Therefore the result of A*B will only contain the lower 32 bits of the results.
One way of making the bit width of an operation self-determined is by using the concatenation operator:
ab_msb <= {A*B} >> 32; // Don't do this -- it still won't work!
Now the result width of the multiplication is determined using the max. width of its operands. Unfortunately both operands are 32 bit and therefore we still have a 32 bit multiplication. So we need to extend one operand to be 64 bit, e.g. by appending zeros
(I assume unsigned operands):
ab_msb <= {{32'd0, A}*B} >> 32;
Accessing the lsb bits is easy as this is the default behavior anyways:
ab_lsb <= A*B;
So we end up with the following alternative code:
wire [31:0] A, B;
reg [31:0] ab_lsb, ab_msb;
always #(posedge clk) begin
ab_lsb <= A*B;
ab_msb <= {{32'd0, A}*B} >> 32;
end
Xilinx XST 14.2 generates the same RTL netlist for both versions. I strongly recommend the first version as it is much easier to read and understand. If only 'ab_lsb' or 'ab_msb' is used, the synthesis tool will automatically discard the unused bits of 'tmp'. So there is really no difference.
If this is not the information you where looking for you should probably clarify why and how you "don't have access to 64 bit registers". After all, you try to access the bits [63:32] of a 64 bit value in your code as well. As you can't calculate the upper 32 bits of the product A*B without also performing almost all calculations required for the lower 32 bits, you might be asking for something that is not possible.
You are mixing blocking and non-blocking assignments here:
{C,Res}<={A*B}[63:32]; //< non-blocking
multState=2; //< blocking
this is considered bad practice.
Not sure that a concatenation operation which is just {A*B} is valid. At best it does nothing.
The way you have encoded it looks like you will end up with 2 hardware multipliers. What makes you say you do not have a 64 bit reg, available? reg does not have to imply flip-flops. If you have 2 32bit regs then you could have 1 64 bit one. I would personally do the multiply on 1 line then split the result up and output as 2 32 bit sections.
However :
x <= (a*b)[31:0] is unfortunately not allowed. If x is 32 bits it will take the LSBs, so all you need is :
x <= (a*b)
To take the MSBs you could try:
reg [31:0] throw_away;
{x, throw_away} <= (a*b) ;

How to use bit operations to replace modulu and division operators?

i have this line of code:
base_num = (arr[j]/base)%256;
This line runs in a loop and the operations "/" and "%" take a lot of resources and time to perform. I would like to change this line and apply bit operations in order to maximize the program performance. How can i do that?
Thanks.
If base is the nth power of two, you can replace division by it with a bitshift of n to the right. Then, since taking the mod 256 of an integer is equivalent to taking its last 8 bits, you can AND it with 0xFF. Alternately, you can reverse the operations if you AND it with 256*base and then bitshift n to the right.
base_num = arr[j] >> n;
base_num &= 0xFF;
Of course, any half-decent compiler should be able to do this for you.
Add -O1 or greater to your compiler options and the compiler will do it for you.
In gcc, -O1 turns on -ftree-slsr which is, according to the docs,
Perform straight-line strength reduction on trees. This recognizes related expressions involving multiplications and replaces them by less expensive calculations when possible.
This will replace the modulo, and the base if it is constant. However, if you know that the base will be some non-constant power of two, you can refactor the surrounding code to give you the log2 of that number, and >> by that amount minus one.
You could also just declare base_num as an 8 bit integer:
#include <stdint.h>
uint8_t base_num;
uint16_t crap;
crap = 0xFF00;
base_num = crap;
If your compiler is standards compliment, it will put the value of byte(0xFF00) (0x00) into base_num.
I have yet to meet a compiler that does saturated arithmetic in plain C (neither C++ or C#), but if it does, it will put the value of sat_byte(0xFF00) which being greater than 0xFF, it will put 0xFF into base_num.
Keep in mind your compiler will warn you of a loss of precision in this instance. Your compiler may error out in this case (Visual Studio does with Treat Warnings as Errors On). If that happens, you can just do:
base_num = (uint8_t)crap;
but this seems like what you are trying to avoid.
What you are trying to do it seems is to remove the modulus operator as that requires a division and division is the most costly basic arithmetic operation. I generally would not think of this as a bottleneck in any way as any "intelligent" compiler (even in debug mode) would "optimize" it to:
base_num = crap & 0xFF;
on a supported platform (every mainstream processor I've heard of - x86, AMD64, ARM, MIPS), which should be any. I would be dumbfounded to hear of a processor that has no basic AND and OR arithmetic instructions.

Is there any advantage to using '<< 1' instead of '* 2'?

I've seen this a couple of times, but it seems to me that using the bitwise shift left hinders readability. Why is it used? Is it faster than just multiplying by 2?
You should use * when you are multiplying, and << when you are bit shifting. They are mathematically equivalent, but have different semantic meanings. If you are building a flag field, for example, use bit shifting. If you are calculating a total, use multiplication.
It is faster on old compilers that don't optimize the * 2 calls by emitting a left shift instruction. That optimization is really easy to detect and any decent compiler already does.
If it affects readability, then don't use it. Always write your code in the most clear and concise fashion first, then if you have speed problems go back and profile and do hand optimizations.
It's used when you're concerned with the individual bits of the data you're working with. For example, if you want to set the upper byte of a word to 0x9A, you would not write
n |= 0x9A * 256
You'd write:
n |= 0x9A << 8
This makes it clearer that you're working with bits, rather than the data they represent.
For some architectures, bit shifting is faster than multiplying. However, any compiler worth its salt will optimize *2 (or any multiplication by a power of 2) to a left bit shift (when a bit shift would be faster).
For readability of values used as bitfields:
enum Flags { UP = (1<<0),
DOWN = (1<<1),
STRANGE = (1<<2),
CHARM = (1<<3),
...
which I think is preferable to either '=1,...,=2,...=4' or '=1,...=2, =2*2,...=2*3' especially if you have 8+ flags.
If you are using a old C compiler, it is preferrable to use bitwise. For readability you can comment you code though.

What is the fastest way to get the 4 least significant bits in a byte (C++)?

I'm talking about this:
If we have the letter 'A' which is 77 in decimal and 4D in Hex.
I am looking for the fastest way to get D.
I thought about two ways:
Given x is a byte.
x << 4; x >> 4
x %= 16
Any other ways? Which one is faster?
Brevity is nice - explanations are better :)
x &= 0x0f
is, of course, the right answer. It exactly expresses the intent of what you're trying to achieve, and on any sane architecture will always compile down to the minimum number of instructions (i.e. 1). Do use hex rather than decimal whenever you put constants in a bit-wise operator.
x <<= 4; x >>= 4
will only work if your 'byte' is a proper unsigned type. If it was actually a signed char then the second operation might cause sign extension (i.e. your original bit 3 would then appear in bits 4-7 too).
without optimization this will of course take 2 instructions, but with GCC on OSX, even -O1 will reduce this to the first answer.
x %= 16
even without the optimizer enabled your compiler will almost certainly do the right thing here and turn that expensive div/mod operation into the first answer. However it can only do that for powers of two, and this paradigm doesn't make it quite so obvious what you're trying to achieve.
I always use x &= 0x0f
There are many good answers and some of them are technically the right ones.
In a broader scale, one should understand that C/C++ is not an assembler. Programmer's job is to try to tell to the compiler the intention what you want to achieve. The compiler will pick the best way to do it depending on the architecture and various optimization flags.
x &= 0x0F; is the most clear way to tell the compiler what you want to achieve. If shifting up and down is faster on some architecture, it is the compiler's job to know it and do the right thing.
Single AND operation can do it.
x = (x & 0x0F);
It will depend on on the architecture to some extent - shifting up and back down on an ARM is probably the fastest way - however the compiler should do that for you. In fact, all of the suggested methods will probably be optimized to the same code by the compiler.
x = x & 15