Comparing the Most Significant Bit of two numbers: ==, <, <= - bit-manipulation

Is there a quick bit operation to implement msb_equal: a function to check if two numbers have the same most significant bit?
For example, 0b000100 and 0b000111 both have 4 as their most significant bit value, so they are most msb_equal. In contrast 0b001111 has 8 as the MSB value, and 0b010000 has 16 as it's MSB value, so the pair are not msb_equal.
Similarly, are there fast ways to compute <, and <=?
Examples:
msb_equal(0, 0) => true
msb_equal(2, 3) => true
msb_equal(0, 1) => false
msb_equal(1, 2) => false
msb_equal(3, 4) => false
msb_equal(128, 255) => true
A comment asks why 0 and 1 are not msb_equal. My view on this is that if I write out two numbers in binary, they are msb_equal when the most significant 1 bit in each is the same bit.
Writing out 2 & 3:
2 == b0010
3 == b0011
In this case, the top most 1 is the same in each number
Writing out 1 & 0:
1 == b0001
0 == b0000
Here, the top most 1s are not the same.
It could be said that as 0 has no top most set bit, msb_equal(0,0) is ambiguous. I'm defining it as true: I feel this is helpful and consistent.

Yes, there are fast bit based operations to compute MSB equality and inequalities.
Note on syntax
I'll provide implementations using c language syntax for bitwise and logical operators:
| – bitwise OR. || – logical OR.
& – bitwise AND. && – logical AND.
^ – bitwise XOR.
==
msb_equal(l, r) -> bool
{
return (l^r) <= (l&r)
}
<
This is taken from the Wikipedia page on the Z Order Curve (which is awesome):
msb_less_than(l, r) -> bool
{
(l < r) && (l < l^r)
}
<=
msb_less_than_equal(l, r) -> bool
{
(l < r) || (l^r <= l&r)
}

If you know which number is the smallest/biggest one, there is a very fast way to check whether the MSBs are equal. The following code is written in C:
bool msb_equal(unsigned small, unsigned big) {
assert(small <= big);
return (small ^ big) <= small;
}
This can be useful in cases like when you add numbers to a variable and you want to know when you reached a new power of 2.
Explanation
The trick here is that if the two numbers have the same most significant bit, it will disappear since 1 xor 1 is 0; that makes the xor result smaller than both numbers. If they have different most significant bits, the biggest number's MSB will remain because the smallest number has a 0 in that place and therefore the xor result will be bigger than the smallest number.
When both input numbers are 0, the xor result will be 0 and the function will return true. If you want 0 and 0 to count as having different MSBs then you can replace <= with <.

Related

What does 0b1 mean in C++?

I came across a part of code that I cannot understand.
for (unsigned int i = (x & 0b1); i < x; i+= 2)
{
// body
}
Here, x is from 0 to 5.
What is meant by 0b1? and what would be the answers for eg: (0 & 0b1), (4 & 0b1) etc?
0b... is a binary number, just like 0x... is hex and 0... is octal.
Thus 0b1 is same as 1.
1b0 is illegal, the first digit in those must always be 0.
As previous answers said, it is the binary representation of the integer number 1, but they don't seem to have fully answered your question. This has a lot of layers so I'll briefly explain each.
In this context, the ampersand is working as a bitwise AND operator. i & 0b1 is (sometimes) a faster way of checking if an integer is even as opposed to i % 2 == 0.
Say you have int x = 5 and you'd like to check if it's even using bitwise AND.
In binary, 5 would be represented as 0101. That final 1 actually represents the number 1, and in binary integers it's only present in odd numbers. Let's apply the bitwise AND operator to 5 and 1;
0101
0001
&----
0001
The operator is checking each column, and if both rows are 1, that column of the result will be 1 – otherwise, it will be 0. So, the result (converted back to base10) is 1. Now let's try with an even number. 4 = 0100.
0100
0001
&----
0000
The result is now equal to 0. These rules apply to every single integer no matter its size.
The higher-level layer here is that in C, there is no boolean datatype, so booleans are represented as integers of either 0 (false) or any other value (true). This allows for some tricky shorthand, so the conditional if(x & 0b1) will only run if x is odd, because odd & 0b1 will always equal 1 (true), but even & 0b1 will always equal 0 (false).

Logical operators and bit manipulation in C

i am trying to do some exercises, but i'm stuck at this point, where i can't understand what's happening and can't find anything related to this particular matter (Found other things about logical operators, but still not enough)
EDIT: Why the downvote, i was pretty explicit. There is no information regarding the type of X, but i assume is INT, the size is not described either, i thought i would discover that by doing the exercise.
a) At least one bit of x is '1';
b) At least one bit of x is '0';
c) At least one bit at the Least Significant Byte of x , is '1';
d) At least one bit at the Least Significant Byte of x , is '0';
I have the solutions, but would be great to understand them
a) !!x // What happens here? The '!' Usually is NOT in c
b) !!~x // Again, the '!' appears... The bitwise operand NOT is '~' and it makes the int one's complement, no further realization made unfortunately
c) !!(x & 0xFF) // I've read that this is a bit mask, i think they take in consideration 4 bytes in X, and this applies a mask at the least significant byte?
d) !!(~x & 0xFF) // Well, at this point i'm lost ...
I would love not having to skip classes at college, but i work full time in order to pay the fees :( .
You can add brackets around the separate operations and apply them in order. e.g.
!(!(~x))
i.e. !! is 2 NOT's
What happens to some value if you perform one NOT is:
If x == 0 then !x == 1, otherwise !x == 0
So, if you would perform another NOT, you invert the truth-value again. i.e.
If x == 0 then !!x == 0, otherwise !!x == 1
You could see it as getting your value between 0 and 1 in which 0 means: "no bit of x is '1'", and 1 means: "at least one bit of x is '1'".
Also, x & 0xFF takes the least significant byte of your variable. More thoroughly explained here:
What does least significant byte mean?
Assuming x is some unsigned int/short/long/... and you want conditions (if, while...):
a) You´ll have to know that just a value/variable as condition (without a==b or something)
is false if it is 0 and true if it is not 0. So, if x is not 0 (true), one ! will switch it to 0 and the other ! to something not-0-like again (not necessarily the old value, only not 0). If x was 0, the ! will finally result in 0 again (first not 0, then again 0).
The whole value of x is not 0 if at least 1 bit is 1...
What you´re doing is to transform either 0 to 0 or a value with 1-bits to some value with 1-bits. Not wrong, but... You can just write if(x) instead of if(!!x)
b) ~ switches every 0-bit to 1 and every 1 to 0. Now you can search again a 1 because you want a 0 in the original value. The same !!-thing again...
c and d:
&0xFF sets all bits except for the lowest 8 ones (lowest byte) to 0.
The result of A&B is a value where each bit is only 1 if the bits of A an B at the same position are both 1. 0xff (decimal 255) is the number which has exactly the lowest 8 bits set to 1...

What does this float formula mean?

OK, I was given a formula to determine a float value between 0.0 and 1.0 using i and j (coordinates of a value in a 2D array). I simply want to know exactly what this formula does. It makes no sense to me. I have implemented it in its own function where the int values of i and j are passed as parameters. Can someone provide an explanation? I don't HAVE to understand it, as he gave it to us just to use as is, but I really want to know.
float col = float (((i & 0x08) == 0) ^ ((j & 0x08) == 0));
What exactly is going on here?
The result, if plotted with i,j as the x,y coordinates, will be a checkerboard with squares of 8x8 pixels.
The i & 0x08 and j & 0x08 are just testing a single bit of each axis. That bit will change state every 8 pixels.
The == 0 will convert each result to a boolean, which evaluates to either a 0 or 1. It also inverts the result, but I don't think that's relevant in the overall formula.
The ^ exclusive-or operator will return 0 if the two are the same, or 1 if they're different. That's how you get the checkerboard - the result alternates each time either i or j crosses a boundary.
There are a lot of Boolean and bitwise boolean operators here.. Let me try to answer in parts..
Lets first split into pieces
A:(i & 0x08)
Performing bitwise and on i - basically and is performed on each bit of i and 0x08( 1000 in binary)
B:A==0
Check if the bitwise and is false for EVERY BIT
Basically checks if the 4th bit from the last is 0
C: B ^ B'
Bitwise XOR- returns 1 if one of them and not both of them is true (bitwise)
D:float(C)
Easy one, casts C to float.
End result - No idea..
float col = float (((i & 0x08) == 0) ^ ((j & 0x08) == 0));
& 0x08 does a bitwise and with 8, which means it extracts the 4th least signficant bit (1 is the least, then 2, 4, 8) from the numbers i and j. The ^ is an exclusive OR operation: if both bits are the same the result is 0, if they differ the result is 1. That's promoted to a float by the outside = float(...), so col becomes 0.0 if i and j are the same, but 1.0 if they differ.
Why might it be useful? That depends on what i and j are. Presumably, the 4th bit encodes some specific condition or flag (a boolean), for example: whether a person is male or female. The & operation extracts that, then the ^ says "do they differ?". Why might you want a boolean expression converted to a float? Not many good reasons to be honest - you can always let the conversion be done implicitly at the place it's used as in (assuming male/female):
bool hetero = i & 0x08 ^ j & 0x08;
float estimated_children_from_coupling = 1.3 * hetero; // same as hetero ? 1.3 : 0;
In short, "col = 1.0" if "i" OR "j" is equal to 0x08 (decimal 8).
(i & 0x08): not zero if "i == 0x08", zero otherwise
(i & 0x08) == 0: 1/true if "i != 0x08", 0/false if "i == 0x08"
same for "j"
Hence the "exclusive or" (^ operator) of the 2 sides will be true of either of them is true, but not both, which happens when i OR j is 0x08.
Finally, it casts the result to float for whatever the reason.

Fast divisibility tests (by 2,3,4,5,.., 16)?

What are the fastest divisibility tests? Say, given a little-endian architecture and a 32-bit signed integer: how to calculate very fast that a number is divisible by 2,3,4,5,... up to 16?
WARNING: given code is EXAMPLE only. Every line is independent! Just obvious solution using modulo operation is slow on many processors, which don't have DIV hardware (like many ARMs). Some compilers are also cannot make such optimizations (say, if divisor is a function's argument or is dependent on something).
Divisible_by_1 = do();
Divisible_by_2 = if (!(number & 1)) do();
Divisible_by_3 = ?
Divisible_by_4 = ?
Divisible_by_5 = ?
Divisible_by_6 = ?
Divisible_by_7 = ?
Divisible_by_8 = ?
Divisible_by_9 = ?
Divisible_by_10 = ?
Divisible_by_11 = ?
Divisible_by_12 = ?
Divisible_by_13 = ?
Divisible_by_14 = ?
Divisible_by_15 = ?
Divisible_by_16 = if(!number & 0x0000000F) do();
and special cases:
Divisible_by_2k = if(number & (tk-1)) do(); //tk=2**k=(2*2*2*...) k times
In every case (including divisible by 2):
if (number % n == 0) do();
Anding with a mask of low order bits is just obfuscation, and with a modern compiler will not be any faster than writing the code in a readable fashion.
If you have to test all of the cases, you might improve performance by putting some of the cases in the if for another: there's no point it testing for divisibility by 4 if divisibility by 2 has already failed, for example.
It is not a bad idea AT ALL to figure out alternatives to division instructions (which includes modulo on x86/x64) because they are very slow. Slower (or even much slower) than most people realize. Those suggesting "% n" where n is a variable are giving foolish advice because it will invariably lead to the use of the division instruction. On the other hand "% c" (where c is a constant) will allow the compiler to determine the best algorithm available in its repertoire. Sometimes it will be the division instruction but a lot of the time it won't.
In this document Torbjörn Granlund shows that the ratio of clock cycles required for unsigned 32-bit mults:divs is 4:26 (6.5x) on Sandybridge and 3:45 (15x) on K10. for 64-bit the respective ratios are 4:92 (23x) and 5:77 (14.4x).
The "L" columns denote latency. "T" columns denote throughput. This has to do with the processor's ability to handle multiple instructions in parallell. Sandybridge can issue one 32-bit multiplication every other cycle or one 64-bit every cycle. For K10 the corresponding throughput is reversed. For divisions the K10 needs to complete the entire sequence before it may begin another. I suspect it is the same for Sandybridge.
Using the K10 as an example it means that during the cycles required for a 32-bit division (45) the same number (45) of multiplications can be issued and the next-to-last and last one of these will complete one and two clock cycles after the division has completed. A LOT of work can be performed in 45 multiplications.
It is also interesting to note that divs have become less efficient with the evolution from K8-K9 to K10: from 39 to 45 and 71 to 77 clock cycles for 32- and 64-bit.
Granlund's page at gmplib.org and at the Royal Institute of Technology in Stockholm contain more goodies, some of which have been incorporated into the gcc compiler.
As #James mentioned, let the compiler simplify it for you. If n is a constant, any decent compiler is able to recognize the pattern and change it to a more efficient equivalent.
For example, the code
#include <stdio.h>
int main() {
size_t x;
scanf("%u\n", &x);
__asm__ volatile ("nop;nop;nop;nop;nop;");
const char* volatile foo = (x%3 == 0) ? "yes" : "no";
__asm__ volatile ("nop;nop;nop;nop;nop;");
printf("%s\n", foo);
return 0;
}
compiled with g++-4.5 -O3, the relevant part of x%3 == 0 will become
mov rcx,QWORD PTR [rbp-0x8] # rbp-0x8 = &x
mov rdx,0xaaaaaaaaaaaaaaab
mov rax,rcx
mul rdx
lea rax,"yes"
shr rdx,1
lea rdx,[rdx+rdx*2]
cmp rcx,rdx
lea rdx,"no"
cmovne rax,rdx
mov QWORD PTR [rbp-0x10],rax
which, translated back to C code, means
(hi64bit(x * 0xaaaaaaaaaaaaaaab) / 2) * 3 == x ? "yes" : "no"
// equivalatent to: x % 3 == 0 ? "yes" : "no"
no division involved here. (Note that 0xaaaaaaaaaaaaaaab == 0x20000000000000001L/3)
Edit:
The magic constant 0xaaaaaaaaaaaaaaab can be computed in http://www.hackersdelight.org/magic.htm
For divisors of the form 2n - 1, check http://graphics.stanford.edu/~seander/bithacks.html#ModulusDivision
A bit tongue in cheek, but assuming you get the rest of the answers:
Divisible_by_6 = Divisible_by_3 && Divisible_by_2;
Divisible_by_10 = Divisible_by_5 && Divisible_by_2;
Divisible_by_12 = Divisible_by_4 && Divisible_by_3;
Divisible_by_14 = Divisible_by_7 && Divisible_by_2;
Divisible_by_15 = Divisible_by_5 && Divisible_by_3;
Assume number is unsigned (32-bits). Then the following are very fast ways to compute divisibility up to 16. (I haven't measured but the assembly code indicates so.)
bool divisible_by_2 = number % 2 == 0;
bool divisible_by_3 = number * 2863311531u <= 1431655765u;
bool divisible_by_4 = number % 4 == 0;
bool divisible_by_5 = number * 3435973837u <= 858993459u;
bool divisible_by_6 = divisible_by_2 && divisible_by_3;
bool divisible_by_7 = number * 3067833783u <= 613566756u;
bool divisible_by_8 = number % 8 == 0;
bool divisible_by_9 = number * 954437177u <= 477218588u;
bool divisible_by_10 = divisible_by_2 && divisible_by_5;
bool divisible_by_11 = number * 3123612579u <= 390451572u;
bool divisible_by_12 = divisible_by_3 && divisible_by_4;
bool divisible_by_13 = number * 3303820997u <= 330382099u;
bool divisible_by_14 = divisible_by_2 && divisible_by_7;
bool divisible_by_15 = number * 4008636143u <= 286331153u;
bool divisible_by_16 = number % 16 == 0;
Regarding divisibility by d the following rules hold:
When d is a power of 2:
As pointed out by James Kanze, you can use is_divisible_by_d = (number % d == 0). Compilers are clever enough to implement this as (number & (d - 1)) == 0 which is very efficient but obfuscated.
However, when d is not a power of 2 it looks like the obfuscations shown above are more efficient than what current compilers do. (More on that later).
When d is odd:
The technique takes the form is_divisible_by_d = number * a <= b where a and b are cleverly obtained constants. Notice that all we need is 1 multiplication and 1 comparison:
When d is even but not a power of 2:
Then, write d = p * q where p is a power of 2 and q is odd and use the "tongue in cheek" suggested by unpythonic, that is, is_divisible_by_d = is_divisible_by_p && is_divisible_by_q. Again, only 1 multiplication (in the calculation of is_divisible_by_q) is performed.
Many compilers (I've tested clang 5.0.0, gcc 7.3, icc 18 and msvc 19 using godbolt) replace number % d == 0 by (number / d) * d == number. They use a clever technique (see references in Olof Forshell's answer) to replace the division by a multiplication and a bit shift. They end up doing 2 multiplications. In contrast the techniques above perform only 1 multiplication.
Update 01-Oct-2018
Looks like the algorithm above is coming to GCC soon (already in trunk):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853
The GCC's implementation seems even more efficient. Indeed, the implementation above has three parts: 1) divisibility by the divisor's even part; 2) divisibility by the divisor's odd part; 3) && to connect the results of the two previous steps. By using an assembler instruction which is not efficiently available in standard C++ (ror), GCC wraps up the three parts into a single one which is very similar to that of divisibility by the odd part. Great stuff! Having this implementation available, it's better (for both clarity and performance) to fall back to % all times.
Update 05-May-2020
My articles on the subject have been published:
Quick Modular Calculations (Part 1), Overload Journal 154, December 2019, pages 11-15.
Quick Modular Calculations (Part 2), Overload Journal 155, February 2020, pages 14-17.
Quick Modular Calculations (Part 3), Overload Journal 156, April 2020, pages 10-13.
First of all, I remind you that a number in the form bn...b2b1b0 in binary has value:
number = bn*2^n+...+b2*4+b1*2+b0
Now, when you say number%3, you have:
number%3 =3= bn*(2^n % 3)+...+b2*1+b1*2+b0
(I used =3= to indicate congruence modulo 3). Note also that b1*2 =3= -b1*1
Now I will write all the 16 divisions using + and - and possibly multiplication (note that multiplication could be written as shift or sum of same value shifted to different locations. For example 5*x means x+(x<<2) in which you compute x once only)
Let's call the number n and let's say Divisible_by_i is a boolean value. As an intermediate value, imagine Congruence_by_i is a value congruent to n modulo i.
Also, lets say n0 means bit zero of n, n1 means bit 1 etc, that is
ni = (n >> i) & 1;
Congruence_by_1 = 0
Congruence_by_2 = n&0x1
Congruence_by_3 = n0-n1+n2-n3+n4-n5+n6-n7+n8-n9+n10-n11+n12-n13+n14-n15+n16-n17+n18-n19+n20-n21+n22-n23+n24-n25+n26-n27+n28-n29+n30-n31
Congruence_by_4 = n&0x3
Congruence_by_5 = n0+2*n1-n2-2*n3+n4+2*n5-n6-2*n7+n8+2*n9-n10-2*n11+n12+2*n13-n14-2*n15+n16+2*n17-n18-2*n19+n20+2*n21-n22-2*n23+n24+2*n25-n26-2*n27+n28+2*n29-n30-2*n31
Congruence_by_7 = n0+2*n1+4*n2+n3+2*n4+4*n5+n6+2*n7+4*n8+n9+2*n10+4*n11+n12+2*n13+4*n14+n15+2*n16+4*n17+n18+2*n19+4*n20+n21+2*n22+4*n23+n24+2*n25+4*n26+n27+2*n28+4*n29+n30+2*n31
Congruence_by_8 = n&0x7
Congruence_by_9 = n0+2*n1+4*n2-n3-2*n4-4*n5+n6+2*n7+4*n8-n9-2*n10-4*n11+n12+2*n13+4*n14-n15-2*n16-4*n17+n18+2*n19+4*n20-n21-2*n22-4*n23+n24+2*n25+4*n26-n27-2*n28-4*n29+n30+2*n31
Congruence_by_11 = n0+2*n1+4*n2+8*n3+5*n4-n5-2*n6-4*n7-8*n8-5*n9+n10+2*n11+4*n12+8*n13+5*n14-n15-2*n16-4*n17-8*n18-5*n19+n20+2*n21+4*n22+8*n23+5*n24-n25-2*n26-4*n27-8*n28-5*n29+n30+2*n31
Congruence_by_13 = n0+2*n1+4*n2+8*n3+3*n4+6*n5-n6-2*n7-4*n8-8*n9-3*n10-6*n11+n12+2*n13+4*n14+8*n15+3*n16+6*n17-n18-2*n19-4*n20-8*n21-3*n22-6*n3+n24+2*n25+4*n26+8*n27+3*n28+6*n29-n30-2*n31
Congruence_by_16 = n&0xF
Or when factorized:
Congruence_by_1 = 0
Congruence_by_2 = n&0x1
Congruence_by_3 = (n0+n2+n4+n6+n8+n10+n12+n14+n16+n18+n20+n22+n24+n26+n28+n30)-(n1+n3+n5+n7+n9+n11+n13+n15+n17+n19+n21+n23+n25+n27+n29+n31)
Congruence_by_4 = n&0x3
Congruence_by_5 = n0+n4+n8+n12+n16+n20+n24+n28-(n2+n6+n10+n14+n18+n22+n26+n30)+2*(n1+n5+n9+n13+n17+n21+n25+n29-(n3+n7+n11+n15+n19+n23+n27+n31))
Congruence_by_7 = n0+n3+n6+n9+n12+n15+n18+n21+n24+n27+n30+2*(n1+n4+n7+n10+n13+n16+n19+n22+n25+n28+n31)+4*(n2+n5+n8+n11+n14+n17+n20+n23+n26+n29)
Congruence_by_8 = n&0x7
Congruence_by_9 = n0+n6+n12+n18+n24+n30-(n3+n9+n15+n21+n27)+2*(n1+n7+n13+n19+n25+n31-(n4+n10+n16+n22+n28))+4*(n2+n8+n14+n20+n26-(n5+n11+n17+n23+n29))
// and so on
If these values end up being negative, add it with i until they become positive.
Now what you should do is recursively feed these values through the same process we just did until Congruence_by_i becomes less than i (and obviously >= 0). This is similar to what we do when we want to find remainder of a number by 3 or 9, remember? Sum up the digits, if it had more than one digit, some up the digits of the result again until you get only one digit.
Now for i = 1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 16:
Divisible_by_i = (Congruence_by_i == 0);
And for the rest:
Divisible_by_6 = Divisible_by_3 && Divisible_by_2;
Divisible_by_10 = Divisible_by_5 && Divisible_by_2;
Divisible_by_12 = Divisible_by_4 && Divisible_by_3;
Divisible_by_14 = Divisible_by_7 && Divisible_by_2;
Divisible_by_15 = Divisible_by_5 && Divisible_by_3;
Edit: Note that some of the additions could be avoided from the very beginning. For example n0+2*n1+4*n2 is the same as n&0x7, similarly n3+2*n4+4*n5 is (n>>3)&0x7 and thus with each formula, you don't have to get each bit individually, I wrote it like that for the sake of clarity and similarity in operation. To optimize each of the formulas, you should work on it yourself; group operands and factorize operation.
The LCM of these numbers seems to be 720720. Its quite small, so that you can perform a single modulus operation and use the remainder as the index in the precomputed LUT.
You should just use (i % N) == 0 as your test.
My compiler (a fairly old version of gcc) generated good code for all the cases I tried.
Where bit tests were appropriate it did that. Where N was a constant it didn't generate the obvious "divide" for any case, it always used some "trick".
Just let the compiler generate the code for you, it will almost certainly know more about the architecture of the machine than you do :) And these are easy optimisations where you are unlikely to think up something better than the compiler does.
It's an interesting question though. I can't list the tricks used by the compiler for each constant as I have to compile on a different computer.. But I'll update this reply later on if nobody beats me to it :)
This probably won't help you in code, but there's a neat trick which can help do this in your head in some cases:
For divide by 3: For a number represented in decimal, you can sum all the digits, and check if the sum is divisible by 3.
Example: 12345 => 1+2+3+4+5 = 15 => 1+5 = 6, which is divisible by 3 (3 x 4115 = 12345).
More interestingly the same technique works for all factors of X-1, where X is the base in which the number is represented. So for decimal number, you can check divide by 3 or 9. For hex, you can check divide by 3,5 or 15. And for octal numbers, you can check divide by 7.
In a previous question, I showed a fast algorithm to check in base N for divisors that are factors of N-1. Base transformations between different powers of 2 are trivial; that's just bit grouping.
Therefore, checking for 3 is easy in base 4; checking for 5 is easy in base 16, and checking for 7 (and 9) is easy in base 64.
Non-prime divisors are trivial, so only 11 and 13 are hard cases. For 11, you could use base 1024, but at that point it's not really efficient for small integers.
A method that can help modulo reduction of all integer values uses bit-slicing and popcount.
mod3 = pop(x & 0x55555555) + pop(x & 0xaaaaaaaa) << 1; // <- one term is shared!
mod5 = pop(x & 0x99999999) + pop(x & 0xaaaaaaaa) << 1 + pop(x & 0x44444444) << 2;
mod7 = pop(x & 0x49249249) + pop(x & 0x92492492) << 1 + pop(x & 0x24924924) << 2;
modB = pop(x & 0x5d1745d1) + pop(x & 0xba2e8ba2) << 1 +
pop(x & 0x294a5294) << 2 + pop(x & 0x0681a068) << 3;
modD = pop(x & 0x91b91b91) + pop(x & 0xb2cb2cb2) << 1 +
pop(x & 0x64a64a64) << 2 + pop(x & 0xc85c85c8) << 3;
The maximum values for these variables are 48, 80, 73, 168 and 203, which all fit into 8-bit variables. The second round can be carried in parallel (or some LUT method can be applied)
mod3 mod3 mod5 mod5 mod5 mod7 mod7 mod7 modB modB modB modB modD modD modD modD
mask 0x55 0xaa 0x99 0xaa 0x44 0x49 0x92 0x24 0xd1 0xa2 0x94 0x68 0x91 0xb2 0x64 0xc8
shift *1 *2 *1 *2 *4 *1 *2 *4 *1 *2 *4 *8 *1 *2 *4 *8
sum <-------> <------------> <-----------> <-----------------> <----------------->
You can replace division by a non-power-of-two constant by a multiplication, essentially multiplying by the reciprocal of your divisor. The details to get the exact result by this method are complicated.
Hacker's Delight discusses this at length in chapter 10 (unfortunately not available online).
From the quotient you can get the modulus by another multiplication and a subtraction.
One thing to consider: since you only care about divisibility up to 16, you really only need to check divisibility by the primes up to 16. These are 2, 3, 5, 7, 11, and 13.
Divide your number by each of the primes, keeping track with a boolean (such as div2 = true). The numbers two and three are special cases. If div3 is true, try dividing by 3 again, setting div9. Two and its powers are very simple (note: '&' is one of the fastest things a processor can do):
if n & 1 == 0:
div2 = true
if n & 3 == 0:
div4 = true
if n & 7 == 0:
div8 = true
if n & 15 == 0:
div16 = true
You now have the booleans div2, div3, div4, div5, div7, div8, div9, div11, div13, and div16. All
other numbers are combinations; for instance div6 is the same as (div2 && div3)
So, you only need to do either 5 or 6 actual divisions (6 only if your number is divisible by 3).
For myself, i would probably use bits in a single register for my booleans; for instance
bit_0 means div2. I can then use masks:
if (flags & (div2+div3)) == (div2 + div3): do_6()
note that div2+div3 can be a precomputed constant. If div2 is bit0, and div3 is bit1,
then div2+div3 == 3. This makes the above 'if' optimize to:
if (flags & 3) == 3: do_6()
So now... mod without a divide:
def mod(n,m):
i = 0
while m < n:
m <<= 1
i += 1
while i > 0:
m >>= 1
if m <= n: n -= m
i -= 1
return n
div3 = mod(n,3) == 0
...
btw: the worst case for the above code is 31 times through either loop for a 32-bit number
FYI: Just looked at Msalter's post, above. His technique can be used instead of mod(...) for some of the primes.
Fast tests for divisibility depend heavily on the base in which the number is represented. In case when base is 2, I think you can only do "fast tests" for divisibility by powers of 2. A binary number is divisible by 2n iff the last n binary digits of that number are 0. For other tests I don't think you can generally find anything faster than %.
A bit of evil, obfuscated bit-twiddling can get you divisbility by 15.
For a 32-bit unsigned number:
def mod_15ish(unsigned int x) {
// returns a number between 0 and 21 that is either x % 15
// or 15 + (x % 15), and returns 0 only for x == 0
x = (x & 0xF0F0F0F) + ((x >> 4) & 0xF0F0F0F);
x = (x & 0xFF00FF) + ((x >> 8) & 0xFF00FF);
x = (x & 0xFFFF) + ((x >> 16) & 0xFFFF);
// *1
x = (x & 0xF) + ((x >> 4) & 0xF);
return x;
}
def Divisible_by_15(unsigned int x) {
return ((x == 0) || (mod_15ish(x) == 15));
}
You can build similar divisibility routines for 3 and 5 based on mod_15ish.
If you have 64-bit unsigned ints to deal with, extend each constant above the *1 line in the obvious way, and add a line above the *1 line to do a right shift by 32 bits with a mask of 0xFFFFFFFF. (The last two lines can stay the same) mod_15ish then obeys the same basic contract, but the return value is now between 0 and 31. (so what's maintained is that x % 15 == mod_15ish(x) % 15)
Here are some tips I haven't see anyone else suggest yet:
One idea is to use a switch statement, or precompute some array. Then, any decent optimizer can simply index each case directly. For example:
// tests for (2,3,4,5,6,7)
switch (n % 8)
{
case 0: break;
case 1: break;
case 2: do(2); break;
case 3: do(3); break;
case 4: do(2); do(4) break;
case 5: do(5); break;
case 6: do(2); do(3); do(4); break;
case 7: do(7); break;
}
Your application is a bit ambiguous, but you may only need to check prime numbers less than n=16. This is because all numbers are factors of the current or previous prime numbers. So for n=16, you might be able to get away with only checking 2, 3, 5, 7, 11, 13 somehow. Just a thought.

How can I test whether a number is a power of 2?

I need a function like this:
// return true if 'n' is a power of 2, e.g.
// is_power_of_2(16) => true
// is_power_of_2(3) => false
bool is_power_of_2(int n);
Can anyone suggest how I could write this?
(n & (n - 1)) == 0 is best. However, note that it will incorrectly return true for n=0, so if that is possible, you will want to check for it explicitly.
http://www.graphics.stanford.edu/~seander/bithacks.html has a large collection of clever bit-twiddling algorithms, including this one.
A power of two will have just one bit set (for unsigned numbers). Something like
bool powerOfTwo = !(x == 0) && !(x & (x - 1));
Will work fine; one less than a power of two is all 1s in the less significant bits, so must AND to 0 bitwise.
As I was assuming unsigned numbers, the == 0 test (that I originally forgot, sorry) is adequate. You may want a > 0 test if you're using signed integers.
Powers of two in binary look like this:
1: 0001
2: 0010
4: 0100
8: 1000
Note that there is always exactly 1 bit set. The only exception is with a signed integer. e.g. An 8-bit signed integer with a value of -128 looks like:
10000000
So after checking that the number is greater than zero, we can use a clever little bit hack to test that one and only one bit is set.
bool is_power_of_2(int x) {
return x > 0 && !(x & (x−1));
}
For more bit twiddling see here.
Approach #1:
Divide number by 2 reclusively to check it.
Time complexity : O(log2n).
Approach #2:
Bitwise AND the number with its just previous number should be equal to ZERO.
Example: Number = 8
Binary of 8: 1 0 0 0
Binary of 7: 0 1 1 1 and the bitwise AND of both the numbers is 0 0 0 0 = 0.
Time complexity : O(1).
Approach #3:
Bitwise XOR the number with its just previous number should be sum of both numbers.
Example: Number = 8
Binary of 8: 1 0 0 0
Binary of 7: 0 1 1 1 and the bitwise XOR of both the numbers is 1 1 1 1 = 15.
Time complexity : O(1).
http://javaexplorer03.blogspot.in/2016/01/how-to-check-number-is-power-of-two.html
In C++20 there is std::has_single_bit which you can use for exactly this purpose if you don't need to implement it yourself:
#include <bit>
static_assert(std::has_single_bit(16));
static_assert(!std::has_single_bit(15));
Note that this requires the argument to be an unsigned integer type.
bool is_power_of_2(int i) {
if ( i <= 0 ) {
return 0;
}
return ! (i & (i-1));
}
for any power of 2, the following also holds.
n&(-n)==n
NOTE: The condition is true for n=0 ,though its not a power of 2.
Reason why this works is:
-n is the 2s complement of n. -n will have every bit to the left of rightmost set bit of n flipped compared to n. For powers of 2 there is only one set bit.
This is probably the fastest, if using GCC. It only uses a POPCNT cpu instruction and one comparison. Binary representation of any power of 2 number, has always only one bit set, other bits are always zero. So we count the number of set bits with POPCNT, and if it's equal to 1, the number is power of 2. I don't think there is any possible faster methods. And it's very simple, if you understood it once:
if(1==__builtin_popcount(n))
Following would be faster then most up-voted answer due to boolean short-circuiting and fact that comparison is slow.
int isPowerOfTwo(unsigned int x)
{
return x && !(x & (x – 1));
}
If you know that x can not be 0 then
int isPowerOfTwo(unsigned int x)
{
return !(x & (x – 1));
}
return n > 0 && 0 == (1 << 30) % n;
This isn't the fastest or shortest way, but I think it is very readable. So I would do something like this:
bool is_power_of_2(int n)
int bitCounter=0;
while(n) {
if ((n & 1) == 1) {
++bitCounter;
}
n >>= 1;
}
return (bitCounter == 1);
}
This works since binary is based on powers of two. Any number with only one bit set must be a power of two.
What's the simplest way to test whether a number is a power of 2 in C++?
If you have a modern Intel processor with the Bit Manipulation Instructions, then you can perform the following. It omits the straight C/C++ code because others have already answered it, but you need it if BMI is not available or enabled.
bool IsPowerOf2_32(uint32_t x)
{
#if __BMI__ || ((_MSC_VER >= 1900) && defined(__AVX2__))
return !!((x > 0) && _blsr_u32(x));
#endif
// Fallback to C/C++ code
}
bool IsPowerOf2_64(uint64_t x)
{
#if __BMI__ || ((_MSC_VER >= 1900) && defined(__AVX2__))
return !!((x > 0) && _blsr_u64(x));
#endif
// Fallback to C/C++ code
}
GCC, ICC, and Clang signal BMI support with __BMI__. It's available in Microsoft compilers in Visual Studio 2015 and above when AVX2 is available and enabled. For the headers you need, see Header files for SIMD intrinsics.
I usually guard the _blsr_u64 with an _LP64_ in case compiling on i686. Clang needs a little workaround because it uses a slightly different intrinsic symbol nam:
#if defined(__GNUC__) && defined(__BMI__)
# if defined(__clang__)
# ifndef _tzcnt_u32
# define _tzcnt_u32(x) __tzcnt_u32(x)
# endif
# ifndef _blsr_u32
# define _blsr_u32(x) __blsr_u32(x)
# endif
# ifdef __x86_64__
# ifndef _tzcnt_u64
# define _tzcnt_u64(x) __tzcnt_u64(x)
# endif
# ifndef _blsr_u64
# define _blsr_u64(x) __blsr_u64(x)
# endif
# endif // x86_64
# endif // Clang
#endif // GNUC and BMI
Can you tell me a good web site where this sort of algorithm can be found?
This website is often cited: Bit Twiddling Hacks.
Here is another method, in this case using | instead of & :
bool is_power_of_2(int x) {
return x > 0 && (x<<1 == (x|(x-1)) +1));
}
It is possible through c++
int IsPowOf2(int z) {
double x=log2(z);
int y=x;
if (x==(double)y)
return 1;
else
return 0;
}
Another way to go (maybe not fastest) is to determine if ln(x) / ln(2) is a whole number.
This is the bit-shift method in T-SQL (SQL Server):
SELECT CASE WHEN #X>0 AND (#X) & (#X-1)=0 THEN 1 ELSE 0 END AS IsPowerOfTwo
It is a lot faster than doing a logarithm four times (first set to get decimal result, 2nd set to get integer set & compare)