XOR conditionally without branching if the lowest bit is set - c++

I have three unsigned 32-bit integers, say a, b, and c. If the lowest bit of b is 1, I want to XOR c with a, and store the result into c. We can do this in the following way:
#include <cassert>
int main()
{
// Some values for a and c
unsigned a = 16;
unsigned c = 25;
unsigned b = 5; // 101_2
if(b & 1)
{
c ^= a;
}
assert(c == 9);
}
Can I do this conditionally without branching, that is, with no if-statements?

There's lots of ways to do this.
Here's another, no multiply, only 4 operations.
c ^= a&(-(b&1));

This should work
c ^= a * ( b & 1 );

Without if statement and without branching you have to check the assembly dump of your compiler:
c ^= ~((b & 1) - 1) & a;

Related

Looping over bits c++

Came across a question to reverse bits of an unsigned integer. Tried a different approach. However, I'm not very familiar with how bit-wise operators work. Can someone please point what is fundamentally wrong here?
unsigned int reverse(unsigned int A)
{
unsigned int c=0;
int a=0;
while(a < 32)
{
c = c << 1;
c = c | ( A & (1 << a) );
a++;
}
return c;
You shift 1 to the left in both cases and getting the same result. Try to use 10000... (32 bits) and shift it to the right instead of 1 << a

How can I calculate (A*B)%C for A,B,C <= 10^18, in C++?

For example, A=10^17, B=10^17, C=10^18.
The product A*B exceeds the limit of long long int.
Also, writing ((A%C)*(B%C))%C doesn't help.
Assuming you want to stay within 64-bit integer operations, you can use binary long division, which boils down to a bunch of adds and multiply by two operations. This means you also need overflow-proof versions of those operators, but those are relatively simple.
Here is some Java code that assumes A and B are already positive and less than M. If not, it's easy to make them so beforehand.
// assumes a and b are already less than m
public static long addMod(long a, long b, long m) {
if (a + b < 0)
return (a - m) + b; // avoid overflow
else if (a + b >= m)
return a + b - m;
else
return a + b;
}
// assumes a and b are already less than m
public static long multiplyMod(long a, long b, long m) {
if (b == 0 || a <= Long.MAX_VALUE / b)
return a * b % m; // a*b > c if and only if a > c/b
// a * b would overflow; binary long division:
long result = 0;
if (a > b) {
long c = b;
b = a;
a = c;
}
while (a > 0) {
if ((a & 1) != 0) {
result = addMod(result, b, m);
}
a >>= 1;
// compute b << 1 % m without overflow
b -= m - b; // equivalent to b = 2 * b - m
if (b < 0)
b += m;
}
return result;
}
You can use
The GNU Multiple Precision Arithmetic Library
https://gmplib.org/
or
C++ Big Integer Library
https://mattmccutchen.net/bigint/
If you work only with power of 10 numbers, you could create a simple class with 2 members: a base and the power of 10, so A=10^17 would be {1, 17}. Implementing adding, subtracting, multiply and division is very easy and so is the print.

How to write and read bytes from unsigned variable

Here is what I'm trying to do:
I have two integers
int a = 0; // can be 0 or 1
int b = 3; // can be 0, 1, 2 or 3
Also I want to have
unsigned short c
to store that variables inside it.
For example, if I would store a inside c it will be looking like this:
00000000
^ here is a
Then I need to store b inside c. And it should look like following:
011000000
^^ here is b.
Also I would like to read that numbers back after writing them.
How can I do this?
Thanks for your suggestions.
Assuming those are binary representations of the numbers and assuming that you really meant to have five zeros to the right of b
01100000
^^ here is b
(the way you have it a and b overlap)
Then this is how to do it
// write a to c
c &= ~(1 << 7);
c |= a << 7;
// write b to c
c &= ~(3 << 5);
c |= b << 5;
// read a from c
a = (c >> 7)&1;
// read b from c
b = (c >> 5)&3;
You can accomplish this with C++ Bit Fields:
struct MyBitfield
{
unsigned short a : 1;
unsigned short b : 2;
};
MyBitfield c;
c.a = // 0 or 1
c.b = // 0 or 1 or 2 or 3

Find out (in C++) if binary number is prefix of another

I need a function with a header like this:
bool is_prefix(int a, int b, int* c) {
// ...
}
If a is, read as a binary number string, a prefix of b, then set *c to be the rest of b (i.e. "what b has more than a") and return true. Otherwise, return false. Assume that binary strings always start with "1".
Of course - it is easy to do by comparing bit by bit (leftshift b until b==a). But is there a solution which is more efficient, without iterating over the bits?
Example: a=100 (4), b=1001 (9). Now set *c to 1.
You can use your favorite "fast" method to find the highest set bit. Let's call the function msb().
bool is_prefix (int a, int b, int *c) {
if (a == 0 || b == 0 || c == 0) return false;
int d = msb(b) - msb(a);
if (d < 0) return false;
if ((b >> d) == a) {
*c = b ^ (a << d);
return true;
}
return false;
}
Shift b so its high order bit aligns with a, and compare that with a. If they are equal, then a is a "prefix" of b.
This algorithm's performance depends on the performance of msb(). If it is constant, then this algorithm is constant. If msb() is expensive, then the "easy approach" may be the fastest approach.
I'm not too sure, but would something like the following work:
bool
is_prefix( unsigned a, unsigned b, unsigned* c )
{
unsigned mask = -1;
while ( mask != 0 && a != (b & mask) ) {
a <<= 1;
mask <<= 1;
}
c = b & ~mask;
return mask != 0;
}
(Just off the top of my head, so there could be errors.)

if/else statement in SSE intrinsics

I am trying to optimize a small piece of code with SSE intrinsics (I am a complete beginner on the topic), but I am a little stuck on the use of conditionals.
My original code is:
unsigned long c;
unsigned long constant = 0x12345678;
unsigned long table[256];
int n, k;
for( n = 0; n < 256; n++ )
{
c = n;
for( k = 0; k < 8; k++ )
{
if( c & 1 ) c = constant ^ (c >> 1);
else c >>= 1;
}
table[n] = c;
}
The goal of this code is to compute a crc table (the constant can be any polynomial, it doesn't play a role here),
I suppose my optimized code would be something like:
__m128 x;
__m128 y;
__m128 *table;
x = _mm_set_ps(3, 2, 1, 0);
y = _mm_set_ps(3, 2, 1, 0);
//offset for incrementation
offset = _mm_set1_ps(4);
for( n = 0; n < 64; n++ )
{
y = x;
for( k = 0; k < 8; k++ )
{
//if do something with y
//else do something with y
}
table[n] = y;
x = _mm_add_epi32 (x, offset);
}
I have no idea how to go through the if-else statement, but I suspect there is a clever trick. Has anybody an idea on how to do that?
(Aside from this, my optimization is probably quite poor - any advice or correction on it would be treated with the greatest sympathy)
You can get rid of the if/else entirely. Back in the days when I produced MMX assembly code, that was a common programming activity. Let me start with a series of transformations on the "false" statement:
c >>= 1;
c = c >> 1;
c = 0 ^ (c >> 1);
Why did I introduce the exclusive-or? Because exclusive-or is also found in the "true" statement:
c = constant ^ (c >> 1);
Note the similarity? In the "true" part, we xor with a constant, and in the false part, we xor with zero.
Now I'm going to show you a series of transformations on the entire if/else statement:
if (c & 1)
c = constant ^ (c >> 1); // same as before
else
c = 0 ^ (c >> 1); // just different layout
if (c & 1)
c = constant ^ (c >> 1);
else
c = (constant & 0) ^ (c >> 1); // 0 == x & 0
if (c & 1)
c = (constant & -1) ^ (c >> 1); // x == x & -1
else
c = (constant & 0) ^ (c >> 1);
Now the two branches only differ in the second argument to the binary-and, which can be calculated trivially from the condition itself, thus enabling us to get rid of the if/else:
c = (constant & -(c & 1)) ^ (c >> 1);
Disclaimer: This solution only works on a two's complement architecture where -1 means "all bits set".
The idea in SSE is to build both results and then blend the results together.
E.g. :
__m128i mask = ...; // some way to build mask[n] = 0x1
__m128i constant = ...;
__m128i tmp_c = _mm_xor_si128( _mm_srli_epis32( c, 1 ), constant );
__m128i tmp_c2 = _mm_srli_epis32( c, 1 );
__m128i v = _mm_cmpeq_epi32( c, mask );
tmp_c = _mm_and_epi32( tmp_c, mask );
tmp_c2 = _mm_andnot_si128( mask, tmp_c2 );
c = _mm_or_si128( tmp_c, tmp_c2 );
// or in sse4_1
c = _mm_blendv_epi8( tmp_c, tmp_c2, mask );
Note beside, this is not complete code, only to demonstrate the principle.
The first step in efficiently computing CRC is using a wider basic unit than the bit. See here for an example of how to do this byte per byte.