Can someone explain how this works?
#define BX_(x) ((x) - (((x)>>1)&0x77777777) \
- (((x)>>2)&0x33333333) \
- (((x)>>3)&0x11111111))
#define BITCOUNT(x) (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
Clarification:
Ideally, the answer will start something along the lines of:
The macro: "BX_" subtracts three values from the passed in number.
These three values represent:
XXXXX
YYYYY
ZZZZZ
This allows the BITCOUNT() to work as follows...
Cheers,
David
The output of BX_(x) is the number of on bits in each hex digit. So
BX_(0x0123457F) = 0x01121234
The following:
((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F)
shuffles the counts into bytes:
((BX_(0x0123457F)+(BX_(0x0123457F)>>4)) & 0x0F0F0F0F) = 0x01030307
Taking this result modulo 255 adds up the individual bytes to arrive at the correct answer 14. To see that this works, consider just a two-byte integer, 256*X + Y. This is just 255*X + X + Y, and 255*X % 255 is always zero, so
(256*X + Y) % 255 = (X + Y) % 255.
This extends to four-byte integers:
256^3*V + 256^2*W + 256*X + Y
Just replace each 256 with (255+1) to see that
(256^3*V + 256^2*W + 256*X + Y) % 255 = (V + W + X + Y) % 255.
The final observation (which I swept under the rug with the 2-digit example) is that V + W + X + Y is always less than 255, so
(V + W + X + Y) % 255 = V + W + X + Y.
As quoted by Johannes from that splendid Bit Twiddling Hacks page, there's an excellent and detailed description of that algorithm in Software Optimization Guide for AMD Athlon™ 64 and Opteron™ Processors from AMD on page numbers 179 and 180 - corresponding to pages 195 and 196 of the PDF.
Also describing the same idea and some alternative solutions and their relative performance: this page.
Related
Problem : To check if a non-negative integer is of form 2^j - 2^k where j>=k>=0 i.e. difference of powers of 2.
My solution : The number n (say) can be represented as contiguous sequence of 1's for eg. 00011110. I will turn off the contiguous sequence(right most) of 1's and do a zero check on n.
What I do here is that, steps for solution
00011110
00011111(turn on trailing 0's)
00000000(then turn off trailing 1's).
Using this formula (x | (x - 1)) & ((x | (x - 1)) + 1).
But a more efficient formula(maybe because of less number of operation) which does not uses literals is ((x & -x) + x) & x followed by a zero check. And I can't understand this but it's written it does the same thing, but just can't derive the formula from my result. Can someone explain this to me?
EDIT : 32-bit word, 2's complement
Given that -x is ~x + 1, if a number is of the form 2^j - 2^k then:
-x = 2^k plus all 1s >= 2^j, as carry will ripple up until it hits 2^k, then stop;
hence x & -x= 2^k;
hence (x & -x) + x = 2^k; and
hence ((x & -x) + x) & x = 0.
And you can work backwards along that logic:
((x & -x) + x) & x = 0 => no common bits between ((x & -x) + x) and x;
no common bits between x and ((x & -x) + x) implies that for consecutive group of 1s in x, (x & -x) must have the lowest of those bits set and none of the others;
... and the only way to achieve that given the way that carry ripples is if there is only one consecutive group of 1s.
You asked for an algebraic proof connecting the two expressions, so here is one, but with some non-simple steps
((x | (x - 1)) + 1) & (x | (x - 1))
// rename x | (x - 1) into blsfill(x)
(blsfill(x) + 1) & blsfill(x)
// the trailing zeroes that get filled on the right side of the & don't matter,
// they end up being reset by the & anyway
(blsfill(x) + 1) & x
// filling the trailing zeroes and adding 1,
// is the same thing as skipping the trailing zeroes and adding the least-set-bit
(x + blsi(x)) & x
// rewrite blsi into elementary operations
(x + (x & -x)) & x
Probably very easy question, yet I came out with this implementation that looks far too complicated...
unsigned int x;
unsigned int z;
unsigned int makeXMultipleOfZ(const unsigned x, const unsigned z) {
return x + (z - x % z) % z;
//or
//return x + (z - (x + 1) % z - 1); //This generates shorter assembly,
//6 against 8 instructions
}
I would like to avoid if-statements
If this can help we can safely say that z will be a power of 2
In my case z=4 (I know I could replace the modulo operation with a & bit operator), and I was wondering if could come with an implementation that involves less steps.
If z is a power of two, the modulo operation can be reduced to this bitwise operation:
return (x + z - 1) & ~(z - 1);
This logic is very common for data structure boundary alignment, for example. More info here: https://en.wikipedia.org/wiki/Data_structure_alignment
If z is a power of two and the integers are unsigned, the following will work:
x + (z - 1) & ~(z - 1)
I cannot think of a solution using bit-twiddling if z is an arbitrary number.
int x, y; // x is a non-negative integer
p = 0;
while (x > 0)
{
if ( x % 2 == 1 )
p = p + y;
y = y*2;
x = x/2;
}
// p == a*b here
I understand that this loop finds the product of 'a' and 'b' using the algebra:
a * b = (1/2)a * 2b
but I don't understand the code:
if ( x % 2 == 1 )
p = p + y;
I was hoping someone could explain why 'p' is assigned 'p + y' on odd values of x.
while (x > 0) {
if (x % 2 == 1)
p = p + y;
y = y*2;
x = x/2;
}
imagine x = 4, y = 5
iterations:
x is even, y = 10, x = 2 (i.e. x can be divided, y should be doubled)
x is even, y = 20, x = 1
x is odd, p = 20, y = 40, x = 0 (i.e. x can not be divided anymore, y should be added to p)
x > 0 is false, loop ends
p = 4 * y
now imagine x is odd at the beginning, let's say x = 5, y = 2:
x is odd, p = 2, y = 4, x = 2
(5/2 = 2.5, new value of x will be rounded down, y should be added BEFORE it is doubled)
x is even, y = 8, x = 1
x is odd, p = 10, y = 16, x = 0
p = y + 4*y
that first y is the reason, adding it to the result before it is doubled (1 * y) is in this case equivalent to 0.5 * (2 * y)
Because these are integers, a / 2 will be an integer. If a is odd, that integer has been rounded down, and you’re missing one-half b in the next iteration of the loop, i.e. one whole b in the current iteration of the loop (since b [y] is doubled each time).
If x is odd, x = x/2 will set x to 0.5 less than x/2 (because integer division rounds it down). p needs to be adjusted to allow for that.
Think of multiplication as repeated addition, x*y is adding y together x times. It is also the same as adding 2*y together x/2 times. Conceptually it is somewhat unclear what it means if x is odd. For example, if x=5 and y=3, what does it mean to add 2.5 times? The code notices when x is odd, adds y in, then does the y=y*2 and x=x/2. When x is odd, this throws away the .5 part. So in this example, you add y one time, then x becomes 2 (not 2.5) because integer division throws away the fraction.
At the end of each loop, you will see that the product of the original x and y is equal to p + x*y for the current values of p, x, and y. The loop iterates until x is 0, and the result is entirely in p.
It also helps to see what is going on if you make a table and update it each time through the loop. These are the values at the start of each iteration:
x | y | p
----------
5 | 3 | 0
2 | 6 | 3
1 | 12 | 3
0 | 24 | 15
This works by observing that (for example) y * 10 = y * 8 + y * 2.
It's pretty much like doing multiplication on paper in school. For example, to multiply 14 x 21, we multiply one digit at a time (and shift left a place where needed) so we add 1x14 + 2 x 14 (shifted left one digit).
14
x 21
----
14
280
Here, we're doing pretty much the same thing, but working in binary instead of decimal. The right shifting has nothing to do with the numbers being odd, and everything to do with simply finding which bits in the number are set.
As we shift one operand right to find whether a bit is set, we also shift the other operand left, just like we add zeros to shift numbers left when doing arithmetic on paper in decimal.
So, viewing things in binary, we end up with something like:
101101
x 11010
--------
1011010
+ 101101000
+ 1011010000
If we wanted to, instead of shifting the operand right, we could just shift the mask left so instead of repeatedly anding with 1, we'd and with 1, then with 2, then with 4, and so on (in fact, it would probably make a lot more sense that way). For better or worse, however, in assembly language (where this sort of thing is normally done) it's usually a little easier to shift the operand and use a constant for the mask than load the mask in a register and shift it when needed.
You should rewrite x as 2*b+1 (assuming x is odd). Then
x*y = (2*b+1)*y = (2*b)*y + y = b*(2*y) + y = (x/2)*(2*y) + y
where (x/2) is meant to be the integer division. With the operation rewritten this way, you see the x/2, the 2y and the +y appear.
I got an assignment today at my faculty (Mathematics Faculty of Belgrade, Serbia) which says:
1) Write a program that for two given integers x and y, inverts in integer x those bits that match the corresponding bits in y, while the rest of the bits remain the same.
For example:
x = 1001110110101
y = 1100010100011
x' = 0011101011100
I managed to write a program that does that, but I am a little insecure about the quality of my solution. Please, if you have time, check out the code and tell me how I could improve it.
int x, y, bitnum;
int z = 0;
unsigned int mask;
bitnum = sizeof(int) * 8;
mask = 1 << bitnum - 1;
printf("Unesi x i y: ");
scanf("%d%d", &x, &y);
while (mask > 1) {
if ( (((x & mask) == 0) && ((y & mask) == 0)) ||
((x & mask) && ((y & mask) == 0)) )
z += 1;
z <<= 1;
mask >>= 1;
} /* <-- THAT'S HOW STUPID PEOPLE SOLVE PROBLEMS... WITH HAMMER! */
z = y~; /* <-- THAT'S HOW SMART PEOPLE SOLVE PROBLEMS... WITH ONE LINE */
Everything works correctly, for x = 423 and y = 324 for example, I get z = -344, which is correct. Also, bit prints match I would just like to know if there is a better way to do this.
Thanks.
If you take a look at your x/y/x' example, it must strike you that x' is a complement to y. And indeed it's like that.
x y x'
--------
1 1 0
0 0 1
1 0 1
0 1 0
Spoiler (hover your mouse over block below, if you want to see a solution):
For bits that match, you invert bit in x, but as it is the same as bit in y, it's the same as inverting bit in y. When they do not match, you keep the bit from x, what is already inversion of bit in y on its own. I hope you can see the one-line solution already yourself: x' = ~y;
//Try with the next code:
unsigned int mask1, mask2, mask3, answ;
mask1 = x & y; // identify bits with 1 that match
mask2 = ~x & ~y; // identify bits with 0 that match
mask3 = mask1 | mask2; // identify bits with 0 or 1 that match
answ = x ^ m3; // Change identified bits
I have the compute the sum S = (a*x + b*y + c) % N. Yes it looks like a quadratic equation but it is not because the x and y have some properties and have to be calculated using some recurrence relations. Because the sum exceeds even the limits of unsigned long long I want to know how could I compute that sum using the properties of the modulo operation, properties that allow the writing of the sum something like that(I say something because I do not remember exactly how are those properties): (a*x)%N + (b*y)%N + c%N, thus avoiding exceeding the limits of unsigned long long.
Thanks in advance for your concern! :)
a % N = x means that for some integers 0 <= x < N and m: m * N + x = a.
You can simply deduce then that if a % N = x and b % N = y then
(a + b) % N =
= (m * N + x + l * N + y) % N =
= ((m + l) * N + x + y) % N =
= (x + y) % N =
= (a % N + b % N) % N.
We know that 0 < x + y < 2N, that is why you need to keep remainder calculation. This shows that it is okay to split the summation and calculate the remainders separately and then add them, but don't forget to get the remainder for the sum.
For multiplication:
(a * b) % N =
= ((m * N + x) * (l * N + y)) % N =
= ((m * l + x * l + m * y) * N + x * y) % N =
= (x * y) % N =
= ((a % N) * (b % N)) % N.
Thus you can also do the same with products.
These properties can be simply derived in a more general setting using some abstract algebra (the remainders form a factor ring Z/nZ).
You can take the idea even further, if needed:
S = ( (a%N)*(x%N)+(b%N)*(y%N)+c%N )%N
You can apply the modulus to each term of the sum as you've suggested; but even so after summing them you must apply the modulus again to get your final result.
How about this:
int x = (7 + 7 + 7) % 10;
int y = (7 % 10 + 7 % 10 + 7 % 10) % 10;
You remember right. The equation you gave, where you %N every of the summands is correct. And that would be exactly what I use. You should also %N for every partial sum (and the total) again, as the addition results can be still greater than N. BUT be careful this works only if your size limit is at least twice as big as your N. If this is not the case, it can get really nasty.
Btw for the following %N operations of the partial sums, you dont have to perform a complete division, a check > N and if bigger just subtraction of N is enough.
Not only can you reduce all variable mod n before starting the calculation, you can write your own mod-mul to compute a*x mod n by using a shift-and-add method and reduce the result mod n at each step. That way your intermediate calculations will only require one more bit than n. Once these products are computed, you can add them pairwise and reduce mod n after each addition which will also not require more than 1 bit beyond the range of n.
There is a python implementation of modular multiplication in my answer to this question. Conversion to C should be trivial.