Why does b = (b - x) & x result in getting the next subset? - bit-manipulation

The Competitive Programmer's Handbook on page 99 suggests the following way of going through all subsets of a set x (the set bits represent the numbers in the set):
int b = 0;
do {
// Process subset b
} while (b = (b - x) & x);
I understand all the background about bit representation and bitwise operators.
What I am not understanding is why b = (b - x) & x results in getting the next subset.
This post gives an example, but does not provide an insight.
So, why does this work?

Things become clearer when we remember two's complement. The negative of a number is just 1 plus the bitwise NOT of that number. Thus,
(b - x) = (b + ~x + 1)
Let's work through an example of one iteration of the algorithm. Then I'll explain the logic.
Suppose
x = . 1 1 . . 1 .
b = . [.][.] . . [1] .
^
where . denotes zero.
Let's define "important" bits to be the bits that are in the same position as a 1 in x. I've surrounded the important bits with [], and I've marked the right-most important zero in b with ^.
~x = 1 [.][.] 1 1 [.] 1
~x + b = 1 [.][.] 1 1 [1] 1
~x + b + 1 = 1 [.][1] . . [.] .
(~x + b + 1) & x = . [.][1] . . [.] .
Notice that ~x + b always has a string of ones to the right of the right-most important zero of b. When we add 1, all those ones become zeros, and the right-most important zero becomes a 1.
If we look only at the important bits, we see that b transformed from [.][.][1] into [.][1][.]. Here are what the important bits will be if we continue:
[.][1][.]
[.][1][1]
[1][.][.]
[1][.][1]
[1][1][.]
[1][1][1]
If we write the important bits side-by-side like this, as if they were a binary number, then the operation effectively increments that number by 1. The operation is counting.
Once all the important bits are ones, (b - x) & x simply becomes (x - x) & x, which is 0, causing the loop to terminate.
By that point, we've encountered all 2^n possible values of the n important bits. Those values are the subsets of x.

Related

Absolute value abs(x) using bitwise operators and Boolean logic [duplicate]

This question already has answers here:
How to compute the integer absolute value
(11 answers)
Closed 2 years ago.
How does this work?
The idea is to make abs(x) use bitwise operators for integers (assuming 32 bit words):
y = x >> 31
(x + y) ^ y // This gives abs(x) (is ^ XOR)?
Assuming 32-bit words, as stated in the question:
For negative x, x >> 31 is implementation-defined in the C and C++ standards. The author of the code expects two’s complement integers and an arithmetic right-shift, in which x >> 31 produces all zero bits if the sign bit of x is zero and all one bits if the sign bit is one.
Thus, if x is positive or zero, y is zero, and x + y is x, so (x + y) ^ y is x, which is the absolute value of x.
If x is negative, y is all ones, which represents −1 in two’s complement. Then x + y is x - 1. Then XORing with all ones inverts all the bits. Inverting all the bits is equivalent to taking the two’s complement and subtracting one, and two’s complement is the method used to negate integers in two’s complement format. In other words, XORing q with all ones gives -q - 1. So x - 1 XORed with all ones produces -(x - 1) - 1 = -x + 1 - 1 = -x, which is the absolute value of x except when x is the minimum possible value for the format (−2,147,483,648 for 32-bit two’s complement), in which case the absolute value (2,147,483,648) is too large to represent, and the resulting bit pattern is just the original x.
This approach relies on many implementation specific behavior:
It assumes that x is 32 bits wide. Though, you could fix this by x >> (sizeof(x) * CHAR_BIT - 1)
It assumes that the machine uses two's complement representation.
the right-shift operator copies the sign bit from left to right.
Example with 3 bits:
101 -> x = -3
111 -> x >> 2
101 + 111 = 100 -> x + y
100 XOR 111 -> 011 -> 3
This is not portable.
This isn't portable, but I'll explain why it works anyway.
The first operation exploits a trait of 2's complement negative numbers, that the first bit if 1 if negative, and 0 if positive. This is because the numbers range from
The example below is for 8 bits, but can be extrapolated to any number of bits. In your case it's 32 bits (but 8 bits displays the ranges more easily)
10000000 (smallest negative number)
10000001 (next to smallest)
...
11111111 (negative one)
00000000 (zero)
00000001 (one)
...
01111110 (next to largest)
01111111 (largest)
Reasons for using 2's complement encoding of numbers come about by the property that adding any negative number to it's positive number yields zero.
Now, to create the negative of a 2's complement number, you would need to
Take the inverse (bitwise not) of a the input number.
Add one to it.
The reason the 1 is added to it is to force the feature of the addition zeroing the register. You see, if it was just x + ~(x), then you would get a register of all 1's. By adding one to it, you get a cascading carry which yields a register of zeros (with a 1 in the carry out of the register).
This understanding is important to know "why" the algorithm you provided (mostly) works.
y = x >> 31 // this line acts like an "if" statement.
// Depending on if y is 32 signed or unsigned, when x is negative,
// it will fill y with 0xFFFFFFFF or 1. The rest of the
// algorithm doesn't, care because it accommodates both inputs.
// when x is positive, the result is zero.
We will explore (x is positive first)
(x + y) ^ y // for positive x, first we substitute the y = 0
(x + 0) ^ 0 // reduce the addition
(x) ^ 0 // remove the parenthesis
x ^ 0 // which, by definition of xor, can only yield x
x
Now let's explore (x is negative, y is 0xFFFFFFFF (y was signed))
(x + y) ^ y // first substitute the Y
(x + 0xFFFFFFFF) ^ 0xFFFFFFFF // note that 0xFFFFF is the same as 2's complement -1
(x - 1) ^ 0xFFFFFFFF // add in a new variable Z to hold the result
(x - 1) ^ 0xFFFFFFFF = Z // take the ^ 0xFFFFFFFF of both sides
(x - 1) ^ 0xFFFFFFFF ^ 0xFFFFFFFF = Z ^ 0xFFFFFFFF // reduce the left side
(x - 1) = z ^ 0xFFFFFFFF // note that not is equivalent to ^ 0xFFFFFFFF
(x - 1) = ~(z) // add one to both sides
x - 1 + 1 = ~(z) + 1 // reduce
x = ~(z) + 1 // by definition z is negative x (for 2's complement numbers)
Now let's explore (x is negative, y is 0x01 (y was unsigned))
(x + y) ^ y // first substitute the Y
(x + 1) ^ 0x00000001 // note that x is a 2's complement negative, but is
// being treated as unsigned, so to make the unsigned
// context of x tracable, I'll add a -(x) around the X
(-(x) + 1) ^ 0x00000001 // which simplifies to
(-(x - 1)) ^ 0x00000001 // negative of a negative is positive
(-(x - 1)) ^ -(-(0x00000001)) // substituting 1 for bits of -1
(-(x - 1)) ^ -(0xFFFFFFFF) // pulling out the negative sign
-((x-1) ^ 0xFFFFFFFF) // recalling that while we added signs and negations to
// make the math sensible, there's actually no place to
// store them in an unsigned storage system, so dropping
// them is acceptable
x-1 ^ 0XFFFFFFFF = Z // introducing a new variable Z, take the ^ 0xFFFFFFF of both sides
x-1 ^ 0xFFFFFFFF ^ 0xFFFFFFFF = Z ^ 0xFFFFFFFF // reduce the left side
x-1 = z ^ 0xFFFFFFFF // note that not is equivalent to ^ 0xFFFFFFFF
x-1 = ~(z) // add one to both sides
x - 1 + 1 = ~(z) + 1 // reduce
x = ~(z) + 1 // by definition z is negative x (for 2's complement numbers, even though we used only non-2's complement types)
Note that while the above proofs are passable for a general explanation, the reality is that these proofs don't cover important edge cases, like x = 0x80000000 , which represents a negative number greater in absolute value than any positive X which could be stored in the same number of bits.
I use this code, first the calculation of the two's complement (the guard just ensures with a compile time check, the template is an Integer)
/**
* Zweierkomplement - Two's Complement
*/
template<typename T> constexpr auto ZQ(T const& _x) noexcept ->T{
Compile::Guards::IsInteger<T>();
return ((~(_x))+1);
}
and in a second step this is used to calculate the integer abs()
/**
* if number is negative, get the same number with positiv sign
*/
template<typename T> auto INTABS(T const _x) -> typename std::make_unsigned<T>::type{
Compile::Guards::IsInteger<T>();
return static_cast<typename std::make_unsigned<T>::type>((_x<0)?(ZQ<T>(_x)):(_x));
}
why I use this kind of code:
* compile-time checks
* works with all Integer sizes
* portable from small µC to modern cores
* Its clear, that we need to consider the two's complement, so you need an unsigned return value, e.g for 8bit abs(-128)=128 can not be expressed in an signed integer

Using 1's complement to generate a mask that shows the first non-zero bit

I found an interesting property about 1's complement when reading an interview preparation book.
The property says given a number X, we can generate a mask that shows the first set bit (from right to left) using the 1's complement as follows:
X & ~(X - 1) where ~ stands for 1's complement.
For example, if X = 0b0011 then
0b0011 & 0b1101 = 0b0001
I understood that the author is doing the X-1 to flip the first non-zero bit from the right. But I'm curious as to how did he come up with the idea that taking a 1's complement of X-1 and &ing it with X would result into a bit-mask that shows the first non-zero bit in X.
Its my first time posting at StackOverflow, so my apologies if this question doesn't belong here.
First, notice that for any X, X & (~X) = 0 and X & X = X.
Let X = b_n b_(n-1) ... b_k ... b_1, where b_k is the first set bit.
Thus, X is essentially this:
b_n b_(n-1) ... b_(k+1) 1 0 0 ... 0
---- k ----
X-1 is:
b_n b_(n-1) ... b_(k+1) 0 1 1 ... 1
---- k ----
~(X-1) is:
~b_n ~b_(n-1) ... ~b_(k+1) 1 0 0 ... 0
---- k ----
X & ~(X-1) is:
0 0 .................... 0 1 0 0 ... 0
---- k ----
This can actually be proved using some math. Let x be a positive integer. For all x, there exists a binary representation of x. Additionally, for all x there exists a number x - 1 which also has a binary representation. For all x, the bit in the 1s place, will differ from that of x - 1. Let us define ~ as the ones' complement operator. For any binary number b, ~b turns all of the 0s in b into 1s, and all of the 1s in b into 0s. We can then say that ~(x - 1) must then have the same bit in the 1s place as x. Now, this is simple for odd numbers as all odd numbers o have a 1 in the 1s bit, and so must ~(x - 1), and we can stop there. For even numbers this gets a bit trickier. For all even numbers, e, the 1 bit must be empty. As we stated that x (and also e) must be greater than 0, we can also say that for all even numbers, e, there exists some bit such that the value of that bit is 1. We can also say that for e - 1, the 1s bit must be 1 as e - 1 must be odd. Additionally, we can say that the first bit with a value of 1 in e will be 0 in e - 1. Therefore, using the ones' complement of e - 1, that bit in e that must have been 0, will become 1 by the rules of ones' complement. Using the & operator, that will be the common 1 bit between e and ~(e - 1).
This trick is probably better known written as
X & -X
which is by definition (of -) equivalent, and using the following interpretation of - it becomes very simple to understand:
In string notation for a number that isn't zero, -(a10k) = (~a)10k
If you're unfamiliar with the notation, a10k just means "some string of bits 'a' followed by a 1 followed by k zeroes".
This interpretation just says that negation keeps all the trailing zeroes and the lowest 1, but inverts all higher bits. You can see that it does that from the definition of negation as well, for example if you look at ~X + 1, you see that the +1 cancels out the inversion for the trailing zeroes (which become ones which the +1 carries through) and the lowest set bit (which becomes 0 and then the carry through the trailing zeroes is captured by it).
Anyway, using that interpretation of negation, obviously the top part is removed, the lowest set bit is kept, and the trailing zeroes are just going to stay.
In general, the string notation is very helpful when coming up with these tricks. For example if you know that negation looks like that in string notation, this trick is really quite obvious, and so are some related tricks which you can then also find:
x & x - 1 resets the lowest set bit
x | -x keeps the trailing zeroes but sets all higher bits
x ^ -x keeps the trailing zeroes, resets the lowest set bit, but sets all higher bits
.. and more variants.

Is (n & m) <= m always true?

Given n and m unsigned integral types, will the expression
(n & m) <= m
always be true ?
Yes, it is true.
It should be readily apparent that a necessary condition for y > x is that at least one bit position is set to 1 in y but 0 in x. As & cannot set a bit to 1 if the corresponding operand bits were not already 1, the result cannot be larger than the operands.
Yes, it is always true for unsigned integral data types.
Depending on the value of the mask n, some 1-bits in m may become 0-bits; all bits that are 0 in m will remain 0 in the result. From the point of keeping m as high as possible, the best thing that could happen is that all 1-bits would remain in place, in which case the result would equal m. In all other cases, the result will be less than m.
Let m be m1m2m3m4m5m6m7m8 where mi is a bit.
Now let n be n1n2n3n4n5n6n7n8.
What would be m & n? All bits in m that originally was 0, will stay 0, because 0 & anything is 0. All bits that were 1, will be 1 only of the corresponding bit in n is 1.
Meaning that in the "best" case, the number will be the same, but can never get bigger, since not 1's can be created from 0 & anything.
Let's have an example in order to have a better intuition:
Let m be 11101011.
What are the numbers that are bigger than m? 11111111 (trivial), 11111011, 11101111, 11111010, 11111110.
n1n2n3n4n5n6n7n8
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ &
1 1 1 0 1 0 1 1
-----------------------
No way you can get any of the above combinations from doing this.
Yes. To supplement the reasons described in the other answers, a few binary examples make it pretty clear that it will not be possible to make the result greater than either of the args:
0
1
------
0
1
1
------
1
111011
11111
------
11011
111011
111011
------
111011
Given the two arguments, the highest that we can achieve is if both of the arguments are the same value, the result being equal to the value of the two arguments (last example above).
It is impossible to make the result larger no matter what we set the arguments to. If you are able to, then we are in serious trouble. ;)
n & m has all the bits that are set in m and in n.
~n & m has all the bits that are set in m but not in n.
Adding both quantities will give all the bits that are set in m. That is, m:
m = (n & m) + (~n & m)
m ≥ (n & m)

Bitwise operation for add

Could you please help me figure out why the following expression is true:
x + y = x ^ y + (x & y) << 1
I am looking for some rules from the bitwise logic to explain this mathematical equivalent.
It's like solving an ordinary base 10 addition problem 955 + 445, by first adding all the columns individually and throwing away carried 1s:
955
445
-----
390
Then finding all the columns where there should be a carried 1:
955
445
-----
101
Shifting this and adding it to the original result:
390
+ 1010
------
1400
So basically you're doing addition but ignoring all the carried 1s, and then adding in the carried ones after, as a separate step.
In base 2, XOR (^) correctly performs addition when either of the bits is a 0. When both bits are 1, it performs addition without carry, just like we did in the first step above.
x ^ y correctly adds all the bits where x and y are not both 1:
1110111011
^ 0110111101
-------------
1000000110 (x ^ y)
x & y gives us a 1 in all the columns where both bits are a 1. These are exactly the columns where we missed a carry:
1110111011
& 0110111101
-------------
0110111001 (x & y)
Of course when you carry a 1 when doing addition you shift it left one place, just like when you add in base 10.
1000000110 (x ^ y)
+ 01101110010 + (x & y) << 1
-------------
10101111000
x + y is not equivalent to x ^ y + (x & y) << 1
However, your expression above will evaluate to true for most values since = means assignment and non-zero values mean true. == will test for equality.
EDIT
x ^ y + ((x & y) << 1) is correct with parentheses. The AND finds where a carry would happen and the shift carries it. The XOR finds where and addition would happen with no carry. Adding the two together unifies the result.

Operations on bits, getting the bigger value

I'm not familiar with bitwise operations. I have this sequence:
1 0 0 0 0 : 16
---------------
0 1 1 1 1 : 15
---------------
0 1 1 1 0 : 14
---------------
.
.
.
---------------
0 0 0 1 1 : 3
---------------
0 0 0 1 0 : 2
---------------
0 0 0 0 1 : 1
---------------
I want to check first if there is more than one "1". If that's the case, I want to remove the one that has the bigger decimal value, and to finish, getting the bigger remaining. For example 15, there is four "1", I remove the bigger one, the "1" at "8", I got "0 0 1 1 1 : 7", where the bigger "1" is at "4". How can I do this?
Here's the code that does what you want:
unsigned chk_bits(unsigned int x) {
unsigned i;
if (x != 0 && (x & (x-1)) != 0) {
/* More than one '1' bit */
for (i = ~(~0U >> 1); (x & i) == 0; i >>= 1)
; /* Intentionally left blank */
return x & ~i;
}
return x;
}
Note that I assume you're dealing with unsigned numbers. This is usually safer, because right shifting is implementation defined on signed integers, because of sign extension.
The if statement checks if there's more than one bit set in x. x & (x-1) is a known way to get a number that is the same as x with the first '1' least significant bit turned off (for example, if x is 101100100, then x & (x-1) is 101100000. Thus, the if says:
If x is not zero, and if turning off the first bit set to 1 (from LSB to MSB) results in something that is not 0,
then...
Which is equivalent to saying that there's m ore than 1 bit set in x.
Then, we loop through every bit in x, stopping in the first most significant bit that is set. i is initialized to 1000000000000000000000000000, and the loop keeps right shifting it until x & i evaluates to something that is not zero, at which point we found the first most significant bit that is 1. At that point, taking i's complement will yield the mask to turn off this bit in x, since ~i is a number with every bit set to 1 except the only bit that was a 1 (which corresponds to the highest order bit in x). Thus, ANDing this with x gives you what you want.
The code is portable: it does not assume any particular representation, nor does it rely on the fact that unsigned is 32 or 64 bits.
UPDATE: I'm adding a more detailed explanation after reading your comment.
1st step - understanding what x & (x-1) does:
We have to consider two possibilities here:
x ends with a 1 (.......0011001)
x ends with a 0 (.......0011000)
In the first case, it is easy to see that x-1 is just x with the rightmost bit set to 0. For example, 0011001 - 1 = 0011000, so, effectively, x & (x-1) will just be x-1.
In the second case, it might be slightly harder to understand, but if the rightmost bit of x is a 0, then x-1 will be x with every 0 bit switched to a 1 bit, starting on the least significant bits, until a 1 is found, which is turned into a 0.
Let me give you an example, because this can be tricky for someone new to this:
1101011000 - 1 = 11101010111
Why is that? Because the previous number of a binary number ending with a 0 is a binary number filled with one or more 1 bits in the rightmost positions. When we increment it, like 10101111101111 + 1, we have to increment the next "free" position, i.e., the next 0 position, to turn it into a 1, and then all of the 1-bits to the right of that position are turned into 0. This is the way ANY base-n counting works, the only difference is that for base-2 you only have 0's and 1's.
Think about how base-10 counting works. When we run out of digits, the value wraps around and we add a new digit on the left side. What comes after 999? Well, the counting resets again, with a new digit on the left, and the 9's wrap around to 0, and the result is 1000. The same thing happens with binary arithmetic.
Think about the process of counting in binary; we just have 2 bits, 0 and 1:
0 (decimal 0)
1 (decimal 1 - now, we ran out of bits. For the next number, this 1 will be turned into a 0, and we need to add a new bit to the left)
10 (decimal 2)
11 (decimal 3 - the process is going to repeat again - we ran out of bits, so now those 2 bits will be turned into 0 and a new bit to the left must be added)
100 (decimal 4)
101 (decimal 5)
110 (the same process repeats again)
111
...
See how the pattern is exactly as I described?
Remember we are considering the 2nd case, where x ends with a 0. While comparing x-1 with x, rightmost 0's on x are now 1's in x-1, and the rightmost 1 in x is now 0 in x-1. Thus, the only part of x that remains the same is that on the left of the 1 that was turned into a 0.
So, x & (x-1) will be the same as x until the position where the first rightmost 1 bit was. So now we can see that in both cases, x & (x-1) will in fact delete the rightmost 1 bit of x.
2nd step: What exactly is ~0U >> 1?
The letter U stands for unsigned. In C, integer constants are of type int unless you specify it. Appending a U to an integer constant makes it unsigned. I used this because, as I mentioned earlier, it is implementation defined whether right shifting makes sign extension. The unary operator ~ is the complement operator, it grabs a number, and takes its complement: every 0 bit is turned into 1 and every 1 bit is turned into 0. So, ~0 is a number filled with 1's: 11111111111.... Then I shift it right one position, so now we have: 01111111...., and the expression for this is ~0U >> 1. Finally, I take the complement of that, to get 100000...., which in code is ~(~0U >> 1). This is just a portable way to get a number with the leftmost bit set to 1 and every other set to 0.
You can give a look at K&R chapter 2, more specifically, section 2.9. Starting on page 48, bitwise operators are presented. Exercise 2-9 challenges the reader to explain why x & (x-1) works. In case you don't know, K&R is a book describing the C programming language written by Kernighan and Ritchie, the creators of C. The book title is "The C Programming Language", I recommend you to get a copy of the second edition. Every good C programmer learned C from this book.
I want to check first if there is more than one "1".
If a number has a single 1 in its binary representation then it is a number that can be represented in the form 2x. For example,
4 00000100 2^2
32 00010000 2^5
So to check for single one, you can just check for this property.
If log2 (x) is a whole number then it has single 1 in it's binary representation.
You can calculate log2 (x)
log2 (x) = logy (x) / logy (2)
where y can be anything, which for standard log functions is either 10 or e.
Here is a solution
double logBase2 = log(num)/log(2);
if (logBase2 != (int)logBase2) {
int i = 7;
for (;i >0 ; i--) {
if (num & (1 << i)) {
num &= ~(1 << i);
break;
}
}
}