Determining cell value using bitwise operation with adjacent cell values - c++

A r*c grid has only 0's ans 1's . In each iteration , if there is any adjacent cell (up,down,left,right) same to it, the value of the current cell will be flipped . Now , how to come up with a bitwise formula to do this . It can be done with a simple if condition , but I want to know the bitwise operation to do this so the whole operation can be done once per row .
I am talking about this problem . I saw a solution using this concept here . But I couldn't understand how this is used to do the determine the cell value by this XOR operations.
ans[i] ^= ((l ^ r) | (r ^ u) | (u ^ d)) | (~s[i] ^ l);
ans[i] &= prefix;
Any help would be appreciated :D

For the start, consider s[i], l, r, u, and d to be single bits, that is, boolean variables.
s[i] (abbreviated as s in this answer) is the old color of the cell to be updated.
l, r, u, and d are the colors of the adjacent cells left, right, above (up), and below (down) of the cell to be updated.
ans[i] (abbreviated as ans in this answer) is the new color of the cell after the update.
We initialize ans = s and update it only if needed.
Recall the rules from the game for a single cell C:
If all cells adjacent to C have the opposite color of C, then C retains its color.
Otherwise (if a cell adjacent to C has the same color as C), C changes its color.
Are there various adjacent colors?
For the first condition you can use a fail-fast approach. No matter the color of C, if the adjacent cells have various colors (some are 0 and some are 1) then C changes its color. To check whether the adjacent cells l, r, u, and d have various colors you only need three checks ✱:
various_adjacent_colors = (l != r) || (r != u) || (u != d)
In bit-wise notation this is
various_adjacent_colors = (l ^ r) | (r ^ u) | (u ^ d)
✱ The "missing" checks like r != d are not necessary. Think about it the other way: If all three checks fail, then we know (l == r) && (r == u) && (u == d). In that case, from transitivity of == follows that (l == u), and (l == d), and (r == d). Therefore, all colors are the same.
Fail-Fast for various adjacent colors
If we find various adjacent colors, then we change s:
if (various_adjacent_colors)
ans = !s
In bit-wise notation this is
ans ^= various_adjacent_colors
Are all colors equal?
If we did not fail-fast, we know that all adjacent colors are equal to each other but not if they are equal to s. If s == all_adjacent_colors then we change s and if s != all_adjacent_colors then we retain s.
if (!various_adjacent_colors && s == l) // l can be replaced by either r, u, or d
ans = !s
In bit-wise notation this is
ans ^= ~various_adjacent_colors & ~(s ^ l) or
ans ^= ~various_adjacent_colors & (~s ^ l)
Putting everything together
Now let's inline (and slightly simplify) all the bit-wise notations:
vari = (l ^ r) | (r ^ u) | (u ^ d); ans ^= vari; ans ^= ~vari & (~s ^ l) is the same as
vari = (l ^ r) | (r ^ u) | (u ^ d); ans ^= vari | (~s ^ l) is the same as
ans ^= ((l ^ r) | (r ^ u) | (u ^ d)) | (~s ^ l)
Seems familiar, right? :)
From single bits to bit-vectors
So far, we only considered single bits. The linked solution uses bit-vectors instead to simultaneously update all bits/cells in a row of the 2D game board. This approach only fails at the borders of the game board:
From r = s[i] << 1 the game board might end up bigger than it should be. ans[i] &= prefix fixes the size by masking overhanging bits.
At the top and bottom row the update does not work because u = s[i-1] and d = s[i+i] do not exist. The author updates these rows "manually" in a for loop.
The update for the leftmost and rightmost cell in each row might be wrong since r = s[i] << 1 and l = s[i] >> 1 shift in "adjacent" cells of color 0 which are not actually in the game. The author updates these cells "manually" in another for loop.
By the way: A (better?) alternative to the mentioned "manual" border updates is to slightly enlarge the game board with an additional virtual row/column at each border. Before each game iteration, the virtual rows/columns are initialized such that they don't affect the update. Then the update of the actual game board is done as usual. The virtual rows/columns don't have to be stored, instead use ...
// define once for the game
bitset<N> maskMsb, maskLsb;
maskMsb[m-1] = 1;
maskLsb[0] = 1;
// define for each row when updating the game board
bitset<N> l = (s[i] >> 1) | (~s[i] & maskMsb);
bitset<N> r = (s[i] << 1) | (~s[i] & maskLsb);
bitset<N> u = i+1 <= n-1 ? s[i+1] : ~s[n-1];
bitset<N> d = i-1 >= 0 ? s[i-1] : ~s[0];

Related

My segment tree update function doesn't work properly

The problem:
In this task, you need to write a regular segment tree for the sum.
Input The first line contains two integers n and m (1≤n,m≤100000), the
size of the array and the number of operations. The next line contains
n numbers a_i, the initial state of the array (0≤a_i≤10^9). The following
lines contain the description of the operations. The description of
each operation is as follows:
1 i v: set the element with index i to v (0≤i<n, 0≤v≤10^9).
2 l r:
calculate the sum of elements with indices from l to r−1 (0≤l<r≤n).
Output
For each operation of the second type print the corresponding
sum.
I'm trying to implement segment tree and all my functions works properly except for the update function:
void update(int i, int delta, int v = 0, int tl = 0, int tr = n - 1)
{
if (tl == i && tr == i)
t[v] += delta;
else if (tl <= i && i <= tr)
{
t[v] += delta;
int m = (tl + tr) / 2;
int left = 2 * v + 1;
int right = left + 1;
update(i, delta, left, tl, m);
update(i, delta, right, m + 1, tr);
}
}
I got WA on segment tree problem, meanwhile with this update function I got accepted:
void update(int i, int new_value, int v = 0, int tl = 0, int tr = n - 1)
{
if (tl == i && tr == i)
t[v] = new_value;
else if (tl <= i && i <= tr)
{
int m = (tl + tr) / 2;
int left = 2 * v + 1;
int right = left + 1;
update(i, new_value, left, tl, m);
update(i, new_value, right, m + 1, tr);
t[v] = t[left] + t[right];
}
}
I really don't understand why my first version is not working. I thought maybe I had some kind of overflowing problem and decided to change everything to long longs, but it didn't help, so the problem in the algorithm of updating itself. But it seems ok to me. For every segment that includes i I need to add sum of this segment to some delta (it can be negative, if for example I had number 5 and decided to change it to 3, then delta will be -2). So what's the problem? I really don't see it :(
There are 2 problems with your first solution:
The question expects you to do a point update. The condition (tl == i && tr == i) checks if you are the leaf node of the tree.
At leaf node, you have to actually replace the value instead of adding something into it, which you did for the second solution.
Secondly, you can only update the non-leaf nodes after all its child nodes are updated. Updating t[v] before making recursive call will anyways result into wrong answer.

How to simplify this "clear multiple bits at once" function?

I finally figured out through trial and error how to clear multiple bits on an integer:
const getNumberOfBitsInUint8 = function(i8) {
let i = 0
while (i8) {
i++
i8 >>= 1
}
return i
}
const write = function(n, i, x) {
let o = 0xff // 0b11111111
let c = getNumberOfBitsInUint8(x)
let j = 8 - i // right side start
let k = j - c // right side remaining
let h = c + i
let a = x << k // set bits
let b = a ^ o // set bits flip
let d = o >> h // mask right
let q = d ^ b //
let m = o >> j // mask left
let s = m << j
let t = s ^ q // clear bits!
let w = n | a // set the set bits
let z = w & ~t // perform some magic https://stackoverflow.com/q/8965521/169992
return z
}
The write function takes an integer n, the index i to write bits into, and the bits value x.
Is there any way to simplify this function down and remove some steps? (Without just combining multiple operations on a single line)?
One possibility is to first clear the relevant part and then copy the bits into it:
return (n & ~((0xff << (8 - c)) >> i)) | (x << (8 - c - i))
assuming the left shift is restricted to 8 bits so the top bits disappear. Another is to use xor to find the bits to be changed :
return n ^ ((((n >> (8 - c - i)) ^ x) << (8 - c)) >> i)

Range Update - Range Query using Fenwick Tree

http://ayazdzulfikar.blogspot.in/2014/12/penggunaan-fenwick-tree-bit.html?showComment=1434865697025#c5391178275473818224
For example being told that the value of the function or f (i) of the index-i is an i ^ k, for k> = 0 and always stay on this matter. Given query like the following:
Add value array [i], for all a <= i <= b as v Determine the total
array [i] f (i), for each a <= i <= b (remember the previous function
values ​​clarification)
To work on this matter, can be formed into Query (x) = m * g (x) - c,
where g (x) is f (1) + f (2) + ... + f (x).
To accomplish this, we
need to know the values ​​of m and c. For that, we need 2 separate
BIT. Observations below for each update in the form of ab v. To
calculate the value of m, virtually identical to the Range Update -
Point Query. We can get the following observations for each value of
i, which may be:
i <a, m = 0
a <= i <= b, m = v
b <i, m = 0
By using the following observation, it is clear that the Range Update - Point Query can be used on any of the BIT. To calculate the value of c, we need to observe the possibility for each value of i, which may be:
i <a, then c = 0
a <= i <= b, then c = v * g (a - 1)
b <i, c = v * (g (b) - g (a - 1))
Again, we need Range Update - Point Query, but in a different BIT.
Oiya, for a little help, I wrote the value of g (x) for k <= 3 yes: p:
k = 0 -> x
k = 1 -> x * (x + 1) / 2
k = 2 -> x * (x + 1) * (2x + 1) / 6
k = 3 -> (x * (x + 1) / 2) ^ 2
Now, example problem SPOJ - Horrible Queries . This problem is
similar issues that have described, with k = 0. Note also that
sometimes there is a matter that is quite extreme, where the function
is not for one type of k, but it could be some that polynomial shape!
Eg LA - Alien Abduction Again . To work on this problem, the solution
is, for each rank we make its BIT counter m respectively. BIT combined
to clear the counters c it was fine.
How can we used this concept if:
Given an array of integers A1,A2,…AN.
Given x,y: Add 1×2 to Ax, add 2×3 to Ax+1, add 3×4 to Ax+2, add 4×5 to
Ax+3, and so on until Ay.
Then return Sum of the range [Ax,Ay].

Shift masked bits to the lsb

When you and some data with a mask you get some result which is of the same size as the data/mask.
What I want to do, is to take the masked bits in the result (where there was 1 in the mask) and shift them to the right so they are next to each other and I can perform a CTZ (Count Trailing Zeroes) on them.
I didn't know how to name such a procedure so Google has failed me. The operation should preferably not be a loop solution, this has to be as fast operation as possible.
And here is an incredible image made in MS Paint.
This operation is known as compress right. It is implemented as part of BMI2 as the PEXT instruction, in Intel processors as of Haswell.
Unfortunately, without hardware support is it a quite annoying operation. Of course there is an obvious solution, just moving the bits one by one in a loop, here is the one given by Hackers Delight:
unsigned compress(unsigned x, unsigned m) {
unsigned r, s, b; // Result, shift, mask bit.
r = 0;
s = 0;
do {
b = m & 1;
r = r | ((x & b) << s);
s = s + b;
x = x >> 1;
m = m >> 1;
} while (m != 0);
return r;
}
But there is an other way, also given by Hackers Delight, which does less looping (number of iteration logarithmic in the number of bits) but more per iteration:
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
Notice that a lot of the values there depend only on m. Since you only have 512 different masks, you could precompute those and simplify the code to something like this (not tested)
unsigned compress(unsigned x, int maskindex) {
unsigned t;
int i;
x = x & masks[maskindex][0];
for (i = 0; i < 5; i++) {
t = x & masks[maskindex][i + 1];
x = x ^ t | (t >> (1 << i));
}
return x;
}
Of course all of these can be turned into "not a loop" by unrolling, the second and third ways are probably more suitable for that. That's a bit of cheat however.
You can use the pack-by-multiplication technique similar to the one described here. This way you don't need any loop and can mix the bits in any order.
For example with the mask 0b10101001 == 0xA9 like above and 8-bit data abcdefgh (with a-h is the 8 bits) you can use the below expression to get 0000aceh
uint8_t compress_maskA9(uint8_t x)
{
const uint8_t mask1 = 0xA9 & 0xF0;
const uint8_t mask2 = 0xA9 & 0x0F;
return (((x & mask1)*0x03000000 >> 28) & 0x0C) | ((x & mask2)*0x50000000 >> 30);
}
In this specific case there are some overlaps of the 4 bits while adding (which incur unexpected carry) during the multiplication step, so I've split them into 2 parts, the first one extracts bit a and c, then e and h will be extracted in the latter part. There are other ways to split the bits as well, like a & h then c & e. You can see the results compared to Harold's function live on ideone
An alternate way with only one multiplication
const uint32_t X = (x << 8) | x;
return (X & 0x8821)*0x12050000 >> 28;
I got this by duplicating the bits so that they're spaced out farther, leaving enough space to avoid the carry. This is often better than splitting into 2 multiplications
If you want the result's bits reversed (i.e. heca0000) you can easily change the magic numbers accordingly
// result: he00 | 00ca;
return (((x & 0x09)*0x88000000 >> 28) & 0x0C) | (((x & 0xA0)*0x04800000) >> 30);
or you can also extract the 3 bits e, c and a at the same time, leaving h separately (as I mentioned above, there are often multiple solutions) and you need only one multiplication
return ((x & 0xA8)*0x12400000 >> 29) | (x & 0x01) << 3; // result: 0eca | h000
But there might be a better alternative like the above second snippet
const uint32_t X = (x << 8) | x;
return (X & 0x2881)*0x80290000 >> 28
Correctness check: http://ideone.com/PYUkty
For a larger number of masks you can precompute the magic numbers correspond to those masks and store them in an array so that you can look them up immediately for use. I calculated those mask by hand but you can do that automatically
Explanation
We have abcdefgh & mask1 = a0c00000. Multiply it with magic1
........................a0c00000
× 00000011000000000000000000000000 (magic1 = 0x03000000)
────────────────────────────────
a0c00000........................
+ a0c00000......................... (the leading "a" bit is outside int's range
──────────────────────────────── so it'll be truncated)
r1 = acc.............................
=> (r1 >> 28) & 0x0C = 0000ac00
Similarly we multiply abcdefgh & mask2 = 0000e00h with magic2
........................0000e00h
× 01010000000000000000000000000000 (magic2 = 0x50000000)
────────────────────────────────
e00h............................
+ 0h..............................
────────────────────────────────
r2 = eh..............................
=> (r2 >> 30) = 000000eh
Combine them together we have the expected result
((r1 >> 28) & 0x0C) | (r2 >> 30) = 0000aceh
And here's the demo for the second snippet
abcdefghabcdefgh
& 1000100000100001 (0x8821)
────────────────────────────────
a000e00000c0000h
× 00010010000001010000000000000000 (0x12050000)
────────────────────────────────
000h
00e00000c0000h
+ 0c0000h
a000e00000c0000h
────────────────────────────────
= acehe0h0c0c00h0h
& 11110000000000000000000000000000
────────────────────────────────
= aceh
For the reversed order case:
abcdefghabcdefgh
& 0010100010000001 (0x2881)
────────────────────────────────
00c0e000a000000h
x 10000000001010010000000000000000 (0x80290000)
────────────────────────────────
000a000000h
00c0e000a000000h
+ 0e000a000000h
h
────────────────────────────────
hecaea00a0h0h00h
& 11110000000000000000000000000000
────────────────────────────────
= heca
Related:
How to create a byte out of 8 bool values (and vice versa)?
Redistribute least significant bits from a 4-byte array to a nibble

Test if fixed set is equal without branching

I have a set of integers (x, y, z) and a function that takes 3 integers (u, v, w). How can I test if (x,y,z) == (u,v,w)? The naive way is:
bool match = (x == u || x == v || x == w) && (y == u || y == v || y == w) && (z == u || z == v || z == w);
Does anyone know of some smart bit operations/arithmetic to do the same thing?
Edit: I can assume that neither (x, y, z) or (u, v, w) contain duplicates.
In this case, you can replace the logical operations by bitwise operations to eliminate the branching:
bool match = (x == u | x == v | x == w)
& (y == u | y == v | y == w)
& (z == u | z == v | z == w);
However, you would have to measure the performance effect to see if this is faster or slower.
You can eliminate a bunch of unequal vectors up front by converting to unsigned and comparing the sums before doing the real test.
If a and b are the same then a^b is zero. So !(a^b) is non-zero only when a and b are the same. Supposing your platform can do logical 'not' without a branch, you can therefore test whether a is a member of (u, v, w) with a single branch using:
if(!(a^u) | !(a^v) | !(a^w))
And hence whether all of (x, y, z) are members of (u, v, w) using:
if(
(!(a^u) | !(a^v) | !(a^w))) &
(!(b^u) | !(b^v) | !(b^w))) &
(!(c^u) | !(c^v) | !(c^w))))
i.e. just doing a bitwise and on the various results, and again only a single branch.
If your platform needs a branch to perform !, e.g. if it's performed essentially as a ? 0 : -1, then that's ten conditionals and no better than the naive solution.
In C there is no way to do this without branching.
If you are willing to inline-assembly you can do this with some CMPXCHG instructions.
As pointed out in the comments, your 'naive' way matches whenever all the elements in (x,y,z) are contained in the set (u,v,w). If you really want to test if the sets are equivalent, you probably want
(x==u && ((y==v && z==w) || (y==w && z==v))) ||
(y==u && ((z==v && x==w) || (x==w && z==v))) ||
(z==u && ((x==v && y==w) || (y==w && x==v)));
You can quickly filter out many mismatches with
bad = (x+y+z) - (u+v+w);
Some processors have a non-branching 'min' and 'max' instructions, which would allow you to do
a = min(x,y)
b = max(x,y)
c = min(b,z)
x = min(a,c)
y = max(a,c)
z = max(b,z)
//repeat sorting sequence for u,v,w
match = (x==u)&(y==v)&(z==w);