Bit operation used in a for loop - c++

I found this loop in the source code of an algorithm. I think that details about the problems aren't relevant here, because this is a really small part of the solution.
void update(int i, int value, int array[], int n) {
for(; i < n; i += ~i & (i + 1)) {
array[i] += value;
}
}
I don't really understand what happens in that for loop, is it some sort of trick? I found something similar named Fenwick trees, but they look a bit different than what I have here.
Any ideas what this loop means?
Also, found this :
"Bit Hack #9. Isolate the rightmost 0-bit.
y = ~x & (x+1)
"

You are correct: the bit-hack ~i & (i + 1) should evaluate to an integer which is all binary 0's, except the one corresponding to the rightmost zero-bit of i, which is set to binary 1.
So at the end of each pass of the for loop, it adds this value to itself. Since the corresponding bit in i is zero, this has the effect of setting it, without affecting any other bits in i. This will strictly increase the value of i at each pass, until i overflows (or becomes -1, if you started with i<0). In context, you can probably expect that it is called with i>=0, and that i < n is set terminate the loop before your index walks off the array.
The overall function should have the effect of iterating through the zero-bits of the original value of i from least- to most-significant, setting them one by one, and incrementing the corresponding elements of the array.
Fenwick trees are a clever way to accumulate and query statistics efficiently; as you say, their update loop looks a bit like this, and typically uses a comparable bit-hack. There are bound to be multiple ways to accomplish this kind of bit-fiddling, so it is certainly possible that your source code is updating a Fenwick tree, or something comparable.

Assume that from the right to the left, you have some number of 1 bits, a 0 bit, and then more bits in x.
If you add x + 1, then all the 1's at the right are changed to 0, the 0 is changed to 1, the rest is unchanged. For example xxxx011 + 1 = xxxx100.
In ~x, you have the same number of 0 bits, a 1 bit, and the inverses of the other bits. The bitwise and produces the 0 bits, one 1 bit, and since the remaining bits are and'ed with their negation, those bits are 0.
So the result of ~x & (x + 1) is a number with one 1 bit where x had its rightmost zero bit.
If you add this to x, you change the rightmost 0 to a 1. So if you do this repeatedly, you change the 0 bits in x to 1, from the right to the left.

The update function iterates and sets the 0-bits of i from the leftmost zero to the rightmost zero and add value to the ith element of array.
The for loop checks if i is less than n, if so, ~i & (i + 1) would be an integer has all binary 0's, except for the rightmost bit ( i.e. 1). Then array[i] += value adds value to iterated itself.
Setting i to 8 and going through iterations may clear things to you.

Related

Specific binary permutation generating function

So I'm writing a program where I need to produce strings of binary numbers that are not only a specific length, but also have a specific number of 1's and 0's. In addition, theses strings that are produced are compared to a higher and lower value to see if they are in that specific range. The issue that I'm having is that I'm dealing with 64 bit unsigned integers. So sometimes, very large numbers that require al 64 bits produce a lot of permutations of binary strings for values which are not in the range at all and it's taking a ton of time.
I'm curious if it is possible for an algorithm to take in two bound values, a number of ones, and only produce binary strings in between the bound values with that specific number of ones.
This is what I have so far, but it's producing way to many numbers.
void generatePermutations(int no_ones, int length, uint64_t smaller, uint64_t larger, uint64_t& accum){
char charArray[length+1];
for(int i = length - 1; i > -1; i--){
if(no_ones > 0){
charArray[i] = '1';
no_ones--;
}else{
charArray[i] = '0';
}
}
charArray[length] = '\0';
do {
std::string val(charArray);
uint64_t num = convertToNum(val);
if(num >= smaller && num <= larger){
accum ++;
}
} while ( std::next_permutation(charArray, (charArray + length)));
}
(Note: The number of 1-bits in a binary value is generally called the population count -- popcount, for short -- or Hamming weight.)
There is a well-known bit-hack to cycle through all binary words with the same population count, which basically does the following:
Find the longest suffix of the word consisting of a 0, a non-empty sequence of 1s, and finally a possibly empty sequence of 0s.
Change the first 0 to a 1; the following 1 to a 0, and then shift all the others 1s (if any) to the end of the word.
Example:
00010010111100
^-------- beginning of the suffix
00010011 0 becomes 1
0 1 becomes 0
00111 remaining 1s right-shifted to the end
That can be done quite rapidly by using the fact that the lowest-order set bit in x is x & -x (where - represents the 2s-complement negative of x). To find the beginning of the suffix, it suffices to add the lowest-order set bit to the number, and then find the new lowest-order set bit. (Try this with a few numbers and you should see how it works.)
The biggest problem is performing the right shift, since we don't actually know the bit count. The traditional solution is to do the right-shift with a division (by the original low-order 1 bit), but it turns out that divide on modern hardware is really slow, relative to other operands. Looping a one-bit shift is generally faster than dividing, but in the code below I use gcc's __builtin_ffsll, which normally compiles into an appropriate opcode if one exists on the target hardware. (See man ffs for details; I use the builtin to avoid feature-test macros, but it's a bit ugly and limits the range of compilers you can use. OTOH, ffsll is also an extension.)
I've included the division-based solution as well for portability; however, it takes almost three times as long on my i5 laptop.
template<typename UInt>
static inline UInt last_one(UInt ui) { return ui & -ui; }
// next_with_same_popcount(ui) finds the next larger integer with the same
// number of 1-bits as ui. If there isn't one (within the range
// of the unsigned type), it returns 0.
template<typename UInt>
UInt next_with_same_popcount(UInt ui) {
UInt lo = last_one(ui);
UInt next = ui + lo;
UInt hi = last_one(next);
if (next) next += (hi >> __builtin_ffsll(lo)) - 1;
return next;
}
/*
template<typename UInt>
UInt next_with_same_popcount(UInt ui) {
UInt lo = last_one(ui);
UInt next = ui + lo;
UInt hi = last_one(next) >> 1;
if (next) next += hi/lo - 1;
return next;
}
*/
The only remaining problem is to find the first number with the correct popcount inside of the given range. To help with this, the following simple algorithm can be used:
Start with the first value in the range.
As long as the popcount of the value is too high, eliminate the last run of 1s by adding the low-order 1 bit to the number (using exactly the same x&-x trick as above). Since this works right-to-left, it cannot loop more than 64 times, once per bit.
While the popcount is too small, add the smallest possible bit by changing the low-order 0 bit to a 1. Since this adds a single 1-bit on each loop, it also cannot loop more than k times (where k is the target popcount), and it is not necessary to recompute the population count on each loop, unlike the first step.
In the following implementation, I again use a GCC builtin, __builtin_popcountll. This one doesn't have a corresponding Posix function. See the Wikipedia page for alternative implementations and a list of hardware which does support the operation. Note that it is possible that the value found will exceed the end of the range; also, the function might return a value less than the supplied argument, indicating that there is no appropriate value. So you need to check that the result is inside the desired range before using it.
// next_with_popcount_k returns the smallest integer >= ui whose popcnt
// is exactly k. If ui has exactly k bits set, it is returned. If there
// is no such value, returns the smallest integer with exactly k bits.
template<typename UInt>
UInt next_with_popcount_k(UInt ui, int k) {
int count;
while ((count = __builtin_popcountll(ui)) > k)
ui += last_one(ui);
for (int i = count; i < k; ++i)
ui += last_one(~ui);
return ui;
}
It's possible to make this slightly more efficient by changing the first loop to:
while ((count = __builtin_popcountll(ui)) > k) {
UInt lo = last_one(ui);
ui += last_one(ui - lo) - lo;
}
That shaved about 10% off of the execution time, but I doubt whether the function will be called often enough to make that worthwhile. Depending on how efficiently your CPU implements the POPCOUNT opcode, it might be faster to do the first loop with a single bit sweep in order to be able to track the popcount instead of recomputing it. That will almost certainly be the case on hardware without a POPCOUNT opcode.
Once you have those two functions, iterating over all k-bit values in a range becomes trivial:
void all_k_bits(uint64_t lo, uint64_t hi, int k) {
uint64_t i = next_with_popcount_k(lo, k);
if (i >= lo) {
for (; i > 0 && i < hi; i = next_with_same_popcount(i)) {
// Do what needs to be done
}
}
}

Sieve of eratosthenes : bit wise optimized

After searching the net I came to know that the bit-wise version of the sieve of eratosthenes is pretty efficient.
The problem is I am unable to understand the math/method it is using.
The version that I have been busy with looks like this:
#define MAX 100000000
#define LIM 10000
unsigned flag[MAX>>6]={0};
#define ifc(n) (flag[n>>6]&(1<<((n>>1)&31))) //LINE 1
#define isc(n) (flag[n>>6]|=(1<<((n>>1)&31))) //LINE 2
void sieve() {
unsigned i, j, k;
for(i=3; i<LIM; i+=2)
if(!ifc(i))
for(j=i*i, k=i<<1; j<LIM*LIM; j+=k)
isc(j);
}
Points that I understood (Please correct me if I am wrong):
Statement in line 1 checks if the number is composite.
Statement in line 2 marks the number 'n' as composite.
The program is storing the value 0 or 1 at a bit of an int. This tends to reduce the memory usage to x/32. (x is the size that would have been used had an int been used to store the 0 or 1 instead of a bit like in my solution above)
Points that are going above my head as of now :
How is the finction in LINE 1 functioning.How is the function making sure that the number is composite or not.
How is function in LINE 2 setting the bit.
I also came to know that the bitwise sieve is timewise efficient as
well. Is it because of the use of bitwise operators only or
something else is contributing to it as well.
Any ideas or suggestions?
Technically, there is a bug in the code as well:
unsigned flag[MAX>>6]={0};
divides MAX by 64, but if MAX is not an exact multiple of 64, the array is one element short.
Line 1: Let's pick it apart:
(flag[n>>6]&(1<<((n>>1)&31)))
The flag[n>>6] (n >> 6 = n / 64) gives the 32-bit integer that holds the bit value for n / 2.
Since only "Odd" numbers are possible primes, divide n by two: (n>>1).
The 1<<((n>>1)&31) gives us the bit corresponding to n/2 within the 0..31 - (& 31 makes sure that it's "in range").
Finally, use & to combine the value on the left with the value on the right.
So, the result is true if element for n has bit number n modulo 32 set.
The second line is essentially the same concept, just that it uses |= (or equal) to set the bit corresponding to the multiple.

add and remove last bit

I am trying to determine the next and previous even number with bitwise operations.
So for example for the next function:
x nextEven(x)
1 2
2 2
3 4
4 4
and for the previous:
x previousEven(x)
1 0
2 2
3 2
4 4
I had the idea for the nextEven function something like: value = ((value+1)>>1)<<1;
And for the previousEven function something like: value = ((value)>>1)<<1
is there a better approach?, without comparing and seeing if the values are even or odd.
Thank you.
Doing a right shift followed by a left shift to clear the LSB isn't very efficient.
I'd use something like:
previous: value &= ~1;
next: value = (value +1) & ~1;
The ~1 can (and normally will) be pre-computed at compile time, so the previous will end up as a single bit-wise operation at run-time. the next will probably end up as two operations (increment, and), but should still be quite fast.
About the best you can hope for from the shifts is that the compiler will recognize that you're just clearly the LSB, and optimize it to about what you'd expect this to produce anyway.
you could do something like this
for previous even
unsigned prevev(unsigned x)
{
return x-(x%2);//bitwise counterpart x-(x&1);
}
for next even
unsigned nxtev(unsigned x)
{
return (x%2)+x; //bitwise counterpart x+(x&1);
}
Say you're using unsigned ints, previous even (matching your values - we could argue about whether previous even of 2 should be 0 etc) is simply x & ~1u. Next even is previous even of x + 1.
Tricks like Duff's Device, or swapping two variables with XOR, or working out next and previous even number with bitwise operations seem clever, but they rarely are.
The best thing you can do as a developer is to optimise for readability first and only tackle performance once you've identified a specific bottleneck that is causing real problems.
The best code for getting the previous even number (by your definition where the previous even number of 2 is 2) is simply writing something like:
if ((num % 2) == 1) num--; // num++ for next.
or (slightly more advanced):
num -= num % 2; // += for next.
and letting the insane optimising compilers figure out the best underlying code.
Unless you need to do these operations billions of times per second, readability should always be your prime concern.
Previous even number:
For previous even number I prefer Jerry Coffin's answer
// Get previous even number
unsigned prevEven(unsigned no)
{
return (no & ~1);
}
Next even number:
I try to use only bitwise operator's but still i use one unary minus(-) operator to get next number.
// Get next even number
unsigned nextEven(unsigned no)
{
return (no & 1) ? (-(~no)) : no ;
}
Working of Method nextEven():
If number is even return the same number,
if no is even it's LSB is 0 otherwise 1
Get LSB of number => number & 1
If number is odd return the number + 1,
Add 1 to number => -(~number)
unsigned int previous(unsigned int x)
{
return x & 0xfffffffe;
}
unsigned int next(unsigned int x)
{
return previous(x + 2);
}

Isolating a string of 1's in a character

I need to come up with a function which takes a char and index of a set bit in it and isolates a string of 1's containing that bit.
i.e.
char isolate(unsigned char arg, int i);
For example:
isolate(221,2) would return 28 (11011101 >>> 00011100)
isolate(221,6) would return 192 (11011101 >>> 1100000)
A lookup table seems a clumsy solution as it would require ~256*8=2048 entries.
I am thinking of examining each individual bit to the left and right of the index:
char isolate(char arg, int i)
{
char result=0;
char mask = 1<<i;
for(char mask = 1<<i; arg & mask != 0; mask>>=1)
result |= mask;
for(char mask = 1<<i; arg & mask != 0; mask<<=1)
result |= mask;
return result;
}
But it also seems a bit ugly. How can I do any better than this?
That's a funny operation. The code you've written expresses it fairly well, so would you mind elaborating on how it's ugly?
The details I can see: Given that i expresses a bit number in arg, there's absolutely no point in i being a wider type. There's never a point in writing != 0 in a condition. You probably don't want to be redeclaring mask everywhere you use it, nor initializing it twice in a row.
As for the actual spreading bit mask, I can't think of a way that's more expressive, cleaner or efficient right now.
Warning: none of this was tested or even relevant*, but it may be interesting.
Isolating the rightmost run of 1s is easy, like this: x ^ (x & ((x|(x-1))+1)) (explanation below), so let's work with that.
First x|(x-1) smears the rightmost 1 to the right, adding 1 turns all those bits to 0 including the rightmost run of 1's, anding x with removes rightmost run of 1's, and finally, xoring that with x leaves just the rightmost run of 1s.
Then we just need to make sure that the range we're looking for is the rightmost one. That's less amenable to simple bitmath, but if there's Count Leading Zeros (clz), it's not too hard:
int shift = 32 - clz(~x & ((1 << i) - 1)); //replace 32 with word size
x = (x >> shift) << shift;
((1 << i) - 1) makes a mask of the part where the right-end of the run we're looking for could be in (it could also just miss the end, but that's ok), then clz looks for the first zero to the right of i in x, then the shifts remove the bits that we don't want to look at.
Apply the first formula, for isolating the rightmost run of 1s, to the result of that to get the run of ones where i was in. i had better be in some run, or things go sideways (more accurately, it would return the first run of 1s that starts at an index higher than i)
*: For this question, none of this really matters. A 2KB table is not a clumsy solution unless you only have a tiny amount of memory available, and even if that's the case, the input is so short that the loops aren't all that bad.

Analysis of the usage of prime numbers in hash functions

I was studying hash-based sort and I found that using prime numbers in a hash function is considered a good idea, because multiplying each character of the key by a prime number and adding the results up would produce a unique value (because primes are unique) and a prime number like 31 would produce better distribution of keys.
key(s)=s[0]*31(len–1)+s[1]*31(len–2)+ ... +s[len–1]
Sample code:
public int hashCode( )
{
int h = hash;
if (h == 0)
{
for (int i = 0; i < chars.length; i++)
{
h = MULT*h + chars[i];
}
hash = h;
}
return h;
}
I would like to understand why the use of even numbers for multiplying each character is a bad idea in the context of this explanation below (found on another forum; it sounds like a good explanation, but I'm failing to grasp it). If the reasoning below is not valid, I would appreciate a simpler explanation.
Suppose MULT were 26, and consider
hashing a hundred-character string.
How much influence does the string's
first character have on the final
value of 'h'? The first character's value
will have been multiplied by MULT 99
times, so if the arithmetic were done
in infinite precision the value would
consist of some jumble of bits
followed by 99 low-order zero bits --
each time you multiply by MULT you
introduce another low-order zero,
right? The computer's finite
arithmetic just chops away all the
excess high-order bits, so the first
character's actual contribution to 'h'
is ... precisely zero! The 'h' value
depends only on the rightmost 32
string characters (assuming a 32-bit
int), and even then things are not
wonderful: the first of those final 32
bytes influences only the leftmost bit
of `h' and has no effect on the
remaining 31. Clearly, an even-valued
MULT is a poor idea.
I think it's easier to see if you use 2 instead of 26. They both have the same effect on the lowest-order bit of h. Consider a 33 character string of some character c followed by 32 zero bytes (for illustrative purposes). Since the string isn't wholly null you'd hope the hash would be nonzero.
For the first character, your computed hash h is equal to c[0]. For the second character, you take h * 2 + c[1]. So now h is 2*c[0]. For the third character h is now h*2 + c[2] which works out to 4*c[0]. Repeat this 30 more times, and you can see that the multiplier uses more bits than are available in your destination, meaning effectively c[0] had no impact on the final hash at all.
The end math works out exactly the same with a different multiplier like 26, except that the intermediate hashes will modulo 2^32 every so often during the process. Since 26 is even it still adds one 0 bit to the low end each iteration.
This hash can be described like this (here ^ is exponentiation, not xor).
hash(string) = sum_over_i(s[i] * MULT^(strlen(s) - i - 1)) % (2^32).
Look at the contribution of the first character. It's
(s[0] * MULT^(strlen(s) - 1)) % (2^32).
If the string is long enough (strlen(s) > 32) then this is zero.
Other people have posted the answer -- if you use an even multiple, then only the last characters in the string matter for computing the hash, as the early character's influence will have shifted out of the register.
Now lets consider what happens when you use a multiplier like 31. Well, 31 is 32-1 or 2^5 - 1. So when you use that, your final hash value will be:
\sum{c_i 2^{5(len-i)} - \sum{c_i}
unfortunately stackoverflow doesn't understad TeX math notation, so the above is hard to understand, but its two summations over the characters in the string, where the first one shifts each character by 5 bits for each subsequent character in the string. So using a 32-bit machine, that will shift off the top for all except the last seven characters of the string.
The upshot of this is that using a multiplier of 31 means that while characters other than the last seven have an effect on the string, its completely independent of their order. If you take two strings that have the same last 7 characters, for which the other characters also the same but in a different order, you'll get the same hash for both. You'll also get the same hash for things like "az" and "by" other than in the last 7 chars.
So using a prime multiplier, while much better than an even multiplier, is still not very good. Better is to use a rotate instruction, which shifts the bits back into the bottom when they shift out the top. Something like:
public unisgned hashCode(string chars)
{
unsigned h = 0;
for (int i = 0; i < chars.length; i++) {
h = (h<<5) + (h>>27); // ROL by 5, assuming 32 bits here
h += chars[i];
}
return h;
}
Of course, this depends on your compiler being smart enough to recognize the idiom for a rotate instruction and turn it into a single instruction for maximum efficiency.
This also still has the problem that swapping 32-character blocks in the string will give the same hash value, so its far from strong, but probably adequate for most non-cryptographic purposes
would produce a unique value
Stop right there. Hashes are not unique. A good hash algorithm will minimize collisions, but the pigeonhole principle assures us that perfectly avoiding collisions is not possible (for any datatype with non-trivial information content).