What does this function do? - c++

I am reading a program which contains the following function, which is
int f(int n) {
int c;
for (c=0;n!=0;++c)
n=n&(n-1);
return c;
}
I don't quite understand what does this function intend to do?

It counts number of 1's in binary representation of n

The function is INTENDED to return the number of bits in the representation of n. What is missed out in the other answers is, that the function invokes undefined behaviour for arguments n < 0. This is because the function peels the number away one bit a time, starting from the lowest bit to the highest. For a negative number this means, that the last value of n before the loop terminates (for 32-bit integers in 2-complement) is 0x8000000. This number is INT_MIN and it is now used in the loop for the last time:
n = n&(n-1)
Unfortunately, INT_MIN-1 is a overflow and overflows invoke undefined behavior. A conforming implementation is not required to "wrap around" integers, it may for example issue an overflow trap instead or leave all kinds of weird results.

It is a (now obsolete) workaround for the lack of the POPCNT instruction in non-military cpu's.

This counts the number of iterations it takes to reduce n to 0 by using a binary and.

The expression n = n & (n - 1) is a bitwise operation which replaces the rightmost bit '1' to '0' in n.
For examle, take an integer 5 (0101). Then n & (n - 1) → (0101) & (0100) → 0100 (removes first '1' bit from right side).
So the above code returns the number of 1's in binary form of given integer.

It shows a way how not to program(for the x86 Instruction set), using a intrinsic/inline assembler instruction is faster and better to read for something simple like this. (but this is only true for a x86 Architecture as far as i know, i don't know how's it about ARM or SPARC or something else)

Could it be it tries to return the number of significant bits in n? (Haven't thought through it completely...)

Related

How to check if an int variable is even or odd by "looking" at the last digit of its binary value?

How it is possible to check if one integer number stored in a variable is even or odd, looking at its last digit?
(I've already seen many answers determining it by the reminder doing a modulo operation,
but, I mean, since a binary-number's last digit can be 0 xor 1,
and assumed that it is 0, the number is even, else the number is odd, wouldn't it be much faster to simply look at its last digit?)
Assuming two's complement (or unsigned) numbers, which are guaranteed by C++20, and were nearly ubiquitous before:
bool even = !(x & 1);
Use
bool even = !(x % 2);
to test evenness for any integral type x. You could use flashy XOR tricks but then you are doing the compiler's job, not your job. The compiler will make the appropriate optimisation. Yes XOR is normally an extremely fast machine-level instruction, to the point you see REG XOR REG to set REG to 0 when looking at generated assembly, rather than a write to 0.
The correct way using XOR is to write
bool even = (x & 1) ^ ((-1 & 1) | ((x < 0) ? 0 : 1));
assuming x is a signed type. Any other way and you are assuming something about the complementing scheme of x. You can't assume 2's complement until C++20.
wouldn't it be much faster to simply look at its last digit?
Checking last bit will be faster not slower than calculating x % 2 and comparing that to 0.
However, your compiler will most certainly do that optimization. So write your code in a manner that is easy to understand and maintain.

why do I get different results with the same conditions in a for loop?

I was stuck when I was trying to use a for loop to solve a problem.
Here's my simplified code:
int main(int argc, const char * argv[])
{
std::vector<int> a;
a.push_back(2333);
int n = 10, m = 10;
for(int i=0; i< -1; i++)
m--;
std::cout<<m<<endl;
for(int j=0; j<a.size()-2; j++)
n--;
std::cout<<n<<endl;
return 0;
}
Apparently, a.size() = 1 so these two end conditions should be the same. However, when I ran my code on Xcode 9.4.1 I got unexpected as it turned out that m = 10
and n = 11. And I found that the time it took to get the value of n is much longer than m.
Why would I get such a result? Any help will be appreciated.
The value returned by size() is std::size_t, which is an unsigned integral type. This means that it can only represent non-negative numbers, and if you do an operation that results in a negative number, it will wrap around to the largest possible value like in modular arithmetic.
Here, 2 - 1 is -1, which wraps to 2^32 - 1 on a 32-bit system. When you try to subtract 2^32 - 1 from 10, you cause a signed integer underflow since the minimum value of a 32-bit integer is -2^31. Signed integer overflow/underflow is undefined behavior, so anything can happen.
In this case, it seems like the underflow wrapped around to the maximum value. So the result would be 10 - (2^32 - 1) + 2^32, which is 11. We add 2^32 to simulate the underflow wrapping around. In other words, after the 2^31 + 10th iteration of the loop, n is the minimum possible value in a 32-bit integer. The next iteration causes the wrap around, so n is now 2^31 - 1. Then, the remaining 2^31 - 12 iterations decrease n to 11.
Again, signed integer overflow/underflow is undefined behavior, so don't be surprised when something weird happens because of that, especially with modern compiler optimizations. For example, your entire program can be "optimized" to do absolutely nothing since it will always invoke UB. You're not even guaranteed to see the output from std::cout<<m<<endl;, even though the UB is invoked after that line executes.
The value returned by a.size() is it type size_t, which is an unsigned int, because there wouldn’t be any reason to have a size that is negative. If you do 1-2 with unsigned numbers it will roll over and become a value near the maximum value for an unsigned int and the loop will take quite a while to run, or might not even stop since a signed integer can’t be larger than the top half of unsigned values. This depends on the rules of comparing signed and unsigned which I don’t remember for sure on the spot.
Using a debugger and making sure the types are correct (your compiler should mention signed/unsigned mismatch here) helps determine these cases.

Do most compilers transform % 2 into bit comparison? Is it really faster?

In programming, one often needs to check if a number is odd or even. For that, we usually use:
n % 2 == 0
However, my understanding is that the '%' operator actually performs a division and returns its remainder; therefore, for the case above, it would be faster to simply check the last bit instead. Let's say n = 5;
5 = 00000101
In order to check if the number is odd or even, we just need to check the last bit. If it's 1, the number is odd; otherwise, it is even. In programming, it would be expressed like this:
n & 1 == 0
In my understanding this would be faster than % 2 as no division is performed. A mere bit comparison is needed.
I have 2 questions then:
1) Is the second way really faster than the first (in all cases)?
2) If the answer for 1 is yes, are compilers (in all languages) smart enough to convert % 2 into a simple bit comparison? Or do we have to explicitly use the second way if we want the best performance?
Yes, a bit-test is much faster than integer division, by about a factor of 10 to 20, or even 100 for 128bit / 64bit = 64bit idiv on Intel. Esp. since x86 at least has a test instruction that sets condition flags based on the result of a bitwise AND, so you don't have to divide and then compare; the bitwise AND is the compare.
I decided to actually check the compiler output on Godbolt, and got a surprise:
It turns out that using n % 2 as a signed integer value (e.g. a return n % 2 from a function that return signed int) instead of just testing it for non-zero (if (n % 2)) sometimes produces slower code than return n & 1. This is because (-1 % 2) == -1, while (-1 & 1) == 1, so the compiler can't use a bitwise AND. Compilers still avoid integer division, though, and use some clever shift / and / add / sub sequence instead, because that's still cheaper than an integer division. (gcc and clang use different sequences.)
So if you want to return a truth value based on n % 2, your best bet is to do it with an unsigned type. This lets the compiler always optimize it to a single AND instruction. (On godbolt, you can flip to other architectures, like ARM and PowerPC, and see that the unsigned even (%) function and the int even_bit (bitwise &) function have the same asm code.)
Using a bool (which must be 0 or 1, not just any non-zero value) is another option, but the compiler will have to do extra work to return (bool) (n % 4) (or any test other than n%2). The bitwise-and version of that will be 0, 1, 2, or 3, so the compiler has to turn any non-zero value into a 1. (x86 has an efficient setcc instruction that sets a register to 0 or 1, depending on the flags, so it's still only 2 instructions instead of 1. clang/gcc use this, see aligned4_bool in the godbolt asm output.)
With any optimization level higher than -O0, gcc and clang optimize if (n%2) to what we expect. The other huge surprise is that icc 13 doesn't. I don't understand WTF icc thinks it's doing with all those branches.
The speed is equivalent.
The modulo version is generally guaranteed to work whether the integer is positive, negative or zero, regardless of the implementing language. The bitwise version is not.
Use what you feel is most readable.

Acting like unsigned int overflow. What is causing it?

I have this function which generates a specified number of so called 'triangle numbers'. If I print out the deque afterwords, the numbers increase, jumps down, then increases again. Triangle numbers should never get lower as i rises so there must be some kind of overflow happening. I tried to fix it by adding the line if(toPush > INT_MAX) return i - 1; to try to stop the function from generating more numbers (and return the number it generated) if the result is overflowing. That is not working however, the output continues to be incorrect (increases for a while, jumps down to a lower number, then increases again). The line I added doesn't actually seem to be doing anything at all. Return is not being reached. Does anyone know what's going on here?
#include <iostream>
#include <deque>
#include <climits>
int generateTriangleNumbers(std::deque<unsigned int> &triangleNumbers, unsigned int generateCount) {
for(unsigned int i = 1; i <= generateCount; i++) {
unsigned int toPush = (i * (i + 1)) / 2;
if(toPush > INT_MAX) return i - 1;
triangleNumbers.push_back(toPush);
}
return generateCount;
}
INT_MAX is the maximum value of signed int. It's about half the maximum value of unsigned int (UINT_MAX). Your calculation of toPush may well get much higher than UINT_MAX because you square the value (if it's near INT_MAX the result will be much larger than UINT_MAX that your toPush can hold). In this case the toPush wraps around and results in smaller value than previous one.
First of all, your comparison to INT_MAX is flawed since your type is unsigned int, not signed int. Secondly, even a comparison to UINT_MAX would be incorrect since it implies that toPush (the left operand of the comparison expression) can hold a value above it's maximum - and that's not possible. The correct way would be to compare your generated number with the previous one. If it's lower, you know you have got an overflow and you should stop.
Additionally, you may want to use types that can hold a larger range of values (such as unsigned long long).
The 92682th triangle number is already greater than UINT32_MAX. But the culprit here is much earlier, in the computation of i * (i + 1). There, the calculation overflows for the 65536th triangular number. If we ask Python with its native bignum support:
>>> 2**16 * (2**16+1) > 0xffffffff
True
Oops. Then if you inspect your stored numbers, you will see your sequence dropping back to low values. To attempt to emulate what the Standard says about the behaviour of this case, in Python:
>>> (int(2**16 * (2**16+1)) % 0xffffffff) >> 1
32768
and that is the value you will see for the 65536th triangular number, which is incorrect.
One way to detect overflow here is ensure that the sequence of numbers you generate is monotonic; that is, if the Nth triangle number generated is strictly greater than the (N-1)th triangle number.
To avoid overflow, you can use 64-bit variables to both generate & store them, or use a big number library if you need a large amount of triangle numbers.
In Visual C++ int (and of course unsigned int) is 32 bits even on 64-bit computers.
Either use unsigned long long or uint64_t to use a 64-bit value.

big integer addition without carry flag

In assembly languages, there is usually an instruction that adds two operands and a carry. If you want to implement big integer additions, you simply add the lowest integers without a carry and the next integers with a carry. How would I do that efficiently in C or C++ where I don't have access to the carry flag? It should work on several compilers and architectures, so I cannot simply use inline assembly or such.
You can use "nails" (a term from GMP): rather than using all 64 bits of a uint64_t when representing a number, use only 63 of them, with the top bit zero. That way you can detect overflow with a simple bit-shift. You may even want less than 63.
Or, you can do half-word arithmetic. If you can do 64-bit arithmetic, represent your number as an array of uint32_ts (or equivalently, split 64-bit words into upper and lower 32-bit chunks). Then, when doing arithmetic operations on these 32-bit integers, you can first promote to 64 bits do the arithmetic there, then convert back. This lets you detect carry, and it's also good for multiplication if you don't have a "multiply hi" instruction.
As the other answer indicates, you can detect overflow in an unsigned addition by:
uint64_t sum = a + b;
uint64_t carry = sum < a;
As an aside, while in practice this will also work in signed arithmetic, you have two issues:
It's more complex
Technically, overflowing a signed integer is undefined behavior
so you're usually better off sticking to unsigned numbers.
You can figure out the carry by virtue of the fact that, if you overflow by adding two numbers, the result will always be less than either of those other two values.
In other words, if a + b is less than a, it overflowed. That's for positive values of a and b of course but that's what you'd almost certainly be using for a bignum library.
Unfortunately, a carry introduces an extra complication in that adding the largest possible value plus a carry of one will give you the same value you started with. Hence, you have to handle that as a special case.
Something like:
carry = 0
for i = 7 to 0:
if a[i] > b[i]:
small = b[i], large = a[i]
else:
small = a[i], large = b[i]
if carry is 1 and large is maxvalue:
c[i] = small
carry = 1
else:
c[i] = large + small + carry
if c[i] < large:
carry = 1
else
carry = 0
In reality, you may also want to consider not using all the bits in your array elements.
I've implemented libraries in the past, where the maximum "digit" is less than or equal to the square root of the highest value it can hold. So for 8-bit (octet) digits, you store values from 0 through 15 - that way, multiplying two digits and adding the maximum carry will always fit with an octet, making overflow detection moot, though at the cost of some storage.
Similarly, 16-bit digits would have the range 0 through 255 so that it won't overflow at 65536.
In fact, I've sometimes limited it to more than that, ensuring the artificial wrap value is a power of ten (so an octet would hold 0 through 9, 16-bit digits would be 0 through 99, 32-bit digits from 0 through 9999, and so on.
That's a bit more wasteful on space but makes conversion to and from text (such as printing your numbers) incredibly easy.
u can check for carry for unsigned types by checking, is result less than an operand (any operand will do).
just start the thing with carry 0.
If I understand you correctly, you want to write you own addition for you own big integer type.
You can do this with a simple function. No need to worry about the carry flag in the first run. Just go from right to left, add digit by digit and the carry flag (internally in that function), starting with a carry of 0, and set the result to (a+b+carry) %10 and the carry to (a+b+carry) / 10.
this SO could be relevant:
how to implement big int in c