C/C++ Bit Twiddling - c++

in the spirit of graphics.stanford.edu/~seander/bithacks.html I need to solve the following problem:
int x;
int pow2; // always a positive power of 2
int sgn; // always either 0 or 1
// ...
// ...
if(sgn == 0)
x -= pow2;
else
x += pow2;
Of course I need to avoid the conditional. So far the best I came up with is
x -= (1|(~sgn+1))*pow2
but that involves a multiplication which I also would like to avoid. Thanks in advance.
EDIT: Thanks all,
x -= (pow2^-sgn) + sgn
seems to do the trick!

I would try
x -= (pow2 ^ (~sgn+1)) + sgn
or, as suggested by lijie in the comments
x -= (pow2 ^ -sgn) + sgn
If sgn is 0, ~sgn+1 is also 0, so pow2 ^ (~sgn+1) == pow2. If sgn is 1, (~sgn+1) is 0xFFFFFFFF, and (pow2 ^ (~sgn+1)) + sgn == -pow2.

mask = sgn - 1; // generate mask: sgn == 0 => mask = -1, sgn == 1 => mask = 0
x = x + (mask & (-pow2)) + (~mask & (pow2)); // use mask to select +/- pow2 for addition

Off the top of my head:
int subMask = sgn - 1;
x -= pow2 & subMask;
int addMask = -sgn;
x += pow2 & addMask;
No guarantees on whether it works or whether this is smart, this is just a random idea that popped into my head.
EDIT: let's make this a bit less readable (aka more compact):
x += (pow2 & -sgn) - (pow2 & (sgn-1));

I would change the interface and replace the multiplication by left shift. (Use exponent instead of pow2)

You can do something like (from the link)
x += ((pow2 ^ -sgn) + sgn)

Related

How do I avoid getting -0 when dividing in c++

I have a script in which I want to find the chunk my player is in.
Simplified version:
float x = -5
float y = -15
int chunkSize = 16
int player_chunk_x = int(x / chunkSize)
int player_chunk_y = int(y / chunkSize)
This gives the chunk the player is in, but when x or y is negative but not less than the chunkSize (-16), player_chunk_x or player_chunk_y is still 0 or '-0' when I need -1
Of course I can just do this:
if (x < 0) x--
if (y < 0) y--
But I was wondering if there is a better solution to my problem.
Thanks in advance.
Since C++20 it's impossible to get an integral type signed negative zero, and was only possible in a rare (but by no means extinct) situation where your platform had 1's complement int. It's still possible in C (although rare), and adding 0 to the result will remove it.
It's possible though to have a floating point signed negative zero. For that, adding 0.0 will remove it.
Note that for an integral -0, subtracting 1 will yield -1.
Your issue is that you are casting a floating point value to an integer value.
This rounds to zero by default.
If you want consistent round down, you first have to floor your value:
int player_chunk_x = int(std::floor(x / chunkSize);
If you don't like negative numbers then don't use them:
int player_chunk_x = (x - min_x) / chunkSize;
int player_chunk_y = (y - min_y) / chunkSize;
If you want integer, in this case -1 on ( -5%16 or anything like it ) then this is possible using a math function:
Possible Ways :
using floor ->
float x = -5;
float y = -15;
int chunkSize = 16;
int player_chunk_x = floor(x / chunkSize)
// will give -1 for (-5 % 16);
// 0 for (5%16)
// 1 for any value between 1 & 2 and so on
int player_chunk_y = floor(y / chunkSize);

A bitwise shortcut for calculating the signed result of `(x - y) / z`, given unsigned operands

I'm looking for a neat way (most likely, a "bitwise shortcut") for calculating the signed value of the expression (x - y) / z, given unsigned operands x, y and z.
Here is a "kinda real kinda pseudo" code illustrating what I am currently doing (please don't mind the actual syntax being "100% perfect C or C++"):
int64 func(uint64 x, uint64 y, uint64 z)
{
if (x >= y) {
uint64 result = (x - y) / z;
if (int64(result) >= 0)
return int64(result);
}
else {
uint64 result = (y - x) / z;
if (int64(result) >= 0)
return -int64(result);
}
throwSomeError();
}
Please assume that I don't have a larger type at hand.
I'd be happy to read any idea of how to make this simpler/shorter/neater.
There is a shortcut, by using a bitwise trick for conditional-negation twice (once for the absolute difference, and then again to restore the sign).
I'll use some similar non-perfect C-ish syntax I guess, to match the question.
First get a mask that has all bits set iff x < y:
uint64 m = -uint64(x < y);
(x - y) and -(y - x) are actually the same, even in unsigned arithmetic, and conditional negation can be done by using the definition of two's complement: -a = ~(a - 1) = (a + (-1) ^ -1). (a + 0) ^ 0 is of course equal to a again, so when m is -1, (a + m) ^ m = -a and when m is zero, it is a. So it's a conditional negation.
uint64 absdiff = (x - y + m) ^ m;
Then divide as usual, and restore the sign by doing another conditional negation:
return int64((absdiff / z + m) ^ m);

Shift masked bits to the lsb

When you and some data with a mask you get some result which is of the same size as the data/mask.
What I want to do, is to take the masked bits in the result (where there was 1 in the mask) and shift them to the right so they are next to each other and I can perform a CTZ (Count Trailing Zeroes) on them.
I didn't know how to name such a procedure so Google has failed me. The operation should preferably not be a loop solution, this has to be as fast operation as possible.
And here is an incredible image made in MS Paint.
This operation is known as compress right. It is implemented as part of BMI2 as the PEXT instruction, in Intel processors as of Haswell.
Unfortunately, without hardware support is it a quite annoying operation. Of course there is an obvious solution, just moving the bits one by one in a loop, here is the one given by Hackers Delight:
unsigned compress(unsigned x, unsigned m) {
unsigned r, s, b; // Result, shift, mask bit.
r = 0;
s = 0;
do {
b = m & 1;
r = r | ((x & b) << s);
s = s + b;
x = x >> 1;
m = m >> 1;
} while (m != 0);
return r;
}
But there is an other way, also given by Hackers Delight, which does less looping (number of iteration logarithmic in the number of bits) but more per iteration:
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
Notice that a lot of the values there depend only on m. Since you only have 512 different masks, you could precompute those and simplify the code to something like this (not tested)
unsigned compress(unsigned x, int maskindex) {
unsigned t;
int i;
x = x & masks[maskindex][0];
for (i = 0; i < 5; i++) {
t = x & masks[maskindex][i + 1];
x = x ^ t | (t >> (1 << i));
}
return x;
}
Of course all of these can be turned into "not a loop" by unrolling, the second and third ways are probably more suitable for that. That's a bit of cheat however.
You can use the pack-by-multiplication technique similar to the one described here. This way you don't need any loop and can mix the bits in any order.
For example with the mask 0b10101001 == 0xA9 like above and 8-bit data abcdefgh (with a-h is the 8 bits) you can use the below expression to get 0000aceh
uint8_t compress_maskA9(uint8_t x)
{
const uint8_t mask1 = 0xA9 & 0xF0;
const uint8_t mask2 = 0xA9 & 0x0F;
return (((x & mask1)*0x03000000 >> 28) & 0x0C) | ((x & mask2)*0x50000000 >> 30);
}
In this specific case there are some overlaps of the 4 bits while adding (which incur unexpected carry) during the multiplication step, so I've split them into 2 parts, the first one extracts bit a and c, then e and h will be extracted in the latter part. There are other ways to split the bits as well, like a & h then c & e. You can see the results compared to Harold's function live on ideone
An alternate way with only one multiplication
const uint32_t X = (x << 8) | x;
return (X & 0x8821)*0x12050000 >> 28;
I got this by duplicating the bits so that they're spaced out farther, leaving enough space to avoid the carry. This is often better than splitting into 2 multiplications
If you want the result's bits reversed (i.e. heca0000) you can easily change the magic numbers accordingly
// result: he00 | 00ca;
return (((x & 0x09)*0x88000000 >> 28) & 0x0C) | (((x & 0xA0)*0x04800000) >> 30);
or you can also extract the 3 bits e, c and a at the same time, leaving h separately (as I mentioned above, there are often multiple solutions) and you need only one multiplication
return ((x & 0xA8)*0x12400000 >> 29) | (x & 0x01) << 3; // result: 0eca | h000
But there might be a better alternative like the above second snippet
const uint32_t X = (x << 8) | x;
return (X & 0x2881)*0x80290000 >> 28
Correctness check: http://ideone.com/PYUkty
For a larger number of masks you can precompute the magic numbers correspond to those masks and store them in an array so that you can look them up immediately for use. I calculated those mask by hand but you can do that automatically
Explanation
We have abcdefgh & mask1 = a0c00000. Multiply it with magic1
........................a0c00000
× 00000011000000000000000000000000 (magic1 = 0x03000000)
────────────────────────────────
a0c00000........................
+ a0c00000......................... (the leading "a" bit is outside int's range
──────────────────────────────── so it'll be truncated)
r1 = acc.............................
=> (r1 >> 28) & 0x0C = 0000ac00
Similarly we multiply abcdefgh & mask2 = 0000e00h with magic2
........................0000e00h
× 01010000000000000000000000000000 (magic2 = 0x50000000)
────────────────────────────────
e00h............................
+ 0h..............................
────────────────────────────────
r2 = eh..............................
=> (r2 >> 30) = 000000eh
Combine them together we have the expected result
((r1 >> 28) & 0x0C) | (r2 >> 30) = 0000aceh
And here's the demo for the second snippet
abcdefghabcdefgh
& 1000100000100001 (0x8821)
────────────────────────────────
a000e00000c0000h
× 00010010000001010000000000000000 (0x12050000)
────────────────────────────────
000h
00e00000c0000h
+ 0c0000h
a000e00000c0000h
────────────────────────────────
= acehe0h0c0c00h0h
& 11110000000000000000000000000000
────────────────────────────────
= aceh
For the reversed order case:
abcdefghabcdefgh
& 0010100010000001 (0x2881)
────────────────────────────────
00c0e000a000000h
x 10000000001010010000000000000000 (0x80290000)
────────────────────────────────
000a000000h
00c0e000a000000h
+ 0e000a000000h
h
────────────────────────────────
hecaea00a0h0h00h
& 11110000000000000000000000000000
────────────────────────────────
= heca
Related:
How to create a byte out of 8 bool values (and vice versa)?
Redistribute least significant bits from a 4-byte array to a nibble

Bitwise operations based on two numbers

I got an assignment today at my faculty (Mathematics Faculty of Belgrade, Serbia) which says:
1) Write a program that for two given integers x and y, inverts in integer x those bits that match the corresponding bits in y, while the rest of the bits remain the same.
For example:
x = 1001110110101
y = 1100010100011
x' = 0011101011100
I managed to write a program that does that, but I am a little insecure about the quality of my solution. Please, if you have time, check out the code and tell me how I could improve it.
int x, y, bitnum;
int z = 0;
unsigned int mask;
bitnum = sizeof(int) * 8;
mask = 1 << bitnum - 1;
printf("Unesi x i y: ");
scanf("%d%d", &x, &y);
while (mask > 1) {
if ( (((x & mask) == 0) && ((y & mask) == 0)) ||
((x & mask) && ((y & mask) == 0)) )
z += 1;
z <<= 1;
mask >>= 1;
} /* <-- THAT'S HOW STUPID PEOPLE SOLVE PROBLEMS... WITH HAMMER! */
z = y~; /* <-- THAT'S HOW SMART PEOPLE SOLVE PROBLEMS... WITH ONE LINE */
Everything works correctly, for x = 423 and y = 324 for example, I get z = -344, which is correct. Also, bit prints match I would just like to know if there is a better way to do this.
Thanks.
If you take a look at your x/y/x' example, it must strike you that x' is a complement to y. And indeed it's like that.
x y x'
--------
1 1 0
0 0 1
1 0 1
0 1 0
Spoiler (hover your mouse over block below, if you want to see a solution):
For bits that match, you invert bit in x, but as it is the same as bit in y, it's the same as inverting bit in y. When they do not match, you keep the bit from x, what is already inversion of bit in y on its own. I hope you can see the one-line solution already yourself: x' = ~y;
//Try with the next code:
unsigned int mask1, mask2, mask3, answ;
mask1 = x & y; // identify bits with 1 that match
mask2 = ~x & ~y; // identify bits with 0 that match
mask3 = mask1 | mask2; // identify bits with 0 or 1 that match
answ = x ^ m3; // Change identified bits

Fast ceiling of an integer division in C / C++

Given integer values x and y, C and C++ both return as the quotient q = x/y the floor of the floating point equivalent. I'm interested in a method of returning the ceiling instead. For example, ceil(10/5)=2 and ceil(11/5)=3.
The obvious approach involves something like:
q = x / y;
if (q * y < x) ++q;
This requires an extra comparison and multiplication; and other methods I've seen (used in fact) involve casting as a float or double. Is there a more direct method that avoids the additional multiplication (or a second division) and branch, and that also avoids casting as a floating point number?
For positive numbers where you want to find the ceiling (q) of x when divided by y.
unsigned int x, y, q;
To round up ...
q = (x + y - 1) / y;
or (avoiding overflow in x+y)
q = 1 + ((x - 1) / y); // if x != 0
For positive numbers:
q = x/y + (x % y != 0);
Sparky's answer is one standard way to solve this problem, but as I also wrote in my comment, you run the risk of overflows. This can be solved by using a wider type, but what if you want to divide long longs?
Nathan Ernst's answer provides one solution, but it involves a function call, a variable declaration and a conditional, which makes it no shorter than the OPs code and probably even slower, because it is harder to optimize.
My solution is this:
q = (x % y) ? x / y + 1 : x / y;
It will be slightly faster than the OPs code, because the modulo and the division is performed using the same instruction on the processor, because the compiler can see that they are equivalent. At least gcc 4.4.1 performs this optimization with -O2 flag on x86.
In theory the compiler might inline the function call in Nathan Ernst's code and emit the same thing, but gcc didn't do that when I tested it. This might be because it would tie the compiled code to a single version of the standard library.
As a final note, none of this matters on a modern machine, except if you are in an extremely tight loop and all your data is in registers or the L1-cache. Otherwise all of these solutions will be equally fast, except for possibly Nathan Ernst's, which might be significantly slower if the function has to be fetched from main memory.
You could use the div function in cstdlib to get the quotient & remainder in a single call and then handle the ceiling separately, like in the below
#include <cstdlib>
#include <iostream>
int div_ceil(int numerator, int denominator)
{
std::div_t res = std::div(numerator, denominator);
return res.rem ? (res.quot + 1) : res.quot;
}
int main(int, const char**)
{
std::cout << "10 / 5 = " << div_ceil(10, 5) << std::endl;
std::cout << "11 / 5 = " << div_ceil(11, 5) << std::endl;
return 0;
}
There's a solution for both positive and negative x but only for positive y with just 1 division and without branches:
int div_ceil(int x, int y) {
return x / y + (x % y > 0);
}
Note, if x is positive then division is towards zero, and we should add 1 if reminder is not zero.
If x is negative then division is towards zero, that's what we need, and we will not add anything because x % y is not positive
How about this? (requires y non-negative, so don't use this in the rare case where y is a variable with no non-negativity guarantee)
q = (x > 0)? 1 + (x - 1)/y: (x / y);
I reduced y/y to one, eliminating the term x + y - 1 and with it any chance of overflow.
I avoid x - 1 wrapping around when x is an unsigned type and contains zero.
For signed x, negative and zero still combine into a single case.
Probably not a huge benefit on a modern general-purpose CPU, but this would be far faster in an embedded system than any of the other correct answers.
I would have rather commented but I don't have a high enough rep.
As far as I am aware, for positive arguments and a divisor which is a power of 2, this is the fastest way (tested in CUDA):
//example y=8
q = (x >> 3) + !!(x & 7);
For generic positive arguments only, I tend to do it like so:
q = x/y + !!(x % y);
This works for positive or negative numbers:
q = x / y + ((x % y != 0) ? !((x > 0) ^ (y > 0)) : 0);
If there is a remainder, checks to see if x and y are of the same sign and adds 1 accordingly.
simplified generic form,
int div_up(int n, int d) {
return n / d + (((n < 0) ^ (d > 0)) && (n % d));
} //i.e. +1 iff (not exact int && positive result)
For a more generic answer, C++ functions for integer division with well defined rounding strategy
For signed or unsigned integers.
q = x / y + !(((x < 0) != (y < 0)) || !(x % y));
For signed dividends and unsigned divisors.
q = x / y + !((x < 0) || !(x % y));
For unsigned dividends and signed divisors.
q = x / y + !((y < 0) || !(x % y));
For unsigned integers.
q = x / y + !!(x % y);
Zero divisor fails (as with a native operation). Cannot cause overflow.
Corresponding floored and modulo constexpr implementations here, along with templates to select the necessary overloads (as full optimization and to prevent mismatched sign comparison warnings):
https://github.com/libbitcoin/libbitcoin-system/wiki/Integer-Division-Unraveled
Compile with O3, The compiler performs optimization well.
q = x / y;
if (x % y) ++q;