compute midpoint in floating point - bit-manipulation

Given two floating point numbers (IEEE single or double precision), I would like to find the number that lies half-way between them, but not in the sense of (x+y)/2 but with respect to actually representable numbers.
if both x and y are positive, the following works
float ieeeMidpoint(float x, float y)
{
assert(x >= 0 && y >= 0);
int xi = *(int*)&x;
int yi = *(int*)&y;
int zi = (xi+yi)/2;
return *(float*)&zi;
}
The reason this works is that positive ieee floating point numbers (including subnormals and infinity) keep their order when doing a reinterpreting cast. (this is not true for the 80-bit extended format, but I don't need that anyway).
Now I am looking for an elegant way to do the same that includes the case when one or both of the numbers are negative. Of course it is easy to do with a bunch of if's, but I was wondering if there is some nice bit-magic, prefarably without any branching.

Figured it out myself. the order of negative number is reversed when doing the reinterpreting cast, so that is the only thing one needs to fix. This version is longer than I hoped it would be, but its only some bit-shuffling, so it should be fast.
float ieeeMidpoint(float x, float y)
{
// check for NaN's (Note that subnormals and infinity work fine)
assert(x ==x && y == y);
// re-interpreting cast
int xi = *(int*)&x;
int yi = *(int*)&y;
// reverse negative numbers
// (would look cleaner with an 'if', but I like not branching)
xi = xi ^ ((xi >> 31) & 0x3FFFFFFF);
yi = yi ^ ((yi >> 31) & 0x3FFFFFFF);
// compute average of xi,yi (overflow-safe)
int zi = (xi >> 1) + (yi >> 1) + (xi & 1);
// reverse negative numbers back
zi = zi ^ ((zi >> 31) & 0x3FFFFFFF);
// re-interpreting back to float
return *(float*)&zi;
}

Related

How do I avoid getting -0 when dividing in c++

I have a script in which I want to find the chunk my player is in.
Simplified version:
float x = -5
float y = -15
int chunkSize = 16
int player_chunk_x = int(x / chunkSize)
int player_chunk_y = int(y / chunkSize)
This gives the chunk the player is in, but when x or y is negative but not less than the chunkSize (-16), player_chunk_x or player_chunk_y is still 0 or '-0' when I need -1
Of course I can just do this:
if (x < 0) x--
if (y < 0) y--
But I was wondering if there is a better solution to my problem.
Thanks in advance.
Since C++20 it's impossible to get an integral type signed negative zero, and was only possible in a rare (but by no means extinct) situation where your platform had 1's complement int. It's still possible in C (although rare), and adding 0 to the result will remove it.
It's possible though to have a floating point signed negative zero. For that, adding 0.0 will remove it.
Note that for an integral -0, subtracting 1 will yield -1.
Your issue is that you are casting a floating point value to an integer value.
This rounds to zero by default.
If you want consistent round down, you first have to floor your value:
int player_chunk_x = int(std::floor(x / chunkSize);
If you don't like negative numbers then don't use them:
int player_chunk_x = (x - min_x) / chunkSize;
int player_chunk_y = (y - min_y) / chunkSize;
If you want integer, in this case -1 on ( -5%16 or anything like it ) then this is possible using a math function:
Possible Ways :
using floor ->
float x = -5;
float y = -15;
int chunkSize = 16;
int player_chunk_x = floor(x / chunkSize)
// will give -1 for (-5 % 16);
// 0 for (5%16)
// 1 for any value between 1 & 2 and so on
int player_chunk_y = floor(y / chunkSize);

A bitwise shortcut for calculating the signed result of `(x - y) / z`, given unsigned operands

I'm looking for a neat way (most likely, a "bitwise shortcut") for calculating the signed value of the expression (x - y) / z, given unsigned operands x, y and z.
Here is a "kinda real kinda pseudo" code illustrating what I am currently doing (please don't mind the actual syntax being "100% perfect C or C++"):
int64 func(uint64 x, uint64 y, uint64 z)
{
if (x >= y) {
uint64 result = (x - y) / z;
if (int64(result) >= 0)
return int64(result);
}
else {
uint64 result = (y - x) / z;
if (int64(result) >= 0)
return -int64(result);
}
throwSomeError();
}
Please assume that I don't have a larger type at hand.
I'd be happy to read any idea of how to make this simpler/shorter/neater.
There is a shortcut, by using a bitwise trick for conditional-negation twice (once for the absolute difference, and then again to restore the sign).
I'll use some similar non-perfect C-ish syntax I guess, to match the question.
First get a mask that has all bits set iff x < y:
uint64 m = -uint64(x < y);
(x - y) and -(y - x) are actually the same, even in unsigned arithmetic, and conditional negation can be done by using the definition of two's complement: -a = ~(a - 1) = (a + (-1) ^ -1). (a + 0) ^ 0 is of course equal to a again, so when m is -1, (a + m) ^ m = -a and when m is zero, it is a. So it's a conditional negation.
uint64 absdiff = (x - y + m) ^ m;
Then divide as usual, and restore the sign by doing another conditional negation:
return int64((absdiff / z + m) ^ m);

Calculator with specific methods only. Normal and recursive

So atm im stuck with my calculator. It is only allowed to use following methods:
int succ(int x){
return ++x;
}
int neg(int x){
return -x;
}
What i already got is +, -. *. Iterativ an also recursive (so i can also use them if needed).
Now im stuck on the divide method because i dont know how to deal with the commas and the logic behind it. Just to imagine what it looks like to deal with succ() and neg() heres an example of an subtraction iterativ and recursive:
int sub(int x, int y){
if (y > 0){
y = neg(y);
x = add(x, y);
return x;
}
else if (y < 0){
y = neg(y);
x = add(x, y);
return x;
}
else if (y == 0) {
return x;
}
}
int sub_recc(int x, int y){
if (y < 0){
y = neg(y);
x = add_recc(x, y);
return x;
} else if (y > 0){
x = sub_recc(x, y - 1);
x = x - 1;
return x;
}else if( y == 0) {
return x;
}
}
If you can substract and add, then you can handle integer division. In pseudo code it is just:
division y/x is:
First handle signs because we will only divide positive integers
set sign = 0
if y > 0 then y = neg(y), sign = 1 - sign
if x > 0 then y = neg(y), sign = 1 - sign
ok, if sign is 0 nothing to do, if sign is 1, we will negate the result
Now the quotient is just the number of times you can substract the divisor:
set quotient = 0
while y > x do
y = y - x
quotient = quotient + 1
Ok we have the absolute value of the quotient, now for the sign:
if sign == 1, then quotient = neg(quotient)
The correct translation in C++ language as well as the recursive part are left as an exercise...
Hint for recursion y/x == 1 + (y-x)/x while y>x
Above was the integer part. Integer is nice and easy because it gives exact operations. A floating point representation in a base is always something close to mantissa * baseexp where mantissa is either an integer number with a maximum number of digits or a number between 0 and 1 (said normal representation). And you can pass from one representation to the other but changing the exponent part by the number of digits of the mantissa: 2.5 is 25 10-1 (int mantissa) of .25 101 (0 <= mantissa < 1).
So if you want to operate base 10 floating point numbers you should:
convert an integer to a floating point (mantissa + exponent) representation
for addition and substraction, the result exponent is a priori the greater of the exponents. Both mantissa shall be scaled to that exponent and added/substracted. Then the final exponent must be adjusted because the operation may have added an additional digit (7 + 9 = 16) or have caused the highest order ones to vanish (101 - 98 - 3)
for product, you add the exponents and multiply the mantissas, and then normalize (adjust exponent) the resul
for division, you scale the mantissa by the maximum number of digits, make the division with the integer division algorithm, and again normalise. For example 1/3 with a precision of 6 digits is obtained with:
1/3 = (1 * 106 /3) * 10-6 = (1000000/3) * 10-6
it give 333333 * 10-6 so .333333 in normalized form
Ok, it will be a lot of boiling plate code, but nothing really hard.
Log story made short: just remember how you learned that with a paper and a pencil...

Find smallest integer greater or equal than x (positive integer) multiple of z (positive integer, probably power of 2)

Probably very easy question, yet I came out with this implementation that looks far too complicated...
unsigned int x;
unsigned int z;
unsigned int makeXMultipleOfZ(const unsigned x, const unsigned z) {
return x + (z - x % z) % z;
//or
//return x + (z - (x + 1) % z - 1); //This generates shorter assembly,
//6 against 8 instructions
}
I would like to avoid if-statements
If this can help we can safely say that z will be a power of 2
In my case z=4 (I know I could replace the modulo operation with a & bit operator), and I was wondering if could come with an implementation that involves less steps.
If z is a power of two, the modulo operation can be reduced to this bitwise operation:
return (x + z - 1) & ~(z - 1);
This logic is very common for data structure boundary alignment, for example. More info here: https://en.wikipedia.org/wiki/Data_structure_alignment
If z is a power of two and the integers are unsigned, the following will work:
x + (z - 1) & ~(z - 1)
I cannot think of a solution using bit-twiddling if z is an arbitrary number.

Fast ceiling of an integer division in C / C++

Given integer values x and y, C and C++ both return as the quotient q = x/y the floor of the floating point equivalent. I'm interested in a method of returning the ceiling instead. For example, ceil(10/5)=2 and ceil(11/5)=3.
The obvious approach involves something like:
q = x / y;
if (q * y < x) ++q;
This requires an extra comparison and multiplication; and other methods I've seen (used in fact) involve casting as a float or double. Is there a more direct method that avoids the additional multiplication (or a second division) and branch, and that also avoids casting as a floating point number?
For positive numbers where you want to find the ceiling (q) of x when divided by y.
unsigned int x, y, q;
To round up ...
q = (x + y - 1) / y;
or (avoiding overflow in x+y)
q = 1 + ((x - 1) / y); // if x != 0
For positive numbers:
q = x/y + (x % y != 0);
Sparky's answer is one standard way to solve this problem, but as I also wrote in my comment, you run the risk of overflows. This can be solved by using a wider type, but what if you want to divide long longs?
Nathan Ernst's answer provides one solution, but it involves a function call, a variable declaration and a conditional, which makes it no shorter than the OPs code and probably even slower, because it is harder to optimize.
My solution is this:
q = (x % y) ? x / y + 1 : x / y;
It will be slightly faster than the OPs code, because the modulo and the division is performed using the same instruction on the processor, because the compiler can see that they are equivalent. At least gcc 4.4.1 performs this optimization with -O2 flag on x86.
In theory the compiler might inline the function call in Nathan Ernst's code and emit the same thing, but gcc didn't do that when I tested it. This might be because it would tie the compiled code to a single version of the standard library.
As a final note, none of this matters on a modern machine, except if you are in an extremely tight loop and all your data is in registers or the L1-cache. Otherwise all of these solutions will be equally fast, except for possibly Nathan Ernst's, which might be significantly slower if the function has to be fetched from main memory.
You could use the div function in cstdlib to get the quotient & remainder in a single call and then handle the ceiling separately, like in the below
#include <cstdlib>
#include <iostream>
int div_ceil(int numerator, int denominator)
{
std::div_t res = std::div(numerator, denominator);
return res.rem ? (res.quot + 1) : res.quot;
}
int main(int, const char**)
{
std::cout << "10 / 5 = " << div_ceil(10, 5) << std::endl;
std::cout << "11 / 5 = " << div_ceil(11, 5) << std::endl;
return 0;
}
There's a solution for both positive and negative x but only for positive y with just 1 division and without branches:
int div_ceil(int x, int y) {
return x / y + (x % y > 0);
}
Note, if x is positive then division is towards zero, and we should add 1 if reminder is not zero.
If x is negative then division is towards zero, that's what we need, and we will not add anything because x % y is not positive
How about this? (requires y non-negative, so don't use this in the rare case where y is a variable with no non-negativity guarantee)
q = (x > 0)? 1 + (x - 1)/y: (x / y);
I reduced y/y to one, eliminating the term x + y - 1 and with it any chance of overflow.
I avoid x - 1 wrapping around when x is an unsigned type and contains zero.
For signed x, negative and zero still combine into a single case.
Probably not a huge benefit on a modern general-purpose CPU, but this would be far faster in an embedded system than any of the other correct answers.
I would have rather commented but I don't have a high enough rep.
As far as I am aware, for positive arguments and a divisor which is a power of 2, this is the fastest way (tested in CUDA):
//example y=8
q = (x >> 3) + !!(x & 7);
For generic positive arguments only, I tend to do it like so:
q = x/y + !!(x % y);
This works for positive or negative numbers:
q = x / y + ((x % y != 0) ? !((x > 0) ^ (y > 0)) : 0);
If there is a remainder, checks to see if x and y are of the same sign and adds 1 accordingly.
simplified generic form,
int div_up(int n, int d) {
return n / d + (((n < 0) ^ (d > 0)) && (n % d));
} //i.e. +1 iff (not exact int && positive result)
For a more generic answer, C++ functions for integer division with well defined rounding strategy
For signed or unsigned integers.
q = x / y + !(((x < 0) != (y < 0)) || !(x % y));
For signed dividends and unsigned divisors.
q = x / y + !((x < 0) || !(x % y));
For unsigned dividends and signed divisors.
q = x / y + !((y < 0) || !(x % y));
For unsigned integers.
q = x / y + !!(x % y);
Zero divisor fails (as with a native operation). Cannot cause overflow.
Corresponding floored and modulo constexpr implementations here, along with templates to select the necessary overloads (as full optimization and to prevent mismatched sign comparison warnings):
https://github.com/libbitcoin/libbitcoin-system/wiki/Integer-Division-Unraveled
Compile with O3, The compiler performs optimization well.
q = x / y;
if (x % y) ++q;