Signed Char ATAN2 and ATAN approximations - c++

Basically, I've been trying to make two approximation functions. In both cases I input the "x" and the "y" components (to deal with those nasty n/0 and 0/0 conditions), and need to get a Signed Char output. In ATAN2's case, it should provide a range of +/-PI, and in ATAN's case, the range should be +/- PI/2.
I spent the entire of yesterday trying to wrap my head around it. After playing around in excel to find an overall good algorithm based on the approximation:
X * (PI/4 + 0.273 * (1 - |X|)) * 128/PI // Scale factor at end to switch to char format
I came up with the following code:
signed char nabsSC(signed char x)
{
if(x > 0)
return -x;
return x;
}
signed char signSC(signed char input, signed char ifZero = 0, signed char scaleFactor = 1)
{
if(input > 0)
{return scaleFactor;}
else if(input < 0)
{return -scaleFactor;}
else
{return ifZero;}
}
signed char divisionSC(signed char numerator, signed char denominator)
{
if(denominator == 0) // Error Condition
{return 0;}
else
{return numerator/denominator;}
}
//#######################################################################################
signed char atan2SC(signed char y, signed char x)
{
// #todo make clearer : the code was deduced through trial and error in excel with brute force... not the best reasoning in the world but hey ho
if((x == y) && (x == 0)) // Error Condition
{return 0;}
// Prepare for algorithm Choice
const signed char X = abs(x);
signed char Y = abs(y);
if(Y > 2)
{Y = (Y << 1) + 4;}
const signed char alpha1 = 43;
const signed char alpha2 = 11;
// Make Choice
if(X <= Y) // x/y Path
{
const signed char beta = 64;
const signed char a = divisionSC(x,y); // x/y
const signed char A = nabsSC(a); // -|x/y|
const signed char temp = a * (alpha1 + alpha2 * A); // (x/y) * (32 + ((0.273 * 128) / PI) * (1 - |x/y|)))
// Small angle approximation of ARCTAN(X)
if(y < 0) // Determine Quadrant
{return -(temp + beta);}
else
{return -(temp - beta);}
}
else // y/x Path
{
const signed char a = divisionSC(y,x); // y/x
const signed char A = nabsSC(a); // -|y/x|
const signed char temp = a * (alpha1 + alpha2 * A); // (y/x) * (32 + ((0.273 * 128) / PI) * (1 - |y/x|)))
// Small angle approximation of ARCTAN(X)
if(x < 0) // Determine Quadrant
{
Y = signSC(y, -127, 127); // Sign(y)*127, if undefined: use -127
return temp + Y;
}
else
{return temp;}
}
}
Much to my despair, the implementation has errors as large as 180 degrees, and pretty much everywhere in between as well. (I compared it to the ATAN2F from the library after converting to signed char format.)
I got the general gist from this website: http://geekshavefeelings.com/posts/fixed-point-atan2
Can anybody tell me where I'm going wrong? And how I should approach the ATAN variant (which should be more precise as it's looking over half the range) without all this craziness.
I'm currently using QT creator 4.8.1 on windows. The end platform for this specific bit of code will eventually be a micro-controller without an FPU, and the ATAN functions will be one of the primary functions used. As such, efficiency with reasonable error (+/-2 degrees for ATAN2 and +/-1 degree for ATAN. These are guesstimates for now, so I might increase the range, however, 90 degrees is definitely not acceptable!) is the aim of the game.
Thanks in advance for any and all help!
EDIT:
Just to clarify, the outputs of ATAN2 and ATAN output to a signed char value, but the ranges of the two types are different ranges.
ATAN2 shall have a range from -128 (-PI) to 127 (+PI - PI/128).
ATAN will have a range from -128 (-PI/2) to 127 (+PI/2 - PI/256).
As such the output values from the two can be considered to be two different data types.
Sorry for any confusion.
EDIT2: Converted implicit int numbers explicitly into signed char constants.

An outline follows. Below is additional information.
The result angle (a Binary Angle Measure) exactly mathematically divides the unit circle into 8 wedges. Assuming -128 to 127 char, for atan2SC() the result of each octant is 33 integers: 0 to 32 + an offset. (0 to 32, rather than 0 to 31 due to rounding.) For atan2SC(), the result is 0 to 64. So just focus on calculating the result of 1 primary octant with x,y inputs and 0 to 64 result. atan2SC() and atan2SC() can both use this helper function at2(). For atan2SC(), to find the intermediate angle a, use a = at2(x,y)/2. For atanSC(), use a = at2(-128, y).
Finding the integer quotient with a = divisionSC(x,y) and then a * (43 + 11 * A) loses too much information in the division. Need to find the atan2 approximation with an equation that uses x,y maybe in the form at2 = (a*y*y + b*y)/(c*x*x + d*x).
Good to use negative absolute value as with nabsSC(). The negative range of integers meets or exceed the positive range. e.g. -128 to -1 vs 1 to 127. Use negative numbers and 0, when calling the at2().
[Edit]
Below is code with a simplified octant selection algorithm. It is carefully constructed to insure any negation of x,y will result in the SCHAR_MIN,SCHAR_MAX range - assuming 2's complelment. All octants call the iat2() and here is where improvements can be made to improve precision. Note: iat2() division by x==0 is prevented as x is not 0 at this point. Depending on rounding mode and if this helper function is shared with atanSC() will dictate its details. Suggest a 2 piece wise linear table is wide integer math is not available, else a a linear (ay+b)/(cx+d). I may play with this more.
The weight of precision vs. performance is a crucial one for OP's code, but not pass along well enough for me to derive an optimal answer. So I've posted a test driver below that assesses the precision of what ever detail of iat2() OP comes up with.
3 pitfalls exist. 1) When answer is to be +180 degree, OP appears to want -128 BAM. But atan2(-1, 0.0) comes up with +pi. This sign reversal may be an issue. Note: atan2(-1, -0.0) --> -pi. Ref. 2) When an answer is just slightly less than +180 degrees, depending on iat2() details, the integer BAM result is +128, which tends to wrap to -128. The atan2() result is just less than +pi or +128 BAM. This edge condition needs review inOP's final code. 3) The (x=0,y=0) case needs special handling. The octant selection code finds it.
Code for a signed char atanSC(signed char x), if it needs to be fast, could use a few if()s and a 64 byte look-up table. (Assuming a 8 bit signed char). This same table could be used in iat2().
.
#include <stdio.h>
#include <stdlib.h>
// -x > -y >= 0, so divide by 0 not possible
static signed char iat2(signed char y, signed char x) {
// printf("x=%4d y=%4d\n", x, y); fflush(stdout);
return ((y*32+(x/2))/x)*2; // 3.39 mxdiff
// return ((y*64+(x/2))/x); // 3.65 mxdiff
// return (y*64)/x; // 3.88 mxdiff
}
signed char iatan2sc(signed char y, signed char x) {
// determine octant
if (y >= 0) { // oct 0,1,2,3
if (x >= 0) { // oct 0,1
if (x > y) {
return iat2(-y, -x)/2 + 0*32;
} else {
if (y == 0) return 0; // (x=0,y=0)
return -iat2(-x, -y)/2 + 2*32;
}
} else { // oct 2,3
// if (-x <= y) {
if (x >= -y) {
return iat2(x, -y)/2 + 2*32;
} else {
return -iat2(-y, x)/2 + 4*32;
}
}
} else { // oct 4,5,6,7
if (x < 0) { // oct 4,5
// if (-x > -y) {
if (x < y) {
return iat2(y, x)/2 + -4*32;
} else {
return -iat2(x, y)/2 + -2*32;
}
} else { // oct 6,7
// if (x <= -y) {
if (-x >= y) {
return iat2(-x, y)/2 + -2*32;
} else {
return -iat2(y, -x)/2 + -0*32;
}
}
}
}
#include <math.h>
static void test_iatan2sc(signed char y, signed char x) {
static int mn=INT_MAX;
static int mx=INT_MIN;
static double mxdiff = 0;
signed char i = iatan2sc(y,x);
static const double Pi = 3.1415926535897932384626433832795;
double a = atan2(y ? y : -0.0, x) * 256/(2*Pi);
if (i < mn) {
mn = i;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
if (i > mx) {
mx = i;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
double diff = fabs(i - a);
if (diff > 128) diff = fabs(diff - 256);
if (diff > mxdiff) {
mxdiff = diff;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
}
int main(void) {
int x,y;
int n = 127;
for (y = -n-1; y <= n; y++) {
for (x = -n-1; x <= n; x++) {
test_iatan2sc(y,x);
}
}
puts("Done");
return 0;
}
BTW: a fun problem.

Related

How do I avoid getting -0 when dividing in c++

I have a script in which I want to find the chunk my player is in.
Simplified version:
float x = -5
float y = -15
int chunkSize = 16
int player_chunk_x = int(x / chunkSize)
int player_chunk_y = int(y / chunkSize)
This gives the chunk the player is in, but when x or y is negative but not less than the chunkSize (-16), player_chunk_x or player_chunk_y is still 0 or '-0' when I need -1
Of course I can just do this:
if (x < 0) x--
if (y < 0) y--
But I was wondering if there is a better solution to my problem.
Thanks in advance.
Since C++20 it's impossible to get an integral type signed negative zero, and was only possible in a rare (but by no means extinct) situation where your platform had 1's complement int. It's still possible in C (although rare), and adding 0 to the result will remove it.
It's possible though to have a floating point signed negative zero. For that, adding 0.0 will remove it.
Note that for an integral -0, subtracting 1 will yield -1.
Your issue is that you are casting a floating point value to an integer value.
This rounds to zero by default.
If you want consistent round down, you first have to floor your value:
int player_chunk_x = int(std::floor(x / chunkSize);
If you don't like negative numbers then don't use them:
int player_chunk_x = (x - min_x) / chunkSize;
int player_chunk_y = (y - min_y) / chunkSize;
If you want integer, in this case -1 on ( -5%16 or anything like it ) then this is possible using a math function:
Possible Ways :
using floor ->
float x = -5;
float y = -15;
int chunkSize = 16;
int player_chunk_x = floor(x / chunkSize)
// will give -1 for (-5 % 16);
// 0 for (5%16)
// 1 for any value between 1 & 2 and so on
int player_chunk_y = floor(y / chunkSize);

how to wrap radians between -pi and pi with mod? [duplicate]

I'm looking for some nice C code that will accomplish effectively:
while (deltaPhase >= M_PI) deltaPhase -= M_TWOPI;
while (deltaPhase < -M_PI) deltaPhase += M_TWOPI;
What are my options?
Edit Apr 19, 2013:
Modulo function updated to handle boundary cases as noted by aka.nice and arr_sea:
static const double _PI= 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348;
static const double _TWO_PI= 6.2831853071795864769252867665590057683943387987502116419498891846156328125724179972560696;
// Floating-point modulo
// The result (the remainder) has same sign as the divisor.
// Similar to matlab's mod(); Not similar to fmod() - Mod(-3,4)= 1 fmod(-3,4)= -3
template<typename T>
T Mod(T x, T y)
{
static_assert(!std::numeric_limits<T>::is_exact , "Mod: floating-point type expected");
if (0. == y)
return x;
double m= x - y * floor(x/y);
// handle boundary cases resulted from floating-point cut off:
if (y > 0) // modulo range: [0..y)
{
if (m>=y) // Mod(-1e-16 , 360. ): m= 360.
return 0;
if (m<0 )
{
if (y+m == y)
return 0 ; // just in case...
else
return y+m; // Mod(106.81415022205296 , _TWO_PI ): m= -1.421e-14
}
}
else // modulo range: (y..0]
{
if (m<=y) // Mod(1e-16 , -360. ): m= -360.
return 0;
if (m>0 )
{
if (y+m == y)
return 0 ; // just in case...
else
return y+m; // Mod(-106.81415022205296, -_TWO_PI): m= 1.421e-14
}
}
return m;
}
// wrap [rad] angle to [-PI..PI)
inline double WrapPosNegPI(double fAng)
{
return Mod(fAng + _PI, _TWO_PI) - _PI;
}
// wrap [rad] angle to [0..TWO_PI)
inline double WrapTwoPI(double fAng)
{
return Mod(fAng, _TWO_PI);
}
// wrap [deg] angle to [-180..180)
inline double WrapPosNeg180(double fAng)
{
return Mod(fAng + 180., 360.) - 180.;
}
// wrap [deg] angle to [0..360)
inline double Wrap360(double fAng)
{
return Mod(fAng ,360.);
}
One-liner constant-time solution:
Okay, it's a two-liner if you count the second function for [min,max) form, but close enough — you could merge them together anyways.
/* change to `float/fmodf` or `long double/fmodl` or `int/%` as appropriate */
/* wrap x -> [0,max) */
double wrapMax(double x, double max)
{
/* integer math: `(max + x % max) % max` */
return fmod(max + fmod(x, max), max);
}
/* wrap x -> [min,max) */
double wrapMinMax(double x, double min, double max)
{
return min + wrapMax(x - min, max - min);
}
Then you can simply use deltaPhase = wrapMinMax(deltaPhase, -M_PI, +M_PI).
The solutions is constant-time, meaning that the time it takes does not depend on how far your value is from [-PI,+PI) — for better or for worse.
Verification:
Now, I don't expect you to take my word for it, so here are some examples, including boundary conditions. I'm using integers for clarity, but it works much the same with fmod() and floats:
Positive x:
wrapMax(3, 5) == 3: (5 + 3 % 5) % 5 == (5 + 3) % 5 == 8 % 5 == 3
wrapMax(6, 5) == 1: (5 + 6 % 5) % 5 == (5 + 1) % 5 == 6 % 5 == 1
Negative x:
Note: These assume that integer modulo copies left-hand sign; if not, you get the above ("Positive") case.
wrapMax(-3, 5) == 2: (5 + (-3) % 5) % 5 == (5 - 3) % 5 == 2 % 5 == 2
wrapMax(-6, 5) == 4: (5 + (-6) % 5) % 5 == (5 - 1) % 5 == 4 % 5 == 4
Boundaries:
wrapMax(0, 5) == 0: (5 + 0 % 5) % 5 == (5 + 0) % 5 == 5 % 5 == 0
wrapMax(5, 5) == 0: (5 + 5 % 5) % 5 == (5 + 0) % 5== 5 % 5 == 0
wrapMax(-5, 5) == 0: (5 + (-5) % 5) % 5 == (5 + 0) % 5 == 5 % 5 == 0
Note: Possibly -0 instead of +0 for floating-point.
The wrapMinMax function works much the same: wrapping x to [min,max) is the same as wrapping x - min to [0,max-min), and then (re-)adding min to the result.
I don't know what would happen with a negative max, but feel free to check that yourself!
If ever your input angle can reach arbitrarily high values, and if continuity matters, you can also try
atan2(sin(x),cos(x))
This will preserve continuity of sin(x) and cos(x) better than modulo for high values of x, especially in single precision (float).
Indeed, exact_value_of_pi - double_precision_approximation ~= 1.22e-16
On the other hand, most library/hardware use a high precision approximation of PI for applying the modulo when evaluating trigonometric functions (though x86 family is known to use a rather poor one).
Result might be in [-pi,pi], you'll have to check the exact bounds.
Personaly, I would prevent any angle to reach several revolutions by wrapping systematically and stick to a fmod solution like the one of boost.
There is also fmod function in math.h but the sign causes trouble so that a subsequent operation is needed to make the result fir in the proper range (like you already do with the while's). For big values of deltaPhase this is probably faster than substracting/adding `M_TWOPI' hundreds of times.
deltaPhase = fmod(deltaPhase, M_TWOPI);
EDIT:
I didn't try it intensively but I think you can use fmod this way by handling positive and negative values differently:
if (deltaPhase>0)
deltaPhase = fmod(deltaPhase+M_PI, 2.0*M_PI)-M_PI;
else
deltaPhase = fmod(deltaPhase-M_PI, 2.0*M_PI)+M_PI;
The computational time is constant (unlike the while solution which gets slower as the absolute value of deltaPhase increases)
I would do this:
double wrap(double x) {
return x-2*M_PI*floor(x/(2*M_PI)+0.5);
}
There will be significant numerical errors. The best solution to the numerical errors is to store your phase scaled by 1/PI or by 1/(2*PI) and depending on what you are doing store them as fixed point.
Instead of working in radians, use angles scaled by 1/(2π) and use modf, floor etc. Convert back to radians to use library functions.
This also has the effect that rotating ten thousand and a half revolutions is the same as rotating half then ten thousand revolutions, which is not guaranteed if your angles are in radians, as you have an exact representation in the floating point value rather than summing approximate representations:
#include <iostream>
#include <cmath>
float wrap_rads ( float r )
{
while ( r > M_PI ) {
r -= 2 * M_PI;
}
while ( r <= -M_PI ) {
r += 2 * M_PI;
}
return r;
}
float wrap_grads ( float r )
{
float i;
r = modff ( r, &i );
if ( r > 0.5 ) r -= 1;
if ( r <= -0.5 ) r += 1;
return r;
}
int main ()
{
for (int rotations = 1; rotations < 100000; rotations *= 10 ) {
{
float pi = ( float ) M_PI;
float two_pi = 2 * pi;
float a = pi;
a += rotations * two_pi;
std::cout << rotations << " and a half rotations in radians " << a << " => " << wrap_rads ( a ) / two_pi << '\n' ;
}
{
float pi = ( float ) 0.5;
float two_pi = 2 * pi;
float a = pi;
a += rotations * two_pi;
std::cout << rotations << " and a half rotations in grads " << a << " => " << wrap_grads ( a ) / two_pi << '\n' ;
}
std::cout << '\n';
}}
Here is a version for other people finding this question that can use C++ with Boost:
#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/sign.hpp>
template<typename T>
inline T normalizeRadiansPiToMinusPi(T rad)
{
// copy the sign of the value in radians to the value of pi
T signedPI = boost::math::copysign(boost::math::constants::pi<T>(),rad);
// set the value of rad to the appropriate signed value between pi and -pi
rad = fmod(rad+signedPI,(2*boost::math::constants::pi<T>())) - signedPI;
return rad;
}
C++11 version, no Boost dependency:
#include <cmath>
// Bring the 'difference' between two angles into [-pi; pi].
template <typename T>
T normalizeRadiansPiToMinusPi(T rad) {
// Copy the sign of the value in radians to the value of pi.
T signed_pi = std::copysign(M_PI,rad);
// Set the value of difference to the appropriate signed value between pi and -pi.
rad = std::fmod(rad + signed_pi,(2 * M_PI)) - signed_pi;
return rad;
}
I encountered this question when searching for how to wrap a floating point value (or a double) between two arbitrary numbers. It didn't answer specifically for my case, so I worked out my own solution which can be seen here. This will take a given value and wrap it between lowerBound and upperBound where upperBound perfectly meets lowerBound such that they are equivalent (ie: 360 degrees == 0 degrees so 360 would wrap to 0)
Hopefully this answer is helpful to others stumbling across this question looking for a more generic bounding solution.
double boundBetween(double val, double lowerBound, double upperBound){
if(lowerBound > upperBound){std::swap(lowerBound, upperBound);}
val-=lowerBound; //adjust to 0
double rangeSize = upperBound - lowerBound;
if(rangeSize == 0){return upperBound;} //avoid dividing by 0
return val - (rangeSize * std::floor(val/rangeSize)) + lowerBound;
}
A related question for integers is available here:
Clean, efficient algorithm for wrapping integers in C++
A two-liner, non-iterative, tested solution for normalizing arbitrary angles to [-π, π):
double normalizeAngle(double angle)
{
double a = fmod(angle + M_PI, 2 * M_PI);
return a >= 0 ? (a - M_PI) : (a + M_PI);
}
Similarly, for [0, 2π):
double normalizeAngle(double angle)
{
double a = fmod(angle, 2 * M_PI);
return a >= 0 ? a : (a + 2 * M_PI);
}
In the case where fmod() is implemented through truncated division and has the same sign as the dividend, it can be taken advantage of to solve the general problem thusly:
For the case of (-PI, PI]:
if (x > 0) x = x - 2PI * ceil(x/2PI) #Shift to the negative regime
return fmod(x - PI, 2PI) + PI
And for the case of [-PI, PI):
if (x < 0) x = x - 2PI * floor(x/2PI) #Shift to the positive regime
return fmod(x + PI, 2PI) - PI
[Note that this is pseudocode; my original was written in Tcl, and I didn't want to torture everyone with that. I needed the first case, so had to figure this out.]
deltaPhase -= floor(deltaPhase/M_TWOPI)*M_TWOPI;
The way suggested you suggested is best. It is fastest for small deflections. If angles in your program are constantly being deflected into the proper range, then you should only run into big out of range values rarely. Therefore paying the cost of a complicated modular arithmetic code every round seems wasteful. Comparisons are cheap compared to modular arithmetic (http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/).
In C99:
float unwindRadians( float radians )
{
const bool radiansNeedUnwinding = radians < -M_PI || M_PI <= radians;
if ( radiansNeedUnwinding )
{
if ( signbit( radians ) )
{
radians = -fmodf( -radians + M_PI, 2.f * M_PI ) + M_PI;
}
else
{
radians = fmodf( radians + M_PI, 2.f * M_PI ) - M_PI;
}
}
return radians;
}
If linking against glibc's libm (including newlib's implementation) you can access
__ieee754_rem_pio2f() and __ieee754_rem_pio2() private functions:
extern __int32_t __ieee754_rem_pio2f (float,float*);
float wrapToPI(float xf){
const float p[4]={0,M_PI_2,M_PI,-M_PI_2};
float yf[2];
int q;
int qmod4;
q=__ieee754_rem_pio2f(xf,yf);
/* xf = q * M_PI_2 + yf[0] + yf[1] /
* yf[1] << y[0], not sure if it could be ignored */
qmod4= q % 4;
if (qmod4==2)
/* (yf[0] > 0) defines interval (-pi,pi]*/
return ( (yf[0] > 0) ? -p[2] : p[2] ) + yf[0] + yf[1];
else
return p[qmod4] + yf[0] + yf[1];
}
Edit: Just realised that you need to link to libm.a, I couldn't find the symbols declared in libm.so
I have used (in python):
def WrapAngle(Wrapped, UnWrapped ):
TWOPI = math.pi * 2
TWOPIINV = 1.0 / TWOPI
return UnWrapped + round((Wrapped - UnWrapped) * TWOPIINV) * TWOPI
c-code equivalent:
#define TWOPI 6.28318531
double WrapAngle(const double dWrapped, const double dUnWrapped )
{
const double TWOPIINV = 1.0/ TWOPI;
return dUnWrapped + round((dWrapped - dUnWrapped) * TWOPIINV) * TWOPI;
}
notice that this brings it in the wrapped domain +/- 2pi so for +/- pi domain you need to handle that afterward like:
if( angle > pi):
angle -= 2*math.pi

Exact value of a floating-point number as a rational

I'm looking for a method to convert the exact value of a floating-point number to a rational quotient of two integers, i.e. a / b, where b is not larger than a specified maximum denominator b_max. If satisfying the condition b <= b_max is impossible, then the result falls back to the best approximation which still satisfies the condition.
Hold on. There are a lot of questions/answers here about the best rational approximation of a truncated real number which is represented as a floating-point number. However I'm interested in the exact value of a floating-point number, which is itself a rational number with a different representation. More specifically, the mathematical set of floating-point numbers is a subset of rational numbers. In case of IEEE 754 binary floating-point standard it is a subset of dyadic rationals. Anyway, any floating-point number can be converted to a rational quotient of two finite precision integers as a / b.
So, for example assuming IEEE 754 single-precision binary floating-point format, the rational equivalent of float f = 1.0f / 3.0f is not 1 / 3, but 11184811 / 33554432. This is the exact value of f, which is a number from the mathematical set of IEEE 754 single-precision binary floating-point numbers.
Based on my experience, traversing (by binary search of) the Stern-Brocot tree is not useful here, since that is more suitable for approximating the value of a floating-point number, when it is interpreted as a truncated real instead of an exact rational.
Possibly, continued fractions are the way to go.
The another problem here is integer overflow. Think about that we want to represent the rational as the quotient of two int32_t, where the maximum denominator b_max = INT32_MAX. We cannot rely on a stopping criterion like b > b_max. So the algorithm must never overflow, or it must detect overflow.
What I found so far is an algorithm from Rosetta Code, which is based on continued fractions, but its source mentions it is "still not quite complete". Some basic tests gave good results, but I cannot confirm its overall correctness and I think it can easily overflow.
// https://rosettacode.org/wiki/Convert_decimal_number_to_rational#C
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdint.h>
/* f : number to convert.
* num, denom: returned parts of the rational.
* md: max denominator value. Note that machine floating point number
* has a finite resolution (10e-16 ish for 64 bit double), so specifying
* a "best match with minimal error" is often wrong, because one can
* always just retrieve the significand and return that divided by
* 2**52, which is in a sense accurate, but generally not very useful:
* 1.0/7.0 would be "2573485501354569/18014398509481984", for example.
*/
void rat_approx(double f, int64_t md, int64_t *num, int64_t *denom)
{
/* a: continued fraction coefficients. */
int64_t a, h[3] = { 0, 1, 0 }, k[3] = { 1, 0, 0 };
int64_t x, d, n = 1;
int i, neg = 0;
if (md <= 1) { *denom = 1; *num = (int64_t) f; return; }
if (f < 0) { neg = 1; f = -f; }
while (f != floor(f)) { n <<= 1; f *= 2; }
d = f;
/* continued fraction and check denominator each step */
for (i = 0; i < 64; i++) {
a = n ? d / n : 0;
if (i && !a) break;
x = d; d = n; n = x % n;
x = a;
if (k[1] * a + k[0] >= md) {
x = (md - k[0]) / k[1];
if (x * 2 >= a || k[1] >= md)
i = 65;
else
break;
}
h[2] = x * h[1] + h[0]; h[0] = h[1]; h[1] = h[2];
k[2] = x * k[1] + k[0]; k[0] = k[1]; k[1] = k[2];
}
*denom = k[1];
*num = neg ? -h[1] : h[1];
}
All finite double are rational numbers as OP well stated..
Use frexp() to break the number into its fraction and exponent. The end result still needs to use double to represent whole number values due to range requirements. Some numbers are too small, (x smaller than 1.0/(2.0,DBL_MAX_EXP)) and infinity, not-a-number are issues.
The frexp functions break a floating-point number into a normalized fraction and an integral power of 2. ... interval [1/2, 1) or zero ...
C11 §7.12.6.4 2/3
#include <math.h>
#include <float.h>
_Static_assert(FLT_RADIX == 2, "TBD code for non-binary FP");
// Return error flag
int split(double x, double *numerator, double *denominator) {
if (!isfinite(x)) {
*numerator = *denominator = 0.0;
if (x > 0.0) *numerator = 1.0;
if (x < 0.0) *numerator = -1.0;
return 1;
}
int bdigits = DBL_MANT_DIG;
int expo;
*denominator = 1.0;
*numerator = frexp(x, &expo) * pow(2.0, bdigits);
expo -= bdigits;
if (expo > 0) {
*numerator *= pow(2.0, expo);
}
else if (expo < 0) {
expo = -expo;
if (expo >= DBL_MAX_EXP-1) {
*numerator /= pow(2.0, expo - (DBL_MAX_EXP-1));
*denominator *= pow(2.0, DBL_MAX_EXP-1);
return fabs(*numerator) < 1.0;
} else {
*denominator *= pow(2.0, expo);
}
}
while (*numerator && fmod(*numerator,2) == 0 && fmod(*denominator,2) == 0) {
*numerator /= 2.0;
*denominator /= 2.0;
}
return 0;
}
void split_test(double x) {
double numerator, denominator;
int err = split(x, &numerator, &denominator);
printf("e:%d x:%24.17g n:%24.17g d:%24.17g q:%24.17g\n",
err, x, numerator, denominator, numerator/ denominator);
}
int main(void) {
volatile float third = 1.0f/3.0f;
split_test(third);
split_test(0.0);
split_test(0.5);
split_test(1.0);
split_test(2.0);
split_test(1.0/7);
split_test(DBL_TRUE_MIN);
split_test(DBL_MIN);
split_test(DBL_MAX);
return 0;
}
Output
e:0 x: 0.3333333432674408 n: 11184811 d: 33554432 q: 0.3333333432674408
e:0 x: 0 n: 0 d: 9007199254740992 q: 0
e:0 x: 1 n: 1 d: 1 q: 1
e:0 x: 0.5 n: 1 d: 2 q: 0.5
e:0 x: 1 n: 1 d: 1 q: 1
e:0 x: 2 n: 2 d: 1 q: 2
e:0 x: 0.14285714285714285 n: 2573485501354569 d: 18014398509481984 q: 0.14285714285714285
e:1 x: 4.9406564584124654e-324 n: 4.4408920985006262e-16 d: 8.9884656743115795e+307 q: 4.9406564584124654e-324
e:0 x: 2.2250738585072014e-308 n: 2 d: 8.9884656743115795e+307 q: 2.2250738585072014e-308
e:0 x: 1.7976931348623157e+308 n: 1.7976931348623157e+308 d: 1 q: 1.7976931348623157e+308
Leave the b_max consideration for later.
More expedient code is possible with replacing pow(2.0, expo) with ldexp(1, expo) #gammatester or exp2(expo) #Bob__
while (*numerator && fmod(*numerator,2) == 0 && fmod(*denominator,2) == 0) could also use some performance improvements. But first, let us get the functionality as needed.

Fast ceiling of an integer division in C / C++

Given integer values x and y, C and C++ both return as the quotient q = x/y the floor of the floating point equivalent. I'm interested in a method of returning the ceiling instead. For example, ceil(10/5)=2 and ceil(11/5)=3.
The obvious approach involves something like:
q = x / y;
if (q * y < x) ++q;
This requires an extra comparison and multiplication; and other methods I've seen (used in fact) involve casting as a float or double. Is there a more direct method that avoids the additional multiplication (or a second division) and branch, and that also avoids casting as a floating point number?
For positive numbers where you want to find the ceiling (q) of x when divided by y.
unsigned int x, y, q;
To round up ...
q = (x + y - 1) / y;
or (avoiding overflow in x+y)
q = 1 + ((x - 1) / y); // if x != 0
For positive numbers:
q = x/y + (x % y != 0);
Sparky's answer is one standard way to solve this problem, but as I also wrote in my comment, you run the risk of overflows. This can be solved by using a wider type, but what if you want to divide long longs?
Nathan Ernst's answer provides one solution, but it involves a function call, a variable declaration and a conditional, which makes it no shorter than the OPs code and probably even slower, because it is harder to optimize.
My solution is this:
q = (x % y) ? x / y + 1 : x / y;
It will be slightly faster than the OPs code, because the modulo and the division is performed using the same instruction on the processor, because the compiler can see that they are equivalent. At least gcc 4.4.1 performs this optimization with -O2 flag on x86.
In theory the compiler might inline the function call in Nathan Ernst's code and emit the same thing, but gcc didn't do that when I tested it. This might be because it would tie the compiled code to a single version of the standard library.
As a final note, none of this matters on a modern machine, except if you are in an extremely tight loop and all your data is in registers or the L1-cache. Otherwise all of these solutions will be equally fast, except for possibly Nathan Ernst's, which might be significantly slower if the function has to be fetched from main memory.
You could use the div function in cstdlib to get the quotient & remainder in a single call and then handle the ceiling separately, like in the below
#include <cstdlib>
#include <iostream>
int div_ceil(int numerator, int denominator)
{
std::div_t res = std::div(numerator, denominator);
return res.rem ? (res.quot + 1) : res.quot;
}
int main(int, const char**)
{
std::cout << "10 / 5 = " << div_ceil(10, 5) << std::endl;
std::cout << "11 / 5 = " << div_ceil(11, 5) << std::endl;
return 0;
}
There's a solution for both positive and negative x but only for positive y with just 1 division and without branches:
int div_ceil(int x, int y) {
return x / y + (x % y > 0);
}
Note, if x is positive then division is towards zero, and we should add 1 if reminder is not zero.
If x is negative then division is towards zero, that's what we need, and we will not add anything because x % y is not positive
How about this? (requires y non-negative, so don't use this in the rare case where y is a variable with no non-negativity guarantee)
q = (x > 0)? 1 + (x - 1)/y: (x / y);
I reduced y/y to one, eliminating the term x + y - 1 and with it any chance of overflow.
I avoid x - 1 wrapping around when x is an unsigned type and contains zero.
For signed x, negative and zero still combine into a single case.
Probably not a huge benefit on a modern general-purpose CPU, but this would be far faster in an embedded system than any of the other correct answers.
I would have rather commented but I don't have a high enough rep.
As far as I am aware, for positive arguments and a divisor which is a power of 2, this is the fastest way (tested in CUDA):
//example y=8
q = (x >> 3) + !!(x & 7);
For generic positive arguments only, I tend to do it like so:
q = x/y + !!(x % y);
This works for positive or negative numbers:
q = x / y + ((x % y != 0) ? !((x > 0) ^ (y > 0)) : 0);
If there is a remainder, checks to see if x and y are of the same sign and adds 1 accordingly.
simplified generic form,
int div_up(int n, int d) {
return n / d + (((n < 0) ^ (d > 0)) && (n % d));
} //i.e. +1 iff (not exact int && positive result)
For a more generic answer, C++ functions for integer division with well defined rounding strategy
For signed or unsigned integers.
q = x / y + !(((x < 0) != (y < 0)) || !(x % y));
For signed dividends and unsigned divisors.
q = x / y + !((x < 0) || !(x % y));
For unsigned dividends and signed divisors.
q = x / y + !((y < 0) || !(x % y));
For unsigned integers.
q = x / y + !!(x % y);
Zero divisor fails (as with a native operation). Cannot cause overflow.
Corresponding floored and modulo constexpr implementations here, along with templates to select the necessary overloads (as full optimization and to prevent mismatched sign comparison warnings):
https://github.com/libbitcoin/libbitcoin-system/wiki/Integer-Division-Unraveled
Compile with O3, The compiler performs optimization well.
q = x / y;
if (x % y) ++q;

Clean, efficient algorithm for wrapping integers in C++

/**
* Returns a number between kLowerBound and kUpperBound
* e.g.: Wrap(-1, 0, 4); // Returns 4
* e.g.: Wrap(5, 0, 4); // Returns 0
*/
int Wrap(int const kX, int const kLowerBound, int const kUpperBound)
{
// Suggest an implementation?
}
The sign of a % b is only defined if a and b are both non-negative.
int Wrap(int kX, int const kLowerBound, int const kUpperBound)
{
int range_size = kUpperBound - kLowerBound + 1;
if (kX < kLowerBound)
kX += range_size * ((kLowerBound - kX) / range_size + 1);
return kLowerBound + (kX - kLowerBound) % range_size;
}
The following should work independently of the implementation of the mod operator:
int range = kUpperBound - kLowerBound + 1;
kx = ((kx-kLowerBound) % range);
if (kx<0)
return kUpperBound + 1 + kx;
else
return kLowerBound + kx;
An advantage over other solutions is, that it uses only a single % (i.e. division), which makes it pretty efficient.
Note (Off Topic):
It's a good example, why sometimes it is wise to define intervals with the upper bound being being the first element not in the range (such as for STL iterators...). In this case, both "+1" would vanish.
Fastest solution, least flexible: Take advantage of native datatypes that will do wrapping in the hardware.
The absolute fastest method for wrapping integers would be to make sure your data is scaled to int8/int16/int32 or whatever native datatype. Then when you need your data to wrap the native data type will be done in hardware! Very painless and orders of magnitude faster than any software wrapping implementation seen here.
As an example case study:
I have found this to be very useful when I need a fast implementation of sin/cos implemented using a look-up-table for a sin/cos implementation. Basically you make scale your data such that INT16_MAX is pi and INT16_MIN is -pi. Then have you are set to go.
As a side note, scaling your data will add some up front finite computation cost that usually looks something like:
int fixedPoint = (int)( floatingPoint * SCALING_FACTOR + 0.5 )
Feel free to exchange int for something else you want like int8_t / int16_t / int32_t.
Next fastest solution, more flexible: The mod operation is slow instead if possible try to use bit masks!
Most of the solutions I skimmed are functionally correct... but they are dependent on the mod operation.
The mod operation is very slow because it is essentially doing a hardware division. The laymans explanation of why mod and division are slow is to equate the division operation to some pseudo-code for(quotient = 0;inputNum> 0;inputNum -= divisor) { quotient++; } ( def of quotient and divisor ). As you can see, the hardware division can be fast if it is a low number relative to the divisor... but division can also be horribly slow if it is much greater than the divisor.
If you can scale your data to a power of two then you can use a bit mask which will execute in one cycle ( on 99% of all platforms ) and your speed improvement will be approximately one order of magnitude ( at the very least 2 or 3 times faster ).
C code to implement wrapping:
#define BIT_MASK (0xFFFF)
int wrappedAddition(int a, int b) {
return ( a + b ) & BIT_MASK;
}
int wrappedSubtraction(int a, int b) {
return ( a - b ) & BIT_MASK;
}
Feel free to make the #define something that is run time. And feel free to adjust the bit mask to be whatever power of two that you need. Like 0xFFFFFFFF or power of two you decide on implementing.
p.s. I strongly suggest reading about fixed point processing when messing with wrapping/overflow conditions. I suggest reading:
Fixed-Point Arithmetic: An Introduction by Randy Yates August 23, 2007
Please do not overlook this post. :)
Is this any good?
int Wrap(N,L,H){
H=H-L+1; return (N-L+(N<L)*H)%H+L;
}
This works for negative inputs, and all arguments can be negative so long as L is less than H.
Background... (Note that H here is the reused variable, set to original H-L+1).
I had been using (N-L)%H+L when incrementing, but unlike in Lua, which I used before starting to learn C a few months back, this would NOT work if I used inputs below the lower bound, never mind negative inputs. (Lua is built in C, but I don't know what it's doing, and it likely wouldn't be fast...)
I decided to add +(N<L)*H to make (N-L+(N<L)*H)%H+L, as C seems to be defined such that true=1 and false=0. It works well enough for me, and seems to answer the original question neatly. If anyone knows how to do it without the MOD operator % to make it dazzlingly fast, please do it. I don't need speed right now, but some time I will, no doubt.
EDIT:
That function fails if N is lower than L by more than H-L+1 but this doesn't:
int Wrap(N,L,H){
H-=L; return (N-L+(N<L)*((L-N)/H+1)*++H)%H+L;
}
I think it would break at the negative extreme of the integer range in any system, but should work for most practical situations. It adds an extra multiplication and a division, but is still fairly compact.
(This edit is just for completion, because I came up with a much better way, in a newer post in this thread.)
Crow.
Personally I've found solutions to these types of functions to be cleaner if range is exclusive and divisor is restricted to positive values.
int ifloordiv(int x, int y)
{
if (x > 0)
return x / y;
if (x < 0)
return (x + 1) / y - 1;
return 0
}
int iwrap(int x, int y)
{ return x - y * ifloordiv(x, y);
}
Integrated.
int iwrap(int x, int y)
{
if (x > 0)
return x % y;
if (x < 0)
return (x + 1) % y + y - 1;
return 0;
}
Same family. Why not?
int ireflect(int x, int y)
{
int z = iwrap(x, y*2);
if (z < y)
return z;
return y*2-1 - z;
}
int ibandy(int x, int y)
{
if (y != 1)
return ireflect(abs(x + x / (y - 1)), y);
return 0;
}
Ranged functionality can be implemented for all functions with,
// output is in the range [min, max).
int func2(int x, int min, int max)
{
// increment max for inclusive behavior.
assert(min < max);
return func(x - min, max - min) + min;
}
Actually, since -1 % 4 returns -1 on every system I've even been on, the simple mod solution doesn't work. I would try:
int range = kUpperBound - kLowerBound +1;
kx = ((kx - kLowerBound) % range) + range;
return (kx % range) + kLowerBound;
if kx is positive, you mod, add range, and mod back, undoing the add. If kx is negative, you mod, add range which makes it positive, then mod again, which doesn't do anything.
My other post got nasty, all that 'corrective' multiplication and division got out of hand. After looking at Martin Stettner's post, and at my own starting conditions of (N-L)%H+L, I came up with this:
int Wrap(N,L,H){
H=H-L+1; N=(N-L)%H+L; if(N<L)N+=H; return N;
}
At the extreme negative end of the integer range it breaks as my other one would, but it will be faster, and is a lot easier to read, and avoids the other nastiness that crept in to it.
Crow.
I would suggest this solution:
int Wrap(int const kX, int const kLowerBound, int const kUpperBound)
{
int d = kUpperBound - kLowerBound + 1;
return kLowerBound + (kX >= 0 ? kX % d : -kX % d ? d - (-kX % d) : 0);
}
The if-then-else logic of the ?: operator makes sure that both operands of % are nonnegative.
I would give an entry point to the most common case lowerBound=0, upperBound=N-1. And call this function in the general case. No mod computation is done where I is already in range. It assumes upper>=lower, or n>0.
int wrapN(int i,int n)
{
if (i<0) return (n-1)-(-1-i)%n; // -1-i is >=0
if (i>=n) return i%n;
return i; // In range, no mod
}
int wrapLU(int i,int lower,int upper)
{
return lower+wrapN(i-lower,1+upper-lower);
}
An answer that has some symmetry and also makes it obvious that when kX is in range, it is returned unmodified.
int Wrap(int const kX, int const kLowerBound, int const kUpperBound)
{
int range_size = kUpperBound - kLowerBound + 1;
if (kX < kLowerBound)
return kX + range_size * ((kLowerBound - kX) / range_size + 1);
if (kX > kUpperBound)
return kX - range_size * ((kX - kUpperBound) / range_size + 1);
return kX;
}
I've faced this problem as well. This is my solution.
template <> int mod(const int &x, const int &y) {
return x % y;
}
template <class T> T mod(const T &x, const T &y) {
return ::fmod((T)x, (T)y);
}
template <class T> T wrap(const T &x, const T &max, const T &min = 0) {
if(max < min)
return x;
if(x > max)
return min + mod(x - min, max - min + 1);
if(x < min)
return max - mod(min - x, max - min + 1);
return x;
}
I don't know if it's good, but I'd thought I'd share since I got directed here when doing a Google search on this problem and found the above solutions lacking to my needs. =)
In the special case where the lower bound is zero, this code avoids division, modulus and multiplication. The upper bound does not have to be a power of two. This code is overly verbose and looks bloated, but compiles into 3 instructions: subtract, shift (by constant), and 'and'.
#include <climits> // CHAR_BIT
// -------------------------------------------------------------- allBits
// sign extend a signed integer into an unsigned mask:
// return all zero bits (+0) if arg is positive,
// or all one bits (-0) for negative arg
template <typename SNum>
static inline auto allBits (SNum arg) {
static constexpr auto argBits = CHAR_BIT * sizeof( arg);
static_assert( argBits < 256, "allBits() sign extension may fail");
static_assert( std::is_signed< SNum>::value, "SNum must be signed");
typedef typename std::make_unsigned< SNum>::type UNum;
// signed shift required, but need unsigned result
const UNum mask = UNum( arg >> (argBits - 1));
return mask;
}
// -------------------------------------------------------------- boolWrap
// wrap reset a counter without conditionals:
// return arg >= limit? 0 : arg
template <typename UNum>
static inline auto boolWrap (const UNum arg, const UNum limit) {
static_assert( ! std::is_signed< UNum>::value, "UNum assumed unsigned");
typedef typename std::make_signed< UNum>::type SNum;
const SNum negX = SNum( arg) - SNum( limit);
const auto signX = allBits( negX); // +0 or -0
return arg & signX;
}
// example usage:
for (int j= 0; j < 15; ++j) {
cout << j << boolWrap( j, 11);
}
For negative kX, you can add:
int temp = kUpperBound - kLowerBound + 1;
while (kX < 0) kX += temp;
return kX%temp + kLowerBound;
Why not using Extension methods.
public static class IntExtensions
{
public static int Wrap(this int kX, int kLowerBound, int kUpperBound)
{
int range_size = kUpperBound - kLowerBound + 1;
if (kX < kLowerBound)
kX += range_size * ((kLowerBound - kX) / range_size + 1);
return kLowerBound + (kX - kLowerBound) % range_size;
}
}
Usage: currentInt = (++currentInt).Wrap(0, 2);