Clean, efficient algorithm for wrapping integers in C++ - c++

/**
* Returns a number between kLowerBound and kUpperBound
* e.g.: Wrap(-1, 0, 4); // Returns 4
* e.g.: Wrap(5, 0, 4); // Returns 0
*/
int Wrap(int const kX, int const kLowerBound, int const kUpperBound)
{
// Suggest an implementation?
}

The sign of a % b is only defined if a and b are both non-negative.
int Wrap(int kX, int const kLowerBound, int const kUpperBound)
{
int range_size = kUpperBound - kLowerBound + 1;
if (kX < kLowerBound)
kX += range_size * ((kLowerBound - kX) / range_size + 1);
return kLowerBound + (kX - kLowerBound) % range_size;
}

The following should work independently of the implementation of the mod operator:
int range = kUpperBound - kLowerBound + 1;
kx = ((kx-kLowerBound) % range);
if (kx<0)
return kUpperBound + 1 + kx;
else
return kLowerBound + kx;
An advantage over other solutions is, that it uses only a single % (i.e. division), which makes it pretty efficient.
Note (Off Topic):
It's a good example, why sometimes it is wise to define intervals with the upper bound being being the first element not in the range (such as for STL iterators...). In this case, both "+1" would vanish.

Fastest solution, least flexible: Take advantage of native datatypes that will do wrapping in the hardware.
The absolute fastest method for wrapping integers would be to make sure your data is scaled to int8/int16/int32 or whatever native datatype. Then when you need your data to wrap the native data type will be done in hardware! Very painless and orders of magnitude faster than any software wrapping implementation seen here.
As an example case study:
I have found this to be very useful when I need a fast implementation of sin/cos implemented using a look-up-table for a sin/cos implementation. Basically you make scale your data such that INT16_MAX is pi and INT16_MIN is -pi. Then have you are set to go.
As a side note, scaling your data will add some up front finite computation cost that usually looks something like:
int fixedPoint = (int)( floatingPoint * SCALING_FACTOR + 0.5 )
Feel free to exchange int for something else you want like int8_t / int16_t / int32_t.
Next fastest solution, more flexible: The mod operation is slow instead if possible try to use bit masks!
Most of the solutions I skimmed are functionally correct... but they are dependent on the mod operation.
The mod operation is very slow because it is essentially doing a hardware division. The laymans explanation of why mod and division are slow is to equate the division operation to some pseudo-code for(quotient = 0;inputNum> 0;inputNum -= divisor) { quotient++; } ( def of quotient and divisor ). As you can see, the hardware division can be fast if it is a low number relative to the divisor... but division can also be horribly slow if it is much greater than the divisor.
If you can scale your data to a power of two then you can use a bit mask which will execute in one cycle ( on 99% of all platforms ) and your speed improvement will be approximately one order of magnitude ( at the very least 2 or 3 times faster ).
C code to implement wrapping:
#define BIT_MASK (0xFFFF)
int wrappedAddition(int a, int b) {
return ( a + b ) & BIT_MASK;
}
int wrappedSubtraction(int a, int b) {
return ( a - b ) & BIT_MASK;
}
Feel free to make the #define something that is run time. And feel free to adjust the bit mask to be whatever power of two that you need. Like 0xFFFFFFFF or power of two you decide on implementing.
p.s. I strongly suggest reading about fixed point processing when messing with wrapping/overflow conditions. I suggest reading:
Fixed-Point Arithmetic: An Introduction by Randy Yates August 23, 2007

Please do not overlook this post. :)
Is this any good?
int Wrap(N,L,H){
H=H-L+1; return (N-L+(N<L)*H)%H+L;
}
This works for negative inputs, and all arguments can be negative so long as L is less than H.
Background... (Note that H here is the reused variable, set to original H-L+1).
I had been using (N-L)%H+L when incrementing, but unlike in Lua, which I used before starting to learn C a few months back, this would NOT work if I used inputs below the lower bound, never mind negative inputs. (Lua is built in C, but I don't know what it's doing, and it likely wouldn't be fast...)
I decided to add +(N<L)*H to make (N-L+(N<L)*H)%H+L, as C seems to be defined such that true=1 and false=0. It works well enough for me, and seems to answer the original question neatly. If anyone knows how to do it without the MOD operator % to make it dazzlingly fast, please do it. I don't need speed right now, but some time I will, no doubt.
EDIT:
That function fails if N is lower than L by more than H-L+1 but this doesn't:
int Wrap(N,L,H){
H-=L; return (N-L+(N<L)*((L-N)/H+1)*++H)%H+L;
}
I think it would break at the negative extreme of the integer range in any system, but should work for most practical situations. It adds an extra multiplication and a division, but is still fairly compact.
(This edit is just for completion, because I came up with a much better way, in a newer post in this thread.)
Crow.

Personally I've found solutions to these types of functions to be cleaner if range is exclusive and divisor is restricted to positive values.
int ifloordiv(int x, int y)
{
if (x > 0)
return x / y;
if (x < 0)
return (x + 1) / y - 1;
return 0
}
int iwrap(int x, int y)
{ return x - y * ifloordiv(x, y);
}
Integrated.
int iwrap(int x, int y)
{
if (x > 0)
return x % y;
if (x < 0)
return (x + 1) % y + y - 1;
return 0;
}
Same family. Why not?
int ireflect(int x, int y)
{
int z = iwrap(x, y*2);
if (z < y)
return z;
return y*2-1 - z;
}
int ibandy(int x, int y)
{
if (y != 1)
return ireflect(abs(x + x / (y - 1)), y);
return 0;
}
Ranged functionality can be implemented for all functions with,
// output is in the range [min, max).
int func2(int x, int min, int max)
{
// increment max for inclusive behavior.
assert(min < max);
return func(x - min, max - min) + min;
}

Actually, since -1 % 4 returns -1 on every system I've even been on, the simple mod solution doesn't work. I would try:
int range = kUpperBound - kLowerBound +1;
kx = ((kx - kLowerBound) % range) + range;
return (kx % range) + kLowerBound;
if kx is positive, you mod, add range, and mod back, undoing the add. If kx is negative, you mod, add range which makes it positive, then mod again, which doesn't do anything.

My other post got nasty, all that 'corrective' multiplication and division got out of hand. After looking at Martin Stettner's post, and at my own starting conditions of (N-L)%H+L, I came up with this:
int Wrap(N,L,H){
H=H-L+1; N=(N-L)%H+L; if(N<L)N+=H; return N;
}
At the extreme negative end of the integer range it breaks as my other one would, but it will be faster, and is a lot easier to read, and avoids the other nastiness that crept in to it.
Crow.

I would suggest this solution:
int Wrap(int const kX, int const kLowerBound, int const kUpperBound)
{
int d = kUpperBound - kLowerBound + 1;
return kLowerBound + (kX >= 0 ? kX % d : -kX % d ? d - (-kX % d) : 0);
}
The if-then-else logic of the ?: operator makes sure that both operands of % are nonnegative.

I would give an entry point to the most common case lowerBound=0, upperBound=N-1. And call this function in the general case. No mod computation is done where I is already in range. It assumes upper>=lower, or n>0.
int wrapN(int i,int n)
{
if (i<0) return (n-1)-(-1-i)%n; // -1-i is >=0
if (i>=n) return i%n;
return i; // In range, no mod
}
int wrapLU(int i,int lower,int upper)
{
return lower+wrapN(i-lower,1+upper-lower);
}

An answer that has some symmetry and also makes it obvious that when kX is in range, it is returned unmodified.
int Wrap(int const kX, int const kLowerBound, int const kUpperBound)
{
int range_size = kUpperBound - kLowerBound + 1;
if (kX < kLowerBound)
return kX + range_size * ((kLowerBound - kX) / range_size + 1);
if (kX > kUpperBound)
return kX - range_size * ((kX - kUpperBound) / range_size + 1);
return kX;
}

I've faced this problem as well. This is my solution.
template <> int mod(const int &x, const int &y) {
return x % y;
}
template <class T> T mod(const T &x, const T &y) {
return ::fmod((T)x, (T)y);
}
template <class T> T wrap(const T &x, const T &max, const T &min = 0) {
if(max < min)
return x;
if(x > max)
return min + mod(x - min, max - min + 1);
if(x < min)
return max - mod(min - x, max - min + 1);
return x;
}
I don't know if it's good, but I'd thought I'd share since I got directed here when doing a Google search on this problem and found the above solutions lacking to my needs. =)

In the special case where the lower bound is zero, this code avoids division, modulus and multiplication. The upper bound does not have to be a power of two. This code is overly verbose and looks bloated, but compiles into 3 instructions: subtract, shift (by constant), and 'and'.
#include <climits> // CHAR_BIT
// -------------------------------------------------------------- allBits
// sign extend a signed integer into an unsigned mask:
// return all zero bits (+0) if arg is positive,
// or all one bits (-0) for negative arg
template <typename SNum>
static inline auto allBits (SNum arg) {
static constexpr auto argBits = CHAR_BIT * sizeof( arg);
static_assert( argBits < 256, "allBits() sign extension may fail");
static_assert( std::is_signed< SNum>::value, "SNum must be signed");
typedef typename std::make_unsigned< SNum>::type UNum;
// signed shift required, but need unsigned result
const UNum mask = UNum( arg >> (argBits - 1));
return mask;
}
// -------------------------------------------------------------- boolWrap
// wrap reset a counter without conditionals:
// return arg >= limit? 0 : arg
template <typename UNum>
static inline auto boolWrap (const UNum arg, const UNum limit) {
static_assert( ! std::is_signed< UNum>::value, "UNum assumed unsigned");
typedef typename std::make_signed< UNum>::type SNum;
const SNum negX = SNum( arg) - SNum( limit);
const auto signX = allBits( negX); // +0 or -0
return arg & signX;
}
// example usage:
for (int j= 0; j < 15; ++j) {
cout << j << boolWrap( j, 11);
}

For negative kX, you can add:
int temp = kUpperBound - kLowerBound + 1;
while (kX < 0) kX += temp;
return kX%temp + kLowerBound;

Why not using Extension methods.
public static class IntExtensions
{
public static int Wrap(this int kX, int kLowerBound, int kUpperBound)
{
int range_size = kUpperBound - kLowerBound + 1;
if (kX < kLowerBound)
kX += range_size * ((kLowerBound - kX) / range_size + 1);
return kLowerBound + (kX - kLowerBound) % range_size;
}
}
Usage: currentInt = (++currentInt).Wrap(0, 2);

Related

Division with negative dividend, but rounded towards negative infinity?

Consider the following code (in C++11):
int a = -11, b = 3;
int c = a / b;
// now c == -3
C++11 specification says that division with a negative dividend is rounded toward zero.
It is quite useful for there to be a operator or function to do division with rounding toward negative infinity (e.g. for consistency with positive dividends when iterating a range), so is there a function or operator in the standard library that does what I want? Or perhaps a compiler-defined function/intrinsic that does it in modern compilers?
I could write my own, such as the following (works only for positive divisors):
int div_neg(int dividend, int divisor){
if(dividend >= 0) return dividend / divisor;
else return (dividend - divisor + 1) / divisor;
}
But it would not be as descriptive of my intent, and possibly not be as optimized a standard library function or compiler intrinsic (if it exists).
I'm not aware of any intrinsics for it. I would simply apply a correction to standard division retrospectively.
int div_floor(int a, int b)
{
int res = a / b;
int rem = a % b;
// Correct division result downwards if up-rounding happened,
// (for non-zero remainder of sign different than the divisor).
int corr = (rem != 0 && ((rem < 0) != (b < 0)));
return res - corr;
}
Note it also works for pre-C99 and pre-C++11, i.e. without standarization of rounding division towards zero.
Here's another possible variant, valid for positive divisors and arbitrary dividends.
int div_floor(int n, int d) {
return n >= 0 ? n / d : -1 - (-1 - n) / d;
}
Explanation: in the case of negative n, write q for (-1 - n) / d, then -1 - n = qd + r for some r satisfying 0 <= r < d. Rearranging gives n = (-1 - q)d + (d - 1 - r). It's clear that 0 <= d - 1 - r < d, so d - 1 - r is the remainder of the floor division operation, and -1 - q is the quotient.
Note that the arithmetic operations here are all safe from overflow, regardless of the internal representation of signed integers (two's complement, ones' complement, sign-magnitude).
Assuming two's complement representation for signed integers, a good compiler should optimise the two -1-* operations to bitwise negation operations. On my x86-64 machine, the second branch of the conditional gets compiled to the following sequence:
notl %edi
movl %edi, %eax
cltd
idivl %esi
notl %eax
The standard library has only one function that can be used to do what you want: floor. The division you're after can be expressed as floor((double) n / d). However, this assumes that double has enough precision to represent both n and d exactly. If not, then this may introduce rounding errors.
Personally, I'd go with a custom implementation. But you can use the floating point version too, if that's easier to read and you've verified that the results are correct for the ranges you're calling it for.
C++11 has a std::div(a, b) that returns both a % b and a / b in struct with rem and quot members (so both remainder and quotient primitives) and for which modern processors have a single instruction. C++11 does truncated division.
To do floored division for both the remainder and the quotient, you can write:
// http://stackoverflow.com/a/4609795/819272
auto signum(int n) noexcept
{
return static_cast<int>(0 < n) - static_cast<int>(n < 0);
}
auto floored_div(int D, int d) // Throws: Nothing.
{
assert(d != 0);
auto const divT = std::div(D, d);
auto const I = signum(divT.rem) == -signum(d) ? 1 : 0;
auto const qF = divT.quot - I;
auto const rF = divT.rem + I * d;
assert(D == d * qF + rF);
assert(abs(rF) < abs(d));
assert(signum(rF) == signum(d));
return std::div_t{qF, rF};
}
Finally, it's handy to also have Euclidean division (for which the remainder is always non-negative) in your own library:
auto euclidean_div(int D, int d) // Throws: Nothing.
{
assert(d != 0);
auto const divT = std::div(D, d);
auto const I = divT.rem >= 0 ? 0 : (d > 0 ? 1 : -1);
auto const qE = divT.quot - I;
auto const rE = divT.rem + I * d;
assert(D == d * qE + rE);
assert(abs(rE) < abs(d));
assert(signum(rE) != -1);
return std::div_t{qE, rE};
}
There is a Microsoft research paper discussing the pros and cons of the 3 versions.
When the operands are both positive, the / operator does floored division.
When the operands are both negative, the / operator does floored division.
When exactly one of the operands is negative, the / operator does ceiling division.
For the last case, the quotient can be adjusted when exactly one operand is negative and there is no remainder (with no remainder, floored division and ceiling division work the same).
int floored_div(int numer, int denom) {
int div = numer / denom;
int n_negatives = (numer < 0) + (denom < 0);
div -= (n_negatives == 1) && (numer % denom != 0);
return div;
}

Optimizing Fixed-Point Sqrt

I made what I think is a good fixed-point square root algorithm:
template<int64_t M, int64_t P>
typename enable_if<M + P == 32, FixedPoint<M, P>>::type sqrt(FixedPoint<M, P> f)
{
if (f.num == 0)
return 0;
//Reduce it to the 1/2 to 2 range (based around FixedPoint<2, 30> to avoid left/right shift branching)
int64_t num{ f.num }, faux_half{ 1 << 29 };
ptrdiff_t mag{ 0 };
while (num < (faux_half)) {
num <<= 2;
++mag;
}
int64_t res = (M % 2 == 0 ? SQRT_32_EVEN_LOOKUP : SQRT_32_ODD_LOOKUP)[(num >> (30 - 4)) - (1LL << 3)];
res >>= M / 2 + mag - 1; //Finish making an excellent guess
for (int i = 0; i < 2; ++i)
// \ | /
// \ | /
// _| V L
res = (res + (int64_t(f.num) << P) / res) >> 1; //Use Newton's method to improve greatly on guess
// 7 A r
// / | \
// / | \
// The Infamous Time Eater
return FixedPoint<M, P>(res, true);
}
However, after profiling (in release mode) I found out that the division here takes up 83% of the time this algorithm spends. I can speed it up 6x by replacing the division with multiplication, but that's just wrong. I found out that integer division is much slower than multiplication, unfortunately. Is there any way to optimize this?
In case this table is necessary.
const array<int32_t, 24> SQRT_32_EVEN_LOOKUP = {
0x2d413ccd, //magic numbers calculated by taking 0.5 + 0.5 * i / 8 from i = 0 to 23, multiplying by 2^30, and converting to hex
0x30000000,
0x3298b076,
0x3510e528,
0x376cf5d1,
0x39b05689,
0x3bddd423,
0x3df7bd63,
0x40000000,
0x41f83d9b,
0x43e1db33,
0x45be0cd2,
0x478dde6e,
0x49523ae4,
0x4b0bf165,
0x4cbbb9d6,
0x4e623850,
0x50000000,
0x5195957c,
0x532370b9,
0x54a9fea7,
0x5629a293,
0x57a2b749,
0x59159016
};
SQRT_32_ODD_LOOKUP is just SQRT_32_EVEN_LOOKUP divided by sqrt(2).
Reinventing the wheel, really, and not in a good way. The correct solution is to calculate 1/sqrt(x) using NR, and then multiply once to get x/sqrt(x) - just check for x==0 up front.
The reason why this is so much better is that the NR step for y=1/sqrt(x) is just y = (3-x*y*y)*y/2. That's all straightforward multiplication.

Recursive algorithm for cos taylor series expansion c++

I recently wrote a Computer Science exam where they asked us to give a recursive definition for the cos taylor series expansion. This is the series
cos(x) = 1 - x^2/2! + x^4/4! + x^6/6! ...
and the function signature looks as follows
float cos(int n , float x)
where n represents the number in the series the user would like to calculate till and x represents the value of x in the cos function
I obviously did not get that question correct and I have been trying to figure it out for the past few days but I have hit a brick wall
Would anyone be able to help out getting me started somewhere ?
All answers so far recompute the factorial every time. I surely wouldn't do that. Instead you can write :
float cos(int n, float x)
{
if (n > MAX)
return 1;
return 1 - x*x / ((2 * n - 1) * (2 * n)) * cos(n + 1, x);
}
Consider that cos returns the following (sorry for the dots position) :
You can see that this is true for n>MAX, n=MAX, and so on. The sign alternating and powers of x are easy to see.
Finally, at n=1 you get 0! = 1, so calling cos(1, x) gets you the first MAX terms of the Taylor expansion of cos.
By developing (easier to see when it has few terms), you can see the first formula is equivalent to the following :
For n > 0, you do in cos(n-1, x) a division by (2n-3)(2n-2) of the previous result, and a multiplication by x². You can see that when n=MAX+1 this formula is 1, with n=MAX then it is 1-x²/((2MAX-1)2MAX) and so on.
If you allow yourself helper functions, then you should change the signature of the above to float cos_helper(int n, float x, int MAX) and call it like so :
float cos(int n, float x) { return cos_helper(1, x, n); }
Edit : To reverse the meaning of n from degree of the evaluated term (as in this answer so far) to number of terms (as in the question, and below), but still not recompute the total factorial every time, I would suggest using a two-term relation.
Let us define trivially cos(0,x) = 0 and cos(1,x) = 1, and try to achieve generally cos(n,x) the sum of the n first terms of the Taylor series.
Then for each n > 0, we can write, cos(n,x) from cos(n-1,x) :
cos(n,x) = cos(n-1,x) + x2n / (2n)!
now for n > 1, we try to make the last term of cos(n-1,x) appear (because it is the closest term to the one we want to add) :
cos(n,x) = cos(n-1,x) + x² / ((2n-1)2n) * ( x2n-2 / (2n-2)! )
By combining this formula with the previous one (adapting it to n-1 instead of n) :
cos(n,x) = cos(n-1,x) + x² / ((2n-1)2n) * ( cos(n-1,x) - cos(n-2,x) )
We now have a purely recursive definition of cos(n,x), without helper function, without recomputing the factorial, and with n the number of terms in the sum of the Taylor decomposition.
However, I must stress that the following code will perform terribly :
performance wise, unless some optimization allows to not re-evaluate a cos(n-1,x) that was evaluated at the previous step as cos( (n-1) - 1, x)
precision wise, because of cancellation effects : the precision with which we get x2n-2 / (2n-2)! is very bad
Now this disclaimer is in place, here comes the code :
float cos(int n, float x)
{
if (n < 2)
return n;
float c = x * x / (2 * (n - 1) * 2 * n);
return (1-c) * cos(n-1, x) + c * cos(n-2, x);
}
cos(x)=1 - x^2/2! + x^4/4! - x^6/6! + x^8/8!.....
=1-x^2/2 (1 - x^2/3*4 + x^4/3*4*5*6 -x^6/3*4*5*6*7*8)
=1 - x^2/2 {1- x^2/3*4 (1- x^2/5*6 + x^4/5*6*7*8)}
=1 - x^2/2 [1- x^2/3*4 {1- x^2/5*6 ( 1- x^2/7*8)}]
double cos_series_recursion(double x, int n, double r=1){
if(n>0){
r=1-((x*x*r)/(n*(n-1)));
return cos_series_recursion(x,n-2,r);
}else return r;
}
A simple approach that makes use of static variables:
double cos(double x, int n) {
static double p = 1, f = 1;
double r;
if(n == 0)
return 1;
r = cos(x, n-1);
p = (p*x)*x;
f = f*(2*n-1)*2*n;
if(n%2==0) {
return r+p/f;
} else {
return r-p/f;
}
}
Notice that I'm multiplying 2*n in the operation to get the next factorial.
Having n align to the factorial we need makes this easy to do in 2 operations: f = f * (n - 1) then f = f * n.
when n = 1, we need 2!
when n = 2, we need 4!
when n = 3, we need 6!
So we can safely double n and work from there. We could write:
n = 2*n;
f = f*(n-1);
f = f*n;
If we did this, we would need to update our even/odd check to if((n/2)%2==0) since we're doubling the value of n.
This can instead be written as f = f*(2*n-1)*2*n; and now we don't have to divide n when checking if it's even/odd, since n is not being altered.
You can use a loop or recursion, but I would recommend a loop. Anyway, if you must use recursion you could use something like the code below
#include <iostream>
using namespace std;
int fact(int n) {
if (n <= 1) return 1;
else return n*fact(n-1);
}
float Cos(int n, float x) {
if (n == 0) return 1;
return Cos(n-1, x) + (n%2 ? -1 : 1) * pow (x, 2*n) / (fact(2*n));
}
int main()
{
cout << Cos(6, 3.14/6);
}
Just do it like the sum.
The parameter n in float cos(int n , float x) is the l and now just do it...
Some pseudocode:
float cos(int n , float x)
{
//the sum-part
float sum = pow(-1, n) * (pow(x, 2*n))/faculty(2*n);
if(n <= /*Some predefined maximum*/)
return sum + cos(n + 1, x);
return sum;
}
The usual technique when you want to recurse but the function arguments don't carry the information that you need, is to introduce a helper function to do the recursion.
I have the impression that in the Lisp world the convention is to name such a function something-aux (short for auxiliary), but that may have been just a limited group in the old days.
Anyway, the main problem here is that n represents the natural ending point for the recursion, the base case, and that you then also need some index that works itself up to n. So, that's one good candidate for extra argument for the auxiliary function. Another candidate stems from considering how one term of the series relates to the previous one.

Signed Char ATAN2 and ATAN approximations

Basically, I've been trying to make two approximation functions. In both cases I input the "x" and the "y" components (to deal with those nasty n/0 and 0/0 conditions), and need to get a Signed Char output. In ATAN2's case, it should provide a range of +/-PI, and in ATAN's case, the range should be +/- PI/2.
I spent the entire of yesterday trying to wrap my head around it. After playing around in excel to find an overall good algorithm based on the approximation:
X * (PI/4 + 0.273 * (1 - |X|)) * 128/PI // Scale factor at end to switch to char format
I came up with the following code:
signed char nabsSC(signed char x)
{
if(x > 0)
return -x;
return x;
}
signed char signSC(signed char input, signed char ifZero = 0, signed char scaleFactor = 1)
{
if(input > 0)
{return scaleFactor;}
else if(input < 0)
{return -scaleFactor;}
else
{return ifZero;}
}
signed char divisionSC(signed char numerator, signed char denominator)
{
if(denominator == 0) // Error Condition
{return 0;}
else
{return numerator/denominator;}
}
//#######################################################################################
signed char atan2SC(signed char y, signed char x)
{
// #todo make clearer : the code was deduced through trial and error in excel with brute force... not the best reasoning in the world but hey ho
if((x == y) && (x == 0)) // Error Condition
{return 0;}
// Prepare for algorithm Choice
const signed char X = abs(x);
signed char Y = abs(y);
if(Y > 2)
{Y = (Y << 1) + 4;}
const signed char alpha1 = 43;
const signed char alpha2 = 11;
// Make Choice
if(X <= Y) // x/y Path
{
const signed char beta = 64;
const signed char a = divisionSC(x,y); // x/y
const signed char A = nabsSC(a); // -|x/y|
const signed char temp = a * (alpha1 + alpha2 * A); // (x/y) * (32 + ((0.273 * 128) / PI) * (1 - |x/y|)))
// Small angle approximation of ARCTAN(X)
if(y < 0) // Determine Quadrant
{return -(temp + beta);}
else
{return -(temp - beta);}
}
else // y/x Path
{
const signed char a = divisionSC(y,x); // y/x
const signed char A = nabsSC(a); // -|y/x|
const signed char temp = a * (alpha1 + alpha2 * A); // (y/x) * (32 + ((0.273 * 128) / PI) * (1 - |y/x|)))
// Small angle approximation of ARCTAN(X)
if(x < 0) // Determine Quadrant
{
Y = signSC(y, -127, 127); // Sign(y)*127, if undefined: use -127
return temp + Y;
}
else
{return temp;}
}
}
Much to my despair, the implementation has errors as large as 180 degrees, and pretty much everywhere in between as well. (I compared it to the ATAN2F from the library after converting to signed char format.)
I got the general gist from this website: http://geekshavefeelings.com/posts/fixed-point-atan2
Can anybody tell me where I'm going wrong? And how I should approach the ATAN variant (which should be more precise as it's looking over half the range) without all this craziness.
I'm currently using QT creator 4.8.1 on windows. The end platform for this specific bit of code will eventually be a micro-controller without an FPU, and the ATAN functions will be one of the primary functions used. As such, efficiency with reasonable error (+/-2 degrees for ATAN2 and +/-1 degree for ATAN. These are guesstimates for now, so I might increase the range, however, 90 degrees is definitely not acceptable!) is the aim of the game.
Thanks in advance for any and all help!
EDIT:
Just to clarify, the outputs of ATAN2 and ATAN output to a signed char value, but the ranges of the two types are different ranges.
ATAN2 shall have a range from -128 (-PI) to 127 (+PI - PI/128).
ATAN will have a range from -128 (-PI/2) to 127 (+PI/2 - PI/256).
As such the output values from the two can be considered to be two different data types.
Sorry for any confusion.
EDIT2: Converted implicit int numbers explicitly into signed char constants.
An outline follows. Below is additional information.
The result angle (a Binary Angle Measure) exactly mathematically divides the unit circle into 8 wedges. Assuming -128 to 127 char, for atan2SC() the result of each octant is 33 integers: 0 to 32 + an offset. (0 to 32, rather than 0 to 31 due to rounding.) For atan2SC(), the result is 0 to 64. So just focus on calculating the result of 1 primary octant with x,y inputs and 0 to 64 result. atan2SC() and atan2SC() can both use this helper function at2(). For atan2SC(), to find the intermediate angle a, use a = at2(x,y)/2. For atanSC(), use a = at2(-128, y).
Finding the integer quotient with a = divisionSC(x,y) and then a * (43 + 11 * A) loses too much information in the division. Need to find the atan2 approximation with an equation that uses x,y maybe in the form at2 = (a*y*y + b*y)/(c*x*x + d*x).
Good to use negative absolute value as with nabsSC(). The negative range of integers meets or exceed the positive range. e.g. -128 to -1 vs 1 to 127. Use negative numbers and 0, when calling the at2().
[Edit]
Below is code with a simplified octant selection algorithm. It is carefully constructed to insure any negation of x,y will result in the SCHAR_MIN,SCHAR_MAX range - assuming 2's complelment. All octants call the iat2() and here is where improvements can be made to improve precision. Note: iat2() division by x==0 is prevented as x is not 0 at this point. Depending on rounding mode and if this helper function is shared with atanSC() will dictate its details. Suggest a 2 piece wise linear table is wide integer math is not available, else a a linear (ay+b)/(cx+d). I may play with this more.
The weight of precision vs. performance is a crucial one for OP's code, but not pass along well enough for me to derive an optimal answer. So I've posted a test driver below that assesses the precision of what ever detail of iat2() OP comes up with.
3 pitfalls exist. 1) When answer is to be +180 degree, OP appears to want -128 BAM. But atan2(-1, 0.0) comes up with +pi. This sign reversal may be an issue. Note: atan2(-1, -0.0) --> -pi. Ref. 2) When an answer is just slightly less than +180 degrees, depending on iat2() details, the integer BAM result is +128, which tends to wrap to -128. The atan2() result is just less than +pi or +128 BAM. This edge condition needs review inOP's final code. 3) The (x=0,y=0) case needs special handling. The octant selection code finds it.
Code for a signed char atanSC(signed char x), if it needs to be fast, could use a few if()s and a 64 byte look-up table. (Assuming a 8 bit signed char). This same table could be used in iat2().
.
#include <stdio.h>
#include <stdlib.h>
// -x > -y >= 0, so divide by 0 not possible
static signed char iat2(signed char y, signed char x) {
// printf("x=%4d y=%4d\n", x, y); fflush(stdout);
return ((y*32+(x/2))/x)*2; // 3.39 mxdiff
// return ((y*64+(x/2))/x); // 3.65 mxdiff
// return (y*64)/x; // 3.88 mxdiff
}
signed char iatan2sc(signed char y, signed char x) {
// determine octant
if (y >= 0) { // oct 0,1,2,3
if (x >= 0) { // oct 0,1
if (x > y) {
return iat2(-y, -x)/2 + 0*32;
} else {
if (y == 0) return 0; // (x=0,y=0)
return -iat2(-x, -y)/2 + 2*32;
}
} else { // oct 2,3
// if (-x <= y) {
if (x >= -y) {
return iat2(x, -y)/2 + 2*32;
} else {
return -iat2(-y, x)/2 + 4*32;
}
}
} else { // oct 4,5,6,7
if (x < 0) { // oct 4,5
// if (-x > -y) {
if (x < y) {
return iat2(y, x)/2 + -4*32;
} else {
return -iat2(x, y)/2 + -2*32;
}
} else { // oct 6,7
// if (x <= -y) {
if (-x >= y) {
return iat2(-x, y)/2 + -2*32;
} else {
return -iat2(y, -x)/2 + -0*32;
}
}
}
}
#include <math.h>
static void test_iatan2sc(signed char y, signed char x) {
static int mn=INT_MAX;
static int mx=INT_MIN;
static double mxdiff = 0;
signed char i = iatan2sc(y,x);
static const double Pi = 3.1415926535897932384626433832795;
double a = atan2(y ? y : -0.0, x) * 256/(2*Pi);
if (i < mn) {
mn = i;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
if (i > mx) {
mx = i;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
double diff = fabs(i - a);
if (diff > 128) diff = fabs(diff - 256);
if (diff > mxdiff) {
mxdiff = diff;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
}
int main(void) {
int x,y;
int n = 127;
for (y = -n-1; y <= n; y++) {
for (x = -n-1; x <= n; x++) {
test_iatan2sc(y,x);
}
}
puts("Done");
return 0;
}
BTW: a fun problem.

Fast ceiling of an integer division in C / C++

Given integer values x and y, C and C++ both return as the quotient q = x/y the floor of the floating point equivalent. I'm interested in a method of returning the ceiling instead. For example, ceil(10/5)=2 and ceil(11/5)=3.
The obvious approach involves something like:
q = x / y;
if (q * y < x) ++q;
This requires an extra comparison and multiplication; and other methods I've seen (used in fact) involve casting as a float or double. Is there a more direct method that avoids the additional multiplication (or a second division) and branch, and that also avoids casting as a floating point number?
For positive numbers where you want to find the ceiling (q) of x when divided by y.
unsigned int x, y, q;
To round up ...
q = (x + y - 1) / y;
or (avoiding overflow in x+y)
q = 1 + ((x - 1) / y); // if x != 0
For positive numbers:
q = x/y + (x % y != 0);
Sparky's answer is one standard way to solve this problem, but as I also wrote in my comment, you run the risk of overflows. This can be solved by using a wider type, but what if you want to divide long longs?
Nathan Ernst's answer provides one solution, but it involves a function call, a variable declaration and a conditional, which makes it no shorter than the OPs code and probably even slower, because it is harder to optimize.
My solution is this:
q = (x % y) ? x / y + 1 : x / y;
It will be slightly faster than the OPs code, because the modulo and the division is performed using the same instruction on the processor, because the compiler can see that they are equivalent. At least gcc 4.4.1 performs this optimization with -O2 flag on x86.
In theory the compiler might inline the function call in Nathan Ernst's code and emit the same thing, but gcc didn't do that when I tested it. This might be because it would tie the compiled code to a single version of the standard library.
As a final note, none of this matters on a modern machine, except if you are in an extremely tight loop and all your data is in registers or the L1-cache. Otherwise all of these solutions will be equally fast, except for possibly Nathan Ernst's, which might be significantly slower if the function has to be fetched from main memory.
You could use the div function in cstdlib to get the quotient & remainder in a single call and then handle the ceiling separately, like in the below
#include <cstdlib>
#include <iostream>
int div_ceil(int numerator, int denominator)
{
std::div_t res = std::div(numerator, denominator);
return res.rem ? (res.quot + 1) : res.quot;
}
int main(int, const char**)
{
std::cout << "10 / 5 = " << div_ceil(10, 5) << std::endl;
std::cout << "11 / 5 = " << div_ceil(11, 5) << std::endl;
return 0;
}
There's a solution for both positive and negative x but only for positive y with just 1 division and without branches:
int div_ceil(int x, int y) {
return x / y + (x % y > 0);
}
Note, if x is positive then division is towards zero, and we should add 1 if reminder is not zero.
If x is negative then division is towards zero, that's what we need, and we will not add anything because x % y is not positive
How about this? (requires y non-negative, so don't use this in the rare case where y is a variable with no non-negativity guarantee)
q = (x > 0)? 1 + (x - 1)/y: (x / y);
I reduced y/y to one, eliminating the term x + y - 1 and with it any chance of overflow.
I avoid x - 1 wrapping around when x is an unsigned type and contains zero.
For signed x, negative and zero still combine into a single case.
Probably not a huge benefit on a modern general-purpose CPU, but this would be far faster in an embedded system than any of the other correct answers.
I would have rather commented but I don't have a high enough rep.
As far as I am aware, for positive arguments and a divisor which is a power of 2, this is the fastest way (tested in CUDA):
//example y=8
q = (x >> 3) + !!(x & 7);
For generic positive arguments only, I tend to do it like so:
q = x/y + !!(x % y);
This works for positive or negative numbers:
q = x / y + ((x % y != 0) ? !((x > 0) ^ (y > 0)) : 0);
If there is a remainder, checks to see if x and y are of the same sign and adds 1 accordingly.
simplified generic form,
int div_up(int n, int d) {
return n / d + (((n < 0) ^ (d > 0)) && (n % d));
} //i.e. +1 iff (not exact int && positive result)
For a more generic answer, C++ functions for integer division with well defined rounding strategy
For signed or unsigned integers.
q = x / y + !(((x < 0) != (y < 0)) || !(x % y));
For signed dividends and unsigned divisors.
q = x / y + !((x < 0) || !(x % y));
For unsigned dividends and signed divisors.
q = x / y + !((y < 0) || !(x % y));
For unsigned integers.
q = x / y + !!(x % y);
Zero divisor fails (as with a native operation). Cannot cause overflow.
Corresponding floored and modulo constexpr implementations here, along with templates to select the necessary overloads (as full optimization and to prevent mismatched sign comparison warnings):
https://github.com/libbitcoin/libbitcoin-system/wiki/Integer-Division-Unraveled
Compile with O3, The compiler performs optimization well.
q = x / y;
if (x % y) ++q;