Neighbor index computation for diagonally flattened matrix - c++

I have a 2D matrix stored in a flat buffer along diagonals. For example a 4x4 matrix would have its indexes scattered like so:
0 2 5 9
1 4 8 12
3 7 11 14
6 10 13 15
With this representation, what is the most efficient way to calculate the index of a neighboring element given the original index and a X/Y offset? For example:
// return the index of a neighbor given an offset
int getNGonalNeighbor(const size_t index,
const int x_offset,
const int y_offset){
//...
}
// for the array above:
getNGonalNeighbor(15,-1,-1); // should return 11
getNGonalNeighbor(15, 0,-1); // should return 14
getNGonalNeighbor(15,-1, 0); // should return 13
getNGonalNeighbor(11,-2,-1); // should return 1
We assume here that overflow never occurs and there is no wrap-around.
I have a solution involving a lot of triangular number and triangular root calculations. It also contains a lot of branches, which I would prefer to replace with algebra if possible (this will run on GPUs where diverging control flow is expensive). My solution is working but very lengthy. I feel like there must be a much simpler and less compute intensive way of doing it.
Maybe it would help me if someone can put a name on this particular problem/representation.
I can post my full solution if anyone is interested, but as I said it is very long and relatively complicated for such a simple task. In a nutshell, my solution does:
translate the original index into a larger triangular matrix to avoid dealing with 2 triangles (for example 13 would become 17)
For the 4x4 matrix this would be:
0 2 5 9 14 20 27
1 4 8 13 19 26
3 7 12 18 25
6 11 17 24
10 16 23
15 22
21
calculate the index of the diagonal of the neighbor in this representation using the manhattan distance of the offset and the triangular root of the index.
calculate the position of the neighbor in this diagonal using the offset
translate back to the original representation by removing the padding.
For some reason this is the simplest solution i could come up with.
Edit:
having loop to accumulate the offset:
I realize that given the properties of the triangle numbers, it would be easier to split up the matrix in two triangles (let's call 0 to 9 'upper triangle' and 10 to 15 'lower triangle') and have a loop with a test inside to accumulate the offset by adding one while in the upper triangle and subtracting one in the lower (if that makes sense). But for my solution loops must be avoided at all cost, especially loops with unbalanced trip counts (again, very bad for GPUs).
So I am looking more for an algebraic solution rather than an algorithmic one.
Building a lookup table:
Again, because of the GPU, it is preferable to avoid building a lookup table and have random accesses in it (very expensive). An algebraic solution is preferable.
Properties of the matrix:
The size of the matrix is known.
For now I only consider square matrix, but a solution for rectangular ones as well would be nice.
as the name of the function in my example suggests, extending the solution to N-dimensional volumes (hence N-gonal flattening) would be a big plus too.

Table lookup
#include <stdio.h>
#define SIZE 16
#define SIDE 4 //sqrt(SIZE)
int table[SIZE];
int rtable[100];// {x,y| x<=99, y<=99 }
void setup(){
int i, x, y, xy, index;//xy = x + y
x=y=xy=0;
for(i=0;i<SIZE;++i){
table[i]= index= x*10 + y;
rtable[x*10+y]=i;
x = x + 1; y = y - 1;//right up
if(y < 0 || x >= SIDE){
++xy;
x = 0;
y = xy;;
while(y>=SIDE){
++x;
--y;
}
}
}
}
int getNGonalNeighbor(int index, int offsetX, int offsetY){
int x,y;
x=table[index] / 10 + offsetX;
y=table[index] % 10 + offsetY;
if(x < 0 || x >= SIDE || y < 0 || y >= SIDE) return -1; //ERROR
return rtable[ x*10+y ];
}
int main() {
int i;
setup();
printf("%d\n", getNGonalNeighbor(15,-1,-1));
printf("%d\n", getNGonalNeighbor(15, 0,-1));
printf("%d\n", getNGonalNeighbor(15,-1, 0));
printf("%d\n", getNGonalNeighbor(11,-2,-1));
printf("%d\n", getNGonalNeighbor(0, -1,-1));
return 0;
}
don't use table version.
#include <stdio.h>
#define SIZE 16
#define SIDE 4
void num2xy(int index, int *offsetX, int *offsetY){
int i, x, y, xy;//xy = x + y
x=y=xy=0;
for(i=0;i<SIZE;++i){
if(i == index){
*offsetX = x;
*offsetY = y;
return;
}
x = x + 1; y = y - 1;//right up
if(y < 0 || x >= SIDE){
++xy;
x = 0;
y = xy;;
while(y>=SIDE){
++x;
--y;
}
}
}
}
int xy2num(int offsetX, int offsetY){
int i, x, y, xy, index;//xy = x + y
x=y=xy=0;
for(i=0;i<SIZE;++i){
if(offsetX == x && offsetY == y) return i;
x = x + 1; y = y - 1;//right up
if(y < 0 || x >= SIDE){
++xy;
x = 0;
y = xy;;
while(y>=SIDE){
++x;
--y;
}
}
}
return -1;
}
int getNGonalNeighbor(int index, int offsetX, int offsetY){
int x,y;
num2xy(index, &x, &y);
return xy2num(x + offsetX, y + offsetY);
}
int main() {
printf("%d\n", getNGonalNeighbor(15,-1,-1));
printf("%d\n", getNGonalNeighbor(15, 0,-1));
printf("%d\n", getNGonalNeighbor(15,-1, 0));
printf("%d\n", getNGonalNeighbor(11,-2,-1));
printf("%d\n", getNGonalNeighbor(0, -1,-1));
return 0;
}

I actually already had the elements to solve it somewhere else in my code. As BLUEPIXY's solution hinted, I am using scatter/gather operations, which I had already implemented for layout transformation.
This solution basically rebuilds the original (x,y) index of the given element in the matrix, applies the index offset and translates the result back to the transformed layout. It splits the square in 2 triangles and adjust the computation depending on which triangle it belongs to.
It is an almost entirely algebraic transformation: it uses no loop and no table lookup, has a small memory footprint and little branching. The code can probably be optimized further.
Here is the draft of the code:
#include <stdio.h>
#include <math.h>
// size of the matrix
#define SIZE 4
// triangle number of X
#define TRIG(X) (((X) * ((X) + 1)) >> 1)
// triangle root of X
#define TRIROOT(X) ((int)(sqrt(8*(X)+1)-1)>>1);
// return the index of a neighbor given an offset
int getNGonalNeighbor(const size_t index,
const int x_offset,
const int y_offset){
// compute largest upper triangle index
const size_t upper_triangle = TRIG(SIZE);
// position of the actual element of index
unsigned int x = 0,y = 0;
// adjust the index depending of upper/lower triangle.
const size_t adjusted_index = index < upper_triangle ?
index :
SIZE * SIZE - index - 1;
// compute triangular root
const size_t triroot = TRIROOT(adjusted_index);
const size_t trig = TRIG(triroot);
const size_t offset = adjusted_index - trig;
// upper triangle
if(index < upper_triangle){
x = offset;
y = triroot-offset;
}
// lower triangle
else {
x = SIZE - offset - 1;
y = SIZE - (trig + triroot + 1 - adjusted_index);
}
// adjust the offset
x += x_offset;
y += y_offset;
// manhattan distance
const size_t man_dist = x+y;
// calculate index using triangular number
return TRIG(man_dist) +
(man_dist >= SIZE ? x - (man_dist - SIZE + 1) : x) -
(man_dist > SIZE ? 2* TRIG(man_dist - SIZE) : 0);
}
int main(){
printf("%d\n", getNGonalNeighbor(15,-1,-1)); // should return 11
printf("%d\n", getNGonalNeighbor(15, 0,-1)); // should return 14
printf("%d\n", getNGonalNeighbor(15,-1, 0)); // should return 13
printf("%d\n", getNGonalNeighbor(11,-2,-1)); // should return 1
}
And the output is indeed:
11
14
13
1
If you think this solution looks over complicated and inefficient, I remind you that the target here is GPU, where computation costs virtually nothing compared to memory accesses, and all index computations are computed at the same time using massively parallel architectures.

Related

Fast way of finding all numbers X that sum with themselves with one digit removed to get N?

I'm working on an assignment that gives an integer N and tasks us to find all possible combinations of X, Y such that X + Y = N and Y = X with one digit removed. For example, 302 would have the following solutions:
251 + 51 = 302
275 + 27 = 302
276 + 26 = 302
281 + 21 = 302
301 + 01 = 302
My code to accomplish this can find all of the correct answers, but it runs too slowly for very large numbers (it takes roughly 8 seconds for the largest possible number, 10^9, when I would like for the entire algorithm of up to 100 of these cases to complete in under 3 seconds).
Here's some code describing my current solution:
//Only need to consider cases where x > y.
for(int x = n * 0.5; x <= n; x++)
{
//Only considers cases where y's rightmost digit could align with x.
int y = n - x,
y_rightmost = y % 10;
if(y_rightmost == x % 10 || y_rightmost == (x % 100) / 10)
{
//Determines the number of digits in x and y without division. places[] = {1, 10, 100, 1000, ... 1000000000}
int x_numDigits = 0,
y_numDigits = 0;
while(x >= places[x_numDigits])
{
if(y >= places[x_numDigits])
y_numDigits++;
x_numDigits++;
}
//y must have less digits than x to be a possible solution.
if(y_numDigits < x_numDigits)
{
if(func(x, y))
{
//x and y are a solution.
}
}
}
Where func is a function to determine if x and y only have a one digit difference. Here's my current method for calculating that:
bool func(int x, int y)
{
int diff = 0;
while(y > 0)
{
if(x % 10 != y % 10)
{
//If the rightmost digits do not match, move x to the left once and check again.
x /= 10;
diff++;
if(diff > 1)
return false;
}
else
{
//If they matched, both move to the next digit.
x /= 10;
y /= 10;
}
}
//If the last digit in x is the only difference or x is composed of 0's led by 1 number, then x, y is a solution.
if((x < 10 && diff == 0) || (x % 10 == 0))
return true;
else
return false;
}
This is the fastest solution that I've been able to find so far (other methods I tried included converting X and Y into strings and using a custom subsequence function, along with dividing X into a prefix and suffix without each digit from the right to the left and seeing if any of these summed to Y, but neither worked as quickly). However, it still doesn't scale as well as I need it to with larger numbers, and I'm struggling to think of any other ways to optimize the code or underlying mathematical reasoning. Any advice would be greatly appreciated.
Consider solving a simpler solution first:
Finding X and Y such that X + Y = N
In pseudo-code you steps should look like this:
loop through the array and with every given item do the next:
add this number to Set and check whether there is N - item
This will work as O(n) complexity for unique array.
So improve it to work with duplicated numbers by looping through an array first and adding counter of duplicates for every number. Use some kind of Dictionary for c++ or extend Set. And every time you find the necessary number check for counter.
After doing that you will just have to write this "digit check" function and apply it when finding the value in Set.

Diagonally Sorting a Two Dimensional Array in C++ [duplicate]

I'm building a heatmap-like rectangular array interface and I want the 'hot' location to be at the top left of the array, and the 'cold' location to be at the bottom right. Therefore, I need an array to be filled diagonally like this:
0 1 2 3
|----|----|----|----|
0 | 0 | 2 | 5 | 8 |
|----|----|----|----|
1 | 1 | 4 | 7 | 10 |
|----|----|----|----|
2 | 3 | 6 | 9 | 11 |
|----|----|----|----|
So actually, I need a function f(x,y) such that
f(0,0) = 0
f(2,1) = 7
f(1,2) = 6
f(3,2) = 11
(or, of course, a similar function f(n) where f(7) = 10, f(9) = 6, etc.).
Finally, yes, I know this question is similar to the ones asked here, here and here, but the solutions described there only traverse and don't fill a matrix.
Interesting problem if you are limited to go through the array row by row.
I divided the rectangle in three regions. The top left triangle, the bottom right triangle and the rhomboid in the middle.
For the top left triangle the values in the first column (x=0) can be calculated using the common arithmetic series 1 + 2 + 3 + .. + n = n*(n+1)/2. Fields in the that triangle with the same x+y value are in the same diagonal and there value is that sum from the first colum + x.
The same approach works for the bottom right triangle. But instead of x and y, w-x and h-y is used, where w is the width and h the height of rectangle. That value have to be subtracted from the highest value w*h-1 in the array.
There are two cases for the rhomboid in the middle. If the width of rectangle is greater than (or equal to) the height, then the bottom left field of the rectangle is the field with the lowest value in the rhomboid and can be calculated that sum from before for h-1. From there on you can imagine that the rhomboid is a rectangle with a x-value of x+y and a y-value of y from the original rectangle. So calculations of the remaining values in that new rectangle are easy.
In the other case when the height is greater than the width, then the field at x=w-1 and y=0 can be calculated using that arithmetic sum and the rhomboid can be imagined as a rectangle with x-value x and y-value y-(w-x-1).
The code can be optimised by precalculating values for example. I think there also is one formula for all that cases. Maybe i think about it later.
inline static int diagonalvalue(int x, int y, int w, int h) {
if (h > x+y+1 && w > x+y+1) {
// top/left triangle
return ((x+y)*(x+y+1)/2) + x;
} else if (y+x >= h && y+x >= w) {
// bottom/right triangle
return w*h - (((w-x-1)+(h-y-1))*((w-x-1)+(h-y-1)+1)/2) - (w-x-1) - 1;
}
// rhomboid in the middle
if (w >= h) {
return (h*(h+1)/2) + ((x+y+1)-h)*h - y - 1;
}
return (w*(w+1)/2) + ((x+y)-w)*w + x;
}
for (y=0; y<h; y++) {
for (x=0; x<w; x++) {
array[x][y] = diagonalvalue(x,y,w,h);
}
}
Of course if there is not such a limitation, something like that should be way faster:
n = w*h;
x = 0;
y = 0;
for (i=0; i<n; i++) {
array[x][y] = i;
if (y <= 0 || x+1 >= w) {
y = x+y+1;
if (y >= h) {
x = (y-h)+1;
y -= x;
} else {
x = 0;
}
} else {
x++;
y--;
}
}
What about this (having an NxN matrix):
count = 1;
for( int k = 0; k < 2*N-1; ++k ) {
int max_i = std::min(k,N-1);
int min_i = std::max(0,k-N+1);
for( int i = max_i, j = min_i; i >= min_i; --i, ++j ) {
M.at(i).at(j) = count++;
}
}
Follow the steps in the 3rd example -- this gives the indexes (in order to print out the slices) -- and just set the value with an incrementing counter:
int x[3][3];
int n = 3;
int pos = 1;
for (int slice = 0; slice < 2 * n - 1; ++slice) {
int z = slice < n ? 0 : slice - n + 1;
for (int j = z; j <= slice - z; ++j)
x[j][slice - j] = pos++;
}
At a M*N matrix, the values, when traversing like in your stated example, seem to increase by n, except for border cases, so
f(0,0)=0
f(1,0)=f(0,0)+2
f(2,0)=f(1,0)+3
...and so on up to f(N,0). Then
f(0,1)=1
f(0,2)=3
and then
f(m,n)=f(m-1,n)+N, where m,n are index variables
and
f(M,N)=f(M-1,N)+2, where M,N are the last indexes of the matrix
This is not conclusive, but it should give you something to work with. Note, that you only need the value of the preceding element in each row and a few starting values to begin.
If you want a simple function, you could use a recursive definition.
H = height
def get_point(x,y)
if x == 0
if y == 0
return 0
else
return get_point(y-1,0)+1
end
else
return get_point(x-1,y) + H
end
end
This takes advantage of the fact that any value is H+the value of the item to its left. If the item is already at the leftmost column, then you find the cell that is to its far upper right diagonal, and move left from there, and add 1.
This is a good chance to use dynamic programming, and "cache" or memoize the functions you've already accomplished.
If you want something "strictly" done by f(n), you could use the relationship:
n = ( n % W , n / H ) [integer division, with no remainder/decimal]
And work your function from there.
Alternatively, if you want a purely array-populating-by-rows method, with no recursion, you could follow these rules:
If you are on the first cell of the row, "remember" the item in the cell (R-1) (where R is your current row) of the first row, and add 1 to it.
Otherwise, simply add H to the cell you last computed (ie, the cell to your left).
Psuedo-Code: (Assuming array is indexed by arr[row,column])
arr[0,0] = 0
for R from 0 to H
if R > 0
arr[R,0] = arr[0,R-1] + 1
end
for C from 1 to W
arr[R,C] = arr[R,C-1]
end
end

Trying to compute my own Histogram without opencv calcHist()

What I'm trying to do is writing a function that calculates a Histogram of a greyscale image with a forwarded Number of Bins (anzBin) which the histograms range is divided in. Then I'm running through the Image Pixels compairing their value to the different Bins and in case a value fits, increasing the value of the Bin by 1
vector<int> calcuHisto(const IplImage *src_pic, int anzBin)
{
CvSize size = cvGetSize(src_pic);
int binSize = (size.width / 256)*anzBin;
vector<int> histogram(anzBin,0);
for (int y = 0; y<size.height; y++)
{
const uchar *src_pic_point =
(uchar *)(src_pic->imageData + y*src_pic->widthStep);
for (int x = 0; x<size.width; x++)
{
for (int z = 0; z < anzBin; z++)
{
if (src_pic_point[x] <= z*binSize)
{
histogram[src_pic_point[x]]++;
}
}
}
}
return histogram;
}
But unfortunately it's not working...
What is wrong here?
Please help
There are a few issues I can see
Your binSize calculation is wrong
Your binning algorithm is one sided, and should be two sided
You aren't incrementing the proper bin when you find a match
1. binsize calculation
bin size = your range / number of bins
2. two sided binning
if (src_pic_point[x] <= z*binSize)
you need a two sided range of values, not a one sided inequality. Imagine you have 4 bins and values from 0 to 255. Your bins should have the following ranges
bin low high
0 0 63.75
1 63.75 127.5
2 127.5 191.25
3 191.25 255
For example: a value of 57 should go in bin 0. Your code says the value goes in all the bins! Because its always <= z*binsize You need something something with a lower and upper bound.
3. Incrementing the appropriate bin
You are using z to loop over each bin, so when you find a match you should increment bin z, you don't use the actual pixel value except when determining which bin it belongs to
this would likely be buffer overrun imagine again you have 4 bins, and the current pixel has a value of 57. This code says increment bin 57. But you only have 4 bins (0-3)
histogram[src_pic_point[x]]++;
you want to increment only the bin the pixel value falls into
histogram[z]++;
CODE
With that in mind here is revised code (it is untested, but should work)
vector<int> calcuHisto(const IplImage *src_pic, int anzBin)
{
CvSize size = cvGetSize(src_pic);
double binSize = 256.0 / anzBin; //new definition
vector<int> histogram(anzBin,0); //i don't know if this works so I
//so I will leave it
//goes through all rows
for (int y = 0; y<size.height; y++)
{
//grabs an entire row of the imageData
const uchar *src_pic_point = (uchar *)(src_pic->imageData + y*src_pic->widthStep);
//goes through each column
for (int x = 0; x<size.width; x++)
{
//for each bin
for (int z = 0; z < anzBin; z++)
{
//check both upper and lower limits
if (src_pic_point[x] >= z*binSize && src_pic_point[x] < (z+1)*binSize)
{
//increment the index that contains the point
histogram[z]++;
}
}
}
}
return histogram;
}

Signed Char ATAN2 and ATAN approximations

Basically, I've been trying to make two approximation functions. In both cases I input the "x" and the "y" components (to deal with those nasty n/0 and 0/0 conditions), and need to get a Signed Char output. In ATAN2's case, it should provide a range of +/-PI, and in ATAN's case, the range should be +/- PI/2.
I spent the entire of yesterday trying to wrap my head around it. After playing around in excel to find an overall good algorithm based on the approximation:
X * (PI/4 + 0.273 * (1 - |X|)) * 128/PI // Scale factor at end to switch to char format
I came up with the following code:
signed char nabsSC(signed char x)
{
if(x > 0)
return -x;
return x;
}
signed char signSC(signed char input, signed char ifZero = 0, signed char scaleFactor = 1)
{
if(input > 0)
{return scaleFactor;}
else if(input < 0)
{return -scaleFactor;}
else
{return ifZero;}
}
signed char divisionSC(signed char numerator, signed char denominator)
{
if(denominator == 0) // Error Condition
{return 0;}
else
{return numerator/denominator;}
}
//#######################################################################################
signed char atan2SC(signed char y, signed char x)
{
// #todo make clearer : the code was deduced through trial and error in excel with brute force... not the best reasoning in the world but hey ho
if((x == y) && (x == 0)) // Error Condition
{return 0;}
// Prepare for algorithm Choice
const signed char X = abs(x);
signed char Y = abs(y);
if(Y > 2)
{Y = (Y << 1) + 4;}
const signed char alpha1 = 43;
const signed char alpha2 = 11;
// Make Choice
if(X <= Y) // x/y Path
{
const signed char beta = 64;
const signed char a = divisionSC(x,y); // x/y
const signed char A = nabsSC(a); // -|x/y|
const signed char temp = a * (alpha1 + alpha2 * A); // (x/y) * (32 + ((0.273 * 128) / PI) * (1 - |x/y|)))
// Small angle approximation of ARCTAN(X)
if(y < 0) // Determine Quadrant
{return -(temp + beta);}
else
{return -(temp - beta);}
}
else // y/x Path
{
const signed char a = divisionSC(y,x); // y/x
const signed char A = nabsSC(a); // -|y/x|
const signed char temp = a * (alpha1 + alpha2 * A); // (y/x) * (32 + ((0.273 * 128) / PI) * (1 - |y/x|)))
// Small angle approximation of ARCTAN(X)
if(x < 0) // Determine Quadrant
{
Y = signSC(y, -127, 127); // Sign(y)*127, if undefined: use -127
return temp + Y;
}
else
{return temp;}
}
}
Much to my despair, the implementation has errors as large as 180 degrees, and pretty much everywhere in between as well. (I compared it to the ATAN2F from the library after converting to signed char format.)
I got the general gist from this website: http://geekshavefeelings.com/posts/fixed-point-atan2
Can anybody tell me where I'm going wrong? And how I should approach the ATAN variant (which should be more precise as it's looking over half the range) without all this craziness.
I'm currently using QT creator 4.8.1 on windows. The end platform for this specific bit of code will eventually be a micro-controller without an FPU, and the ATAN functions will be one of the primary functions used. As such, efficiency with reasonable error (+/-2 degrees for ATAN2 and +/-1 degree for ATAN. These are guesstimates for now, so I might increase the range, however, 90 degrees is definitely not acceptable!) is the aim of the game.
Thanks in advance for any and all help!
EDIT:
Just to clarify, the outputs of ATAN2 and ATAN output to a signed char value, but the ranges of the two types are different ranges.
ATAN2 shall have a range from -128 (-PI) to 127 (+PI - PI/128).
ATAN will have a range from -128 (-PI/2) to 127 (+PI/2 - PI/256).
As such the output values from the two can be considered to be two different data types.
Sorry for any confusion.
EDIT2: Converted implicit int numbers explicitly into signed char constants.
An outline follows. Below is additional information.
The result angle (a Binary Angle Measure) exactly mathematically divides the unit circle into 8 wedges. Assuming -128 to 127 char, for atan2SC() the result of each octant is 33 integers: 0 to 32 + an offset. (0 to 32, rather than 0 to 31 due to rounding.) For atan2SC(), the result is 0 to 64. So just focus on calculating the result of 1 primary octant with x,y inputs and 0 to 64 result. atan2SC() and atan2SC() can both use this helper function at2(). For atan2SC(), to find the intermediate angle a, use a = at2(x,y)/2. For atanSC(), use a = at2(-128, y).
Finding the integer quotient with a = divisionSC(x,y) and then a * (43 + 11 * A) loses too much information in the division. Need to find the atan2 approximation with an equation that uses x,y maybe in the form at2 = (a*y*y + b*y)/(c*x*x + d*x).
Good to use negative absolute value as with nabsSC(). The negative range of integers meets or exceed the positive range. e.g. -128 to -1 vs 1 to 127. Use negative numbers and 0, when calling the at2().
[Edit]
Below is code with a simplified octant selection algorithm. It is carefully constructed to insure any negation of x,y will result in the SCHAR_MIN,SCHAR_MAX range - assuming 2's complelment. All octants call the iat2() and here is where improvements can be made to improve precision. Note: iat2() division by x==0 is prevented as x is not 0 at this point. Depending on rounding mode and if this helper function is shared with atanSC() will dictate its details. Suggest a 2 piece wise linear table is wide integer math is not available, else a a linear (ay+b)/(cx+d). I may play with this more.
The weight of precision vs. performance is a crucial one for OP's code, but not pass along well enough for me to derive an optimal answer. So I've posted a test driver below that assesses the precision of what ever detail of iat2() OP comes up with.
3 pitfalls exist. 1) When answer is to be +180 degree, OP appears to want -128 BAM. But atan2(-1, 0.0) comes up with +pi. This sign reversal may be an issue. Note: atan2(-1, -0.0) --> -pi. Ref. 2) When an answer is just slightly less than +180 degrees, depending on iat2() details, the integer BAM result is +128, which tends to wrap to -128. The atan2() result is just less than +pi or +128 BAM. This edge condition needs review inOP's final code. 3) The (x=0,y=0) case needs special handling. The octant selection code finds it.
Code for a signed char atanSC(signed char x), if it needs to be fast, could use a few if()s and a 64 byte look-up table. (Assuming a 8 bit signed char). This same table could be used in iat2().
.
#include <stdio.h>
#include <stdlib.h>
// -x > -y >= 0, so divide by 0 not possible
static signed char iat2(signed char y, signed char x) {
// printf("x=%4d y=%4d\n", x, y); fflush(stdout);
return ((y*32+(x/2))/x)*2; // 3.39 mxdiff
// return ((y*64+(x/2))/x); // 3.65 mxdiff
// return (y*64)/x; // 3.88 mxdiff
}
signed char iatan2sc(signed char y, signed char x) {
// determine octant
if (y >= 0) { // oct 0,1,2,3
if (x >= 0) { // oct 0,1
if (x > y) {
return iat2(-y, -x)/2 + 0*32;
} else {
if (y == 0) return 0; // (x=0,y=0)
return -iat2(-x, -y)/2 + 2*32;
}
} else { // oct 2,3
// if (-x <= y) {
if (x >= -y) {
return iat2(x, -y)/2 + 2*32;
} else {
return -iat2(-y, x)/2 + 4*32;
}
}
} else { // oct 4,5,6,7
if (x < 0) { // oct 4,5
// if (-x > -y) {
if (x < y) {
return iat2(y, x)/2 + -4*32;
} else {
return -iat2(x, y)/2 + -2*32;
}
} else { // oct 6,7
// if (x <= -y) {
if (-x >= y) {
return iat2(-x, y)/2 + -2*32;
} else {
return -iat2(y, -x)/2 + -0*32;
}
}
}
}
#include <math.h>
static void test_iatan2sc(signed char y, signed char x) {
static int mn=INT_MAX;
static int mx=INT_MIN;
static double mxdiff = 0;
signed char i = iatan2sc(y,x);
static const double Pi = 3.1415926535897932384626433832795;
double a = atan2(y ? y : -0.0, x) * 256/(2*Pi);
if (i < mn) {
mn = i;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
if (i > mx) {
mx = i;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
double diff = fabs(i - a);
if (diff > 128) diff = fabs(diff - 256);
if (diff > mxdiff) {
mxdiff = diff;
printf ("x=%4d,y=%4d --> %4d %f, mn %d mx %d mxdiff %f\n",
x,y,i,a,mn,mx,mxdiff);
}
}
int main(void) {
int x,y;
int n = 127;
for (y = -n-1; y <= n; y++) {
for (x = -n-1; x <= n; x++) {
test_iatan2sc(y,x);
}
}
puts("Done");
return 0;
}
BTW: a fun problem.

Better way than if else if else... for linear interpolation

question is easy.
Lets say you have function
double interpolate (double x);
and you have a table that has map of known x-> y
for example
5 15
7 18
10 22
note: real tables are bigger ofc, this is just example.
so for 8 you would return 18+((8-7)/(10-7))*(22-18)=19.3333333
One cool way I found is
http://www.bnikolic.co.uk/blog/cpp-map-interp.html
(long story short it uses std::map, key= x, value = y for x->y data pairs).
If somebody asks what is the if else if else way in title
it is basically:
if ((x>=5) && (x<=7))
{
//interpolate
}
else
if((x>=7) && x<=10)
{
//interpolate
}
So is there a more clever way to do it or map way is the state of the art? :)
Btw I prefer soutions in C++ but obviously any language solution that has 1:1 mapping to C++ is nice.
Well, the easiest way I can think of would be using a binary search to find the point where your point lies. Try to avoid maps if you can, as they are very slow in practice.
This is a simple way:
const double INF = 1.e100;
vector<pair<double, double> > table;
double interpolate(double x) {
// Assumes that "table" is sorted by .first
// Check if x is out of bound
if (x > table.back().first) return INF;
if (x < table[0].first) return -INF;
vector<pair<double, double> >::iterator it, it2;
// INFINITY is defined in math.h in the glibc implementation
it = lower_bound(table.begin(), table.end(), make_pair(x, -INF));
// Corner case
if (it == table.begin()) return it->second;
it2 = it;
--it2;
return it2->second + (it->second - it2->second)*(x - it2->first)/(it->first - it2->first);
}
int main() {
table.push_back(make_pair(5., 15.));
table.push_back(make_pair(7., 18.));
table.push_back(make_pair(10., 22.));
// If you are not sure if table is sorted:
sort(table.begin(), table.end());
printf("%f\n", interpolate(8.));
printf("%f\n", interpolate(10.));
printf("%f\n", interpolate(10.1));
}
You can use a binary search tree to store the interpolation data. This is beneficial when you have a large set of N interpolation points, as interpolation can then be performed in O(log N) time. However, in your example, this does not seem to be the case, and the linear search suggested by RedX is more appropriate.
#include <stdio.h>
#include <assert.h>
#include <map>
static double interpolate (double x, const std::map<double, double> &table)
{
assert(table.size() > 0);
std::map<double, double>::const_iterator it = table.lower_bound(x);
if (it == table.end()) {
return table.rbegin()->second;
} else {
if (it == table.begin()) {
return it->second;
} else {
double x2 = it->first;
double y2 = it->second;
--it;
double x1 = it->first;
double y1 = it->second;
double p = (x - x1) / (x2 - x1);
return (1 - p) * y1 + p * y2;
}
}
}
int main ()
{
std::map<double, double> table;
table.insert(std::pair<double, double>(5, 6));
table.insert(std::pair<double, double>(8, 4));
table.insert(std::pair<double, double>(9, 5));
double y = interpolate(5.1, table);
printf("%f\n", y);
}
Store your points sorted:
index X Y
1 1 -> 3
2 3 -> 7
3 10-> 8
Then loop from max to min and as soon as you get below a number you know it the one you want.
You want let's say 6 so:
// pseudo
for i = 3 to 1
if x[i] <= 6
// you found your range!
// interpolate between x[i] and x[i - 1]
break; // Do not look any further
end
end
Yes, I guess that you should think in a map between those intervals and the natural nummbers. I mean, just label the intervals and use a switch:
switch(I) {
case Int1: //whatever
break;
...
default:
}
I don't know, it's the first thing that I thought of.
EDIT Switch is more efficient than if-else if your numbers are within a relative small interval (that's something to take into account when doing the mapping)
If your x-coordinates must be irregularly spaced, then store the x-coordinates in sorted order, and use a binary search to find the nearest coordinate, for example using Daniel Fleischman's answer.
However, if your problem permits it, consider pre-interpolating to regularly spaced data. So
5 15
7 18
10 22
becomes
5 15
6 16.5
7 18
8 19.3333333
9 20.6666667
10 22
Then at run-time you can interpolate with O(1) using something like this:
double interp1( double x0, double dx, double* y, int n, double xi )
{
double f = ( xi - x0 ) / dx;
if (f<0) return y[0];
if (f>=(n-1)) return y[n-1];
int i = (int) f;
double w = f-(double)i;
return dy[i]*(1.0-w) + dy[i+1]*w;
}
using
double y[6] = {15,16.5,18,19.3333333, 20.6666667, 22 }
double yi = interp1( 5.0 , 1.0 , y, 5, xi );
This isn't necessarily suitable for every problem -- you could end up losing accuracy (if there's no nice grid that contains all your x-samples), and it could have a bad cache penalty if it would make your table much much bigger. But it's a good option for cases where you have some control over the x-coordinates to begin with.
How you've already got it is fairly readable and understandable, and there's a lot to be said for that over a "clever" solution. You can however do away with the lower bounds check and clumsy && because the sequence is ordered:
if (x < 5)
return 0;
else if (x <= 7)
// interpolate
else if (x <= 10)
// interpolate
...