I can't seem to find an answer to this. I realize that the integer value used in an array must be known at compile time, and what I have here seems to meet that criteria. If I use:
int L = 50; // number of interior points in x and y
int pts = L + 2; // interior points + boundary points
double u[pts][pts], // potential to be found
u_new[pts][pts]; // new potential after each step
Then I get the array bound error, even though the value for pts is known at compile time. The code is accepted, however, when I use:
int L = 50; // number of interior points in x and y
int pts = L + 2; // interior points + boundary points
double u[52][52], // potential to be found
u_new[52][52]; // new potential after each step
Am I missing something here? If not, what can I do to have it accept pts?
When you use
int L = 50;
int pts = L + 2;
L and pts cannot be used as dimensions of arrays since they are not compile time constants. Use constexpr qualifiers to let the compiler know that they can be computed at compile time, and hence can be used as dimensions of arrays.
constexpr int L = 50;
constexpr int pts = L + 2;
Related
The task is to increase the resolution of a geometry shape like this:
So that the shape becomes this by adding data points:
The shape is define by data points which has x and y coordinates and a index. The index represent the order to connect them.
What type of algorithm should I use to achieve this?
You can use linear interpolation between segment ends.
It is not clear yet - how many new points you want to insert in every segment. Does it depend on segment length or other reasons? Seems your picture shows mixed approach: n is at least nmin, part length is at most lmax like this
n = max(nmin, int(seglength / lmax)) ;
For XStart, XEnd referring to starting and ending coordinates of segment, we insert n-1 points, dividing segment into n equal parts:
for (int i = 1; i < n; i++) {
X[i] = XStart + (XEnd - XStart) * i / n;
Y[i] = YStart + (YEnd - YStart) * i / n;
}
This question already has an answer here:
What's algorithm used to solve Linear Diophantine equation: ax + by = c
(1 answer)
Closed 5 years ago.
I am trying to write a code which can input 3 long int variables, a, b, c.
The code should find all integer (x,y) so that ax+by = c, but the input values can be up to 2*10^9. I'm not sure how to do this efficiently. My algorithm is O(n^2), which is really bad for such large inputs. How can I do it better? Here's my code-
typedef long int lint;
struct point
{
lint x, y;
};
int main()
{
lint a, b, c;
vector <point> points;
cin >> c >> a >> b;
for(lint x = 0; x < c; x++)
for(lint y = 0; y < c; y++)
{
point candidate;
if(a*x + b*y == c)
{
candidate.x = x;
candidate.y = y;
points.push_back(candidate);
break;
}
}
}
Seems like you can apply a tiny bit of really trivial math to solve for y for any given value of x. Starting from ax + by = c:
ax + by = c
by = c - ax
Assuming non-zero b1, we then get:
y = (c - ax) / b
With that in hand, we can generate our values of x in the loop, plug it into the equation above, and compute the matching value of y and check whether it's an integer. If so, add that (x, y) pair, and go on to the next value of x.
You could, of course, make the next step and figure out which values of x would result in the required y being an integer, but even without doing that we've moved from O(N2) to O(N), which is likely to be plenty to get the task done in a much more reasonable time frame.
Of course, if b is 0, then the by term is zero, so we have ax = c, which we can then turn into x = c/a, so we then just need to check that x is an integer, and if so all pairs of that x with any candidate value of y will yield the correct c.
I am writing a program in C++ to reconstruct a 3D object from a set of projected 2D images, the most computation-intensive part of which involves magnifying and shifting each image via bilinear interpolation. I currently have a pair of functions for this task; "blnSetup" defines a handful of parameters outside the loop, then "bilinear" applies the interpolation point-by-point within the loop:
(NOTE: 'I' is a 1D array containing ordered rows of image data)
//Pre-definition structure (in header)
struct blnData{
float* X;
float* Y;
int* I;
float X0;
float Y0;
float delX;
float delY;
};
//Pre-definition function (outside the FOR loop)
extern inline blnData blnSetup(float* X, float* Y, int* I)
{
blnData bln;
//Create pointers to X, Y, and I vectors
bln.X = X;
bln.Y = Y;
bln.I = I;
//Store offset and step values for X and Y
bln.X0 = X[0];
bln.delX = X[1] - X[0];
bln.Y0 = Y[0];
bln.delY = Y[1] - Y[0];
return bln;
}
//Main interpolation function (inside the FOR loop)
extern inline float bilinear(float x, float y, blnData bln)
{
float Ixy;
//Return -1 if the target point is outside the image matrix
if (x < bln.X[0] || x > bln.X[-1] || y < bln.Y[0] || y > bln.Y[-1])
Ixy = 0;
//Otherwise, apply bilinear interpolation
else
{
//Define known image width
int W = 200;
//Find nearest indices for interpolation
int i = floor((x - bln.X0) / bln.delX);
int j = floor((y - bln.Y0) / bln.delY);
//Interpolate I at (xi, yj)
Ixy = 1 / ((bln.X[i + 1] - bln.X[i])*(bln.Y[j + 1] - bln.Y[j])) *
(
bln.I[W*j + i] * (bln.X[i + 1] - x) * (bln.Y[j + 1] - y) +
bln.I[W*j + i + 1] * (x - bln.X[i]) * (bln.Y[j + 1] - y) +
bln.I[W*(j + 1) + i] * (bln.X[i + 1] - x) * (y - bln.Y[j]) +
bln.I[W*(j + 1) + i + 1] * (x - bln.X[i]) * (y - bln.Y[j])
);
}
return Ixy;
}
EDIT: The function calls are below. 'flat.imgdata' is a std::vector containing the input image data and 'proj.imgdata' is a std::vector containing the transformed image.
int Xs = flat.dim[0];
int Ys = flat.dim[1];
int* Iarr = flat.imgdata.data();
float II, x, y;
bln = blnSetup(X, Y, Iarr);
for (int j = 0; j < flat.imgdata.size(); j++)
{
x = 1.2*X[j % Xs];
y = 1.2*Y[j / Xs];
II = bilinear(x, y, bln);
proj.imgdata[j] = (int)II;
}
Since I started optimizing, I have been able to reduce computation time by ~50x (!) by switching from std::vectors to C arrays within the interpolation function, and another 2x or so by cleaning up redundant computations/typecasting/etc, but assuming O(n) with n being the total number of processed pixels, the full reconstruction (~7e10 pixels) should still take 40min or so--about an order of magnitude longer than my goal of <5min.
According to Visual Studio's performance profiler, the interpolation function call ("II = bilinear(x, y, bln);") is unsurprisingly still the majority of my computation load. I haven't been able to find any linear algebraic methods for fast multiple interpolation, so my question is: is this basically as fast as my code will get, short of applying more or faster CPUs to the task? Or is there a different approach that might speed things up?
P.S. I've also only been coding in C++ for about a month now, so feel free to point out any beginner mistakes I might be making.
I wrote up a long answer suggesting looking at OpenCV (opencv.org), or using Halide (http://halide-lang.org/), and getting into how image warping is optimized, but I think a shorter answer might serve better. If you are really just scaling and translating entire images, OpenCV has code to do that and we have an example for resizing in Halide as well (https://github.com/halide/Halide/blob/master/apps/resize/resize.cpp).
If you really have an algorithm that needs to index an image using floating-point coordinates which result from a computation that cannot be turned into a moderately simple function on integer coordinates, then you really want to be using filtered texture sampling on a GPU. Most techniques for optimizing on the CPU rely on exploiting some regular pattern of access in the algorithm and removing float to integer conversion from the addressing. (For resizing, one uses two integer variables, one which indexes the pixel coordinate of the image and the other which is the fractional part of the coordinate and it indexes a kernel of weights.) If this is not possible, the speedups are somewhat limited on CPUs. OpenCV does provide fairly general remapping support, but it likely isn't all that fast.
Two optimizations that may be applicable here are trying to move the boundary condition out the loop and using a two pass approach in which the horizontal and vertical dimensions are processed separately. The latter may or may not win and will require tiling the data to fit in cache if the images are very large. Tiling in general is pretty important for large images, but it isn't clear it is the first order performance problem here and depending on the values in the inputs, the cache behavior may not be regular enough anyway.
"vector 50x slower than array". That's a dead giveaway you're in debug mode, where vector::operator[] is not inlined. You will probably get the necessary speedup, and a lot more, simply by switching to release mode.
As a bonus, vector has a .back() method, so you have a proper replacement for that [-1]. Pointers to the begin of an array don't contain the array size, so you can't find the back of an array that way.
I need to calculate the value of a high dimensional integral in C++. I have found numerous libraries capable of solving this task for fixed limit integrals,
\int_{0}^{L} \int_{0}^{L} dx dy f(x,y) .
However the integrals which I am looking at have variable limits,
\int_{0}^{L} \int_{x}^{L} dx dy f(x,y) .
To clarify what i mean, here is a naive 2D Riemann sum implementation in 2D, which returns the desired result,
int steps = 100;
double integral = 0;
double dl = L/((double) steps);
double x[2] = {0};
for(int i = 0; i < steps; i ++){
x[0] = dl*i;
for(int j = i; j < steps; j ++){
x[1] = dl*j;
double val = f(x);
integral += val*val*dl*dl;
}
}
where f is some arbitrary function and L the common upper integration limit. While this implementation works, it's slow and thus impractical for higher dimensions.
Effective algorithms for higher dimensions exist, but to my knowledge, library implementations (e.g. Cuba) take a fixed value vector as the limit argument which renders them useless for my problem.
Is there any reason for this and/or is there any trick to circumvent the problem?
Your integration order is wrong, should be dy dx.
You are integrating over the triangle
0 <= x <= y <= L
inside the square [0,L]x[0,L]. This can be simulated by integrating over the full square where the integrand f is defined as 0 outside of the triangle. In many cases, when f is defined on the full square, this can be accomplished by taking the product of f with the indicator function of the triangle as new integrand.
When integrating over a triangular region such as 0<=x<=y<=L one can take advantage of symmetry: integrate f(min(x,y),max(x,y)) over the square 0<=x,y<=L and divide the result by 2. This has an advantage over extending f by zero (the method mentioned by LutzL) in that the extended function is continuous, which improves the performance of the integration routine.
I compared these on the example of the integral of 2x+y over 0<=x<=y<=1. The true value of the integral is 2/3. Let's compare the performance; for demonstration purpose I use Matlab routine, but this is not specific to language or library used.
Extending by zero
f = #(x,y) (2*x+y).*(x<=y);
result = integral2(f, 0, 1, 0, 1);
fprintf('%.9f\n',result);
Output:
Warning: Reached the maximum number of function evaluations
(10000). The result fails the global error test.
0.666727294
Extending by symmetry
g = #(x,y) (2*min(x,y)+max(x,y));
result2 = integral2(g, 0, 1, 0, 1)/2;
fprintf('%.9f\n',result2);
Output:
0.666666776
The second result is 500 times more accurate than the first.
Unfortunately, this symmetry trick is not available for general domains; but integration over a triangle comes up often enough so it's useful to keep it in mind.
I was a bit confused by your integral definition but from your code i see it like this:
just did some testing so here is your code:
//---------------------------------------------------------------------------
double f(double *x) { return (x[0]+x[1]); }
void integral0()
{
double L=10.0;
int steps = 10000;
double integral = 0;
double dl = L/((double) steps);
double x[2] = {0};
for(int i = 0; i < steps; i ++){
x[0] = dl*i;
for(int j = i; j < steps; j ++){
x[1] = dl*j;
double val = f(x);
integral += val*val*dl*dl;
}
}
}
//---------------------------------------------------------------------------
Here is optimized code:
//---------------------------------------------------------------------------
void integral1()
{
double L=10.0;
int i0,i1,steps = 10000;
double x[2]={0.0,0.0};
double integral,val,dl=L/((double)steps);
#define f(x) (x[0]+x[1])
integral=0.0;
for(x[0]= 0.0,i0= 0;i0<steps;i0++,x[0]+=dl)
for(x[1]=x[0],i1=i0;i1<steps;i1++,x[1]+=dl)
{
val=f(x);
integral+=val*val;
}
integral*=dl*dl;
#undef f
}
//---------------------------------------------------------------------------
results:
[ 452.639 ms] integral0
[ 336.268 ms] integral1
so the increase in speed is ~ 1.3 times (on 32bit app on WOW64 AMD 3.2GHz)
for higher dimensions it will multiply
but still I think this approach is slow
The only thing to reduce complexity I can think of is algebraically simplify things
either by integration tables or by Laplace or Z transforms
but for that the f(*x) must be know ...
constant time reduction can of course be done
by the use of multi-threading
and or GPU ussage
this can give you N times speed increase
because this is all directly parallelisable
I have the following problem. Suppose you have a big array of Manhattan polygons on the plane (their sides are parallel to x or y axis). I need to find a polygons, placed closer than some value delta. The question - is how to make this in most effective way, because the number of this polygons is very large. I will be glad if you will give me a reference to implemented solution, which will be easy to adapt for my case.
The first thing that comes to mind is the sweep and prune algorithm (also known as sort and sweep).
Basically, you first find out the 'bounds' of each shape along each axis. For the x axis, these would be leftmost and rightmost points on a shape. For the y axis, the topmost and bottommost.
Lets say you have a bound structure that looks something like this:
struct Bound
{
float value; // The value of the bound, ie, the x or y coordinate.
bool isLower; // True for a lower bound (leftmost point or bottommost point).
int shapeIndex; // The index (into your array of shapes) of the shape this bound is on.
};
Create two arrays of these Bounds, one for the x axis and one for the y.
Bound xBounds* = new Bound[2 * numberOfShapes];
Bound yBounds* = new Bound[2 * numberOfShapes];
You will also need two more arrays. An array that tracks on how many axes each pair of shapes is close to one another, and an array of candidate pairs.
int closeAxes* = new int[numberOfShapes * numberOfShapes];
for (int i = 0; i < numberOfShapes * numberOfShapes; i++)
CloseAxes[i] = 0;
struct Pair
{
int shapeIndexA;
int shapeIndexB;
};
Pair candidatePairs* = new Pair[numberOfShapes * numberOfShape];
int numberOfPairs = 0;
Iterate through your list of shapes and fill the arrays appropriately, with one caveat:
Since you're checking for closeness rather than intersection, add delta to each upper bound.
Then sort each array by value, using whichever algorithm you like.
Next, do the following (and repeat for the Y axis):
for (int i = 0; i + 1 < 2 * numberOfShapes; i++)
{
if (xBounds[i].isLower && xBounds[i + 1].isLower)
{
unsigned int L = xBounds[i].shapeIndex;
unsigned int R = xBounds[i + 1].shapeIndex;
closeAxes[L + R * numberOfShapes]++;
closeAxes[R + L * numberOfShapes]++;
if (closeAxes[L + R * numberOfShapes] == 2 ||
closeAxes[R + L * numberOfShapes] == 2)
{
candidatePairs[numberOfPairs].shapeIndexA = L;
candidatePairs[numberOfPairs].shapeIndexB = R;
numberOfPairs++;
}
}
}
All the candidate pairs are less than delta apart on each axis. Now simply check each candidate pair to make sure they're actually less than delta apart. I won't go into exactly how to do that at the moment because, well, I haven't actually thought about it, but hopefully my answer will at least get you started. I suppose you could just check each pair of line segments and find the shortest x or y distance, but I'm sure there's a more efficient way to go about the 'narrow phase' step.
Obviously, the actual implementation of this algorithm can be a lot more sophisticated. My goal was to make the explanation clear and brief rather than elegant. Depending on the layout of your shapes and the sorting algorithm you use, a single run of this is approximately between O(n) and O(n log n) in terms of efficiency, as opposed to O(n^2) to check every pair of shapes.