Absolute value in objective function of linear optimization - c++

I'm trying to find the solution for the following expression
Objective function:
minimize(| x - c0 | + | y - c1 |)
Constraint:
0 < x < A
0 < y < B
where c0, c1, A, B are positive constants
Following the conversion given in
http://lpsolve.sourceforge.net/5.1/absolute.htm
I reworded the expression to
Constraints:
(x - c0) <= xbar
-1 *(x - c0) <= xbar
(y - c1) <= ybar
-1 *(y - c1) <= ybar
0 < x < A
0 < y < B
Objective function:
minimize(xbar + ybar)
However, I'm not able to implement this.
I tried the following snippet
#include "ortools/linear_solver/linear_solver.h"
#include "ortools/linear_solver/linear_expr.h"
MPSolver solver("distanceFinder", MPSolver::GLOP_LINEAR_PROGRAMMING);
MPVariable* x = solver.MakeNumVar(0, A, "x");
MPVariable* y = solver.MakeNumVar(0, B, "y");
const LinearExpr e = x;
const LinearExpr f = y;
LinearExpr X;
LinearExpr Y;
LinearRange Z = slope * e + offset == f; // Where 'slope' & 'offset' are real numbers.
solver.MakeRowConstraint(Z);
const LinearRange r = -1 * (e - c0) <= X;
const LinearRange s = (e - c0]) <= X ;
const LinearRange m = -1 * (f - c1) <= Y;
const LinearRange k = (f - c1) <= Y ;
solver.MakeRowConstraint(r);
solver.MakeRowConstraint(s);
solver.MakeRowConstraint(m);
solver.MakeRowConstraint(k);
MPObjective* const objective = solver.MutableObjective();
objective->MinimizeLinearExpr(X+Y);
I'm getting the error,
E0206 16:41:08.889048 80935 linear_solver.cc:1577] No solution exists. MPSolverInterface::result_status_ = MPSOLVER_INFEASIBLE
My use cases always produce feasible solutions (I'm trying to find the least manhattan distance between a point and a line).
I'm very new to using GOOGLE-OR tools. Please suggest any simpler solution I might have overlooked
Any help will be appreciated
Thanks,
Ram

Here is a working example. You mixed up variables in your code
const double A = 10.0;
const double B = 8.0;
const double c0 = 6.0;
const double c1 = 3.5;
MPSolver solver("distanceFinder", MPSolver::GLOP_LINEAR_PROGRAMMING);
MPVariable* x = solver.MakeNumVar(0, A, "x");
MPVariable* y = solver.MakeNumVar(0, B, "y");
MPVariable* xbar = solver.MakeNumVar(0, A, "xbar");
MPVariable* ybar = solver.MakeNumVar(0, B, "ybar");
LinearExpr X(x);
LinearExpr Y(y);
const LinearRange r = -1 * (X - c0) <= xbar;
const LinearRange s = (X - c0) <= xbar;
const LinearRange m = -1 * (Y - c1) <= ybar;
const LinearRange k = (Y - c1) <= ybar;
solver.MakeRowConstraint(r);
solver.MakeRowConstraint(s);
solver.MakeRowConstraint(m);
solver.MakeRowConstraint(k);
MPObjective *const objective = solver.MutableObjective();
objective->MinimizeLinearExpr(LinearExpr(xbar) + LinearExpr(ybar));
It computes
x = 6
y = 3.5
xbar = 0
ybar = -0

Related

Matrix out of bounds but can't put condition to check bounds

I have a square matrix, 40 x 40, and a draw circle function that uses this formula.
I have another function that reads input from a file, the point itself (x0, y0) and the type of circle (0 or 1) and the radius.
void cerc(int x0, int y0, int r, int** matriceHarta, int tip, int n, int m)
{
if (r == 0)
return;
int x, y, xx, rr;
for (rr = r * r, x = -r; x <= r; x++)
for (xx = x * x, y = -r; y <= r; y++)
if (xx + (y * y) <= rr && matriceHarta[x0 + x][y0 + y] == 0)
{
if (tip == 0)
matriceHarta[x0 + x][y0 + y] = -5;
else if (tip == 1)
matriceHarta[x0 + x][y0 + y] = -6;
}
}
N and M are the rows and columns, but right now they are both equal.
The matrix is allocated dynamically and is transmitted via the int** matriceHarta parameter.
If I put the point on (39, 39) and I give it the radius 5, the program returns a negative exit code, which I found out is an out of bounds related error. I looked over the for loops and it makes sense that that'd be the error and tried to create the condition if((x0 + x) < n && (y0 + y) < m) to check the bounds, but it still gives the error.
Question is, what am I doing wrong? For contrast, point(37, 4) with radius = 2 is OK, but point(38, 4) with radius = 2 is not OK
This is the attempted fix:
for (rr = r * r, x = -r; x <= r; x++)
for (xx = x * x, y = -r; y <= r; y++)
if (xx + (y * y) <= rr && matriceHarta[x0 + x][y0 + y] == 0
&& (((x0+x) < n) && ((y0+y) < m)) )
//^^^^^ this is the condition i was talking about
{
if (tip == 0)
matriceHarta[x0 + x][y0 + y] = -5;
else if (tip == 1)
matriceHarta[x0 + x][y0 + y] = -6;
}
The issue is that you are testing for the out-of-bounds condition after you have already accessed potential out-of-bounds elements.
Let's break it down into separate lines:
if (xx + (y * y) <= rr && matriceHarta[x0 + x][y0 + y] == 0
&& // <-- This binds the conditions
(((x0+x) < n) && ((y0+y) < m)))
The line above the && marked with <-- is evaluated before the line below the <--.
In summary, the logical && is always evaluated from left-to-right, where the right side will not be evaluated if the left side evaluates to false (short-circuit boolean evaluation).
Thus the fix is to test the bounds condition first (swap the lines in the code above).
However, to make this a little more clear, you could break up the statement into two if statements:
if (x0+x < n && y0+y < m)
{
if (xx + (y * y) <= rr && matriceHarta[x0 + x][y0 + y] == 0)
{
...
}
}

Compare roots of quadratic functions

I need function to fast compare root of quadratic function and a given value and function to fast compare two roots of two quadratic functions.
I write first function
bool isRootLessThanValue (bool sqrtDeltaSign, int a, int b, int c, int value) {
bool ret;
if(sqrtDeltaSign){
if(a < 0){
ret = (2*a*value + b < 0) || (a*value*value + b*value + c > 0);
}else{
ret = (2*a*value + b > 0) && (a*value*value + b*value + c > 0);
}
}else{
if(a < 0){
ret = (2*a*value + b < 0) && (a*value*value + b*value + c < 0);
}else{
ret = (2*a*value + b > 0) || (a*value*value + b*value + c < 0);
}
}
return ret;
};
When i try to write this for second function it grow to very big and complicated...
bool isRoot1LessThanRoot2 (bool sqrtDeltaSign1, int a1, int b1, int c1, bool sqrtDeltaSign2, int a2, int b2, int c2) {
//...
}
Have u any suggestions how can i simplify this function?
If you think thats stupid idea for optimizations, please tell me why :)
I give a simpified version of the first part of your code by comparing the greater root of the quadratic function with a given value as follows:
#include <iostream>
#include <cmath> // for main testing
int isRootLessThanValue (int a, int b, int c, int value)
{
if (a<0){ b *= -1; c *= -1; a *= -1;}
int xt, delta;
xt = 2 * a * value + b;
if (xt < 0) return false; // value is left to reflection point
delta = b*b - 4*a*c;
// compare square distance between value and the root
return ( (xt * xt) > delta )? true: false;
}
In the test main() program, the roots are first calculate for clarity purpose:
int main()
{
int a, b, c, v;
a = -2;
b = 4;
c = 3;
double r1, r2, r, dt;
dt = std::sqrt(b*b-4.0*a*c);
r1 = (-b + dt) / (2.0*a);
r2 = (-b - dt) / (2.0*a);
r = (r1>r2)? r1 : r2;
while (1)
{
std::cout << "Input the try value = ";
std::cin >> v;
if (isRootLessThanValue(a,b,c,v)) std::cout << v <<" > " << r << std::endl;
else std::cout << v <<" < " << r << std::endl;
}
return 0;
}
A test run
The following assumes that both quadratics have real, mutually distinct roots, and a1 = a2 = 1. This keeps the notations simpler, though similar logic can be used in the general case.
Suppose f(x) = x^2 + b1 x + c1 has the real roots u1 < u2, and g(x) = x^2 + b2 x + c2 has the real roots v1 < v2. Then there are 6 possible sort orders.
(1)   u1 < u2 < v1 < v2
(2)   u1 < v1 < u2 < v2
(3)   u1 < v1 < v2 < u2
(4)   v1 < u1 < u2 < v2
(5)   v1 < u1 < v2 < u2
(6)   v1 < v2 < u1 < u2
Let v be a root of g so that g(v) = v^2 + b2 v + c2 = 0 then v^2 = -b2 v - c2 and therefore f(v) = (b1 - b2) v + c1 - c2 = b12 v + c12 where b12 = b1 - b2 and c12 = c1 - c2.
It follows that Sf = f(v1) + f(v2) = b12(v1 + v2) + 2 c12 and Pf = f(v1) f(v2) = b12^2 v1 v2 + b12 c12 (v1 + v2) + c12^2. Using Vieta's relations v1 v2 = c2 and v1 + v2 = -b2 so in the end Sf = f(v1) + f(v2) = -b12 b2 + 2 c2 and Pf = f(v1) f(v2) = b12^2 c2 - b12 c12 b2 + c12^2. Similar expressions can be calculated for Sg = g(u1) + g(u2) and Pg = g(u1) g(u2).
(Should be noted that Sf, Pf, Sg, Pg above are arithmetic expressions in the coefficients, not involving sqrt square roots. There is, however, the potential for integer overflow. If that is an actual concern, then the calculations would have to be done in floating point instead of integers.)
If Pf = f(v1) f(v2) < 0 then exactly one root of f is between the roots v1, v2 of g.
If the axis of f is to the left of the g one, meaning -b1 < -b2, then that's the smaller root u1 of f which is between v1, v2 i.e. case (5).
Otherwise if -b1 > -b2 then that's the larger root i.e. case (2).
If Pf = f(v1) f(v2) > 0 then either both or none of the roots of f are between the roots of g. In this case f(v1) and f(v2) must have the same sign, and they will either be both negative if Sf = f(v1) + f(v2) < 0 or both positive if Sf > 0.
If f(v1) < 0 and f(v2) < 0 then both roots v1, v2 of g are between the roots of f i.e. case (3).
By symmetry, if Pg > 0 and Sg < 0 then g(u1) < 0 and g(u2) < 0, so both roots u1, u2 of f are between the roots of g i.e. case (4).
Otherwise the last combination left is f(v1), f(v2) > 0 and g(u1), g(u2) > 0 where the intervals (u1, u2) and (v1, v2) do not overlap. If -b1 < -b2 the axis of f is to the left of the g one i.e. case (1) else it's case (6).
Once the sort order between all roots is determined, comparing any particular pair of roots follows.
We are definitely talking micro-optimization here, but consider making calculations before performing the comparison:
bool isRootLessThanValue (bool sqrtDeltaSign, int a, int b, int c, int value)
{
const int a_value = a * value;
const int two_a_b_value = 2 * a_value + b;
const int a_squared_b = a_value * value + b * value + c;
const bool two_ab_less_zero = (two_a_b_value < 0);
bool ret = false;
if(sqrtDeltaSign)
{
const bool a_squared_b_greater_zero = (a_squared_b > 0);
if (a < 0)
{
ret = two_ab_less_zero || a_squared_b_greater_zero;
}
else
{
ret = !two_ab_less_zero && a_squared_b_greater_zero;//(edited)
}
}
else
{
const bool a_squared_b_less_zero = (a_squared_b < 0);
if (a < 0)
{
ret = two_ab_less_zero && a_squared_b_less_zero;
}
else
{
ret = !two_ab_less_zero || a_squared_b_less_zero;//(edited)
}
}
return ret;
};
Another note, is that the boolean expression is calculated and stored in a variable, thus could be counted as a data processing instruction (depending on the compiler and processor).
Compare the assembly language of this function to yours. Also benchmark. As I said, I'm not expecting much time savings here, but I don't know how many times this function is called in your code.
Im reorganising my code and have found some facilities :)
When calculate a, b and c i can keep structure to get only a > 0 :)
and i know that i want small or big root :)
so function to compare root to value is regresed to the form below
bool isRootMinLessThanValue (int a, int b, int c, int value) {
const int a_value = a * value;
const int u = 2*a_value + b;
const int v = a_value*value + b*value + c;
return u > 0 || v < 0 ;
};
bool isRootMaxLessThanValue (int a, int b, int c, int value) {
const int a_value = a*value;
const int u = 2*a_value + b;
const int v = a_value*value + b*value + c;
return u > 0 && v > 0;
}
when im testing benchmark its faster than calculate roots traditionaly (by assumptions I cannot say how much)
Below code for fast (and slow traditionaly) compare root to value without assumptions
bool isRootLessThanValue (bool sqrtDeltaSign, int a, int b, int c, int value) {
const int a_value = a*value;
const int u = 2*a_value + b;
const int v = a_value*value + b*value + c;
const bool s = sqrtDeltaSign;
return ( a < 0 && s && u < 0 ) ||
( a < 0 && s && v > 0 ) ||
( a < 0 && !s && u < 0 && v < 0) ||
(!(a < 0) && !s && u > 0 ) ||
(!(a < 0) && !s && v < 0 ) ||
(!(a < 0) && s && u > 0 && v > 0);
};
bool isRootLessThanValueTraditional (bool sqrtDeltaSign, int a, int b, int c, int value) {
double delta = b*b - 4.0*a*c;
double calculatedRoot = sqrtDeltaSign ? (-b + sqrt(delta))/(2.0*a) : (-b - sqrt(delta))/(2.0*a);
return calculatedRoot < value;
};
benchmark results below:
isRootLessThanValue (optimized): 10000000000 compares in 152.922s
isRootLessThanValueTraditional : 10000000000 compares in 196.168s
Any suggestions how can i simplify even more isRootLessThanValue function? :)
I will try to prepare function to compare two roots of different equations
edited::2020-11-30
bool isRootLessThanValue (bool sqrtDeltaSign, int a, int b, int c, int value) {
const int a_value = a*value;
const int u = 2*a_value + b;
const int v = a_value*value + b*value + c;
return sqrtDeltaSign ?
(( a < 0 && (u < 0 || v > 0) ) || (u > 0 && v > 0)) :
(( a > 0 && (u > 0 || v < 0) ) || (u < 0 && v < 0));
};

How to extract variables from a equation?

Im studying some code and I would like help with some math. Im trying to solve the equation of the tangent line on a circle with given point of tangency.
//(x1 - p)(x - p) +(y1 - q)(y - q) = r^2 I understand this formula
//variables
//x1 = point.x
//y1 = point.y
//p = center.x
//q = center.y
//r = radius
edit: here is the whole function, maybe it will help. My teacher gave it to me to study, but maybe he is trolling me :D
const std::pair<double, double> Arc::tangentEquation(const glm::vec3& center, const glm::vec3& pointA, float radius) const {
if (radius <= 0.0f)
throw std::domain_error("Radius can't be negative or 0");
// Jednadžba tangente u točki T
// (x1 - p)(x - p) + (y1 - q)(y - q) = r^2
glm::vec3 point = pointA + center;
double px = -1 * (center.x * point.x);
double qy = -1 * (center.y * point.y);
double x = point.x - center.x;
double y = point.y - center.y;
double k = 0.0;
double l = (pow(radius, 2) - (px + pow(center.x, 2) + qy + pow(center.y, 2)));
if (y == 0) { // paralelan s x os
k = l / x;
l = 0;
} else if (x == 0) { // paralelan s y os
l = l / y;
k = 0;
} else {
k = -x / y;
l = l / y;
}
return std::pair<double, double>(k, l);
}
The code does not implement the formula on the first line, so I don't think it is strange that you don't understand :-)
(x1 - p)(x - p) + (y1 - q)(y - q)
If we write out all the terms in the parenthesis multiplication, we get:
x1*x - p*x - p*x1 + p^2 + y1*y - q*y - q*y1 + q^2
(https://www.youtube.com/watch?v=3s_lroR5_1U for very pedagogic explanation)
But your code looses half of these terms....?

Point within a triangle: barycentric co-ordinates

I'm solving a classic problem of determining whether a point is within a triangle, and I'm using the barycentric co-ordinates method.
For some reason (I think it's the logic, not the precision) it doesn't pass all the tests.
What could be wrong?
The code is this:
#include <iostream>
using namespace std;
struct point
{
int x;
int y;
};
bool Place(point &A, point &B, point &C, point &P)
{
double det = (B.y - C.y)*(A.x - C.x) + (C.x - B.x)*(A.y - C.y);
double factor_alpha = (B.y - C.y)*(P.x - C.x) + (C.x - B.x)*(P.y - C.y);
double factor_beta = (C.y - A.y)*(P.x - C.x) + (A.x - C.x)*(P.y - C.y);
double alpha = factor_alpha / det;
double beta = factor_beta / det;
double gamma = 1.0 - alpha - beta;
bool In = false;
if (((A.x == P.x) & (A.y == P.y)) | ((B.x == P.x) & (B.y == P.y)) | ((C.x == P.x) & (C.y == P.y)))
In = true; // the sneaky guys are trying to see if the vertice of the triangle belongs to it
// the problem statement says it does.
if ((alpha == 0) | (beta == 0) | (gamma == 0))
In = true; // the point is on the edge of the triangle
if (( (0 < alpha) & (alpha < 1)) & ((0 < beta) & (beta < 1)) & ((0 < gamma) & (gamma < 1)))
In = true; // in this case P is actually within the triangle area
return In;
}
int main()
{
point A, B, C, P;
cin >> A.x >> A.y >> B.x >> B.y >> C.x >> C.y >> P.x >> P.y;
Place(A, B, C, P) ? cout << "In\n" : cout << "Out\n";
return 0;
}
Your logic says that the point is on the edge if at least one of alpha, beta, or gamma is 0.
That's necessary but not sufficient; the other ones must also be in the interval [0, 1].
Since you're not interested in the "edge" case specifically, you could write
if (0 <= alpha && alpha <= 1 && 0 <= beta && beta <= 1 && 0 <= gamma && gamma <= 1)
In = true;
(I removed some brackets and replaced the bitwise & with the logical &&.)
Readability suggestion:
Introducing a couple of functions makes the code look more like a mathematical definition:
bool operator ==(const point& a, const point& b)
{
return a.x == b.x && a.y == b.y;
}
bool within(double x)
{
return 0 <= x && x <= 1;
}
bool Place(const point &A, const point &B, const point &C, const point &P)
{
double det = (B.y - C.y)*(A.x - C.x) + (C.x - B.x)*(A.y - C.y);
double factor_alpha = (B.y - C.y)*(P.x - C.x) + (C.x - B.x)*(P.y - C.y);
double factor_beta = (C.y - A.y)*(P.x - C.x) + (A.x - C.x)*(P.y - C.y);
double alpha = factor_alpha / det;
double beta = factor_beta / det;
double gamma = 1.0 - alpha - beta;
return P == A || P == B || P == C || (within(alpha) && within(beta) && within(gamma));
}
Try this function (assumes that you have a Point class template where T is the stored type, and also that you have overloaded operator* for computing the dot product):
template <typename T>
bool is_point_in_triangle(const Point<3,T>& p,
const Point<3,T>& a,
const Point<3,T>& b,
const Point<3,T>& c) {
typedef Point<3,T> point_type;
point_type v0 = b-a, v1 = c-a, v2 = p-a;
T d00 = v0*v0;
T d01 = v0*v1;
T d11 = v1*v1;
T d20 = v2*v0;
T d21 = v2*v1;
T denom = d00*d11 - d01*d01;
// compute parametric coordinates
Real v = (d11 * d20 - d01 * d21) / denom;
Real w = (d00 * d21 - d01 * d20) / denom;
return v >= 0. && w >= 0. && v + w <= 1.;
}
As a side note, you're using int to store the coordinates, so your tests may be failing because of the truncation?
Good luck!

Optimizing C++ code for performance

Can you think of some way to optimize this piece of code? It's meant to execute in an ARMv7 processor (Iphone 3GS):
4.0% inline float BoxIntegral(IplImage *img, int row, int col, int rows, int cols)
{
0.7% float *data = (float *) img->imageData;
1.4% int step = img->widthStep/sizeof(float);
// The subtraction by one for row/col is because row/col is inclusive.
1.1% int r1 = std::min(row, img->height) - 1;
1.0% int c1 = std::min(col, img->width) - 1;
2.7% int r2 = std::min(row + rows, img->height) - 1;
3.7% int c2 = std::min(col + cols, img->width) - 1;
float A(0.0f), B(0.0f), C(0.0f), D(0.0f);
8.5% if (r1 >= 0 && c1 >= 0) A = data[r1 * step + c1];
11.7% if (r1 >= 0 && c2 >= 0) B = data[r1 * step + c2];
7.6% if (r2 >= 0 && c1 >= 0) C = data[r2 * step + c1];
9.2% if (r2 >= 0 && c2 >= 0) D = data[r2 * step + c2];
21.9% return std::max(0.f, A - B - C + D);
3.8% }
All this code is taken from the OpenSURF library. Here's the context of the function (some people were asking for the context):
//! Calculate DoH responses for supplied layer
void FastHessian::buildResponseLayer(ResponseLayer *rl)
{
float *responses = rl->responses; // response storage
unsigned char *laplacian = rl->laplacian; // laplacian sign storage
int step = rl->step; // step size for this filter
int b = (rl->filter - 1) * 0.5 + 1; // border for this filter
int l = rl->filter / 3; // lobe for this filter (filter size / 3)
int w = rl->filter; // filter size
float inverse_area = 1.f/(w*w); // normalisation factor
float Dxx, Dyy, Dxy;
for(int r, c, ar = 0, index = 0; ar < rl->height; ++ar)
{
for(int ac = 0; ac < rl->width; ++ac, index++)
{
// get the image coordinates
r = ar * step;
c = ac * step;
// Compute response components
Dxx = BoxIntegral(img, r - l + 1, c - b, 2*l - 1, w)
- BoxIntegral(img, r - l + 1, c - l * 0.5, 2*l - 1, l)*3;
Dyy = BoxIntegral(img, r - b, c - l + 1, w, 2*l - 1)
- BoxIntegral(img, r - l * 0.5, c - l + 1, l, 2*l - 1)*3;
Dxy = + BoxIntegral(img, r - l, c + 1, l, l)
+ BoxIntegral(img, r + 1, c - l, l, l)
- BoxIntegral(img, r - l, c - l, l, l)
- BoxIntegral(img, r + 1, c + 1, l, l);
// Normalise the filter responses with respect to their size
Dxx *= inverse_area;
Dyy *= inverse_area;
Dxy *= inverse_area;
// Get the determinant of hessian response & laplacian sign
responses[index] = (Dxx * Dyy - 0.81f * Dxy * Dxy);
laplacian[index] = (Dxx + Dyy >= 0 ? 1 : 0);
#ifdef RL_DEBUG
// create list of the image coords for each response
rl->coords.push_back(std::make_pair<int,int>(r,c));
#endif
}
}
}
Some questions:
Is it a good idea that the function is inline?
Would using inline assembly provide a significant speedup?
Specialize for the edges so that you don't need to check for them in every row and column. I assume that this call is in a nested loop and is called a lot. This function would become:
inline float BoxIntegralNonEdge(IplImage *img, int row, int col, int rows, int cols)
{
float *data = (float *) img->imageData;
int step = img->widthStep/sizeof(float);
// The subtraction by one for row/col is because row/col is inclusive.
int r1 = row - 1;
int c1 = col - 1;
int r2 = row + rows - 1;
int c2 = col + cols - 1;
float A(data[r1 * step + c1]), B(data[r1 * step + c2]), C(data[r2 * step + c1]), D(data[r2 * step + c2]);
return std::max(0.f, A - B - C + D);
}
You get rid of a conditional and branch for each min and two conditionals and a branch for each if. You can only call this function if you already meet the conditions -- check that in the caller for the whole row once instead of each pixel.
I wrote up some tips for optimizing image processing when you have to do work on each pixel:
http://www.atalasoft.com/cs/blogs/loufranco/archive/2006/04/28/9985.aspx
Other things from the blog:
You are recalculating a position in the image data with 2 multiplies (indexing is multiplication) -- you should be incrementing a pointer.
Instead of passing in img, row, row, col and cols, pass in pointers to the exact pixels to process -- which you get from incrementing pointers, not indexing.
If you don't do the above, step is the same for all pixels, calculate it in the caller and pass it in. If you do 1 and 2, you won't need step at all.
There are a few places to reuse temporary variables, but whether it would improve performance would have to be measured as dirkgently stated:
Change
if (r1 >= 0 && c1 >= 0) A = data[r1 * step + c1];
if (r1 >= 0 && c2 >= 0) B = data[r1 * step + c2];
if (r2 >= 0 && c1 >= 0) C = data[r2 * step + c1];
if (r2 >= 0 && c2 >= 0) D = data[r2 * step + c2];
to
if (r1 >= 0) {
int r1Step = r1 * step;
if (c1 >= 0) A = data[r1Step + c1];
if (c2 >= 0) B = data[r1Step + c2];
}
if (r2 >= 0) {
int r2Step = r2 * step;
if (c1 >= 0) C = data[r2Step + c1];
if (c2 >= 0) D = data[r2Step + c2];
}
You may actually end up doing the temp multiplactions too often in case your if statements rarely provides true.
You aren't interested in four variables A, B, C, D, but only the combination A - B - C + D.
Try
float result(0.0f);
if (r1 >= 0 && c1 >= 0) result += data[r1 * step + c1];
if (r1 >= 0 && c2 >= 0) result -= data[r1 * step + c2];
if (r2 >= 0 && c1 >= 0) result -= data[r2 * step + c1];
if (r2 >= 0 && c2 >= 0) result += data[r2 * step + c2];
if (result > 0f) return result;
return 0f;
The compiler probably handles inling automatically where it's proper.
Without any knowledge about the context. Is the if(r1 >= 0 && c1 >= 0) check necessary?
Isn't it required that the row and col parameters are > 0?
float BoxIntegral(IplImage *img, int row, int col, int rows, int cols)
{
assert(row > 0 && col > 0);
float *data = (float*)img->imageData; // Don't use C-style casts
int step = img->widthStep/sizeof(float);
// Is the min check rly necessary?
int r1 = std::min(row, img->height) - 1;
int c1 = std::min(col, img->width) - 1;
int r2 = std::min(row + rows, img->height) - 1;
int c2 = std::min(col + cols, img->width) - 1;
int r1_step = r1 * step;
int r2_step = r2 * step;
float A = data[r1_step + c1];
float B = data[r1_step + c2];
float C = data[r2_step + c1];
float D = data[r2_step + c2];
return std::max(0.0f, A - B - C + D);
}
Some of the examples say to initialize A, B, C and D directly and skip the initialization with 0, but this is functionally different than your original code in some ways. I would do this however:
inline float BoxIntegral(IplImage *img, int row, int col, int rows, int cols) {
const float *data = (float *) img->imageData;
const int step = img->widthStep/sizeof(float);
// The subtraction by one for row/col is because row/col is inclusive.
const int r1 = std::min(row, img->height) - 1;
const int r2 = std::min(row + rows, img->height) - 1;
const int c1 = std::min(col, img->width) - 1;
const int c2 = std::min(col + cols, img->width) - 1;
const float A = (r1 >= 0 && c1 >= 0) ? data[r1 * step + c1] : 0.0f;
const float B = (r1 >= 0 && c2 >= 0) ? data[r1 * step + c2] : 0.0f;
const float C = (r2 >= 0 && c1 >= 0) ? data[r2 * step + c1] : 0.0f;
const float D = (r2 >= 0 && c2 >= 0) ? data[r2 * step + c2] : 0.0f;
return std::max(0.f, A - B - C + D);
}
like your original code, this will make A, B, C and D have a value either from data[] if the condition is true or 0.0f if the condition is false. Also, I would (as I have shown) use const wherever it is appropriate. Many compilers aren't able to improve code much based on const-ness, but it certainly can't hurt to give the compiler more information about the data it is operating on. Finally I have reordered the r1/r2/c1/c2 variables to encourage reuse of the fetched width and height.
Obviously you would need to profile to determine if any of this is actually an improvement.
I am not sure if your problem lends itself to SIMD but this could potentially allow you to perform multiple operations on your image at once and give you a good performance improvement. I am assuming you are inlining and optimizing because you are performing the operation multiple times. Take a look at:
http://blogs.arm.com/software-enablement/coding-for-neon-part-1-load-and-stores/
http://blogs.arm.com/software-enablement/coding-for-neon-part-2-dealing-with-leftovers/
http://blogs.arm.com/software-enablement/coding-for-neon-part-3-matrix-multiplication/
http://blogs.arm.com/software-enablement/coding-for-neon-part-4-shifting-left-and-right/
Compiler do have some support for Neon if the correct flags are enabled but you will probably need to roll out some on your own.
Edit
To get compiler support for neon you will need to use the compiler flag -mfpu=neon