Performance issues with simple calculations

Performance issues with simple calculations - c++

EDIT 2: 16% decrease in program computation time! See bottom for calculation
I have written a N-body simulator, implementing the Barnes-Hut algorithm. Now I have a innocent looking function called CheckNode. Its simple and doesn't take long to compute, but the issue is, it gets called millions of times, so it takes up most of the calculation time between each frame.
I profiled the code, and this function is responsible for 84.58% of the total calculation time, and this is with only 10K particles, when I do it with up to 10x this, this function uses a greater and greater percentage.
Now here is the function, with percentage of time spent on the right in the red.
Now there are some alarming things here, Like a simple if statement taking 9.17% and another if statement accounting for over 20% of computation time! Is there any, even the slightest optimisation that can be done here, that would be multiplied over millions of function calls to allow my program to run faster?
EDIT:
Here is the CalculateForceNode function:
void CalculateForceNode(Body* bi, Node* bj) //bi is being attracted to bj. 15 flops of calculation
{
//vector from the body to the center of mass
double vectorx = bj->CenterOfMassx - bi->posX;
double vectory = bj->CenterOfMassy - bi->posY;
//c^2 = a^2 + b^2 + softener^2
double distSqr = vectorx * vectorx + vectory * vectory + Softener * Softener;
// ivnDistCube = 1/distSqr^(3/2)
double distSixth = distSqr * distSqr * distSqr;
double invDistCube = 1.0f / (sqrt(distSixth));
double Accel = (bj->TotalMass * invDistCube * _GRAV_CONST);
bi->AccelX += vectorx * Accel;
bi->AccelY += vectory * Accel;
}
EDIT 2:
Results of optimisations
The CheckNode function now takes up 82.03% of the total computation time (measured over a 1 min 37 sec sample), as opposed to previously it took up 84.58%.
Now logic tells that the remaining 15% of calculation time, took the same as the remaining 18% calculation time of the second program. So these identical periods (Its the same code) took 15% of the first program, and 18% of the second program. Letting the time to complete this other code be x the 1st program took 1/0.15 = 6.666x and the second took 1/0.18 = 5.555x. Then you can find the fraction that 5.555x is of 6.666x which calculates to be ~0.83 and therefor there was a (1 - 0.83 = 0.16) 16% decrease in program computation time!

First thing I would try is to reverse the elements in one of your conditions, replace:
if(withSqr / distanceSqr < nodeThresholdSqr || pNode->HasChildren == false)
with:
if(pNode->HasChildren == false || (withSqr / distanceSqr < nodeThresholdSqr))
If the first part of the condition is true pNode->HasChildren == false than the second one (withSqr / distanceSqr < nodeThresholdSqr) will never be executed (read: evaluated). Checking simple condition is much faster than operations on floating point numbers (division in your case). You can even take it to the next level: *do you need to compute the distanceSqr AT ALL when pNode->HasChildren == false ?
EDIT: even better:
if(pNode->HasChildren == false)
{
CalculateForceNode(pBody,pNode);
}
else
{
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double withSqr = pNode->width * pNode->width;
if(withSqr / distanceSqr < nodeThresholdSqr)
{
CalculateForceNode(pBody,pNode);
}
else
{//if not, repeat function with child
if(pNode->Child[0]->Bodies.size() > 0)
CheckNode(pNode->Child[0],pBody);
//..... - all the rest of your code
}
}

Profiling based on time spent is not enough, you need to know what was this time spent in - in other words use a more advanced profiler.
Also you don't mention any information about compiler or platform you are using.
For the if statement that is using 9% of the time, I don't think it is spent in the comparison, it is spent in fetching data. You have multiple levels of indirection (accessing data using pointer that takes you to another pointer and so on). This is bad for caching and branch prediction, and I guess you are spending time fetching data from memory or doing useless calculations because of branch miss prediction, not doing the actual comparison.
another note that I noticed: if (pNode->HasChildren == false) then you don't need all the calculations you made to find widthSqr. I think you should restructure your logic to check for this first, if the condition is false then you can can calculate widthSqr and continue your logic.

Since the function is called a lot of times you should get rid of the overhead of calling CalculateForceNode(...) by manually inlining the code. One you do this you will notice other tricks to apply:
void CheckNode(Node* pNode, Body* pBody)
{
double diffX = (pNode->CenterOfMass - pBody->posX);
double diffY = (pNode->CenterOfMass - pBody->posY);
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double widthSqr = pNode->width * pNode->width;
if (widthSqr / distanceSqr < NodeThresholdSqr || pNode->hasChildren == false)
{
//vector from the body to the center of mass
double vectorx = pNode->CenterOfMassx - pBody->posX;
double vectory = pNode->CenterOfMassy - pBody->posY;
//c^2 = a^2 + b^2 + softener^2
double distSqr = vectorx * vectorx + vectory * vectory + Softener * Softener;
// ivnDistCube = 1/distSqr^(3/2)
double distSixth = distSqr * distSqr * distSqr;
double invDistCube = 1.0f / (sqrt(distSixth));
double Accel = (pNode->TotalMass * invDistCube * _GRAV_CONST);
pBody->AccelX += vectorx * Accel;
pBody->AccelY += vectory * Accel;
}
else
{
CheckChildren(pNode, pBody);
}
}
Now you can see that diffX = vectorx, diffY = vectory, distSqr = distanceSqr*Softner*Softner. Reusing some of the calculation already made and precomputing whatever is possible should save you some cycles:
void CheckNode(Node* pNode, Body* pBody)
{
double diffX = (pNode->CenterOfMass - pBody->posX);
double diffY = (pNode->CenterOfMass - pBody->posY);
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double widthSqr = pNode->width * pNode->width;
double SoftnerSq = Softener * Softener; //precompute this value
if (widthSqr / distanceSqr < NodeThresholdSqr || pNode->hasChildren == false)
{
//c^2 = a^2 + b^2 + softener^2
double distSqr = distanceSqr + SoftnerSq;
// ivnDistCube = 1/distSqr^(3/2)
double distSixth = distSqr * distSqr * distSqr;
double invDistCube = 1.0f / (sqrt(distSixth));
double Accel = (pNode->TotalMass * invDistCube * _GRAV_CONST);
pBody->AccelX += diffX * Accel;
pBody->AccelY += diffY * Accel;
}
else
{
CheckChildren(pNode, pBody);
}
}
Hope this works for you.

First you should inline function Bodies.size() or access size directly so there is no overhead with function calling (it takes time to push all needed information to stack and pop it off).
I don't see all the code, but it looks like you can precalculate widthSqr. It can be calculated when the width is assigned not in the function.
You are using a lot of pointers here and it looks like your structures are scattered all over the memory. This will generate a lot CPU cache misses. Make sure that all the data needed for computation are in one, long, continuous and compact memory area.
In CalculateForceNode check if Softener*Softener can be precalculated. sqrt function is very time consuming. sqrt algorithm is iterative so you can sacrifice accuracy for speed by doing less iterations or you can use Look up tables.
You are doing the same calculations twice in CalculateForceNode.
void CalculateForceNode(Body* bi, Node* bj)
{
//vector from the body to the center of mass
double vectorx = bj->CenterOfMassx - bi->posX;
double vectory = bj->CenterOfMassy - bi->posY;
//c^2 = a^2 + b^2 + softener^2
double distSqr = vectorx * vectorx + vectory * vectory...
vectorx,vectory and distSqr were already calulated in CheckNode as diffX, diffY and distanceSqr. Manually inline whole function CalculateForceNode.

Swap your if statement around and move all your calculations inside the pNode->hasChildren == false part:
void CheckChildren(Node* pNode, Body* pBody)
{
if (pNode->Child[0]->Bodies.size() > 0)
CheckNode(...
}
void CheckNode(Node* pNode, Body* pBody)
{
if (pNode->hasChildren != false)
{
double diffX = (pNode->CenterOfMass - pBody->posX);
double diffY = (pNode->CenterOfMass - pBody->posY);
double distanceSqr = ((diffX * diffX) + (diffY * diffY));
double widthSqr = pNode->width * pNode->width;
if (widthSqr / distanceSqr < NodeThresholdSqr)
{
CalculateForceNode(pBody, pNode);
}
else
{
CheckChildren(pNode, pBody);
}
}
else
{
CheckChildren(pNode, pBody);
}
}

Related

Batch gradient descent algorithm does not converge

I'm trying to implement batch grandient descent algorithm for my machine learning homework. I have a training set, whose x value is around 10^3 and y value is around 10^6. I'm trying to find the value of [theta0, theta1] which makes y = theta0 + theta1 * x converge. I set the learning rate to 0.0001 and maximum interation to 10. Here's my code in Qt.
QVector<double> gradient_descent_batch(QVector<double> x, QVector<double>y)
{
QVector<double> theta(0);
theta.resize(2);
int size = x.size();
theta[1] = 0.1;
theta[0] = 0.1;
for (int j=0;j<MAX_ITERATION;j++)
{
double dJ0 = 0.0;
double dJ1 = 0.0;
for (int i=0;i<size;i++)
{
dJ0 += (theta[0] + theta[1] * x[i] - y[i]);
dJ1 += (theta[0] + theta[1] * x[i] - y[i]) * x[i];
}
double theta0 = theta[0];
double theta1 = theta[1];
theta[0] = theta0 - LRATE * dJ0;
theta[1] = theta1 - LRATE * dJ1;
if (qAbs(theta0 - theta[0]) < THRESHOLD && qAbs(theta1 - theta[1]) < THRESHOLD)
return theta;
}
return theta;
}
I print the value of theta every interation, and here's the result.
QVector(921495, 2.29367e+09)
QVector(-8.14503e+12, -1.99708e+16)
QVector(7.09179e+19, 1.73884e+23)
QVector(-6.17475e+26, -1.51399e+30)
QVector(5.3763e+33, 1.31821e+37)
QVector(-4.68109e+40, -1.14775e+44)
QVector(4.07577e+47, 9.99338e+50)
QVector(-3.54873e+54, -8.70114e+57)
QVector(3.08985e+61, 7.57599e+64)
QVector(-2.6903e+68, -6.59634e+71)
I seems that theta will never converge.
I follow the solution here to set learning rate to 0.00000000000001 and maximum iteration to 20. But it seems will not converge. Here's the result.
QVector(0.100092, 0.329367)
QVector(0.100184, 0.558535)
QVector(0.100276, 0.787503)
QVector(0.100368, 1.01627)
QVector(0.10046, 1.24484)
QVector(0.100552, 1.47321)
QVector(0.100643, 1.70138)
QVector(0.100735, 1.92936)
QVector(0.100826, 2.15713)
QVector(0.100918, 2.38471)
QVector(0.101009, 2.61209)
QVector(0.1011, 2.83927)
QVector(0.101192, 3.06625)
QVector(0.101283, 3.29303)
QVector(0.101374, 3.51962)
QVector(0.101465, 3.74601)
QVector(0.101556, 3.9722)
QVector(0.101646, 4.1982)
QVector(0.101737, 4.424)
QVector(0.101828, 4.6496)
What's wrong?

So firstly your algorithm seems fine except that you should divide LRATE by size;
theta[0] = theta0 - LRATE * dJ0 / size;
theta[1] = theta1 - LRATE * dJ1 / size;
What I would suggest you should calculate cost function and monitor it;
Cost function
Your cost should be decreasing on every iteration. If its bouncing back and forward you are using a large value of learning rate. I would suggest you to use 0.01 and do 400 iterations.

How can i input this formula into my c++ program?

Wind Chill = 35.74 + 0.6215T - 35.75(V^0.16) + 0.4275T(V^0.16)
I need the correct way to input the above formula into my program. I currently have the following and it's giving me a crazy number:
WindChill = ((35.74 + (0.6215 * temperature))
- (35.75 * pow(windSpeed, 0.16))
+ (0.4275 * temperature * pow(windSpeed, 0.16)));
I am a beginner programmer, C++ is my first language I am learning so I would appreciate any and all help. Thank you.

You can simplify by removing parenthesis.
double wind_chill = 35.74 + 0.6215 * T - 35.75 * pow(V, 0.16) + 0.4275 * T * pow(V, 0.16);
But in this case you calculate the power two times. A better way is :
double pow_v = pow(V, 0.16);
double wind_chill = 35.74 + 0.6215 * T - 35.75 * pow_v + 0.4275 * T * pow_v;

Try this. And if you are using your own power function then rather then again and againcalling that method you can store it in some variable. That will be good for efficiency as well as readability.
double windPower = pow(windspeed, 0.16);
WindChill = (35.74 + (0.6215 * temp) - (35.75 * windPower ) + (0.4275 * temp * windPower ))
And your power function ( if you want to define it ) goes like this:-
int pow(int x, unsigned int y)
{
if( y == 0)
return 1;
else if (y%2 == 0)
return power(x, y/2)*power(x, y/2);
else
return x*power(x, y/2)*power(x, y/2);
}
This is for integers ( As I was able to test it quickly ).

If you compute by hand, using the same data, does it give you the same crazy number, or the correct answer? If you get the same crazy number, maybe you need to convert some of your numbers to the correct units, eg temperature should probably be in Kelvin rather than Celsius or Farhenheit.
(This should have been a comment, but I don't have enough rep yet...)

Collision detection using the Pythagorean theorem is being unreliable?

Objects can often times pass through each other? Additionally when calculating momentum, occasionally the sprites will form blobs upon collision, moving together instead of bouncing off.
The code does work for most collisions, but it often fails. Any ideas?
xV = X Velocity. yV = Y Velocity. Every frame this velocity values are added to the X and Y positions of the quad.
bool Quad::IsTouching(Quad &q)
{
float distance = 0;
float combinedRadius = (size/2) + (q.GetSize()/2);
distance = sqrt(pow(q.GetX() - GetX(), 2) + pow(q.GetY() - GetY(), 2));
if(distance < combinedRadius)
{
return true;
}
return false;
}
void Quad::Collide(Quad &q)
{
float mX, mY, mX2, mY2, mXTmp, mYTmp;
mX = mass * xV;
mY = mass * yV;
mXTmp = mX;
mYTmp = mY;
mX2 = q.GetMass() * q.GetxV();
mY2 = q.GetMass() * q.GetyV();
mX = mX2;
mY = mY2;
mX2 = mXTmp;
mY2 = mYTmp;
xV = mX/mass;
yV = mY/mass;
q.SetxV(mX2/q.GetMass());
q.SetyV(mY2/q.GetMass());
}

I had the same issue and here is a quick video I made to demonstrate the problem.
The method to solve this is to calculate the exact time of the collision, so the particles would move the remaining time of the iteration/time-step with the new velocity. To do this you would have to check whether the will be a collision before updating the position, so: sqrt((x1 - x2 + dt * (vx1 - vx2))^2 + (y1 - y2 + dt * (vy1 - vy2))^2) <= distance.
You might also be able to get away with a simpler solution, in which you move both object slightly so that they aren't colliding anymore. This would yield a creator inaccuracy but does needs less calculations:
dx = x1 - x2;
dy = y1 - y2;
d = sqrt(dx^2 + dy^2);
D = r1 + r2;
if(d < D)
{
s = (d - D) / (2 * d);
x1 = x1 + s * dx;
x2 = x2 - s * dx;
y1 = y1 + s * dy;
y2 = y2 - s * dy;
}

What type of collisions are you referring to? Elastic or inelastic? For an elastic collision, the code would fail, and you would have to create an additional property to prevent the two objects from sticking together on contact. You would also have to ensure, with a loop or if statement, that if one object is crossing the position of another object at the same time as the other object, that the two will separate with an angle proportional to the collision speed. Use the appropriate physics formulae.

As a deduced, potential, issue (there are no values to the velocities, sizes, etc. supplied so I can't say for sure), you are not accounting that the quads are exactly touching. That is, distance == combinedRadius. Therefore, when this is true the check fails then the objects continue moving on the next tick...right through each other.
Change your check to distance <= combinedRadius. In addition, you may simply be getting a tunneling effect because the objects are small enough and moving fast enough that on each tick they pass through each other. There are multiple ways to fix this some of which are: impose a maximum velocity and a minimum size; increase your frame rate for physics checks; use continuous collision checks versus discrete checks: see wikipedia article on subject

Box2D & Cocos2d-x finding future location of a body

I'm in the process of making a mobile game and am having trouble predicting the future location of a body/sprite pair. The method I am using to do this is below.
1 --futureX is passed in as the cocos2d sprite's location, found using CCSprite.getPosition().x
2 -- I am using b2Body values for acceleration and velocity, so I must correct the futureX coordinate by dividing it by PTM_RATIO (defined elsewhere)
3 -- the function solves for the time it will take for a the b2Body to reach the futureX position (based off of its x-direction acceleration and velocity) and then uses that time to determine where the futureY position for the body should be. I multiply by PTM_RATIO at the end because the coordinate is meant to be used for creating another sprite.
4 -- when solving for time I have two cases: one with x acceleration != 0 and one for x acceleration == 0.
5 -- I am using the quadratic formula and kinematic equations to solve my equations.
Unfortunately, the sprite I'm creating is not where I expect it to be. It ends up in the correct x location, however, the Y location is always too large. Any idea why this could be? Please let me know what other information is helpful here, or if there is an easier way to solve this!
float Sprite::getFutureY(float futureX)
{
b2Vec2 vel = this->p_body->GetLinearVelocity();
b2Vec2 accel = p_world->GetGravity();
//we need to solve a quadratic equation:
// --> 0 = 1/2(accel.x)*(time^2) + vel.x(time) - deltaX
float a = accel.x/2;
float b = vel.x;
float c = this->p_body->GetPosition().x - futureX/PTM_RATIO;
float t1;
float t2;
//if Acceleration.x is not 0, solve quadratically
if(a != 0 ){
t1 = (-b + sqrt( b * b - 4 * a * c )) / (2 * a);
t2 = (-b - sqrt( b * b - 4 * a * c )) / (2 * a);
//otherwise, solve linearly
}else{
t2 = -1;
t1 = (c/b)*(-1);
}
//now that we know how long it takes to get to posX, we can tell the y location on the sprites path
float time;
if(t1 >= 0){
time = t1;
}else{
time = t2;
}
float posY = this->p_body->GetPosition().y;
float futureY = (posY + (vel.y)*time + (1/2)*accel.y*(time*time))*PTM_RATIO;
return futureY;
}

SOLVED:
The issue was in this line:
float futureY = (posY + (vel.y)*time + (1/2)*accel.y*(time*time))*PTM_RATIO;
I needed to explicitley cast the (1/2) as a float. I fixed it with this:
float futureY = (posY + (vel.y)*time + (float).5*accel.y*(time*time))*PTM_RATIO;
Note, otherwise, the term with acceleration was evaluated to zero.

Sinusoid and derivative function

I have some object whish moves by sinusoid. I have to animate it each time it reaches the top (or the bottom) of the "wave". I want to do this with derivative function: as we know it changes the value (from positive to negative or contrary) at that points. So the code is:
// Start value
int functionValue = +1;
// Function
float y = k1 * sinf(k2 * Deg2Rad(x)) + y_base;
// Derivative function
float tempValue = -cosf(y);
// Check whether value is changed
if (tempValue * functionValue < 0)
{
animation = true;
}
functionValue = tempValue;
if I will output the tempValue it shows strange numbers:
0.851513
0.997643
0.0242145
0.690432
0.326303
-0.614262
0.892036
0.1348
0.709843
0.968676
0.0454846
0.920602
-0.423125
0.692132
-0.960107
0.0799654
-0.747722
-0.635241
0.148477
-0.98611
0.900912
-0.877801
0.811632
-0.362743
-0.233856
0.35512
-0.994107
0.885184
-0.468005
0.982489
0.675337
0.661048
0.870765
0.0312914
-0.319066
-0.602956
-0.996169
-0.95627
And animation is strange too. Not only at the top of wave. What's wrong is there?

You have
y = a * sin(b * x) + c
derivative of that is
y' = a * b * cos(b * x)
not
y' = -cos(y)

You're doing your math wrong. Derivative of sin(x) is cos(x), not cos(sin(x)).
should be
float tempValue = cosf(k2 * Deg2Rad(x));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Performance issues with simple calculations - c++

Related

Batch gradient descent algorithm does not converge

How can i input this formula into my c++ program?

Collision detection using the Pythagorean theorem is being unreliable?

Box2D & Cocos2d-x finding future location of a body

Sinusoid and derivative function

Categories

Resources