I often use references to simplify the appearance of code:
vec3f& vertex = _vertices[index];
// Calculate the vertex position
vertex[0] = startx + col * colWidth;
vertex[1] = starty + row * rowWidth;
vertex[2] = 0.0f;
Will compilers recognize and optimize this so it is essentially the following?
_vertices[index][0] = startx + col * colWidth;
_vertices[index][1] = starty + row * rowWidth;
_vertices[index][2] = 0.0f;
Yes. This is a basic optimization that any modern (and even ancient) compilers will make.
In fact, I don't think it's really accurate to call that you've written an optimisation, since the move straightforward way to translate that to assembly involves a store to the _vertex address, plus index, plus {0,1,2} (multiplied by the appropriate sizes for things, of course).
In general though, modern compilers are amazing. Almost any optimization you can think of will be implemented. You should always write your code in a way that emphasizes readability unless you know that one way has significant performance benefits for your code.
As a simple example, code like this:
int func() {
int x;
int y;
int z;
int a;
x = 5*5;
y = x;
z = y;
a = 100 * 100 * 100* 100;
return z;
}
Will be optimized to this:
int func() {
return 25
}
Additionally, the compiler will also inline the function so that no call is actually made. Instead, everywhere 'func()' appears will just be replaced with '25'.
This is just a simple example. There are many more complex optimizations a modern compiler implements.
Compilers will even do more clever stuff than this. Maybe they'll do
vec3f * vertex = _vertices[index];
*vertex++ = startx + col * colWidth;
*vertex++ = starty + row * rowWidth;
*vertex++ = 0.0f;
Or even other variations …
Depending on the types of your variables, what you've described is a pessimization.
If vertices is a class type then your original form makes a single call to operator[] and reuses the returned reference. Your second form makes three separate calls. It can't necessarily be inferred that the returned reference will refer to the same object each time.
The cost of a reference is probably not material in comparison to repeated lookups in the original vertices object.
Except in limited cases, the compiler cannot optimize out (or pessimize in) extra function calls, unless the change introduced is not detectable by a conforming program. Often this requires visibility of an inline definition.
This is known as the "as if" rule. So long as the code behaves as if the language rules have been followed exactly, the implementation may make any optimizations it sees fit.
Related
I wrote following code to calculate intersecting points of two circles. The code is simple and fast enough. Not that I need more optimization, but I can think of optimizing this code more aggressively. For example h/d and 1.0/d are calculated twice (Let's forget about compiler optimizations).
const std::array<point,2> intersect(const circle& a,
const circle& b) {
std::array<point,2> intersect_points;
const float d2 = squared_distance(a.center, b.center);
const float d = std::sqrt(d2);
const float r12 = std::pow(a.radious, 2);
const float r22 = std::pow(b.radious, 2);
const float l = (r12 - r22 + d2) / (2*d);
const float h = std::sqrt(r12 - std::pow(l,2));
const float termx1 = (1.0/d) * (b.center.x - a.center.x) + a.center.x;
const float termx2 = (h/d)*(b.center.y - a.center.y);
const float termy1 = (1.0/d) * (b.center.y - a.center.y) + a.center.y;
const float termy2 = (h/d)*(b.center.x - a.center.x);
intersect_points[0].x = termx1 + termx2;
intersect_points[0].y = termy1 - termy2;
intersect_points[1].x = termx1 - termx2;
intersect_points[1].y = termy1 + termy2;
return intersect_points;
}
My question is how much we can trust C++ compilers (g++ here) to understand the code and optimize final binary? Can g++ avoid doing 1.0/d twice? More precisely I want to know where is the line. When we should leave fine tuning to compiler and when do we do optimization?
Popular compilers are pretty good in optimization nowadays.
It is very likely that the optimizer detects common expressions like 1.0/d, so don't care about this one.
It is much less likely that the optimizer replaces std:pow( x, 2 ) by x * x.
This depends of the exact function you use, the compiler version you are using and the optimization command line switches. So in this case, you're better off to write x * x.
It's hard to say how far an optimizer can go and when you as a human must take over, this depends on how "smart" the optimizer is. But as a rule of thumb, the compiler can ony optimize things it can deduct from the lines of code.
Example:
The compiler will know, that this term is always false: 1 == 2
But it can't know that this is always false as well: 1 == nextPrimeAfter(1), because therefore it would have to have knowledge about what the function nextPrimeAfter() does.
I have an example code like this, in which the literal 1 repeats several times.
foo(x - 1);
y = z + 1;
bar[1] = y;
Should I define a constant ONE, and replace the literals with it?
constexpr int ONE = 1;
foo(x - ONE);
y = z + ONE;
bar[ONE] = y;
Would this replacement make any performance improvement and/or reduce machine code size in the favor of reducing code readability? Would the number of repeating of the literal change the answer?
It will not bring you any performance/memory improvements. However, you should try to keep your code clean from magical numbers. So, if there is a repeated constant in your code in several places, and in all those places this constant is the same from logical point of view, it would be better to make it a named constant.
Example:
const int numberOfParticles = 10; //This is just an example, it's better not to use global variables.
void processParticlesPair(int i, int j) {
for (int iteration = 0; iteration < 10; ++iteration) {
//note, that I didn't replace "10" in the line above, because it is not a numberOrParticles,
//but a number of iterations, so it is a different constant from a logical point of view.
//Do stuff
}
}
void displayParticles() {
for (int i = 0; i < numberOfParticles; ++i) {
for (int j = 0; j < numberOfParticles; ++j) {
if (i != j) {
processParticlesPair(i, j);
}
}
}
}
Depends. If you just have 1s in your code and you ask if you should replace them: DONT. Keep your code clean. You will not have any performance or memory advantages - even worse, you might increase build time
If the 1, however, is a build-time parameter: Yes, please introduce a constant! But choose a better name than ONE!
Should I define a constant ONE, and replace the literals with it?
No, absolutely not. If you have a name that indicates the meaning of the number (e.g. NumberOfDummyFoos), if its value can change and you want to prevent having to update it in a dozen locations, then you can use a constant for that, but a constant ONE adds absolutely no value over a literal 1.
Would this replacement make any performance improvement and/or reduce machine code size in the favor of reducing code readability?
In any realistic implementation, it does not.
Replacing literals with named constants make only sense,
if the meaning of the constant is special. Replacing 1 with ONE is
just overhead in most cases, and does not add any useful information
to the reader, especially if it is used in different functions (index, part of a calculation etc.). If the entry 1 of an array is somehow special, using a constant THE_SPECIAL_INDEX=1 would make sense.
For the compiler it usually does not make any difference.
In assembly, one constant value generally takes the same amount of memory as any other. Setting a constant value in your source code is more of a convenience for humans than an optimization.
In this case, using ONE in such a way is neither a performance or readability enhancement. That's why you've probably never seen it in source code before ;)
When i'm trying to optimize my code, I often run into a dilemma:
I have an expression like this:
int x = 5 + y * y;
int z = sqrt(12) + y * y;
Does it worth it making a new integer variable to store y*y for two instances, or just leave them alone?
int s = y* y;
int x = 5 + s;
int z = sqrt(12) + s;
If not, how many instances does it need to worth it?
Trying to optimize your code most often means giving the compiler the permission (through flags) to do its own optimization. Trying to do it yourself will more often then not, either just be a waste of time (no improvement over the compiler) or worse.
In your specific example, I seriously doubt there is anything you can do to change the performance.
One of the older compiler optimisations is "common subexpression elimination" - in this case y * y is such a common subexpression.
It may still make sense to show a reader of the code that the expression only needs calculating once, but any compiler produced in the last ten years will calculate this perfectly fine without repeating the multiplication.
Trying to "beat the compiler on it's own game" is often futile, and certainly needs measuring to ensure you get a better result than the compiler. Adding extra variables MAY cause the compiler to produce worse code, because it gets "confused", so it may not help at all.
And ALWAYS when it comes to performance (or code size) results from varying optimizations, measure, measure again, and measure a third time to make sure you get the results you expect. It's not very easy to predict from looking at code which is faster, and which is slower. But it'd definitely be surprised if y * y is calculated twice even with a low level of optimisation in your compiler.
You don't need a temporary variable:
int z = y * y;
int x = z + 5
z = z + sqrt(12);
but the only way to be sure if this is (a) faster and (b) truly where you should focus your attention, is to use a profiler and benchmark your entire application.
If you have a method such as this:
float method(myClass foo)
{
return foo.PrivateVar() + foo.PrivateVar();
}
is it always faster/better to do this instead?:
float method(myClass foo)
{
float temp = foo.PrivateVar();
return temp + temp;
}
I know you're not supposed to put a call like foo.PrivateVar() in a for loop for example, because it evaluates it many times when you actually only need to use the value once (in some cases.
for (int i = 0; i < foo.PrivateInt(); i++)
{
//iterate through stuff with 'i'
}
from this I made the assumption to change code like the first example to that in the second, but then I've been told by people to not try to be smarter than the compiler! and that it could very well inline the calls.
I don't want to profile anything, I just want a few simple rules for good practice on this. I'm writing a demo for a job application and I don't want anyone to look at the code and see some rookie mistake.
That completely depends on what PrivateVar() is doing and where it's defined etc. If the compiler has access to the code in PrivateVar() and can guarantee that there are no side effects by calling the function it can do CSE which is basically what you've done in your second code example.
Exactly the same is true for your for loop. So if you want to be sure it's only evaluated once because it's a hugely expensive function (which also means that guaranteeing no side-effects get's tricky even if there aren't any) write it explicitly.
If PrivateVar() is just a getter, write the clearer code - even if the compiler may not do CSE the performance difference won't matter in 99.9999% of all cases.
Edit: CSE stands for Common Subexpression eliminiation and does exactly what it stands for ;) The wiki page shows an example for a simple multiplication, but we can do this for larger code constructs just as well, like for example a function call.
In all cases we have to guarantee that only evaluating the code once doesn't change the semantics, i.e. doing CSE for this code:
a = b++ * c + g;
d = b++ * c * d;
and changing it to:
tmp = b++ * c;
a = tmp + g;
d = tmp * d;
would obviously be illegal (for function calls this is obviously a bit more complex, but it's the same principle).
I have the following looking code in VC++:
for (int i = (a - 1) * b; i < a * b && i < someObject->someFunction(); i++)
{
// ...
}
As far as I know compilers optimize all these arithmetic operations and they won't be executed on each loop, but I'm not sure if they can tell that the function above also returns the same value each time and it doesn't need to be called each time.
Is it a better practice to save all calculations into variables, or just rely on compiler optimizations to have a more readable code?
int start = (a - 1) * b;
int expra = a * b;
int exprb = someObject->someFunction();
for (int i = startl i < expra && i < exprb; i++)
{
// ...
}
Short answer: it depends. If the compiler can deduce that running someObject->someFunction() every time and caching the result once both produce the same effects, it is allowed (but not guaranteed) to do so. Whether this static analysis is possible depends on your program: specifically, what the static type of someObject is and what its dynamic type is expected to be, as well as what someFunction() actually does, whether it's virtual, and so on.
In general, if it only needs to be done once, write your code in such a way that it can only be done once, bypassing the need to worry about what the compiler might be doing:
int start = (a - 1) * b;
int expra = a * b;
int exprb = someObject->someFunction();
for (int i = start; i < expra && i < exprb; i++)
// ...
Or, if you're into being concise:
for (int i = (a - 1) * b, expra = a * b, exprb = someObject->someFunction();
i < expra && i < exprb; i++)
// ...
From my experience VC++ compiler won't optimize the function call out unless it can see the function implementation at the point of compiling the calling code. So moving the call outside the loop is a good idea.
If a function resides within the same compilation unit as its caller, the compiler can often deduce some facts about it - e.g. that its output might not change for subsequent calls. In general, however, that is not the case.
In your example, assigning variables for these simple arithmetic expressions does not really change anything with regards to the produced object code and, in my opinion, makes the code less readable. Unless you have a bunch of long expressions that cannot reasonably be put within a line or two, you should avoid using temporary variables - if for no other reason, then just to reduce namespace pollution.
Using temporary variables implies a significant management overhead for the programmer, in order to keep them separate and avoid unintended side-effects. It also makes reusing code snippets harder.
On the other hand, assigning the result of the function to a variable can help the compiler optimise your code better by explicitly avoiding multiple function calls.
Personally, I would go with this:
int expr = someObject->someFunction();
for (int i = (a - 1) * b; i < a * b && i < expr; i++)
{
// ...
}
The compiler cannot make any assumption on whether your function will return the same value at each time. Let's imagine that your object is a socket, how could the compiler possibly know what will be its output?
Also, the optimization that a compiler can make in such loops strongly depends on the whether a and b are declared as const or not, and whether or not they are local. With advanced optimization schemes, it may be able to infer that a and b are neither modified in the loop nor in your function (again, you might imagine that your object holds some reference to them).
Well, in short: go for the second version of your code!
It is very likely that the compiler will call the function each time.
If you are concerned with the readability of code, what about using:
int maxindex = min (expra, exprb);
for (i=start; i<maxindex; i++)
IMHO, long lines does not improve readability.
Writing short lines and doing multiple step to get a result, does not impact the performance, this is exactly why we use compilers.
Effectively what you might be asking is whether the compiler will inline the function someFunction() and whether it will see that someObject is the same instance in each loop, and if it does both it will potentially "cache" the return value and not keep re-evaluating it.
Much of this may depend on what optimisation settings you use, with VC++ as well as any other compiler, although I am not sure VC++ gives you quite as many flags as gnu.
I often find it incredible that programmers rely on compilers to optimise things they can easily optimise themselves. Just move the expression to the first section of the for-loop if you know it will evaluate the same each time:
Just do this and don't rely on the compiler:
for (int i = (a - 1) * b, iMax = someObject->someFunction();
i < a * b && i < iMax; ++i)
{
// body
}