Cache locality vs Function Calls - c++

I have a function which does a task, lets call this function F(). Now, I need to do this task n times where is sufficiently small. I can think of doing 2 things:
//Code Here...
Code-for-function-F()
Code-for-function-F()
.
.
.
Code-for-function-F()
//following code
//Code Here
for(int i=0; i<n; ++i)
F()
//Following code
In the first case, I avoid function call overheads. But since the code is repeated n-times, the code can be rather large and would lead to worse cache locality/performance. For the second case, cache would be better utilized but results in overheads due to function calls. I was wondering if someone has done an analysis on which of the two is better approach.
PS: I understand that actual answer might depend on what code profiling tells me, is there a theoretically better approach between the two? I am using c++ on Linux.

There is no one-fits-them-all answer when the question is which code is faster. You have to measure it.
However, the optimizations you have in mind, loop-unrolling and function inlining, are techniques that the compiler is really good at. It is rare that applying them explicitly in your code helps the compiler to perform better optimizations. I would rather worry about preventing such compiler optimizations by writing unnecessarily clever code.
If you have a concrete example, I suggest you to take a look at godbolt. It is a nice tool that can help you to see the effect of variations on the code on the output of the compiler.
Also dont forget the famous quote from D.Knuth:
Programmers waste enormous amounts of time thinking about, or worrying
about, the speed of noncritical parts of their programs, and these
attempts at efficiency actually have a strong negative impact when
debugging and maintenance are considered. We should forget about small
efficiencies, say about 97% of the time: premature optimization is the
root of all evil. Yet we should not pass up our opportunities in that
critical 3%.
Often it is cited incomplete, while the last part is as important as the rest: "Yet we should not pass up our opportunities in that critical 3%.". To know where those 3% are you have to profile your code.
TL;DR: Don't do premature optimizations. Measure and profile first, only then you know where it is worth to improve and if you can get an improvement at all.

Related

How much faster is C++ code "supposed" to be with optimizations turned on?

I have a program that runs in around 1 minute when compiling with g++ without any options.
Compiling with -O3 however makes it run in around 1-2 seconds.
My question is whether it is normal to have this much of a speed up? Or is my code perhaps so bad, that optimization can take away that much time. Obviously I know my code isn't perfect but because of this huge speedup I'm beginning to think it's worse than I thought. Please tell me what the "normal" amount of speed up is (if that's a thing), and whether too much speed up can mean bad code that could (and should) be easily optimized by hand instead of relying on the compiler.
How much faster is C++ code “supposed” to be with optimizations turned on?
In theory: There doesn't necessarily need to be any speed difference. Nor does there exist any upper limit to the speed difference. The C++ language simply doesn't specify a difference between optimisation and lack thereof.
In practice: It depends. Some programs have more to gain from optimisation than others. Some behaviours are easier to prove than others. Some optimisations can even make the program slower, because the compiler cannot know about everything that may happen at runtime.
... 1 minute ... [optimisation] makes it run in around 1-2 seconds.
My question is whether it is normal to have this much of a speed up?
It is entirely normal. You cannot assume that you'll always get as much improvement, but this is not out of the ordinary.
Or is my code perhaps so bad, that optimization can take away that much time.
If the program is fast with optimisation, then it is a fast program. If the program is slow without optimisation, we don't care because we can enable optimisation. Usually, only the optimised speed is relevant.
Faster is better than slower, although that is not the only important metric of a program. Readability, maintainability and especially correctness are more important.
Please tell me ... whether ... code ... could ... be ... optimized by hand instead of relying on the compiler.
Everything could be optimized by hand, at least if you write the program in assembly.
... or should ...
No. There is no reason to waste time doing what the compiler has already done for you.
There are sometimes reasons to optimise by hand something that is already well optimised by the compiler. Relative speedup is not one of those reasons. An example of a valid reason is that the non-optimised build may be too slow to be executed for debugging purposes when there are real time requirements (whether hard or soft) involved.

Is it really better to have an unnecessary function call instead of using else?

So I had a discussion with a colleague today. He strongly suggested me to change a code from
if(condition){
function->setValue(true)
}
else{
function->setValue(false)
}
to
function->setValue(false)
if(condition){
function->setValue(true)
}
in order to avoid the 'else'. I disagreed, because - while it might improve readability to some degree - in the case of the if-condition being true, we have 1 absolutely unnecessary function call.
What do you guys think?
Meh.
To do this to just to avoid the else is silly (at least there should be a deeper rationale). There's no extra branching cost to it typically, especially after the optimizer goes through it.
Code compactness can sometimes be a desirable aesthetic, especially if a lot of time is spent skimming and searching through code than reading it line-by-line. There can be legit reasons to favor terser code sometimes, but it's always cons and pros. But even code compactness should not be about cramming logic into fewer lines of code so much as just straightforward logic.
Correctness here might be easier to achieve with one or the other. The point was made in a comment that you might not know the side effects associated with calling setValue(false), though I would suggest that's kind of moot. Functions should have minimal side effects, they should all be documented at the interface/usage level if they aren't totally obvious, and if we don't know exactly what they are, we should be spending more time looking up their documentation prior to calling them (and their side effects should not be changing once firm dependencies are established to them).
Given that, it may sometimes be easier to achieve correctness and maintain it with a solution that starts out initializing states to some default value, and using a form of code that opts in to overwrite it in specific branches of code. From that standpoint, what your colleague suggested may be valid as a way to avoid tripping over that code in the future. Then again, for a simple if/else pair of branches, it's hardly a big deal.
Don't worry about the cost of the extra most-likely-constant-time function call either way in this kind of knee-deep micro-level implementation case, especially with no super tight performance-critical loop around this code (and even then, still prefer to worry about that at least a little bit in hindsight after profiling).
I think there are far better things to think about than this kind of coding style, like testing procedure. Reliable code tends to need less revisiting, and has the freedom to be written in a wider variety of ways without causing disputes. Testing is what establishes reliability. The biggest disputes about coding style tend to follow teams where there's more toe-stepping and more debugging of the same bodies of code over and over and over from disparate people due to lack of reliability, modularity, excessive coupling, etc. It's a symptom of a problem but not necessarily the root cause.

Looking for opinions on when to unwind loops with if statements

I'm wondering when (if ever) I should be pulling if statement out of substantial loops in order to help optimize speed?
for(i=0;i<File.NumBits;i++)
if(File.Format.Equals(FileFormats.A))
ProcessFormatA(File[i]);
else
ProcessFormatB(File[i]);
into
if(File.Format.Equals(FileFormats.A))
for(i=0;i<File.NumBits;i++)
ProcessFormatA(File[i]);
else
for(i=0;i<File.NumBits;i++)
ProcessFormatB(File[i]);
I'm not sure if the compiler will do this type of optimization for me, or if this is considered good coding practice because I would imagine it would make code much harder to read / maintain if the loops were more complex.
Thanks for any input / suggestions.
When you are finished the code and the profiler tells you that the for loops are a bottleneck. No sooner.
If you're actually processing a file (i.e. reading and/or writing) within your functions, optimizing the if is going to be pointless, the file operations will take so much longer, comparatively, than the if that you won't even notice the speed improvement.
I would expect that a decent compiler might be able to optimize the code - however, it would need to be certain that File.Format can't change in the loop, which could be a big ask.
As I always like to say, write first, optimize later!
Definitely code for maintainability and correctness first. In this case I would be inclined to suggest neither:
for(...)
ProcessFormat(File, Format);
Much harder to mess up if all the checks are in one place. You do a better job of confusing your optimizer this way, but generally you want correct code to run slowly rather than incorrect code running quickly. You can always optimize later if you want.
The two perform the same and read the same, so I'd pick the one that has less code. In fact, this example you give hints polymorphism might be a good fit here to make code simpler and short still.

Performance penalty for "if error then fail fast" in C++?

Is there any performance difference (in C++) between the two styles of writing if-else, as shown below (logically equivalent code) for the likely1 == likely2 == true path (likely1 and likely2 are meant here as placeholders for some more elaborate conditions)?
// Case (1):
if (likely1) {
Foo();
if (likely2) {
Bar();
} else {
Alarm(2);
}
} else {
Alarm(1);
}
vs.
// Case (2):
if (!likely1) {
Alarm(1);
return;
}
Foo();
if (!likely2) {
Alarm(2);
return;
}
Bar();
I'd be very grateful for information on as many compilers and platforms as possible (but with gcc/x86 highlighted).
Please note I'm not interested in readability opinions on those two styles, neither in any "premature optimisation" claims.
EDIT: In other words, I'd like to ask if the two styles are at some point considered fully-totally-100% equivalent/transparent by a compiler (e.g. bit-by-bit equivalent AST at some point in a particular compiler), and if not, then what are the differences? For any (with a preference towards "modern" and gcc) compiler you know.
And, to make it more clear, I too don't suppose that it's going to give me much of a performance improvement, and that it usually would be premature optimization, but I am interested in whether and how much it can improve/degrade anything?
It depends greatly on the compiler, and the optimization settings. If the difference is crucial - implement both, and either analyze the assembly, or do benchmarks.
I have no answers for specific platforms, but I can make a few general points:
The traditional answer on non-modern processors without branch prediction, is that the first is likely to be more efficient since in the common case it takes fewer branches. But you seem interested in modern compilers and processors.
On modern processors, generally speaking short forward branches are not expensive, whereas mispredicted branches may be expensive. By "expensive" of course I mean a few cycles
Quite aside from this, the compiler is entitled to order basic blocks however it likes provided it doesn't change the logic. So when you write if (blah) {foo();} else {bar();}, the compiler is entitled to emit code like:
evaluate condition blah
jump_if_true else_label
bar()
jump endif_label
else_label:
foo()
endif_label:
On the whole, gcc tends to emit things in roughly the order you write them, all else being equal. There are various things which make all else not equal, for example if you have the logical equivalent of bar(); return in two different places in your function, gcc might well coalesce those blocks, emit only one call to bar() followed by return, and jump or fall through to that from two different places.
There are two kinds of branch prediction - static and dynamic. Static means that the CPU instructions for the branch specify whether the condition is "likely", so that the CPU can optimize for the common case. Compilers might emit static branch predictions on some platforms, and if you're optimizing for that platform you might write code to take account of that. You can take account of it either by knowing how your compiler treats the various control structures, or by using compiler extensions. Personally I don't think it's consistent enough to generalize about what compilers will do. Look at the disassembly.
Dynamic branch prediction means that in hot code, the CPU will keep statistics for itself how likely branches are to be taken, and optimize for the common case. Modern processors use various different dynamic branch prediction techniques: http://en.wikipedia.org/wiki/Branch_predictor. Performance-critical code pretty much is hot code, and as long as the dynamic branch prediction strategy works, it very rapidly optimizes hot code. There might be certain pathological cases that confuse particular strategies, but in general you can say that anything in a tight loop where there's a bias towards taken/not taken, will be correctly predicted most of the time
Sometimes it doesn't even matter whether the branch is correctly predicted or not, since some CPUs in some cases will include both possibilities in the instruction pipeline while it's waiting for the condition to be evaluated, and ditch the unnecessary option. Modern CPUs get complicated. Even much simpler CPU designs have ways of avoiding the cost of branching, though, such as conditional instructions on ARM.
Calls out of line to other functions will upset all such guesswork anyway. So in your example there may be small differences, and those differences may depend on the actual code in Foo, Bar and Alarm. Unfortunately it's not possible to distinguish between significant and insignificant differences, or to account for details of those functions, without getting into the "premature optimization" accusations you're not interested in.
It's almost always premature to micro-optimize code that isn't written yet. It's very hard to predict the performance of functions named Foo and Bar. Presumably the purpose of the question is to discern whether there's any common gotcha that should inform coding style. To which the answer is that, thanks to dynamic branch prediction, there is not. In hot code it makes very little difference how your conditions are arranged, and where it does make a difference that difference isn't as easily predictable as "it's faster to take / not take the branch in an if condition".
If this question was intended to apply to just one single program with this code proven to be hot, then of course it can be tested, there's no need to generalize.
It is compiler dependent. Check out the gcc documentation on __builtin_expect. Your compiler may have something similar. Note that you really should be concerned about premature optimization.
The answer depends a lot on the type of "likely". If it is an integer constant expression, the compiler can optimize it and both cases will be equivalent. Otherwise, it will be evaluated in runtime and can't be optimized much.
Thus, case 2 is generally more efficient than case 1.
As input from real-time embedded systems, which I work with, your "case 2" is often the norm for code that is safety- and/or performance critical. Style guides for safety-critical embedded systems often allow this syntax so a function can quit quickly upon errors.
Generally, style guides will frown upon the "case 2" syntax, but make an exception to allow several returns in one function either if
1) the function needs to quit quickly and handle the error, or
2) if one single return at the end of the function leads to less readable code, which is often the case for various protocol and data parsers.
If you are this concerned about performance, I assume you are using profile guided optimization.
If you are using profile guided optimization, the two variants you have proposed are exactly the same.
In any event, the performance of what you are asking about is completely overshadowed by performance characteristics of things not evident in your code samples, so we really can not answer this. You have to test the performance of both.
Though I'm with everyone else here insofar as optimizing a branch makes no sense without having profiled and actually having found a bottleneck... if anything, it makes sense to optimize for the likely case.
Both likely1 and likely2 are likely, as their name suggests. Thus ruling out the also likely combination of both being true would likely be fastest:
if(likely1 && likely2)
{
... // happens most of the time
}else
{
if(likely1)
...
if(likely2)
...
else if(!likely1 && !likely2) // happens almost never
...
}
Note that the second else is probably not necessary, a decent compiler will figure out that the last if clause cannot possibly be true if the previous one was, even if you don't explicitly tell it.

Should I use a function in a situation where it would be called an extreme number of times?

I have a section of my program that contains a large amount of math with some rather long equations. Its long and unsightly and I wish to replace it with a function. However, chunk of code is used an extreme number of times in my code and also requires a lot of variables to be initialized.
If I'm worried about speed, is the cost of calling the function and initializing the variables negligible here or should i stick to directly coding it in each time?
Thanks,
-Faken
Most compilers are smart about inlining reasonably small functions to avoid the overhead of a function call. For functions big enough that the compiler won't inline them, the overhead for the call is probably a very small fraction of the total execution time.
Check your compiler documentation to understand it's specific approach. Some older compilers required or could benefit from hints that a function is a candidate for inlining.
Either way, stick with functions and keep your code clean.
Are you asking if you should optimize prematurely?
Code it in a maintainable manner first; if you then find that this section is a bottleneck in the overall program, worry about tuning it at that point.
You don't know where your bottlenecks are until you profile your code. Anything you can assume about your code hot spots is likely to be wrong. I remember once I wanted to optimize some computational code. I ran a profiler and it turned out that 70 % of the running time was spent zeroing arrays. Nobody would have guessed it by looking at the code.
So, first code clean, then run a profiler, then optimize the rough spots. Not earlier. If it's still slow, change algorithm.
Modern C++ compilers generally inline small functions to avoid function call overhead. As far as the cost of variable initialization, one of the benefits of inlining is that it allows the compiler to perform additional optimizations at the call site. After performing inlining, if the compiler can prove that you don't need those extra variables, the copying will likely be eliminated. (I assume we're talking about primitives, not things with copy constructors.)
The only way to answer that is to test it. Without knowing more about the proposed function, nobody can really say whether the compiler can/will inline that code or not. This may/will also depend on the compiler and compiler flags you use. Depending on the compiler, if you find that it's really a problem, you may be able to use different flags, a pragma, etc., to force it to be generated inline even if it wouldn't be otherwise.
Without knowing how big the function would be, and/or how long it'll take to execute, it's impossible guess how much effect on speed it'll have if it isn't generated inline.
With both of those being unknown, none of us can really guess at how much effect moving the code into a function will have. There might be none, or little or huge.