Looking for opinions on when to unwind loops with if statements - c++

I'm wondering when (if ever) I should be pulling if statement out of substantial loops in order to help optimize speed?
for(i=0;i<File.NumBits;i++)
if(File.Format.Equals(FileFormats.A))
ProcessFormatA(File[i]);
else
ProcessFormatB(File[i]);
into
if(File.Format.Equals(FileFormats.A))
for(i=0;i<File.NumBits;i++)
ProcessFormatA(File[i]);
else
for(i=0;i<File.NumBits;i++)
ProcessFormatB(File[i]);
I'm not sure if the compiler will do this type of optimization for me, or if this is considered good coding practice because I would imagine it would make code much harder to read / maintain if the loops were more complex.
Thanks for any input / suggestions.

When you are finished the code and the profiler tells you that the for loops are a bottleneck. No sooner.

If you're actually processing a file (i.e. reading and/or writing) within your functions, optimizing the if is going to be pointless, the file operations will take so much longer, comparatively, than the if that you won't even notice the speed improvement.
I would expect that a decent compiler might be able to optimize the code - however, it would need to be certain that File.Format can't change in the loop, which could be a big ask.
As I always like to say, write first, optimize later!

Definitely code for maintainability and correctness first. In this case I would be inclined to suggest neither:
for(...)
ProcessFormat(File, Format);
Much harder to mess up if all the checks are in one place. You do a better job of confusing your optimizer this way, but generally you want correct code to run slowly rather than incorrect code running quickly. You can always optimize later if you want.

The two perform the same and read the same, so I'd pick the one that has less code. In fact, this example you give hints polymorphism might be a good fit here to make code simpler and short still.

Related

Cache locality vs Function Calls

I have a function which does a task, lets call this function F(). Now, I need to do this task n times where is sufficiently small. I can think of doing 2 things:
//Code Here...
Code-for-function-F()
Code-for-function-F()
.
.
.
Code-for-function-F()
//following code
//Code Here
for(int i=0; i<n; ++i)
F()
//Following code
In the first case, I avoid function call overheads. But since the code is repeated n-times, the code can be rather large and would lead to worse cache locality/performance. For the second case, cache would be better utilized but results in overheads due to function calls. I was wondering if someone has done an analysis on which of the two is better approach.
PS: I understand that actual answer might depend on what code profiling tells me, is there a theoretically better approach between the two? I am using c++ on Linux.
There is no one-fits-them-all answer when the question is which code is faster. You have to measure it.
However, the optimizations you have in mind, loop-unrolling and function inlining, are techniques that the compiler is really good at. It is rare that applying them explicitly in your code helps the compiler to perform better optimizations. I would rather worry about preventing such compiler optimizations by writing unnecessarily clever code.
If you have a concrete example, I suggest you to take a look at godbolt. It is a nice tool that can help you to see the effect of variations on the code on the output of the compiler.
Also dont forget the famous quote from D.Knuth:
Programmers waste enormous amounts of time thinking about, or worrying
about, the speed of noncritical parts of their programs, and these
attempts at efficiency actually have a strong negative impact when
debugging and maintenance are considered. We should forget about small
efficiencies, say about 97% of the time: premature optimization is the
root of all evil. Yet we should not pass up our opportunities in that
critical 3%.
Often it is cited incomplete, while the last part is as important as the rest: "Yet we should not pass up our opportunities in that critical 3%.". To know where those 3% are you have to profile your code.
TL;DR: Don't do premature optimizations. Measure and profile first, only then you know where it is worth to improve and if you can get an improvement at all.

Is it really better to have an unnecessary function call instead of using else?

So I had a discussion with a colleague today. He strongly suggested me to change a code from
if(condition){
function->setValue(true)
}
else{
function->setValue(false)
}
to
function->setValue(false)
if(condition){
function->setValue(true)
}
in order to avoid the 'else'. I disagreed, because - while it might improve readability to some degree - in the case of the if-condition being true, we have 1 absolutely unnecessary function call.
What do you guys think?
Meh.
To do this to just to avoid the else is silly (at least there should be a deeper rationale). There's no extra branching cost to it typically, especially after the optimizer goes through it.
Code compactness can sometimes be a desirable aesthetic, especially if a lot of time is spent skimming and searching through code than reading it line-by-line. There can be legit reasons to favor terser code sometimes, but it's always cons and pros. But even code compactness should not be about cramming logic into fewer lines of code so much as just straightforward logic.
Correctness here might be easier to achieve with one or the other. The point was made in a comment that you might not know the side effects associated with calling setValue(false), though I would suggest that's kind of moot. Functions should have minimal side effects, they should all be documented at the interface/usage level if they aren't totally obvious, and if we don't know exactly what they are, we should be spending more time looking up their documentation prior to calling them (and their side effects should not be changing once firm dependencies are established to them).
Given that, it may sometimes be easier to achieve correctness and maintain it with a solution that starts out initializing states to some default value, and using a form of code that opts in to overwrite it in specific branches of code. From that standpoint, what your colleague suggested may be valid as a way to avoid tripping over that code in the future. Then again, for a simple if/else pair of branches, it's hardly a big deal.
Don't worry about the cost of the extra most-likely-constant-time function call either way in this kind of knee-deep micro-level implementation case, especially with no super tight performance-critical loop around this code (and even then, still prefer to worry about that at least a little bit in hindsight after profiling).
I think there are far better things to think about than this kind of coding style, like testing procedure. Reliable code tends to need less revisiting, and has the freedom to be written in a wider variety of ways without causing disputes. Testing is what establishes reliability. The biggest disputes about coding style tend to follow teams where there's more toe-stepping and more debugging of the same bodies of code over and over and over from disparate people due to lack of reliability, modularity, excessive coupling, etc. It's a symptom of a problem but not necessarily the root cause.

Simple profiling of single C++ function in Windows

There are times, particularly when I'm writing a new function, where I would like to profile a single piece of code but a full profile run is not really necessary, and possibly too slow.
I'm using VS 2008 and have used the AMD profiler on C++ with good results, but I'm looking for something a little more lightweight.
What tools do you use to profile single functions? Perhaps something that is a macro which gets excluded when your not in DEBUG mode. I could write my own, but I wanted to know if there were any built in that I'm missing. I was thinking something like:
void FunctionToTest()
{
PROFILE_ENTER("FunctionToTest")
// Do some stuff
PROFILE_EXIT()
}
Which would simply print in the debug output window how long the function took.
If I want to get maximum speed from a particular function, I wrap it in a nice long-running loop and use this technique.
I really don't care too much about the time it takes. That's just the result.
What I really need to know is what must I do to make it take less time.
See the difference?
After finding and fixing the speed bugs, when the outer loop is removed, it flies.
Also I don't follow the orthodoxy of only tuning optimized code, because that assumes the code is already nearly as tight as possible.
In fact, in programs of any significant size, there is usually stupid stuff going on, like calling a subfunction over and over with the same arguments, or new-ing objects repeatedly, when prior copies could be re-used.
The compiler's optimizer might be able to clean up some such problems,
but I need to clean up every one, because the ones left behind will dominate.
What it can do is make them harder to find, by scrambling the code.
When I've gotten all the stupid stuff out (making it much faster), then I turn on the optimizer.
You might think "Well I would never put stupid stuff in my code."
Right.
And you'd never put in bugs either.
None of us try to make mistakes, but we all do, if we're working.
This code by Jeff Preshing should do the trick:-
http://preshing.com/20111203/a-c-profiling-module-for-multithreaded-apis
Measure the time - which the code in the link does too - using either clock() or one of the OS provided high resolution timers. With C++11 you can use the timers from the <chrono> header.
Please note, that you should always measure in Release build not Debug build, to get proper timings.

Any way to handle "predictable branches" faster?

I have some code in which there are two or three branches which you don't know what way they will go, but after the first time they are hit, it is either 100% certain, or close to that, that the same path will happen again. I have noticed that use of the __builtin_likely doesn't do much in terms of avoiding branch misses. And even though branch prediction does a good job when my function is called repeatedly in a short time span..as soon as there is other stuff going on between calls to my function, performance degrades substantially. What are some ways around this or some techniques I can look into? Any way to somehow "tag" these branches for when they are reached again after some vagrancy?
You could use templates to generate a different version of the function for each code path, then use a function pointer to select one at runtime when you find out which way the condition goes.
The branch predictor and compiler intrinsics are all you've got. At best, you can look at the assembly and try to hand-roll some optimization yourself, but you won't find much.

Should I use a function in a situation where it would be called an extreme number of times?

I have a section of my program that contains a large amount of math with some rather long equations. Its long and unsightly and I wish to replace it with a function. However, chunk of code is used an extreme number of times in my code and also requires a lot of variables to be initialized.
If I'm worried about speed, is the cost of calling the function and initializing the variables negligible here or should i stick to directly coding it in each time?
Thanks,
-Faken
Most compilers are smart about inlining reasonably small functions to avoid the overhead of a function call. For functions big enough that the compiler won't inline them, the overhead for the call is probably a very small fraction of the total execution time.
Check your compiler documentation to understand it's specific approach. Some older compilers required or could benefit from hints that a function is a candidate for inlining.
Either way, stick with functions and keep your code clean.
Are you asking if you should optimize prematurely?
Code it in a maintainable manner first; if you then find that this section is a bottleneck in the overall program, worry about tuning it at that point.
You don't know where your bottlenecks are until you profile your code. Anything you can assume about your code hot spots is likely to be wrong. I remember once I wanted to optimize some computational code. I ran a profiler and it turned out that 70 % of the running time was spent zeroing arrays. Nobody would have guessed it by looking at the code.
So, first code clean, then run a profiler, then optimize the rough spots. Not earlier. If it's still slow, change algorithm.
Modern C++ compilers generally inline small functions to avoid function call overhead. As far as the cost of variable initialization, one of the benefits of inlining is that it allows the compiler to perform additional optimizations at the call site. After performing inlining, if the compiler can prove that you don't need those extra variables, the copying will likely be eliminated. (I assume we're talking about primitives, not things with copy constructors.)
The only way to answer that is to test it. Without knowing more about the proposed function, nobody can really say whether the compiler can/will inline that code or not. This may/will also depend on the compiler and compiler flags you use. Depending on the compiler, if you find that it's really a problem, you may be able to use different flags, a pragma, etc., to force it to be generated inline even if it wouldn't be otherwise.
Without knowing how big the function would be, and/or how long it'll take to execute, it's impossible guess how much effect on speed it'll have if it isn't generated inline.
With both of those being unknown, none of us can really guess at how much effect moving the code into a function will have. There might be none, or little or huge.