Branching when mixing template parameters and variables in C++ - c++

I'm trying to carry out some loop optimization as described here: Optimizing a Loop vs Code Duplication
I have the additional complication that some code inside the loop only needs to be executed depending on a combination of run-time-known variables external to the loop (which can be replaced with template parameters for optimization, as discussed in the link above) and a run-time-known variable that is only known inside the loop.
Here is the completely un-optimized version of the code:
for (int i = 0; i < 100000, i++){
if (external_condition_1 || (external_condition_2 && internal_condition[i])){
run_some_code;
}
else{
run_some_other_code;
}
run_lots_of_other_code;
}
This is my attempt at wrapping the loop in a templated function as suggested in the question linked above to optimize performance and avoid code duplication by writing multiple versions of the loop:
template<bool external_condition_1, external_condition_2>myloop(){
for (int i = 0; i < 100000, i++){
if (external_condition_1 || (external_condition_2 && internal_condition[i]){
run_some_code;
}
else{
run_some_other_code;
}
run_lots_of_other_code;
}
My question is: how can the code be written to avoid branching and code duplication?
Note that the code is sufficiently complex that the function probably can't be inlined, and compiler optimization also likely wouldn't sort this out in general.

My question is: how can the code be written to avoid branching and code duplication?
Well, you already wrote your template to avoid code duplication, right? So let's look at what branching is left. To do this, we should look at each function that is generated from your template (there are four of them). We should also apply the expected compiler optimizations based upon the template parameters.
First up, set condition 1 to true. This should produce two functions that are essentially (using a bit of pseudo-syntax) the following:
myloop<true, bool external_condition_2>() {
for (int i = 0; i < 100000, i++){
// if ( true || whatever ) <-- optimized out
run_some_code;
run_lots_of_other_code;
}
}
No branching there. Good. Moving on to the first condition being false and the second condition being true.
myloop<false, true>(){
for (int i = 0; i < 100000, i++){
if ( internal_condition[i] ){ // simplified from (false || (true && i_c[i]))
run_some_code;
}
else{
run_some_other_code;
}
run_lots_of_other_code;
}
}
OK, there is some branching going on here. However, each i needs to be analyzed to see which code should execute. I think there is nothing more that can be done here without more information about internal_condition. I'll give some thoughts on that later, but let's move on to the fourth function for now.
myloop<false, false>() {
for (int i = 0; i < 100000, i++){
// if ( false || (false && whatever) ) <-- optimized out
run_some_other_code;
run_lots_of_other_code;
}
}
No branching here. You already have done a good job avoiding branching and code duplication.
OK, let's go back to myloop<false,true>, where there is branching. The branching is largely unavoidable simply because of how your situation is set up. You are going to iterate many times. Some iterations you want to do one thing while other iterations should do another. To get around this, you would need to re-envision your setup so that you can do the same thing each iteration. (The optimization you are working from is based upon doing the same thing each iteration, even though it might be a different thing the next time the loop starts.)
The simplest, yet unlikely, scenario would be where internal_condition[i] is equivalent to something like i < 5000. It would also be convenient if you could do all of the "some code" before any of the "lots of other code". Then you could loop from 0 to 4999, running "some code" each iteration. Then loop from 5000 to 99999, running "other code". Then a third loop to run "lots of other code".
Any solution I can think of would involve adapting your situation to make it more like the unlikely simple scenario. Can you calculate how many times internal_condition[i] is true? Can you iterate that many times and map your (new) loop control variable to the appropriate value of i (the old loop control variable)? (Or maybe the exact value of i is not important?) Then do a second loop to cover the remaining cases? In some scenarios, this might be trivial. In others, far from it.
There might be other tricks that could be done, but they depend on more details about what you are doing, what you need to do, and what you think you need to do but don't really. (It's possible that the required level of detail would overwhelm StackOverflow.) Is the order important? Is the exact value of i important?
In the end, I would opt for profiling the code. Profile the code without code duplication but with branching. Profile the code with minimal branching but with code duplication. Is there a measurable change? If so, think about how you can re-arrange your internal condition so that i can cover large ranges without changing the value of the internal condition. Then divide your loop into smaller pieces.

In C++17, to guaranty no extra branches evaluation, you might do:
template <bool external_condition_1, bool external_condition_2>
void myloop()
{
for (int i = 0; i < 100000, i++){
if constexpr (external_condition_1) {
run_some_code;
} else if constexpr (external_condition_2){
if (internal_condition[i]) {
run_some_code;
} else {
run_some_other_code;
}
} else {
run_some_other_code;
}
run_lots_of_other_code;
}
}

Related

Trouble understanding some unexpected behaviour in code with OpenMP

Question regarding OpenMP parallelization. I have included a stripped down version of my function below. The problem is that, the contents of the for loop are not getting evaluated for all values of uiIndex, although not always.
I use the buffer vec_succ_status to check if all values of uiIndex are getting evaluated. It turns out that it is not.
My code does not crash, it just exits from the function compute_Lagr_shortest_paths_from_source, without encountering any of the exit(-1) statements in the function definition below.
I am using g++ 7.4.0 version on Ubunutu 14, and every time it has failed, there is exactly one value of uiIndex that was skipped. There is no consistency to the uiIndex for which the function fails to evaluate.
For the programs I have been testing, the size of vec_group is always 1, so only the first if statement inside the for loop will evaluate.
In my main function, I included the line omp_set_num_threads(4). Apart from that, I did not set any other settings (such as scheduler type) for OpenMP.
Also, I can assure that no 2 values of uiIndex lead to the same uiRobot value, so no 2 threads will ever have to access the same vec_cf_graphs[uiRobot] array through the lieftime of the function.
I wonder if I am making some wrong assumptions about OpenMp. I require all objects such as vec_cf_graphs, vec_succ_status to be shared across all threads. I am wondering if I need to explicitly mention them as shared, as it usually the recommended approach. Anyways, I thought the way I have implemented also suffices. However, it seems rather strange to me that certain uiIndex values can get skipped altogether. I must point out that, I repeatedly call the function shown, but only sometimes certain uiIndex values are getting skipped from evaluation. If someone can point me to potential issues with my approach, that would be great. I am happy to provide additional information. Thanks.
bool compute_Lagr_shortest_paths_from_source(std::vector<Robot_CF_Graph>& vec_cf_graphs, const std::vector<std::vector<size_t>>& vec_robot_groups)
{
size_t uiIndex;
std::vector<bool> vec_succ_status(vec_robot_groups.size(), false);
#pragma omp parallel for default(shared) private(uiIndex)
for(uiIndex = 0; uiIndex < vec_robot_groups.size(); uiIndex++)
{
vec_succ_status[uiIndex] = false;
const auto& vec_group = vec_robot_groups[uiIndex];
if(1 == vec_group.size())
{
size_t uiRobot = vec_group[0];
vec_cf_graphs[uiRobot].compute_shortest_path("ABC");
vec_succ_status[uiIndex] = true;
}
else
{
std::cout<< "Tag: Code should not have entered this block"<<endl;
exit(-1);
}
if(false == vec_succ_status[uiIndex])
{
std::cout<< "It is not possible for this to happen \n";
exit(-1);
}
}
return true;
}
You concurrently write to a vector<bool> which is not a 'normal' vector. It has an internal memory optimization. This is undefined behaviour.
See detailed reasoning here:
Write concurrently vector<bool>
How vector<bool> is different from other vectors can be found here:
https://en.cppreference.com/w/cpp/container/vector_bool
Just using a vector<char> with 0 or 1 representing true or false is the easiest way to solve this. Other options are discussed here, if you want to have more elegant code:
Alternative to vector<bool>

Loop checking convention

If I was going through a loop, say iterating a vector, and I don't want to do an action on some item in the vector, I can do it in two ways:
This is the one I prefer to use:
vector<int> vec;
void loopFunction(int toIgnore) {
for (size_t index = 0; index < vec.size(); index++) {
if (vec[index] == toIgnore) continue;
// do stuff
}
}
This is the one I see most people use:
vector<int> vec;
void loopFunction(int toIgnore) {
for (size_t index = 0; index < vec.size(); index++) {
if (vec[index] != toIgnore) {
// do stuff
}
}
}
I know in the final results there is absolutely no difference. However, is there any difference under the hood since the second way opens a new scope to execute? Is any of these two preferred over the other?
Thanks
As stated in my comment, on a personal level, I prefer the first implementation using continue in order to prevent unnecessary code nesting and scope creation.
The only performance overhead from each, in addition to the normal code that will be implemented, is the evaluation of the expression in the if-statement. Since they both contain an expression to be evaluated, they're the same performance wise.
If you think about how this is compiled, for C/C++, its straight into assembly code. On that level no matter how you nest the code, it compiles into simple jmp and cmp commands. Therefore, regardless of the implementation, on compile, you'll have the ~same assembly code.
Either way you look at it, this is a micro-micro-micro optimization, if at all! Do what you prefer for code formatting and styling.

Are while loops more efficient than for loops

I was told that a while loop was more efficient than a for loop. (c/c++)
This seemed reasonable but I wanted to find a way to prove or disprove it.
I have tried three tests using analogous snippets of code. Each containing Nothing but a for or while loop with the same output:
Compile time - roughly the same
Run time - Same
Compiled to intel assembly code and compared - Same number of lines and virtually the same code
Should I have tried anything else, or can anyone confirm one way or the other?
All loops follow the same template:
{
// Initialize
LOOP:
if(!(/* Condition */) ) {
goto END
}
// Loop body
// Loop increment/decrement
goto LOOP
}
END:
Therefor the two loops are the same:
// A
for(int i=0; i<10; i++) {
// Do stuff
}
// B
int i=0;
while(i < 10) {
// Do stuff
i++;
}
// Or even
int i=0;
while(true) {
if(!(i < 10) ) {
break;
}
// Do stuff
i++;
}
Both are converted to something similar to:
{
int i=0;
LOOP:
if(!(i < 10) ) {
goto END
}
// Do stuff
i++;
goto LOOP
}
END:
Unused/unreachable code will be removed from the final executable/library.
Do-while loops skip the first conditional check and are left as an exercise for the reader. :)
Certainly LLVM will convert ALL types of loops to a consistent form (to the extent possible, of course). So as long as you have the same functionality, it doesn't really matter if you use for, while, do-while or goto to form the loop, if it's got the same initialization, exit condition, and update statement and body, it will produce the exact same machine code.
This is not terribly hard to do in a compiler if it's done early enough during the optimisation (so the compiler still understands what is actually being written). The purpose of such "make all loops equal" is that you then only need one way to optimise loops, rather than having one for while-loops, one for for-loops, one for do-while loops and one for "any other loops".
It's not guaranteed for ALL compilers, but I know that gcc/g++ will also generate nearly identical code whatever loop construct you use, and from what I've seen Microsoft also does the same.
C and C++ compilers actually convert high level C or C++ codes to assembly codes and in assembly we don't have while or for loops. We can only check a condition and jump to another location.
So, performance of for or while loop heavily depends on how strong the compiler is to optimize the codes.
This is good paper on code optimizations:
http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf.

in C++, is it more efficient to have "if" outside of a for loop or have the "if" in the for loop

I am writing a program in C++ and I am debating whether to put the "if"s inside the loop. I would imagine doing one check and then looping would be overall more efficient rather than the constant loop and check, but I am not quite sure. Or does any of this not matter because the compiler will optimize it anyway?
Is this more efficient?
for(int i = 0; i < SOME_BOUND; i++){
if(SOME_CONDITION){
//Some actions
}
else {
//Some actions
}
}
or is this more efficient?
if(SOME_CONDITION){
for(int i = 0; i < SOME_BOUND; i++){
//Some Actions
}
}
else {
for(int i = 0; i < SOME_BOUND; i++){
//Some Actions
}
}
One check is definitely better, however there are features both in hardware (branch prediction) and compiler (hoisting expressions and conditionals outside the loop) which make it hard to predict whether there will actually be a runtime difference between the two pieces of source code.
Generally you should focus on correctness and maintainability, and only go duplicating loops for performance reasons if profiling shows that the optimizer is missing out on performance.
The compiler can probably figure this out for you so that it executes it in the most efficient way. You should do what is more intuitive for your problem.
At a high level, doing the if only once is faster than doing it on every iteration of a loop, but again you should do what is more clear.

When to use the for loop over the while loop?

We can use for loop and while loop for same purpose.
in what means they effect our code if I use for instead of while? same question arises between if-else and switch-case? how to decide what to use?
for example which one you would prefer?
This code:
int main()
{
int n = 10;
for(int i=0;i<n;i++)
{
do_something();
}
return 0;
}
Or this code:
int main()
{
int n=10,i=0;
while(i<n)
{
do_something();
i++;
}
return 0;
}
if using for or while loop does not effect the code by any means then may I know What was the need to make 2 solution for same problem?
Use whichever one makes the intention of your code clearest.
If you know the number of iterations the loop should run beforehand, I would recommend the for construct. While loops are good for when the loop's terminating condition happens at some yet-to-be determined time.
I try to prefer the for loop. Why? Because when I see a for loop, I can expect all of the loop bookeeping is kept in a single statement. I can insert break or continue statements without worrying about breaking how the loop operates. And most importantly, the body of the loop focuses on what you actually want the loop to be doing, rather than maintaining the loop itself. If I see a while, then I have to look at and understand the entire loop body before I can understand what iteration pattern the loop uses.
The only place I end up using while is for those few cases where the control of the loop is provided by some outside routine (i.e. FindFirstFileW)
It's all a matter of personal opinion though. Lots of people don't like what I end up doing with for loops because the loop statement often ends up spanning multiple lines.
There are some very subtle differences..
scope of loop variable(s), for example, with the for loop i has local scope, with a while this has to be defined before (which means it is available after, of course you can do that with for as well..)
continue, with a for loop, variable will be increment/decremented, with a while, you'd have to insert the operation before continue
Frankly, if you need to increment/decrement, a for loop makes sense, if you don't know the bounds, and there is no real increment/decrement, a while loop makes more sense, e.g.
while(some_stream >> input)
{
// do stuff...
}
In general, a for loop might be preferable for simple loops, since the logic of the loop is contained in a single line:
for (int i = 0; i < 10; ++i) {...}
However, sometimes we need more complex logic or flow control. A while loop allows us to implement more complicated loops. For example, suppose we only want to increment the counter variable under certain conditions:
int i = 0;
while (i < 10)
{
if (some_condition) ++i;
else if (some_other_condition) { ... }
else break;
}
Just use the one that makes the code readable and logical.
In some cases the compiler (gcc at least) will be able to optimize a very slightly better than a for loop doing the same thing. If I remember correctly that optimization is only about few clock cycles so it probably never will have any noticeable affect on the performance.
You cannot write while(int i=0, i < n); that is, you've to define i before the while loop; means i exists inside as well as outside the loop.
However, in case of for loop, you can define i right in the for loop itself; and so i doesn't exist outside the loop. That is one difference. Just because of this difference, I like for more than while. And use while rarely, when for makes thing more cumbersome!
By no means they affect your program the way it works ! Its the matter of ease to understand better.
switch(i) // Once finding your case, you can easily know where the switch ends
// and thus the next statement of execution
{
case 1: break ;
case 2: break ;
// .....
case 10: break ;
default:break ;
}
if( i==1 ) // Here you have the pain of finding where the last else if ends !
{}
else if( i==2)
{}
// ...
else if( i==10)
{}
However, it is a matter of taste. I prefer switch.