Let's say I have a simple function that checks a condition and returns true if the condition is true and false if the condition is false.
Is it better to use this type of code:
bool myfunction( /*parameters*/ ) {
if ( /*conditional statement*/ ) {
return true;
}
return false;
}
Or this type:
bool myfunction( /*parameters*/ ) {
if ( /*conditional statement*/ ) {
return true;
}
else return false;
}
Or does it just really not make a difference? Also, what considerations should I bear in mind when deciding whether to "if...else if" vs. "if...else" vs. "switch"?
You can also write this without any conditional at all:
bool myfunction( /*parameters*/ ) {
return /*conditional statement*/;
}
This way you avoid the conditional entirely.
Of course, if you are dealing with a different function where you need the conditional, it shouldn't make a difference. Modern compilers work well either way.
As far as using switch vs if-else, switch adds efficiency when you have many cases by allowing you to jump to a single one, making execution faster by not running all cases. At a low (hardware/compiler level), switch statements allow you to make a single check/jump, where if you had many if statements, you would need to make many checks/jumps.
It is the same. Remember whenever you say
return boolean;
the function ends and return to its calling line.
Therefore putting it inside else or just simply putting it is same.
say we want to check the prime
bool isPrime (int n){
for (int i = 2; i <= sqrt(n); i++){
if (n % i == 0)
return false;
}
return true;
}
if you see the function closely you will know if the number is divided properly with any value in range of sqrt(n) it will return false as the number is not a prime..
if it cannot be divided then the loop will end without any interference and said the number to be a prime. hence forth the function works properly.
Since neither of two given answers are hitting the nail, i will give you another one.
From the code (or compiler's) view, assuming recent compiler both versions are identical. Compiler will optimise if version to return version just fine. Difference is in debugging - the debugger you're using might not allow you to set breakpoint on return value (for example if you want to set breakpoint on only returning true values). While if version give you two return statements on different lines and any sane debugger will set breakpoint on line just fine.
Both functions are identical, regardless of any optimizations applied by the compiler, because the "else" in the second function hasn't any effect. If you leave the function as soon as the condition is met, you'll never enter the other branch in this case, so the "else" is implicit in the first version.
Hence I'd prefer the first version, because the "else" in the other one is misleading.
However, I agree with others here that this kind of function (both variants) doesn't make sense anyway, because you can simply use the plain boolean condition instead of this function, which is just a needless wrapper.
In terms of compilation the specific form you choose for if-else syntax won't make a big difference. The optimization path will usually erase any differences. Your decision should be made based on visual form instead.
As others have pointed out already, if you have a simple condition like this it's best to just return the calculation directly and avoid the if statement.
Returning directly only works if you have a boolean calculation. You might instead need to return a different type:
int foo(/*args*/) {
if(/*condition*/) {
return bar();
}
return 0;
}
Alternately you could use the ternary operator ?:, but if the expressions, it may not be as clear.
By using short returns (evaluation doesn't reach the end of the function) you can also sequence several conditions and evaluations.
int foo(/*args*/) {
if(/*condition1*/) {
return 0;
}
if(/*condition2*/) {
return 3;
}
int common = bar(/*args*/);
if(/*condition3*/) {
return 1-common;
}
return common;
}
Pick the form based on what makes the most logical sense, just ignore how this might compile. Then consider massaging the form to have the least visual complexity (avoids too much indentation or deep branching).
Related
I am always in the habit of using if, else-if statement instead of multiple if statements.
Example:
int val = -1;
if (a == b1) {
return c1;
} else if (a == b2) {
return c2;
} ...
...
} else {
return c11;
}
How does it compare to example 2:
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
....
if (a == b11) {
return c11;
}
I know functionality wise they are the same. But is it best practice to do if else-if, or not? It's raised by one of my friends when I pointed out he could structure the code base differently to make it cleaner. It's already a habit for me for long but I have never asked why.
if-elseif-else statements stop doing comparisons as soon as it finds one that's true. if-if-if does every comparison. The first is more efficient.
Edit: It's been pointed out in comments that you do a return within each if block. In these cases, or in cases where control will leave the method (exceptions), there is no difference between doing multiple if statements and doing if-elseif-else statements.
However, it's best practice to use if-elseif-else anyhow. Suppose you change your code such that you don't do a return in every if block. Then, to remain efficient, you'd also have to change to an if-elseif-else idiom. Having it be if-elseif-else from the beginning saves you edits in the future, and is clearer to people reading your code (witness the misinterpretation I just gave you by doing a skim-over of your code!).
What about the case where b1 == b2? (And if a == b1 and a == b2?)
When that happens, generally speaking, the following two chunks of code will very likely have different behavior:
if (a == b1) {
/* do stuff here, and break out of the test */
}
else if (a == b2) {
/* this block is never reached */
}
and:
if (a == b1) {
/* do stuff here */
}
if (a == b2) {
/* do this stuff, as well */
}
If you want to clearly delineate functionality for the different cases, use if-else or switch-case to make one test.
If you want different functionality for multiple cases, then use multiple if blocks as separate tests.
It's not a question of "best practices" so much as defining whether you have one test or multiple tests.
The are NOT functionally equivalent.
The only way it would be functionally equivalent is if you did an "if" statement for every single possible value of a (ie: every possibly int value, as defined in limits.h in C; using INT_MIN and INT_MAX, or equivalent in Java).
The else statement allows you to cover every possible remaining value without having to write millions of "if" statements.
Also, it's better coding practice to use if...else if...else, just like how in a switch/case statement, your compiler will nag you with a warning if you don't provide a "default" case statement. This prevents you from overlooking invalid values in your program. eg:
double square_root(double x) {
if(x > 0.0f) {
return sqrt(x);
} else if(x == 0.0f) {
return x;
} else {
printf("INVALID VALUE: x must be greater than zero");
return 0.0f;
}
}
Do you want to type millions of if statements for each possible value of x in this case? Doubt it :)
Cheers!
This totally depends on the condition you're testing. In your example it will make no difference eventually but as best practice, if you want ONE of the conditions to be eventually executed then you better use if else
if (x > 1) {
System.out.println("Hello!");
}else if (x < 1) {
System.out.println("Bye!");
}
Also note that if the first condition is TRUE the second will NOT be checked at all but if you use
if (x > 1) {
System.out.println("Hello!");
}
if (x < 1) {
System.out.println("Bye!");
}
The second condition will be checked even if the first condition is TRUE. This might be resolved by the optimizer eventually but as far as I know it behaves that way. Also the first one is the one is meant to be written and behaves like this so it is always the best choice for me unless the logic requires otherwise.
if and else if is different to two consecutive if statements. In the first, when the CPU takes the first if branch the else if won't be checked. In the two consecutive if statements, even if the the first if is checked and taken, the next if will also be check and take if the the condition is true.
I tend to think that using else if is easier more robust in the face of code changes. If someone were to adjust the control flow of the function and replaces a return with side-effect or a function call with a try-catch the else-if would fail hard if all conditions are truly exclusive. It really depends to much on the exact code you are working with to make a general judgment and you need to consider the possible trade-offs with brevity.
With return statements in each if branch.
In your code, you have return statements in each of the if conditions. When you have a situation like this, there are two ways to write this. The first is how you've written it in Example 1:
if (a == b1) {
return c1;
} else if (a == b2) {
return c2;
} else {
return c11;
}
The other is as follows:
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
return c11; // no if or else around this return statement
These two ways of writing your code are identical.
The way you wrote your code in example 2 wouldn't compile in C++ or Java (and would be undefined behavior in C), because the compiler doesn't know that you've covered all possible values of a so it thinks there's a code path through the function that can get you to the end of the function without returning a return value.
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
...
if (a == b11) {
return c11;
}
// what if you set a to some value c12?
Without return statements in each if branch.
Without return statements in each if branch, your code would be functionally identical only if the following statements are true:
You don't mutate the value of a in any of the if branches.
== is an equivalence relation (in the mathematical sense) and none of the b1 thru b11 are in the same equivalence class.
== doesn't have any side effects.
To clarify further about point #2 (and also point #3):
== is always an equivalence relation in C or Java and never has side effects.
In languages that let you override the == operator, such as C++, Ruby, or Scala, the overridden == operator may not be an equivalence relation, and it may have side effects. We certainly hope that whoever overrides the == operator was sane enough to write an equivalence relation that doesn't have side effects, but there's no guarantee.
In JavaScript and certain other programming languages with loose type conversion rules, there are cases built into the language where == is not transitive, or not symmetric. (In Javascript, === is an equivalence relation.)
In terms of performance, example #1 is guaranteed not to perform any comparisons after the one that matches. It may be possible for the compiler to optimize #2 to skip the extra comparisons, but it's unlikely. In the following example, it probably can't, and if the strings are long, the extra comparisons aren't cheap.
if (strcmp(str, "b1") == 0) {
...
}
if (strcmp(str, "b2") == 0) {
...
}
if (strcmp(str, "b3") == 0) {
...
}
I prefer if/else structures, because it's much easier to evaluate all possible states of your problem in every variation together with switches. It's more robust I find and quicker to debug especially when you do multiple Boolean evaluations in a weak-typed environment such as PHP, example why elseif is bad (exaggerated for demonstration):
if(a && (c == d))
{
} elseif ( b && (!d || a))
{
} elseif ( d == a && ( b^2 > c))
{
} else {
}
This problem has beyond 4^2=16 boolean states, which is simply to demonstrate the weak-typing effects that makes things even worse. It isn't so hard to imagine a three state variable, three variable problem involved in a if ab elseif bc type of way.
Leave optimization to the compiler.
In most cases, using if-elseif-else and switch statements over if-if-if statements is more efficient (since it makes it easier for the compiler to create jump/lookup tables) and better practice since it makes your code more readable, plus the compiler makes sure you include a default case in the switch. This answer, along with this table comparing the three different statements was synthesized using other answer posts on this page as well as those of a similar SO question.
I think these code snippets are equivalent for the simple reason that you have many return statements. If you had a single return statements, you would be using else constructs that here are unnecessary.
Your comparison relies on the fact that the body of the if statements return control from the method. Otherwise, the functionality would be different.
In this case, they perform the same functionality. The latter is much easier to read and understand in my opinion and would be my choice as which to use.
They potentially do different things.
If a is equal to b1 and b2, you enter two if blocks. In the first example, you only ever enter one. I imagine the first example is faster as the compiler probably does have to check each condition sequentially as certain comparison rules may apply to the object. It may be able to optimise them out... but if you only want one to be entered, the first approach is more obvious, less likely to lead to developer mistake or inefficient code, so I'd definitely recommend that.
CanSpice's answer is correct. An additional consideration for performance is to find out which conditional occurs most often. For example, if a==b1 only occurs 1% of the time, then you get better performance by checking the other case first.
Gir Loves Tacos answer is also good. Best practice is to ensure you have all cases covered.
Question regarding OpenMP parallelization. I have included a stripped down version of my function below. The problem is that, the contents of the for loop are not getting evaluated for all values of uiIndex, although not always.
I use the buffer vec_succ_status to check if all values of uiIndex are getting evaluated. It turns out that it is not.
My code does not crash, it just exits from the function compute_Lagr_shortest_paths_from_source, without encountering any of the exit(-1) statements in the function definition below.
I am using g++ 7.4.0 version on Ubunutu 14, and every time it has failed, there is exactly one value of uiIndex that was skipped. There is no consistency to the uiIndex for which the function fails to evaluate.
For the programs I have been testing, the size of vec_group is always 1, so only the first if statement inside the for loop will evaluate.
In my main function, I included the line omp_set_num_threads(4). Apart from that, I did not set any other settings (such as scheduler type) for OpenMP.
Also, I can assure that no 2 values of uiIndex lead to the same uiRobot value, so no 2 threads will ever have to access the same vec_cf_graphs[uiRobot] array through the lieftime of the function.
I wonder if I am making some wrong assumptions about OpenMp. I require all objects such as vec_cf_graphs, vec_succ_status to be shared across all threads. I am wondering if I need to explicitly mention them as shared, as it usually the recommended approach. Anyways, I thought the way I have implemented also suffices. However, it seems rather strange to me that certain uiIndex values can get skipped altogether. I must point out that, I repeatedly call the function shown, but only sometimes certain uiIndex values are getting skipped from evaluation. If someone can point me to potential issues with my approach, that would be great. I am happy to provide additional information. Thanks.
bool compute_Lagr_shortest_paths_from_source(std::vector<Robot_CF_Graph>& vec_cf_graphs, const std::vector<std::vector<size_t>>& vec_robot_groups)
{
size_t uiIndex;
std::vector<bool> vec_succ_status(vec_robot_groups.size(), false);
#pragma omp parallel for default(shared) private(uiIndex)
for(uiIndex = 0; uiIndex < vec_robot_groups.size(); uiIndex++)
{
vec_succ_status[uiIndex] = false;
const auto& vec_group = vec_robot_groups[uiIndex];
if(1 == vec_group.size())
{
size_t uiRobot = vec_group[0];
vec_cf_graphs[uiRobot].compute_shortest_path("ABC");
vec_succ_status[uiIndex] = true;
}
else
{
std::cout<< "Tag: Code should not have entered this block"<<endl;
exit(-1);
}
if(false == vec_succ_status[uiIndex])
{
std::cout<< "It is not possible for this to happen \n";
exit(-1);
}
}
return true;
}
You concurrently write to a vector<bool> which is not a 'normal' vector. It has an internal memory optimization. This is undefined behaviour.
See detailed reasoning here:
Write concurrently vector<bool>
How vector<bool> is different from other vectors can be found here:
https://en.cppreference.com/w/cpp/container/vector_bool
Just using a vector<char> with 0 or 1 representing true or false is the easiest way to solve this. Other options are discussed here, if you want to have more elegant code:
Alternative to vector<bool>
I'm trying to carry out some loop optimization as described here: Optimizing a Loop vs Code Duplication
I have the additional complication that some code inside the loop only needs to be executed depending on a combination of run-time-known variables external to the loop (which can be replaced with template parameters for optimization, as discussed in the link above) and a run-time-known variable that is only known inside the loop.
Here is the completely un-optimized version of the code:
for (int i = 0; i < 100000, i++){
if (external_condition_1 || (external_condition_2 && internal_condition[i])){
run_some_code;
}
else{
run_some_other_code;
}
run_lots_of_other_code;
}
This is my attempt at wrapping the loop in a templated function as suggested in the question linked above to optimize performance and avoid code duplication by writing multiple versions of the loop:
template<bool external_condition_1, external_condition_2>myloop(){
for (int i = 0; i < 100000, i++){
if (external_condition_1 || (external_condition_2 && internal_condition[i]){
run_some_code;
}
else{
run_some_other_code;
}
run_lots_of_other_code;
}
My question is: how can the code be written to avoid branching and code duplication?
Note that the code is sufficiently complex that the function probably can't be inlined, and compiler optimization also likely wouldn't sort this out in general.
My question is: how can the code be written to avoid branching and code duplication?
Well, you already wrote your template to avoid code duplication, right? So let's look at what branching is left. To do this, we should look at each function that is generated from your template (there are four of them). We should also apply the expected compiler optimizations based upon the template parameters.
First up, set condition 1 to true. This should produce two functions that are essentially (using a bit of pseudo-syntax) the following:
myloop<true, bool external_condition_2>() {
for (int i = 0; i < 100000, i++){
// if ( true || whatever ) <-- optimized out
run_some_code;
run_lots_of_other_code;
}
}
No branching there. Good. Moving on to the first condition being false and the second condition being true.
myloop<false, true>(){
for (int i = 0; i < 100000, i++){
if ( internal_condition[i] ){ // simplified from (false || (true && i_c[i]))
run_some_code;
}
else{
run_some_other_code;
}
run_lots_of_other_code;
}
}
OK, there is some branching going on here. However, each i needs to be analyzed to see which code should execute. I think there is nothing more that can be done here without more information about internal_condition. I'll give some thoughts on that later, but let's move on to the fourth function for now.
myloop<false, false>() {
for (int i = 0; i < 100000, i++){
// if ( false || (false && whatever) ) <-- optimized out
run_some_other_code;
run_lots_of_other_code;
}
}
No branching here. You already have done a good job avoiding branching and code duplication.
OK, let's go back to myloop<false,true>, where there is branching. The branching is largely unavoidable simply because of how your situation is set up. You are going to iterate many times. Some iterations you want to do one thing while other iterations should do another. To get around this, you would need to re-envision your setup so that you can do the same thing each iteration. (The optimization you are working from is based upon doing the same thing each iteration, even though it might be a different thing the next time the loop starts.)
The simplest, yet unlikely, scenario would be where internal_condition[i] is equivalent to something like i < 5000. It would also be convenient if you could do all of the "some code" before any of the "lots of other code". Then you could loop from 0 to 4999, running "some code" each iteration. Then loop from 5000 to 99999, running "other code". Then a third loop to run "lots of other code".
Any solution I can think of would involve adapting your situation to make it more like the unlikely simple scenario. Can you calculate how many times internal_condition[i] is true? Can you iterate that many times and map your (new) loop control variable to the appropriate value of i (the old loop control variable)? (Or maybe the exact value of i is not important?) Then do a second loop to cover the remaining cases? In some scenarios, this might be trivial. In others, far from it.
There might be other tricks that could be done, but they depend on more details about what you are doing, what you need to do, and what you think you need to do but don't really. (It's possible that the required level of detail would overwhelm StackOverflow.) Is the order important? Is the exact value of i important?
In the end, I would opt for profiling the code. Profile the code without code duplication but with branching. Profile the code with minimal branching but with code duplication. Is there a measurable change? If so, think about how you can re-arrange your internal condition so that i can cover large ranges without changing the value of the internal condition. Then divide your loop into smaller pieces.
In C++17, to guaranty no extra branches evaluation, you might do:
template <bool external_condition_1, bool external_condition_2>
void myloop()
{
for (int i = 0; i < 100000, i++){
if constexpr (external_condition_1) {
run_some_code;
} else if constexpr (external_condition_2){
if (internal_condition[i]) {
run_some_code;
} else {
run_some_other_code;
}
} else {
run_some_other_code;
}
run_lots_of_other_code;
}
}
I have a Heaviside step function centered on unity for any data type, which I've encoded using:
template <typename T>
int h1(const T& t){
if (t < 1){
return 0;
} else if (t >= 1){
return 1;
}
}
In code review, my reviewer told me that there is not an explicit return on all control paths. And the compiler does not warn me either. But I don't agree; the conditions are mutually exclusive. How do I deal with this?
It depends on how the template is used. For an int, you're fine.
But, if t is an IEEE754 floating point double type with a value set to NaN, neither t < 1 nor t >= 1 are true and so program control reaches the end of the if block! This causes the function to return without an explicit value; the behaviour of which is undefined.
(In a more general case, where T overloads the < and >= operators in such a way as to not cover all possibilities, program control will reach the end of the if block with no explicit return.)
The moral of the story here is to decide on which branch should be the default, and make that one the else case.
Just because code is correct, that doesn't mean it can't be better. Correct execution is the first step in quality, not the last.
if (t < 1) {
return 0;
} else if (t >= 1){
return 1;
}
The above is "correct" for any datatype of t than has sane behavior for < and >=. But this:
if (t < 1) {
return 0;
}
return 1;
Is easier to see by inspection that every case is covered, and avoids the second unneeded comparison altogether (that some compilers might not have optimized out). Code is not only read by compilers, but by humans, including you 10 years from now. Give the humans a break and write more simply for their understanding as well.
As noted, some special numbers can be both < and >=, so your reviewer is simply right.
The question is: what made you want to code it like this in the first place? Why do you even consider making life so hard for yourself and others (the people that need to maintain your code)? Just the fact that you are smart enough to deduce that < and >= should cover all cases doesn't mean that you have to make the code more complex than necessary. What goes for physics goes for code too: make things as simple as possible, but not simpler (I believe Einstein said this).
Think about it. What are you trying to achieve? Must be something like this: 'Return 0 if the input is less than 1, return 1 otherwise.' What you've done is add intelligence by saying ... oh but that means that I return 1 if t is greater or equal 1. This sort of needless 'x implies y' is requiring extra think work on behalf of the maintainer. If you think that is a good thing, I would advise to do a couple of years of code maintenance yourself.
If it were my review, I'd make another remark. If you use an 'if' statement, then you can basically do anything you want in all branches. But in this case, you do not do 'anything'. All you want to do is return 0 or 1 depending on whether t<1 or not. In those cases, I think the '?:' statement is much better and more readable than the if statement. Thus:
return t<1 ? 0 : 1;
I know the ?: operator is forbidden in some companies, and I find that a horrible thing to do. ?: usually matches much better with specifications, and it can make code so much easier to read (if used with care) ...
For example
int f(int a) {
...
return a > 10;
}
is that considered acceptable (not legal, I mean is it ``good code''), or should it always be in a conditional, like this
int f(int a) {
...
if (a > 10)
return 1;
else
return 0;
}
It would be acceptable - if your return type was bool.
This is absolutely acceptable! In fact, Joel mentioned this on the latest stackoverflow podcast. He said it was the one thing he's had to show almost every programmer that starts at Fog Creek.
return a > 10 ? 1 : 0;
... makes more sense because you're returning an int, not a bool.
The first case is perfectly good, far better than the second, IMHO. As a matter of readability, I personally would do
return (a > 10);
but that is a minor nit, and not one everyone would agree on.
I don't see anything wrong with it. If anything it's more concise and I think most developers with moderate experience would prefer it.
The first is much preferable to me, since it is more concise. (And it avoids multiple returns:)
I'd rather write bool f(int); and the first form as bool is the boolean type in C++. If I really need to return an int, I'd write something like
int f(int) {
...
const int res = (i>42) ? 1 : 0;
return res;
}
I'd never understood why people write
if (expr == true)
mybool = true ;
else
mybool = false;
instead of the plain
mybool = expr;
Boolean algebra is a tool that any developer should be able to handle instinctively
Moreover, I'd rather define a named temporary as some debuggers don't handle function return values very well.
I think its perfectly acceptable, provided that you ensure that you make an extra effort to maintain readability. Like I would make sure that the method name is very unambiguous and you use good variable names.
The second alternative that you provided I think is almost worse because it involves a branch statement and multiple return statements and these things increase the complexity of the method while themselves reducing its readability.
Not only is that syntax 100% acceptable, you should also feel free to use boolean expressions outside of if statements, i.e. int x = i && ( j || k ); (or returning values like that).
I think part of it has to do with the style and culture of the language. The first example you have written is what would be expected from an experienced C programmer. They would much rather strangle themselves than put in an unnecessary block of statements.
I think it is perfectly acceptable when the language allows it and the usage is part of the paradigm of that language
I just tried three different variants with GCC:
int one(int x) { return (x > 42) ? 1 : 0; }
int two(int x) { return x > 42; }
int thr(int x) { if (x > 42) return 1; else return 0; }
As soon as you enable some optimization, the generated code for all of them is the same. So you should use the variant that is easiest to read.
I'll typically do the former over the latter.