In recursive DP, break up recursion call by storing variables: inefficient? - c++

Suppose I am solving a dynamic programming problem recursively (top down). For example, a recursive solution to the longest common subsequence problem:
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1);
else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) );
return result;
}
Often in such a DP problem at some point we have to take the max of some expressions, representing returns to different choices we can make. In the above case we have the max of two simple expressions, but in worse cases it can be the max of three or four quite complicated expressions involving long function calls. In such situations, I am often tempted to give these complicated expressions their own variable names, to make the code more readable. In the above case that would mean I would write
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1);
else
a = LCS(S,n-1,T,m);
b = LCS(S, n, T, m-1);
result = max(a, b);
return result;
}
(In this simplified case a and b are not complicated, but in other cases they are, and there may be even more arguments to the max function, so this could really help it be more understandable.)
My Question: Is this a terrible idea? As I understand it, I'm adding a variable to each layer of the call stack, and I'm thinking that could be wasteful. But on the other hand, at each layer it has to calculate the temporary variable LCS(S,n,T,m) anyway (I'm thinking in terms of C++, say), and as far as I know, there might be not much difference in cost between the two ways.
If this is a terrible idea, is there a more efficient way to break up a complicated recursive function call to make it more readable?

C++ has the "As-If" rule, which states that a compiler can do whatever it wants so long as the observable effects are indistinguishable from what is defined by the standard to happen. In this case, it's trivial to prove both fragments have the same meaning, and a compiler will likely emit identical instructions for both.
Note: You aren't doing dynamic programming here, as you don't memoise parameter / result pairs.

Related

Else keyword in non void function in C++ [duplicate]

I am always in the habit of using if, else-if statement instead of multiple if statements.
Example:
int val = -1;
if (a == b1) {
return c1;
} else if (a == b2) {
return c2;
} ...
...
} else {
return c11;
}
How does it compare to example 2:
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
....
if (a == b11) {
return c11;
}
I know functionality wise they are the same. But is it best practice to do if else-if, or not? It's raised by one of my friends when I pointed out he could structure the code base differently to make it cleaner. It's already a habit for me for long but I have never asked why.
if-elseif-else statements stop doing comparisons as soon as it finds one that's true. if-if-if does every comparison. The first is more efficient.
Edit: It's been pointed out in comments that you do a return within each if block. In these cases, or in cases where control will leave the method (exceptions), there is no difference between doing multiple if statements and doing if-elseif-else statements.
However, it's best practice to use if-elseif-else anyhow. Suppose you change your code such that you don't do a return in every if block. Then, to remain efficient, you'd also have to change to an if-elseif-else idiom. Having it be if-elseif-else from the beginning saves you edits in the future, and is clearer to people reading your code (witness the misinterpretation I just gave you by doing a skim-over of your code!).
What about the case where b1 == b2? (And if a == b1 and a == b2?)
When that happens, generally speaking, the following two chunks of code will very likely have different behavior:
if (a == b1) {
/* do stuff here, and break out of the test */
}
else if (a == b2) {
/* this block is never reached */
}
and:
if (a == b1) {
/* do stuff here */
}
if (a == b2) {
/* do this stuff, as well */
}
If you want to clearly delineate functionality for the different cases, use if-else or switch-case to make one test.
If you want different functionality for multiple cases, then use multiple if blocks as separate tests.
It's not a question of "best practices" so much as defining whether you have one test or multiple tests.
The are NOT functionally equivalent.
The only way it would be functionally equivalent is if you did an "if" statement for every single possible value of a (ie: every possibly int value, as defined in limits.h in C; using INT_MIN and INT_MAX, or equivalent in Java).
The else statement allows you to cover every possible remaining value without having to write millions of "if" statements.
Also, it's better coding practice to use if...else if...else, just like how in a switch/case statement, your compiler will nag you with a warning if you don't provide a "default" case statement. This prevents you from overlooking invalid values in your program. eg:
double square_root(double x) {
if(x > 0.0f) {
return sqrt(x);
} else if(x == 0.0f) {
return x;
} else {
printf("INVALID VALUE: x must be greater than zero");
return 0.0f;
}
}
Do you want to type millions of if statements for each possible value of x in this case? Doubt it :)
Cheers!
This totally depends on the condition you're testing. In your example it will make no difference eventually but as best practice, if you want ONE of the conditions to be eventually executed then you better use if else
if (x > 1) {
System.out.println("Hello!");
}else if (x < 1) {
System.out.println("Bye!");
}
Also note that if the first condition is TRUE the second will NOT be checked at all but if you use
if (x > 1) {
System.out.println("Hello!");
}
if (x < 1) {
System.out.println("Bye!");
}
The second condition will be checked even if the first condition is TRUE. This might be resolved by the optimizer eventually but as far as I know it behaves that way. Also the first one is the one is meant to be written and behaves like this so it is always the best choice for me unless the logic requires otherwise.
if and else if is different to two consecutive if statements. In the first, when the CPU takes the first if branch the else if won't be checked. In the two consecutive if statements, even if the the first if is checked and taken, the next if will also be check and take if the the condition is true.
I tend to think that using else if is easier more robust in the face of code changes. If someone were to adjust the control flow of the function and replaces a return with side-effect or a function call with a try-catch the else-if would fail hard if all conditions are truly exclusive. It really depends to much on the exact code you are working with to make a general judgment and you need to consider the possible trade-offs with brevity.
With return statements in each if branch.
In your code, you have return statements in each of the if conditions. When you have a situation like this, there are two ways to write this. The first is how you've written it in Example 1:
if (a == b1) {
return c1;
} else if (a == b2) {
return c2;
} else {
return c11;
}
The other is as follows:
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
return c11; // no if or else around this return statement
These two ways of writing your code are identical.
The way you wrote your code in example 2 wouldn't compile in C++ or Java (and would be undefined behavior in C), because the compiler doesn't know that you've covered all possible values of a so it thinks there's a code path through the function that can get you to the end of the function without returning a return value.
if (a == b1) {
return c1;
}
if (a == b2) {
return c2;
}
...
if (a == b11) {
return c11;
}
// what if you set a to some value c12?
Without return statements in each if branch.
Without return statements in each if branch, your code would be functionally identical only if the following statements are true:
You don't mutate the value of a in any of the if branches.
== is an equivalence relation (in the mathematical sense) and none of the b1 thru b11 are in the same equivalence class.
== doesn't have any side effects.
To clarify further about point #2 (and also point #3):
== is always an equivalence relation in C or Java and never has side effects.
In languages that let you override the == operator, such as C++, Ruby, or Scala, the overridden == operator may not be an equivalence relation, and it may have side effects. We certainly hope that whoever overrides the == operator was sane enough to write an equivalence relation that doesn't have side effects, but there's no guarantee.
In JavaScript and certain other programming languages with loose type conversion rules, there are cases built into the language where == is not transitive, or not symmetric. (In Javascript, === is an equivalence relation.)
In terms of performance, example #1 is guaranteed not to perform any comparisons after the one that matches. It may be possible for the compiler to optimize #2 to skip the extra comparisons, but it's unlikely. In the following example, it probably can't, and if the strings are long, the extra comparisons aren't cheap.
if (strcmp(str, "b1") == 0) {
...
}
if (strcmp(str, "b2") == 0) {
...
}
if (strcmp(str, "b3") == 0) {
...
}
I prefer if/else structures, because it's much easier to evaluate all possible states of your problem in every variation together with switches. It's more robust I find and quicker to debug especially when you do multiple Boolean evaluations in a weak-typed environment such as PHP, example why elseif is bad (exaggerated for demonstration):
if(a && (c == d))
{
} elseif ( b && (!d || a))
{
} elseif ( d == a && ( b^2 > c))
{
} else {
}
This problem has beyond 4^2=16 boolean states, which is simply to demonstrate the weak-typing effects that makes things even worse. It isn't so hard to imagine a three state variable, three variable problem involved in a if ab elseif bc type of way.
Leave optimization to the compiler.
In most cases, using if-elseif-else and switch statements over if-if-if statements is more efficient (since it makes it easier for the compiler to create jump/lookup tables) and better practice since it makes your code more readable, plus the compiler makes sure you include a default case in the switch. This answer, along with this table comparing the three different statements was synthesized using other answer posts on this page as well as those of a similar SO question.
I think these code snippets are equivalent for the simple reason that you have many return statements. If you had a single return statements, you would be using else constructs that here are unnecessary.
Your comparison relies on the fact that the body of the if statements return control from the method. Otherwise, the functionality would be different.
In this case, they perform the same functionality. The latter is much easier to read and understand in my opinion and would be my choice as which to use.
They potentially do different things.
If a is equal to b1 and b2, you enter two if blocks. In the first example, you only ever enter one. I imagine the first example is faster as the compiler probably does have to check each condition sequentially as certain comparison rules may apply to the object. It may be able to optimise them out... but if you only want one to be entered, the first approach is more obvious, less likely to lead to developer mistake or inefficient code, so I'd definitely recommend that.
CanSpice's answer is correct. An additional consideration for performance is to find out which conditional occurs most often. For example, if a==b1 only occurs 1% of the time, then you get better performance by checking the other case first.
Gir Loves Tacos answer is also good. Best practice is to ensure you have all cases covered.

Does this function have explicit return values on all control paths?

I have a Heaviside step function centered on unity for any data type, which I've encoded using:
template <typename T>
int h1(const T& t){
if (t < 1){
return 0;
} else if (t >= 1){
return 1;
}
}
In code review, my reviewer told me that there is not an explicit return on all control paths. And the compiler does not warn me either. But I don't agree; the conditions are mutually exclusive. How do I deal with this?
It depends on how the template is used. For an int, you're fine.
But, if t is an IEEE754 floating point double type with a value set to NaN, neither t < 1 nor t >= 1 are true and so program control reaches the end of the if block! This causes the function to return without an explicit value; the behaviour of which is undefined.
(In a more general case, where T overloads the < and >= operators in such a way as to not cover all possibilities, program control will reach the end of the if block with no explicit return.)
The moral of the story here is to decide on which branch should be the default, and make that one the else case.
Just because code is correct, that doesn't mean it can't be better. Correct execution is the first step in quality, not the last.
if (t < 1) {
return 0;
} else if (t >= 1){
return 1;
}
The above is "correct" for any datatype of t than has sane behavior for < and >=. But this:
if (t < 1) {
return 0;
}
return 1;
Is easier to see by inspection that every case is covered, and avoids the second unneeded comparison altogether (that some compilers might not have optimized out). Code is not only read by compilers, but by humans, including you 10 years from now. Give the humans a break and write more simply for their understanding as well.
As noted, some special numbers can be both < and >=, so your reviewer is simply right.
The question is: what made you want to code it like this in the first place? Why do you even consider making life so hard for yourself and others (the people that need to maintain your code)? Just the fact that you are smart enough to deduce that < and >= should cover all cases doesn't mean that you have to make the code more complex than necessary. What goes for physics goes for code too: make things as simple as possible, but not simpler (I believe Einstein said this).
Think about it. What are you trying to achieve? Must be something like this: 'Return 0 if the input is less than 1, return 1 otherwise.' What you've done is add intelligence by saying ... oh but that means that I return 1 if t is greater or equal 1. This sort of needless 'x implies y' is requiring extra think work on behalf of the maintainer. If you think that is a good thing, I would advise to do a couple of years of code maintenance yourself.
If it were my review, I'd make another remark. If you use an 'if' statement, then you can basically do anything you want in all branches. But in this case, you do not do 'anything'. All you want to do is return 0 or 1 depending on whether t<1 or not. In those cases, I think the '?:' statement is much better and more readable than the if statement. Thus:
return t<1 ? 0 : 1;
I know the ?: operator is forbidden in some companies, and I find that a horrible thing to do. ?: usually matches much better with specifications, and it can make code so much easier to read (if used with care) ...

Is there a technical reason to use > (<) instead of != when incrementing by 1 in a 'for' loop?

I almost never see a for loop like this:
for (int i = 0; 5 != i; ++i)
{}
Is there a technical reason to use > or < instead of != when incrementing by 1 in a for loop? Or this is more of a convention?
while (time != 6:30pm) {
Work();
}
It is 6:31pm... Damn, now my next chance to go home is tomorrow! :)
This to show that the stronger restriction mitigates risks and is probably more intuitive to understand.
There is no technical reason. But there is mitigation of risk, maintainability and better understanding of code.
< or > are stronger restrictions than != and fulfill the exact same purpose in most cases (I'd even say in all practical cases).
There is duplicate question here; and one interesting answer.
Yes there is a reason. If you write a (plain old index based) for loop like this
for (int i = a; i < b; ++i){}
then it works as expected for any values of a and b (ie zero iterations when a > b instead of infinite if you had used i == b;).
On the other hand, for iterators you'd write
for (auto it = begin; it != end; ++it)
because any iterator should implement an operator!=, but not for every iterator it is possible to provide an operator<.
Also range-based for loops
for (auto e : v)
are not just fancy sugar, but they measurably reduce the chances to write wrong code.
You can have something like
for(int i = 0; i<5; ++i){
...
if(...) i++;
...
}
If your loop variable is written by the inner code, the i!=5 might not break that loop. This is safer to check for inequality.
Edit about readability.
The inequality form is way more frequently used. Therefore, this is very fast to read as there is nothing special to understand (brain load is reduced because the task is common). So it's cool for the readers to make use of these habits.
And last but not least, this is called defensive programming, meaning to always take the strongest case to avoid current and future errors influencing the program.
The only case where defensive programming is not needed is where states have been proven by pre- and post-conditions (but then, proving this is the most defensive of all programming).
I would argue that an expression like
for ( int i = 0 ; i < 100 ; ++i )
{
...
}
is more expressive of intent than is
for ( int i = 0 ; i != 100 ; ++i )
{
...
}
The former clearly calls out that the condition is a test for an exclusive upper bound on a range; the latter is a binary test of an exit condition. And if the body of the loop is non-trivial, it may not apparent that the index is only modified in the for statement itself.
Iterators are an important case when you most often use the != notation:
for(auto it = vector.begin(); it != vector.end(); ++it) {
// do stuff
}
Granted: in practice I would write the same relying on a range-for:
for(auto & item : vector) {
// do stuff
}
but the point remains: one normally compares iterators using == or !=.
The loop condition is an enforced loop invariant.
Suppose you don't look at the body of the loop:
for (int i = 0; i != 5; ++i)
{
// ?
}
in this case, you know at the start of the loop iteration that i does not equal 5.
for (int i = 0; i < 5; ++i)
{
// ?
}
in this case, you know at the start of the loop iteration that i is less than 5.
The second is much, much more information than the first, no? Now, the programmer intent is (almost certainly) the same, but if you are looking for bugs, having confidence from reading a line of code is a good thing. And the second enforces that invariant, which means some bugs that would bite you in the first case just cannot happen (or don't cause memory corruption, say) in the second case.
You know more about the state of the program, from reading less code, with < than with !=. And on modern CPUs, they take the same amount of time as no difference.
If your i was not manipulated in the loop body, and it was always increased by 1, and it started less than 5, there would be no difference. But in order to know if it was manipulated, you'd have to confirm each of these facts.
Some of these facts are relatively easy, but you can get wrong. Checking the entire body of the loop is, however, a pain.
In C++ you can write an indexes type such that:
for( const int i : indexes(0, 5) )
{
// ?
}
does the same thing as either of the two above for loops, even down to the compiler optimizing it down to the same code. Here, however, you know that i cannot be manipulated in the body of the loop, as it is declared const, without the code corrupting memory.
The more information you can get out of a line of code without having to understand the context, the easier it is to track down what is going wrong. < in the case of integer loops gives you more information about the state of the code at that line than != does.
As already said by Ian Newson, you can't reliably loop over a floating variable and exit with !=. For instance,
for (double x=0; x!=1; x+=0.1) {}
will actually loop forever, because 0.1 can't exactly be represented in floating point, hence the counter narrowly misses 1. With < it terminates.
(Note however that it's basically undefined behaviour whether you get 0.9999... as the last accepted number – which kind of violates the less-than assumption – or already exit at 1.0000000000000001.)
Yes; OpenMP doesn't parallelize loops with the != condition.
It may happen that the variable i is set to some large value and if you just use the != operator you will end up in an endless loop.
As you can see from the other numerous answers, there are reasons to use < instead of != which will help in edge cases, initial conditions, unintended loop counter modification, etc...
Honestly though, I don't think you can stress the importance of convention enough. For this example it will be easy enough for other programmers to see what you are trying to do, but it will cause a double-take. One of the jobs while programming is making it as readable and familiar to everyone as possible, so inevitably when someone has to update/change your code, it doesn't take a lot of effort to figure out what you were doing in different code blocks. If I saw someone use !=, I'd assume there was a reason they used it instead of < and if it was a large loop I'd look through the whole thing trying to figure out what you did that made that necessary... and that's wasted time.
I take the adjectival "technical" to mean language behavior/quirks and compiler side effects such as performance of generated code.
To this end, the answer is: no(*). The (*) is "please consult your processor manual". If you are working with some edge-case RISC or FPGA system, you may need to check what instructions are generated and what they cost. But if you're using pretty much any conventional modern architecture, then there is no significant processor level difference in cost between lt, eq, ne and gt.
If you are using an edge case you could find that != requires three operations (cmp, not, beq) vs two (cmp, blt xtr myo). Again, RTM in that case.
For the most part, the reasons are defensive/hardening, especially when working with pointers or complex loops. Consider
// highly contrived example
size_t count_chars(char c, const char* str, size_t len) {
size_t count = 0;
bool quoted = false;
const char* p = str;
while (p != str + len) {
if (*p == '"') {
quote = !quote;
++p;
}
if (*(p++) == c && !quoted)
++count;
}
return count;
}
A less contrived example would be where you are using return values to perform increments, accepting data from a user:
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i != len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
i += step; // here for emphasis, it could go in the for(;;)
}
}
Try this and input the values 1, 2, 10, 999.
You could prevent this:
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i != len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
if (step + i > len)
std::cout << "too much.\n";
else
i += step;
}
}
But what you probably wanted was
#include <iostream>
int main() {
size_t len = 5, step;
for (size_t i = 0; i < len; ) {
std::cout << "i = " << i << ", step? " << std::flush;
std::cin >> step;
i += step;
}
}
There is also something of a convention bias towards <, because ordering in standard containers often relies on operator<, for instance hashing in several STL containers determines equality by saying
if (lhs < rhs) // T.operator <
lessthan
else if (rhs < lhs) // T.operator < again
greaterthan
else
equal
If lhs and rhs are a user defined class writing this code as
if (lhs < rhs) // requires T.operator<
lessthan
else if (lhs > rhs) // requires T.operator>
greaterthan
else
equal
The implementor has to provide two comparison functions. So < has become the favored operator.
There are several ways to write any kind of code (usually), there just happens to be two ways in this case (three if you count <= and >=).
In this case, people prefer > and < to make sure that even if something unexpected happens in the loop (like a bug), it won't loop infinitely (BAD). Consider the following code, for example.
for (int i = 1; i != 3; i++) {
//More Code
i = 5; //OOPS! MISTAKE!
//More Code
}
If we used (i < 3), we would be safe from an infinite loop because it placed a bigger restriction.
Its really your choice whether you want a mistake in your program to shut the whole thing down or keep functioning with the bug there.
Hope this helped!
The most common reason to use < is convention. More programmers think of loops like this as "while the index is in range" rather than "until the index reaches the end." There's value is sticking to convention when you can.
On the other hand, many answers here are claiming that using the < form helps avoid bugs. I'd argue that in many cases this just helps hide bugs. If the loop index is supposed to reach the end value, and, instead, it actually goes beyond it, then there's something happening you didn't expect which may cause a malfunction (or be a side effect of another bug). The < will likely delay discovery of the bug. The != is more likely to lead to a stall, hang, or even a crash, which will help you spot the bug sooner. The sooner a bug is found, the cheaper it is to fix.
Note that this convention is peculiar to array and vector indexing. When traversing nearly any other type of data structure, you'd use an iterator (or pointer) and check directly for an end value. In those cases you have to be sure the iterator will reach and not overshoot the actual end value.
For example, if you're stepping through a plain C string, it's generally more common to write:
for (char *p = foo; *p != '\0'; ++p) {
// do something with *p
}
than
int length = strlen(foo);
for (int i = 0; i < length; ++i) {
// do something with foo[i]
}
For one thing, if the string is very long, the second form will be slower because the strlen is another pass through the string.
With a C++ std::string, you'd use a range-based for loop, a standard algorithm, or iterators, even if though the length is readily available. If you're using iterators, the convention is to use != rather than <, as in:
for (auto it = foo.begin(); it != foo.end(); ++it) { ... }
Similarly, iterating a tree or a list or a deque usually involves watching for a null pointer or other sentinel rather than checking if an index remains within a range.
One reason not to use this construct is floating point numbers. != is a very dangerous comparison to use with floats as it'll rarely evaluate to true even if the numbers look the same. < or > removes this risk.
There are two related reasons for following this practice that both have to do with the fact that a programming language is, after all, a language that will be read by humans (among others).
(1) A bit of redundancy. In natural language we usually provide more information than is strictly necessary, much like an error correcting code. Here the extra information is that the loop variable i (see how I used redundancy here? If you didn't know what 'loop variable' means, or if you forgot the name of the variable, after reading "loop variable i" you have the full information) is less than 5 during the loop, not just different from 5. Redundancy enhances readability.
(2) Convention. Languages have specific standard ways of expressing certain situations. If you don't follow the established way of saying something, you will still be understood, but the effort for the recipient of your message is greater because certain optimisations won't work. Example:
Don't talk around the hot mash. Just illuminate the difficulty!
The first sentence is a literal translation of a German idiom. The second is a common English idiom with the main words replaced by synonyms. The result is comprehensible but takes a lot longer to understand than this:
Don't beat around the bush. Just explain the problem!
This is true even in case the synonyms used in the first version happen to fit the situation better than the conventional words in the English idiom. Similar forces are in effect when programmers read code. This is also why 5 != i and 5 > i are weird ways of putting it unless you are working in an environment in which it is standard to swap the more normal i != 5 and i < 5 in this way. Such dialect communities do exist, probably because consistency makes it easier to remember to write 5 == i instead of the natural but error prone i == 5.
Using relational comparisons in such cases is more of a popular habit than anything else. It gained its popularity back in the times when such conceptual considerations as iterator categories and their comparability were not considered high priority.
I'd say that one should prefer to use equality comparisons instead of relational comparisons whenever possible, since equality comparisons impose less requirements on the values being compared. Being EqualityComparable is a lesser requirement than being LessThanComparable.
Another example that demonstrates the wider applicability of equality comparison in such contexts is the popular conundrum with implementing unsigned iteration down to 0. It can be done as
for (unsigned i = 42; i != -1; --i)
...
Note that the above is equally applicable to both signed and unsigned iteration, while the relational version breaks down with unsigned types.
Besides the examples, where the loop variable will (unintentional) change inside the body, there are other reasions to use the smaller-than or greater-than operators:
Negations make code harder to understand
< or > is only one char, but != two
In addition to the various people who have mentioned that it mitigates risk, it also reduces the number of function overloads necessary to interact with various standard library components. As an example, if you want your type to be storable in a std::set, or used as a key for std::map, or used with some of the searching and sorting algorithms, the standard library usually uses std::less to compare objects as most algorithms only need a strict weak ordering. Thus it becomes a good habit to use the < comparisons instead of != comparisons (where it makes sense, of course).
There is no problem from a syntax perspective, but the logic behind that expression 5!=i is not sound.
In my opinion, using != to set the bounds of a for loop is not logically sound because a for loop either increments or decrements the iteration index, so setting the loop to iterate until the iteration index becomes out of bounds (!= to something) is not a proper implementation.
It will work, but it is prone to misbehavior since the boundary data handling is lost when using != for an incremental problem (meaning that you know from the start if it increments or decrements), that's why instead of != the <>>==> are used.

c switch and jump tables

It is my understanding that a switch statement in c/c++ will sometimes compile to a jump table.
My question is, are there any thumb rules to assure that?
In my case I'm doing something like this:
enum myenum{
MY_CASE0= 0,
MY_CASE0= 1,
.
.
.
};
switch(foo)
{
case MY_CASE0:
//do stuff
break;
case MY_CASE1:
//do stuff
break;
.
.
.
}
I cover all the cases from 1 to n by order. Is safe to assume it will compile to a jump table?
The original code was a long and messy if else statement, so at the very least I gain some readability.
A good compiler can and will choose between a jump table, a chained if/else or a combination. A poorly designed compiler may not make such a choice - and may even produce very bad code for switch-blocks. But any decent compiler should produce efficient code for switch-blocks. T
he major decision factor here is that the compiler may choose if/else when the numbers are far apart [and not trivially (e.g. dividing by 2, 4, 8, 16, 256 etc) changed to a closer value], e.g.
switch(x)
{
case 1:
...
case 4912:
...
case 11211:
...
case 19102:
...
}
would require a jump table of at least 19102 * 2 bytes.
On the other hand, if the numbers are close together, the compiler will typically use a jumptable.
Even if it's a if/else type of design, it will typically do a "binary search" - if we take the above example:
if (x <= 4912)
{
if (x == 1)
{
....
}
else if (x == 4912)
{
....
}
} else {
if (x == 11211)
{
....
}
else if (x == 19102)
{
...
}
}
If we have LOTS of cases, this approach will nest quite deep, and humans will probably get lost after three or four levels of depth (bearing in mind that each if starts at some point in the MIDDLE of the range), but it reduces the number of tests by a log2(n) where n is the number of choices. It is certainly a lot more efficient than the naive approach of
if (x == first value) ...
else if (x == second value) ...
else if (x == third value) ...
..
else if (x == nth value) ...
else ...
This can be slightly better if certain values are put at the beginning of the if-else chain, but that assumes you can determine what is the most common before running the code.
If performance is CRITICAL to your case, then you need to benchmark the two alternatives. But my guess is that just writing the code as a switch will make the code much clearer, and at the same time run at least as fast, if not faster.
Compilers can certainly convert any C/C++ switch into a jump table, but a compiler would do this for efficiency. Ask yourself, what would I do if I were writing a compiler and I had just build a parse tree for a switch/case statement? I have studied compiler design and construction, and here are some of the decisions,
How to help a compiler decide to implement a jump table:
case values are small integers (0,1,2,3,...)
case values are in a compact range (few holes, remember default is an option)
there are enough cases to make the optimization worthwhile (> N, examine your compiler source to find the constant)
clever compilers may subtract/add a constant to a jumptable index if the range is compact (example: 1000, 1001, 1002, 1003, 1004, 1005, etc)
avoid fallthrough and transfer of control (goto, continue)
only one break at end of each case
Though the mechanics may differ between compilers, the compiler is essentially creating unnamed functions (well, maybe not a function, because the compiler may use jump into the code block and jump outof the code block, or may be clever and use jsr and return)
The certain way to get a jump table is to write it. It is an array of pointers to functions, indexed by the value you want.
How?
Define a typedef for your function pointer, Understanding typedefs for function pointers in C,
typedef void (*FunkPtr)(double a1, double a2);
FunkPtr JumpTable[] = {
function_name_0,
function_name_1,
function_name_2,
...
function_name_n
};
Of course, you have already defined function_name_{0..n}, so the compiler can find the address of the function to evoke.
I will leave evocation of the function pointer and boundary checking as an exercise for the reader.

If-else-if versus map

Suppose I have such an if/else-if chain:
if( x.GetId() == 1 )
{
}
else if( x.GetId() == 2 )
{
}
// ... 50 more else if statements
What I wonder is, if I keep a map, will it be any better in terms of performance? (assuming keys are integers)
Maps (usually) are implemented using red-black trees which gives O(log N) lookups as the tree is constantly kept in balance. Your linear list of if statements will be O(N) worst case. So, yes a map would be significantly faster for lookup.
Many people are recommending using a switch statement, which may not be faster for you, depending on your actual if statements. A compiler can sometimes optimize switch by using a jump table which would be O(1), but this is only possible for values that an undefined criteria; hence this behavior can be somewhat nondeterministic. Though there is a great article with a few tips on optimizing switch statements here Optimizing C and C++ Code.
You technically could even formulate a balanced tree manually, this works best for static data and I happened to just recently create a function to quickly find which bit was set in a byte (This was used in an embedded application on an I/O pin interrupt and had to be quick when 99% of the time only 1 bit would be set in the byte):
unsigned char single_bit_index(unsigned char bit) {
// Hard-coded balanced tree lookup
if(bit > 0x08)
if(bit > 0x20)
if(bit == 0x40)
return 6;
else
return 7;
else
if(bit == 0x10)
return 4;
else
return 5;
else
if(bit > 0x02)
if(bit == 0x04)
return 2;
else
return 3;
else
if(bit == 0x01)
return 0;
else
return 1;
}
This gives a constant lookup in 3 steps for any of the 8 values which gives me very deterministic performance, a linear search -- given random data -- would average 4 step lookups, with a best-case of 1 and worst-case of 8 steps.
This is a good example of a range that a compiler would probably not optimize to a jump table since the 8 values I am searching for are so far apart: 1, 2, 4, 8, 16, 32, 64, and 128. It would have to create a very sparse 128 position table with only 8 elements containing a target, which on a PC with a ton of RAM might not be a big deal, but on a microcontroller it'd be killer.
why dont you use a a switch ?
swich(x.GetId())
{
case 1: /* do work */ break; // From the most used case
case 2: /* do work */ break;
case ...: // To the less used case
}
EDIT:
Put the most frequently used case in the top of the switch (This can have some performance issue if x.GetId is generally equal to 50)
switch is the best thing I think
The better solution would be a switch statement. This will allow you to check the value of x.GetId() just once, rather than (on average) 25 times as your code is doing now.
If you want to get fancy, you can use a data structure containing pointers to functions that handle whatever it is that's in the braces. If your ID values are consecutive (i.e. numbers between 1 and 50) then an array of function pointers would be best. If they are spread out, then a map would be more appropriate.
The answer, as with most performance related questions, is maybe.
If the IDs are in a fortunate range, a switch might become a jump-table, providing constant time lookups to all IDs. You won't get much better than this, short of redesigning. Alternatively, if the IDs are consecutive but you don't get a jump-table out of the compiler, you can force the issue by filling an array with function pointers.
[from here on out, switch refers to a generic if/else chain]
A map provides worst-case logarithmic lookup for any given ID, while a switch can only guarantee linear. However, if the IDs are not random, sorting the switch cases by usage might ensure the worst-case scenario is sufficiently rare that this doesn't matter.
A map will incur some initial overhead when loading the IDs and associating them with the functions, and then incur a the overhead of calling a function pointer every time you access an ID. A switch incurs additional overhead when writing the routine, and possibly significant overhead when debugging it.
Redesigning might allow you to avoid the question all together. No matter how you implement it, this smells like trouble. I can't help but think there's a better way to handle this.
If I really had a potential switch of fifty possibilities, I'd definitely think about a vector of pointers to functions.
#include <cstdio>
#include <cstdlib>
#include <ctime>
const unsigned int Max = 4;
void f1();
void f2();
void f3();
void f4();
void (*vp[Max])();
int main()
{
vp[ 0 ] = f1;
vp[ 1 ] = f2;
vp[ 2 ] = f3;
vp[ 3 ] = f4;
srand( std::time( NULL ) );
vp[( rand() % Max )]();
}
void f1()
{
std::printf( "Hello from f1!\n" );
}
void f2()
{
std::printf( "Hello from f2!\n" );
}
void f3()
{
std::printf( "Hello from f3!\n" );
}
void f4()
{
std::printf( "Hello from f4!\n" );
}
There are a lot of suggestions involving switch-case. In terms of efficiency, this might be better, might be the same. Won't be worse.
But if you're just setting/returning a value or name based on the ID, then YES. A map is exactly what you need. STL containers are optimised, and if you think you can optimise better, then you are either incredibly smart or staggeringly dumb.
e.g A single call using a std::map called mymap,
thisvar = mymap[x.getID()];
is much better than 50 of these
if(x.getID() == ...){thisvar = ...;}
because it's more efficient as the number of IDs increases. If you're interested in why, search for a good primer on data structures.
But what I'd really look at here is maintenance/fixing time. If you need to change the name of the variable, or change from using getID() or getName(), or make any kind of minor change, you've got to do it FIFTY TIMES in your example. And you need a new line every time you add an ID.
The map reduces that to one code change NO MATTER HOW MANY IDs YOU HAVE.
That said, if you're actually carrying out different actions for each ID, a switch-case might be better. With switch-case rather than if statements, you can improve performance and readability. See here: Advantage of switch over if-else statement
I'd avoid pointers to functions unless you're very clear on how they'd improve your code, because if you're not 100% certain what you're doing, the syntax can be messed up, and it's overkill for anything you'd feasibly use a map for.
Basically, I'd be interested in the problem you're trying to solve. You might be better off with a map or a switch-case, but if you think you can use a map, that is ABSOLUTELY what you should be using instead.