c switch and jump tables

c switch and jump tables - c++

It is my understanding that a switch statement in c/c++ will sometimes compile to a jump table.
My question is, are there any thumb rules to assure that?
In my case I'm doing something like this:
enum myenum{
MY_CASE0= 0,
MY_CASE0= 1,
.
.
.
};
switch(foo)
{
case MY_CASE0:
//do stuff
break;
case MY_CASE1:
//do stuff
break;
.
.
.
}
I cover all the cases from 1 to n by order. Is safe to assume it will compile to a jump table?
The original code was a long and messy if else statement, so at the very least I gain some readability.

A good compiler can and will choose between a jump table, a chained if/else or a combination. A poorly designed compiler may not make such a choice - and may even produce very bad code for switch-blocks. But any decent compiler should produce efficient code for switch-blocks. T
he major decision factor here is that the compiler may choose if/else when the numbers are far apart [and not trivially (e.g. dividing by 2, 4, 8, 16, 256 etc) changed to a closer value], e.g.
switch(x)
{
case 1:
...
case 4912:
...
case 11211:
...
case 19102:
...
}
would require a jump table of at least 19102 * 2 bytes.
On the other hand, if the numbers are close together, the compiler will typically use a jumptable.
Even if it's a if/else type of design, it will typically do a "binary search" - if we take the above example:
if (x <= 4912)
{
if (x == 1)
{
....
}
else if (x == 4912)
{
....
}
} else {
if (x == 11211)
{
....
}
else if (x == 19102)
{
...
}
}
If we have LOTS of cases, this approach will nest quite deep, and humans will probably get lost after three or four levels of depth (bearing in mind that each if starts at some point in the MIDDLE of the range), but it reduces the number of tests by a log2(n) where n is the number of choices. It is certainly a lot more efficient than the naive approach of
if (x == first value) ...
else if (x == second value) ...
else if (x == third value) ...
..
else if (x == nth value) ...
else ...
This can be slightly better if certain values are put at the beginning of the if-else chain, but that assumes you can determine what is the most common before running the code.
If performance is CRITICAL to your case, then you need to benchmark the two alternatives. But my guess is that just writing the code as a switch will make the code much clearer, and at the same time run at least as fast, if not faster.

Compilers can certainly convert any C/C++ switch into a jump table, but a compiler would do this for efficiency. Ask yourself, what would I do if I were writing a compiler and I had just build a parse tree for a switch/case statement? I have studied compiler design and construction, and here are some of the decisions,
How to help a compiler decide to implement a jump table:
case values are small integers (0,1,2,3,...)
case values are in a compact range (few holes, remember default is an option)
there are enough cases to make the optimization worthwhile (> N, examine your compiler source to find the constant)
clever compilers may subtract/add a constant to a jumptable index if the range is compact (example: 1000, 1001, 1002, 1003, 1004, 1005, etc)
avoid fallthrough and transfer of control (goto, continue)
only one break at end of each case
Though the mechanics may differ between compilers, the compiler is essentially creating unnamed functions (well, maybe not a function, because the compiler may use jump into the code block and jump outof the code block, or may be clever and use jsr and return)
The certain way to get a jump table is to write it. It is an array of pointers to functions, indexed by the value you want.
How?
Define a typedef for your function pointer, Understanding typedefs for function pointers in C,
typedef void (*FunkPtr)(double a1, double a2);
FunkPtr JumpTable[] = {
function_name_0,
function_name_1,
function_name_2,
...
function_name_n
};
Of course, you have already defined function_name_{0..n}, so the compiler can find the address of the function to evoke.
I will leave evocation of the function pointer and boundary checking as an exercise for the reader.

Related

Nested if statements vs extra else if?

I come across a lot of logic work where I'm not sure what design pattern is better for if statements. In these situations, I can usually put in a nested if statement, or alternatively. These two cases are shown below. What are the pros and cons of both, is there a standard I should follow?
if (val > 0 && is_on)
{
// (1)
}
else if (val > 0)
{
// (2)
}
else
{
// (3)
}
if (val > 0)
{
if(is_on)
{
// same as (1)
}
else
{
// same as (2)
}
}
else
{
// same as (3)
}

I don't think there is any specific pros or cons to any of the approach. Its all depends upon how you want to design your code and what you think is more readable to anyone who is looking at your code for the first time.
As per me, the first approach looks better as its more readable and contains fewer lines of code.

The first approach is more readable. But as the logic expressions (e.g. "val > 0 && is_on") get longer, it starts to make more sense to merge towards the second approach. The second one is easier to debug, so you could start there and then merge back. I'd match the style of the surrounding code/"code-policy" ultimately.

While the other answers are absolutely right in that your primary focus should be be readability, I want to address another difference: execution performance.
In the first example, there are 2 conditions that need evaluated before the else branch can run. If as we scale the number of conditions baked into this else/if ladder, the amount of evaluation to get to the else branch grows linearly. Now, we aren't expecting to have ten thousand conditions or anything, but it is something to take note of nonetheless.
Now, in your second example, we check the common condition between the first two branches, and if that fails, we quick-fail to the else branch, with no extra tests. In the extreme case, this can somewhat resemble a binary search for the correct code block- branching left and right until it finds its match, as opposed to a linear scan that checks each in order one-by-one.
Now, does this mean you should use the latter? Not necessarily- readability is more important, and if you're writing in a compiled language, the compiler will likely optimize away all that away anyways. And even if you're in an interpreted language, the performance hit is probably going to be negligible compared to everything else anyways, unless this is the hot section of a hot loop.
However, if you are bothered by the "wastefulness" of the repetition in the first example, but would rather avoid huge amounts of nesting, often languages will provide an assignment expression syntax, giving you a 3rd option, where you compute the result once and store it to a variable inline, for reuse in subsequent code.
For example:
if (expensive_func1() > 0 && is_on)
{
// (1)
}
else if (expensive_func1() > 0 && expensive_func2() > 0)
{
// (2)
}
else
{
// (3)
}
Becomes:
if ((is_alive = expensive_func1() > 0) && is_on)
{
// (1)
}
else if (is_alive && expensive_func2() > 0)
{
// (2)
}
else
{
// (3)
}
This saves us from recomputing the common sub expressions between our conditionals, in languages were we can't rely on a compiler to do that for us. Sure, we could just assign these to variables explicitly before the if statements, but then we bite the bullet of evaluating all shared expressions, rather than lazily evaluating them as needed (imagine we compute expensive_func2 > 0 for reuse in a 3rd if/else, only to find out we didn't need it, that we're taking the first branch).

In recursive DP, break up recursion call by storing variables: inefficient?

Suppose I am solving a dynamic programming problem recursively (top down). For example, a recursive solution to the longest common subsequence problem:
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1);
else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) );
return result;
}
Often in such a DP problem at some point we have to take the max of some expressions, representing returns to different choices we can make. In the above case we have the max of two simple expressions, but in worse cases it can be the max of three or four quite complicated expressions involving long function calls. In such situations, I am often tempted to give these complicated expressions their own variable names, to make the code more readable. In the above case that would mean I would write
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1);
else
a = LCS(S,n-1,T,m);
b = LCS(S, n, T, m-1);
result = max(a, b);
return result;
}
(In this simplified case a and b are not complicated, but in other cases they are, and there may be even more arguments to the max function, so this could really help it be more understandable.)
My Question: Is this a terrible idea? As I understand it, I'm adding a variable to each layer of the call stack, and I'm thinking that could be wasteful. But on the other hand, at each layer it has to calculate the temporary variable LCS(S,n,T,m) anyway (I'm thinking in terms of C++, say), and as far as I know, there might be not much difference in cost between the two ways.
If this is a terrible idea, is there a more efficient way to break up a complicated recursive function call to make it more readable?

C++ has the "As-If" rule, which states that a compiler can do whatever it wants so long as the observable effects are indistinguishable from what is defined by the standard to happen. In this case, it's trivial to prove both fragments have the same meaning, and a compiler will likely emit identical instructions for both.
Note: You aren't doing dynamic programming here, as you don't memoise parameter / result pairs.

Should I use ELSE IF? for better performance?

Newbie here and I just want to know should I use ELSE IF for something like below:
(function)
IF x==1
IF x==2
IF x==3
That is the way I am using, because the x will not be anything else. However, I think that if the x is equal to 1, the program still gonna run through the following codes (which turn out to be FALSE FALSE FALSE ...). Should I use ELSE IF so it won't have to run the rest? Will that help the performance?
Why don't I want to use ELSE IF? because I'd like each code block (IF x==n) to be similar, not like this:
IF x==1
ELSE IF x==2
ELSE IF x==3
(each ELSE IF block is part of the block above it)
But the program will repeatedly call this function so I am worried about the performance or delay.

Short answer: If you do not need to handle a case where multiple conditions might be true at the same time, always use
if (condition) {
//do something
}
else if (other_condition) {
//do something else
}
else { //in all other conditions
//default behaviour
}
Long answer:
As others have already stated, performance is not really a big concern (unless you are writing production code for enterprise software targeted at colossal businesses). In case performance is indeed crucial though, you should go for the above format anyway. So that might be a good practice/habit to get used to (especially if you are now starting your code journey)
Switch could be an alternative, but since you haven't specified the language I would avoid suggesting it since, in some languages, it defaults to fall-through (which might get you where you started in the first place and confuse you even more)
Performance might not be a concern. But keep in mind that logic errors are a huge enemy to programming, and your solution is prone to them if you don't actually need it to be able to match more than one cases. Consider the following case.
if (x == 1) {
x = x + 1
}
if (x == 2) {
x = x + 2
}
if (x >= 3) {
print("Error: x should only be 1 or 2!")
}
In this case, you would expect that if x >= 3 you would warn about an error in value since you only had in mind handling the values 1 or 2. Actually though, even if the value of x is 1 or 2 (which you have considered to be valid) the same error message would be printed!. That's because you have allowed the possibility of more than one conditions being checked and the respective code block being executed each time. Note that this is an oversimplified example. In times, this can be a great pain! Especially if you collaborate with others and you share the code and you are aiming for expendable and maintainable code.
To conclude, do not use a simpler solution if you haven't thought it through. Go for the complete one instead and take in mind all possible outcomes (usually the worst case scenarios and even future features and code).
Best Regards!

If the value being tested is expected to be able to match multiple in a single calling, then test each (IF, IF, ...).
If the value is expected to only match one, then check for it and stop if you find it (IF, ELSE IF, ELSE IF...).
If the values are expected to be one of a known set, then go right to it (switch).

Assuming this is javascript, but this should be about the same for anything else.
The code inside the if statement will only be run if the condition you provide it is true. For example, if you declare x = 1, we could have something like:
function something() {
if(x == 1) {
//do this
}
if(x == 2) {
//do that
}
if(x == 3) {
//do this and that
}
The first block would be run and everything else is ignored. An else-if statement will run if the first if statement is false.
function something() {
if(x == 1) {
//do this
}
else if(x == 2) {
//do that
}
So if x == 1 was false, the next statement would be evaluated.
As for performance, the difference is way too little for you to care about. If you have many conditions you need to test, you may want to look into a switch statement.

Does this function have explicit return values on all control paths?

I have a Heaviside step function centered on unity for any data type, which I've encoded using:
template <typename T>
int h1(const T& t){
if (t < 1){
return 0;
} else if (t >= 1){
return 1;
}
}
In code review, my reviewer told me that there is not an explicit return on all control paths. And the compiler does not warn me either. But I don't agree; the conditions are mutually exclusive. How do I deal with this?

It depends on how the template is used. For an int, you're fine.
But, if t is an IEEE754 floating point double type with a value set to NaN, neither t < 1 nor t >= 1 are true and so program control reaches the end of the if block! This causes the function to return without an explicit value; the behaviour of which is undefined.
(In a more general case, where T overloads the < and >= operators in such a way as to not cover all possibilities, program control will reach the end of the if block with no explicit return.)
The moral of the story here is to decide on which branch should be the default, and make that one the else case.

Just because code is correct, that doesn't mean it can't be better. Correct execution is the first step in quality, not the last.
if (t < 1) {
return 0;
} else if (t >= 1){
return 1;
}
The above is "correct" for any datatype of t than has sane behavior for < and >=. But this:
if (t < 1) {
return 0;
}
return 1;
Is easier to see by inspection that every case is covered, and avoids the second unneeded comparison altogether (that some compilers might not have optimized out). Code is not only read by compilers, but by humans, including you 10 years from now. Give the humans a break and write more simply for their understanding as well.

As noted, some special numbers can be both < and >=, so your reviewer is simply right.
The question is: what made you want to code it like this in the first place? Why do you even consider making life so hard for yourself and others (the people that need to maintain your code)? Just the fact that you are smart enough to deduce that < and >= should cover all cases doesn't mean that you have to make the code more complex than necessary. What goes for physics goes for code too: make things as simple as possible, but not simpler (I believe Einstein said this).
Think about it. What are you trying to achieve? Must be something like this: 'Return 0 if the input is less than 1, return 1 otherwise.' What you've done is add intelligence by saying ... oh but that means that I return 1 if t is greater or equal 1. This sort of needless 'x implies y' is requiring extra think work on behalf of the maintainer. If you think that is a good thing, I would advise to do a couple of years of code maintenance yourself.
If it were my review, I'd make another remark. If you use an 'if' statement, then you can basically do anything you want in all branches. But in this case, you do not do 'anything'. All you want to do is return 0 or 1 depending on whether t<1 or not. In those cases, I think the '?:' statement is much better and more readable than the if statement. Thus:
return t<1 ? 0 : 1;
I know the ?: operator is forbidden in some companies, and I find that a horrible thing to do. ?: usually matches much better with specifications, and it can make code so much easier to read (if used with care) ...

If-else-if versus map

Suppose I have such an if/else-if chain:
if( x.GetId() == 1 )
{
}
else if( x.GetId() == 2 )
{
}
// ... 50 more else if statements
What I wonder is, if I keep a map, will it be any better in terms of performance? (assuming keys are integers)

Maps (usually) are implemented using red-black trees which gives O(log N) lookups as the tree is constantly kept in balance. Your linear list of if statements will be O(N) worst case. So, yes a map would be significantly faster for lookup.
Many people are recommending using a switch statement, which may not be faster for you, depending on your actual if statements. A compiler can sometimes optimize switch by using a jump table which would be O(1), but this is only possible for values that an undefined criteria; hence this behavior can be somewhat nondeterministic. Though there is a great article with a few tips on optimizing switch statements here Optimizing C and C++ Code.
You technically could even formulate a balanced tree manually, this works best for static data and I happened to just recently create a function to quickly find which bit was set in a byte (This was used in an embedded application on an I/O pin interrupt and had to be quick when 99% of the time only 1 bit would be set in the byte):
unsigned char single_bit_index(unsigned char bit) {
// Hard-coded balanced tree lookup
if(bit > 0x08)
if(bit > 0x20)
if(bit == 0x40)
return 6;
else
return 7;
else
if(bit == 0x10)
return 4;
else
return 5;
else
if(bit > 0x02)
if(bit == 0x04)
return 2;
else
return 3;
else
if(bit == 0x01)
return 0;
else
return 1;
}
This gives a constant lookup in 3 steps for any of the 8 values which gives me very deterministic performance, a linear search -- given random data -- would average 4 step lookups, with a best-case of 1 and worst-case of 8 steps.
This is a good example of a range that a compiler would probably not optimize to a jump table since the 8 values I am searching for are so far apart: 1, 2, 4, 8, 16, 32, 64, and 128. It would have to create a very sparse 128 position table with only 8 elements containing a target, which on a PC with a ton of RAM might not be a big deal, but on a microcontroller it'd be killer.

why dont you use a a switch ?
swich(x.GetId())
{
case 1: /* do work */ break; // From the most used case
case 2: /* do work */ break;
case ...: // To the less used case
}
EDIT:
Put the most frequently used case in the top of the switch (This can have some performance issue if x.GetId is generally equal to 50)

switch is the best thing I think

The better solution would be a switch statement. This will allow you to check the value of x.GetId() just once, rather than (on average) 25 times as your code is doing now.
If you want to get fancy, you can use a data structure containing pointers to functions that handle whatever it is that's in the braces. If your ID values are consecutive (i.e. numbers between 1 and 50) then an array of function pointers would be best. If they are spread out, then a map would be more appropriate.

The answer, as with most performance related questions, is maybe.
If the IDs are in a fortunate range, a switch might become a jump-table, providing constant time lookups to all IDs. You won't get much better than this, short of redesigning. Alternatively, if the IDs are consecutive but you don't get a jump-table out of the compiler, you can force the issue by filling an array with function pointers.
[from here on out, switch refers to a generic if/else chain]
A map provides worst-case logarithmic lookup for any given ID, while a switch can only guarantee linear. However, if the IDs are not random, sorting the switch cases by usage might ensure the worst-case scenario is sufficiently rare that this doesn't matter.
A map will incur some initial overhead when loading the IDs and associating them with the functions, and then incur a the overhead of calling a function pointer every time you access an ID. A switch incurs additional overhead when writing the routine, and possibly significant overhead when debugging it.
Redesigning might allow you to avoid the question all together. No matter how you implement it, this smells like trouble. I can't help but think there's a better way to handle this.

If I really had a potential switch of fifty possibilities, I'd definitely think about a vector of pointers to functions.
#include <cstdio>
#include <cstdlib>
#include <ctime>
const unsigned int Max = 4;
void f1();
void f2();
void f3();
void f4();
void (*vp[Max])();
int main()
{
vp[ 0 ] = f1;
vp[ 1 ] = f2;
vp[ 2 ] = f3;
vp[ 3 ] = f4;
srand( std::time( NULL ) );
vp[( rand() % Max )]();
}
void f1()
{
std::printf( "Hello from f1!\n" );
}
void f2()
{
std::printf( "Hello from f2!\n" );
}
void f3()
{
std::printf( "Hello from f3!\n" );
}
void f4()
{
std::printf( "Hello from f4!\n" );
}

There are a lot of suggestions involving switch-case. In terms of efficiency, this might be better, might be the same. Won't be worse.
But if you're just setting/returning a value or name based on the ID, then YES. A map is exactly what you need. STL containers are optimised, and if you think you can optimise better, then you are either incredibly smart or staggeringly dumb.
e.g A single call using a std::map called mymap,
thisvar = mymap[x.getID()];
is much better than 50 of these
if(x.getID() == ...){thisvar = ...;}
because it's more efficient as the number of IDs increases. If you're interested in why, search for a good primer on data structures.
But what I'd really look at here is maintenance/fixing time. If you need to change the name of the variable, or change from using getID() or getName(), or make any kind of minor change, you've got to do it FIFTY TIMES in your example. And you need a new line every time you add an ID.
The map reduces that to one code change NO MATTER HOW MANY IDs YOU HAVE.
That said, if you're actually carrying out different actions for each ID, a switch-case might be better. With switch-case rather than if statements, you can improve performance and readability. See here: Advantage of switch over if-else statement
I'd avoid pointers to functions unless you're very clear on how they'd improve your code, because if you're not 100% certain what you're doing, the syntax can be messed up, and it's overkill for anything you'd feasibly use a map for.
Basically, I'd be interested in the problem you're trying to solve. You might be better off with a map or a switch-case, but if you think you can use a map, that is ABSOLUTELY what you should be using instead.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js