Avoiding Branch Operations in Switch Statements [duplicate]

Avoiding Branch Operations in Switch Statements [duplicate] - c++

This question already has answers here:
How does switch compile in Visual C++ and how optimized and fast is it?
(2 answers)
Closed 7 years ago.
Instead of a switch statement running iterating through branch after branch, is there away to make assembly look up a list in an array for a goto statement? Or is this usually optimized in the compiler?
Such an optimization would help immensely for large switch statements with constant values.
Ex:
switch(test) {
case 1:
// Do something
break;
case 2:
// Do something
break;
}
"Optimized":
action_link[] = {action_1, action_2};
goto action_link[test];
action_1:
// Do Something
action_2:
// Do Something

The compiler will make this decision for you, based on your optimisation settings and its heuristics about what might make a good jump table.
In many simple situations, the compiler might decide that a naive test-and-jump chain would be faster or smaller than the equivalent jump table.

That is indeed what you should expect from a decent compiler. In fact, the limitations of the switch statement were based on making it easy to translate to a "jump table" instead of multiple branches. Back at the dawn of time, FORTRAN had the "computed GOTO" for the same reason.

Related

What is the rationale for C++ switch only working for integral types? [duplicate]

This question already has answers here:
Why can't the switch statement be applied to strings?
(22 answers)
Closed 8 years ago.
I feel that C++ should allow switch() over any type that can be compared, not just integral types. It seems odd that:
switch(myEnum)
{
case myEnumValue1:
... break;
case myEnumValue1:
... break;
}
Is semantically the same as:
if(myEnum == myEnumValue1)
...
else if(myEnum == myEnumValue2)
...
But this only works on integral types. Why? What is the purpose of such a restriction?
I understand that the compiler-generated code for switch may only be compatible with integral or register-sized types, but these kinds of things are normally black-boxed from the developers' decisions. These kinds of things are normally abstracted from us. If a jump table is better, the compiler should generate one. If a typical if/else algorithm is needed, so be it.

The switch statement was inherited from C, itself designed in the holy days when efficiency was still a concern. Yes, this is most probably because it allows "computed gotos", i.e. jump tables.
A sequence of compares can indeed be an alternative, but for long case lists, hashing could be preferred. Not in the scope of this early language.

Switch statement with huge number of cases

What happens if the switch has more than 5000 case. What are the drawbacks and how we can replace it with something faster?
Note: I am not expecting to use array to store cases as it's the same.

There's no specific reason to think you'd want anything other than a switch/case statement (and indeed I'd actively expect it to be unhelpful). The compiler should create efficient dispatching code, which might involve some combination of static [sparse] table(s) and direct indexing, binary branching etc.; it's got insights into the static values of the cases and should do an excellent job (retuning it on the fly each time you change the cases, whereas new values that don't fit well with a hand-crafted approach - such as wildly differing values when you'd had a pretty packed array lookup - could require reworking of code or silently cause memory bloat or a performance drop).
People really cared about this kind of thing back when C was trying to win over hard-core assembly programmers... the compilers were held accountable for generating good code. Put another way - if it's not (measurably) broken, don't fix it.
More generally, it's great to be curious about this kind of thing and get people's ideas on alternatives and their performance implications, but if you really care and the performance difference could make a useful difference to your program (especially if profiling suggests it) then always benchmark with your program doing real work.

As food for thought... in case one might be stuck with an old/buggy/inefficient compiler or just love hacking.
Inner work of switch statement consist of two parts. Finding address to jump, and well jumping there. For the first part you need to use a table to find the address. If the number of cases increases, table gets bigger - searching address to jump takes time. This is the point compilers tries to optimize, combining several techniques but one easy approach is to use table directly which depends on case value space.
In a back of the napkin example;
switch (n) {
case 1: foo(); break;
case 2: bar(); break;
case 3: baz(); break;
}
with such piece of code compiler can create an array of jump_addresses and directly get the address by array[n]. Now search just took O(1). But if you had a switch like below:
switch (n) {
case 10: foo(); break;
case 17: bar(); break;
case 23: baz(); break;
// and a lot other
}
compiler needs to generate a table containing case_id, jump_address pairs and code to search through that structure which with worst implementation can take O(n). (Decent compilers optimize the hell out of such scenario when they are fully unleashed by enabling their optimization flags to a degree that when you need to debug such optimized code your brain starts to fry.)
Then question is can you do this all yourself at C level to beat the compiler? and funny thing is while creating tables and searching through them seems easy, jumping to a variable point using goto is not possible in standard C. So there is a chance that if you are not going to use function pointers due to overhead or code structure, you are stuck... well if you are not using GCC. GCC has a non-standard feature called Labels as Values which helps you to get pointers to labels.
To complete the example you can write the second switch statement with "labels as values" feature like this:
const void *cases[] = {&&case_foo, &&case_bar, &&case_baz, ....};
goto *labels[n];
case_foo:
foo();
goto switch_end;
case_bar:
bar();
goto switch_end;
case_baz:
baz();
goto switch_end;
// and a lot other
switch_end:
Of course if you are talking about 5000 cases, it is much better if you write a piece of code to create this code for you - and it is probably only way to maintain such software.
As closing notes; will this improve your daily work? No. Will this improve your skills? Yes and talking from experience, I once found myself improved a security algorithm in a smart card just by optimizing case values. It is a strange world.

Try to use Dictionary class with Delegate values. At least it makes code a little bit more readable.

Big switch statement, generally auto-generated one, may take long time to compile. But I like the idea that compiler optimizes the switch statement.
One way to break apart the switch statement is to use bucketing,
int getIt(int input)
{
int bucket = input%16;
switch(bucket)
{
case 1:
return getItBucket1(input);
case 2:
return getItBucket2(input);
...
...
}
return -1;
}
So in the code above, we broke apart our switch statement into 16 parts. It is easy to change the number of buckets in auto-generated code.
This code has added run-time cost of one layer of indirection or function-call. . But considering the buckets defined in different files, it is faster to compile them in parallel.

Optimal virtual machine/byte-code interpreter loop

My project has a VM that executes a byte-code compiled from a domain-specific-language. I'm looking at ways that I can improve the execution time of the byte-code. As a first step I'd like to see if there is a way to simply improve the byte-code interpreter before I venture into machine code compilation.
The main loop of the interpreter looks like this:
while(true)
{
uint8_t cmd = *code++;
switch( cmd )
{
case op_1: ...; break;
...
}
}
QUESTION: Is there a faster way to implement this loop without resorting to assembler?
The one option I see is GCC specific to use dynamic goto with label addresses. Rather than a break at the end of each case I could jump directly to the next instruction. I had hoped the optimizer would do this for me, but looking at the disassembly it apparently doesn't: there is a repeated constant jump at the end of most op_codes.
If relevant the VM is a simple register based machine with floating point and integer registers (8 of each). There is no stack, only a global heap (that language is not that complicated).

One very easy optimisation is that instead of
switch /case/case/case/case/case,
just define an array with function pointers (where each function would process a specified command, or a couple of commands in which case you could set several entries in the array to the same function, and the function itself could check the exact code), and instead of
switch(cmd)
just do a
array[cmd]()
This is given that you dont have too many commands. Also, do some checking if you will not define all the possible commands (maybe you only have 300 commands, but you have to use 2 bytes for representing them, so instead of definining an array with 65536 items, just check if the command is less than 301 and if its not, dont do the lookup)
If you won't do that, at least sort the commands that the most used ones are in the beginning of the switch statement.
Otherwise it would be to look into hashtables, but I assume you don't have that many commands, and in that case overhead of doing a hash function would probably cost you more than not having to do a switch. (Or have a VERY simple hash function)

What's the architecture? You may get a speed-up with word-aligned opcodes, but it'll blow out your code size, which means you'll have to balance it against the cost of a cache miss.

Few obvious optimization I see are,
If you don't use cmd anywhere than switch() then, directly use the pointer indirection, switch( *code++ ). For longer while(true) loop, this can be little helpful.
In switch(), you can use continue instead of break. Because when continue is used inside if/else or switch, compiler knows that execution has to jump to the outer loop; the same is not true for break (with respect to switch).
Hope this helps.

How does switch compile in Visual C++ and how optimized and fast is it?

As I found out that I can use only numerical values in C++'s switch statements, I thought that there then must be some deeper difference between it and a bunch of if-else's.
Therefore I asked myself:
(How) does switch differ from if-elseif-elseif in terms of runtime speed, compile time optimization and general compilation? I'm mainly speaking of MSVC here.

A switch is often compiled to a jump-table (one comparison to find out which code to run), or if that is not possible, the compiler may still reorder the comparisons, so as to perform a binary search among the values (log N comparisons). An if-else chain is a linear search (although, I suppose, if all the relevant values are compile-time integral constants, the compiler could in principle perform similar optimizations).

Switch statements are often a common source of compiler optimization. That is, how they are treated depends on the optimization settings you use on your compiler.
The most basic (un-optimized) way of compiling a switch statement is to treat it as a chain of if ... else if ... statements. The common way that compilers optimize a switch is to convert it to a jump table which can look something like:
if (condition1) goto label1;
if (condition2) goto label2;
if (condition3) goto label3;
else goto default;
label1:
<<<code from first `case statement`>>>
goto end;
label2:
<<<code from first `case statement`>>>
goto end;
label3:
<<<code from first `case statement`>>>
goto end;
default:
<<<code from `default` case>>>
goto end;
end:
One reason this method is faster is because the code inside the conditionals is smaller (so there's a smaller instruction cache penalty if the conditional is mis-predicted). Also, the "fall-through" case becomes more trivial to implement (the compiler leaves off the goto end statement).
Compilers can further optimize the jump table by creating an array of pointers (to the locations marked by the labels) and use the value you are switching on as an index into that array. This would eliminate nearly all of the conditionals from the code (except for whatever was needed to validate whether the value you are switching on matches one of your cases or not).
A word of caution: nested jump tables are difficult to generate and some compilers refuse to even try to create one. For that reason, avoid nesting a switch inside another switch if maximally-optimized code is important to you (I'm not 100% sure how MSVC in particular handles nested switches, but the compiler manual should tell you).

Why the switch statement and not if-else?

I've been wondering this for some time now. I'm by far not a hardcore programmer, mainly small Python scripts and I've written a couple molecular dynamics simulations. For the real question: What is the point of the switch statement? Why can't you just use the if-else statement?
Thanks for your answer and if this has been asked before please point me to the link.
EDIT
S.Lott has pointed out that this may be a duplicate of questions If/Else vs. Switch. If you want to close then do so. I'll leave it open for further discussion.

A switch construct is more easily translated into a jump (or branch) table. This can make switch statements much more efficient than if-else when the case labels are close together. The idea is to place a bunch of jump instructions sequentially in memory and then add the value to the program counter. This replaces a sequence of comparison instructions with an add operation.
Below are some extremely simplified psuedo-assembly examples. First, the if-else version:
// C version
if (1 == value)
function1();
else if (2 == value)
function2();
else if (3 == value)
function3();
// assembly version
compare value, 1
jump if zero label1
compare value, 2
jump if zero label2
compare value, 3
jump if zero label3
label1:
call function1
label2:
call function2
label3:
call function3
Next is the switch version:
// C version
switch (value) {
case 1: function1(); break;
case 2: function2(); break;
case 3: function3(); break;
}
// assembly version
add program_counter, value
call function1
call function2
call function3
You can see that the resulting assembly code is much more compact. Note that the value would need to be transformed in some way to handle other values than 1, 2 and 3. However, this should illustrate the concept.

Switch can be optimized by compiler - you will get faster code.
Also I find it to be more elegant when dealing with enumerable types.
To sum up switch statement gives you performance + code elegance :)
Here are some useful links:
speed comparison of switch vs if/else in C#
Feedback-Guided Switch Statement
Optimization (pdf describing switch statement optimization)

I'm ignoring this type of low level optimization as usually unimportant, and probably different from compiler to compiler.
I'd say the main difference is readability. if/else is very flexible, but when you see a switch you know right away that all of the tests are against the same expression.

For expressiveness, the switch/case statement allows you to group multiple cases together, for example:
case 1,2,3: do(this); break;
case 4,5,6: do(that); break;
For performance, compilers can sometimes optimize switch statements into jump tables.

Besides the other mentioned Code readability and optimisation in .NET you also get the ability to switch on enums etc
enum Color { Red, Green, Blue };
Color c = Color.Red;
switch (c) // Switch on the enum
{
// no casting and no need to understand what int value it is
case Color.Red: break;
case Color.Green: break;
case Color.Blue: break;
}

The ability to fall through several cases (intentionally leaving out the break statement) can be useful, and as a few people have already said it's faster as well. Perhaps the most important and least important consideration though, is that it just makes for prettier code than if/else. :)

Switch can be optimized "Better" by some compilers. There are pitfalls with using the switch statement in certain languages. In Java, the switch cannot handle strings and in VB2005 the switch statement will not work with radio buttons.
Switch can be faster and easier to read, If-Then is more generic and will work in more places.

The only time switches can be faster are when your case values are constants, not dynamic or otherwise derived, and when the number of cases is significantly larger than the time to calculate a hash into a lookup table.
Case in point for Javascript, which compiles to assembly for execution on most engines, including Chrome's V8 engine, is that switch statements are 30%-60% slower to execute in the common case: http://jsperf.com/switch-if-else/20

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Avoiding Branch Operations in Switch Statements [duplicate] - c++

The compiler will make this decision for you, based on your optimisation settings and its heuristics about what might make a good jump table. In many simple situations, the compiler might decide that a naive test-and-jump chain would be faster or smaller than the equivalent jump table.

That is indeed what you should expect from a decent compiler. In fact, the limitations of the switch statement were based on making it easy to translate to a "jump table" instead of multiple branches. Back at the dawn of time, FORTRAN had the "computed GOTO" for the same reason.

Related

What is the rationale for C++ switch only working for integral types? [duplicate]

Switch statement with huge number of cases

Optimal virtual machine/byte-code interpreter loop

How does switch compile in Visual C++ and how optimized and fast is it?

Why the switch statement and not if-else?

Categories

Resources