Default in switch case - c++

The below is the code which I need to optimize and planned that it would be good to move to the switch construct. But I can compare in case. So I planned to make the comparison (len > 3) as the default case.
If I make the comparison part (len > 3) as the default case and add the default as the first in the switch, will it be faster?
Or how can I make the below code as a switch statement?
if ( len > 3 ) {
// Which will happen more often;
}
else if ( len == 3 ) {
// Next case which may occur often;
} else if ( len == 2 ) {
// The next priority case;
} else {
// And this case occurs rarely;
}

Probably not. Both if...else and switch...case are high-level constructs. What slows you down is the branch prediction. The better your prediction is the faster your code will run. You should put your most occurring case in first if, second in the else if and so on, just like you wrote. For switch the result depends on the internal compiler implementation which can reorder the cases despite your own order. The default should be actually reserved for the less occurring situations because rest of the conditions must be checked before falling back to default.
To conclude all this, performance-wise usage of if...else is optimal as long as you set your conditions in the correct order. Regarding switch...case it's compiler specific and depends on the applied optimizations.
Also note that switch...case is more limited than if...else since it supports only simple comparison of values.

Although you've accepted what is probably the best answer, I wanted to provide an alternative.
Note that the standard caveat applies - optimisation isn't optimisation unless you've profiled your code.
However if you are encountering poor performance relating to branches, you can reduce or eliminate them. That your code has one or more inequality comparisons is not an obstacle - you can reduce your cases down to a set of direct equalities, and if necessary use that to index a table, rather than branch at all.
void doSomething(int len)
{
static const char* str[] =
{ "%2d > 3\n",
"%2d < 2\n",
"%2d = 2\n",
"%2d = 3\n"
};
int m1 = (len-2)>>31;
int m2 = (len-4)>>31;
int r = (len & m2 & ~m1) + !!m1;
printf(str[r],len);
}
Note that this codes makes several assumptions which may not hold in practice, but as we're making the wild assumption that this even needs optimising in the first place...
Also, note that better optimisations may be possible with more knowledge about the actual range and type of the input parameter, and indeed what the actual actions taken need to be.

You can't move comparisons into a switch statement... it uses single checks for its selections.. i.e.,
switch (len) {
case 1:
// Do case 1 stuff here
break;
case 2:
// Do case 2 stuff here
break;
case 3:
// Do case 3 stuff here
break;
}
Use breaks to prevent the case statements from running into each other. Read more here.
Your code is as 'optimized' as it will get in its current state...

The only way you're going to know is to benchmark it with your
compiler. If performance is an issue, you should use the option
to provide the compiler with profiler output, and let it decide;
it will generally find the best solution. (Note that even on
a specific architecture, like Intel, the best solution in terms
of machine instructions may vary from one processor to the
next.)
In your case, the switch would probably look like:
switch ( len ) {
case 2:
// ...
break;
case 3:
// ...
break;
default:
if ( len > 3 ) {
// ...
} else {
// ...
}
}
With only two effective cases, the compiler doesn't have much to
work with. A typical implementation (without extreme
optimization) would do bounds checking, then a table lookup for
the two explicit cases. Any decent compiler will then pick up
that the comparison in your default case corresponds to one of
the bounds checks it has already done, and not duplicate it.
But with only two cases, the jump table will probably not make
a significant difference compared to the two comparisons,
especially as you'll be out of bounds in the most frequent case.
Until you have actual profiler information that this is
a bottleneck in your code, I wouldn't worry about it. Once you
have that information, you can profile different variants to see
which is faster, but I suspect that if you use maximum
optimization and feed profiling information back into the
compiler, there will be no difference.

If you are worried about speed, the truth is that your if...else or switch...case statements won't have a real impact of your application speed unless you have hundreds of them. The places where you lose speed is in your iterations or loops. To answer your question specifically, you cannot convert your if...else statement to a switch...case statement with the default appearing first; but with that said, if you did convert to a switch...case then you will that they run at the same speed (difference is too minute to be picked up by conventional benchmarking tools).

You can use a range in a case:
switch (len) {
case 3 ... INT_MAX:
// ...
break;
case 2:
// ...
break;
default:
// ...
break;
}
But that is an extension provided by GCC...

Related

Can I replace an if-statement with AND?

My prof once said, that if-statements are rather slow and should be avoided as much as possible. I'm making a game in OpenGL, where I need a lot of them.
In my tests replacing an if-statement with AND via short-circuiting worked, but is it faster?
bool doSomething();
int main()
{
int randomNumber = std::rand() % 10;
randomNumber == 5 && doSomething();
return 0;
}
bool doSomething()
{
std::cout << "function executed" << std::endl;
return true;
}
My intention is to use this inside the draw function of my renderer. My models are supposed to have flags, if a flag is true, a certain function should execute.
if-statements are rather slow and should be avoided as much as possible.
This is wrong and/or misleading. Most simplified statements about slowness of a program are wrong. There's probably something wrong with this answer too.
C++ statements don't have a speed that can be attributed to them. It's the speed of the compiled program that matters. And that consists of assembly language instructions; not of C++ statements.
What would probably be more correct is to say that branch instructions can be relatively slow (on modern, superscalar CPU architectures) (when the branch cannot be predicted well) (depending on what you are comparing to; there are many things that are much more expensive).
randomNumber == 5 && doSomething();
An if-statement is often compiled into a program that uses a branch instruction. A short-circuiting logical-and operation is also often compiled into a program that uses a branch instruction. Replacing if-statement with a logical-and operator is not a magic bullet that makes the program faster.
If you were to compare the program produced by the logical-and and the corresponding program where it is replaced with if (randomNumber == 5), you would find that the optimiser sees through your trick and produces the same assembly in both cases.
My models are supposed to have flags, if a flag is true, a certain function should execute.
In order to avoid the branch, you must change the premise. Instead of iterating through a sequence of all models, checking flag, and conditionally calling a function, you could create a sequence of all models for which the function should be called, iterate that, and call the function unconditionally -> no branching. Is this alternative faster? There is certainly some overhead of maintaining the data structure and the branch predictor may have made this unnecessary. Only way to know for sure is to measure the program.
I agree with the comments above that in almost all practical cases, it's OK to use ifs as much as you need without hesitation.
I also agree that it is not an issue important for a beginner to waste energy on optimizing, and that using logical operators will likely to emit code similar to ifs.
However - there is a valid issue here related to branching in general, so those who are interested are welcome to read on.
Modern CPUs use what we call Instruction pipelining.
Without getting too deap into the technical details:
Within each CPU core there is a level of parallelism.
Each assembly instruction is composed of several stages, and while the current instruction is executed, the next instructions are prepared to a certain degree.
This is called instruction pipelining.
This concept is broken with any kind of branching in general, and conditionals (ifs) in particular.
It's true that there is a mechanism of branch prediction, but it works only to some extent.
So although in most cases ifs are totally OK, there are cases it should be taken into account.
As always when it comes to optimizations, one should carefully profile.
Take the following piece of code as an example (similar things are common in image processing and other implementations):
unsigned char * pData = ...; // get data from somewhere
int dataSize = 100000000; // something big
bool cond = ...; // initialize some condition for relevant for all data
for (int i = 0; i < dataSize; ++i, ++pData)
{
if (cond)
{
*pData = 2; // imagine some small calculation
}
else
{
*pData = 3; // imagine some other small calculation
}
}
It might be better to do it like this (even though it contains duplication which is evil from software engineering point of view):
if (cond)
{
for (int i = 0; i < dataSize; ++i, ++pData)
{
*pData = 2; // imagine some small calculation
}
}
else
{
for (int i = 0; i < dataSize; ++i, ++pData)
{
*pData = 3; // imagine some other small calculation
}
}
We still have an if but it's causing to branch potentially only once.
In certain [rare] cases (requires profiling as mentioned above) it will be more efficient to do even something like this:
for (int i = 0; i < dataSize; ++i, ++pData)
{
*pData = (2 * cond + 3 * (!cond));
}
I know it's not common , but I encountered specific HW some years ago on which the cost of 2 multiplications and 1 addition with negation was less than the cost of branching (due to reset of instruction pipeline). Also this "trick" supports using different condition values for different parts of the data.
Bottom line: ifs are usually OK, but it's good to be aware that sometimes there is a cost.

How to speed up dynamic dispatch by 20% using computed gotos in standard C++

Before you down-vote or start saying that gotoing is evil and obsolete, please read the justification of why it is viable in this case. Before you mark it as duplicate, please read the full question.
I was reading about virtual machine interpreters, when I stumbled across computed gotos . Apparently they allow significant performance improvement of certain pieces of code. The most known example is the main VM interpreter loop.
Consider a (very) simple VM like this:
#include <iostream>
enum class Opcode
{
HALT,
INC,
DEC,
BIT_LEFT,
BIT_RIGHT,
RET
};
int main()
{
Opcode program[] = { // an example program that returns 10
Opcode::INC,
Opcode::BIT_LEFT,
Opcode::BIT_LEFT,
Opcode::BIT_LEFT,
Opcode::INC,
Opcode::INC,
Opcode::RET
};
int result = 0;
for (Opcode instruction : program)
{
switch (instruction)
{
case Opcode::HALT:
break;
case Opcode::INC:
++result;
break;
case Opcode::DEC:
--result;
break;
case Opcode::BIT_LEFT:
result <<= 1;
break;
case Opcode::BIT_RIGHT:
result >>= 1;
break;
case Opcode::RET:
std::cout << result;
return 0;
}
}
}
All this VM can do is a few simple operations on one number of type int and print it. In spite of its doubtable usefullness, it illustrates the subject nonetheless.
The critical part of the VM is obviously the switch statement in the for loop. Its performance is determined by many factors, of which the most inportant ones are most certainly branch prediction and the action of jumping to the appropriate point of execution (the case labels).
There is room for optimization here. In order to speed up the execution of this loop, one might use, so called, computed gotos.
Computed Gotos
Computed gotos are a construct well known to Fortran programmers and those using a certain (non-standard) GCC extension. I do not endorse the use of any non-standard, implementation-defined, and (obviously) undefined behavior. However to illustrate the concept in question, I will use the syntax of the mentioned GCC extension.
In standard C++ we are allowed to define labels that can later be jumped to by a goto statement:
goto some_label;
some_label:
do_something();
Doing this isn't considered good code (and for a good reason!). Although there are good arguments against using goto (of which most are related to code maintainability) there is an application for this abominated feature. It is the improvement of performance.
Using a goto statement can be faster than a function invocation. This is because the amount of "paperwork", like setting up the stack and returning a value, that has to be done when invoking a function. Meanwhile a goto can sometimes be converted into a single jmp assembly instruction.
To exploit the full potential of goto an extension to the GCC compiler was made that allows goto to be more dynamic. That is, the label to jump to can be determined at run-time.
This extension allows one to obtain a label pointer, similar to a function pointer and gotoing to it:
void* label_ptr = &&some_label;
goto (*label_ptr);
some_label:
do_something();
This is an interesting concept that allows us to further enhance our simple VM. Instead of using a switch statement we will use an array of label pointers (a so called jump table) and than goto to the appropriate one (the opcode will be used to index the array):
// [Courtesy of Eli Bendersky][4]
// This code is licensed with the [Unlicense][5]
int interp_cgoto(unsigned char* code, int initval) {
/* The indices of labels in the dispatch_table are the relevant opcodes
*/
static void* dispatch_table[] = {
&&do_halt, &&do_inc, &&do_dec, &&do_mul2,
&&do_div2, &&do_add7, &&do_neg};
#define DISPATCH() goto *dispatch_table[code[pc++]]
int pc = 0;
int val = initval;
DISPATCH();
while (1) {
do_halt:
return val;
do_inc:
val++;
DISPATCH();
do_dec:
val--;
DISPATCH();
do_mul2:
val *= 2;
DISPATCH();
do_div2:
val /= 2;
DISPATCH();
do_add7:
val += 7;
DISPATCH();
do_neg:
val = -val;
DISPATCH();
}
}
This version is about 25% faster than the one that uses a switch (the one on the linked blog post, not the one above). This is because there is only one jump performed after each operation, instead of two.
Control flow with switch:
For example, if we wanted to execute Opcode::FOO and then Opcode::SOMETHING, it would look like this:
As you can see, there are two jumps being performed after an instruction is executed. The first one is back to the switch code and the second is to the actual instruction.
In contrary, if we would go with an array of label pointers (as a reminder, they are non-standard), we would have only one jump:
It is worthwhile to note that in addition to saving cycles by doing less operations, we also enhance the quality of branch prediction by eliminating the additional jump.
Now, we know that by using an array of label pointers instead of a switch we can improve the performance of our VM significantly (by about 20%). I figured that maybe this could have some other applications too.
I came to the conclusion that this technique could be used in any program that has a loop in which it sequentially indirectly dispatches some logic. A simple example of this (apart from the VM) could be invoking a virtual method on every element of a container of polymorphic objects:
std::vector<Base*> objects;
objects = get_objects();
for (auto object : objects)
{
object->foo();
}
Now, this has much more applications.
There is one problem though: There is nothing such as label pointers in standard C++. As such, the question is: Is there a way to simulate the behaviour of computed gotos in standard C++ that can match them in performance?.
Edit 1:
There is yet another down side to using the switch. I was reminded of it by user1937198. It is bound checking. In short, it checks if the value of the variable inside of the switch matches any of the cases. It adds redundant branching (this check is mandated by the standard).
Edit 2:
In response to cmaster, I will clarify what is my idea on reducing overhead of virtual function calls. A dirty approach to this would be to have an id in each derived instance representing its type, that would be used to index the jump table (label pointer array). The problem is that:
There are no jump tables is standard C++
It would require as to modify all jump tables when a new derived class is added.
I would be thankful, if someone came up with some type of template magic (or a macro as a last resort), that would allow to write it to be more clean, extensible and automated, like this:
On a recent versions of MSVC, the key is to give the optimizer the hints it needs so that it can tell that just indexing into the jump table is a safe transform. There are two constraints on the original code that prevent this, and thus make optimising to the code generated by the computed label code an invalid transform.
Firstly in the original code, if the program counter overflows the program, then the loop exits. In the computed label code, undefined behavior (dereferencing an out of range index) is invoked. Thus the compiler has to insert a check for this, causing it to generate a basic block for the loop header rather than inlining that in each switch block.
Secondly in the original code, the default case is not handled. Whilst the switch covers all enum values, and thus it is undefined behavior for no branches to match, the msvc optimiser is not intelligent enough to exploit this, so generates a default case that does nothing. Checking this default case requires a conditional as it handles a large range of values. The computed goto code invokes undefined behavior in this case as well.
The solution to the first issue is simple. Don't use a c++ range for loop, use a while loop or a for loop with no condition. The solution for the second unfortunatly requires platform specific code to tell the optimizer the default is undefined behavior in the form of _assume(0), but something analogous is present in most compilers (__builtin_unreachable() in clang and gcc), and can be conditionally compiled to nothing when no equivalent is present without any correctness issues.
So the result of this is:
#include <iostream>
enum class Opcode
{
HALT,
INC,
DEC,
BIT_LEFT,
BIT_RIGHT,
RET
};
int run(Opcode* program) {
int result = 0;
for (int i = 0; true;i++)
{
auto instruction = program[i];
switch (instruction)
{
case Opcode::HALT:
break;
case Opcode::INC:
++result;
break;
case Opcode::DEC:
--result;
break;
case Opcode::BIT_LEFT:
result <<= 1;
break;
case Opcode::BIT_RIGHT:
result >>= 1;
break;
case Opcode::RET:
std::cout << result;
return 0;
default:
__assume(0);
}
}
}
The generated assembly can be verified on godbolt

Performance function call vs multiplication by 1

Look at this function:
float process(float in) {
float out = in;
for (int i = 0; i < 31; ++i) {
if (biquads_[i]) {
out = biquads_[i]->filter(out);
}
}
return out;
}
biquads_ is a std::optional<Biquad>[31].
in this case i check for every optional to check if its not empty, and then call the filter function of biquad, if instead I unconditionally call filter function, changing it to multiply by 1 or simply return the input value, would be more efficient?
Most likely it won't make a shread of difference (guessing somewhat though since your question is not entirely clear). For two reasons: 1) unless the code is going to be used in a very hot path, it won't matter even if one way is a few nanoseconds faster than the other. 2) most likely your compilers optimizer will be clever enough to generate code that performs close-to (if not identical to) the same in both cases. Did you test it? Did you benchmark/profile it? If not; do so - with optimization enabled.
Strive to write clear, readable, maintainable code. Worry about micro-optimization later when you actually have a problem and your profiler points to your function as a hot-spot.

Switch or else if statements - what is more clear in C++

As in title. There are a lot similar questions but I will give different example: I have 2 enums
enum A
{
A_ONE,
A_TWO
};
enum B
{
B_ONE,
B_TWO
};
What is more clear switch by enum A and then in all cases switch by enum B?
A type1;
B type2;
switch(type1)
{
case A_ONE:
switch(type2)
{
case B_ONE:
//statement1
break;
case B_TWO:
//statement2
break;
}
break;
case A_TWO:
switch(type2)
{
case B_ONE:
//statement3
break;
case B_TWO:
//statement4
break;
}
break;
}
or using else if
if(type1 == A_ONE && type2 == B_ONE)
//statement1
else if(type1 == A_ONE && type2 == B_TWO)
//statement2
else if(type1 == A_TWO && type2 == B_ONE)
//statement3
else if(type1 == A_TWO && type2 == B_TWO)
//statement4
Which is better practice? What do you preffer
It's more of a style issue than anything else. If you are only checking for the presence of two conditions over limited sets of data, the switch() approach is easier to follow, and less prone to issues (forgetting the final else to go with if, else if; using the assignment operator = instead of the equivalence operator ==; accidentally using the binary bitwise AND & operator instead of the binary logical AND && operator, etc).
The only potential downside of the switch() approach is to forget to put a break statement under each case, but you can use CppCheck or enable -Wswitch-fallthrough to cause compiler warnings or failures in such an event.
Edit
Forgot to mention to always have a default case in switches. I always assume it's a given.
So, use:
-Wswitch-default: force default cases in switch statements.
I stand corrected, -Wswitch-fallthrough is not yet implemented. Too bad, since clang has had it for a while. Use CppCheck as part of your build/QA process to avoid getting bit by this oversight.
I suggest you this:
Use 'else-if' statements if the amount of data you´re evaluating is low.
Use 'switch' when there is a considerable amount of data to test.
In your example there are only 4 posibble combinations, so it´s ok to do it that way, but think for example: what if your enum A would have 10 elements and enum B would have 6, yo would need 60 'else-if' statements to cover all posibilities, which later when you try to read it may become difficult, instead if you would use 'switch' you also need 60 declarations, but later you will find it more easy for read or modification
It depends.
I think it is a matter of taste, personal preferences or company standards.
In the case of two enumeration literals if-else statement may be equivalent to the switch statement. But as the number of literals grows the if-else statements may become unmaintainable or difficult to read, while the switch statement may help to systematically cover all cases.
Additionally I would suggest to consider putting default case to cover abnormal circumstances.

Why are C++ switch statements limited to constant expressions?

I wanted to use macro functions in switch statements before I realized the statements need to be constant. Example (does not compile):
#define BAND_FIELD1(B) (10 * B + 1)
...
#define BAND_FIELD7(B) (10 * B + 7)
int B = myField % 10;
switch (myField) {
case BAND_FIELD1(B):
variable1[B] = 123;
break;
case BAND_FIELD7(B):
variable7[B] = 321;
break;
...
}
I rather had to use if .. else:
if (myField == BAND_FIELD1(B)
variable1[B] = 123;
else if (myField == BAND_FIELD7(B)
variable7[B] = 321;
Why are the C++ switch statements limited to constant expressions?
One of the strengths of C++ is its static checking. The switch statement is a static control flow construct, whose power lies in the ability to check (statically) whether all cases have been considered, and in being able to group cases sensibly (e.g. fall through common parts).
If you want to check conditions dynamically, you can already do so with a variety of techniques (if statements, conditional operators, associative arrays, virtual functions, to name a few).
The compiler can generate the fastest possible code for a switch when presented with constants--e.g. jump tables or binary search trees.
When given non-constant values, it can't generate code that's any faster than chained if-else statements. Which you already have at your disposal anyway!
Why are the c++ switch statements limited to constant expressions?
Because the check performed by the switch statements are static. This means that the expressions need to be known at compile time.
In C++11 you can use constexpr (if the expressions are derivated by other constant expressions) in your favor. For example consider this function (that replaces your #define):
inline constexpr int BAND_FIELD1(int B) {
return 10 * B + 1;
}
used in the following simplified version of your code:
constexpr int myField = 0;
constexpr int B = myField % 10;
int variable1 = 0;
switch (myField) {
case BAND_FIELD1(B):
variable1 = 123;
break;
// ...
default: break;
}
As you can see, the above code will easily compile.
My answer would be that the C++ switch is a leftover from the C switch, which is a leftover from antique languages like PL/M.
This case uniqueness is just a chance byproduct of a construct that dates back from the 70's, in my opinion.
It does not guarantee in any way that all cases have been covered, especially given the weak typing of C++ enums.
Considering the piles of assembly code C++ routinely generates behind the scene, arguing that C++ switch is limited to constants for performance reasons seems a bit rich to me.
Many other languages support variables and/or non-numeric expressions in switch statements, and I haven't seen many programmers complain about the possible duplicate case values.
To help answer the question, consider the following (IF it was legal):
int a=15;
int b=16;
int c=15;
int value = a;
switch (value)
{
case a:
// Value is A
break;
case b:
// value is B
break;
case c:
// value is C
break;
default:
// value is unknown
}
We would never execute the code for C, because A would be executed instead. The purpose of switch is that check a value which has unique, testable values. Its a clean if..else if..else..if..else.
Compilers analyze the constant values at compile time to generate optimized lookup tables and decision trees to select a case quickly.
For example, if a table can’t be used, the compiler may generate efficient trinary decision trees using sets of 3 cpu instructions: compare to a value to set cpu flags, branch lower to other cases, branch higher to other cases, or fall through to the equal case.
Remember that C/C++ is concerned with performance.
Maybe a new syntax will be added or compilers will be expanded in the future, but standards advancements try to be incremental and not break code that was already written, or invalidate existing compilers.