What's the best practice for using a switch statement vs using an if statement for 30 unsigned enumerations where about 10 have an expected action (that presently is the same action). Performance and space need to be considered but are not critical. I've abstracted the snippet so don't hate me for the naming conventions.
switch statement:
// numError is an error enumeration type, with 0 being the non-error case
// fire_special_event() is a stub method for the shared processing
switch (numError)
{
case ERROR_01 : // intentional fall-through
case ERROR_07 : // intentional fall-through
case ERROR_0A : // intentional fall-through
case ERROR_10 : // intentional fall-through
case ERROR_15 : // intentional fall-through
case ERROR_16 : // intentional fall-through
case ERROR_20 :
{
fire_special_event();
}
break;
default:
{
// error codes that require no additional action
}
break;
}
if statement:
if ((ERROR_01 == numError) ||
(ERROR_07 == numError) ||
(ERROR_0A == numError) ||
(ERROR_10 == numError) ||
(ERROR_15 == numError) ||
(ERROR_16 == numError) ||
(ERROR_20 == numError))
{
fire_special_event();
}
Use switch.
In the worst case the compiler will generate the same code as a if-else chain, so you don't lose anything. If in doubt put the most common cases first into the switch statement.
In the best case the optimizer may find a better way to generate the code. Common things a compiler does is to build a binary decision tree (saves compares and jumps in the average case) or simply build a jump-table (works without compares at all).
For the special case that you've provided in your example, the clearest code is probably:
if (RequiresSpecialEvent(numError))
fire_special_event();
Obviously this just moves the problem to a different area of the code, but now you have the opportunity to reuse this test. You also have more options for how to solve it. You could use std::set, for example:
bool RequiresSpecialEvent(int numError)
{
return specialSet.find(numError) != specialSet.end();
}
I'm not suggesting that this is the best implementation of RequiresSpecialEvent, just that it's an option. You can still use a switch or if-else chain, or a lookup table, or some bit-manipulation on the value, whatever. The more obscure your decision process becomes, the more value you'll derive from having it in an isolated function.
The switch is faster.
Just try if/else-ing 30 different values inside a loop, and compare it to the same code using switch to see how much faster the switch is.
Now, the switch has one real problem : The switch must know at compile time the values inside each case. This means that the following code:
// WON'T COMPILE
extern const int MY_VALUE ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
won't compile.
Most people will then use defines (Aargh!), and others will declare and define constant variables in the same compilation unit. For example:
// WILL COMPILE
const int MY_VALUE = 25 ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
So, in the end, the developper must choose between "speed + clarity" vs. "code coupling".
(Not that a switch can't be written to be confusing as hell... Most the switch I currently see are of this "confusing" category"... But this is another story...)
Edit 2008-09-21:
bk1e added the following comment: "Defining constants as enums in a header file is another way to handle this".
Of course it is.
The point of an extern type was to decouple the value from the source. Defining this value as a macro, as a simple const int declaration, or even as an enum has the side-effect of inlining the value. Thus, should the define, the enum value, or the const int value change, a recompilation would be needed. The extern declaration means the there is no need to recompile in case of value change, but in the other hand, makes it impossible to use switch. The conclusion being Using switch will increase coupling between the switch code and the variables used as cases. When it is Ok, then use switch. When it isn't, then, no surprise.
.
Edit 2013-01-15:
Vlad Lazarenko commented on my answer, giving a link to his in-depth study of the assembly code generated by a switch. Very enlightning: http://lazarenko.me/switch/
Compiler will optimise it anyway - go for the switch as it's the most readable.
The Switch, if only for readability. Giant if statements are harder to maintain and harder to read in my opinion.
ERROR_01 : // intentional fall-through
or
(ERROR_01 == numError) ||
The later is more error prone and requires more typing and formatting than the first.
Code for readability. If you want to know what performs better, use a profiler, as optimizations and compilers vary, and performance issues are rarely where people think they are.
Compilers are really good at optimizing switch. Recent gcc is also good at optimizing a bunch of conditions in an if.
I made some test cases on godbolt.
When the case values are grouped close together, gcc, clang, and icc are all smart enough to use a bitmap to check if a value is one of the special ones.
e.g. gcc 5.2 -O3 compiles the switch to (and the if something very similar):
errhandler_switch(errtype): # gcc 5.2 -O3
cmpl $32, %edi
ja .L5
movabsq $4301325442, %rax # highest set bit is bit 32 (the 33rd bit)
btq %rdi, %rax
jc .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Notice that the bitmap is immediate data, so there's no potential data-cache miss accessing it, or a jump table.
gcc 4.9.2 -O3 compiles the switch to a bitmap, but does the 1U<<errNumber with mov/shift. It compiles the if version to series of branches.
errhandler_switch(errtype): # gcc 4.9.2 -O3
leal -1(%rdi), %ecx
cmpl $31, %ecx # cmpl $32, %edi wouldn't have to wait an extra cycle for lea's output.
# However, register read ports are limited on pre-SnB Intel
ja .L5
movl $1, %eax
salq %cl, %rax # with -march=haswell, it will use BMI's shlx to avoid moving the shift count into ecx
testl $2150662721, %eax
jne .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Note how it subtracts 1 from errNumber (with lea to combine that operation with a move). That lets it fit the bitmap into a 32bit immediate, avoiding the 64bit-immediate movabsq which takes more instruction bytes.
A shorter (in machine code) sequence would be:
cmpl $32, %edi
ja .L5
mov $2150662721, %eax
dec %edi # movabsq and btq is fewer instructions / fewer Intel uops, but this saves several bytes
bt %edi, %eax
jc fire_special_event
.L5:
ret
(The failure to use jc fire_special_event is omnipresent, and is a compiler bug.)
rep ret is used in branch targets, and following conditional branches, for the benefit of old AMD K8 and K10 (pre-Bulldozer): What does `rep ret` mean?. Without it, branch prediction doesn't work as well on those obsolete CPUs.
bt (bit test) with a register arg is fast. It combines the work of left-shifting a 1 by errNumber bits and doing a test, but is still 1 cycle latency and only a single Intel uop. It's slow with a memory arg because of its way-too-CISC semantics: with a memory operand for the "bit string", the address of the byte to be tested is computed based on the other arg (divided by 8), and isn't limited to the 1, 2, 4, or 8byte chunk pointed to by the memory operand.
From Agner Fog's instruction tables, a variable-count shift instruction is slower than a bt on recent Intel (2 uops instead of 1, and shift doesn't do everything else that's needed).
Use switch, it is what it's for and what programmers expect.
I would put the redundant case labels in though - just to make people feel comfortable, I was trying to remember when / what the rules are for leaving them out.
You don't want the next programmer working on it to have to do any unnecessary thinking about language details (it might be you in a few months time!)
Sorry to disagree with the current accepted answer. This is the year 2021. Modern compilers and their optimizers shouldn't differentiate between switch and an equivalent if-chain anymore. If they still do, and create poorly optimized code for either variant, then write to the compiler vendor (or make it public here, which has a higher change of being respected), but don't let micro-optimizations influence your coding style.
So, if you use:
switch (numError) { case ERROR_A: case ERROR_B: ... }
or:
if(numError == ERROR_A || numError == ERROR_B || ...) { ... }
or:
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
constexpr std::array errList = { ERROR_A, ERROR_B, ... };
if(has(errList, rnd)) { ... }
shouldn't make a difference with respect to execution speed. But depending on what project you are working on, they might make a big difference in coding clarity and code maintainability. For example, if you have to check for a certain error list in many places of the code, the templated has() might be much easier to maintain, as the errList needs to be updated only in one place.
Talking about current compilers, I have compiled the test code quoted below with both clang++ -O3 -std=c++1z (version 10 and 11) and g++ -O3 -std=c++1z. Both clang versions gave similiar compiled code and execution times. So I am talking only about version 11 from now on. Most notably, functionA() (which uses if) and functionB() (which uses switch) produce exactly the same assembler output with clang! And functionC() uses a jump table, even though many other posters deemed jump tables to be an exclusive feature of switch. However, despite many people considering jump tables to be optimal, that was actually the slowest solution on clang: functionC() needs around 20 percent more execution time than functionA() or functionB().
The hand-optimized version functionH() was by far the fastest on clang. It even unrolled the loop partially, doing two iterations on each loop.
Actually, clang calculated the bitfield, which is explicitely supplied in functionH(), also in functionA() and functionB(). However, it used conditional branches in functionA() and functionB(), which made these slow, because branch prediction fails regularly, while it used the much more efficient adc ("add with carry") in functionH(). While it failed to apply this obvious optimization also in the other variants, is unknown to me.
The code produced by g++ looks much more complicated than that of clang - but actually runs a bit faster for functionA() and quite a lot faster for functionC(). Of the non-hand-optimized functions, functionC() is the fastest on g++ and faster than any of the functions on clang. On the contrary, functionH() requires twice the execution time when compiled with g++ instead of with clang, mostly because g++ doesn't do the loop unrolling.
Here are the detailed results:
clang:
functionA: 109877 3627
functionB: 109877 3626
functionC: 109877 4192
functionH: 109877 524
g++:
functionA: 109877 3337
functionB: 109877 4668
functionC: 109877 2890
functionH: 109877 982
The Performance changes drastically, if the constant 32 is changed to 63 in the whole code:
clang:
functionA: 106943 1435
functionB: 106943 1436
functionC: 106943 4191
functionH: 106943 524
g++:
functionA: 106943 1265
functionB: 106943 4481
functionC: 106943 2804
functionH: 106943 1038
The reason for the speedup is, that in case, that the highest tested value is 63, the compilers remove some unnecessary bound checks, because the value of rnd is bound to 63, anyways. Note that with that bound check removed, the non-optimized functionA() using simple if() on g++ performs almost as fast as the hand-optimized functionH(), and it also produces rather similiar assembler output.
What is the conclusion? If you hand-optimize and test compilers a lot, you will get the fastest solution. Any assumption whether switch or if is better, is void - they are the same on clang. And the easy to code solution to check against an array of values is actually the fastest case on g++ (if leaving out hand-optimization and by-incident matching last values of the list).
Future compiler versions will optimize your code better and better and get closer to your hand optimization. So don't waste your time on it, unless cycles are REALLY crucial in your case.
Here the test code:
#include <iostream>
#include <chrono>
#include <limits>
#include <array>
#include <algorithm>
unsigned long long functionA() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(rnd == 1 || rnd == 7 || rnd == 10 || rnd == 16 ||
rnd == 21 || rnd == 22 || rnd == 63)
{
cnt += 1;
}
}
return cnt;
}
unsigned long long functionB() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
switch(rnd) {
case 1:
case 7:
case 10:
case 16:
case 21:
case 22:
case 63:
cnt++;
break;
}
}
return cnt;
}
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
unsigned long long functionC() {
unsigned long long cnt = 0;
constexpr std::array errList { 1, 7, 10, 16, 21, 22, 63 };
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
cnt += has(errList, rnd);
}
return cnt;
}
// Hand optimized version (manually created bitfield):
unsigned long long functionH() {
unsigned long long cnt = 0;
const unsigned long long bitfield =
(1ULL << 1) +
(1ULL << 7) +
(1ULL << 10) +
(1ULL << 16) +
(1ULL << 21) +
(1ULL << 22) +
(1ULL << 63);
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(bitfield & (1ULL << rnd)) {
cnt += 1;
}
}
return cnt;
}
void timeit(unsigned long long (*function)(), const char* message)
{
unsigned long long mintime = std::numeric_limits<unsigned long long>::max();
unsigned long long fres = 0;
for(int i = 0; i < 100; i++) {
auto t1 = std::chrono::high_resolution_clock::now();
fres = function();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
if(duration < mintime) {
mintime = duration;
}
}
std::cout << message << fres << " " << mintime << std::endl;
}
int main(int argc, char* argv[]) {
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
return 0;
}
IMO this is a perfect example of what switch fall-through was made for.
They work equally well. Performance is about the same given a modern compiler.
I prefer if statements over case statements because they are more readable, and more flexible -- you can add other conditions not based on numeric equality, like " || max < min ". But for the simple case you posted here, it doesn't really matter, just do what's most readable to you.
If your cases are likely to remain grouped in the future--if more than one case corresponds to one result--the switch may prove to be easier to read and maintain.
switch is definitely preferred. It's easier to look at a switch's list of cases & know for sure what it is doing than to read the long if condition.
The duplication in the if condition is hard on the eyes. Suppose one of the == was written !=; would you notice? Or if one instance of 'numError' was written 'nmuError', which just happened to compile?
I'd generally prefer to use polymorphism instead of the switch, but without more details of the context, it's hard to say.
As for performance, your best bet is to use a profiler to measure the performance of your application in conditions that are similar to what you expect in the wild. Otherwise, you're probably optimizing in the wrong place and in the wrong way.
I agree with the compacity of the switch solution but IMO you're hijacking the switch here.
The purpose of the switch is to have different handling depending on the value.
If you had to explain your algo in pseudo-code, you'd use an if because, semantically, that's what it is: if whatever_error do this...
So unless you intend someday to change your code to have specific code for each error, I would use if.
I'm not sure about best-practise, but I'd use switch - and then trap intentional fall-through via 'default'
Aesthetically I tend to favor this approach.
unsigned int special_events[] = {
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20
};
int special_events_length = sizeof (special_events) / sizeof (unsigned int);
void process_event(unsigned int numError) {
for (int i = 0; i < special_events_length; i++) {
if (numError == special_events[i]) {
fire_special_event();
break;
}
}
}
Make the data a little smarter so we can make the logic a little dumber.
I realize it looks weird. Here's the inspiration (from how I'd do it in Python):
special_events = [
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20,
]
def process_event(numError):
if numError in special_events:
fire_special_event()
while (true) != while (loop)
Probably the first one is optimised by the compiler, that would explain why the second loop is slower when increasing loop count.
I would pick the if statement for the sake of clarity and convention, although I'm sure that some would disagree. After all, you are wanting to do something if some condition is true! Having a switch with one action seems a little... unneccesary.
Im not the person to tell you about speed and memory usage, but looking at a switch statment is a hell of a lot easier to understand then a large if statement (especially 2-3 months down the line)
I would say use SWITCH. This way you only have to implement differing outcomes. Your ten identical cases can use the default. Should one change all you need to is explicitly implement the change, no need to edit the default. It's also far easier to add or remove cases from a SWITCH than to edit IF and ELSEIF.
switch(numerror){
ERROR_20 : { fire_special_event(); } break;
default : { null; } break;
}
Maybe even test your condition (in this case numerror) against a list of possibilities, an array perhaps so your SWITCH isn't even used unless there definately will be an outcome.
Seeing as you only have 30 error codes, code up your own jump table, then you make all optimisation choices yourself (jump will always be quickest), rather than hope the compiler will do the right thing. It also makes the code very small (apart from the static declaration of the jump table). It also has the side benefit that with a debugger you can modify the behaviour at runtime should you so need, just by poking the table data directly.
I know its old but
public class SwitchTest {
static final int max = 100000;
public static void main(String[] args) {
int counter1 = 0;
long start1 = 0l;
long total1 = 0l;
int counter2 = 0;
long start2 = 0l;
long total2 = 0l;
boolean loop = true;
start1 = System.currentTimeMillis();
while (true) {
if (counter1 == max) {
break;
} else {
counter1++;
}
}
total1 = System.currentTimeMillis() - start1;
start2 = System.currentTimeMillis();
while (loop) {
switch (counter2) {
case max:
loop = false;
break;
default:
counter2++;
}
}
total2 = System.currentTimeMillis() - start2;
System.out.println("While if/else: " + total1 + "ms");
System.out.println("Switch: " + total2 + "ms");
System.out.println("Max Loops: " + max);
System.exit(0);
}
}
Varying the loop count changes a lot:
While if/else: 5ms
Switch: 1ms
Max Loops: 100000
While if/else: 5ms
Switch: 3ms
Max Loops: 1000000
While if/else: 5ms
Switch: 14ms
Max Loops: 10000000
While if/else: 5ms
Switch: 149ms
Max Loops: 100000000
(add more statements if you want)
When it comes to compiling the program, I don't know if there is any difference. But as for the program itself and keeping the code as simple as possible, I personally think it depends on what you want to do. if else if else statements have their advantages, which I think are:
allow you to test a variable against specific ranges
you can use functions (Standard Library or Personal) as conditionals.
(example:
`int a;
cout<<"enter value:\n";
cin>>a;
if( a > 0 && a < 5)
{
cout<<"a is between 0, 5\n";
}else if(a > 5 && a < 10)
cout<<"a is between 5,10\n";
}else{
"a is not an integer, or is not in range 0,10\n";
However, If else if else statements can get complicated and messy (despite your best attempts) in a hurry. Switch statements tend to be clearer, cleaner, and easier to read; but can only be used to test against specific values (example:
`int a;
cout<<"enter value:\n";
cin>>a;
switch(a)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
cout<<"a is between 0,5 and equals: "<<a<<"\n";
break;
//other case statements
default:
cout<<"a is not between the range or is not a good value\n"
break;
I prefer if - else if - else statements, but it really is up to you. If you want to use functions as the conditions, or you want to test something against a range, array, or vector and/or you don't mind dealing with the complicated nesting, I would recommend using If else if else blocks. If you want to test against single values or you want a clean and easy to read block, I would recommend you use switch() case blocks.
Related
Im programming some ring buffers and this question came to me several times.
Suppouse we have a counter and we need to reset after certain count.
Ive seen several examples of ring buffers (mostly audio, wraping around r/w pointers), that do this:
x++;
if (x == SOME_NUMBER ){ // Reseting counter
x -= x;
}
is there any difference/preference in doing this:
x++;
if (x == SOME_NUMBER ){ // Reseting counter
x = 0;
}
?
This question applies to almost all kind of variable resets. In my case, besides ring buiffers, im also reseting a counter that do an average, so after i made all my measures, i reset that counter.
Besides the fact that the result may be the same (x reseting to zero), there may be some difference between one approach and the other. Is there any preference?
Consider those slightly modified versions of your snippets
void f(int n)
{
int x = 0;
for (;;)
{
++x;
if (x == n ) { // Reseting counter
x -= x;
}
// Ending condition to avoid UB
if ( x == 42 )
return;
}
}
void g(int n)
{
int x = 0;
for (;;)
{
++x;
if (x == n ) {
x = 0;
}
if ( x == 42 )
return;
}
}
If you look at the generated assembly (e.g. using Compiler Explorer) you'll notice how modern optimizing compilers can take advantage of the as-if rule.
Clang (with -O2) generates the same machine code for both functions. It uses
xor eax, eax
To load a zero into a register and then
cmove ecx, eax
to "reset" the other register when needed.
Gcc just creates f() and then g() becomes
jmp f(int)
That said
Is there any preference?
A common guideline is to write the more readable and maintainable code and to explore possible optimizations only after having profiled it.
In most cases I'd use the x = 0; version, because it conveys the intent better, IMHO. I can only think of a couple of reasons to adopt the x -= x; one:
It does not rely on "magic numbers". However, that would be the case for the 42 literal in my snippet, 0 is an exceptional case.
It doesn't need any implicit conversion. Consider any case where x is not an int.
There may be some architectures/toolchains where it actually delivers faster code. I can't think of any, but that's immaterial.
The difference is in the number of operations: x -= x is subtraction and assignment, whereas x = 0 is just an assignment. Other than the number of CPU cycles, this affects behavior if x is accessible from other threads.
A simple assignment x = 0 is much clearer as well IMO.
I am writing a C++ algorithm that takes two strings and returns true if you can mutate from string a to string b by changing a single character to another.
The two strings must equal in size, and can only have one difference.
I also need to have access to the index that changed, and the character of strA that was altered.
I found a working algorithm, but it iterates through every single pair of words and is running way too slow on any large amount of input.
bool canChange(std::string const& strA, std::string const& strB, char& letter)
{
int dif = 0;
int position = 0;
int currentSize = (int)strA.size();
if(currentSize != (int)strB.size())
{
return false;
}
for(int i = 0; i < currentSize; ++i)
{
if(strA[i] != strB[i])
{
dif++;
position = i;
if(dif > 1)
{
return false;
}
}
}
if(dif == 1)
{
letter = strA[position];
return true;
}
else return false;
}
Any advice on optimization?
It's a bit hard to get away from examining all the characters in the strings, unless you can accept the occasional incorrect result.
I suggest using features of the standard library, and not trying to count the number of mismatches. For example;
#include <string>
#include <algorithm>
bool canChange(std::string const& strA, std::string const& strB, char& letter, std::size_t &index)
{
bool single_mismatch = false;
if (strA.size() == strB.size())
{
typedef std::string::const_iterator ci;
typedef std::pair<ci, ci> mismatch_result;
ci begA(strA.begin()), endA(strA.end());
mismatch_result result = std::mismatch(begA, endA, strB.begin());
if (result.first != endA) // found a mismatch
{
letter = *(result.first);
index = std::distance(begA, result.first);
// now look for a second mismatch
std::advance(result.first, 1);
std::advance(result.second, 1);
single_mismatch = (std::mismatch(result.first, endA, result.second).first == endA);
}
}
return single_mismatch;
}
This works for all versions. It can be simplified a little in C++11.
If the above returns true, then a single mismatch was found.
If the return value is false, then either the strings are different sizes, or the number of mismatches is not equal to 1 (either the strings are equal, or have more than one mismatch).
letter and index are unchanged if the strings are of different lengths or are exactly equal, but otherwise identify the first mismatch (value of the character in strA, and index).
If you want to optimize for mostly-identical strings, you could use x86 SSE/AVX vector instructions. Your basic idea looks fine: break as soon as you detect a second difference.
To find and count character differences, a sequence like PCMPEQB / PMOVMSKB / test-and-branch is probably good. (Use C/C++ intrinsic functions to get those vector instructions). When your vector loop detects non-zero differences in the current block, POPCNT the bitmask to see if you just found the first difference, or if you found two differences in the same block.
I threw together an untested and not-fully-fleshed out AVX2 version of what I'm describing. This code assumes string lengths are a multiple of 32. Stopping early and handling the last chunk with a cleanup epilogue is left as an exercise for the reader.
#include <immintrin.h>
#include <string>
// not tested, and doesn't avoid reading past the end of the string.
// TODO: epilogue to handle the last up-to-31 left-over bytes separately.
bool canChange_avx2_bmi(std::string const& strA, std::string const& strB, char& letter) {
size_t size = strA.size();
if (size != strB.size())
return false;
int diffs = 0;
size_t diffpos = 0;
size_t pos = 0;
do {
uint32_t diffmask = 0;
while( pos < size ) {
__m256i vecA = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(& strA[pos]));
__m256i vecB = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(& strB[pos]));
__m256i vdiff = _mm256_cmpeq_epi8(vecA, vecB);
diffmask = _mm256_movemask_epi8(vdiff);
pos += 32;
if (diffmask) break; // gcc makes worse code if you include && !diffmask in the while condition, instead of this break
}
if (diffmask) {
diffpos = pos + _tzcnt_u32(diffmask); // position of the lowest set bit. Could safely use BSF rather than TZCNT here, since we only run when diffmask is non-zero.
diffs += _mm_popcnt_u32(diffmask);
}
} while(pos < size && diffs <= 1);
if (diffs == 1) {
letter = strA[diffpos];
return true;
}
return false;
}
The ugly break instead of including that in the while condition apparently helps gcc generate better code. The do{}while() also matches up with how I want the asm to come out. I didn't try using a for or while loop to see what gcc would do.
The inner loop is really tight this way:
.L14:
cmp rcx, r8
jnb .L10 # the while(pos<size) condition
.L6: # entry point for first iteration, because gcc duplicates the pos<size test ahead of the loop
vmovdqu ymm0, YMMWORD PTR [r9+rcx] # tmp118,* pos
vpcmpeqb ymm0, ymm0, YMMWORD PTR [r10+rcx] # tmp123, tmp118,* pos
add rcx, 32 # pos,
vpmovmskb eax, ymm0 # tmp121, tmp123
test eax, eax # tmp121
je .L14 #,
In theory, this should run at one iteration per 2 clocks (Intel Haswell). There are 7 fused-domain uops in the loop. (Would be 6, but 2-reg addressing modes apparently can't micro-fuse on SnB-family CPUs.) Since two of the uops are loads, not ALU, this throughput might be achievable on SnB/IvB as well.
This should be exceptionally good for flying over regions where the two strings are identical. The overhead of correctly handling arbitrary string lengths will make this potentially slower than a simple scalar function if strings are short, and/or have multiple differences early on.
How big is your input?
I'd think that the strA[i], strB[i] has function call overhead unless it's inlined. So make sure you do your performance test with inlining turned on and compiled with release. Otherwise, try getting the bytes as a char* with strA.c_str().
If all that fails and it's still not fast enough, try breaking you string into chunks and using memcmp or strncmp on the chunks. If no difference, move to the next chunk until you reach the end or find a difference. If a difference is found, do your trivial byte by byte compare until you find the difference. I suggest this route because memcmp is often faster than your trivial implementations as they can make use of the processor SSE extensions and so forth to do very fast compares.
Also, there is a problem with your code. You're assuming strA is longer than strB and only checking the length of A for the array accessors.
What's the reason behind applying two explicit type casts as below?
if (unlikely(val != (long)(char)val)) {
Code taken from lxml.etree.c source file from lxml's source package.
That's a cheap way to check to see if there's any junk in the high bits. The char cast chops of the upper 8, 24 or 56 bits (depending on sizeof(val)) and then promotes it back. If char is signed, it will sign extend as well.
A better test might be:
if (unlikely(val & ~0xff)) {
or
if (unlikely(val & ~0x7f)) {
depending on whether this test cares about bit 7.
Just for grins and completeness, I wrote the following test code:
void RegularTest(long val)
{
if (val != ((int)(char)val)) {
printf("Regular = not equal.");
}
else {
printf("Regular = equal.");
}
}
void MaskTest(long val)
{
if (val & ~0xff) {
printf("Mask = not equal.");
}
else {
printf("Mask = equal.");
}
}
And here's what the cast code turns into in debug in visual studio 2010:
movsx eax, BYTE PTR _val$[ebp]
cmp DWORD PTR _val$[ebp], eax
je SHORT $LN2#RegularTes
this is the mask code:
mov eax, DWORD PTR _val$[ebp]
and eax, -256 ; ffffff00H
je SHORT $LN2#MaskTest
In release, I get this for the cast code:
movsx ecx, al
cmp eax, ecx
je SHORT $LN2#RegularTes
In release, I get this for the mask code:
test DWORD PTR _val$[ebp], -256 ; ffffff00H
je SHORT $LN2#MaskTest
So what's going on? In the cast case it's doing a byte mov with sign extension (ha! bug - the code is not the same because chars are signed) and then a compare and to be totally sneaky, the compiler/linker has also made this function use register passing for the argument. In the mask code in release, it has folded everything up into a single test instruction.
Which is faster? Beats me - and frankly unless you're running this kind of test on a VERY slow CPU or are running it several billion times, it won't matter. Not in the least.
So the answer in this case, is to write code that is clear about its intent. I would expect a C/C++ jockey to look at the mask code and understand its intent, but if you don't like that, you should opt for something like this instead:
#define BitsAbove8AreSet(x) ((x) & ~0xff)
#define BitsAbove7AreSet(x) ((x) & ~0x7f)
or:
inline bool BitsAbove8AreSet(long t) { return (t & ~0xff) != 0; } // make it a bool to be nice
inline bool BitsAbove7AreSet(long t) { return (t & ~0x7f) != 0; }
And use the predicates instead of the actual code.
In general, I think "is it cheap?" is not a particularly good question to ask about this unless you're working in some very specific problem domains. For example, I work in image processing and when I have some kind of operation going from one image to another, I often have code that looks like this:
BYTE *srcPixel = PixelOffset(src, x, y, srcrowstride, srcdepth);
int srcAdvance = PixelAdvance(srcrowstride, right, srcdepth);
BYTE *dstPixel = PixelOffset(dst, x, y, dstrowstride, dstdepth);
int dstAdvance = PixelAdvance(dstrowstride, right, dstdepth);
for (y = top; y < bottom; y++) {
for (x=left; x < right; x++) {
ProcessOnePixel(srcPixel, srcdepth, dstPixel, dstdepth);
srcPixel += srcdepth;
dstPixel += dstdepth;
}
srcPixel += srcAdvance;
dstPixel += dstAdvance;
}
And in this case, assume that ProcessOnePixel() is actually a chunk of inline code that will be executed billions and billions of times. In this case, I care a whole lot about not doing function calls, not doing redundant work, not rechecking values, ensuring that the computational flow will translate into something that will use registers wisely, etc. But my actual primary concern is that the code can be read by the next poor schmuck (probably me) who has to look at it.
And in our current coding world, it is FAR FAR CHEAPER for nearly every problem domain to spend a little time up front ensuring that your code is easy to read and maintain than it is to worry about performance out of the gate.
Speculations:
cast to char: to mask the 8 low bits,
cast to long: to bring the value back to signed (if char is unsigned).
If val is a long then the (char) will strip off all but the bottom 8 bits. The (long) casts it back for the comparison.
double d[10];
int length = 10;
memset(d, length * sizeof(double), 0);
//or
for (int i = length; i--;)
d[i] = 0.0;
If you really care you should try and measure. However the most portable way is using std::fill():
std::fill( array, array + numberOfElements, 0.0 );
Note that for memset you have to pass the number of bytes, not the number of elements because this is an old C function:
memset(d, 0, sizeof(double)*length);
memset can be faster since it is written in assembler, whereas std::fill is a template function which simply does a loop internally.
But for type safety and more readable code I would recommend std::fill() - it is the c++ way of doing things, and consider memset if a performance optimization is needed at this place in the code.
Try this, if only to be cool xD
{
double *to = d;
int n=(length+7)/8;
switch(length%8){
case 0: do{ *to++ = 0.0;
case 7: *to++ = 0.0;
case 6: *to++ = 0.0;
case 5: *to++ = 0.0;
case 4: *to++ = 0.0;
case 3: *to++ = 0.0;
case 2: *to++ = 0.0;
case 1: *to++ = 0.0;
}while(--n>0);
}
}
Assuming the loop length is an integral constant expression, the most probable outcome it that a good optimizer will recognize both the for-loop and the memset(0). The result would be that the assembly generated is essentially equal. Perhaps the choice of registers could differ, or the setup. But the marginal costs per double should really be the same.
In addition to the several bugs and omissions in your code, using memset is not portable. You can't assume that a double with all zero bits is equal to 0.0. First make your code correct, then worry about optimizing.
memset(d,0,10*sizeof(*d));
is likely to be faster. Like they say you can also
std::fill_n(d,10,0.);
but it is most likely a prettier way to do the loop.
calloc(length, sizeof(double))
According to IEEE-754, the bit representation of a positive zero is all zero bits, and there's nothing wrong with requiring IEEE-754 compliance. (If you need to zero out the array to reuse it, then pick one of the above solutions).
According to this Wikipedia article on IEEE 754-1975 64-bit floating point a bit pattern of all 0s will indeed properly initialize a double to 0.0. Unfortunately your memset code doesn't do that.
Here is the code you ought to be using:
memset(d, 0, length * sizeof(double));
As part of a more complete package...
{
double *d;
int length = 10;
d = malloc(sizeof(d[0]) * length);
memset(d, 0, length * sizeof(d[0]));
}
Of course, that's dropping the error checking you should be doing on the return value of malloc. sizeof(d[0]) is slightly better than sizeof(double) because it's robust against changes in the type of d.
Also, if you use calloc(length, sizeof(d[0])) it will clear the memory for you and the subsequent memset will no longer be necessary. I didn't use it in the example because then it seems like your question wouldn't be answered.
Memset will always be faster, if debug mode or a low level of optimization is used. At higher levels of optimization, it will still be equivalent to std::fill or std::fill_n.
For example, for the following code under Google Benchmark:
(Test setup: xubuntu 18, GCC 7.3, Clang 6.0)
#include <cstring>
#include <algorithm>
#include <benchmark/benchmark.h>
double total = 0;
static void memory_memset(benchmark::State& state)
{
int ints[50000];
for (auto _ : state)
{
std::memset(ints, 0, sizeof(int) * 50000);
}
for (int counter = 0; counter != 50000; ++counter)
{
total += ints[counter];
}
}
static void memory_filln(benchmark::State& state)
{
int ints[50000];
for (auto _ : state)
{
std::fill_n(ints, 50000, 0);
}
for (int counter = 0; counter != 50000; ++counter)
{
total += ints[counter];
}
}
static void memory_fill(benchmark::State& state)
{
int ints[50000];
for (auto _ : state)
{
std::fill(std::begin(ints), std::end(ints), 0);
}
for (int counter = 0; counter != 50000; ++counter)
{
total += ints[counter];
}
}
// Register the function as a benchmark
BENCHMARK(memory_filln);
BENCHMARK(memory_fill);
BENCHMARK(memory_memset);
int main (int argc, char ** argv)
{
benchmark::Initialize (&argc, argv);
benchmark::RunSpecifiedBenchmarks ();
printf("Total = %f\n", total);
getchar();
return 0;
}
Gives the following results in release mode for GCC (-O2;-march=native):
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
memory_filln 16488 ns 16477 ns 42460
memory_fill 16493 ns 16493 ns 42440
memory_memset 8414 ns 8408 ns 83022
And the following results in debug mode (-O0):
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
memory_filln 87209 ns 87139 ns 8029
memory_fill 94593 ns 94533 ns 7411
memory_memset 8441 ns 8434 ns 82833
While at -O3 or with clang at -O2, the following is obtained:
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
memory_filln 8437 ns 8437 ns 82799
memory_fill 8437 ns 8437 ns 82756
memory_memset 8436 ns 8436 ns 82754
TLDR: use memset unless told you absolutely have to use std::fill or a for-loop, at least for POD types which are not non-IEEE-754 floating-points. There are no strong reasons not to.
(note: the for loops counting the array contents are necessary for clang not to optimize away the google benchmark loops entirely (it will detect they're not used otherwise))
The example will not work because you have to allocate memory for your array. You can do this on the stack or on the heap.
This is an example to do it on the stack:
double d[50] = {0.0};
No memset is needed after that.
Don't forget to compare a properly optimized for loop if you really care about performance.
Some variant of Duff's device if the array is sufficiently long, and prefix --i not suffix i-- (although most compilers will probably correct that automatically.).
Although I'd question if this is the most valuable thing to be optimising. Is this genuinely a bottleneck for the system?
memset(d, 10, 0) is wrong as it only nulls 10 bytes.
prefer std::fill as the intent is clearest.
In general the memset is going to be much faster, make sure you get your length right, obviously your example has not (m)allocated or defined the array of doubles. Now if it truly is going to end up with only a handful of doubles then the loop may turn out to be faster. But as get to the point where the fill loop shadows the handful of setup instructions memset will typically use larger and sometimes aligned chunks to maximize speed.
As usual, test and measure. (although in this case you end up in the cache and the measurement may turn out to be bogus).
One way of answering this question is to quickly run the code through Compiler Explorer: If you check this link, you'll see assembly for the following code:
void do_memset(std::array<char, 1024>& a) {
memset(&a, 'q', a.size());
}
void do_fill(std::array<char, 1024>& a) {
std::fill(a.begin(), a.end(), 'q');
}
void do_loop(std::array<char, 1024>& a) {
for (int i = 0; i < a.size(); ++i) {
a[i] = 'q';
}
}
The answer (at least for clang) is that with optimization levels -O0 and -O1, the assembly is different and std::fill will be slower because the use of the iterators is not optimized out. For -O2 and higher, do_memset and do_fill produce the same assembly. The loop ends up calling memset on every item in the array even with -O3.
Assuming release builds tend to run -O2 or higher, there are no performance considerations and I'd recommend using std::fill when it's available, and memset for C.
If you're required to not use STL...
double aValues [10];
ZeroMemory (aValues, sizeof(aValues));
ZeroMemory at least makes the intent clear.
As an alternative to all stuff proposed, I can suggest you NOT to set array to all zeros at startup. Instead, set up value to zero only when you first access the value in a particular cell. This will stave your question off and may be faster.
I think you mean
memset(d, 0, length * sizeof(d[0]))
and
for (int i = length; --i >= 0; ) d[i] = 0;
Personally, I do either one, but I suppose std::fill() is probably better.
What's the best practice for using a switch statement vs using an if statement for 30 unsigned enumerations where about 10 have an expected action (that presently is the same action). Performance and space need to be considered but are not critical. I've abstracted the snippet so don't hate me for the naming conventions.
switch statement:
// numError is an error enumeration type, with 0 being the non-error case
// fire_special_event() is a stub method for the shared processing
switch (numError)
{
case ERROR_01 : // intentional fall-through
case ERROR_07 : // intentional fall-through
case ERROR_0A : // intentional fall-through
case ERROR_10 : // intentional fall-through
case ERROR_15 : // intentional fall-through
case ERROR_16 : // intentional fall-through
case ERROR_20 :
{
fire_special_event();
}
break;
default:
{
// error codes that require no additional action
}
break;
}
if statement:
if ((ERROR_01 == numError) ||
(ERROR_07 == numError) ||
(ERROR_0A == numError) ||
(ERROR_10 == numError) ||
(ERROR_15 == numError) ||
(ERROR_16 == numError) ||
(ERROR_20 == numError))
{
fire_special_event();
}
Use switch.
In the worst case the compiler will generate the same code as a if-else chain, so you don't lose anything. If in doubt put the most common cases first into the switch statement.
In the best case the optimizer may find a better way to generate the code. Common things a compiler does is to build a binary decision tree (saves compares and jumps in the average case) or simply build a jump-table (works without compares at all).
For the special case that you've provided in your example, the clearest code is probably:
if (RequiresSpecialEvent(numError))
fire_special_event();
Obviously this just moves the problem to a different area of the code, but now you have the opportunity to reuse this test. You also have more options for how to solve it. You could use std::set, for example:
bool RequiresSpecialEvent(int numError)
{
return specialSet.find(numError) != specialSet.end();
}
I'm not suggesting that this is the best implementation of RequiresSpecialEvent, just that it's an option. You can still use a switch or if-else chain, or a lookup table, or some bit-manipulation on the value, whatever. The more obscure your decision process becomes, the more value you'll derive from having it in an isolated function.
The switch is faster.
Just try if/else-ing 30 different values inside a loop, and compare it to the same code using switch to see how much faster the switch is.
Now, the switch has one real problem : The switch must know at compile time the values inside each case. This means that the following code:
// WON'T COMPILE
extern const int MY_VALUE ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
won't compile.
Most people will then use defines (Aargh!), and others will declare and define constant variables in the same compilation unit. For example:
// WILL COMPILE
const int MY_VALUE = 25 ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
So, in the end, the developper must choose between "speed + clarity" vs. "code coupling".
(Not that a switch can't be written to be confusing as hell... Most the switch I currently see are of this "confusing" category"... But this is another story...)
Edit 2008-09-21:
bk1e added the following comment: "Defining constants as enums in a header file is another way to handle this".
Of course it is.
The point of an extern type was to decouple the value from the source. Defining this value as a macro, as a simple const int declaration, or even as an enum has the side-effect of inlining the value. Thus, should the define, the enum value, or the const int value change, a recompilation would be needed. The extern declaration means the there is no need to recompile in case of value change, but in the other hand, makes it impossible to use switch. The conclusion being Using switch will increase coupling between the switch code and the variables used as cases. When it is Ok, then use switch. When it isn't, then, no surprise.
.
Edit 2013-01-15:
Vlad Lazarenko commented on my answer, giving a link to his in-depth study of the assembly code generated by a switch. Very enlightning: http://lazarenko.me/switch/
Compiler will optimise it anyway - go for the switch as it's the most readable.
The Switch, if only for readability. Giant if statements are harder to maintain and harder to read in my opinion.
ERROR_01 : // intentional fall-through
or
(ERROR_01 == numError) ||
The later is more error prone and requires more typing and formatting than the first.
Code for readability. If you want to know what performs better, use a profiler, as optimizations and compilers vary, and performance issues are rarely where people think they are.
Compilers are really good at optimizing switch. Recent gcc is also good at optimizing a bunch of conditions in an if.
I made some test cases on godbolt.
When the case values are grouped close together, gcc, clang, and icc are all smart enough to use a bitmap to check if a value is one of the special ones.
e.g. gcc 5.2 -O3 compiles the switch to (and the if something very similar):
errhandler_switch(errtype): # gcc 5.2 -O3
cmpl $32, %edi
ja .L5
movabsq $4301325442, %rax # highest set bit is bit 32 (the 33rd bit)
btq %rdi, %rax
jc .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Notice that the bitmap is immediate data, so there's no potential data-cache miss accessing it, or a jump table.
gcc 4.9.2 -O3 compiles the switch to a bitmap, but does the 1U<<errNumber with mov/shift. It compiles the if version to series of branches.
errhandler_switch(errtype): # gcc 4.9.2 -O3
leal -1(%rdi), %ecx
cmpl $31, %ecx # cmpl $32, %edi wouldn't have to wait an extra cycle for lea's output.
# However, register read ports are limited on pre-SnB Intel
ja .L5
movl $1, %eax
salq %cl, %rax # with -march=haswell, it will use BMI's shlx to avoid moving the shift count into ecx
testl $2150662721, %eax
jne .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Note how it subtracts 1 from errNumber (with lea to combine that operation with a move). That lets it fit the bitmap into a 32bit immediate, avoiding the 64bit-immediate movabsq which takes more instruction bytes.
A shorter (in machine code) sequence would be:
cmpl $32, %edi
ja .L5
mov $2150662721, %eax
dec %edi # movabsq and btq is fewer instructions / fewer Intel uops, but this saves several bytes
bt %edi, %eax
jc fire_special_event
.L5:
ret
(The failure to use jc fire_special_event is omnipresent, and is a compiler bug.)
rep ret is used in branch targets, and following conditional branches, for the benefit of old AMD K8 and K10 (pre-Bulldozer): What does `rep ret` mean?. Without it, branch prediction doesn't work as well on those obsolete CPUs.
bt (bit test) with a register arg is fast. It combines the work of left-shifting a 1 by errNumber bits and doing a test, but is still 1 cycle latency and only a single Intel uop. It's slow with a memory arg because of its way-too-CISC semantics: with a memory operand for the "bit string", the address of the byte to be tested is computed based on the other arg (divided by 8), and isn't limited to the 1, 2, 4, or 8byte chunk pointed to by the memory operand.
From Agner Fog's instruction tables, a variable-count shift instruction is slower than a bt on recent Intel (2 uops instead of 1, and shift doesn't do everything else that's needed).
Use switch, it is what it's for and what programmers expect.
I would put the redundant case labels in though - just to make people feel comfortable, I was trying to remember when / what the rules are for leaving them out.
You don't want the next programmer working on it to have to do any unnecessary thinking about language details (it might be you in a few months time!)
Sorry to disagree with the current accepted answer. This is the year 2021. Modern compilers and their optimizers shouldn't differentiate between switch and an equivalent if-chain anymore. If they still do, and create poorly optimized code for either variant, then write to the compiler vendor (or make it public here, which has a higher change of being respected), but don't let micro-optimizations influence your coding style.
So, if you use:
switch (numError) { case ERROR_A: case ERROR_B: ... }
or:
if(numError == ERROR_A || numError == ERROR_B || ...) { ... }
or:
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
constexpr std::array errList = { ERROR_A, ERROR_B, ... };
if(has(errList, rnd)) { ... }
shouldn't make a difference with respect to execution speed. But depending on what project you are working on, they might make a big difference in coding clarity and code maintainability. For example, if you have to check for a certain error list in many places of the code, the templated has() might be much easier to maintain, as the errList needs to be updated only in one place.
Talking about current compilers, I have compiled the test code quoted below with both clang++ -O3 -std=c++1z (version 10 and 11) and g++ -O3 -std=c++1z. Both clang versions gave similiar compiled code and execution times. So I am talking only about version 11 from now on. Most notably, functionA() (which uses if) and functionB() (which uses switch) produce exactly the same assembler output with clang! And functionC() uses a jump table, even though many other posters deemed jump tables to be an exclusive feature of switch. However, despite many people considering jump tables to be optimal, that was actually the slowest solution on clang: functionC() needs around 20 percent more execution time than functionA() or functionB().
The hand-optimized version functionH() was by far the fastest on clang. It even unrolled the loop partially, doing two iterations on each loop.
Actually, clang calculated the bitfield, which is explicitely supplied in functionH(), also in functionA() and functionB(). However, it used conditional branches in functionA() and functionB(), which made these slow, because branch prediction fails regularly, while it used the much more efficient adc ("add with carry") in functionH(). While it failed to apply this obvious optimization also in the other variants, is unknown to me.
The code produced by g++ looks much more complicated than that of clang - but actually runs a bit faster for functionA() and quite a lot faster for functionC(). Of the non-hand-optimized functions, functionC() is the fastest on g++ and faster than any of the functions on clang. On the contrary, functionH() requires twice the execution time when compiled with g++ instead of with clang, mostly because g++ doesn't do the loop unrolling.
Here are the detailed results:
clang:
functionA: 109877 3627
functionB: 109877 3626
functionC: 109877 4192
functionH: 109877 524
g++:
functionA: 109877 3337
functionB: 109877 4668
functionC: 109877 2890
functionH: 109877 982
The Performance changes drastically, if the constant 32 is changed to 63 in the whole code:
clang:
functionA: 106943 1435
functionB: 106943 1436
functionC: 106943 4191
functionH: 106943 524
g++:
functionA: 106943 1265
functionB: 106943 4481
functionC: 106943 2804
functionH: 106943 1038
The reason for the speedup is, that in case, that the highest tested value is 63, the compilers remove some unnecessary bound checks, because the value of rnd is bound to 63, anyways. Note that with that bound check removed, the non-optimized functionA() using simple if() on g++ performs almost as fast as the hand-optimized functionH(), and it also produces rather similiar assembler output.
What is the conclusion? If you hand-optimize and test compilers a lot, you will get the fastest solution. Any assumption whether switch or if is better, is void - they are the same on clang. And the easy to code solution to check against an array of values is actually the fastest case on g++ (if leaving out hand-optimization and by-incident matching last values of the list).
Future compiler versions will optimize your code better and better and get closer to your hand optimization. So don't waste your time on it, unless cycles are REALLY crucial in your case.
Here the test code:
#include <iostream>
#include <chrono>
#include <limits>
#include <array>
#include <algorithm>
unsigned long long functionA() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(rnd == 1 || rnd == 7 || rnd == 10 || rnd == 16 ||
rnd == 21 || rnd == 22 || rnd == 63)
{
cnt += 1;
}
}
return cnt;
}
unsigned long long functionB() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
switch(rnd) {
case 1:
case 7:
case 10:
case 16:
case 21:
case 22:
case 63:
cnt++;
break;
}
}
return cnt;
}
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
unsigned long long functionC() {
unsigned long long cnt = 0;
constexpr std::array errList { 1, 7, 10, 16, 21, 22, 63 };
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
cnt += has(errList, rnd);
}
return cnt;
}
// Hand optimized version (manually created bitfield):
unsigned long long functionH() {
unsigned long long cnt = 0;
const unsigned long long bitfield =
(1ULL << 1) +
(1ULL << 7) +
(1ULL << 10) +
(1ULL << 16) +
(1ULL << 21) +
(1ULL << 22) +
(1ULL << 63);
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(bitfield & (1ULL << rnd)) {
cnt += 1;
}
}
return cnt;
}
void timeit(unsigned long long (*function)(), const char* message)
{
unsigned long long mintime = std::numeric_limits<unsigned long long>::max();
unsigned long long fres = 0;
for(int i = 0; i < 100; i++) {
auto t1 = std::chrono::high_resolution_clock::now();
fres = function();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
if(duration < mintime) {
mintime = duration;
}
}
std::cout << message << fres << " " << mintime << std::endl;
}
int main(int argc, char* argv[]) {
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
return 0;
}
IMO this is a perfect example of what switch fall-through was made for.
They work equally well. Performance is about the same given a modern compiler.
I prefer if statements over case statements because they are more readable, and more flexible -- you can add other conditions not based on numeric equality, like " || max < min ". But for the simple case you posted here, it doesn't really matter, just do what's most readable to you.
If your cases are likely to remain grouped in the future--if more than one case corresponds to one result--the switch may prove to be easier to read and maintain.
switch is definitely preferred. It's easier to look at a switch's list of cases & know for sure what it is doing than to read the long if condition.
The duplication in the if condition is hard on the eyes. Suppose one of the == was written !=; would you notice? Or if one instance of 'numError' was written 'nmuError', which just happened to compile?
I'd generally prefer to use polymorphism instead of the switch, but without more details of the context, it's hard to say.
As for performance, your best bet is to use a profiler to measure the performance of your application in conditions that are similar to what you expect in the wild. Otherwise, you're probably optimizing in the wrong place and in the wrong way.
I agree with the compacity of the switch solution but IMO you're hijacking the switch here.
The purpose of the switch is to have different handling depending on the value.
If you had to explain your algo in pseudo-code, you'd use an if because, semantically, that's what it is: if whatever_error do this...
So unless you intend someday to change your code to have specific code for each error, I would use if.
I'm not sure about best-practise, but I'd use switch - and then trap intentional fall-through via 'default'
Aesthetically I tend to favor this approach.
unsigned int special_events[] = {
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20
};
int special_events_length = sizeof (special_events) / sizeof (unsigned int);
void process_event(unsigned int numError) {
for (int i = 0; i < special_events_length; i++) {
if (numError == special_events[i]) {
fire_special_event();
break;
}
}
}
Make the data a little smarter so we can make the logic a little dumber.
I realize it looks weird. Here's the inspiration (from how I'd do it in Python):
special_events = [
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20,
]
def process_event(numError):
if numError in special_events:
fire_special_event()
while (true) != while (loop)
Probably the first one is optimised by the compiler, that would explain why the second loop is slower when increasing loop count.
I would pick the if statement for the sake of clarity and convention, although I'm sure that some would disagree. After all, you are wanting to do something if some condition is true! Having a switch with one action seems a little... unneccesary.
Im not the person to tell you about speed and memory usage, but looking at a switch statment is a hell of a lot easier to understand then a large if statement (especially 2-3 months down the line)
I would say use SWITCH. This way you only have to implement differing outcomes. Your ten identical cases can use the default. Should one change all you need to is explicitly implement the change, no need to edit the default. It's also far easier to add or remove cases from a SWITCH than to edit IF and ELSEIF.
switch(numerror){
ERROR_20 : { fire_special_event(); } break;
default : { null; } break;
}
Maybe even test your condition (in this case numerror) against a list of possibilities, an array perhaps so your SWITCH isn't even used unless there definately will be an outcome.
Seeing as you only have 30 error codes, code up your own jump table, then you make all optimisation choices yourself (jump will always be quickest), rather than hope the compiler will do the right thing. It also makes the code very small (apart from the static declaration of the jump table). It also has the side benefit that with a debugger you can modify the behaviour at runtime should you so need, just by poking the table data directly.
I know its old but
public class SwitchTest {
static final int max = 100000;
public static void main(String[] args) {
int counter1 = 0;
long start1 = 0l;
long total1 = 0l;
int counter2 = 0;
long start2 = 0l;
long total2 = 0l;
boolean loop = true;
start1 = System.currentTimeMillis();
while (true) {
if (counter1 == max) {
break;
} else {
counter1++;
}
}
total1 = System.currentTimeMillis() - start1;
start2 = System.currentTimeMillis();
while (loop) {
switch (counter2) {
case max:
loop = false;
break;
default:
counter2++;
}
}
total2 = System.currentTimeMillis() - start2;
System.out.println("While if/else: " + total1 + "ms");
System.out.println("Switch: " + total2 + "ms");
System.out.println("Max Loops: " + max);
System.exit(0);
}
}
Varying the loop count changes a lot:
While if/else: 5ms
Switch: 1ms
Max Loops: 100000
While if/else: 5ms
Switch: 3ms
Max Loops: 1000000
While if/else: 5ms
Switch: 14ms
Max Loops: 10000000
While if/else: 5ms
Switch: 149ms
Max Loops: 100000000
(add more statements if you want)
When it comes to compiling the program, I don't know if there is any difference. But as for the program itself and keeping the code as simple as possible, I personally think it depends on what you want to do. if else if else statements have their advantages, which I think are:
allow you to test a variable against specific ranges
you can use functions (Standard Library or Personal) as conditionals.
(example:
`int a;
cout<<"enter value:\n";
cin>>a;
if( a > 0 && a < 5)
{
cout<<"a is between 0, 5\n";
}else if(a > 5 && a < 10)
cout<<"a is between 5,10\n";
}else{
"a is not an integer, or is not in range 0,10\n";
However, If else if else statements can get complicated and messy (despite your best attempts) in a hurry. Switch statements tend to be clearer, cleaner, and easier to read; but can only be used to test against specific values (example:
`int a;
cout<<"enter value:\n";
cin>>a;
switch(a)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
cout<<"a is between 0,5 and equals: "<<a<<"\n";
break;
//other case statements
default:
cout<<"a is not between the range or is not a good value\n"
break;
I prefer if - else if - else statements, but it really is up to you. If you want to use functions as the conditions, or you want to test something against a range, array, or vector and/or you don't mind dealing with the complicated nesting, I would recommend using If else if else blocks. If you want to test against single values or you want a clean and easy to read block, I would recommend you use switch() case blocks.