Related
I have implemented a isPermutation function which given two string will return true if the two are permutation of each other, otherwise it will return false.
One uses c++ sort algorithm twice, while the other uses an array of ints to keep track of string count.
I ran the code several times and every time the sorting method is faster. Is my array implementation wrong?
Here is the output:
1
0
1
Time: 0.088 ms
1
0
1
Time: 0.014 ms
And the code:
#include <iostream> // cout
#include <string> // string
#include <cstring> // memset
#include <algorithm> // sort
#include <ctime> // clock_t
using namespace std;
#define MAX_CHAR 255
void PrintTimeDiff(clock_t start, clock_t end) {
std::cout << "Time: " << (end - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
}
// using array to keep a count of used chars
bool isPermutation(string inputa, string inputb) {
int allChars[MAX_CHAR];
memset(allChars, 0, sizeof(int) * MAX_CHAR);
for(int i=0; i < inputa.size(); i++) {
allChars[(int)inputa[i]]++;
}
for (int i=0; i < inputb.size(); i++) {
allChars[(int)inputb[i]]--;
if(allChars[(int)inputb[i]] < 0) {
return false;
}
}
return true;
}
// using sorting anc comparing
bool isPermutation_sort(string inputa, string inputb) {
std::sort(inputa.begin(), inputa.end());
std::sort(inputb.begin(), inputb.end());
if(inputa == inputb) return true;
return false;
}
int main(int argc, char* argv[]) {
clock_t start = clock();
cout << isPermutation("god", "dog") << endl;
cout << isPermutation("thisisaratherlongerinput","thisisarathershorterinput") << endl;
cout << isPermutation("armen", "ramen") << endl;
PrintTimeDiff(start, clock());
start = clock();
cout << isPermutation_sort("god", "dog") << endl;
cout << isPermutation_sort("thisisaratherlongerinput","thisisarathershorterinput") << endl;
cout << isPermutation_sort("armen", "ramen") << endl;
PrintTimeDiff(start, clock());
return 0;
}
To benchmark this you have to eliminate all the noise you can.
The easiest way to do this is to wrap it in a loop that repeats the call to each 1000 times or so, then only spit out the value every 10 iterations. This way they each have a similar caching profile. Throw away values that are bogus due (eg blowouts due to context switches by the OS).
I got your method marginally faster by doing this. An excerpt.
method 1 array Time: 0.768 us
method 2 sort Time: 0.840333 us
method 1 array Time: 0.621333 us
method 2 sort Time: 0.774 us
method 1 array Time: 0.769 us
method 2 sort Time: 0.856333 us
method 1 array Time: 0.766 us
method 2 sort Time: 0.850333 us
method 1 array Time: 0.802667 us
method 2 sort Time: 0.89 us
method 1 array Time: 0.778 us
method 2 sort Time: 0.841333 us
I used rdtsc which works better for me on this system. 3000 cycles per microsecond is close enough for this, but please do make it more accurate if you care about precision of the readings.
#if defined(__x86_64__)
static uint64_t rdtsc()
{
uint64_t hi, lo;
__asm__ __volatile__ (
"xor %%eax, %%eax\n"
"cpuid\n"
"rdtsc\n"
: "=a"(lo), "=d"(hi)
:: "ebx", "ecx");
return (hi << 32)|lo;
}
#else
#error wrong architecture - implement me
#endif
void PrintTimeDiff(uint64_t start, uint64_t end) {
std::cout << "Time: " << (end - start)/double(3000) << " us" << std::endl;
}
you cannot check performance differences between implementations putting in the mix calls to std::cout. isPermutation and isPermutation_sort are some order of magnitude faster than a call to std::cout (and, anyway, prefer \n over std::endl).
for testing you have to activate compiler optimizations. Doing so the compiler will apply the loop-invariant code motion optimization and you'll probably get the same results for both implementations.
A more effective way of testing is:
int main()
{
const std::vector<std::string> bag
{
"god", "dog", "thisisaratherlongerinput", "thisisarathershorterinput",
"armen", "ramen"
};
static std::mt19937 engine;
std::uniform_int_distribution<std::size_t> rand(0, bag.size() - 1);
const unsigned stop = 1000000;
unsigned counter = 0;
std::clock_t start = std::clock();
for (unsigned i(0); i < stop; ++i)
counter += isPermutation(bag[rand(engine)], bag[rand(engine)]);
std::cout << counter << '\n';
PrintTimeDiff(start, clock());
counter = 0;
start = std::clock();
for (unsigned i(0); i < stop; ++i)
counter += isPermutation_sort(bag[rand(engine)], bag[rand(engine)]);
std::cout << counter << '\n';
PrintTimeDiff(start, clock());
return 0;
}
I have 2.4s for isPermutations_sort vs 2s for isPermutation (somewhat similar to Hal's results). Same with g++ and clang++.
Printing the value of counter has the double benefit of:
triggering the as-if rule (the compiler cannot remove the for loops);
allowing a first check of your implementations (the two values cannot be too distant).
There're some things you have to change in your implementation of isPermutation:
pass arguments as const references
bool isPermutation(const std::string &inputa, const std::string &inputb)
just this change brings the time down to 0.8s (of course you cannot do the same with isPermutation_sort).
you can use std::array and std::fill instead of memset (this is C++ :-)
avoid premature pessimization and prefer preincrement. Only use postincrement if you're going to use the original value
do not mix signed and unsigned value in the for loops (inputa.size() and i). i should be declared as std::size_t
even better, use the range based for loop.
So something like:
bool isPermutation(const std::string &inputa, const std::string &inputb)
{
std::array<int, MAX_CHAR> allChars;
allChars.fill(0);
for (auto c : inputa)
++allChars[(unsigned char)c];
for (auto c : inputb)
{
--allChars[(unsigned char)c];
if (allChars[(unsigned char)c] < 0)
return false;
}
return true;
}
Anyway both isPermutation and isPermutation_sort should have this preliminary check:
if (inputa.length() != inputb.length())
return false;
Now we are at 0.55s for isPermutation vs 1.1s for isPermutation_sort.
Last but not least consider std::is_permutation:
for (unsigned i(0); i < stop; ++i)
{
const std::string &s1(bag[rand(engine)]), &s2(bag[rand(engine)]);
counter += std::is_permutation(s1.begin(), s1.end(), s2.begin());
}
(0.6s)
EDIT
As observed in BeyelerStudios' comment Mersenne-Twister is too much in this case.
You can change the engine to a simpler one.:
static std::linear_congruential_engine<std::uint_fast32_t, 48271, 0, 2147483647> engine;
This further lowers the timings. Luckily the relative speeds remain the same.
Just to be sure I've also checked with a non random access scheme obtaining the same relative results.
Your idea amounts to using a Counting Sort on both strings, but with the comparison happening on the count array, rather than after writing out sorted strings.
It works well because a byte can only have one of 255 non-zero values. Zeroing 256B of memory, or even 4*256B, is pretty cheap, so it works well even for fairly short strings, where most of the count array isn't touched.
It should be fairly good for very long strings, at least in some cases. It's pretty heavily dependent on a good and a heavily pipelined L1 cache, because scattered increments to the count array produces scattered read-modify-writes. Repeated occurrences create a dependency chain of with a store-load round-trip in it. This is a big glass-jaw for this algorithm, on CPUs where many loads and stores can be in flight at once (with their latencies happening in parallel). Modern x86 CPUs should run it pretty well, since they can sustain a load + store every clock cycle.
The initial count of inputa compiles to a very tight loop:
.L15:
movsx rdx, BYTE PTR [rax]
add rax, 1
add DWORD PTR [rsp-120+rdx*4], 1
cmp rax, rcx
jne .L15
This brings us to the first major bug in your code: char can be signed or unsigned. In the x86-64 ABI, char is signed, so allChars[(int)inputa[i]]++; sign-extends it for use as an array index. (movsx instead of movzx). Your code will write outside the array bounds on non-ASCII characters that have their high bit set. So you should have written allChars[(unsigned char)inputa[i]]++;. Note that casting to (unsigned) doesn't give the result we want (see comments).
Note that clang makes much worse code (v3.7.1 and v3.8, both with -O3), with a function call to std::basic_string<...>::_M_leak_hard() inside the inner loop. (Leak as in leak a reference, I think.) #manlio's version doesn't have this problem, so I guess for (auto c : inputa) syntax helps clang figure out what's happening.
Also, using std::string when your callers have char[] forces them to construct a std::string. That's kind of silly, but it is helpful to be able to compare string lengths.
GNU libc's std::is_permutation uses a very different strategy:
First, it skips any common prefix that's identical without permutation in both strings.
Then, for each element in inputa:
count the occurrences of that element in inputb. Check that it matches the count in inputa.
There are a couple optimizations:
Only compare counts the first time an element is seen: find duplicates by searching from the beginning of inputa, and if the match position isn't the current position, we've already checked this element.
check that the match count in inputb is != 0 before counting matches in the rest of inputa.
This doesn't need any temporary storage, so it can work when the elements are large. (e.g. an array of int64_t, or an array of structs).
If there is a mismatch, this is likely to find it early, before doing as much work. There are probably a few cases of inputs where the counting version would take less time, but probably for most inputs the library algorithm is best.
std::is_permutation uses std::count, which should be implemented very well with SSE / AVX vectors. Unfortunately, it's auto-vectorized in a really stupid way by both gcc and clang. It unpacks bytes to 64bit integers before accumulating them in vector elements, to avoid overflow. So it spends most of its instructions shuffling data around, and is probably slower than a scalar implementation (which you'd get from compiling with -O2, or with -O3 -fno-tree-vectorize).
It could and should only do this every few iterations, so the inner loop of count can just be something like pcmpeqb / psubb, with a psadbw every 255 iterations. Or pcmpeqb / pmovmskb / popcnt / add, but that's slower.
Template specializations in the library could help a lot for std::count for 8, 16, and 32bit types whose equality can be checked with bitwise equality (integer ==).
What's the best practice for using a switch statement vs using an if statement for 30 unsigned enumerations where about 10 have an expected action (that presently is the same action). Performance and space need to be considered but are not critical. I've abstracted the snippet so don't hate me for the naming conventions.
switch statement:
// numError is an error enumeration type, with 0 being the non-error case
// fire_special_event() is a stub method for the shared processing
switch (numError)
{
case ERROR_01 : // intentional fall-through
case ERROR_07 : // intentional fall-through
case ERROR_0A : // intentional fall-through
case ERROR_10 : // intentional fall-through
case ERROR_15 : // intentional fall-through
case ERROR_16 : // intentional fall-through
case ERROR_20 :
{
fire_special_event();
}
break;
default:
{
// error codes that require no additional action
}
break;
}
if statement:
if ((ERROR_01 == numError) ||
(ERROR_07 == numError) ||
(ERROR_0A == numError) ||
(ERROR_10 == numError) ||
(ERROR_15 == numError) ||
(ERROR_16 == numError) ||
(ERROR_20 == numError))
{
fire_special_event();
}
Use switch.
In the worst case the compiler will generate the same code as a if-else chain, so you don't lose anything. If in doubt put the most common cases first into the switch statement.
In the best case the optimizer may find a better way to generate the code. Common things a compiler does is to build a binary decision tree (saves compares and jumps in the average case) or simply build a jump-table (works without compares at all).
For the special case that you've provided in your example, the clearest code is probably:
if (RequiresSpecialEvent(numError))
fire_special_event();
Obviously this just moves the problem to a different area of the code, but now you have the opportunity to reuse this test. You also have more options for how to solve it. You could use std::set, for example:
bool RequiresSpecialEvent(int numError)
{
return specialSet.find(numError) != specialSet.end();
}
I'm not suggesting that this is the best implementation of RequiresSpecialEvent, just that it's an option. You can still use a switch or if-else chain, or a lookup table, or some bit-manipulation on the value, whatever. The more obscure your decision process becomes, the more value you'll derive from having it in an isolated function.
The switch is faster.
Just try if/else-ing 30 different values inside a loop, and compare it to the same code using switch to see how much faster the switch is.
Now, the switch has one real problem : The switch must know at compile time the values inside each case. This means that the following code:
// WON'T COMPILE
extern const int MY_VALUE ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
won't compile.
Most people will then use defines (Aargh!), and others will declare and define constant variables in the same compilation unit. For example:
// WILL COMPILE
const int MY_VALUE = 25 ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
So, in the end, the developper must choose between "speed + clarity" vs. "code coupling".
(Not that a switch can't be written to be confusing as hell... Most the switch I currently see are of this "confusing" category"... But this is another story...)
Edit 2008-09-21:
bk1e added the following comment: "Defining constants as enums in a header file is another way to handle this".
Of course it is.
The point of an extern type was to decouple the value from the source. Defining this value as a macro, as a simple const int declaration, or even as an enum has the side-effect of inlining the value. Thus, should the define, the enum value, or the const int value change, a recompilation would be needed. The extern declaration means the there is no need to recompile in case of value change, but in the other hand, makes it impossible to use switch. The conclusion being Using switch will increase coupling between the switch code and the variables used as cases. When it is Ok, then use switch. When it isn't, then, no surprise.
.
Edit 2013-01-15:
Vlad Lazarenko commented on my answer, giving a link to his in-depth study of the assembly code generated by a switch. Very enlightning: http://lazarenko.me/switch/
Compiler will optimise it anyway - go for the switch as it's the most readable.
The Switch, if only for readability. Giant if statements are harder to maintain and harder to read in my opinion.
ERROR_01 : // intentional fall-through
or
(ERROR_01 == numError) ||
The later is more error prone and requires more typing and formatting than the first.
Code for readability. If you want to know what performs better, use a profiler, as optimizations and compilers vary, and performance issues are rarely where people think they are.
Compilers are really good at optimizing switch. Recent gcc is also good at optimizing a bunch of conditions in an if.
I made some test cases on godbolt.
When the case values are grouped close together, gcc, clang, and icc are all smart enough to use a bitmap to check if a value is one of the special ones.
e.g. gcc 5.2 -O3 compiles the switch to (and the if something very similar):
errhandler_switch(errtype): # gcc 5.2 -O3
cmpl $32, %edi
ja .L5
movabsq $4301325442, %rax # highest set bit is bit 32 (the 33rd bit)
btq %rdi, %rax
jc .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Notice that the bitmap is immediate data, so there's no potential data-cache miss accessing it, or a jump table.
gcc 4.9.2 -O3 compiles the switch to a bitmap, but does the 1U<<errNumber with mov/shift. It compiles the if version to series of branches.
errhandler_switch(errtype): # gcc 4.9.2 -O3
leal -1(%rdi), %ecx
cmpl $31, %ecx # cmpl $32, %edi wouldn't have to wait an extra cycle for lea's output.
# However, register read ports are limited on pre-SnB Intel
ja .L5
movl $1, %eax
salq %cl, %rax # with -march=haswell, it will use BMI's shlx to avoid moving the shift count into ecx
testl $2150662721, %eax
jne .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Note how it subtracts 1 from errNumber (with lea to combine that operation with a move). That lets it fit the bitmap into a 32bit immediate, avoiding the 64bit-immediate movabsq which takes more instruction bytes.
A shorter (in machine code) sequence would be:
cmpl $32, %edi
ja .L5
mov $2150662721, %eax
dec %edi # movabsq and btq is fewer instructions / fewer Intel uops, but this saves several bytes
bt %edi, %eax
jc fire_special_event
.L5:
ret
(The failure to use jc fire_special_event is omnipresent, and is a compiler bug.)
rep ret is used in branch targets, and following conditional branches, for the benefit of old AMD K8 and K10 (pre-Bulldozer): What does `rep ret` mean?. Without it, branch prediction doesn't work as well on those obsolete CPUs.
bt (bit test) with a register arg is fast. It combines the work of left-shifting a 1 by errNumber bits and doing a test, but is still 1 cycle latency and only a single Intel uop. It's slow with a memory arg because of its way-too-CISC semantics: with a memory operand for the "bit string", the address of the byte to be tested is computed based on the other arg (divided by 8), and isn't limited to the 1, 2, 4, or 8byte chunk pointed to by the memory operand.
From Agner Fog's instruction tables, a variable-count shift instruction is slower than a bt on recent Intel (2 uops instead of 1, and shift doesn't do everything else that's needed).
Use switch, it is what it's for and what programmers expect.
I would put the redundant case labels in though - just to make people feel comfortable, I was trying to remember when / what the rules are for leaving them out.
You don't want the next programmer working on it to have to do any unnecessary thinking about language details (it might be you in a few months time!)
Sorry to disagree with the current accepted answer. This is the year 2021. Modern compilers and their optimizers shouldn't differentiate between switch and an equivalent if-chain anymore. If they still do, and create poorly optimized code for either variant, then write to the compiler vendor (or make it public here, which has a higher change of being respected), but don't let micro-optimizations influence your coding style.
So, if you use:
switch (numError) { case ERROR_A: case ERROR_B: ... }
or:
if(numError == ERROR_A || numError == ERROR_B || ...) { ... }
or:
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
constexpr std::array errList = { ERROR_A, ERROR_B, ... };
if(has(errList, rnd)) { ... }
shouldn't make a difference with respect to execution speed. But depending on what project you are working on, they might make a big difference in coding clarity and code maintainability. For example, if you have to check for a certain error list in many places of the code, the templated has() might be much easier to maintain, as the errList needs to be updated only in one place.
Talking about current compilers, I have compiled the test code quoted below with both clang++ -O3 -std=c++1z (version 10 and 11) and g++ -O3 -std=c++1z. Both clang versions gave similiar compiled code and execution times. So I am talking only about version 11 from now on. Most notably, functionA() (which uses if) and functionB() (which uses switch) produce exactly the same assembler output with clang! And functionC() uses a jump table, even though many other posters deemed jump tables to be an exclusive feature of switch. However, despite many people considering jump tables to be optimal, that was actually the slowest solution on clang: functionC() needs around 20 percent more execution time than functionA() or functionB().
The hand-optimized version functionH() was by far the fastest on clang. It even unrolled the loop partially, doing two iterations on each loop.
Actually, clang calculated the bitfield, which is explicitely supplied in functionH(), also in functionA() and functionB(). However, it used conditional branches in functionA() and functionB(), which made these slow, because branch prediction fails regularly, while it used the much more efficient adc ("add with carry") in functionH(). While it failed to apply this obvious optimization also in the other variants, is unknown to me.
The code produced by g++ looks much more complicated than that of clang - but actually runs a bit faster for functionA() and quite a lot faster for functionC(). Of the non-hand-optimized functions, functionC() is the fastest on g++ and faster than any of the functions on clang. On the contrary, functionH() requires twice the execution time when compiled with g++ instead of with clang, mostly because g++ doesn't do the loop unrolling.
Here are the detailed results:
clang:
functionA: 109877 3627
functionB: 109877 3626
functionC: 109877 4192
functionH: 109877 524
g++:
functionA: 109877 3337
functionB: 109877 4668
functionC: 109877 2890
functionH: 109877 982
The Performance changes drastically, if the constant 32 is changed to 63 in the whole code:
clang:
functionA: 106943 1435
functionB: 106943 1436
functionC: 106943 4191
functionH: 106943 524
g++:
functionA: 106943 1265
functionB: 106943 4481
functionC: 106943 2804
functionH: 106943 1038
The reason for the speedup is, that in case, that the highest tested value is 63, the compilers remove some unnecessary bound checks, because the value of rnd is bound to 63, anyways. Note that with that bound check removed, the non-optimized functionA() using simple if() on g++ performs almost as fast as the hand-optimized functionH(), and it also produces rather similiar assembler output.
What is the conclusion? If you hand-optimize and test compilers a lot, you will get the fastest solution. Any assumption whether switch or if is better, is void - they are the same on clang. And the easy to code solution to check against an array of values is actually the fastest case on g++ (if leaving out hand-optimization and by-incident matching last values of the list).
Future compiler versions will optimize your code better and better and get closer to your hand optimization. So don't waste your time on it, unless cycles are REALLY crucial in your case.
Here the test code:
#include <iostream>
#include <chrono>
#include <limits>
#include <array>
#include <algorithm>
unsigned long long functionA() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(rnd == 1 || rnd == 7 || rnd == 10 || rnd == 16 ||
rnd == 21 || rnd == 22 || rnd == 63)
{
cnt += 1;
}
}
return cnt;
}
unsigned long long functionB() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
switch(rnd) {
case 1:
case 7:
case 10:
case 16:
case 21:
case 22:
case 63:
cnt++;
break;
}
}
return cnt;
}
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
unsigned long long functionC() {
unsigned long long cnt = 0;
constexpr std::array errList { 1, 7, 10, 16, 21, 22, 63 };
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
cnt += has(errList, rnd);
}
return cnt;
}
// Hand optimized version (manually created bitfield):
unsigned long long functionH() {
unsigned long long cnt = 0;
const unsigned long long bitfield =
(1ULL << 1) +
(1ULL << 7) +
(1ULL << 10) +
(1ULL << 16) +
(1ULL << 21) +
(1ULL << 22) +
(1ULL << 63);
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(bitfield & (1ULL << rnd)) {
cnt += 1;
}
}
return cnt;
}
void timeit(unsigned long long (*function)(), const char* message)
{
unsigned long long mintime = std::numeric_limits<unsigned long long>::max();
unsigned long long fres = 0;
for(int i = 0; i < 100; i++) {
auto t1 = std::chrono::high_resolution_clock::now();
fres = function();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
if(duration < mintime) {
mintime = duration;
}
}
std::cout << message << fres << " " << mintime << std::endl;
}
int main(int argc, char* argv[]) {
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
return 0;
}
IMO this is a perfect example of what switch fall-through was made for.
They work equally well. Performance is about the same given a modern compiler.
I prefer if statements over case statements because they are more readable, and more flexible -- you can add other conditions not based on numeric equality, like " || max < min ". But for the simple case you posted here, it doesn't really matter, just do what's most readable to you.
If your cases are likely to remain grouped in the future--if more than one case corresponds to one result--the switch may prove to be easier to read and maintain.
switch is definitely preferred. It's easier to look at a switch's list of cases & know for sure what it is doing than to read the long if condition.
The duplication in the if condition is hard on the eyes. Suppose one of the == was written !=; would you notice? Or if one instance of 'numError' was written 'nmuError', which just happened to compile?
I'd generally prefer to use polymorphism instead of the switch, but without more details of the context, it's hard to say.
As for performance, your best bet is to use a profiler to measure the performance of your application in conditions that are similar to what you expect in the wild. Otherwise, you're probably optimizing in the wrong place and in the wrong way.
I agree with the compacity of the switch solution but IMO you're hijacking the switch here.
The purpose of the switch is to have different handling depending on the value.
If you had to explain your algo in pseudo-code, you'd use an if because, semantically, that's what it is: if whatever_error do this...
So unless you intend someday to change your code to have specific code for each error, I would use if.
I'm not sure about best-practise, but I'd use switch - and then trap intentional fall-through via 'default'
Aesthetically I tend to favor this approach.
unsigned int special_events[] = {
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20
};
int special_events_length = sizeof (special_events) / sizeof (unsigned int);
void process_event(unsigned int numError) {
for (int i = 0; i < special_events_length; i++) {
if (numError == special_events[i]) {
fire_special_event();
break;
}
}
}
Make the data a little smarter so we can make the logic a little dumber.
I realize it looks weird. Here's the inspiration (from how I'd do it in Python):
special_events = [
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20,
]
def process_event(numError):
if numError in special_events:
fire_special_event()
while (true) != while (loop)
Probably the first one is optimised by the compiler, that would explain why the second loop is slower when increasing loop count.
I would pick the if statement for the sake of clarity and convention, although I'm sure that some would disagree. After all, you are wanting to do something if some condition is true! Having a switch with one action seems a little... unneccesary.
Im not the person to tell you about speed and memory usage, but looking at a switch statment is a hell of a lot easier to understand then a large if statement (especially 2-3 months down the line)
I would say use SWITCH. This way you only have to implement differing outcomes. Your ten identical cases can use the default. Should one change all you need to is explicitly implement the change, no need to edit the default. It's also far easier to add or remove cases from a SWITCH than to edit IF and ELSEIF.
switch(numerror){
ERROR_20 : { fire_special_event(); } break;
default : { null; } break;
}
Maybe even test your condition (in this case numerror) against a list of possibilities, an array perhaps so your SWITCH isn't even used unless there definately will be an outcome.
Seeing as you only have 30 error codes, code up your own jump table, then you make all optimisation choices yourself (jump will always be quickest), rather than hope the compiler will do the right thing. It also makes the code very small (apart from the static declaration of the jump table). It also has the side benefit that with a debugger you can modify the behaviour at runtime should you so need, just by poking the table data directly.
I know its old but
public class SwitchTest {
static final int max = 100000;
public static void main(String[] args) {
int counter1 = 0;
long start1 = 0l;
long total1 = 0l;
int counter2 = 0;
long start2 = 0l;
long total2 = 0l;
boolean loop = true;
start1 = System.currentTimeMillis();
while (true) {
if (counter1 == max) {
break;
} else {
counter1++;
}
}
total1 = System.currentTimeMillis() - start1;
start2 = System.currentTimeMillis();
while (loop) {
switch (counter2) {
case max:
loop = false;
break;
default:
counter2++;
}
}
total2 = System.currentTimeMillis() - start2;
System.out.println("While if/else: " + total1 + "ms");
System.out.println("Switch: " + total2 + "ms");
System.out.println("Max Loops: " + max);
System.exit(0);
}
}
Varying the loop count changes a lot:
While if/else: 5ms
Switch: 1ms
Max Loops: 100000
While if/else: 5ms
Switch: 3ms
Max Loops: 1000000
While if/else: 5ms
Switch: 14ms
Max Loops: 10000000
While if/else: 5ms
Switch: 149ms
Max Loops: 100000000
(add more statements if you want)
When it comes to compiling the program, I don't know if there is any difference. But as for the program itself and keeping the code as simple as possible, I personally think it depends on what you want to do. if else if else statements have their advantages, which I think are:
allow you to test a variable against specific ranges
you can use functions (Standard Library or Personal) as conditionals.
(example:
`int a;
cout<<"enter value:\n";
cin>>a;
if( a > 0 && a < 5)
{
cout<<"a is between 0, 5\n";
}else if(a > 5 && a < 10)
cout<<"a is between 5,10\n";
}else{
"a is not an integer, or is not in range 0,10\n";
However, If else if else statements can get complicated and messy (despite your best attempts) in a hurry. Switch statements tend to be clearer, cleaner, and easier to read; but can only be used to test against specific values (example:
`int a;
cout<<"enter value:\n";
cin>>a;
switch(a)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
cout<<"a is between 0,5 and equals: "<<a<<"\n";
break;
//other case statements
default:
cout<<"a is not between the range or is not a good value\n"
break;
I prefer if - else if - else statements, but it really is up to you. If you want to use functions as the conditions, or you want to test something against a range, array, or vector and/or you don't mind dealing with the complicated nesting, I would recommend using If else if else blocks. If you want to test against single values or you want a clean and easy to read block, I would recommend you use switch() case blocks.
I'm not sure if the following code can cause redundant calculations, or is it compiler-specific?
for (int i = 0; i < strlen(ss); ++i)
{
// blabla
}
Will strlen() be calculated every time when i increases?
Yes, strlen() will be evaluated on each iteration. It's possible that, under ideal circumstances, the optimiser might be able to deduce that the value won't change, but I personally wouldn't rely on that.
I'd do something like
for (int i = 0, n = strlen(ss); i < n; ++i)
or possibly
for (int i = 0; ss[i]; ++i)
as long as the string isn't going to change length during the iteration. If it might, then you'll need to either call strlen() each time, or handle it through more complicated logic.
Yes, every time you use the loop. Then it will every time calculate the length of the string.
so use it like this:
char str[30];
for ( int i = 0; str[i] != '\0'; i++)
{
//Something;
}
In the above code str[i] only verifies one particular character in the string at location i each time the loop starts a cycle, thus it will take less memory and is more efficient.
See this Link for more information.
In the code below every time the loop runs strlen will count the length of the whole string which is less efficient, takes more time and takes more memory.
char str[];
for ( int i = 0; i < strlen(str); i++)
{
//Something;
}
A good compiler may not calculate it every time, but I don't think you can be sure, that every compiler does it.
In addition to that, the compiler has to know, that strlen(ss) does not change. This is only true if ss is not changed in for loop.
For example, if you use a read-only function on ss in for loop but don't declare the ss-parameter as const, the compiler cannot even know that ss is not changed in the loop and has to calculate strlen(ss) in every iteration.
If ss is of type const char * and you're not casting away the constness within the loop the compiler might only call strlen once, if optimizations are turned on. But this is certainly not behavior that can be counted upon.
You should save the strlen result in a variable and use this variable in the loop. If you don't want to create an additional variable, depending on what you're doing, you may be ale to get away with reversing the loop to iterate backwards.
for( auto i = strlen(s); i > 0; --i ) {
// do whatever
// remember value of s[strlen(s)] is the terminating NULL character
}
Formally yes, strlen() is expected to be called for every iteration.
Anyway I do not want to negate the possibility of the existance of some clever compiler optimisation, that will optimise away any successive call to strlen() after the first one.
The predicate code in it's entirety will be executed on every iteration of the for loop. In order to memoize the result of the strlen(ss) call the compiler would need to know that at least
The function strlen was side effect free
The memory pointed to by ss doesn't change for the duration of the loop
The compiler doesn't know either of these things and hence can't safely memoize the result of the first call
Yes. strlen will be calculated everytime when i increases.
If you didn't change ss with in the loop means it won't affect logic otherwise it will affect.
It is safer to use following code.
int length = strlen(ss);
for ( int i = 0; i < length ; ++ i )
{
// blabla
}
Yes, the strlen(ss) will calculate the length at each iteration. If you are increasing the ss by some way and also increasing the i; there would be infinite loop.
Yes, the strlen() function is called every time the loop is evaluated.
If you want to improve the efficiency then always remember to save everything in local variables... It will take time but it's very useful ..
You can use code like below:
String str="ss";
int l = strlen(str);
for ( int i = 0; i < l ; i++ )
{
// blablabla
}
Yes, strlen(ss) will be calculated every time the code runs.
Not common nowadays but 20 years ago on 16 bit platforms, I'd recommend this:
for ( char* p = str; *p; p++ ) { /* ... */ }
Even if your compiler isn't very smart in optimization, the above code can result in good assembly code yet.
Yes. The test doesn't know that ss doesn't get changed inside the loop. If you know that it won't change then I would write:
int stringLength = strlen (ss);
for ( int i = 0; i < stringLength; ++ i )
{
// blabla
}
Arrgh, it will, even under ideal circumstances, dammit!
As of today (January 2018), and gcc 7.3 and clang 5.0, if you compile:
#include <string.h>
void bar(char c);
void foo(const char* __restrict__ ss)
{
for (int i = 0; i < strlen(ss); ++i)
{
bar(*ss);
}
}
So, we have:
ss is a constant pointer.
ss is marked __restrict__
The loop body cannot in any way touch the memory pointed to by ss (well, unless it violates the __restrict__).
and still, both compilers execute strlen() every single iteration of that loop. Amazing.
This also means the allusions/wishful thinking of #Praetorian and #JaredPar doesn't pan out.
YES, in simple words.
And there is small no in rare condition in which compiler is wishing to, as an optimization step if it finds that there is no changes made in ss at all. But in safe condition you should think it as YES. There are some situation like in multithreaded and event driven program, it may get buggy if you consider it a NO.
Play safe as it is not going to improve the program complexity too much.
Yes.
strlen() calculated everytime when i increases and does not optimized.
Below code shows why the compiler should not optimize strlen().
for ( int i = 0; i < strlen(ss); ++i )
{
// Change ss string.
ss[i] = 'a'; // Compiler should not optimize strlen().
}
We can easily test it :
char nums[] = "0123456789";
size_t end;
int i;
for( i=0, end=strlen(nums); i<strlen(nums); i++ ) {
putchar( nums[i] );
num[--end] = 0;
}
Loop condition evaluates after each repetition, before restarting the loop .
Also be careful about the type you use to handle length of strings . it should be size_t which has been defined as unsigned int in stdio. comparing and casting it to int might cause some serious vulnerability issue.
well, I noticed that someone is saying that it is optimized by default by any "clever" modern compiler. By the way look at results without optimization. I tried: Minimal C code:
#include <stdio.h>
#include <string.h>
int main()
{
char *s="aaaa";
for (int i=0; i<strlen(s);i++)
printf ("a");
return 0;
}
My compiler: g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Command for generation of assembly code: g++ -S -masm=intel test.cpp
Gotten assembly code at the output:
...
L3:
mov DWORD PTR [esp], 97
call putchar
add DWORD PTR [esp+40], 1
.L2:
THIS LOOP IS HERE
**<b>mov ebx, DWORD PTR [esp+40]
mov eax, DWORD PTR [esp+44]
mov DWORD PTR [esp+28], -1
mov edx, eax
mov eax, 0
mov ecx, DWORD PTR [esp+28]
mov edi, edx
repnz scasb</b>**
AS YOU CAN SEE it's done every time
mov eax, ecx
not eax
sub eax, 1
cmp ebx, eax
setb al
test al, al
jne .L3
mov eax, 0
.....
Elaborating on Prætorian's answer I recommend the following:
for( auto i = strlen(s)-1; i > 0; --i ) {foo(s[i-1];}
auto because you don't want to care about which type strlen returns. A C++11 compiler (e.g. gcc -std=c++0x, not completely C++11 but auto types work) will do that for you.
i = strlen(s) becuase you want to compare to 0 (see below)
i > 0 because comparison to 0 is (slightly) faster that comparison to any other number.
disadvantage is that you have to use i-1 in order to access the string characters.
What's the reason behind applying two explicit type casts as below?
if (unlikely(val != (long)(char)val)) {
Code taken from lxml.etree.c source file from lxml's source package.
That's a cheap way to check to see if there's any junk in the high bits. The char cast chops of the upper 8, 24 or 56 bits (depending on sizeof(val)) and then promotes it back. If char is signed, it will sign extend as well.
A better test might be:
if (unlikely(val & ~0xff)) {
or
if (unlikely(val & ~0x7f)) {
depending on whether this test cares about bit 7.
Just for grins and completeness, I wrote the following test code:
void RegularTest(long val)
{
if (val != ((int)(char)val)) {
printf("Regular = not equal.");
}
else {
printf("Regular = equal.");
}
}
void MaskTest(long val)
{
if (val & ~0xff) {
printf("Mask = not equal.");
}
else {
printf("Mask = equal.");
}
}
And here's what the cast code turns into in debug in visual studio 2010:
movsx eax, BYTE PTR _val$[ebp]
cmp DWORD PTR _val$[ebp], eax
je SHORT $LN2#RegularTes
this is the mask code:
mov eax, DWORD PTR _val$[ebp]
and eax, -256 ; ffffff00H
je SHORT $LN2#MaskTest
In release, I get this for the cast code:
movsx ecx, al
cmp eax, ecx
je SHORT $LN2#RegularTes
In release, I get this for the mask code:
test DWORD PTR _val$[ebp], -256 ; ffffff00H
je SHORT $LN2#MaskTest
So what's going on? In the cast case it's doing a byte mov with sign extension (ha! bug - the code is not the same because chars are signed) and then a compare and to be totally sneaky, the compiler/linker has also made this function use register passing for the argument. In the mask code in release, it has folded everything up into a single test instruction.
Which is faster? Beats me - and frankly unless you're running this kind of test on a VERY slow CPU or are running it several billion times, it won't matter. Not in the least.
So the answer in this case, is to write code that is clear about its intent. I would expect a C/C++ jockey to look at the mask code and understand its intent, but if you don't like that, you should opt for something like this instead:
#define BitsAbove8AreSet(x) ((x) & ~0xff)
#define BitsAbove7AreSet(x) ((x) & ~0x7f)
or:
inline bool BitsAbove8AreSet(long t) { return (t & ~0xff) != 0; } // make it a bool to be nice
inline bool BitsAbove7AreSet(long t) { return (t & ~0x7f) != 0; }
And use the predicates instead of the actual code.
In general, I think "is it cheap?" is not a particularly good question to ask about this unless you're working in some very specific problem domains. For example, I work in image processing and when I have some kind of operation going from one image to another, I often have code that looks like this:
BYTE *srcPixel = PixelOffset(src, x, y, srcrowstride, srcdepth);
int srcAdvance = PixelAdvance(srcrowstride, right, srcdepth);
BYTE *dstPixel = PixelOffset(dst, x, y, dstrowstride, dstdepth);
int dstAdvance = PixelAdvance(dstrowstride, right, dstdepth);
for (y = top; y < bottom; y++) {
for (x=left; x < right; x++) {
ProcessOnePixel(srcPixel, srcdepth, dstPixel, dstdepth);
srcPixel += srcdepth;
dstPixel += dstdepth;
}
srcPixel += srcAdvance;
dstPixel += dstAdvance;
}
And in this case, assume that ProcessOnePixel() is actually a chunk of inline code that will be executed billions and billions of times. In this case, I care a whole lot about not doing function calls, not doing redundant work, not rechecking values, ensuring that the computational flow will translate into something that will use registers wisely, etc. But my actual primary concern is that the code can be read by the next poor schmuck (probably me) who has to look at it.
And in our current coding world, it is FAR FAR CHEAPER for nearly every problem domain to spend a little time up front ensuring that your code is easy to read and maintain than it is to worry about performance out of the gate.
Speculations:
cast to char: to mask the 8 low bits,
cast to long: to bring the value back to signed (if char is unsigned).
If val is a long then the (char) will strip off all but the bottom 8 bits. The (long) casts it back for the comparison.
What's the best practice for using a switch statement vs using an if statement for 30 unsigned enumerations where about 10 have an expected action (that presently is the same action). Performance and space need to be considered but are not critical. I've abstracted the snippet so don't hate me for the naming conventions.
switch statement:
// numError is an error enumeration type, with 0 being the non-error case
// fire_special_event() is a stub method for the shared processing
switch (numError)
{
case ERROR_01 : // intentional fall-through
case ERROR_07 : // intentional fall-through
case ERROR_0A : // intentional fall-through
case ERROR_10 : // intentional fall-through
case ERROR_15 : // intentional fall-through
case ERROR_16 : // intentional fall-through
case ERROR_20 :
{
fire_special_event();
}
break;
default:
{
// error codes that require no additional action
}
break;
}
if statement:
if ((ERROR_01 == numError) ||
(ERROR_07 == numError) ||
(ERROR_0A == numError) ||
(ERROR_10 == numError) ||
(ERROR_15 == numError) ||
(ERROR_16 == numError) ||
(ERROR_20 == numError))
{
fire_special_event();
}
Use switch.
In the worst case the compiler will generate the same code as a if-else chain, so you don't lose anything. If in doubt put the most common cases first into the switch statement.
In the best case the optimizer may find a better way to generate the code. Common things a compiler does is to build a binary decision tree (saves compares and jumps in the average case) or simply build a jump-table (works without compares at all).
For the special case that you've provided in your example, the clearest code is probably:
if (RequiresSpecialEvent(numError))
fire_special_event();
Obviously this just moves the problem to a different area of the code, but now you have the opportunity to reuse this test. You also have more options for how to solve it. You could use std::set, for example:
bool RequiresSpecialEvent(int numError)
{
return specialSet.find(numError) != specialSet.end();
}
I'm not suggesting that this is the best implementation of RequiresSpecialEvent, just that it's an option. You can still use a switch or if-else chain, or a lookup table, or some bit-manipulation on the value, whatever. The more obscure your decision process becomes, the more value you'll derive from having it in an isolated function.
The switch is faster.
Just try if/else-ing 30 different values inside a loop, and compare it to the same code using switch to see how much faster the switch is.
Now, the switch has one real problem : The switch must know at compile time the values inside each case. This means that the following code:
// WON'T COMPILE
extern const int MY_VALUE ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
won't compile.
Most people will then use defines (Aargh!), and others will declare and define constant variables in the same compilation unit. For example:
// WILL COMPILE
const int MY_VALUE = 25 ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
So, in the end, the developper must choose between "speed + clarity" vs. "code coupling".
(Not that a switch can't be written to be confusing as hell... Most the switch I currently see are of this "confusing" category"... But this is another story...)
Edit 2008-09-21:
bk1e added the following comment: "Defining constants as enums in a header file is another way to handle this".
Of course it is.
The point of an extern type was to decouple the value from the source. Defining this value as a macro, as a simple const int declaration, or even as an enum has the side-effect of inlining the value. Thus, should the define, the enum value, or the const int value change, a recompilation would be needed. The extern declaration means the there is no need to recompile in case of value change, but in the other hand, makes it impossible to use switch. The conclusion being Using switch will increase coupling between the switch code and the variables used as cases. When it is Ok, then use switch. When it isn't, then, no surprise.
.
Edit 2013-01-15:
Vlad Lazarenko commented on my answer, giving a link to his in-depth study of the assembly code generated by a switch. Very enlightning: http://lazarenko.me/switch/
Compiler will optimise it anyway - go for the switch as it's the most readable.
The Switch, if only for readability. Giant if statements are harder to maintain and harder to read in my opinion.
ERROR_01 : // intentional fall-through
or
(ERROR_01 == numError) ||
The later is more error prone and requires more typing and formatting than the first.
Code for readability. If you want to know what performs better, use a profiler, as optimizations and compilers vary, and performance issues are rarely where people think they are.
Compilers are really good at optimizing switch. Recent gcc is also good at optimizing a bunch of conditions in an if.
I made some test cases on godbolt.
When the case values are grouped close together, gcc, clang, and icc are all smart enough to use a bitmap to check if a value is one of the special ones.
e.g. gcc 5.2 -O3 compiles the switch to (and the if something very similar):
errhandler_switch(errtype): # gcc 5.2 -O3
cmpl $32, %edi
ja .L5
movabsq $4301325442, %rax # highest set bit is bit 32 (the 33rd bit)
btq %rdi, %rax
jc .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Notice that the bitmap is immediate data, so there's no potential data-cache miss accessing it, or a jump table.
gcc 4.9.2 -O3 compiles the switch to a bitmap, but does the 1U<<errNumber with mov/shift. It compiles the if version to series of branches.
errhandler_switch(errtype): # gcc 4.9.2 -O3
leal -1(%rdi), %ecx
cmpl $31, %ecx # cmpl $32, %edi wouldn't have to wait an extra cycle for lea's output.
# However, register read ports are limited on pre-SnB Intel
ja .L5
movl $1, %eax
salq %cl, %rax # with -march=haswell, it will use BMI's shlx to avoid moving the shift count into ecx
testl $2150662721, %eax
jne .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Note how it subtracts 1 from errNumber (with lea to combine that operation with a move). That lets it fit the bitmap into a 32bit immediate, avoiding the 64bit-immediate movabsq which takes more instruction bytes.
A shorter (in machine code) sequence would be:
cmpl $32, %edi
ja .L5
mov $2150662721, %eax
dec %edi # movabsq and btq is fewer instructions / fewer Intel uops, but this saves several bytes
bt %edi, %eax
jc fire_special_event
.L5:
ret
(The failure to use jc fire_special_event is omnipresent, and is a compiler bug.)
rep ret is used in branch targets, and following conditional branches, for the benefit of old AMD K8 and K10 (pre-Bulldozer): What does `rep ret` mean?. Without it, branch prediction doesn't work as well on those obsolete CPUs.
bt (bit test) with a register arg is fast. It combines the work of left-shifting a 1 by errNumber bits and doing a test, but is still 1 cycle latency and only a single Intel uop. It's slow with a memory arg because of its way-too-CISC semantics: with a memory operand for the "bit string", the address of the byte to be tested is computed based on the other arg (divided by 8), and isn't limited to the 1, 2, 4, or 8byte chunk pointed to by the memory operand.
From Agner Fog's instruction tables, a variable-count shift instruction is slower than a bt on recent Intel (2 uops instead of 1, and shift doesn't do everything else that's needed).
Use switch, it is what it's for and what programmers expect.
I would put the redundant case labels in though - just to make people feel comfortable, I was trying to remember when / what the rules are for leaving them out.
You don't want the next programmer working on it to have to do any unnecessary thinking about language details (it might be you in a few months time!)
Sorry to disagree with the current accepted answer. This is the year 2021. Modern compilers and their optimizers shouldn't differentiate between switch and an equivalent if-chain anymore. If they still do, and create poorly optimized code for either variant, then write to the compiler vendor (or make it public here, which has a higher change of being respected), but don't let micro-optimizations influence your coding style.
So, if you use:
switch (numError) { case ERROR_A: case ERROR_B: ... }
or:
if(numError == ERROR_A || numError == ERROR_B || ...) { ... }
or:
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
constexpr std::array errList = { ERROR_A, ERROR_B, ... };
if(has(errList, rnd)) { ... }
shouldn't make a difference with respect to execution speed. But depending on what project you are working on, they might make a big difference in coding clarity and code maintainability. For example, if you have to check for a certain error list in many places of the code, the templated has() might be much easier to maintain, as the errList needs to be updated only in one place.
Talking about current compilers, I have compiled the test code quoted below with both clang++ -O3 -std=c++1z (version 10 and 11) and g++ -O3 -std=c++1z. Both clang versions gave similiar compiled code and execution times. So I am talking only about version 11 from now on. Most notably, functionA() (which uses if) and functionB() (which uses switch) produce exactly the same assembler output with clang! And functionC() uses a jump table, even though many other posters deemed jump tables to be an exclusive feature of switch. However, despite many people considering jump tables to be optimal, that was actually the slowest solution on clang: functionC() needs around 20 percent more execution time than functionA() or functionB().
The hand-optimized version functionH() was by far the fastest on clang. It even unrolled the loop partially, doing two iterations on each loop.
Actually, clang calculated the bitfield, which is explicitely supplied in functionH(), also in functionA() and functionB(). However, it used conditional branches in functionA() and functionB(), which made these slow, because branch prediction fails regularly, while it used the much more efficient adc ("add with carry") in functionH(). While it failed to apply this obvious optimization also in the other variants, is unknown to me.
The code produced by g++ looks much more complicated than that of clang - but actually runs a bit faster for functionA() and quite a lot faster for functionC(). Of the non-hand-optimized functions, functionC() is the fastest on g++ and faster than any of the functions on clang. On the contrary, functionH() requires twice the execution time when compiled with g++ instead of with clang, mostly because g++ doesn't do the loop unrolling.
Here are the detailed results:
clang:
functionA: 109877 3627
functionB: 109877 3626
functionC: 109877 4192
functionH: 109877 524
g++:
functionA: 109877 3337
functionB: 109877 4668
functionC: 109877 2890
functionH: 109877 982
The Performance changes drastically, if the constant 32 is changed to 63 in the whole code:
clang:
functionA: 106943 1435
functionB: 106943 1436
functionC: 106943 4191
functionH: 106943 524
g++:
functionA: 106943 1265
functionB: 106943 4481
functionC: 106943 2804
functionH: 106943 1038
The reason for the speedup is, that in case, that the highest tested value is 63, the compilers remove some unnecessary bound checks, because the value of rnd is bound to 63, anyways. Note that with that bound check removed, the non-optimized functionA() using simple if() on g++ performs almost as fast as the hand-optimized functionH(), and it also produces rather similiar assembler output.
What is the conclusion? If you hand-optimize and test compilers a lot, you will get the fastest solution. Any assumption whether switch or if is better, is void - they are the same on clang. And the easy to code solution to check against an array of values is actually the fastest case on g++ (if leaving out hand-optimization and by-incident matching last values of the list).
Future compiler versions will optimize your code better and better and get closer to your hand optimization. So don't waste your time on it, unless cycles are REALLY crucial in your case.
Here the test code:
#include <iostream>
#include <chrono>
#include <limits>
#include <array>
#include <algorithm>
unsigned long long functionA() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(rnd == 1 || rnd == 7 || rnd == 10 || rnd == 16 ||
rnd == 21 || rnd == 22 || rnd == 63)
{
cnt += 1;
}
}
return cnt;
}
unsigned long long functionB() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
switch(rnd) {
case 1:
case 7:
case 10:
case 16:
case 21:
case 22:
case 63:
cnt++;
break;
}
}
return cnt;
}
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
unsigned long long functionC() {
unsigned long long cnt = 0;
constexpr std::array errList { 1, 7, 10, 16, 21, 22, 63 };
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
cnt += has(errList, rnd);
}
return cnt;
}
// Hand optimized version (manually created bitfield):
unsigned long long functionH() {
unsigned long long cnt = 0;
const unsigned long long bitfield =
(1ULL << 1) +
(1ULL << 7) +
(1ULL << 10) +
(1ULL << 16) +
(1ULL << 21) +
(1ULL << 22) +
(1ULL << 63);
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(bitfield & (1ULL << rnd)) {
cnt += 1;
}
}
return cnt;
}
void timeit(unsigned long long (*function)(), const char* message)
{
unsigned long long mintime = std::numeric_limits<unsigned long long>::max();
unsigned long long fres = 0;
for(int i = 0; i < 100; i++) {
auto t1 = std::chrono::high_resolution_clock::now();
fres = function();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
if(duration < mintime) {
mintime = duration;
}
}
std::cout << message << fres << " " << mintime << std::endl;
}
int main(int argc, char* argv[]) {
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
return 0;
}
IMO this is a perfect example of what switch fall-through was made for.
They work equally well. Performance is about the same given a modern compiler.
I prefer if statements over case statements because they are more readable, and more flexible -- you can add other conditions not based on numeric equality, like " || max < min ". But for the simple case you posted here, it doesn't really matter, just do what's most readable to you.
If your cases are likely to remain grouped in the future--if more than one case corresponds to one result--the switch may prove to be easier to read and maintain.
switch is definitely preferred. It's easier to look at a switch's list of cases & know for sure what it is doing than to read the long if condition.
The duplication in the if condition is hard on the eyes. Suppose one of the == was written !=; would you notice? Or if one instance of 'numError' was written 'nmuError', which just happened to compile?
I'd generally prefer to use polymorphism instead of the switch, but without more details of the context, it's hard to say.
As for performance, your best bet is to use a profiler to measure the performance of your application in conditions that are similar to what you expect in the wild. Otherwise, you're probably optimizing in the wrong place and in the wrong way.
I agree with the compacity of the switch solution but IMO you're hijacking the switch here.
The purpose of the switch is to have different handling depending on the value.
If you had to explain your algo in pseudo-code, you'd use an if because, semantically, that's what it is: if whatever_error do this...
So unless you intend someday to change your code to have specific code for each error, I would use if.
I'm not sure about best-practise, but I'd use switch - and then trap intentional fall-through via 'default'
Aesthetically I tend to favor this approach.
unsigned int special_events[] = {
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20
};
int special_events_length = sizeof (special_events) / sizeof (unsigned int);
void process_event(unsigned int numError) {
for (int i = 0; i < special_events_length; i++) {
if (numError == special_events[i]) {
fire_special_event();
break;
}
}
}
Make the data a little smarter so we can make the logic a little dumber.
I realize it looks weird. Here's the inspiration (from how I'd do it in Python):
special_events = [
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20,
]
def process_event(numError):
if numError in special_events:
fire_special_event()
while (true) != while (loop)
Probably the first one is optimised by the compiler, that would explain why the second loop is slower when increasing loop count.
I would pick the if statement for the sake of clarity and convention, although I'm sure that some would disagree. After all, you are wanting to do something if some condition is true! Having a switch with one action seems a little... unneccesary.
Im not the person to tell you about speed and memory usage, but looking at a switch statment is a hell of a lot easier to understand then a large if statement (especially 2-3 months down the line)
I would say use SWITCH. This way you only have to implement differing outcomes. Your ten identical cases can use the default. Should one change all you need to is explicitly implement the change, no need to edit the default. It's also far easier to add or remove cases from a SWITCH than to edit IF and ELSEIF.
switch(numerror){
ERROR_20 : { fire_special_event(); } break;
default : { null; } break;
}
Maybe even test your condition (in this case numerror) against a list of possibilities, an array perhaps so your SWITCH isn't even used unless there definately will be an outcome.
Seeing as you only have 30 error codes, code up your own jump table, then you make all optimisation choices yourself (jump will always be quickest), rather than hope the compiler will do the right thing. It also makes the code very small (apart from the static declaration of the jump table). It also has the side benefit that with a debugger you can modify the behaviour at runtime should you so need, just by poking the table data directly.
I know its old but
public class SwitchTest {
static final int max = 100000;
public static void main(String[] args) {
int counter1 = 0;
long start1 = 0l;
long total1 = 0l;
int counter2 = 0;
long start2 = 0l;
long total2 = 0l;
boolean loop = true;
start1 = System.currentTimeMillis();
while (true) {
if (counter1 == max) {
break;
} else {
counter1++;
}
}
total1 = System.currentTimeMillis() - start1;
start2 = System.currentTimeMillis();
while (loop) {
switch (counter2) {
case max:
loop = false;
break;
default:
counter2++;
}
}
total2 = System.currentTimeMillis() - start2;
System.out.println("While if/else: " + total1 + "ms");
System.out.println("Switch: " + total2 + "ms");
System.out.println("Max Loops: " + max);
System.exit(0);
}
}
Varying the loop count changes a lot:
While if/else: 5ms
Switch: 1ms
Max Loops: 100000
While if/else: 5ms
Switch: 3ms
Max Loops: 1000000
While if/else: 5ms
Switch: 14ms
Max Loops: 10000000
While if/else: 5ms
Switch: 149ms
Max Loops: 100000000
(add more statements if you want)
When it comes to compiling the program, I don't know if there is any difference. But as for the program itself and keeping the code as simple as possible, I personally think it depends on what you want to do. if else if else statements have their advantages, which I think are:
allow you to test a variable against specific ranges
you can use functions (Standard Library or Personal) as conditionals.
(example:
`int a;
cout<<"enter value:\n";
cin>>a;
if( a > 0 && a < 5)
{
cout<<"a is between 0, 5\n";
}else if(a > 5 && a < 10)
cout<<"a is between 5,10\n";
}else{
"a is not an integer, or is not in range 0,10\n";
However, If else if else statements can get complicated and messy (despite your best attempts) in a hurry. Switch statements tend to be clearer, cleaner, and easier to read; but can only be used to test against specific values (example:
`int a;
cout<<"enter value:\n";
cin>>a;
switch(a)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
cout<<"a is between 0,5 and equals: "<<a<<"\n";
break;
//other case statements
default:
cout<<"a is not between the range or is not a good value\n"
break;
I prefer if - else if - else statements, but it really is up to you. If you want to use functions as the conditions, or you want to test something against a range, array, or vector and/or you don't mind dealing with the complicated nesting, I would recommend using If else if else blocks. If you want to test against single values or you want a clean and easy to read block, I would recommend you use switch() case blocks.