asm volatile issue in C/C++ MUD code - c++

I'm working on an old MUD codebase with a friend just as a hobby project, but I'm having issues getting the code to compile in any OS other than debian (x386 specifically). The issue is (mostly) because of a few asm lines that I honestly don't understand enough to modify. The error I receive when trying to compile in VS is "error c2059: syntax error '' line 29. Any idea on how to get this to compile on an x64 OS?
void Execute(int nArgs, ...)
{
if(MAX_FUNCTION_ARGS < nArgs)
throw "Error: CFuncPtr has too many args";
int i;
void *fptrs[MAX_FUNCTION_ARGS], *ptrs[MAX_FUNCTION_ARGS];
va_list ap;
va_start(ap, nArgs);
for(i = 0; i < nArgs; i++)
fptrs[i] = va_arg(ap, void *);
for(i = 0; i < nArgs; i++)
{
ptrs[i] = fptrs[nArgs - i - 1];
// ============== This is the part with the issue
asm volatile("" \ // This is line 29.
"movl %0, %%eax\n\t" \
"pushl %%eax\n\t" \
:
: "r"(ptrs[i])
: "%eax");
// ==============
}
(*funcptr) ();
va_end(ap);
}

This is far from trivial, since x86-64 uses register passing for arguments [and it's a rather ugly solution in the first place, since it's basically assuming no arguments are passed in registers, and that the callee function takes all arguments on the stack].
I would probably avoid the assembly version altogether, and instead of the second for-loop (with the assembly code) write something like:
switch(nArgs)
{
case 0:
(*funcptr)();
break;
case 1:
(*funcptr)(fptrs[0]);
break;
case 2:
(*funcptr)(fptrs[0], fptrs[1]);
break;
... Repeat until MAX_FUNCTION_ARGS is covered.
}
It is unlikely to generate terribly bad code unless MAX_FUNCTION_ARGS is VERY large [in which case you probably want to change the calling convention of funcptr in the first place].

Related

Member function [c++] definition not found by VS2015 due to #define

Have a #define which is called within a member function. By this #define the member-function definition isn't found by the VS2015 environment at the declaration level. The project compiles and runs just fine so no problem there.
This does however, break the functionality of VS2015 to jump between the declaration and definition.
It can be solved by writing the #define within the source, but can this be solved without removing the #define?
File: cFoo.h
class cFoo
{
public:
int Bits;
void Member();
}
File: cFoo.cpp
#include "cFoo.h"
#define SWITCH( x ) for ( int bit = 1; x >= bit; bit *= 2) if (x & bit) switch (bit)
void cFoo::Member()
{
SWITCH( Bits )
{
case 1: break;
case 2: break;
default: break;
}
}
I would disadvice to use such constructs. It is rather counterintuitive and difficult to understand.
Possible problems/difficulties:
switch case with breaks suggests that only one case is executed, but your macro logic hides the loop.
the default case is executed multiple times depending on the highest populated bit.
using signed int as bit set -> use unsigned - its less prone to implementation defined behavior
it is possibly slow because of loop (I do not know if the compiler is able to unroll and optimize it)
its called SWITCH_BITS which suggests that bit numbers are expected, but cases have to be powers of two.
Your whole statement is not more compact than a simple if sequence.
if(bits & 1 ){
}
if(bits & 1024){
}
but you maybe want to test to bit numbers:
inline bool isBitSet(u32 i_bitset, u32 i_bit){ return i_bitset & (1 << i_bit);}
if(isBitSet(bits, 0){
}
if(isBitSet(bits, 10){
}
You can try doing this by completing the switch statement within the #define - macro:
Foo.h
#ifndef FOO_H
#define FOO_H
#ifndef SWITCH
#define SWITCH(x) \
do { \
for ( int bit = 1; (x) >= bit; bit *=2 ) { \
if ( (x) & bit ) { \
switch( bit ) { \
case 0: break; \
case 1: break; \
default: break; \
} \
} \
} \
} while(0)
#endif
class Foo {
public:
int Bits;
void Member();
};
#endif // !FOO_H
Foo.cpp
#include "Foo.h"
void Foo::Member() {
SWITCH( this->Bits );
}
When I was trying the #define or macro as you had it, Foo::Member()
was not being defined by intellisense...
However even with it not being defined I was able to build, compile and run it like this:
#ifndef SWITCH_A
#define SWITCH_A(x) \
for ( int bit = 1; (x) >= bit; bit *= 2 ) \
if ( (x) & bit ) \
switch (bit)
#endif
For some reason; MS Visual Studio was being very picky with it; meaning that I was having a hard time getting it to compile; it was even complaining about the "spacing" giving me compiler errors of missing { or ; need before... etc. Once I got all the spacing down properly it did eventually compile & run, but intellisense was not able to see that Foo::Member() was defined. When I tried it in the original manner that I've shown above; intellisense had no problem seeing that Foo::Member() was defined. Don't know if it's a pre-compiler bug, MS Visual Studio bug etc.; Just hope that this helps you.

Getting debug error in multithreaded VS code and unable to find issue

I keep getting a debug error thrown, telling me that abort() has been called and then when I go to debug in Visual Studio it takes me to the following code (the last line is where it throws):
void __cdecl _NMSG_WRITE (
int rterrnum
)
{
const wchar_t * const error_text = _GET_RTERRMSG(rterrnum);
if (error_text)
{
int msgshown = 0;
#ifdef _DEBUG
/*
* Report error.
*
* If _CRT_ERROR has _CRTDBG_REPORT_WNDW on, and user chooses
* "Retry", call the debugger.
*
* Otherwise, continue execution.
*
*/
if (rterrnum != _RT_CRNL && rterrnum != _RT_BANNER && rterrnum != _RT_CRT_NOTINIT)
{
switch (_CrtDbgReportW(_CRT_ERROR, NULL, 0, NULL, L"%s", error_text))
{
case 1: _CrtDbgBreak(); msgshown = 1; break;
It appears when I step over whatever-is-the-last-line in the below function:
X::my_func(){
//a,b,c are 2x ints and a unordered_map
std::thread t1(&X::multi_thread_func, this, a, b, c);
int c = 0;
//The debug error message doesn't appear until I step over this line. If I were
//to add further code to this function then the error only appears after stepping
//over the last line in the function.
int c1 = 0;
}
I appreciate there's not a lot to go on- but could people give me tips how I could continue my investigation from within Visual Studio 2012?
EDIT: If I remove the multithreaded call I don't get the error
A std::thread instance must either be joined or detached, before it can be destructed (what happens, as soon as t1 goes out of scope). Otherwise, the destructor of std::thread will call std::terminate(). This might be the cause of that abort.
X::my_func(){
//a,b,c are 2x ints and a unordered_map
std::thread t1(&X::multi_thread_func, this, a, b, c);
t1.join();// <--- ... however, using a thread like this makes little to no sense.
int c = 0;
//The debug error message doesn't appear until I step over this line. If I were
//to add further code to this function then the error only appears after stepping
//over the last line in the function.
int c1 = 0;
}

In CPP, can for replace while and if replace switch? [duplicate]

What's the best practice for using a switch statement vs using an if statement for 30 unsigned enumerations where about 10 have an expected action (that presently is the same action). Performance and space need to be considered but are not critical. I've abstracted the snippet so don't hate me for the naming conventions.
switch statement:
// numError is an error enumeration type, with 0 being the non-error case
// fire_special_event() is a stub method for the shared processing
switch (numError)
{
case ERROR_01 : // intentional fall-through
case ERROR_07 : // intentional fall-through
case ERROR_0A : // intentional fall-through
case ERROR_10 : // intentional fall-through
case ERROR_15 : // intentional fall-through
case ERROR_16 : // intentional fall-through
case ERROR_20 :
{
fire_special_event();
}
break;
default:
{
// error codes that require no additional action
}
break;
}
if statement:
if ((ERROR_01 == numError) ||
(ERROR_07 == numError) ||
(ERROR_0A == numError) ||
(ERROR_10 == numError) ||
(ERROR_15 == numError) ||
(ERROR_16 == numError) ||
(ERROR_20 == numError))
{
fire_special_event();
}
Use switch.
In the worst case the compiler will generate the same code as a if-else chain, so you don't lose anything. If in doubt put the most common cases first into the switch statement.
In the best case the optimizer may find a better way to generate the code. Common things a compiler does is to build a binary decision tree (saves compares and jumps in the average case) or simply build a jump-table (works without compares at all).
For the special case that you've provided in your example, the clearest code is probably:
if (RequiresSpecialEvent(numError))
fire_special_event();
Obviously this just moves the problem to a different area of the code, but now you have the opportunity to reuse this test. You also have more options for how to solve it. You could use std::set, for example:
bool RequiresSpecialEvent(int numError)
{
return specialSet.find(numError) != specialSet.end();
}
I'm not suggesting that this is the best implementation of RequiresSpecialEvent, just that it's an option. You can still use a switch or if-else chain, or a lookup table, or some bit-manipulation on the value, whatever. The more obscure your decision process becomes, the more value you'll derive from having it in an isolated function.
The switch is faster.
Just try if/else-ing 30 different values inside a loop, and compare it to the same code using switch to see how much faster the switch is.
Now, the switch has one real problem : The switch must know at compile time the values inside each case. This means that the following code:
// WON'T COMPILE
extern const int MY_VALUE ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
won't compile.
Most people will then use defines (Aargh!), and others will declare and define constant variables in the same compilation unit. For example:
// WILL COMPILE
const int MY_VALUE = 25 ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
So, in the end, the developper must choose between "speed + clarity" vs. "code coupling".
(Not that a switch can't be written to be confusing as hell... Most the switch I currently see are of this "confusing" category"... But this is another story...)
Edit 2008-09-21:
bk1e added the following comment: "Defining constants as enums in a header file is another way to handle this".
Of course it is.
The point of an extern type was to decouple the value from the source. Defining this value as a macro, as a simple const int declaration, or even as an enum has the side-effect of inlining the value. Thus, should the define, the enum value, or the const int value change, a recompilation would be needed. The extern declaration means the there is no need to recompile in case of value change, but in the other hand, makes it impossible to use switch. The conclusion being Using switch will increase coupling between the switch code and the variables used as cases. When it is Ok, then use switch. When it isn't, then, no surprise.
.
Edit 2013-01-15:
Vlad Lazarenko commented on my answer, giving a link to his in-depth study of the assembly code generated by a switch. Very enlightning: http://lazarenko.me/switch/
Compiler will optimise it anyway - go for the switch as it's the most readable.
The Switch, if only for readability. Giant if statements are harder to maintain and harder to read in my opinion.
ERROR_01 : // intentional fall-through
or
(ERROR_01 == numError) ||
The later is more error prone and requires more typing and formatting than the first.
Code for readability. If you want to know what performs better, use a profiler, as optimizations and compilers vary, and performance issues are rarely where people think they are.
Compilers are really good at optimizing switch. Recent gcc is also good at optimizing a bunch of conditions in an if.
I made some test cases on godbolt.
When the case values are grouped close together, gcc, clang, and icc are all smart enough to use a bitmap to check if a value is one of the special ones.
e.g. gcc 5.2 -O3 compiles the switch to (and the if something very similar):
errhandler_switch(errtype): # gcc 5.2 -O3
cmpl $32, %edi
ja .L5
movabsq $4301325442, %rax # highest set bit is bit 32 (the 33rd bit)
btq %rdi, %rax
jc .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Notice that the bitmap is immediate data, so there's no potential data-cache miss accessing it, or a jump table.
gcc 4.9.2 -O3 compiles the switch to a bitmap, but does the 1U<<errNumber with mov/shift. It compiles the if version to series of branches.
errhandler_switch(errtype): # gcc 4.9.2 -O3
leal -1(%rdi), %ecx
cmpl $31, %ecx # cmpl $32, %edi wouldn't have to wait an extra cycle for lea's output.
# However, register read ports are limited on pre-SnB Intel
ja .L5
movl $1, %eax
salq %cl, %rax # with -march=haswell, it will use BMI's shlx to avoid moving the shift count into ecx
testl $2150662721, %eax
jne .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Note how it subtracts 1 from errNumber (with lea to combine that operation with a move). That lets it fit the bitmap into a 32bit immediate, avoiding the 64bit-immediate movabsq which takes more instruction bytes.
A shorter (in machine code) sequence would be:
cmpl $32, %edi
ja .L5
mov $2150662721, %eax
dec %edi # movabsq and btq is fewer instructions / fewer Intel uops, but this saves several bytes
bt %edi, %eax
jc fire_special_event
.L5:
ret
(The failure to use jc fire_special_event is omnipresent, and is a compiler bug.)
rep ret is used in branch targets, and following conditional branches, for the benefit of old AMD K8 and K10 (pre-Bulldozer): What does `rep ret` mean?. Without it, branch prediction doesn't work as well on those obsolete CPUs.
bt (bit test) with a register arg is fast. It combines the work of left-shifting a 1 by errNumber bits and doing a test, but is still 1 cycle latency and only a single Intel uop. It's slow with a memory arg because of its way-too-CISC semantics: with a memory operand for the "bit string", the address of the byte to be tested is computed based on the other arg (divided by 8), and isn't limited to the 1, 2, 4, or 8byte chunk pointed to by the memory operand.
From Agner Fog's instruction tables, a variable-count shift instruction is slower than a bt on recent Intel (2 uops instead of 1, and shift doesn't do everything else that's needed).
Use switch, it is what it's for and what programmers expect.
I would put the redundant case labels in though - just to make people feel comfortable, I was trying to remember when / what the rules are for leaving them out.
You don't want the next programmer working on it to have to do any unnecessary thinking about language details (it might be you in a few months time!)
Sorry to disagree with the current accepted answer. This is the year 2021. Modern compilers and their optimizers shouldn't differentiate between switch and an equivalent if-chain anymore. If they still do, and create poorly optimized code for either variant, then write to the compiler vendor (or make it public here, which has a higher change of being respected), but don't let micro-optimizations influence your coding style.
So, if you use:
switch (numError) { case ERROR_A: case ERROR_B: ... }
or:
if(numError == ERROR_A || numError == ERROR_B || ...) { ... }
or:
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
constexpr std::array errList = { ERROR_A, ERROR_B, ... };
if(has(errList, rnd)) { ... }
shouldn't make a difference with respect to execution speed. But depending on what project you are working on, they might make a big difference in coding clarity and code maintainability. For example, if you have to check for a certain error list in many places of the code, the templated has() might be much easier to maintain, as the errList needs to be updated only in one place.
Talking about current compilers, I have compiled the test code quoted below with both clang++ -O3 -std=c++1z (version 10 and 11) and g++ -O3 -std=c++1z. Both clang versions gave similiar compiled code and execution times. So I am talking only about version 11 from now on. Most notably, functionA() (which uses if) and functionB() (which uses switch) produce exactly the same assembler output with clang! And functionC() uses a jump table, even though many other posters deemed jump tables to be an exclusive feature of switch. However, despite many people considering jump tables to be optimal, that was actually the slowest solution on clang: functionC() needs around 20 percent more execution time than functionA() or functionB().
The hand-optimized version functionH() was by far the fastest on clang. It even unrolled the loop partially, doing two iterations on each loop.
Actually, clang calculated the bitfield, which is explicitely supplied in functionH(), also in functionA() and functionB(). However, it used conditional branches in functionA() and functionB(), which made these slow, because branch prediction fails regularly, while it used the much more efficient adc ("add with carry") in functionH(). While it failed to apply this obvious optimization also in the other variants, is unknown to me.
The code produced by g++ looks much more complicated than that of clang - but actually runs a bit faster for functionA() and quite a lot faster for functionC(). Of the non-hand-optimized functions, functionC() is the fastest on g++ and faster than any of the functions on clang. On the contrary, functionH() requires twice the execution time when compiled with g++ instead of with clang, mostly because g++ doesn't do the loop unrolling.
Here are the detailed results:
clang:
functionA: 109877 3627
functionB: 109877 3626
functionC: 109877 4192
functionH: 109877 524
g++:
functionA: 109877 3337
functionB: 109877 4668
functionC: 109877 2890
functionH: 109877 982
The Performance changes drastically, if the constant 32 is changed to 63 in the whole code:
clang:
functionA: 106943 1435
functionB: 106943 1436
functionC: 106943 4191
functionH: 106943 524
g++:
functionA: 106943 1265
functionB: 106943 4481
functionC: 106943 2804
functionH: 106943 1038
The reason for the speedup is, that in case, that the highest tested value is 63, the compilers remove some unnecessary bound checks, because the value of rnd is bound to 63, anyways. Note that with that bound check removed, the non-optimized functionA() using simple if() on g++ performs almost as fast as the hand-optimized functionH(), and it also produces rather similiar assembler output.
What is the conclusion? If you hand-optimize and test compilers a lot, you will get the fastest solution. Any assumption whether switch or if is better, is void - they are the same on clang. And the easy to code solution to check against an array of values is actually the fastest case on g++ (if leaving out hand-optimization and by-incident matching last values of the list).
Future compiler versions will optimize your code better and better and get closer to your hand optimization. So don't waste your time on it, unless cycles are REALLY crucial in your case.
Here the test code:
#include <iostream>
#include <chrono>
#include <limits>
#include <array>
#include <algorithm>
unsigned long long functionA() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(rnd == 1 || rnd == 7 || rnd == 10 || rnd == 16 ||
rnd == 21 || rnd == 22 || rnd == 63)
{
cnt += 1;
}
}
return cnt;
}
unsigned long long functionB() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
switch(rnd) {
case 1:
case 7:
case 10:
case 16:
case 21:
case 22:
case 63:
cnt++;
break;
}
}
return cnt;
}
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
unsigned long long functionC() {
unsigned long long cnt = 0;
constexpr std::array errList { 1, 7, 10, 16, 21, 22, 63 };
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
cnt += has(errList, rnd);
}
return cnt;
}
// Hand optimized version (manually created bitfield):
unsigned long long functionH() {
unsigned long long cnt = 0;
const unsigned long long bitfield =
(1ULL << 1) +
(1ULL << 7) +
(1ULL << 10) +
(1ULL << 16) +
(1ULL << 21) +
(1ULL << 22) +
(1ULL << 63);
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(bitfield & (1ULL << rnd)) {
cnt += 1;
}
}
return cnt;
}
void timeit(unsigned long long (*function)(), const char* message)
{
unsigned long long mintime = std::numeric_limits<unsigned long long>::max();
unsigned long long fres = 0;
for(int i = 0; i < 100; i++) {
auto t1 = std::chrono::high_resolution_clock::now();
fres = function();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
if(duration < mintime) {
mintime = duration;
}
}
std::cout << message << fres << " " << mintime << std::endl;
}
int main(int argc, char* argv[]) {
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
return 0;
}
IMO this is a perfect example of what switch fall-through was made for.
They work equally well. Performance is about the same given a modern compiler.
I prefer if statements over case statements because they are more readable, and more flexible -- you can add other conditions not based on numeric equality, like " || max < min ". But for the simple case you posted here, it doesn't really matter, just do what's most readable to you.
If your cases are likely to remain grouped in the future--if more than one case corresponds to one result--the switch may prove to be easier to read and maintain.
switch is definitely preferred. It's easier to look at a switch's list of cases & know for sure what it is doing than to read the long if condition.
The duplication in the if condition is hard on the eyes. Suppose one of the == was written !=; would you notice? Or if one instance of 'numError' was written 'nmuError', which just happened to compile?
I'd generally prefer to use polymorphism instead of the switch, but without more details of the context, it's hard to say.
As for performance, your best bet is to use a profiler to measure the performance of your application in conditions that are similar to what you expect in the wild. Otherwise, you're probably optimizing in the wrong place and in the wrong way.
I agree with the compacity of the switch solution but IMO you're hijacking the switch here.
The purpose of the switch is to have different handling depending on the value.
If you had to explain your algo in pseudo-code, you'd use an if because, semantically, that's what it is: if whatever_error do this...
So unless you intend someday to change your code to have specific code for each error, I would use if.
I'm not sure about best-practise, but I'd use switch - and then trap intentional fall-through via 'default'
Aesthetically I tend to favor this approach.
unsigned int special_events[] = {
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20
};
int special_events_length = sizeof (special_events) / sizeof (unsigned int);
void process_event(unsigned int numError) {
for (int i = 0; i < special_events_length; i++) {
if (numError == special_events[i]) {
fire_special_event();
break;
}
}
}
Make the data a little smarter so we can make the logic a little dumber.
I realize it looks weird. Here's the inspiration (from how I'd do it in Python):
special_events = [
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20,
]
def process_event(numError):
if numError in special_events:
fire_special_event()
while (true) != while (loop)
Probably the first one is optimised by the compiler, that would explain why the second loop is slower when increasing loop count.
I would pick the if statement for the sake of clarity and convention, although I'm sure that some would disagree. After all, you are wanting to do something if some condition is true! Having a switch with one action seems a little... unneccesary.
Im not the person to tell you about speed and memory usage, but looking at a switch statment is a hell of a lot easier to understand then a large if statement (especially 2-3 months down the line)
I would say use SWITCH. This way you only have to implement differing outcomes. Your ten identical cases can use the default. Should one change all you need to is explicitly implement the change, no need to edit the default. It's also far easier to add or remove cases from a SWITCH than to edit IF and ELSEIF.
switch(numerror){
ERROR_20 : { fire_special_event(); } break;
default : { null; } break;
}
Maybe even test your condition (in this case numerror) against a list of possibilities, an array perhaps so your SWITCH isn't even used unless there definately will be an outcome.
Seeing as you only have 30 error codes, code up your own jump table, then you make all optimisation choices yourself (jump will always be quickest), rather than hope the compiler will do the right thing. It also makes the code very small (apart from the static declaration of the jump table). It also has the side benefit that with a debugger you can modify the behaviour at runtime should you so need, just by poking the table data directly.
I know its old but
public class SwitchTest {
static final int max = 100000;
public static void main(String[] args) {
int counter1 = 0;
long start1 = 0l;
long total1 = 0l;
int counter2 = 0;
long start2 = 0l;
long total2 = 0l;
boolean loop = true;
start1 = System.currentTimeMillis();
while (true) {
if (counter1 == max) {
break;
} else {
counter1++;
}
}
total1 = System.currentTimeMillis() - start1;
start2 = System.currentTimeMillis();
while (loop) {
switch (counter2) {
case max:
loop = false;
break;
default:
counter2++;
}
}
total2 = System.currentTimeMillis() - start2;
System.out.println("While if/else: " + total1 + "ms");
System.out.println("Switch: " + total2 + "ms");
System.out.println("Max Loops: " + max);
System.exit(0);
}
}
Varying the loop count changes a lot:
While if/else: 5ms
Switch: 1ms
Max Loops: 100000
While if/else: 5ms
Switch: 3ms
Max Loops: 1000000
While if/else: 5ms
Switch: 14ms
Max Loops: 10000000
While if/else: 5ms
Switch: 149ms
Max Loops: 100000000
(add more statements if you want)
When it comes to compiling the program, I don't know if there is any difference. But as for the program itself and keeping the code as simple as possible, I personally think it depends on what you want to do. if else if else statements have their advantages, which I think are:
allow you to test a variable against specific ranges
you can use functions (Standard Library or Personal) as conditionals.
(example:
`int a;
cout<<"enter value:\n";
cin>>a;
if( a > 0 && a < 5)
{
cout<<"a is between 0, 5\n";
}else if(a > 5 && a < 10)
cout<<"a is between 5,10\n";
}else{
"a is not an integer, or is not in range 0,10\n";
However, If else if else statements can get complicated and messy (despite your best attempts) in a hurry. Switch statements tend to be clearer, cleaner, and easier to read; but can only be used to test against specific values (example:
`int a;
cout<<"enter value:\n";
cin>>a;
switch(a)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
cout<<"a is between 0,5 and equals: "<<a<<"\n";
break;
//other case statements
default:
cout<<"a is not between the range or is not a good value\n"
break;
I prefer if - else if - else statements, but it really is up to you. If you want to use functions as the conditions, or you want to test something against a range, array, or vector and/or you don't mind dealing with the complicated nesting, I would recommend using If else if else blocks. If you want to test against single values or you want a clean and easy to read block, I would recommend you use switch() case blocks.

how do i write assembly in C++ for adding (time test)

I want to time how much slower it is if i were to do a simple operation like 1+1 and plus(int l,int r) which does l+r and throws an exception on overflow heres some example code with _C and _V being carry and overflow. The exception code can be written differently if you like.
How do i write it so i can quickly test for carry/overflow and throw an exception if it is true?
I never did jumps (or even some of the basics) in assembly so i a bit clueless even after googling.
This should work in x32 comps. Currently i am running on a Intel Core Duo which has the x86, x86-64 set
unsigned int plus(unsigned int l, unsigned int r){
unsigned int v = l+r;
if (!_C) return v;
throw 1;
}
int plus(int l, int r){
int v = l+r;
if (!_V) return v;
throw 1;
}
Do you want C/C++ code to perform these operations, or do you want to know how to do it in x86 assembly language?
In C/C++, determining the carry is easy:
int _C = (v < l); // "v < r" works too
The overflow is a bit more complicated. Normally overflow is flagged when the two operands have the same sign yet the result has a different sign. On two's complement architectures such as x86, this can be written as:
int _V = ((l ^ r) >= 0) && ((l ^ v) < 0);
The MSB (sign bit) of l ^ r will be 0 if and only if the signs agree, and similarly l ^ v will have a nonzero sign bit (=value less than zero) if and only if l and v have opposite signs.
If you want to write it in assembly, you just do the add and use a jc or jo respectively to jump to the carry/overflow handler. However, you can't easily throw C++ exceptions from assembly code. The easiest way to do this is probably to write a simple one-line function in C++ that throws the exception and call that from your assembly code. The final asm code will look something like this:
; Assuming eax=l, ebx=r
add eax, ebx
jc .have_carry
; Continue computation here...
.have_carry:
call throw_overflow_exception
with the following C++ helper function defined somewhere:
extern "C" void throw_overflow_exception()
{
throw 1; // or some other exception
}
You need the extern C to disable C++ name mangling for this function. There are other conventions to (e.g. some compilers add an underscore before or after C function names) - this depends on the compiler used and the architecture though.
Shift bit both inputs right by one. Add them both then test the left-most bit.
l>>1;
r>>1;
int result = l+r;
if (result>>((sizeof(int)*8)-1)) { /* handle overflow */ }
Here is the test code i end up using
-edit- I tested on VS 2010 outside of the IDE and in GCC. gcc is a lot faster as VC optimizes the variables poorly. (VS c++ moves the register back into V when assembly is used. It doesnt need to do that.) gcc shows little between the 3 test. The results varied from each run so often that they all looked like it was the same test. I couldnt tell. VS showed checking C flag without asm about 3x slower while with asm 6x slower.
#include <stdio.h>
#include <intrin.h>
int main(int argc, char*argv[])
{
try{
for(int n=0; n<10; n++){
volatile unsigned int v, r;
sscanf("0", "%d", &v);
sscanf("0", "%d", &r);
__int64 time = 0xFFFFFFFF;
__int64 start = __rdtsc();
for(int i=0; i<100000000; i++)
{
v=v+v;
#if 1
__asm jc e
continue;
e:
throw 1;
#endif
}
__int64 end = __rdtsc();
time = end - start;
printf("time: %I64d\n", time/10000000);
}
}
catch(int v)
{
printf("exception");
}
return 0;
}

Advantage of switch over if-else statement

What's the best practice for using a switch statement vs using an if statement for 30 unsigned enumerations where about 10 have an expected action (that presently is the same action). Performance and space need to be considered but are not critical. I've abstracted the snippet so don't hate me for the naming conventions.
switch statement:
// numError is an error enumeration type, with 0 being the non-error case
// fire_special_event() is a stub method for the shared processing
switch (numError)
{
case ERROR_01 : // intentional fall-through
case ERROR_07 : // intentional fall-through
case ERROR_0A : // intentional fall-through
case ERROR_10 : // intentional fall-through
case ERROR_15 : // intentional fall-through
case ERROR_16 : // intentional fall-through
case ERROR_20 :
{
fire_special_event();
}
break;
default:
{
// error codes that require no additional action
}
break;
}
if statement:
if ((ERROR_01 == numError) ||
(ERROR_07 == numError) ||
(ERROR_0A == numError) ||
(ERROR_10 == numError) ||
(ERROR_15 == numError) ||
(ERROR_16 == numError) ||
(ERROR_20 == numError))
{
fire_special_event();
}
Use switch.
In the worst case the compiler will generate the same code as a if-else chain, so you don't lose anything. If in doubt put the most common cases first into the switch statement.
In the best case the optimizer may find a better way to generate the code. Common things a compiler does is to build a binary decision tree (saves compares and jumps in the average case) or simply build a jump-table (works without compares at all).
For the special case that you've provided in your example, the clearest code is probably:
if (RequiresSpecialEvent(numError))
fire_special_event();
Obviously this just moves the problem to a different area of the code, but now you have the opportunity to reuse this test. You also have more options for how to solve it. You could use std::set, for example:
bool RequiresSpecialEvent(int numError)
{
return specialSet.find(numError) != specialSet.end();
}
I'm not suggesting that this is the best implementation of RequiresSpecialEvent, just that it's an option. You can still use a switch or if-else chain, or a lookup table, or some bit-manipulation on the value, whatever. The more obscure your decision process becomes, the more value you'll derive from having it in an isolated function.
The switch is faster.
Just try if/else-ing 30 different values inside a loop, and compare it to the same code using switch to see how much faster the switch is.
Now, the switch has one real problem : The switch must know at compile time the values inside each case. This means that the following code:
// WON'T COMPILE
extern const int MY_VALUE ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
won't compile.
Most people will then use defines (Aargh!), and others will declare and define constant variables in the same compilation unit. For example:
// WILL COMPILE
const int MY_VALUE = 25 ;
void doSomething(const int p_iValue)
{
switch(p_iValue)
{
case MY_VALUE : /* do something */ ; break ;
default : /* do something else */ ; break ;
}
}
So, in the end, the developper must choose between "speed + clarity" vs. "code coupling".
(Not that a switch can't be written to be confusing as hell... Most the switch I currently see are of this "confusing" category"... But this is another story...)
Edit 2008-09-21:
bk1e added the following comment: "Defining constants as enums in a header file is another way to handle this".
Of course it is.
The point of an extern type was to decouple the value from the source. Defining this value as a macro, as a simple const int declaration, or even as an enum has the side-effect of inlining the value. Thus, should the define, the enum value, or the const int value change, a recompilation would be needed. The extern declaration means the there is no need to recompile in case of value change, but in the other hand, makes it impossible to use switch. The conclusion being Using switch will increase coupling between the switch code and the variables used as cases. When it is Ok, then use switch. When it isn't, then, no surprise.
.
Edit 2013-01-15:
Vlad Lazarenko commented on my answer, giving a link to his in-depth study of the assembly code generated by a switch. Very enlightning: http://lazarenko.me/switch/
Compiler will optimise it anyway - go for the switch as it's the most readable.
The Switch, if only for readability. Giant if statements are harder to maintain and harder to read in my opinion.
ERROR_01 : // intentional fall-through
or
(ERROR_01 == numError) ||
The later is more error prone and requires more typing and formatting than the first.
Code for readability. If you want to know what performs better, use a profiler, as optimizations and compilers vary, and performance issues are rarely where people think they are.
Compilers are really good at optimizing switch. Recent gcc is also good at optimizing a bunch of conditions in an if.
I made some test cases on godbolt.
When the case values are grouped close together, gcc, clang, and icc are all smart enough to use a bitmap to check if a value is one of the special ones.
e.g. gcc 5.2 -O3 compiles the switch to (and the if something very similar):
errhandler_switch(errtype): # gcc 5.2 -O3
cmpl $32, %edi
ja .L5
movabsq $4301325442, %rax # highest set bit is bit 32 (the 33rd bit)
btq %rdi, %rax
jc .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Notice that the bitmap is immediate data, so there's no potential data-cache miss accessing it, or a jump table.
gcc 4.9.2 -O3 compiles the switch to a bitmap, but does the 1U<<errNumber with mov/shift. It compiles the if version to series of branches.
errhandler_switch(errtype): # gcc 4.9.2 -O3
leal -1(%rdi), %ecx
cmpl $31, %ecx # cmpl $32, %edi wouldn't have to wait an extra cycle for lea's output.
# However, register read ports are limited on pre-SnB Intel
ja .L5
movl $1, %eax
salq %cl, %rax # with -march=haswell, it will use BMI's shlx to avoid moving the shift count into ecx
testl $2150662721, %eax
jne .L10
.L5:
rep ret
.L10:
jmp fire_special_event()
Note how it subtracts 1 from errNumber (with lea to combine that operation with a move). That lets it fit the bitmap into a 32bit immediate, avoiding the 64bit-immediate movabsq which takes more instruction bytes.
A shorter (in machine code) sequence would be:
cmpl $32, %edi
ja .L5
mov $2150662721, %eax
dec %edi # movabsq and btq is fewer instructions / fewer Intel uops, but this saves several bytes
bt %edi, %eax
jc fire_special_event
.L5:
ret
(The failure to use jc fire_special_event is omnipresent, and is a compiler bug.)
rep ret is used in branch targets, and following conditional branches, for the benefit of old AMD K8 and K10 (pre-Bulldozer): What does `rep ret` mean?. Without it, branch prediction doesn't work as well on those obsolete CPUs.
bt (bit test) with a register arg is fast. It combines the work of left-shifting a 1 by errNumber bits and doing a test, but is still 1 cycle latency and only a single Intel uop. It's slow with a memory arg because of its way-too-CISC semantics: with a memory operand for the "bit string", the address of the byte to be tested is computed based on the other arg (divided by 8), and isn't limited to the 1, 2, 4, or 8byte chunk pointed to by the memory operand.
From Agner Fog's instruction tables, a variable-count shift instruction is slower than a bt on recent Intel (2 uops instead of 1, and shift doesn't do everything else that's needed).
Use switch, it is what it's for and what programmers expect.
I would put the redundant case labels in though - just to make people feel comfortable, I was trying to remember when / what the rules are for leaving them out.
You don't want the next programmer working on it to have to do any unnecessary thinking about language details (it might be you in a few months time!)
Sorry to disagree with the current accepted answer. This is the year 2021. Modern compilers and their optimizers shouldn't differentiate between switch and an equivalent if-chain anymore. If they still do, and create poorly optimized code for either variant, then write to the compiler vendor (or make it public here, which has a higher change of being respected), but don't let micro-optimizations influence your coding style.
So, if you use:
switch (numError) { case ERROR_A: case ERROR_B: ... }
or:
if(numError == ERROR_A || numError == ERROR_B || ...) { ... }
or:
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
constexpr std::array errList = { ERROR_A, ERROR_B, ... };
if(has(errList, rnd)) { ... }
shouldn't make a difference with respect to execution speed. But depending on what project you are working on, they might make a big difference in coding clarity and code maintainability. For example, if you have to check for a certain error list in many places of the code, the templated has() might be much easier to maintain, as the errList needs to be updated only in one place.
Talking about current compilers, I have compiled the test code quoted below with both clang++ -O3 -std=c++1z (version 10 and 11) and g++ -O3 -std=c++1z. Both clang versions gave similiar compiled code and execution times. So I am talking only about version 11 from now on. Most notably, functionA() (which uses if) and functionB() (which uses switch) produce exactly the same assembler output with clang! And functionC() uses a jump table, even though many other posters deemed jump tables to be an exclusive feature of switch. However, despite many people considering jump tables to be optimal, that was actually the slowest solution on clang: functionC() needs around 20 percent more execution time than functionA() or functionB().
The hand-optimized version functionH() was by far the fastest on clang. It even unrolled the loop partially, doing two iterations on each loop.
Actually, clang calculated the bitfield, which is explicitely supplied in functionH(), also in functionA() and functionB(). However, it used conditional branches in functionA() and functionB(), which made these slow, because branch prediction fails regularly, while it used the much more efficient adc ("add with carry") in functionH(). While it failed to apply this obvious optimization also in the other variants, is unknown to me.
The code produced by g++ looks much more complicated than that of clang - but actually runs a bit faster for functionA() and quite a lot faster for functionC(). Of the non-hand-optimized functions, functionC() is the fastest on g++ and faster than any of the functions on clang. On the contrary, functionH() requires twice the execution time when compiled with g++ instead of with clang, mostly because g++ doesn't do the loop unrolling.
Here are the detailed results:
clang:
functionA: 109877 3627
functionB: 109877 3626
functionC: 109877 4192
functionH: 109877 524
g++:
functionA: 109877 3337
functionB: 109877 4668
functionC: 109877 2890
functionH: 109877 982
The Performance changes drastically, if the constant 32 is changed to 63 in the whole code:
clang:
functionA: 106943 1435
functionB: 106943 1436
functionC: 106943 4191
functionH: 106943 524
g++:
functionA: 106943 1265
functionB: 106943 4481
functionC: 106943 2804
functionH: 106943 1038
The reason for the speedup is, that in case, that the highest tested value is 63, the compilers remove some unnecessary bound checks, because the value of rnd is bound to 63, anyways. Note that with that bound check removed, the non-optimized functionA() using simple if() on g++ performs almost as fast as the hand-optimized functionH(), and it also produces rather similiar assembler output.
What is the conclusion? If you hand-optimize and test compilers a lot, you will get the fastest solution. Any assumption whether switch or if is better, is void - they are the same on clang. And the easy to code solution to check against an array of values is actually the fastest case on g++ (if leaving out hand-optimization and by-incident matching last values of the list).
Future compiler versions will optimize your code better and better and get closer to your hand optimization. So don't waste your time on it, unless cycles are REALLY crucial in your case.
Here the test code:
#include <iostream>
#include <chrono>
#include <limits>
#include <array>
#include <algorithm>
unsigned long long functionA() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(rnd == 1 || rnd == 7 || rnd == 10 || rnd == 16 ||
rnd == 21 || rnd == 22 || rnd == 63)
{
cnt += 1;
}
}
return cnt;
}
unsigned long long functionB() {
unsigned long long cnt = 0;
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
switch(rnd) {
case 1:
case 7:
case 10:
case 16:
case 21:
case 22:
case 63:
cnt++;
break;
}
}
return cnt;
}
template<typename C, typename EL>
bool has(const C& cont, const EL& el) {
return std::find(cont.begin(), cont.end(), el) != cont.end();
}
unsigned long long functionC() {
unsigned long long cnt = 0;
constexpr std::array errList { 1, 7, 10, 16, 21, 22, 63 };
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
cnt += has(errList, rnd);
}
return cnt;
}
// Hand optimized version (manually created bitfield):
unsigned long long functionH() {
unsigned long long cnt = 0;
const unsigned long long bitfield =
(1ULL << 1) +
(1ULL << 7) +
(1ULL << 10) +
(1ULL << 16) +
(1ULL << 21) +
(1ULL << 22) +
(1ULL << 63);
for(unsigned long long i = 0; i < 1000000; i++) {
unsigned char rnd = (((i * (i >> 3)) >> 8) ^ i) & 63;
if(bitfield & (1ULL << rnd)) {
cnt += 1;
}
}
return cnt;
}
void timeit(unsigned long long (*function)(), const char* message)
{
unsigned long long mintime = std::numeric_limits<unsigned long long>::max();
unsigned long long fres = 0;
for(int i = 0; i < 100; i++) {
auto t1 = std::chrono::high_resolution_clock::now();
fres = function();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
if(duration < mintime) {
mintime = duration;
}
}
std::cout << message << fres << " " << mintime << std::endl;
}
int main(int argc, char* argv[]) {
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
timeit(functionA, "functionA: ");
timeit(functionB, "functionB: ");
timeit(functionC, "functionC: ");
timeit(functionH, "functionH: ");
return 0;
}
IMO this is a perfect example of what switch fall-through was made for.
They work equally well. Performance is about the same given a modern compiler.
I prefer if statements over case statements because they are more readable, and more flexible -- you can add other conditions not based on numeric equality, like " || max < min ". But for the simple case you posted here, it doesn't really matter, just do what's most readable to you.
If your cases are likely to remain grouped in the future--if more than one case corresponds to one result--the switch may prove to be easier to read and maintain.
switch is definitely preferred. It's easier to look at a switch's list of cases & know for sure what it is doing than to read the long if condition.
The duplication in the if condition is hard on the eyes. Suppose one of the == was written !=; would you notice? Or if one instance of 'numError' was written 'nmuError', which just happened to compile?
I'd generally prefer to use polymorphism instead of the switch, but without more details of the context, it's hard to say.
As for performance, your best bet is to use a profiler to measure the performance of your application in conditions that are similar to what you expect in the wild. Otherwise, you're probably optimizing in the wrong place and in the wrong way.
I agree with the compacity of the switch solution but IMO you're hijacking the switch here.
The purpose of the switch is to have different handling depending on the value.
If you had to explain your algo in pseudo-code, you'd use an if because, semantically, that's what it is: if whatever_error do this...
So unless you intend someday to change your code to have specific code for each error, I would use if.
I'm not sure about best-practise, but I'd use switch - and then trap intentional fall-through via 'default'
Aesthetically I tend to favor this approach.
unsigned int special_events[] = {
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20
};
int special_events_length = sizeof (special_events) / sizeof (unsigned int);
void process_event(unsigned int numError) {
for (int i = 0; i < special_events_length; i++) {
if (numError == special_events[i]) {
fire_special_event();
break;
}
}
}
Make the data a little smarter so we can make the logic a little dumber.
I realize it looks weird. Here's the inspiration (from how I'd do it in Python):
special_events = [
ERROR_01,
ERROR_07,
ERROR_0A,
ERROR_10,
ERROR_15,
ERROR_16,
ERROR_20,
]
def process_event(numError):
if numError in special_events:
fire_special_event()
while (true) != while (loop)
Probably the first one is optimised by the compiler, that would explain why the second loop is slower when increasing loop count.
I would pick the if statement for the sake of clarity and convention, although I'm sure that some would disagree. After all, you are wanting to do something if some condition is true! Having a switch with one action seems a little... unneccesary.
Im not the person to tell you about speed and memory usage, but looking at a switch statment is a hell of a lot easier to understand then a large if statement (especially 2-3 months down the line)
I would say use SWITCH. This way you only have to implement differing outcomes. Your ten identical cases can use the default. Should one change all you need to is explicitly implement the change, no need to edit the default. It's also far easier to add or remove cases from a SWITCH than to edit IF and ELSEIF.
switch(numerror){
ERROR_20 : { fire_special_event(); } break;
default : { null; } break;
}
Maybe even test your condition (in this case numerror) against a list of possibilities, an array perhaps so your SWITCH isn't even used unless there definately will be an outcome.
Seeing as you only have 30 error codes, code up your own jump table, then you make all optimisation choices yourself (jump will always be quickest), rather than hope the compiler will do the right thing. It also makes the code very small (apart from the static declaration of the jump table). It also has the side benefit that with a debugger you can modify the behaviour at runtime should you so need, just by poking the table data directly.
I know its old but
public class SwitchTest {
static final int max = 100000;
public static void main(String[] args) {
int counter1 = 0;
long start1 = 0l;
long total1 = 0l;
int counter2 = 0;
long start2 = 0l;
long total2 = 0l;
boolean loop = true;
start1 = System.currentTimeMillis();
while (true) {
if (counter1 == max) {
break;
} else {
counter1++;
}
}
total1 = System.currentTimeMillis() - start1;
start2 = System.currentTimeMillis();
while (loop) {
switch (counter2) {
case max:
loop = false;
break;
default:
counter2++;
}
}
total2 = System.currentTimeMillis() - start2;
System.out.println("While if/else: " + total1 + "ms");
System.out.println("Switch: " + total2 + "ms");
System.out.println("Max Loops: " + max);
System.exit(0);
}
}
Varying the loop count changes a lot:
While if/else: 5ms
Switch: 1ms
Max Loops: 100000
While if/else: 5ms
Switch: 3ms
Max Loops: 1000000
While if/else: 5ms
Switch: 14ms
Max Loops: 10000000
While if/else: 5ms
Switch: 149ms
Max Loops: 100000000
(add more statements if you want)
When it comes to compiling the program, I don't know if there is any difference. But as for the program itself and keeping the code as simple as possible, I personally think it depends on what you want to do. if else if else statements have their advantages, which I think are:
allow you to test a variable against specific ranges
you can use functions (Standard Library or Personal) as conditionals.
(example:
`int a;
cout<<"enter value:\n";
cin>>a;
if( a > 0 && a < 5)
{
cout<<"a is between 0, 5\n";
}else if(a > 5 && a < 10)
cout<<"a is between 5,10\n";
}else{
"a is not an integer, or is not in range 0,10\n";
However, If else if else statements can get complicated and messy (despite your best attempts) in a hurry. Switch statements tend to be clearer, cleaner, and easier to read; but can only be used to test against specific values (example:
`int a;
cout<<"enter value:\n";
cin>>a;
switch(a)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
cout<<"a is between 0,5 and equals: "<<a<<"\n";
break;
//other case statements
default:
cout<<"a is not between the range or is not a good value\n"
break;
I prefer if - else if - else statements, but it really is up to you. If you want to use functions as the conditions, or you want to test something against a range, array, or vector and/or you don't mind dealing with the complicated nesting, I would recommend using If else if else blocks. If you want to test against single values or you want a clean and easy to read block, I would recommend you use switch() case blocks.