Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I want to loop over a sequence, but I want to dynamically choose where to start the loop within the sequence. I designed this flow pattern.
switch(offset){
start:
currentObject = objects[index++]; //a different object is chosen to be manipulated by the sequence of code
case 0:
sub_sequence(currentObject); // a sequence that is repeated within the larger sequence of the entire switch
if(enough_actions) break;
case 1:
sub_sequence(currentObject);
if(enough_actions) break;
case 2:
sub_sequence(currentObject);
if(enough_actions) break;
goto start;
}
It seems to fit my needs well but I've never seen this design before. Is there anything wrong with this design? Should I be inclined to use an alternative?
What you have constructed there is a Duff's device. While it avoids duplicate source code, it is not only difficult to understand for humans, but just as difficult to optimize for the compiler as well.
switch(offset)
{
case 0:
sub_sequence(currentObject); // a sequence that is repeated within the larger sequence of the entire switch
if(enough_actions) break;
case 1:
sub_sequence(currentObject);
if(enough_actions) break;
case 2:
sub_sequence(currentObject);
if(enough_actions) break;
//a different object is chosen to be manipulated by the sequence of code
currentObject = objects[index++];
while(true) {
sub_sequence(currentObject);
if(enough_actions) break;
sub_sequence(currentObject);
if(enough_actions) break;
sub_sequence(currentObject);
if(enough_actions) break;
currentObject = objects[index++];
}
}
By separating the loop from the variable entry point, you are giving the compiler much more freedom to perform optimizations.
In the original code, it was separated by the start: label and 3 case: labels which force the compiler to treat each code section in between two labels individually.
Without these labels, the compiler may now apply optimizations specific to switch statements to the switch block, as well possibly additional loop unrolling or other strategies to the while loop.
In the end, going for the more readable variant may yield machine code which is both more compact and faster.
This is arguably one of the few cases where "duplicating" code is acceptable, since the switch and the while block only look similar, but are still behaving entirely different.
EDIT1: Moved loop to the end of the switch statement in order to handle enough_actions correctly. The loop could have been placed outside the switch block if there had been no condition for an early exit.
BONUS: Switch free implementation:
for(;!enough_actions;offset = 0,currentObject = objects[index++]) {
for(int i = offset; i < 3 && !enough_actions; i++) {
sub_sequence(currentObject);
}
}
You could also do:
switch(offset)
{
do
{
currentObject = objects[index++]; //a different object is chosen to be manipulated by the sequence of code
case 0:
sub_sequence(); // a sequence that is repeated within the larger sequence of the entire switch
if(enough_actions) break;
case 1:
sub_sequence();
if(enough_actions) break;
case 2:
sub_sequence();
if(enough_actions) break;
}
while (1);
}
So you avoid the goto ;)
(Where as stated in comment, technically here is no sense to avoid goto if this behaving IS needed)
But yeh, you are right, both should fit your needs.
I've examined the assembly code produced by Microsoft compilers for the following fibonacci function, and the compiler was still able to slightly modify the unfolded loop sequence (I assume to optimize register dependencies).
unsigned int fib(unsigned int n)
{
unsigned int f0, f1;
f0 = n & 1; /* if n even, f0=0=fib(0), f1=1=fib(-1) */
f1 = 1 - f0; /* else f1=0=fib(0), f0=1=fib(-1) */
switch(n%8){
do{
f1 += f0;
case 7:
f0 += f1;
case 6:
f1 += f0;
case 5:
f0 += f1;
case 4:
f1 += f0;
case 3:
f0 += f1;
case 2:
f1 += f0;
case 1:
f0 += f1;
case 0:
continue;
}while(0 <= (int)(n -= 8));
}
return f0;
}
Produced assembly code:
_fib PROC ; _n$ = eax
push esi
mov esi, eax
and eax, 1
mov edx, esi
mov ecx, 1
and edx, 7
sub ecx, eax
cmp edx, 7
ja SHORT $LN9#fib
jmp DWORD PTR $LN17#fib[edx*4]
$LN10#fib:
sub esi, 8
js SHORT $LN9#fib
add ecx, eax
$LN8#fib:
add eax, ecx
$LN7#fib:
add ecx, eax
$LN6#fib:
add eax, ecx
$LN5#fib:
add ecx, eax
$LN4#fib:
add eax, ecx
$LN3#fib:
add ecx, eax
$LN2#fib:
add eax, ecx
jmp SHORT $LN10#fib
$LN9#fib:
pop esi
ret 0
npad 1
$LN17#fib: ;jump table
DD $LN10#fib
DD $LN2#fib
DD $LN3#fib
DD $LN4#fib
DD $LN5#fib
DD $LN6#fib
DD $LN7#fib
DD $LN8#fib
_fib ENDP
Perhaps this is more applicable to situations like a linear feed back shift register, where the loop is unfolded to save shifting data between variables. For example:
while(...){
e = f(a,b,c,d);
a = b;
b = c;
c = d;
d = e;
}
is unfolded into
do{
a = f(a,b,c,d);
case 3:
b = f(b,c,d,a);
case 2:
c = f(c,d,a,b);
case 1:
d = f(d,a,b,c);
case 0:
}while(...);
and if the number of elements isn't a multiple of 4, then Duff's device is used to enter the unfolded loop.
Related
How do I use multiple if-else statements or the switch operator from C/C++ in Assembly?
Something like this in C:
if ( number == 2 )
printf("TWO");
else if ( number == 3 )
printf("THREE");
else if ( number == 4 )
printf("FOUR");
Or using switch:
switch (i)
{
case 2:
printf("TWO"); break;
case 3:
printf("THREE"); break;
case 4:
printf("FOUR"); break;
}
Thanks.
Architecture is critical for specifics but here's some psuedo code which does what you want.
... # your code
jmp SWITCH
OPTION1:
... # do option 1
jmp DONE
OPTION2:
... # do option 2
jmp DONE
Option3:
... # do option 3
jmp DONE
SWITCH:
if opt1:
jmp OPTION1
if opt2:
jmp OPTION2
if opt3:
jmp OPTION3
DONE:
... #continue your program
A detailed answer will depend upon the particular machine instruction set for which you are writing the assembly language. Basically you write assembly code to perform the C language series of tests (if statements) and branches.
In pseudo-assembly it might look like this:
load r1, number // load the value of number into register 1
cmpi r1, 2 // compare register 1 to the immediate value 2
bne test_for_3 // branch to label "test_for_3" if the compare results is not equal
call printf // I am ignoring the parameter passing here
... // but this is where the code goes to handle
... // the case where number == 2
branch the_end // branch to the label "the_end"
test_for_3: // labels the instruction location (program counter)
// such that branch instructions can reference it
cmpi r1, 3 // compare register 1 to immediate value 3
bne test_for_4 // branch if not equal to label "test_for_4"
... // perform printf "THREE"
branch the_end // branch to the label "the_end"
test_for_4: // labels the instruction location for above branch
cmpi r1, 4 // compare register 1 to immediate value 4
bne the_end // branch if not equal to label "the_end"
... // perform printf "FOUR"
the_end: // labels the instruction location following your 3 test for the value of number
How do you write the if else statement below in assembly languange?
C Code:
If ( input < WaterLevel)
{
MC = 1;
}
else if ( input == WaterLevel)
{
MC = 0;
}
Pseudocode
If input < Water Level
Send 1 to microcontroller
Turn Motor On
Else if input == Water Level
Send 0 to microcontroller
Turn Motor Off
Incomplete Assembly: (MC- Microcontroller)
CMP Input, WaterLevel
MOV word[MC], 1
MOV word[MC], 2
If we want to do something in C like:
if (ax < bx)
{
X = -1;
}
else
{
X = 1;
}
it would look in Assembly like this:
cmp ax, bx
jl Less
mov word [X], 1
jmp Both
Less:
mov word [X], -1
Both:
Not knowing the particular assembly language you are using, I'll write this out in pseudocode:
compare input to waterlevel
if less, jump to A
if equal, jump to B
jump to C
A:
send 1 to microcontroller
turn motor on
jump to C
B:
send 0 to microcontroller
turn motor off
C:
...
For the first three commands: most assembly languages have conditional branch commands to test the value of the zero or sign bit and jump or not according to whether the bit is set.
Assume we have a following code:
switch (currentChar) {
case 'G':
case 'T':
case 'M':
case ';':
case '\r':
case '\n':
doSomething();
break;
}
If the first condition is met (currentChar == 'G') are the following cases also compared, or the program jumps straight to doSomething()?
What would be faster to execute: the switch-case, or an if with || operator?
Clarification:
I want doSomething to be executed if any of the conditions is met. I also know that the 'G' case will occur in some 99% of all cases. Can I assume that it will be compared as the first if I put it on top of the list?
If the first condition is met (currentChar == 'G') are the following cases also evaluated, or the program jumps straight to doSomething()?
It will immediately jump to execute doSomething()
What would be faster to execute: the switch-case, or an if with || operator?
I don't think it would make any difference with any decent modern c++ compiler, and the emitted code should be fairly the same.
What would be faster to execute: the switch-case, or an if with || operator?
Go for switch(). If you have an enum or integer with small value set, switch() will usually create a jump table.
Once currentCharis compared to 'G', instructions jump to instruction doSomething(). You cannot rely on order of your cases to "optimize" the switch.
Note that comparison is not necessary sequential.
switch may be implemented as jump table for example:
void foo_switch(char c)
{
switch (c) {
case '0': bar0(); break;
case '1': bar1(); break;
case '2': bar2(); break;
case '3': bar3(); break;
};
}
void foo_if(char c)
{
if (c == '0') {
bar0();
} else if (c == '1') {
bar1();
} else if (c == '2') {
bar2();
} else if (c == '3') {
bar3();
}
}
void foo_table(char c)
{
if ('0' <= c && c <= '3') {
using voidFPtr = void(*)();
voidFPtr funcs[] = {&bar0, &bar1, &bar2, &bar3};
funcs[c - '0']();
}
}
Questions about the performance outcome of a particular style of code are almost always a waste of time.
Here's how gcc5.3 deals with this code after an optimisation pass:
test(char):
cmpb $59, %dil
je .L3
jle .L6
cmpb $77, %dil
je .L3
cmpb $84, %dil
je .L3
cmpb $71, %dil
je .L3
.L1:
rep ret
.L6:
cmpb $10, %dil
je .L3
cmpb $13, %dil
jne .L1
.L3:
jmp doSomething()
I really don't think you could write anything faster without creating a 256-entry jump table, which would have its own consequences in terms of cache locality and exhaustion.
If the first condition is met (currentChar == 'G') are the following
cases also evaluated, or the program jumps straight to doSomething()?
It falls through until it finds a break or hits the end.
What would be faster to execute: the switch-case, or an if with || operator?
You should worry about code readability and supportability, so use whatever is more readable for you. Then if you have issue with program speed work on optimization.
For readability - of course that's subjective, but with switch you get less verbose code, as you do not have to repeat variable name multiple times:
if( currentChar == 'G' || currentChar == 'B' || currentChar == 'C' )
so I would prefer switch in this situation.
switch (currentChar) {
case 'G':
case 'T':
case 'M':
case ';':
case '\r':
case '\n':
doSomething();
break;
}
This makes doSomething() be called if currentChar is G, T, M, ;, \r or \n. It's faster to use a switch than just plain if, because switch statements are often optimized into jump tables. This is why a switch must operate on a constant integral value.
There is no guarantee for the order of checking in a switch case. There is also no guarantee for the order of execution of || if there are no side effects for the expressions.
Basically, if the only difference is timing, c++ guarantees nothing about the order of stuff, on the basis of the as-if rule.
If the first condition is met (currentChar == 'G') are the following cases also evaluated, or the program jumps straight to doSomething()?
In your example, it will straight away jump to doSomething(). In case you don't want to have this behavior, then you need to insert break statements as shown for one case below:
switch (currentChar) {
case 'G': /*things to be done */ break /* This break will take it out of switch*/;
case 'T':
case 'M':
case ';':
case '\r':
case '\n':
doSomething();
break;
}
Also, note that in your example, break is not needed, as it is the last statement of your switch statement. Please refer this link for a working example of switch statement.
What would be faster to execute: the switch-case, or an if with || operator?
Assuming that you are using a decent compiler, the difference is minimum so that it can be ignored. Please refer this So link, in case you need to know more specifics.
Edit for your clarification:
I want doSomething() to be executed if any of the conditions is met.
Yes, as per your code, doSomething() would be executed even if only one of the conditions is met.
I also know that the 'G' case will occur in some 99% of all cases. Can I assume that it will be compared as the first if I put it on top of the list?
The remaining cases won't be checked.
I have the following while-loop
uint32_t x = 0;
while(x*x < STOP_CONDITION) {
if(CHECK_CONDITION) x++
// Do other stuff that modifies CHECK_CONDITION
}
The STOP_CONDITION is constant at run-time, but not at compile time. Is there are more efficient way to maintain x*x or do I really need to recompute it every time?
Note: According to the benchmark below, this code runs about 1 -- 2% slower than this option. Please read the disclaimer included at the bottom!
In addition to Tamas Ionut's answer, if you want to maintain STOP_CONDITION as the actual stop condition and avoid the square root calculation, you could update the square using the mathematical identity
(x + 1)² = x² + 2x + 1
whenever you change x:
uint32_t x = 0;
unit32_t xSquare = 0;
while(xSquare < STOP_CONDITION) {
if(CHECK_CONDITION) {
xSquare += 2 * x + 1;
x++;
}
// Do other stuff that modifies CHECK_CONDITION
}
Since the 2*x + 1 is just a bit shift and an increment, the compiler should be able to optimize this fairly well.
Disclaimer: Since you asked "how can I optimize this code" I answered with one particular way to possibly make it faster. Whether the double + increment is actually faster than a single integer multiplication should be tested in practice. Whether you should optimize the code is a different question. I assume you have already benchmarked the loop and found it to be a bottleneck, or that you have a theoretical interest in the question. If you are writing production code that you wish to optimize, first measure the performance and then optimize where needed (which is probably not the x*x in this loop).
What about:
uint32_t x = 0;
double bound= sqrt(STOP_CONDITION);
while(x < bound) {
if(CHECK_CONDITION) x++
// Do other stuff that modifies CHECK_CONDITION
}
This way, you're getting rid of that extra computation.
I made a small benchmarking for Tamas Ionut and CompuChip answers and here are the results:
Tamas Ionut: 19.7068
The code of this method:
uint32_t x = 0;
double bound= sqrt(STOP_CONDITION);
while(x < bound) {
if(CHECK_CONDITION) x++
// Do other stuff that modifies CHECK_CONDITION
}
CompuChip: 20.2056
The code of this method:
uint32_t x = 0;
unit32_t xSquare = 0;
while(xSquare < STOP_CONDITION) {
if(CHECK_CONDITION) {
xSquare += 2 * x + 1;
x++;
}
// Do other stuff that modifies CHECK_CONDITION
}
with STOP_CONDITION = 1000000 and repeating the process 1000000 times
Environment:
Compiler : MSVC 2013
OS : Windows 8.1 - X64
Processor: Core i7-4510U
#2.00 GHZ
Release Mode - Maximize Speed (/O2)
I would say, optimization in readibility is better than optimization in Performance in your case since we are talking about a very small Performance optimization
The compliter can optimize a lot for you regarding Performance but readibility lies in the responsibility of the programmer
I believe Tamas Ionut solution is better than that of CompuChip because we only have x++ inside the for loop. However, a comparison between uint32_t and double will kill the deal. It would be more efficient if we use uint32_t for bound instead of using double. This approach has less problem with numerical overflow because x cannot be greater than 2^16 = 65536 if we want to have a correct x^2 value.
If we also do a heavy work in the loop then results obtained from both approach should be very similar, however, Tamas Ionut approach is more simple and easier to read.
Below is my code and the corresponding assembly code obtained using clang version 3.8.0 with -O3 flag. It is very clear from the assembly code that the first approach is more efficient.
using T = size_t;
void test1(const T stopCondition, bool checkCondition) {
T x = 0;
while (x < stopCondition) {
if (checkCondition) {
x++;
}
// Do something heavy here
}
}
void test2(const T stopCondition, bool checkCondition) {
T x = 0;
T xSquare = 0;
const T threshold = stopCondition * stopCondition;
while (xSquare < threshold) {
if (checkCondition) {
xSquare += 2 * x + 1;
x++;
}
// Do something heavy here
}
}
(gdb) disassemble test1
Dump of assembler code for function _Z5test1mb:
0x0000000000400be0 <+0>: movzbl %sil,%eax
0x0000000000400be4 <+4>: mov %rax,%rcx
0x0000000000400be7 <+7>: neg %rcx
0x0000000000400bea <+10>: nopw 0x0(%rax,%rax,1)
0x0000000000400bf0 <+16>: add %rax,%rcx
0x0000000000400bf3 <+19>: cmp %rdi,%rcx
0x0000000000400bf6 <+22>: jb 0x400bf0 <_Z5test1mb+16>
0x0000000000400bf8 <+24>: retq
End of assembler dump.
(gdb) disassemble test2
Dump of assembler code for function _Z5test2mb:
0x0000000000400c00 <+0>: imul %rdi,%rdi
0x0000000000400c04 <+4>: test %sil,%sil
0x0000000000400c07 <+7>: je 0x400c2e <_Z5test2mb+46>
0x0000000000400c09 <+9>: xor %eax,%eax
0x0000000000400c0b <+11>: mov $0x1,%ecx
0x0000000000400c10 <+16>: test %rdi,%rdi
0x0000000000400c13 <+19>: je 0x400c42 <_Z5test2mb+66>
0x0000000000400c15 <+21>: data32 nopw %cs:0x0(%rax,%rax,1)
0x0000000000400c20 <+32>: add %rcx,%rax
0x0000000000400c23 <+35>: add $0x2,%rcx
0x0000000000400c27 <+39>: cmp %rdi,%rax
0x0000000000400c2a <+42>: jb 0x400c20 <_Z5test2mb+32>
0x0000000000400c2c <+44>: jmp 0x400c42 <_Z5test2mb+66>
0x0000000000400c2e <+46>: test %rdi,%rdi
0x0000000000400c31 <+49>: je 0x400c42 <_Z5test2mb+66>
0x0000000000400c33 <+51>: data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
0x0000000000400c40 <+64>: jmp 0x400c40 <_Z5test2mb+64>
0x0000000000400c42 <+66>: retq
End of assembler dump.
This is the fibonacci code,
unsigned int fib(unsigned int n)
{
if (n==1 || n ==2)
return 1;
else
return fib(n-2) + fib(n-1);
}
but instead for my code, I have to change formula to a new one,
f(n-2)/2 + f(n-1) * 2, so the sequence is 1, 2, 4, 9, 20, 44, 98, 218
I need to write a recursive function called Mobonacci in assembly to calculate the nth number in sequence, and also a main function in c++ that reads a positive number n, then cals mobonacci assembly function with parameter n, then print our result
So I'm kind of confused, do I write the function in assembly like I did below, then write a c++ function to call it? and how would guys change my code from fibonacci to the new formula? Here is my code, what do I need to change and did I need create new part that let the code read input? Also is my code too short? do I need add anything else?
.code
main PROC
mov ecx,0
push 4 ; calculate the nth fib
call Fib ; calculate fib (eax)
call WriteDec
call Crlf
exit
main ENDP
Fib PROC
add ecx,1
push ebp
mov ebp,esp
mov eax,[ebp+8] ; get n
cmp eax,2 ; n == 2?
je exception2
cmp eax,1 ; n == 1?
je exception2
dec eax
push eax ; Fib(n-1)
call fib
add eax,
jmp Quit
Exception2:
dec eax
Quit:
pop ebp ; return EAX
ret 4 ; clean up stack
Fib ENDP
END main
Depends on where you are trying to insert asm code into your c++ code...
For gcc/linux you can do something like :
//Simple example:
void *frame; /* Frame pointer */
__asm__ ("mov %%ebp,%0":"=r"(frame));
//Complicated example:
int foo(void) {
int joe=1234, fred;
__asm__(
" mov %1,%%eax\n"
" add $2,%%eax\n"
" mov %%eax,%0\n"
:"=r" (fred) /* %0: Out */
:"r" (joe) /* %1: In */
:"%eax" /* Overwrite */
);
return fred;
}
The important thing is to understand how to use your asm function in cpp.
You can find some useful things about this subject here : https://www.cs.uaf.edu/2011/fall/cs301/lecture/10_12_asm_c.html
About the second part of your question.
To multiple, you can use the command "mul" and to make a division "div".
So if you want to do f(n-1) * 2
You have to get you register %eax after the "call fib" and use mul.
Just have a look here:
http://www.tutorialspoint.com/assembly_programming/assembly_arithmetic_instructions.htm