Is there any difference between the following two code snippets? Which one is better to use? Is one of them faster?
case 1:
int f(int x)
{
int a;
if(x)
a = 42;
else
a = 0;
return a;
}
case 2:
int f(int x)
{
int a;
if(x)
a = 42;
return a;
}
Actually that both snippets can return totally different results, so there is no better...
In case 2 you can return a non initialized variable a, which may result on a garbage value other than zero...
if you mean this:
int f(int x)
{
int a = 0;
if(x)
a = 42;
return a;
}
then I would say is that better, since is more compact(but you are saving only an else, not much computational wasted anyways)
The question is not "which one is better". The question is "will both work?"
And the answer is no, they will not both work. One is correct, the other is out of the question. So, performance is not even an issue.
The following results in a having either an "indeterminate value" or an "unspecified value" mentioned in the c99 standard, sections 3.17.2 and 3.17.3 (Probably the latter, though it is not clear to me.)
int a;
if(x)
a = 42;
return a;
This in turn means that the function will return an unspecified value. This means that that are absolutely no guarantees as to what value you will get.
If you are unlucky, you might get zero, and thus proceed to use the above terrible piece of code without knowing that you are bound to have lots of trouble with it later.
If you are lucky, you will get something like 0x719Ab32d right away, so you will immediately know that you messed up.
Any decent C compiler will give you a warning if you try to compile this, so the fact that you are asking this question means that you do not have a sufficient number of warnings enabled. Do not try to write C code (or any code) without the maximum possible number of warnings enabled; it never leads to any good. Find out how to enable warnings on your C compiler, and enable as many of them as you can.
Note: I assume uninitialized a in your second snippet is a type and it is int a = 0.
We can use gdb to check the difference:
(gdb) list f1
19 {
20 int a;
21 if (x)
22 a = 42;
23 else
24 a = 0;
25 return a;
26 }
(gdb) list f2
28 int f2(int x)
29 {
30 int a = 0;
31 if (x)
32 a = 42;
33 return a;
34 }
Now let's look at the assembler code with -O3:
(gdb) disassemble f1
Dump of assembler code for function f1:
0x00000000004007a0 <+0>: cmp $0x1,%edi
0x00000000004007a3 <+3>: sbb %eax,%eax
0x00000000004007a5 <+5>: not %eax
0x00000000004007a7 <+7>: and $0x2a,%eax
0x00000000004007aa <+10>: retq
End of assembler dump.
(gdb) disassemble f2
Dump of assembler code for function f2:
0x00000000004007b0 <+0>: cmp $0x1,%edi
0x00000000004007b3 <+3>: sbb %eax,%eax
0x00000000004007b5 <+5>: not %eax
0x00000000004007b7 <+7>: and $0x2a,%eax
0x00000000004007ba <+10>: retq
End of assembler dump.
As you can see, there is no difference. Let us disable the optimizations with -O0:
(gdb) disassemble f1
Dump of assembler code for function f1:
0x00000000004006cd <+0>: push %rbp
0x00000000004006ce <+1>: mov %rsp,%rbp
0x00000000004006d1 <+4>: mov %edi,-0x14(%rbp)
0x00000000004006d4 <+7>: cmpl $0x0,-0x14(%rbp)
0x00000000004006d8 <+11>: je 0x4006e3 <f1+22>
0x00000000004006da <+13>: movl $0x2a,-0x4(%rbp)
0x00000000004006e1 <+20>: jmp 0x4006ea <f1+29>
0x00000000004006e3 <+22>: movl $0x0,-0x4(%rbp)
0x00000000004006ea <+29>: mov -0x4(%rbp),%eax
0x00000000004006ed <+32>: pop %rbp
0x00000000004006ee <+33>: retq
End of assembler dump.
(gdb) disassemble f2
Dump of assembler code for function f2:
0x00000000004006ef <+0>: push %rbp
0x00000000004006f0 <+1>: mov %rsp,%rbp
0x00000000004006f3 <+4>: mov %edi,-0x14(%rbp)
0x00000000004006f6 <+7>: movl $0x0,-0x4(%rbp)
0x00000000004006fd <+14>: cmpl $0x0,-0x14(%rbp)
0x0000000000400701 <+18>: je 0x40070a <f2+27>
0x0000000000400703 <+20>: movl $0x2a,-0x4(%rbp)
0x000000000040070a <+27>: mov -0x4(%rbp),%eax
0x000000000040070d <+30>: pop %rbp
0x000000000040070e <+31>: retq
End of assembler dump.
Now there is a difference and the first version in average for random arguments x will be faster as it has one mov less that the second one.
In case your second code is
int f(int x)
{
int a=0;
if(x)
a = 42;
return a;
}
and not
int f(int x)
{
int a;
if(x)
a = 42;
return a;
}
It doesn't matter.The compiler will convert them to same optimized code
I would prefer this (your second snippet):
int f(int x) {
int a = 0;
if (x) {
a = 42;
}
return a;
}
Everything should always have braces. Even if now I only have one line in the if block, I made add more later.
I don't put the braces on their own lines because it's pointless waste of space.
I rarely put the block on the same line as the conditional for readability.
You don't need extra space for a in either case - you can do something like this -
int f(int x)
{
if(x)
return 42;
else
return 0;
}
BTW in your second function you have not initialised a.
Related
There are some existing questions about GCC ordering of variables on the stack. However, those usually involve intermixed variables and arrays, and this is not that. I'm working with the GCC 9.2.0 64-bit release, with no special flags on. If I do this:
#include <iostream>
int main() {
int a = 15, b = 30, c = 45, d = 60;
// std::cout << &a << std::endl;
return 0;
}
Then the memory layout is seen as in the disassembly here:
0x000000000040156d <+13>: mov DWORD PTR [rbp-0x4],0xf
0x0000000000401574 <+20>: mov DWORD PTR [rbp-0x8],0x1e
0x000000000040157b <+27>: mov DWORD PTR [rbp-0xc],0x2d
0x0000000000401582 <+34>: mov DWORD PTR [rbp-0x10],0x3c
So: The four variables are in order at offsets 0x04, 0x08, 0x0C, 0x10 from the RBP; that is, sequenced in the same order they were declared. This is consistent and deterministic; I can re-compile, add other lines of code (random printing statements, other later variables, etc.) and the layout remains the same.
However, as soon as I include a line that touches an address or pointer, then the layout changes. For example, this:
#include <iostream>
int main() {
int a = 15, b = 30, c = 45, d = 60;
std::cout << &a << std::endl;
return 0;
}
Produces this:
0x000000000040156d <+13>: mov DWORD PTR [rbp-0x10],0xf
0x0000000000401574 <+20>: mov DWORD PTR [rbp-0x4],0x1e
0x000000000040157b <+27>: mov DWORD PTR [rbp-0x8],0x2d
0x0000000000401582 <+34>: mov DWORD PTR [rbp-0xc],0x3c
So: A scrambled-up layout with the variables at offsets now respectively at 0x10, 0x04, 0x08, 0x0C. Again, this is consistent with any re-compiles, most random code I think to add, etc.
However, if I just touch a different address like so:
#include <iostream>
int main() {
int a = 15, b = 30, c = 45, d = 60;
std::cout << &b << std::endl;
return 0;
}
Then the variables get ordered like this:
0x000000000040156d <+13>: mov DWORD PTR [rbp-0x4],0xf
0x0000000000401574 <+20>: mov DWORD PTR [rbp-0x10],0x1e
0x000000000040157b <+27>: mov DWORD PTR [rbp-0x8],0x2d
0x0000000000401582 <+34>: mov DWORD PTR [rbp-0xc],0x3c
That is, a different sequence at offsets 0x04, 0x10, 0x08, 0x0C. Once again, this is consistent as far as I can tell with recompilations and code changes, excepting if I refer to some other address in the code.
If I didn't know any better, it would seem like the integer variables are placed in declaration order, unless the code does any manipulation with addressing, at which point it starts scrambling them up in some deterministic way.
Some responses that will not satisfy this question are as follows:
"The behavior is undefined in the C++ standard" -- I'm not asking about the C++ standard, I'm asking specifically about how this GCC compiler makes its decision on layout.
"The compiler can do whatever it wants" -- Does not answer how the compiler decides on what it "wants" in this specific, consistent case.
Why does the GCC compiler layout integer variables in this way?
What explains the consistent re-ordering seen here?
Edit: I guess on closer inspection, the variable whose address I touch is always placed in [rbp-0x10], and then the other ones are put in declaration order sequence after that. Why would that be beneficial? Note that printing the values of any of these variables don't seem to trigger the same re-ordering, from what I can tell.
You should compile your daniel.cc C++ code with g++ -O -fverbose-asm -daniel.cc -S -o daniel.s and look into the generated assembler code daniel.s
For your first example, a lot of constants and slots in your call frame have disappeared, since optimized:
.text
.globl main
.type main, #function
main:
.LFB1644:
.cfi_startproc
endbr64
subq $24, %rsp #,
.cfi_def_cfa_offset 32
# daniel.cc:2: int main() {
movq %fs:40, %rax # MEM[(<address-space-1> long unsigned int *)40B], tmp89
movq %rax, 8(%rsp) # tmp89, D.41631
xorl %eax, %eax # tmp89
# daniel.cc:3: int a = 15, b = 30, c = 45, d = 60;
movl $15, 4(%rsp) #, a
# /usr/include/c++/10/ostream:246: { return _M_insert(__p); }
leaq 4(%rsp), %rsi #, tmp85
leaq _ZSt4cout(%rip), %rdi #,
call _ZNSo9_M_insertIPKvEERSoT_#PLT #
movq %rax, %rdi # tmp88, _4
# /usr/include/c++/10/ostream:113: return __pf(*this);
call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_#PLT #
# daniel.cc:6: }
movq 8(%rsp), %rax # D.41631, tmp90
subq %fs:40, %rax # MEM[(<address-space-1> long unsigned int *)40B], tmp90
jne .L4 #,
movl $0, %eax #,
addq $24, %rsp #,
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L4:
.cfi_restore_state
call __stack_chk_fail#PLT #
.cfi_endproc
.LFE1644:
.size main, .-main
.type _GLOBAL__sub_I_main, #function
If for whatever reason you really require your call frame to contain slots in a known order, you need to use a struct as an automatic variable (and that approach is portable to other C++ compilers).
If you need to understand why GCC has compiled your code the way it did, download the source code of GCC, read the documentation of GCC internals, study it (it is free software).
You should be interested by GCC developer options, they dump a lot of things regarding the internal state of the compiler.
Once you understood a bit what GCC actually does, subscribe to some GCC mailing list (e.g. gcc#gcc.gnu.org) and ask questions there. Alternatively, code your GCC plugin to improve its behavior, change the organization of the call frame, add dumping routines.
If you need to understand or improve GCC, budget several months of full time work, and read the Dragon book before.
They say, the tail recursion optimization works only when the the call is just before return from the function. So they show this code as example of what shouldn't be optimized by C compilers:
long long f(long long n) {
return n > 0 ? f(n - 1) * n : 1;
}
because there the recursive function call is multiplied by n which means the last operation is multiplication, not recursive call. However, it is even on -O1 level:
recursion`f:
0x100000930 <+0>: pushq %rbp
0x100000931 <+1>: movq %rsp, %rbp
0x100000934 <+4>: movl $0x1, %eax
0x100000939 <+9>: testq %rdi, %rdi
0x10000093c <+12>: jle 0x10000094e
0x10000093e <+14>: nop
0x100000940 <+16>: imulq %rdi, %rax
0x100000944 <+20>: cmpq $0x1, %rdi
0x100000948 <+24>: leaq -0x1(%rdi), %rdi
0x10000094c <+28>: jg 0x100000940
0x10000094e <+30>: popq %rbp
0x10000094f <+31>: retq
They say that:
Your final rules are therefore sufficiently correct. However, return n
* fact(n - 1) does have an operation in the tail position! This is the multiplication *, which will be the last thing the function does
before it returns. In some languages, this might actually be
implemented as a function call which could then be tail-call
optimized.
However, as we see from ASM listing, multiplication is still an ASM instruction, not a separate function. So I really struggle to see difference with accumulator approach:
int fac_times (int n, int acc) {
return (n == 0) ? acc : fac_times(n - 1, acc * n);
}
int factorial (int n) {
return fac_times(n, 1);
}
This produces
recursion`fac_times:
0x1000008e0 <+0>: pushq %rbp
0x1000008e1 <+1>: movq %rsp, %rbp
0x1000008e4 <+4>: testl %edi, %edi
0x1000008e6 <+6>: je 0x1000008f7
0x1000008e8 <+8>: nopl (%rax,%rax)
0x1000008f0 <+16>: imull %edi, %esi
0x1000008f3 <+19>: decl %edi
0x1000008f5 <+21>: jne 0x1000008f0
0x1000008f7 <+23>: movl %esi, %eax
0x1000008f9 <+25>: popq %rbp
0x1000008fa <+26>: retq
Am I missing something? Or it's just compilers became smarter?
As you see in the assembly code, the compiler is smart enough to turn your code into a loop that is basically equivalent to (disregarding the different data types):
int fac(int n)
{
int result = n;
while (--n)
result *= n;
return result;
}
GCC is smart enough to know that the state needed by each call to your original f can be kept in two variables (n and result) through the whole recursive call sequence, so that no stack is necessary. It can transform f to fac_times, and both to fac, so to say. This is most likely not only a result of tail call optimization in the strictest sense, but one of the loads of other heuristics that GCC uses for optimization.
(I can't go more into detail regarding the specific heuristics that are used here since I don't know enough about them.)
The non-accumulator f isn't tail-recursive. The compiler's options include turning it into a loop by transforming it, or call / some insns / ret, but they don't include jmp f without other transformations.
tail-call optimization applies in cases like this:
int ext(int a);
int foo(int x) { return ext(x); }
asm output from godbolt:
foo: # #foo
jmp ext # TAILCALL
Tail-call optimization means leaving a function (or recursing) with a jmp instead of a ret. Anything else is not tailcall optimization. Tail-recursion that's optimized with a jmp really is a loop, though.
A good compiler will do further transformations to put the conditional branch at the bottom of the loop when possible, removing the unconditional branch. (In asm, the do{}while() style of looping is the most natural).
I was experimenting with GCC, trying to convince it to assume that certain portions of code are unreachable so as to take opportunity to optimize. One of my experiments gave me somewhat strange code. Here's the source:
#include <iostream>
#define UNREACHABLE {char* null=0; *null=0; return {};}
double test(double x)
{
if(x==-1) return -1;
else if(x==1) return 1;
UNREACHABLE;
}
int main()
{
std::cout << "Enter a number. Only +/- 1 is supported, otherwise I dunno what'll happen: ";
double x;
std::cin >> x;
std::cout << "Here's what I got: " << test(x) << "\n";
}
Here's how I compiled it:
g++ -std=c++11 test.cpp -O3 -march=native -S -masm=intel -Wall -Wextra
And the code of test function looks like this:
_Z4testd:
.LFB1397:
.cfi_startproc
fld QWORD PTR [esp+4]
fld1
fchs
fld st(0)
fxch st(2)
fucomi st, st(2)
fstp st(2)
jp .L10
je .L11
fstp st(0)
jmp .L7
.L10:
fstp st(0)
.p2align 4,,10
.p2align 3
.L7:
fld1
fld st(0)
fxch st(2)
fucomip st, st(2)
fstp st(1)
jp .L12
je .L6
fstp st(0)
jmp .L8
.L12:
fstp st(0)
.p2align 4,,10
.p2align 3
.L8:
mov BYTE PTR ds:0, 0
ud2 // This is redundant, isn't it?..
.p2align 4,,10
.p2align 3
.L11:
fstp st(1)
.L6:
rep; ret
What makes me wonder here is the code at .L8. Namely, it already writes to zero address, which guarantees segmentation fault unless ds has some non-default selector. So why the additional ud2? Isn't writing to zero address already guaranteed crash? Or does GCC not believe that ds has default selector and tries to make a sure-fire crash?
So, your code is writing to address zero (NULL) which in itself is defined to be "undefined behaviour". Since undefined behaviour covers anything, and most importantly for this case, "that it does what you may imagine that it would do" (in other words, writes to address zero rather than crashing). The compiler then decides to TELL you that by adding an UD2 instruction. It's also possible that it is to protect against continuing from a signal handler with further undefined behaviour.
Yes, most machines, under most circumstances, will crash for NULL accesses. But it's not 100% guaranteed, and as I said above, one can catch segfault in a signal handler, and then try to continue - it's really not a good idea to actually continue after trying to write to NULL, so the compiler adds UD2 to ensure you don't go on... It uses 2 bytes more of memory, beyond that I don't see what harm it does [after all, it's undefined what happens - if the compiler wished to do so, it could email random pictures from your filesystem to the Queen of England... I think UD2 is a better choice...]
It is interesting to spot that LLVM does this by itself - I have no special detection of NIL pointer access, but my pascal compiler compiles this:
program p;
var
ptr : ^integer;
begin
ptr := NIL;
ptr^ := 42;
end.
into:
0000000000400880 <__PascalMain>:
400880: 55 push %rbp
400881: 48 89 e5 mov %rsp,%rbp
400884: 48 c7 05 19 18 20 00 movq $0x0,0x201819(%rip) # 6020a8 <ptr>
40088b: 00 00 00 00
40088f: 0f 0b ud2
I'm still trying to figure out where in LLVM this happens and try to understand the purpose of the UD2 instruction itself.
I think the answer is here, in llvm/lib/Transforms/Utils/Local.cpp
void llvm::changeToUnreachable(Instruction *I, bool UseLLVMTrap) {
BasicBlock *BB = I->getParent();
// Loop over all of the successors, removing BB's entry from any PHI
// nodes.
for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB); SI != SE; ++SI)
(*SI)->removePredecessor(BB);
// Insert a call to llvm.trap right before this. This turns the undefined
// behavior into a hard fail instead of falling through into random code.
if (UseLLVMTrap) {
Function *TrapFn =
Intrinsic::getDeclaration(BB->getParent()->getParent(), Intrinsic::trap);
CallInst *CallTrap = CallInst::Create(TrapFn, "", I);
CallTrap->setDebugLoc(I->getDebugLoc());
}
new UnreachableInst(I->getContext(), I);
// All instructions after this are dead.
BasicBlock::iterator BBI = I->getIterator(), BBE = BB->end();
while (BBI != BBE) {
if (!BBI->use_empty())
BBI->replaceAllUsesWith(UndefValue::get(BBI->getType()));
BB->getInstList().erase(BBI++);
}
}
In particular the comment in the middle, where it says "instead of falling through to the into random code". In your code there is no code following the NULL access, but imagine this:
void func()
{
if (answer == 42)
{
#if DEBUG
// Intentionally crash to avoid formatting hard disk for now
char *ptr = NULL;
ptr = 0;
#endif
// Format hard disk.
... some code to format hard disk ...
}
printf("We haven't found the answer yet\n");
...
}
So, this SHOULD crash, but if it doesn't the compiler will ensure that you do not continue after it... It makes UB crashes a little more obvious (and in this case prevents the hard disk from being formatted...)
I was trying to find out when this was introduced, but the function itself originates in 2007, but it's not used for exactly this purpose at the time, which makes it really hard to figure out why it is used this way.
So recently i was thinking about strcpy and back to K&R where they show the implementation as
while (*dst++ = *src++) ;
However I mistakenly transcribed it as:
while (*dst = *src)
{
src++; //technically could be ++src on these lines
dst++;
}
In any case that got me thinking about whether the compiler would actually produce different code for these two. My initial thought is they should be near identical, since src and dst are being incremented but never used I thought the compiler would know not to try to acually preserve them as "variables" in the produced machine code.
Using windows7 with VS 2010 C++ SP1 building in 32 bit Release mode (/O2), I got the dis-assembly code for both of the above incarnations. To prevent the function itself from referencing the input directly and being inlined i made a dll with each of the functions. I have omitted the prologue and epilogue of the produced ASM.
while (*dst++ = *src++)
6EBB1003 8B 55 08 mov edx,dword ptr [src]
6EBB1006 8B 45 0C mov eax,dword ptr [dst]
6EBB1009 2B D0 sub edx,eax //prepare edx so that edx + eax always points to src
6EBB100B EB 03 jmp docopy+10h (6EBB1010h)
6EBB100D 8D 49 00 lea ecx,[ecx] //looks like align padding, never hit this line
6EBB1010 8A 0C 02 mov cl,byte ptr [edx+eax] //ptr [edx+ eax] points to char in src :loop begin
6EBB1013 88 08 mov byte ptr [eax],cl //copy char to dst
6EBB1015 40 inc eax //inc src ptr
6EBB1016 84 C9 test cl,cl // check for 0 (null terminator)
6EBB1018 75 F6 jne docopy+10h (6EBB1010h) //if not goto :loop begin
;
Above I have annotated the code, essentially a single loop , only 1 check for null and 1 memory copy.
Now lets look at my mistake version:
while (*dst = *src)
6EBB1003 8B 55 08 mov edx,dword ptr [src]
6EBB1006 8A 0A mov cl,byte ptr [edx]
6EBB1008 8B 45 0C mov eax,dword ptr [dst]
6EBB100B 88 08 mov byte ptr [eax],cl //copy 0th char to dst
6EBB100D 84 C9 test cl,cl //check for 0
6EBB100F 74 0D je docopy+1Eh (6EBB101Eh) // return if we encounter null terminator
6EBB1011 2B D0 sub edx,eax
6EBB1013 8A 4C 02 01 mov cl,byte ptr [edx+eax+1] //get +1th char :loop begin
{
src++;
dst++;
6EBB1017 40 inc eax
6EBB1018 88 08 mov byte ptr [eax],cl //copy above char to dst
6EBB101A 84 C9 test cl,cl //check for 0
6EBB101C 75 F5 jne docopy+13h (6EBB1013h) // if not goto :loop begin
}
In my version, I see that it first copies the 0th char to the destination, then checks for null , and then finally enters the loop where it checks for null again. So the loop remains largely the same but now it handles the 0th character before the loop. This of course is going to be sub-optimal compared with the first case.
I am wondering if anyone knows why the compiler is being prevented from making the same (or near same) code as the first example. Is this a ms compiler specific issue or possibly with my compiler/linker settings?
here is the full code, 2 files (1 function replaces the other).
// in first dll project
__declspec(dllexport) void docopy(const char* src, char* dst)
{
while (*dst++ = *src++);
}
__declspec(dllexport) void docopy(const char* src, char* dst)
{
while (*dst = *src)
{
++src;
++dst;
}
}
//seprate main.cpp file calls docopy
void docopy(const char* src, char* dst);
char* source ="source";
char destination[100];
int main()
{
docopy(source, destination);
}
Because in the first example, the post-increment happens always, even if src starts out pointing to a null character. In the same starting situation, the second example would not increment the pointers.
Of course the compiler has other options. The "copy first byte then enter the loop if not 0" is what gcc-4.5.1 produces with -O1. With -O2 and -O3, it produces
.LFB0:
.cfi_startproc
jmp .L6 // jump to copy
.p2align 4,,10
.p2align 3
.L4:
addq $1, %rdi // increment pointers
addq $1, %rsi
.L6: // copy
movzbl (%rdi), %eax // get source byte
testb %al, %al // check for 0
movb %al, (%rsi) // move to dest
jne .L4 // loop if nonzero
rep
ret
.cfi_endproc
which is quite similar to what it produces for the K&R loop. Whether that's actually better I can't say, but it looks nicer.
Apart from the jump into the loop, the instructions for the K&R loop are exactly the same, just ordered differently:
.LFB0:
.cfi_startproc
.p2align 4,,10
.p2align 3
.L2:
movzbl (%rdi), %eax // get source byte
addq $1, %rdi // increment source pointer
movb %al, (%rsi) // move byte to dest
addq $1, %rsi // increment dest pointer
testb %al, %al // check for 0
jne .L2 // loop if nonzero
rep
ret
.cfi_endproc
Your second code doesn't "check for null again". In your second version the cycle body works with the characters at edx+eax+1 address (note the +1 part), which would be characters number 1, 2, 3 and so on. The prologue code works with character number 0. That means that the code never checks the same character twice, as you seem to believe. There's no "again" there.
The second code is a bot more convoluted (the first iteration of the cycle is effectively pulled out of it) since, as it has already been explained, its functionality is different. The final values of the pointers differ between your fist and your second version.
I was always interested in assembler, however so far I didn't have a true chance to confront it in a best way. Now, when I do have some time, I began coding some small programs using assembler in a c++, but that's just small ones, i.e. define x, store it somewhere and so on, so forth. I wanted to implement foor loop in assembler, but I couldn't make it, so I would like to ask if anyone here has ever done with it, would be nice to share here. Example of some function would be
for(i=0;i<10;i++) { std::cout<< "A"; }
Anyone has some idea how to implement this in a assembler?
edit2: ISA x86
Here's the unoptimized output1 of GCC for this code:
void some_function(void);
int main()
{
for (int i = 0; i < 137; ++i) { some_function(); }
}
movl $0, 12(%esp) // i = 0; i is stored at %esp + 12
jmp .L2
.L3:
call some_function // some_function()
addl $1, 12(%esp) // ++i
.L2:
cmpl $136, 12(%esp) // compare i to 136 ...
jle .L3 // ... and repeat loop less-or-equal
movl $0, %eax // return 0
leave // --"--
With optimization -O3, the addition+comparison is turned into subtraction:
pushl %ebx // save %ebx
movl $137, %ebx // set %ebx to 137
// some unrelated parts
.L2:
call some_function // some_function()
subl $1, %ebx // subtract 1 from %ebx
jne .L2 // if not equal to 0, repeat loop
1The generated assembly can be examined by invoking GCC with the -S flag.
Try to rewrite the for loop in C++ using a goto and an if statement and you will have the basics for the assembly version.
You could try the reverse - write the program in C++ or C and look at the dissasembled code:
for ( int i = 0 ; i < 10 ; i++ )
00E714EE mov dword ptr [i],0
00E714F5 jmp wmain+30h (0E71500h)
00E714F7 mov eax,dword ptr [i]
00E714FA add eax,1
00E714FD mov dword ptr [i],eax
00E71500 cmp dword ptr [i],0Ah
00E71504 jge wmain+4Bh (0E7151Bh)
cout << "A";
00E71506 push offset string "A" (0E76800h)
00E7150B mov eax,dword ptr [__imp_std::cout (0E792ECh)]
00E71510 push eax
00E71511 call std::operator<<<std::char_traits<char> > (0E71159h)
00E71516 add esp,8
00E71519 jmp wmain+27h (0E714F7h)
then try to make sense of it.