SPARC Assembly if/else if not working - if-statement

I'm translating a C method with two if statements to assembly. I'm having trouble getting a branch to work. I need two branches to be part of the same comparison, and one branch to be on its own. The two grouped branches work, but the lone third one doesn't; even if I use the same comparison, nothing happens with the third call.
ifStatements:
cmp %l3, %l0
ble useThisA ! works
nop
bg useThisB ! works
nop
cmp %l3, %l0
bg useThisC ! doesn't work, even if it's cmp %l3, %l0 again
nop
Why doesn't this work, conceptually?

If ble isn't taken, bg will be taken, and vice versa. Branch C will never be reached.
You didn't show your branch targets or anything. It's likely that one of them can be the fall-through case. e.g.
ble somewhere
; code for the g case
...
Also, I don't know SPARC specifically, but what are those nop instructions for? Is that a branch-delay slot, that's executed whether or not the branch is taken?

Related

How if statements literally works?

I was wondering how an if statement or conditional statements works behind the scenes when executed.
Consider an example like this:
if (10 > 6) {
// Some code
}
How does the compiler or interpreter knows that the number 10 is greater than 6 or 6 is less than 10?
At some point near the end of the compilation, the compiler will convert the above in to assembly language similar to:
start: # This is a label that you can reference
mov ax, 0ah # Store 10 in the ax register
mov bx, 06h # Store 6 in the bx register
cmp ax, bx # Compare ax to bx
jg inside_the_brackets # 10 > 6? Goto `inside_the_brackets`
jmp after_the_brackets # Otherwise skip ahead a little
inside_the_brackets:
# Some code - stuff inside the {} goes here
after_the_brackets:
# The rest of your program. You end up here no matter what.
I haven't written in assembler in years so I know that's a jumble of different varieties, but the above is the gist of it. Now, that's an inefficient way to structure the code, so a smart compiler might write it more like:
start: # This is a label that you can reference
mov ax, 0ah # Store 10 in the ax register
mov bx, 06h # Store 6 in the bx register
cmp ax, bx # Compare ax to bx
jle after_the_brackets # 10 <= 6? Goto `after_the_brackets`
inside_the_brackets:
# Some code - stuff inside the {} goes here
after_the_brackets:
# The rest of your program. You end up here no matter what.
See how that reversed the comparison, so instead of if (10 > 6) it's more like if (10 <= 6)? That removes a jmp instruction. The logic is identical, even if it's no longer exactly what you originally wrote. There -- now you've seen an "optimizing compiler" at work.
Every compiler you're likely to have heard of has a million tricks to convert code you write into assembly language that acts the same, but that the CPU can execute more efficiently. Sometimes the end result is barely recognizable. Some of the optimizations are as simple as what I just did, but others are fiendishly clever and people have earned PhDs in this stuff.
Kirk Strauser answer is correct. However you ask:
How does the compiler or interpreter knows that the number 10 is greater than 6 or 6 is less than 10?
Some optimizer compilers can see that 10 > 6 is a constant expression equivalent to true, and not emit any check or jump at all. If you are asking how they do that, well…
I'll explain the process in steps that hopefully are easy to understand. I'm covering no advanced topics.
The build process will start by parsing your code document according to the syntax of the language.
The syntax of the language will define how to interpret the text of the document (think a string with your code) as a series of symbols or tokens (e.g. keywords, literals, identifiers, operators…). In this case we have:
a if symbol.
a ( symbol.
a 10 symbol.
a > symbol".
a 6 symbol.
a ) symbol.
a { symbol.
and a } symbol.
I'm assuming comments, newlines and white-space do not generate symbols in this language.
From the series of symbols, it will build a tree-like memory structure (see AST) according to the rules of the language.
The tree will say that your code is:
An "if statement", that has two children:
A conditional (a boolean expression), which is a greater than comparison that has two children:
A constant literal integer 10
A constant literal integer 6
A body (a set of statements), in this case empty.
Then the compiler can look at that tree and figure out how to optimize it, and emit code in the target language (let us say machine code).
The optimization process will see that the conditional does not have variables, it is composed entirely of constants that are known at compile time. Thus it can compute the equivalent value and use that. Which leaves us with this:
An "if statement", that has two children:
A conditional (a boolean expression), which is a literal true.
A body (a set of statements), in this case empty.
Then it will see that we have a conditional that is always true, and thus we don't need it. Thus it replaces the if statement with the set of statements in its body. Which are none, we have optimized the code away to nothing.
You can imagine how the process would then go over the tree, figuring out what is the equivalent in the target language (again, let us say, machine code), and emitting that code until it has gone over the whole tree.
I want to mention that intermediate languages and JIT (just in time) compiling has become very common. See Understanding the differences: traditional interpreter, JIT compiler, JIT interpreter and AOT compiler.
My description of how the build process works is a toy textbook example. I would like to encourage to learn further of the topic. I'll suggest, in this order:
Computerphile Compilers with Professor Brailsford video series.
The good old Dragon
Book [pdf], and other books such as "How To Create Pragmatic, Lightweight Languages" and "Parsing with Perl 6 Regexes and Grammars".
Finally CS 6120: Advanced Compilers: The
Self-Guided Online
Course
which is not about parsing, because it presumes you already know
that.
The ability to actually check that is implemented in hardware. To be more specific, it will subtract 10-6 (which is one of the basic instructions that processors can do), and if the result is less than or equal to 0 then it will jump to the end of the block (comparing numbers to zero and jumping based on the result are also basic instructions). If you want to learn more, the keywords to look for are "instruction set architecture", and "assembly".

When debugging with gdb, what is the meaning of the debugging information in the front of the assembly code?

When debugging c code with gdb, the displayed assembly code is
0x000000000040116c main+0 push %rbp
0x000000000040116d main+1 mov %rsp,%rbp
!0x0000000000401170 main+4 movl $0x0,-0x4(%rbp)
0x0000000000401177 main+11 jmp 0x40118d <main+33>
0x0000000000401179 main+13 mov -0x4(%rbp),%eax
0x000000000040117c main+16 mov %eax,%edx
0x000000000040117e main+18 mov -0x4(%rbp),%eax
Is the 0x000000000040116d in the front of the first assembly instruction the virtual address of this function? Is main+1 the offset of this assembly from the main function? The next assembly is main+4. Does it mean that the first mov %rsp,%rbp is three bytes? If so, why is movl $0x0,-0x4(%rbp) 7 bytes?
I am using a server. The version is:Linux version 4.15.0-122-generic (buildd#lcy01-amd64-010) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)) #124~16.04.1-Ubuntu SMP.
Pretty much yes. It's quite apparent that, for example, adding 4 to the first address gives you the address shown for main+4, and adding another 7 on top of that gives you the corresponding address for main+11.
As far as the two move instructions go: they are very different, and do completely different things. They are two very different kind of moves, and that's how many bytes each one requires in x86 machine language, so its not surprising that one takes many more bytes than the other. As far as the precise reason why, well, in general, that opens a very long, broad, and windy discussion about the underlying reasons, and the original design goals of the x86 machine language instruction set. Much of it actually no longer applies (and you would probably find quite boring, actually), since the modern x86 CPU is something quite radically different than its original generation. But it has to remain binary compatible. Hence, little oddities like that.
Just to give you a basic understanding: the first move is between two CPU registers. Doesn't take a long novel to specify from where, and to where. The second move has to specify a 32 bit value (0 to be precise), a CPU register, and a memory offset. That has to be specified somewhere. You have to find the bytes somewhere to specify all the little details for that.

Understanding the assembly language for if-else in following code [duplicate]

What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax
The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.
Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.

Binary bomb statements clarification

I am trying to use gdb debugger to solve the binary bomb and I am stuck at these two statements.
I just want to know what does these statements mean.
cmp $0x1,%eax
cmpl $0x1f5,0x1c(%esp)
I just want to know what does these statements mean. cmp $0x1,%eax cmpl $0x1f5,0x1c(%esp)
The first one compares value in register EAX with 1. The second compares value stored in memory (on the stack) with a constant 0x1f5.
Given your question, you should likely read one of the many available assembly programming tutorials before you attempt to make further progress on the original problem.

Interchange 2 variables in C++ with asm code

I have a huge function that sorts a very large amount of int data. The code works fine except the fact that it's slower that it should be. My first step into solving this is to place some asm code inside C++. How can I interchange 2 variables using asm? I've tried this:
_asm{ push a[x]; push a[y]; pop a[x]; pop a[y];}
and this:
_asm(mov eax, a[x];mov ebx,a[y]; mov a[x],ebx; mov a[y],eax;}
but both crash. How can I save some time on these interchanges ? I use VS_2010
In general, it is very difficult to do better than your compiler with simple code like this.
A compiler, when faced with a swap operation on integers, will typically issue code like this:
mov eax, [x]
mov ebx, [y]
mov [x], ebx
mov [y], eax
Before you try to override, first check what the compiler is actually generating. If it's something like this, don't bother going any further; you won't be able to do better than this. Moreover, if you leave it to the compiler, it may, if these variables are used immediately thereafter, choose to reuse one of these registers to save on variable loads/stores as well. This is impossible with hand-coded assembly; the compiler must reload the variables after the black box that is hand-coded asm.
Note that the push/push/pop/pop sequence is likely to be much slower; not only does it add an additional four memory operations to the stack, it also introduces dependencies on the stack pointer, eliminating any possibility of pipelining. With the simple mov sequence, it is at least possible to run the pair of reads and pair of writes in parallel if they are on different memory banks, or one is in cache, etc. It also does not introduce stalls on the stack pointer in later code.
As such, you should not try to micro-optimize the cost of an interchange; instead, reduce the number of interchanges performed. There are many sorting algorithms available, each with slightly different characteristics. You may find some are better (cause less swaps) on your dataset than others.
What makes you think you can produce faster assembly than an optimizing compiler?
Even if you'll get it to work properly, all you're likely to achieve is to confuse the optimizer to produce even slower code.
When you do in-line assembly, you can change things so that assumptions the compiler has made about register contents will no longer be true. Often times EAX is used to pass a parameter or return a value, so trashing EAX might not have much effect, but you clobbered EBX and didn't put it back, and that could cause problems. Try pushing EBX before you use it, then pop it when you are done.
You can use the variable names, function names and labels in assembly code as symbols. Note that things like a[x] is not such valid symbol.
Writing more efficient code takes skill and knowledge, using asm does not necessarily help you there.
You can compare assembly code that your compiler produces for both the function with inline assembler and without to see where you did break it.