How if statements literally works? - if-statement

I was wondering how an if statement or conditional statements works behind the scenes when executed.
Consider an example like this:
if (10 > 6) {
// Some code
}
How does the compiler or interpreter knows that the number 10 is greater than 6 or 6 is less than 10?

At some point near the end of the compilation, the compiler will convert the above in to assembly language similar to:
start: # This is a label that you can reference
mov ax, 0ah # Store 10 in the ax register
mov bx, 06h # Store 6 in the bx register
cmp ax, bx # Compare ax to bx
jg inside_the_brackets # 10 > 6? Goto `inside_the_brackets`
jmp after_the_brackets # Otherwise skip ahead a little
inside_the_brackets:
# Some code - stuff inside the {} goes here
after_the_brackets:
# The rest of your program. You end up here no matter what.
I haven't written in assembler in years so I know that's a jumble of different varieties, but the above is the gist of it. Now, that's an inefficient way to structure the code, so a smart compiler might write it more like:
start: # This is a label that you can reference
mov ax, 0ah # Store 10 in the ax register
mov bx, 06h # Store 6 in the bx register
cmp ax, bx # Compare ax to bx
jle after_the_brackets # 10 <= 6? Goto `after_the_brackets`
inside_the_brackets:
# Some code - stuff inside the {} goes here
after_the_brackets:
# The rest of your program. You end up here no matter what.
See how that reversed the comparison, so instead of if (10 > 6) it's more like if (10 <= 6)? That removes a jmp instruction. The logic is identical, even if it's no longer exactly what you originally wrote. There -- now you've seen an "optimizing compiler" at work.
Every compiler you're likely to have heard of has a million tricks to convert code you write into assembly language that acts the same, but that the CPU can execute more efficiently. Sometimes the end result is barely recognizable. Some of the optimizations are as simple as what I just did, but others are fiendishly clever and people have earned PhDs in this stuff.

Kirk Strauser answer is correct. However you ask:
How does the compiler or interpreter knows that the number 10 is greater than 6 or 6 is less than 10?
Some optimizer compilers can see that 10 > 6 is a constant expression equivalent to true, and not emit any check or jump at all. If you are asking how they do that, well…
I'll explain the process in steps that hopefully are easy to understand. I'm covering no advanced topics.
The build process will start by parsing your code document according to the syntax of the language.
The syntax of the language will define how to interpret the text of the document (think a string with your code) as a series of symbols or tokens (e.g. keywords, literals, identifiers, operators…). In this case we have:
a if symbol.
a ( symbol.
a 10 symbol.
a > symbol".
a 6 symbol.
a ) symbol.
a { symbol.
and a } symbol.
I'm assuming comments, newlines and white-space do not generate symbols in this language.
From the series of symbols, it will build a tree-like memory structure (see AST) according to the rules of the language.
The tree will say that your code is:
An "if statement", that has two children:
A conditional (a boolean expression), which is a greater than comparison that has two children:
A constant literal integer 10
A constant literal integer 6
A body (a set of statements), in this case empty.
Then the compiler can look at that tree and figure out how to optimize it, and emit code in the target language (let us say machine code).
The optimization process will see that the conditional does not have variables, it is composed entirely of constants that are known at compile time. Thus it can compute the equivalent value and use that. Which leaves us with this:
An "if statement", that has two children:
A conditional (a boolean expression), which is a literal true.
A body (a set of statements), in this case empty.
Then it will see that we have a conditional that is always true, and thus we don't need it. Thus it replaces the if statement with the set of statements in its body. Which are none, we have optimized the code away to nothing.
You can imagine how the process would then go over the tree, figuring out what is the equivalent in the target language (again, let us say, machine code), and emitting that code until it has gone over the whole tree.
I want to mention that intermediate languages and JIT (just in time) compiling has become very common. See Understanding the differences: traditional interpreter, JIT compiler, JIT interpreter and AOT compiler.
My description of how the build process works is a toy textbook example. I would like to encourage to learn further of the topic. I'll suggest, in this order:
Computerphile Compilers with Professor Brailsford video series.
The good old Dragon
Book [pdf], and other books such as "How To Create Pragmatic, Lightweight Languages" and "Parsing with Perl 6 Regexes and Grammars".
Finally CS 6120: Advanced Compilers: The
Self-Guided Online
Course
which is not about parsing, because it presumes you already know
that.

The ability to actually check that is implemented in hardware. To be more specific, it will subtract 10-6 (which is one of the basic instructions that processors can do), and if the result is less than or equal to 0 then it will jump to the end of the block (comparing numbers to zero and jumping based on the result are also basic instructions). If you want to learn more, the keywords to look for are "instruction set architecture", and "assembly".

Related

Where are expressions and constants stored if not in memory?

From C Programming Language by Brian W. Kernighan
& operator only applies to objects in memory: variables and array
elements. It cannot be applied to expressions, constants or register
variables.
Where are expressions and constants stored if not in memory?
What does that quote mean?
E.g:
&(2 + 3)
Why can't we take its address? Where is it stored?
Will the answer be same for C++ also since C has been its parent?
This linked question explains that such expressions are rvalue objects and all rvalue objects do not have addresses.
My question is where are these expressions stored such that their addresses can't be retrieved?
Consider the following function:
unsigned sum_evens (unsigned number) {
number &= ~1; // ~1 = 0xfffffffe (32-bit CPU)
unsigned result = 0;
while (number) {
result += number;
number -= 2;
}
return result;
}
Now, let's play the compiler game and try to compile this by hand. I'm going to assume you're using x86 because that's what most desktop computers use. (x86 is the instruction set for Intel compatible CPUs.)
Let's go through a simple (unoptimized) version of how this routine could look like when compiled:
sum_evens:
and edi, 0xfffffffe ;edi is where the first argument goes
xor eax, eax ;set register eax to 0
cmp edi, 0 ;compare number to 0
jz .done ;if edi = 0, jump to .done
.loop:
add eax, edi ;eax = eax + edi
sub edi, 2 ;edi = edi - 2
jnz .loop ;if edi != 0, go back to .loop
.done:
ret ;return (value in eax is returned to caller)
Now, as you can see, the constants in the code (0, 2, 1) actually show up as part of the CPU instructions! In fact, 1 doesn't show up at all; the compiler (in this case, just me) already calculates ~1 and uses the result in the code.
While you can take the address of a CPU instruction, it often makes no sense to take the address of a part of it (in x86 you sometimes can, but in many other CPUs you simply cannot do this at all), and code addresses are fundamentally different from data addresses (which is why you cannot treat a function pointer (a code address) as a regular pointer (a data address)). In some CPU architectures, code addresses and data addresses are completely incompatible (although this is not the case of x86 in the way most modern OSes use it).
Do notice that while (number) is equivalent to while (number != 0). That 0 doesn't show up in the compiled code at all! It's implied by the jnz instruction (jump if not zero). This is another reason why you cannot take the address of that 0 — it doesn't have one, it's literally nowhere.
I hope this makes it clearer for you.
where are these expressions stored such that there addresses can't be retrieved?
Your question is not well-formed.
Conceptually
It's like asking why people can discuss ownership of nouns but not verbs. Nouns refer to things that may (potentially) be owned, and verbs refer to actions that are performed. You can't own an action or perform a thing.
In terms of language specification
Expressions are not stored in the first place, they are evaluated.
They may be evaluated by the compiler, at compile time, or they may be evaluated by the processor, at run time.
In terms of language implementation
Consider the statement
int a = 0;
This does two things: first, it declares an integer variable a. This is defined to be something whose address you can take. It's up to the compiler to do whatever makes sense on a given platform, to allow you to take the address of a.
Secondly, it sets that variable's value to zero. This does not mean an integer with value zero exists somewhere in your compiled program. It might commonly be implemented as
xor eax,eax
which is to say, XOR (exclusive-or) the eax register with itself. This always results in zero, whatever was there before. However, there is no fixed object of value 0 in the compiled code to match the integer literal 0 you wrote in the source.
As an aside, when I say that a above is something whose address you can take - it's worth pointing out that it may not really have an address unless you take it. For example, the eax register used in that example doesn't have an address. If the compiler can prove the program is still correct, a can live its whole life in that register and never exist in main memory. Conversely, if you use the expression &a somewhere, the compiler will take care to create some addressable space to store a's value in.
Note for comparison that I can easily choose a different language where I can take the address of an expression.
It'll probably be interpreted, because compilation usually discards these structures once the machine-executable output replaces them. For example Python has runtime introspection and code objects.
Or I can start from LISP and extend it to provide some kind of addressof operation on S-expressions.
The key thing they both have in common is that they are not C, which as a matter of design and definition does not provide those mechanisms.
Such expressions end up part of the machine code. An expression 2 + 3 likely gets translated to the machine code instruction "load 5 into register A". CPU registers don't have addresses.
It does not really make sense to take the address to an expression. The closest thing you can do is a function pointer. Expressions are not stored in the same sense as variables and objects.
Expressions are stored in the actual machine code. Of course you could find the address where the expression is evaluated, but it just don't make sense to do it.
Read a bit about assembly. Expressions are stored in the text segment, while variables are stored in other segments, such as data or stack.
https://en.wikipedia.org/wiki/Data_segment
Another way to explain it is that expressions are cpu instructions, while variables are pure data.
One more thing to consider: The compiler often optimizes away things. Consider this code:
int x=0;
while(x<10)
x+=1;
This code will probobly be optimized to:
int x=10;
So what would the address to (x+=1) mean in this case? It is not even present in the machine code, so it has - by definition - no address at all.
Where are expressions and constants stored if not in memory
In some (actually many) cases, a constant expression is not stored at all. In particular, think about optimizing compilers, and see CppCon 2017: Matt Godbolt's talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”
In your particular case of some C code having 2 + 3, most optimizing compilers would have constant folded that into 5, and that 5 constant might be just inside some machine code instruction (as some bitfield) of your code segment and not even have a well defined memory location. If that constant 5 was a loop limit, some compilers could have done loop unrolling, and that constant won't appear anymore in the binary code.
See also this answer, etc...
Be aware that C11 is a specification written in English. Read its n1570 standard. Read also the much bigger specification of C++11 (or later).
Taking the address of a constant is forbidden by the semantics of C (and of C++).

Different conditional checking ways

To check an int within range [1, ∞) or not, I can use the following ways (use #1, #2 a lot):
if (a>=1)
if (a>0)
if (a>1 || a==1)
if (a==1 || a>1)
Is there any difference that I should pay attention to among the four versions?
Functionally there is no difference between the 4 ways you listed. This is mainly an issue of style. I would venture that #1 and #2 are the most common forms though, if I saw #3 or #4 on a code review I would suggest a change.
Perf wise I suppose it is possible that some compiler out there optimizes one better than the other. But I really doubt it. At best it would be a micro-optimization and nothing I would ever base my coding style on without direct profiler input
I don't really see why you would use 3 or 4. Apart from being longer to type, they will generate more code. Since in a or condition the second check is skipped if the first is true, there shouldn't be a performance hit except for version 4 if value is not 1 often(of course hardware with branch prediction will mostly negate that).
1. if (a>=1)
2. if (a>0)
3. if (a>1 || a==1)
4. if (a==1 || a>1)
On x86, options 1 and 2 produce a cmp instruction. This will set various registers. The cmp is then followed by a condition branch/jump based on registers. For the first, it emits bge, for the second it emits bgt.
Option 3 and 4 - in theory - require two cmps and two branches, but chances are the compiler will simply optimize them to be the same as 1.
You should generally choose whichever (a) follows the conventions in the code you are working on (b) use whichever most clearly expresses the algorithm you are implementing.
There are times when explicitly writing "if a is equal to one, or it has a value greater than 1", and in those times you should write if (a == 1 || a > 1). But if you are just checking that a has a positive, non-zero, integer value, you should write if (a > 0), since that is what that says.
If you find that such a case is a part of a performance bottleneck, you should inspect the assembly instructions and adjust accordingly - e.g. if you find you have two cmps and branches, then write the code to use one compare and one branch.
Nope! They all are the same for an int. However, I would prefer to use if (a>0).

Encode asm instructions to opcodes

I need to encode a few instructions like
mov eax, edx
inc edx
to the corresponding x86_64 opcodes. Is there any library (not an entire asm compiler) to accomplish that easily?
You could take open source FASM or NASM and use their parser.
in case you already compiled it into a binary (from your asm or c with embedded asm):
objdump -S your_binary, it will list each instruction with its binary code.
Assuming you are just after translating simple instructions, writing a simple assembler wouldn't be THAT much work. I've done it before - and you probably have most of the logic and tables for your disassembler component (such as a table of opcodes to instruction name and register number to name - just use that in reverse). I don't necessarily mean that you can just use the table directly in reverse, but the content of the tables re-arranged in a suitable way should do most of the hard work not too bad.
What gets difficult is symbols and relocation and such things. But since you probably don't really need that for "find this sequence of code", I guess you could do without those parts. You also don't need to generate object files to some specification - you just need a set of bytes.
Now, it would get a little bit more tricky if you wanted to find:
here:
inc eax
jnz here
jmp someplace_else
....
...
someplace_else:
....
since you'd have to encode the jumps to the their relative location - at the very least, it would require a two-pass approach, to first figure the length of the instructions, then a the actual filling in of the jump targets. If "someplace_else" is far from the jump itself, it may also be an absolute jump, in which case your "search" would have to undertstand how that relates to the location it's searching at - since that sequence would be different for every single address.
I've written both assemblers and disassemblers, and it's not TERRIBLY hard if you don't have to deal with relocatable addresses and file formats with weird defintions that you don't know [until you've studied the 200 page definition of the format].

Which is faster (mask >> i & 1) or (mask & 1 << i)?

In my code I must choose one of this two expressions (where mask and i non constant integer numbers -1 < i < (sizeof(int) << 3) + 1). I don't think that this will make preformance of my programm better or worse, but it is very interesting for me. Do you know which is better and why?
First of all, whenever you find yourself asking "which is faster", your first reaction should be to profile, measure and find out for yourself.
Second of all, this is such a tiny calculation, that it almost certainly has no bearing on the performance of your application.
Third, the two are most likely identical in performance.
C expressions cannot be "faster" or "slower", because CPU cannot evaluate them directly.
Which one is "faster" depends on the machine code your compiler will be able to generate for these two expressions. If your compiler is smart enough to realize that in your context both do the same thing (e.g. you simply compare the result with zero), it will probably generate the same code for both variants, meaning that they will be equally fast. In such case it is quite possible that the generated machine code will not even remotely resemble the sequence of operations in the original expression (i.e. no shift and/or no bitwise-and). If what you are trying to do here is just test the value of one bit, then there are other ways to do it besides the shift-and-bitwise-and combination. And many of those "other ways" are not expressible in C. You can't use them in C, while the compiler can use them in machine code.
For example, the x86 CPU has a dedicated bit-test instruction BT that extracts the value of a specific bit by its number. So a smart compiler might simply generate something like
MOV eax, i
BT mask, eax
...
for both of your expressions (assuming it is more efficient, of which I'm not sure).
Use either one and let your compiler optimize it however it likes.
If "i" is a compile-time constant, then the second would execute fewer instructions -- the 1 << i would be computed at compile time. Otherwise I'd imagine they'd be the same.
Depends entirely on where the values mask and i come from, and the architecture on which the program is running. There's also nothing to stop the compiler from transforming one into the other in situations where they are actually equivalent.
In short, not worth worrying about unless you have a trace showing that this is an appreciable fraction of total execution time.
It is unlikely that either will be faster. If you are really curious, compile a simple program that does both, disassemble, and see what instructions are generated.
Here is how to do that:
gcc -O0 -g main.c -o main
objdump -d main | less
You could examine the assembly output and then look-up how many clock cycles each instruction takes.
But in 99.9999999 percent of programs, it won't make a lick of difference.
The 2 expressions are not logically equivalent, performance is not your concern!
If performance was your concern, write a loop to do 10 million of each and measure.
EDIT: You edited the question after my response ... so please ignore my answer as the constraints change things.

long lines of integer arithmetic

Two Parts two my question. Which is more efficient/faster:
int a,b,c,d,e,f;
int a1,b1,c1,d1,e1,f1;
int SumValue=0; // oops forgot zero
// ... define all values
SumValue=a*a1+b*b1+c*c1+d*d1+e*e1*f*f1;
or
Sumvalue+=a*a1+b*b1+c*c1;
Sumvalue+=d*d1+e*e1*f*f1;
I'm guessing the first one is. My second question is why.
I guess a third question is, at any point would it be necessary to break up an addition operation (besides compiler limitations on number of line continuations etc...).
Edit
Is the only time I would see a slow down when then entire arithmetic operation could not fit in the cache? I think this is impossible - compiler probably gets mad about two many line continuations before this could happen. Maybe I'll have to play tomorrow and see.
Did you measure that? The optimized machine code for both approaches will probably be very similar, if not the same.
EDIT: I just tested this, the results are what I expected:
$ gcc -O2 -S math1.c # your first approach
$ gcc -O2 -S math2.c # your second approach
$ diff -u math1.s math2.s
--- math1.s 2010-10-26 19:35:06.487021094 +0200
+++ math2.s 2010-10-26 19:35:08.918020954 +0200
## -1,4 +1,4 ##
- .file "math1.c"
+ .file "math2.c"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "%d\n"
That's it. Identical machine code.
There is no arbitrary limit to the number of operations you can combine on one line... practically, the compiler will accept any number you care to throw at it. The compilers consideration of the operations happens long after the newlines are stipped - it is dealing with lexical symbols and grammar rules, then an abstract syntax tree, by then. Unless your compiler is very badly written, both statements will perform equally well for int data.
Note that in result = a*b + c*d + e*f etc., the compiler has no sequence points and knows precedence, so has complete freedom to evaluate and combine the subexpressions in parallel (given capable hardware). With a result += a*b; result += c*d; approach, you are inserting sequence points so the compiler is asked to complete one expression before the other, but is free to - and should - realise the result is not used elsewhere in between increments, so it is free to optimise as in the first case.
More generally: the best advice I can give for such performance queries is 1) dont worry about it being a practical problem unless your program is running too slow, then profile to find out where 2) if curious or profiling indicates a problem, then try both/all approaches you can think of and measure real performance.
Aside: += can be more efficient sometimes, e.g. for concatenating to an existing string, as + on such objects can involve creating temporaries and more memory allocation - template expressions work around this problem but are rarely used as theyre very complex to implement and slower to compile.
This is why it helps to be familiar with assembly language. In both cases, assembly instructions will be generated that load operand pairs into registers and perform addition/multiplication, and store the result in a register. Instructions to store the final result in the memory address represented by SumValue may also be generated, depending on how you use SumValue.
In short, both constructs are likely to perform the same, especially with optimization flags. And even if they don't perform the same on some platform, there's nothing intrinsic to either approach that would really help to explain why at the C++ level. At best, you'd be able to understand the reason why one performs better than the other by looking at how your compiler translates C++ constructs into assembly instructions.
I guess a third question is, at any
point would it be necessary to break
up an addition operation (besides
compiler limitations on number of line
continuations etc...).
It's not really necessary to break up an addition operation. But it might help for readability.
They're most likely going to be converted into the same amount of machine instructions, so they'd take the same length of time.