GCC inline assembly: constraints - c++

I'm having difficulty understanding the role constraints play in GCC inline assembly (x86). I've read the manual, which explains exactly what each constraint does. The problem is that even though I understand what each constraint does, I have very little understanding of why you would use one constraint over another, or what the implications might be.
I realize this is a very broad topic, so a small example should help narrow the focus. The following is a simple asm routine which just adds two numbers. If an integer overflow occurs, it writes a value of 1 to an output C variable.
int32_t a = 10, b = 5;
int32_t c = 0; // overflow flag
__asm__
(
"addl %2,%3;" // Do a + b (the result goes into b)
"jno 0f;" // Jump ahead if an overflow occurred
"movl $1, %1;" // Copy 1 into c
"0:" // We're done.
:"=r"(b), "=m"(c) // Output list
:"r"(a), "0"(b) // Input list
);
Now this works fine, except I had to arbitrarily fiddle with the constraints until I got it to work correctly. Originally, I used the following constraints:
:"=r"(b), "=m"(c) // Output list
:"r"(a), "m"(b) // Input list
Note that instead of a "0", I use an "m" constraint for b. This had a weird side effect where if I compiled with optimization flags and called the function twice, for some reason the result of the addition operation would also get stored in c. I eventually read about "matching constraints", which allows you to specify that a variable is to be used as both an input and output operand. When I changed "m"(b) to "0"(b) it worked.
But I don't really understand why you would use one constraint over another. I mean yeah, I understand that "r" means the variable should be in a register and "m" means it should be in memory - but I don't really understand what the implications of choosing one over another are, or why the addition operation doesn't work correctly if I choose a certain combination of constraints.
Questions: 1) In the above example code, why did the "m" constraint on b cause c to get written to? 2) Is there any tutorial or online resource which goes into more detail about constraints?

Here's an example to better illustrate why you should choose constraints carefully (same function as yours, but perhaps written a little more succinctly):
bool add_and_check_overflow(int32_t& a, int32_t b)
{
bool result;
__asm__("addl %2, %1; seto %b0"
: "=q" (result), "+g" (a)
: "r" (b));
return result;
}
So, the constraints used were: q, r, and g.
q means only eax, ecx, edx, or ebx could be selected. This is because the set* instructions must write to an 8-bit-addressable register (al, ah, ...). The use of b in the %b0 means, use the lowest 8-bit portion (al, cl, ...).
For most two-operand instructions, at least one of the operands must be a register. So don't use m or g for both; use r for at least one of the operands.
For the final operand, it doesn't matter whether it's register or memory, so use g (general).
In the example above, I chose to use g (rather than r) for a because references are usually implemented as memory pointers, so using an r constraint would have required copying the referent to a register first, and then copying back. Using g, the referent could be updated directly.
As to why your original version overwrote your c with the addition's value, that's because you specified =m in the output slot, rather than (say) +m; that means the compiler is allowed to reuse the same memory location for input and output.
In your case, that means two outcomes (since the same memory location was used for b and c):
The addition didn't overflow: then, c got overwritten with the value of b (the result of the addition).
The addition did overflow: then, c became 1 (and b might become 1 also, depending on how the code was generated).

Related

Where are expressions and constants stored if not in memory?

From C Programming Language by Brian W. Kernighan
& operator only applies to objects in memory: variables and array
elements. It cannot be applied to expressions, constants or register
variables.
Where are expressions and constants stored if not in memory?
What does that quote mean?
E.g:
&(2 + 3)
Why can't we take its address? Where is it stored?
Will the answer be same for C++ also since C has been its parent?
This linked question explains that such expressions are rvalue objects and all rvalue objects do not have addresses.
My question is where are these expressions stored such that their addresses can't be retrieved?
Consider the following function:
unsigned sum_evens (unsigned number) {
number &= ~1; // ~1 = 0xfffffffe (32-bit CPU)
unsigned result = 0;
while (number) {
result += number;
number -= 2;
}
return result;
}
Now, let's play the compiler game and try to compile this by hand. I'm going to assume you're using x86 because that's what most desktop computers use. (x86 is the instruction set for Intel compatible CPUs.)
Let's go through a simple (unoptimized) version of how this routine could look like when compiled:
sum_evens:
and edi, 0xfffffffe ;edi is where the first argument goes
xor eax, eax ;set register eax to 0
cmp edi, 0 ;compare number to 0
jz .done ;if edi = 0, jump to .done
.loop:
add eax, edi ;eax = eax + edi
sub edi, 2 ;edi = edi - 2
jnz .loop ;if edi != 0, go back to .loop
.done:
ret ;return (value in eax is returned to caller)
Now, as you can see, the constants in the code (0, 2, 1) actually show up as part of the CPU instructions! In fact, 1 doesn't show up at all; the compiler (in this case, just me) already calculates ~1 and uses the result in the code.
While you can take the address of a CPU instruction, it often makes no sense to take the address of a part of it (in x86 you sometimes can, but in many other CPUs you simply cannot do this at all), and code addresses are fundamentally different from data addresses (which is why you cannot treat a function pointer (a code address) as a regular pointer (a data address)). In some CPU architectures, code addresses and data addresses are completely incompatible (although this is not the case of x86 in the way most modern OSes use it).
Do notice that while (number) is equivalent to while (number != 0). That 0 doesn't show up in the compiled code at all! It's implied by the jnz instruction (jump if not zero). This is another reason why you cannot take the address of that 0 — it doesn't have one, it's literally nowhere.
I hope this makes it clearer for you.
where are these expressions stored such that there addresses can't be retrieved?
Your question is not well-formed.
Conceptually
It's like asking why people can discuss ownership of nouns but not verbs. Nouns refer to things that may (potentially) be owned, and verbs refer to actions that are performed. You can't own an action or perform a thing.
In terms of language specification
Expressions are not stored in the first place, they are evaluated.
They may be evaluated by the compiler, at compile time, or they may be evaluated by the processor, at run time.
In terms of language implementation
Consider the statement
int a = 0;
This does two things: first, it declares an integer variable a. This is defined to be something whose address you can take. It's up to the compiler to do whatever makes sense on a given platform, to allow you to take the address of a.
Secondly, it sets that variable's value to zero. This does not mean an integer with value zero exists somewhere in your compiled program. It might commonly be implemented as
xor eax,eax
which is to say, XOR (exclusive-or) the eax register with itself. This always results in zero, whatever was there before. However, there is no fixed object of value 0 in the compiled code to match the integer literal 0 you wrote in the source.
As an aside, when I say that a above is something whose address you can take - it's worth pointing out that it may not really have an address unless you take it. For example, the eax register used in that example doesn't have an address. If the compiler can prove the program is still correct, a can live its whole life in that register and never exist in main memory. Conversely, if you use the expression &a somewhere, the compiler will take care to create some addressable space to store a's value in.
Note for comparison that I can easily choose a different language where I can take the address of an expression.
It'll probably be interpreted, because compilation usually discards these structures once the machine-executable output replaces them. For example Python has runtime introspection and code objects.
Or I can start from LISP and extend it to provide some kind of addressof operation on S-expressions.
The key thing they both have in common is that they are not C, which as a matter of design and definition does not provide those mechanisms.
Such expressions end up part of the machine code. An expression 2 + 3 likely gets translated to the machine code instruction "load 5 into register A". CPU registers don't have addresses.
It does not really make sense to take the address to an expression. The closest thing you can do is a function pointer. Expressions are not stored in the same sense as variables and objects.
Expressions are stored in the actual machine code. Of course you could find the address where the expression is evaluated, but it just don't make sense to do it.
Read a bit about assembly. Expressions are stored in the text segment, while variables are stored in other segments, such as data or stack.
https://en.wikipedia.org/wiki/Data_segment
Another way to explain it is that expressions are cpu instructions, while variables are pure data.
One more thing to consider: The compiler often optimizes away things. Consider this code:
int x=0;
while(x<10)
x+=1;
This code will probobly be optimized to:
int x=10;
So what would the address to (x+=1) mean in this case? It is not even present in the machine code, so it has - by definition - no address at all.
Where are expressions and constants stored if not in memory
In some (actually many) cases, a constant expression is not stored at all. In particular, think about optimizing compilers, and see CppCon 2017: Matt Godbolt's talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”
In your particular case of some C code having 2 + 3, most optimizing compilers would have constant folded that into 5, and that 5 constant might be just inside some machine code instruction (as some bitfield) of your code segment and not even have a well defined memory location. If that constant 5 was a loop limit, some compilers could have done loop unrolling, and that constant won't appear anymore in the binary code.
See also this answer, etc...
Be aware that C11 is a specification written in English. Read its n1570 standard. Read also the much bigger specification of C++11 (or later).
Taking the address of a constant is forbidden by the semantics of C (and of C++).

A unique type of data conversion

in the following code
tt=5;
for(i=0;i<tt;i++)
{
int c,d,l;
scanf("%lld%lld%lld",&c,&d,&l);
printf("%d %d %d %d",c,d,l,tt);
}
in the first iteration, the value of 'tt' is changing to 0 automatically.
I know that i have declared c,d,l as int and taking input as long long so it is making c,d=0. But still, i m not able to understand how tt is becoming 0.
Small, but obligatory announcement. As it was said in comments, you face undefined behavior, so
don't be surprised by tt assigned to zero
don't be surprised by tt not assigned to zero after insignificant code changes (e.g. reordering initialization from "int i,tt;" to "int tt, i;" or vice versa)
don't be surprised by tt not assigned to zero after compiling with different flags or different compiler version or for different platform or for testing with different input
don't be surprised by anything. Any behavior is possible.
You can't expect this code to work one way or another, so don't ever use it in real program.
However, you seem to be OK with that, and the question is "what is actually happening with tt". IMHO this question is really great, it reveals passion to understand programming deeper, and it helps in digging into lower layer. So lets get started.
Possible explanation
I failed to reproduce behavior on VS2015, but situation is quite clear. Actual data aligning, variable sizes, endianness, stack growth direction and other details may differ on your PC, but the general idea should be the same.
Variables i, tt, c, d, l are local, so they are stored on stack. Lets assume, sizeof(int) is 4 and sizeof(long long) is 8 which is quite common. Then one of possible data alignments is shown on picture (addresses grow from left to right, each cell represents one byte):
When doing scanf, you pass address of c (blue arrow on next pict) for filling with data. But size of data is 8 bytes, so data of both c and tt are overwritten (blue cells on the pict). For little-endian representation, you always write zeroes to tt unless really big number is entered by user, while c actually gets valid data for small numbers.
However, valid data in c will be rewritten the same way during filling d, the same will happen to d while filling l. So only l will get nonzero value in described case. Easy test: enter large number for c, d, l and check if tt is still zero.
How to get precise answer
You can get all answers from assembly code. Enable disassembly listing (exact steps depend on toolchain: gcc has -S option, visual studio has "goto disassembly" item in context menu while on breakpoint) and analyze listing. It's really helpful to see exact instructions your CPU is going to execute. Some debuggers allow executing instructions one by one. So you need to find out how variables are alligned on stack and when exactly are they overwritten. Analyzing scanf is hard for beginners, so you can start with the simplified version of your program: replace scanf with the following (can't test, but should work):
*((long long *)(&c)) = 1; //or any other user specified value
*((long long *)(&d)) = 2;
*((long long *)(&l)) = 3;

Is it bad practice to operate on a structure and assign the result to the same structure? Why?

I don't recall seeing examples of code like this hypothetical snippet:
cpu->dev.bus->uevent = (cpu->dev.bus->uevent) >> 16; //or the equivalent using a macro
in which a member in a large structure gets dereferenced using pointers, operated on, and the result assigned back to the same field of the structure.
The kernel seems to be a place where such large structures are frequent but I haven't seen examples of it and became interested as to the reason why.
Is there a performance reason for this, maybe related to the time required to follow the pointers? Is it simply not good style and if so, what is the preferred way?
There's nothing wrong with the statement syntactically, but it's easier to code it like this:
cpu->dev.bus->uevent >>= 16;
It's mush more a matter of history: the kernel is mostly written in C (not C++), and -in the original development intention- (K&R era) was thought as a "high level assembler", whose statement and expression should have a literal correspondence in C and ASM. In this environment, ++i i+=1 and i=i+1 are completely different things that translates in completely different CPU instructions
Compiler optimizations, at that time, where not so advanced and popular, so the idea to follow the pointer chain twice was often avoided by first store the resulting destination address in a local temporary variable (most likely a register) and than do the assignment.
(like int* p = &a->b->c->d; *p = a + *p;)
or trying to use compond instruction like a->b->c >>= 16;)
With nowadays computers (multicore processor, multilevel caches and piping) the execution of cone inside registers can be ten times faster respect to the memory access, following three pointers is faster than storing an address in memory, thus reverting the priority of the "business model".
Compiler optimization, then, can freely change the produced code to adequate it to size or speed depending on what is retained more important and depending on what kind of processor you are working with.
So -nowadays- it doesn't really matter if you write ++i or i+=1 or i=i+1: The compiler will most likely produce the same code, attempting to access i only once. and following the pointer chain twice will most likely be rewritten as equivalent to (cpu->dev.bus->uevent) >>= 16 since >>= correspond to a single machine instruction in the x86 derivative processors.
That said ("it doesn't really matter"), it is also true that code style tend to reflect stiles and fashions of the age it was first written (since further developers tend to maintain consistency).
You code is not "bad" by itself, it just looks "odd" in the place it is usually written.
Just to give you an idea of what piping and prediction is. consider the comparison of two vectors:
bool equal(size_t n, int* a, int *b)
{
for(size_t i=0; i<n; ++i)
if(a[i]!=b[i]) return false;
return true;
}
Here, as soon we find something different we sortcut saying they are different.
Now consider this:
bool equal(size_t n, int* a, int *b)
{
register size_t c=0;
for(register size_t i=0; i<n; ++i)
c+=(a[i]==b[i]);
return c==n;
}
There is no shortcut, and even if we find a difference continue to loop and count.
But having removed the if from inside the loop, if n isn't that big (let's say less that 20) this can be 4 or 5 times faster!
An optimized compiler can even recognize this situation - proven there are no different side effects- can rework the first code in the second!
I see nothing wrong with something like that, it appears as innocuous as:
i = i + 42;
If you're accessing the data items a lot, you could consider something like:
tSomething *cdb = cpu->dev.bus;
cdb->uevent = cdb->uevent >> 16;
// and many more accesses to cdb here
but, even then, I'd tend to leave it to the optimiser, which tends to do a better job than most humans anyway :-)
There's nothing inherently wrong by doing
cpu->dev.bus->uevent = (cpu->dev.bus->uevent) >> 16;
but depending on the type of uevent, you need to be careful when shifting right like that, so you don't accidentally shift in unexpected bits into your value. For instance, if it's a 64-bit value
uint64_t uevent = 0xDEADBEEF00000000;
uevent = uevent >> 16; // now uevent is 0x0000DEADBEEF0000;
if you thought you shifted a 32-bit value and then pass the new uevent to a function taking a 64-bit value, you're not passing 0xBEEF0000, as you might have expected. Since the sizes fit (64-bit value passed as 64-bit parameter), you won't get any compiler warnings here (which you would have if you passed a 64-bit value as a 32-bit parameter).
Also interesting to note is that the above operation, while similar to
i = ++i;
which is undefined behavior (see http://josephmansfield.uk/articles/c++-sequenced-before-graphs.html for details), is still well defined, since there are no side effects in the right-hand side expression.

C++ using boolean evaluations for array positions (jump table)

I have a C++ IF statement which looks like (pseudo code- all variables are ints):
if(x < y){
c += d;
}
else{
c += f;
}
and I am thinking of trying to remove the IF statement and instead, load the values d and f into a two-element array:
array[0] = d
array[1] = f
and then I would like to be able to refer to the array elements '0' or '1' based upon the underlying type of boolean (at least in C- 0 or 1). Is there any way to do this? So my code would change to be something like:
c += array[(x<y)] if this is true, c increments by f, otherwise if its false, c increments by d.
Can I do this, using the boolean result to look up the array index?
Of course you can do it. However, chances are that you are only going to make it worse. If you think that you are removing a branch in this case — you are mistaken. Assuming a production quality compiler and x86_64 architecture, your first version will result in a nice conditional move (i.e. cmovge). The second version, however, will result in extra level of indirection and reading memory (i.e. mov eax,DWORD PTR [rax*4+0x4005d0].
If you accept suggestions, I have a very bad feeling that you are on a very, very wrong path right now. When you are optimizing your program, you have to first measure/profile to determine a bottleneck. Only when you know what are bottlenecks, you can start optimizing them. When optimizing, you have to measure/profile it again to see whether there is an improvement or not. What you seem to be doing is not trusting your compiler, guessing, and doing false-optimization. I recommend you stop right there, or else it will go down the hill from there, trust me.
You could replace the if statement with the following if you want more compact code.
c += (x < y) ? d : f;
Yes that will work. Although it will make your code harder to understand and modern compilers will eliminate the if statement anyways (when translating to assembler).

Post increment operator behavior [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Pre & post increment operator behavior in C, C++, Java, & C#
Here is a test case:
void foo(int i, int j)
{
printf("%d %d", i, j);
}
...
test = 0;
foo(test++, test);
I would expect to get a "0 1" output, but I get "0 0"
What gives??
This is an example of unspecified behavior. The standard does not say what order arguments should be evaluated in. This is a compiler implementation decision. The compiler is free to evaluate the arguments to the function in any order.
In this case, it looks like actually processes the arguments right to left instead of the expected left to right.
In general, doing side-effects in arguments is bad programming practice.
Instead of foo(test++, test); you should write foo(test, test+1); test++;
It would be semantically equivalent to what you are trying to accomplish.
Edit:
As Anthony correctly points out, it is undefined to both read and modify a single variable without an intervening sequence point. So in this case, the behavior is indeed undefined. So the compiler is free to generate whatever code it wants.
This is not just unspecified behaviour, it is actually undefined behaviour .
Yes, the order of argument evaluation is unspecified, but it is undefined to both read and modify a single variable without an intervening sequence point unless the read is solely for the purpose of computing the new value. There is no sequence point between the evaluations of function arguments, so f(test,test++) is undefined behaviour: test is being read for one argument and modified for the other. If you move the modification into a function then you're fine:
int preincrement(int* p)
{
return ++(*p);
}
int test;
printf("%d %d\n",preincrement(&test),test);
This is because there is a sequence point on entry and exit to preincrement, so the call must be evaluated either before or after the simple read. Now the order is just unspecified.
Note also that the comma operator provides a sequence point, so
int dummy;
dummy=test++,test;
is fine --- the increment happens before the read, so dummy is set to the new value.
Everything I said originally is WRONG! The point in time at which the side-affect is calculated is unspecified. Visual C++ will perform the increment after the call to foo() if test is a local variable, but if test is declared as static or global it will be incremented before the call to foo() and produce different results, although the final value of test will be correct.
The increment should really be done in a separate statement after the call to foo(). Even if the behaviour was specified in the C/C++ standard it would be confusing. You would think that C++ compilers would flag this as a potential error.
Here is a good description of sequence points and unspecified behaviour.
<----START OF WRONG WRONG WRONG---->
The "++" bit of "test++" gets executed after the call to foo. So you pass in (0,0) to foo, not (1,0)
Here is the assembler output from Visual Studio 2002:
mov ecx, DWORD PTR _i$[ebp]
push ecx
mov edx, DWORD PTR tv66[ebp]
push edx
call _foo
add esp, 8
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
The increment is done AFTER the call to foo(). While this behavior is by design, it is certainly confusing to the casual reader and should probably be avoided. The increment should really be done in a separate statement after the call to foo()
<----END OF WRONG WRONG WRONG ---->
It's "unspecified behavior", but in practice with the way the C call stack is specified it almost always guarantees that you will see it as 0, 0 and never 1, 0.
As someone noted, the assembler output by VC pushes the right most parameter on the stack first. This is how C function calls are implemented in assembler. This is to accommodate C's "endless parameter list" feature. By pushing parameters in a right-to-left order, the first parameter is guaranteed to be the top item on the stack.
Take printf's signature:
int printf(const char *format, ...);
Those ellipses denote an unknown number of parameters. If parameters were pushed left-to-right, the format would be at the bottom of a stack of which we don't know the size.
Knowing that in C (and C++) that parameters are processed left-to-right, we can determine the simplest way of parsing and interpreting a function call. Get to the end of the parameter list, and start pushing, evaluating any complex statements as you go.
However, even this can't save you as most C compilers have an option to parse functions "Pascal style". And all this means is that the function parameters are pushed on the stack in a left-to-right fashion. If, for instance, printf was compiled with the Pascal option, then the output would most likely be 1, 0 (however, since printf uses the ellipse, I don't think it can be compiled Pascal style).
C doesn't guarantee the order of evaluation of parameters in a function call, so with this you might get the results "0 1" or "0 0". The order can change from compiler to compiler, and the same compiler could choose different orders based on optimization parameters.
It's safer to write foo(test, test + 1) and then do ++test in the next line. Anyway, the compiler should optimize it if possible.
The order of evaluation for arguments to a function is undefined. In this case it appears that it did them right-to-left.
(Modifying variables between sequence points basically allows a compiler to do anything it wants.)
Um, now that the OP has been edited for consistency, it is out of sync with the answers. The fundamental answer about order of evaluation is correct. However the specific possible values are different for the foo(++test, test); case.
++test will be incremented before being passed, so the first argument will always be 1. The second argument will be 0, or 1 depending on evaluation order.
According to the C standard, it is undefined behaviour to have more than one references to a variable in a single sequence point (here you can think of that as being a statement, or parameters to a function) where one of more of those references includes a pre/post modification.
So:
foo(f++,f) <--undefined as to when f increments.
And likewise (I see this all the time in user code):
*p = p++ + p;
Typically a compiler will not change its behaviour for this type of thing (except for major revisions).
Avoid it by turning on warnings and paying attention to them.
To repeat what others have said, this is not unspecified behavior, but rather undefined. This program can legally output anything or nothing, leave n at any value, or send insulting email to your boss.
As a matter of practice, compiler writers will usually just do what's easiest for them to write, which generally means that the program will fetch n once or twice, call the function, and increment sometime. This, like any other conceivable behavior, is just fine according to the standard. There is no reason to expect the same behavior between compilers, or versions, or with different compiler options. There is no reason why two different but similar-looking examples in the same program have to be compiled consistently, although that's the way I'd bet.
In short, don't do this. Test it under different circumstances if you're curious, but don't pretend that there is a single correct or even predictable result.
The compiler might not be evaluating the arguments in the order you'd expect.