Using AT&T inline assembler for GCC - c++

I'm writing a simple but a little specific program:
Purpose: calculate number from it's factorial
Requirements: all calculations must be done on gcc inline asm (at&t syntax)
Source code:
#include <iostream>
int main()
{
unsigned n = 0, f = 0;
std::cin >> n;
asm
(
"mov %0, %%eax \n"
"mov %%eax, %%ecx \n"
"mov 1, %%ebx \n"
"mov 1, %%eax \n"
"jmp cycle_start\n"
"cycle:\n"
"inc %%ebx\n"
"mul %%ebx\n"
"cycle_start:\n"
"cmp %%ecx, %%eax\n"
"jnz cycle\n"
"mov %%ebx, %1 \n":
"=r" (n):
"r" (f)
);
std::cout << f;
return 0;
}
This code causes SIGSEV.
Identic program on intel asm syntax (http://pastebin.com/2EqJmGAV) works fine. Why my "AT&T program" fails and how can i fix it?
#include <iostream>
int main()
{
unsigned n = 0, f = 0;
std::cin >> n;
__asm
{
mov eax, n
mov ecx, eax
mov eax, 1
mov ebx, 1
jmp cycle_start
cycle:
inc ebx
mul ebx
cycle_start:
cmp eax, ecx
jnz cycle
mov f, ebx
};
std::cout << f;
return 0;
}
UPD: Pushing to stack and restoring back used registers gives the same result: SIGSEV

You have your input and output the wrong way around.
So, start by altering
"=r" (n):
"r" (f)
to:
"=r" (f) :
"r" (n)
Then I suspect you'll want to tell the compiler about clobbers (registers you are using that aren't inputs or outputs):
So add:
: "eax", "ebx", "ecx"
after the two lines above.
I personally would make some other changes:
Use local labels (1: and 2: etc), which allows the code to be duplicated without "duplicate label".
Use %1 instead of %%ebx - that way, you are not using an extra register.
Move %0 directly to %%ecx. You are loading 1 into %%eax two instructions later, so what purpose has it got to do in %%eax?
[Now, I'ver written too much, and someone else has answered first... ]
Edit: And, as Anton points out, you need $1 to load the constant 1, 1 means read from address 1, which doesn't work well, and most likely is the cause of your problems

Hopefully there are no requirements to use nothing but gcc inline asm to figure it out. You can translate your AT&T example with nasm, then disassemble with objdump and see what's the right syntax.
I seem to recall that mov 1,%eax should be mov $1,%eax if you mean literal constant and not a memory reference.
An answer by #MatsPetersson is very useful regarding the interaction of your inline assembly with the compiler (clobbered/input/output registers). I've focused on the reason why you get SIGSEGV, and reading the address 1 does answer the question.

Related

What constraint is not correct with this inline assembly?

I have x86 (32-bit) inline assembly in C++ code being compiled for 32-bit Linux by gcc 9.3.0. At the end of the asm call, I get the error 'asm' has impossible constraints. It doesn't tell me which constraint(s) are "impossible" or why.
I understand the syntax of the __asm__(...) statement as documented here, and I spent some time trying to prove why each constraint is correct, but I can't figure out which constraints to add or remove, or the offending assembly that is violating a constraint.
Here's the code:
extern int view_pitch;
static int scale;
static int rle_remainder;
static int height;
static short *color_table;
static PIXEL *RLE_palette;
static int repeat;
static PIXEL *dest;
static UCHAR *src;
void RLE_blit(void)
{
__asm__ __volatile__ (
" mov %2, %%edi\n"
" push %%ebp\n"
" mov %3, %%edx\n"
"mloop:\n"
" xor %%ecx, %%ecx\n"
" mov (%%edx), %%cl\n"
" add $0x2, %%edx\n"
" imul %4, %%ecx\n"
" add %0, %%ecx\n"
" xor %%eax, %%eax\n"
" mov %%ecx, %%ebx\n"
" and $0xffff, %%ecx\n"
" shr $0x10, %%ebx\n"
" mov %%ecx, %0\n"
" cmp $0x0, %%ebx\n"
" jbe mloop\n"
// color region code
" mov -0x1(%%edx), %%al\n"
" mov %%eax, %%ecx\n" // save orginal pixel in eax
" and $0x1f, %%eax\n" // now modulo original pixel by 32 to get offset
" mov %5, %%esi\n" // pointer to color table
" jz bypass\n"
" shr $0x5, %%ecx\n" // divide pixel by 32 to get region
" mov (%%esi,%%ecx,2), %%cx\n" // get region number from region array into cx
" shl $0x5, %%ecx\n" // multiply by 32 to get start of region
" mov %6, %%esi\n"
" add %%ecx, %%eax\n" // add offset
" mov (%%esi,%%eax,2), %%ax\n"
" cmp %1, %%ebx\n" // is run length <= than remaining column height
" jbe no_run_len_adjust\n" // yes: don't adjust
" mov %1, %%ebx\n" // no : set run length to height
"no_run_len_adjust:\n"
" sub %%ebx, %1\n"
" mov %7, %%ebp\n"
" mov %8, %%esi\n"
"run_len_loop:\n" // BEGIN outer run length loop
" mov %%ebp, %%ecx\n"
"col_loop:\n" // BEGIN inner column repeat loop
" dec %%ecx\n"
" mov %%ax, (%%edi,%%ecx,2)\n"
" jnz col_loop\n" // END column repeat loop
" add %%esi, %%edi\n"
" dec %%ebx\n"
" jnz run_len_loop\n" // END run length loop
" cmp $0x0, %1\n"
" jg mloop\n"
" jmp exit1\n"
"bypass:\n"
" mov %%ebx, %%eax\n"
" imul %8, %%eax\n"
" add %%eax, %%edi\n"
" sub %%ebx, %1\n"
" cmp $0x0, %1\n"
" jg mloop\n"
"exit1:\n"
" pop %%ebp\n"
:"+m"(rle_remainder), "+m"(height)
:"m"(dest), "m"(src), "m"(scale), "m"(color_table), "m"(RLE_palette), "m"(repeat), "m"(view_pitch)
:"esi", "edi", "eax", "ebx", "ecx", "edx", "memory"
);
}
Here's the rundown of why I think what I have is right:
Outputs: rle_remainder and height are both read and written-to, so the +m is appropriate.
Inputs: dest, src, scale, color_table, RLE_palette, repeat and view_pitch appear only to be read from; they aren't in the "destination" position of any instruction. So the m means they're being (only) read from.
The list of clobbered registers ("esi", "edi", "eax", "ebx", "ecx", "edx") contains all registers that are modified. Since some of the modified registers are 8 or 16-bit "parts" of these registers, like ax, these parts of the registers don't need to be explicitly specified, because the whole 32-bit register they're a part of is already being listed as clobbered.
"memory" is listed as clobbered to provide a read/write barrier for memory addresses being written to, not only the memory locations of the outputs themselves. For example, mov %%ax, (%%edi,%%ecx,2) clobbers memory.
I didn't use the goto keyword, because no C labels are being used -- the only labels being jumped to here are labels within this block of inline assembly.
The frame pointer ebp gets modified during the course of the code, but restored at the end, so I didn't list it as clobbered. If I do list it, gcc throws an error, because you're not allowed to tell gcc that you clobbered the frame pointer anymore, I don't think. Either way, the code fails to compile with or without ebp called out as clobbered.
Speculation: Could it be complaining that ebp gets modified at all? Even though we restore the value of it back to the original at the end of the code -- guaranteed -- unless the program crashes? The docs weren't clear on whether the rule is "you can't modify ebp at all, even temporarily" or "if you modify ebp, the value at the end of your inline asm must be restored to the original". If it's the former, then my code is wrong, because ebp gets modified during the execution of this code.
Clearly I am missing something about gcc's expectations of either the constraints or the assembly, but I can't spot what it is.

How to get an argument from stack in x64 assembly?

I'm trying to write a procedure in x64 assembly.
I'm calling it in a main program that is written in C++. I'm passing several parameters. I know that first 4 will be in specific registers and the rest of them (should be) on stack. What's more, I read that before taking 5th argument from the stack, I should substract 40 from RSP. And at the begining it worked. Later I needed to check the address of sth so I did it by: cout and &. But then, taking 5th argument from stack didn't work and I have no idea what whould I do.
fragment of C++ code:
std::cout << xOld << '\t' << &xOld << std::endl;
std::cout << xOld[0] << '\t' << &xOld[0] << std::endl;
SthInAsm(A, B, alfa, beta, n, xOld, xNew, lowerBound, upperBound, condition, isReady, precision, maxIterations);
fragment of Asm code:
.data
Aaddr DQ 0
Baddr DQ 0
alfa DQ 0
beta DQ 0
n DQ 0
xOld DQ 0
.
.
.
.code
SthInAsm PROC
MOV Aaddr, RCX
MOV Baddr, RDX
MOV alfa, R8
MOV beta, R9
SUB RSP, 40
XOR RAX, RAX
POP n
MOV RAX, n
.
.
.
After 'MOV RAX, n' RAX doesn't contain value of n. When I didn't check address by cout before calling this function, it worked.
Does anyone know what is the problem here?
Thanks to Jester I know what is wrong in my code. I must have misunderstood sth when I read about x64 assembly. Substracting from RSP - I shouldn't do it.
Instead of that, getting arguments from stack works when I write:
MOV RAX, QWORD PTR [RSP+40]
MOV RAX, QWORD PTR [RSP+48]
etc.
Thank you Jester again!

Trying to understand simple disassembled code from g++

I am still struggling with g++ inline assembler and trying to understand how to use it.
I've adapted a piece of code from here: http://asm.sourceforge.net/articles/linasm.html (Quoted from the "Assembler Instructions with C Expressions Operands" section in gcc info files)
static inline uint32_t sum0() {
uint32_t foo = 1, bar=2;
uint32_t ret;
__asm__ __volatile__ (
"add %%ebx,%%eax"
: "=eax"(ret) // ouput
: "eax"(foo), "ebx"(bar) // input
: "eax" // modify
);
return ret;
}
I've compiled disabling optimisations:
g++ -Og -O0 inline1.cpp -o test
The disassembled code puzzles me:
(gdb) disassemble sum0
Dump of assembler code for function sum0():
0x00000000000009de <+0>: push %rbp ;prologue...
0x00000000000009df <+1>: mov %rsp,%rbp ;prologue...
0x00000000000009e2 <+4>: movl $0x1,-0xc(%rbp) ;initialize foo
0x00000000000009e9 <+11>: movl $0x2,-0x8(%rbp) ;initialize bar
0x00000000000009f0 <+18>: mov -0xc(%rbp),%edx ;
0x00000000000009f3 <+21>: mov -0x8(%rbp),%ecx ;
0x00000000000009f6 <+24>: mov %edx,-0x14(%rbp) ; This is unexpected
0x00000000000009f9 <+27>: movd -0x14(%rbp),%xmm1 ; why moving variables
0x00000000000009fe <+32>: mov %ecx,-0x14(%rbp) ; to extended registers?
0x0000000000000a01 <+35>: movd -0x14(%rbp),%xmm2 ;
0x0000000000000a06 <+40>: add %ebx,%eax ; add (as expected)
0x0000000000000a08 <+42>: movd %xmm0,%edx ; copying the wrong result to ret
0x0000000000000a0c <+46>: mov %edx,-0x4(%rbp) ; " " " " " "
0x0000000000000a0f <+49>: mov -0x4(%rbp),%eax ; " " " " " "
0x0000000000000a12 <+52>: pop %rbp ;
0x0000000000000a13 <+53>: retq
End of assembler dump.
As expected, the sum0() function returns the wrong value.
Any thoughts? What is going on? How to get it right?
-- EDIT --
Based on #MarcGlisse comment, I tried:
static inline uint32_t sum0() {
uint32_t foo = 1, bar=2;
uint32_t ret;
__asm__ __volatile__ (
"add %%ebx,%%eax"
: "=a"(ret) // ouput
: "a"(foo), "b"(bar) // input
: "eax" // modify
);
return ret;
}
It seems that the tutorial I've been following is misleading. "eax" in the output/input field does not mean the register itself, but e,a,x abbreviations on the abbrev table.
Anyway, I still do not get it right. The code above results in a compilation error: 'asm' operand has impossible constraints.
I don't see why.
The Extended inline assembly constraints for x86 are listed in the official documentation.
The complete documentation is also worth reading.
As you can see, the constraints are all single letters.
The constraint "eax" fo foo specifies three constraints:
a
The a register.
x
Any SSE register.
e
32-bit signed integer constant, or ...
Since you are telling GCC that eax is clobbered it cannot put the input operand there and it picks xmm0.
When the compiler selects the registers to use to represent the input operands, it does not use any of the clobbered registers
The proper constraint is simply "a".
You need to remove eax (by the way it should be rax due to zeroing of the upper bits) from the clobbers (and add "cc").

Code blocks Ver.16.01 crashing during run cycle of programme

I have a program which has been proved to run on an older version of codeblocks (ver 13.12) but does not seem to work when I try it on the newer version (ver 16.01). The purpose of the programme is to enter two integers which will then be added, mult etc. It uses asm code which I am new at. My question is why does it say windows has stopped responding after I type 2 integers and press enter?
Here is the code:
//Program 16
#include <stdio.h>
#include <iostream>
using namespace std;
int main() {
int arg1, arg2, add, sub, mul, quo, rem ;
cout << "Enter two integer numbers : " ;
cin >> arg1 >> arg2 ;
cout << endl;
asm ( "addl %%ebx, %%eax;" : "=a" (add) : "a" (arg1) , "b" (arg2) );
asm ( "subl %%ebx, %%eax;" : "=a" (sub) : "a" (arg1) , "b" (arg2) );
asm ( "imull %%ebx, %%eax;" : "=a" (mul) : "a" (arg1) , "b" (arg2) );
asm ( "movl $0x0, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"idivl %%ebx;" : "=a" (quo), "=d" (rem) : "g" (arg1), "g" (arg2) );
cout<< arg1 << "+" << arg2 << " = " << add << endl;
cout<< arg1 << "-" << arg2 << " = " << sub << endl;
cout<< arg1 << "x" << arg2 << " = " << mul << endl;
cout<< arg1 << "/" << arg2 << " = " << quo << " ";
cout<< "remainder " << rem << endl;
return 0;
}
As Michael has said, your problem probably comes from your 4th asm statement being written incorrectly.
The first thing you need to understand when writing inline asm is what registers are and how they are used. Registers are a fundamental concept in x86 assembler programming, so if you don't know what they are, it's time for you to find an x86 assembly language primer.
Once you've got that, you need to understand that when compiler runs, it is using those registers in the code it generates. For example if you do for (int x=0; x<10; x++), x is (probably) going to end up in a register. So what happens if gcc decides to use ebx to hold the value of 'x', and then your asm statement stomps on ebx, putting some other value in it? gcc doesn't 'parse' your asm to figure out what you are doing. The only clue it has about what your asm does are those constraints listed after the asm instructions.
That's what Michael means when he says "the 4th ASM block doesn't list "EBX" in the clobber list (but its contents are destroyed)". If we look at your asm:
asm ("movl $0x0, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"idivl %%ebx;"
: "=a" (quo), "=d" (rem)
: "g" (arg1), "g" (arg2));
You see that the 3rd line is moving a value into ebx, but there's nothing in the constraints that follow to say that it is going to be changed. The fact that your program is crashing is probably due to gcc using that register for something else. The simplest fix might be to "list EBX in the clobber list":
asm ("movl $0x0, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"idivl %%ebx;"
: "=a" (quo), "=d" (rem)
: "g" (arg1), "g" (arg2)
: "ebx");
This tells gcc that ebx may be changed by the asm (aka it 'clobbers' it), and that it doesn't need to have any particular value when the asm statement begins, and won't have any particular value in it when the asm exits.
However, while that may be 'simplest,' it isn't necessarily the best. For example instead of using the "g" constraint for arg2, we can use the "b" constraint:
asm ("movl $0x0, %%edx;"
"movl %2, %%eax;"
"idivl %%ebx;"
: "=a" (quo), "=d" (rem)
: "g" (arg1), "b" (arg2));
This lets us get rid of the movl %3, %%ebx statement, since gcc will ensure the value is in ebx before calling the asm, and we don't need to clobber it anymore.
But why use ebx? idiv doesn't require any particular register there, and maybe gcc is already using ebx for something else. How about letting gcc just pick some register it isn't using? We do this using the "r" constraint:
asm ("movl $0x0, %%edx;"
"movl %2, %%eax;"
"idivl %3;"
: "=a" (quo), "=d" (rem)
: "g" (arg1), "r" (arg2));
Notice that the idiv now uses %3, which means "use the thing that is in the (zero-based) parameter #3." In this case, that's the register that contains arg2.
However, we can still do better. As you have already seen in your previous asm statements, you can use the "a" constraint to tell gcc to put a particular variable into the eax register. Which means we can do this:
asm ("movl $0x0, %%edx;"
"idivl %3;"
: "=a" (quo), "=d" (rem)
: "a" (arg1), "r" (arg2));
Again, 1 fewer instruction since we don't need to move the value into eax anymore. So how about that movl $0x0, %%edx thing? Well, we can get rid of that too:
asm ("idivl %3"
: "=a" (quo), "=d" (rem)
: "a" (arg1), "r" (arg2), "d" (0));
This uses the "d" constraint to put 0 into edx before executing the asm. That brings us to my final version:
asm ("idivl %3"
: "=a" (quo), "=d" (rem)
: "a" (arg1), "r" (arg2), "d" (0)
: "cc");
This says:
On input, put arg1 into eax, arg2 into some register (that we'll refer to using %3), and 0 into edx.
On output, eax will contain the quotient, edx will contain the remainder. This is how the idiv instruction works.
The "cc" clobber tells gcc that your asm modifies the flags registers (eflags), which idiv does as a side effect.
Now, despite having described all this, I usually think using inline asm is a bad idea. It's cool, it's powerful, it gives interesting insight into how the gcc compiler works. But look at all the weird things you "just have to know" in order to work with this. And as you have noticed, if you get any of them wrong, weird things can happen.
It's true all these things are documented in gcc's docs. The simple constraints (like "r" and "g") are doc'ed here. The specific register constraints for the x86 are in the 'x86 family' here. And the detailed description of all the asm features is here. So if you must use this stuff (for example if you are supporting some existing code that uses this), the information is out there.
But there's a much shorter read here that gives you a whole list of reasons not to use inline asm. That's the read I'd recommend. Stick with C, and let the compiler handle all that register junk for you.
PS While I'm at this:
asm ( "addl %2, %0" : "=r" (add) : "0" (arg1) , "r" (arg2) : "cc");
asm ( "subl %2, %0" : "=r" (sub) : "0" (arg1) , "r" (arg2) : "cc");
asm ( "imull %2, %0" : "=r" (mul) : "0" (arg1) , "r" (arg2) : "cc");
Check out the gcc docs to see what it means to use a digit in an input operand.
David Wohlferd has given a very good answer on how to better work with GCC extended assembly templates to do the work of your existing code.
A question may arise as to why the code presented fails with Codeblocks 16.01 w/GCC where as it may have worked previously. As it stands the code looks pretty simple, so what could have possibly gone wrong?
The best thing I recommend is learning to use the debugger and set break points in Codeblocks. It is very simple (but beyond the scope of this answer). You can learn more about debugging in the Codeblocks documentation.
If you used the debugger with Codeblocks 16.01, with a stock C++ console project you may have discovered that the program is giving you an Arithmetic Exception on the IDIV instruction in the assembly template. This is what appears in my console output:
Program received signal SIGFPE, Arithmetic exception.
These lines of code do as you would expect:
asm ( "addl %%ebx, %%eax;" : "=a" (add) : "a" (arg1) , "b" (arg2) );
asm ( "subl %%ebx, %%eax;" : "=a" (sub) : "a" (arg1) , "b" (arg2) );
asm ( "imull %%ebx, %%eax;" : "=a" (mul) : "a" (arg1) , "b" (arg2) );
This is where was have issues:
asm ( "movl $0x0, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"idivl %%ebx;" : "=a" (quo), "=d" (rem) : "g" (arg1), "g" (arg2) );
One thing Codeblocks can do for you is show you the assembly code it generated. Pull down the Debug menu, select Debugging Windows > and Disassembly. The Watches and CPU Registers windows I highly recommend as well.
If you review the generated code with CodeBlocks 16.01 w/GCC you might discover it produced this:
/* Automatically produced by the assembly template for input constraints */
mov -0x20(%ebp),%eax /* EAX = value of arg1 */
mov -0x24(%ebp),%edx /* EDX = value of arg2 */
/* Our assembly template instructions */
mov $0x0,%edx /* EDX = 0 - we just clobbered the previous EDX! */
mov %eax,%eax /* EAX remains the same */
mov %edx,%ebx /* EBX = EDX = 0. */
idiv %ebx /* EBX is 0 so this is division by zero!! *
/* Automatically produced by the assembly template for output constraints */
mov %eax,-0x18(%ebp) /* Value at quo = EAX */
mov %edx,-0x1c(%ebp) /* Value at rem = EDX */
I have commented the code and it should be obvious why this code won't work. We effectively ended up placing zero in EBX and then attempted to use that as a divisor with IDIV and that produced an arithmetic exception (division by zero in this case).
This happened because GCC will (by default) assume that all the input operands are used (consumed) BEFORE the output operands are written to. We never told GCC that it couldn't potentially use the same input operands as output operands. GCC considers this situation an Early Clobber. It provides a mechanism to mark an output constraint as early clobber using & (ampersand) modifier:
`&'
Means (in a particular alternative) that this operand is an earlyclobber operand, which is modified before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is used as an input operand or as part of any memory address.
By changing the operands so that the early clobbers are dealt with, we can place & on both the output constraints like this:
"idivl %%ebx;" : "=&a" (quo), "=&d" (rem) : "g" (arg1), "g" (arg2) );
In this case arg1 and arg2 will not be passed in through any of the operands marked with &. This means this code will avoid using EAX and EDX for the input operands arg1 and arg2.
The other issue is that EBX is modified by your code but you don't tell GCC. You could simply add EBX to the clobber list in the assembly template like this:
"idivl %%ebx;" : "=&a" (quo), "=&d" (rem) : "g" (arg1), "g" (arg2) : "ebx");
So this code should work, but is not efficient:
asm ( "movl $0x0, %%edx;"
"movl %2, %%eax;"
"movl %3, %%ebx;"
"idivl %%ebx;" : "=&a" (quo), "=&d" (rem) : "g" (arg1), "g" (arg2) : "ebx");
The generated code will now look something like:
/* Automatically produced by the assembler template for input constraints */
mov -0x30(%ebp),%ecx /* ECX = value of arg1 */
mov -0x34(%ebp),%esi /* ESI = value of arg2 */
/* Our assembly template instructions */
mov $0x0,%edx /* EDX = 0 */
mov %ecx,%eax /* EAX = ECX = arg1 */
mov %esi,%ebx /* EBX = ESI = arg2 */
idiv %ebx
/* Automatically produced by the assembler template for output constraints */
mov %eax,-0x28(%ebp) /* Value at quo = EAX */
mov %edx,-0x2c(%ebp) /* Value at rem = EDX */
This time the input operands for arg1 and arg2 didn't share the same registers that would conflict with the MOV instructions inside our inline assembly template.
Why other (including older) versions of GCC work?
If GCC had generated instructions using registers other than EAX, EDX, and EBX for arg1 and arg2 operands then it would have worked. But the fact it may have worked was just by luck. To see what happend with older Codeblocks and the GCC that came with it, I'd recommend reviewing the code generated in that environment the same way I have discussed above.
Early clobbering, and register clobbering in general is a reason that extended assembler templates can be tricky, and a reason extended assembler templates should be used sparingly especially if you don't have a solid understanding.
You can create code that appears to work, but is coded incorrectly. A different version of GCC or even different optimization levels may alter the behaviour of the code. Sometimes these bugs can be so subtle that as a program grows the bug manifests itself in other ways that may be hard to trace.
Another rule of thumb is that not all code you find on the internet is bug free, and the subtle complexities of extended inline assembly is often overlooked in tutorials. I discovered the code you used seems to be based on this Code Project. Unfortunately the author didn't have a thorough understanding of the intracies involved. The code may have worked at the time, but not necessarily now.

Generated code not matching expectations with Extended ASM

I have a CpuFeatures class. The requirements for the class are simple: (1) preserve EBX or RBX, and (2) record the values returned from CPUID in EAX/EBX/ECX/EDX. I'm not sure the code being generated is the code I intended.
The CpuFeatures class code uses GCC Extended ASM. Here's the relevant code:
struct CPUIDinfo
{
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
};
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
{
uintptr_t scratch;
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
: "=a"(info.EAX), "=&r"(scratch), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
);
if(func == 0)
return !!info.EAX;
return true;
}
The code below was compiled with -g3 -Og on Cygwin i386. When I examine it under a debugger, I'm don't like what I am seeing.
Dump of assembler code for function CpuFeatures::DoDetectX86Features():
...
0x0048f355 <+1>: sub $0x48,%esp
=> 0x0048f358 <+4>: mov $0x0,%ecx
0x0048f35d <+9>: mov %ecx,%eax
0x0048f35f <+11>: xchg %ebx,%ebx
0x0048f361 <+13>: cpuid
0x0048f363 <+15>: xchg %ebx,%ebx
0x0048f365 <+17>: mov %eax,0x10(%esp)
0x0048f369 <+21>: mov %ecx,0x18(%esp)
0x0048f36d <+25>: mov %edx,0x1c(%esp)
0x0048f371 <+29>: mov %ebx,0x14(%esp)
0x0048f375 <+33>: test %eax,%eax
...
I don't like what I am seeing because it appears EBX/RBX is not being preserved (xchg %ebx,%ebx at +11). Additionally, it looks like the preserved EBX/RBX is being saved as the result of CPUID, and not the actual value of EBX returned by CPUID (xchg %ebx,%ebx at +15, before the mov %ebx,0x14(%esp) at +29).
If I change the operand to use a memory op with "=&m"(scratch), then the generated code is:
0x0048f35e <+10>: xchg %ebx,0x40(%esp)
0x0048f362 <+14>: cpuid
0x0048f364 <+16>: xchg %ebx,0x40(%esp)
A related question is What ensures reads/writes of operands occurs at desired times with extended ASM?
What am I doing wrong (besides wasting countless hours on something that should have taken 5 or 15 minutes)?
The code below is a complete example that I used to compile your example code above including the modification to exchange(swap) directly to the info.EBX variable.
#include <inttypes.h>
#define word32 uint32_t
struct CPUIDinfo
{
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
};
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
{
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
: "=a"(info.EAX), "=&m"(info.EBX), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
);
if(func == 0)
return !!info.EAX;
return true;
}
int main()
{
CPUIDinfo cpuInfo;
CpuId(1, 0, cpuInfo);
}
The first observation that you should make is that I chose to use the info.EBX memory location to do the actual swap to. This eliminates needing a another temporary variable or register.
I assembled as 32-bit code with -g3 -Og -S -m32 and got these instructions of interest:
xchgl %ebx, 4(%edi)
cpuid
xchgl %ebx, 4(%edi)
movl %eax, (%edi)
movl %ecx, 8(%edi)
movl %edx, 12(%edi)
%edi happens to contain the address of the info structure. 4(%edi) happens to be the address of info.EBX. We swap %ebx and 4(%edi) after cpuid. With that instruction ebx is restored to what it was before cpuid and 4(%edi) now has what ebx was right after cpuid was executed. The remaining movl lines place eax, ecx, edx registers into the rest of the info structure via the %edi register.
The generated code above is what I would expect it to be.
Your code with the scratch variable (and using the constraint "=&m"(scratch)) never gets used after the assembler template so %ebx,0x40(%esp) has the value you want but it never gets moved anywhere useful. You'd have to copy the scratch variable into info.EBX (ie. info.EBX = scratch;)and look at all of the resulting instructions that get generated. At some point the data would be copied from the scratch memory location to info.EBX among the generated assembly instructions.
Update - Cygwin and MinGW
I wasn't entirely satisfied that the Cygwin code output was correct. In the middle of the night I had an Aha! moment. Windows already does its own position independent code when the dynamic link loader loads an image (DLL etc) and modifies the image via re-basing. There is no need for additional PIC processing like it is done in Linux 32 bit shared libraries so there is no issue with ebx/rbx. This is why Cygwin and MinGW will present warnings like this when compiling with -fPIC
warning: -fPIC ignored for target (all code is position independent)
This is because under Windows all 32bit code can be re-based when it is loaded by the Windows dynamic loader. More about re-basing can be found in this Dr. Dobbs article. Information on the windows Portable Executable format (PE) can be found in this Wiki article. Cygwin and MinGW don't need to worry about preserving ebx/rbx when targeting 32bit code because on their platforms PIC is already handled by the OS, other re-basing tools, and the linker.