Trying to understand simple disassembled code from g++ - c++

I am still struggling with g++ inline assembler and trying to understand how to use it.
I've adapted a piece of code from here: http://asm.sourceforge.net/articles/linasm.html (Quoted from the "Assembler Instructions with C Expressions Operands" section in gcc info files)
static inline uint32_t sum0() {
uint32_t foo = 1, bar=2;
uint32_t ret;
__asm__ __volatile__ (
"add %%ebx,%%eax"
: "=eax"(ret) // ouput
: "eax"(foo), "ebx"(bar) // input
: "eax" // modify
);
return ret;
}
I've compiled disabling optimisations:
g++ -Og -O0 inline1.cpp -o test
The disassembled code puzzles me:
(gdb) disassemble sum0
Dump of assembler code for function sum0():
0x00000000000009de <+0>: push %rbp ;prologue...
0x00000000000009df <+1>: mov %rsp,%rbp ;prologue...
0x00000000000009e2 <+4>: movl $0x1,-0xc(%rbp) ;initialize foo
0x00000000000009e9 <+11>: movl $0x2,-0x8(%rbp) ;initialize bar
0x00000000000009f0 <+18>: mov -0xc(%rbp),%edx ;
0x00000000000009f3 <+21>: mov -0x8(%rbp),%ecx ;
0x00000000000009f6 <+24>: mov %edx,-0x14(%rbp) ; This is unexpected
0x00000000000009f9 <+27>: movd -0x14(%rbp),%xmm1 ; why moving variables
0x00000000000009fe <+32>: mov %ecx,-0x14(%rbp) ; to extended registers?
0x0000000000000a01 <+35>: movd -0x14(%rbp),%xmm2 ;
0x0000000000000a06 <+40>: add %ebx,%eax ; add (as expected)
0x0000000000000a08 <+42>: movd %xmm0,%edx ; copying the wrong result to ret
0x0000000000000a0c <+46>: mov %edx,-0x4(%rbp) ; " " " " " "
0x0000000000000a0f <+49>: mov -0x4(%rbp),%eax ; " " " " " "
0x0000000000000a12 <+52>: pop %rbp ;
0x0000000000000a13 <+53>: retq
End of assembler dump.
As expected, the sum0() function returns the wrong value.
Any thoughts? What is going on? How to get it right?
-- EDIT --
Based on #MarcGlisse comment, I tried:
static inline uint32_t sum0() {
uint32_t foo = 1, bar=2;
uint32_t ret;
__asm__ __volatile__ (
"add %%ebx,%%eax"
: "=a"(ret) // ouput
: "a"(foo), "b"(bar) // input
: "eax" // modify
);
return ret;
}
It seems that the tutorial I've been following is misleading. "eax" in the output/input field does not mean the register itself, but e,a,x abbreviations on the abbrev table.
Anyway, I still do not get it right. The code above results in a compilation error: 'asm' operand has impossible constraints.
I don't see why.

The Extended inline assembly constraints for x86 are listed in the official documentation.
The complete documentation is also worth reading.
As you can see, the constraints are all single letters.
The constraint "eax" fo foo specifies three constraints:
a
The a register.
x
Any SSE register.
e
32-bit signed integer constant, or ...
Since you are telling GCC that eax is clobbered it cannot put the input operand there and it picks xmm0.
When the compiler selects the registers to use to represent the input operands, it does not use any of the clobbered registers
The proper constraint is simply "a".
You need to remove eax (by the way it should be rax due to zeroing of the upper bits) from the clobbers (and add "cc").

Related

Why does my program not check the value of a bitfield member even though there is an "if" statement?

I wrote this program as a test case for the behavior of bit field member comparisons in C++ (I suppose the same behavior would be exhibited in C as well):
#include <cstdint>
#include <cstdio>
union Foo
{
int8_t bar;
struct
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
int8_t baz : 1;
int8_t quux : 7;
#elif __BYTE_ORDER == __BIG_ENDIAN
int8_t quux : 7;
int8_t baz : 1;
#endif
};
};
int main()
{
Foo foo;
scanf("%d", &foo.bar);
if (foo.baz == 1)
printf("foo.baz == 1\n");
else
printf("foo.baz != 1\n");
}
After I compile and run it with 1 as its input, I get the following output:
foo.baz != 1
*** stack smashing detected ***: terminated
fish: “./a.out” terminated by signal SIGABRT (Abort)
One would expect that the foo.baz == 1 check would be evaluated as true since baz is always the least significant bit in the anonymous bit field. However, the opposite seems to happen, as can be seen from the program output (which is, somewhat comfortingly, consistently the same across each program invocation).
Even more weird to me is the fact that the generated AMD64 assembly code for the program (using the GCC 10.2 compiler) does not contain even a single comparison or jump instruction!
.LC0:
.string "%d"
.LC1:
.string "foo.baz != 1"
main:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-1]
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call scanf
mov edi, OFFSET FLAT:.LC1
call puts
mov eax, 0
leave
ret
It seems that the C++ code for the if statement somehow gets optimized out (or something like that), even though I compiled the program with the default settings (i.e. I did not turn on any level of optimization or anything like that).
Interestingly enough, Clang 10.0.1 (when run without optimizations) seems to generate code with a cmp instruction (as well as a jne and a jmp one):
main: # #main
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 4], 0
lea rax, [rbp - 8]
movabs rdi, offset .L.str
mov rsi, rax
mov al, 0
call scanf
mov cl, byte ptr [rbp - 8]
shl cl, 7
sar cl, 7
movsx edx, cl
cmp edx, 1
jne .LBB0_2
movabs rdi, offset .L.str.1
mov al, 0
call printf
jmp .LBB0_3
.LBB0_2:
movabs rdi, offset .L.str.2
mov al, 0
call printf
.LBB0_3:
mov eax, dword ptr [rbp - 4]
add rsp, 16
pop rbp
ret
.L.str:
.asciz "%d"
.L.str.1:
.asciz "foo.baz == 1\n"
.L.str.2:
.asciz "foo.baz != 1\n"
Both of the printf strings also seem to be present in the data segment (unlike in the GCC case when only the second one is present). I cannot tell for sure (because I'm not very proficient in assembly) but this seems to be properly generated code (unlike the one which GCC generates).
However, as soon as I try compile with any kind of optimizations (even -O1) using Clang, the comparisons/jumps are gone (as well as the foo.baz == 1 string), and the generated code seems to be very similar to the one which GCC generates:
(with -O1)
main: # #main
push rax
mov rsi, rsp
mov edi, offset .L.str
xor eax, eax
call scanf
mov edi, offset .Lstr
call puts
xor eax, eax
pop rcx
ret
.L.str:
.asciz "%d"
.Lstr:
.asciz "foo.baz != 1"
(You may want to check the generated assembly code by different compiler versions yourself using Compiler Explorer.)
I'm totally perplexed by this kind of unintuitive behavior. The only thing which comes to mind as an explanation is the interaction of some weird undefined behavior of bitfields containing signed integral types and unions. What makes me think so is that after I replace the signed integer types with their unsigned counterparts, the output of the program becomes exactly as one would expect (with 1 as input):
foo.baz == 1
*** stack smashing detected ***: terminated
fish: “./a.out” terminated by signal SIGABRT (Abort)
Naturally, the program crashing because of a stack smashing (just like before) is something which is not supposed to happen, which leads to my second question: why does this occur?
Here's the modified program:
#include <cstdint>
#include <cstdio>
union Foo
{
uint8_t bar;
struct
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
uint8_t baz : 1;
uint8_t quux : 7;
#elif __BYTE_ORDER == __BIG_ENDIAN
uint8_t quux : 7;
uint8_t baz : 1;
#endif
};
};
int main()
{
Foo foo;
scanf("%d", &foo.bar);
if (foo.baz == 1)
printf("foo.baz == 1\n");
else
printf("foo.baz != 1\n");
}
... and the generated assembly code by GCC:
.LC0:
.string "%d"
.LC1:
.string "foo.baz == 1"
.LC2:
.string "foo.baz != 1"
main:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-1]
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call scanf
movzx eax, BYTE PTR [rbp-1]
and eax, 1
test al, al
je .L2
mov edi, OFFSET FLAT:.LC1
call puts
jmp .L3
.L2:
mov edi, OFFSET FLAT:.LC2
call puts
.L3:
mov eax, 0
leave
ret
The stack smashing has nothing to do with member access.
scanf("%d", &foo.bar);
The %d format conversion specifier is for an int. Which is, typically, 4 bytes. But your bar is:
int8_t bar;
just one byte.
So, scanf ends up writing a 4 bytes worth of an int value into a one byte bar, and clobbering three additional bytes in the immediate vicinity.
There's your stack smash.
The answer is trivial.
your baz struct member is 1 bit long and it is signed. So it will never be 1. The only possibe values are 0 and -1.
Compiler knows that so the condition foo.baz == 1 will never be the truth. No conditional code has to be generated.
So I afraid it is not the compiler bug, only the programmer bug :)
So if we change the code to:
int main()
{
union Foo foo;
int x;
scanf("%d", &x);
foo.bar = x;
if (foo.baz == -1)
printf("foo.baz == -1\n");
else
printf("foo.baz != -1\n");
}
Compiler starts to generate the conditional instructions.
https://godbolt.org/z/fzKMo5
BTW your endianess check does not make any sense here as endianess defines the byte order not the bit order
Not related to the code generation problem is use of the wrong scanf conversion specifier.

Error in simple g++ inline assembler

I'm trying to write a "hello world" program to test inline assembler in g++.
(still leaning AT&T syntax)
The code is:
#include <stdlib.h>
#include <stdio.h>
# include <iostream>
using namespace std;
int main() {
int c,d;
__asm__ __volatile__ (
"mov %eax,1; \n\t"
"cpuid; \n\t"
"mov %edx, $d; \n\t"
"mov %ecx, $c; \n\t"
);
cout << c << " " << d << "\n";
return 0;
}
I'm getting the following error:
inline1.cpp: Assembler messages:
inline1.cpp:18: Error: unsupported instruction `mov'
inline1.cpp:19: Error: unsupported instruction `mov'
Can you help me to get it done?
Tks
Your assembly code is not valid. Please carefully read on Extended Asm. Here's another good overview.
Here is a CPUID example code from here:
static inline void cpuid(int code, uint32_t* a, uint32_t* d)
{
asm volatile ( "cpuid" : "=a"(*a), "=d"(*d) : "0"(code) : "ebx", "ecx" );
}
Note the format:
first : followed by output operands: : "=a"(*a), "=d"(*d); "=a" is eax and "=b is ebx
second : followed by input operands: : "0"(code); "0" means that code should occupy the same location as output operand 0 (eax in this case)
third : followed by clobbered registers list: : "ebx", "ecx"
I kept #AMA answer as accepted one because it was complete enough. But I've put some thought on it and I concluded that it is not 100% correct.
The code I was trying to implement in GCC is the one below (Microsoft Visual Studio version).
int c,d;
_asm
{
mov eax, 1;
cpuid;
mov d, edx;
mov c, ecx;
}
When cpuid executes with eax set to 1, feature information is returned in ecx and edx.
The suggested code returns the values from eax ("=a") and edx (="d").
This can be easily seen at gdb:
(gdb) disassemble cpuid
Dump of assembler code for function cpuid(int, uint32_t*, uint32_t*):
0x0000000000000a2a <+0>: push %rbp
0x0000000000000a2b <+1>: mov %rsp,%rbp
0x0000000000000a2e <+4>: push %rbx
0x0000000000000a2f <+5>: mov %edi,-0xc(%rbp)
0x0000000000000a32 <+8>: mov %rsi,-0x18(%rbp)
0x0000000000000a36 <+12>: mov %rdx,-0x20(%rbp)
0x0000000000000a3a <+16>: mov -0xc(%rbp),%eax
0x0000000000000a3d <+19>: cpuid
0x0000000000000a3f <+21>: mov -0x18(%rbp),%rcx
0x0000000000000a43 <+25>: mov %eax,(%rcx) <== HERE
0x0000000000000a45 <+27>: mov -0x20(%rbp),%rax
0x0000000000000a49 <+31>: mov %edx,(%rax) <== HERE
0x0000000000000a4b <+33>: nop
0x0000000000000a4c <+34>: pop %rbx
0x0000000000000a4d <+35>: pop %rbp
0x0000000000000a4e <+36>: retq
End of assembler dump.
The code that generates something closer to what I want is (EDITED based on feedbacks on the comments):
static inline void cpuid2(uint32_t* d, uint32_t* c)
{
int a = 1;
asm volatile ( "cpuid" : "=d"(*d), "=c"(*c), "+a"(a) :: "ebx" );
}
The result is:
(gdb) disassemble cpuid2
Dump of assembler code for function cpuid2(uint32_t*, uint32_t*):
0x00000000000009b0 <+0>: push %rbp
0x00000000000009b1 <+1>: mov %rsp,%rbp
0x00000000000009b4 <+4>: push %rbx
0x00000000000009b5 <+5>: mov %rdi,-0x20(%rbp)
0x00000000000009b9 <+9>: mov %rsi,-0x28(%rbp)
0x00000000000009bd <+13>: movl $0x1,-0xc(%rbp)
0x00000000000009c4 <+20>: mov -0xc(%rbp),%eax
0x00000000000009c7 <+23>: cpuid
0x00000000000009c9 <+25>: mov %edx,%esi
0x00000000000009cb <+27>: mov -0x20(%rbp),%rdx
0x00000000000009cf <+31>: mov %esi,(%rdx)
0x00000000000009d1 <+33>: mov -0x28(%rbp),%rdx
0x00000000000009d5 <+37>: mov %ecx,(%rdx)
0x00000000000009d7 <+39>: mov %eax,-0xc(%rbp)
0x00000000000009da <+42>: nop
0x00000000000009db <+43>: pop %rbx
0x00000000000009dc <+44>: pop %rbp
0x00000000000009dd <+45>: retq
End of assembler dump.
Just to be clear... I know that there are better ways of doing it. But the purpose here is purely educational. Just want to understand how it works ;-)
-- edited (removed personal opinion) ---

What's the point of adding ud2 after writing to [ds:0x0]?

I was experimenting with GCC, trying to convince it to assume that certain portions of code are unreachable so as to take opportunity to optimize. One of my experiments gave me somewhat strange code. Here's the source:
#include <iostream>
#define UNREACHABLE {char* null=0; *null=0; return {};}
double test(double x)
{
if(x==-1) return -1;
else if(x==1) return 1;
UNREACHABLE;
}
int main()
{
std::cout << "Enter a number. Only +/- 1 is supported, otherwise I dunno what'll happen: ";
double x;
std::cin >> x;
std::cout << "Here's what I got: " << test(x) << "\n";
}
Here's how I compiled it:
g++ -std=c++11 test.cpp -O3 -march=native -S -masm=intel -Wall -Wextra
And the code of test function looks like this:
_Z4testd:
.LFB1397:
.cfi_startproc
fld QWORD PTR [esp+4]
fld1
fchs
fld st(0)
fxch st(2)
fucomi st, st(2)
fstp st(2)
jp .L10
je .L11
fstp st(0)
jmp .L7
.L10:
fstp st(0)
.p2align 4,,10
.p2align 3
.L7:
fld1
fld st(0)
fxch st(2)
fucomip st, st(2)
fstp st(1)
jp .L12
je .L6
fstp st(0)
jmp .L8
.L12:
fstp st(0)
.p2align 4,,10
.p2align 3
.L8:
mov BYTE PTR ds:0, 0
ud2 // This is redundant, isn't it?..
.p2align 4,,10
.p2align 3
.L11:
fstp st(1)
.L6:
rep; ret
What makes me wonder here is the code at .L8. Namely, it already writes to zero address, which guarantees segmentation fault unless ds has some non-default selector. So why the additional ud2? Isn't writing to zero address already guaranteed crash? Or does GCC not believe that ds has default selector and tries to make a sure-fire crash?
So, your code is writing to address zero (NULL) which in itself is defined to be "undefined behaviour". Since undefined behaviour covers anything, and most importantly for this case, "that it does what you may imagine that it would do" (in other words, writes to address zero rather than crashing). The compiler then decides to TELL you that by adding an UD2 instruction. It's also possible that it is to protect against continuing from a signal handler with further undefined behaviour.
Yes, most machines, under most circumstances, will crash for NULL accesses. But it's not 100% guaranteed, and as I said above, one can catch segfault in a signal handler, and then try to continue - it's really not a good idea to actually continue after trying to write to NULL, so the compiler adds UD2 to ensure you don't go on... It uses 2 bytes more of memory, beyond that I don't see what harm it does [after all, it's undefined what happens - if the compiler wished to do so, it could email random pictures from your filesystem to the Queen of England... I think UD2 is a better choice...]
It is interesting to spot that LLVM does this by itself - I have no special detection of NIL pointer access, but my pascal compiler compiles this:
program p;
var
ptr : ^integer;
begin
ptr := NIL;
ptr^ := 42;
end.
into:
0000000000400880 <__PascalMain>:
400880: 55 push %rbp
400881: 48 89 e5 mov %rsp,%rbp
400884: 48 c7 05 19 18 20 00 movq $0x0,0x201819(%rip) # 6020a8 <ptr>
40088b: 00 00 00 00
40088f: 0f 0b ud2
I'm still trying to figure out where in LLVM this happens and try to understand the purpose of the UD2 instruction itself.
I think the answer is here, in llvm/lib/Transforms/Utils/Local.cpp
void llvm::changeToUnreachable(Instruction *I, bool UseLLVMTrap) {
BasicBlock *BB = I->getParent();
// Loop over all of the successors, removing BB's entry from any PHI
// nodes.
for (succ_iterator SI = succ_begin(BB), SE = succ_end(BB); SI != SE; ++SI)
(*SI)->removePredecessor(BB);
// Insert a call to llvm.trap right before this. This turns the undefined
// behavior into a hard fail instead of falling through into random code.
if (UseLLVMTrap) {
Function *TrapFn =
Intrinsic::getDeclaration(BB->getParent()->getParent(), Intrinsic::trap);
CallInst *CallTrap = CallInst::Create(TrapFn, "", I);
CallTrap->setDebugLoc(I->getDebugLoc());
}
new UnreachableInst(I->getContext(), I);
// All instructions after this are dead.
BasicBlock::iterator BBI = I->getIterator(), BBE = BB->end();
while (BBI != BBE) {
if (!BBI->use_empty())
BBI->replaceAllUsesWith(UndefValue::get(BBI->getType()));
BB->getInstList().erase(BBI++);
}
}
In particular the comment in the middle, where it says "instead of falling through to the into random code". In your code there is no code following the NULL access, but imagine this:
void func()
{
if (answer == 42)
{
#if DEBUG
// Intentionally crash to avoid formatting hard disk for now
char *ptr = NULL;
ptr = 0;
#endif
// Format hard disk.
... some code to format hard disk ...
}
printf("We haven't found the answer yet\n");
...
}
So, this SHOULD crash, but if it doesn't the compiler will ensure that you do not continue after it... It makes UB crashes a little more obvious (and in this case prevents the hard disk from being formatted...)
I was trying to find out when this was introduced, but the function itself originates in 2007, but it's not used for exactly this purpose at the time, which makes it really hard to figure out why it is used this way.

Generated code not matching expectations with Extended ASM

I have a CpuFeatures class. The requirements for the class are simple: (1) preserve EBX or RBX, and (2) record the values returned from CPUID in EAX/EBX/ECX/EDX. I'm not sure the code being generated is the code I intended.
The CpuFeatures class code uses GCC Extended ASM. Here's the relevant code:
struct CPUIDinfo
{
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
};
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
{
uintptr_t scratch;
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
: "=a"(info.EAX), "=&r"(scratch), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
);
if(func == 0)
return !!info.EAX;
return true;
}
The code below was compiled with -g3 -Og on Cygwin i386. When I examine it under a debugger, I'm don't like what I am seeing.
Dump of assembler code for function CpuFeatures::DoDetectX86Features():
...
0x0048f355 <+1>: sub $0x48,%esp
=> 0x0048f358 <+4>: mov $0x0,%ecx
0x0048f35d <+9>: mov %ecx,%eax
0x0048f35f <+11>: xchg %ebx,%ebx
0x0048f361 <+13>: cpuid
0x0048f363 <+15>: xchg %ebx,%ebx
0x0048f365 <+17>: mov %eax,0x10(%esp)
0x0048f369 <+21>: mov %ecx,0x18(%esp)
0x0048f36d <+25>: mov %edx,0x1c(%esp)
0x0048f371 <+29>: mov %ebx,0x14(%esp)
0x0048f375 <+33>: test %eax,%eax
...
I don't like what I am seeing because it appears EBX/RBX is not being preserved (xchg %ebx,%ebx at +11). Additionally, it looks like the preserved EBX/RBX is being saved as the result of CPUID, and not the actual value of EBX returned by CPUID (xchg %ebx,%ebx at +15, before the mov %ebx,0x14(%esp) at +29).
If I change the operand to use a memory op with "=&m"(scratch), then the generated code is:
0x0048f35e <+10>: xchg %ebx,0x40(%esp)
0x0048f362 <+14>: cpuid
0x0048f364 <+16>: xchg %ebx,0x40(%esp)
A related question is What ensures reads/writes of operands occurs at desired times with extended ASM?
What am I doing wrong (besides wasting countless hours on something that should have taken 5 or 15 minutes)?
The code below is a complete example that I used to compile your example code above including the modification to exchange(swap) directly to the info.EBX variable.
#include <inttypes.h>
#define word32 uint32_t
struct CPUIDinfo
{
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
};
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
{
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
#else
"\t xchgl %%ebx, %k1 \n"
#endif
: "=a"(info.EAX), "=&m"(info.EBX), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
);
if(func == 0)
return !!info.EAX;
return true;
}
int main()
{
CPUIDinfo cpuInfo;
CpuId(1, 0, cpuInfo);
}
The first observation that you should make is that I chose to use the info.EBX memory location to do the actual swap to. This eliminates needing a another temporary variable or register.
I assembled as 32-bit code with -g3 -Og -S -m32 and got these instructions of interest:
xchgl %ebx, 4(%edi)
cpuid
xchgl %ebx, 4(%edi)
movl %eax, (%edi)
movl %ecx, 8(%edi)
movl %edx, 12(%edi)
%edi happens to contain the address of the info structure. 4(%edi) happens to be the address of info.EBX. We swap %ebx and 4(%edi) after cpuid. With that instruction ebx is restored to what it was before cpuid and 4(%edi) now has what ebx was right after cpuid was executed. The remaining movl lines place eax, ecx, edx registers into the rest of the info structure via the %edi register.
The generated code above is what I would expect it to be.
Your code with the scratch variable (and using the constraint "=&m"(scratch)) never gets used after the assembler template so %ebx,0x40(%esp) has the value you want but it never gets moved anywhere useful. You'd have to copy the scratch variable into info.EBX (ie. info.EBX = scratch;)and look at all of the resulting instructions that get generated. At some point the data would be copied from the scratch memory location to info.EBX among the generated assembly instructions.
Update - Cygwin and MinGW
I wasn't entirely satisfied that the Cygwin code output was correct. In the middle of the night I had an Aha! moment. Windows already does its own position independent code when the dynamic link loader loads an image (DLL etc) and modifies the image via re-basing. There is no need for additional PIC processing like it is done in Linux 32 bit shared libraries so there is no issue with ebx/rbx. This is why Cygwin and MinGW will present warnings like this when compiling with -fPIC
warning: -fPIC ignored for target (all code is position independent)
This is because under Windows all 32bit code can be re-based when it is loaded by the Windows dynamic loader. More about re-basing can be found in this Dr. Dobbs article. Information on the windows Portable Executable format (PE) can be found in this Wiki article. Cygwin and MinGW don't need to worry about preserving ebx/rbx when targeting 32bit code because on their platforms PIC is already handled by the OS, other re-basing tools, and the linker.

Using AT&T inline assembler for GCC

I'm writing a simple but a little specific program:
Purpose: calculate number from it's factorial
Requirements: all calculations must be done on gcc inline asm (at&t syntax)
Source code:
#include <iostream>
int main()
{
unsigned n = 0, f = 0;
std::cin >> n;
asm
(
"mov %0, %%eax \n"
"mov %%eax, %%ecx \n"
"mov 1, %%ebx \n"
"mov 1, %%eax \n"
"jmp cycle_start\n"
"cycle:\n"
"inc %%ebx\n"
"mul %%ebx\n"
"cycle_start:\n"
"cmp %%ecx, %%eax\n"
"jnz cycle\n"
"mov %%ebx, %1 \n":
"=r" (n):
"r" (f)
);
std::cout << f;
return 0;
}
This code causes SIGSEV.
Identic program on intel asm syntax (http://pastebin.com/2EqJmGAV) works fine. Why my "AT&T program" fails and how can i fix it?
#include <iostream>
int main()
{
unsigned n = 0, f = 0;
std::cin >> n;
__asm
{
mov eax, n
mov ecx, eax
mov eax, 1
mov ebx, 1
jmp cycle_start
cycle:
inc ebx
mul ebx
cycle_start:
cmp eax, ecx
jnz cycle
mov f, ebx
};
std::cout << f;
return 0;
}
UPD: Pushing to stack and restoring back used registers gives the same result: SIGSEV
You have your input and output the wrong way around.
So, start by altering
"=r" (n):
"r" (f)
to:
"=r" (f) :
"r" (n)
Then I suspect you'll want to tell the compiler about clobbers (registers you are using that aren't inputs or outputs):
So add:
: "eax", "ebx", "ecx"
after the two lines above.
I personally would make some other changes:
Use local labels (1: and 2: etc), which allows the code to be duplicated without "duplicate label".
Use %1 instead of %%ebx - that way, you are not using an extra register.
Move %0 directly to %%ecx. You are loading 1 into %%eax two instructions later, so what purpose has it got to do in %%eax?
[Now, I'ver written too much, and someone else has answered first... ]
Edit: And, as Anton points out, you need $1 to load the constant 1, 1 means read from address 1, which doesn't work well, and most likely is the cause of your problems
Hopefully there are no requirements to use nothing but gcc inline asm to figure it out. You can translate your AT&T example with nasm, then disassemble with objdump and see what's the right syntax.
I seem to recall that mov 1,%eax should be mov $1,%eax if you mean literal constant and not a memory reference.
An answer by #MatsPetersson is very useful regarding the interaction of your inline assembly with the compiler (clobbered/input/output registers). I've focused on the reason why you get SIGSEGV, and reading the address 1 does answer the question.