What constraint is not correct with this inline assembly? - c++

I have x86 (32-bit) inline assembly in C++ code being compiled for 32-bit Linux by gcc 9.3.0. At the end of the asm call, I get the error 'asm' has impossible constraints. It doesn't tell me which constraint(s) are "impossible" or why.
I understand the syntax of the __asm__(...) statement as documented here, and I spent some time trying to prove why each constraint is correct, but I can't figure out which constraints to add or remove, or the offending assembly that is violating a constraint.
Here's the code:
extern int view_pitch;
static int scale;
static int rle_remainder;
static int height;
static short *color_table;
static PIXEL *RLE_palette;
static int repeat;
static PIXEL *dest;
static UCHAR *src;
void RLE_blit(void)
__asm__ __volatile__ (
" mov %2, %%edi\n"
" push %%ebp\n"
" mov %3, %%edx\n"
" xor %%ecx, %%ecx\n"
" mov (%%edx), %%cl\n"
" add $0x2, %%edx\n"
" imul %4, %%ecx\n"
" add %0, %%ecx\n"
" xor %%eax, %%eax\n"
" mov %%ecx, %%ebx\n"
" and $0xffff, %%ecx\n"
" shr $0x10, %%ebx\n"
" mov %%ecx, %0\n"
" cmp $0x0, %%ebx\n"
" jbe mloop\n"
// color region code
" mov -0x1(%%edx), %%al\n"
" mov %%eax, %%ecx\n" // save orginal pixel in eax
" and $0x1f, %%eax\n" // now modulo original pixel by 32 to get offset
" mov %5, %%esi\n" // pointer to color table
" jz bypass\n"
" shr $0x5, %%ecx\n" // divide pixel by 32 to get region
" mov (%%esi,%%ecx,2), %%cx\n" // get region number from region array into cx
" shl $0x5, %%ecx\n" // multiply by 32 to get start of region
" mov %6, %%esi\n"
" add %%ecx, %%eax\n" // add offset
" mov (%%esi,%%eax,2), %%ax\n"
" cmp %1, %%ebx\n" // is run length <= than remaining column height
" jbe no_run_len_adjust\n" // yes: don't adjust
" mov %1, %%ebx\n" // no : set run length to height
" sub %%ebx, %1\n"
" mov %7, %%ebp\n"
" mov %8, %%esi\n"
"run_len_loop:\n" // BEGIN outer run length loop
" mov %%ebp, %%ecx\n"
"col_loop:\n" // BEGIN inner column repeat loop
" dec %%ecx\n"
" mov %%ax, (%%edi,%%ecx,2)\n"
" jnz col_loop\n" // END column repeat loop
" add %%esi, %%edi\n"
" dec %%ebx\n"
" jnz run_len_loop\n" // END run length loop
" cmp $0x0, %1\n"
" jg mloop\n"
" jmp exit1\n"
" mov %%ebx, %%eax\n"
" imul %8, %%eax\n"
" add %%eax, %%edi\n"
" sub %%ebx, %1\n"
" cmp $0x0, %1\n"
" jg mloop\n"
" pop %%ebp\n"
:"+m"(rle_remainder), "+m"(height)
:"m"(dest), "m"(src), "m"(scale), "m"(color_table), "m"(RLE_palette), "m"(repeat), "m"(view_pitch)
:"esi", "edi", "eax", "ebx", "ecx", "edx", "memory"
Here's the rundown of why I think what I have is right:
Outputs: rle_remainder and height are both read and written-to, so the +m is appropriate.
Inputs: dest, src, scale, color_table, RLE_palette, repeat and view_pitch appear only to be read from; they aren't in the "destination" position of any instruction. So the m means they're being (only) read from.
The list of clobbered registers ("esi", "edi", "eax", "ebx", "ecx", "edx") contains all registers that are modified. Since some of the modified registers are 8 or 16-bit "parts" of these registers, like ax, these parts of the registers don't need to be explicitly specified, because the whole 32-bit register they're a part of is already being listed as clobbered.
"memory" is listed as clobbered to provide a read/write barrier for memory addresses being written to, not only the memory locations of the outputs themselves. For example, mov %%ax, (%%edi,%%ecx,2) clobbers memory.
I didn't use the goto keyword, because no C labels are being used -- the only labels being jumped to here are labels within this block of inline assembly.
The frame pointer ebp gets modified during the course of the code, but restored at the end, so I didn't list it as clobbered. If I do list it, gcc throws an error, because you're not allowed to tell gcc that you clobbered the frame pointer anymore, I don't think. Either way, the code fails to compile with or without ebp called out as clobbered.
Speculation: Could it be complaining that ebp gets modified at all? Even though we restore the value of it back to the original at the end of the code -- guaranteed -- unless the program crashes? The docs weren't clear on whether the rule is "you can't modify ebp at all, even temporarily" or "if you modify ebp, the value at the end of your inline asm must be restored to the original". If it's the former, then my code is wrong, because ebp gets modified during the execution of this code.
Clearly I am missing something about gcc's expectations of either the constraints or the assembly, but I can't spot what it is.


How to get an argument from stack in x64 assembly?

I'm trying to write a procedure in x64 assembly.
I'm calling it in a main program that is written in C++. I'm passing several parameters. I know that first 4 will be in specific registers and the rest of them (should be) on stack. What's more, I read that before taking 5th argument from the stack, I should substract 40 from RSP. And at the begining it worked. Later I needed to check the address of sth so I did it by: cout and &. But then, taking 5th argument from stack didn't work and I have no idea what whould I do.
fragment of C++ code:
std::cout << xOld << '\t' << &xOld << std::endl;
std::cout << xOld[0] << '\t' << &xOld[0] << std::endl;
SthInAsm(A, B, alfa, beta, n, xOld, xNew, lowerBound, upperBound, condition, isReady, precision, maxIterations);
fragment of Asm code:
Aaddr DQ 0
Baddr DQ 0
alfa DQ 0
beta DQ 0
n DQ 0
xOld DQ 0
MOV Aaddr, RCX
MOV Baddr, RDX
MOV alfa, R8
MOV beta, R9
After 'MOV RAX, n' RAX doesn't contain value of n. When I didn't check address by cout before calling this function, it worked.
Does anyone know what is the problem here?
Thanks to Jester I know what is wrong in my code. I must have misunderstood sth when I read about x64 assembly. Substracting from RSP - I shouldn't do it.
Instead of that, getting arguments from stack works when I write:
Thank you Jester again!

Trying to understand simple disassembled code from g++

I am still struggling with g++ inline assembler and trying to understand how to use it.
I've adapted a piece of code from here: http://asm.sourceforge.net/articles/linasm.html (Quoted from the "Assembler Instructions with C Expressions Operands" section in gcc info files)
static inline uint32_t sum0() {
uint32_t foo = 1, bar=2;
uint32_t ret;
__asm__ __volatile__ (
"add %%ebx,%%eax"
: "=eax"(ret) // ouput
: "eax"(foo), "ebx"(bar) // input
: "eax" // modify
return ret;
I've compiled disabling optimisations:
g++ -Og -O0 inline1.cpp -o test
The disassembled code puzzles me:
(gdb) disassemble sum0
Dump of assembler code for function sum0():
0x00000000000009de <+0>: push %rbp ;prologue...
0x00000000000009df <+1>: mov %rsp,%rbp ;prologue...
0x00000000000009e2 <+4>: movl $0x1,-0xc(%rbp) ;initialize foo
0x00000000000009e9 <+11>: movl $0x2,-0x8(%rbp) ;initialize bar
0x00000000000009f0 <+18>: mov -0xc(%rbp),%edx ;
0x00000000000009f3 <+21>: mov -0x8(%rbp),%ecx ;
0x00000000000009f6 <+24>: mov %edx,-0x14(%rbp) ; This is unexpected
0x00000000000009f9 <+27>: movd -0x14(%rbp),%xmm1 ; why moving variables
0x00000000000009fe <+32>: mov %ecx,-0x14(%rbp) ; to extended registers?
0x0000000000000a01 <+35>: movd -0x14(%rbp),%xmm2 ;
0x0000000000000a06 <+40>: add %ebx,%eax ; add (as expected)
0x0000000000000a08 <+42>: movd %xmm0,%edx ; copying the wrong result to ret
0x0000000000000a0c <+46>: mov %edx,-0x4(%rbp) ; " " " " " "
0x0000000000000a0f <+49>: mov -0x4(%rbp),%eax ; " " " " " "
0x0000000000000a12 <+52>: pop %rbp ;
0x0000000000000a13 <+53>: retq
End of assembler dump.
As expected, the sum0() function returns the wrong value.
Any thoughts? What is going on? How to get it right?
-- EDIT --
Based on #MarcGlisse comment, I tried:
static inline uint32_t sum0() {
uint32_t foo = 1, bar=2;
uint32_t ret;
__asm__ __volatile__ (
"add %%ebx,%%eax"
: "=a"(ret) // ouput
: "a"(foo), "b"(bar) // input
: "eax" // modify
return ret;
It seems that the tutorial I've been following is misleading. "eax" in the output/input field does not mean the register itself, but e,a,x abbreviations on the abbrev table.
Anyway, I still do not get it right. The code above results in a compilation error: 'asm' operand has impossible constraints.
I don't see why.
The Extended inline assembly constraints for x86 are listed in the official documentation.
The complete documentation is also worth reading.
As you can see, the constraints are all single letters.
The constraint "eax" fo foo specifies three constraints:
The a register.
Any SSE register.
32-bit signed integer constant, or ...
Since you are telling GCC that eax is clobbered it cannot put the input operand there and it picks xmm0.
When the compiler selects the registers to use to represent the input operands, it does not use any of the clobbered registers
The proper constraint is simply "a".
You need to remove eax (by the way it should be rax due to zeroing of the upper bits) from the clobbers (and add "cc").

Reverse external array in assembly using swapping method - x86 MASM

I am working on a project where we need to pass an array of type char as a parameter and reverse the array. I feel like I am very close to getting it done, but I am stuck on the actual swapping process.
For my swapping function in my .asm, I used the same method I would in c++ (use an unused register as a temp, then swap the front and the back.) What I am not understanding is how would I go about changing the actual content at that address. I assumed performing the following would "change" the content at the destination address:
mov eax,[edx]
However, this did not work as planned. After I ran a for loop to iterate through the array again, everything stayed the same.
If anyone can point me in the right direction, it would be great. I have provided the code below with as much comments as I could provide.
Also, I am doing all this in a single .asm file; however, my professor wants me to have 3 separate .asm document for each of the following functions: swap, reverse, and getLength. I tried to include the other 2 .asm document in the reverse.asm, but it kept giving me an error.
Assembly Code Starts:
.model flat
_reverse PROC
push ebp
mov ebp,esp ;Have ebp point to esp
mov ebx,[ebp+8] ;Point to beginning of array
mov eax,ebx
mov edx,1
mov ecx,0
mov edi,0
jmp getLength
cmp ebp, 0 ;Counter to iterate until needed to stop
je setup
add ecx,1
mov ebp,[ebx+edx]
add edx,1
jmp getLength
setup: ;This is to set up the numbers correctly and get array length divided by 2
mov esi,ecx
mov edx,0
mov eax,ecx
mov ecx,2
div ecx
mov ecx,eax
add ecx,edx ;Set up ecx(Length of string) correctly by adding modulo if odd length string
mov eax,ebx
dec esi
jmp reverse
reverse: ;I started the reverse function by using a counter to iterate through length / 2
cmp edi, ecx
je allDone
mov ebx,eax ;Set ebx to the beginning of array
mov edx,eax ;Set edx to the beginning of array
add ebx,edi ;Move ebx to correct index to perform swap
add edx,esi ;Move edx to the back at the correct index
jmp swap ;Invoke swap function
mov ebp,ebx ;Move value to temp
mov ebx,[edx] ;Swap the back end value to the front
mov edx,[edx] ;Move temp to back
inc edi ;Increment to move up one index to set up next swap
dec esi ;Decrement to move back one index to set up for next swap
jmp reverse ;Jump back to reverse to setup next index swapping
pop ebp
_reverse ENDP
C++ Code starts:
#include <iostream>
#include <string>
using namespace std;
extern "C" char reverse(char*);
int main()
const int SIZE = 20;
char str1[SIZE] = { NULL };
cout << "Please enter a string: ";
cin >> str1;
cout << "Your string is: ";
for (int i = 0; str1[i] != NULL; i++)
cout << str1[i];
cout << "." << endl;
cout << "Your string in reverse is: ";
for (int i = 0; str1[i] != NULL; i++)
cout << str1[i];
cout << "." << endl;
return 0;
So after many more hours of tinkering and looking around, I was finally able to figure out how to properly copy over a byte. I will post my .asm code below with comments if anybody needs it for future reference.
I was actually moving the content of the current address into a 32 bit registers. After I changed it from mov ebx,[eax] to mov bl,[eax], it copied the value correctly.
I will only post the code that I was having difficulty with so I do not give away the entire project for other students.
ASM Code Below:
mov bl,[edx] ;Uses bl since we are trying to copy a 1 byte char value
mov bh,[eax] ;Uses bh since we are trying to copy a 1 byte char value
mov [edx],bh ;Passing the value to the end of the array
mov [eax],bl ;Passing the value to the beginning of the array
inc eax ;Moving the array one index forward
dec edx ;Moving the array one index backwards
dec ecx ;Decreasing the counter by one to continue loop as needed
jmp reverse ;Jump back to reverse to check if additional swap is needed
Thanks for everyone that helped.
mov eax,[edx] (assuming intel syntax) places the 32 bits found in memory at address edx into eax. I.e, this code retrieves data from a memory location. If you'd like to write to a mem location, you need to reverse this, i.e mov [edx], eax
After playing with some 16 bit code overnight for sorting, I've the following two functions that may be of use. Obviously, you can't copy/paste them - you'll have to study it. However, you'll notice that it is able to swap items of arbitrary size. Perfect for swapping elements that are structures of some type.
; copies cx bytes from ds:si to es:di
shr cx, 1
jnc .swapCopy1Loop
shr cx, 1
jnc .swapCopy2Loop
rep movsd
; bp+0 bp+2 bp+4
;void swap(void *ptr1, void *ptr2, int dataSizeBytes)
push bp
mov bp, sp
add bp, 4
push di
push si
push es
mov ax, ds
mov es, ax
sub sp, [bp+4] ; allocate dataSizeBytes on the stack, starting at bp-6 - dataSizeBytes
mov di, sp
mov si, [bp+0]
mov cx, [bp+4]
call copyBytes
mov si, [bp+2]
mov di, [bp+0]
mov cx, [bp+4]
call copyBytes
mov si, sp
mov di, [bp+2]
mov cx, [bp+4]
call copyBytes
add sp, [bp+4]
pop es
pop si
pop di
pop bp
ret 2 * 3

Generated code not matching expectations with Extended ASM

I have a CpuFeatures class. The requirements for the class are simple: (1) preserve EBX or RBX, and (2) record the values returned from CPUID in EAX/EBX/ECX/EDX. I'm not sure the code being generated is the code I intended.
The CpuFeatures class code uses GCC Extended ASM. Here's the relevant code:
struct CPUIDinfo
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
uintptr_t scratch;
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
"\t xchgl %%ebx, %k1 \n"
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
"\t xchgl %%ebx, %k1 \n"
: "=a"(info.EAX), "=&r"(scratch), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
if(func == 0)
return !!info.EAX;
return true;
The code below was compiled with -g3 -Og on Cygwin i386. When I examine it under a debugger, I'm don't like what I am seeing.
Dump of assembler code for function CpuFeatures::DoDetectX86Features():
0x0048f355 <+1>: sub $0x48,%esp
=> 0x0048f358 <+4>: mov $0x0,%ecx
0x0048f35d <+9>: mov %ecx,%eax
0x0048f35f <+11>: xchg %ebx,%ebx
0x0048f361 <+13>: cpuid
0x0048f363 <+15>: xchg %ebx,%ebx
0x0048f365 <+17>: mov %eax,0x10(%esp)
0x0048f369 <+21>: mov %ecx,0x18(%esp)
0x0048f36d <+25>: mov %edx,0x1c(%esp)
0x0048f371 <+29>: mov %ebx,0x14(%esp)
0x0048f375 <+33>: test %eax,%eax
I don't like what I am seeing because it appears EBX/RBX is not being preserved (xchg %ebx,%ebx at +11). Additionally, it looks like the preserved EBX/RBX is being saved as the result of CPUID, and not the actual value of EBX returned by CPUID (xchg %ebx,%ebx at +15, before the mov %ebx,0x14(%esp) at +29).
If I change the operand to use a memory op with "=&m"(scratch), then the generated code is:
0x0048f35e <+10>: xchg %ebx,0x40(%esp)
0x0048f362 <+14>: cpuid
0x0048f364 <+16>: xchg %ebx,0x40(%esp)
A related question is What ensures reads/writes of operands occurs at desired times with extended ASM?
What am I doing wrong (besides wasting countless hours on something that should have taken 5 or 15 minutes)?
The code below is a complete example that I used to compile your example code above including the modification to exchange(swap) directly to the info.EBX variable.
#include <inttypes.h>
#define word32 uint32_t
struct CPUIDinfo
word32 EAX;
word32 EBX;
word32 ECX;
word32 EDX;
bool CpuId(word32 func, word32 subfunc, CPUIDinfo& info)
__asm__ __volatile__ (
".att_syntax \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
"\t xchgl %%ebx, %k1 \n"
"\t cpuid \n"
#if defined(__x86_64__)
"\t xchgq %%rbx, %q1 \n"
"\t xchgl %%ebx, %k1 \n"
: "=a"(info.EAX), "=&m"(info.EBX), "=c"(info.ECX), "=d"(info.EDX)
: "a"(func), "c"(subfunc)
if(func == 0)
return !!info.EAX;
return true;
int main()
CPUIDinfo cpuInfo;
CpuId(1, 0, cpuInfo);
The first observation that you should make is that I chose to use the info.EBX memory location to do the actual swap to. This eliminates needing a another temporary variable or register.
I assembled as 32-bit code with -g3 -Og -S -m32 and got these instructions of interest:
xchgl %ebx, 4(%edi)
xchgl %ebx, 4(%edi)
movl %eax, (%edi)
movl %ecx, 8(%edi)
movl %edx, 12(%edi)
%edi happens to contain the address of the info structure. 4(%edi) happens to be the address of info.EBX. We swap %ebx and 4(%edi) after cpuid. With that instruction ebx is restored to what it was before cpuid and 4(%edi) now has what ebx was right after cpuid was executed. The remaining movl lines place eax, ecx, edx registers into the rest of the info structure via the %edi register.
The generated code above is what I would expect it to be.
Your code with the scratch variable (and using the constraint "=&m"(scratch)) never gets used after the assembler template so %ebx,0x40(%esp) has the value you want but it never gets moved anywhere useful. You'd have to copy the scratch variable into info.EBX (ie. info.EBX = scratch;)and look at all of the resulting instructions that get generated. At some point the data would be copied from the scratch memory location to info.EBX among the generated assembly instructions.
Update - Cygwin and MinGW
I wasn't entirely satisfied that the Cygwin code output was correct. In the middle of the night I had an Aha! moment. Windows already does its own position independent code when the dynamic link loader loads an image (DLL etc) and modifies the image via re-basing. There is no need for additional PIC processing like it is done in Linux 32 bit shared libraries so there is no issue with ebx/rbx. This is why Cygwin and MinGW will present warnings like this when compiling with -fPIC
warning: -fPIC ignored for target (all code is position independent)
This is because under Windows all 32bit code can be re-based when it is loaded by the Windows dynamic loader. More about re-basing can be found in this Dr. Dobbs article. Information on the windows Portable Executable format (PE) can be found in this Wiki article. Cygwin and MinGW don't need to worry about preserving ebx/rbx when targeting 32bit code because on their platforms PIC is already handled by the OS, other re-basing tools, and the linker.

Using AT&T inline assembler for GCC

I'm writing a simple but a little specific program:
Purpose: calculate number from it's factorial
Requirements: all calculations must be done on gcc inline asm (at&t syntax)
Source code:
#include <iostream>
int main()
unsigned n = 0, f = 0;
std::cin >> n;
"mov %0, %%eax \n"
"mov %%eax, %%ecx \n"
"mov 1, %%ebx \n"
"mov 1, %%eax \n"
"jmp cycle_start\n"
"inc %%ebx\n"
"mul %%ebx\n"
"cmp %%ecx, %%eax\n"
"jnz cycle\n"
"mov %%ebx, %1 \n":
"=r" (n):
"r" (f)
std::cout << f;
return 0;
This code causes SIGSEV.
Identic program on intel asm syntax (http://pastebin.com/2EqJmGAV) works fine. Why my "AT&T program" fails and how can i fix it?
#include <iostream>
int main()
unsigned n = 0, f = 0;
std::cin >> n;
mov eax, n
mov ecx, eax
mov eax, 1
mov ebx, 1
jmp cycle_start
inc ebx
mul ebx
cmp eax, ecx
jnz cycle
mov f, ebx
std::cout << f;
return 0;
UPD: Pushing to stack and restoring back used registers gives the same result: SIGSEV
You have your input and output the wrong way around.
So, start by altering
"=r" (n):
"r" (f)
"=r" (f) :
"r" (n)
Then I suspect you'll want to tell the compiler about clobbers (registers you are using that aren't inputs or outputs):
So add:
: "eax", "ebx", "ecx"
after the two lines above.
I personally would make some other changes:
Use local labels (1: and 2: etc), which allows the code to be duplicated without "duplicate label".
Use %1 instead of %%ebx - that way, you are not using an extra register.
Move %0 directly to %%ecx. You are loading 1 into %%eax two instructions later, so what purpose has it got to do in %%eax?
[Now, I'ver written too much, and someone else has answered first... ]
Edit: And, as Anton points out, you need $1 to load the constant 1, 1 means read from address 1, which doesn't work well, and most likely is the cause of your problems
Hopefully there are no requirements to use nothing but gcc inline asm to figure it out. You can translate your AT&T example with nasm, then disassemble with objdump and see what's the right syntax.
I seem to recall that mov 1,%eax should be mov $1,%eax if you mean literal constant and not a memory reference.
An answer by #MatsPetersson is very useful regarding the interaction of your inline assembly with the compiler (clobbered/input/output registers). I've focused on the reason why you get SIGSEGV, and reading the address 1 does answer the question.