x86 logical address syntax error - c++

Compiler: gcc 4.7.1, 32bit, ubuntu
Here's an example:
int main(void)
{
unsigned int mem = 0;
__asm volatile
(
"mov ebx, esp\n\t"
"mov %0, [ds : ebx]\n\t"
: "=m"(mem)
);
printf("mem = 0x%08x\n", mem);
return 0;
}
gcc -masm=intel -o app main.c
Assembler messages: invalid use of register!
As I know, ds and ss point to the same segment. I don't know why I can't use [ds : ebx] logical address for addressing.

Your code has two problems:
One: the indirect memory reference should be:
mov %0, ds : [ebx]
That is, with the ds out of the brackets.
Two: A single instruction cannot have both origin and destination in memory, you have to use a register. The easiest way would be to indicate =g that basically means whatever, but in your case it is not possible because esp cannot be moved directly to memory. You have to use =r.
Three: (?) You are clobbering the ebx register, so you should declare it as such, or else do not use it that way. That will not prevent compilation, but will make your code to behave erratically.
In short:
unsigned int mem = 0;
__asm volatile
(
"mov ebx, esp\n\t"
"mov %0, ds : [ebx]\n\t"
: "=r"(mem) :: "ebx"
);
Or better not to force to use ebx, let instead the compiler decide:
unsigned int mem = 0, temp;
__asm volatile
(
"mov %1, esp\n\t"
"mov %0, ds : [%1]\n\t"
: "=r"(mem) : "r"(temp)
);
BTW, you don't need the volatile keyword in this code. That is used to avoid the assembler to be optimized away even if the output is not needed. If you write the code for the side-effect, add volatile, but if you write the code to get an output, do not add volatile. That way, if the optimizing compiler determines that the output is not needed, it will remove the whole block.

Related

Error : Invalid Character '(' in mnemonic

Hi I am trying to compile the below assembly code on Linux using gcc 7.5 version but somehow getting the error
Error : Invalid Character '(' in mnemonic
bool InterlockedCompareAndStore128(int *dest,int *newVal,int *oldVal)
{
asm(
"push %rbx\n"
"push %rdi\n"
"mov %rcx, %rdi\n" // ptr to dest -> RDI
"mov 8(%rdx), %rcx\n" // newVal -> RCX:RBX
"mov (%rdx), %rbx\n"
"mov 8(%r8), %rdx\n" // oldVal -> RDX:RAX
"mov (%r8), %rax\n"
"lock (%rdi), cmpxchg16b\n"
"mov $0, %rax\n"
"jnz exit\n"
"inc1 %rax\n"
"exit:;\n"
"pop %rdi\n"
"pop %rbx\n"
);
}
Can anyone suggest how to resolve this . Checked many online links and tutorials for Assembly code but could not relate the exact issue.
Thanks for the help in advance.
In Windows I could see the implementation of the above function as:
function InterlockedCompareExchange128;
asm
.PUSHNV RBX
MOV R10,RCX
MOV RBX,R8
MOV RCX,RDX
MOV RDX,[R9+8]
MOV RAX,[R9]
LOCK CMPXCHG16B [R10]
MOV [R9+8],RDX
MOV [R9],RAX
SETZ AL
MOVZX EAX, AL
end;
For PUSHNV , I could not found anything related to this on Linux. So , basically I am trying to implement same functionality in c++ on Linux.
The question here was about Invalid Character '(' in mnemonic which the other answer addresses.
However, OP's code has a number of issues beyond that problem. Here's (what I think are) two better approaches to this problem. Note that I've changed the order of the parameters and turned them const.
This one continues to use inline asm, but uses Extended asm instead of Basic. While I'm of the don't use inline asm school of thought, this might be useful or at least educational.
bool InterlockedCompareAndStore128B(__int64 *dest, const __int64 *oldVal, const __int64 *newVal)
{
bool result;
__int64 ovl = oldVal[0];
__int64 ovh = oldVal[1];
asm volatile ("lock cmpxchg16b %[ptr]"
: "=#ccz" (result), [ptr] "+m" (*dest),
"+d" (ovh), "+a" (ovl)
: "c" (newVal[1]), "b" (newVal[0])
: "cc", "memory");
// cmpxchg16b changes rdx:rax to the current value in dest. Useful if you need
// to loop until you succeed, but OP's code doesn't save the values, so I'm
// just following that spec.
//oldVal[0] = ovl;
//oldVal[1] = ovh;
return result;
}
In addition to solving the problems with the original code, it's also inlineable and shorter. The constraints likely make it harder to read, but the fact that there's only 1 line of asm might help offset that. If you want to understand what the constraints mean, check out this page (scroll down to x86 family) and the description of flag output constraints (again, scroll down for x86 family).
As an alternative, this code uses a gcc builtin and allows the compiler to generate the appropriate asm instructions. Note that this must be built with -mcx16 for best results.
bool InterlockedCompareAndStore128C(__int128 *dest, const __int128 *oldVal, const __int128 *newVal)
{
// While a sensible person would use __atomic_compare_exchange_n and let gcc generate
// cmpxchg16b, gcc decided they needed to turn this into a big hairy function call:
// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
// In short, if someone wants to compare/exchange against readonly memory, you can't just
// use cmpxchg16b cuz it would crash. Why would anyone try to exchange memory that can't
// be written to? Apparently because it's expected to *not* crash if the compare fails
// and nothing gets written. So no one gets to use that 1 line instruction and everyone
// gets an entire routine (that uses MUTEX instead of lockfree) to support this absurd
// border case. Sounds dumb to me, but that's where things stand as of 2021-05-07.
// Use the legacy function instead.
bool b = __sync_bool_compare_and_swap(dest, *oldVal, *newVal);
return b;
}
For the kibizters in the crowd, here's the code generated by -m64 -O3 -mcx16 for that last one:
InterlockedCompareAndStore128C(__int128*, __int128 const*, __int128 const*):
mov rcx, rdx
push rbx
mov rax, QWORD PTR [rsi]
mov rbx, QWORD PTR [rcx]
mov rdx, QWORD PTR [rsi+8]
mov rcx, QWORD PTR [rcx+8]
lock cmpxchg16b XMMWORD PTR [rdi]
pop rbx
sete al
ret
If someone wants to fiddle, here's the godbolt link.
There are a number of problems with this code, and I'm not convinced I'm doing you any favors by telling you how to fix the specific problem.
But the short answer is that
"lock (%rdi), cmpxchg16b\n"
should be
"lock cmpxchg16b (%rdi)\n"
Tada, now it compiles. Well, it would if inc1 was a real instruction.
But I can't help but notice that the pointers here are int *, which is 4 bytes, not 16. And that this function is not declared as naked. And using Extended asm would save you from having to push all these registers around by hand, making this code a lot slower than it needs to be.
But most of all, you should really use the builtins, like __atomic_compare_exchange because inline asm is error prone, not portable, and really hard to maintain.

Asm inserion in naked function

I have ubuntu 16.04, x86_64 arch, 4.15.0-39-generic kernel version.
GCC 8.1.0
I tried to rewrite this functions(from first post https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/qHDCU73cEFc) from Intel dialect to AT&T. And I did not succeed.
namespace atomic {
__declspec(naked)
static void*
ldptr_acq(void* volatile*) {
_asm {
MOV EAX, [ESP + 4]
MOV EAX, [EAX]
RET
}
}
__declspec(naked)
static void*
stptr_rel(void* volatile*, void* const) {
_asm {
MOV ECX, [ESP + 4]
MOV EAX, [ESP + 8]
MOV [ECX], EAX
RET
}
}
}
Then I wrote a simple program, to get the same pointer, which I pass inside. I installed GCC version 8.1 with supported naked attributes(https://gcc.gnu.org/gcc-8/changes.html "The x86 port now supports the naked function attribute") for fuctions.
As far as I remember, this attribute tells the compiler not to create the prologue and epilog of the function, and I can take the parameters from the stack myself and return them.
Code:(don't work with segfault)
#include <cstdio>
#include <cstdlib>
__attribute__ ((naked))
int *get_num(int*) {
__asm__ (
"movl 4(%esp), %eax\n\t"
"movl (%eax), %eax\n\t"
"ret"
);
}
int main() {
int *i =(int*) malloc(sizeof(int));
*i = 5;
int *j = get_num(i);
printf("%d\n", *j);
free(i);
return 0;
}
then I tried using 64bit registers:(don't work with segfault)
__asm__ (
"movq 4(%rsp), %rax\n\t"
"movq (%rax), %rax\n\t"
"ret"
);
And only after I took the value out of rdi register - it all worked.
__asm__ (
"movq %rdi, %rax\n\t"
"ret"
);
Why did I fail to make the transfer through the stack register? I probably made a mistake. Please tell me where is my fail?
Because the x86-64 System V calling convention passes args in registers, not on the stack, unlike the old inefficient i386 System V calling convention.
You always have to write asm that matches the calling convention, if you're writing the whole function in asm, like with a naked function or a stand-along .S file.
GNU C extended asm allows you to use operands to specify the inputs to an asm statement, and the compiler will generate instructions to make that happen. (I wouldn't recommend using it until you understand asm and how compilers turn C into asm with optimization enabled, though.)
Also note that movq %rdi, %rax implements long *foo(long*p){return p;} not return *p. Perhaps you meant mov (%rdi), %rax to dereference the pointer arg?
And BTW, you definitely don't need and shouldn't use inline asm for this. https://gcc.gnu.org/wiki/DontUseInlineAsm, and see https://stackoverflow.com/tags/inline-assembly/info
In GNU C, you can cast a pointer to volatile uint64_t*. Or you can use __atomic_load_n (ptr, __ATOMIC_ACQUIRE) to get basically everything you were getting from that asm, without the overhead of a function call or any of the cost for the optimizer at the call-site of having all the call-clobbered registers be clobbered.
You can use them on any object: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html Unlike C++11 where you can only do atomic ops on a std::atomic<T>.

c++ gcc inline assembly does not seem to work

I am trying to figure out gcc inline assembly on c++. The following code works on visual c++ without % and other operands but i could not make it work with gcc
void function(const char* text) {
DWORD addr = (DWORD)text;
DWORD fncAddr = 0x004169E0;
asm(
"push %0" "\n"
"call %1" "\n"
"add esp, 04" "\n"
: "=r" (addr) : "d" (fncAddr)
);
}
I am injecting a dll to a process on runtime and fncAddr is an address of a function. It never changes. As I said it works with Visual C++
VC++ equivalent of that function:
void function(const char* text) {
DWORD addr = (DWORD)text;
DWORD fncAddr = 0x004169E0;
__asm {
push addr
call fncAddr
add esp, 04
}
}
Edit:
I changed my function to this: now it crashes
void sendPacket(const char* msg) {
DWORD addr = (DWORD)msg;
DWORD fncAddr = 0x004169E0;
asm(
".intel_syntax noprefix" "\n"
"pusha" "\n"
"push %0" "\n"
"call %1" "\n"
"add esp, 04" "\n"
"popa" "\n"
:
: "r" (addr) , "d"(fncAddr) : "memory"
);
}
Edit:
004169E0 /$ 8B0D B4D38100 MOV ECX,DWORD PTR DS:[81D3B4]
004169E6 |. 85C9 TEST ECX,ECX
004169E8 |. 74 0A JE SHORT client_6.004169F4
004169EA |. 8B4424 04 MOV EAX,DWORD PTR SS:[ESP+4]
004169EE |. 50 PUSH EAX
004169EF |. E8 7C3F0000 CALL client_6.0041A970
004169F4 \> C3 RETN
the function im calling is above. I changed it to function pointer cast
char_func_t func = (char_func_t)0x004169E0;
func(text);
like this and it crashed too but surprisingly somethimes it works. I attacted a debugger and it gave access violation at some address it does not exist
on callstack the last call is this:
004169EF |. E8 7C3F0000 CALL client_6.0041A970
LAST EDIT:
I gave up inline assembly, instead i wrote instructions i wanted byte by byte and it works like a charm
void function(const char* text) {
DWORD fncAddr = 0x004169E0;
char *buff = new char[50]; //extra bytes for no reason
memset((void*)buff, 0x90, 50);
*((BYTE*)buff) = 0x68; // push
*((DWORD*)(buff + 1)) = ((DWORD)text);
*((BYTE*)buff+5) = 0xE8; //call
*((DWORD*)(buff + 6)) = ((DWORD)fncAddr) - ((DWORD)&(buff[5]) + 5);
*((BYTE*)(buff + 10)) = 0x83; // add esp, 04
*((BYTE*)(buff + 11)) = 0xC4;
*((BYTE*)(buff + 12)) = 0x04;
*((BYTE*)(buff + 13)) = 0xC3; // ret
typedef void(*char_func_t)(void);
char_func_t func = (char_func_t)buff;
func();
delete[] buff;
}
Thank you all
Your current version with pusha / popa looks correct (slow but safe), unless your calling convention depends on maintaing 16-byte stack alignment.
If it's crashing, your real problem is somewhere else, so you should use a debugger and find out where it crashes.
Declaring clobbers on eax / ecx / edx, or asking for the pointers in two of those registers and clobbering the third, would let you avoid pusha / popa. (Or whatever the call-clobbered regs are for the calling convention you're using.)
You should remove the .intel_syntax noprefix. You already depend on compiling with -masm=intel, because you don't restore the previous mode in case it was AT&T. (I don't think there is a way to save/restore the old mode, unfortunately, but there is a dialect-alternatves mechanism for using different templates for different syntax modes.)
You don't need and shouldn't use inline asm for this
compilers know how to make function calls already, when you're using a standard calling convention (in this case: stack args in 32-bit mode which is normally the default).
It's valid C++ to cast an integer to a function pointer, and it's not even undefined behaviour if there really is a function there at that address.
void function(const char* text) {
typedef void (*char_func_t)(const char *);
char_func_t func = (char_func_t)0x004169E0;
func(text);
}
As a bonus, this compiles more efficiently with MSVC than your asm version, too.
You can use GCC function attributes on function pointers to specify the calling convention explicitly, in case you compile with a different default. For example __attribute__((cdecl)) to explicitly specify stack args and caller-pops for calls using that function pointer. The MSVC equivalent is just __cdecl.
#ifdef __GNUC__
#define CDECL __attribute__((cdecl))
#define STDCALL __attribute__((stdcall))
#elif defined(_MSC_VER)
#define CDECL __cdecl
#define STDCALL __stdcall
#else
#define CDECL /*empty*/
#define STDCALL /*empty*/
#endif
// With STDCALL instead of CDECL, this function has to translate from one calling convention to another
// so it can't compile to just a jmp tailcall
void function(const char* text) {
typedef void (CDECL *char_func_t)(const char *);
char_func_t func = (char_func_t)0x004169E0;
func(text);
}
To see the compiler's asm output, I put this on the Godbolt compiler explorer. I used the "intel-syntax" option, so gcc output comes from gcc -S -masm=intel
# gcc8.1 -O3 -m32 (the 32-bit Linux calling convention is close enough to Windows)
# except it requires maintaing 16-byte stack alignment.
function(char const*):
mov eax, 4286944
jmp eax # tail-call with the args still where we got them
This test caller makes the compiler set up args and not just a tail-call, but function can inline into it.
int caller() {
function("hello world");
return 0;
}
.LC0:
.string "hello world"
caller():
sub esp, 24 # reserve way more stack than it needs to reach 16-byte alignment, IDK why.
mov eax, 4286944 # your function pointer
push OFFSET FLAT:.LC0 # addr becomes an immediate
call eax
xor eax, eax # return 0
add esp, 28 # add esp, 4 folded into this
ret
MSVC's -Ox output for caller is essentially the same:
caller PROC
push OFFSET $SG2661
mov eax, 4286944 ; 004169e0H
call eax
add esp, 4
xor eax, eax
ret 0
But a version using your inline asm is much worse:
;; MSVC -Ox on a caller() that uses your asm implementation of function()
caller_asm PROC
push ebp
mov ebp, esp
sub esp, 8
; store inline asm inputs to the stack
mov DWORD PTR _addr$2[ebp], OFFSET $SG2671
mov DWORD PTR _fncAddr$1[ebp], 4286944 ; 004169e0H
push DWORD PTR _addr$2[ebp] ; then reload as memory operands
call DWORD PTR _fncAddr$1[ebp]
add esp, 4
xor eax, eax
mov esp, ebp ; makes the add esp,4 redundant in this case
pop ebp
ret 0
MSVC inline asm syntax basically sucks, because unlike GNU C asm syntax the inputs always have to be in memory, not registers or immediates. So you could do better with GNU C, but not as good as you can do by avoiding inline asm altogether. https://gcc.gnu.org/wiki/DontUseInlineAsm.
Making function calls from inline asm is generally to be avoided; it's much safer and more efficient when the compiler knows what's happening.
Here's an example of inline assembly with gcc.
Routine "vazio" hosts assembly code for routine "rotina" (vazio and rotina are simply labels). Note the use of Intel syntax by means of a directive; gcc defaults to AT&T .
I recovered this code from an old sub-directory; variables in assembly code were prefixed with "_" , as "_str" - that's standard C convention. I confess that, here and now, I have no idea as why the compiler is accepting "str" instead... Anyway:
compiled correctly with gcc/g++ versions 5 and 7! Hope this helps. Simply call "gcc main.c", or "gcc -S main.c" if you want to see the asm result, and "gcc -S masm=intel main.c" for Intel output.
#include <stdio.h>
char str[] = "abcdefg";
// C routine, acts as a container for "rotina"
void vazio (void) {
asm(".intel_syntax noprefix");
asm("rotina:");
asm("inc eax");
// EBX = address of str
asm("lea ebx, str");
// ++str[0]
asm("inc byte ptr [ebx]");
asm("ret");
asm(".att_syntax noprefix");
}
// global variables make things simpler
int a;
int main(void) {
a = -7;
puts ("antes");
puts (str);
printf("a = %d\n\n", a);
asm(".intel_syntax noprefix");
asm("mov eax, 0");
asm("call rotina");
// modify variable a
asm("mov a, eax");
asm(".att_syntax noprefix");
printf("depois: \n a = %d\n", a);
puts (str);
return 0;
}

__asm__ gcc call to a memory address

I have a code that allocates memory, copies some buffer to that allocated memory and then it jumps to that memory address.
the problem is that I cant jump to the memory address. Im using gcc and __asm__ but I cant call that memory address.
I want to do something like:
address=VirtualAlloc(NULL,len+1, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
dest=strncpy(address, buf, len);
And then I want to do this in ASM:
MOV EAX, dest
CALL EAX.
I've tried something like:
__asm__("movl %eax, dest\n\t"
"call %eax\n\t");
But it does not work.
How can I do it?
There is usually no need to use asm for this, you can simply go through a function pointer and let the compiler take care of the details.
You do need to use __builtin___clear_cache(buf, buf+len) after copy machine code to a buffer before you dereference a function-pointer to it, otherwise it can be optimized away as a dead store.. x86 has coherent instruction caches so it doesn't compile to any extra instructions, but you still need it so the optimizer knows what's going on.
static inline
int func(char *dest, int len) {
__builtin___clear_cache(dest, dest+len); // no instructions on x86 but still needed
int ret = ((int (*)(void))dest)(); // cast to function pointer and deref
return ret;
}
compiles with GCC9.1 -O2 -m32 to
func(char*, int):
jmp [DWORD PTR [esp+4]] # tailcall
Also, you don't actually need to copy a string, you can just mprotect or VirtualProtect the page it's in to make it executable. But if you want to make sure it does stop at the first 0 byte to test your shellcode, then sure copy it.
If you nevertheless insist on inline asm, you should know that gcc inline asm is a complex thing. Also, if you expect the function to return, you should really make sure it follows the calling convention, in particular it preserves the registers it should.
AT&T syntax is op src, dst so your mov was actually a store to the global symbol dest.
That said, here is the answer to the question as worded:
int ret;
__asm__ __volatile__ ("call *%0" : "=a" (ret) : "0" (dest) : "ecx", "edx", "memory");
Explanation: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
call *%0 = the %0 refers to the first substitued argument, the * is standard gas syntax for indirect call
"=a" (ret) = output argument in eax register should be assigned to variable ret after the block
"0" (dest) = input argument in the same place as output argument 0 (which is eax) should be loaded from dest before the block
"ecx", "edx" = tell the compiler these registers may be altered by the asm block, as per normal calling convention.
"memory" = tell the compiler the asm block might make unspecified modifications to memory, so don't cache anything
Note that in x86-64 System V (Linux / OS X), it's not safe to make a function call from inline asm like this. There's no way to declare a clobber on the red zone below RSP.

Using function-scoped labels in inline assembly

I'm playing around with inline assembly in C++ using gcc-4.7 on 64-bit little endian Ubuntu 12.04 LTS with Eclipse CDT and gdb. The general direction of what I'm trying to do is to make some sort of bytecode interpreter for some esoteric stack-based programming language.
In this example, I process the instructions 4-bits at a time (in practice this will depend on the instruction), and when there are no more non-zero instructions (as 0 will be nop) I read the next 64-bit words.
I would like to ask though, how do I use a function-scoped label in inline assembly?
It seems labels in assembly are global, which is unfavourable, and I can't find a way to jump to a C++ function-scoped label from an assembly statement.
The following code is an example of what I'm trying to do (Note the comment):
...
register long ip asm("r8");
register long buf asm("r9");
register long op asm("r10");
...
fetch:
asm("mov (%r8), %r9");
asm("add $8, %r8");
control:
asm("test %r9, %r9");
asm("jz fetch"); // undefined reference to `fetch'
asm("shr $4, %r9");
asm("mov %r9, %r10");
asm("and $0xf, %r10");
switch (op) {
...
}
goto control;
Note the following comment from the gcc inline asm documentation:
Speaking of labels, jumps from one `asm' to another are not supported.
The compiler's optimizers do not know about these jumps, and therefore
they cannot take account of them when deciding how to optimize.
You also can't rely on the flags set in one asm being available in the next, as the compiler might insert something between them
With gcc 4.5 and later, you can use asm goto to do what you want:
fetch:
asm("mov (%r8), %r9");
asm("add $8, %r8");
control:
asm goto("test %r9, %r9\n\t"
"jz %l[fetch]" : : : : fetch);
Note that all the rest of your asm is completely unsafe as it uses registers directly without declaring them in its read/write/clobbered lists, so the compiler may decide to put something else in them (despite the vars with the asm declarations on them -- it may decide that those are dead as they are never used). So if you expect this to actually work with -O1 or higher, you need to write it as:
...
long ip;
long buf;
long op;
...
fetch:
asm("mov (%1), %0" : "=r"(buf) : "r"(ip));
asm("add $8, %0" : "=r"(ip) : "0"(ip));
control:
asm goto("test %0, %0\n\t"
"jz %l[fetch]" : : "r"(buf) : : fetch);
asm("shr $4, %0" : "=r"(buf) : "0"(buf));
asm("mov %1, %0" : "=r"(op) : "r"(buf));
asm("and $0xf, %0" : "=r"(op) : "r"(op));
At which point, its much easier to just write it as C code:
long *ip, buf, op;
fetch:
do {
buf = *op++;
control:
} while (!buf);
op = (buf >>= 4) & 0xf;
switch(op) {
:
}
goto control;
You should be able to do this:
fetch:
asm("afetch: mov(%r8), %r9");
...
asm("jz afetch");
Alternatively, putting the label in a separate asm("afetch:"); should work as well. Note the different name to avoid conflicts - I'm not entirely sure that's necessary, but I suspect it is.