In my C++ / C project I want to set the stack pointer equal to the base pointer... Intuitively I would use something like this:
asm volatile(
"movl %%ebp %%esp"
);
However, when I execute this, I get this error message:
Error: bad register name `%%ebp %%esp'
I use gcc / g++ version 4.9.1 compiler.
I dont know whether I need to set specific g++ or gcc flag though... There should be a way to manipulate the esp and ebp registers but I just don't know the right way to do it.
Doe anybody know how to manipulate these two registers in c++? Maybe I should do it with hexed OP codes?
You're using GNU C Basic Asm syntax (no input/output/clobber constraints), so % is not special and therefore, it shouldn't be escaped.
It's only in Extended Asm (with constraints) that % needs to be escaped to end up with a single % in front of hard-coded register names in the compiler's asm output (as required in AT&T syntax).
You also have to separate the operands with a comma:
asm volatile(
"movl %ebp, %esp"
);
asm statements with no output operands are implicitly volatile, but it doesn't hurt to write an explicit volatile.
Note, however, that putting this statement inside a function will likely interfere with the way the compiler handles the stack frame.
Related
I would like to learn some inline assembly programming, but my first cod snippet does not work. I have a string and I would like to assign the value of the string to the rsi register.
Here is my code:
string s = "Hello world";
const char *ystr = s.c_str();
asm("mov %1,%%rsi"
:"S"(ystr)
:"%rsi" //clobbered register
);
return 0;
It gives me the error :Expected ')' before token. Any help is appreciated.
You left out a : to delimit the empty outputs section. So "S"(ystr) is an input operand in the outputs section, and "%rsi" is in the inputs section, not clobbers.
But as an input it's missing the (var_name) part of the "constraint"(var_name) syntax. So that's a syntax error, as well as a semantic error. That's the immediate source of the error <source>:9:5: error: expected '(' before ')' token. https://godbolt.org/z/97aTdjE8K
As Nate pointed out, you have several other errors, like trying to force the input to pick RSI with "S".
char *output; // could be a dummy var if you don't actually need it.
asm("mov %1, %0"
: "=r" (output) /// compiler picks a reg for you to write to.
:"S"(ystr) // force RSI input
: // no clobbers
);
Note that this does not tell the compiler that you read or write the pointed-to memory, so it's only safe for something like this, which copies the address around but doesn't expect to read or write the pointed-to data.
Also related:
How can I indicate that the memory *pointed* to by an inline ASM argument may be used?
Can I modify input operands in gcc inline assembly
How to mark as clobbered input operands (C register variables) in extended GCC inline assembly?
In general, when using gcc inline asm on x86, you want to avoid ever using mov instructions, and want to avoid explicit registers in the asm code -- just use the register constraints to get things in the appropriate registers. So for your example, getting a string pointer into the rsi register, you want just:
asm volatile("; ystr wil be in %rsi here"
: // no output contraints
: "S"(ystr) // input constraint
: // no clobber needed
);
Note that there's no actual asm code output here -- just a comment. The input constraint is sufficient to get the operand into the needed register prior to the point where this appears. Yes, rsi might well be used for something else afterwards, but that is as expected -- the register constraints just cover the inputs and outputs of the asm text.
In my case, C++ code was compiled with -std=c++17 and the compiler also reported expected ')' before ':' token
I changed the keyword asm to __asm__ and in my case, this helped.
This modification was inspired by "When writing code that can be compiled with -ansi and the various -std options, use __asm__ instead of asm" from https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, however as commented below, this may not be completely precise.
We consider that we are using GCC (or GCC-compatible) compiler on a X86_64 architecture, and that eax, ebx, ecx, edx and level are variables (unsigned int or unsigned int*) for input and output of the instruction (like here).
asm("CPUID":::);
asm volatile("CPUID":::);
asm volatile("CPUID":::"memory");
asm volatile("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx)::"memory");
asm volatile("CPUID":"=a"(eax):"0"(level):"memory");
asm volatile("CPUID"::"a"(level):"memory"); // Not sure of this syntax
asm volatile("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level):"memory");
asm("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level):"memory");
asm volatile("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level));
I am not used to the inline assembly syntax, and I am wondering what would be the difference between all these calls, in a context where I just want to use CPUID as a serializing instruction (e.g. nothing will be done with the output of the instruction).
Can some of these calls lead to errors?
Which one(s) of these calls would be the most suited (given that I want the least overhead as possible, but at the same time the "strongest" serialization possible)?
First of all, lfence may be strongly serializing enough for your use-case, e.g. for rdtsc. If you care about performance, check and see if you can find evidence that lfence is strong enough (at least for your use-case). Possibly even using both mfence; lfence might be better than cpuid, if you want to e.g. drain the store buffer before an rdtsc.
But neither lfence nor mfence are serializing on the whole pipeline in the official technical-terminology meaning, which could matter for cross-modifying code - discarding instructions that might have been fetched before some stores from another core became visible.
2. Yes, all the ones that don't tell the compiler that the asm statement writes E[A-D]X are dangerous and will likely cause hard-to-debug weirdness. (i.e. you need to use (dummy) output operands or clobbers).
You need volatile, because you want the asm code to be executed for the side-effect of serialization, not to produce the outputs.
If you don't want to use the CPUID result for anything (e.g. do double duty by serializing and querying something), you should simply list the registers as clobbers, not outputs, so you don't need any C variables to hold the results.
// volatile is already implied because there are no output operands
// but it doesn't hurt to be explicit.
// Serialize and block compile-time reordering of loads/stores across this
asm volatile("CPUID"::: "eax","ebx","ecx","edx", "memory");
// the "eax" clobber covers RAX in x86-64 code, you don't need an #ifdef __i386__
I am wondering what would be the difference between all these calls
First of all, none of these are "calls". They're asm statements, and inline into the function where you use them. CPUID itself is not a "call" either, although I guess you could look at it as calling a microcode function built-in to the CPU. But by that logic, every instruction is a "call", e.g. mul rcx takes inputs in RAX and RCX, and returns in RDX:RAX.
The first three (and the later one with no outputs, just a level input) destroy RAX through RDX without telling the compiler. It will assume that those registers still hold whatever it was keeping in them. They're obviously unusable.
asm("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level):"memory"); (the one without volatile) will optimize away if you don't use any of the outputs. And if you do use them, it can still be hoisted out of loops. A non-volatile asm statement is treated by the optimizer as a pure function with no side effects. https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#index-asm-volatile
It has a memory clobber, but (I think) that doesn't stop it from optimizing away, it just means that if / when / where it does run, any variables it could possibly read / write are synced to memory, so memory contents match what the C abstract machine would have at that point. This may exclude locals that haven't had their address taken, though.
asm("" ::: "memory") is very similar to std::atomic_thread_fence(std::memory_order_seq_cst), but note that that asm statement has no outputs, and thus is implicitly volatile. That's why it isn't optimized away, not because of the "memory" clobber itself. A (volatile) asm statement with a memory clobber is a compiler barrier against reordering loads or stores across it.
The optimizer doesn't care at all what's inside the first string literal, only the constraints / clobbers, so asm volatile("anything" ::: register clobbers, "memory") is also a compile-time-only memory barrier. I assume this is what you want, to serialize some memory operations.
"0"(level) is a matching constraint for the first operand (the "=a"). You could equally have written "a"(level), because in this case the compiler doesn't have a choice of which register to select; the output constraint can only be satisfied by eax. You could also have used "+a"(eax) as the output operand, but then you'd have to set eax=level before the asm statement. Matching constraints instead of read-write operands are sometimes necessary for x87 stack stuff; I think that came up once in an SO question. But other than weird stuff like that, the advantage is being able to use different C variables for input and output, or not using a variable at all for the input. (e.g. a literal constant, or an lvalue (expression)).
Anyway, telling the compiler to provide an input will probably result in an extra instruction, e.g. level=0 would result in an xor-zeroing of eax. This would be a waste of an instruction if it didn't already need a zeroed register for anything earlier. Normally xor-zeroing an input would break a dependency on the previous value, but the whole point of CPUID here is that it's serializing, so it has to wait for all previous instructions to finish executing anyway. Making sure eax is ready early is pointless; if you don't care about the outputs, don't even tell the compiler your asm statement takes an input. Compilers make it difficult or impossible to use an undefined / uninitialized value with no overhead; sometimes leaving a C variable uninitialized will result in loading garbage from the stack, or zeroing a register, instead of just using a register without writing it first.
I want to create a function for addition two 16-bit integers with overflow detection. I have generic variant written in portable c. But the generic variant is not optimal for x86 target, because CPU internally calculate overflow flag when execute ADD/SUB/etc. Of course, there is__builtin_add_overflow(), but in my case it generates some boilerplate.
So I write the following code:
#include <cstdint>
struct result_t
{
uint16_t src;
uint16_t dst;
uint8_t of;
};
static void add_u16_with_overflow(result_t& r)
{
char of, cf;
asm (
" addw %[dst], %[src] "
: [dst] "+mr"(r.dst)//, "=#cco"(of), "=#ccc"(cf)
: [src] "imr" (r.src)
: "cc"
);
asm (" seto %0 " : "=rm" (r.of) );
}
uint16_t test_add(uint16_t a, uint16_t b)
{
result_t r;
r.src = a;
r.dst = b;
add_u16_with_overflow(r);
add_u16_with_overflow(r);
return (r.dst + r.of); // use r.dst and r.of for prevent discarding
}
I've played with https://godbolt.org/g/2mLF55 (gcc 7.2 -O2 -std=c++11) and it results
test_add(unsigned short, unsigned short):
seto %al
movzbl %al, %eax
addw %si, %di
addw %si, %di
addl %esi, %eax
ret
So, seto %0 is reordered. It seems gcc think there is no dependency between two consequent asm() statements. And "cc" clobber doesn't have any effect for flags dependency.
I can't use volatile because seto %0 or whole function can be (and have to) optimized out if result (or some part of result) is not used.
I can add dependency for r.dst: asm (" seto %0 " : "=rm" (r.of) : "rm"(r.dst) );, and reordering will not happen. But it is not a "right thing", and compiler still can insert some code changes flags (but not changes r.dst) between add and seto statement.
Is there way to say "this asm() statement change some cpu flags" and "this asm() use some cpu flags" for dependency between statement and prevent reordering?
I haven't looked at gcc's output for __builtin_add_overflow, but how bad is it? #David's suggestion to use it, and https://gcc.gnu.org/wiki/DontUseInlineAsm is usually good, especially if you're worried about how this will optimize. asm defeats constant propagation and some other things.
Also, if you are going to use ASM, note that att syntax is add %[src], %[dst] operand order. See the tag wiki for details, unless you're always going to build your code with -masm=intel.
Is there way to say "this asm() statement change some cpu flags" and "this asm() use some cpu flags" for dependency between statement and prevent reordering?
No. Put the flag-consuming instruction (seto) inside the same asm block as the flag-producing instruction. An asm statement can have an many input and output operands as you like, limited only by register-allocation difficulty (but multiple memory outputs can use the same base register with different offsets). Anyway, an extra write-only output on the statement containing the add isn't going to cause any inefficiency.
I was going to suggest that if you want multiple flag outputs from one instruction, use LAHF to Load AH from FLAGS. But that doesn't include OF, only the other condition codes. This is often inconvenient and seems like a bad design choice because there are some unused reserved bits in the low 8 of EFLAGS/RFLAGS, so OF could have been in the low 8 along with CF, SF, ZF, PF, and AF. But since that isn't the case, setc + seto are probably better than pushf / reload, but that is worth considering.
Even if there was syntax for flag-input (like there is for flag-output), there would be very little to gain from letting gcc insert some of its own non-flag-modifying instructions (like lea or mov) between your two separate asm statements.
You don't want them reordered or anything, so putting them in the same asm statement makes by far the most sense. Even on an in-order CPU, add is low latency so it's not a big bottleneck to put a dependent instruction right after it.
And BTW, a jcc might be more efficient if overflow is an error condition that doesn't happen normally. But unfortunately GNU C asm goto doesn't support output operands. You could take a pointer input and modify dst in memory (and use a "memory" clobber), but forcing a store/reload sucks more than using setc or seto to produce an input for a compiler-generated test/jnz.
If you didn't also need an output, you could put C labels on a return true and a return false statement, which (after inlining) would turn your code into a jcc to wherever the compiler wanted to lay out the branches of an if(). e.g. see how Linux does it: (with extra complicating factors in these two examples I found): setting up to patch the code after checking a CPU feature once at boot, or something with a section for a jump table in arch_static_branch.)
I have two similar issues when handling arrays when defined in the asm and when passed from c++ to asm. The code works fine inline but I need to separate them from the cpp into an asm file. The compiler may not throw an error or warning but the end result is random each run and should be constant like it was when inline.
The below code works when used in MMX (movq mm6,twosMask_W) but I need the equivalent for SSE2. I thought that this would work but I appear to be incorrect.
.data
align 16
twosMask_W qword 2 dup(0002000200020002h)
.code
...
movdqa xmm6,oword ptr twosMask_W
...
The second issue is when I pass my thresh128 array from C++ to asm (again for SSE2):
//C++
uint64_t thresh128[2];
thresh128[0] = ((thresh-1)<<8)+(thresh-1);
thresh128[0] += (thresh128[0]<<48)+(thresh128[0]<<32)+(thresh128[0]<<16);
thresh128[1] = thresh128[0];
sendToASM(thresh128)
//ASM
;There are more parameters that utilize the registers but not listed.
receivedFromCPP proc thresh:qword
public receivedFromCPP
...
movdqu xmm4,oword ptr thresh
...
I've tried having thresh as an oword parameter in the procedure but it yielded no results. I'm sure I've got some syntax or parameter type wrong. Any help would be greatly appreciated.
Note: Compiled using MASM in VS2013 for x86.
Well, I tested the first part and it seems to work - so I cannot say anything related to this particular issue.
Concerning the second problem: you seem to pass a 64 bit qword on the stack in 32 bit mode (where is no direct opcode for 64 bit PUSHes) so it would be 2 PUSHes...
receivedFromCPP proc thresh:qword
but are expecting a pointer to a 128 bit value on the stack:
movdqu xmm4,oword ptr thresh
Also keep in mind the little-endianess of x86 - depending on how the compiler chooses to PUSH the 2*64bit-array it may be different from a little-endian-value resulting in seemingly random values.
EDIT: Because the stack grows upside-down, a 128 bit value has to be PUSHed in reverse order for referencing it by EBP.
This is my first attempt at using assembly and I'm just trying to use the Intel Architecture instruction FABS. (Referencing this document on page 399).
This is simply supposed to clear the sign bit.
The little I know about assembly involves sources and destinations but I'm not seeing any reference to the notation for this instruction.
Below is one of my attempts at using it (using Visual studio 2012, C++):
double myabs(double x){
__asm(fabs(x));
return x;
}
This particular attempt give the error C2400: inline assembler syntax error in 'opcode'; found '('
Please note that I want to use the assembly instruction and am not interested in other / "better" options that are available.
Several pointers: First - you're using gcc style inline assembly, in MS style you could use -
__asm{ ... }
Second - instructions aren't functions, so the parenthesis there are also wrong.
Last but most important - fabs doesn't take an argument, it just works on the top of the FP stack. You need to explicitly load your variable there first. Try this:
__asm {
fld x
fabs
fstp x
}
Anyway, using old x87 instructions is probably not a good thing, it's probably quite inefficient - you should consider switching to an SSE solution, see - How to absolute 2 double or 4 floats using SSE instruction set? (Up to SSE4)
With VC++, you don't enclose the assembly language in parentheses. Correct syntax would be more like:
__asm fabs
or:
__asm {
fabs
// possibly more instructions here
}
In your specific case, you'd probably want something like:
__asm {
fload x // load x onto F.P. stack
fabs // take absolute value
fstp x // store back to x and pop from F.P. stack.
}
As far as source and destination go, floating point on an x86 uses a stack. Unless you specify otherwise, most instructions (other than load/store) take operands from the top of the stack and deposit results on the top of the stack as well. For example, with no operand given, fabs will take the absolute value of the operand at the top of the floating point stack and deposit the result back in the same place.