I'm writing an RPC library for AVR and need to pass a function address to some inline assembler code and call the function from within the assembler code. However the assembler complains when I try to call the function directly.
This minimal example test.cpp illustrates the issue (in the actual case I'm passing args and the function is an instantiation of a static member of templated class):
void bar () {
return;
}
void foo() {
asm volatile (
"call %0" "\n"
:
: "p" (bar)
);
}
Compiling with avr-gcc -S test.cpp -o test.S -mmcu=atmega328p works fine but when I try to assemble with avr-gcc -c test.S -o test.o -mmcu=atmega328p avr-as complains:
test.c: Assembler messages:
test.c:38: Error: garbage at end of line
I have no idea why it writes "test.c", the file it is referring to is test.S, which contains this on line 38:
call gs(_Z3barv)
I have tried all even remotely sensible constraints on the paramter to the inline assembler that I could find here but none of those I tried worked.
I imagine if the gs() part was removed, everything should work, but all constraints seem to add it. I have no idea what it does.
The odd thing is that doing an indirect call like this assembles just fine:
void bar () {
return;
}
void foo() {
asm volatile (
"ldi r30, lo8(%0)" "\n"
"ldi r31, hi8(%0)" "\n"
"icall" "\n"
:
: "p" (bar)
);
}
The assembler produced looks like this:
ldi r30, lo8(gs(_Z3barv))
ldi r31, hi8(gs(_Z3barv))
icall
And avr-as doesn't complain about any garbage.
There are several issues with the code:
Issue 1: Wrong Constraint
The correct constraint for a call target is "i", thus known at link-time.
Issue 2: Wrong % print-modifier
In order to print an address suitable for a call, use %x which will print a plain symbol without gs(). Generating a linker stub at this place by means of gs() is not valid syntax, hence "garbage at end of line". Apart from that, as you are calling bar directly, there is no need for linker stub (at least not for this kind of symbol usage).
Issue 3: call instruction might not be available
To factor out whether a device supports call or just rcall, there is %~ which prints a single r if just rcall is available, and nothing if call is available.
Issue 4: The Call might clobber Registers or have other Side-Effects
It's unlikely that the call has no effects on registers or on memory whatsoever. If you description of the inline asm does not match some side-effects of the code, it's likely that you will get wrong code sooner or later.
Taking it all together
Let's assume you have a function bar written in assembly that takes two 16-bit operands in R22 and R26, and computes a result in R22. This function does not obey the avr-gcc C/C++ calling convention, so inline assembly is one way to interface to such a function. For bar we cannot write a correct prototype anyways, so we just provide a prototype so that we can use symbol bar. Register X has constraint "x", but R22 has no own register constraint, and therefore we have to use a local asm register:
extern "C" void bar (...);
int call_bar (int x, int y)
{
register int r22 __asm ("r22") = x;
__asm ("%~call %x2"
: "+r" (r22)
: "x" (y), "i" (bar));
return r22;
}
Generated code for ATmega32 + optimization:
_Z8call_barii:
movw r26,r22
movw r22,r24
call bar
movw r24,r22
ret
So what's that "generate stub" gs() thing?
Suppose the C/C++ code is taking the address of a function. The only sensible thing to do with it is to call that function, which will be an indirect call in general. Now an indirect call can target 64KiW = 128KiB at most, so that on devices with > 128KiB of code memory, special means must be taken to indirectly call a function beyond the 128KiB boundary. The AVR hardware features an SFR named EIND for that purpose, but problems using it are obvious. You'd have to set it prior to a call and then reset it somehow somewhere; all evil things would be necessary.
avr-gcc takes a different approach: For each such address taken, the compiler generates gs(func). This will just resolve to func if the address is in the 128KiB range. If not, gs() resolves to an address in section .trampolines which is located close to the beginning of flash, i.e. in the lower 128KiB. .trampolines containts a list of direct JMPs to targets beyond the 128KiB range.
Take for example the following C code:
extern int far_func (void);
int main (void)
{
int (*pfunc)(void) = far_func;
__asm ("" : "+r" (pfunc)); /* Forget content of pfunc. */
return pfunc();
}
The __asm is used to keep the compiler from optimizing the indirect call to a direct one. Then run
> avr-gcc main.c -o main.elf -mmcu=atmega2560 -save-temps -Os -Wl,--defsym,far_func=0x24680
> avr-objdump -d main.elf > main.lst
For the matter of brevity, we just define symbol far_func per command line.
The assembly dump in main.s shows that far_func might require a linker stub:
main:
ldi r30,lo8(gs(far_func))
ldi r31,hi8(gs(far_func))
eijmp
The final executable listing in main.lst then shows that the stub is actually generated and used:
main.elf: file format elf32-avr
Disassembly of section .text:
...
000000e4 <__trampolines_start>:
e4: 0d 94 40 23 jmp 0x24680 ; 0x24680 <far_func>
...
00000104 <main>:
104: e2 e7 ldi r30, 0x72 ; 114
106: f0 e0 ldi r31, 0x00 ; 0
108: 19 94 eijmp
main loads Z=0x0072 which is a word address for byte address 0x00e4, i.e. the code is indirectly jumping to 0x00e4, and from there it jumps directly to 0x24680.
Note that call requires a constant, known-at-link-time value. The "p" constraint does not include that semantics; it would also allow a pointer from a variable (e.g. char* x), which call cannot handle. (I seem to remember that sometimes gcc is clever enough to optimize in such a way that "p" does work here - but that's basically undocumented behavior and non-deterministic, so better not count on it.)
If the function you're calling actually is compile-time constant you can use "i" (bar). If it's not, then you have no other choice than using icall as you already figured out.
Btw, the AVR section of https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints documents some more, AVR-specific constraints.
I've tries various ways of passing a C function name to inline ASM code without success. However I did find a workaround, which seems to provide the desired result.
Answer to the question:
As explained on https://www.nongnu.org/avr-libc/user-manual/inline_asm.html you can assign a ASM name to a C function in a prototype declaration:
void bar (void) asm ("ASM_BAR"); // any name possible here
void bar (void)
{
return;
}
Then you can call the function easily from your ASM code:
asm volatile("call ASM_BAR");
Use with library functions:
This approach does not work with library functions, because they have their own prototype declarations. To call a function like system_tick() of the time.h library more efficiently from an ISR, you can declare a helper function. Unfortunately GCC does not apply the inline setting to calls from ASM code.
inline void asm_system_tick(void) asm ("ASM_SYSTEM_TICK") __attribute__((always_inline));
void asm_system_tick(void)
{
system_tick();
}
In the following example GCC does only generate push/ pop instructions for the surrounding code, not for the function call! Note that system_tick() is specifically designed for ISR_NAKED and does all required stack operations on its own.
volatile uint8_t tick = 0;
ISR(TIMER2_OVF_vect)
{
tick++;
if (tick > 127)
{
tick = 0;
asm volatile ("call ASM_SYSTEM_TICK");
}
}
Because the inline attribute does not work, each function call takes 8 additional cpu cycles. Compared to 5632 CPU cycles required for push/ pull operations with a normal function call (44 CPU cycles for each run of the ISR) it is still a very impressive improvement.
Related
TL;DR; I am looking for a standard way to basically tell the compiler to pass whatever happened to be in a given register to the next function.
Basically I have a function int bar(int a, int b, int c). In some cases c is unused and I would like to be able to call bar in the cases where c is unused without modifying rdx in any way.
For example if I have
int foo(int a, int b) {
int no_init;
return bar(a, b, no_init);
}
I would like the assembly to just be:
For a tailcall
jmp bar
or for a normal call
call bar
Note: clang generally produces what I am looking for. But I am unsure if this will always be the case in more complex functions and I am hoping to not have to check the assembly each time I build.
GCC produces:
For a tailcall
xorl %edx, %edx
jmp bar
or for a normal call
xorl %edx, %edx
call bar
I can get the results I want using inline assembly i.e changing foo (for tail calls) to
int foo(int a, int b) {
asm volatile("jmp bar" : : :);
__builtin_unreachable();
}
which compiles to just
jmp bar
I understand that the performance implications of an xorl %edx, %edx is about as close to 0 as possible but
I am wondering if there is a standard way to achieve this.
I.e I can probably find a hack for it for any given case. But that will require me verifying the assembly each time. I am looking for a method that you can basically tell the compiler "pass whatever happened to be in register".
See for examples: https://godbolt.org/z/eh1vK8
Edit: This is happening with -O3 set.
I am wondering if there is a standard way to achieve this.
I.e I can probably find a hack for it for any given case. But that
will require me verifying the assembly each time. I am looking for a
method that you can basically tell the compiler "pass whatever
happened to be in register".
No, there is no standard way to achieve it in either C or C++. Neither of these languages speak to any lower-level function call semantics, nor even acknowledge the existence of CPU registers,* and both languages require every function call to provide arguments corresponding to all non-optional parameters (which is simply "all declared parameters" in C).
For example if I have
int foo(int a, int b) {
int no_init;
return bar(a, b, no_init);
}
... then you reap undefined behavior as a result of using the value of no_init while it is indeterminate. Whatever any particular C or C++ implementation that accepts that at all does with it is non-standard by definition.
If you want to call bar(), but you don't care what value is passed as the third argument, then why not just choose a convenient value to pass? Zero, for example:
return bar(a, b, 0);
*Even the register keyword does not do this as far as either language standard is concerned.
Note that if the called function does read its 3rd arg, leaving it unwritten risks creating a false dependency on whatever last used EDX. For example it might be the result of a cache-miss load, or a long chain of calculations.
GCC is careful to xor-zero to break false dependencies in a lot of cases, e.g. before cvtsi2ss (bad ISA design) or popcnt (Sandybridge-family quirk).
Usually the xor edx,edx is basically a wasted 2-byte NOP, but it does prevent possible coupling of otherwise-independent dependency chains (critical paths).
If you're sure you want to defeat the compiler's attempt to protect you from that, then Nate's asm("" :"=r"(var)); is a good way to do an integer version of _mm_undefined_ps() that actually leaves a register uninitialized. (Note that _mm_undefined_ps doesn't guarantee leaving an XMM reg unwritten; some compilers will xor-zero one for you instead of fully implementing the false-dependency recklessness that intrinsic was designed to allow for Intel's compiler.)
One approach that should work for gcc/clang on most platforms is to do
int no_init;
asm("" : "=r" (no_init));
return bar(a, b, no_init);
This way you don't have to lie to the compiler about the prototype of bar (whichc could break some calling conventions), and you fool the compiler into thinking no_init is really initialized.
I would wonder about an architecture like Itanium with its "trap bit" that causes a fault when an uninitialized register is accessed. This code would probably not be safe there.
There is no portable way to get this behavior that I know of, but you could ifdef it:
#ifdef __GNUC__
#define UNUSED_INT ({ int x; asm("" : "=r" (x)); x; })
#else
#define UNUSED_INT 0
#endif
// ...
bar(a, b, UNUSED_INT);
Then you can fall back to the (infinitesimally) less efficient but correct code when necessary.
It results in a bare jmp on gcc/x86-64, see https://godbolt.org/z/d3ordK. On x86-32 it is not quite optimal as it pushes an uninitialized register, instead of just adjusting an existing subtraction from esp. Note that a bare jmp/call is not safe on x86-32 because that third stack slot may contain something important, and the callee is allowed to overwrite it (even if the variable is unused on the path you have in mind, the compiler could be using it as scratch space).
One portable alternative would be to rewrite bar to be variadic. However, then it would need to use va_arg to retrieve the third argument when it is present, and that tends to be less efficient.
Cast the function to have the smaller signature (i.e. fewer parameters):
extern int bar(int, int, int);
int foo(int a, int int b) {
return ((int (*)(int,int))bar)(a, b);
}
Maybe make a macro for 2 parameter bar, and even get rid of foo:
extern int bar3(int, int, int);
#define bar2(a,b) ((int (*)(int,int))bar3)(a,b)
int userOfBar(int a, int b) { return bar2 (a,b); }
https://godbolt.org/z/Gn4a69
Oddly, given the above gcc doesn't touch %edx, but clang does... oh, well.
(Still can't guarantee the compiler won't touch some registers, though, that's its domain. Otherwise, you can write these functions directly in assembly and avoid the middleperson.)
Given this code:
#include <stdio.h>
int main(int argc, char **argv)
{
int x = 1;
printf("Hello x = %d\n", x);
}
I'd like to access and manipulate the variable x in inline assembly. Ideally, I want to change its value using inline assembly. GNU assembler, and using the AT&T syntax.
In GNU C inline asm, with x86 AT&T syntax:
(But https://gcc.gnu.org/wiki/DontUseInlineAsm if you can avoid it).
// this example doesn't really need volatile: the result is the same every time
asm volatile("movl $0, %[some]"
: [some] "=r" (x)
);
after this, x contains 0.
Note that you should generally avoid mov as the first or last instruction of an asm statement. Don't copy from %[some] to a hard-coded register like %%eax, just use %[some] as a register, letting the compiler do register allocation.
See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html and https://stackoverflow.com/tags/inline-assembly/info for more docs and guides.
Not all compilers support GNU syntax.
For example, for MSVC you do this:
__asm mov x, 0 and x will have the value of 0 after this statement.
Please specify the compiler you would want to use.
Also note, doing this will restrict your program to compile with only a specific compiler-assembler combination, and will be targeted only towards a particular architecture.
In most cases, you'll get as good or better results from using pure C and intrinsics, not inline asm.
asm("mov $0, %1":"=r" (x):"r" (x):"cc"); -- this may get you on the right track. Specify register use as much as possible for performance and efficiency. However, as Aniket points out, highly architecture dependent and requires gcc.
The following program compiles perfectly with no errors or warnings (even with -Wall) in g++, but crashes immediately.
#include <cstdio>
int stuff(void)
{
puts("hello there.");
return 0;
}
int (*main)(void) = stuff;
This is an (obviously horribly misguided) attempt at running a C++ program without explicitly declaring main as a function. It was my intention for the program to execute stuff by binding it to the symbol main. I was very surprised that this compiled, but why exactly does it fail, having compiled? I've looked at the generated assembly but I don't know enough to understand it at all.
I'm fully aware that there are plenty of restrictions on how main can be defined/used, but I'm unclear on how my program breaks any of them. I haven't overloaded main or called it within my program... so exactly what rule am I breaking by defining main this way?
Note: this was not something I was trying to do in actual code. It was actually the beginnings of an attempt to write Haskell in C++.
In the code that runs before main, there is something like:
extern "C" int main(int argc, char **argv);
The problem with your code is that if you have a function pointer called main, it is not a the same as a function (as opposed to Haskell where a function and a funciton pointer is pretty much interchangable - at least with my 0.1% knowledge of Haskell).
Whilst the compiler will happily accept:
int (*func)() = ...;
int x = func();
as a valid call to the function pointer func. However, when the compiler generates code to call func, it actually does this in a different way [although the standard doesn't say how this should be done, and it varies on different processor architectures, in practice it loads the value in the pointer variable, and then calls this content].
When you have:
int func() { ... }
int x = func();
the call to func just refers to the address of func itself, and calls that.
So, assuming your code actually does compile, the startup code before main will call the address of your variable main rather than indirectly reading the value in main and then calling that. In modern systems, this will cause a segfault because main lives in the data segment which is not executable, but in older OS's it would most likely crash due to main does not contain real code (but it may execute a few instructions before it falls over in this case - in the dim and distant past, I've accidentally run all sorts of "rubbish" with rather difficult to discover causes...)
But since main is a "special" function, it's also possible that the compiler says "No, you can't do this".
It used to work, many years ago to do this:
char main[] = { 0xXX, 0xYY, 0xZZ ... };
but again, this doesn't work in a modern OS, because main ends up in the data section, and it's not executable in that section.
Edit: After actually testing the posted code, at least on my 64-bit Linux, the code actually compiles, but crashes, unsurprisingly, when it tries to execute main.
Running in GDB gives this:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000600950 in main ()
(gdb) bt
#0 0x0000000000600950 in main ()
(gdb) disass
Dump of assembler code for function main:
=> 0x0000000000600950 <+0>: and %al,0x40(%rip) # 0x600996
0x0000000000600956 <+6>: add %al,(%rax)
End of assembler dump.
(gdb) disass stuff
Dump of assembler code for function stuff():
0x0000000000400520 <+0>: push %rbp
0x0000000000400521 <+1>: mov %rsp,%rbp
0x0000000000400524 <+4>: sub $0x10,%rsp
0x0000000000400528 <+8>: lea 0x400648,%rdi
0x0000000000400530 <+16>: callq 0x400410 <puts#plt>
0x0000000000400535 <+21>: mov $0x0,%ecx
0x000000000040053a <+26>: mov %eax,-0x4(%rbp)
0x000000000040053d <+29>: mov %ecx,%eax
0x000000000040053f <+31>: add $0x10,%rsp
0x0000000000400543 <+35>: pop %rbp
0x0000000000400544 <+36>: retq
End of assembler dump.
(gdb) x main
0x400520 <stuff()>: 0xe5894855
(gdb) p main
$1 = (int (*)(void)) 0x400520 <stuff()>
(gdb)
So, we can see that main is not really a function, it's a variable which contains a pointer to stuff. The startup code calls main as if it was a function, but it fails to execute the instructions there (because it's data, and data has the "no execute" bit set - not that you can see that here, but I know it works that way).
Edit2:
Inspecting dmesg shows:
a.out[7035]: segfault at 600950 ip 0000000000600950 sp 00007fff4e7cb928 error 15 in a.out[600000+1000]
In other words, the segmentation fault happens immediately with the execution of main - because it's not executable.
Edit3:
Ok, so it's slightly more convoluted than that (at least in my C runtime library), as the code that calls main is a function that takes the pointer to main as an argument, and calls it through a pointer. This however doesn't change the fact that when the compiler builds the code, it produces a level of indirection less than it needs, and tries to execute the variable called main rather than the function that the variable is pointing at.
Listing __libc_start_main in GDB:
87 STATIC int
88 LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
89 int argc, char *__unbounded *__unbounded ubp_av,
90 #ifdef LIBC_START_MAIN_AUXVEC_ARG
91 ElfW(auxv_t) *__unbounded auxvec,
92 #endif
At this point, printing main gives us a function pointer that points at 0x600950, which is the variable called main (same as what I dissassembled above)
(gdb) p main
$1 = (int (*)(int, char **, char **)) 0x600950 <main>
Note that this is a different variable main than the one called main in the source posted in the question.
There's nothing special here about it being main(). The same will happen if you do this for any function. Consider this example:
file1.cpp:
#include <cstdio>
void stuff(void)
{
puts("hello there.");
}
void (*func)(void) = stuff;
file2.cpp:
extern "C" {void func(void);}
int main(int argc, char**argv)
{
func();
}
This will also compile, and then segfault. It is essentially doing the same thing for the function func, but because the coding is explicit it now more apparently looks wrong. main() is a plain C type function with no name mangling, and just appears as a name in the symbol table. If you make it something other than a function, you get a segfault when it executes a pointer.
I guess the interesting part is that the compiler will allow you to define a symbol called main when it is already implicitly declared with a different type.
I'm working with a proprietary MCU that has a built-in library in metal (mask ROM). The compiler I'm using is clang, which uses GCC-like inline ASM. The issue I'm running into, is calling the library since the library does not have a consistent calling convention. While I found a solution, I've found that in some cases the compiler will make optimizations that clobber registers immediately before the call, I think there is just something wrong with how I'm doing things. Here is the code I'm using:
int EchoByte()
{
register int asmHex __asm__ ("R1") = Hex;
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
((volatile void (*)(void))(MASKROM_EchoByte))(); //MASKROM_EchoByte is a 16-bit integer with the memory location of the function
}
Now this has the obvious problem that while the variable "asmHex" is asserted to register R1, the actual call does not use it and therefore the compiler "doesn't know" that R1 is reserved at the time of the call. I used the following code to eliminate this case:
int EchoByte()
{
register int asmHex __asm__ ("R1") = Hex;
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
((volatile void (*)(void))(MASKROM_EchoByte))();
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
}
This seems really ugly to me, and like there should be a better way. Also I'm worried that the compiler may do some nonsense in between, since the call itself has no indication that it needs the asmHex variable. Unfortunately, ((volatile void (*)(int))(MASKROM_EchoByte))(asmHex) does not work as it will follow the C-convention, which puts arguments into R2+ (R1 is reserved for scratching)
Note that changing the Mask ROM library is unfortunately impossible, and there are too many frequently used routines to recreate them all in C/C++.
Cheers, and thanks.
EDIT: I should note that while I could call the function in the ASM block, the compiler has an optimization for functions that are call-less, and by calling in assembly it looks like there's no call. I could go this route if there is some way of indicating that the inline ASM contains a function call, but otherwise the return address will likely get clobbered. I haven't been able to find a way to do this in any case.
Per the comments above:
The most conventional answer is that you should implement a stub function in assembly (in a .s file) that simply performs the wacky call for you. In ARM, this would look something like
// void EchoByte(int hex);
_EchoByte:
push {lr}
mov r1, r0 // move our first parameter into r1
bl _MASKROM_EchoByte
pop pc
Implement one of these stubs per mask-ROM routine, and you're done.
What's that? You have 500 mask-ROM routines and don't want to cut-and-paste so much code? Then add a level of indirection:
// typedef void MASKROM_Routine(int r1, ...);
// void GeneralPurposeStub(MASKROM_Routine *f, int arg, ...);
_GeneralPurposeStub:
bx r0
Call this stub by using the syntax GeneralPurposeStub(&MASKROM_EchoByte, hex). It'll work for any mask-ROM entry point that expects a parameter in r1. Any really wacky entry points will still need their own hand-coded assembly stubs.
But if you really, really, really must do this via inline assembly in a C function, then (as #JasonD pointed out) all you need to do is add the link register lr to the clobber list.
void EchoByte(int hex)
{
register int r1 asm("r1") = hex;
asm volatile(
"bl _MASKROM_EchoByte"
:
: "r"(r1)
: "r1", "lr" // Compare the codegen with and without this "lr"!
);
}
If you want to call a C/C++ function from inline assembly, you can do something like this:
void callee() {}
void caller()
{
asm("call *%0" : : "r"(callee));
}
GCC will then emit code which looks like this:
movl $callee, %eax
call *%eax
This can be problematic since the indirect call will destroy the pipeline on older CPUs.
Since the address of callee is eventually a constant, one can imagine that it would be possible to use the i constraint. Quoting from the GCC online docs:
`i'
An immediate integer operand (one with constant value) is allowed. This
includes symbolic constants whose
values will be known only at assembly
time or later.
If I try to use it like this:
asm("call %0" : : "i"(callee));
I get the following error from the assembler:
Error: suffix or operands invalid for `call'
This is because GCC emits the code
call $callee
Instead of
call callee
So my question is whether it is possible to make GCC output the correct call.
I got the answer from GCC's mailing list:
asm("call %P0" : : "i"(callee)); // FIXME: missing clobbers
Now I just need to find out what %P0 actually means because it seems to be an undocumented feature...
Edit: After looking at the GCC source code, it's not exactly clear what the code P in front of a constraint means. But, among other things, it prevents GCC from putting a $ in front of constant values. Which is exactly what I need in this case.
For this to be safe, you need to tell the compiler about all registers that the function call might modify, e.g. : "eax", "ecx", "edx", "xmm0", "xmm1", ..., "st(0)", "st(1)", ....
See Calling printf in extended inline ASM for a full x86-64 example of correctly and safely making a function call from inline asm.
Maybe I am missing something here, but
extern "C" void callee(void)
{
}
void caller(void)
{
asm("call callee\n");
}
should work fine. You need extern "C" so that the name won't be decorated based on C++ naming mangling rules.
If you're generating 32-bit code (e.g. -m32 gcc option), the following asm inline emits a direct call:
asm ("call %0" :: "m" (callee));
The trick is string literal concatenation. Before GCC starts trying to get any real meaning from your code it will concatenate adjacent string literals, so even though assembly strings aren't the same as other strings you use in your program they should be concatenated if you do:
#define ASM_CALL(X) asm("\t call " X "\n")
int main(void) {
ASM_CALL( "my_function" );
return 0;
}
Since you are using GCC you could also do
#define ASM_CALL(X) asm("\t call " #X "\n")
int main(void) {
ASM_CALL(my_function);
return 0;
}
If you don't already know you should be aware that calling things from inline assembly is very tricky. When the compiler generates its own calls to other functions it includes code to set up and restore things before and after the call. It doesn't know that it should be doing any of this for your call, though. You will have to either include that yourself (very tricky to get right and may break with a compiler upgrade or compilation flags) or ensure that your function is written in such a way that it does not appear to have changed any registers or condition of the stack (or variable on it).
edit this will only work for C function names -- not C++ as they are mangled.