__asm__ gcc call to a memory address - c++

I have a code that allocates memory, copies some buffer to that allocated memory and then it jumps to that memory address.
the problem is that I cant jump to the memory address. Im using gcc and __asm__ but I cant call that memory address.
I want to do something like:
address=VirtualAlloc(NULL,len+1, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
dest=strncpy(address, buf, len);
And then I want to do this in ASM:
MOV EAX, dest
CALL EAX.
I've tried something like:
__asm__("movl %eax, dest\n\t"
"call %eax\n\t");
But it does not work.
How can I do it?

There is usually no need to use asm for this, you can simply go through a function pointer and let the compiler take care of the details.
You do need to use __builtin___clear_cache(buf, buf+len) after copy machine code to a buffer before you dereference a function-pointer to it, otherwise it can be optimized away as a dead store.. x86 has coherent instruction caches so it doesn't compile to any extra instructions, but you still need it so the optimizer knows what's going on.
static inline
int func(char *dest, int len) {
__builtin___clear_cache(dest, dest+len); // no instructions on x86 but still needed
int ret = ((int (*)(void))dest)(); // cast to function pointer and deref
return ret;
}
compiles with GCC9.1 -O2 -m32 to
func(char*, int):
jmp [DWORD PTR [esp+4]] # tailcall
Also, you don't actually need to copy a string, you can just mprotect or VirtualProtect the page it's in to make it executable. But if you want to make sure it does stop at the first 0 byte to test your shellcode, then sure copy it.
If you nevertheless insist on inline asm, you should know that gcc inline asm is a complex thing. Also, if you expect the function to return, you should really make sure it follows the calling convention, in particular it preserves the registers it should.
AT&T syntax is op src, dst so your mov was actually a store to the global symbol dest.
That said, here is the answer to the question as worded:
int ret;
__asm__ __volatile__ ("call *%0" : "=a" (ret) : "0" (dest) : "ecx", "edx", "memory");
Explanation: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
call *%0 = the %0 refers to the first substitued argument, the * is standard gas syntax for indirect call
"=a" (ret) = output argument in eax register should be assigned to variable ret after the block
"0" (dest) = input argument in the same place as output argument 0 (which is eax) should be loaded from dest before the block
"ecx", "edx" = tell the compiler these registers may be altered by the asm block, as per normal calling convention.
"memory" = tell the compiler the asm block might make unspecified modifications to memory, so don't cache anything
Note that in x86-64 System V (Linux / OS X), it's not safe to make a function call from inline asm like this. There's no way to declare a clobber on the red zone below RSP.

Related

Why doesn't this MSVC asm block have a ret, or the non-void function have a return?

I'm learning about using inline assembly inside the C++ code.
Here is the very simple example:
// Power2_inline_asm.c
// compile with: /EHsc
// processor: x86
#include <stdio.h>
int power2( int num, int power );
int main( void )
{
printf_s( "3 times 2 to the power of 5 is %d\n", \
power2( 3, 5) );
}
int power2( int num, int power )
{
__asm
{
mov eax, num ; Get first argument
mov ecx, power ; Get second argument
shl eax, cl ; EAX = EAX * ( 2 to the power of CL )
}
// Return with result in EAX
}
Since the power2 function returns the result WHY isn't there a ret instruction a the end of the asm code?
Or a C++ return keyword outside the asm block, before the end of the function?
EAX is implied to contain return value, and there's ret generated by complier (some code is generated by compiler, if __declspec(naked) is not specified). Since there's no C++ return statement, from C++ point of view the behavior is undefined, the manifestation of undefined behavior is to return whatever EAX contains, which is the result.
It seems you're unclear about the relationship between the ret instruction and return values. There is none.
The operand to the ret instruction is not the return value, it's the number of bytes to remove from the stack for calling conventions where the callee handles argument cleanup.
The return value is passed in some other way, controlled by the calling convention, and must be stored before reaching the ret instruction.
Not having a ret instruction in the asm is totally normal; you want to let the compiler generate function prologues / epilogues, including a ret instruction, like they normally do for paths of execution that reach the } in a function. (Or you'd use __declspec(naked) to write the whole function in asm, including handling the calling convention, which would let you use fastcall to take args in registers instead of needing to load them from stack memory).
The more interesting thing is falling off the end of a non-void function without a return. (I edited your question to ask about that, too).
In ISO C++, that's undefined behaviour. (So compilers like clang -fasm-blocks can assume that path of execution is never reached, and not even emit any instructions for it, not even a ret.) But MSVC does at least de-facto define the behaviour of doing that,
MSVC specifically does support falling off the end of a non-void function after an asm statement, treating EAX or ST0 as the return value. Or at least that's how MSVC de-facto works, whether it's intentional support or not, but it does even support inlining such functions, so it's not just a calling-convention abuse of UB. (clang -fasm-blocks does not work that way; IDK about clang-cl. But it does not define the behaviour of falling off the end of a non-void function, fully omitting the ret because that path of execution must not be reachable.)
Not using ret in the asm{} block
ESP isn't pointing at the return address when the asm{} block executes; I think MSVC always forces functions using asm{} to set up EBP as a frame pointer.
You definitely can't just ret out of the middle of a function without giving the compiler a chance to restore call-preserved registers and clean up the stack in the function epilogue.
Also, what if the compiler had inlined power2 into a caller?
Then you'd be returning from that caller (if you did leave / ret in an asm block).
Look at the compiler-generated asm.
(TODO: I was going to write more and link something on https://godbolt.org/, but never got back to it.)

Getting the caller's Return Address

I am trying to figure out how to grab the return address of a caller in MSVC. I can use _ReturnAddress() to get the return address of my function, but I can't seem to find a way to get the caller's.
I've tried using CaptureStackBackTrace, but for some reason, it crashes after many, many calls. I would also prefer a solution via inline assembly.
void my_function(){
cout << "return address of caller_function: " << [GET CALLER'S RETURN VALUE];
} // imaginary return address: 0x15AF7C0
void caller_function(){
my_function();
}// imaginary return address: 0x15AFA70
Output:
return address of caller_function: 0x15AFA70
In Windows, you can use RtlCaptureStackBackTrace or RtlWalkFrameChain to do this safely without relying on debug-mode code-gen. See RbMn's answer in comments
In GNU C / C++ (docs), the equivalent is
void * __builtin_return_address (unsigned int level). So __builtin_return_address(0) to get your own, __builtin_return_address(1) to get your parent's. The manual warns that it's only 100% safe with an arg of 0 and might crash with higher values, but many platforms do have stack-unwind metadata that it can use.
MSVC 32-bit debug/unoptimized builds only
If there is a preserved call stack (i.e. on debug builds or when optimizations are not present) and considering MSVC x86 as target PE you could do something like:
void *__cdecl get_own_retaddr_debugmode()
{
// consider you can put this asm inline snippet inside the function you want to get its return address
__asm
{
MOV EAX, DWORD PTR SS:[EBP + 4]
}
// fall off the end of a non-void function after asm writes EAX:
// supported by MSVC but not clang's -fasm-blocks option
}
On debug builds, when optimization are disabled on the compiler (MSVC compiler argument: /Od) and when frame pointer is not omitted (MSVC compiler argument: /Oy-) function calls to cdecl functions will always save the return address at the offset +4 of the callee stack frame. The register EBP stores the head of the running function's stack frame. So in the code above foo will return the return address of its caller.
With optimization enabled, even this breaks: it can inline into the caller, and MSVC doesn't even set up EBP as a frame pointer for this function (Godbolt compiler explorer) because the asm doesn't reference any C local variables. A naked function that used mov eax, [esp] ; ret would work reliably.
By reading again your question I think you might want the return address of the caller of the caller. You can do this by accessing the immediate caller's stack frame and then getting its return address. Something like this:
// only works if *the caller* was compiled in debug mode
// as well as this function
void *__cdecl get_caller_retaddr_unsafe_debug_mode_only()
{
__asm
{
MOV ECX, DWORD PTR SS:[EBP + 0] // [EBP+0] points to caller stack frame pointer
MOV EAX, DWORD PTR SS:[ECX + 4] // get return address of the caller of the caller
}
}
It is important to note that this requires the caller to have set up EBP as a frame pointer with the traditional stack-frame layout. This isn't part of the calling convention or ABI in modern OSes; stack unwinding for exceptions uses different metadata. But it will be the case if optimization is disabled for the caller.
As noted by Michael Petch, MSVC doesn't allow the asm inline construct on x86-64 C/C++ code. Despite that the compiler allows a whole set of intrinsic functions to deal with that.
From the example giving above, the calling convension used here __cdecl is not in there right order. This is how it should be on current MVSC++ compilers code specification.
// getting our own return address is easy, and should always work
// using inline asm at all forces MSVC to set up EBP as a frame pointer even with optimization enabled
// But this function might still inline into its caller
void __cdecl *get_own_retaddr()
{
// consider you can put this asm inline snippet inside the function you want to get its return address
__asm
{
MOV EAX, DWORD PTR SS:[EBP + 4]
}
// fall off the end of a non-void function after asm writes EAX:
// supported by MSVC but not clang's -fasm-blocks option
}
Same applies to other examples provided above.

replace inline assembly tailcall function epilogue with Intrinsics for x86/x64 msvc

I took an inactive project and already fixed a lot in it, but I don't get an Intrinsics replacement correctly to work for the used inline assembly, which is no longer supported in the x86/x64 msvc compilers.
#define XCALL(uAddr) \
__asm { mov esp, ebp } \
__asm { pop ebp } \
__asm { mov eax, uAddr } \
__asm { jmp eax }
Use cases:
static oCMOB * CreateNewInstance() {
XCALL(0x00718590);
}
int Copy(class zSTRING const &, enum zTSTR_KIND const &) {
XCALL(0x0046C2D0);
}
void TrimLeft(char) {
XCALL(0x0046C630);
}
This snippet goes at the bottom of a function (which can't inline, and must be compiled with ebp as a frame pointer, and no other registers that need restoring). It looks quite brittle, or else it's only useful in cases where you didn't need inline asm at all.
Instead of returning, it jumps to uAddr, which is equivalent to making a tailcall.
There aren't intrinsics for arbitrary jumps or manipulation of the stack. If you need that, you're out of luck. It doesn't make sense to ask about this snippet by itself, only with enough context to see how it's being used. i.e. is it important which return address is on the stack, or is it ok for it to compile to call/ret instead of jmp to that address? (See the first version of this answer for a simple example of using it as a function pointer.)
From your update, your use-cases are just a very clunky way to make wrappers for absolute function pointers.
We can instead define static const function pointers of the right types, so no wrapper is needed and the compiler can call directly from wherever you use these. static const is how we let the compile know it can fully inline the function pointers, and doesn't need to store them anywhere as data if it doesn't want to, just like normal static const int xyz = 2;
struct oCMOB;
class zSTRING;
enum zTSTR_KIND { a, b, c }; // enum forward declarations are illegal
// C syntax
//static oCMOB* (*const CreateNewInstance)() = (oCMOB *(*const)())0x00718590;
// C++11
static const auto CreateNewInstance = reinterpret_cast<oCMOB *(*)()>(0x00718590);
// passing an enum by const-reference is dumb. By value is more efficient for integer types
static const auto Copy = reinterpret_cast<int (*)(class zSTRING const &, enum zTSTR_KIND const &)>(0x0046C2D0);
static const auto TrimLeft = reinterpret_cast<void (*)(char)> (0x0046C630);
void foo() {
oCMOB *inst = CreateNewInstance();
(void)inst; // silence unused warning
zSTRING *dummy = nullptr; // work around instantiating an incomplete type
int result = Copy(*dummy, c);
(void) result;
TrimLeft('a');
}
It also compiles just fine with x86-64 and 32-bit x86 MSVC, and gcc/clang 32 and 64-bit on the Godbolt compiler explorer. (And also non-x86 architectures). This is the 32-bit asm output from MSVC, so you could compare with what you get for your nasty wrapper functions. You can see that it's basically inlined the useful part (mov eax, uAddr / jmp or call) into the caller.
;; x86 MSVC -O3
$T1 = -4 ; size = 4
?foo##YAXXZ PROC ; foo
push ecx
mov eax, 7439760 ; 00718590H
call eax
lea eax, DWORD PTR $T1[esp+4]
mov DWORD PTR $T1[esp+4], 2 ; the by-reference enum
push eax
push 0 ; the dummy nullptr
mov eax, 4637392 ; 0046c2d0H
call eax
push 97 ; 00000061H
mov eax, 4638256 ; 0046c630H
call eax
add esp, 16 ; 00000010H
ret 0
?foo##YAXXZ ENDP
For repeated calls to the same function, the compiler would keep the function pointer in a call-preserved register.
For some reason even with 32-bit position-dependent code, we don't get a direct call rel32. The linker can calculate the relative offset from the call-site to the absolute target at link time, so there's no reason for the compiler to use a register-indirect call.
If we didn't tell the compiler to create position-independent code, it's a useful optimization in this case to address absolute addresses relative to the code, for jumps/calls.
In 32-bit code, every possible destination address is in range from every possible source address, but in 64-bit it's harder. In 32-bit mode, clang does spot this optimization! But even in 32-bit mode, MSVC and gcc miss it.
I played around with some stuff with gcc/clang:
// don't use
oCMOB * CreateNewInstance(void) asm("0x00718590");
Kind of works, but only as a total hack. Gcc just uses that string as if it were a symbol, so it feeds call 0x00718590 to the assembler, which handles it correctly (generating an absolute relocation which links just fine in a non-PIE executable). But with -fPIE, we it emits 0x00718590#GOTPCREL as a symbol name, so we're screwed.
Of course, in 64-bit mode a PIE executable or library will be out of range of that absolute address so only non-PIE makes sense anyway.
Another idea was to define the symbol in asm with an absolute address, and provide a prototype that would get gcc to only use it directly, without #PLT or going through the GOT. (I maybe could have done that for the func() asm("0x..."); hack, too, using hidden visibility.)
I only realized after hacking this up with the "hidden" attribute that this is useless in position-independent code, so you can't use this in a shared library or PIE executable anyway.
extern "C" is not necessary, but means I didn't have to mess with name mangling in the inline asm.
#ifdef __GNUC__
extern "C" {
// hidden visibility means that even in a PIE executable, or shared lib,
// calls will go *directly* to that address, not via the PLT or GOT.
oCMOB * CNI(void) __attribute__((__visibility__("hidden")));
}
//asm("CNI = 0x718590"); // set the address of a symbol, like `org 0x71... / CNI:`
asm(".set CNI, 0x718590"); // alternate syntax for the same thing
void *test() {
CNI(); // works
return (void*)CNI; // gcc: RIP+0x718590 instead of the relative displacement needed to reach it?
// clang appears to work
}
#endif
disassembly of compiled+linked gcc output for test, from Godbolt, using the binary output to see how it assembled+linked:
# gcc -O3 (non-PIE). Clang makes pretty much the same code, with a direct call and mov imm.
sub rsp,0x8
call 718590 <CNI>
mov eax,0x718590
add rsp,0x8
ret
With -fPIE, gcc+gas emits lea rax,[rip+0x718590] # b18ab0 <CNI+0x400520>, i.e. it uses the absolute address as an offset from RIP, instead of subtracting. I guess that's because gcc literally emits lea CNI(%rip),%rax, and we've defined CNI as an assemble-time symbol with that numeric value. Oops. So it's not quite like a label with that address like you'd get with .org 0x718590; CNI:.
But since we can only use rel32 call in non-PIE executables, this is ok unless you compile with -no-pie but forget -fno-pie, in which case you're screwed. :/
Providing a separate object file with the symbol definition might have worked.
Clang appears to do exactly what we want, though, even with -fPIE, with its built-in assembler. This machine code could only have linked with -fno-pie (the default on Godbolt, not the default on many distros.)
# disassembly of clang -fPIE machine-code output for test()
push rax
call 718590 <CNI>
lea rax,[rip+0x3180b3] # 718590 <CNI>
pop rcx
ret
So this is actually safe (but sub-optimal because lea rel32 is worse than mov imm32.) With -m32 -fPIE, it doesn't even assemble.

compiler memory optimization - reusing existing blocks

Say i were to allocate 2 memory blocks.
I use the first memory block to store something and use this stored data.
Then i use the second memory block to do something similar.
{
int a[10];
int b[10];
setup_0(a);
use_0(a);
setup_1(b);
use_1(b);
}
|| compiler optimizes this to this?
\/
{
int a[10];
setup_0(a);
use_0(a);
setup_1(a);
use_1(a);
}
// the setup functions overwrites all 10 words
The question is now: Do compiler optimize this, so that they reuse the existing memory blocks, instead of allocating a second one, if the compiler knows that the first block will not be referenced again?
If this is true:
Does this also work with dynamic memory allocation?
Is this also possible if the memory persists outside the scope, but is used in the same way as given in the example?
I assume this only works if setup and foo are implemented in the same c file (exist in the same object as the calling code)?
Do compiler optimize this
This question can only be answered if you ask about a particular compiler. And the answer can be found by inspecting the generated code.
so that they reuse the existing memory blocks, instead of allocating a second one, if the compiler knows that the first block will not be referenced again?
Such optimization would not change the behaviour of the program, so it would be allowed. Another matter is: Is it possible to prove that the memory will not be referenced? If it is possible, then is it easy enough to prove in reasonable time? I feel very safe in saying that it is not possible to prove in general, but it is provable in some cases.
I assume this only works if setup and foo are implemented in the same c file (exist in the same object as the calling code)?
That would usually be required to prove the untouchability of the memory. Link time optimization might lift this requirement, in theory.
Does this also work with dynamic memory allocation?
In theory, since it doesn't change the behaviour of the program. However, the dynamic memory allocation is typically performed by a library and thus the compiler may not be able to prove the lack of side-effects and therefore wouldn't be able to prove that removing an allocation wouldn't change behaviour.
Is this also possible if the memory persists outside the scope, but is used in the same way as given in the example?
If the compiler is able to prove that the memory is leaked, then perhaps.
Even though the optimization may be possible, it is not very significant. Saving a bit of stack space probably has very little effect on run time. It could be useful to prevent stack overflows if the arrays are large.
https://godbolt.org/g/5nDqoC
#include <cstdlib>
extern int a;
extern int b;
int main()
{
{
int tab[1];
tab[0] = 42;
a = tab[0];
}
{
int tab[1];
tab[0] = 42;
b = tab[0];
}
return 0;
}
Compiled with gcc 7 with -O3 compilation flag:
main:
mov DWORD PTR a[rip], 42
mov DWORD PTR b[rip], 42
xor eax, eax
ret
If you follow the link you should see the code being compiled on gcc and clang with -O3 optimisation level. The resulting asm code is pretty straight forward. As the value stored in the array is know at compilation time, the compiler can easily skip everything and straight up set the variables a and b. Your buffer is not needed.
Following a code similar to the one provided in your example:
https://godbolt.org/g/bZHSE4
#include <cstdlib>
int func1(const int (&tab)[10]);
int func2(const int (&tab)[10]);
int main()
{
int a[10];
int b[10];
func1(a);
func2(b);
return 0;
}
Compiled with gcc 7 with -O3 compilation flag:
main:
sub rsp, 104
mov rdi, rsp ; first address is rsp
call func1(int const (&) [10])
lea rdi, [rsp+48] ; second address is [rsp+48]
call func2(int const (&) [10])
xor eax, eax
add rsp, 104
ret
You can see the pointer sent to the function func1 and func2 is different as the first pointer used is rsp in the call to func1, and [rsp+48] in the call to func2.
You can see that either the compiler completely ignores your code in the case it is predictable. In the other case, at least for gcc 7 and clang 3.9.1, it is not optimized.
https://godbolt.org/g/TnV62V
#include <cstdlib>
extern int * a;
extern int * b;
inline int do_stuff(int ** to)
{
*to = (int *) malloc(sizeof(int));
(**to) = 42;
return **to;
}
int main()
{
do_stuff(&a);
free(a);
do_stuff(&b);
free(b);
return 0;
}
Compiled with gcc 7 with -O3 compilation flag:
main:
sub rsp, 8
mov edi, 4
call malloc
mov rdi, rax
mov QWORD PTR a[rip], rax
call free
mov edi, 4
call malloc
mov rdi, rax
mov QWORD PTR b[rip], rax
call free
xor eax, eax
add rsp, 8
ret
While not being fluent at reading this, it is pretty easy to tell that with the following example, malloc and free is not being optimized neither by gcc or clang (if you want to try with more compiler, suit yourself but don't forget to set the optimization flag).
You can clearly see a call to "malloc" followed by a call to "free", twice
Optimizing stack space is quite unlikely to really have an effect on the speed of your program, unless you manipulate large amount of data.
Optimizing dynamically allocated memory is more relevant. AFAIK you will have to use a third-party library or run your own system if you plan to do that and this is not a trivial task.
EDIT: Forgot to mention the obvious, this is very compiler dependent.
As the compiler sees that a is used as a parameter for a function, it will not optimize b away. It can't, because it doesn't know what happens in the function that uses a and b. Same for a: the compiler doesn't know that a isn't used anymore.
As far as the compiler is concerned, the address of a could e.g. have ben stored by setup0 in a global variable and will be used by setup1 when it is called with b.
The short answer is: No! The compiler cannot optimize this code to what you suggested, because it is not semantically equivalent.
Long explenation: The lifetime of a and b is with some simplification the complete block.
So now lets assume, that one of setup_0 or use_0 stores a pointer to a in some global variable. Now setup_1 and use_1 are allowed to use a via this global variable in combination with b (It can for example add the array elements of a and b. If the transformation you suggested of the code was done, this would result in undefined behaviour. If you really want to make a statement about the lifetime, you have to write the code in the following way:
{
{ // Lifetime block for a
char a[100];
setup_0(a);
use_0(a);
} // Lifetime of a ends here, so no one of the following called
// function is allowed to access it. If it does access it by
// accident it is undefined behaviour
char b[100];
setup_1(b); // Not allowed to access a
use_1(b); // Not allowed to access a
}
Please also note that gcc 12.x and clang 15 both do the optimization. If you comment out the curly brackets, the optimization is (correctly!) not done.
Yes, theoretically, a compiler could optimize the code as you describe, assuming that it could prove that these functions did not modify the arrays passed in as parameters.
But in practice, no, that does not happen. You can write a simple test case to verify this. I've avoided defining the helper functions so the compiler can't inline them, but passed the arrays by const-reference to ensure that the compiler knows the functions don't modify them:
void setup_0(const int (&p)[10]);
void use_0 (const int (&p)[10]);
void setup_1(const int (&p)[10]);
void use_1 (const int (&p)[10]);
void TestFxn()
{
int a[10];
int b[10];
setup_0(a);
use_0(a);
setup_1(b);
use_1(b);
}
As you can see here on Godbolt's Compiler Explorer, no compilers (GCC, Clang, ICC, nor MSVC) will optimize this to use a single stack-allocated array of 10 elements. Of course, each compiler varies in how much space it allocates on the stack. Some of that is due to different calling conventions, which may or may not require a red zone. Otherwise, it's due to the optimizer's alignment preferences.
Taking GCC's output as an example, you can immediately tell that it is not reusing the array a. The following is the disassembly, with my annotations:
; Allocate 104 bytes on the stack
; by subtracting from the stack pointer, RSP.
; (The stack always grows downward on x86.)
sub rsp, 104
; Place the address of the top of the stack in RDI,
; which is how the array is passed to setup_0().
mov rdi, rsp
call setup_0(int const (&) [10])
; Since setup_0() may have clobbered the value in RDI,
; "refresh" it with the address at the top of the stack,
; and call use_0().
mov rdi, rsp
call use_0(int const (&) [10])
; We are now finished with array 'a', so add 48 bytes
; to the top of the stack (RSP), and place the result
; in the RDI register.
lea rdi, [rsp+48]
; Now, RDI contains what is effectively the address of
; array 'b', so call setup_1().
; The parameter is passed in RDI, just like before.
call setup_1(int const (&) [10])
; Second verse, same as the first: "refresh" the address
; of array 'b' in RDI, since it might have been clobbered,
; and pass it to use_1().
lea rdi, [rsp+48]
call use_1(int const (&) [10])
; Clean up the stack by adding 104 bytes to compensate for the
; same 104 bytes that we subtracted at the top of the function.
add rsp, 104
ret
So, what gives? Are compilers just massively missing the boat here when it comes to an important optimization? No. Allocating space on the stack is extremely fast and cheap. There would be very little benefit in allocating ~50 bytes, as opposed to ~100 bytes. Might as well just play it safe and allocate enough space for both arrays separately.
There might be more of a benefit in reusing the stack space for the second array if both arrays were extremely large, but empirically, compilers don't do this, either.
Does this work with dynamic memory allocation? No. Emphatically no. I've never seen a compiler that optimizes around dynamic memory allocation like this, and I don't expect to see one. It just doesn't make sense. If you wanted to re-use the block of memory, you would have written the code to re-use it instead of allocating a separate block.
I suppose you are thinking that if you had something like the following C code:
void TestFxn()
{
int* a = malloc(sizeof(int) * 10);
setup_0(a);
use_0(a);
free(a);
int* b = malloc(sizeof(int) * 10);
setup_1(b);
use_1(b);
free(b);
}
that the optimizer could see that you were freeing a, and then immediately re-allocating a block of the same size as b? Well, the optimizer won't recognize this and elide the back-to-back calls to free and malloc, but the run-time library (and/or operating system) very likely will. free is a very cheap operation, and since a block of the appropriate size was just released, allocation will also be very cheap. (Most run-time libraries maintain a private heap for the application and won't even return the memory to the operating system, so depending on the memory-allocation strategy, it's even possible that you get the exact same block back.)

GCC inline assembly error: Cannot take the address of 'this', which is an rvalue expression

I'm still fighting with GCC - compiling the following inline assembly code (with -fasm-blocks, which enables Intel style assembly syntax) nets me a strange error Cannot take the address of 'this', which is an rvalue expression...
MyClass::MyFunction()
{
_asm
{
//...
mov ebx, this // error: Cannot take the address of 'this', which is an rvalue expression
//...
mov eax, this // error: Cannot take the address of 'this', which is an rvalue expression
//...
};
}
Why can I store pointers to different objects in registers, but can't use pointer to MyClass instance?
That's because the compiler might decide on its own to store this in a register (generally ECX) instead of a memory cell, for optimization purposes, or because the calling convention explicitly specifies it should do that.
In that case, you cannot take its address, because registers are not addressable memory.
You can use something like this:
#include <stdio.h>
class A{
public:
void* work(){
void* result;
asm( "mov %%eax, %%eax"
: "=a" (result) /* put contents of EAX to result*/
: "a"(this) /* put this to EAX */
);
return result;
}
};
int main(){
A a;
printf("%x - %x\n", &a, a.work());
}
See more details on operands passing to inline asm here
Practically speaking, each implementation defines its own rules with
regards to asm. In the case of g++, it looks like when you write mov
ebx, something, g++ needs the address of something in order to
generate the instruction. (Not to surprising, really, given the way
assemblers work.) this doesn't have an address. (That's what being
an rvalue means.) The implementation could treat this as a special
case in inline assembler, and replace it with whatever is appropriate at
that spot in the code. g++ doesn't do this, probably because it has
another, more general mechanism (elder_george's solution) which handles the problem.