Why does using the wrong calling convention sometimes work? - c++

I used "StartServiceCtrlDispatcher" function to register a callback function (called ServiceMain) in windows, but the callback function I declared got compiled with the wrong calling convention.
The thing is that on some computers, when the application returned from the callback function, the application crashed, but on other computers the application did not crash.
Now, once I found the bug everything worked, but I just don't understand why on some computers it worked correctly without crashing ?
Thanks! :-)

This is all very Windows-specific, we're not talking standard C++ here.
Checking out the documentation of StartServiceDispatcher it has only one argument, and is declared as WINAPI which in turn means __stcall calling convention.
For freestanding functions, __stdcall is one of two main calling conventions. The other one is __cdecl. The machine code level difference is simply who restores the stack pointer: with __stdcall it is the function itself, while with __cdecl it is the calling code.
When the function actually is __stdcall but is invoked as if it was __cdecl, the situation is that there are two attempts to restore the stack pointer: one at the exit from the function, and one in the calling code. The one in the function will succeed. Depending on how the attempt in the calling code is done, it can mess things up thoroughly (e.g. if just adding the required offset, treating the stack pointer as relative), or it may have no harmful effect. But it's very likely to create a mess, since the assumption about the stack pointer value on return from the function, is incorrect.
When the function actually is __cdecl it will not itself restore the stack pointer, since that is the calling code's responsibility. And if the calling code is treating it as __stdcall then the calling code won't restore it either, since from the calling code's view the function is doing that. The result, if you don't get an early crash (because of broken assumptions), should then be that repeated calls, say in a loop, will eat stack space.
It's all very much Undefined Behavior.
And one property of Undefined Behavior is that it can do anything, including apparently working…
Cheers & hth.,

Calling conventions differ in the details, like which registers are preserved. If you happened to not store anything you still needed in those registers, then it didn't matter that they were erased when they didn't have to be. Similarly, if your calling convention differs about how it deals with return values, if you don't return anything, then it doesn't matter.
Fortunately, x64 only has one calling convention and this whole mess will be in the past.

The computers where the application crashed might have been using .NET Framework version 4.
Have a look at the following article:
http://msdn.microsoft.com/en-us/library/ee941656.aspx
It states the following under Interoperability - Platform Invoke:
"To improve performance in interoperability with unmanaged code, incorrect calling conventions in a platform invoke now cause the application to fail. In previous versions, the marshaling layer resolved these errors up the stack."

This is all related to what is current in the memory. Let's assume you have two functions like this:
void stdcall f1(...) { ... }
void cdecl f2(...) { ... }
stdcall is Windows calling convention, while cdecl is used by most compilers. The difference between them is who owns the responsibility to clear the stack after the call. In stdcall, the callee (f1, or f2) does, in cdecl, the caller does.
The stack, after all, is filled with unknown values. Therefore, when it gets cleaned up (wrongly), the next value that you access in the stack is undetermined. It could very well be an acceptable value, or it could be a very bad one. This is, in principle, how stack overflow (the bug, not the site) works.

Related

Why/When would/should you use __attribute__((noreturn))? [duplicate]

This question already has answers here:
What is the point of noreturn?
(5 answers)
Closed last year.
As I understand, this attribute tells the compiler that a function does not return. What is the benefit of specifying this when a function has a void return?
How are these two functions different?
void foo(){}
__attribute__((noreturn)) void foo(){}
It doesn't mean that the function doesn't return a value - it means that the function doesn't return at all. Calling the function gives a one way trip.
There's some limited use of this in hardware-related programming, where you don't want any function call overhead to be generated on the caller-side stack, since you aren't going to return there anyway.
For example when calling void main (void) from the CRT start-up code inside a microcontroller embedded system - these are systems which keep running until you plug the power and they will never return from main(). So if the compiler for such a system supports _Noreturn void main (void), then that creates less wasted stack space. Otherwise the CRT will just push x bytes on the stack which will remain there forever, as dead space.
It might also be useful for diagnostic purposes, such as when examining the behavior of compilers. As was done here, when demonstrating some major bugs in the clang compiler.
There is no relation between the return type, and a "no-return" behavior of the function.
The void return type is merely telling that this function returns no value (in case it returns).
Some function do not ever return to the caller. For example some message processing infinite loops, the exit() function or the functions for replacing the current process (such as exec* family). The __attribute__((noreturn)) is hinting the compiler that it might consider certain optimization for this specific function which are valid for functions that do not return - such as optimizing the overhead of calling such a function (like saving context of the caller, return address and such).

What may cause EnumProcesses() to fail?

The documentation states:
If the function fails, the return value is zero. To get extended error
information, call GetLastError.
But it doesn't give any example how the function could possibly fail.
For unit testing I need to reliably create a situation that makes EnumProcesses() fail.
Like most functions, it can fail if you pass it invalid parameters. In this case that means a smaller PID array than the size you tell it or a NULL pointer for the received count. It is a bit risky to do this on purpose because you don't know if the function uses SEH to protect against this or if it will just crash.
Internally the function has to allocate some memory before calling into NTDLL to get the process information and this can cause the function to fail if there is not enough memory available.
You should call EnumProcesses in a helper function to abstract away the memory/retry details anyway and that would be a good place to simulate failures when needed.
If you absolutely need the function itself to fail you could hook it with something like Microsoft Detours or IAT hooking...

How does the compiler know where control should return to after a function call?

Consider the following functions:
int main()
{
//statement(s);
func1();
//statement(s);
}
void func1()
{
//statement(s);
func2();
//statement(s);
}
void func2()
{
//statement(s);
}
How does the compiler know where to return to after the func2 has performed all its operations? I know the control transfers to function func1 (and exactly which statement), but how does the compiler knows it? What tells the compiler where to return to?
This is typically implemented using a call stack:
When control is being transfered to a function, the address to return to is pushed onto the stack.
When the function finishes, the address is popped off the stack and used to transfer control back to the callee.
The details are typically mandated by the hardware architecture for which the code is being compiled.
Actually, the compiler doesn't run the code, but the machine does, and when it calls a new function, it stores the address of the next instruction to be executed after the function currently being called on the stack, so that when the function returns it can pop it off back in to the Instruction Pointer (IP) and resume from there.
I've simplified things a bit for the sake of explanation.
When a function is called, the correct return address in the calling function is placed somewhere, usually the stack though the standard does not mandate that, that is used for precisely the purpose of storing the return address.
It is the compiler's duty to ensure that its calling conventions are such that unless something goes wrong (for example, a stack overflow), then the called function knows how to return to the calling function.
The runtime makes use of some thing called as a 'call stack' which basically holds the address of the next statement to call after the function being called is returned. So when a function call is made and before the control jumps to the new instruction address, the next instruction address in the calling function is pushed on to the stack. And this process is repeated for every subsequent call to any function. Now why only a stack? because it's necessary to get back to the point where it left off - which is basically a 'last in first out' behavior and stack is the data structure that does that. You can actually look at this call stack when you are debugging a program in Visual Studio - there's a separate window called 'Call Stack' which shows the entries of the addresses placed in the call stack.

C++ Function Hook (memory address only)

I have a memory address, its the memory address of a function in another program (one of its dlls). I am already loaded into the program via DLL injection. I already have the bass address, and the actual location of the function each time the program loads. So, this is not an issue.
I want to just simply hook that location, and grab the variables. I know the function's pseudocode. So this is not an issue. OR another approach that would be great is doing a break point at that memory location and grab the debug registers.
I can not find any clear-cut examples of this. I also do not have the "name" of the function, I just have the memory address. Is there any way to work with just a memory address? Most, if not all the examples have you use the name of the function, which I do not have.
If anyone could point me into the right direction so I can accomplish this task, I would greatly appreciate it. It also might help a lot of other people who may have the same question.
Edit: I should also mention that Id rather not overload my program with someone else code, I really just want the barebones, much like a basic car with roll-up windows. No luxury packages for me please.
You missed the most important part, is this for 32 or 64 bit code? In any case, the code project has a good run-down and lib here that covers both.
If you want to do this "old-school", then it can be done quite simply:
firstly, you need to find the virtual address of the function you want to hook (due to ASLR, you should never rely on it being in the same place), this is generally done with RVA + module base load address for function that are not exported, for exported functions, you can use GetProcAddress.
From there, the type hook depends on what you want to accomplish, in your case, there are two methods:
patch a jump/call out to your function in the target function' prologue
patch all call sites to the function you want to hook, redirecting to your function
the first is simpler, but messy as it generally involves some inline assembly (unless you are hooking a /HOTPATCH binary or you just want to stub it), the second is much cleaner, but requires a bit of work with a debugger.
The function you'll jump out to should have the same parameters and calling convention (ABI) as the function you are hooking, this function is where you can capture the passed parameters, manipulate them, filter calls or whatever you are after.
for both, you need a way to write some assembly to do the patching, under windows, WriteProcessMemory is your first port of call (note: you require RWX permissions to do this, hence the calls to VirtualProtect), this is a little utility function that creates a 32bit relative call or jump (depending on the opcode passed as eType)
#pragma pack(1)
struct patch_t
{
BYTE nPatchType;
DWORD dwAddress;
};
#pragma pack()
BOOL ApplyPatch(BYTE eType, DWORD dwAddress, const void* pTarget)
{
DWORD dwOldValue, dwTemp;
patch_t pWrite =
{
eType,
(DWORD)pTarget - (dwAddress + sizeof(DWORD) + sizeof(BYTE))
};
VirtualProtect((LPVOID)dwAddress,sizeof(DWORD),PAGE_EXECUTE_READWRITE,&dwOldValue);
BOOL bSuccess = WriteProcessMemory(GetCurrentProcess(),(LPVOID)dwAddress,&pWrite,sizeof(pWrite),NULL);
VirtualProtect((LPVOID)dwAddress,sizeof(DWORD),dwOldValue,&dwTemp);
return bSuccess;
}
This function works great for method 2, but for method 1, you'll need to jump to an intermediary assembly trampoline to restore any code that the patch overwrote before returning to the original function, this gets very tedious, which is why its better to just use an existing and tested library.
From the sounds of it, using method 1 and patching a jump over the prologue of your target function will do what you need, as it seems you don't care about executing the function you patched.
(there is a third method using HW breakpoints, but this is very brittle, and can become problematic, as you are limited to 4 HW breakpoints).
Your "sample" is here:
http://www.codeproject.com/Articles/4610/Three-Ways-to-Inject-Your-Code-into-Another-Proces#section_1
Normally when you "hook" into the DLL, you actually put your function in front of the one in the DLL that gets called, so your function gets called instead. You then capture whatever you want, call the other function, capture its return values and whatever else, then return to the original caller.

How do you call a function in another address-space in C++

I'm aware of the threading issues etc that this could cause and of its dangers but I need to know how to do this for a security project I am doing at school. I need to know how to call a function in a remote address space of a given calling convention with arguments - preferably recovering the data the remote function has returned though its really not required that I do.
If I can get specifics from the remote function's function prototype at compile time, I will be able to make this method work. I need to know how big the arguments are and if the arguments are explicitly declared as pointers or not (void*, char*, int*, etc...)
I.e if I define a function prototype like:
typedef void (__cdecl *testFunc_t)(int* pData);
I would need to, at compile time, get the size of arguments at least, and if I could, which ones are pointers or not. Here we are assuming the remote function is either an stdcall or _cdecl call.
The IDE I am using is Microsoft Visual Studio 2007 in case the solution is specific to a particular product.
Here is my plan:
Create a thread in the remote process using CreateRemoteThread at the origin of the function want to call, though I would do so in a suspended state.
I would setup the stack such that the return address was that of a stub of code allocated inside of the process that would call ExitThread(eax) - as this would exit the thread with the function's return value - I would then recover this by by using GetExitCodeThread
I would also copy the arguments for the function call from my local stack to that of the newly created thread - this is where I need to know if function arguments are pointers and the size of the arguments.
Resume the thread and wait for it to exit, at which point I will return to the caller with the threads exit code.
I know that this should be doable at compile time but whether the compiler has some method I can use to do it, I'm not sure. I'm also aware all this data can be easily recovered from a PDB file created after compiling the code and that the size of arguments might change if the compiler performs optimizations. I don't need to be told how dangerous this is, as I am fully aware of it, but this is not a commercial product but a small project I must do for school.
The question:
If I have a function prototype such as
typedef void (__cdecl testFunc_t)(int pData);
Is there anyway I can get the size of this prototype's arguments at compile time(i.e in the above example, the arguments would sum to a total size of sizeof(int*) If, for example, I have a function like:
template<typename T> unsigned long getPrototypeArgLength<T>()
{
//would return size of arguments described in the prototype T
}
//when called as
getPrototypeArgLength<testFunc>()
This seems like quite a school project...
For step 3 you can use ReadProcessMemory / WriteProcessMemory (one of them). For example, the new thread could receive the address (on the calling process), during the thread creation, of the parameters on the start (begin and end). Then it could read the caller process memory from that region and copy it to its own stack.
Did you consider using COM for this whole thing? you could probably get things done much easier if you use a mechanism that was designed especially for that.
Alright, I figured out that I can use the BOOST library to get a lot of type information at compile-time. Specifically, I am using boost::function_traits however, if you look around the boost library, you will find that you can recover quite a bit of information. Here's a bit of code I wrote to demonstrate how to get the number of arguments of a function prototype.
(actually, I haven't tested the below code, its just something I'm throwing together from another function I've made and tested.)
template<typename T>
unsigned long getArgCount()
{
return boost::function_traits<boost::remove_pointer<T>::type>::arity;
}
void (*pFunc)(int, int);
2 = getArgCount<BOOST_TYPEOF(pFunc)>();