Building a 'shadow stack' across threads when profiling the CLR

Building a 'shadow stack' across threads when profiling the CLR - c++

I am building a profiler, and am trying to replicate the call stack of the application being profiled.
This article on profiling the CLR recommends building a 'shadow' stack using the Enter/Leave method callbacks (SetEnterLeaveFunctionHooks), rather than using the snapshot method.
Is there a way to associate these callbacks when they are part of the same call stack, but could potentially occur on different threads? (for example, when the application being profiled uses Task.Run or async/await).
Ideally, I would like to show the user a call stack that 'follows' async methods, so associating them by ThreadId is unreliable.

Related

Can the size of a COM queue cause a stack overflow?

I guess this can only happen in an STA setting due to the fact that all COM calls are handled by a single thread.
I have a test scenario with a .NET application making a COM call to a C++ component which calls a remote COM component. (I have the code of all components)
This .NET application starts a few hundred Tasks. Each Tasks does between 3 and 5 COM calls, each of them doing at least one more COM call to the remote component.
So we have a lot of COM calls.
What happens is that after a while this program runs into a stack overflow error although there is no recursion happening on any level.
In the debugger I can see that the thread causing the exception has all the COM calls waiting to be processed queued up. I can see that quite clearly because it starts at the COM call to the native component and stops at the COM call to the remote component.
Does this even make sense?
Can too many COM calls cause a stack overflow?
Is there a way to resolve this?

c++ watchdog for 3rd party lib calls

I have a problem with long running boost::regex_match(...) invocation in a threaded process environment. But it could be another lib (API call) having the same problem.
Is there a generic way to set up a watchdog for such?
For non-threaded process alarm() can be used to detect timeout.
However, signals don't play nicely with threads. I can avoid direct use of alarm() in the thread and delegate timer mgt. to a dedicated separate thread and let that one use pthread_kill(...) to address the correct threads (this is just an idea - i didn't yet verify that part).
However, also this only interrupts and detects the situation, but cannot gracefully stop boost::regex_match(...).
I played around with Throwing an exception from within a signal handler using sigsetjmp() and siglongjmp() for the thread using boost::regex_match(..).
But it causes memory leaks in boost::regex_match(...) becausesiglongjmp()` bypasses destructors.
How can i gracefully stop a 3rd party API call - presuming that it's implemented exception safe?
Or does it have to be supported by some "stoppable" feature actively implemented in the 3rd party API? (is there some for the boost library?)
Maybe some strange idea, but:
Code can be implemented to be "thread-safe" and/or "exception-safe".
Would it be an option to define "longjmp-safe"? This could be done by passing an additional token to a lib to let is associate all resource allocations to that token. After longjmp() the client SW could ask the API separately to release those resources.
simpler maybe would just be some central init()/release() or register()/unregister() API call, by which the API could clean-up itself.

In a case where you have to:
monitor exceeding execution time
stop execution of processing
you should simply think for tasks instead of threads.
Using threads is something which sounds like "state of the art" but in practice tasks are very often the better way of implementation. Especially for controlling memory leeks in "undefined" end of execution, confine unwanted memory excess and control stack overruns etc.
In the case you have mentioned I tend to implement that as tasks. IPC works well on all known platforms but is not portable. If portability is no problem, changing to a task based solution is not a big deal.
A hanging task can be killed by a os call and all locks, memory and other resources like ipc/shared memory/pipes etc. will be removed automatically. So this fits much better to your problem and it did not depend on your external and maybe unchangeable third party components.

Is CString::LoadString() thread-safe?

I am implementing a multithreaded application that invokes modules from a legacy application written using MFC.
My code runs perfectly when I run it using only one thread, but if I run it using more than one thread, I always get an assertion when CString::LoadString() invokes AfxGetResourceHandle(). The string that is invoking LoadString() is a local string, so it is not being shared at all.
If I add a mutex before the CString::LoadString() everything goes ok, but because the size of the legacy app and the common use of this method, this solution would be hard to implement and would slow down the app.
I looked into the MS documentation and it says nothing about thread-safety or so on.
Do you know something about LoadString() and multithreaded environments? All the DLLs in my app have the same character set, they all are in DEBUG mode and they all use MFC shared DLL.

Generally you can only access MFC objects from threads created with CWinThread. You didn't provide the exact assertion you got on the secondary thread, but I'm guessing your 'other' threads are created some other way. See MSDN for details on MFC vs. multithreading.

Prevent Dll injection from an Dll C++

I have some doubts about anti dll injection in C++.
I have a game C++ based, Im having problems with hackers with dll injection.
Then i need to prevent it.
I find notify hook there from there:
MSDN - Notification Hooks
But i dont have idea how to use it.
Its is possible notify hook to prevent dll injection?
How its possible? (With and example better).
Can be from dll? (With example better).
Thanks for read that post.
PS: sorry for my english.

Forget it, unless you do very sophisticated things, it's not going to work. By sophisticated I mean something like the code obfuscation, anti-debugging technology used in Skype. Just look at this talk.
You can spend a ton of time on trying to prevent DLL injection, in the end somebody will spend less time than you and circumvent your protection. I think the time would be better invested in an architecture that's more secure and tamperproof (ie calculating scores on the server, etc).
It's a cat and mouse game you can't win.

This question is old but I will briefly answer it in better form for anyone who does happen to stumble upon it magically after a proper response.
You cannot fully prevent code injection from within your own process, but you can try to do some tricks without interception of other processes. It is not recommended because you need to have experience and knowledge with lower-level tasks, especially to get it working properly and not prevent functionality of your own software, however...
Asynchronous Procedure Calls (APC) is an implementation from the Windows Kernel. It is primarily used for code injection into other running processes, Windows uses it a lot itself for a variety of things such as notifications being sent to specific processes. When a user-mode process calls QueueUserApc (KERNEL32), NtQueueApcThread (NTDLL) will be invoked. NtQueueApcThread (NTDLL) will perform a system call which will cause NtQueueApcThread (NTOSKRNL) to be invoked, which is not exported by NTOSKRNL - for anyone wondering, NTOSKRNL is the Windows Kernel, and a system-call is nothing more than a transition from user-mode to kernel-mode since the Native API System Routines exist in kernel-mode memory, NTDLL routines for NTAPI are system call stubs which direct control up to the Windows Kernel. When NtQueueApcThread (NTOSKRNL) is called, it'll use KeInitializeApc and KeInsertQueueApc (both do happen to be exported by NTOSKNL). When the APC is actually issued to the targeted process, KiUserApcDispatcher (NTDLL) will be locally called within the process, unless the APC is performed in a more extensive manner to bypass this activity (99% of the time it will not be prevented). This means that you have an oppertunity to intercept this behavior and prevent APC injection into your own process with one single local hook in your own process, via byte-patching (also known as "inline hooking") KiUserApcDispatcher, exported by NTDLL. The only problem which you will face is that it is undocumented and this is not officially supported by Microsoft; you'll need to figure out how the parameters work and how to prevent the callback routine from blocking off genuine requests which are needed to provide functionality for your own software. This will however include prevention of kernel-mode APC injection, not just user-mode attacks.
There are many ways to inject code into a process, and APC is simply one of them. Another common method would be through remote thread creation. When a user-mode process attacks another process via remote thread creation, it'll typically call CreateRemoteThread (KERNEL32). This will lead down to RtlCreateUserThread (NTDLL), and RtlCreateUserThread will call NtCreateThreadEx (NTDLL). NTDLL will perform a system call and then NtCreateThreadEx (non-exported routine from the Windows Kernel) will be invoked in kernel-mode memory. In the end, the targeted process will have LdrInitializeThunk locally invoked, and RtlUserThreadStart will also be invoked locally. Both of these routines are exported by NTDLL. This is a same scenario as with APC... You can patch LdrInitializeThunk locally, however you must do it properly to prevent genuine functionality within your own software.
These two techniques are not full-proof, there is no "full-proof" solution. There are many ways to inject code into a process, and there are very sophisticated methods to bypass said solutions from myself. Anti-Virus software has been battling anti-RCE/self-protection for as long as I can remember, as has Anti-Cheat systems. You should look into kernel-mode device driver development as well, it'll allow you to register kernel-mode callbacks which can help you out.
The first callback you should look into is ObRegisterCallbacks. It allows you to receive a Pre-operation callback notification whenever NtOpenProcess is called from the Windows Kernel. This means that user-mode processes will also trigger it, since NtOpenProcess ends up being called in kernel-mode after NTDLL makes the system-call. I cannot remember specifically if the callback APIs are triggered in the NtOpenProcess stub itself or if it goes deeper into Ob* kernel-mode only routines, but you can check at ease with WinDbg with remote kernel debugging, or Interactive Disassembler (target ntoskrnl.exe and use the symbolic links provided by Microsoft). ObRegisterCallbacks supports notifications for both handle creation & duplication for the process and the processes' threads, you can strip access rights you don't want permitted for the requested handle.
The second callback you should look into would be PsSetCreateThreadNotifyRoutineEx. This callback routine will allow you to receive a notification whenever a new thread creation occurs on the system; you can filter it out for your own process and if a rogue thread is created, terminate the thread.
The third callback you should look into would be PsSetLoadImageNotifyRoutineEx. This callback will provide a notification whenever a new module is loaded into a process; once again, you can filter for your own process. If you detect a rogue module, you can attempt to have your process call LdrUnloadDll (NTDLL) targeting the base address of the newly loaded image, however the reference count for the module needs to be 0 for it to be unloaded. In that case, you can try "hacky" methods like calling NtUnmapViewOfSection/NtFreeVirtualMemory. Bear in mind, if you mess up the rogue loaded module and it has set memory byte patches to redirect execution flow to its own routines, unless you restore them, your process will crash when they are referenced.
These are some ideas, commonly the ones typically used. Kernel-Mode callbacks are very popular among security software and anti-cheat software. As for thread creation, you'll be interested in mitigating this as much as possible -> if you only look for rogue DLL loads then you'll miss out on reflective DLL loading. Also remember of the other code injection methods, like thread hijacking, shared window memory exploitation with ROP chain call exploitation, DLL patching on-disk, etc.

Whether a new process is forked from a JNI call

I am trying to call a C API from my java program using JNI. Could somebody tell me whether the call to C API would fork a new process internally?...I need this because my concurrent transactions would be very huge so that if new process is forked then there would be so many new processes for every transaction.

The advantage of using JNI is that both the calling program and the called program run in the same process (job) while the other methods start a new process (job). This makes JNI calls faster at startup time and less resource-intensive. However, because Java applications run in the Technology Independent Machine Interface (TIMI) and user native methods require a user address space to run, some overhead is required initially to create a user environment that uses 16-byte address pointers instead of the 8-byte pointers used below TIMI. It simply means that your reasons for using JNI should be based on more than performance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js