I stucked into a problem with threads syncronization and critical sections on Windows 10.
Application will crash in this case:
Application has two threads.
Thread 1 calls EnterCriticalSection with object m_CS
Thread 2 then attempts to enter the same critical section
Thread 1 terminates Thread 2 using TerminateThread
Thread 1 calls LeaveCriticalSection
In previous Windows versions which I was able to test (7, 8, 8.1) this works properly. Thread 2 terminates, and Thread 1 leaves the critical section without exception.
On Windows 10, when Thread 1 leaves the critical section, application crashes with Access Violation. It only happens when another thread was terminated while waiting on EnterCriticalThread.
Looking at the stack trace it looks this (latest frame at the top):
RtlpWakeByAddress
RtlpUnWaitCriticalSection
RtlLeaveCriticalSection
I spent so much time on debugging this issue. In my case m_CS is totally fine when LeaveCriticalSection was called. I debugged and spent some time to analyze disassembled code of ntdll.dll functions. Seems like object corrupts somewhere during execution of RtlpUnWaitCriticalSection and then passed to RtlpWakeByAddress when crash occurs. Basicly ntdll.dll was able to modify CRITICAL_SECTION object's properties such as lock count in RtlLeaveCriticalSection.
From the web I didn't find any answer on this or statement what changed in Windows 10. Only thread on reddit and ~1800 crash reports for Mozilla Firefox with same call stack in the last month. I contacted with author of post on reddit and he was not able to fix this thus far.
So anybody dealed with this issue and may be have a fix for it or advices? As a solution right now I only see to rethink usage of WinAPI TerminateThread and try to avoid it as much as possible. Another way probably to do a code refactoring and think on application's architecture.
Any response appreciated.
Thanks in advance
Implementation of CRITICAL_SECTION very volatile from version to version. when in last Windows version thread begin wait on CRITICAL_SECTION he call WaitOnAddress function. ok, really it ntdll internal implementation - RtlpWaitOnAddress, but this not change gist. this function internal call RtlpAddWaitBlockToWaitList - and here the key point - WaitBlock is allocated on thread stack and pointer to this wait block is added to List. then when owner of CRITICAL_SECTION leave he call WakeByAddressSingle (really it internal implementation RtlpWakeByAddress) and this function pop the first WaitBlock from List, extract thread Id from it and call NtAlertThreadByThreadId(new api from win 8.1) - for awaken some thread waited in EnterCriticalSection. but when you terminated thread, waited in EnterCriticalSection - his stack is deallocated. so address of WaitBlock block become invalid. so thread which called RtlpWakeByAddress (as part of LeaveCriticalSection) got access violation when try read thread Id from WaitBlock (died thread stack).
conclusion - if you call TerminatedThread - process already become in unstable state, bug can be at any time and any point. so - not call this function, especially from self process.
Thread 1 terminates Thread 2 using TerminateThread
Don't do that. It may look like it works on other windows versions, but there's no way for you to know for sure what side-effects are occurring and hiding from you.
From https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717(v=vs.85).aspx
TerminateThread is a dangerous function that should only be used in
the most extreme cases. You should call TerminateThread only if you
know exactly what the target thread is doing, and you control all of
the code that the target thread could possibly be running at the time
of the termination. For example, TerminateThread can result in the
following problems:
If the target thread owns a critical section, the critical section will not be released.
If the target thread is allocating memory from the heap, the heap lock will not be released.
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be
inconsistent.
If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of
the DLL.
What you should do is communicate with thread 2 and let thread 2 shut itself down correctly and safely.
I would change the code of thread 2 to use TryEnterCriticalSection
if(!TryEnterCriticalSection(&m_CS)) {
return 0; // Terminate thread
}
//code
LeaveCriticalSection(&m_CS);
This has the advantage that thread 2 is not waiting on the critical section, and it can terminate itself properly. It is generally not advisable to use TerminateThread, as already mentioned by others in the comments.
Yes, I can confirm this behavior and spent more than 3 days for finding a memoryleak in our code what distroys my CRITICAL_SECTION. The problem was an old call of TerminateThread. The program worked nice, but now on Windows10 we had apparently occuring access violations in EnterCriticalSection or LeaveCriticalSection.
Thank you so much, this made my day.
Related
I've written a threadpool with a terminate option for threads available to the user. As described in
Documentation of API terminateThread(),
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
I could verify this problem myself: Terminating a thread in that condition caused memory allocation problems (amongst others), but fixing that condition fixed the problems at the same time.
Questions
So, I want to check this inner state after every time terminateThread() has been used. If terminateThread() has caused problem for the inner state of a process in kernel32.dll, I want to raise an exception - and terminate the process after logging to user (unless fixing the inner state is still possible).
Is this feasible? Maybe by finding the address of the relevant state variable (or anything like that - by matching the pdb file of kernel32 or another way)? The situation is bad for me - if I cannot solve it, I either have to omit the terminate option for the threapool or just leave the thread for itself. Any hints would be appreciated!
Is there any other win32 function that causes similar problems?
a. Is it safe to leave a thread for itself when it has called a blocking kernel32 function that will definitely never return?
b. What happens if the win32 function returns and the lambda function has been destroyed?
Why am I asking this? (Supplementary information)
I have a custom threadpool in my project where I call some win32 APIs that may sometimes block forever. Hence, I call them using a timeout. When that timeout is reached, I call terminateThread() and have my threadpool return "unsucessfull call state".
Sometimes, my current app reaches a deadlock. I found out that this deadlock is happening in the threadpool, so I'm looking for alternatives to terminateThread() (such as leaving the thread as I described in the question) or trying to fix the inner state, or at least to verify whether terminateThread() is the root of my deadlock.
I'd like to reuse this threadpool in other projects, too, so I should make it safe.
Update: Problem fixed.
I found the bug in my app:
It was actually a call to terminateThread() when the timeout in my threadpool was already low (about 200 ms).
The thread was killed in a moment when it wasn't blocking (i.e., if a longer timeout period had remained, it would have worked and returned correctly).
From the kernel stack trace I found out that in kernel mode, a mutex was being locked while the thread was terminated, and while the thread was exiting, other threads were already waiting for that mutex.
The problem first appeared to go away by increasing the minimum timeout to 1000 ms, but I wasn't content with that:
My solution was to create lambda on heap when timeout reached, to leave the lambda and the thread for itself without terminating, and to add it to a list of _ToTerminateThreads.
The list gets terminated once in 10 minutes (waits 10 min., copies the list, waits another minute, and then terminates the threads and deletes the lambdas).
Still, after testing and hours of debugging I was getting heap corruptions.
Finally I found out the following:
The threads that I had left for deletion
wrote to the memory that had been used by the user function (which was passed to threadpool)
- and they were freed because threadpool had returned.
This had caused the ultimate problems, so the final solution was to increase the timeout to a safe amount.
I recommend to everyone who needs such a feature to deploy it to a child process, and to terminate that process instead of using a thread.
I keep this question open because the main 4 questions haven't been answered yet. For my problem, I don't need their answer anymore, but they may be interesting for other members of stackoverflow.
My issue resolved although it has nothing to do with 3 questions in post.
I try to answer them in reverse order:
ad 3.b.) If an external function returns and your local lambda has been deleted, the cpu won't know this and will try to process the bytes at that offset as CPU instructions. This surely will mess you up, so never do that!
ad 3.a.) Yes, it is safe to leave if you're 100% sure the external function will never return (otherwise it'll mess your app when return
if you deleted the rest of the code with the same way explained in b.
if you haven't deleted the lambda or it's a gobal function, it'll run the rest of function which may be editing dynamic allocated memories (heap, not stack) which has been deallocated and cause heap corruption or simply just edit some global variables).
ad 2.) I googled for dangerous winapi functions and didn't find any result other than TerminateThread().
If you know about one, please add a comment or write another answer.
ad 1.) I don't have any solution for checking/fixing the inner kernel32 state for the process that Microsoft refers to.
I think a guy at Microsoft who has read kernel32.dll source code should answer this.
Apart from this kernel32 state, TerminateThread() will cause lots of other problems (like resource/heap allocations, mutex locks, leaks and so on) so never use it unless you 100% sure what you are doing.
Read the article #RichardCritten linked in the comments: TerminateThread()
What was the bug in my code?
I was calling TerminateThread() with a low timeout (300 ms).
Randomly when a machine had low resources, the function was still operating (I mean non-blocking call!).
I terminated that function and thereby caused a kernel mutex to be locked.
This locked mutex made all other threads wait - and not exit when they returned.
Remarks
I answered my own question from what I found after I didn't receive any answer. Hence, it may contain some wrong information. Please correct me if anything is wrong in this.
VS2013, C++
I just release dll application. One of dll app function run thread by _beginthread.
In normal software flow I use mutex and control threads. Before unregister dll from main application I wait for thread terminating and close handlers.
However there is one case that main application could close without release resources in correct way I mean without waiting for child thread terminating and without close of handlers.
Is there any risk if main application force exit? Is there any risk if I run application and threads again after exit?
Is there any risk for OS? Are all threads terminating after main exit?
I know that it is "dirty" solution but for some reason I can’t change that.
Thank you in advance for advices.
According to Raymond Chen - in Windows systems - if the main thread terminates, your application hangs while all your threads end. This means, no your solution will not work, your thread will freeze your application in the closing state. Also even if your thread would be forcefully terminated on exit, it would not be uninitialized, and - since we are talking about MFC threads here - it would cause your application to leak resources, so pretty please don't do that!
Is there any risk if main application force exit?
Yes! Since thread can have started consistence-sensitive processes.
Is there any risk if I run application and threads again after exit?
Yes! May be previous shutdown crushed the data structure and now you cannot even load data correctly
Is there any risk for OS?
It depends on your business. May be you create a soft for disk-optimization and you are moving clusters while emergency shutdown?
Are all threads terminating after main exit?
Yes! You need foreseen special "join" code that waits accomplishment of threads.
I would say, the behavior is undefined. Too many things may happen, when the application is terminated without having the chance to clean up.
This SO question may give some ideas.
This MS article describes TerminateThread function and also lists some implication of unexpectedly terminating the threads (which is probably happened on calling exit):
If the target thread owns a critical section, the critical section
will not be released.
If the target thread is allocating memory from the heap, the heap lock will not be released.
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be
inconsistent.
If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users
of the DLL.
So looks like there is a risk even for the OS
kernel32 state for the thread's process could be inconsistent
Will the thread terminate even if it is in suspended state when TerminateThread is called?
The TerminateThread function destroys the thread regardless of its state or the likely side effects. The linked MSDN page covers this in some detail.
TerminateThread is used to cause a thread to exit. When this occurs, the target thread has no chance to execute any user-mode code. DLLs attached to the thread are not notified that the thread is terminating. The system frees the thread's initial stack.
Windows Server 2003 and Windows XP: The target thread's initial stack is not freed, causing a resource leak.
TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:
I've got to ask why you would want to call this as it's definitely a last resort for shutting down a thread. your application will leak memory and other resources unless you are very lucky or careful.
In my application a thread runs while(1){} in it so thread terminates when my app is terminated by user.
Is it safe to do like this? I am using while(1){} because my app continuously monitors devices on system.
After some time I am getting "(R6016) not enough space for thread data" on ffmpeg.
I read this but did not get solution of my problem:
http://support.microsoft.com/kb/126709
Thread description:
Thread uses ffmpeg and handle utility (http://technet.microsoft.com/en-us/sysinternals/bb896655.aspx). within while(1){} loop.
ffmpeg and handle is running through QProcess which I am deleting after process ends.
while(1){} loop waits for 5 seconds using
msleep(5000).
This is not safe.
Change while (1) to while (!stopCondition) and have stopCondition change to TRUE when exiting. The main thread should wait for all other thread to finish before exiting.
Note: stopCondition is defined as volatile int stopCondition.
When the main thread exists, a cleanup process starts:
- global destructors are called (C++).
- C runtime library starts to shut down, releasing all memory allocated with malloc, unloading dynamic libraries and other resources.
A thread that depends on the C runtime being functional will crash or if it runs code from a shared/dynamic libray. If that thread was doing something important like writing to a file, the file will be corrupt. Maybe in your case things are not so bad, but seeing an application crash doesn't looks good to say the least.
This is not the full story, but I think it makes my point.
The deal is:
I want to create a thread that works similarly to executing a new .exe in Windows, so if that program (new thread) crashes or goes into infinite loop: it will be killed gracefully (after the time limit exceeded or when it crashed) and all resources freed properly.
And when that thread has succeeded, i would like to be able to modify some global variable which could have some data in it, such as a list of files for example. That is why i cant just execute external executable from Windows, since i cant access the variables inside the function that got executed into the new thread.
Edit: Clarified the problem a lot more.
The thread will already run after calling CreateThread.
WaitForSingleObject is not necessary (unless you really want to wait for the thread to finish); but it will not "force-quit" the thread; in fact, force-quitting - even if it might be possible - is never such a good idea; you might e.g. leave resources opened or otherwise leave your application in a state which is no good.
A thread is not some sort of magical object that can be made to do things. It is a separate path of execution through your code. Your code cannot be made to jump arbitrarily around its codebase unless you specifically program it to do so. And even then, it can only be done within the rules of C++ (ie: calling functions).
You cannot kill a thread because killing a thread would utterly wreck some of the most fundamental assumptions a programmer makes. You would now have to take into account the possibility that the next line doesn't execute for reasons that you can neither predict nor prevent.
This isn't like exception handling, where C++ specifically requires destructors to be called, and you have the ability to catch exceptions and do special cleanup. You're talking about executing one piece of code, then suddenly ending the execution of that entire call-stack. That's not going to work.
The reason that web browsers moved from a "thread-per-tab" to "process-per-tab" model is exactly this: because processes can be terminated without leaving the other processes in an unknown state. What you need is to use processes instead of threads.
When the process finishes and sets it's data, you need to use some inter-process communication system to read that data (I like Boost.Interprocess myself). It won't look like a regular C++ global variable, but you shouldn't have a problem with reading it. This way, you can effectively kill the process if it's taking too long, and your program will remain in a reasonable state.
Well, that's what WaitForSingleObject does. It blocks until the object does something (in case of a thread it waits until the thread exits or the timeout elapses). What you need is
HANDLE thread = CreateThread(0, 0, do_stuff, NULL, 0, 0);
//rest of code that will run paralelly with your new thread.
WaitForSingleObject(thread, 4000); // wait 4 seconds or for the other thread to exit
If you want your worker thread to shut down after a period of time has elapsed, the best way to do that is to have the thread itself monitor the elapsed time in some way and then exit when the time is up.
Another way to do this is to monitor the elapsed time in the main thread or even a third, monitor type thread. When the time has elapsed, set an event. Your worker thread could wait for this event in it's main loop, and then exit when it has been raised. These kinds of events, which are used to signal the thread to kill itself, are sometimes called "death events." (Or at least, I call them that.)
Yet another way to do this is to queue a user job to the worker thread, which needs to be in an alterable wait state. The APC can then set some internal state variable which will trigger the death sequence in the thread when it resumes.
There is another method which I hesitate even mentioning, because it should only be used in extremely dire circumstances. You can kill the thread. This is a very dangerous method akin to turning off your sink by detonating an atomic bomb. You get the sink turned off, but there could be other unintended consequences as well. Please don't do this unless you know exactly what you're doing and why.
Remove the call to WaitForSingleObject. That causes your parent thread to wait.
Remove the WaitForSingleObject call?