How to delete singleton instance from separate module before system killed running threads - c++

I have a C++ Windows (compiled with Visual Studio 2019) program that uses shared libraries. A shared library uses a singleton on a class that creates a thread. The class destructor kills the thread cleanly, so there should be no memory leak. However, I see that the destructor is being invoked after the system actually killed all running threads upon exit, so it's too late, the thread is not exited cleanly and this introduces a memory leak (and possibly other problems depending on the code being processed by the thread).
Here is a MCVE:
#include <thread>
#include <atomic>
class Single
static Single& GetInstance()
static Single single;
return single;
int doSomething()
while ( !started )
std::this_thread::sleep_for( std::chrono::milliseconds(100) );
return 0;
Single() :
started( false ),
continueThread( true )
thread = new std::thread( &Single::threadFunc, this );
continueThread = false;
delete thread;
void threadFunc()
started = true;
while ( continueThread )
std::this_thread::sleep_for( std::chrono::milliseconds(1) );
std::atomic_bool started;
std::atomic_bool continueThread;
std::thread* thread;
int main( int argc, char* argv[] )
return Single::GetInstance().doSomething();
If this is copied to a single main.cpp file and executed, everything works fine. When ~Single is executed, in the debugger, I see the threadFunc thread is running and it gets stopped cleanly.
Now, if Single definition and implementation is moved to a separate dll. When ~Single is executed, in the debugger, I see the threadFunc thread is not running anymore (the system already stopped it) and the code can't stop in cleanly. Visual Leak Detector reports then a memory leak.
Is there any flag (in code or at compiler level) that could be set to guarantee threads are not destroyed before the singleton gets deleted?
I know I could call a deinit function manually from the main function, but at some point, the main may not even know there is singleton running a thread in the shared library it uses...the shared library itself should be able to cleanly exit.

Multithreading, automatic cleanup, and DLL unloading are basically a huge mess on Windows once they interact.
The solution is to not have singletons, or any static lifetime variables (globals, local statics, class statics) with non-trivial destruction semantics. Make an instance of your thing in main()/WinMain(). Pass a reference to whoever needs it. Let the destructor clean it up before main exits and thus before everything gets unloaded.
Or simply ignore the memory leak. The process is exiting anyway.

This a common case of SUOF (Static Unitialization Order Fiasco) caused by giving up control over object instance lifetime by using static local variable. Solution is to get back control over object instance lifetime by adding a couple of initialization / uninitialization routines (probably wrapped with RAII) that will ensure that object is created before the first use and destroyed after last use but prior to dll getting unloaded / main function returning.


Can the thread object be deleted after std::thread::detach?

I have a question about std::thread::detach(). In it says 'After a call to this function, the thread object becomes non-joinable and can be destroyed safely', by which it seems to mean that the destructor ~thread() may be called safely.
My question is, does this mean that it is ok & safe to delete a thread object immediately after calling detach(), as in the following sample code? Will the function my_function continue safely, and safely use its thread_local variables and variables that are global to the program?
#include <thread>
#include <unistd.h>
void my_function(int t)
int main()
std::thread *X = new std::thread(my_function, 10);
delete X;
return 0;
The code 'runs' ok, I just want to know if this is safe from the point of view of memory ownership. My motivation here is to have a program that runs 'forever', and spawns a few child threads from time to time (e.g. every 30 seconds.) Each child thread then does something, and dies: I do not want to have to somehow keep track of the children in the parent thread, call join() and then delete.

Prevent destruction of self after main?

I'm writing some asynchronous I/O stuff in C++, and I need to prevent an object from being destructed until its handler for the asynchronous I/O is called. I'm trying to use shared_ptr and create my object with a static constructor so I can be sure that it is using reference counting. Then I save that in a weak_ptr until I start the asynchronous I/O, when I store it into another shared_ptr to be sure it doesn't become invalid during that time. Finally, I reset it when the callback completes. Here's an example:
#pragma once
#include <memory>
#include <functional>
using namespace std;
class SomeIO {
std::weak_ptr<SomeIO> self;
std::shared_ptr<SomeIO> savingSelf;
void myCallback() {
// do my callback stuff here
SomeIO() = delete;
~SomeIO() {}
static shared_ptr<SomeIO> create() {
auto self = make_shared<SomeIO>();
self->self = self;
return self;
void start() {
savingSelf = self.lock();
//startSomeAsyncIO(bind(self, SomeIO::myCallback));
int main() {
auto myIO = SomeIO::create();
return 0;
My question is, what is going to happen after main returns? Will it stay alive until the final reference is released, or is this going to cause a memory leak? If this does cause a memory leak, how do I handle this situation so the asynchronous I/O can be canceled and the program can end without a memory leak? I would think that shared_ptr protects me from memory leaks, but I'm not so sure about this situation.
In C++ (as opposed to Java) , the program ends whenever the main ends. all other threads are terminated. memory leaks are not your problem since the program ends anyway and all the memory is deallocated.
you can use std::thread with std::thread::join to prevent you program from exiting too early :
int main (void){
std::thread myAsyncIOThread ([]{
auto myIO = SomeIO::create();
//other things you program needs to do
return 0;
you might want to be interested having a Thread-Pool in your program.

Waiting for main() to return?

So I have a multithreaded C++ console application in which I want to handle the console close event in order to perform cleanup.
I have something to this effect:
bool running = true;
ServerSocket* server;
std::mutex mutex;
running = false;
std::lock_guard<std::mutex> guard(mutex);
return TRUE;
int main()
std::lock_guard<std::mutex> guard(mutex);
SetConsoleCtrlHandler(&HandlerRoutine, TRUE);
try {
ServerSocket server(27015);
::server = &server;
while (running)
TCPSocket* client = server.accept(true);
catch (const ServerSocket::ServerShutdownException&)
return 0;
If I return from HandlerRoutine my program gets terminated unceremoniously, so I have to wait for main() to end.
However, after main ends I get an exception telling me a mutex was destroyed while busy, thrown from dynamic atexit destructor for 'mutex'(). This leads me to believe that static and global variables are destroyed as soon as main returns, leaving my handler function hanging around with invalid globals.
Is this the standard specified behaviour, and if so, any idea about how I can achieve my desired effect?
In this scenario I would simply leak the mutex object. You don't want the destructor called prior to termination of the last thread, and there's no point in calling it during termination of the last thread.
std::mutex& mutex = *new mutex; // freed by OS at process exit
You can try boost::application.
Here the example wait_for_termination_request.cpp
Yes, your deduction is correct. Seems like the best option would be to unregister your handler and then wait for it to finish before returning from main(). But if that's not an option for whatever reason, something else you could do is to wrap all your globals in a struct:
struct Globals
bool running;
ServerSocket* server;
std::mutex mutex;
Have a single, global shared_ptr to an instance of that struct:
std::shared_ptr<Globals> globals = std::make_shared<Globals>();
Make a copy of the shared_ptr in your handler:
std::shared_ptr<Globals> myGlobals = globals;
And rely exclusively on myGlobals within the handler (there is no guarantee that the globals pointer itself will remain valid for the entire lifetime of the thread). That way everything is kept alive until everyone is done with it.
This assumes, of course, that globals is still valid when HandlerRoutine begins. If that's not the case (i.e. if the system can call the handler after main returns but before the process ends), then I'll delete this answer.
I'd be tempted to play ping pong with mutexes. Have not one, but two mutexes.
The first is held by mymain (a copy of your main basically). main does nothing but call mymain.
The second is held by HandlerRoutine, and aquired by main after returning from mymain.
If you shut down without the HandlerRoutine being called, you simply fall off the end of main.
If you shut down after the HandlerRoutine is called, your main blocks on it finishing.
Simply planning to leak the mutex is insufficient, as if HandlerRoutine is called during the period that main was already planing to shutdown, its server->shutdown could be accessing invalid memory.
Some work on the second mutax (that HandlerRoutine accesses) needs to be done to deal with race conditions (being called -- or reaching the lock -- after main has already exited, and the process is cleaning up global variables?). Storing the HandlerRoutine mutex in a pointer, and using lock-free techniques to access it extremely carefully, possibly involving spin locks.
To expand on the comments mentioning that the mutex is unnecessary, this is one alternative:
running = false;
return TRUE; // just to stop the compiler complaining

std::thread::join() hangs if called after main() exits when using VS2012 RC

The following example runs successfully (i.e. doesn't hang) if compiled using Clang 3.2 or GCC 4.7 on Ubuntu 12.04, but hangs if I compile using VS11 Beta or VS2012 RC.
#include <iostream>
#include <string>
#include <thread>
#include "boost/thread/thread.hpp"
void SleepFor(int ms) {
template<typename T>
class ThreadTest {
ThreadTest() : thread_([] { SleepFor(10); }) {}
~ThreadTest() {
std::cout << "About to join\t" << id() << '\n';
std::cout << "Joined\t\t" << id() << '\n';
std::string id() const { return typeid(decltype(thread_)).name(); }
T thread_;
int main() {
static ThreadTest<std::thread> std_test;
static ThreadTest<boost::thread> boost_test;
// SleepFor(100);
The issue appears to be that std::thread::join() never returns if it is invoked after main has exited. It is blocked at WaitForSingleObject in _Thrd_join defined in cthread.c.
Uncommenting SleepFor(100); at the end of main allows the program to exit properly, as does making std_test non-static. Using boost::thread also avoids the issue.
So I'd like to know if I'm invoking undefined behaviour here (seems unlikely to me), or if I should be filing a bug against VS2012?
Tracing through Fraser's sample code in his connect bug (
with VS2012 RTM seems to show a fairly straightforward case of deadlocking. This likely isn't specific to std::thread - likely _beginthreadex suffers the same fate.
What I see in the debugger is the following:
On the main thread, the main() function has completed, the process cleanup code has acquired a critical section called _EXIT_LOCK1, called the destructor of ThreadTest, and is waiting (indefinitely) on the second thread to exit (via the call to join()).
The second thread's anonymous function completed and is in the thread cleanup code waiting to acquire the _EXIT_LOCK1 critical section. Unfortunately, due to the timing of things (whereby the second thread's anonymous function's lifetime exceeds that of the main() function) the main thread already owns that critical section.
Anything that extends the lifetime of main() such that the second thread can acquire _EXIT_LOCK1 before the main thread avoids the deadlock situation. That's why the uncommenting the sleep in main() results in a clean shutdown.
Alternatively if you remove the static keyword from the ThreadTest local variable, the destructor call is moved up to the end of the main() function (instead of in the process cleanup code) which then blocks until the second thread has exited - avoiding the deadlock situation.
Or you could add a function to ThreadTest that calls join() and call that function at the end of main() - again avoiding the deadlock situation.
I realize this is an old question regarding VS2012, but the bug is still present in VS2013. For those who are stuck on VS2013, perhaps due to Microsoft's refusal to provide upgrade pricing for VS2015, I offer the following analysis and workaround.
The problem is that the mutex (at_thread_exit_mutex) used by _Cnd_do_broadcast_at_thread_exit() is either not yet initialized, or has already been destroyed, depending on the exact circumstances. In the former case, _Cnd_do_broadcast_at_thread_exit() tries to initialize the mutex during shutdown, causing a deadlock. In the latter case, where the mutex has already been destroyed via the atexit stack, the program will crash on the way out.
The solution I found is to explicitly call _Cnd_do_broadcast_at_thread_exit() (which thankfully is declared publicly) early during program startup. This has the effect of creating the mutex before anyone else tries to access it, as well as ensuring that the mutex continues to exist until the last possible moment.
So, to fix the problem, insert the following code at the bottom of a source module, for instance somewhere below main().
#pragma warning(disable:4073) // initializers put in library initialization area
#pragma init_seg(lib)
#if _MSC_VER < 1900
struct VS2013_threading_fix
} threading_fix;
I believe your threads have already been terminated and their resources freed following the termination of your main function and before static destruction. This is the behavior of the VC runtimes dating back to at least VC6.
Do child threads exit when the parent thread terminates
boost thread and process cleanup on windows
My answer is too far late, but hope will help someone.
I was stucked by this bug, and i find a trick to solve this problem,it worked in my code.
int main()
ThreadTest trick_obj; //trick... You can put this line of code anywhere
static ThreadTest std_test;
return 1;
I have been battling this bug for a day, and found the following work-around, which turned out the be the least dirty trick:
Instead of returning, one can use the standard Windows API function call ExitThread() to terminate the thread. This method of course may mess up the internal state of the std::thread object and associated library, but since the program is going to terminate anyway, well, so be it.
#include <windows.h>
template<typename T>
class ThreadTest {
ThreadTest() : thread_([] { SleepFor(10); ExitThread(NULL); }) {}
~ThreadTest() {
std::cout << "About to join\t" << id() << '\n';
std::cout << "Joined\t\t" << id() << '\n';
std::string id() const { return typeid(decltype(thread_)).name(); }
T thread_;
The join() call apparently works correctly. However, I chose to use a more safe method in our solution. One can get the thread HANDLE via std::thread::native_handle(). With this handle we can call the Windows API directly to join the thread:
WaitForSingleObject(thread_.native_handle(), INFINITE);
Thereafter, the std::thread object must not be destroyed, as the destructor would try to join the thread a second time. So we just leave the std::thread object dangling at program exit.

wxwidgets - exit the thread the right way

I run openCL /openGL program which uses wxWidget as gui enviroment
Inside object of class ,which derives from wxThread,I perform some complicated calculations and build many openCL programs.
I want to delete the thread .But the thread is not deleted immediately – it continue to build programs and just after it finishes with all the compilations.
I know that I can use wxThread::KIll() to exit the thread but it cause some memory problems so its not really an option.
I have myFrame class which is derived from has pCanvas pointer ,which points to the object which is derived from wxCanvas
*pCanvas object includes the myThread (which runs the complicated calculation)
void myFrame::onExit(wxCommandEvent& WXUNUSED(event))
if(_pCanvas != NULL )
wxCriticalSectionLocker enter(_smokeThreadCS);
// smoke thread still exists
if (_pCanvas->getThread() != NULL)
//_pCanvas->getSmokeThread()->Delete(); <-waits until thread ends and after it application terminates
_pCanvas->getSmokeThread()->Kill(); <- immediately makes the application not responding
// exit from the critical section to give the thread
// the possibility to enter its destructor
// (which is guarded with m_pThreadCS critical section!)
while (true)
{ // was the ~MyThread() function executed?
wxCriticalSectionLocker enter(_smokeThreadCS);
if (!_pCanvas->getSmokeThread()) break;
// wait for thread completion
// Close the main frame, this ends the application run:
Killing a thread like that is indeed very bad. It's best to give the thread a chance to clean up.
Graceful thread termination is usually done by periodically checking a flag that tells it to exit:
volatile bool continue_processing = true;
thread thread;
void compile_thread()
// compile one OpenCL program.
void terminate()
continue_processing = false;
thread.join(); // wait for thread to exit itself.
Depending on your CPU and compiler, simply marking continue_processing as volatile might not be enough to make the change happen immediately and visible to the other thread, so barriers are used.
You'll have to consult your compiler's documentation to see how to create a barrier... they're different in each one. VC++ uses _ReadWriteBarrier() and _WriteBarrier().
If it is non joinable thread it will die itself and clean up
I found this link which I think will help a lot!