Sorry, the title is a click bait... It's not as easy to solve as you think... that one is a real challenge
I am having a very weird issue where a thread that is joinable() fails to join().
The error I get is No such process.
This is not a typical beginner's mistake of joining threads twice...
It is a complex issue and probably even caused by memory corruption... But I am hoping that I am simply missing something and I need a fresh external view... I have been working on this issue for two days.
I am compiling for both Linux and Windows.
On Linux (using gcc 9.1.0) it works flawlessly every time.
On Windows (using x86_64-w64-mingw32-g++ 9.2.0 from my linux machine and running the program on my windows machine) I always get the error.
Here's what I can confirm WITH 100% CERTAINTY :
Thread was NOT joined already.. Only one call to join() for that thread, and it crashes.
Thread is NOT default-constructed (and it is a raw pointer assigned with new)
Threads are working (Other Threads join() are working fine)
Calling a detach() instead of join() causes the same error
Not calling that join() (and sleep for a second instead) "fixes" the issue
The parent thread (the one creating the problematic thread) is the same as the one calling join()
Whether we are compiling in Debug (-ggdb -g -O0) or Release (-O3) does not change the outcome (Linux always works, windows always fails)
Erroneous thread is created through a lambda function which is perfectly-forwarded from another lambda function
That very last point may very well be the source of the issue, though I really don't see how.
I also know that the object containing the thread pointer is not destroyed before the join().
The only place where I delete this pointer is right after the join() if successful.
The parent object is a wrapped within a shared_ptr.
The pointer to that thread is also never used/shared elsewhere.
The code is very difficult to simplify and share here since it is part of a complete networking system and all aspects of it may be the source of the issue.
Oh, and the actual thread is correctly executed and all resulting network communications work as they should even though the thread cannot be joined.
Here's a very simplified version of the important parts with comments explaining what happens :
// We instantiate a new ListeningServer then call Start(),
// then we connect a client to it, we transfer some data,
// then we call Stop() on the ListeningServer and we get the error, but everything worked flawlessly still
typedef std::function<void(std::shared_ptr<ListeningSocket>)> Func;
class ListeningServer {
ListeningSocket listeningSocket; // The class' Constructor initializes it correctly
void Start(uint16_t port) {
listeningSocket.Bind(port);
listeningSocket.StartListeningThread([this](std::shared_ptr<ListeningSocket> socket) {
HandleNewConnection(socket);
});
}
void HandleNewConnection(std::shared_ptr<ListeningSocket> socket) {
// Whatever we are doing here works flawlessly and does not change the outcome of the error
}
void Stop() {
listeningSocket.Disconnect();
}
};
class ListeningSocket {
SOCKET socket = INVALID_SOCKET; // Native winsock fd handle for windows or typedefed to int on linux
std::thread* listeningThread = nullptr;
std::atomic<bool> listening = false;
void StartListeningThread(Func&& newSocketCallback) {
listening = (::listen(socket, SOMAXCONN) >= 0);
if (!listening) return; // That does not happen, we're still good
listeningThread = new std::thread([this](std::shared_ptr<ListeningSocket>&& newSocketCallback){
while (IsListening()) {
// Here I have Ommited a ::poll call with a 10ms timeout as interval so that the thread does not block, the issue is happening with or without it
memset(&incomingAddr, 0, sizeof(incomingAddr));
SOCKET clientSocket = ::accept(socket, (struct sockaddr*)&incomingAddr, &addrLen);
if (IsListening() && IsValid(clientSocket)) {
newSocketCallback(std::make_shared<ClientSocket>(clientSocket, incomingAddr)); // ClientSocket is a wrapper to native SOCKET with addr info and stuff...
}
}
LOG("ListeningThread Finished") // This is correctly logged just before the error
}, std::forward<Func>(newSocketCallback));
LOG("Listening with Thread " << listeningThread->get_id()) // This is correctly logged to the same thread id that we want to join() after
}
INLINE void Disconnect() {
listening = false; // will make IsListening() return false
if (listeningThread) {
if (listeningThread->joinable()) {
LOG("*** Socket Before join thread " << listeningThread->get_id()) // Logs the correct thread id
try {
listeningThread->join();
delete listeningThread;
listeningThread = nullptr;
LOG("*** Socket After join thread") // NEVER LOGGED
} catch(...) {
LOG("JOIN ERROR") // it ALWAYS goes here with "No Such Process"
SLEEP(100ms) // We need to make sure the thread still finishes in time
// The thread finishes in time and all resulting actions work flawlessly
}
}
}
#ifdef _WINDOWS
::closesocket(socket);
#else
::close(socket);
#endif
socket = INVALID_SOCKET;
}
};
Anothing important thing to note is that elsewhere in the program I am directly instantiating a ListeningSocket and calling StartListeningThread() with a lambda and that one does not fail to join the thread after calling Disconnect() directly
Also, part of this code is compiled in a shared library that is linked dynamically.
Issue solved !
It would seem that, in windows only, one cannot create a thread from code compiled in a shared library and try to join it from code compiled in the main application.
Basically, the joinable() will return true, but the .join() or .detach() will fail.
All I had to do is to make sure the thread is created and joined from code originally compiled in the same file.
It was the kind of hint that I was looking for when I asked the question, because I knew that it was more complicated than that and a simplified minimal code would not be able to reproduce the issue.
This constraint of threads in windows is not documented anywhere (as I know of, and I SEARCHED)
So it is very plausible that it's not supposed to be a constraint and is actually a bug in the compiler I am using.
Related
I'm creating a logging object which performs the real file writing work on a separate std::thread, and offers an interface to a log command buffer, syncing the caller threads and the one worker thread. Access to the buffer is protected by a mutex, there's an atomic bool for the worker thread exit condition, and I'm using Windows native Events as a signal to wake up the worker thread when new commands arrive. The object's constructor spawns the worker thread so it is immediately available. The worker thread is simply a while loop checking the exit condition, with in the loop a blocking wait for the signal. The object's destructor finally just sets the exit condition, signals the thread to wake up and joins it to ensure it's down before the object is fully destroyed.
Seems simple enough, and when using such an object somewhere in a function it works nicely. However, when declaring such an object as a global variable to have it usable for everyone it stops working. I'm on Windows, using Visual Studio 2017 with the 2015 tool chain. My project is a DLL plugin for another application.
The things I tried so far:
Start the thread in the constructor of the global object. This however makes the main thread hang immediately when my DLL is loaded. Pausing the app in the debugger reveals we're in the std lib, at a point where the main thread should have launched the worker thread and is now stuck waiting for a condition variable, presumably one that is signaled by the worker thread once it is launched?
Delay-construct the thread on demand when we first use the global object from somewhere else. This way constructing it goes nicely without a hang. However, when signalling the worker thread to exit from the destructor, the signal is sent, but the join on the worker thread now hangs. Pausing the app in the debugger reveals our main thread is the only one still alive, and the worker thread is already gone? A breakpoint placed in the worker thread function right before the close brace reveals it is never hit; the thread must be getting killed?
I also tried to start the thread via a std::future, starting it up async, and that one launches perfectly fine from the constructor in global objects. However, when the future tries to join the thread in the destructor, it hangs as well; here again no worker thread to be detected anymore while no breakpoint gets hit in it.
What could be going on? I can't imagine it's because the thread construction and destruction takes place outside main() so to speak; these std primitives should really be available at such moments, right? Or is this Windows specific and is the code running in the context of DllMain's DLL_PROCESS_ATTACH / DLL_THREAD_ATTACH events, where starting up threads might wreak havoc due to thread local storage not yet being up and running or such? (would it?)
EDIT -- added code sample
The following is an abbreviation/simplification of my code; it probably doesn't even compile but it gets the point across I hope :)
class LogWriter {
public:
LogWriter() :
m_mayLive(true) {
m_writerThread = std::thread(&C_LogWriter::HandleLogWrites, this); // or in initializer list above, same result
};
~LogWriter() {
m_mayLive = false;
m_doSomething.signal();
if (m_writerThread.joinable()) {
m_writerThread.join();
}
};
void AddToLog(const std::string& line) { // multithreaded client facing interface
{
Locker locker; // Locker = own RAII locker class
Lock(locker); // using a mutex here behind the scenes
m_outstandingLines.push_back(line);
}
m_doSomething.signal();
}
private:
std::list<std::string> m_outstandingLines; // buffer between worker thread and the rest of the world
std::atomic<bool> m_mayLive; // worker thread exit signal
juce::WaitableEvent m_doSomething; // signal to wake up worker thread; no std -- we're using other libs as well
std::thread m_writerThread;
int HandleLogWrites() {
do {
m_doSomething.wait(); // wait for input; no busy loop please
C_Locker locker; // access our line buffer; auto-released at end of loop iteration
Lock(locker);
while (!m_outstandingLines.empty()) {
WriteLineToLog(m_outstandingLines.front());
m_outstandingLines.pop_front();
if (!m_outstandingLines.empty()) {
locker.Unlock(); // don't hog; give caller threads some room to add lines to the buffer in between
std::this_thread::sleep_for(std::chrono::milliseconds(10));
Lock(locker);
}
};
} while (m_mayLive); // atmoic bool; no need to mutex it
WriteLineToLog("LogWriter shut down"); // doesn't show in the logs; breakpoints here also aren't being hit
return 0;
}
void WriteLineToLog(const std::string& line) {
... fopen, fprintf the line, flush, close ...
}
void Lock(C_Locker& locker) {
static LocalLock lock; // LocalLock is similar to std::mutex, though we're using other libs here
locker.Lock(&lock);
}
};
class Logger {
public:
Logger();
~Logger();
void operator() (const char* text, ...) { // behave like printf
std::string newLine;
... vsnprintf -> std::string ...
m_writer.AddToLog(newLine);
}
private:
LogWriter m_writer;
};
extern Logger g_logger; // so everyone can use g_logger("x = %d\n", x);
// no need to make it a Meyer Singleton; we have no other global objects interfering
Since you're writing a DLL in C++, you have to understand how "globals" in DLL's work. The compiler sticks their initialization in DllMain, before anything else that you would do there. But there are some strict rules what you can do in DllMain, as it runs under loader lock. The short summary is that you can't call anything in another DLL because that DLL cannot be loaded while your DllMain is running. Calling CreateThread is definitely not allowed, not even if wrapped inside a std::thread::thread constructor.
The problem with the destructor is quite possibly because your DLL has exited (can't tell without code). The DLL unloads before the EXE, and their respective globals are also cleaned up in that order. Any attempt to log from a destructor in an EXE will fail for obvious reasons.
There is no simple solution here. Andrei Alexandrescu's "Modern C++ Design" has a reasonable solution for logging in the non-DLL case, but you'll need to harden that for use in a DLL. An alternative is to check in your logging functions if your logger still exists. You can use a named mutex for that. If your log function fails in OpenMutex, then either the logger does not exist yet or it no longer exists.
Think I've encountered that destruction issue with DLLs to use with Unity.
The only solution I found back then was to essentially give up true global variables that would need cleanup.
Instead I put them in a separate class which is instantiated only a single time into a global pointer by some custom launch function. Then my DLL got a "quit()" function also called by the user of the DLL. The quit function correctly destroys the instance carrying the global variables.
Probably not the smoothest solution and you have a pointer-indirection on every access to the global variables, but it turned out to be comfortable for serializing the state of the global variables as well.
I heard that "a modern operating system will clean up all threads created by the process on closing it" but when I return main(), I'm getting these errors:
1) This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
2) terminate called without an active exception
My implementation looks like this (I'm writing now for example sorry for bad implementation):
void process(int id)
{
while(true) { std::this_thread::sleep_for(std::chrono::milliseconds(1); } }
}
int main()
{
std::thread thr1(process, 0);
std::thread thr2(process, 1);
//thr1.detach();
//thr2.detach();
return 0;
}
If I uncomment detach();s, there is no problem but my processing threads will be socket readers/writers and they will run infinitely (until main returns). So how to deal with it? What's wrong?
EDIT: Namely, I can't detach() every thread one-by-one because they will not be terminated normally (until the end). Oh and again, if I close my program from the DDOS window's X button, (my simple solution not works in this case) my detach(); functions being passed because program force-terminated and here is the error again :)
What happens in an application is not related to what the OS may do.
If a std::thread is destroyed, still having a joinable thread, the application calls std::terminate and that's what is showing up: http://en.cppreference.com/w/cpp/thread/thread/~thread`
With the c++11 threads, either you detach if you do not care on their completion time, or you care and need to join before the thread object is destroyed.
I recently wrote a simple tcp server using winsock using an online guide. I then tried to multithread it without the help of a guide. After a little struggle I ended up succeeding, but only by detaching the threads.
I have an infinite loop that whenever accept() returns a SOCKET, it creates a Handler, and calls handle() with the SOCKET result of accept() passed to it.
This is the handle() function which takes the socket from the accept() call and creates the thread which calls processData:
void Handler::handle(SOCKET socket)
{
std::thread handlerThread([socket]{
processData(socket);
});
}
Here is the actual processData function, which is a static function in Handler:
void Handler::processData(SOCKET socket)
{
try
{
const int buffLength = 512;
char recvBuff[buffLength];
int recvResult = recv(socket, recvBuff, buffLength, 0);
if(recvResult > 0)
{
std::cout << recvBuff << std::endl;
}
closesocket(socket);
}
catch(std::exception& e)
{
std::cerr << e.what() << std::endl;
}
}
This code would abort() on the recv() call with the code R6010 somehow escaping the try-catch. It wasn't until I changed the handle function to this:
void Handler::handle(SOCKET socket)
{
std::thread handlerThread([socket]{
processData(socket);
});
handlerThread.detach();
}
that it was able to get past the recv() call.
If anyone could explain why detaching the thread had an effect on recv() and knows if there is a more desired design pattern where you do not have to detach the worker threads, I would be very thankful if you would share it with me.
If that is too specific, maybe give your opinion on when it is ok to detach a thread.
From the Spec:
30.3.1.3 thread destructor [thread.thread.destr] ~thread();
If joinable(), calls std::terminate(). Otherwise, has no effects.
In your first case the destructor is called when the handle returns as your thread object is on the stack. Since your underlaying thread is still running and blocked in the recv function std::terminate() gets called. Which eventually lead to the abort() call.
When you detach the thread you can destroy the std::thread object because it is no longer joinable.
I personally try to avoid detached threads. So I would prefer in your case either a thread pool or you keep track of your thread objects by storing them e.g in a vector.
As already answered by #mkaes, as long as the thread is not detached, it works in the context of the thread object, so it is stopped when the object is destroyed.
As far as threads are concerned, I either work with temporary threads or with static threads. Typically, on servers, static threads should not be stopped, so they can be detached. For temporary threads, you either want some result, so you have to join them at some point or you don't, so they can be detached.
The problem with threads in general is gracefully ending your application. This is even harder if you have detached threads, because even if you signal them to stop, you need an extra mechanism to find out whether all threads have ended (and even then you're never 100% sure), so that seems the main reason to avoid detached threads.
I have a program that uses boost threads. The program has start and stop functionality. When the program is started I create a boost thread that does some processing. When the program is stopped I call join on this thread and delete the thread's pointer. My program starts and stops correctly the first time; however, when I try to start my program a second time I fail an assertion inside of boost (when newing the processing thread) and the following is output on my screen
/root/src/boost.cmake/libs/thread/src/pthread/once.cpp:46: unsigned long &boost::detail::get_once_per_thread_epoch(): Assertion`!pthread_setspecific(epoch_tss_key,data)' failed.
I know that my join is working correctly because when the processing thread exits I output a message to my console. Does anyone know why this might happen?
An extra note... I have played around with my code a little bit and the methodology that I am using to clean up my boost threads appears to work in other parts of my program (for example, if I create the boost::thread in the parent class). However, it fails every time in the child class (which is an abstract class).
My start and stop methods looks like this...
void ThreadMethod()
{
while(_runningThread)
{
}
}
void Start()
{
_runningThread = true;
_thread = boost::make_shared<boost::thread>(&TestChildVirtualClass::ThreadMethod, this);
};
void Stop()
{
_runningThread = false;
_thread->join();
if( _thread )
{
_thread.reset();
}
};
However, I am having trouble recreating this issue in a test program (although it occurs every time in my actual program).
The error could be a bug on Boost.Thread as there are some holes in the call_once implementation (#5752 boost::call_once() is unreliable on some platforms - see https://svn.boost.org/trac/boost/ticket/5752). This of course depends on which platform you are running your program.
Of course I maybe wrong.
You should also protect the access to _runningThread.
For purposes of thread local cleanup I need to create an assertion that checks if the current thread was created via boost::thread. How can I can check if this was the case? That is, how can I check if the current thread is handled by boost::thread?
I simply need this to do a cleanup of thread local storage when the thread exits. Boost's thread_local_ptr appears to only work if the thread itself is a boost thread.
Note that I'm not doing the check at cleanup time, but sometime during the life of the thread. Some function calls one of our API/callbacks (indirectly) causing me to allocate thread-local storage. Only boost threads are allowed to do this, so I need to detect at that moment if the thread is not a boost thread.
Refer to Destruction of static class members in Thread local storage for the problem of not having a generic cleanup handler. I answered that and realized pthread_clenaup_push won't actually work: it isn't called on a clean exit form the thread.
While I don't have answer to detect a boost thread the chosen answer does solve the root of my problem. Boost thread_specific_ptr's will call their cleanup in any pthread. It must have been something else causing it not to work for me, as an isolated test shows that it does work.
The premise for your question is mistaken :) boost::thread_specific_ptr works even if the thread is not a boost thread. Think about it -- how would thread specific storage for the main thread work, seeing as it's impossible for it to be created by boost? I have used boost::thread_specific_ptr from the main thread fine, and although I haven't examined boost::thread_specific_ptr's implementation, the most obvious way of implementing it would work even for non-boost threads. Most operating systems let you get a unique ID number for the current thread, which you can then use as an index into a map/array/hashtable.
More likely you have a different bug that prevents the behavior you're expecting to see from happening. You should open a separate question with a small compilable code sample illustrating the unexpected behavior.
You can't do this with a static assertion: That would mean you could detect it at compile time, and that's impossible.
Assuming you mean a runtime check though:
If you don't mix boost::thread with other methods, then the problem just goes away. Any libraries that are creating threads should already be dealing with their own threads automatically (or per a shutdown function the API documents that you must call).
Otherwise you can keep, for example, a container of all pthread_ts you create not using boost::thread and check if the thread is in the container when shutting down. If it's not in the container then it was created using boost::thread.
EDIT: Instead of trying to detect if it was created with boost::thread, have you considered setting up your application so that the API callback can only occur in threads created with boost::thread? This way you prevent the problem up front and eliminate the need for a check that, if it even exists, would be painful to implement.
Each time a boost thread ends, all the Thread Specific Data gets cleaned. TSD is a pointer, calling delete p* at destruction/reset.
Optionally, instead of delete p*, a cleanup handler can get called for each item. That handler is specified on the TLS constructor, and you can use the cleanup function to do the one time cleaning.
#include <iostream>
#include <boost/thread/thread.hpp>
#include <boost/thread/tss.hpp>
void cleanup(int* _ignored) {
std::cout << "TLS cleanup" << std::endl;
}
void thread_func() {
boost::thread_specific_ptr<int> x(cleanup);
x.reset((int*)1); // Force cleanup to be called on this thread
std::cout << "Thread begin" << std::endl;
}
int main(int argc, char** argv) {
boost::thread::thread t(thread_func);
t.join();
return 0;
}