I have a program that uses boost threads. The program has start and stop functionality. When the program is started I create a boost thread that does some processing. When the program is stopped I call join on this thread and delete the thread's pointer. My program starts and stops correctly the first time; however, when I try to start my program a second time I fail an assertion inside of boost (when newing the processing thread) and the following is output on my screen
/root/src/boost.cmake/libs/thread/src/pthread/once.cpp:46: unsigned long &boost::detail::get_once_per_thread_epoch(): Assertion`!pthread_setspecific(epoch_tss_key,data)' failed.
I know that my join is working correctly because when the processing thread exits I output a message to my console. Does anyone know why this might happen?
An extra note... I have played around with my code a little bit and the methodology that I am using to clean up my boost threads appears to work in other parts of my program (for example, if I create the boost::thread in the parent class). However, it fails every time in the child class (which is an abstract class).
My start and stop methods looks like this...
void ThreadMethod()
{
while(_runningThread)
{
}
}
void Start()
{
_runningThread = true;
_thread = boost::make_shared<boost::thread>(&TestChildVirtualClass::ThreadMethod, this);
};
void Stop()
{
_runningThread = false;
_thread->join();
if( _thread )
{
_thread.reset();
}
};
However, I am having trouble recreating this issue in a test program (although it occurs every time in my actual program).
The error could be a bug on Boost.Thread as there are some holes in the call_once implementation (#5752 boost::call_once() is unreliable on some platforms - see https://svn.boost.org/trac/boost/ticket/5752). This of course depends on which platform you are running your program.
Of course I maybe wrong.
You should also protect the access to _runningThread.
Related
When a class is responsible for managing a thread, it is a common pattern (see for example here) to join this thread in the destructor after you have made sure that the thread will finish in time. However, this is not always trivial as outlined in the linked thread leading to a program that never terminates if done incorrectly. Given below is an example to reproduce such a situation:
#include <iostream>
#include <thread>
#include <chrono>
using namespace std::chrono_literals;
class Foo {
public:
Foo() {
mythread = std::thread([&](){
int i = 0;
while(running) {
std::cout << "hi" << std::endl;
if (i++ >= 2) {
// placeholder for e.g. a blocking condition variable
std::this_thread::sleep_for(1000h);
}
std::this_thread::sleep_for(500ms);
}
});
}
~Foo() {
running = false;
mythread.join();
}
private:
std::thread mythread;
bool running{true};
};
int main() {
Foo bar;
std::this_thread::sleep_for(1s);
// enabling this line will block the termination
//std::this_thread::sleep_for(2s);
std::cout << "ending" << std::endl;
}
What I am searching for is a solution that forcefully terminates the program if this situation occurs. Of course, one should always strive towards finishing the thread properly, but having such feature would be good as last resort to have a peace of mind, especially for unobserved embedded systems where crashing programs can be easier restored and debugged than blocking programs.
A rough solution draft would be to start a thread at the end of the main that sleeps for a few seconds and if the program has not ended after that time, std::terminate is called (and ideally a corresponding error is reported). However, we have a chicken-or-egg problem because this new thread will of course keep the program from ending in time. I would highly appreciate any ideas.
EDIT: The solution should not require modification of the Foo class itself so that it also covers respective bugs in unmodified code of e.g. external libraries. Ideally, it would even cover threads no class feels responsible for ending them before the main ends (classes with static storage duration or even no longer referenced objects with dynamic storage duration), but that might not be possible at all without in-depth OS hacking or an external process monitor.
There are several solutions:
Investigate and fix the root problem (this is the best and correct solution)
Workarounds:
You can notify from thread about exiting via condition variable. And only after it do join. If CV's wait_for returns with timeout - kill thread (bad solution, there are another problems).
You can create watch-thread, which will verify time-counter. Counter should be reset from time to time by the application. If watch-thread detects too high value in time-counter, it restarts whole the application.
Move suspicious code out of your application to separate process and communicate with it via IPC. In case of problems - restart that application (best among the workarounds)
I'm creating a logging object which performs the real file writing work on a separate std::thread, and offers an interface to a log command buffer, syncing the caller threads and the one worker thread. Access to the buffer is protected by a mutex, there's an atomic bool for the worker thread exit condition, and I'm using Windows native Events as a signal to wake up the worker thread when new commands arrive. The object's constructor spawns the worker thread so it is immediately available. The worker thread is simply a while loop checking the exit condition, with in the loop a blocking wait for the signal. The object's destructor finally just sets the exit condition, signals the thread to wake up and joins it to ensure it's down before the object is fully destroyed.
Seems simple enough, and when using such an object somewhere in a function it works nicely. However, when declaring such an object as a global variable to have it usable for everyone it stops working. I'm on Windows, using Visual Studio 2017 with the 2015 tool chain. My project is a DLL plugin for another application.
The things I tried so far:
Start the thread in the constructor of the global object. This however makes the main thread hang immediately when my DLL is loaded. Pausing the app in the debugger reveals we're in the std lib, at a point where the main thread should have launched the worker thread and is now stuck waiting for a condition variable, presumably one that is signaled by the worker thread once it is launched?
Delay-construct the thread on demand when we first use the global object from somewhere else. This way constructing it goes nicely without a hang. However, when signalling the worker thread to exit from the destructor, the signal is sent, but the join on the worker thread now hangs. Pausing the app in the debugger reveals our main thread is the only one still alive, and the worker thread is already gone? A breakpoint placed in the worker thread function right before the close brace reveals it is never hit; the thread must be getting killed?
I also tried to start the thread via a std::future, starting it up async, and that one launches perfectly fine from the constructor in global objects. However, when the future tries to join the thread in the destructor, it hangs as well; here again no worker thread to be detected anymore while no breakpoint gets hit in it.
What could be going on? I can't imagine it's because the thread construction and destruction takes place outside main() so to speak; these std primitives should really be available at such moments, right? Or is this Windows specific and is the code running in the context of DllMain's DLL_PROCESS_ATTACH / DLL_THREAD_ATTACH events, where starting up threads might wreak havoc due to thread local storage not yet being up and running or such? (would it?)
EDIT -- added code sample
The following is an abbreviation/simplification of my code; it probably doesn't even compile but it gets the point across I hope :)
class LogWriter {
public:
LogWriter() :
m_mayLive(true) {
m_writerThread = std::thread(&C_LogWriter::HandleLogWrites, this); // or in initializer list above, same result
};
~LogWriter() {
m_mayLive = false;
m_doSomething.signal();
if (m_writerThread.joinable()) {
m_writerThread.join();
}
};
void AddToLog(const std::string& line) { // multithreaded client facing interface
{
Locker locker; // Locker = own RAII locker class
Lock(locker); // using a mutex here behind the scenes
m_outstandingLines.push_back(line);
}
m_doSomething.signal();
}
private:
std::list<std::string> m_outstandingLines; // buffer between worker thread and the rest of the world
std::atomic<bool> m_mayLive; // worker thread exit signal
juce::WaitableEvent m_doSomething; // signal to wake up worker thread; no std -- we're using other libs as well
std::thread m_writerThread;
int HandleLogWrites() {
do {
m_doSomething.wait(); // wait for input; no busy loop please
C_Locker locker; // access our line buffer; auto-released at end of loop iteration
Lock(locker);
while (!m_outstandingLines.empty()) {
WriteLineToLog(m_outstandingLines.front());
m_outstandingLines.pop_front();
if (!m_outstandingLines.empty()) {
locker.Unlock(); // don't hog; give caller threads some room to add lines to the buffer in between
std::this_thread::sleep_for(std::chrono::milliseconds(10));
Lock(locker);
}
};
} while (m_mayLive); // atmoic bool; no need to mutex it
WriteLineToLog("LogWriter shut down"); // doesn't show in the logs; breakpoints here also aren't being hit
return 0;
}
void WriteLineToLog(const std::string& line) {
... fopen, fprintf the line, flush, close ...
}
void Lock(C_Locker& locker) {
static LocalLock lock; // LocalLock is similar to std::mutex, though we're using other libs here
locker.Lock(&lock);
}
};
class Logger {
public:
Logger();
~Logger();
void operator() (const char* text, ...) { // behave like printf
std::string newLine;
... vsnprintf -> std::string ...
m_writer.AddToLog(newLine);
}
private:
LogWriter m_writer;
};
extern Logger g_logger; // so everyone can use g_logger("x = %d\n", x);
// no need to make it a Meyer Singleton; we have no other global objects interfering
Since you're writing a DLL in C++, you have to understand how "globals" in DLL's work. The compiler sticks their initialization in DllMain, before anything else that you would do there. But there are some strict rules what you can do in DllMain, as it runs under loader lock. The short summary is that you can't call anything in another DLL because that DLL cannot be loaded while your DllMain is running. Calling CreateThread is definitely not allowed, not even if wrapped inside a std::thread::thread constructor.
The problem with the destructor is quite possibly because your DLL has exited (can't tell without code). The DLL unloads before the EXE, and their respective globals are also cleaned up in that order. Any attempt to log from a destructor in an EXE will fail for obvious reasons.
There is no simple solution here. Andrei Alexandrescu's "Modern C++ Design" has a reasonable solution for logging in the non-DLL case, but you'll need to harden that for use in a DLL. An alternative is to check in your logging functions if your logger still exists. You can use a named mutex for that. If your log function fails in OpenMutex, then either the logger does not exist yet or it no longer exists.
Think I've encountered that destruction issue with DLLs to use with Unity.
The only solution I found back then was to essentially give up true global variables that would need cleanup.
Instead I put them in a separate class which is instantiated only a single time into a global pointer by some custom launch function. Then my DLL got a "quit()" function also called by the user of the DLL. The quit function correctly destroys the instance carrying the global variables.
Probably not the smoothest solution and you have a pointer-indirection on every access to the global variables, but it turned out to be comfortable for serializing the state of the global variables as well.
Sorry, the title is a click bait... It's not as easy to solve as you think... that one is a real challenge
I am having a very weird issue where a thread that is joinable() fails to join().
The error I get is No such process.
This is not a typical beginner's mistake of joining threads twice...
It is a complex issue and probably even caused by memory corruption... But I am hoping that I am simply missing something and I need a fresh external view... I have been working on this issue for two days.
I am compiling for both Linux and Windows.
On Linux (using gcc 9.1.0) it works flawlessly every time.
On Windows (using x86_64-w64-mingw32-g++ 9.2.0 from my linux machine and running the program on my windows machine) I always get the error.
Here's what I can confirm WITH 100% CERTAINTY :
Thread was NOT joined already.. Only one call to join() for that thread, and it crashes.
Thread is NOT default-constructed (and it is a raw pointer assigned with new)
Threads are working (Other Threads join() are working fine)
Calling a detach() instead of join() causes the same error
Not calling that join() (and sleep for a second instead) "fixes" the issue
The parent thread (the one creating the problematic thread) is the same as the one calling join()
Whether we are compiling in Debug (-ggdb -g -O0) or Release (-O3) does not change the outcome (Linux always works, windows always fails)
Erroneous thread is created through a lambda function which is perfectly-forwarded from another lambda function
That very last point may very well be the source of the issue, though I really don't see how.
I also know that the object containing the thread pointer is not destroyed before the join().
The only place where I delete this pointer is right after the join() if successful.
The parent object is a wrapped within a shared_ptr.
The pointer to that thread is also never used/shared elsewhere.
The code is very difficult to simplify and share here since it is part of a complete networking system and all aspects of it may be the source of the issue.
Oh, and the actual thread is correctly executed and all resulting network communications work as they should even though the thread cannot be joined.
Here's a very simplified version of the important parts with comments explaining what happens :
// We instantiate a new ListeningServer then call Start(),
// then we connect a client to it, we transfer some data,
// then we call Stop() on the ListeningServer and we get the error, but everything worked flawlessly still
typedef std::function<void(std::shared_ptr<ListeningSocket>)> Func;
class ListeningServer {
ListeningSocket listeningSocket; // The class' Constructor initializes it correctly
void Start(uint16_t port) {
listeningSocket.Bind(port);
listeningSocket.StartListeningThread([this](std::shared_ptr<ListeningSocket> socket) {
HandleNewConnection(socket);
});
}
void HandleNewConnection(std::shared_ptr<ListeningSocket> socket) {
// Whatever we are doing here works flawlessly and does not change the outcome of the error
}
void Stop() {
listeningSocket.Disconnect();
}
};
class ListeningSocket {
SOCKET socket = INVALID_SOCKET; // Native winsock fd handle for windows or typedefed to int on linux
std::thread* listeningThread = nullptr;
std::atomic<bool> listening = false;
void StartListeningThread(Func&& newSocketCallback) {
listening = (::listen(socket, SOMAXCONN) >= 0);
if (!listening) return; // That does not happen, we're still good
listeningThread = new std::thread([this](std::shared_ptr<ListeningSocket>&& newSocketCallback){
while (IsListening()) {
// Here I have Ommited a ::poll call with a 10ms timeout as interval so that the thread does not block, the issue is happening with or without it
memset(&incomingAddr, 0, sizeof(incomingAddr));
SOCKET clientSocket = ::accept(socket, (struct sockaddr*)&incomingAddr, &addrLen);
if (IsListening() && IsValid(clientSocket)) {
newSocketCallback(std::make_shared<ClientSocket>(clientSocket, incomingAddr)); // ClientSocket is a wrapper to native SOCKET with addr info and stuff...
}
}
LOG("ListeningThread Finished") // This is correctly logged just before the error
}, std::forward<Func>(newSocketCallback));
LOG("Listening with Thread " << listeningThread->get_id()) // This is correctly logged to the same thread id that we want to join() after
}
INLINE void Disconnect() {
listening = false; // will make IsListening() return false
if (listeningThread) {
if (listeningThread->joinable()) {
LOG("*** Socket Before join thread " << listeningThread->get_id()) // Logs the correct thread id
try {
listeningThread->join();
delete listeningThread;
listeningThread = nullptr;
LOG("*** Socket After join thread") // NEVER LOGGED
} catch(...) {
LOG("JOIN ERROR") // it ALWAYS goes here with "No Such Process"
SLEEP(100ms) // We need to make sure the thread still finishes in time
// The thread finishes in time and all resulting actions work flawlessly
}
}
}
#ifdef _WINDOWS
::closesocket(socket);
#else
::close(socket);
#endif
socket = INVALID_SOCKET;
}
};
Anothing important thing to note is that elsewhere in the program I am directly instantiating a ListeningSocket and calling StartListeningThread() with a lambda and that one does not fail to join the thread after calling Disconnect() directly
Also, part of this code is compiled in a shared library that is linked dynamically.
Issue solved !
It would seem that, in windows only, one cannot create a thread from code compiled in a shared library and try to join it from code compiled in the main application.
Basically, the joinable() will return true, but the .join() or .detach() will fail.
All I had to do is to make sure the thread is created and joined from code originally compiled in the same file.
It was the kind of hint that I was looking for when I asked the question, because I knew that it was more complicated than that and a simplified minimal code would not be able to reproduce the issue.
This constraint of threads in windows is not documented anywhere (as I know of, and I SEARCHED)
So it is very plausible that it's not supposed to be a constraint and is actually a bug in the compiler I am using.
I heard that "a modern operating system will clean up all threads created by the process on closing it" but when I return main(), I'm getting these errors:
1) This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
2) terminate called without an active exception
My implementation looks like this (I'm writing now for example sorry for bad implementation):
void process(int id)
{
while(true) { std::this_thread::sleep_for(std::chrono::milliseconds(1); } }
}
int main()
{
std::thread thr1(process, 0);
std::thread thr2(process, 1);
//thr1.detach();
//thr2.detach();
return 0;
}
If I uncomment detach();s, there is no problem but my processing threads will be socket readers/writers and they will run infinitely (until main returns). So how to deal with it? What's wrong?
EDIT: Namely, I can't detach() every thread one-by-one because they will not be terminated normally (until the end). Oh and again, if I close my program from the DDOS window's X button, (my simple solution not works in this case) my detach(); functions being passed because program force-terminated and here is the error again :)
What happens in an application is not related to what the OS may do.
If a std::thread is destroyed, still having a joinable thread, the application calls std::terminate and that's what is showing up: http://en.cppreference.com/w/cpp/thread/thread/~thread`
With the c++11 threads, either you detach if you do not care on their completion time, or you care and need to join before the thread object is destroyed.
I am new to multi-threading. I am using c++ on unix.
In the code below, runSearch() takes a long time and I want to be able to kill the search as soon as "cancel == true". The function cancelSearch is called by another thread.
What is the best way to solve this problem?
Thanks you..
------------------This is the existing code-------------------------
struct SearchTask : public Runnable
{
bool cancel = false;
void cancelSearch()
{
cancel = true;
}
void run()
{
cancel = false;
runSearch();
if (cancel == true)
{
return;
}
//...more steps.
}
}
EDIT: To make it more clear, say runSearch() takes 10 mins to run. After 1 min, cancel==true, then I want to exit out of run() immediately rather than waiting another 9 more mins for runSearch() to complete.
You'll need to keep checking the flag throughout the search operation. Something like this:
void run()
{
cancel = false;
while (!cancel)
{
runSearch();
//do your thread stuff...
}
}
You have mentioned that you cannot modify runSearch(). With pthreads there's a pthread_setcancelstate() function, however I don't believe this is safe, especially with C++ code that expects RAII semantics.
Safe thread cancellation must be cooperative. The code that gets canceled must be aware of the cancellation and be able to clean up after itself. If the code is not designed to do this and is simply terminated then your program will probably exhibit undefined behavior.
For this reason C++'s std::thread does not offer any method of thread cancellation and instead the code must be written with explicit cancellation checks as other answers have shown.
Create a generic method that accepts a action / delegate. Have each step be something REALLY small and specific. Send the generic method a delegate / action of what you consider a "step". In the generic method detect if cancel is true and return if true. Because steps are small if it is cancelled it shouldn't take long for the thread to die.
That is the best advice I can give without any code of what the steps do.
Also note :
void run()
{
cancel = false;
runSearch();
while (!cancel)
{
//do your thread stuff...
}
}
Won't work because if what you are doing is not a iteration it will run the entire thread before checking for !cancel. Like I said if you can add more details on what the steps do it would easier to give you advice. When working with threads that you want to halt or kill, your best bet is to split your code into very small steps.
Basically you have to poll the cancel flag everywhere. There are other tricks you could use, but they are more platform-specific, like thread cancellation, or are not general enough like interrupts.
And cancel needs to be an atomic variable (like in std::atomic, or just protected it with a mutex) otherwise the compiler might just cache the value in a register and not see the update coming from another thread.
Reading the responses is right - just because you've called a blocking function in a thread doesn't mean it magically turns into a non-blocking call. The thread may not interrupt the rest of the program, but it still has to wait for the runSearch call to complete.
OK, so there are ways round this, but they're not necessarily safe to use.
You can kill a thread explicitly. On Windows you can use TerminateThread() that will kill the thread execution. Sound good right? Well, except that it is very dangerous to use - unless you know exactly what all the resources and calls are going on in the killed thread, you may find yourself with an app that refuses to work correctly next time round. If runSearch opens a DB connection for example, the TerminateThread call will not close it. Same applies to memory, loaded dlls, and all they use. Its designed for killing totally unresponsive threads so you can close a program and restart it.
Given the above, and the very strong recommendation you not use it, the next step is to call the runSearch in a external manner - if you run your blocking call in a separate process, then the process can be killed with a lot more certainty that you won't bugger everything else up. The process dies, clears up its memory, its heap, any loaded dlls, everything. So inside your thread, call CreateProcess and wait on the handle. You'll need some form on IPC (probably best not to use shared memory as it can be a nuisance to reset that when you kill the process) to transfer the results back to your main app. If you need to kill this process, call ExitProcess on it's handle (or exit in Linux)
Note that these exit calls require to be called inside the process, so you'll need to run a thread inside the process for your blocking call. You can terminate a process externally, but again, its dangerous - not nearly as dangerous as killing a thread, but you can still trip up occasionally. (use TerminateProcess or kill for this)