Why does my code that interrupts a thread leak? - c++

This is a very simplified example, so please bear with me for a moment....
#include <boost/thread/thread.hpp>
struct foo {
boost::thread t;
void do_something() { std::cout << "foo\n"; }
void thread_fun(){
try {
boost::this_thread::sleep(boost::posix_time::seconds(2));
{
boost::this_thread::disable_interruption di;
do_something();
}
} catch (boost::thread_interrupted& e) {}
}
void interrupt_and_restart() {
t.interrupt();
//if (t.joinable()) t.join(); // X
t = boost::thread(&foo::thread_fun,this);
}
};
int main(){
foo f;
for (int i=0;i<1000;i++){
f.interrupt_and_restart();
boost::this_thread::sleep(boost::posix_time::seconds(3));
}
}
When I run this code on linux and look at the memory consumption with top I see a constant increase in virtual memory used (and my actual code crashes at some point). Only if I join the thread after interrupting it, the memory usage stays constant. Why is that?

You are not joining the thread: because of this, some resources needed to keep track of the thread stay allocated.
A non joined thread still uses some system resources even if it has been terminated (e.g. its thread id is still valid).
Also, the system may impose a limit on the number of threads simultaneously allocated, and non joined threads count toward that limit.
Using cat /proc/sys/kernel/threads-max on my Linux VM gives me 23207 threads.
The latest versions of boost should actually crash when you destroy a joinable thread object, while older versions are happy to comply with the destruction request.

Related

thread ownership

Can B thread can created in A thread?
After waiting for B thread end, Can A thread continue to run?
Short answer
Yes
Yes
There is very little conceptual difference between thread A and the main thread. Note that you could even join thread B in the main thread even though it was created from thread A.
Sample: (replace <thread> with <boost/thread.hpp> if you don't have a c++11 compiler yet)
Live On Coliru
#include <thread>
#include <iostream>
void threadB() {
std::cout << "Hello world\n";
}
void threadA() {
std::thread B(threadB);
B.join();
std::cout << "Continued to run\n";
}
int main() {
std::thread A(threadA);
A.join(); // no difference really
}
Prints
Hello world
Continued to run
If B is a child thread of A?
There are ways to synchronize threads for turn taking. Whether or not they can run in parallel depends on using kernel threads or user threads. User threads are not aware of different processors so they cannot run truly in 'parallel'. If you want the threads to take turns you can use a mutex/semaphore/lock to synchronize them. If you want them to run in true parallel you will need B to be a child process of A.
You can also end the child thread/process in which case the parent will be scheduled. It's often not possible to guarantee scheduling without some sort of synchronization.
void FuncA()
{
if(ScanResultsMonitorThread == NULL) {
/* start thread A */
}
}
void FunAThread()
{
while(1) {
FuncB();
}
}
void FuncB()
{
try {
boost::this_thread::sleep(boost::posix_time::seconds(25));
}
catch(const boost::thread_interrupted&) {
}
if(needRestart){
/* create thread B */
boost::thread Restart(&FuncBThread,this);
boost::this_thread::sleep(boost::posix_time::seconds(10));
/* program can not run here and thread A end, why? */
}
else {
}
}

Actor calculation model using boost::thread

I'm trying to implement Actor calculation model over threads on C++ using boost::thread.
But program throws weird exception during execution. Exception isn't stable and some times program works in correct way.
There my code:
actor.hpp
class Actor {
public:
typedef boost::function<int()> Job;
private:
std::queue<Job> d_jobQueue;
boost::mutex d_jobQueueMutex;
boost::condition_variable d_hasJob;
boost::atomic<bool> d_keepWorkerRunning;
boost::thread d_worker;
void workerThread();
public:
Actor();
virtual ~Actor();
void execJobAsync(const Job& job);
int execJobSync(const Job& job);
};
actor.cpp
namespace {
int executeJobSync(std::string *error,
boost::promise<int> *promise,
const Actor::Job *job)
{
int rc = (*job)();
promise->set_value(rc);
return 0;
}
}
void Actor::workerThread()
{
while (d_keepWorkerRunning) try {
Job job;
{
boost::unique_lock<boost::mutex> g(d_jobQueueMutex);
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
job = d_jobQueue.front();
d_jobQueue.pop();
}
job();
}
catch (...) {
// Log error
}
}
void Actor::execJobAsync(const Job& job)
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(job);
d_hasJob.notify_one();
}
int Actor::execJobSync(const Job& job)
{
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(boost::bind(executeJobSync, &error, &promise, &job));
d_hasJob.notify_one();
}
int rc = future.get();
if (rc) {
ErrorUtil::setLastError(rc, error.c_str());
}
return rc;
}
Actor::Actor()
: d_keepWorkerRunning(true)
, d_worker(&Actor::workerThread, this)
{
}
Actor::~Actor()
{
d_keepWorkerRunning = false;
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_hasJob.notify_one();
}
d_worker.join();
}
Actually exception that is thrown is boost::thread_interrupted in int rc = future.get(); line. But form boost docs I can't reason of this exception. Docs says
Throws: - boost::thread_interrupted if the result associated with *this is not ready at the point of the call, and the current thread is interrupted.
But my worker thread can't be in interrupted state.
When I used gdb and set "catch throw" I see that back trace looks like
throw thread_interrupted
boost::detail::interruption_checker::check_for_interruption
boost::detail::interruption_checker::interruption_checker
boost::condition_variable::wait
boost::detail::future_object_base::wait_internal
boost::detail::future_object_base::wait
boost::detail::future_object::get
boost::unique_future::get
I looked into boost sources but can't get why interruption_checker decided that worker thread is interrupted.
So someone C++ guru, please help me. What I need to do to get correct code?
I'm using:
boost 1_53
Linux version 2.6.18-194.32.1.el5 Red Hat 4.1.2-48
gcc 4.7
EDIT
Fixed it! Thanks to Evgeny Panasyuk and Lazin. The problem was in TLS
management. boost::thread and boost::thread_specific_ptr are using
same TLS storage for their purposes. In my case there was problem when
they both tried to change this storage on creation (Unfortunately I
didn't get why in details it happens). So TLS became corrupted.
I replaced boost::thread_specific_ptr from my code with __thread
specified variable.
Offtop: During debugging I found memory corruption in external library
and fixed it =)
.
EDIT 2
I got the exact problem... It is a bug in GCC =)
The _GLIBCXX_DEBUG compilation flag breaks ABI.
You can see discussion on boost bugtracker:
https://svn.boost.org/trac/boost/ticket/7666
I have found several bugs:
Actor::workerThread function does double unlock on d_jobQueueMutex. First unlock is manual d_jobQueueMutex.unlock();, second is in destructor of boost::unique_lock<boost::mutex>.
You should prevent one of unlocking, for example release association between unique_lock and mutex:
g.release(); // <------------ PATCH
d_jobQueueMutex.unlock();
Or add additional code block + default-constructed Job.
It is possible that workerThread will never leave following loop:
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
Imagine following case: d_jobQueue is empty, Actor::~Actor() is called, it sets flag and notifies worker thread:
d_keepWorkerRunning = false;
d_hasJob.notify_one();
workerThread wakes up in while loop, sees that queue is empty and sleeps again.
It is common practice to send special final job to stop worker thread:
~Actor()
{
execJobSync([this]()->int
{
d_keepWorkerRunning = false;
return 0;
});
d_worker.join();
}
In this case, d_keepWorkerRunning is not required to be atomic.
LIVE DEMO on Coliru
EDIT:
I have added event queue code into your example.
You have concurrent queue in both EventQueueImpl and Actor, but for different types. It is possible to extract common part into separate entity concurrent_queue<T> which works for any type. It would be much easier to debug and test queue in one place than catching bugs scattered over different classes.
So, you can try to use this concurrent_queue<T>(on Coliru)
This is just a guess. I think that some code can actually call boost::tread::interrupt(). You can set breakpoint to this function and see what code is responsible for this. You can test for interruption in execJobSync:
int Actor::execJobSync(const Job& job)
{
if (boost::this_thread::interruption_requested())
std::cout << "Interruption requested!" << std::endl;
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
The most suspicious code in this case is a code that has reference to thread object.
It is good practice to make your boost::thread code interruption aware anyway. It is also possible to disable interruption for some scope.
If this is not the case - you need to check code that works with thread local storage, because thread interruption flag stored in the TLS. Maybe some your code rewrites it. You can check interruption before and after such code fragment.
Another possibility is that your memory is corrupt. If no code is calling boost::thread::interrupt() and you doesn't work with TLS. This is the most hard case, try to use some dynamic analyzer - valgrind or clang memory sanitizer.
Offtopic:
You probably need to use some concurrent queue. std::queue will be very slow because of high memory contention and you will end up with poor cache performance. Good concurrent queue allow your code to enqueue and dequeue elements in parallel.
Also, actor is not something that supposed to execute arbitrary code. Actor queue must receive simple messages, not functions! Youre writing a job queue :) You need to take a look at some actor system like Akka or libcpa.

multithreaded program producer/consumer [boost]

I'm playing with boost library and C++. I want to create a multithreaded program that contains a producer, conumer, and a stack. The procuder fills the stack, the consumer remove items (int) from the stack. everything work (pop, push, mutex) But when i call the pop/push winthin a thread, i don't get any effect
i made this simple code :
#include "stdafx.h"
#include <stack>
#include <iostream>
#include <algorithm>
#include <boost/shared_ptr.hpp>
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
#include <boost/signals2/mutex.hpp>
#include <ctime>
using namespace std;
/ *
* this class reprents a stack which is proteced by mutex
* Pop and push are executed by one thread each time.
*/
class ProtectedStack{
private :
stack<int> m_Stack;
boost::signals2::mutex m;
public :
ProtectedStack(){
}
ProtectedStack(const ProtectedStack & p){
}
void push(int x){
m.lock();
m_Stack.push(x);
m.unlock();
}
void pop(){
m.lock();
//return m_Stack.top();
if(!m_Stack.empty())
m_Stack.pop();
m.unlock();
}
int size(){
return m_Stack.size();
}
bool isEmpty(){
return m_Stack.empty();
}
int top(){
return m_Stack.top();
}
};
/*
*The producer is the class that fills the stack. It encapsulate the thread object
*/
class Producer{
public:
Producer(int number ){
//create thread here but don't start here
m_Number=number;
}
void fillStack (ProtectedStack& s ) {
int object = 3; //random value
s.push(object);
//cout<<"push object\n";
}
void produce (ProtectedStack & s){
//call fill within a thread
m_Thread = boost::thread(&Producer::fillStack,this, s);
}
private :
int m_Number;
boost::thread m_Thread;
};
/* The consumer will consume the products produced by the producer */
class Consumer {
private :
int m_Number;
boost::thread m_Thread;
public:
Consumer(int n){
m_Number = n;
}
void remove(ProtectedStack &s ) {
if(s.isEmpty()){ // if the stack is empty sleep and wait for the producer to fill the stack
//cout<<"stack is empty\n";
boost::posix_time::seconds workTime(1);
boost::this_thread::sleep(workTime);
}
else{
s.pop(); //pop it
//cout<<"pop object\n";
}
}
void consume (ProtectedStack & s){
//call remove within a thread
m_Thread = boost::thread(&Consumer::remove, this, s);
}
};
int main(int argc, char* argv[])
{
ProtectedStack s;
Producer p(0);
p.produce(s);
Producer p2(1);
p2.produce(s);
cout<<"size after production "<<s.size()<<endl;
Consumer c(0);
c.consume(s);
Consumer c2(1);
c2.consume(s);
cout<<"size after consumption "<<s.size()<<endl;
getchar();
return 0;
}
After i run that in VC++ 2010 / win7
i got :
0
0
Could you please help me understand why when i call fillStack function from the main i got an effect but when i call it from a thread nothing happens?
Thank you
Your example code suffers from a couple synchronization issues as noted by others:
Missing locks on calls to some of the members of ProtectedStack.
Main thread could exit without allowing worker threads to join.
The producer and consumer do not loop as you would expect. Producers should always (when they can) be producing, and consumers should keep consuming as new elements are pushed onto the stack.
cout's on the main thread may very well be performed before the producers or consumers have had a chance to work yet.
I would recommend looking at using a condition variable for synchronization between your producers and consumers. Take a look at the producer/consumer example here: http://en.cppreference.com/w/cpp/thread/condition_variable
It is a rather new feature in the standard library as of C++11 and supported as of VS2012. Before VS2012, you would either need boost or to use Win32 calls.
Using a condition variable to tackle a producer/consumer problem is nice because it almost enforces the use of a mutex to lock shared data and it provides a signaling mechanism to let consumers know something is ready to be consumed so they don't have so spin (which is always a trade off between the responsiveness of the consumer and CPU usage polling the queue). It also does so being atomic itself which prevents the possibility of threads missing a signal that there is something to consume as explained here: https://en.wikipedia.org/wiki/Sleeping_barber_problem
To give a brief run-down of how a condition variable takes care of this...
A producer does all time consuming activities on its thread without the owning the mutex.
The producer locks the mutex, adds the item it produced to a global data structure (probably a queue of some sort), lets go of the mutex and signals a single consumer to go -- in that order.
A consumer that is waiting on the condition variable re-acquires the mutex automatically, removes the item out of the queue and does some processing on it. During this time, the producer is already working on producing a new item but has to wait until the consumer is done before it can queue the item up.
This would have the following impact on your code:
No more need for ProtectedStack, a normal stack/queue data structure will do.
No need for boost if you are using a new enough compiler - removing build dependencies is always a nice thing.
I get the feeling that threading is rather new to you so I can only offer the advice to look at how others have solved synchronization issues as it is very difficult to wrap your mind around. Confusion about what is going on in an environment with multiple threads and shared data typically leads to issues like deadlocks down the road.
The major problem with your code is that your threads are not synchronized.
Remember that by default threads execution isn't ordered and isn't sequenced, so consumer threads actually can be (and in your particular case are) finished before any producer thread produces any data.
To make sure consumers will be run after producers finished its work you need to use thread::join() function on producer threads, it will stop main thread execution until producers exit:
// Start producers
...
p.m_Thread.join(); // Wait p to complete
p2.m_Thread.join(); // Wait p2 to complete
// Start consumers
...
This will do the trick, but probably this is not good for typical producer-consumer use case.
To achieve more useful case you need to fix consumer function.
Your consumer function actually doesn't wait for produced data, it will just exit if stack is empty and never consume any data if no data were produced yet.
It shall be like this:
void remove(ProtectedStack &s)
{
// Place your actual exit condition here,
// e.g. count of consumed elements or some event
// raised by producers meaning no more data available etc.
// For testing/educational purpose it can be just while(true)
while(!_some_exit_condition_)
{
if(s.isEmpty())
{
// Second sleeping is too big, use milliseconds instead
boost::posix_time::milliseconds workTime(1);
boost::this_thread::sleep(workTime);
}
else
{
s.pop();
}
}
}
Another problem is wrong thread constructor usage:
m_Thread = boost::thread(&Producer::fillStack, this, s);
Quote from Boost.Thread documentation:
Thread Constructor with arguments
template <class F,class A1,class A2,...>
thread(F f,A1 a1,A2 a2,...);
Preconditions:
F and each An must by copyable or movable.
Effects:
As if thread(boost::bind(f,a1,a2,...)). Consequently, f and each an are copied into
internal storage for access by the new thread.
This means that each your thread receives its own copy of s and all modifications aren't applied to s but to local thread copies. It's the same case when you pass object to function argument by value. You need to pass s object by reference instead - using boost::ref:
void produce(ProtectedStack& s)
{
m_Thread = boost::thread(&Producer::fillStack, this, boost::ref(s));
}
void consume(ProtectedStack& s)
{
m_Thread = boost::thread(&Consumer::remove, this, boost::ref(s));
}
Another issues is about your mutex usage. It's not the best possible.
Why do you use mutex from Signals2 library? Just use boost::mutex from Boost.Thread and remove uneeded dependency to Signals2 library.
Use RAII wrapper boost::lock_guard instead of direct lock/unlock calls.
As other people mentioned, you shall protect with lock all members of ProtectedStack.
Sample:
boost::mutex m;
void push(int x)
{
boost::lock_guard<boost::mutex> lock(m);
m_Stack.push(x);
}
void pop()
{
boost::lock_guard<boost::mutex> lock(m);
if(!m_Stack.empty()) m_Stack.pop();
}
int size()
{
boost::lock_guard<boost::mutex> lock(m);
return m_Stack.size();
}
bool isEmpty()
{
boost::lock_guard<boost::mutex> lock(m);
return m_Stack.empty();
}
int top()
{
boost::lock_guard<boost::mutex> lock(m);
return m_Stack.top();
}
You're not checking that the producing thread has executed before you try to consume. You're also not locking around size/empty/top... that's not safe if the container's being updated.

Is possible to get a thread-locking mechanism in C++ with a std::atomic_flag?

Using MS Visual C++2012
A class has a member of type std::atomic_flag
class A {
public:
...
std::atomic_flag lockFlag;
A () { std::atomic_flag_clear (&lockFlag); }
};
There is an object of type A
A object;
who can be accessed by two (Boost) threads
void thr1(A* objPtr) { ... }
void thr2(A* objPtr) { ... }
The idea is wait the thread if the object is being accessed by the other thread.
The question is: do it is possible construct such mechanism with an atomic_flag object? Not to say that for the moment, I want some lightweight that a boost::mutex.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
I've tryed in each thread some as:
thr1 (A* objPtr) {
...
while (std::atomic_flag_test_and_set_explicit (&objPtr->lockFlag, std::memory_order_acquire)) {
boost::this_thread::sleep(boost::posix_time::millisec(100));
}
... /* Zone to portect */
std::atomic_flag_clear_explicit (&objPtr->lockFlag, std::memory_order_release);
... /* the process continues */
}
But with no success, because the second thread hangs. In fact, I don't completely understand the mechanism involved in the atomic_flag_test_and_set_explicit function. Neither if such function returns inmediately or can delay until the flag can be locked.
Also it is a mistery to me how to get a lock mechanism with such a function who always set the value, and return the previous value. with no option to only read the actual setting.
Any suggestion are welcome.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
Such a zone is known as the critical section. The simplest way to work with a critical section is to lock by mutual exclusion.
The mutex solution suggested is indeed the way to go, unless you can prove that this is a hotspot and the lock contention is a performance problem. Lock-free programming using just atomic and intrinsics is enormously complex and cannot be recommended at this level.
Here's a simple example showing how you could do this (live on http://liveworkspace.org/code/6af945eda5132a5221db823fa6bde49a):
#include <iostream>
#include <thread>
#include <mutex>
struct A
{
std::mutex mux;
int x;
A() : x(0) {}
};
void threadf(A* data)
{
for(int i=0; i<10; ++i)
{
std::lock_guard<std::mutex> lock(data->mux);
data->x++;
}
}
int main(int argc, const char *argv[])
{
A instance;
auto t1 = std::thread(threadf, &instance);
auto t2 = std::thread(threadf, &instance);
t1.join();
t2.join();
std::cout << instance.x << std::endl;
return 0;
}
It looks like you're trying to write a spinlock. Yes, you can do that with std::atomic_flag, but you are better off using std::mutex instead. Don't use atomics unless you really know what you're doing.
To actually answer the question asked: Yes, you can use std::atomic_flag to create a thread locking object called a spinlock.
#include <atomic>
class atomic_lock
{
public:
atomic_lock()
: lock_( ATOMIC_FLAG_INIT )
{}
void lock()
{
while ( lock_.test_and_set() ) { } // Spin until the lock is acquired.
}
void unlock()
{
lock_.clear();
}
private:
std::atomic_flag lock_;
};

Why might this thread management pattern result in a deadlock?

I'm using a common base class has_threads to manage any type that should be allowed to instantiate a boost::thread.
Instances of has_threads each own a set of threads (to support waitAll and interruptAll functions, which I do not include below), and should automatically invoke removeThread when a thread terminates to maintain this set's integrity.
In my program, I have just one of these. Threads are created on an interval every 10s, and each performs a database lookup. When the lookup is complete, the thread runs to completion and removeThread should be invoked; with a mutex set, the thread object is removed from internal tracking. I can see this working properly with the output ABC.
Once in a while, though, the mechanisms collide. removeThread is executed perhaps twice concurrently. What I can't figure out is why this results in a deadlock. All thread invocations from this point never output anything other than A. [It's worth noting that I'm using thread-safe stdlib, and that the issue remains when IOStreams are not used.] Stack traces indicate that the mutex is locking these threads, but why would the lock not be eventually released by the first thread for the second, then the second for the third, and so on?
Am I missing something fundamental about how scoped_lock works? Is there anything obvious here that I've missed that could lead to a deadlock, despite (or even due to?) the use of a mutex lock?
Sorry for the poor question, but as I'm sure you're aware it's nigh-on impossible to present real testcases for bugs like this.
class has_threads {
protected:
template <typename Callable>
void createThread(Callable f, bool allowSignals)
{
boost::mutex::scoped_lock l(threads_lock);
// Create and run thread
boost::shared_ptr<boost::thread> t(new boost::thread());
// Track thread
threads.insert(t);
// Run thread (do this after inserting the thread for tracking so that we're ready for the on-exit handler)
*t = boost::thread(&has_threads::runThread<Callable>, this, f, allowSignals);
}
private:
/**
* Entrypoint function for a thread.
* Sets up the on-end handler then invokes the user-provided worker function.
*/
template <typename Callable>
void runThread(Callable f, bool allowSignals)
{
boost::this_thread::at_thread_exit(
boost::bind(
&has_threads::releaseThread,
this,
boost::this_thread::get_id()
)
);
if (!allowSignals)
blockSignalsInThisThread();
try {
f();
}
catch (boost::thread_interrupted& e) {
// Yes, we should catch this exception!
// Letting it bubble over is _potentially_ dangerous:
// http://stackoverflow.com/questions/6375121
std::cout << "Thread " << boost::this_thread::get_id() << " interrupted (and ended)." << std::endl;
}
catch (std::exception& e) {
std::cout << "Exception caught from thread " << boost::this_thread::get_id() << ": " << e.what() << std::endl;
}
catch (...) {
std::cout << "Unknown exception caught from thread " << boost::this_thread::get_id() << std::endl;
}
}
void has_threads::releaseThread(boost::thread::id thread_id)
{
std::cout << "A";
boost::mutex::scoped_lock l(threads_lock);
std::cout << "B";
for (threads_t::iterator it = threads.begin(), end = threads.end(); it != end; ++it) {
if ((*it)->get_id() != thread_id)
continue;
threads.erase(it);
break;
}
std::cout << "C";
}
void blockSignalsInThisThread()
{
sigset_t signal_set;
sigemptyset(&signal_set);
sigaddset(&signal_set, SIGINT);
sigaddset(&signal_set, SIGTERM);
sigaddset(&signal_set, SIGHUP);
sigaddset(&signal_set, SIGPIPE); // http://www.unixguide.net/network/socketfaq/2.19.shtml
pthread_sigmask(SIG_BLOCK, &signal_set, NULL);
}
typedef std::set<boost::shared_ptr<boost::thread> > threads_t;
threads_t threads;
boost::mutex threads_lock;
};
struct some_component : has_threads {
some_component() {
// set a scheduler to invoke createThread(bind(&some_work, this)) every 10s
}
void some_work() {
// usually pretty quick, but I guess sometimes it could take >= 10s
}
};
Well, a deadlock might occurs if the same thread lock a mutex it has already locked (unless you use a recursive mutex).
If the release part is called a second time by the same thread as it seems to happen with your code, you have a deadlock.
I have not studied your code in details, but you probably have to re-design your code (simplify ?) to be sure that a lock can not be acquired twice by the same thread. You can probably use a safeguard checking for the ownership of the lock ...
EDIT:
As said in my comment and in IronMensan answer, one possible case is that the thread stop during creation, the at_exit being called before the release of the mutex locked in the creation part of your code.
EDIT2:
Well, with mutex and scoped lock, I can only imagine a recursive lock, or a lock that is not released. It can happen if a loop goes to infinite due to a memory corruption for instance.
I suggest to add more logs with a thread id to check if there is a recursive lock or something strange. Then I will check that my loop is correct. I will also check that the at_exit is only called once per thread ...
One more thing, check the effect of erasing (thus calling the destructor) of a thread while being in the at_exit function...
my 2 cents
You may need to do something like this:
void createThread(Callable f, bool allowSignals)
{
// Create and run thread
boost::shared_ptr<boost::thread> t(new boost::thread());
{
boost::mutex::scoped_lock l(threads_lock);
// Track thread
threads.insert(t);
}
//Do not hold threads_lock while starting the new thread in case
//it completes immediately
// Run thread (do this after inserting the thread for tracking so that we're ready for the on-exit handler)
*t = boost::thread(&has_threads::runThread<Callable>, this, f, allowSignals);
}
In other words, use thread_lock exclusively to protect threads.
Update:
To expand on something in the comments with speculation about how boost::thread works, the lock patterns could look something like this:
createThread:
(createThread) obtain threads_lock
(boost::thread::opeator =) obtain a boost::thread internal lock
(boost::thread::opeator =) release a boost::thread internal lock
(createThread) release threads_lock
thread end handler:
(at_thread_exit) obtain a boost::thread internal lock
(releaseThread) obtain threads_lock
(releaseThread) release threads_lock
(at_thread_exit) release a boost:thread internal lock
If those two boost::thread locks are the same lock, the potential for deadlock is clear. But this is speculation because much of the boost code scares me and I try not to look at it.
createThread could/should be reworked to move step 4 up between steps one and two and eliminate the potential deadlock.
It is possible that the created thread is finishing before or during the assignment operator in createThread is complete. Using an event queue or some other structure that is might be necessary. Though a simpler, though hack-ish, solution might work as well. Don't change createThread since you have to use threads_lock to protect threads itself and the thread objects it points to. Instead change runThread to this:
template <typename Callable>
void runThread(Callable f, bool allowSignals)
{
//SNIP setup
try {
f();
}
//SNIP catch blocks
//ensure that createThread is complete before this thread terminates
boost::mutex::scoped_lock l(threads_lock);
}