Detached pthreads and memory leak - c++

Can somebody please explain to me why this simple code leaks memory?
I believe that since pthreads are created with detached state their resources should be released inmediatly after it's termination, but it's not the case.
My environment is Qt5.2.
#include <QCoreApplication>
#include <windows.h>
void *threadFunc( void *arg )
{
printf("#");
pthread_exit(NULL);
}
int main()
{
pthread_t thread;
pthread_attr_t attr;
while(1)
{
printf("\nStarting threads...\n");
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
}
printf("\nSleeping 10 seconds...\n");
Sleep(10000);
}
}
UPDATE:
I discovered that if I add a slight delay of 5 milliseconds inside the for loop the leak is WAY slower:
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
Sleep(5); /// <--- 5 MILLISECONDS DELAY ///
}
This is freaking me out, could somebody please tell me what is happening? How this slight delay may produce such a significant change? (or alter the behavior in any way)
Any advice would be greatly appreciated.
Thanks.
UPDATE2:
This leak was observed on Windows platforms (W7 and XP), no leak was observed on Linux platforms (thank you #MichaelGoren)

I checked the program with slight modifications on windows using cygwin, and memory consumption was steady. So it must be a qt issue; the pthread library on cygwin works fine without leaking.
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
void *threadFunc( void *arg )
{
printf("#");
pthread_exit(NULL);
}
int main()
{
pthread_t thread;
pthread_attr_t attr;
int idx;
while(1)
{
printf("\nStarting threads...\n");
for(idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
}
printf("\nSleeping 10 seconds...\n");
//Sleep(10000);
sleep(10);
}
}

Compiler optimizations or the OS it self can decide to do loop unrolling. That is your for loop has a constant bound (100 here). Since there is no explicit synchronization to prevent it, a newly created, detached thread can die and have its thread ID reassigned to another new thread before its creator returns from pthread_create() due to this unrolling. The next iteration is already started before the thread was actually destroyed.
This also explains why your added slight delay has less issues; one iteration takes longer and hence the thread functions can actually finish in more cases and hence the threads are actually terminated most of the time.
A possible fix would be to disable compiler optimizations, or add synchronization; that is, you check whether the thread still exist, at the end of the code, if it does you'll have to wait for the function to finish.
A more tricky way would be to use mutexes; you let the thread claim a resource at creation and by definition of PTHREAD_CREATE_DETACHED this resource is automatically released when the thread is exited, hence you can use try_lock to test whether the thread is actually finished. Note that I haven't tested this approach so I'm not actually sure whether PTHREAD_CREATE_DETACHED actually is working according to its definition...
Concept:
pthread_mutex_t mutex;
void *threadFunc( void *arg )
{
printf("#");
pthread_mutex_lock(&mutex);
pthread_exit(NULL);
}
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &thread, &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
pthread_mutex_lock(&mutex); //will block untill "destroy" has released the mutex
pthread_mutex_unlock(&mutex);
}

The delay can induce a large change in behavior because it gives the thread time to exit! Of course how your pthread library is implemented is also a factor here. I suspect it is using a 'free list' optimization.
If you create 1000 threads all at once, then the library allocates memory for them all before any significant number of those threads can exit.
If as in your second code sample you let the previous thread run and probably exit before you start a new thread, then your thread library can reuse that thread's allocated memory or data structures which it now knows are no longer needed and it is now probably holding in a free list just in case someone creates a thread again and it can efficiently recycle the memory.

It has nothing to do with compiler optimisations. Code is fine. Problem could be
a) Windows itself.
b) Qt implementation of pthread_create() with detached attributes
Checking for (a): Try to create many fast detached threads using Windows _beginthreadex directly and see if you get the same picture. Note: CloseHandle(thread_handle) as soon as _beginthreaex returns to make it detached.
Checking for (b): Trace which function Qt uses to create threads. If it is _beginthread then there is your answer. If it is _beginthreadex, then Qt is doing the right thing and you need to check if Qt closes the thread handle handle immediately. If it does not then that is the cause.
cheers
UPDATE 2
Qt5.2.0 does not provide pthreads API and is unlikely responsible for the observed leak.
I wrapped native windows api to see how the code runs without pthread library. You can include this fragment right after includes:
#include <process.h>
#define PTHREAD_CREATE_JOINABLE 0
#define PTHREAD_CREATE_DETACHED 1
typedef struct { int detachstate; } pthread_attr_t;
typedef HANDLE pthread_t;
_declspec(noreturn) void pthread_exit(void *retval)
{
static_assert(sizeof(unsigned) == sizeof(void*), "Modify code");
_endthreadex((unsigned)retval);
}
int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate)
{
attr->detachstate = detachstate;
return 0;
}
int pthread_attr_init(pthread_attr_t *attr)
{
attr->detachstate = PTHREAD_CREATE_JOINABLE;
return 0;
}
int pthread_attr_destroy(pthread_attr_t *attr)
{
(void)attr;
return 0;
}
typedef struct {
void *(*start_routine)(void *arg);
void *arg;
} winapi_caller_args;
unsigned __stdcall winapi_caller(void *arglist)
{
winapi_caller_args *list = (winapi_caller_args *)arglist;
void *(*start_routine)(void *arg) = list->start_routine;
void *arg = list->arg;
free(list);
static_assert(sizeof(unsigned) == sizeof(void*), "Modify code");
return (unsigned)start_routine(arg);
}
int pthread_create( pthread_t *thread, pthread_attr_t *attr,
void *(*start_routine)(void *), void *arg)
{
winapi_caller_args *list;
list = (winapi_caller_args *)malloc(sizeof *list);
if (list == NULL)
return EAGAIN;
list->start_routine = start_routine;
list->arg = arg;
*thread = (HANDLE)_beginthreadex(NULL, 0, winapi_caller, list, 0, NULL);
if (*thread == 0) {
free(list);
return errno;
}
if (attr->detachstate == PTHREAD_CREATE_DETACHED)
CloseHandle(*thread);
return 0;
}
With Sleep() line commented out it works OK without leaks. Run time = 1hr approx.
If the code with Sleep line commented out is calling Pthreads-win32 2.9.1 library (prebuilt for MSVC) then the program stops spawning new threads and stops responding after 5..10 minutes.
Test environment: XP Home, MSVC 2010 Expresss, Qt5.2.0 qmake etc.

You forgot to join your thread (even if they are finished already).
Correct code should be:
pthread_t arr[100];
for(int idx=0;idx<100;idx++)
{
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create( &arr[idx], &attr, &threadFunc, NULL);
pthread_attr_destroy ( &attr );
}
Sleep(2000);
for(int idx=0;idx<100;idx++)
{
pthread_join(arr[idx]);
}
Note from man page:
Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when enough zombie threads have
accumulated, it will no longer be possible to create new threads (or processes).

Related

Returning code from pthread creation in C++ is 11

I have thread creation problem using Pthread. My code is as follows. I show only some portion due to space constraints.
Main.c create Detectdirection instance and send to the function.
d = new Detectdirection();
while(run)
{
int ret = d->run_parallel(d);
if(ret == -1)
run = false;
}
My Detectdirection Class has two functions to run in parallel:
class Detectdirection{
public:
int run_parallel(void*p);
void *Tracking(void *p);
static void *Tracking_helper(void * p);
void *ReadImage(void *p );
static void *ReadImage_helper(void *p );
private:
pthread_t thread[2];
}
void *Detectdirection::ReadImage(void *p){
Detectdirection *app = (Detectdirection*)p;
while(run){
}
pthread_exit(NULL);
}
void *Detectdirection::Tracking(void *p){
Detectdirection *app = (Detectdirection*)p;
while(run){
}
pthread_exit(NULL);
}
void *Detectdirection::Tracking_helper(void *p){
Detectdirection *app = (Detectdirection*)p;
return ((Detectdirection*)p)->Tracking(app);
}
void *Detectdirection::ReadImage_helper(void *p ){
Detectdirection *app = (Detectdirection*)p;
return ((Detectdirection*)p)->ReadImage(app);
}
int Detectdirection::run_parallel(void* p){
Detectdirection *app = (Detectdirection*)p;
int rc = pthread_create(&thread[0], NULL, app->ReadImage_helper, app);
if (rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);
return -1;
}
rc = pthread_create(&thread[1], NULL, app->Tracking_helper, app);
if (rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);
return -1;
}
return 0;
}
Compile is ok and when I run, I have thread creation error. That sort of return type 11 happens only when many threads are created. But now I create only two thread and I have that error. What could be wrong?
I believe your are getting EAGAIN (based on the error code 11). That (obivously) means your system doesn't have enough resources to create threads anymore.
POSIX documentation says:
[EAGAIN] The system lacked the necessary resources to create another
thread, or the system-imposed limit on the total number of threads in
a process {PTHREAD_THREADS_MAX} would be exceeded.
I am not quite sure the following is true.
But now I create only two thread and I have that error. What could be wrong?
Here,
while(run)
{
int ret = d->run_parallel(d);
if(ret == -1)
run = false;
}
You are creating in a loop and each call d->run_parallel() creates two threads. So, you are potentially creating infinite number of threads
as the loop only breaks when pthread_create() fails. So, you may want to look at this loop carefully whether you really want to do as it is right now.
You don't seem to join with the threads you create. So, you could detach the threads so that thread-specific resources are released immediately when the thread(s) exit.
You can do:
pthread_detach(pthread_self());
in both ReadImage_helper() and Tracking_helper() functions to detach them. This could potentially solve your resource issue.
If it's still present then you have to look at ways to limit the number of threads that are simultaneously running on your system. One possible option is to use thread pools -- create a fixed number of threads and assign them new tasks as the threads complete their current task(s).

POSIX Threads - synchronize DETACHED threads using conditional variable MEMORY LEAK

Hello I'm trying to synchronize detached threads using conditional variable, but I found a bug that sometimes causes memory leak (depends on scheduler mood). I think the code is self explanatory. I would appreciate any advice.
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <unistd.h>
#include <pthread.h>
using namespace std;
struct TThrArg
{
pthread_t m_ID;
bool m_IsRunning;
};
TThrArg g_Threads[64];
int g_Counter;
pthread_mutex_t g_Mtx;
pthread_cond_t g_Cond;
void * thrFunc ( void * arg )
{
TThrArg * data = (TThrArg *) arg;
// do some stuff
// -----------------------------------
// for ( int i = 0; i < 5000; ++i )
// for ( int j = 0; j < 5000; ++j )
// int x = 0;
// printf("Thread: %lu running...\n", data->m_ID);
// -----------------------------------
pthread_mutex_lock(&g_Mtx);
memset(data, 0, sizeof(TThrArg));
--g_Counter;
pthread_cond_signal(&g_Cond);
pthread_mutex_unlock(&g_Mtx);
sleep(1); // --> this spot causes that main may end before return NULL so resources will not be freed
return NULL;
}
void createThread ( void )
{
pthread_mutex_lock(&g_Mtx);
for ( int i = 0; i < 64; ++i )
{
if ( g_Threads[i].m_IsRunning == 0 )
{
g_Threads[i].m_IsRunning = 1;
++g_Counter;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&g_Threads[i].m_ID, &attr, thrFunc, &g_Threads[i]);
pthread_attr_destroy(&attr);
break;
}
}
pthread_mutex_unlock(&g_Mtx);
}
int main ( int argc, char * argv[] )
{
pthread_mutex_init(&g_Mtx, NULL);
pthread_cond_init(&g_Cond, NULL);
g_Counter = 0;
for ( int i = 0; i < 64; ++i )
createThread();
pthread_mutex_lock(&g_Mtx);
while ( g_Counter != 0 )
{
pthread_cond_wait(&g_Cond, &g_Mtx);
}
pthread_mutex_unlock(&g_Mtx);
pthread_mutex_destroy(&g_Mtx);
pthread_cond_destroy(&g_Cond);
return 0;
}
The leak you see is because the terminating thread decrements the mutex-protected thread counter, and pauses for a second before the thread actually terminates.
The main execution thread will immediately see that the thread counter reached 0, and terminate before the actual detached threads have exited. Each running thread, even a detached thread, consumes and allocates a little bit of internal memory, which does not get released until the thread actually terminates. This is the leak you see, from execution threads that did not terminate before the main execution thread stopped.
This is not the kind of a leak that you need to worry about. It is rather annoying, and makes debugging difficult, true.
In the past, I took one approach in a framework class library that I wrote some time ago. I did not use detached threads at all, but all threads were joinable threads. The framework started one singleton background thread whose only job was to join() the terminated threads. Then, each thread started by the framework will queue up its own thread id for the singleton background thread, just before each thread terminates.
The net effect was equivalent to detached threads. I could start each thread and not worry about joining to it. It's going to be the background thread's job. The framework would signal the background thread to terminate itself, and join it, before exiting. So, if all goes well, there will not be any reported memory leaks that can be accounted to thread support.

pthread_attr_setstacksize and pthread_exit

I have a question about C concurrency programming in Embedded System with about 64Mb Ram.
Especially, I want to reduce the default memory used by a Thread, so I have defined:
pthread_attr_t attr_test;
size_t stacksize = 0x186A0; // 100Kbyte
pthread_attr_init(&attr_test);
pthread_attr_setdetachstate(&attr_test, PTHREAD_CREATE_DETACHED);
pthread_attr_setstacksize(&attr_test, stacksize);
So, When the Thread starts, it uses only 100Kbyte of virtual Memory.
BUT when the Thread ends and calls pthread_exit, the virtual Memory used by the process, increases rapidly!....
Why? What can I do?
Thanks!
UPDATE:
Thread ->
void *thread_test(void *arg1) {
int *param;
param = (int*)arg1;
printf("Thread %d start\n", *param);
pthread_cond_wait(&condition[*param], &mutex[*param]);
printf("Thread %d stop\n",*param);
pthread_exit(0);
}
Main ->
int main(void) {
pthread_t IDthread[MAX_THREADS];
int param[MAX_THREADS];
int pointer;
int i, keyb;
void *stkaddr;
size_t stacksize;
puts("!!! THREAD TEST !!!");
printf("Process ID %d\n\n", getpid());
for(i=0; i<MAX_THREADS; i++)
{
pthread_cond_init(&condition[i], NULL);
pthread_mutex_init(&mutex[i], NULL);
IDthread[i] = 0;
param[i] = i;
}
stacksize = 0x186A0; // 100Kbyte
pthread_attr_init(&attr_test);
pthread_attr_setdetachstate(&attr_test, PTHREAD_CREATE_DETACHED);
/* setting the size of the stack also */
pthread_attr_setstacksize(&attr_test, stacksize);
pointer = 0;
do {
keyb = getchar();
if (keyb == '1')
{
if (pointer < MAX_THREADS)
{
pthread_create(&IDthread[pointer], &attr_test, thread_test, &param[pointer]);
sleep(1);
pointer++;
}
else
puts("MAX Threads Number");
}
if (keyb == '2')
{
if (pointer != 0)
{
pointer--;
pthread_cond_signal(&condition[pointer]);
sleep(1);
}
else
puts("0 Thread is running");
}
} while (keyb != '0');
printf("FINE\n");
return EXIT_SUCCESS;
}
There is a known issue with the joinable or detached threads, quoting from the manual:
Only when a
terminated joinable thread has been joined are the last of its
resources released back to the system. When a detached thread
terminates, its resources are automatically released back to the
system
you can make the thread detachable with:
pthread_attr_setdetachstate(3)
There are some problems with your test.
At first, pthread_attr_setstacksize has the following documentation:
The stack size attribute determines the minimum size (in bytes) that will be allocated for threads created using the thread attributes object attr.
So each thread could use more than what you have set. But more than that, threads may allocate memory from the OS to use as stack. And this also applies to the main thread.
Therefore I don't think there is a way to achieve what you want by looking at the result of top command, since this information is only visible from within the thread itself.
Also note that the virtual memory used by the process is not related to the amount of RAM used by the process.
Here is something you can try to check the total stack of a thread.

Serial code execution in a multi-threaded program in C++

The question: Is it possible to guarantee code execution can only occur in one thread at a time in a multi-threaded program? (Or something which approximates this)
Specifically: I have a controller M (which is a thread) and threads A, B, C. I would like M to be able to decided who should be allowed to run. When the thread has finished (either finally or temporarily) the control transfers back to M.
Why: Ideally I want A, B and C to execute their code in their own thread while the others are not running. This would enable each thread to keep their instruction pointer and stack while they pause, starting back where they left off when the controller gives them the control back.
What I'm doing now: I've written some code which can actually do this - but I don't like it.
In pseudo-C:
//Controller M
//do some stuff
UnlockMutex(mutex);
do{}while(lockval==0);
LockMutex(mutex);
//continue with other stuff
//Thread A
//The controller currently has the mutex - will release it at UnlockMutex
LockMutex(mutex);
lockval=1;
//do stuff
UnlockMutex(mutex);
The reason why
do{}while(lockval==0);
is required is that when the mutex is unlocked, both A and M will continue. This hack ensures that A won't unlock the mutex before M can lock it again allowing A to retake the lock a second time and run again (it should only run once).
The do-while seems like overkill, but does the job. So my question is, is there a better way?
Assuming you're running on Windows, you might try looking at Fibers. (See eg http://developer.amd.com/Pages/1031200677.aspx or just google "windows fibers".)
I suspect you're really looking for coroutines.
Check for "CriticalSection" in Win32.
C++ 11 uses an other term "lock_guard".
How do I make a critical section with Boost?
http://en.cppreference.com/w/cpp/thread/lock_guard
Your code
do{}while(lockval==0);
will eat up your CPU performance.
I presume your are coding c++ under linux and using pthread API.
Here is the code, not so much robust, but a good point to start. Hope useful to you.
Using "g++ test_controller_thread.cpp -pthread -o test_controller_thread" to make the binary executive.
// 3 threads, one for controller, the other two for worker1 and worker2.
// Only one thread can proceed at any time.
// We use one pthread_mutex_t and two pthread_cond_t to guarantee this.
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
static pthread_mutex_t g_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t g_controller_cond = PTHREAD_COND_INITIALIZER;
static pthread_cond_t g_worker_cond = PTHREAD_COND_INITIALIZER;
void* controller_func(void *arg) {
printf("entering the controller thread. \n");
// limit the max time the controller can run
int max_run_time = 5;
int run_time = 0;
pthread_mutex_lock(&g_mutex);
while (run_time++ < max_run_time) {
printf("controller is waitting.\n");
pthread_cond_wait(&g_controller_cond, &g_mutex);
printf("controller is woken up.\n");
pthread_cond_signal(&g_worker_cond);
printf("signal worker to wake up.\n");
}
pthread_mutex_unlock(&g_mutex);
}
void* worker_func(void *arg) {
int work_id = *(int*)arg;
printf("worker %d start.\n", work_id);
pthread_mutex_lock(&g_mutex);
while (1) {
printf("worker %d is waitting for controller.\n", work_id);
pthread_cond_wait(&g_worker_cond, &g_mutex);
printf("worker %d is working.\n", work_id);
pthread_cond_signal(&g_controller_cond);
printf("worker %d signal the controller.\n", work_id);
}
pthread_mutex_unlock(&g_mutex);
}
int main() {
pthread_t controller_thread, worker_thread_1, worker_thread_2;
int worker_id_1 = 1;
int worker_id_2 = 2;
pthread_create(&controller_thread, NULL, controller_func, NULL);
pthread_create(&worker_thread_1, NULL, worker_func, &worker_id_1);
pthread_create(&worker_thread_2, NULL, worker_func, &worker_id_2);
sleep(1);
printf("\nsignal the controller to start all the process.\n\n");
pthread_cond_signal(&g_controller_cond);
pthread_join(controller_thread, NULL);
pthread_cancel(worker_thread_1);
pthread_cancel(worker_thread_2);
return 0;
}

Thread - synchronizing and sleeping thread refuses to wake up (LINUX)

I'm developing an application For OpenSUSE 12.1.
This application has a main thread and other two threads running instances of the same functions. I'm trying to use pthread_barrier to synchronize all threads but I'm having some problems:
When I put the derived threads to sleep, they will never wake up for some reason.
(in the case when I remove the sleep from the other threads, throwing CPU usage to the sky) In some point all the threads reach pthread_barrier_wait() but none of them continues execution after that.
Here's some pseudo code trying to illustrate what I'm doing.
pthread_barrier_t barrier;
int main(void)
{
pthread_barrier_init(&barrier, NULL , 3);
pthread_create(&thread_id1, NULL,&thread_func, (void*) &params1);
pthread_create(&thread_id2v, NULL,&thread_func, (void*) &params2);
while(1)
{
doSomeWork();
nanosleep(&t1, &t2);
pthread_barrier_wait(&barrier);
doSomeMoreWork();
}
}
void *thread_func(void *params)
{
init_thread(params);
while(1)
{
nanosleep(&t1, &t2);
doAnotherWork();
pthread_barrier_wait(&barrier);
}
}
I don't think it has to do with the barrier as you've presented it in the pseudocode. I'm making an assumption that your glibc is approximately the same as my machine. I compiled roughly your pseudo-code and it's running like I expect: the threads do some work, the main thread does some work, they all reach the barrier and then loop.
Can you comment more about any other synchronization methods or what the work functions are?
This is the the example program I'm using:
#include <pthread.h>
#include <stdio.h>
#include <time.h>
struct timespec req = {1,0}; //{.tv_sec = 1, .tv_nsec = 0};
struct timespec rem = {0,0}; //{.tv_sec = 0, .tv_nsec = 0};
pthread_barrier_t barrier;
void *thread_func(void *params) {
long int name;
name = (long int)params;
while(1) {
printf("This is thread %ld\n", name);
nanosleep(&req, &rem);
pthread_barrier_wait(&barrier);
printf("More work from %ld\n", name);
}
}
int main(void)
{
pthread_t th1, th2;
pthread_barrier_init(&barrier, NULL , 3);
pthread_create(&th1, NULL, &thread_func, (void*)1);
pthread_create(&th2, NULL, &thread_func, (void*)2);
while(1) {
nanosleep(&req, &rem);
printf("This is the parent\n\n");
pthread_barrier_wait(&barrier);
}
return 0;
}
I would suggest to use condition variables in order to synchronize threads.
Here some website about how to do it i hope it helps.
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html