This question seems to be asked a lot. I had some legacy production code that was seemingly fine, until it started getting many more connections per day. Each connection kicked off a new thread. Eventually, it would exhaust memory and crash.
I'm going back over pthread (and C sockets) which I've not dealt with in years. The tutorial I had was informative, but I'm seeing the same thing when I use top. All the threads exit, but there's still some virtual memory taken up. Valgrind tells me there is a possible memory loss when calling pthread_create(). The very basic sample code is below.
The scariest part is that pthread_exit( NULL ) seems to leave about 100m in VIRT unaccounted for when all the threads exit. If I comment out this line, it's much more liveable, but there is still some there. On my system it start with about 14k, and ends with 47k.
If I up the thread count to 10,000, VIRT goes up to 70+ gigs, but finishes somewhere around 50k, assuming I comment out pthread_exit( NULL ). If I use pthread_exit( NULL ) it finishes with about 113m still in VIRT. Are these acceptable? Is top not telling me everything?
void* run_thread( void* id )
{
int thread_id = *(int*)id;
int count = 0;
while ( count < 10 ) {
sleep( 1 );
printf( "Thread %d at count %d\n", thread_id, count++ );
}
pthread_exit( NULL );
return 0;
}
int main( int argc, char* argv[] )
{
sleep( 5 );
int thread_count = 0;
while( thread_count < 10 ) {
pthread_t my_thread;
if ( pthread_create( &my_thread, NULL, run_thread, (void*)&thread_count ) < 0 ) {
perror( "Error making thread...\n" );
return 1;
}
pthread_detach( my_thread );
thread_count++;
sleep( 1 );
}
pthread_exit( 0 ); // added as per request
return 0;
}
I know this is rather old question, but I hope others wil benefit.
This is indeed a memory leak. The thread is created with default attributes. By default the thread is joinable. A joinable threads keeps its underlying bookkeeping until it is finished... and joined.
If a thread is never joined, set de Detached attribute. All (thread) resources will be freed once the thread terminates.
Here's an example:
pthread_attr_t attr;
pthread_t thread;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, 1);
pthread_create(&thread, &attr, &threadfunction, NULL);
pthread_attr_destroy(&attr);
Prior to your edit of adding pthread_exit(0) to the end of main(), your program would finish executing before all the threads had finished running. valgrind thus reported the resources that were still being held by the threads that were still active at the time the program terminated, making it look like your program had a memory leak.
The call to pthread_exit(0) in main() makes the main thread wait for all the other spawned threads to exit before it itself exits. This lets valgrind observe a clean run in terms of memory utilization.
(I am assuming linux is your operating system below, but it seems you are running some variety of UNIX from your comments.)
The extra virtual memory you see is just linux assigning some pages to your program since it was a big memory user. As long as your resident memory utilization is low and constant when you reach the idle state, and the virtual utilization is relatively constant, you can assume your system is well behaved.
By default, each thread gets 2MB of stack space on linux. If each thread stack does not need that much space, you can adjust it by initializing a pthread_attr_t and setting it with a smaller stack size using pthread_attr_setstacksize(). What stack size is appropriate depends on how deep your function call stack grows and how much space the local variables for those functions take.
#define SMALLEST_STACKSZ PTHREAD_STACK_MIN
#define SMALL_STACK (24*1024)
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, SMALL_STACK);
/* ... */
pthread_create(&my_thread, &attr, run_thread, (void *)thread_count);
/* ... */
pthread_attr_destroy(&attr);
Related
I'm currently using std::async in order to launch a several tasks(4) simultaneously, after the launch I wait for the task to finish using std::future objects.The problem is that when I see the system monitoring, it appears that more than 13 threads have been created and do not terminate.
Here is the piece of code:
System system;
std::vector<Compressor> m_compressorContainer(4);
std::vector<future<void> > m_futures(4);
while( system.isRunning() )
{
int index=0;
//launch one thread per compressor
for ( auto &compressor : m_compressorContainer )
{
m_futures[index++] = std::async(std::launch::any, &Compressor::process, compressor );
}
//wait for results
std::for_each( m_futures.begin(),m_futures.end(), [](std::future<void> &future){ future.get(); } );
}
Since I'm waiting the end of each thread, I was expecting that the number of thread will always be 4 and not 13.
No idea ?
Threads, like memory, may be kept alive by the library for reuse in the future. E.g. delete p; isn't guaranteed to return memory to the system either.
i would like to encrypt the file with multiple threads in order to reduce the time taken. im running on intel i5 processor, 4 GB memory, visual c++ 2008. the problem is when i run below code in debug mode (visual c++ 2008), the time taken is longer, example if i use one thread to encrypt 3 mb file, time taken is 5 seconds but when i use two threads, time taken is 10 seconds. The time is supposed to be short when using 2 threads in debug mode. but in release mode, there is no problem, time taken is short using multiple threads.
is it possible to run the code in debug mode with shorter time taken? is there setting to change in visual c++ 2008?
void load()
{
ifstream readF ("3mb.txt");
string output; string out;
if(readF.is_open())
{
while(!readF.eof())
{
getline(readF,out);
output=output+'\n'+out;
}
readF.close();
//cout<<output<<endl;
//cout<<output.size()<<endl;
text[0]=output;
}
else
cout<<"couldnt open file!"<<endl;
}
unsigned Counter;
unsigned __stdcall SecondThreadFunc( void* pArguments )
{
cout<<"encrypting..."<<endl;
Enc(text[0]);
_endthreadex( 0 );
return 0;
}
unsigned __stdcall SecondThreadFunc2( void* pArguments )
{
cout<<"encrypting..."<<endl;
//Enc(text[0]);
_endthreadex( 0 );
return 0;
}
int main()
{
load();
HANDLE hThread[10];
unsigned threadID;
time_t start, end;
start =time(0);
hThread[0] = (HANDLE)_beginthreadex( NULL, 0, &SecondThreadFunc, NULL, 0, &threadID);
hThread[1] = (HANDLE)_beginthreadex( NULL, 0, &SecondThreadFunc2, NULL, 0, &threadID );
WaitForSingleObject( hThread[0], INFINITE );
WaitForSingleObject( hThread[1], INFINITE );
CloseHandle( hThread[0] );
end=time(0);
cout<<"Time taken : "<<difftime(end, start) << "second(s)" << endl;
system("pause");
}
A potential reason it may be slower is that multiple threads will need to load data from memory into the cpu cache. In debug mode, there may be extra padding around data structures etc which is intended to catch buffer overflows. That might mean when the cpu switches from one thread to the other, it needs to flush out the cache and reload all data from ram. But, in release mode where there is no padding, enough data for both thread does fit into the cache, so it will run quicker.
You will find even in release mode if you add more threads you will reach a point where adding more threads gives diminishing returns and then actually starts to go slower than less threads.
edit:
I made the wrong assumption that threads started running on pthread_join when they actually start running on pthread_create.
I'm learning to use Posix threads, and I've read that:
pthread_join() - wait for thread termination
So, in the code sample, main's exit(0) is not reached until both started threads end.
But after the first call to pthread_join(), main continues executing, because the second call to pthread_join() actually runs, and the message in between is printed.
So how's this? does main continue executing while both threads aren't finished yet? or doesn't it?
I know this isn't a reliable way of testing, but the second test message always gets printed after both threads are finished, no matter how long the loop is. (at least on my machine when I tried it)
void *print_message_function( void *ptr )
{
char *message = (char *) ptr;
for( int a = 0; a < 1000; ++a )
printf( "%s - %i\n", message, a );
return NULL;
}
//
int main( int argc, char *argv[] )
{
pthread_t thread1, thread2;
char message1[] = "Thread 1";
char message2[] = "Thread 2";
int iret1, iret2;
//
iret1 = pthread_create( &thread1, NULL, print_message_function, (void*) message1);
iret2 = pthread_create( &thread2, NULL, print_message_function, (void*) message2);
//
pthread_join( thread1, NULL);
printf( "Let's see when is this printed...\n" );
pthread_join( thread2, NULL);
printf( "And this one?...\n" );
//
printf("Thread 1 returns: %d\n",iret1);
printf("Thread 2 returns: %d\n",iret2);
exit(0);
}
The function pthread_join waits for the thread to finish or returns immediately if the thread is already done.
So in your case
pthread_join( thread1, NULL); /* Start waiting for thread1. */
printf( "Let's see when is this printed...\n" ); /* Done waiting for thread1. */
pthread_join( thread2, NULL); /* Start waiting for thread2. */
printf( "And this one?...\n" ); /* Done waiting for thread2. */
But after the first call to pthread_join(), main continues executing,
because the second call to pthread_join() actually runs, and the
message in between is printed.
False. pthread_join waits unless thread1 is already done.
pthread_join() does not return (blocking the calling thread) until the thread being joined has terminated. If the thread has already terminated, then it returns straight away.
In your test, both threads do exit, and so of course you'll see all the messages printed from the main thread. When the first message is printed, you know that thread1 is complete; when the second is printed you know that thread2 is also complete. This will probably happen quite quickly after the first, since both threads were doing the same amount of work at roughly the same time.
pthread_join( thread1, NULL);
The main thread waits here on this join call till thread1 completes its job. Once thread1 completes execution main thread will proceed ahead and execute the next statement printf.
printf( "Let's see when is this printed...\n" );
Again, Main thread will wait here till thread2 completes its job.
pthread_join( thread2, NULL);
Once thread2 completes its job the main thread moves ahead and the next statement which is the printf is executed.
printf( "And this one?...\n" );
The sequence will work in the above mentioned way.Probably, this happens all too soon that the traces you see makes it confusing.
Also, Do not using printf to see behavior of multithreaded programs can be quite misleading, the order of the printf may not always indicate the correct control flow Since it is timing based and flushing of the buffers to stdout may not happen in sasme order as the prints were executed accross threads.
If the first pthread_join returns immediately, that would suggest that the first thread has already finished executing. What does the output look like? Do you see any "Thread 1 - n" output after "Let's see when this is printed"?
In the following code I create some number of threads, and each threads sleeps for some seconds.
However my main program doesn't wait for the threads to finish, I was under the assumption that threads would continue to run until they finished by themselves.
Is there someway of making threads continue to run even though the calling thread finishes.
#include <pthread.h>
#include <iostream>
#include <cstdio>
#include <cstdlib>
int sample(int min,int max){
int r=rand();
return (r %max+min );
}
void *worker(void *p){
long i = (long) p;
int s = sample(1,10);
fprintf(stdout,"\tid:%ld will sleep: %d \n",i,s);
sleep(s);
fprintf(stdout,"\tid:%ld done sleeping \n",i,s);
}
pthread_t thread1;
int main(){
int nThreads = sample(1,10);
for(int i=0;i<nThreads;i++){
fprintf(stderr,"\t-> Creating: %d of %d\n",i,nThreads);
int iret1 = pthread_create( &thread1, NULL, worker, (void*) i);
pthread_detach(thread1);
}
// sleep(10);//work if this is not commented out.
return 0;
}
Thanks
Edit:
Sorry for not clarifying, is it possible without explicitly keeping track of my current running threads and by using join.
Each program has a main thread. It is the thread in which your main() function executes. When the execution of that thread finishes, the program finishes along with all its threads. If you want your main thread to wait for other threads, use must use pthread_join function
You need to keep track of the threads. You are not doing that because you are using the same thread1 variable to every thread you are creating.
You track threads by creating a list (or array) of pthread_t types that you pass to the pthread_create() function. Then you pthread_join() those threads in the list.
edit:
Well, it's really lazy of you to not keep track of running threads. But, you can accomplish what you want by having a global var (protected by a mutex) that gets incremented just before a thread finishes. Then in you main thread you can check if that var gets to the value you want. Say nThreads in your sample code.
You need to join each thread you create:
int main()
{
int nThreads = sample(1,10);
std::vector<pthread_t> threads(nThreads);
for(i=0; i<nThreads; i++)
{
pthread_create( &threads[i], NULL, worker, (void*) i)
}
/* Wait on the other threads */
for(i=0; i<nThreads; i++)
{
status* status;
pthread_join(threads[i], &status);
}
}
You learned your assumption was wrong. Main is special. Exiting main will kill your threads. So there are two options:
Use pthread_exit to exit main. This function will allow you to exit main but keep other threads running.
Do something to keep main alive. This can be anything from a loop (stupid and inefficient) to any blocking call. pthread_join is common since it will block but also give you the return status of the threads, if you are interested, and clean up the dead thread resources. But for the purposes of keeping main from terminating any blocking call will do e.g. select, read a pipe, block on a semaphore, etc.
Since Martin showed join(), here's pthread_exit():
int main(){
int nThreads = sample(1,10);
for(int i=0;i<nThreads;i++){
fprintf(stderr,"\t-> Creating: %d of %d\n",i,nThreads);
int iret1 = pthread_create( &thread1, NULL, worker, (void*) i);
pthread_detach(thread1);
}
pthread_exit(NULL);
}
I have a program that spawns 3 worker threads that do some number crunching, and waits for them to finish like so:
#define THREAD_COUNT 3
volatile LONG waitCount;
HANDLE pSemaphore;
int main(int argc, char **argv)
{
// ...
HANDLE threads[THREAD_COUNT];
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
waitCount = 0;
for (int j=0; j<THREAD_COUNT; ++j)
{
threads[j] = CreateThread(NULL, 0, Iteration, p+j, 0, NULL);
}
WaitForMultipleObjects(THREAD_COUNT, threads, TRUE, INFINITE);
// ...
}
The worker threads use a custom Barrier function at certain points in the code to wait until all other threads reach the Barrier:
void Barrier(volatile LONG* counter, HANDLE semaphore, int thread_count = THREAD_COUNT)
{
LONG wait_count = InterlockedIncrement(counter);
if ( wait_count == thread_count )
{
*counter = 0;
ReleaseSemaphore(semaphore, thread_count - 1, NULL);
}
else
{
WaitForSingleObject(semaphore, INFINITE);
}
}
(Implementation based on this answer)
The program occasionally deadlocks. If at that point I use VS2008 to break execution and dig around in the internals, there is only 1 worker thread waiting on the Wait... line in Barrier(). The value of waitCount is always 2.
To make things even more awkward, the faster the threads work, the more likely they are to deadlock. If I run in Release mode, the deadlock comes about 8 out of 10 times. If I run in Debug mode and put some prints in the thread function to see where they hang, they almost never hang.
So it seems that some of my worker threads are killed early, leaving the rest stuck on the Barrier. However, the threads do literally nothing except read and write memory (and call Barrier()), and I'm quite positive that no segfaults occur. It is also possible that I'm jumping to the wrong conclusions, since (as mentioned in the question linked above) I'm new to Win32 threads.
What could be going on here, and how can I debug this sort of weird behavior with VS?
How do I debug weird thread behaviour?
Not quite what you said, but the answer is almost always: understand the code really well, understand all the possible outcomes and work out which one is happening. A debugger becomes less useful here, because you can either follow one thread and miss out on what is causing other threads to fail, or follow from the parent, in which case execution is no longer sequential and you end up all over the place.
Now, onto the problem.
pSemaphore = CreateSemaphore(NULL, THREAD_COUNT, THREAD_COUNT, NULL);
From the MSDN documentation:
lInitialCount [in]: The initial count for the semaphore object. This value must be greater than or equal to zero and less than or equal to lMaximumCount. The state of a semaphore is signaled when its count is greater than zero and nonsignaled when it is zero. The count is decreased by one whenever a wait function releases a thread that was waiting for the semaphore. The count is increased by a specified amount by calling the ReleaseSemaphore function.
And here:
Before a thread attempts to perform the task, it uses the WaitForSingleObject function to determine whether the semaphore's current count permits it to do so. The wait function's time-out parameter is set to zero, so the function returns immediately if the semaphore is in the nonsignaled state. WaitForSingleObject decrements the semaphore's count by one.
So what we're saying here, is that a semaphore's count parameter tells you how many threads are allowed to perform a given task at once. When you set your count initially to THREAD_COUNT you are allowing all your threads access to the "resource" which in this case is to continue onwards.
The answer you link uses this creation method for the semaphore:
CreateSemaphore(0, 0, 1024, 0)
Which basically says none of the threads are permitted to use the resource. In your implementation, the semaphore is signaled (>0), so everything carries on merrily until one of the threads manages to decrease the count to zero, at which point some other thread waits for the semaphore to become signaled again, which probably isn't happening in sync with your counters. Remember when WaitForSingleObject returns it decreases the counter on the semaphore.
In the example you've posted, setting:
::ReleaseSemaphore(sync.Semaphore, sync.ThreadsCount - 1, 0);
Works because each of the WaitForSingleObject calls decrease the semaphore's value by 1 and there are threadcount - 1 of them to do, which happen when the threadcount - 1 WaitForSingleObjects all return, so the semaphore is back to 0 and therefore unsignaled again, so on the next pass everybody waits because nobody is allowed to access the resource at once.
So in short, set your initial value to zero and see if that fixes it.
Edit A little explanation: So to think of it a different way, a semaphore is like an n-atomic gate. What you do is usually this:
// Set the number of tickets:
HANDLE Semaphore = CreateSemaphore(0, 20, 200, 0);
// Later on in a thread somewhere...
// Get a ticket in the queue
WaitForSingleObject(Semaphore, INFINITE);
// Only 20 threads can access this area
// at once. When one thread has entered
// this area the available tickets decrease
// by one. When there are 20 threads here
// all other threads must wait.
// do stuff
ReleaseSemaphore(Semaphore, 1, 0);
// gives back one ticket.
So the use we're putting semaphores to here isn't quite the one for which they were designed.
It's a bit hard to guess exactly what you might be running into. Parallel programming is one of those places that (IMO) it pays to follow the philosophy of "keep it so simple it's obviously correct", and unfortunately I can't say that your Barrier code seems to qualify. Personally, I think I'd have something like this:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[thread_count];
for (int i=0; i<thread_count; i++)
barrier_[i] = CreateEvent(NULL, true, false, NULL);
// ...
Barrier(size_t thread_num) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, barrier_, true, INFINITE);
}
Edit:
Okay, now that the intent has been clarified (need to handle multiple iterations), I'd modify the answer, but only slightly. Instead of one array of Events, have two: one for the odd iterations and one for the even iterations:
// define and initialize the array of events use for the barrier:
HANDLE barrier_[2][thread_count];
for (int i=0; i<thread_count; i++) {
barrier_[0][i] = CreateEvent(NULL, true, false, NULL);
barrier_[1][i] = CreateEvent(NULL, true, false, NULL);
}
// ...
Barrier(size_t thread_num, int iteration) {
// Signal that this thread has reached the barrier:
SetEvent(barrier_[iteration & 1][thread_num]);
// Then wait for all the threads to reach the barrier:
WaitForMultipleObjects(thread_count, &barrier[iteration & 1], true, INFINITE);
ResetEvent(barrier_[iteration & 1][thread_num]);
}
In your barrier, what prevents this line:
*counter = 0;
to be executed while this other one is executed by another thread?
LONG wait_count =
InterlockedIncrement(counter);