Memory management during Windows Thread Creation

Memory management during Windows Thread Creation - c++

I need one clarification related to the Windows Threads (WEC7). Consider the following sample code. I would like to know whether any memory leak in the code.
In the code snippet MyThread1 creates memory in the heap and passed to MyThread2 and the allocated memory is cleared there.
DWORD WINAPI MyThread2(LPVOID lpVoid)
{
int* b = (int*)lpVoid;
int c = *b;
free(lpVoid);
return 0;
}
DWORD WINAPI MyThread1(LPVOID lpVoid)
{
int count =100;
while(count)
{
int* a = NULL;
a= (int*)malloc(sizeof(int));
*a = count;
CloseHandle(CreateThread(NULL, 0, MyThread2,(LPVOID)a, 0, NULL));
count --;
}
return 0;
}
int main()
{
CreateThread(NULL, 0, MyThread1,NULL, 0, NULL);
// wait here until MyThread1 exits.
return 0;
}

The code you have shown will leak if CreateThread() in MyThread1() fails to create a new thread. You are not checking for that condition so MyThread1() can free the memory it allocated. Other than that, since you are allocating memory in one thread and freeing it in another thread, make sure you are using the multi-threaded version of your compiler's RTL.

There's no leak of memory. The calls to malloc are matched by calls to free.
It is possible the some thread 2 instances have not started to run when thread 1 finishes and the program closes. But at that point the system reclaims memory.
You don't call CloseHandle for thread 1, but then I guess this isn't real code. In the real code you would for sure have to capture the handle for thread 1 so you can wait on it.
HANDLE hThread1 = CreateThread(NULL, 0, MyThread1,NULL, 0, NULL);
WaitForSingleObject(hThread1, INFINITE);
CloseHandle(hThread1);
Note that I've adopted your policy of eliding error checking for simplicity of exposition. Of course in real code you'd add error handling for all API calls.

Related

CString use coupled with HeapWalk and HeapLock/HeapUnlock deadlocks in the kernel

My goal is to lock virtual memory allocated for my process heaps (to prevent a possibility of it being swapped out to disk.)
I use the following code:
//pseudo-code, error checks are omitted for brevity
struct MEM_PAGE_TO_LOCK{
const BYTE* pBaseAddr; //Base address of the page
size_t szcbBlockSz; //Size of the block in bytes
MEM_PAGE_TO_LOCK()
: pBaseAddr(NULL)
, szcbBlockSz(0)
{
}
};
void WorkerThread(LPVOID pVoid)
{
//Called repeatedly from a worker thread
HANDLE hHeaps[256] = {0}; //Assume large array for the sake of this example
UINT nNumberHeaps = ::GetProcessHeaps(256, hHeaps);
if(nNumberHeaps > 256)
nNumberHeaps = 256;
std::vector<MEM_PAGE_TO_LOCK> arrPages;
for(UINT i = 0; i < nNumberHeaps; i++)
{
lockUnlockHeapAndWalkIt(hHeaps[i], arrPages);
}
//Now lock collected virtual memory
for(size_t p = 0; p < arrPages.size(); p++)
{
::VirtualLock((void*)arrPages[p].pBaseAddr, arrPages[p].szcbBlockSz);
}
}
void lockUnlockHeapAndWalkIt(HANDLE hHeap, std::vector<MEM_PAGE_TO_LOCK>& arrPages)
{
if(::HeapLock(hHeap))
{
__try
{
walkHeapAndCollectVMPages(hHeap, arrPages);
}
__finally
{
::HeapUnlock(hHeap);
}
}
}
void walkHeapAndCollectVMPages(HANDLE hHeap, std::vector<MEM_PAGE_TO_LOCK>& arrPages)
{
PROCESS_HEAP_ENTRY phe = {0};
MEM_PAGE_TO_LOCK mptl;
SYSTEM_INFO si = {0};
::GetSystemInfo(&si);
for(;;)
{
//Get next heap block
if(!::HeapWalk(hHeap, &phe))
{
if(::GetLastError() != ERROR_NO_MORE_ITEMS)
{
//Some other error
ASSERT(NULL);
}
break;
}
//We need to skip heap regions & uncommitted areas
//We're interested only in allocated blocks
if((phe.wFlags & (PROCESS_HEAP_REGION |
PROCESS_HEAP_UNCOMMITTED_RANGE | PROCESS_HEAP_ENTRY_BUSY)) == PROCESS_HEAP_ENTRY_BUSY)
{
if(phe.cbData &&
phe.lpData)
{
//Get address aligned at the page size boundary
size_t nRmndr = (size_t)phe.lpData % si.dwPageSize;
BYTE* pBegin = (BYTE*)((size_t)phe.lpData - nRmndr);
//Get segment size, also page aligned (round it up though)
BYTE* pLast = (BYTE*)phe.lpData + phe.cbData;
nRmndr = (size_t)pLast % si.dwPageSize;
if(nRmndr)
pLast += si.dwPageSize - nRmndr;
size_t szcbSz = pLast - pBegin;
//Do we have such a block already, or an adjacent one?
std::vector<MEM_PAGE_TO_LOCK>::iterator itr = arrPages.begin();
for(; itr != arrPages.end(); ++itr)
{
const BYTE* pLPtr = itr->pBaseAddr + itr->szcbBlockSz;
//See if they intersect or are adjacent
if(pLPtr >= pBegin &&
itr->pBaseAddr <= pLast)
{
//Intersected with another memory block
//Get the larger of the two
if(pBegin < itr->pBaseAddr)
itr->pBaseAddr = pBegin;
itr->szcbBlockSz = pLPtr > pLast ? pLPtr - itr->pBaseAddr : pLast - itr->pBaseAddr;
break;
}
}
if(itr == arrPages.end())
{
//Add new page
mptl.pBaseAddr = pBegin;
mptl.szcbBlockSz = szcbSz;
arrPages.push_back(mptl);
}
}
}
}
}
This method works, except that rarely the following happens. The app hangs up, UI and everything, and even if I try to run it with the Visual Studio debugger and then try to Break all, it shows an error message that no user-mode threads are running:
The process appears to be deadlocked (or is not running any user-mode
code). All threads have been stopped.
I tried it several times. The second time when the app hung up, I used the Task Manager to create dump file, after which I loaded the .dmp file into Visual Studio & analyzed it. The debugger showed that the deadlock happened somewhere in the kernel:
and if you review the call stack:
It points to the location of the code as such:
CString str;
str.Format(L"Some formatting value=%d, %s", value, etc);
Experimenting further with it, if I remove HeapLock and HeapUnlock calls from the code above, it doesn't seem to hang anymore. But then HeapWalk may sometimes issue an unhandled exception, access violation.
So any suggestions how to resolve this?

The problem is that you're using the C runtime's memory management, and more specifically the CRT's debug heap, while holding the operating system's heap lock.
The call stack you've posted includes _free_dbg, which always claims the CRT debug heap lock before taking any other action, so we know the thread holds the CRT debug heap lock. We can also see that the CRT was inside an operating system call made by _CrtIsValidHeapPointer when the deadlock occurred; the only such call is to HeapValidate and HEAP_NO_SERIALIZE is not specified.
So the thread whose call stack has been posted is holding the CRT debug heap lock and attempting to claim the operating system's heap lock.
The worker thread, on the other hand, holds the operating system's heap lock and makes calls that attempt to claim the CRT debug heap lock.
QED. Classic deadlock situation.
In a debug build, you will need to refrain from using any C or C++ library functions that might allocate or free memory while you are holding the corresponding operating system heap lock.
Even in a release build, you would still need to avoid any library functions that might allocate or release memory while holding a lock, which might be a problem if, for example, a hypothetical future implementation of std::vector was changed to make it thread-safe.
I recommend that you avoid the issue entirely, which is probably best done by creating a dedicated heap for your worker thread and taking all necessary memory allocations out of that heap. It would probably be best to exclude this heap from processing; the documentation for HeapWalk does not explicitly say that you should not modify the heap during enumeration, but it seems risky.

Linux Multithreading - threads do not produce any output as expected

I am learning multi-threading in Linux platform. I wrote this small program to get comfort with the concepts. On running the executable, I could not see any error nor does it print Hi. Hence I made to sleep the thread after I saw the output. But still could not see the prints on the console.
I also want to know which thread prints at run time. Can anyone help me?
#include <iostream>
#include <unistd.h>
#include <pthread.h>
using std::cout;
using std::endl;
void* print (void* data)
{
cout << "Hi" << endl;
sleep(10000000);
}
int main (int argc, char* argv[])
{
int t1 = 1, t2 =2, t3 = 3;
pthread_t thread1, thread2, thread3;
int thread_id_1, thread_id_2, thread_id_3;
thread_id_1 = pthread_create(&thread1, NULL, print, 0);
thread_id_2 = pthread_create(&thread2, NULL, print, 0);
thread_id_3 = pthread_create(&thread3, NULL, print, 0);
return 0;
}

Your main thread probably exits and thus the entire process dies. So, the threads don't get a chance to run. It's also possible (quite unlikely but still possible) that you'd see the output from the threads even with your code as-is if the threads complete execution before main thread exits. But you can't rely on that.
Call pthread_join(), which suspends the calling thread until the thread (specified by the thread ID) returns, on the threads after the pthread_create() calls in main():
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
pthread_join(thread3, NULL);
You can also use an array of pthread_t which would allow you to use a for loop over the pthread_create() and pthread_join() calls.
Or exit only the main thread using pthread_exit(0), which would exit only the calling thread and the remaining threads (the ones you created) will continue execution.
Note that your thread function should return a pointer or NULL:
void* print (void* data)
{
cout << "Hi" << endl;
return NULL;
}
Not sure about the high sleeps either right the threads exit, which is unnecessary and would hold the threads from exiting. Probably not something you wanted.

Abnormal output of windows-thread in c++

i'm new to c++ and experimenting a bit around. I started to create a thread using and the CreateThread-Method. I also pass in a pointer to a struct-method, which references a test-value. So far so good. Now i changed the value BEFORE starting the thread to 0. When i start the program, evrything seems to be working like i expected. But when i change the value also AFTER the CreateThread-Method (to 1), the thread outputs always 1. I expected an output of 0, because i thought that changes to a variable does not affect the thread. Or at least an output of 0 first of all and then 1.
Now i tested something:
If i let the main-thread wait for example 5000ms, then the output of the thread is again always 0. But without it is 1 again.
I hope you can help me with the understanding of that problem.
Here is some code:
struct TESTSTRUCT{
int test;
};
DWORD WINAPI testFunc(void* lpParam){
TESTSTRUCT& params=*((TESTSTRUCT*)lpParam);
int a=params.test;
while(true){
cout << a;
Sleep(5000);
}
return 0;
}
int main(int argc, char *argv[]) {
TESTSTRUCT teststr;
teststr.test=0;
HANDLE myhandle;
myhandle = CreateThread(0, 0, testFunc, &teststr, 0, 0);
Sleep(5000); // If left out, thread outputs always 1!
teststr.test=1;
cout << "main" << teststr.test;
CloseHandle(myhandle);
MSG msg;
while(GetMessage(&msg, NULL, 0, 0))
{
TranslateMessage(&msg);
DispatchMessage(&msg);
}
}

See this:
myhandle = CreateThread(0, 0, testFunc, &teststr, 0, 0);
You are passing your teststr by reference to your testFunc. Therefore your main method and your testFunc are operating on the same data. Changing one will affect the other.
To be clear, the way it works is that you are getting the address of teststr, casting it to void*, and then within testFunc you are casting it back to TESTSTRUCT&. This is the same thing as if you just passed teststr to testFunc by reference.
I suppose I should mention that without locking you have a race condition on the output of the thread, no matter what the sleep period is. Make main sleep long enough and in general you'll see your thread print the 'old' value, but there's no guarantee.

pthread_create() and memory leaks

This question seems to be asked a lot. I had some legacy production code that was seemingly fine, until it started getting many more connections per day. Each connection kicked off a new thread. Eventually, it would exhaust memory and crash.
I'm going back over pthread (and C sockets) which I've not dealt with in years. The tutorial I had was informative, but I'm seeing the same thing when I use top. All the threads exit, but there's still some virtual memory taken up. Valgrind tells me there is a possible memory loss when calling pthread_create(). The very basic sample code is below.
The scariest part is that pthread_exit( NULL ) seems to leave about 100m in VIRT unaccounted for when all the threads exit. If I comment out this line, it's much more liveable, but there is still some there. On my system it start with about 14k, and ends with 47k.
If I up the thread count to 10,000, VIRT goes up to 70+ gigs, but finishes somewhere around 50k, assuming I comment out pthread_exit( NULL ). If I use pthread_exit( NULL ) it finishes with about 113m still in VIRT. Are these acceptable? Is top not telling me everything?
void* run_thread( void* id )
{
int thread_id = *(int*)id;
int count = 0;
while ( count < 10 ) {
sleep( 1 );
printf( "Thread %d at count %d\n", thread_id, count++ );
}
pthread_exit( NULL );
return 0;
}
int main( int argc, char* argv[] )
{
sleep( 5 );
int thread_count = 0;
while( thread_count < 10 ) {
pthread_t my_thread;
if ( pthread_create( &my_thread, NULL, run_thread, (void*)&thread_count ) < 0 ) {
perror( "Error making thread...\n" );
return 1;
}
pthread_detach( my_thread );
thread_count++;
sleep( 1 );
}
pthread_exit( 0 ); // added as per request
return 0;
}

I know this is rather old question, but I hope others wil benefit.
This is indeed a memory leak. The thread is created with default attributes. By default the thread is joinable. A joinable threads keeps its underlying bookkeeping until it is finished... and joined.
If a thread is never joined, set de Detached attribute. All (thread) resources will be freed once the thread terminates.
Here's an example:
pthread_attr_t attr;
pthread_t thread;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, 1);
pthread_create(&thread, &attr, &threadfunction, NULL);
pthread_attr_destroy(&attr);

Prior to your edit of adding pthread_exit(0) to the end of main(), your program would finish executing before all the threads had finished running. valgrind thus reported the resources that were still being held by the threads that were still active at the time the program terminated, making it look like your program had a memory leak.
The call to pthread_exit(0) in main() makes the main thread wait for all the other spawned threads to exit before it itself exits. This lets valgrind observe a clean run in terms of memory utilization.
(I am assuming linux is your operating system below, but it seems you are running some variety of UNIX from your comments.)
The extra virtual memory you see is just linux assigning some pages to your program since it was a big memory user. As long as your resident memory utilization is low and constant when you reach the idle state, and the virtual utilization is relatively constant, you can assume your system is well behaved.
By default, each thread gets 2MB of stack space on linux. If each thread stack does not need that much space, you can adjust it by initializing a pthread_attr_t and setting it with a smaller stack size using pthread_attr_setstacksize(). What stack size is appropriate depends on how deep your function call stack grows and how much space the local variables for those functions take.
#define SMALLEST_STACKSZ PTHREAD_STACK_MIN
#define SMALL_STACK (24*1024)
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, SMALL_STACK);
/* ... */
pthread_create(&my_thread, &attr, run_thread, (void *)thread_count);
/* ... */
pthread_attr_destroy(&attr);

critical section problem in Windows 7

Why does the code sample below cause one thread to execute way more than another but a mutex does not?
#include <windows.h>
#include <conio.h>
#include <process.h>
#include <iostream>
using namespace std;
typedef struct _THREAD_INFO_ {
COORD coord; // a structure containing x and y coordinates
INT threadNumber; // each thread has it's own number
INT count;
}THREAD_INFO, * PTHREAD_INFO;
void gotoxy(int x, int y);
BOOL g_bRun;
CRITICAL_SECTION g_cs;
unsigned __stdcall ThreadFunc( void* pArguments )
{
PTHREAD_INFO info = (PTHREAD_INFO)pArguments;
while(g_bRun)
{
EnterCriticalSection(&g_cs);
//if(TryEnterCriticalSection(&g_cs))
//{
gotoxy(info->coord.X, info->coord.Y);
cout << "T" << info->threadNumber << ": " << info->count;
info->count++;
LeaveCriticalSection(&g_cs);
//}
}
ExitThread(0);
return 0;
}
int main(void)
{
// OR unsigned int
unsigned int id0, id1; // a place to store the thread ID returned from CreateThread
HANDLE h0, h1; // handles to theads
THREAD_INFO tInfo[2]; // only one of these - not optimal!
g_bRun = TRUE;
ZeroMemory(&tInfo, sizeof(tInfo)); // win32 function - memset(&buffer, 0, sizeof(buffer))
InitializeCriticalSection(&g_cs);
// setup data for the first thread
tInfo[0].threadNumber = 1;
tInfo[0].coord.X = 0;
tInfo[0].coord.Y = 0;
h0 = (HANDLE)_beginthreadex(
NULL, // no security attributes
0, // defaut stack size
&ThreadFunc, // pointer to function
&tInfo[0], // each thread gets its own data to output
0, // 0 for running or CREATE_SUSPENDED
&id0 ); // return thread id - reused here
// setup data for the second thread
tInfo[1].threadNumber = 2;
tInfo[1].coord.X = 15;
tInfo[1].coord.Y = 0;
h1 = (HANDLE)_beginthreadex(
NULL, // no security attributes
0, // defaut stack size
&ThreadFunc, // pointer to function
&tInfo[1], // each thread gets its own data to output
0, // 0 for running or CREATE_SUSPENDED
&id1 ); // return thread id - reused here
_getch();
g_bRun = FALSE;
return 0;
}
void gotoxy(int x, int y) // x=column position and y=row position
{
HANDLE hdl;
COORD coords;
hdl = GetStdHandle(STD_OUTPUT_HANDLE);
coords.X = x;
coords.Y = y;
SetConsoleCursorPosition(hdl, coords);
}

That may not answer your question but the behavior of critical sections changed on Windows Server 2003 SP1 and later.
If you have bugs related to critical sections on Windows 7 that you can't reproduce on an XP machine you may be affected by that change.
My understanding is that on Windows XP critical sections used a FIFO based strategy that was fair for all threads while later versions use a new strategy aimed at reducing context switching between threads.
There's a short note about this on the MSDN page about critical sections
You may also want to check this forum post

Critical sections, like mutexes are designed to protect a shared resource against conflicting access (such as concurrent modification). Critical sections are not meant to replace thread priority.
You have artificially introduced a shared resource (the screen) and made it into a bottleneck. As a result, the critical section is highly contended. Since both threads have equal priority, that is no reason for Windows to prefer one thread over another. Reduction of context switches is a reason to pick one thread over another. As a result of that reduction, the utilization of the shared resource goes up. That is a good thing; it means that one thread will be finished a lot earlier and the other thread will finish a bit earlier.
To see the effect graphically, compare
A B A B A B A B A B
to
AAAAA BBBBB
The second sequence is shorter because there's only one switch from A to B.

In hand wavey terms:
CriticalSection is saying the thread wants control to do some things together.
Mutex is making a marker to show 'being busy' so others can wait and notifying of completion so somebody else can start. Somebody else already waiting for the mutex will grab it before you can start the loop again and get it back.
So what you are getting with CriticalSection is a failure to yield between loops. You might see a difference if you had Sleep(0); after LeaveCriticalSection

I can't say why you're observing this particular behavior, but it's probably to do with the specifics of the implementation of each mechanism. What I CAN say is that unlocking then immediately locking a mutex is a bad thing. You will observe odd behavior eventually.

From some MSDN docs (http://msdn.microsoft.com/en-us/library/ms682530.aspx):
Starting with Windows Server 2003 with Service Pack 1 (SP1), threads waiting on a critical section do not acquire the critical section on a first-come, first-serve basis. This change increases performance significantly for most code

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js