Why Win32 api _beginthreadex/CreateThread leaks when using CRT or not?

Why Win32 api _beginthreadex/CreateThread leaks when using CRT or not? - c++

I have the following function that is executed on a Thread created with _beginthreadex or CreateThread:
static volatile LONG g_executedThreads = 0;
void executeThread(int v){
//1. leaks: time_t tt = _time64(NULL);
//2. leaks: FILETIME ft; GetSystemTimeAsFileTime(&ft);
//3. no leak: SYSTEMTIME stm; GetSystemTime(&stm);
InterlockedAdd(&g_executedThreads, 1); // count nr of executions
}
When I uncomment any of lines 1. (crt call) or 2. (win 32 api call) the thread leaks and next calls of _begintreadex will fail( GetLastError -> return error (8) -> Not enough storage is available to process this command).
Memory reported by Process Explorer, when _beginthreadex starts to fail:
Private 130 Mb, Virtual 150 Mb.
But if I uncomment only line 3. (other win 32 api call) no leak happens, and no fail after 1 million of threads. Here Memory reported is Private 1.4 Mb, Virtual 25 Mb. And this version ran very fast (20 secs for 1 million threads vs the first one that took 60 secs for 30000).
I've tested (see here the test code ) with Visual Studio 2013, compiled x86 (debug and release) and ran on Win 8.1 x64; After creating 30000 of threads _beginthreadex starts failing (most of the calls); I want to mention that simultaneous running threads are under 100.
Updated 2:
My assumption of max 100 threads was based on console output (scheduled is aprox equal with completed) and Process Explorer in Threads Tab did not report more then 10 threads_)
Here is the console output (no WaitForSingleObject, original code):
step:0, scheduled:1, completed:1
step:5000, scheduled:5001, completed:5000
...
step:25000, scheduled:25001, completed:24999
step:30000, scheduled:30001, completed:30001
_beginthreadex failed. err(8); errno(12). exiting ...
step:31701, scheduled:31712, completed:31710
rerun loop:
step:0, scheduled:31713, completed:31711
_beginthreadex failed. err(8); errno(12). exiting ...
step:6, scheduled:31719, completed:31716
Based on #SHR & #HarryJohnston suggestion I've scheduled 64 threads at once, and wait all to complete, (see updated code here), but the behaviour is same. Note I've tried single thread once, but the fail happens sporadic. Also the Reserved Stack size is 64K!
Here is the new schedule function:
static unsigned int __stdcall _beginthreadex_wrapper(void *arg) {
executeThread(1);
return 0;
}
const int maxThreadsCount = MAXIMUM_WAIT_OBJECTS;
bool _beginthreadex_factory(int& step) {
DWORD lastError = 0;
HANDLE threads[maxThreadsCount];
int threadsCount = 0;
while (threadsCount < maxThreadsCount){
unsigned int id;
threads[threadsCount] = (HANDLE)_beginthreadex(NULL,
64 * 1024, _beginthreadex_wrapper, NULL, STACK_SIZE_PARAM_IS_A_RESERVATION, &id);
if (threads[threadsCount] == NULL) {
lastError = GetLastError();
break;
}
else threadsCount++;
}
if (threadsCount > 0) {
WaitForMultipleObjects(threadsCount, threads, TRUE, INFINITE);
for (int i = 0; i < threadsCount; i++) CloseHandle(threads[i]);
}
step += threadsCount;
g_scheduledThreads += threadsCount;
if (threadsCount < maxThreadsCount) {
printf(" %03d sec: step:%d, _beginthreadex failed. err(%d); errno(%d). exiting ...\n", getLogTime(), step, lastError, errno);
return false;
}
else return true;
}
Here is what is printed on Console:
000 sec: step:6400, scheduled:6400, completed:6400
003 sec: step:12800, scheduled:12800, completed:12800
007 sec: step:19200, scheduled:19200, completed:19200
014 sec: step:25600, scheduled:25600, completed:25600
022 sec: step:32000, scheduled:32000, completed:32000
023 sec: step:32358, _beginthreadex failed. err(8); errno(12). exiting ...
sleep 5 seconds
028 sec: step:32358, scheduled:32358, completed:32358
try to create 2 more times
028 sec: step:32361, _beginthreadex failed. err(8); errno(12). exiting ...
032 sec: step:32361, scheduled:32361, completed:32361
rerun loop: 1
036 sec: step:3, _beginthreadex failed. err(8); errno(12). exiting ...
sleep 5 seconds
041 sec: step:3, scheduled:32364, completed:32364
try to create 2 more times
041 sec: step:5, _beginthreadex failed. err(8); errno(12). exiting ...
045 sec: step:5, scheduled:32366, completed:32366
rerun loop: 2
056 sec: step:2, _beginthreadex failed. err(8); errno(12). exiting ...
sleep 5 seconds
061 sec: step:2, scheduled:32368, completed:32368
try to create 2 more times
061 sec: step:4, _beginthreadex failed. err(8); errno(12). exiting ...
065 sec: step:4, scheduled:32370, completed:32370
Any suggestion/info is welcome.
Thanks.

I guess you get it wrong.
Take a look at this code:
int thread_func(void* p)
{
Sleep(1000);
return 0;
}
int main()
{
LPTHREAD_START_ROUTINE s = (LPTHREAD_START_ROUTINE)&thread_func;
for(int i=0;i<1000000;i++)
{
DWORD id;
HANDLE h = CreateThread(NULL,0, s,NULL,0,&id);
WaitForSingleObject(h,INFINITE);
}
return 0;
}
A leaking thread will leak just because you calling it, so the wait doesn't chang a thing, but when you look at this in performance monitor, you'll see all lines are almost constant.
Now ask yourself, what will happen when I remove the WaitForSingleObject?
The creation of threads run much faster then the threads, so you reach the threads limit per proccess, or memory limit per process.
Note that if you are compiling for x86, memory is limited to 4GB but only 2GB is used for user mode memory and the other 2GB used for kernel mode memory. if you are using the default stack size (1MB) for thread, and the rest of the program doesn't use memory at all (it's never happen, since you have code...), then you are limited to 2000 threads. after the 2GB finished you can't create more threads until previous threads will over.
So, my conclusion is that you creating threads and don't wait, and after some period, no memory left for more threads.
You can check if this is the case with performance monitor and check the max threads per your process.

After uninstall the antivirus the failure could not be reproduced (even the code is run as fast as for the other scenario 3.).

Related

sem_timedwait() does not work for the remote machine

I have written a C++ program for thread synchronization. The program works for my computer, for which the operating system is Ubuntu 16.04. While running the program on the remote machine with Ubuntu 18.04, a different behaviour arise. I am simply using sem_timedwait() function of Linux (man page) to cause timeout if some signals do not arrive on time. Here tfoundry() is called by posix-threads and it shall wait for signals sem1 and sem2 to continue its execution. After 5 seconds, even if one of the signals is not received, the thread must stop running.
struct Foundry{
sem_t sem1;
sem_t sem2;
const struct timespec* abs_timeout;
struct timeval currentTime;
struct timespec ts;
};
void* tfoundry(void* f){
Foundry* foundry = (Foundry*) f;
sem_init(&(foundry->sem1), 0, 0);
sem_init(&(foundry->sem2), 0, 0);
foundry->abs_timeout = &(foundry->ts);
gettimeofday(&(foundry->currentTime),NULL);
// 5 sec for timeout
foundry->ts.tv_sec = foundry->currentTime.tv_sec + 5;
if (sem_timedwait(&(foundry->sem1),foundry->abs_timeout) != 0)
exit(-1);
if (sem_timedwait(&(foundry->sem2),foundry->abs_timeout) != 0)
exit(-1);
}
In my computer, this thread waits for 5 seconds and if not two signals have arrived, it simply exits. But on the remote machine it suddenly exits without waiting 5 seconds. I will appreciate any help.

How to activate a specific core on the big.Little architecture of arm?

I am developing a C++ code using android-ndk-15c and trying to run a thread on a specific core available on the processor that has 10 ARM cores (not all cores are the same; Big.little architecture). However, not all cores are active all the time. If I try to call sched_setaffinity with a cpu that is inactive, the call returns error message. Here is the sample code.
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/syscall.h>
void getCpus() {
cpu_set_t my_set;
int syscallres = sched_getaffinity(0, sizeof(cpu_set_t), &my_set);
if( syscallres ) {
int err = errno;
printf("Error in the syscall getaffinity: err=%d\n", err);
}
for(unsigned cpu = 0; cpu < 10; cpu++ ) {
if( CPU_ISSET(cpu, &my_set) ) {
printf( "cpu %d available!!\n", cpu );
}
}
}
void setCpu( int cpu ) {
cpu_set_t my_set;
CPU_ZERO(&my_set);
CPU_SET( cpu, &my_set);
int syscallres = sched_setaffinity(0, sizeof(cpu_set_t), &my_set);
if( syscallres ) {
int err = errno;
printf("Error in the syscall setaffinity: cpu=%d err=%d\n", cpu, err);
}
}
int main () {
getCpus();
setCpu(3);
}
Sample outputs:
cpu 0 available!!
cpu 1 available!!
Error in the syscall setaffinity: cpu=3 err=22
Another output when cpu 3 was active (not due to my code; android may activate some cores depending on load).
cpu 0 available!!
cpu 1 available!!
cpu 2 available!!
cpu 3 available!!
cpu 4 available!!
How to activate a specific core via ndk system calls?

I don't think individual cores can be activated, but it looks like...
android : PowerManager allows you to see if there is a sustained performance mode, which can be set on a Window by calling setSustainedPerformanceMode
This should wake up the CPUs for your usage. Also WakeLocks look like they warn Android that you want to access more than "idle" resources.

WaitForSingleObject doesn't timeout

The following simple code doesn't operate as I expected. It creates a thread (suspended), starts it, waits for it to run for 1 millisecond and loops waiting until the thread dies or fails.
I expected the output to be something along the lines of:
Start
Callback running
Callback running
Callback running
WaitForSingleObject looping
Callback running
Callback running
WaitForSingleObject looping
Callback running
Callback running
WaitForSingleObject looping
Callback running
Callback running
... repeating for 10000 times
End
Thread end
But the output is:
Start
Callback running
Callback running
Callback running
Callback running
Callback running
... repeating for 10000 times
Callback running
End
WaitForSingleObject looping
Thread end
I thought that the wait in WaitForSingleObject would timeout at some point and interrupt the thread at some point? But the thread seems to be blocking and not asynchronous?
DWORD WINAPI callback(LPVOID param)
{
printf("Start\n");
for (int i=10000; i>0; i--)
printf("Callback running\n");
printf("End\n");
return 1;
}
int main()
{
HANDLE hThread = CreateThread(NULL, 0, callback, 0, CREATE_SUSPENDED, 0);
if (!hThread) {
printf("Failed to create thread\n");
return 0;
}
ResumeThread(hThread);
while (WaitForSingleObject(hThread, 1) == WAIT_TIMEOUT) {
printf("WaitForSingleObject looping\n");
}
CloseHandle(hThread);
printf("Thread end\n");
system("PAUSE");
return 0;
}

The dwMilliseconds parameter in WaitForSingleObject cannot be relied upon for accurate timing. The only contract is that after that much time has elapsed, the thread will eventually wake up and return the TIMEOUT value. The thread may not wake up until its next scheduled quanta, which can be as high as 60 milliseconds (or even higher on Windows Server). This is more than enough time for the second thread to complete. Try increasing the iteration count such that the worker thread takes at least one second to run - that should be plenty of time for the primary thread to be scheduled and run at least one more iteration of the TIMEOUT loop.

Windows creates events on thread shutdown

I am attempting to add handle leak detection to the unit test framework on my code. (Windows 7, x64 VS2010)
I basically call GetProcessHandleCount() before and after each unit test.
This works fine except when threads are created/destroyed as part of the test.
It seems that windows is occasionally creating an 1-3 events on thread shutdown. Running the same test in a loop does not increase the event creation count. (eg running the test 5000 times in a loop only results in 1-3 extra events being created)
I do not create events manually in my own code.
It seems that this is similar to this problem:
boost::thread causing small event handle leak?
but I am doing manual thread creation/shutdown.
I followed this code:
http://blogs.technet.com/b/yongrhee/archive/2011/12/19/how-to-troubleshoot-a-handle-leak.aspx
And got this callstack from WinDbg:
Outstanding handles opened since the previous snapshot:
--------------------------------------
Handle = 0x0000000000000108 - OPEN
Thread ID = 0x00000000000030dc, Process ID = 0x0000000000000c90
0x000000007715173a: ntdll!NtCreateEvent+0x000000000000000a
0x0000000077133f26: ntdll!RtlpCreateCriticalSectionSem+0x0000000000000026
0x0000000077133ee3: ntdll!RtlpWaitOnCriticalSection+0x000000000000014e
0x000000007714e40b: ntdll!RtlEnterCriticalSection+0x00000000000000d1
0x0000000077146ad2: ntdll!LdrShutdownThread+0x0000000000000072
0x0000000077146978: ntdll!RtlExitUserThread+0x0000000000000038
0x0000000076ef59f5: kernel32!BaseThreadInitThunk+0x0000000000000015
0x000000007712c541: ntdll!RtlUserThreadStart+0x000000000000001d
--------------------------------------
As you can see, this is an event created on the thread shutdown.
Is there a better way of doing this handle leak detection in unit tests? My only current options are:
Forget trying to do this handle leak detection
Spin up some dummy tasks to attempt to create these spurious events.
Allow some small tolerance value in leaks and run each test 100's of times (so actual leaks will be a large number)
Get the handle count excluding events (difficult amount of code)
I have also tried switching to using std::thread in VS2013, but it seems that it creates a lot of background threads and handles when used. (makes the count difference much worse)
Here is a self contained example where 99+% of the time (on my computer) an event is created behind the scenes. (handle count is different). Putting the startup/shutdown code in a loop indicates it does not directly leak, but accumulates the occasional events:
#include "stdio.h"
#include <Windows.h>
#include <process.h>
#define THREADCOUNT 3
static HANDLE s_semCommand, s_semRender;
static unsigned __stdcall ExecutiveThread(void *)
{
WaitForSingleObject(s_semCommand, INFINITE);
ReleaseSemaphore(s_semRender, THREADCOUNT - 1, NULL);
return 0;
}
static unsigned __stdcall WorkerThread(void *)
{
WaitForSingleObject(s_semRender, INFINITE);
return 0;
}
int main(int argc, char* argv[])
{
DWORD oldHandleCount = 0;
GetProcessHandleCount(GetCurrentProcess(), &oldHandleCount);
s_semCommand = CreateSemaphoreA(NULL, 0, 0xFFFF, NULL);
s_semRender = CreateSemaphoreA(NULL, 0, 0xFFFF, NULL);
// Spool threads up
HANDLE threads[THREADCOUNT];
for (int i = 0; i < THREADCOUNT; i++)
{
threads[i] = (HANDLE)_beginthreadex(NULL, 4096, (i==0) ? ExecutiveThread : WorkerThread, NULL, 0, NULL);
}
// Signal shutdown - Wait for threads and close semaphores
ReleaseSemaphore(s_semCommand, 1, NULL);
for (int i = 0; i < THREADCOUNT; i++)
{
WaitForSingleObject(threads[i], INFINITE);
CloseHandle(threads[i]);
}
CloseHandle(s_semCommand);
CloseHandle(s_semRender);
DWORD newHandleCount = 0;
GetProcessHandleCount(GetCurrentProcess(), &newHandleCount);
printf("Handle %d -> %d", oldHandleCount, newHandleCount);
return 0;
}

COM port read - Thread remains alive after timeout occurs

I have a dll which includes a function called ReadPort that reads data from serial COM port, written in c/c++. This function is called within an extra thread from another WINAPI function using the _beginthreadex. When COM port has data to be read, the worker thread returns the data, ends normaly, the calling thread closes the worker's thread handle and the dll works fine.
However, if ReadPort is called without data pending on the COM port, when timeout occurs then WaitForSingleObject returns WAIT_TIMEOUT but the worker thread never ends. As a result, virtual memory grows at about 1 MB every time, physical memory grows some KBs and the application that calls the dll becomes unstable. I also tryied to use TerminateThread() but i got the same results.
I have to admit that although i have enough developing experience, i am not familiar with c/c++. I did a lot of research before posting but unfortunately i didn't manage to solve my problem.
Does anyone have a clue on how could i solve this problem? However, I really want to stick to this kind of solution. Also, i want to mention that i think i can't use any global variables to use some kind of extra events, because each dll's functions may be called many times for every COM port.
I post some parts of my code below:
The Worker Thread:
unsigned int __stdcall ReadPort(void* readstr){
DWORD dwError; int rres;DWORD dwCommModemStatus, dwBytesTransferred;
int ret;
char szBuff[64] = "";
ReadParams* params = (ReadParams*)readstr;
ret = SetCommMask(params->param2, EV_RXCHAR | EV_CTS | EV_DSR | EV_RLSD | EV_RING);
if (ret == 0)
{
_endthreadex(0);
return -1;
}
ret = WaitCommEvent(params->param2, &dwCommModemStatus, 0);
if (ret == 0)
{
_endthreadex(0);
return -2;
}
ret = SetCommMask(params->param2, EV_RXCHAR | EV_CTS | EV_DSR | EV_RLSD| EV_RING);
if (ret == 0)
{
_endthreadex(0);
return -3;
}
if (dwCommModemStatus & EV_RXCHAR||dwCommModemStatus & EV_RLSD)
{
rres = ReadFile(params->param2, szBuff, 64, &dwBytesTransferred,NULL);
if (rres == 0)
{
switch (dwError = GetLastError())
{
case ERROR_HANDLE_EOF:
_endthreadex(0);
return -4;
}
_endthreadex(0);
return -5;
}
else
{
strcpy(params->param1,szBuff);
_endthreadex(0);
return 0;
}
}
else
{
_endthreadex(0);
return 0;
}
_endthreadex(0);
return 0;}
The Calling Thread:
int WINAPI StartReadThread(HANDLE porthandle, HWND windowhandle){
HANDLE hThread;
unsigned threadID;
ReadParams readstr;
DWORD ret, ret2;
readstr.param2 = porthandle;
hThread = (HANDLE)_beginthreadex( NULL, 0, ReadPort, &readstr, 0, &threadID );
ret = WaitForSingleObject(hThread, 500);
if (ret == WAIT_OBJECT_0)
{
CloseHandle(hThread);
if (readstr.param1 != NULL)
// Send message to GUI
return 0;
}
else if (ret == WAIT_TIMEOUT)
{
ret2 = CloseHandle(hThread);
return -1;
}
else
{
ret2 = CloseHandle(hThread);
if (ret2 == 0)
return -2;
}}
Thank you in advance,
Sna.

Don't use WaitCommEvent. You can call ReadFile even when there is no data waiting.
Use SetCommTimeouts to make ReadFile itself timeout, instead of building a timeout on the inter-thread communications.

Change the delay in the WaitForSingleObject call to 5000 or 10000 and I bet your problem frequency goes way down.
Edwin's answer is also valid. The spawned thread does not die because you closed the thread handle.
There is no guarantee that the ReadPort thread has even started by the time you are timing out. Windows takes a LONG time to start a thread.
Here are some suggestions:
You never check the return value of beginthreadex. How do you know the thread started?
Use whatever synchronization method with which you are comfortable to sync the ReadPort thread startup with StartReadThread. It could be as simple as an integer flag that ReadPort sets to 1 when its ready to work. Then the main thread can start its true waiting at that point. Otherwise you'll never know short of using a debugger what's happening between the 2 threads. Do not time out from the call to WaitForSingleObject in StartReadThread until your sync method indicates that ReadPort is working.
You should not use strcpy to copy the bytes received from the serial port with ReadFile. ReadFile tells you how many bytes it read. Use that value and memcpy to fill the buffer.
Look here and here for info on how to have ReadFile time out so your reads are not indefinite. Blocking forever on Windows is a recipe for disaster as it can cause zombie processes you cannot kill, among other problems.
You communicate no status to StartReadThread about what happened in the ReadPort thread. How do you know how many bytes ReadPort placed into szBuff? To get the theads exit code, use GetExitCodeThread. Documented here. Note that you cannot use GetExitCodeThread if you've closed the thread handle.

In your calling thread after a timeout you close the threadhandle. This will only stop you from using the handle. The worker thread however is still running. You should use a loop which waits again.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js