We have a standlone VC++ application and we added logging using the log4cxx0.10.0 version.
The application will initiate a thread (for some time consuming operation) and if it takes more than threshold time then the main thread will kill the thread using TerminateThread method. The child thread function also has some logging prints.Log4CXX configured with rolling file appender of 1 MB size of 5 backup copies.Logging is working fine in most of the scenarios. But in some scenarios main thread logging function call is hanging after killing the child thread and hence the entire application is in hanging state.
Subsequent instances of the application is also hanging. We took the full crash dumps of the application and analyzed using the WinDbg.
Here is the call stack of the application
**00 ntdll!NtWaitForSingleObject+0xa
01 ntdll!RtlpWaitOnCriticalSection+0xe8
02 ntdll!RtlEnterCriticalSection+0xd1
03 log4cxx!log4cxx::filter::DenyAllFilter::decide+0x194
04 log4cxx!log4cxx::helpers::synchronized::synchronized+0x31
05 log4cxx!log4cxx::Logger::callAppenders+0x81
06 log4cxx!log4cxx::Logger::forcedLog+0xe5**
07 Test!CXX_LOG(int LOG_TYPE = 0n2, char * format = 0x00000001`3f2a2ad8 "Main thread pint...")+0x463 [d:\test\saf\test.cpp # 2360]
08 test!TestFunction(int argc = 0n3, char ** argv = 0x00000001`3f2ae880, int level = 0n1)+0x586 [d:\test\saf\test.cpp # 1634]
09 test!main(int argc = 0n4, char ** argv = 0x00000000`00282920)+0x1820 [d:\test\saf\test.cpp # 2309]
0a test!__tmainCRTStartup(void)+0x13b [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c # 278]
0b kernel32!BaseThreadInitThunk+0xd
0c ntdll!RtlUserThreadStart+0x1d
and subsequent applications hangs for locking the file and call stack of the instance as follows
**ntdll!ZwLockFile+0xa
KERNELBASE!LockFileEx+0xb2
kernel32!LockFileEx+0x1b
log4cxx!log4cxx::filter::DenyAllFilter::decide+0x2a89
log4cxx!log4cxx::helpers::DatagramPacket::setData+0x559c
log4cxx!log4cxx::helpers::FileOutputStream::write+0x82
log4cxx!log4cxx::rolling::RollingFileAppenderSkeleton::getTriggeringPolicy+0x1ca
log4cxx!log4cxx::helpers::OutputStreamWriter::write+0xbe
log4cxx!log4cxx::WriterAppender::subAppend+0x7c
log4cxx!log4cxx::rolling::RollingFileAppenderSkeleton::subAppend+0xd0
log4cxx!log4cxx::WriterAppender::append+0x31
log4cxx!log4cxx::AppenderSkeleton::doAppend+0x293
log4cxx!log4cxx::helpers::AppenderAttachableImpl::appendLoopOnAppenders+0x40
log4cxx!log4cxx::Logger::callAppenders+0xa3
log4cxx!log4cxx::Logger::forcedLog+0xe5**
test!CXX_LOG(int LOG_TYPE = 0n2, char * format = 0x00000001`3f2a3868 "Starting the application")+0x463
test!main(int argc = 0n4, char ** argv = 0x00000000`00162920)+0x1806
test!__tmainCRTStartup(void)+0x13b
kernel32!BaseThreadInitThunk+0xd
ntdll!RtlUserThreadStart+0x21
We have checked the function 'decide' and it has nothing to do with locking. it is just returning some constant value.I have read that LOG4CXX is thread safe. This issue is not occurring frequently and hence we didn't have the steps to reproduce in consistent way.
Is it anything needs to be addressed when we killing the child thread??
Redesign your application. TerminateThread is implicitly unsafe by its very nature because resources in use by the thread are not released. You just managed to terminate it while it was holding a lock, and now your main thread is trying to acquire that held lock. Find a different way to terminate the thread.
Here is the lock in that stack trace: https://apache.googlesource.com/log4cxx/+/e3db59080a3506f0ed23e98cbcb2be58f0b15a20/src/main/cpp/logger.cpp#93
Related
My attempt
I created a minimal, CRT-free, dependency-depleted executable with Microsoft Visual Studio by specifying the /GS- compiler flag and the /NoDefaultLib linker flag, and naming the main function mainCRTStartup. The application does not create additional threads and returns from mainCRTStartup after < 5 seconds, but it takes 30 seconds in total for the process to terminate.
Problem description
From my experience, if an application, executed on Windows 10, only depends on dynamic libraries that are loaded by default into every Windows process, namingly ntdll.dll, KernelBase.dll and kernel32.dll, the process exits normally when the main thread returns from the mainCRTStartup function.
If other libraries are loaded, statically or dynamically (f. e. by calling LoadLibraryW), returning from the main function will leave the process alive: for 30 seconds when run normally and indefinitely when run under a debugger.
Context
On process creation, the Windows 10 process loader creates additional threads to load dynamic libraries faster, see:
Why does Windows 10 start extra threads in my program?
Why there are three unexpected worker threads when a Win32 console application starts up?
Cylance mentions in Windows 10 Parallel Loading Breakdown:
The worker thread idle timeout is set to 30 seconds. Programs which execute in less than 30 seconds will appear to hang due to ntdll!TppWorkerThreadwaiting for the idle timeout before the process terminates.
Microsoft mentions in Terminating a Process: How Processes are Terminated:
Note that some implementation of the C run-time library (CRT) call ExitProcess if the primary thread of the process returns.
On the other hand, Microsoft mentions in ExitProcess:
Note that returning from the main function of an application results in a call to ExitProcess.
Test code
This is the minimal test code I worked with, I used kernel32!CloseHandle and user32!CloseWindow as examples, the call to them does not actually do anything:
#include <cstdint>
namespace windows {
typedef const intptr_t Handle;
typedef const void * Module;
constexpr Handle InvalidHandleValue = -1;
namespace kernel32 {
extern "C" uint32_t __stdcall CloseHandle(Handle);
extern "C" uint32_t __stdcall FreeLibrary(Module);
extern "C" Module __stdcall LoadLibraryW(const wchar_t *);
}
namespace user32 {
extern "C" uint32_t __stdcall CloseWindow(Handle);
}
}
int mainCRTStartup() {
// 0 seconds
// windows::kernel32::CloseHandle(windows::InvalidHandleValue);
// 30 seconds
// windows::user32::CloseWindow(windows::InvalidHandleValue);
// 0 seconds
// windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L"kernel32.dll"));
// 30 seconds
// windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L"user32.dll"));
// 0 seconds
// windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L""));
return 0;
}
Debugging
Commenting in the WinAPI usage in the the mainCRTStartup function results in execution times mentioned above the respective WinAPI call.
This is the execution flow of the program traced in a debugger in pseudo C++:
ntdll.RtlUserThreadStart() {
kernel32.BaseThreadInitThunk() {
const auto return_code = test.mainCRTStartup();
ntdll.RtlExitUserThread(return_code) {
if (ntdll.NtQueryInformationThread(CURRENT_THREAD, ThreadAmILastThread) != STATUS_SUCCESS || !AmILastThread) {
// Bad path - for `30 seconds`.
ntdll.LdrShutdownThread();
ntdll.TpCheckTerminateWorker(0);
ntdll.NtTerminateThread(0, return_code);
// The thread execution does not return from `NtTerminateThread`, but the process still runs.
} else {
// Good path - for `0 seconds`.
ntdll.RtlExitUserProcess(return_code) {
ntdll.EtwpShutdownPrivateLoggers();
ntdll.LdrpDrainWorkQueue(0);
ntdll.LdrpAcquireLoaderLock();
ntdll.RtlEnterCriticalSection(ntdll.FastPebLock);
ntdll.RtlLockHeap(peb.ProcessHeap);
ntdll.NtTerminateProcess(0, return_code);
ntdll.RtlUnlockProcessHeapOnProcessTerminate();
ntdll.RtlLeaveCriticalSection(ntdll.FastPebLock);
ntdll.RtlReportSilentProcessExit(CURRENT_PROCESS, return_code);
ntdll.LdrShutdownProcess();
ntdll.NtTerminateProcess(CURRENT_PROCESS, return_code);
// The thread execution does not return from `NtTerminateProcess` and the process is terminated.
}
}
}
}
}
Expected results
I expected the process to terminate if it does not create additional threads and returns from the main function.
Calling ExitProcess at the end of the main function terminates the process, even if WinAPI is called which resulted in 30 seconds execution before. Using this API is not always possible, because the problematic application might not be mine, but a 3rd party application (from Microsoft) like here: Why would a process hang within RtlExitUserProcess/LdrpDrainWorkQueue?
It seems to me that the Windows 10 process loader is broken, if even Microsoft processes behave incorrectly.
Is there a clean solution to this problem?
What are those loader threads needed for, if the last user created thread exits? AFAIK it is impossible at this point to load any other libraries.
I expected the process to terminate if it does not create additional
threads and returns from the main function.
process can implicit create additional threads. loader for example. and need understanding what mean
returns from the main function
here mean function which called from standard CRT mainCRTStartup function. after this mainCRTStartup call ExitProcess. so not any exe entry real entry point function but some sub-function called from entry point. but entry point call ExitProcess than.
if we not use CRT - we need call ExitProcess yourself. if we simply return from from entry point - will be RtlExitUserThread which not call ExitProcess except this is last thread in process (AmILastThread) (and here also can be race if 2 or more threads in parallel call ExitThread)
I'm developing a multi-threaded C++ application using GCC 4.4.5 and GDB 7.2.
At the moment, I have four threads. Each one interacts with a CAN bus in one form or another, either reading, writing, polling or handling messages.
In order to determine which thread is doing what, I have decided to add the thread IDs to log messages.
In my logging functions, I have the following code:
// This is for outputting debug messages
void logDebug(string msg, thread::id threadId[ = NULL]) {
#ifdebug _DEBUG
threadState.outputLock->lock();
if (threadId != NULL)
cout << "[Thread #" << threadId << "] ";
// The rest of the output
threadState.outputLock->unlock();
#endif
}
This is the (debug) output from the application:
[Thread #3085296768] [DEBUG] [Mon Jun 17 10:18:45 2019] CAN frame was empty or no message on bus...
----------
And this is the what GDB is telling me:
Thread #3 7575 [core: 0] (Suspended: Breakpoint)
----
Why is the debugger giving me different information from the application (the thread IDs/numbers) and is there a way to output the same information in the application, as the debugger is telling me?
The expected behaviour is that the thread IDs are identical.
EDIT:
I forgot to add some possibly important information.
I'm cross-compiling to an embedded device powered by a POWERPC chip, running a derivative of Debian Wheezy.
You can get the thread id from your application with the following system call : syscall(SYS_gettid)
From there you can set the thread name by either :
writing directly the name in /proc/PID/task/TID/comm
using the pthread function int pthread_setname_np(pthread_t thread, const char *name)
Then in GDB you can easily match the given thread name, its Linux TID and the GDB thread ID with info threads command.
Hope this helps.
To debug a locked file problem, we're calling SysInternal's Handle64.exe 4.11 from a .NET process (via Process.Start with asynchronous output redirection). The calling process hangs on Process.WaitForExit because the Handle64 process doesn't exit (for more than two hours).
We took a dump of the corresponding Handle64 process and checked it in the Visual Studio 2017 debugger. It shows two threads ("Main Thread" and "ntdll.dll!TppWorkerThread").
Main thread's call stack:
ntdll.dll!NtWaitForSingleObject () Unknown
ntdll.dll!LdrpDrainWorkQueue() Unknown
ntdll.dll!RtlExitUserProcess() Unknown
kernel32.dll!ExitProcessImplementation () Unknown
handle64.exe!000000014000664c() Unknown
handle64.exe!00000001400082a5() Unknown
kernel32.dll!BaseThreadInitThunk () Unknown
ntdll.dll!RtlUserThreadStart () Unknown
Worker thread's call stack:
ntdll.dll!NtWaitForSingleObject() Unknown
ntdll.dll!LdrpDrainWorkQueue() Unknown
ntdll.dll!LdrpInitializeThread() Unknown
ntdll.dll!_LdrpInitialize() Unknown
ntdll.dll!LdrInitializeThunk() Unknown
My question is: Why would a process hang in LdrpDrainWorkQueue? From https://stackoverflow.com/a/42789684/62838, I gather that this is the Windows 10 parallel loader at work, but why would it get stuck while exiting the process? Can this be caused by how we invoke Handle64 from another process? I.e., are we doing something wrong or is this rather a bug in Handle64?
How long did you wait?
According to this analysis,
The worker thread idle timeout is set to 30 seconds. Programs which
execute in less than 30 seconds will appear to hang due to
ntdll!TppWorkerThread waiting for the idle timeout before the process
terminates.
I would recommend trying to set the registry key specified in that article to disable the parallel loader and see if this resolved the issue.
Parent Key: HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\handle64.exe
Value Name: MaxLoaderThreads
Type: DWORD
Value: 1 to disable
Take a look at this amazing call stack:
1 FNFolderTreeDir::refreshSubDir fnfoldertreedir.cpp 42 0x13faa3287
2 FNFolderTreeDriveRootHive::processDirLoaded fnfoldertreedriveroothive.cpp 25 0x13faa2175
3 FNFolderTreeDriveRootHive::qt_static_metacall moc_fnfoldertreedriveroothive.cpp 74 0x13faa5df5
4 QMetaObject::activate qobject.cpp 3742 0x51744922
... lots of Qt internals here ...
33 _purecall purevirt.c 58 0x7feedc2d0cc
34 FNFolderTreeNode::purge fnfoldertreenode.cpp 28 0x13fa9cf53
35 FNFolderTreeNode::~FNFolderTreeNode fnfoldertreenode.cpp 23 0x13fa9ceeb
FNFolderTreeDriveRootHive::processDirLoaded is a slot connected to the QFileSystemModel::directoryLoaded signal. So, what happened here is that my destructor was happily doing some internal cleanup when Qt interrupted it at a seemingly random point to call a slot to refresh the very object that has been reduced to the base class already. Of course, it crashed.
A couple of questions, if I may:
How is this even possible? I suspect that Qt used either QAbstractItemModel::beginRemoveColumns or endRemoveRows to dispatch the callback - I call them both in FNFolderTreeNode::purge. Should it not be doing this in main loop only?
How can I prevent this behaviour? Could try disconnecting the slot first thing in the destructor, but where is the guarantee that it will not be interrupted too?
I am working on writing a wrapper DLL to interface a communication DLL for a yokogawa WT1600 power meter, to a PC based automation package. I got the communication part to work but I need to thread it so that a 50ms scan time of the automation package can be maintained. (The Extended Function Block (EFB) Call will block the scan until it returns)
These are the steps I need to do.
Call EFB
EFB creates a thread to perform communication setup (takes about 200ms to do)
EFB returns EFB_BUSY while the thread is doing the work
3a. (automation program continues scanning until it comes back to the EFB call)
Call EFB passing in that it returned busy on the last call
EFB checks if the thread has returned
If the thread returned Then the EFB returns success, Else return EFB_BUSY
repeat 3a-6 until efb returns success
So my problem is, how do I create a thread that exists past the life of the function that called it? And how do I get that thread return value when I call back into the DLL?
EDIT #1
HeavyFunction::HeavyFunction^ hf; //HeavyFunction is a class that has a time consuming function in it
ThreadStart^ efbThreadDelegate;
Thread^ efbThread;
if( pEfbData->nBlockingRecall != DOEFB_BUSY ) {
hf = gcnew HeavyFunction::HeavyFunction;
hf->iiStart = (int)(pEfbData->uParams[0].dw);
hf->iiEnd = (int)(pEfbData->uParams[1].dw);
efbThreadDelegate = gcnew ThreadStart( hf, &HeavyFunction::HeavyFunction::iGetPrime );
efbThread = gcnew Thread( efbThreadDelegate );
efbThread->Start();
return DOEFB_BUSY;
}else if ( efbThread->IsAlive ) {
return DOEFB_BUSY;
}else {
uRetValue->dw = hf->iReturn;
return 0;
}
Will efbThread still have the same thread handle upon a subsequent call?
EDIT #2
I got it to work by creating a global HANDLE for a Mutex and a thread. Initializing the mutex in the init entry point (done upon dll loading) and creating the thread in the main function when a call is actually made to the dll.
I used the sample code from MSDN: Creating Threads as my model.
Any thread created (whether in a DLL or elsewhere) will not stop spontaneously. In particular, the function that created the thread may return. The new thread would still run even if the creator thread exited. That is, assuming it didn't hit the end of its entry function.
Windows threads return a DWORD when ready. To peek, call WaitForSingleObject on the thread handle with a 0 second timeout, and it that succeeds, call GetExitCodeThread .
I don't understand your whole "EFB" thing, neither what it is nor what it does, though. If it is doing funny things to normal Windows threads, all bets are off.