EDIT: I have now edited my code a bit to have a rough idea of "all" the code. Maybe this
might be helpful to identify the problem ;)
I have integrated the following simple code fragement which either cancels the timer if data
is read from the TCP socket or otherwise it cancels the data read from the socket
// file tcp.cpp
void CheckTCPSocket()
{
TRequestStatus iStatus;
TSockXfrLength len;
int timeout = 1000;
RTimer timer;
TRequestStatus timerstatus;
TPtr8 buff;
iSocket.RecvOneOrMore( buff, 0, iStatus, len );
timer.CreateLocal();
timer.After(timerstatus, timeout);
// Wait for two requests – if timer completes first, we have a
// timeout.
User::WaitForRequest(iStatus, timerstatus);
if(timerstatus.Int() != KRequestPending)
{
iSocket.CancelRead();
}
else
{
timer.Cancel();
}
timer.Close();
}
// file main.cpp
void TestActiveObject::RunL()
{
TUint Data;
MQueue.ReceiveBlocking(Data);
CheckTCPSocket();
SetActive();
}
This part is executed within active Object and since integrating the code piece above I always get the kernel panic:
E32User-CBase 46: This panic is raised by an active scheduler, a CActiveScheduler. It is caused by a stray signal.
I never had any problem with my code until now this piece of code is executed; code executes fine as data is read from the socket and
then the timer is canceled and closed. I do not understand how the timer object has here any influence on the AO.
Would be great if someone could point me to the right direction.
Thanks
This could be a problem with another active object completing (not one of these two), or SetActive() not being called. See Forum Nokia. Hard to say without seeing all your code!
BTW User::WaitForRequest() is nearly always a bad idea. See why here.
Never mix active objects and User::WaitForRequest().
(Well, almost never. When you know exactly what you are doing it can be ok, but the code you posted suggests you still have some learning to do.)
You get the stray signal panic when the thread request semaphore is signalled with RThread::RequestComplete() by the asynchronous service provider and the active scheduler that was waiting on the semaphore with User::WaitForAnyRequest() tries to look for an active object that was completed so that its RunL() could be called, but cannot find any in its list of active objects.
In this case you have two ongoing requests, neither of which is controlled by the active scheduler (for example, not using CActive::iStatus as the TRequestStatus; issuing SetActive() on an object where CActive::iStatus is not involved in an async request is another error in your code but not the reason for stray signal). You wait for either one of them to complete with WaitForRequest() but don't wait for the other to complete at all. The other request's completion signal will go to the active scheduler's WaitForAnyRequest(), resulting in stray signal. If you cancel a request, you will still need to wait on the thread request semaphore.
The best solution is to make the timeout timer an active object as well. Have a look at the CTimer class.
Another solution is just to add another WaitForRequest on the request not yet completed.
You are calling TestActiveObject::SetActive() but there is no call to any method that sets TestActiveObject::iStatus to KRequestPending. This will create the stray signal panic.
The only iStatus variable in your code is local to the CheckTCPSocket() method.
Related
I have a c++17 project using the non-boost version of ASIO because i need to connect, read and write to a TCP socket. The application has a read and write thread that run periodically and share a mutex therefore my reading thread has a time slot of 20 milliseconds in which it needs to read as much as it can and exit.
My problem is that i cant figure out how to get ASIO to read and then stop reading gracefully until another read is requested. There is no read with timeout functions and neither could i find any examples of such behaviour.
The closest thing ive found seems to kinda work but not exactly and i have no idea why. My current code is something like this:
ErrorCode Read(uint8_t* buf, unsigned int maxAmountOfBytesToRead, unsigned int& nRead)
{
std::lock_guard<std::mutex> tcpSocketLock(m_TCPSocketMutex);
asio::error_code asioError;
unsigned int amountOfBytesInBuffer = 0;
m_TCPConnectionSocket.async_read_some(asio::buffer(buf, maxAmountOfBytesToRead),
[&](const asio::error_code& errorCode, unsigned int result_n)
{
asioError = errorCode;
amountOfBytesInBuffer = result_n;
});
RunIOContextWithTimeOut(std::chrono::milliseconds(20));
nRead = amountOfBytesInBuffer;
// finish up and exit.
}
void RunIOContextWithTimeOut(std::chrono::steady_clock::duration timeout)
{
// Restart the io_context, as it may have been left in the "stopped" state
// by a previous operation.
m_TCPioContext.restart();
// Block until the asynchronous operation has completed, or timed out. If
// the pending asynchronous operation is a composed operation, the deadline
// applies to the entire operation, rather than individual operations on
// the socket.
m_TCPioContext.run_for(timeout);
// If the asynchronous operation completed successfully then the io_context
// would have been stopped due to running out of work. If it was not
// stopped, then the io_context::run_for call must have timed out.
if (!m_TCPioContext.stopped())
{
m_TCPioContext.stop();
// Run the io_context again until the operation completes.
m_TCPioContext.run();
}
}
But when running this code, i do notice that the data coming in is not exactly correct and that there are chunks of it missing. Adding logs and debugging i see that when the run_for pops out because of a time out, it never finishes the async read callback handler which makes me suspect that when the run_for doesnt finish on its own and is asked to stop, it abandons what ever data is has read and exits.
But i thought that was what the subsequent run() function was used for, to make the thread go back in and finish running the read before exiting. But apparently not? I dont understand how to make it just read and when its time to stop, copy over all that has read and stop gracefully. All other examples have you closing sockets and cancelling everything but i want to keep the socket open, the connection established, just to stop reading.
I cant let it read for as long as it wants because there is a write thread waiting for the read to finish so that it can be executed. I also would prefer not to make a solution that uses an additional thread of continues reading because this solution will be scaled up which will cause the usage of an additional 40 threads on a system with limited resources, we want to be as efficient as possible with our CPU resources.
I am using linux epoll in edge trigger mode.
Each time a new connection is incoming, I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag. My first question is: What's the right way to check which kind of event(s) occur for each ready file descriptor after the epoll_wait returns? I mean, I see some example code e.g from https://github.com/yedf/handy/blob/master/raw-examples/epoll-et.cc line 124 do it like this:
for (int i = 0; i < n; i++) {
//...
if (events & (EPOLLIN | EPOLLERR)) {
if (fd == lfd) {
handleAccept(efd, fd);
} else {
handleRead(efd, fd);
}
} else if (events & EPOLLOUT) {
if (output_log)
printf("handling epollout\n");
handleWrite(efd, fd);
} else {
exit_if(1, "unknown event");
}
}
What caught my attention is: it uses "if and else if and else" to check which event occurs, which means if it handleRead, then it can't handleWrite at the same time. And I think this may cause loss of event in the following condition: Both socket read and write operation have meet EAGAIN and then the remote end both read and send some data, thus the epoll wait may set both EPOLLIN and EPOLLOUT, but it can only handleRead, and the data remaining in output buffer can't be sent since handleWrite is not being called.
So is the above usage wrong?
According man 7 epoll QA:
If more than one event occurs between epoll_wait(2) calls, are
they combined or reported separately?
They will be combined.
If i got it right, several events can occur on a single file descriptor between epoll_wait calls. So I think I should use multiple "if if and if" to check on by one whether readable/writable/error events occur instead of using "if and else if". I went to see how nginx epoll module do, from https://github.com/nginx/nginx/blob/953f53921505a884f3912f2d8db5217a71c0479a/src/event/modules/ngx_epoll_module.c#L867 I see the following code:
if (revents & (EPOLLERR|EPOLLHUP)) {
//...
}
if ((revents & EPOLLIN) && rev->active) {
//....
rev->handler(rev);
}
if ((revents & EPOLLOUT) && wev->active) {
//....
wev->handler(wev);
}
It seems to adhere to my thoughts of checking all EPOLLERR..,EPOLLIN,EPOLLOUT events one after another.
Then I do the same kind of thing as nginx do in my application. But What I realized after experiment is: if I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag, and I didn't fill up the output buffer, I will always get EPOLLOUT flag set after epoll_wait returns due to some data arrives and this fd becomes readable, therefore redundant write_handler would be called, which is not what I expect.
I did some search and found that this situation indeed exists and not caused by any bug in my application. According to the top voted answer at epoll with edge triggered event says:
On a somewhat related note: if you register for EPOLLIN and EPOLLOUT events and assuming you never fill up the send buffer, you still get the EPOLLOUT flag set in the event returned by epoll_wait each time EPOLLIN is triggered - see https://lkml.org/lkml/2011/11/17/234 for a more detailed explanation.
And the link in this answer says:
It's doesn't mean there's an EPOLLOUT "event", it just means a message
is triggered (by the socket becoming readable) so you get a status
update. In theory the program doesn't need to be told about EPOLLOUT
here (it should be assuming the socket is writable already), but it
doesn't do any harm.
So far What I understand about epoll edge trigger mode is:
the epoll_wait return when the state of any fd being monitored has changed, e.g from nothing to read -> readable or buffer is full-> buffer can write
the epoll_wait may return one or several event(flags) for each fd in the ready list.
the flags in sturct epoll_event.events field indicate the current state of this fd. Even if we don't fill out the output buffer, the EPOLLOUT flag would be set when epoll_wait return due to readable, because the current state of the fd is just writable.
Please correct me if I am wrong.
Then my question would be: Should I maintain a flag in each connection to indicate whether EAGAIN occurs when write to output buffer, if it is not set, don't call write_handler/handleWrite in "if (events & EPOLLOUT)" branch, so that my upper layer program would not be told about EPOLLOUT here?
What a great question (since I had pretty much the same question)! I'll just summarize what I think I know now wrt to your informative question/description and your helpful links and hopefully smarter folk will correct any mistakes.
Yes, the if/else handling of event flags is definitely bogus. For sure at least two can events can arrive at effectively the same time. E.g., both the read and write sides might have become unblocked since last you called epoll_wait(). And, of course, as soon as you accept() the connection, both reading and writing suddenly become possible, so you get an "event" of EPOLLIN|EPOLLOUT.
I really didn't grok that epoll_wait() is always delivering the entire current state, rather than only the parts of the state that changed -- thanks for clearing that up. To be perhaps clearer, epoll_wait() won't return an fd unless something changed on that socket, but if something did change, it returns all the flags representing the current state. So, I found myself staring at a stream of EPOLLIN|EPOLLOUT events wondering why it was claiming there was an "output" event, even though I hadn't written anything yet. Your answer being correct: it's just telling me the output side is still writeable.
"Should I maintain a flag..." Yes, but I would imagine that in all but the most trivial situations you were probably going to end up maintaining at least one bit of "am I currently blocked" state for your readers/writers anyway. For example, if you ever want to process data in an order different than how it arrives (e.g., prioritize responses over requests to make your server more resistant to overload) you instantly have to give up the simplicity of just having the arrival of I/O drive everything. In the particular case of writing, epoll simply doesn't have enough information to notify you at the "right" time. As soon as you accept a connection, there's an event that says "you can write now"--but you probably have nothing to write if you're a server who couldn't possibly have already gotten a request from the client. epoll just can't know whether you have something to write or not, so you were always going to have to either suffer essentially "extraneous" events, or maintain your own state.
In all but the simplest cases, the socket file descriptor ends up being insufficient information for handling I/O events, so you invariably have to associate some data structure with it, or object if you prefer. So, my C++ looks something like:
nAwake = epoll_wait(epollFd, events, 100, milliseconds);
if(nAwake < 0)
{
perror("epoll_wait failed");
assert(false);
}
for(int iSocket=0; iSocket < nAwake; ++iSocket)
{
auto This = static_cast<Eventable*>(events[iSocket].data.ptr);
auto eventFlags = events[iSocket].events;
fprintf(stderr, "%s event on socket [%d] -> %s\n",
This->ClassName(), This->fd, DumpEvent(eventFlags));
This->Event(eventFlags);
}
Where Eventable is a C++ class (or derivative thereof) that has all the state needed to decide how to handle the flags epoll delivers. (Of course, this is letting the kernel store a pointer to a C++ object, requiring a design that is very clear about pointer ownership/lifetimes.)
And since you're writing low-level code on Linux, you may also care about EPOLLRDHUP. This not-highly-portable flag lets you save one call to read(). If the client (curl seems pretty good at evoking this behavior) closes its write side of the connection (sends a FIN), you normally discover that when epoll tells you EPOLLIN, but read() returns zero bytes. However, Linux maintains an extra bit to indicate your client's write side (your read side) has been closed. So, if you tell epoll you want the EPOLLRDHUP event you can use it to avoid doing a read() whose sole purpose will turn out to be telling you the writer closed their side.
Note that EPOLLIN will still be turned on whenever EPOLLRDHUP is, AFAIK. Even after you do a shutdown(fd, SHUT_RD). Another example of how you will usually be driven to maintain your own idea of the state of the connection. You care more about clients who are kind enough to do half-shutdowns if you are implementing HTTP.
When used as an edge-triggered interface, for performance reasons,
it
is possible to add the file descriptor inside the epoll interface
(EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).
This allows you
to avoid continuously switching between EPOLLIN and EPOLLOUT calling
epoll_ctl(2) with EPOLL_CTL_MOD.
How should QLocalSocket/QDataStream be read?
I have a program that communicates with another via named pipes using QLocalSocket and QDataStream. The recieveMessage() slot below is connected to the QLocalSocket's readyRead() signal.
void MySceneClient::receiveMessage()
{
qint32 msglength;
(*m_stream) >> msglength;
char* msgdata = new char[msglength];
int read = 0;
while (read < msglength) {
read += m_stream->readRawData(&msgdata[read], msglength - read);
}
...
}
I find that the application sometimes hangs on readRawData(). That is, it succesfully reads the 4 byte header, but then never returns from readRawData().
If I add...
if (m_socket->bytesAvailable() < 5)
return;
...to the start of this function, the application works fine (with the short test message).
I am guessing then (the documentation is very sparse) that there is some sort of deadlock occurring, and that I must use the bytesAvailable() signal to gradually build up the buffer rather than blocking.
Why is this? And what is the correct approach to reading from QLocalSocket?
Your loop blocks the event loop, so you will never get data if all did not arrive pn first read, is what causes your problem I think.
Correct approach is to use signals and slots, readyRead-signal here, and just read the available data in your slot, and if there's not enough, buffer it and return, and read more when you get the next signal.
Be careful with this alternative approach: If you are absolutely sure all the data you expect is going to arrive promptly (perhaps not unreasonable with a local socket where you control both client and server), or if the whole thing is in a thread which doesn nothing else, then it may be ok to use waitForReadyRead method. But the event loop will remain blocked until data arrives, freezing GUI for example (if in GUI thread), and generally troublesome.
I'm experiencing an intermittent delay when reading from a POSIX socket (RHEL6 x86_64 C++ icpc). My code is designed such that a user can provide an absolute timespec deadline (vs. a relative timeout) to be used across multiple calls to recv. I call pselect to make sure that data is available for reading before attempting to call recv.
This typically works as expected (will wait for data but not exceed deadline, introducing no noticeable delay if data is available to recv). However, I have a user that can periodically (~50% of the time) get his application into a state where the select blocks for ~400-500 ms even though data is available on the socket. If I watch /proc/net/tcp, I can see that data is available in the RX queue and I can see the application slowly reading the data off the queue. If I skip the call to pselect and just call recv, the behavior is similar (but less delay overall indicating recv is also blocking unnecessarily). When the application gets into this state it stays this way (experiences consistent delay with each pselect/recv).
I spent several hours poking around here and on other sites. This is the closest similar issue I could find, but there was no resolution...
http://developerweb.net/viewtopic.php?id=7458
Has anyone run into this sort of behavior before? I'm at a loss for what to do. I've instrumented the code to validate that this is where the delay is happening. (Edit: We actually just validated that the entire method below was slow, not any particular system call.) It seems like a kernel/OS issue but I'm not sure where to look. Here's the code...
// protected
bool
Message::wait(int socket, const timespec & deadline) {
// Bail if deadline not provided
if (deadline.tv_sec == 0 && deadline.tv_nsec == 0) {
return true;
}
// Make sure we haven't already exceeded deadline
timespec currentTime;
clock_gettime(CLOCK_REALTIME, ¤tTime);
if (VirtualClock::cmptime(currentTime, deadline) >= 0) {
LOG_WARNING("Timed out waiting to receive data");
m_timedOut = true;
return false;
}
// Calculate receive timeout
timespec timeout;
memset(&timeout, 0, sizeof(timeout));
timeout.tv_nsec = VirtualClock::nsecs(currentTime, deadline);
VirtualClock::fixtime(timeout);
// Wait for data
fd_set descSet;
FD_ZERO(&descSet);
FD_SET(socket, &descSet);
int result = pselect(socket + 1, &descSet, NULL, NULL, &timeout, NULL);
if (result == -1) {
m_error = errno;
LOG_ERROR("Failed to wait for data: %d, %s",
m_error, strerror(m_error));
return false;
} else if (result == 0 || !FD_ISSET(socket, &descSet)) {
LOG_WARNING("Timed out waiting to receive data");
m_timedOut = true;
return false;
}
return true;
}
VirtualClock is a time-related utility class just used here to compare/fix-up timespecs (i.e. not introducing any delays). I'd appreciate any insight on this behavior.
This was in fact not a problem with any system call. We used strace to diagnose and were seeing tons of calls to clock_gettime. Another (third) review of the calling code revealed a programming error resulting in the called code having a reference to corrupt stack data. This was facilitated by a flawed API design on my part resulting in corruption of the deadline.
I was allowing the user to pass in a reference to a ServerConfig class containing configuration (including data related to the deadline). My Server class was saving the reference instead of copying the object. The user created an instance of my Server class on the heap, passed in a reference a ServerConfig on the stack (in a method) resulting in non-deterministic garbage in the configuration when the method exited and the ServerConfig went out of scope. This is older code and I've since prevented this sort of thing from happening in other places after being burned but this one slipped through.
So lessons learned for me are: be careful with writing APIs that hang on to user-provided references, rethink premature optimization (the whole reason I was hanging onto a reference instead of just doing a copy), and look for stack corruption when you see non-deterministic behavior like this (something that I check for when I suspect builds are jacked up but didn't suspect this time). Also, strace is a great tool...I've seen others use it but now I'm comfortable using it myself.
Thanks for the comments and sorry for the false alarm.
Having what appears to be a dead-lock situation with a multi-threaded logging application.
Little background:
My main application has 4-6 threads running. The main thread responsible for monitoring health of various things I'm doing, updating GUIs, etc... Then I have a transmit thread and a receive thread. The transmit and receive threads talk to physical hardware. I sometimes need to debug the data that the transmit and receive threads are seeing; i.e. print to a console without interrupting them due to their time critical nature of the data. The data, by the way, is on a USB bus.
Due to the threading nature of the application, I want to create a debug console that I can send messages to from my other threads. The debug consule runs as a low priority thread and implements a ring buffer such that when you print to the debug console, the message is quickly stored to a ring buffer and sets and event. The debug console's thread sits WaitingOnSingleObject events from the in bound messages that come in. When an event is detected, console thread updates a GUI display with the message. Simple eh? The printing calls and the console thread use a critical section to control access.
NOTE: I can adjust the ring buffer size if I see that I am dropping messages (at least that's the idea).
In a test application, the console works very well if I call its Print method slowly via mouse clicks. I have a button that I can press to send messages to the console and it works. However, if I put any sort of load (many calls to Print method), everything dead-locks. When I trace the dead-lock, my IDE's debugger traces to EnterCriticalSection and sits there.
NOTE: If I remove the Lock/UnLock calls and just use Enter/LeaveCriticalSection (see the code) I sometimes work but still find myself in a dead-lock situation. To rule out deadlocks to stack push/pops, I call Enter/LeaveCriticalSection directly now but this did not solve my issue.... What's going on here?
Here is one Print statement, that allows me to pass in a simple int to the display console.
void TGDB::Print(int I)
{
//Lock();
EnterCriticalSection(&CS);
if( !SuppressOutput )
{
//swprintf( MsgRec->Msg, L"%d", I);
sprintf( MsgRec->Msg, "%d", I);
MBuffer->PutMsg(MsgRec, 1);
}
SetEvent( m_hEvent );
LeaveCriticalSection(&CS);
//UnLock();
}
// My Lock/UnLock methods
void TGDB::Lock(void)
{
EnterCriticalSection(&CS);
}
bool TGDB::TryLock(void)
{
return( TryEnterCriticalSection(&CS) );
}
void TGDB::UnLock(void)
{
LeaveCriticalSection(&CS);
}
// This is how I implemented Console's thread routines
DWORD WINAPI TGDB::ConsoleThread(PVOID pA)
{
DWORD rVal;
TGDB *g = (TGDB *)pA;
return( g->ProcessMessages() );
}
DWORD TGDB::ProcessMessages()
{
DWORD rVal;
bool brVal;
int MsgCnt;
do
{
rVal = WaitForMultipleObjects(1, &m_hEvent, true, iWaitTime);
switch(rVal)
{
case WAIT_OBJECT_0:
EnterCriticalSection(&CS);
//Lock();
if( KeepRunning )
{
Info->Caption = "Rx";
Info->Refresh();
MsgCnt = MBuffer->GetMsgCount();
for(int i=0; i<MsgCnt; i++)
{
MBuffer->GetMsg( MsgRec, 1);
Log->Lines->Add(MsgRec->Msg);
}
}
brVal = KeepRunning;
ResetEvent( m_hEvent );
LeaveCriticalSection(&CS);
//UnLock();
break;
case WAIT_TIMEOUT:
EnterCriticalSection(&CS);
//Lock();
Info->Caption = "Idle";
Info->Refresh();
brVal = KeepRunning;
ResetEvent( m_hEvent );
LeaveCriticalSection(&CS);
//UnLock();
break;
case WAIT_FAILED:
EnterCriticalSection(&CS);
//Lock();
brVal = false;
Info->Caption = "ERROR";
Info->Refresh();
aLine.sprintf("Console error: [%d]", GetLastError() );
Log->Lines->Add(aLine);
aLine = "";
LeaveCriticalSection(&CS);
//UnLock();
break;
}
}while( brVal );
return( rVal );
}
MyTest1 and MyTest2 are just two test functions that I call in response to a button press. MyTest1 never causes a problem no matter how fast I click the button. MyTest2 dead locks nearly everytime.
// No Dead Lock
void TTest::MyTest1()
{
if(gdb)
{
// else where: gdb = new TGDB;
gdb->Print(++I);
}
}
// Causes a Dead Lock
void TTest::MyTest2()
{
if(gdb)
{
// else where: gdb = new TGDB;
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
gdb->Print(++I);
}
}
UPDATE:
Found a bug in my ring buffer implementation. Under heavy load, when buffer wrapped, I didn't detect a full buffer properly so buffer was not returning. I'm pretty sure that issue is now resolved. Once I fixed the ring buffer issue, performance got much better. However, if I decrease the iWaitTime, my dead lock (or freeze up issue) returns.
So after further tests with a much heavier load it appears my deadlock is not gone. Under super heavy load I continue to deadlock or at least my app freezes up but no where near it use to since I fixed ring buffer problem. If I double the number of Print calls in MyTest2 I easily can lock up every time....
Also, my updated code is reflected above. I know make sure my Set & Reset event calls are inside critical section calls.
With those options closed up, I would ask questions about this "Info" object. Is it a window, which window is it parented to, and which thread was it created on?
If Info, or its parent window, was created on the other thread, then the following situation might occur:
The Console Thread is inside a critical section, processing a message.
The Main thread calls Print() and blocks on a critical section waiting for the Console Thread to release the lock.
The Console thread calls a function on Info (Caption), which results in the system sending a message (WM_SETTEXT) to the window. SendMessage blocks because the target thread is not in a message alertable state (isn't blocked on a call to GetMessage/WaitMessage/MsgWaitForMultipleObjects).
Now you have a deadlock.
This kind of #$(%^ can happen whenever you mix blocking routines with anything that interacts with windows. The only appropriate blocking function to use on a GUI thread is MSGWaitForMultipleObjects otherwise SendMessage calls to windows hosted on the thread can easily deadlock.
Avoiding this involves two possible approaches:
Never doing any GUI interaction in worker threads. Only use PostMessage to dispatch non blocking UI update commands to the UI thread, OR
Use kernel Event objects + MSGWaitForMultipleObjects (on the GUI thread) to ensure that even when you are blocking on a resource, you are still dispatching messages.
Without knowing where it is deadlocking this code is hard to figure out. Two comments tho:
Given that this is c++, you should be using an Auto object to perform the lock and unlock. Just in case it ever becomes non catastrophic for Log to throw an exception.
You are resetting the event in response to WAIT_TIMEOUT. This leaves a small window of opportunity for a 2nd Print() call to set the event while the worker thread has returned from WaitForMultiple, but before it has entered the critical section. Which will result in the event being reset when there is actually data pending.
But you do need to debug it and reveal where it "Deadlocks". If one thread IS stuck on EnterCriticalSection, then we can find out why. If neither thread is, then the incomplete printing is just the result of an event getting lost.
I would strongly recommend a lockfree implementation.
Not only will this avoid potential deadlock, but debug instrumentation is one place where you absolutely do not want to take a lock. The impact of formatting debug messages on timing of a multi-threaded application is bad enough... having locks synchronize your parallel code just because you instrumented it makes debugging futile.
What I suggest is an SList-based design (The Win32 API provides an SList implementation, but you can build a thread-safe template easily enough using InterlockedCompareExchange and InterlockedExchange). Each thread will have a pool of buffers. Each buffer will track the thread it came from, after processing the buffer, the log manager will post the buffer back to the source thread's SList for reuse. Threads wishing to write a message will post a buffer to the logger thread. This also prevents any thread from starving other threads of buffers. An event to wake the logger thread when a buffer is placed into the queue completes the design.