GetQueuedCompletionStatus stops reading a serial port - c++

I have a device that generates messages over a serial port. When I reboot the device, the IO Completion Port stops reading bytes.
The code is calls GetQueuedCompletionStatus():
BOOL bRet = GetQueuedCompletionStatus(
m_hCompletionPort,
&dwBytesTransferred,
&dwCompletionKey,
&pOverlapped,
INFINITE);
PortMon looks like:
...
IRP_MJ_WRITE Serial1 SUCCESS LENGTH: 7 REBOOT.
IRP_MJ_READ Serial1 CANCELLED LENGTH: 1
Logging shows the following result:
bRet=true, dwBytesTransferred=7, pOverlapped=0x0202B028, GetLastError()=997
(sleep forever)
Is there any way to detect this failure and reestablish communications?
I can monitor a heat beat and close/reopen the serial port, but it doesn't seem right that the windows API allows serial communications to silently drop like this.

If you do WaitForSingleObject on the handle for the serial port that you opened to start reading data, does the handle become signalled when the device is rebooted? Maybe this is a way to tell when you need to open the port again?

IO Completion Ports can certainly handle this case without problem. You don't need to close and reopen the device.
The most likely problem in this case is that you have an error on the line (caused by the device reset) that you have not cleared using ClearCommError().
You need to use SetCommState() and SetCommTimeouts() appropriately for your device up front. In the DCB you pass to SetCommState(), you need to set fAbortOnError. If you do dequeue an error you need to call ClearCommError() before you requeue another read.

RE: janm (I can't seem to add a comment to your answer sorry)
I did try setting various flags, including the DCB's fAbortOnError, but GetQueuedCompletionStatus() would still wait infinitely. I also tried periodically timing out the call, and checking the serial port for errors. The serial port always looked fine, yet the disconnection would still permanently break the IO Completion Port. The device rebooting probably creates a transient error state... I say probably, because I've never been able to detect it!
A fellow developer also had a crack at this problem, and they too failed. So we just rewrote the code to use overlapped serial port reads, and now it works fine.
There is probably something, somewhere that we missed... in the end we wasted more time trying to solve the mystery than it took to rewrite the code.

Related

How to find out if my COM port has recieved XOFF?

I am testing code for serial port communication. I am using synchronous IO for now to keep things simple. I have observed that when the PC recieves XOFF the program pauses, no more printing in the console window. When the program recieves XON it starts running again.
What method exists to find out if my program's serial port has earlier recieved XOFF (regardless of synchronous and asynchronous IO) since the COM port driver shall prevent the port from sending any characters out in this case and I better check if XOFF was recieved before calling writefile().
I think that one way is to use the ClearCommError() with COMSTAT struct. Is this the only way? I get this impression by reading https://msdn.microsoft.com/en-us/library/ff802693.aspx
If that is so, it certainly appears a strange way of doing that.

Win API serial port need to wait after initialization

I have a following problem. I have a serial port device that is supposed to communicate with a computer. In fact it is Arduino Due board but i don't think it is related.
I use CreateFile to open the port, and then set the parameters using GetCommState()&SetCommState() and GetCommTimeouts()&SetCommTimeouts().
The port is opened correctly - no problem there. But at this point I want to check whether the device is connected. So I send a specific message. The device is supposed to respond in a certain way so that I know it is connected.
Now to the problem: It only works if put Sleep(1000) after Creating the port (before sending the handshake request). It looks as if the WinAPI needs some time before it can begin to use the port. Because the Sleep solution is not generally usable I need to find some alternative...
By it doesn't work I mean ReadFile times out. It times out even if the timeout is set to something like 5 seconds - note that the Sleep interval is only one second. So it looks like the handshake request is not even sent. If I set timeout to 1 second and Sleep interval to one second, it works. If I set timeout to 5 seconds but there's no Sleep it doesn't work. See the problem?
I am going to try some NetworkMonitor, but I'm kinda sure the problem is not with the device...
OK, I might have searched a little more before posting this question.
The thing is that Arduino restarts itself when you open a connection from your PC.
When you use a terminal you connect first and write a few seconds later so that the Arduino board has enough time to boot up and you won't notice the thing. Which is what confused me enough to write the question.
There are 3 solutions to this, only 2 of which it makes sense to mention at all:
1) the solution I used without knowing all this (you wait about a second for the board to boot up again...)
2) you disable auto-reset by modifying your Arduino board
Both of them are stupid if you ask me, there should be a switch or a flash variable to do this...

WriteFile returning error code 995

I have searched stackoverflow and googled thoroughly for this problem but not been able to find a clue to why this problem is happening.
I am writing a program in C++ which communicates with a measurement device connected through USB. The program is multithreaded and several threads will communicate with the device. A mutex is used to guarantee that no two threads try to read or write from the device at the same time.
Commands are sent to the device using WriteFile and the responses and measured values are read using ReadFile - both operations are done synchronously.
Sometimes while reading the measured value from the device will the measurement fail with a timeout (GetLastError() returns error code 121), due to a synchronization error inside the measurement device itself - which is ok and expected.
When I try to continue the measurement, by sending a new command will WriteFile sometimes (roughly 50% of the time) fail and GetLastError() returns the error code 995 which is described in MSDN as:
ERROR_OPERATION_ABORTED
995 (0x3E3)
The I/O operation has been aborted because of either a thread exit or an application request.
There is no thread exit after the timeout occurs and there is no cancel of any read or write operation. I am able to resume the communication only by closing and re-opening the communication with the device using CloseHandle and CreateFile. However, this will take some time from the measurement and is not an ideal solution.
My question is, why does WriteFile return the error code 995 in this case, and what can I do to avoid having to close and re-open the communication with the device?
Contact the OEM of your USB-serial device -- we can't help you further with this, since we don't have access to the driver's plumbing. If the device OEM can't help you -- contact the USB-serial chipset's manufacturer; if they refuse to help, throw the USB-to-serial adapter in the garbage can and buy one with a chipset that actually has manufacturer support (such as the FTDI or Silicon Labs USB-serial chips), not some cheap clone garbage.

ctb::SerialPort - time-out in Write()

I'm writing program that should control a piece of scientific hardware over COM-port. The program itself is written in wxWidgets and uses ctb library. To test, it before I connect it to 300k€ equipment, I use com0com (Null-modem emulator) to forward COM2 port. To emulate my hardware I use wxTerminal (COM3). Altogether it works nice. One can debug not only in VS or DB but also see the whole data transfer in wxTerminal.
Now to my problem. I use to send data to COM-port ctb::SerialPort::Write() function.
device->Write( (char*)line.c_str(), line.size() );
However, if I disconnect the connection on the side of wxTerminal (i.e. COM2->NULL) than program hangs in this function.
It's obvious that I should add some function to test if my equipment is still there, but to do it I need to send data-packet to it and expect some answer. So I'm back to the Write().
"Just in case" I've also tried ctb::IOBase::Writev (char ∗ buf, size_t len, unsigned int timeout_in_ms) with timeout set to 100ms and I've still got program hanging in the same line. It's actually expected behavior as in this case timeout means only that the connection line is blocked till whole buffer is transferred or timeout is reached.
Connecting of wxTerminal to COM3 leads to un-freezing of debugger or stand-alone program. The Sun is shining, the birds are singing.
Can somebody give me a hint how to overcome my problem? I'd appreciate if comments would be restrained to wxWidgets-world - I really do not want to re-write whole program with other toolkit.
If you COM port library does not provide effective timeouts on write block, (presumably because of hardware flow-control), you could implement your own by threading off the write. You could use a couple of events/semaphores/condvar/whatever. One to signal to the thread that there is something in a buffer to send and another that you can wait on with a timeout that is signaled by the thread after it has sent the buffer. If the 'ack' wait times out, your COM port is stuck and you can pop up some 'Check cable' messageBox. I don't know what other calls your port lib supports, so I don't know how you could implement flushes/retries.

Ensuring data is being read with async_read

I am currently testing my network application in very low bandwidth environments. I currently have code that attempts to ensure that the connection is good by making sure I am still receiving information.
Traditionally I have done this by recording the timestamp in my ReadHandler function so that each time it gets called I know I have received data on the socket. With very low bandwidths this isn't sufficient because my ReadHandler is not getting called frequently enough.
I was toying around with the idea of writing my own completion condition function (right now I am using tranfer_at_least(1)) thinking it would get called more frequently and I could record my timestamp there, but I was wondering if there wasn't some other more standard way to go about this.
We had a similar issue in production: some of our connections may be idle for days, but we must detect if the remote is dead ASAP.
We solved it by enabling the TCP_KEEPALIVE option:
boost::asio::socket_base::keep_alive option(true);
mSocketTCP.set_option(option);
which had to be accompanied by new startup script that writes sensible values to /proc/sys/net/ipv4/tcp_keepalive_* which have very long timeouts by default (on LInux)
You can use the read_some method to get partial reads, and deal with the book keeping. This is more efficient than transfer_at_least(1), but you still have to keep track of what is going on.
However, a cleaner approach is just to use a concurrent deadline_timer. If the timer goes off before you are finished, then is taking too long and cancel whatever is going on. If not, just stop the timer and continue. Something like:
boost::asio::deadline_timer t;
t.expires_from_now(boost::posix_time::seconds(20));
t.async_wait(bind(&Class::timed_out, this, _1));
// Do stuff.
if (!t.cancel()) {
// Timer went off, abort
}
// And the timeout method
void Class::timed_out(error_code const& error)
{
if (error == boost::asio::error::operation_aborted) return;
// Deal with the timeout, close the socket, etc.
}
I don't know how to handle low latency of network from within application. Can you be sure if it's network latency, or if peer server or peer application busy and react slowly. Does it matter if it network/server/application quilt?
Even if you can discover network latency and find it's big, what are you going to do?
You can not improve the situation.
Consider other critical case which is a subset of what you're trying to handle - network is down (e.g. you disconnect cable from your machine). Since it a subset of your problem you want to handle it too.
Let's examine the network down effect on active TCP connection.How can you discover your active TCP connection is still alive? Calling send() will success, but it merely says that the message queued in TCP outgoing queue in kernel. TCP stack will try to send it, but since TCP ACK won't be sent back, TCP stack on your side will try to resend it again and again. You can see your message in netstat output (Send-Q column).
I'm aware of the following ways to deal with it:
One standard way is TCP keep alive proposed #Cubby.
Another way is to implement Keep Alive mechanism. Send Keep Alive req message and peer is obligated to send back Keep Alive ack message.
If you don't receive ack message after predefined timeout, try to send Keep Alive req N more times (e.g. N=2). If still no success, close the socket and open it again. If peer server is not available you'll not be abable to open connection, since TCP 3 way handshake requires peer to respond.