I have a C++ application that accepts TCP connections from client applications.
After a seemingly random time of running fine (days), it stops receiving followup messages from the clients and only sees the first message on each TCP connection. After a re-start all is fine again.
The trouble is, this only happens on the production server where I have to restart is as soon as it gets stuck and I have been uanble to reproduce this on a lab machine. None of the socket operations seems to return an error, that I would see in my logfile and the application is huge so I can't just post the relevant part here.
First messages keep coming through all the time, only subsequent messages aren't received after a while. Even when my application stops receiving the followup-messages, I can see them comming in with Wireshark.
Any ideas how I might find out what is happening ? What should I be looking for ?
Any config settings used here? In the past I have put a condition on a server accept to ignore messages after 50,000 have been processed. This was to prevent run-away situations in development. This code went live on one occasion without changing the config setting to 'allow infinite messages'. The result was exactly what you describe, ok for 2-3 days, then messages sent ok, but just ignored with no errors anywhere.
This may not be the case here, but I mention it as an example of where you may have to look.
Related
Multiple clients are connected to a single ZMQ_PUSH socket. When a client is powered off unexpectedly, server does not get an alert and keep sending messages to it. Despite of using ZMQ_OBLOCK and setting ZMQ_HWM to 5 (queue only 5 messages at max), my server doesn't get an error until unless client is reconnected and all the messages in queue are received at once.
I recently ran into a similar problem when using ZMQ. We would cut power to interconnected systems, and the subscriber would be unable to reconnect automatically. It turns out the there has recently (past year or so) been implemented a heartbeat mechanism over ZMTP, the underlying protocol used by ZMQ sockets.
If you are using ZMQ version 4.2.0 or greater, look into setting the ZMQ_HEARTBEAT_IVL and ZMQ_HEARTBEAT_TIMEOUT socket options (http://api.zeromq.org/4-2:zmq-setsockopt). These will set the interval between heartbeats (ZMQ_HEARTBEAT_IVL) and how long to wait for the reply until closing the connection (ZMQ_HEARTBEAT_TIMEOUT).
EDIT: You must set these socket options before connecting.
There is nothing in zmq explicitly to detect the unexpected termination of a program at the other end of a socket, or the gratuitous and unexpected failure of a network connection.
There has been historical talk of adding some kind of underlying ping-pong are-you-still-alive internal messaging to zmq, but last time I looked (quite some time ago) it had been decided not to do this.
This does mean that crashes, network failures, etc aren't necessarily handled very cleanly, and your application will not necessarily know what is going on or whether messages have been successfully sent. It is Actor model after all. As you're finding your program may eventually determine something had previously gone wrong. Timeouts in zmtp will spot the failure, and eventually the consequences bubble back up to your program.
To do anything better you'd have to layer something like a ping-pong on top yourself (eg have a separate socket just for that so that you can track the reachability of clients) but that then starts making it very hard to use the nice parts of ZMQ such as push / pull. Which is probably why the (excellent) zmq authors decided not to put it in themselves.
When faced with a similar problem I ended up writing my own transport library. I couldn't find one off the shelf that gave nice behaviour in the face of network failures, crashes, etc. It implemented CSP, not actor model, wasn't terribly fast (an inevitability), didn't do patterns in the zmq sense, but did mean that programs knew exactly where messages were at all times, and knew that clients were alive or unreachable at all times. The CSPness also meant message transfers were an execution rendezvous, so programs know what each other is doing too.
I have a following problem. I have a serial port device that is supposed to communicate with a computer. In fact it is Arduino Due board but i don't think it is related.
I use CreateFile to open the port, and then set the parameters using GetCommState()&SetCommState() and GetCommTimeouts()&SetCommTimeouts().
The port is opened correctly - no problem there. But at this point I want to check whether the device is connected. So I send a specific message. The device is supposed to respond in a certain way so that I know it is connected.
Now to the problem: It only works if put Sleep(1000) after Creating the port (before sending the handshake request). It looks as if the WinAPI needs some time before it can begin to use the port. Because the Sleep solution is not generally usable I need to find some alternative...
By it doesn't work I mean ReadFile times out. It times out even if the timeout is set to something like 5 seconds - note that the Sleep interval is only one second. So it looks like the handshake request is not even sent. If I set timeout to 1 second and Sleep interval to one second, it works. If I set timeout to 5 seconds but there's no Sleep it doesn't work. See the problem?
I am going to try some NetworkMonitor, but I'm kinda sure the problem is not with the device...
OK, I might have searched a little more before posting this question.
The thing is that Arduino restarts itself when you open a connection from your PC.
When you use a terminal you connect first and write a few seconds later so that the Arduino board has enough time to boot up and you won't notice the thing. Which is what confused me enough to write the question.
There are 3 solutions to this, only 2 of which it makes sense to mention at all:
1) the solution I used without knowing all this (you wait about a second for the board to boot up again...)
2) you disable auto-reset by modifying your Arduino board
Both of them are stupid if you ask me, there should be a switch or a flash variable to do this...
I am creating a network client application that sends requests to a server using a QTcpSocket and expects responses in return. No higher protocol involved (HTTP, etc.), they just exchange somewhat simple custom strings.
In order to test, I have created a TCP server in Python that listens on a socket and logs the strings it receives and those it sends back.
I can send the first request OK and get the expected response. However, when I send the second request, it does not seem to get written to the network.
I have attached debug slots to the QTcpSocket's notification signals, such as bytesWritten(...), connected(), error(), stateChanged(...), etc. and I see the connection being established, the first request sent, the first response processed, the number of bytes written - it all adds up...
Only the second request never seems to get sent :-(
After attempting to send it, the socket sends an error(RemoteHostClosedError) signal followed by ClosingState and UnconnectedState state change signals.
Before I go any deeper into this, a couple of (probably really basic) questions:
do I need to "clear" the underlying socket in any way after reading ?
is it possible / probable that not reading all the data the server has sent me prevents me from writing ?
why does the server close the connection ? Does it always do that so quickly or could that be a sign that something is not right ? I tried setting LowDelay and KeepAlive socket options, but that didn't change anything. I've also checked the socket's state() and isValid() and they're good - although the latter also returns true when unconnected...
In an earlier version of the application, I closed and re-opened the connection before sending a request. This worked ok. I would prefer keeping the connection open though. Is that not a reasonable approach ? What is the 'canonical' way to to implement TCP network communication ? Just read/write or re-open every time ?
Does the way I read from the socket have any impact on how I can write to it ? Most sample code uses readAll(...) to get all available data; I read piece by piece as I need it and << to a QTextStream when writing...
Could this possibly be a bug in the Qt event loop ? I have observed that the output in the Qt Creator console created with QDebug() << ... almost always gets cut short, i.e. just stops. Sometimes some more output is printed when I shut down the application.
This is with the latest Qt 5.4.1 on Mac OS X 10.8, but the issue also occurs on Windows 7.
Update after the first answer and comments:
The test server is dead simple and was taken from the official Python SocketServer.TCPServer Example:
import SocketServer
class MyTCPHandler(SocketServer.StreamRequestHandler):
def handle(self):
request = self.rfile.readline().strip()
print "RX [%s]: %s" % (self.client_address[0], request)
response = self.processRequest(request)
print "TX [%s]: %s" % (self.client_address[0], response)
self.wfile.write(response)
def processRequest(self, message):
if message == 'request type 01':
return 'response type 01'
elif message == 'request type 02':
return 'response type 02'
if __name__ == "__main__":
server = SocketServer.TCPServer(('localhost', 12345), MyTCPHandler)
server.serve_forever()
The output I get is
RX [127.0.0.1]: request type 01
TX [127.0.0.1]: response type 01
Also, nothing happens when I re-send any message after this - which is not surprising as the socket was closed. Guess I'll have to figure out why it is closed...
Next update:
I've captured the network traffic using Wireshark and while all the network stuff doesn't really tell me a lot, I do see the first request and the response. Right after the client [ACK]nowledges the response, the server sends a Connection finish (FIN). I don't see the second request anywhere.
Last update:
I have posted a follow-up question at Python: SocketServer closes TCP connection unexpectedly.
Only the second request never seems to get sent :-(
I highly recommend running a program like WireShark and seeing what packets are actually getting sent and received across the network. (As it is, you can't know for sure whether the bug is on the client side or in the server, and that is the first thing you need to figure out)
do I need to "clear" the underlying socket in any way after reading ?
No.
is it possible / probable that not reading all the data the server has
sent me prevents me from writing ?
No.
why does the server close the connection ?
It's impossible to say without looking at the server's code.
Does it always do that so quickly or could that be a sign that
something is not right ?
Again, this would depend on how the server was written.
This worked ok. I would prefer keeping the connection open though. Is
that not a reasonable approach ?
Keeping the connection open is definitely a reasonable approach.
What is the 'canonical' way to to implement TCP network communication
? Just read/write or re-open every time ?
Neither was is canonical; it depends on what you are attempting to accomplish.
Does the way I read from the socket have any impact on how I can write
to it ?
No.
Could this possibly be a bug in the Qt event loop ?
That's extremely unlikely. The Qt code has been used for years by tens of thousands of programs, so any bug that serious would almost certainly have been found and fixed long ago. It's much more likely that either there is a bug in your client, or a bug in your server, or a mismatch between how you expect some API call to behave and how it actually behaves.
i know little about pipes but have used one to connect two processes in my code in visual C++. The pipe is working well, but I need to add error handling to the same, hence wanted to know what will happen to a pipe if the server creating it crashed and how do I recognize it from client process?
Also what will happen if the client process tried accessing the same pipe, after the server crash, if no error handling is put in place?
Edit:
What impact will be there on the memory if i keep creating new pipes (say by using system time as pipe name) while the previous was broken because of a server crash? Will these broken pipes be removed from the memory?
IIRC the ReadFile or WriteFile function will return FALSE and GetLastError() will return STATUS_PIPE_DISCONNECTED
I guess this kind of handling is implemented in your code, if not you should better add it ;-)
I just want to throw this out there.
If you want a survivable method for transferring data between two applications, you might consider using MSMQ or even bringing in BizTalk or another message platform.
There are several things to consider:
what happens if the server is rebooted or loses power?
What happens if the server application becomes unresponsive?
What happens if the server application is killed or goes away completely?
What is the appropriate response of a client application in each of the above?
Each of those contexts represent a potential loss of data. If the data loss is unacceptable then named pipes is not the mechanism you should be using. Instead you need to persist the messages somehow.
MSMQ, storing to a database, or even leveraging Biztalk can take care of the survivability of the message itself.
If 1 or 3 happens, then the named pipe goes away and must be recreated by a new instance of your server application. If #2 happens, then the pipe won't go away until someone either reboots the server or kills the server app and starts it again.
Regardless, the client application needs to handle the above issues. They boil down to connection failed problems. Depending on what the client does you might have it move into a wait state and let it ping the server every so often to see if it has come back again.
Without knowing the nature of the data and communication processes involved its hard to recommend a proper approach.
I have built a simple web service that simply uses HttpListener to receive and send requests. Occasionally, the service fails with "Specified network name is no longer available". It appears to be thrown when I write to the output buffer of the HttpListenerResponse.
Here is the error:
ListenerCallback() Error: The specified network name is no longer available at System.Net.HttpResponseStream.Write(Byte[] buffer, Int32 offset, Int32 size)
and here is the guilty portion of the code. responseString is the data being sent back to the client:
buffer = System.Text.Encoding.UTF8.GetBytes(responseString);
response.ContentLength64 = buffer.Length;
output = response.OutputStream;
output.Write(buffer, 0, buffer.Length);
It doesn't seem to always be a huge buffer, two examples are 3,816 bytes and, 142,619 bytes, these errors were thrown about 30 seconds apart. I would not think that my single client application would be overloading HTTPlistener; the client does occasionally sent/receive data in bursts, with several exchanges happening one after another.
Mostly Google searches shows that this is a common IT problem where, when there are network problems, this error is shown -- most of the help is directed toward sysadmins diagnosing a problem with an app moreso than developers tracking down a bug. My app has been tested on different machines, networks, etc. and I don't think it's simply a network configuration problem.
What may be the cause of this problem?
I'm getting this too, when a ContentLength64 is specified and KeepAlive is false. It seems as though the client is inspecting the Content-Length header (which, by all possible accounts, is set correctly, since I get an exception with any other value) and then saying "Whelp I'm done KTHXBYE" and closing the connection a little bit before the underlying HttpListenerResponse stream was expecting it to. For now, I'm just catching the exception and moving on.
I've only gotten this particular exception once so far when using HttpListener.
It occurred when I resumed execution after my application had been standing on a breakpoint for a while.
Perhaps there is some sort of internal timeout involved? Your application sends data in bursts, which means it's probably completely inactive a lot of the time. Did the exception occur immediately after a period of inactivity?
Same problem here, but other threads suggest ignoring the Exception.
C# problem with HttpListener
May be that's not the right thing to do.
For me I find that whenever the client close the webpage before it load fully it gives me that exception. What I do is just add a try catch block and print something when the exception happen. In another word I just ignore the exception.
The problem occurs when you're trying to respond to an invalid request. Take a look at this. I found out that the only way to solve this problem is:
listener = new HttpListener();
listener.IgnoreWriteExceptions = true;
Just set IgnoreWriteExceptions to true after instantiating your listener and the errors are gone.
Update:
For a deeper explanation, Http protocol is based on TCP protocol which works with streams to which each peer writes data. TCP protocol is peer to peer and each peer can close the connection. When the client sends a request to your HttpListener there will be a TCP handshake, then the server will process the data and responds back to the client by writing into the connection's stream. If you try to write into a stream which is already closed by the remote peer the Exception with "Specified network name is no longer available" will occur.