TCP/IP client reconnect after server shutdown - c++

I have a situation where the server that the client connects to may get repeatly shutdown with the client still operational.
In the current implementation, when the client fails a read it will call close(sockFd) to close the socket. Then it will it loop to try to recreate that socket.
Is that best practice? Or is it possible to leave the socket and attempt to connect to it?
Edit: Platform is Linux

When you get any error other than EINTR or EAGAIN/EWOULDBLOCK on a socket it is almost certainly dead and must be closed. #abarnert gives some others in the useful comment below.

Related

boost asio notify server of disconnect

I was wondering if there is any way to notify a server if a client side application was closed. Normally, if I Ctrl+C my client side terminal an EOF-signal is sent to the server side. The server side async_read function has a handle which has boost::system::error_code ec argument fed into it. The handle is called when the server side receives EOF-signal which I can happily process and tell the server to start listening again.
However, if I try to cleanly close my client application using socket.shutdown() and socket.close() nothing happens and the server side socket remains open.
I was wondering, is there a way to somehow send an error signal to the server-side socket so I could then process it using the error code?
The approaches described in comments covers 99% of cases. It doesn't work when client machine was (not gracefully) turned off, or network problems.
To get reliable notification of disconnected client you need to implement "ping" feature: to send ping packets regularly and to check that you received pong packets.

C++ Socket Recv() and Network Interface going down

I have written a client using plain sockets in C to connect to a remote machine to maintain a persistent connection so as to receive push messages. Everything works great. To make it persistent, I have set Keepalive and waiting on recv() function in a loop.
The problem is, when the network interface goes down, the recv() does not return. As I understand from socket documentation that the peer has to disconnect for recv() to return. Network Interface going down is not the same as peer disconnecting.
The need here is that if the network interface goes down, I need to schedule a reconnect so that the channel gets established.
Any thoughts on this please?
Use whatever mechanism you wish to force the receive operation to timeout. Depending on the specifics of your use case, you may wish to disconnect if a timeout occurs or you may wish to send something to check the status of the connection.
Whatever protocol you are using on top of TCP should be documented and the documentation should specify how disconnects are detected. You must send to detect a connection loss, so every protocol designed to operate on top of TCP should be designed with this in mind.

Recvfrom() is hanging -- how to deal this when Server is OFF

I have client code which gets a response from the server using UDP and recvfrom(). This is working fine when the server is ON, but once I stop the server my client program is hanging; I suspect recvfrom() is waiting for the response from the server.
If the server and client are both are installed on same system them I am getting error from recvfrom() when the server is OFF, but when the server and client are on different systems then the client hangs at recvfrom() as there is no response from the server since its OFF.
Please some one can give me idea how can I can deal with this situation, maybe a timer signal interuption can solve the issue.. can anyone throw some light on this?
I am Using Visual studio 2005.
Your call is blocking, because there is no data for this socket right now. When the server was on, it was fast enough to send data so the recvfrom call got it and returned quickly. When the server is off, nobody's sending data and recvfrom waits forever. It does not matter whether server is on or off, recvfrom is doing the same thing in both cases; you just don't notice the delay in the first case.
You need to use non-blocking sockets. In non-blocking mode, recvfrom will return an error when there is no data, instead of waiting. You can then use select call to sleep until a timeout happens or the data arrive.

Address already in use with boost asio acceptor

I wrote a server that is listening for incomming TCP connections and clients connecting to it. When I shut down the server and restart it on the same port, I sometimes get the error message EADDRINUSE when calling bind(...) (error code: 98 on Linux). This happens even though I am setting the option to reuse the socket.
The error does not happen all the time, but it seems that it happens more often when clients are connected to the server and sending data while it shuts down. I guess the problem is that there are still pending connections while the server is shut down (related topic: https://stackoverflow.com/questions/41602/how-to-forcibly-close-a-socket-in-time-wait).
On the server side, I am using boost::asio::ip::tcp::acceptor. I initialize it with the option "reuse_address" (see http://beta.boost.org/doc/libs/1_38_0/doc/html/boost_asio/reference/basic_socket_acceptor.html). Here is the code snippet:
using boost::asio::ip::tcp;
acceptor acceptor::acceptor(io_service);
endpoint ep(ip::tcp::v4(), port);
acceptor.open(ep.protocol());
acceptor.set_option(acceptor::reuse_address(true));
acceptor.bind(ep);
acceptor.listen();
The acceptor is closed with:
acceptor.close();
I also tried using acceptor.cancel() before that, but it had the same effect. When this error occurred, I cannot restart the server on the same port for quite some time. Restarting the network helps, but is not a permanent solution.
What am I missing?
Any help would be greatly appreciated! :)
These were originally a comment to the question.
does your server fork child processes? Also, are you sure the socket is in TIME_WAIT state? You might want to grab the netstat -ap output when this happens
When you solve these problems "by force", it seems you are calling problems on your head, do not you?
There is a reason the default behavior requires you to wait, otherwise the network could for example confuse the ACK from the previous connection to be ACK for the new connection.
I would not allow this "solution" to be included in release builds in my team.
Remember, when the probability of error is very low, testing is extremely difficult!

Socket re-connection failure

System Background:
Its basically a client/server application. Server is an embedded device and Client is a windows app developed in C++.
Issue: After a runtime of about a week, communication breaks between client/server,
because of this the server is not able to connect back to the client and needs a restart to recover. Looks like System is experiencing Socket re-connection problem. Also The network sometimes experiences intermittent failures.
Abrupt Termination at remote end
Port locking
Want some suggestions on how to cleanup the socket or shutdown cleanly so that re-connection happens properly. Other alternate solutions?
Thanks,
Hussain
It does not sound like you are in a position to easily write a stress test app to reproduce this more quickly out of band, which is what I would normally suggest. A pragmatic solution might be to periodically restart the server and client at a time when you think the system is least busy, or when problems arise. This sounds like cheating but many production systems I have been involved with take this approach to maximize system uptime.
My preferred solution here would be to abstract the server and client socket code (hopefully your design allows this to be done without too much work) and use it to implement client and server test apps that can be used to stress test only the socket code by simulating a lot of normal socket traffic in a short space of time - this helps identify timing windows and edge cases that could cause problems over time, and might speed up the process of obtaining a debuggable repro - you can simulate network error in your test code by dropping the socket on the client or server periodically.
A further step to take on the strategic front would be to ensure that you have good diagnostics in your socket handlers on client and server side. Track socket open and close, with special focus on your socket error and reconnect paths given you know the network is unreliable. Make sure the logs are output sequential with a timestamp. Something as simple as this might quickly show you what error or conditions trigger your problems. You can quickly make sure the logs are correct and complete using the test apps I mentioned above.
One thing you might want to check is that you are not being hit by lack of ability to reuse addresses. Sometimes when a socket gets closed, it cannot be immediately reused for a reconnect attempt as there is still residual activity on one or other end. You may be able to get around this (based on my Windows/Winsock experience) by experimenting with SO_REUSEADDR and SO_LINGER on your sockets. however, my first focus in your case would be on ensuring the socket code on client and server handles all errors and mainline cases correctly, before worrying about this.
A common issue is that when a connection is dropped, it is kept opened by the OS in TIME_WAIT state. If you want to restart the server socket, it will not be able to reopen the same port directly because it is still present for the OS.
To avoid that, you need to set the parameter SO_REUSEADDR so that the OS allows you to reuse the port if it is in TIME_WAIT state for a server socket.
Example:
int optval=1;
// set SO_REUSEADDR on a socket to true (1):
setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval);
I'm experiencing something similar with encrypted connections. I believe in my case it is because the client dropped the connection and reconnected in less than the 4 minute FIN_WAIT period. The initial connection is recycled (by the os) and the server doesn't see the drop out. The SSL authentication is lost when the client loses connection so the client tries to re-authenticate. This is during what the servers considers the middle of a conversation. The server then hangs up on the client. I think the server ssl code considers this a man in the middle attack or just gets confused and closes the connection.