listening socket dies unexpectedly - c++

I'm having a problem where a TCP socket is listening on a port, and has been working perfectly for a very long time - it's handled multiple connections, and seems to work flawlessly. However, occasionally when calling accept() to create a new connection the accept() call fails, and I get the following error string from the system:
10022: An invalid argument was supplied.
Apparently this can happen when you call accept() on a socket that is no longer listening, but I have not closed the socket myself, and have not been notified of any errors on that socket.
Can anyone think of any reasons why a listening socket would stop listening, or how the error mentioned above might be generated?

Some possibilities:
Some other part of your code overwrote the handle value. Check to see if it has changed (keep a copy somewhere else and compare, print it out, breakpoint on write in the debugger, whatever).
Something closed the handle.
Interactions with a buggy Winsock LSP.

One thing that comes to my mind is system standy or hibernation mode. I'm not sure how these events are handled by the winsock Library. Might be that the network interface is (partially) shut down.

It might make sense to debug the socket's thread (either with an IDE or through a disassembler) and watch its execution for anything that might be causing it to stop listening.

Related

TCP server hangs, not responding to SYN

I have a strange issue with a TCP server that sometimes hangs. The weird issue is that when it hangs it does not receive any new connection, i.e. doesn't respond to the initial TCP SYN packet. I was pretty sure that since TCP handshakes are handled by the kernel, even when a program hangs clients should still at the very least receive the initial SYN,ACK. If anyone knows a situation where a program can hang in a way that prevents the OS from even completing the TCP handshake (and without it ever closing the listening socket) please let me know.
P.S.
The program is written in C++ and the OS is Windows Server 2016.
Most likely, the listen queue is full. Not responding to the initial SYN causes the other side to try another SYN a bit later. With luck, the listen queue won't be full at that time. The program is probably not calling accept (or some similar function) often enough.
It's also possible that the program is using the selective accept functionality (see the lpfnCondition parameter to WSASelect) to choose not to respond to this connection attempt.

Close a socket without FD_CLR

Should applications use FD_CLR() before close() of a socket descriptor?
Does shutdown take care of FD_CLR()? Sometimes I observe that close() works even without FD_CLR() but sometimes the socket still shows up in the netstat entries.
Why is this erratic?
Should applications use FD_CLR() before close() of a socket descriptor?
You should certainly use FD_CLR if you're going to keep using that fd_set after closing the socket.
Does shutdown take care of FD_CLR()?
No.
Sometimes I observe that close() works even without FD_CLR() but sometimes the socket still shows up in the netstat entries.
Shows up how? And FD_CLR doesn't have anything to do with ports remaining in TIME-WAIT or other states in netstat.
You seem to be asking three questions at once.

Reconnect a socket with Boost asio in Windows

I'm having trouble when connecting a socket to an endpoint after being connected to another.
This is the situation:
a) The boost::asio::ip::tcp::socket is connected to a remote host (say pop.remote1.com).
b) The transmission ends, and the socket is closed:
socket_.shutdown(boost::asio::ip::tcp::socket::shutdown_both, error);
socket_.close(error);
Then, when trying to connect to another host (say pop.remote2.com) using the same process that in a), the proccess returns without error, but the socket remains closed.
Note that when using pop.remote2.com as the first connection, the things run Ok, and the same problem arises if try to connect to pop.remote1.com after closing.
In both situations there are not pending processes in the attached io_service.
The questions are:
Is that reconnection admissible?
Is that the supposed correct process?
Thanks in advance.
P.D:
I tried to open the socket before the reconnection, but the result remains the same. That is, the result is the same if after closing the previous connection with.
socket_.shutdown(...);
socket_.close(...);
is used
socket_.open(...);
socket_.async_connect( ... );
or just
socket_.async_connect( ... );
A final thought:
After spent some time on the problem, and do some debug with MS Visual Studio, I think that simply that is not possible, at least in Asio v. 1.45.0; Windows 32 and VC++.
Perhaps the question is that here -at Boost librarys- all people think in and use objects, and if sometime need reconnect, simply delete the apropriate object, and do a new connection... creating a new object!
That was the solution that I do in my application with good results, athought with some extra code.
HTH to some else.
Is that reconnection admissible?
yes
Is that the supposed correct process?
yes and no. If you aren't opening the socket for subsequent connections after you close it for the previous one, you'll need to do that. Ex:
socket_.open();
socket_.async_connect( ... );

Socket still listening after application crash

I'm having a problem with one of my C++ applications on Windows 2008x64 (same app runs just fine on Windows 2003x64).
After a crash or even sometimes after a regular shutdown/restart cycle it has a problem using a socket on port 82 it needs to receive commands.
Looking at netstat I see the socket is still in listening state more than 10 minutes after the application stopped (the process is definitely not running anymore).
TCP 0.0.0.0:82 LISTENING
I tried setting the socket option to REUSEADDR but as far as I know that only affects re-connecting to a port that's in TIME_WAIT state. Either way this change didn't seem to make any difference.
int doReuse = 1;
setsockopt(listenFd, SOL_SOCKET, SO_REUSEADDR,
(const char *)&doReuse, sizeof(doReuse));
Any ideas what I can do to solve or at least avoid this problem?
EDIT:
Did netstat -an but this is all I am getting:
TCP 0.0.0.0:82 0.0.0.0:0 LISTENING
For netstat -anb I get:
TCP 0.0.0.0:82 0.0.0.0:0 LISTENING
[System]
I'm aware of shutting down gracefully, but even if the app crashes for some reason I still need to be able to restart it. The application in question uses an in-house library that internally uses Windows Sockets API.
EDIT:
Apparently there is no solution for this problem, so for development I will go with a proxy / tool to work around it. Thanks for all the suggestions, much appreciated.
If this is only hurting you at debug time, use tcpview from the sysinternals folks to force the socket closed. I am assuming it works on your platform, but I am not sure.
If you're doing blocking operations on any sockets, do not use an indefinite timeout. This can cause weird behavior on a multiprocessor machine in my experience. I'm not sure what Windows server OS it was, but, it was one or two versions previous to 2003 Server.
Instead of an indefinite timeout, use a 30 to 60 second timeout and then just repeat the wait. This goes for overlapped IO and IOCompletion ports as well, if you're using them.
If this is an app you're shipping for others to use, good luck. Windows can be a pure bastard when using sockets...
I tried setting the socket option to
REUSEADDR but as far as I know that
only affects re-connecting to a port
that's in TIME_WAIT state.
That's not quite correct. It will let you re-use a port in TIME_WAIT state for any purpose, i.e. listen or connect. But I agree it won't help with this. I'm surprised by the comment about the OS taking 10 minutes to detect the crashed listener. It should clean up all resources as soon as the process ends, other than ports in the TIME_WAIT state.
The first thing to check is that it really is your application listening on that port. Use:
netstat -anb
to figure out which process is listenin on that port.
The second thing to check is that your are closing the socket gracefully when your application shuts down. If you're using a high-level socket API that shouldn't be too much of an issue (you are using a socket API, right?).
Finally, how is your application structured? Is it threaded? Does it launch other processes? How do you know that your application is really shut down?
Run
netstat -ano
This will give you the PID of the process that has the port open. Check that process from the task manager. Make sure you have "list processes from all users" is checked.
http://hea-www.harvard.edu/~fine/Tech/addrinuse.html is a great resource for "Bind: Address Already in Use" errors.
Some extracts:
TIME_WAIT is the state that typically ties up the port for several minutes after the process has completed. The length of the associated timeout varies on different operating systems, and may be dynamic on some operating systems, however typical values are in the range of one to four minutes.
Strategies for Avoidance
SO_REUSEADDR
This is the both the simplest and the most effective option for reducing the "address already in use" error.
Client Closes First
TIME_WAIT can be avoided if the remote end initiates the closure. So the server can avoid problems by letting the client close first.
Reduce Timeout
If (for whatever reason) neither of these options works for you, it may also be possible to shorten the timeout associated with TIME_WAIT.
After seeing https://superuser.com/a/453827/56937 I discovered that there was a WerFault process that was suspended.
It must have inherited the sockets from the non-existent process because killing it freed up my listening ports.

recv() with errno=107:(transport endpoint connected)

well..I use a typical model of epoll+multithread to handle massive sockets, that is, I have a thread called epollWorkThread that use epoll_wait to handle i/o sockets. While there's an event of EPOLLIN, recv() will do the work and I do use the noblocking mode to allow immediate return. And recv() is indeed in a while(true) loop.
Everything is fine in the intial time(maybe a couple of hours or maybe minutes or if I'm lucky days), I can receive the information. But some time later, recv() insists to return -1 with the errno = 107(ENOTCONN). The other peer of the transport is written in AS3 which makes sure that the socket is connected. So I'm confused by the recv() behaviour. Thank you in advance and any comment is appreciated!
Errno 107 means that the socket is NOT connected (any more).
There are several reasons why this could happen. Assuming you're right and both sides of the connection claim that the socket is still open, an intermediate router/switch may have dropped the connection due to a timeout. The safest way to avoid such things from happen is to periodically send a 'health' or 'keep-alive' message. (Thus the intermediate router/switch accepts the connection as living...)=