Close a socket without FD_CLR - c++

Should applications use FD_CLR() before close() of a socket descriptor?
Does shutdown take care of FD_CLR()? Sometimes I observe that close() works even without FD_CLR() but sometimes the socket still shows up in the netstat entries.
Why is this erratic?

Should applications use FD_CLR() before close() of a socket descriptor?
You should certainly use FD_CLR if you're going to keep using that fd_set after closing the socket.
Does shutdown take care of FD_CLR()?
No.
Sometimes I observe that close() works even without FD_CLR() but sometimes the socket still shows up in the netstat entries.
Shows up how? And FD_CLR doesn't have anything to do with ports remaining in TIME-WAIT or other states in netstat.
You seem to be asking three questions at once.

Related

Problem with blocking unix domain sockets

I'm writing an application that is split into two parts for Mac OS X - a daemon and an agent. I'm using a standard unix socket to communicate between the daemon and the agents. That is, the socket is created with PF_UNIX and SOCK_STREAM.
When agents are created (whenever a user logs in), one of the first things it does is to connect to the socket. This seems to work perfectly for the first agent. However, when the second agent connects, the daemon experiences the following issue:
I'm using select() to check for data that can be read. The select() call succeeds, and indicates that there is data to be read. However, when i call recv() it returns with -1, and errno is set to 35, or "Resource temporarily unavailable".
Now, I would expect this for a non-blocking socket, but I have triple-checked - I never set the socket to be non-blocking.
As far as I can tell, this only happens when a second agent connects to the same unix socket. If I limit myself to one daemon and one agent then everything seems to work perfectly. What could be causing this odd behaviour?
It sounds a bit like you're trying to read from the wrong client fd. It's hard to tell without seeing your code, but it also sounds a bit that way from your description.
So just in case, here's how it works. Your server is ending up with three file descriptors, the socket it first starts listening on, and then one file descriptor for each connected client. When there's something to read on the original socket, that means there's a new client; it sounds like you have this part right. Each connected client then gives you its own independent fd to read/write from. Calling select() will return if any of these is ready to read; you then have to check each fd in the readfds variable from select with FD_ISSET() to see if it actually has data to read.
You can see a basic example of this type of code here.

Reading data from a socket

I am having issues reading data from a socket. Supposedly, there is a server socket that is waiting for clients to connect. When I write a client to connect() to the server socket/port, it appears that I am connected. But when I try to read() data that the server is supposedly writing on the socket, the read() function hangs until the server app is stopped.
Why would a read() call ever hang if the socket is connected? I believe that I am not ever really connected to the socket/port but I can't prove it, b/c the connect() call did not return an error. The read() call is not returning an error either, it is just never returning at all.
Read is blocking until is receives some I/O (or an error).
As John & Whirl mentioned, the problem is almost certainly that the server hasn't sent any data for your read() call to return. Another easy thing to overlook when you're starting out with network programming is that the data transfered in a server's write() call is not always symmetrical with a client's read() call. Where the server may write("hello world"), your read() could easily return "hello world", "hello wo", "hel", or even just "h"
Unless you explicitly changed your reader's socket to non-blocking mode, a call to read will do exactly what you say until there is data available: It will block forever until some data is actually read.
You can also use netstat (I use it with -f inet) to figure out connections that have been made and see the status of your socket connection.
Your server is probably not writing data to the socket, so your reader just blocks waiting for data to appear on the socket.

listening socket dies unexpectedly

I'm having a problem where a TCP socket is listening on a port, and has been working perfectly for a very long time - it's handled multiple connections, and seems to work flawlessly. However, occasionally when calling accept() to create a new connection the accept() call fails, and I get the following error string from the system:
10022: An invalid argument was supplied.
Apparently this can happen when you call accept() on a socket that is no longer listening, but I have not closed the socket myself, and have not been notified of any errors on that socket.
Can anyone think of any reasons why a listening socket would stop listening, or how the error mentioned above might be generated?
Some possibilities:
Some other part of your code overwrote the handle value. Check to see if it has changed (keep a copy somewhere else and compare, print it out, breakpoint on write in the debugger, whatever).
Something closed the handle.
Interactions with a buggy Winsock LSP.
One thing that comes to my mind is system standy or hibernation mode. I'm not sure how these events are handled by the winsock Library. Might be that the network interface is (partially) shut down.
It might make sense to debug the socket's thread (either with an IDE or through a disassembler) and watch its execution for anything that might be causing it to stop listening.

recv() with errno=107:(transport endpoint connected)

well..I use a typical model of epoll+multithread to handle massive sockets, that is, I have a thread called epollWorkThread that use epoll_wait to handle i/o sockets. While there's an event of EPOLLIN, recv() will do the work and I do use the noblocking mode to allow immediate return. And recv() is indeed in a while(true) loop.
Everything is fine in the intial time(maybe a couple of hours or maybe minutes or if I'm lucky days), I can receive the information. But some time later, recv() insists to return -1 with the errno = 107(ENOTCONN). The other peer of the transport is written in AS3 which makes sure that the socket is connected. So I'm confused by the recv() behaviour. Thank you in advance and any comment is appreciated!
Errno 107 means that the socket is NOT connected (any more).
There are several reasons why this could happen. Assuming you're right and both sides of the connection claim that the socket is still open, an intermediate router/switch may have dropped the connection due to a timeout. The safest way to avoid such things from happen is to periodically send a 'health' or 'keep-alive' message. (Thus the intermediate router/switch accepts the connection as living...)=

select(), recv() and EWOULDBLOCK on non-blocking sockets

I would like to know if the following scenario is real?!
select() (RD) on non-blocking TCP socket says that the socket is ready
following recv() would return EWOULDBLOCK despite the call to select()
For recv() you would get EAGAIN rather than EWOULDBLOCK, and yes it is possible. Since you have just checked with select() then one of two things happened:
Something else (another thread) has drained the input buffer between select() and recv().
A receive timeout was set on the socket and it expired without data being received.
It's possible, but only in a situation where you have multiple threads/processes trying to read from the same socket.
On Linux it's even documented that this can happen, as I read it.
See this question:
Spurious readiness notification for Select System call
I am aware of an error in a popular desktop operating where O_NONBLOCK TCP sockets, particularly those running over the loopback interface, can sometimes return EAGAIN from recv() after select() reports the socket is ready for reading. In my case, this happens after the other side half-closes the sending stream.
For more details, see the source code for t_nx.ml in the NX library of my OCaml Network Application Environment distribution. (link)
Though my application is a single-threaded one, I noticed that the described behavior is not uncommon in RHEL5. Both with TCP and UDP sockets that were set to O_NONBLOCK (the only socket option that is set). select() reports that the socket is ready but the following recv() returns EAGAIN.
Yes, it's real. Here's one way it can happen:
A future modification to the TCP protocol adds the ability for one side to "revoke" information it sent provided it hasn't been received yet by the other side's application layer. This feature is negotiated on the connection. The other side sends you some data, you get a select hit. Before you can call recv, the other side "revokes" the data using this new extension. Your read gets a "would block" error because no data is available to be read.
The select function is a status-reporting function that does not come with future guarantees. Assuming that a hit on select now assures that a subsequent operation won't block is as invalid as using any other status-reporting function this way. It's as bad as using access to try to ensure a subsequent operation won't fail due to incorrect permissions or using statfs to try to ensure a subsequent write won't fail due to a full disk.
It is possible in a multithreaded environment where two threads are reading from the socket. Is this a multithreaded application?
If you do not call any other syscall between select() and recv() on this socket, then recv() will never return EAGAIN or EWOULDBLOCK.
I don't know what they mean with recv-timeout, however, the POSIX standard does not mention it here so you can be safe calling recv().