I'm writing an application that is split into two parts for Mac OS X - a daemon and an agent. I'm using a standard unix socket to communicate between the daemon and the agents. That is, the socket is created with PF_UNIX and SOCK_STREAM.
When agents are created (whenever a user logs in), one of the first things it does is to connect to the socket. This seems to work perfectly for the first agent. However, when the second agent connects, the daemon experiences the following issue:
I'm using select() to check for data that can be read. The select() call succeeds, and indicates that there is data to be read. However, when i call recv() it returns with -1, and errno is set to 35, or "Resource temporarily unavailable".
Now, I would expect this for a non-blocking socket, but I have triple-checked - I never set the socket to be non-blocking.
As far as I can tell, this only happens when a second agent connects to the same unix socket. If I limit myself to one daemon and one agent then everything seems to work perfectly. What could be causing this odd behaviour?
It sounds a bit like you're trying to read from the wrong client fd. It's hard to tell without seeing your code, but it also sounds a bit that way from your description.
So just in case, here's how it works. Your server is ending up with three file descriptors, the socket it first starts listening on, and then one file descriptor for each connected client. When there's something to read on the original socket, that means there's a new client; it sounds like you have this part right. Each connected client then gives you its own independent fd to read/write from. Calling select() will return if any of these is ready to read; you then have to check each fd in the readfds variable from select with FD_ISSET() to see if it actually has data to read.
You can see a basic example of this type of code here.
Related
I'm working on a simple chatroom based on C++ and UDP, and I'm using this as a base. Every time client-server are saying "hello" to each other, both of them are ending their processes and nothing else, but I'd like to keep the socket open after that, so I can send something else and/or something like that, but haven't found a way to do so, so how do I do such thing? Haven't found much info on what I need, so any help appreciated. Thanks in advance.
You don't need to send a pulse or a heartbeat to keep the socket open. The socket will remain open as long as the program is running or you call close on it.
You can wrap your send and receive in an infinite loop but you should note that the example code you linked to is waaaay too simple for a chat client: you will need to handle errors like the underlying connection being offline ( for example, the interface being disconnected/ brought down, when the send and recv calls will return an error with associated errno ). You should look into using the select, poll and epoll system calls to detect errors and deal with them.
I'm writing a cross-platform client application that uses sockets, written in C++. I'm having problems where the server is doing a hard close on the socket when it's done sending me info.
I've been reading other posts on this topic, and I'm not so much interested in the rights or wrong of this approach, but it's seems the server is either explicitly setting SO_LINGER=0, or that's the default behavior on that system (not sure, it's a Linux box).
I can see (in Wireshark) that the data was sent to me followed within milli-seconds by an RST, indicating a hard close by the server. I personally don't agree with this approach as it should be up to the client to shutdown the socket.
Server team are saying there's nothing wrong with that approach (doing a hard close rather than shutdown), it's typical on servers to avoid accumulating TIMED_WAIT sockets. On Windows my select() returns indicating there's something to read (while I haven't read any of this "in transit" data yet).
However, because of the quick arrival of the RST, on Windows recv() returns -1 and I'm seeing a 10054 for the error code (connection reset by peer). This wouldn't be too bad if I could at least get the data that was sent, but it seems that once my client's socket stack sees the RST any unread bytes are no longer made available to me.
On Linux (client), there's no problem. It seems the TCP stack is behaving slightly differently, in that I can read the outstanding bytes before the RST is honoured. I'm having trouble convincing the server guys they have a bug, given that it works for a Linux client.
First off, am I correct? Is this a server-side issue? I can't see that the client end is doing anything wrong, so it must be right?
It seems the server team are adamant that they want to perform the close, and they don't want to in have TIMED_WAITs, so I was going to push for them to add a SO_LINGER of, say 2 seconds? Does that sound like it will solve my problem? From what I understand this will stop the server from sending out a RST so soon after sending data, and should give me a chance to read the outstanding bytes.
Found a definitive answer to my own question:
"...Upon reception of RST segment, the receiving side will immediately abort the connection. This statement has more implications than just meaning that you will not be able to receive or send any more data to/from this connection. It also implies that any unread data still in the TCP reception buffer will be lost..." It cites the book "TCP/IP Internetworking Volume II". I don't have that book, so I can only take his word for it. Doesn't seems to discard data on Linux, only Windows...
Olivier Langlois's blog
The side-effect of fiddling with SO_LINGER to force a reset is that all pending data is lost. The fact that you don't receive it is all the proof you need that the server team is wrong to do this.
RFC 793 cited below says 'this command [ABORT] causes all pending SENDs and RECEIVEs to be aborted, ... and a special RESET message to be sent to the TCP on the other side of the connection.' See also W.R. Stevens, TCP/IP Illustrated, Vol. 1, p. 287: 'Aborting a connection provides two features to the application: (1) any queued data is thrown away and the reset is sent immediately, and (2) the receiver of the RST can tell that the other end did an abort instead of a normal close'. There is similar wording, along with an extract from the BSD code that implements it, in Vol. 2.
The TIME_WAIT state only occurs on a socket which sends a FIN before it has received one: see RFC 793. So the server should be waiting for a FIN from the client, with a suitable timeout, rather than resetting. This will also permit the client to do connection pooling.
Suppose I have a server application - the connection is over TCP, using UNIX sockets.
The connection is asynchronous - in other words, clients' and servers' sockets are non-blocking.
Suppose the following situation: in some conditions, the server may decide to send some data to a connected client and immediately close the connection: using shutdown with SHUT_RDWR.
So, my question is - is it guaranteed, that when the client call recv, it will receive the (sent by the server) data?
Or, to receive the data, recv must be called before the server's shutdown? If so, what should I do (or, to be more precise, how should I do this), to make sure, that the data is received by the client?
You can control this behavior with "setsockopt(SO_LINGER)":
man setsockopt
SO_LINGER
Waits to complete the close function if data is present. When this option is enabled and there is unsent data present when the close
function is called, the calling application is blocked during the
close function until the data is transmitted or the connection has
timed out. The close function returns without blocking the caller.
This option has meaning only for stream sockets.
See also:
man read
Beej's Guide to Network Programming
There's no guarantee you will receive any data, let alone this data, but the data pending when the socket is closed is subject to the same guarantees as all the other data: if it arrives it will arrive in order and undamaged and subject to TCP's best efforts.
NB 'Asynchronous' and 'non-blocking' are two different things, not two terms for the same thing.
Once you have successfully written the data to the socket, it is in the kernel's buffer, where it will stay until it has been sent and acknowledged. Shutdown doesn't cause the buffered data to get lost. Closing the socket doesn't cause the buffered data to get lost. Not even the death of the sending process would cause the buffered data to get lost.
You can observe the size of the buffer with netstat. The SendQ column is how much data the kernel still wants to transmit.
After the client has acknowledged everything, the port disappears from the server. This may happen before the client has read the data, in which case it will be in RecvQ on the client. Basically you have nothing to worry about. After a successful write to a TCP socket, every component is trying as hard as it can to make sure that your data gets to the destination unharmed regardless of what happens to the sending socket and/or process.
Well, maybe one thing to worry about: If the client tries to send anything after the server has done its shutdown, it could get a SIGPIPE and die before it has read all the available data from the socket.
Currently I am implementing a simple client-server program with just the basic functionalities of read/write.
However I noticed that if for example my server calls a write() to reply my client, and if my client does not have a corresponding read() function, my server program will just hang there.
Currently I am thinking of using a simple timer to define a timeout count, and then to disconnect the client after a certain count, but I am wondering if there is a more elegant/or standard way of handling such errors?
There are two general approaches to prevent server blocking and to handle multiple clients by a single server instance:
use POSIX threads to handle each client's connection. If one thread blocks because of erroneous client, other threads will still continue to run. If the remote client has just disappeared (crashed, network down, etc.), then sooner or later the TCP stack will signal a timeout and the blocked write operation will fail with error.
use non-blocking I/O together with a polling mechanism, e.g. select(2) or poll(2). It is quite harder to program using polling calls though. Network sockets are made non-blocking using fcntl(2) and in cases where a normal write(2) or read(2) on the socket would block an EAGAIN error is returned instead. You can use select(2) or poll(2) to wait for something to happen on the socket with an adjustable timeout period. For example, waiting for the socket to become writable, means that you will be notified when there is enough socket send buffer space, e.g. previously written data was flushed to the client machine TCP stack.
If the client side isn't going to read from the socket anymore, it should close down the socket with close. And if you don't want to do that because the client still might want to write to the socket, then you should at least close the read half with shutdown(fd, SHUT_RD).
This will set it up so the server gets an EPIPE on the write call.
If you don't control the clients... if random clients you didn't write can connect, the server should handle clients actively attempting to be malicious. One way for a client to be malicious is to attempt to force your server to hang. You should use a combination of non-blocking sockets and the timeout mechanism you describe to keep this from happening.
In general you should write the protocols for how the server and client communicate so that neither the server or client are trying to write to the socket when the other side isn't going to be reading. This doesn't mean you have to synchronize them tightly or anything. But, for example, HTTP is defined in such a way that it's quite clear for either side as to whether or not the other side is really expecting them to write anything at any given point in the protocol.
I am trying to pass an established connection Unix Domain Socket file descriptor from process A to process B through another Unix Domain Socket connection, with no luck
although a tcp socket is passed with no problem.
Is there a reason to it or am I doing something wrong?
Both are passed through ancilary message
Thanx
Socket file descriptor (just like a regular file descriptors) have absolutely no meaning outside the process that properly created them. When you send a fd to other process, you are just sending a bunch of bytes - nothing more, nothing less.
The only way you can move a working fd from one process to another is to fork() a process containing the fd to be passed.
If you want some process to connect to a particular Unix socket you should pass a unix socket file system entry name to that process. The receiving process can properly create a socket and make a connection afterwards.
I don't know why didn't you have problems with passing a tcp socket fd. Perhaps if you post the relevant parts of your code the reason will be revealed.