I've wrote an application that sends information through a socket using a TCP connection. For several reasons, I'm using blocking calls but I've noticed that boost::asio::write() method doesn't block when the other machine (the one receiving the data) disconnects. It doesn't raise an error either.
Is this the expected behavior?
Socket write will block when there is no room in the buffer, otherwise it will return as soon as data is in the buffer to send, not until data is delivered to the recipient. Also network stack may not detect that another side disconnected immediately, so you may or may not see error code on write. So yes, it is expected behavior.
Related
I read somewhere that every TCP connection has it's own 125kB output and input buffer. What happens if this buffer is full, and I still continue sending data on linux?
According to http://www.kernel.org/doc/man-pages/online/pages/man2/send.2.html the packets are just silently dropped, without notifying me. What can I do to stop this from happening? Is there any way to find out if at least some of my data has been sent correctly, so that I can continue at a later point in time?
Short answer is this. "send" calls on a TCP socket will just block until the TCP sliding window (or internal queue buffers) opens up as a result of the remote endpoint receiving and consuming data. It's not much different than trying to write bytes to a file faster than the disk can save it.
If your socket is configured for non-blocking mode, send will return EWOULDBLOCK or EAGAIN, until data can be sent. Standard poll, select, and epoll calls will work as expected so you know when to "send" again.
I don't know that the "packets are dropped". I think that what is more likely is that the calls that the program makes to write() will either block or return a failure.
Suppose I have a server application - the connection is over TCP, using UNIX sockets.
The connection is asynchronous - in other words, clients' and servers' sockets are non-blocking.
Suppose the following situation: in some conditions, the server may decide to send some data to a connected client and immediately close the connection: using shutdown with SHUT_RDWR.
So, my question is - is it guaranteed, that when the client call recv, it will receive the (sent by the server) data?
Or, to receive the data, recv must be called before the server's shutdown? If so, what should I do (or, to be more precise, how should I do this), to make sure, that the data is received by the client?
You can control this behavior with "setsockopt(SO_LINGER)":
man setsockopt
SO_LINGER
Waits to complete the close function if data is present. When this option is enabled and there is unsent data present when the close
function is called, the calling application is blocked during the
close function until the data is transmitted or the connection has
timed out. The close function returns without blocking the caller.
This option has meaning only for stream sockets.
See also:
man read
Beej's Guide to Network Programming
There's no guarantee you will receive any data, let alone this data, but the data pending when the socket is closed is subject to the same guarantees as all the other data: if it arrives it will arrive in order and undamaged and subject to TCP's best efforts.
NB 'Asynchronous' and 'non-blocking' are two different things, not two terms for the same thing.
Once you have successfully written the data to the socket, it is in the kernel's buffer, where it will stay until it has been sent and acknowledged. Shutdown doesn't cause the buffered data to get lost. Closing the socket doesn't cause the buffered data to get lost. Not even the death of the sending process would cause the buffered data to get lost.
You can observe the size of the buffer with netstat. The SendQ column is how much data the kernel still wants to transmit.
After the client has acknowledged everything, the port disappears from the server. This may happen before the client has read the data, in which case it will be in RecvQ on the client. Basically you have nothing to worry about. After a successful write to a TCP socket, every component is trying as hard as it can to make sure that your data gets to the destination unharmed regardless of what happens to the sending socket and/or process.
Well, maybe one thing to worry about: If the client tries to send anything after the server has done its shutdown, it could get a SIGPIPE and die before it has read all the available data from the socket.
Using Winsock, C++, I send and receive the data with send()/recv(), TCP connection. I want to be sure that the data has been delivered to the other party, and wonder if it is recommended to send back some acknowledgment message after (if) receiving data with recv.
Here are two possibilities, and please advice which way to go:
If send returns the size of passed buffer, assume that the data has been delivered at least to recv function on the other side of wire. When I say "at least", I mean even if the recv fails there (e.g. due to insufficient buffer, etc.), I don't care, I just want to be sure I've done my server part of work properly - I've sent the data completely (i.e. the data reached the other machine).
Use additional acknowledgment: after receiving the data with recv, send back some ID of received packet (part of header of each data sent) signaling the successful receive operation of that packet. If I don't receive such "acknowledgment message" after some interval, return failure code from the sender function.
The second answer looks more safe, but I don't want to complicate the transfer protocol if it is redundant. Also please note that I'm talking about the TCP connection (which is more safe by itself than UDP).
Is there any other mechanisms (maybe some other APIs? maybe WSARecv()/WSASend() work differently?) of ensuring that the data was delivered to the recv function on the other side?
If you recommend the second way, could you please give me some code snippet that allows me to use recv with timeout to receive the acknowledgment? recv is a blocking operation so it will hang forever if the previous send attempt failed (the other party was not notified). Is there any simple way of using recv with timeout (without creating separate thread every time which would probably be the overkill for each and every send operation).
Also the amount of data I pass to send function might be quite big (several megabytes), so how to choose the timeout for "acknowledgment message"? Maybe I should "split" large buffers and use several send calls? I think it will get quite complicated, please advice!
EDIT: OK, you people are suggesting that TCP/IP stack will handle it (i.e. no manual acknowledgment required), but this is what I found on MSDN page: "The successful completion of a send function does not indicate that the data was successfully delivered and received to the recipient. This function only indicates the data was successfully sent." So even if the TCP mechanism has the ability to ensure data delivery, I can't get that status (success or not) via send() function, or any other Winsock function I know. Do you know any way of getting the status from the TCP layer? Again - return value of send() function seems to be not enough!
========================================================
EDIT 2: OK, I think we agree that even though TCP protocol considers the error handling when something goes wrong, the send() function of Winsock is not capable of reporting the errors (simply because it returns before actual transmitting of data starts by the network driver). So here is a million dollar question: Does the send() function of Winsock at least ensure that no other packets will be delivered to the other party until the current packet will be? In other words, if the sending fails for some network failure (but not reported by send() call), and then the network failure will be fixed before next call of send() function with next chunk of data, will it be ensured that the previous packet (which failed but not reported by send()) will be delivered before the next packet? In other words, is there a chance that the one particular send() function will fail "silently", so that subsequent send() calls will succeed but the first packet will be lost? AGAIN - I'm not talking at the TCP level, I'm talking at the Winsock API level!
Why don't you trust your TCP/IP stack to guarantee delivery. After all, that is the whole point of using TCP instead of UDP.
The existing answers here are mostly correct: if you use TCP you really don't need to worry about reliable delivery of your packets to your peer.
But this is a dangerous view for some systems where data integrity must be taken to the next level: the common criteria auditing requirement FAU_STG.4.1 requires the ability to prevent auditable events if the audit log might suffer a loss of audit entries. (For example, the Linux auditd(8) audit logging daemon can be configured to place the computer in single-user-mode or halt the system completely when there is no more space left for audit logs.) Audit logs from remote systems should probably be maintained until it is known that they have been successfully written to centralized log servers.
Financial transactions would probably be best handled with a more reliable protocol than simple TCP as well -- crediting or debiting accounts would be best handled with a multi-staged protocol to ensure availability of funds, perform the transaction, then report the result of the transaction to the origination point.
TCP allows nearly a gigabyte of in-flight data between two peers (under extreme conditions); depending upon the requirements of your application, you might need to maintain that data at the sending side until you receive positive confirmation from your peer that the data has been properly handled.
Thankfully, most applications aren't this critical; losing a megabyte of data here or there down a socket that reports a closed connection at some point "in the future" really isn't horrible -- we just re-try our HTTP request, or re-attempt the SFTP connection.
Update
A socket will only accept enough data to fill its available window. The window size is negotiated between the two peers during the session handshake. So your calls to send() will begin blocking when the socket's window fills. (The OS might keep letting you add data to its internal buffers too, but at some point the writes will block.) If the peer breaks the connection with a RST or ICMP Unreachable message, a future call to send() will return an error value for Connection Reset or Broken Pipe.
Update 2
I'm not talking at the TCP level, I'm talking at the Winsock API level
This might be the source of confusion. send() has no choice but to adhere to the TCP behavior when used with TCP.
TCP guarantees in-order reliable delivery of a stream of bytes, to the extent that packets can be delivered. (See #Hans's comment about a pony and careless people kicking power cords.) The peer program will see bytes in the correct order they were sent. (Well, okay, TCP also has out-of-band urgent packet delivery, but I haven't actually seen any applications that use it. Using OOB packets, you can get some data out-of-line. Forget I mentioned it.)
If the remote program receives a byte sent on a TCP stream, it reliably received all preceding bytes as well. (Well, there are entire classes of replay attacks that splice together legitimate and fake packets for the remote peer, but those are increasingly difficult on systems with randomized initial sequence numbers. If this is within your threat model, you should be using TLS on top of TCP to provide cryptographically strong tamper evident information. But TLS can't provide better per-packet delivery notification.)
If you use UDP and you care about the data actually being received by the other side you NEED to use ACK, but if you don't need the speed of UDP you should use TCP, as it does the ACKing for you.
I think you are over complicating this, trust your TCP/IP software stack and the reliable delivery it offers. TCP sockets operate on streams of data, not packets. Also one call to send does not guarantee one call to recv.
Background: I'm using CreateIoCompletionPort, WSASend/Recv, and GetQueuedCompletionStatus to do overlapped socket io on my server. For flow control, when sending to the client, I only allow several WSASend() to be called when all pending OVERLAPs have popped off the IOCP.
Problem: Recently, there are occassions when the OVERLAPs do not get returned to the IOCP. The thread calling GetQueuedCompletionStatus does not get them and they remain in my local pending queue. I've verified that the client DOES receive the data off the socket and the socket is connected. No errors were returned when the WSASend() calls were made. The OVERLAPs simply "never" come back without an external stimulus like the following:
Disconnecting the socket from the client or server, immediately allows the GetQueuedCompletionStatus thread to retrieve the OVERLAPs
Making additional calls to WSASend(), sometimes several are needed, before all the OVERLAPs suddenly pop off the queue.
Question: Has anyone seen this type of behavior? Any ideas on what is causing this?
Thanks,
Geoffrey
WSASend() can fail to complete in a timely manner if the TCP window is full. In this case the stack can't send any more data so your WSASend() waits and your completion doesn't occur until the TCP stack CAN send more data.
If you happen to have a protocol between your client and server that has no flow control built into the protocol itself AND you aren't doing any flow control yourself based on write completions and are just sending data as fast as your server can send then you may get to a point where either the network or your client can't keep up and TCP flow control kicks in (when the TCP window gets full). If you continue to just fire off data asynchronously with additional calls to WSASend() then eventually you'll chew your way through all of the non-paged memory on the machine and at that point all bets are off (chances are high that a driver may cause the box to bluescreen).
So, in summary, completions from overlapped socket writes can and will sometimes take longer to come back than you may expect. In your example, I expect that the completions that you get when you close the socket are all failures?
I talk about this some more on my blog; here: http://www.lenholgate.com/blog/2008/07/write-completion-flow-control.html and here: http://www.serverframework.com/asynchronousevents/2011/06/tcp-flow-control-and-asynchronous-writes.html
I would like to know if the following scenario is real?!
select() (RD) on non-blocking TCP socket says that the socket is ready
following recv() would return EWOULDBLOCK despite the call to select()
For recv() you would get EAGAIN rather than EWOULDBLOCK, and yes it is possible. Since you have just checked with select() then one of two things happened:
Something else (another thread) has drained the input buffer between select() and recv().
A receive timeout was set on the socket and it expired without data being received.
It's possible, but only in a situation where you have multiple threads/processes trying to read from the same socket.
On Linux it's even documented that this can happen, as I read it.
See this question:
Spurious readiness notification for Select System call
I am aware of an error in a popular desktop operating where O_NONBLOCK TCP sockets, particularly those running over the loopback interface, can sometimes return EAGAIN from recv() after select() reports the socket is ready for reading. In my case, this happens after the other side half-closes the sending stream.
For more details, see the source code for t_nx.ml in the NX library of my OCaml Network Application Environment distribution. (link)
Though my application is a single-threaded one, I noticed that the described behavior is not uncommon in RHEL5. Both with TCP and UDP sockets that were set to O_NONBLOCK (the only socket option that is set). select() reports that the socket is ready but the following recv() returns EAGAIN.
Yes, it's real. Here's one way it can happen:
A future modification to the TCP protocol adds the ability for one side to "revoke" information it sent provided it hasn't been received yet by the other side's application layer. This feature is negotiated on the connection. The other side sends you some data, you get a select hit. Before you can call recv, the other side "revokes" the data using this new extension. Your read gets a "would block" error because no data is available to be read.
The select function is a status-reporting function that does not come with future guarantees. Assuming that a hit on select now assures that a subsequent operation won't block is as invalid as using any other status-reporting function this way. It's as bad as using access to try to ensure a subsequent operation won't fail due to incorrect permissions or using statfs to try to ensure a subsequent write won't fail due to a full disk.
It is possible in a multithreaded environment where two threads are reading from the socket. Is this a multithreaded application?
If you do not call any other syscall between select() and recv() on this socket, then recv() will never return EAGAIN or EWOULDBLOCK.
I don't know what they mean with recv-timeout, however, the POSIX standard does not mention it here so you can be safe calling recv().