Receiving data from already closed socket? - c++

Suppose I have a server application - the connection is over TCP, using UNIX sockets.
The connection is asynchronous - in other words, clients' and servers' sockets are non-blocking.
Suppose the following situation: in some conditions, the server may decide to send some data to a connected client and immediately close the connection: using shutdown with SHUT_RDWR.
So, my question is - is it guaranteed, that when the client call recv, it will receive the (sent by the server) data?
Or, to receive the data, recv must be called before the server's shutdown? If so, what should I do (or, to be more precise, how should I do this), to make sure, that the data is received by the client?

You can control this behavior with "setsockopt(SO_LINGER)":
man setsockopt
SO_LINGER
Waits to complete the close function if data is present. When this option is enabled and there is unsent data present when the close
function is called, the calling application is blocked during the
close function until the data is transmitted or the connection has
timed out. The close function returns without blocking the caller.
This option has meaning only for stream sockets.
See also:
man read
Beej's Guide to Network Programming

There's no guarantee you will receive any data, let alone this data, but the data pending when the socket is closed is subject to the same guarantees as all the other data: if it arrives it will arrive in order and undamaged and subject to TCP's best efforts.
NB 'Asynchronous' and 'non-blocking' are two different things, not two terms for the same thing.

Once you have successfully written the data to the socket, it is in the kernel's buffer, where it will stay until it has been sent and acknowledged. Shutdown doesn't cause the buffered data to get lost. Closing the socket doesn't cause the buffered data to get lost. Not even the death of the sending process would cause the buffered data to get lost.
You can observe the size of the buffer with netstat. The SendQ column is how much data the kernel still wants to transmit.
After the client has acknowledged everything, the port disappears from the server. This may happen before the client has read the data, in which case it will be in RecvQ on the client. Basically you have nothing to worry about. After a successful write to a TCP socket, every component is trying as hard as it can to make sure that your data gets to the destination unharmed regardless of what happens to the sending socket and/or process.
Well, maybe one thing to worry about: If the client tries to send anything after the server has done its shutdown, it could get a SIGPIPE and die before it has read all the available data from the socket.

Related

Information overload from a websocket

I am just very thankful that stackoverflow exists, so many questions that would have taken me hours, they are answred here from exprienced people, thanks everyone :).
One question I have, suppose I am connected to a server via a websocket that sends me data every 1 second, and I am processing that data in a function call it on_feed(cons map_t& m).
Suppose each on_feed call takes 2 seconds, what will happend? Is there an internal queue in the OS that will process the input and queue them?
I hope I am clear, if not what happens if a server sends data too quickly that I can't process it as my process takes time. (I don't want to use my own queue :) )
Thanks!!
A websocket is a TCP socket. TCP sockets have an internal buffer of some unspecified size that holds unread data, until it's read from the socket.
When enough unread data accumulates there are low-level IP messages and protocols that get communicated to the sender indicating that the peer cannot accept any more data on the socket (and additional messages when this is no longer the case and the sending can resume).
The sender will also have a buffer that holds some amount of data that was written to the socket but not yet transmitted to the peer, also of some unspecified size; so the sender's socket's buffer will also accumulate some amount of data, and when that buffer is full any ordinary write() to the socket will block until such time that more data to the socket can be written.
At that point what happens depends entirely on the application on the other end of the websocket and it is entirely up to the application to figure out what to do next. It may choose to wait forever, or for some indeterminate period of time until it can write more data to the socket; or it may choose to close the socket immediately, it is entirely up to the websocket server.
Is there an internal queue in the OS that will process the input and queue them?"
Most operating system kernels do not have built-in support for the WebSocket protocol. They generally only provide TCP/IP support. You will probably be using a WebSocket library which will be linked to your program. This library will likely not be part of the operating system (in the narrow sense). In order to support WebSocket messages, this library will probably have its own input queue.
what happens if a server sends data too quickly that I can't process it as my process takes time
If the input queue of the WebSocket library is full, then the WebSocket library will probably stop accepting new data from the server, until more input has been processed and there is room in the queue to accept new input.
Therefore, it should generally not be a problem if a server attempts to send data faster than the client can process.
If the server software is programmed to send new data sets to the client at a certain rate, but the client is unable to process the data sets at this rate, then the client will probably stop accepting new data after some time, due to its input buffer being full. After that, the server's output buffer will start to fill. If the server software is well-designed, then it should be able to handle this situation well, and it will stop generating data once the output buffer is full, until there is room again in the output buffer.
However, if the server software is not well-designed, then, depending on the situtation, it may not be able to cope with this type of problem.
Also, even if the server software is well-designed, it may expect the client to be able to process the WebSocket messages in a timely manner, and the server may decide to abort the connection if the client is taking too long.

Socket data race [duplicate]

This question already has answers here:
Are parallel calls to send/recv on the same socket valid?
(3 answers)
Closed 7 years ago.
Sockets can generally two way communicate, therefore the same socket can be used to send and recv.
If I wanted to send some data (on another thread) while the socket is getting read, what would the kernel do? This is applied for both parts.
Consider this example: the server is sending you a file and say it will take a lot (low uplink or a very big file). The user gets bored and decides to SIGINT you. You catch it and tell the server to stop sending the file (with some kind of message).
Will you be able to send to tell the server to stop sending even though you're reading from it? And of course, that's applied to the server-side as well.
Hopefully I've been enough clear.
If I wanted to send some data (on another thread) while the socket is getting read, what would the kernel do?
Nothing special... sockets aren't like garden hoses... there's just some meta-data added to a packet that's sent between the machines, so the reading and writing happen independently (apart perhaps from if one side calls recv() on a socket that has unsent data in the local buffers due to the Nagle algorithm, which bunches up data into sensible size packets, it might time-out immediately and send whatever it can, but any tuning of that would be an implementation latency-tuning detail and doesn't change the big picture or way the client and server call the TCP API).
Consider this example: the server is sending you a file and say it will take a lot (low uplink or a very big file). The user gets bored and decides to SIGINT you. You catch it and tell the server to stop sending the file (with some kind of message). Will you be able to send to tell the server to stop sending even though you're reading from it? And of course, that's applied to the server-side as well.
The kernel accepts a limited amount of data to be sent, and a limited amount of data received, after which it forces the sending side to wait until some has been consumed before sending more. So, if you've sent data to a server, then get a local SIGINT and send a "oh, cancel that" in the same way, the server must read all the already-sent data before it can see the "oh, cancel that". If instead of sending it "in the same way" you turn on the Out Of Band (OOB) flag while sending the cancel message, then the server can (if it's written to do so) detect that there's OOB data and read it before it's completed reading/processing the other data. It will still need to read and discard whatever in-band data you've already sent, but the flow control / buffering mentioned above means that should be a manageable amount - far less than your file size might be. Throughout all this, whatever you want to recv or the server sends is independent and unaffected by the large client->server send, any OOB data etc..
There's a discussion and example code from GNU at http://www.gnu.org/software/libc/manual/html_node/Out_002dof_002dBand-Data.html
Thread 1 can safely write to the socket (with send) whilst thread 2 reads from the socket (with recv). What you need to be careful of is that at the point where you close() the socket the threads are synchronised, else the file descriptor may be used elsewhere, so the other thread (if not synchronized) could read from a file descriptor now used for something else. One way to achieve this would be for your reading thread to shutdown the file descriptor, which should cause the other end to drop the connection and thus error an in-progress send.

Send buffer empty of Socket in Linux?

Is there a way to check if the send buffer of an TCP Connection is completely empty?
I haven't found anything until now and just want to make sure a connection is not closed by my server while there are still data being transmitted to a certain client.
I'm using poll to check if I'm able to send data on a non-blocking socket. But by that I'm not able to find out if EVERYTHING has been sent in buffer, am I?
In Linux, you can query a socket's send queue with ioctl(sd, SIOCOUTQ, &bytes). See man ioctl for details.
The information is not completely reliable in the sense that it is possible that the data has been received by the remote host, since the buffer cannot be emptied until an ACK is received. You probably should not use it to add another level of flow-control on top of TCP.
If the remote host actually closes the connection (or half-closes it), then the socket become unwriteable, regardless of how much data might have been in the buffer. You can detect this condition by writing 0 bytes to the socket.
The more difficult (and often more likely) condition is the remote host becoming unreachable, because of network issues or because it crashes. In that case, data will pile up in the send buffer, but that can also happen because the remote host's receive buffer is full (perhaps because the process reading the buffer doesn't have enough resources to process its input). In the case of network routing issues, you might get a router notification (an ICMP error), which should make the socket unwritable; unfortunately, there are many network errors which just result in black holes.

Handling POSIX socket read() errors

Currently I am implementing a simple client-server program with just the basic functionalities of read/write.
However I noticed that if for example my server calls a write() to reply my client, and if my client does not have a corresponding read() function, my server program will just hang there.
Currently I am thinking of using a simple timer to define a timeout count, and then to disconnect the client after a certain count, but I am wondering if there is a more elegant/or standard way of handling such errors?
There are two general approaches to prevent server blocking and to handle multiple clients by a single server instance:
use POSIX threads to handle each client's connection. If one thread blocks because of erroneous client, other threads will still continue to run. If the remote client has just disappeared (crashed, network down, etc.), then sooner or later the TCP stack will signal a timeout and the blocked write operation will fail with error.
use non-blocking I/O together with a polling mechanism, e.g. select(2) or poll(2). It is quite harder to program using polling calls though. Network sockets are made non-blocking using fcntl(2) and in cases where a normal write(2) or read(2) on the socket would block an EAGAIN error is returned instead. You can use select(2) or poll(2) to wait for something to happen on the socket with an adjustable timeout period. For example, waiting for the socket to become writable, means that you will be notified when there is enough socket send buffer space, e.g. previously written data was flushed to the client machine TCP stack.
If the client side isn't going to read from the socket anymore, it should close down the socket with close. And if you don't want to do that because the client still might want to write to the socket, then you should at least close the read half with shutdown(fd, SHUT_RD).
This will set it up so the server gets an EPIPE on the write call.
If you don't control the clients... if random clients you didn't write can connect, the server should handle clients actively attempting to be malicious. One way for a client to be malicious is to attempt to force your server to hang. You should use a combination of non-blocking sockets and the timeout mechanism you describe to keep this from happening.
In general you should write the protocols for how the server and client communicate so that neither the server or client are trying to write to the socket when the other side isn't going to be reading. This doesn't mean you have to synchronize them tightly or anything. But, for example, HTTP is defined in such a way that it's quite clear for either side as to whether or not the other side is really expecting them to write anything at any given point in the protocol.

Is acknowledgment response necessary when using send()/recv() of Winsock?

Using Winsock, C++, I send and receive the data with send()/recv(), TCP connection. I want to be sure that the data has been delivered to the other party, and wonder if it is recommended to send back some acknowledgment message after (if) receiving data with recv.
Here are two possibilities, and please advice which way to go:
If send returns the size of passed buffer, assume that the data has been delivered at least to recv function on the other side of wire. When I say "at least", I mean even if the recv fails there (e.g. due to insufficient buffer, etc.), I don't care, I just want to be sure I've done my server part of work properly - I've sent the data completely (i.e. the data reached the other machine).
Use additional acknowledgment: after receiving the data with recv, send back some ID of received packet (part of header of each data sent) signaling the successful receive operation of that packet. If I don't receive such "acknowledgment message" after some interval, return failure code from the sender function.
The second answer looks more safe, but I don't want to complicate the transfer protocol if it is redundant. Also please note that I'm talking about the TCP connection (which is more safe by itself than UDP).
Is there any other mechanisms (maybe some other APIs? maybe WSARecv()/WSASend() work differently?) of ensuring that the data was delivered to the recv function on the other side?
If you recommend the second way, could you please give me some code snippet that allows me to use recv with timeout to receive the acknowledgment? recv is a blocking operation so it will hang forever if the previous send attempt failed (the other party was not notified). Is there any simple way of using recv with timeout (without creating separate thread every time which would probably be the overkill for each and every send operation).
Also the amount of data I pass to send function might be quite big (several megabytes), so how to choose the timeout for "acknowledgment message"? Maybe I should "split" large buffers and use several send calls? I think it will get quite complicated, please advice!
EDIT: OK, you people are suggesting that TCP/IP stack will handle it (i.e. no manual acknowledgment required), but this is what I found on MSDN page: "The successful completion of a send function does not indicate that the data was successfully delivered and received to the recipient. This function only indicates the data was successfully sent." So even if the TCP mechanism has the ability to ensure data delivery, I can't get that status (success or not) via send() function, or any other Winsock function I know. Do you know any way of getting the status from the TCP layer? Again - return value of send() function seems to be not enough!
========================================================
EDIT 2: OK, I think we agree that even though TCP protocol considers the error handling when something goes wrong, the send() function of Winsock is not capable of reporting the errors (simply because it returns before actual transmitting of data starts by the network driver). So here is a million dollar question: Does the send() function of Winsock at least ensure that no other packets will be delivered to the other party until the current packet will be? In other words, if the sending fails for some network failure (but not reported by send() call), and then the network failure will be fixed before next call of send() function with next chunk of data, will it be ensured that the previous packet (which failed but not reported by send()) will be delivered before the next packet? In other words, is there a chance that the one particular send() function will fail "silently", so that subsequent send() calls will succeed but the first packet will be lost? AGAIN - I'm not talking at the TCP level, I'm talking at the Winsock API level!
Why don't you trust your TCP/IP stack to guarantee delivery. After all, that is the whole point of using TCP instead of UDP.
The existing answers here are mostly correct: if you use TCP you really don't need to worry about reliable delivery of your packets to your peer.
But this is a dangerous view for some systems where data integrity must be taken to the next level: the common criteria auditing requirement FAU_STG.4.1 requires the ability to prevent auditable events if the audit log might suffer a loss of audit entries. (For example, the Linux auditd(8) audit logging daemon can be configured to place the computer in single-user-mode or halt the system completely when there is no more space left for audit logs.) Audit logs from remote systems should probably be maintained until it is known that they have been successfully written to centralized log servers.
Financial transactions would probably be best handled with a more reliable protocol than simple TCP as well -- crediting or debiting accounts would be best handled with a multi-staged protocol to ensure availability of funds, perform the transaction, then report the result of the transaction to the origination point.
TCP allows nearly a gigabyte of in-flight data between two peers (under extreme conditions); depending upon the requirements of your application, you might need to maintain that data at the sending side until you receive positive confirmation from your peer that the data has been properly handled.
Thankfully, most applications aren't this critical; losing a megabyte of data here or there down a socket that reports a closed connection at some point "in the future" really isn't horrible -- we just re-try our HTTP request, or re-attempt the SFTP connection.
Update
A socket will only accept enough data to fill its available window. The window size is negotiated between the two peers during the session handshake. So your calls to send() will begin blocking when the socket's window fills. (The OS might keep letting you add data to its internal buffers too, but at some point the writes will block.) If the peer breaks the connection with a RST or ICMP Unreachable message, a future call to send() will return an error value for Connection Reset or Broken Pipe.
Update 2
I'm not talking at the TCP level, I'm talking at the Winsock API level
This might be the source of confusion. send() has no choice but to adhere to the TCP behavior when used with TCP.
TCP guarantees in-order reliable delivery of a stream of bytes, to the extent that packets can be delivered. (See #Hans's comment about a pony and careless people kicking power cords.) The peer program will see bytes in the correct order they were sent. (Well, okay, TCP also has out-of-band urgent packet delivery, but I haven't actually seen any applications that use it. Using OOB packets, you can get some data out-of-line. Forget I mentioned it.)
If the remote program receives a byte sent on a TCP stream, it reliably received all preceding bytes as well. (Well, there are entire classes of replay attacks that splice together legitimate and fake packets for the remote peer, but those are increasingly difficult on systems with randomized initial sequence numbers. If this is within your threat model, you should be using TLS on top of TCP to provide cryptographically strong tamper evident information. But TLS can't provide better per-packet delivery notification.)
If you use UDP and you care about the data actually being received by the other side you NEED to use ACK, but if you don't need the speed of UDP you should use TCP, as it does the ACKing for you.
I think you are over complicating this, trust your TCP/IP software stack and the reliable delivery it offers. TCP sockets operate on streams of data, not packets. Also one call to send does not guarantee one call to recv.