Blocking TCP sockets: “send()” return and error handling - c++

According to this answer Blocking sockets: when, exactly, does “send()” return?, send() on a blocking socket will return as soon as the user buffer has been copied into kernel buffer. In the case of a delivery failure (i.e., the client doesn't receive the bytes), how does the process that called send() get notified that a failure occurred?

The POSIX/BSD socket APIs do not provide an interrupt driven asynchronous interface to TCP connection errors. Since TCP is reliable, the only way the data was not delivered is if the connection itself suffered a failure that prevented complete delivery.
You have to detect the error by performing some kind of synchronous operation on the (perhaps non-blocking) socket. The most asynchronous mechanism is using select or poll (or OS specific alternatives), which allows waiting for status updates on more than one socket in a single synchronous call. Errors may appear in the exceptfds set of select, or as an error indication when using poll for read or write. In addition, an error will be delivered when you attempt to read or write to a socket that is no longer connected.

Packets being lost and resent are a normal condition under TCP/IP, even under normal operation (congestion control relies on packets being lost!).
This is not an error, and therefore you shouldn't receive one (indeed, send couldn't possibly return an error, since it has already returned by the time this is detected). Instead, the datagrams are silently resent until acknowledged (or until TCP gives up).
You normally use poll (or select) in combination with blocking sockets, so in case the other end closes the connection or someone pulls out the ethernet cable, you will (well, should, and not necessarily immediately) see POLLHUP, POLLRDHUP, or POLLERR.
When that happens, you know that nobody is listening at the other end any more.
Note that events like routers going down and cables being pulled do not necessarily break a TCP connection, at least not immediately. This can only be detected when a send is attempted, and the destination isn't reachable. That could, in theory, happen only after minutes or hours (or someone could in the mean time plug the cable back in, and you never know!).

Short answer there is no way of knowing whether data was delivered to the remote end or not unless such a check is incorporated into session layer protocol. It is for this reason that protocols like HTTP require the remote end to send a response even for PUT requests.
Since IP packets traverse multiple networks and may very well take different routes to the destination, there is no way of knowing when sending a packet whether it will reach the remote end. At best the host will continue to resend a packet until the remote end acknowledges the packet or give up after a timeout.
So if you want to be sure whether data is received by the remote end, require a response from the remote end as part of your protocol.

Related

TCP - What if client call close() before server accept()

In C/C++, if client and server finished 3-way handshake and this connection was sitting in server's backlog (listening queue). Before server calls accept(), what gonna happen if client calls close(). Will this connection get removed from backlog?
The real world situation is that, server sometimes is too busy to accept every connection immediately. So there will be some connections waiting in backlog. The client has a timeout for the first response from server. If the timeout happens, it will call close() then retry or whatever. At this moment, I am wondering if the backlog of server will remove the connection from backlog.
Please share your idea. Appriciate it!
Generally speaking, if a client calls close(), the clients protocol stack will send a FIN to indicate that the client is done sending, and will wait for the server to send a FIN,ACK back to the client (which won't happen before the server accepts the connection, as we shall see), and then the client will ACK that. This would be a normal termination of a TCP connection.
However, since a TCP connection consists of two more or less independent streams, sending a FIN from the client really is only a statement that the client is done sending data (this is often referred to as "half closed"), and is not actually a request at the TCP protocol level to close the connection (although higher level protocols often will interpret it that way, but they can only do so after the connection has been accepted and they have had a read return 0 bytes in order to learn that the client is done writing). The server can still continue to send data, but since the client has called close(), it is no longer possible for this data to be delivered to the client application. If the server sends further data, the protocol stack on the client will respond with a reset, causing an abnormal termination of the TCP connection. If the client actually wished to continue receiving data from the server after declaring that it was done sending data, it should do so by calling shutdown(sock,SHUT_WR) rather than calling close().
So what this means is that the connections that time out and that are normally closed by clients will generally remain active at the server, and the server will be able to accept them, read the request, process the request, and send the reply and only then discover that the application can no longer read the reply when the reset is returned from the client. The reason I say "generally" is that firewalls, proxies, and OS protocol stacks all place limits on how long a TCP connection can remain in a half closed state, generally in violation of the relevant TCP RFCs but for "valid" reasons such as dealing with DDOS.
I think your concern is that a server that is overloaded will be further overloaded by clients timing out and retrying, which in my view is correct based on my preceding explanation. In order to avoid this, a client timing out could set SO_LINGER to 0 prior to calling close() which would cause a reset to be sent to cause an immediate abnormal termination. I would also suggest using an exponential back-off on timeout to further mitigate the impact on an overloaded server.
Once the 3way handshake is complete, the connection is in an ESTABLISHED state. On the client side, it can start sending data immediately. On the server side, the connection is placed in a state/queue that accept() can then pull from so the application can use the connection (see How TCP backlog works in Linux).
If the server doesn't accept() the connection, the connection is still ESTABLISHED, it's inbound buffer will simply fill up with whatever data the client sends, if any.
If the client disconnects before accept() is called, then the connection still enters the CLOSED state, and will be removed from the queue that accept() pulls from. The application will never see the connection.

C++ Socket Recv() and Network Interface going down

I have written a client using plain sockets in C to connect to a remote machine to maintain a persistent connection so as to receive push messages. Everything works great. To make it persistent, I have set Keepalive and waiting on recv() function in a loop.
The problem is, when the network interface goes down, the recv() does not return. As I understand from socket documentation that the peer has to disconnect for recv() to return. Network Interface going down is not the same as peer disconnecting.
The need here is that if the network interface goes down, I need to schedule a reconnect so that the channel gets established.
Any thoughts on this please?
Use whatever mechanism you wish to force the receive operation to timeout. Depending on the specifics of your use case, you may wish to disconnect if a timeout occurs or you may wish to send something to check the status of the connection.
Whatever protocol you are using on top of TCP should be documented and the documentation should specify how disconnects are detected. You must send to detect a connection loss, so every protocol designed to operate on top of TCP should be designed with this in mind.

TIME_WAIT with boost asio

I tried the official tcp echo server example server and client. With netstat -ano | findstr TIME_WAIT I can see the client causes a TIME_WAIT every time, while the server disconnects cleanly.
Is there anyway to prevent the TIME_WAIT or CLOSE_WAIT, that disconnect cleanly for both sides?
Here's the captured packets, seems the last ACK is sent correctly, but still there is TIME_WAIT on client side.
CLOSE_WAIT is a programming error. The local application has received an incoming close but hasn't closed this end.
TIME_WAIT comes after a clean disconnect by both parties, and it only lasts a few minutes. The way to avoid it is to be the end that receives the first close. Typically you want to avoid it at the server, so you have the client close first.
A long lingering CLOSE_WAIT is really a programming error (the OS performs the connection shutdown, but your application doesn't remember to free the socket in a timely manner -- or at all).
TIME_WAIT, however, is not really an exceptional condition. It is necessary to provide a clean closing on a connection that might have lost the very last ACK segment during the normal connection shutdown. Without it, the retransmition of the FIN+ACK segment would be responded with a connection reset, and some sensitive applications might not like it.
The most common way to have a smaller number of sockets in TIME_WAIT state is to shorten its duration globally, by tuning a global OS-level parameter. IIRC, there is also a way to disable it completely on a single socket through setsockopt() (I don't remember what option, however), but then you might occasionally send possibly unwanted RST segments to peers that lose packets during connection shutdown.
As to why you see them only on one of the sides of the connection, it is probably on the side that requested to close the connection first. It is it that sends the first FIN, receives the FIN+ACK, and sends the last ACK. If that last ACK is lost, it will receive the FIN+ACK again, and should resend the ACK, not RST. The other side, however, knows for sure that the connection is completely finished when the last ACK arrives, and then there is no need to wait for anything else on that socket -- if anything arrives to that host with the same pair of address+TCP port endpoints as the just-closed socket, than it should either be a new connection request (in which case a new connection might be opened), or it is some TCP state machine violation (and must be responded to with RST, or maybe some ICMP prohibited message).

Is it normal for WSASend to fail during big file transfers?

I need a little help if someone's got a minute.
I've written a web server using IO completion ports, but I am having some trouble sending out large files. Web pages seem to load fine, but during large file transfers, WSASend() fails after a few minutes with error "The specified network name is no longer available."
Right now, my server just closes the associated connection when any overlapped operation fails. Is this the right thing to do? or should I retry failed overlapped operations a few times before I close the socket? I am using tcp/stream sockets.
(fixed) I am also receiving what seems like random 0 byte packets from WSARecv. I am not sure what to make of this, or if the problem is related.(/fixed)
Thanks for any help
edit: now that the server properly handles connections, and has a much more comprehensive log, it seems like Len is right. The client is closing the connection for some reason.
The log:
Initializing Windows Sockets...
Forwarding port 80...
Starting server...
Waiting for incoming connections...
Socket 1128: Client connected.
Socket 1128: Request received
Socket 1128: Sent response
Socket 1128: Error 64: SendChunk() failed. //WSASend()
Socket 1128: Closing connection - GetQueueCompletionStatus == FALSE
so the question is now, why would the client close the connection? It takes anywhere from 2-5 minutes to happen. I have decreased the buffer size to 4098 bytes per send, and only send the next chunk when the first has completed.
Thanks again for any ideas on this.
p.s. I even just implemented a retry function so that it will retry a failed overlapped IO operation five times before giving up....still no luck =(
A zero length packet returned from recv indicates client on the other end has closed the connection.
Which answers why your subsequent send to the client failed.
http://www.opengroup.org/onlinepubs/009695399/functions/recv.html
If no messages are available to be
received and the peer has performed an
orderly shutdown, recv() shall return
0.
Are you doing anything to impose some form of flow control on your data transmission?
If not then you are probably using up resources which is causing the send to fail.
For example, if you are simply issuing LOTS of WSASend() calls one after the other rather than pacing them based on when they complete then each one will use system resources (non-paged pool and/or lock pages which go towards the 'locked pages limit'). You'll then likely eventually fail with ENOBUFS or similar errors.
What you need to do is build a flow control system that works off of the send completions so that you only ever have a known number of sends outstanding at a time.
See these questions for more detail:
Implement a good performing "to-send" queue with TCP
Limiting TCP sends with a "to-be-sent" queue and other design issues
Finally figured it out.
from Rogers Internet Terms of Service:
Without limitation, you may not use (or allow anyone else to use) our Services to:
(xvi) operate a server in connection with the Services, including, without limitation, >mail, news, file, gopher, telnet, chat, Web, or host configuration servers, multimedia >streamers or multi-user interactive forums;
how lame is that? O_o
good news: server works fine =)
edit- called Rogers. They verified that they are cutting me off, and told me that I need a business account to run a web server.

What is the best way to implement a heartbeat in C++ to check for socket connectivity?

Hey gang. I have just written a client and server in C++ using sys/socket. I need to handle a situation where the client is still active but the server is down. One suggested way to do this is to use a heartbeat to periodically assert connectivity. And if there is none to try to reconnect every X seconds for Y period of time, and then to time out.
Is this "heartbeat" the best way to check for connectivity?
The socket I am using might have information on it, is there a way to check that there is a connection without messing with the buffer?
If you're using TCP sockets over an IP network, you can use the TCP protocol's keepalive feature, which will periodically check the socket to make sure the other end is still there. (This also has the advantage of keeping the forwarding record for your socket valid in any NAT routers between your client and your server.)
Here's a TCP keepalive overview which outlines some of the reasons you might want to use TCP keepalive; this Linux-specific HOWTO describes how to configure your socket to use TCP keepalive at runtime.
It looks like you can enable TCP keepalive in Windows sockets by setting SIO_KEEPALIVE_VALS using the WSAIoctl() function.
If you're using UDP sockets over IP you'll need to build your own heartbeat into your protocol.
Yes, this heartbeat is the best way. You'll have to build it into the protocol the server and client use to communicate.
The simplest solution is to have the client send data periodically and the server close the connection if it hasn't received any data from the client in a particular period of time. This works perfectly for query/response protocols where the client sends queries and the server sends responses.
For example, you can use the following scheme:
The server responds to every query. If the server does not receive a query for two minutes, it closes the connection.
The client sends queries and keeps the connection open after each one.
If the client has not send a query for one minute, it sends an "are you there" query. The server responds with "yes I am". This resets the server's two minutes timer and confirms to the client that the connection is still available.
It may be simpler to just have the client close the connection if it hasn't needed to send a query for the past minute. Since all operations are initiated by the client, it can always just open a new connection if it needs to perform a new operation. That reduces it to just this:
The server closes the connection if it hasn't received a query in two minutes.
The client closes the connection if it hasn't needed to send a query in one minute.
However, this doesn't assure the client that the server is present and ready to accept a query at all times. If you need this capability, you will have to implement an "are you there" "yes I am" query/response into your protocol.
If the other side has gone away (i.e. the process has died, the machine has gone down, etc.), attempting to receive data from the socket should result in an error. However if the other side is merely hung, the socket will remain open. In this case, having a heartbeat is useful. Make sure that whatever protocol you are using (on top of TCP) supports some kind of "do-nothing" request or packet - each side can use this to keep track of the last time they received something from the other side, and can then close the connection if too much time elapses between packets.
Note that this is assuming you're using TCP/IP. If you're using UDP, then that's a whole other kettle of fish, since it's connectionless.
Ok, I don't know what your program does or anything, so maybe this isn't feasible, but I suggest that you avoid trying to always keep the socket open. It should only be open when you are using it, and should be closed when you are not.
If you are between reads and writes waiting on user input, close the socket. Design your client/server protocol (assuming you're doing this by hand and not using any standard protocols like http and/or SOAP) to handle this.
Sockets will error if the connection is dropped; write your program such that you don't lose any information in the case of such an error during a write to the socket and that you don't gain any information in the case of an error during a read from the socket. Transactionality and atomicity should be rolled into your client/server protocol (again, assuming you're designing it yourself).
maybe this will help you, TCP Keepalive HOWTO
or this SO_SOCKET