How to get return value of 0 in UDP recvfrom() call after shutdown call.? - c++

I have tried getting the value of 0 over the network when the socket connection has been gracefully closed by the sender as specified here. When I used unblocked call I was getting -1 in the UDP stream before data was sent from sender to the receiver . After the original data was sent and when I closed the connection(tried shutting down the socket and closing the socket on the sender side) I was still getting -1 rather than getting 0 indicating the socket has been closed. can anybody please help is there is any way to get the same.
Thanks.

When UDP socket is close(2)-ed there's nothing sent out, even if the socket was connect(2)-ed. TCP, on the other hand, initiates four-way connection tear-down. Looks like you are confusing these two cases.

UNIX man page for shutdown states the following:
Return Value:
On success, zero is returned. On error, -1 is returned,
and errno is set appropriately.
Errors:
EBADF - sockfd is not a valid descriptor.
ENOTCONN - The specified socket is not connected.
ENOTSOCK - sockfd is a file, not a socket.
And Windows platform have quite the same:
Return value
If no error occurs, shutdown returns zero. Otherwise, a value of
SOCKET_ERROR is returned, and a specific error code can be retrieved
by calling WSAGetLastError.
Thing is: UDP is not connection oriented protocol and connect() call for it do not mean that any association is established whatsoever.
So my guess, you're actually getting ENOTCONN error (or WSAENOTCONN, if you're on Windows)
Check your errno (or WSAGetLastError() on Windows)

Related

What is minimal work needed to reconnect existing socket to new server?

Given the following pseudocode...
int sock = socket( AF_INET, SOCK_STREAM, 0 );
sockaddr_in si;
si.sin_family = AF_INET;
si.sin_port = 0;
si.sin_addr.s_addr = htonl( inet_network( "127.0.0.100" ) );
bind( sock, (sockaddr*)&si, sizeof si );
...
struct sockaddr_in peer_addr;
inet_pton(AF_INET, "127.0.0.200", &peer_addr.sin_addr);
peer_addr.sin_family = AF_INET;
peer_addr.sin_port = htons( 9000 );
connect( sock, (sockaddr*)&peer_addr, sizeof peer_addr) );
...assuming the connect() is successful, and is followed by the peer closing its respective sockets used with listen() and returned by accept(), is it possible to reuse sock as the argument to a subsequenet connect() with a different peer address?
Experimentally, the answer seems to be "no": although the second connect() returns 0, the second peer to which I try to connect never returns from accept(). Can a knowledgeable answerer explain the nature of what is going wrong here? The 0 return value supposedly indicates success, so why might the peer accept() never unblock?
Is there something I can do to reuse sock to connect to a second peer? Or must that second connect() be done with a socket freshly-created by socket()? (I have verified that doing so works)
What is minimal work needed to reconnect existing socket to new server?
Infinite. It is impossible.
is it possible to reuse sock as the argument to a subsequenet connect() with a different peer address?
No. You cannot reconnect a TCP socket once you have called connect(), even if it failed. You have to close it and create a new socket. One reason is that if the socket wasn't bound, connect() binds it, and that binding is chosen based on the IP route to the destination, which may not be the same for the second destination.
although the second connect() returns 0
Hard to believe. Are you sure?
the second peer to which I try to connect never returns from listen().
listen() doesn't block. Do you mean accept()?
Can a knowledgeable answerer explain the nature of what is going wrong here?
Again you must mean accept(), and again it is hard to believe in the second connect() returning zero. connect() should have returned -1 with errno == EISCONN (or WSAGetLastError() == WSAEISCONN on Windows).
EDIT However calling connect() for a second time on a non-blocking socket is used to detect whether the first connect() has completed. This technique is in all the old books, but now that we have SO_ERROR the correct technique is to check getsockopt(SO_ERROR) if you got EAGAIN/EWOULBLOCK from the first connect(). You do these checks when you get a write notification from select(), or a write or error notification from (e)poll(). So all that happened in your case was that the second connect() confirmed the success of the first connect(), and ignored the different target address/port.
The 0 return value supposedly indicates success, so why might the peer listen() never unblock?
Whatever the appearance, the second connect() failed, so there was no reason for the server to do anything, let alone return from accept().
Is there something I can do to reuse sock to connect to a second peer?
No.
Or must that second connect() be done with a socket freshly-created by socket()?
Yes.
is it possible to reuse sock as the argument to a subsequenet connect() with a different peer address?
For a TCP socket, no it is not possible (for a UDP socket, it is allowed). Once a TCP socket has been closed, it cannot be reused. You need a separate socket() call for each connect() call.
HOWEVER, on Windows only, a SOCKET (from socket() or WSASocket()) can be reused, but only if it is closed using DisconnectEx() with the dwFlags parameter set to TF_REUSE_SOCKET. Then the SOCKET can be passed to ConnectEx() (or AcceptEx()).

WSASend returns before sending data to device actually

Sorry for improper description of my question.
What my program do is that connect a server, send some data and close connection. I simplified my code as below:
WSAStartup(MAKEWORD(2, 2), &wsaData);
SOCKET s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
connect(s, (const sockaddr*)&dstAddr, sizeof(dstAddr));
send(s, (const char*)pBuffer, fileLen, 0);
shutdown(s, SD_SEND);
closesocket(s);
WSACleanup();
Only partial data was received by server before found a RST causing communication shutdown.
I wrote a simulate server program to accept connection and receive data, but the simulator could get all data. Because I couldn't access server's source code, I didn't know if something made wrong in it. Is there a way I can avoid this error by adding some code in client, or can I prove that there is something wrong in server program?
Setting socket's linger option can fix the bug. But I need to give a magic number for the value of linger time.
linger l;
l.l_onoff = 1;
l.l_linger = 30;
setsockopt(socket, SOL_SOCKET, SO_LINGER, (const char*)&l, sizeof(l));
WSASend returns before sending data to device actually
Correct.
I created a non-blocking socket and tried to send data to server.
WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, WSA_FLAG_OVERLAPPED)
No you didn't. You created an overlapped I/O socket.
After executed, returnValue was SOCKET_ERROR and WSAGetLastError() returned WSA_IO_PENDING. Then I called WSAWaitForMultipleEvents to wait for event being set. After it returned WSA_WAIT_EVENT_0, I called WSAGetOverlappedResult to get actual sent data length and it is the same value with I sent.
So all the data got transferred into the socket send buffer.
I called WSASocket first, then WSASend/WSAWaitForMultipleEvents/WSAGetOverlappedResult several times to send a bunch of data, and closesocket at the end.
So at the end of that process all the data and the close had been transferred to the socket send buffer.
But server couldn't receive all data, I used Wireshark to view tcp packets and found that client sent RST before all packet were sent out.
That could be for a number of reasons none of which is determinable without seeing some code.
If I slept 1 minute before calling closesocket, then server would receive all data.
Again this would depend on what else had happened in your code.
It seemed like that WSASend/WSAWaitForMultipleEvents/WSAGetOverlappedResult returned before sending data to server actually.
Correct.
The data were saved in buffer and waiting for being sent out.
Correct.
When I called closesocket, communication was shut down.
Correct.
They didn't work as my expectation.
So your expectation was wrong.
What did I go wrong? This problem only occurred in specific PCs, the application run well in others.
Impossible to answer without seeing some code. The usual reason for issuing an RST is that the peer had written data to a connection that you had already closed: in other words, an application protocol error; but there are other possibilities.

SSL_shutdown() returns -1 and errno is 0

In my C++ application I use OpenSSL to connect to a server using nonblocking BIO. I am developing for mac OS X and iOS.
The first call to SSL_shutdown() returns 0. Which means I have to call SSL_shutdown() again:
The following return values can occur:
0 The shutdown is not yet finished. Call SSL_shutdown() for a second time, if a bidirectional shutdown shall be performed. The output of SSL_get_error may be misleading, as an erroneous SSL_ERROR_SYSCALL may be flagged even though no error occurred.
<0
The shutdown was not successful because a fatal error occurred either at the protocol level or a connection failure occurred. It can also occur if action is need to continue the operation for non-blocking BIOs. Call SSL_get_error with the return value ret to find out the reason.
https://www.openssl.org/docs/ssl/SSL_shutdown.html
So far so god. The problem occurs on the second call to SSL_shutdown(). This returns -1 which means an error has occurred (see above). Now if I check with SSL_get_error() I get error SSL_ERROR_SYSCALL which in turn is supposed to mean a system error has occurred. But now the catch. If I check the errno it returns 0 -> unknown error. What I have read so far about the issue is, that it could mean that the server did just "hang up", but to be honest this does not satisfy me.
Here is my implementation of the shutdown:
int result = 0;
int shutdownResult;
while ((shutdownResult = SSL_shutdown(sslHandle)) != 1) { //close connection 1 means everything is shut down ok
if (shutdownResult == 0) { //we are supposed to call shutdown again
continue;
} else if (SSL_get_error(sslHandle, shutdownResult) == SSL_ERROR_WANT_READ) {
[...] //omitted want read code, in this case the application never reaches this point
} else if (SSL_get_error(sslHandle, shutdownResult) == SSL_ERROR_WANT_WRITE) {
[...] //omitted want write code, in this case the application never reaches this point
} else {
logError("Error in ssl shutdown, ssl error: " + std::to_string(SSL_get_error(sslHandle, shutdownResult)) + ", system error: " + std::string(strerror(errno))); //something went wrong
break;
}
}
When run the application logs:
ERROR:: Error in ssl shutdown, ssl error: 5, system error: Undefined error: 0
So is here just the server shutting down the connection or is there a more critical issue? Am I just missing something really obvious?
A full SSL shutdown consists of two parts:
sending the 'close notify' alert to the peer
receiving the 'close notify' alert from the peer
The first SSL_shutdown returned 0 which means that it did send the 'close notify' to the peer but did not receive anything back yet. The second call of SSL_shutdown fails because the peer did not do a proper SSL shutdown and send a 'close notify' back, but instead just closed the underlying TCP connection.
This behavior is actually very common and you can usually just ignore the error. It does not matter much if the underlying TCP connection should be closed anyway. But a proper SSL shutdown is usually needed when you want to continue in plain text on the same TCP connection, like needed for the CCC command in FTPS connections (but even there various implementation fail to handle this case properly).

TCP socket: detect if peer has shut down before sending? (Linux)

Is there any direct command to detect whether the peer has shut down / closed its socket before sending?
I do this:
int sendResult = send( mySD, bufferPtr, numberToSend, MSG_NOSIGNAL );
send() does happily accept the message and seems to send it (positive return value), only the next time I try sending it returns an error. That means: I get the warning 1 message too late.
Yes, I am using select() beforehand, yet it still returns 1 even when the peer has shut down.
As a workaround, I can perform a 0-byte-read with recv() directly before calling send(), that tells me "Connection OK" (-1) or "Peer shutdown" (0) and does pretty much the job:
int readTest = recv( mySD, NULL, 0, MSG_DONTWAIT | MSG_PEEK );
But from the semantic standpoint, it does "feel" wrong to read when I actually want sending, what I actually want is a mere test. So is there a command such as "socket status" where I can directly figure out what I need? The kind of thing recv() uses internally?
As your programs is select based, I believe you register the socket both for read and write fd set. If yes, you would be getting a select return for read fd set and you would be 'recv'ing eventually '0' and hence closing the socket.
I guess there is a reason why protocols on top of sockets do implement ping-pong mechanisms?
Best, Peter

Why would connect() give EADDRNOTAVAIL?

I have in my application a failure that arose which does not seem to be reproducible. I have a TCP socket connection which failed and the application tried to reconnect it. In the second call to connect() attempting to reconnect, I got an error result with errno == EADDRNOTAVAIL which the man page for connect() says means: "The specified address is not available from the local machine."
Looking at the call to connect(), the second argument appears to be the address to which the error is referring to, but as I understand it, this argument is the TCP socket address of the remote host, so I am confused about the man page referring to the local machine. Is it that this address to the remote TCP socket host is not available from my local machine? If so, why would this be? It had to have succeeded calling connect() the first time before the connection failed and it attempted to reconnect and got this error. The arguments to connect() were the same both times.
Would this error be a transient one which, if I had tried calling connect again might have gone away if I waited long enough? If not, how should I try to recover from this failure?
Check this link
http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not.html
EDIT: Yes I meant to add more but had to cut it there because of an emergency
Did you close the socket before attempting to reconnect? Closing will tell the system that the socketpair (ip/port) is now free.
Here are additional items too look at:
If the local port is already connected to the given remote IP and port (i.e., there's already an identical socketpair), you'll receive this error (see bug link below).
Binding a socket address which isn't the local one will produce this error. if the IP addresses of a machine are 127.0.0.1 and 1.2.3.4, and you're trying to bind to 1.2.3.5 you are going to get this error.
EADDRNOTAVAIL: The specified address is unavailable on the remote machine or the address field of the name structure is all zeroes.
Link with a bug similar to yours (answer is close to the bottom)
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4294599
It seems that your socket is basically stuck in one of the TCP internal states and that adding a delay for reconnection might solve your problem as they seem to have done in that bug report.
This can also happen if an invalid port is given, like 0.
If you are unwilling to change the number of temporary ports available (as suggested by David), or you need more connections than the theoretical maximum, there are two other methods to reduce the number of ports in use. However, they are to various degrees violations of the TCP standard, so they should be used with care.
The first is to turn on SO_LINGER with a zero-second timeout, forcing the TCP stack to send a RST packet and flush the connection state. There is one subtlety, however: you should call shutdown on the socket file descriptor before you close, so that you have a chance to send a FIN packet before the RST packet. So the code will look something like:
shutdown(fd, SHUT_RDWR);
struct linger linger;
linger.l_onoff = 1;
linger.l_linger = 0;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_LINGER,
(char *) &linger, sizeof(linger));
close(fd);
The server should only see a premature connection reset if the FIN packet gets reordered with the RST packet.
See TCP option SO_LINGER (zero) - when it's required for more details. (Experimentally, it doesn't seem to matter where you set setsockopt.)
The second is to use SO_REUSEADDR and an explicit bind (even if you're the client), which will allow Linux to reuse temporary ports when you run, before they are done waiting. Note that you must use bind with INADDR_ANY and port 0, otherwise SO_REUSEADDR is not respected. Your code will look something like:
int opts = 1;
// todo: test for error
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
(char *) &opts, sizeof(int));
struct sockaddr_in listen_addr;
listen_addr.sin_family = AF_INET;
listen_addr.sin_port = 0;
listen_addr.sin_addr.s_addr = INADDR_ANY;
// todo: test for error
bind(fd, (struct sockaddr *) &listen_addr, sizeof(listen_addr));
// todo: test for addr
// saddr is the struct sockaddr_in you're connecting to
connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
This option is less good because you'll still saturate the internal kernel data structures for TCP connections as per netstat -an | grep -e tcp -e udp | wc -l. However, you won't start reusing ports until this happens.
I got this issue. I got it resolve by enabling tcp timestamp.
Root cause:
After connection close, Connections will go in TIME_WAIT state for some
time.
During this state if any new connections comes with same IP and PORT,
if SO_REUSEADDR is not provided during socket creation then socket bind()
will fail with error EADDRINUSE.
But even though after providing SO_REUSEADDR also sockect connect() may
fail with error EADDRNOTAVAIL if tcp timestamp is not enable on both side.
Solution:
Please enable tcp timestamp on both side client and server.
echo 1 > /proc/sys/net/ipv4/tcp_timestamps
Reason to enable tcp_timestamp:
When we enable tcp_tw_reuse, sockets in TIME_WAIT state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If we enable tcp_timestamps, it will make sure that those collisions cannot happen. However, we need TCP timestamps to be enabled on both ends. See the definition of tcp_twsk_unique for the gory details.
reference:
https://serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1
Another thing to check is that the interface is up. I got confused by this one recently while using network namespaces, since it seems creating a new network namespace produces an entirely independent loopback interface but doesn't bring it up (at least, with Debian wheezy's versions of things). This escaped me for a while since one doesn't typically think of loopback as ever being down.