How to close tcp server socket correctly - c++

can anyone explain me, what I am doing wrong with my tcp server termination?
In my program (single instance), I started another program which starts a tcp server. For the tcp server it is only allowed to listen to one connection.
After a connection was established between client and my server, a few messages are shared. As soon as the message protocoI has passed through, I want to terminate the server socket, reset my internal states and close my sub-programm.
After a few seconds, it will be possible to open my sub-programm again.
If so, I open the socket again... The same network device, the same ip address and the same port as before will be used...
My problem: my sub-programm crashes when running the 2nd time.
With netstat I analyzed my socket and found out, it stays in state LAST_ACK.
This can take more than 60 seconds (timeout?) till the socket is finally closed.
For closing the socket, I used the following code:
if (0 != shutdown(socketDescriptor_, SHUT_RDWR)) {
std::cout << "Read/write of socket deactivated" << std::endl;
}
if (0 == close(socketDescriptor_)) {
std::cout << "Socket is destroyed" << std::endl;
socketDescriptor_ = -1;
}
Any ideas? Thanks for your help!
Kind regards,
Matthias

Related

Always listening UDP Server

Good afternoon all,
I have been making a UDP server for gathering metrics on my Windows server (SNMP isn't accurate on Windows as it doesn't have 64bit counters). The server runs on the Windows server and the client is running on a Linux monitoring box.
I have set it up running as a service and it is running great except for, every once and a while, the UDP packet is not received from the Linux machine. I am using the following bit of code to receive UDP packets:
bytes_received = recvfrom(serverSocket, serverBuf, serverBufLen, 0, (SOCKADDR*)&SenderAddr, &SenderAddrSize);
The socket is set to timeout every 15 seconds (So any service control requests like stop can be executed). What I am thinking is happening is either:
The UDP packet is arriving in between the 15 second timeout and when it starts listening again.
The packet is arriving a fraction of a second after another UDP packet has arrived (for a different metric) and it has gone onto starting up a process to send a packet back, and thus it isn't at the recvfrom yet.
(I am basing both of those off my assumption that it is only waiting for a packet when it is at recvfrom).
I could possibly move over to TCP to solve this issue, but since the information is time sensitive, I would prefer to stay with UDP for it's speed.
Is there anyway to queue up incoming packets and have them be processed or would I be best to look at TCP instead?
I ended up coming up with the idea of transmitting the UDP packet if the first one doesn't get a response after 2 seconds. Works a treat so far.
Edit:
It is asking for code:
std::string returnMsg;
returnMsg = "CRITICAL - No packet recieved back.";
int i = 0;
while(returnMsg == "CRITICAL - No packet recieved back.") {
if(i == 5) {
std::cout << "CRITICAL - No packet recieved back." << "\n";
return 2;
}
//std::cout << "Try " << i << "\n";
// Now lets send the message
send_message(args[2],message.c_str());
// Now lets wait for response
returnMsg = recieve_message();
i++;
}
The recieve_message function returns "CRITICAL - No packet recieved back" when the timeout occurs.

OpenSSL client stuck in endless read

I am using cpp-httplib to retrieve some data from a server using long polling (that is, the client will issue a request to the server, and the server will just keep the connection open until the required data is available or a timeout is reached).
The program is running on my raspberry pi, which sits behind a router that does not have an outgoing static ip address. Every time the ip is reassigned (or, at least, close to that time point), my program breaks, in that the thread currently performing the poll will be forever stuck in httplib::SSLClient::Get, which is caused by a blocking read() syscall. Both server- and client timeouts are unable to do anything, while a connection close should make read immediately return 0, which is what i would have expected in this situation.
Inspecting the program with gdb shows the following:
(gdb) thread 2
(gdb) where
__libc_read (nbytes=5, buf=0x75608edb, fd=3) at ../sysdeps/unix/sysv/linux/read.c:26
__libc_read (fd=3, buf=0x75608edb, nbytes=5) at ../sysdeps/unix/sysv/linux/read.c:24
0x76d1862c in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I am not doing anything (as far as I know) that could accidentally overwrite return addresses.
For comparison, a 'healthy' stack trace during a SSLCLient::Get can be found here.
The actual code is quite a lot, but here's a short version that shows the same behaviour:
#include <iostream>
#define CPPHTTPLIB_OPENSSL_SUPPORT 1
#include "httplib.h"
void poll(httplib::SSLClient* c, char* path) {
while (true) {
auto response = c->Get(path);
std::cout << response.body << std::endl;
}
}
int main(int argc, char* argv[]) {
if (argc >= 3) {
httplib::SSLClient client(argv[1], 443, 20);
std::thread poll_thread(poll, &client, argv[2]);
poll_thread.join();
} else {
std::cerr << "Usage: ./poll <host> <path>" << std::endl;
return 1;
}
}
I can think of some workarounds that might or might not work, but I'd really like to know why and how this is happening in the first place.
Just expanding on the keep_alive option I mentioned in the comment.
In the scenario you described, it seems possible that the underlying TCP socket connection was terminated in an unclean fashion. I.e., you say the IP address was reassigned.
Ideally when there is a TCP socket termination, you want your code to exit out of any blocked read/poll operation. That is what will happen for normal socket closures, e.g., say the remote process is killed, or the remote process just decides it is time to close. But if the IP address of your host is changed .... I'm not sure there will necessarily be a low level TCP messages that says, to affect, this connection is now closed. So the consequence for your program is that is can still hold a local socket (the local TCP endpoint), and not realise the connection has dropped.
This is where something like keep_alive. The idea is that the kernel will send keep alive packets to keep testing if the connection is established; if these ever fail, then it can close the local socket (and so your blocking read, or blocking select, will return with some sort of end-of-stream error).
Separately to keep_alive, you can also consider application heart-beat messages (e.g., websocket has ping/pong). In addition to ensuring the TCP connection remains established, it confirms whether the remote application is healthy.

Socket is open after process, that opened it finished

After closing client socket on sever side and exit application, socket still open for some time.
I can see it via netstat
Every 0.1s: netstat -tuplna | grep 6676
tcp 0 0 127.0.0.1:6676 127.0.0.1:36065 TIME_WAIT -
I use log4cxx logging and telnet appender. log4cxx use apr sockets.
Socket::close() method looks like that:
void Socket::close() {
if (socket != 0) {
apr_status_t status = apr_socket_close(socket);
if (status != APR_SUCCESS) {
throw SocketException(status);
}
socket = 0;
}
}
And it's successfully processed. But after program is finished I can see opened socket via netstat, and if it starts again log4cxx unable to open 6676 port, because it is busy.
I tries to modify log4cxx.
Shutdown socket before close:
void Socket::close() {
if (socket != 0) {
apr_status_t shutdown_status = apr_socket_shutdown(socket, APR_SHUTDOWN_READWRITE);
printf("Socket::close shutdown_status %d\n", shutdown_status);
if (shutdown_status != APR_SUCCESS) {
printf("Socket::close WTF %d\n", shutdown_status != APR_SUCCESS);
throw SocketException(shutdown_status);
}
apr_status_t close_status = apr_socket_close(socket);
printf("Socket::close close_status %d\n", close_status);
if (close_status != APR_SUCCESS) {
printf("Socket::close WTF %d\n", close_status != APR_SUCCESS);
throw SocketException(close_status);
}
socket = 0;
}
}
But it didn't helped, bug still reproduced.
This is not a bug. Time Wait (and Close Wait) is by design for safety purpose. You may however adjust the wait time. In any case, on server's perspective the socket is closed and you are relax by the ulimit counter, it has not much visible impact unless you are doing stress test.
As noted by Calvin this isn't a bug, it's a feature. Time Wait is a socket state that says, this socket isn't in use any more but nevertheless can't be reused quite yet.
Imagine you have a socket open and some client is sending data. The data may be backed up in the network or be in-flight when the server closes its socket.
Now imagine you start the service again or start some new service. The packets on the wire aren't aware that its a new service and the service can't know the packets were destined for a service that's gone. The new service may try to parse the packets and fail because they're in some odd format or the client may get an unrelated error back and keep trying to send, maybe because the sequence numbers don't match and the receiving host will get some odd error. With timed wait the client will get notified that the socket is closed and the server won't potentially get odd data. A win-win. The time it waits should be sofficient for all in-transit data to be flused from the system.
Take a look at this post for some additional info: Socket options SO_REUSEADDR and SO_REUSEPORT, how do they differ? Do they mean the same across all major operating systems?
TIME_WAIT is a socket state to allow all in travel packets that could remain from the connection to arrive or dead before the connection parameters (source address, source port, desintation address, destination port) can be reused again. The kernel simply sets a timer to wait for this time to elapse, before allowing you to reuse that socket again. But you cannot shorten it (even if you can, you had better not to do it), because you have no possibility to know if there are still packets travelling or to accelerate or kill them. The only possibility you have is to wait for a socket bound to that port to timeout and pass from the state TIME_WAIT to the CLOSED state.
If you were allowed to reuse the connection (I think there's an option or something can be done in the linux kernel) and you receive an old connection packet, you can get a connection reset due to the received packet. This can lead to more problems in the new connection. These are solved making you wait for all traffic belonging to the old connection to die or reach destination, before you use that socket again.

UDP bind failures

{Windows 7, MinGW 4.8, boost 1.55}
I'm having some problems with UDP binds. I've a client that broadcasts datagrams for listeners listening on specific port and binds to a port itself if the listeners want to communicate something back.
The port on which the client needs to bind is X and the servers are listening on Y.
Problem:
If I simulate a client-crash (eg., by causing segmentation fault by dereferencing a nullptr) after binding the UDP socket to the port, then once the client application is no longer running (no longer listed in Windows Task Manager) netstat -ano | find "X" still shows that someone is bound to port X and ip address of 0.0.0.0 (the client had specified the IP address as any address). The PID cannot be found in Windows Task Manager. However when I downloaded application TCPView I can see that a <non-existent> process is still bound to 50000. On starting the client (without making it crash this time) subsequently.
I get two behaviors:
<1> On some machines the client is unable to bind to the socket again (although reuse_address option is set to true) and the error message is: An attempt was made to access a socket in a way forbidden by its access permissions.
<2> On other machines the client binds successfully but the read handler is not called and the client does not receive any datagram on port X although the servers are unicasting to the client port X. Infact <2> is true even for launching multiple instances of the client on the same machine even if none of the clients were deliberately made to crash and exist as zombie processes. Only the 1st one gets datagrams.
Here is how client socket is set up:
if(!m_udpSocket.is_open())
{
m_udpSocket.open(m_localEndpoint.protocol(), errorCode); //m_localEndpoint is address 0.0.0.0 and port X
if(errorCode)
{
std::cerr << "Unable to open socket: " << errorCode.message() << std::endl;
}
else
{
m_udpSocket.set_option(boost::asio::socket_base::reuse_address(true), errorCode);
if(errorCode)
{
std::cerr << "Reuse address option set failure. " << errorCode.message() << std::endl;
}
m_udpSocket.set_option(boost::asio::socket_base::broadcast(true), errorCode);
if(errorCode)
{
std::cerr << "Socket cannot send broadcast. " << errorCode.message() << std::endl;
}
else
{
m_udpSocket.bind(m_localEndpoint, errorCode);
if(errorCode)
{
std::cerr << "Socket cannot bind...!! " << errorCode.message() << std::endl;
}
}
}
}
Can you explain why do I get <1> and <2> and what can I do to avoid them and make socket bind even if there is some other process bound to that socket? I need to support Windows, Linux and MAC.

C++ sleep() breaks program

I am trying to connect to a computer through a socket in c++. Basically what this code should do is try to connect, and if it cant connect, it should wait 3 seconds and try again.
while (true) {
if (connect(sock, (struct sockaddr *) &echoserver, sizeof(echoserver)) >= 0)
{
break;
}
cout << "Connection failed!";
sleep(3);
}
What the code does when its running is it will connect if it can, but if it can't, the cout never gets called and sleep never gets called either. When sleep is not there, the program works and continually tries to connect to the socket but there is no delay so it wouldn't connect anyway. I really need the delay to work.
Could anyone please help?
Once the connect fails, the socket refers to the failed connection. You can no longer use it to connect to anything. You need to close the existing socket and allocate a new one. This would have been much easier to diagnose if you had reported the error in the cout statement. (See docs for strerror and errno.)