Unix socket hangs on recv, until I place/remove a breakpoint anywhere - c++

[TL;DR version: the code below hangs indefinitely on the second recv() call both in Release and Debug mode. In Debug, if I place or remove a breakpoint anywhere in the code, it makes the execution continue and everything behaves normally]
I'm coding a simple client-server communication using UNIX sockets. The server is in C++ while the client is in python. The connection (TCP socket on localhost) gets established no problem, but when it comes to receiving data on the server side, it hangs on the recv function. Here is the code where the problem happens:
bool server::readBody(int csock) // csock is the socket filedescriptor
{
int bytecount;
// protobuf-related variables
google::protobuf::uint32 siz;
kinMsg::request message;
// if the code is working, client will send false
// I initialize at true to be sure that the message is actually read
message.set_endconnection(true);
// First, read 4-characters header for extracting data size
char buffer_hdr[5];
if((bytecount = recv(csock, buffer_hdr, 4, MSG_WAITALL))== -1)
::std::cerr << "Error receiving data "<< ::std::endl;
buffer_hdr[4] = '\0';
siz = atoi(buffer_hdr);
// Second, read the data. The code hangs here !!
char buffer [siz];
if((bytecount = recv(csock, (void *)buffer, siz, MSG_WAITALL))== -1)
::std::cerr << "Error receiving data " << errno << ::std::endl;
//Finally, process the protobuf message
google::protobuf::io::ArrayInputStream ais(buffer,siz);
google::protobuf::io::CodedInputStream coded_input(&ais);
google::protobuf::io::CodedInputStream::Limit msgLimit = coded_input.PushLimit(siz);
message.ParseFromCodedStream(&coded_input);
coded_input.PopLimit(msgLimit);
if (message.has_endconnection())
return !message.endconnection();
return false;
}
As can be seen in the code, the protocol is such that the client will first send the number of bytes in the message in a 4-character array, followed by the protobuf message itself. The first recv call works well and does not hang. Then, the code hangs on the second recv call, which should be recovering the body of the message.
Now, for the interesting part. When run in Release mode, the code hangs indefinitely and I have to kill either the client or the server. It does not matter whether I run it from my IDE (qtcreator), or from the CLI after a clean build (using cmake/g++).
When I run the code in Debug mode, it also hangs at the same recv() call. Then, if I place or remove a breakpoint ANYWHERE in the code (before or after that line of code), it starts again and works perfectly : the server receives the data, and reads the correct message.endconnection() value before returning out of the readBody function. The breakpoint that I have to place to trigger this behavior is not necessarily trigerred. Since the readBody() function is in a loop (my C++ server waits for requests from the python client), at the next iteration, the same behavior happens again, and I have to place or remove a breakpoint anywhere in the code, which is not necessarily triggered, in order to go past that recv() call. The loop looks like this:
bool connection = true;
// server waiting for client connection
if (!waitForConnection(connectionID)) std::cerr << "Error accepting connection" << ::std::endl;
// main loop
while(connection)
{
if((bytecount = recv(connectionID, buffer, 4, MSG_PEEK))== -1)
{
::std::cerr << "Error receiving data "<< ::std::endl;
}
else if (bytecount == 0)
break;
try
{
if(readBody(connectionID))
{
sendResponse(connectionID);
}
// if client is requesting disconnection, break the while(true)
else
{
std::cout << "Disconnection requested by client. Exiting ..." << std::endl;
connection = false;
}
}
catch(...)
{
std::cerr << "Erro receiving message from client" << std::endl;
}
}
Finally, as you can see, when the program returns from readBody(), it sends back another message to the client, which processes it and prints in the standard output (python code working, not shown because the question is already long enough). From this last behavior, I can conclude that the protocol and client code are OK. I tried to put sleep instructions at many points to see whether it was a timing problem, but it did not change anything.
I searched all over Google and SO for a similar problem, but did not find anything. Help would be much appreciated !

The solution is to not use any flags. Call recv with 0 for the flags or just use read instead of recv.
You are requesting the socket for data that is not there. The recv expects 10 bytes, but the client only sent 6. The MSG_WAITALL states clearly that the call should block until 10 bytes are available in the stream.
If you dont use any flags, the call will succeed with a bytecount at 6, which is the exact same effect than with MSG_DONTWAIT, without the potential side effects of non-blocking calls.
I did the test on the github project, it works.

The solution is to replace MSG_WAITALL by MSG_DONTWAIT in the recv() calls. It now works fine. To summarize, it makes the recv() calls non blocking, which makes the whole code work fine.
However, this still raises many questions, the first of which being: why was it working with this weird breakpoint changing thing ?
If the socket was blocking in the first place, one could assume that it is because there is no data on the socket. Let's assume both situations here :
There is no data on the socket, which is the reason why the blocking recv() call was not working. Changing it to a non blocking recv() call would then, in the same situation, trigger an error. If not, the protobuf deserialization would afterwards fail trying to deserialize from an empty buffer. But it does not ...
There is data on the socket. Then, why on earth would it block in the first place ?
Obviously there is something that I don't get about sockets in C, and I'd be very happy if somebody has an explanation for this behavior !

Related

OpenSSL server unable to handle client disconnect with delays in between writes

I am new to OpenSSL programming. Anyways, I have coded an openssl server and client in C (also tested in C++). When the two connect they successfully handshake and are able to read and write to each other sucessfully. I have it currently set up such that the client only reads from stream and writes to buffer, like so:
while((rc = SSL_read(ssl, buffer, sizeof(buffer))) > 0){
fprintf(stdout,"%s\n", buffer);
}
Likewise, my server is setup such that it constantly writes to stream from buffer, like so:
while ((rc = SSL_write(ssl, buffer, sizeof(buffer))) > 0) {
fprintf(stdout, "Sent message.\n");
}
fprintf(stdout, "Done sending.\n");
And this works. If I were to abruptly end the client with ^C (Ctrl C), the server would finish and print "Done Sending." However, if I were to put a delay any longer than about 10000 nanoseconds between every SSL_write (within the server's writing while loop), I get unexpected behaviour when the client disconnects abruptly (using using ^C) or normally via counter and break. To clarify, before abruptly disconnecting the server is able to SSL_write and the client is able to SSL_read normally between any duration of a delay (haven't tried anything past a minute).
This issue means that a client connection can effectively crash the server thread, as demonstrated by not printing "Done Sending" after the client disconnects when using delays longer than 10000 nanoseconds. I do not want the server to be able to crash because of an abrupt disconnect in a session. To be clear the server crashes on call to SSL_write and does not return anything.
Thing I have tried attempting to solve this issue:
Attempting to see changes in these return values: SSL_want(ssl), SSL_get_error(ssl,0), ERR_get_error(), and SSL_get_shutdown(ssl). In some test code, I have printed all these out prior to calling SSL_write and none have changed their values before crashing.
Clearing the error queue prior to every SSL_write using ERR_clear_error()
Seeing if anything is printed with ERR_print_errors_fp(stderr) - nada
I have used the following delay methods:
// Method 1
for(long i = 0; i < (long) 99999999; i++){}
// Method 2
struct timespec tim, tim2;
tim.tv_sec = 0;
tim.tv_nsec = 10000L; // 5 milliseconds
nanosleep(&tim, &tim2);
// Method 3
sleep(1)
I personally think it would be ludicrous to be required to write to socket within a duration of a hundredth of a millisecond just so that the server doesn't crash on client disconnect.
Is this actually expected behavior? Am I doing something wrong or am I forgetting something? What should I do to circumvent this issue?
Any help or advice would be appreciated.

Multithreading in C++, receive message from socket

I have studied Java for 8 months but decided to learn some c++ to on my spare time.
I'm currently making a multithreaded server in QT with minGW. My problem is that when a client connects, I create an instance of Client( which is a class) and pass the socket in the client class contructor.
And then I start a thread in the client object (startClient()) which is going to wait for messages, but it doesn't. Btw, startClient is a method that I create a thread from. See code below.
What happens then? Yes, when I try to send messages to the server, only errors, the server won't print out that a new client connects, and for some reason my computer starts working really hard. And qtcreator gets super slow until I close the server-program.
What I actually is trying to achieve is an object which derives the thread, but I have heard that it isn't a very good idea to do so in C++.
The listener loop in the server:
for (;;)
{
if ((sock_CONNECTION = accept(sock_LISTEN, (SOCKADDR*)&ADDRESS, &AddressSize)))
{
cout << "\nClient connected" << endl;
Client client(sock_CONNECTION); // new object and pass the socket
std::thread t1(&Client::startClient, client); //create thread of the method
t1.detach();
}
}
the Client class:
Client::Client(SOCKET socket)
{
this->socket = socket;
cout << "hello from clientconstructor ! " << endl;
}
void Client::startClient()
{
cout << "hello from clientmethod ! " << endl;
// WHEN I ADD THE CODE BELOW I DON'T GET ANY OUTPUT ON THE CONSOLE!
// No messages gets received either.
char RecvdData[100] = "";
int ret;
for(;;)
{
try
{
ret = recv(socket,RecvdData,sizeof(RecvdData),0);
cout << RecvdData << endl;
}
catch (int e)
{
cout << "Error sending message to client" << endl;
}
}
}
It looks like your Client object is going out of scope after you detach it.
if (/* ... */)
{
Client client(sock_CONNECTION);
std::thread t1(&Client::startClient, client);
t1.detach();
} // GOING OUT OF SCOPE HERE
You'll need to create a pointer of your client object and manage it, or define it at a higher level where it won't go out of scope.
The fact that you never see any output from the Server likely means that your client is unable to connect to your Server in the first place. Check that you are doing your IP addressing correctly in your connect calls. If that looks good, then maybe there is a firewall blocking the connection. Turn that off or open the necessary ports.
Your connecting client is likely getting an error from connect that it is interpreting as success and then trying to send lots of traffic on an invalid socket as fast as it can, which is why your machine seems to be working hard.
You definitely need to check the return values from accept, connect, read and write more carefully. Also, make sure that you aren't running your Server's accept socket in non-blocking mode. I don't think that you are because you aren't seeing any output, but if you did it would infinitely loop on error spawning tons of threads that would also infinitely loop on errors and likely bring your machine to its knees.
If I misunderstood what is happening and you do actually get a client connection and have "Client connected" and "hello from client method ! " output, then it is highly likely that your calls to recv() are failing and you are ignoring the failure. So, you are in a tight infinite loop that is repeatedly outputting "" as fast as possible.
You also probably want to change your catch block to catch (...) rather than int. I doubt either recv() or cout throw an int. Even so, that catch block won't be invoked when recv fails because recv doesn't throw any exceptions AFAIK. It returns its failure indicator through its return value.

Boost asio exits with code 0 for no reason. Setting a breakpoint AFTER the problematic statement solves it

I'm writing a TCP server-client pair with boost asio. It's very simple and synchronous.
The server is supposed to transmit a large amount of binary data through several recursive calls to a function that transmits a packet of data over TCP. The client does the analogue, reading and appending the data through a recursive function that reads incoming packets from the socket.
However, in the middle of receiving this data, most times (around 80%) the client just stops recursion suddenly, always before one of the read calls (shown below). It shouldn't be able to do this, given that there are several other statements and function calls after the recursion.
size_t bytes_transferred = m_socket.read_some(boost::asio::buffer(m_fileReadBuffer, m_fileReadBuffer.size()));
m_fileReadBuffer is a boost::array of char, with size 4096 (although I have tried other buffer formats as well with no success).
There is absolutely no way I can conceive of deducing why this is happening.
The program exits immediately, so I can't pass an error code to read_some and read any error messages, since that would need to happen after the read_some statement
No exceptions are thrown
No errors or warnings on compile/runtime
If I put breakpoints inside the recursive function, the problem never happens (transfer completes successfully)
If I put breakpoints after the transfer, or trap the execution in a while loop after the transfer, the problem never happens and there is no sign of anything wrong
Also, it's important to note that the server ALWAYS successfully sends all the data. On top of that, the problem always happens at the very end of transmissions: I can send 8000 bytes and it will exit when around 6000 or 7000 bytes have been transferred, and I can send 8000000 bytes and it will exit when something like 7996000 bytes have been transferred.
I can provide any code necessary, I just have no idea of where the problem could be. Below is the recursive read function on the client:
void TCP_Client::receive_volScan_message()
{
try
{
//If the transfer is complete, exit this loop
if(m_rollingSum >= (std::streamsize)m_fileSize)
{
std::cout << "File transfer complete!\n";
std::cout << m_fileSize << " "<< m_fileData.size() << "\n\n";
return;
}
boost::system::error_code error;
//Transfer isn't complete, so we read some more
size_t bytes_transferred = m_socket.read_some(boost::asio::buffer(m_fileReadBuffer, m_fileReadBuffer.size()));
std::cout << "Received " << (std::streamsize)bytes_transferred << " bytes\n";
//Copy the bytes_transferred to m_fileData vector. Only copies up to m_fileSize bytes into m_fileData
if(bytes_transferred+m_rollingSum > m_fileSize)
{
//memcpy(&m_fileData[m_rollingSum], &m_fileReadBuffer, m_fileSize-m_rollingSum);
m_rollingSum += m_fileSize-m_rollingSum;
}
else
{
// memcpy(&m_fileData[m_rollingSum], &m_fileReadBuffer, bytes_transferred);
m_rollingSum += (std::streamsize)bytes_transferred;
}
std::cout << "rolling sum: " << m_rollingSum << std::endl;
this->receive_volScan_message();
}
catch(...)
{
std::cout << "whoops";
}
}
As a suggestion, I have tried changing the recursive loops to for loops on both the client and server. The problem persists, somehow. The only difference is that now instead of exiting 0 before the previously mentioned read_some call, it exits 0 at the end of one of the for loop blocks, just before it starts executing another for loop pass.
EDIT: As it turns out, the error doesn't take place whenever I built the client in debug mode on my IDE.
I haven't completely understood the problem, however I have managed to fix it entirely.
The root of the issue was that on the client, the boost::asio::read calls made main exit with code 0 if the server messages hadn't arrived yet. That means that a simple
while(m_socket.available() == 0)
{
;
}
before all read calls completely prevented the problem. Both in debug and release mode.
This is very strange because as I understand these functions should just block until there is something to read, and even if they encountered errors they should return zero.
I think the debug/release discrepancy happened because the m_readBuffer wasn't initialized to anything whenever the read calls took place. This made the read call return some form of silent error. On debug, uninitialized variables get automatically set to NULL, stealthily fixing my problem.
I have no idea why adding a while loop after the transfer prevented the issue, though. Neither why it normally happened on the end of transfers, after the m_readBuffer had been set and successfully used several times.
On top of that, I have never seen this type of "crash" before, where the program simply exits with code 0 in a random place, with no errors or exceptions thrown.

Why my else condition is never executing?

I am working on UDP server and this code of UDP server is working fine except the else condition. May be i am wrong but i have done lot of things using else condition in the same way to terminate while loop. I am not sure if its UDP problem or something else........
while(1)// execute three times because its getting data only three times from the client
{
int total_bytes = 0;
int bytes_recv=0;
int count = 0;
std::vector<double> m_vector(8000);
// Bytes are also received 3 times correctly then why else condition not executing after receiving 3 times ?
bytes_recv = recvfrom(Socket,(char*)m_vector.data(),64000,0,(SOCKADDR*)&ClientAddr,&i);
count++;
if(bytes_recv > 0 )
{
total_bytes = total_bytes+bytes_recv;
std::cout<<"Server: loop counter is"<<count<<std::endl;
std::cout<<"Server: Received bytes are"<<total_bytes<<std::endl;
}else
{
//why this part never executes ?
std::cout<<"Data Receiving has finished"<<std::endl;
break;
}
}
WSACleanup();
system("pause");
return 0;
}
The comment in the source says that you expect only 3 datagrams from the client. Thus, do count how many datagrams you have received, and if you already have 3 of them, do not continue calling recvfrom.
You already have a variable count, but it is reset to zero every iteration and isn't used as exit condition.
Once you have count == 3, you know that there is nothing more coming, so calling recvfrom is pointless. It will only block, since that is what you're telling it to do. Making the socket non-blocking would "help" to avoid blocking, but then you would be polling, which isn't good either (and useless, since you know there is nothing to be received). It's best to operate correctly.
You could also have the client send an "end of message" datagram, but of course you would have to add a timeout and a strategy for packet loss, or the server could block forever. Not only because of malicious clients, but also simply because the receive buffer was full and a packet was dropped (which is a normal thing to happen!).
Alternatively, since there is a call to WSACleanup in your code, you're using Winsock. Which means you could use overlapped WSARecvFrom instead of recvfrom. Fire off one receive, and from its completion handler fire off another two, also with a callback function. After firing off the request, forget about it and let the callback handle the rest, you can now deal with another client (must be alertable though for that to happen ... alternatively, block on an IOCP or WaitOnMultipleObjects or whatever).
If no second or third packet comes in after so and so long, either send a "please resend" message or consider the client dead, close the socket and move on.
recvfrom is by default a blocking call and will only return once a packet has been read. Because of this when you stop sending packets it just blocks on recvfrom so the case with 0 bytes never happens
You could change the flags to recvfrom to change this behaviour, but it's likely not what you want because then if there's any delay between sending the packets you will get 0 bytes and exit.
I suppose you could see how long you've gone without receiving any packets and then shut down, so in the else case you could use a timer and a running total before exiting.
What are you trying to accomplish?
I have not checked (bad me, I know, but time's short), if recvfrom follows typical behavior, then it guarantees you that:
returns value < 0 means error
returns value == 0 means that everything was OK but channel cannot receive anything more
returns value > 0 means something was received
In TCP you get 'received bytes' == 0 only when the connection is closed.
In UDP there's no such thing as 'connection'. The channel is always ready to receive, until your the socked is closed.
Hence, it probably simply waits until something arrives. It cannot detect that there is noone to listen from. That's the UDP specifics.
If you want to catch a case when nothing arrives for a long time, try to set read timeout.

send and recv on same socket from different threads not working

I read that it should be safe from different threads concurrently, but my program has some weird behaviour and I don't know what's wrong.
I have concurrent threads communicating with a client socket
one doing send to a socket
one doing select and then recv from the same socket
As I'm still sending, the client has already received the data and closed the socket.
At the same time, I'm doing a select and recv on that socket, which returns 0 (since it is closed) so I close this socket. However, the send has not returned yet...and since I call close on this socket the send call fails with EBADF.
I know the client has received the data correctly since I output it after I close the socket and it is right. However, on my end, my send call is still returning an error (EBADF), so I want to fix it so it doesn't fail.
This doesn't always happen. It happens maybe 40% of the time. I don't use sleep anywhere. Am I supposed to have pauses between sends or recvs or anything?
Here's some code:
Sending:
while(true)
{
// keep sending until send returns 0
n = send(_sfd, bytesPtr, sentSize, 0);
if (n == 0)
{
break;
}
else if(n<0)
{
cerr << "ERROR: send returned an error "<<errno<< endl; // this case is triggered
return n;
}
sentSize -= n;
bytesPtr += n;
}
Receiving:
while(true)
{
memset(bufferPointer,0,sizeLeft);
n = recv(_sfd,bufferPointer,sizeLeft, 0);
if (debug) cerr << "Receiving..."<<sizeLeft<<endl;
if(n == 0)
{
cerr << "Connection closed"<<endl; // this case is triggered
return n;
}
else if (n < 0)
{
cerr << "ERROR reading from socket"<<endl;
return n;
}
bufferPointer += n;
sizeLeft -= n;
if(sizeLeft <= 0) break;
}
On the client, I use the same receive code, then I call close() on the socket.
Then on my side, I get 0 from the receive call and also call close() on the socket
Then my send fails. It still hasn't finished?! But my client already got the data!
I must admit I'm surprised you see this problem as often as you do, but it's always a possibility when you're dealing with threads. When you call send() you'll end up going into the kernel to append the data to the socket buffer in there, and it's therefore quite likely that there'll be a context switch, maybe to another process in the system. Meanwhile the kernel has probably buffered and transmitted the packet quite quickly. I'm guessing you're testing on a local network, so the other end receives the data and closes the connection and sends the appropriate FIN back to your end very quickly. This could all happen while the sending machine is still running other threads or processes because the latency on a local ethernet network is so low.
Now the FIN arrives - your receive thread hasn't done a lot lately since it's been waiting for input. Many scheduling systems will therefore raise its priority quite a bit and there's a good chance it'll be run next (you don't specify which OS you're using but this is likely to happen on at least Linux, for example). This thread closes the socket due to its zero read. At some point shortly after this the sending thread will be re-awoken, but presumably the kernel notices that the socket is closed before it returns from the blocked send() and returns EBADF.
Now this is just speculation as to the exact cause - among other things it heavily depends on your platform. But you can see how this could happen.
The easiest solution is probably to use poll() in the sending thread as well, but wait for the socket to become write-ready instead of read-ready. Obviously you also need to wait until there's any buffered data to send - how you do that depends on which thread buffers the data. The poll() call will let you detect when the connection has been closed by flagging it with POLLHUP, which you can detect before you try your send().
As a general rule you shouldn't close a socket until you're certain that the send buffer has been fully flushed - you can only be sure of this once the send() call has returned and indicates that all the remaining data has gone out. I've handled this in the past by checking the send buffer when I get a zero read and if it's not empty I set a "closing" flag. In your case the sending thread would then use this as a hint to do the close once everything is flushed. This matters because if the remote end does a half-close with shutdown() then you'll get a zero read even if it might still be reading. You might not care about half closes, however, in which case your strategy above is OK.
Finally, I personally would avoid the hassle of sending and receiving threads and just have a single thread which does both - that's more or less the point of select() and poll(), to allow a single thread of execution to deal with one or more filehandles without worrying about performing an operation which blocks and starves the other connections.
Found the problem. It's with my loop. Notice that it's an infinite loop. When I don't have anymore left to send, my sentSize is 0, but I'll still loop to try to send more. At this time, the other thread has already closed this thread and so my send call for 0 bytes returns with an error.
I fixed it by changing the loop to stop looping when sentSize is 0 and it fixed the problem!