Boost asio exits with code 0 for no reason. Setting a breakpoint AFTER the problematic statement solves it - c++

I'm writing a TCP server-client pair with boost asio. It's very simple and synchronous.
The server is supposed to transmit a large amount of binary data through several recursive calls to a function that transmits a packet of data over TCP. The client does the analogue, reading and appending the data through a recursive function that reads incoming packets from the socket.
However, in the middle of receiving this data, most times (around 80%) the client just stops recursion suddenly, always before one of the read calls (shown below). It shouldn't be able to do this, given that there are several other statements and function calls after the recursion.
size_t bytes_transferred = m_socket.read_some(boost::asio::buffer(m_fileReadBuffer, m_fileReadBuffer.size()));
m_fileReadBuffer is a boost::array of char, with size 4096 (although I have tried other buffer formats as well with no success).
There is absolutely no way I can conceive of deducing why this is happening.
The program exits immediately, so I can't pass an error code to read_some and read any error messages, since that would need to happen after the read_some statement
No exceptions are thrown
No errors or warnings on compile/runtime
If I put breakpoints inside the recursive function, the problem never happens (transfer completes successfully)
If I put breakpoints after the transfer, or trap the execution in a while loop after the transfer, the problem never happens and there is no sign of anything wrong
Also, it's important to note that the server ALWAYS successfully sends all the data. On top of that, the problem always happens at the very end of transmissions: I can send 8000 bytes and it will exit when around 6000 or 7000 bytes have been transferred, and I can send 8000000 bytes and it will exit when something like 7996000 bytes have been transferred.
I can provide any code necessary, I just have no idea of where the problem could be. Below is the recursive read function on the client:
void TCP_Client::receive_volScan_message()
{
try
{
//If the transfer is complete, exit this loop
if(m_rollingSum >= (std::streamsize)m_fileSize)
{
std::cout << "File transfer complete!\n";
std::cout << m_fileSize << " "<< m_fileData.size() << "\n\n";
return;
}
boost::system::error_code error;
//Transfer isn't complete, so we read some more
size_t bytes_transferred = m_socket.read_some(boost::asio::buffer(m_fileReadBuffer, m_fileReadBuffer.size()));
std::cout << "Received " << (std::streamsize)bytes_transferred << " bytes\n";
//Copy the bytes_transferred to m_fileData vector. Only copies up to m_fileSize bytes into m_fileData
if(bytes_transferred+m_rollingSum > m_fileSize)
{
//memcpy(&m_fileData[m_rollingSum], &m_fileReadBuffer, m_fileSize-m_rollingSum);
m_rollingSum += m_fileSize-m_rollingSum;
}
else
{
// memcpy(&m_fileData[m_rollingSum], &m_fileReadBuffer, bytes_transferred);
m_rollingSum += (std::streamsize)bytes_transferred;
}
std::cout << "rolling sum: " << m_rollingSum << std::endl;
this->receive_volScan_message();
}
catch(...)
{
std::cout << "whoops";
}
}
As a suggestion, I have tried changing the recursive loops to for loops on both the client and server. The problem persists, somehow. The only difference is that now instead of exiting 0 before the previously mentioned read_some call, it exits 0 at the end of one of the for loop blocks, just before it starts executing another for loop pass.
EDIT: As it turns out, the error doesn't take place whenever I built the client in debug mode on my IDE.

I haven't completely understood the problem, however I have managed to fix it entirely.
The root of the issue was that on the client, the boost::asio::read calls made main exit with code 0 if the server messages hadn't arrived yet. That means that a simple
while(m_socket.available() == 0)
{
;
}
before all read calls completely prevented the problem. Both in debug and release mode.
This is very strange because as I understand these functions should just block until there is something to read, and even if they encountered errors they should return zero.
I think the debug/release discrepancy happened because the m_readBuffer wasn't initialized to anything whenever the read calls took place. This made the read call return some form of silent error. On debug, uninitialized variables get automatically set to NULL, stealthily fixing my problem.
I have no idea why adding a while loop after the transfer prevented the issue, though. Neither why it normally happened on the end of transfers, after the m_readBuffer had been set and successfully used several times.
On top of that, I have never seen this type of "crash" before, where the program simply exits with code 0 in a random place, with no errors or exceptions thrown.

Related

Unix socket hangs on recv, until I place/remove a breakpoint anywhere

[TL;DR version: the code below hangs indefinitely on the second recv() call both in Release and Debug mode. In Debug, if I place or remove a breakpoint anywhere in the code, it makes the execution continue and everything behaves normally]
I'm coding a simple client-server communication using UNIX sockets. The server is in C++ while the client is in python. The connection (TCP socket on localhost) gets established no problem, but when it comes to receiving data on the server side, it hangs on the recv function. Here is the code where the problem happens:
bool server::readBody(int csock) // csock is the socket filedescriptor
{
int bytecount;
// protobuf-related variables
google::protobuf::uint32 siz;
kinMsg::request message;
// if the code is working, client will send false
// I initialize at true to be sure that the message is actually read
message.set_endconnection(true);
// First, read 4-characters header for extracting data size
char buffer_hdr[5];
if((bytecount = recv(csock, buffer_hdr, 4, MSG_WAITALL))== -1)
::std::cerr << "Error receiving data "<< ::std::endl;
buffer_hdr[4] = '\0';
siz = atoi(buffer_hdr);
// Second, read the data. The code hangs here !!
char buffer [siz];
if((bytecount = recv(csock, (void *)buffer, siz, MSG_WAITALL))== -1)
::std::cerr << "Error receiving data " << errno << ::std::endl;
//Finally, process the protobuf message
google::protobuf::io::ArrayInputStream ais(buffer,siz);
google::protobuf::io::CodedInputStream coded_input(&ais);
google::protobuf::io::CodedInputStream::Limit msgLimit = coded_input.PushLimit(siz);
message.ParseFromCodedStream(&coded_input);
coded_input.PopLimit(msgLimit);
if (message.has_endconnection())
return !message.endconnection();
return false;
}
As can be seen in the code, the protocol is such that the client will first send the number of bytes in the message in a 4-character array, followed by the protobuf message itself. The first recv call works well and does not hang. Then, the code hangs on the second recv call, which should be recovering the body of the message.
Now, for the interesting part. When run in Release mode, the code hangs indefinitely and I have to kill either the client or the server. It does not matter whether I run it from my IDE (qtcreator), or from the CLI after a clean build (using cmake/g++).
When I run the code in Debug mode, it also hangs at the same recv() call. Then, if I place or remove a breakpoint ANYWHERE in the code (before or after that line of code), it starts again and works perfectly : the server receives the data, and reads the correct message.endconnection() value before returning out of the readBody function. The breakpoint that I have to place to trigger this behavior is not necessarily trigerred. Since the readBody() function is in a loop (my C++ server waits for requests from the python client), at the next iteration, the same behavior happens again, and I have to place or remove a breakpoint anywhere in the code, which is not necessarily triggered, in order to go past that recv() call. The loop looks like this:
bool connection = true;
// server waiting for client connection
if (!waitForConnection(connectionID)) std::cerr << "Error accepting connection" << ::std::endl;
// main loop
while(connection)
{
if((bytecount = recv(connectionID, buffer, 4, MSG_PEEK))== -1)
{
::std::cerr << "Error receiving data "<< ::std::endl;
}
else if (bytecount == 0)
break;
try
{
if(readBody(connectionID))
{
sendResponse(connectionID);
}
// if client is requesting disconnection, break the while(true)
else
{
std::cout << "Disconnection requested by client. Exiting ..." << std::endl;
connection = false;
}
}
catch(...)
{
std::cerr << "Erro receiving message from client" << std::endl;
}
}
Finally, as you can see, when the program returns from readBody(), it sends back another message to the client, which processes it and prints in the standard output (python code working, not shown because the question is already long enough). From this last behavior, I can conclude that the protocol and client code are OK. I tried to put sleep instructions at many points to see whether it was a timing problem, but it did not change anything.
I searched all over Google and SO for a similar problem, but did not find anything. Help would be much appreciated !
The solution is to not use any flags. Call recv with 0 for the flags or just use read instead of recv.
You are requesting the socket for data that is not there. The recv expects 10 bytes, but the client only sent 6. The MSG_WAITALL states clearly that the call should block until 10 bytes are available in the stream.
If you dont use any flags, the call will succeed with a bytecount at 6, which is the exact same effect than with MSG_DONTWAIT, without the potential side effects of non-blocking calls.
I did the test on the github project, it works.
The solution is to replace MSG_WAITALL by MSG_DONTWAIT in the recv() calls. It now works fine. To summarize, it makes the recv() calls non blocking, which makes the whole code work fine.
However, this still raises many questions, the first of which being: why was it working with this weird breakpoint changing thing ?
If the socket was blocking in the first place, one could assume that it is because there is no data on the socket. Let's assume both situations here :
There is no data on the socket, which is the reason why the blocking recv() call was not working. Changing it to a non blocking recv() call would then, in the same situation, trigger an error. If not, the protobuf deserialization would afterwards fail trying to deserialize from an empty buffer. But it does not ...
There is data on the socket. Then, why on earth would it block in the first place ?
Obviously there is something that I don't get about sockets in C, and I'd be very happy if somebody has an explanation for this behavior !

Multithreading in C++, receive message from socket

I have studied Java for 8 months but decided to learn some c++ to on my spare time.
I'm currently making a multithreaded server in QT with minGW. My problem is that when a client connects, I create an instance of Client( which is a class) and pass the socket in the client class contructor.
And then I start a thread in the client object (startClient()) which is going to wait for messages, but it doesn't. Btw, startClient is a method that I create a thread from. See code below.
What happens then? Yes, when I try to send messages to the server, only errors, the server won't print out that a new client connects, and for some reason my computer starts working really hard. And qtcreator gets super slow until I close the server-program.
What I actually is trying to achieve is an object which derives the thread, but I have heard that it isn't a very good idea to do so in C++.
The listener loop in the server:
for (;;)
{
if ((sock_CONNECTION = accept(sock_LISTEN, (SOCKADDR*)&ADDRESS, &AddressSize)))
{
cout << "\nClient connected" << endl;
Client client(sock_CONNECTION); // new object and pass the socket
std::thread t1(&Client::startClient, client); //create thread of the method
t1.detach();
}
}
the Client class:
Client::Client(SOCKET socket)
{
this->socket = socket;
cout << "hello from clientconstructor ! " << endl;
}
void Client::startClient()
{
cout << "hello from clientmethod ! " << endl;
// WHEN I ADD THE CODE BELOW I DON'T GET ANY OUTPUT ON THE CONSOLE!
// No messages gets received either.
char RecvdData[100] = "";
int ret;
for(;;)
{
try
{
ret = recv(socket,RecvdData,sizeof(RecvdData),0);
cout << RecvdData << endl;
}
catch (int e)
{
cout << "Error sending message to client" << endl;
}
}
}
It looks like your Client object is going out of scope after you detach it.
if (/* ... */)
{
Client client(sock_CONNECTION);
std::thread t1(&Client::startClient, client);
t1.detach();
} // GOING OUT OF SCOPE HERE
You'll need to create a pointer of your client object and manage it, or define it at a higher level where it won't go out of scope.
The fact that you never see any output from the Server likely means that your client is unable to connect to your Server in the first place. Check that you are doing your IP addressing correctly in your connect calls. If that looks good, then maybe there is a firewall blocking the connection. Turn that off or open the necessary ports.
Your connecting client is likely getting an error from connect that it is interpreting as success and then trying to send lots of traffic on an invalid socket as fast as it can, which is why your machine seems to be working hard.
You definitely need to check the return values from accept, connect, read and write more carefully. Also, make sure that you aren't running your Server's accept socket in non-blocking mode. I don't think that you are because you aren't seeing any output, but if you did it would infinitely loop on error spawning tons of threads that would also infinitely loop on errors and likely bring your machine to its knees.
If I misunderstood what is happening and you do actually get a client connection and have "Client connected" and "hello from client method ! " output, then it is highly likely that your calls to recv() are failing and you are ignoring the failure. So, you are in a tight infinite loop that is repeatedly outputting "" as fast as possible.
You also probably want to change your catch block to catch (...) rather than int. I doubt either recv() or cout throw an int. Even so, that catch block won't be invoked when recv fails because recv doesn't throw any exceptions AFAIK. It returns its failure indicator through its return value.

Error in ffmpeg when reading from UDP stream

I'm trying to process frames from a UDP stream using ffmpeg. Everything will run fine for a while but av_read_frame() will always eventually return either AVERROR_EXIT (Immeditate exit requested) or -5 (Error number -5 occurred) while the stream should still be running fine. Right before the error it always prints the following message to the console
[mpeg2video # 0caf6600] ac-tex damaged at 14 10
[mpeg2video # 0caf6600] Warning MVs not available
[mpeg2video # 0caf6600] concealing 800 DC, 800 AC, 800 MV errors in I frame
(the numbers in the message vary from run to run)
I have a suspicion that the error is related to calling av_read_frame too quickly. If I have it run as fast as possible, I usually get an error within 10-20 frames, but if I put a sleep before reading it will run fine for a minute or so and then exit with an error. I realize this is hacky and assume there is a better solution. Bottom line: is there a way to dynamically check if 'av_read_frame()' is ready to be called? or a way to supress the error?
Psuedo code of what I'm doing below. Thanks in advance for the help!
void getFrame()
{
//wait here?? seems hacky...
//boost::this_thread::sleep(boost::posix_time::milliseconds(25));
int av_read_frame_error = av_read_frame(m_input_format_context, &m_input_packet);
if(av_read_frame_error == 0){
//DO STUFF - this all works fine when it gets here
}
else{
//error
char errorBuf[AV_ERROR_MAX_STRING_SIZE];
av_make_error_string(errorBuf, AV_ERROR_MAX_STRING_SIZE, av_read_frame_error);
cout << "FFMPEG Input Stream Exit Code: " << av_read_frame_error << " Message: " << errorBuf << endl;
}
}
Incoming frames needs to be handled in a callback function. So the mechanism should be such that a callback gets called whenever there is a new frame. In this way there is no need to manually fine tune the delay.
Disclaimer: I have not used ffmpeg APIs.

send and recv on same socket from different threads not working

I read that it should be safe from different threads concurrently, but my program has some weird behaviour and I don't know what's wrong.
I have concurrent threads communicating with a client socket
one doing send to a socket
one doing select and then recv from the same socket
As I'm still sending, the client has already received the data and closed the socket.
At the same time, I'm doing a select and recv on that socket, which returns 0 (since it is closed) so I close this socket. However, the send has not returned yet...and since I call close on this socket the send call fails with EBADF.
I know the client has received the data correctly since I output it after I close the socket and it is right. However, on my end, my send call is still returning an error (EBADF), so I want to fix it so it doesn't fail.
This doesn't always happen. It happens maybe 40% of the time. I don't use sleep anywhere. Am I supposed to have pauses between sends or recvs or anything?
Here's some code:
Sending:
while(true)
{
// keep sending until send returns 0
n = send(_sfd, bytesPtr, sentSize, 0);
if (n == 0)
{
break;
}
else if(n<0)
{
cerr << "ERROR: send returned an error "<<errno<< endl; // this case is triggered
return n;
}
sentSize -= n;
bytesPtr += n;
}
Receiving:
while(true)
{
memset(bufferPointer,0,sizeLeft);
n = recv(_sfd,bufferPointer,sizeLeft, 0);
if (debug) cerr << "Receiving..."<<sizeLeft<<endl;
if(n == 0)
{
cerr << "Connection closed"<<endl; // this case is triggered
return n;
}
else if (n < 0)
{
cerr << "ERROR reading from socket"<<endl;
return n;
}
bufferPointer += n;
sizeLeft -= n;
if(sizeLeft <= 0) break;
}
On the client, I use the same receive code, then I call close() on the socket.
Then on my side, I get 0 from the receive call and also call close() on the socket
Then my send fails. It still hasn't finished?! But my client already got the data!
I must admit I'm surprised you see this problem as often as you do, but it's always a possibility when you're dealing with threads. When you call send() you'll end up going into the kernel to append the data to the socket buffer in there, and it's therefore quite likely that there'll be a context switch, maybe to another process in the system. Meanwhile the kernel has probably buffered and transmitted the packet quite quickly. I'm guessing you're testing on a local network, so the other end receives the data and closes the connection and sends the appropriate FIN back to your end very quickly. This could all happen while the sending machine is still running other threads or processes because the latency on a local ethernet network is so low.
Now the FIN arrives - your receive thread hasn't done a lot lately since it's been waiting for input. Many scheduling systems will therefore raise its priority quite a bit and there's a good chance it'll be run next (you don't specify which OS you're using but this is likely to happen on at least Linux, for example). This thread closes the socket due to its zero read. At some point shortly after this the sending thread will be re-awoken, but presumably the kernel notices that the socket is closed before it returns from the blocked send() and returns EBADF.
Now this is just speculation as to the exact cause - among other things it heavily depends on your platform. But you can see how this could happen.
The easiest solution is probably to use poll() in the sending thread as well, but wait for the socket to become write-ready instead of read-ready. Obviously you also need to wait until there's any buffered data to send - how you do that depends on which thread buffers the data. The poll() call will let you detect when the connection has been closed by flagging it with POLLHUP, which you can detect before you try your send().
As a general rule you shouldn't close a socket until you're certain that the send buffer has been fully flushed - you can only be sure of this once the send() call has returned and indicates that all the remaining data has gone out. I've handled this in the past by checking the send buffer when I get a zero read and if it's not empty I set a "closing" flag. In your case the sending thread would then use this as a hint to do the close once everything is flushed. This matters because if the remote end does a half-close with shutdown() then you'll get a zero read even if it might still be reading. You might not care about half closes, however, in which case your strategy above is OK.
Finally, I personally would avoid the hassle of sending and receiving threads and just have a single thread which does both - that's more or less the point of select() and poll(), to allow a single thread of execution to deal with one or more filehandles without worrying about performing an operation which blocks and starves the other connections.
Found the problem. It's with my loop. Notice that it's an infinite loop. When I don't have anymore left to send, my sentSize is 0, but I'll still loop to try to send more. At this time, the other thread has already closed this thread and so my send call for 0 bytes returns with an error.
I fixed it by changing the loop to stop looping when sentSize is 0 and it fixed the problem!

boost asio "A non-recoverable error occurred during database lookup"

I'm currently stress testing my server.
sometimes I get "A non-recoverable error occurred during database lookup" Error
coming from error.message()
error is sent to my handling function by boost::asio::placeholders::error called on the async_read method.
I have no idea what this error means, and I am not able to reproduce purposely this error, it only happen sometimes and seems to be random (of course it is not, but it seems)
Does anyone have ever got this error message, and if so, know where it came from ?
EDIT 1
Here's what I found on the boost library, the error is :
no_recovery = BOOST_ASIO_NETDB_ERROR(NO_RECOVERY)
But can't figure out what this is...
EDIT 2
Just so you know everything about my problem, here the design :
I have only one io_service.
Everytime a user is connecting, an async_read is starting, waiting for something to read.
When it reads something, most of the time, it is doing some work on a thread (coming from a pool), and write something synchronously back to the user. (using boost write).
Even since boost 1.37 claims that synchronous write is thread safe, I'm really worried about the fact that it is coming from this.
If the user sends different message really quick, it can happen that async_read and write are called simultaneously, can it does any harm ?
EDIT 3
Here's some portion of my code asked by Dave S :
void TCPConnection::listenForCMD() {
boost::asio::async_read(m_socket,
boost::asio::buffer(m_inbound_data, 3),
boost::asio::transfer_at_least(3),
boost::bind(&TCPConnection::handle_cmd,
shared_from_this(),
boost::asio::placeholders::error)
);
}
void TCPConnection::handle_cmd(const boost::system::error_code& error) {
if (error) {
std::cout << "ERROR READING : " << error.message() << std::endl;
return;
}
std::string str1(m_inbound_data);
std::string str = str1.substr(0,3);
std::cout << "COMMAND FUNCTION: " << str << std::endl;
a_fact func = CommandFactory::getInstance()->getFunction(str);
if (func == NULL) {
std::cout << "command doesn't exist: " << str << std::endl;
return;
}
protocol::in::Command::pointer cmd = func(m_socket, client);
cmd->setCallback(boost::bind(&TCPConnection::command_is_done,
shared_from_this()));
cmd->parse();
}
m_inbound_data is a char[3]
Once cmd->parse() is done, it will call a callback command_is_done
void TCPConnection::command_is_done() {
m_inbound_data[0] = '0';
m_inbound_data[1] = '0';
m_inbound_data[2] = '0';
listenForCMD();
}
The error occurs in the handle_cmd when checking for error at the first line.
As I said before, the cmd->parse() will parse the command it just got, sometime lauching blocking code in a thread coming from a pool. On this thread it sends back data to the client with a synchronous write.
IMPORTANT THING : The callback command_is_done will always be called before the said thread is launched. this means that listenForCMD is already called when the thread may send something back to the client in synchronous write. Therefore my first worries.
When it reads something, most of the time, it is doing some work on a
thread (coming from a pool), and write something synchronously back to
the user. (using boost write). Even since boost 1.37 claims that
synchronous write is thread safe, I'm really worried about the fact
that it is coming from this.
Emphasis added by me, this is incorrect. A single boost::asio::tcp::socket is not thread safe, the documentation is very clear
Thread Safety
Distinct objects: Safe.
Shared objects: Unsafe.
It is also very odd to mix async_read() with a blocking write().