send and recv on same socket from different threads not working - c++

I read that it should be safe from different threads concurrently, but my program has some weird behaviour and I don't know what's wrong.
I have concurrent threads communicating with a client socket
one doing send to a socket
one doing select and then recv from the same socket
As I'm still sending, the client has already received the data and closed the socket.
At the same time, I'm doing a select and recv on that socket, which returns 0 (since it is closed) so I close this socket. However, the send has not returned yet...and since I call close on this socket the send call fails with EBADF.
I know the client has received the data correctly since I output it after I close the socket and it is right. However, on my end, my send call is still returning an error (EBADF), so I want to fix it so it doesn't fail.
This doesn't always happen. It happens maybe 40% of the time. I don't use sleep anywhere. Am I supposed to have pauses between sends or recvs or anything?
Here's some code:
Sending:
while(true)
{
// keep sending until send returns 0
n = send(_sfd, bytesPtr, sentSize, 0);
if (n == 0)
{
break;
}
else if(n<0)
{
cerr << "ERROR: send returned an error "<<errno<< endl; // this case is triggered
return n;
}
sentSize -= n;
bytesPtr += n;
}
Receiving:
while(true)
{
memset(bufferPointer,0,sizeLeft);
n = recv(_sfd,bufferPointer,sizeLeft, 0);
if (debug) cerr << "Receiving..."<<sizeLeft<<endl;
if(n == 0)
{
cerr << "Connection closed"<<endl; // this case is triggered
return n;
}
else if (n < 0)
{
cerr << "ERROR reading from socket"<<endl;
return n;
}
bufferPointer += n;
sizeLeft -= n;
if(sizeLeft <= 0) break;
}
On the client, I use the same receive code, then I call close() on the socket.
Then on my side, I get 0 from the receive call and also call close() on the socket
Then my send fails. It still hasn't finished?! But my client already got the data!

I must admit I'm surprised you see this problem as often as you do, but it's always a possibility when you're dealing with threads. When you call send() you'll end up going into the kernel to append the data to the socket buffer in there, and it's therefore quite likely that there'll be a context switch, maybe to another process in the system. Meanwhile the kernel has probably buffered and transmitted the packet quite quickly. I'm guessing you're testing on a local network, so the other end receives the data and closes the connection and sends the appropriate FIN back to your end very quickly. This could all happen while the sending machine is still running other threads or processes because the latency on a local ethernet network is so low.
Now the FIN arrives - your receive thread hasn't done a lot lately since it's been waiting for input. Many scheduling systems will therefore raise its priority quite a bit and there's a good chance it'll be run next (you don't specify which OS you're using but this is likely to happen on at least Linux, for example). This thread closes the socket due to its zero read. At some point shortly after this the sending thread will be re-awoken, but presumably the kernel notices that the socket is closed before it returns from the blocked send() and returns EBADF.
Now this is just speculation as to the exact cause - among other things it heavily depends on your platform. But you can see how this could happen.
The easiest solution is probably to use poll() in the sending thread as well, but wait for the socket to become write-ready instead of read-ready. Obviously you also need to wait until there's any buffered data to send - how you do that depends on which thread buffers the data. The poll() call will let you detect when the connection has been closed by flagging it with POLLHUP, which you can detect before you try your send().
As a general rule you shouldn't close a socket until you're certain that the send buffer has been fully flushed - you can only be sure of this once the send() call has returned and indicates that all the remaining data has gone out. I've handled this in the past by checking the send buffer when I get a zero read and if it's not empty I set a "closing" flag. In your case the sending thread would then use this as a hint to do the close once everything is flushed. This matters because if the remote end does a half-close with shutdown() then you'll get a zero read even if it might still be reading. You might not care about half closes, however, in which case your strategy above is OK.
Finally, I personally would avoid the hassle of sending and receiving threads and just have a single thread which does both - that's more or less the point of select() and poll(), to allow a single thread of execution to deal with one or more filehandles without worrying about performing an operation which blocks and starves the other connections.

Found the problem. It's with my loop. Notice that it's an infinite loop. When I don't have anymore left to send, my sentSize is 0, but I'll still loop to try to send more. At this time, the other thread has already closed this thread and so my send call for 0 bytes returns with an error.
I fixed it by changing the loop to stop looping when sentSize is 0 and it fixed the problem!

Related

Unix socket hangs on recv, until I place/remove a breakpoint anywhere

[TL;DR version: the code below hangs indefinitely on the second recv() call both in Release and Debug mode. In Debug, if I place or remove a breakpoint anywhere in the code, it makes the execution continue and everything behaves normally]
I'm coding a simple client-server communication using UNIX sockets. The server is in C++ while the client is in python. The connection (TCP socket on localhost) gets established no problem, but when it comes to receiving data on the server side, it hangs on the recv function. Here is the code where the problem happens:
bool server::readBody(int csock) // csock is the socket filedescriptor
{
int bytecount;
// protobuf-related variables
google::protobuf::uint32 siz;
kinMsg::request message;
// if the code is working, client will send false
// I initialize at true to be sure that the message is actually read
message.set_endconnection(true);
// First, read 4-characters header for extracting data size
char buffer_hdr[5];
if((bytecount = recv(csock, buffer_hdr, 4, MSG_WAITALL))== -1)
::std::cerr << "Error receiving data "<< ::std::endl;
buffer_hdr[4] = '\0';
siz = atoi(buffer_hdr);
// Second, read the data. The code hangs here !!
char buffer [siz];
if((bytecount = recv(csock, (void *)buffer, siz, MSG_WAITALL))== -1)
::std::cerr << "Error receiving data " << errno << ::std::endl;
//Finally, process the protobuf message
google::protobuf::io::ArrayInputStream ais(buffer,siz);
google::protobuf::io::CodedInputStream coded_input(&ais);
google::protobuf::io::CodedInputStream::Limit msgLimit = coded_input.PushLimit(siz);
message.ParseFromCodedStream(&coded_input);
coded_input.PopLimit(msgLimit);
if (message.has_endconnection())
return !message.endconnection();
return false;
}
As can be seen in the code, the protocol is such that the client will first send the number of bytes in the message in a 4-character array, followed by the protobuf message itself. The first recv call works well and does not hang. Then, the code hangs on the second recv call, which should be recovering the body of the message.
Now, for the interesting part. When run in Release mode, the code hangs indefinitely and I have to kill either the client or the server. It does not matter whether I run it from my IDE (qtcreator), or from the CLI after a clean build (using cmake/g++).
When I run the code in Debug mode, it also hangs at the same recv() call. Then, if I place or remove a breakpoint ANYWHERE in the code (before or after that line of code), it starts again and works perfectly : the server receives the data, and reads the correct message.endconnection() value before returning out of the readBody function. The breakpoint that I have to place to trigger this behavior is not necessarily trigerred. Since the readBody() function is in a loop (my C++ server waits for requests from the python client), at the next iteration, the same behavior happens again, and I have to place or remove a breakpoint anywhere in the code, which is not necessarily triggered, in order to go past that recv() call. The loop looks like this:
bool connection = true;
// server waiting for client connection
if (!waitForConnection(connectionID)) std::cerr << "Error accepting connection" << ::std::endl;
// main loop
while(connection)
{
if((bytecount = recv(connectionID, buffer, 4, MSG_PEEK))== -1)
{
::std::cerr << "Error receiving data "<< ::std::endl;
}
else if (bytecount == 0)
break;
try
{
if(readBody(connectionID))
{
sendResponse(connectionID);
}
// if client is requesting disconnection, break the while(true)
else
{
std::cout << "Disconnection requested by client. Exiting ..." << std::endl;
connection = false;
}
}
catch(...)
{
std::cerr << "Erro receiving message from client" << std::endl;
}
}
Finally, as you can see, when the program returns from readBody(), it sends back another message to the client, which processes it and prints in the standard output (python code working, not shown because the question is already long enough). From this last behavior, I can conclude that the protocol and client code are OK. I tried to put sleep instructions at many points to see whether it was a timing problem, but it did not change anything.
I searched all over Google and SO for a similar problem, but did not find anything. Help would be much appreciated !
The solution is to not use any flags. Call recv with 0 for the flags or just use read instead of recv.
You are requesting the socket for data that is not there. The recv expects 10 bytes, but the client only sent 6. The MSG_WAITALL states clearly that the call should block until 10 bytes are available in the stream.
If you dont use any flags, the call will succeed with a bytecount at 6, which is the exact same effect than with MSG_DONTWAIT, without the potential side effects of non-blocking calls.
I did the test on the github project, it works.
The solution is to replace MSG_WAITALL by MSG_DONTWAIT in the recv() calls. It now works fine. To summarize, it makes the recv() calls non blocking, which makes the whole code work fine.
However, this still raises many questions, the first of which being: why was it working with this weird breakpoint changing thing ?
If the socket was blocking in the first place, one could assume that it is because there is no data on the socket. Let's assume both situations here :
There is no data on the socket, which is the reason why the blocking recv() call was not working. Changing it to a non blocking recv() call would then, in the same situation, trigger an error. If not, the protobuf deserialization would afterwards fail trying to deserialize from an empty buffer. But it does not ...
There is data on the socket. Then, why on earth would it block in the first place ?
Obviously there is something that I don't get about sockets in C, and I'd be very happy if somebody has an explanation for this behavior !

Is it expected for poll() to take 40ms to return even though data will be available sooner?

I created a proxy server to handle CQL orders from website clients. The proxy listens for incoming connections and each connection is given a thread. The thread loops as long as the socket exists and dies on HUP. You may also stop the proxy, which will stop the threads by sending an event (See eventfd()) to each thread.
By itself, this already allows me to save a good 100ms because the proxy is local and connecting to a local service is much faster than a service on a remote computer... (even if the computer is local.)
However, I send orders and once in a while the proxy sees no incoming data (i.e. it calls read() on the socket which is setup as NONBLOCK and gets -1 in return and errno == EAGAIN.) When that happens, I call poll() to wait for additional data, the HUP, or a hit on the eventfd meaning I have to quit (i.e. 2 fds, the socket and the eventfd).
Somehow, more often than not, when I hit the poll() function call, it adds an extra 40ms to the time it takes for a message to go round trip. Although one would think this only happens on larger messages, it happens when I receive an order, which is less than 100 bytes! So the size should not be the culprit. I also changed the code to make sure I send the entire order from the client to the proxy in one write() and to avoid the poll() if at all possible (i.e. I call read() first, and poll() only if nothing is available.)
Note that I have no timeout in this case because there is nothing to check other than the incoming orders and the eventfd. So I would imagine that the timeout won't be a problem.
The code base is really big. But the client/server comes down to something like this (the sizes in original are fully dynamic):
// Client
...
connect(socket);
...
write(socket, order, sizeof(order));
read(socket, result, sizeof(result));
// repeat for other orders, as required by client...
// server
...
socket = accept(); // happens for each client
...
pthread_create(runner);
...
// server thread (runner)
...
for(;;)
{
int r(0);
for(;;)
{
r += read(socket, order, sizeof(order));
if(r >= sizeof(order))
{
break;
}
// wait for more data is not enough received yet
poll(..."socket" + "eventfd"...); // <-- this will often take 40ms
if(eventfd_happened)
{
// quit thread
return;
}
}
...
[work on order]
...
write(socket, result, sizeof(result));
}
Note 1: I see the problem when I have a single client. So having multiple clients does not in itself cause the problem either.
Note 2: The client really uses BIO_connect(), BIO_read() and BIO_write() [from OpenSSL], but I doubt that would be a problem. I do not use any kind of encryption.
I don't see why you're using non-blocking I/O given you have a dedicated thread per socket. Just block in read(). Use SO_RCVTIMEO if you need an overall read timeout.

UDP real time sending and receiving on Linux on command from control computer

I am currently working on a project written in C++ involving UDP real time connection. I receive UDP packets from a control computer containing commands to start/stop an infinite while loop that reads data from an IMU and sends that data to the control computer.
My problem is the following: First I implemented an exit condition from the loop using recvfrom() and read(), but the control computer sends a UDP packet every second, which was delaying the whole loop and made sending the data in the desired time interval of 5ms impossible.
I tried to fix this problem by usingfcntl(fd, F_SETFL, O_NONBLOCK);and using only read(), which actually works fine, but I am unsure whether this is a wise idea or not, since I am not checking for errors anymore. Is there any elegant way how to solve this problem? I thought about using Pthreads or something like that, however I have never worked with threads or parallel programming so I would have to spend some time learning that.
I appreciate any advice on that problem you could give me.
Here is a code example:
//include
...
int main() {
RNet cmd; //RNet: struct that contains all the information of the UDP header and the command
RNet* pCmd = &cmd;
ssize_t b;
int fd2;
struct sockaddr_in snd; // sender is control computer
socklen_t length;
// further declaration of variables, connecting to socket, etc...
...
fcntl(fd2, F_SETFL, O_NONBLOCK);
while (1)
{
// read messages from control computer
if ((b = read(fd2, pCmd, 19)) > 0) {
memcpy(&cmd, pCmd, b);
}
// transmission
while (cmd.CLout.MotionCommand == 1) // MotionCommand: 1 - send messages; 0 - do nothing
{
if(time_elapsed >= 5) // elapsed time in ms
{
// update sensor values
...
//sendto ()
...
// update control time, timestamp, etc.
...
}
if (recvfrom(fd2, pCmd, (int)sizeof(pCmd), 0, (struct sockaddr*) &snd, &length) < 0) {
perror("error receiving data");
return 0;
}
// checking Control Model Command
if ((b = read(fd2, pCmd, 19)) > 0) {
memcpy(&cmd, pCmd, b);
}
}
}
}
I really like the "blocking calls on multiple threads" design. It enables you to have distinct independent tasks, and you don't have to worry about how each task can disturb another. It can have some drawbacks but it is usually a good fit for many needs.
To do that, just use pthread_create to create a new thread for each task (you may keep the main thread for one task). In your case, you should have a thread to receive commands, and another one to send your data. You also need for the receiving thread to notify the sending thread of the commands. To do that, you can use some synchronization tool, like a mutex.
Overall, you should have your receiving thread blocking on recvfrom, and the sending thread waiting for a signal from the mutex (wait for the mutex to be freed, technically). When the receiving thread receive a start command, it signals the mutex and go back to recvfrom (optionally you can set a variable to provide more information to the other thread).
As a comment, remember that UDP are 1-to-many, thus your code here will react to any packet sent to you (even from some random or malicious host). You may want to filter with the remote sockaddr after recvfrom, or use connect + recv. It depends on what you want.

Why my else condition is never executing?

I am working on UDP server and this code of UDP server is working fine except the else condition. May be i am wrong but i have done lot of things using else condition in the same way to terminate while loop. I am not sure if its UDP problem or something else........
while(1)// execute three times because its getting data only three times from the client
{
int total_bytes = 0;
int bytes_recv=0;
int count = 0;
std::vector<double> m_vector(8000);
// Bytes are also received 3 times correctly then why else condition not executing after receiving 3 times ?
bytes_recv = recvfrom(Socket,(char*)m_vector.data(),64000,0,(SOCKADDR*)&ClientAddr,&i);
count++;
if(bytes_recv > 0 )
{
total_bytes = total_bytes+bytes_recv;
std::cout<<"Server: loop counter is"<<count<<std::endl;
std::cout<<"Server: Received bytes are"<<total_bytes<<std::endl;
}else
{
//why this part never executes ?
std::cout<<"Data Receiving has finished"<<std::endl;
break;
}
}
WSACleanup();
system("pause");
return 0;
}
The comment in the source says that you expect only 3 datagrams from the client. Thus, do count how many datagrams you have received, and if you already have 3 of them, do not continue calling recvfrom.
You already have a variable count, but it is reset to zero every iteration and isn't used as exit condition.
Once you have count == 3, you know that there is nothing more coming, so calling recvfrom is pointless. It will only block, since that is what you're telling it to do. Making the socket non-blocking would "help" to avoid blocking, but then you would be polling, which isn't good either (and useless, since you know there is nothing to be received). It's best to operate correctly.
You could also have the client send an "end of message" datagram, but of course you would have to add a timeout and a strategy for packet loss, or the server could block forever. Not only because of malicious clients, but also simply because the receive buffer was full and a packet was dropped (which is a normal thing to happen!).
Alternatively, since there is a call to WSACleanup in your code, you're using Winsock. Which means you could use overlapped WSARecvFrom instead of recvfrom. Fire off one receive, and from its completion handler fire off another two, also with a callback function. After firing off the request, forget about it and let the callback handle the rest, you can now deal with another client (must be alertable though for that to happen ... alternatively, block on an IOCP or WaitOnMultipleObjects or whatever).
If no second or third packet comes in after so and so long, either send a "please resend" message or consider the client dead, close the socket and move on.
recvfrom is by default a blocking call and will only return once a packet has been read. Because of this when you stop sending packets it just blocks on recvfrom so the case with 0 bytes never happens
You could change the flags to recvfrom to change this behaviour, but it's likely not what you want because then if there's any delay between sending the packets you will get 0 bytes and exit.
I suppose you could see how long you've gone without receiving any packets and then shut down, so in the else case you could use a timer and a running total before exiting.
What are you trying to accomplish?
I have not checked (bad me, I know, but time's short), if recvfrom follows typical behavior, then it guarantees you that:
returns value < 0 means error
returns value == 0 means that everything was OK but channel cannot receive anything more
returns value > 0 means something was received
In TCP you get 'received bytes' == 0 only when the connection is closed.
In UDP there's no such thing as 'connection'. The channel is always ready to receive, until your the socked is closed.
Hence, it probably simply waits until something arrives. It cannot detect that there is noone to listen from. That's the UDP specifics.
If you want to catch a case when nothing arrives for a long time, try to set read timeout.

C++ non blocking socket select send too slow?

I have a program that maintains a list of "streaming" sockets. These sockets are configured to be non-blocking sockets.
Currently, I have used a list to store these streaming sockets. I have some data that I need to send to all these streaming sockets hence I used the iterator to loop through this list of streaming sockets and calling the send_TCP_NB function below:
The issue is that my own program buffer that stores the data before sending to this send_TCP_NB function slowly decreases in free size indicating that the send is slower than the rate at which data is put into the program buffer. The rate at which the program buffer is about 1000 data per second. Each data is quite small, about 100 bytes.
Hence, i am not sure if my send_TCP_NB function is working efficiently or correct?
int send_TCP_NB(int cs, char data[], int data_length) {
bool sent = false;
FD_ZERO(&write_flags); // initialize the writer socket set
FD_SET(cs, &write_flags); // set the write notification for the socket based on the current state of the buffer
int status;
int err;
struct timeval waitd; // set the time limit for waiting
waitd.tv_sec = 0;
waitd.tv_usec = 1000;
err = select(cs+1, NULL, &write_flags, NULL, &waitd);
if(err==0)
{
// time limit expired
printf("Time limit expired!\n");
return 0; // send failed
}
else
{
while(!sent)
{
if(FD_ISSET(cs, &write_flags))
{
FD_CLR(cs, &write_flags);
status = send(cs, data, data_length, 0);
sent = true;
}
}
int nError = WSAGetLastError();
if(nError != WSAEWOULDBLOCK && nError != 0)
{
printf("Error sending non blocking data\n");
return 0;
}
else
{
if(nError == WSAEWOULDBLOCK)
{
printf("%d\n", nError);
}
return 1;
}
}
}
One thing that would help is if you thought out exactly what this function is supposed to do. What it actually does is probably not what you wanted, and has some bad features.
The major features of what it does that I've noticed are:
Modify some global state
Wait (up to 1 millisecond) for the write buffer to have some empty space
Abort if the buffer is still full
Send 1 or more bytes on the socket (ignoring how much was sent)
If there was an error (including the send decided it would have blocked despite the earlier check), obtain its value. Otherwise, obtain a random error value
Possibly print something to screen, depending on the value obtained
Return 0 or 1, depending on the error value.
Comments on these points:
Why is write_flags global?
Did you really intend to block in this function?
This is probably fine
Surely you care how much of the data was sent?
I do not see anything in the documentation that suggests that this will be zero if send succeeds
If you cleared up what the actual intent of this function was, it would probably be much easier to ensure that this function actually fulfills that intent.
That said
I have some data that I need to send to all these streaming sockets
What precisely is your need?
If your need is that the data must be sent before proceeding, then using a non-blocking write is inappropriate*, since you're going to have to wait until you can write the data anyways.
If your need is that the data must be sent sometime in the future, then your solution is missing a very critical piece: you need to create a buffer for each socket which holds the data that needs to be sent, and then you periodically need to invoke a function that checks the sockets to try writing whatever it can. If you spawn a new thread for this latter purpose, this is the sort of thing select is very useful for, since you can make that new thread block until it is able to write something. However, if you don't spawn a new thread and just periodically invoke a function from the main thread to check, then you don't need to bother. (just write what you can to everything, even if it's zero bytes)
*: At least, it is a very premature optimization. There are some edge cases where you could get slightly more performance by using the non-blocking writes intelligently, but if you don't understand what those edge cases are and how the non-blocking writes would help, then guessing at it is unlikely to get good results.
EDIT: as another answer implied, this is something the operating system is good at anyways. Rather than try to write your own code to manage this, if you find your socket buffers filling up, then make the system buffers larger. And if they're still filling up, you should really give serious thought to the idea that your program needs to block anyways, so that it stops sending data faster than the other end can handle it. i.e. just use ordinary blocking sends for all of your data.
Some general advice:
Keep in mind you are multiplying data. So if you get 1 MB/s in, you output N MB/s with N clients. Are you sure your network card can take it ? It gets worse with smaller packets, you get more general overhead. You may want to consider broadcasting.
You are using non blocking sockets, but you block while they are not free. If you want to be non blocking, better discard the packet immediately if the socket is not ready.
What would be better is to "select" more than one socket at once. Do everything that you are doing but for all the sockets that are available. You'll write to each "ready" socket, then repeat again while there are sockets that are not ready. This way, you'll proceed with the sockets that are available first, and then with some chance, the busy sockets will become themselves available.
the while (!sent) loop is useless and probably buggy. Since you are checking only one socket FD_ISSET will always be true. It is wrong to check again FD_ISSET after a FD_CLR
Keep in mind that your OS has some internal buffers for the sockets and that there are way to extend them (not easy on Linux, though, to get large values you need to do some config as root).
There are some socket libraries that will probably work better than what you can implement in a reasonable time (boost::asio and zmq for the ones I know).
If you need to implement it yourself, (i.e. because for instance zmq has its own packet format), consider using a threadpool library.
EDIT:
Sleeping 1 millisecond is probably a bad idea. Your thread will probably get descheduled and it will take much more than that before you get some CPU time again.
This is just a horrible way to do things. The select serves no purpose but to waste time. If the send is non-blocking, it can mangle data on a partial send. If it's blocking, you still waste arbitrarily much time waiting for one receiver.
You need to pick a sensible I/O strategy. Here is one: Set all sockets non-blocking. When you need to send data to a socket, just call write. If all the data writes, lovely. If not, save the portion of data that wasn't sent for later and add the socket to your write set. When you have nothing else to do, call select. If you get a hit on any socket in your write set, write as many bytes as you can from what you saved. If you write all of them, remove that socket from the write set.
(If you need to write to a data that's already in your write set, just add the data to the saved data to be sent. You may need to close the connection if too much data gets buffered.)
A better idea might be to use a library that already does all these things. Boost::asio is a good one.
You are calling select() before calling send(). Do it the other way around. Call select() only if send() reports WSAEWOULDBLOCK, eg:
int send_TCP_NB(int cs, char data[], int data_length)
{
int status;
int err;
struct timeval waitd;
char *data_ptr = data;
while (data_length > 0)
{
status = send(cs, data_ptr, data_length, 0);
if (status > 0)
{
data_ptr += status;
data_length -= status;
continue;
}
err = WSAGetLastError();
if (err != WSAEWOULDBLOCK)
{
printf("Error sending non blocking data\n");
return 0; // send failed
}
FD_ZERO(&write_flags);
FD_SET(cs, &write_flags); // set the write notification for the socket based on the current state of the buffer
waitd.tv_sec = 0;
waitd.tv_usec = 1000;
status = select(cs+1, NULL, &write_flags, NULL, &waitd);
if (status > 0)
continue;
if (status == 0)
printf("Time limit expired!\n");
else
printf("Error waiting for time limit!\n");
return 0; // send failed
}
return 1;
}