One of my projects on Linux uses blocking sockets. Things happen very serially so non-blocking would just make things more complicated. Anyway, I am finding that often a recv() call is returning -1 with errno set to EAGAIN.
The man page only really mentions this happening for non-blocking sockets, which makes sense. With non-blocking, the socket may or may not be available so you might need to try again.
What would cause it to happen for a blocking socket? Can I do anything to avoid it?
At the moment, my code to deal with it looks something like this (I have it throw an exception on error, but beyond that it is a very simple wrapper around recv()):
int ret;
do {
ret = ::recv(socket, buf, len, flags | MSG_NOSIGNAL);
} while(ret == -1 && errno == EAGAIN);
if(ret == -1) {
throw socket_error(strerror(errno));
}
return ret;
Is this even correct? The EAGAIN condition gets hit pretty often.
EDIT: some things which I've noticed which may be relevant.
I do set a read timeout on the socket using setsockopts(), but it is set to 30 seconds. the EAGAIN's happen way more often than once every 30 secs. CORRECTION my debugging was flawed, EAGAIN's don't happen as often as I thought they did. Perhaps it is the timeout triggering.
For connecting, I want to be able to have connect timeout, so I temporarily set the socket to non-blocking. That code looks like this:
int error = 0;
fd_set rset;
fd_set wset;
int n;
const SOCKET sock = m_Socket;
// set the socket as nonblocking IO
const int flags = fcntl (sock, F_GETFL, 0);
fcntl(sock, F_SETFL, flags | O_NONBLOCK);
errno = 0;
// we connect, but it will return soon
n = ::connect(sock, addr, size_addr);
if(n < 0) {
if (errno != EINPROGRESS) {
return -1;
}
} else if (n == 0) {
goto done;
}
FD_ZERO(&rset);
FD_ZERO(&wset);
FD_SET(sock, &rset);
FD_SET(sock, &wset);
struct timeval tval;
tval.tv_sec = timeout;
tval.tv_usec = 0;
// We "select()" until connect() returns its result or timeout
n = select(sock + 1, &rset, &wset, 0, timeout ? &tval : 0);
if(n == 0) {
errno = ETIMEDOUT;
return -1;
}
if (FD_ISSET(sock, &rset) || FD_ISSET(sock, &wset)) {
socklen_t len = sizeof(error);
if (getsockopt(SOL_SOCKET, SO_ERROR, &error, &len) < 0) {
return -1;
}
} else {
return -1;
}
done:
// We change the socket options back to blocking IO
if (fcntl(sock, F_SETFL, flags) == -1) {
return -1;
}
return 0;
The idea is that I set it to non-blocking, attempt a connect and select on the socket so I can enforce a timeout. Both the set and restore fcntl() calls return successfully, so the socket should end up in blocking mode again when this function completes.
It's possible that you have a nonzero receive timeout set on the socket (via setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO,...)) as that would also cause recv to return EAGAIN
Is it possible that you're using MSG_DONTWAIT is being specified as part of your flags? The man page says EAGAIN will occur if no data is available and this flag is specified.
If you really want to force a block until the recv() is somewhat successful, you may wish to use the MSG_WAITALL flag.
I don't suggest this as a first-attempt fix, but if you're all out of options, you can always select() on the socket with a reasonably long timeout to force it to wait for data.
EAGAIN is generated by the OS almost like an "Oops! I'm sorry for disturbing you.". In case of this error, you may try reading again, This is not a serious or fatal error. I have seen these interrupts occur in Linux and LynxOS anywhere from one a day to 100 times a day.
Related
I'm writing a network game for a university project and while I have messages being sent and received between a client and a server, I'm unsure on how I would go about implementing a writeable fd_set (my lecturer's example code only included a readable fd_set) and what the function is of both fd_sets with select(). Any insight you could give would be great in helping me understand this.
My server code is as such:
bool ServerSocket::Update() {
// Update the connections with the server
fd_set readable;
FD_ZERO(&readable);
// Add server socket, which will be readable if there's a new connection
FD_SET(m_socket, &readable);
// Add connected clients' sockets
if(!AddConnectedClients(&readable)) {
Error("Couldn't add connected clients to fd_set.");
return false;
}
// Set timeout to wait for something to happen (0.5 seconds)
timeval timeout;
timeout.tv_sec = 0;
timeout.tv_usec = 500000;
// Wait for the socket to become readable
int count = select(0, &readable, NULL, NULL, &timeout);
if(count == SOCKET_ERROR) {
Error("Select failed, socket error.");
return false;
}
// Accept new connection to the server socket if readable
if(FD_ISSET(m_socket, &readable)) {
if(!AddNewClient()) {
return false;
}
}
// Check all clients to see if there are messages to be read
if(!CheckClients(&readable)) {
return false;
}
return true;
}
A socket becomes:
readable if there is either data in the socket receive buffer or a pending FIN (recv() is about to return zero)
writable if there is room in the socket receive buffer. Note that this is true nearly all the time, so you should use it only when you've encountered a prior EWOULDBLOCK/EAGAIN on the socket, and stop using it when you don't.
You'd create an fd_set variable called writeable, initialize it the same way (with the same sockets), and pass it as select's third argument:
select(0, &readable, &writeable, NULL, &timeout);
Then after select returns you'd check whether each socket is still in the set writeable. If so, then it's writeable.
Basically, exactly the same way readable works, except that it tells you a different thing about the socket.
select() is terribly outdated and it's interface is arcane. poll (or it's windows counterpart WSAPoll is a modern replacement for it, and should be always preferred.
It would be used in following manner:
WSAPOLLFD pollfd = {m_socket, POLLWRNORM, 0};
int rc = WSAPoll(&pollfd, 1, 100);
if (rc == 1) {
// Socket is ready for writing!
}
I have a C++ and Qt application which part of it implements a C socket client. Some time ago by app crashed because something happened with the server; the only thing I got from that crash was a message in Qt Creator's Application Output stating
recv_from_client: Connection reset by peer
I did some research on the web about this "connection reset by peer" error and while some threads here in SO and other places did managed to explain what is going on, none of them tells how to handle it - that is, how can I "catch" the error and continue my application without a crash (particularly the method where I read from the server is inside a while loop, so I'ld like to stop the while loop and enter in another place of my code that will try to re-establish the connection).
So how can I catch this error to handle it appropriately? Don't forget that my code is actually C++ with Qt - the C part is a library which calls the socket methods.
EDIT
Btw, the probable method from which the crash originated (given the "recv_from_client" part of the error message above) was:
int hal_socket_read_from_client(socket_t *obj, u_int8_t *buffer, int size)
{
struct s_socket_private * const socket_obj = (struct s_socket_private *)obj;
int retval = recv(socket_obj->client_fd, buffer, size, MSG_DONTWAIT); //last = 0
if (retval < 0)
perror("recv_from_client");
return retval;
}
Note: I'm not sure if by the time this error occurred, the recv configuration was with MSG_DONTWAIT or with 0.
Just examine errno when read() returns a negative result.
There is normally no crash involved.
while (...) {
ssize_t amt = read(sock, buf, size);
if (amt > 0) {
// success
} else if (amt == 0) {
// remote shutdown (EOF)
} else {
// error
// Interrupted by signal, try again
if (errno == EINTR)
continue;
// This is fatal... you have to close the socket and reconnect
// handle errno == ECONNRESET here
// If you use non-blocking sockets, you also have to handle
// EWOULDBLOCK / EAGAIN here
return;
}
}
It isn't an exception or a signal. You can't catch it. Instead, you get an error which tells you that the connection has been resetted when trying to work on that socket.
int rc = recv(fd, ..., ..., ..., ...);
if (rc == -1)
{
if (errno == ECONNRESET)
/* handle it; there isn't much to do, though.*/
else
perror("Error while reading");
}
As I've written, there isn't much you can do. If you're using some I/O multiplexer, you may want to remove that file descriptor from further monitoring.
I have written simple C/S applications to test the characteristics of non-blocking sockets, here is some brief information about the server and client:
//On linux The server thread will send
//a file to the client using non-blocking socket
void *SendFileThread(void *param){
CFile* theFile = (CFile*) param;
int sockfd = theFile->GetSocket();
set_non_blocking(sockfd);
set_sock_sndbuf(sockfd, 1024 * 64); //set the send buffer to 64K
//get the total packets count of target file
int PacketCOunt = theFile->GetFilePacketsCount();
int CurrPacket = 0;
while (CurrPacket < PacketCount){
char buffer[512];
int len = 0;
//get packet data by packet no.
GetPacketData(currPacket, buffer, len);
//send_non_blocking_sock_data will loop and send
//data into buffer of sockfd until there is error
int ret = send_non_blocking_sock_data(sockfd, buffer, len);
if (ret < 0 && errno == EAGAIN){
continue;
} else if (ret < 0 || ret == 0 ){
break;
} else {
currPacket++;
}
......
}
}
//On windows, the client thread will do something like below
//to receive the file data sent by the server via block socket
void *RecvFileThread(void *param){
int sockfd = (int) param; //blocking socket
set_sock_rcvbuf(sockfd, 1024 * 256); //set the send buffer to 256
while (1){
struct timeval timeout;
timeout.tv_sec = 1;
timeout.tv_usec = 0;
fd_set rds;
FD_ZERO(&rds);
FD_SET(sockfd, &rds)'
//actually, the first parameter of select() is
//ignored on windows, though on linux this parameter
//should be (maximum socket value + 1)
int ret = select(sockfd + 1, &rds, NULL, NULL, &timeout );
if (ret == 0){
// log that timer expires
CLogger::log("RecvFileThread---Calling select() timeouts\n");
} else if (ret) {
//log the number of data it received
int ret = 0;
char buffer[1024 * 256];
int len = recv(sockfd, buffer, sizeof(buffer), 0);
// handle error
process_tcp_data(buffer, len);
} else {
//handle and break;
break;
}
}
}
What surprised me is that the server thread fails frequently because of socket buffer full, e.g. to send a file of 14M size it reports 50000 failures with errno = EAGAIN. However, via logging I observed there are tens of timeouts during the transfer, the flow is like below:
on the Nth loop, select() succeeds and read 256K's data successfully.
on the (N+1)th loop, select() failed with timeout.
on the (N+2)th loop, select() succeeds and read 256K's data successfully.
Why there would be timeouts interleaved during the receving? Can anyone explain this phenomenon?
[UPDATE]
1. Uploading a file of 14M to the server only takes 8 seconds
2. Using the same file with 1), the server takes nearly 30 seconds to send all data to the client.
3. All sockets used by the client are blocking. All sockets used by the server are non-blocking.
Regarding #2, I think timeouts are the reason why #2 takes much more time then #1, and I wonder why there would be so many timeouts when the client is busy in receiving data.
[UPDATE2]
Thanks for comments from #Duck, #ebrob, #EJP, #ja_mesa , I will do more investigation today
then update this post.
Regarding why I send 512 bytes per loop in the server thread, it is because I found the server thread sends data much faster than the client thread receiving them. I am very confused that why timeout happened to the client thread.
Consider this more of a long comment than an answer but as several people have noted the network is orders of magnitude slower than your processor. The point of non-blocking i/o is that the difference is so great that you can actually use it to do real work rather than blocking. Here you are just pounding on the elevator button hoping that makes a difference.
I'm not sure how much of your code is real and how much is chopped up for posting but in the server you don't account for (ret == 0) i.e. normal shutdown by the peer.
The select in the client is wrong. Again, not sure if that was sloppy editing or not but if not then the number of parameters are wrong but, more concerning, the first parameter - i.e. should be the highest file descriptor for select to look at plus one - is zero. Depending on the implementation of select I wonder if that is in fact just turning select into a fancy sleep statement.
You should be calling recv() first and then call select() only if recv() tells you to do so. Don't call select() first, that is a waste of processing. recv() knows if data is immediately available or if it has to wait for data to arrive:
void *RecvFileThread(void *param){
int sockfd = (int) param; //blocking socket
set_sock_rcvbuf(sockfd, 1024 * 256); //set the send buffer to 256
char buffer[1024 * 256];
while (1){
int ret = 0;
int len = recv(sockfd, buffer, sizeof(buffer), 0);
if (len == -1) {
if (WSAGetLastError() != WSAEWOULDBLOCK) {
//handle error
break;
}
struct timeval timeout;
timeout.tv_sec = 1;
timeout.tv_usec = 0;
fd_set rds;
FD_ZERO(&rds);
FD_SET(sockfd, &rds)'
//actually, the first parameter of select() is
//ignored on windows, though on linux this parameter
//should be (maximum socket value + 1)
int ret = select(sockfd + 1, &rds, NULL, &timeout );
if (ret == -1) {
// handle error
break;
}
if (ret == 0) {
// log that timer expires
break;
}
// socket is readable so try read again
continue;
}
if (len == 0) {
// handle graceful disconnect
break;
}
//log the number of data it received
process_tcp_data(buffer, len);
}
}
Do something similar on the sending side as well. Call send() first, and then call select() waiting for writability only if send() tells you to do so.
I'm having a strange problem while attempting to transform a blocking socket server into a nonblocking one. Though the message was only received once when being sent with blocking sockets, using nonblocking sockets the message seems to be received an infinite number of times.
Here is the code that was changed:
return ::write(client, message, size);
to
// Nonblocking socket code
int total_sent = 0, result = -1;
while( total_sent < size ) {
// Create a temporary set of flags for use with the select function
fd_set working_set;
memcpy(&working_set, &master_set, sizeof(master_set));
// Check if data is available for the socket - wait 1 second for timeout
timeout.tv_sec = 1;
timeout.tv_usec = 0;
result = select(client + 1, NULL, &working_set, NULL, &timeout);
// We are able to write - do so
result = ::write(client, &message[total_sent], (size - total_sent));
if (result == -1) {
std::cerr << "An error has occured while writing to the server."
<< std::endl;
return result;
}
total_sent += result;
}
return 0;
EDIT: The initialization of the master set looks like this:
// Private member variables in header file
fd_set master_set;
int sock;
...
// Creation of socket in class constructor
sock = ::socket(PF_INET, socket_type, 0);
// Makes the socket nonblocking
fcntl(sock,F_GETFL,0);
FD_ZERO(&master_set);
FD_SET(sock, &master_set);
...
// And then when accept is called on the socket
result = ::accept(sock, NULL, NULL);
if (result > 0) {
// A connection was made with a client - change the master file
// descriptor to note that
FD_SET(result, &master_set);
}
I have confirmed that in both cases, the code is only being called once for the offending message. Also, the client side code hasn't changed at all - does anyone have any recommendations?
fcntl(sock,F_GETFL,0);
How does that make the socket non-blocking?
fcntl(sock, F_SETFL, O_NONBLOCK);
Also, you are not checking if you can actually write to the socket non-blocking style with
FD_ISSET(client, &working_set);
I do not believe that this code is really called only once in the "non blocking" version (quotes because it is not really non-blocking yet as Maister pointed out, look here), check again. If the blocking and non blocking versions are consistent, the non blocking version should return total_sent (or size). With return 0 instead caller is likely to believe nothing was sent. Which would cause infinite sending... is it not what's happening ?
Also your "non blocking" code is quite strange. You seem to use select to make it blocking anyway... Ok, with a timeout of 1s, but why don't you make it really non blocking ? ie: remove all the select stuff and test for error case in write() with errno being EWOULDBLOCK. select or poll are for multiplexing.
Also you should check errors for select and use FD_ISSET to check if socket is really ready. What if the 1 s timeout really happen ? Or if select is stopped by some interruption ? And if an error occurs in write, you should also write which error, that is much more useful than your generic message. But I guess this part of code is still far from finished.
As far as I understand your code it should probably look somewhat like that (if the code is running in an unique thread or threaded, or forking when accepting a connection would change details):
// Creation of socket in class constructor
sock = ::socket(PF_INET, socket_type, 0);
fcntl(sock, F_SETFL, O_NONBLOCK);
// And then when accept is called on the socket
result = ::accept(sock, NULL, NULL);
if (result > 0) {
// A connection was made with a client
client = result;
fcntl(client, F_SETFL, O_NONBLOCK);
}
// Nonblocking socket code
result = ::write(client, &message[total_sent], (size - total_sent));
if (result == -1) {
if (errno == EWOULDBLOCK){
return 0;
}
std::cerr << "An error has occured while writing to the server."
<< std::endl;
return result;
}
return size;
Update:
My bad. The error I am getting is ECONNREFUSED and not EINPROGRESS. After checking the error variable I've found that it is greater than 0, I printfed errno instead of error.
Of course errno is EINPROGRESS because it value didn't change since the call to connect().
Question answered. Thanks folks.
I am using the the the same piece of code as in Stevens' UNIX Network Programming non >blocking connect() example:
Setting socket to nonblocking
Initiate nonblocking connect()
Check for immediate completion
Call select() with timeout and wait for read or write readiness
When select() returns with value greater than 0 do getsockopt(socket, SOL_SOCKET, SO_ERROR, &error, &len).
The error I am getting is EINPROGRESS.
The code is executed on rhel5 server.
Any ideas why I am getting this error?
Code snippet:
flags = fcntl(sockfd, F_GETFL, 0);
fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
if ((retVal = connect(sockfd, saptr, salen)) < 0)
if (errno != EINPROGRESS)
return (-1);
if (retVal == 0)
{
// restore file status flags
fcntl(sockfd, F_SETFL, flags);
return 0;
}
FD_ZERO(&rset);
FD_SET(sockfd, &rset);
wset = rset;
tval.tv_sec = nsec;
tval.tv_usec = 0;
if ((retVal = select(sockfd + 1, &rset, &wset, NULL, &tval)) == 0)
{
// timeout
close(sockfd);
errno = ETIMEDOUT;
return (-1);
}
if (retVal < 0)
{
// select() failed
return (-1);
}
if (FD_ISSET(sockfd, &rset) || FD_ISSET(sockfd, &wset))
{
len = sizeof(error);
error = 0;
if (getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len) < 0)
return (-1);
if (error > 0) //<<<<< error == EINPROGRESS >>>
{
close(sockfd);
errno = error;
return (-1);
}
}
else
{
return (-1);
}
// restore file status flags
fcntl(sockfd, F_SETFL, flags);
Shucks....I give up. I tried and tried but could not find a problem with your code.
So, all I offer are some suggestions and hypothesis.
Use FD_COPY to copy rset to wset.
Can you do a getsockopt when the first connect fails.
I am suspecting that Select is returning 0 and because of above, somehow your writefd set is set and doing a getsockopt is returning you the stale error from the previous connect. IMO, a subsquent connect should return EALREADY (although it could be platform dependant)
Please do the above (and downvote me if I am wrong about using FD_COPY) and share the results
EINPROGRESS indicates that the non-blocking connect() is still... in progress.
So I would ask - are you checking that after the select() returns, the connect()ing file descriptor is still set in the writefds fd_set? If it isn't, then that means select() has returned for some other reason.
A select value greater than zero doesn't indicate any sort of error... it is the number of sockets that met the criteria of the select statement. select will return -1 in the event of a failure.
It's not clear to me when calling getsocktopt with SO_ERROR is valid; you are probably best of sticking with checking for errors everytime you use the socket ( which I think you are doing anyway ). It the select isn't failing, you are fine.