I made a server that uses select() to check witch of the socket descriptors have data in them, but apparently select marks a socket to be ready to read from even after the client disconnects, and I get garbage values.
I have found this post on stack overflow:
select (with the read mask set) will return with the handle signalled, but when you use ioctl* to check the number of bytes pending to be read, it will be zero. `
My question is what is ioctl* and how to use it? And an example would be very good.
If a call to read() on a socket (file) descriptor returns 0, that simply means the other side of the connection had shutdown and closed the connection.
Note: A select() waiting for possible "events" on set(s) of socket (file) descriptors will also return when a connection represented by one of the fd_set's passed to select() had been shutdown.
Check the usual errors people make when using select(2):
Always re-initialize fd_sets you give to select(2) on every iteration - these are input-output arguments that system call modifies for you.
Re-calculate fd_max, the first argument, on every iteration.
Check for errors from all system calls, check the value of errno(3).
And, yes, read(2) returns zero when the other side closed TCP connection cleanly, don't use that socket anymore, just close(2) it.
Related
I was struggling with epoll last days and I'm in the middle of nowhere right now ;)
There's a lot of information on the Internet and obviously in the system man but I probably took an overdose and a bit confused.
In my server app(backend to nginx) I'm waiting for data from clients in the ET mode:
event_template.events = EPOLLIN | EPOLLRDHUP | EPOLLET
Everything has become curious when I have noticed that nginx is responding with 502 despite I could see successful send() on my side. I run wireshark
to sniff and have realised that my server sends(trying and getting RST) data to another machine on the net. So, I decided that socket descriptor is invalid and this is sort of "undefined behaviour". Finally, I found out that on a second recv() I'm getting zero bytes which means that connection has to be closed and I'm not allowed to send data back anymore. Nevertheless, I was getting from epoll not just EPOLLIN but EPOLLRDHUP in a row.
Question: Do I have to close socket just for reading when recv() returns zero and shutdown(SHUT_WR) later on during EPOLLRDHUP processing?
Reading from socket in a nutshell:
std::array<char, BatchSize> batch;
ssize_t total_count = 0, count = 0;
do {
count = recv(_handle, batch.begin(), batch.size(), MSG_DONTWAIT);
if (0 == count && 0 == total_count) {
/// #??? Do I need to wait zero just on first iteration?
close();
return total_count;
} else if (count < 0) {
if (errno == EAGAIN || errno == EWOULDBLOCK) {
/// #??? Will be back with next EPOLLIN?!
break ;
}
_last_error = errno;
/// #brief just log the error
return 0;
}
if (count > 0) {
total_count += count;
/// DATA!
if (count < batch.size()) {
/// #??? Received less than requested - no sense to repeat recv, otherwise I need one more turn?!
return total_count;
}
}
} while (count > 0);
Probably, my the general mistake was attempt to send data on invalid socket descriptor and everything what happens later is just a consequence. But, I continued to dig ;) My second part of a question is about writing to a socket in MSG_DONTWAIT mode as well.
As far as I now know, send() may also return -1 and EAGAIN which means that I'm supposed to subscribe on EPOLLOUT and wait when kernel buffer will be free enough to receive some data from my me. Is this right? But what if client won't wait so long? Or, may I call blocking send(anyway, I'm sending on a different thread) and guarantee the everything what I send to kernel will be really sent to peer because of setsockopt(SO_LINGER)? And a final guess which I ask to confirm: I'm allowed to read and write simultaneously, but N>1 concurrent writes is a data race and everything that I have to deal with it is a mutex.
Thanks to everyone who at least read to the end :)
Questions: Do I have to close socket just for reading when recv()
returns zero and shutdown(SHUT_WR) later on during EPOLLRDHUP
processing?
No, there is no particular reason to perform that somewhat convoluted sequence of actions.
Having received a 0 return value from recv(), you know that the connection is at least half-closed at the network layer. You will not receive anything further from it, and I would not expect EPoll operating in edge-triggered mode to further advertise its readiness for reading, but that does not in itself require any particular action. If the write side remains open (from a local perspective) then you may continue to write() or send() on it, though you will be without a mechanism for confirming receipt of what you send.
What you actually should do depends on the application-level protocol or message exchange pattern you are assuming. If you expect the remote peer to shutdown the write side of its endpoint (connected to the read side of the local endpoint) while awaiting data from you then by all means do send the data it anticipates. Otherwise, you should probably just close the whole connection and stop using it when recv() signals end-of-file by returning 0. Note well that close()ing the descriptor will remove it automatically from any Epoll interest sets in which it is enrolled, but only if there are no other open file descriptors referring to the same open file description.
Any way around, until you do close() the socket, it remains valid, even if you cannot successfully communicate over it. Until then, there is no reason to expect that messages you attempt to send over it will go anywhere other than possibly to the original remote endpoint. Attempts to send may succeed, or they may appear to do even though the data never arrive at the far end, or the may fail with one of several different errors.
/// #??? Do I need to wait zero just on first iteration?
You should take action on a return value of 0 whether any data have already been received or not. Not necessarily identical action, but either way you should arrange one way or another to get it out of the EPoll interest set, quite possibly by closing it.
/// #??? Will be back with next EPOLLIN?!
If recv() fails with EAGAIN or EWOULDBLOCK then EPoll might very well signal read-readiness for it on a future call. Not necessarilly the very next one, though.
/// #??? Received less than requested - no sense to repeat recv, otherwise I need one more turn?!
Receiving less than you requested is a possibility you should always be prepared for. It does not necessarily mean that another recv() won't return any data, and if you are using edge-triggered mode in EPoll then assuming the contrary is dangerous. In that case, you should continue to recv(), in non-blocking mode or with MSG_DONTWAIT, until the call fails with EAGAIN or EWOULDBLOCK.
As far as I now know, send() may also return -1 and EAGAIN which means that I'm supposed to subscribe on EPOLLOUT and wait when kernel buffer will be free enough to receive some data from my me. Is this right?
send() certainly can fail with EAGAIN or EWOULDBLOCK. It can also succeed, but send fewer bytes than you requested, which you should be prepared for. Either way, it would be reasonable to respond by subscribing to EPOLLOUT events on the file descriptor, so as to resume sending later.
But what if client won't wait so long?
That depends on what the client does in such a situation. If it closes the connection then a future attempt to send() to it would fail with a different error. If you were registered only for EPOLLOUT events on the descriptor then I suspect it would be possible, albeit unlikely, to get stuck in a condition where that attempt never happens because no further event is signaled. That likelihood could be reduced even further by registering for and correctly handling EPOLLRDHUP events, too, even though your main interest is in writing.
If the client gives up without ever closing the connection then EPOLLRDHUP probably would not be useful, and you're more likely to get the stale connection stuck indefinitely in your EPoll. It might be worthwhile to address this possibility with a per-FD timeout.
Or, may I call blocking send(anyway, I'm sending on a different
thread) and guarantee the everything what I send to kernel will be
really sent to peer because of setsockopt(SO_LINGER)?
If you have a separate thread dedicated entirely to sending on that specific file descriptor then you can certainly consider blocking send()s. The only drawback is that you cannot implement a timeout on top of that, but other than that, what would such a thread do if it blocking either on sending data or on receiving more data to send?
I don't see quite what SO_LINGER has to do with it, though, at least on the local side. The kernel will make every attempt to send data that you have already dispatched via a send() call to the remote peer, even if you close() the socket while data are still buffered, regardless of the value of SO_LINGER. The purpose of that option is to receive (and drop) straggling data associated with the connection after it is closed, so that they are not accidentally delivered to another socket.
None of this can guarantee that the data are successfully delivered to the remote peer, however. Nothing can guarantee that.
And a final guess which I ask to confirm: I'm allowed to read and
write simultaneously, but N>1 concurrent writes is a data race and
everything that I have to deal with it is a mutex.
Sockets are full-duplex, yes. Moreover, POSIX requires most functions, including send() and recv(), to be thread safe. Nevertheless, multiple threads writing to the same socket is asking for trouble, for the thread safety of individual calls does not guarantee coherency across multiple calls.
I have a blocking SSL BIO object which I want to send data to. The problem is that the connection was closed on the remote side and I cannot find that out until I do a read (BIO_write does NOT return an error). However, I cannot read before I send since I do not want to block. Lastly, the code responsible for sending the data and the code responsible for reading are separate meaning that the failed read cannot trigger another send. How do I fix this?
There are two kinds of "close" states, and are referred to as "half-close" states. They mostly have to do with whether one side or the other side of a socket is going to be sending any more application data. When your recv call returns 0, it is actually notifying you that there is no more data to be received. However, it is still okay to send data, unless the send call signals some other kind of error, like EPIPE or ECONNRESET (I am not sure what the windows equivalents of these are for winsock, but I know they are there). If SSL_write is not returning an error, it is because the other side of the socket is still accepting the data.
The recv call allows a non-blocking check for the "no more data" state, and it can be done like this:
char c;
int r = recv(sock, &c, 1, MSG_DONTWAIT|MSG_PEEK);
If r is 0, the socket has receved an indication that there is no more data pending from the other end. Otherwise, the call will return 1 for a byte of data (which is still in the input buffer because of MSG_PEEK), or -1. If the errno is EAGAIN (which is possible because of MSG_DONTWAIT) there is no error. Any other errno value should be consulted, but is likely an indication that the socket is in an invalid state, and needs to be closed.
Before the socket gets closed, the OpenSSL application is supposed to make sure SSL_shutdown has returned 1. Then, the close on the socket occurs after the SSL object gets destroyed (with SSL_free). What this means is that, unless the application does something abnormal, both sides of the socket using OpenSSL should have seen SSL_shutdown return 1 and then both sides can safely close the connection.
If you want to check for the shutdown state of your SSL context, you can use SSL_get_shutdown, which will report whether or not the other end has started the SSL_shutdown sequence.
I have the following select call for tcp sockets:
ret = select(nfds + 1, &rfds, &rfds2, NULL, &tv);
rfds2 is used when I send to large data (non-blocking mode). And rfds is there to detect if we received something on the socket.
Now, when the send buffer is empty, I detect it with rfds2. But at the same time I get the socket back in rfds, although there is nothing that I received on that socket.
Is that the intended behaviour of the select-call? How can I distinguish orderly between the send and the recieve case?
Now, when the send buffer is empty, I
detect it with rfds2
That's not correct. select() will detect when the send buffer has room. It is hardly ever correct to register a socket for OP_READ and OP_WRITE simultaneously. OP_WRITE is almost always ready, except in the brief intervals when the send buffer is full.
Thanks for your answers. I have found the problem for myself:
The faulty code was after the select call (how I used FD_ISSET() to determine which action I can do).
I think my assumption is true, that there is only a socket in rfds, when there is really some data that can be received.
If the socket is non-blocking that seems to be the expected behaviour. The manual page for select has this to say about the readfds argument:
Those listed in readfds will be
watched to see if characters become
available for reading (more
precisely, to see if a read will not
block; in particular, a file
descriptor is also ready on
end-of-file)
Because the socket is non-blocking it is true that a read would not block and hence it is reasonable for that bit to be set.
It shouldn't cause a problem because if you try and read from the socket you will simply get nothing returned and the read won't block.
As a rule of thumb, whenever select returns you should process each socket that it indicates is ready, either reading and processing whatever data is available if it returns as ready-to-read, or writing more data if it returns as ready-to-write. You shouldn't assume that only one event will be signalled each time it returns.
Initial questions here
So I've been reading up on asynchronous sockets, and I have a couple more questions. Mostly concrete.
1: I can use a blocking socket with select() without repercussions, correct?
2: When I use FD_SET() I'm appending the current fd_set* not changing it, correct?
3: When using FD_CLR(), I can simply pass in the socket ID of the socket I wish to remove, right?
4: When I remove a socket, using FD_CLR(), is there a prefferred way of resetting the Max File Descriptor (nfds)?
5: Say I have all of my connected sockets in a vector, when select() returns, I can just itterate through that vector and check if (FD_ISSET (theVector[loopNum], &readFileSet)) to see if any data needs to be read, correct? And if this returns true, I can simply use the same receiving function I was using on my synchronous sockets to retreive that data?
6: What happens if select() attempts to read from a closed socket? I know it returns -1, but does it set errno or is there some other way I can continue to use select()?
7: Why are you so awesome? =D
I appreciate your time, sorry for the headache, and I hope you can help!
Yes
Unclear? FD_SET inserts a socket into the set. If the socket is already there, nothing changes.
FD_CLR removes a socket from the set, if the socket isn't there nothing's changed
You could keep a parallel set<> of sockets, then get the highest value from there. Or you could just set a bool saying "rescan for nfd before next select" (NOTE: On windows nfd is ignored)
Correct
If select fails, the quick fix is to iterate sockets and select() on each of them one by one to find the bogus one. Optimally your code should not allow select() on a socket you have closed though, if the other end closed it it's perfectly valid to select on.
I need to get you to talk to my wife.
So I've been reading up on asynchronous sockets
Judging by what follows I don't think you have. You appear to have been reading about non-blocking sockets. Not the same thing.
1: I can use a blocking socket with select() without repercussions, correct?
No. Consider the case where a listening socket becomes readable, indicating an impending accept(), but meanwhile the client closes the connection. If you then call accept() you will block until the next incoming connection, preventing you from servicing other sockets.
2: When I use FD_SET() I'm appending the current fd_set* not changing it, correct?
No. You are setting a bit. If it's already set, nothing changes.
3: When using FD_CLR(), I can simply pass in the socket ID of the socket I wish to remove, right?
Correct.
4: When I remove a socket, using FD_CLR(), is there a preferred way of resetting the Max File Descriptor (nfds)?
Not really, just re-scan and re-compute. But you don't really need to reset it actually.
5: Say I have all of my connected sockets in a vector, when select() returns, I can just itterate through that vector and check if (FD_ISSET (theVector[loopNum], &readFileSet)) to see if any data needs to be read, correct?
Correct, but it's more usual just to iterate through the FD set itself.
And if this returns true, I can simply use the same receiving function I was using on my synchronous sockets to retreive that data?
On your blocking sockets, yes.
6: What happens if select() attempts to read from a closed socket?
select() doesn't 'attempt to read from a closed socket. It may attempt to select on a closed socket, in which case it will return -1 with errno == EBADF, as stated in the documentation.
I know it returns -1, but does it set errno or is there some other way I can continue to use select()?
See above.
I have created a named pipe with following flags:
PIPE_ACCESS_DUPLEX - both side read/write access
PIPE_TYPE_MESSAGE - Message type read
PIPE_WAIT - blocking read\write
From the server side I am calling ConnectNamedPipe and waiting for the clients to connect.
From the client side I am calling CallNamedPipe to connect to server and write data of length N.
On the server side:
After the client connects, PeekNamedPipe is called to get the length of the buffer to allocate to read the data buffer.
After getting the exact buffer size (N), I am allocating the buffer of length N and calling ReadFile to read the data from Pipe.
Problem:
The issue is that -- on Single processor machines the PeekNamedPipe API returns the buffer length as 0. Due to this later ReadFile fails.
after some investigation I could find that due to some race condition , PeekNamedPipe API gets called even before data is put onto the Pipe by the client.
Any idea how to solve this race condition ? I need to call PeekNamedPipe to get the buffer size and PeekNamedPipe cannot be called before the data is available.
I thought of introducing custom header to indicate the buffer length in the message itself but this sounds lot of changes.
Is there any better and reliable way to get the length of the data to be read from pipe ?
There are a large number of race conditions you can get with named pipes. You have to deal with them in your code. Possibilities:
ConnectNamedPipe() on the server side may return ERROR_PIPE_CONNECTED if the client managed to connect right after the CreateNamedPipe() call. Just treat it as connected.
WaitNamedPipe on client side does not set the error if it timed out. Assume a timeout.
CreateFile() on client side may return ERROR_PIPE_BUSY if another client managed to grab the pipe first, even after a successful WaitNamedPipe() call. Go back to WaitNamedPipe state.
FlushFileBuffers() may return ERROR_PIPE_NOT_CONNECTED if the client already saw the message and closed the pipe. Ignore that.
An overlapped ReadFile() call may complete immediately and not return ERROR_IO_PENDING. Consider the read completed.
PeekNamedPipe() may return 0 if the server has not written to the pipe yet. Sleep(1) and repeat.
It sounds like you want Aynschronous I/O. Just let Windows notify you when data is available, and peek at that moment.
Having a packet size in the header is a good idea in any case, making the protocol less dependent on transport layer.
Alternatively, if the client sends data and closes the pipe you can accumulate into buffer with ReadFile until EOF.