A lot of system calls like close( fd ) Can be interrupted by a signal. In this case usually -1 is returned and errno is set EINTR.
The question is what is the right thing to do? Say, I still want this fd to be closed.
What I can come up with is:
while( close( fd ) == -1 )
if( errno != EINTR ) {
ReportError();
break;
}
Can anybody suggest a better/more elegant/standard way to handle this situation?
UPDATE:
As noticed by mux, SA_RESTART flag can be used when installing the signal handler.
Can somebody tell me which functions are guaranteed to be restartable on all POSIX systems(not only Linux)?
Some system calls are restartable, which means the kernel will restart the call if interrupted, if the SA_RESTART flag is used when installing the signal handler, the signal(7) man page says:
If a blocked call to one of the following interfaces is interrupted
by a signal handler, then the call will be automatically restarted
after the signal
handler returns if the SA_RESTART flag was used; otherwise the call will fail with the error EINTR:
It doesn't mention if close() is restartable, but these are:
read(2), readv(2), write(2), writev(2), ioctl(2), open(2),wait(2),
wait3(2), wait4(2), waitid(2), and waitpid,accept(2), connect(2),
recv(2), recvfrom(2), recvmsg(2), send(2), sendto(2), and sendmsg(2)
flock(2) and fcntl(2) mq_receive(3), mq_timedreceive(3), mq_send(3),
and mq_timedsend(3) sem_wait(3) and sem_timedwait(3) futex(2)
Note that those details, specifically the list of non-restartable calls, are Linux-specific
I posted a relevant question about which system calls are restartable and if it's specified by POSIX somewhere, it is specified by POSIX but it's optional, so you should check the list of non-restartable calls for your OS, if it's not there it should be restartable. This is my question:
How to know if a Linux system call is restartable or not?
Update: Close is a special case it's not restartable and should not be retried in Linux, see this answer for more details:
https://stackoverflow.com/a/14431867/1157444
Assuming you're after shorter code, you can try something like:
while (((rc = close (fd)) == -1) && (errno == EINTR));
if (rc == -1)
complainBitterly (errno);
Assuming you're after more readable code in addition to shorter, just create a function:
int closeWithRetry (int fd);
and place your readable code in there. Then it doesn't really matter how long it is, it's still a one-liner where you call it, but you can make the function body itself very readable:
int closeWithRetry (int fd) {
// Initial close attempt.
int rc = close (fd);
// As long as you failed with EINTR, keep trying.
// Possibly with a limit (count or time-based).
while ((rc == -1) && (errno == EINTR))
rc = close (fd);
// Once either success or non-retry failure, return error code.
return rc;
}
For the record: On essentially every UNIX, close() must not be retried if it returns EINTR. DO NOT put an EINTR retry-loop in place for close like you would for waitpid() or read(). See this page for more details: http://austingroupbugs.net/view.php?id=529 On linux, Solaris, BSD and others, retrying close() is incorrect. HP-UX is the only common(!) system I could find that requires this.
EINTR means something very different for read() and select() and waitpid() and so on than it does for close(). For most calls, you retry on EINTR because you asked for something to be done which blocks, and if you were interrupted that means it didn't happen, so you try again. For close(), the action you requested was for an entry to be removed from the fd table, which is instantaneous, without error, and will always happen no matter what close() returns.[*] The only reason close() blocks is that sometimes, for special semantics (like TCP linger), it can wait until I/O is done before returning. If close returns EINTR, that means that you asked it to wait but it couldn't. However, the fd was still closed; you just lost your chance to wait on it.
Conclusion: unless you know you can't receive signals, using close() for waiting is a very stupid thing to do. Use an application-level ACK (TCP) or an fsync (file I/O) to make sure any writes were completed before closing the fd.
[*] There is a caveat: if another thread of the process is inside a blocking syscall on the same fd, well, ... it depends.
Related
I am using linux epoll in edge trigger mode.
Each time a new connection is incoming, I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag. My first question is: What's the right way to check which kind of event(s) occur for each ready file descriptor after the epoll_wait returns? I mean, I see some example code e.g from https://github.com/yedf/handy/blob/master/raw-examples/epoll-et.cc line 124 do it like this:
for (int i = 0; i < n; i++) {
//...
if (events & (EPOLLIN | EPOLLERR)) {
if (fd == lfd) {
handleAccept(efd, fd);
} else {
handleRead(efd, fd);
}
} else if (events & EPOLLOUT) {
if (output_log)
printf("handling epollout\n");
handleWrite(efd, fd);
} else {
exit_if(1, "unknown event");
}
}
What caught my attention is: it uses "if and else if and else" to check which event occurs, which means if it handleRead, then it can't handleWrite at the same time. And I think this may cause loss of event in the following condition: Both socket read and write operation have meet EAGAIN and then the remote end both read and send some data, thus the epoll wait may set both EPOLLIN and EPOLLOUT, but it can only handleRead, and the data remaining in output buffer can't be sent since handleWrite is not being called.
So is the above usage wrong?
According man 7 epoll QA:
If more than one event occurs between epoll_wait(2) calls, are
they combined or reported separately?
They will be combined.
If i got it right, several events can occur on a single file descriptor between epoll_wait calls. So I think I should use multiple "if if and if" to check on by one whether readable/writable/error events occur instead of using "if and else if". I went to see how nginx epoll module do, from https://github.com/nginx/nginx/blob/953f53921505a884f3912f2d8db5217a71c0479a/src/event/modules/ngx_epoll_module.c#L867 I see the following code:
if (revents & (EPOLLERR|EPOLLHUP)) {
//...
}
if ((revents & EPOLLIN) && rev->active) {
//....
rev->handler(rev);
}
if ((revents & EPOLLOUT) && wev->active) {
//....
wev->handler(wev);
}
It seems to adhere to my thoughts of checking all EPOLLERR..,EPOLLIN,EPOLLOUT events one after another.
Then I do the same kind of thing as nginx do in my application. But What I realized after experiment is: if I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag, and I didn't fill up the output buffer, I will always get EPOLLOUT flag set after epoll_wait returns due to some data arrives and this fd becomes readable, therefore redundant write_handler would be called, which is not what I expect.
I did some search and found that this situation indeed exists and not caused by any bug in my application. According to the top voted answer at epoll with edge triggered event says:
On a somewhat related note: if you register for EPOLLIN and EPOLLOUT events and assuming you never fill up the send buffer, you still get the EPOLLOUT flag set in the event returned by epoll_wait each time EPOLLIN is triggered - see https://lkml.org/lkml/2011/11/17/234 for a more detailed explanation.
And the link in this answer says:
It's doesn't mean there's an EPOLLOUT "event", it just means a message
is triggered (by the socket becoming readable) so you get a status
update. In theory the program doesn't need to be told about EPOLLOUT
here (it should be assuming the socket is writable already), but it
doesn't do any harm.
So far What I understand about epoll edge trigger mode is:
the epoll_wait return when the state of any fd being monitored has changed, e.g from nothing to read -> readable or buffer is full-> buffer can write
the epoll_wait may return one or several event(flags) for each fd in the ready list.
the flags in sturct epoll_event.events field indicate the current state of this fd. Even if we don't fill out the output buffer, the EPOLLOUT flag would be set when epoll_wait return due to readable, because the current state of the fd is just writable.
Please correct me if I am wrong.
Then my question would be: Should I maintain a flag in each connection to indicate whether EAGAIN occurs when write to output buffer, if it is not set, don't call write_handler/handleWrite in "if (events & EPOLLOUT)" branch, so that my upper layer program would not be told about EPOLLOUT here?
What a great question (since I had pretty much the same question)! I'll just summarize what I think I know now wrt to your informative question/description and your helpful links and hopefully smarter folk will correct any mistakes.
Yes, the if/else handling of event flags is definitely bogus. For sure at least two can events can arrive at effectively the same time. E.g., both the read and write sides might have become unblocked since last you called epoll_wait(). And, of course, as soon as you accept() the connection, both reading and writing suddenly become possible, so you get an "event" of EPOLLIN|EPOLLOUT.
I really didn't grok that epoll_wait() is always delivering the entire current state, rather than only the parts of the state that changed -- thanks for clearing that up. To be perhaps clearer, epoll_wait() won't return an fd unless something changed on that socket, but if something did change, it returns all the flags representing the current state. So, I found myself staring at a stream of EPOLLIN|EPOLLOUT events wondering why it was claiming there was an "output" event, even though I hadn't written anything yet. Your answer being correct: it's just telling me the output side is still writeable.
"Should I maintain a flag..." Yes, but I would imagine that in all but the most trivial situations you were probably going to end up maintaining at least one bit of "am I currently blocked" state for your readers/writers anyway. For example, if you ever want to process data in an order different than how it arrives (e.g., prioritize responses over requests to make your server more resistant to overload) you instantly have to give up the simplicity of just having the arrival of I/O drive everything. In the particular case of writing, epoll simply doesn't have enough information to notify you at the "right" time. As soon as you accept a connection, there's an event that says "you can write now"--but you probably have nothing to write if you're a server who couldn't possibly have already gotten a request from the client. epoll just can't know whether you have something to write or not, so you were always going to have to either suffer essentially "extraneous" events, or maintain your own state.
In all but the simplest cases, the socket file descriptor ends up being insufficient information for handling I/O events, so you invariably have to associate some data structure with it, or object if you prefer. So, my C++ looks something like:
nAwake = epoll_wait(epollFd, events, 100, milliseconds);
if(nAwake < 0)
{
perror("epoll_wait failed");
assert(false);
}
for(int iSocket=0; iSocket < nAwake; ++iSocket)
{
auto This = static_cast<Eventable*>(events[iSocket].data.ptr);
auto eventFlags = events[iSocket].events;
fprintf(stderr, "%s event on socket [%d] -> %s\n",
This->ClassName(), This->fd, DumpEvent(eventFlags));
This->Event(eventFlags);
}
Where Eventable is a C++ class (or derivative thereof) that has all the state needed to decide how to handle the flags epoll delivers. (Of course, this is letting the kernel store a pointer to a C++ object, requiring a design that is very clear about pointer ownership/lifetimes.)
And since you're writing low-level code on Linux, you may also care about EPOLLRDHUP. This not-highly-portable flag lets you save one call to read(). If the client (curl seems pretty good at evoking this behavior) closes its write side of the connection (sends a FIN), you normally discover that when epoll tells you EPOLLIN, but read() returns zero bytes. However, Linux maintains an extra bit to indicate your client's write side (your read side) has been closed. So, if you tell epoll you want the EPOLLRDHUP event you can use it to avoid doing a read() whose sole purpose will turn out to be telling you the writer closed their side.
Note that EPOLLIN will still be turned on whenever EPOLLRDHUP is, AFAIK. Even after you do a shutdown(fd, SHUT_RD). Another example of how you will usually be driven to maintain your own idea of the state of the connection. You care more about clients who are kind enough to do half-shutdowns if you are implementing HTTP.
When used as an edge-triggered interface, for performance reasons,
it
is possible to add the file descriptor inside the epoll interface
(EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).
This allows you
to avoid continuously switching between EPOLLIN and EPOLLOUT calling
epoll_ctl(2) with EPOLL_CTL_MOD.
I have a C++ application that includes this function:
int
mySelect(const int fdMaxPlus1,
fd_set *readFDset,
fd_set *writeFDset,
struct timeval *timeout)
{
retry:
const int selectReturn
= ::select(fdMaxPlus1, readFDset, writeFDset, NULL, timeout);
if (selectReturn < 0 && EINTR == errno) {
// Interrupted system call, such as for profiling signal, try again.
goto retry;
}
return selectReturn;
}
Normally, this code work just fine, however, in one instance, I saw it get into an infinite loop where select() keeps failing with the EINTR errno code. In this case, the caller had set the timeout to zero seconds and zero microseconds, meaning don't wait and return the select() result immediately. I thought that EINTR only occurs when a signal handler occurred, why would I keep getting a signal handler over and over again (for over 12 hours)? This is Centos 5. Once I put this into the debugger to see what was happening, the code returned without EINTR after a couple iterations. Note that the fd being checked is a socket.
I could add a retry limit to the above code, but I'd like to understand what is going on first.
On Linux, select(2) may modify the timeout argument (passed by address). So you should copy it after the call.
retry:
struct timeout timeoutcopy = timeout;
const int selectReturn
= ::select(fdMaxPlus1, readFDset, writeFDset, NULL, &timeoutcopy);
(in your code, your timeout is probably zero or very small after a few or even the first iterations)
BTW, I suggest rather using poll(2) instead of select (since poll is is more C10K problem friendly)
BTW, EINTR happens on any signal (see signal(7)), even without a registered signal handler.
You might use strace to understand the overall behavior of your program.
I have a thread that listens for new connections
new_fd = accept(Listen_fd, (struct sockaddr *) & their_addr, &sin_size);
and another thread that closes Listen_fd when when it's time to close the program. After Listen_fd is closed however, it still blocks. When I use GDB to try and debug accept() doesn't block. I thought that it could be a problem with SO_LINGER, but it shouldn't be on by default, and shouldn't change when using GDB. Any idea whats going on, or any other suggestion to closing the listing socket?
Use: sock.shutdown (socket.SHUT_RD)
Then accept will return EINVAL. No ugly cross thread signals required!
From the Python documentation:
"Note close() releases the resource associated with a connection but does not necessarily close the connection immediately. If you want to close the connection in a timely fashion, call shutdown() before close()."
http://docs.python.org/3/library/socket.html#socket.socket.close
I ran into this problem years ago, while programming in C. But I only found the solution today, after running into the same problem in Python, AND pondering using signals (yuck!), AND THEN remembering the note about shutdown!
As for the comments that say you should not close/use sockets across threads... in CPython the global interpreter lock should protect you (assuming you are using file objects rather than raw, integer file descriptors).
Here is example code:
import socket, threading, time
sock = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt (socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind (('', 8000))
sock.listen (5)
def child ():
print ('child accept ...')
try: sock.accept ()
except OSError as exc : print ('child exception %s' % exc)
print ('child exit')
threading.Thread ( target = child ).start ()
time.sleep (1)
print ('main shutdown')
sock.shutdown (socket.SHUT_RD)
time.sleep (1)
print ('main close')
sock.close ()
time.sleep (1)
print ('main exit')
The behavior of accept when called on something which is not a valid socket FD is undefined. "Not a valid socket FD" includes numbers which were once valid sockets but have since been closed. You might say "but Borealid, it's supposed to return EINVAL!", but that's not guaranteed - for instance, the same FD number might be reassigned to a different socket between your close and accept calls.
So, even if you were to isolate and correct whatever makes your program fail, you could still begin to fail again in the future. Don't do it - correct the error that causes you to attempt to accept a connection on a closed socket.
If you meant that a call which was previously made to accept continues blocking after close, then what you should do is send a signal to the thread which is blocked in accept. This will give it EINTR and it can cleanly disengage - and then close the socket. Don't close it from a thread other than the one using it.
The shutdown() function may be what you are looking for. Calling shutdown(Listen_fd, SHUT_RDWR) will cause any blocked call to accept() to return EINVAL. Coupling a call to shutdown() with the use of an atomic flag can help to determine the reason for the EINVAL.
For example, if you have this flag:
std::atomic<bool> safe_shutdown(false);
Then you can instruct the other thread to stop listening via:
shutdown_handler([&]() {
safe_shutdown = true;
shutdown(Listen_fd, SHUT_RDWR);
});
For completeness, here's how your thread could call accept:
while (true) {
sockaddr_in clientAddr = {0};
socklen_t clientAddrSize = sizeof(clientAddr);
int connSd = accept(Listen_fd, (sockaddr *)&clientAddr, &clientAddrSize);
if (connSd < 0) {
// If shutdown_handler() was called, then exit gracefully
if (errno == EINVAL && safe_shutdown)
break;
// Otherwise, it's an unrecoverable error
std::terminate();
}
char clientname[1024];
std::cout << "Connected to "
<< inet_ntop(AF_INET, &clientAddr.sin_addr, clientname,
sizeof(clientname))
<< std::endl;
service_connection(connSd);
}
It's a workaround, but you could select on Listen_fd with a timeout, and if a timeout occured check that you're about to close the program. If so, exit the loop, if not, go back to step 1 and do the next select.
Are you checking the return value of close?
From linux manpages, (http://www.kernel.org/doc/man-pages/online/pages/man2/close.2.html)
"It is probably unwise to close file descriptors while they may be in use by system calls in other threads in the same process. Since a file descriptor may be reused, there are some obscure race conditions that may cause unintended side effects".
You can use a select instead of an accept and wait for some event from the other thead, then close the socket in the listener thread.
Since it seems that I can't find a solution to my original problem, I tried to do a little workaround. I'm simply trying to set a timeout to the connect() call of my TCP Socket.
I want the connect() to be blocking but not until the usual 75 seconds timeout, I want to define my own.
I have already tried select() which worked for the timeout but I couldn't get a connection (that was my initial problem as described here ).
So now I found another way to deal with it: just do a blocking connect() call but interrupt it with an alarm like this :
signal(SIGALRM, connect_alarm);
int secs = 5;
alarm(secs);
if (connect(m_Socket, (struct sockaddr *)&addr, sizeof(addr)) < 0 )
{
if ( errno == EINTR )
{
debug_printf("Timeout");
m_connectionStatus = STATUS_CLOSED;
return ERR_TIMEOUT;
}
else
{
debug_printf("Other Err");
m_connectionStatus = STATUS_CLOSED;
return ERR_NET_SOCKET;
}
}
with
static void connect_alarm(int signo)
{
debug_printf("SignalHandler");
return;
}
This is the solution I found on the Internet in a thread here on stackoverflow. If I use this code the program starts the timer and then goes into the connect() call. After the 5 seconds the signal handler is fired (as seen on the console with the printf()), but after that the program still remains within the connect() function for 75 seconds. Actually every description says that the connect_alarm() should interrupt the connect() function but it seems it doesn't in my case. Is there any way to get the desired result for my problem?
signal is a massively under-specified interface and should be avoided in new code. On some versions of Linux, I believe it provides "BSD semantics", which means (among other things) that providing SA_RESTART by default.
Use sigaction instead, do not specify SA_RESTART, and you should be good to go.
...
Well, except for the general fragility and unavoidable race conditions, that is. connect will return EINTR for any signal, not just SIGALARM. More troublesome, if the system happens to be under heavy load, it could take more than 5 seconds between the call to alarm and the call to connect, in which case you will miss the signal and block in connect forever.
Your earlier attempt, using non-blocking sockets with connect and select, was a much better idea. I would suggest debugging that.
While it's relatively easy to setup the alarm(2) (less the pain of signal handling and system call interruptions), the more efficient way of timing out TCP connection attempts is the non-blocking connect, which also allows you to initiate multiple connections and wait on all of them, handling successes and failures one at a time.
The man pages for select() do not list EAGAIN as possible error code for the select() function.
Can anyone explain in which cases the select() can produce EAGAIN error?
If I understand select_tut man page, EAGAIN can be produced by sending a signal to the process which is blocked waiting on blocked select(). Is this correct?
Since I am using select() in blocking mode with timeout, like this:
bool selectRepeat = true;
int res = 0;
timeval selectTimeout( timeout );
while ( true == selectRepeat )
{
res = ::select( fd.Get() + 1,
NULL,
&writeFdSet,
NULL,
&selectTimeout );
selectRepeat = ( ( -1 == res ) && ( EINTR == errno ) );
}
should I just repeat the loop when the error number is EAGAIN?
select() will not return EAGAIN under any circumstance.
It may, however, return EINTR if interrupted by a signal (This applies to most system calls).
EAGAIN (or EWOULDBLOCK) may be returned from read, write, recv, send, etc.
EAGAIN is technically not an error, but an indication that the operation terminated without completing, and you should...er...try it again. You will probably want to write logic to retry, but not infinitely. If that was safe, they would have done it themselves in the API.
If you are thinking that returing a silly non-error error code like that is kinda bad client interface design, you aren't the first. It turns out EAGAIN as an error code has a long interesting history in Unix. Among other things, it spawned the widely circulated essay on software design The Rise of Worse-is-Better. There's a couple of paragraphs in the middle that explain why Unix needs to return this sometimes. Yes, it does indeed have something to do with receiving interrupts during an I/O. They call it PC loser-ing.
Many credit this essay as one of the inspirations for Agile programming.