pselect blocks even though data is available for read on socket - c++

I'm experiencing an intermittent delay when reading from a POSIX socket (RHEL6 x86_64 C++ icpc). My code is designed such that a user can provide an absolute timespec deadline (vs. a relative timeout) to be used across multiple calls to recv. I call pselect to make sure that data is available for reading before attempting to call recv.
This typically works as expected (will wait for data but not exceed deadline, introducing no noticeable delay if data is available to recv). However, I have a user that can periodically (~50% of the time) get his application into a state where the select blocks for ~400-500 ms even though data is available on the socket. If I watch /proc/net/tcp, I can see that data is available in the RX queue and I can see the application slowly reading the data off the queue. If I skip the call to pselect and just call recv, the behavior is similar (but less delay overall indicating recv is also blocking unnecessarily). When the application gets into this state it stays this way (experiences consistent delay with each pselect/recv).
I spent several hours poking around here and on other sites. This is the closest similar issue I could find, but there was no resolution...
http://developerweb.net/viewtopic.php?id=7458
Has anyone run into this sort of behavior before? I'm at a loss for what to do. I've instrumented the code to validate that this is where the delay is happening. (Edit: We actually just validated that the entire method below was slow, not any particular system call.) It seems like a kernel/OS issue but I'm not sure where to look. Here's the code...
// protected
bool
Message::wait(int socket, const timespec & deadline) {
// Bail if deadline not provided
if (deadline.tv_sec == 0 && deadline.tv_nsec == 0) {
return true;
}
// Make sure we haven't already exceeded deadline
timespec currentTime;
clock_gettime(CLOCK_REALTIME, &currentTime);
if (VirtualClock::cmptime(currentTime, deadline) >= 0) {
LOG_WARNING("Timed out waiting to receive data");
m_timedOut = true;
return false;
}
// Calculate receive timeout
timespec timeout;
memset(&timeout, 0, sizeof(timeout));
timeout.tv_nsec = VirtualClock::nsecs(currentTime, deadline);
VirtualClock::fixtime(timeout);
// Wait for data
fd_set descSet;
FD_ZERO(&descSet);
FD_SET(socket, &descSet);
int result = pselect(socket + 1, &descSet, NULL, NULL, &timeout, NULL);
if (result == -1) {
m_error = errno;
LOG_ERROR("Failed to wait for data: %d, %s",
m_error, strerror(m_error));
return false;
} else if (result == 0 || !FD_ISSET(socket, &descSet)) {
LOG_WARNING("Timed out waiting to receive data");
m_timedOut = true;
return false;
}
return true;
}
VirtualClock is a time-related utility class just used here to compare/fix-up timespecs (i.e. not introducing any delays). I'd appreciate any insight on this behavior.

This was in fact not a problem with any system call. We used strace to diagnose and were seeing tons of calls to clock_gettime. Another (third) review of the calling code revealed a programming error resulting in the called code having a reference to corrupt stack data. This was facilitated by a flawed API design on my part resulting in corruption of the deadline.
I was allowing the user to pass in a reference to a ServerConfig class containing configuration (including data related to the deadline). My Server class was saving the reference instead of copying the object. The user created an instance of my Server class on the heap, passed in a reference a ServerConfig on the stack (in a method) resulting in non-deterministic garbage in the configuration when the method exited and the ServerConfig went out of scope. This is older code and I've since prevented this sort of thing from happening in other places after being burned but this one slipped through.
So lessons learned for me are: be careful with writing APIs that hang on to user-provided references, rethink premature optimization (the whole reason I was hanging onto a reference instead of just doing a copy), and look for stack corruption when you see non-deterministic behavior like this (something that I check for when I suspect builds are jacked up but didn't suspect this time). Also, strace is a great tool...I've seen others use it but now I'm comfortable using it myself.
Thanks for the comments and sorry for the false alarm.

Related

errorfds vs select return value, and select() returning immediately?

I came to maintain a piece of software that does:
/*... init, setting timeout, etc ... */
FD_ZERO(&set);
FD_SET(socket_, &set);
int selectRes = select(socket_ + 1, &set, NULL, NULL, &timeout);
if (selectRes < 0) {
throw IoException("Select: ", errno);
}
if (selectRes == 0) {
throw TimeoutException();
}
/* ... then handle recvfrom, throw IoException if return < 0 ... */
IoException should cause the program to terminate. Timeout exception resumes operation. Passing without exception loops back. socket_ is a UNIX datagram socket (reading messages from another local process).
This program runs at very high priority (required to react to messages quickly) but it's expected to be idle most of the time, hanging on select()'s timeout waiting for incoming messages. Meanwhile, it seems like it sometimes hogs 100% of CPU time (without receiving enough messages to grant such behavior). The occurrence is rather erratic, never mind the program's high priority makes debugging it very hard (a small single-core Linux embedded system, everything else grinds to a halt).
I'm worried about the NULL in the errorfds position - is testing the return value of select() enough in this case, or may select() return immediately (with 0) if there's an error condition on the socket but errorfds is NULL, and keep repeating doing so every time it loops back to select()?
Or alternatively, what other circumstances, other than an avalanche of messages, could make select() exit immediately (or maybe wait in a spinlock instead of freeing up the CPU time)?

epoll_wait return EPOLLOUT even with EPOLLET flag

I am using linux epoll in edge trigger mode.
Each time a new connection is incoming, I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag. My first question is: What's the right way to check which kind of event(s) occur for each ready file descriptor after the epoll_wait returns? I mean, I see some example code e.g from https://github.com/yedf/handy/blob/master/raw-examples/epoll-et.cc line 124 do it like this:
for (int i = 0; i < n; i++) {
//...
if (events & (EPOLLIN | EPOLLERR)) {
if (fd == lfd) {
handleAccept(efd, fd);
} else {
handleRead(efd, fd);
}
} else if (events & EPOLLOUT) {
if (output_log)
printf("handling epollout\n");
handleWrite(efd, fd);
} else {
exit_if(1, "unknown event");
}
}
What caught my attention is: it uses "if and else if and else" to check which event occurs, which means if it handleRead, then it can't handleWrite at the same time. And I think this may cause loss of event in the following condition: Both socket read and write operation have meet EAGAIN and then the remote end both read and send some data, thus the epoll wait may set both EPOLLIN and EPOLLOUT, but it can only handleRead, and the data remaining in output buffer can't be sent since handleWrite is not being called.
So is the above usage wrong?
According man 7 epoll QA:
If more than one event occurs between epoll_wait(2) calls, are
they combined or reported separately?
They will be combined.
If i got it right, several events can occur on a single file descriptor between epoll_wait calls. So I think I should use multiple "if if and if" to check on by one whether readable/writable/error events occur instead of using "if and else if". I went to see how nginx epoll module do, from https://github.com/nginx/nginx/blob/953f53921505a884f3912f2d8db5217a71c0479a/src/event/modules/ngx_epoll_module.c#L867 I see the following code:
if (revents & (EPOLLERR|EPOLLHUP)) {
//...
}
if ((revents & EPOLLIN) && rev->active) {
//....
rev->handler(rev);
}
if ((revents & EPOLLOUT) && wev->active) {
//....
wev->handler(wev);
}
It seems to adhere to my thoughts of checking all EPOLLERR..,EPOLLIN,EPOLLOUT events one after another.
Then I do the same kind of thing as nginx do in my application. But What I realized after experiment is: if I add the file descriptor to epoll with EPOLLIN|EPOLLOUT|EPOLLET flag, and I didn't fill up the output buffer, I will always get EPOLLOUT flag set after epoll_wait returns due to some data arrives and this fd becomes readable, therefore redundant write_handler would be called, which is not what I expect.
I did some search and found that this situation indeed exists and not caused by any bug in my application. According to the top voted answer at epoll with edge triggered event says:
On a somewhat related note: if you register for EPOLLIN and EPOLLOUT events and assuming you never fill up the send buffer, you still get the EPOLLOUT flag set in the event returned by epoll_wait each time EPOLLIN is triggered - see https://lkml.org/lkml/2011/11/17/234 for a more detailed explanation.
And the link in this answer says:
It's doesn't mean there's an EPOLLOUT "event", it just means a message
is triggered (by the socket becoming readable) so you get a status
update. In theory the program doesn't need to be told about EPOLLOUT
here (it should be assuming the socket is writable already), but it
doesn't do any harm.
So far What I understand about epoll edge trigger mode is:
the epoll_wait return when the state of any fd being monitored has changed, e.g from nothing to read -> readable or buffer is full-> buffer can write
the epoll_wait may return one or several event(flags) for each fd in the ready list.
the flags in sturct epoll_event.events field indicate the current state of this fd. Even if we don't fill out the output buffer, the EPOLLOUT flag would be set when epoll_wait return due to readable, because the current state of the fd is just writable.
Please correct me if I am wrong.
Then my question would be: Should I maintain a flag in each connection to indicate whether EAGAIN occurs when write to output buffer, if it is not set, don't call write_handler/handleWrite in "if (events & EPOLLOUT)" branch, so that my upper layer program would not be told about EPOLLOUT here?
What a great question (since I had pretty much the same question)! I'll just summarize what I think I know now wrt to your informative question/description and your helpful links and hopefully smarter folk will correct any mistakes.
Yes, the if/else handling of event flags is definitely bogus. For sure at least two can events can arrive at effectively the same time. E.g., both the read and write sides might have become unblocked since last you called epoll_wait(). And, of course, as soon as you accept() the connection, both reading and writing suddenly become possible, so you get an "event" of EPOLLIN|EPOLLOUT.
I really didn't grok that epoll_wait() is always delivering the entire current state, rather than only the parts of the state that changed -- thanks for clearing that up. To be perhaps clearer, epoll_wait() won't return an fd unless something changed on that socket, but if something did change, it returns all the flags representing the current state. So, I found myself staring at a stream of EPOLLIN|EPOLLOUT events wondering why it was claiming there was an "output" event, even though I hadn't written anything yet. Your answer being correct: it's just telling me the output side is still writeable.
"Should I maintain a flag..." Yes, but I would imagine that in all but the most trivial situations you were probably going to end up maintaining at least one bit of "am I currently blocked" state for your readers/writers anyway. For example, if you ever want to process data in an order different than how it arrives (e.g., prioritize responses over requests to make your server more resistant to overload) you instantly have to give up the simplicity of just having the arrival of I/O drive everything. In the particular case of writing, epoll simply doesn't have enough information to notify you at the "right" time. As soon as you accept a connection, there's an event that says "you can write now"--but you probably have nothing to write if you're a server who couldn't possibly have already gotten a request from the client. epoll just can't know whether you have something to write or not, so you were always going to have to either suffer essentially "extraneous" events, or maintain your own state.
In all but the simplest cases, the socket file descriptor ends up being insufficient information for handling I/O events, so you invariably have to associate some data structure with it, or object if you prefer. So, my C++ looks something like:
nAwake = epoll_wait(epollFd, events, 100, milliseconds);
if(nAwake < 0)
{
perror("epoll_wait failed");
assert(false);
}
for(int iSocket=0; iSocket < nAwake; ++iSocket)
{
auto This = static_cast<Eventable*>(events[iSocket].data.ptr);
auto eventFlags = events[iSocket].events;
fprintf(stderr, "%s event on socket [%d] -> %s\n",
This->ClassName(), This->fd, DumpEvent(eventFlags));
This->Event(eventFlags);
}
Where Eventable is a C++ class (or derivative thereof) that has all the state needed to decide how to handle the flags epoll delivers. (Of course, this is letting the kernel store a pointer to a C++ object, requiring a design that is very clear about pointer ownership/lifetimes.)
And since you're writing low-level code on Linux, you may also care about EPOLLRDHUP. This not-highly-portable flag lets you save one call to read(). If the client (curl seems pretty good at evoking this behavior) closes its write side of the connection (sends a FIN), you normally discover that when epoll tells you EPOLLIN, but read() returns zero bytes. However, Linux maintains an extra bit to indicate your client's write side (your read side) has been closed. So, if you tell epoll you want the EPOLLRDHUP event you can use it to avoid doing a read() whose sole purpose will turn out to be telling you the writer closed their side.
Note that EPOLLIN will still be turned on whenever EPOLLRDHUP is, AFAIK. Even after you do a shutdown(fd, SHUT_RD). Another example of how you will usually be driven to maintain your own idea of the state of the connection. You care more about clients who are kind enough to do half-shutdowns if you are implementing HTTP.
When used as an edge-triggered interface, for performance reasons,
it
is possible to add the file descriptor inside the epoll interface
(EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).
This allows you
to avoid continuously switching between EPOLLIN and EPOLLOUT calling
epoll_ctl(2) with EPOLL_CTL_MOD.

waveOutWrite buffers are never returned to application

I have a problem with Microsoft's WaveOut API:
edit1: Added Link to sample project:
edit2: removed link, its not representative of the issue
After playing some audio, when I want to terminate a given playback stream, I call the function:
waveOutClose(hWaveOut_);
However, even after waveOutClose() is called, sometimes the library will still access memory previously passed to it by waveOutWrite(), causing an invalid memory access.
I then tried to ensure all the buffers are marked as done before freeing the buffer:
PcmPlayback::~PcmPlayback()
{
if(hWaveOut_ == nullptr)
return;
waveOutReset(hWaveOut_); // infinite-loops, never returns
for(auto it = buffers_.begin(); it != buffers_.end(); ++it)
waveOutUnprepareHeader(hWaveOut_, &it->wavehdr_, sizeof(WAVEHDR));
while( buffers_.empty() == false ) // infinite loops
removeCompletedBuffers();
waveOutClose(hWaveOut_);
//Unhandled exception at 0x75629E80 (msvcrt.dll) in app.exe:
// 0xC0000005: Access violation reading location 0xFEEEFEEE.
}
void PcmPlayback::removeCompletedBuffers()
{
for(auto it = buffers_.begin(); it != buffers_.end();)
{
if( it->wavehdr_.dwFlags & WHDR_DONE )
{
waveOutUnprepareHeader(hWaveOut_, &it->wavehdr_, sizeof(WAVEHDR));
it = buffers_.erase(it);
}
else
++it;
}
}
However, this situation never happens - the buffer never becomes empty. There will be 4-5 blocks remaining with wavehdr_.dwFlags == 18 (I believe this means the blocks are still marked as in playback)
How can I resolve this issue?
# Martin Schlott ("Can you provide the loop where you write the buffer to waveOutWrite?")
Its not quite a loop, instead I have a function that is called whenever I receive an audio packet over the network:
void PcmPlayback::addData(const std::vector<short> &rhs)
{
removeCompletedBuffers();
if(rhs.empty())
return;
// add new data
buffers_.push_back(Buffer());
Buffer & buffer = buffers_.back();
buffer.data_ = rhs;
ZeroMemory(&buffers_.back().wavehdr_, sizeof(WAVEHDR));
buffer.wavehdr_.dwBufferLength = buffer.data_.size() * sizeof(short);
buffer.wavehdr_.lpData = (char *)(buffer.data_.data());
waveOutPrepareHeader(hWaveOut_, &buffer.wavehdr_, sizeof(WAVEHDR)); // prepare block for playback
waveOutWrite(hWaveOut_, &buffer.wavehdr_, sizeof(WAVEHDR));
}
The described behavior can happen if you do not call
waveOutUnprepareHeader
to every buffer you used before you use
waveOutClose
The flagfield _dwFlags seems to indicate that the buffers are still enqueued (WHDR_INQUEUE | WHDR_PREPARED) try:
waveOutReset
before unprepare buffers.
After analyses your code, I found two problems/bugs which are not related to waveOut (funny, you use C++11 but the oldest media interface). You use a vector as buffer. During some calling operations, the vector is copied! One bug I found is:
typedef std::function<void(std::vector<short>)> CALLBACK_FN;
instead of:
typedef std::function<void(std::vector<short>&)> CALLBACK_FN;
which forces a copy of the vector.
Try to avoid using vectors if you expect to use it mostly as rawbuffer. Better use std::unique_pointer as buffer pointer.
Your callback in the recorder is not monitored by a mutex, nor does it check if a destructor was already called. The destructing happens during the callback (mostly) which leads to an exception.
For your test program, go back and use raw pointer and static callbacks before blaming waveOut. Your code is not bad, but the first bug already shows, that a small bug will lead to unpredictical errors. As you also organize your buffers in a std::array, I would search for bugs there. I guess, you make a unintentional copy of your whole buffer array, unpreparing the wrong buffers.
I did not have the time to dig deeper, but I guess those are the problems.
I managed to find my problem in the end, it was caused by multiple bugs and a deadlock. I will document what happened here so people can learn from this in the future
I was clued in to what was happening when I fixed the bugs in the sample:
call waveInStop() before waveInClose() in ~Recorder.cpp
wait for all buffers to have the WHDR_DONE flag before calling waveOutClose() in ~PcmPlayback.
After doing this, the sample worked fine and did not display the behavior of the WHDR_DONE flag never being marked.
In my main program, that behavior was caused by a deadlock that occurs in the following situation:
I have a vector of objects representing each peer I am streaming audio with
Each Object owns a Playback class
This vector is protected by a mutex
Recorder callback:
mutex.lock()
send audio packet to each peer.
Remove Peer:
mutex.lock()
~PcmPlayback
wait for WHDR_DONE flags to be marked
A deadlock occurs when I remove a peer, locking the mutex and the recorder callback tries to acquire a lock too.
Note that this will happen often because the playback buffer is usually (~4 * 20ms) while the recorder has a cadence of 20ms.
In ~PcmPlayback, the buffers will never be marked as WHDR_DONE and any calls to the WaveOut API will never return because the WaveOut API is waiting for the Recorder callback to complete, which is in turn waiting on mutex.lock(), causing a deadlock.

C++ non blocking socket select send too slow?

I have a program that maintains a list of "streaming" sockets. These sockets are configured to be non-blocking sockets.
Currently, I have used a list to store these streaming sockets. I have some data that I need to send to all these streaming sockets hence I used the iterator to loop through this list of streaming sockets and calling the send_TCP_NB function below:
The issue is that my own program buffer that stores the data before sending to this send_TCP_NB function slowly decreases in free size indicating that the send is slower than the rate at which data is put into the program buffer. The rate at which the program buffer is about 1000 data per second. Each data is quite small, about 100 bytes.
Hence, i am not sure if my send_TCP_NB function is working efficiently or correct?
int send_TCP_NB(int cs, char data[], int data_length) {
bool sent = false;
FD_ZERO(&write_flags); // initialize the writer socket set
FD_SET(cs, &write_flags); // set the write notification for the socket based on the current state of the buffer
int status;
int err;
struct timeval waitd; // set the time limit for waiting
waitd.tv_sec = 0;
waitd.tv_usec = 1000;
err = select(cs+1, NULL, &write_flags, NULL, &waitd);
if(err==0)
{
// time limit expired
printf("Time limit expired!\n");
return 0; // send failed
}
else
{
while(!sent)
{
if(FD_ISSET(cs, &write_flags))
{
FD_CLR(cs, &write_flags);
status = send(cs, data, data_length, 0);
sent = true;
}
}
int nError = WSAGetLastError();
if(nError != WSAEWOULDBLOCK && nError != 0)
{
printf("Error sending non blocking data\n");
return 0;
}
else
{
if(nError == WSAEWOULDBLOCK)
{
printf("%d\n", nError);
}
return 1;
}
}
}
One thing that would help is if you thought out exactly what this function is supposed to do. What it actually does is probably not what you wanted, and has some bad features.
The major features of what it does that I've noticed are:
Modify some global state
Wait (up to 1 millisecond) for the write buffer to have some empty space
Abort if the buffer is still full
Send 1 or more bytes on the socket (ignoring how much was sent)
If there was an error (including the send decided it would have blocked despite the earlier check), obtain its value. Otherwise, obtain a random error value
Possibly print something to screen, depending on the value obtained
Return 0 or 1, depending on the error value.
Comments on these points:
Why is write_flags global?
Did you really intend to block in this function?
This is probably fine
Surely you care how much of the data was sent?
I do not see anything in the documentation that suggests that this will be zero if send succeeds
If you cleared up what the actual intent of this function was, it would probably be much easier to ensure that this function actually fulfills that intent.
That said
I have some data that I need to send to all these streaming sockets
What precisely is your need?
If your need is that the data must be sent before proceeding, then using a non-blocking write is inappropriate*, since you're going to have to wait until you can write the data anyways.
If your need is that the data must be sent sometime in the future, then your solution is missing a very critical piece: you need to create a buffer for each socket which holds the data that needs to be sent, and then you periodically need to invoke a function that checks the sockets to try writing whatever it can. If you spawn a new thread for this latter purpose, this is the sort of thing select is very useful for, since you can make that new thread block until it is able to write something. However, if you don't spawn a new thread and just periodically invoke a function from the main thread to check, then you don't need to bother. (just write what you can to everything, even if it's zero bytes)
*: At least, it is a very premature optimization. There are some edge cases where you could get slightly more performance by using the non-blocking writes intelligently, but if you don't understand what those edge cases are and how the non-blocking writes would help, then guessing at it is unlikely to get good results.
EDIT: as another answer implied, this is something the operating system is good at anyways. Rather than try to write your own code to manage this, if you find your socket buffers filling up, then make the system buffers larger. And if they're still filling up, you should really give serious thought to the idea that your program needs to block anyways, so that it stops sending data faster than the other end can handle it. i.e. just use ordinary blocking sends for all of your data.
Some general advice:
Keep in mind you are multiplying data. So if you get 1 MB/s in, you output N MB/s with N clients. Are you sure your network card can take it ? It gets worse with smaller packets, you get more general overhead. You may want to consider broadcasting.
You are using non blocking sockets, but you block while they are not free. If you want to be non blocking, better discard the packet immediately if the socket is not ready.
What would be better is to "select" more than one socket at once. Do everything that you are doing but for all the sockets that are available. You'll write to each "ready" socket, then repeat again while there are sockets that are not ready. This way, you'll proceed with the sockets that are available first, and then with some chance, the busy sockets will become themselves available.
the while (!sent) loop is useless and probably buggy. Since you are checking only one socket FD_ISSET will always be true. It is wrong to check again FD_ISSET after a FD_CLR
Keep in mind that your OS has some internal buffers for the sockets and that there are way to extend them (not easy on Linux, though, to get large values you need to do some config as root).
There are some socket libraries that will probably work better than what you can implement in a reasonable time (boost::asio and zmq for the ones I know).
If you need to implement it yourself, (i.e. because for instance zmq has its own packet format), consider using a threadpool library.
EDIT:
Sleeping 1 millisecond is probably a bad idea. Your thread will probably get descheduled and it will take much more than that before you get some CPU time again.
This is just a horrible way to do things. The select serves no purpose but to waste time. If the send is non-blocking, it can mangle data on a partial send. If it's blocking, you still waste arbitrarily much time waiting for one receiver.
You need to pick a sensible I/O strategy. Here is one: Set all sockets non-blocking. When you need to send data to a socket, just call write. If all the data writes, lovely. If not, save the portion of data that wasn't sent for later and add the socket to your write set. When you have nothing else to do, call select. If you get a hit on any socket in your write set, write as many bytes as you can from what you saved. If you write all of them, remove that socket from the write set.
(If you need to write to a data that's already in your write set, just add the data to the saved data to be sent. You may need to close the connection if too much data gets buffered.)
A better idea might be to use a library that already does all these things. Boost::asio is a good one.
You are calling select() before calling send(). Do it the other way around. Call select() only if send() reports WSAEWOULDBLOCK, eg:
int send_TCP_NB(int cs, char data[], int data_length)
{
int status;
int err;
struct timeval waitd;
char *data_ptr = data;
while (data_length > 0)
{
status = send(cs, data_ptr, data_length, 0);
if (status > 0)
{
data_ptr += status;
data_length -= status;
continue;
}
err = WSAGetLastError();
if (err != WSAEWOULDBLOCK)
{
printf("Error sending non blocking data\n");
return 0; // send failed
}
FD_ZERO(&write_flags);
FD_SET(cs, &write_flags); // set the write notification for the socket based on the current state of the buffer
waitd.tv_sec = 0;
waitd.tv_usec = 1000;
status = select(cs+1, NULL, &write_flags, NULL, &waitd);
if (status > 0)
continue;
if (status == 0)
printf("Time limit expired!\n");
else
printf("Error waiting for time limit!\n");
return 0; // send failed
}
return 1;
}

WaitForRequest with Timeout crashes

EDIT: I have now edited my code a bit to have a rough idea of "all" the code. Maybe this
might be helpful to identify the problem ;)
I have integrated the following simple code fragement which either cancels the timer if data
is read from the TCP socket or otherwise it cancels the data read from the socket
// file tcp.cpp
void CheckTCPSocket()
{
TRequestStatus iStatus;
TSockXfrLength len;
int timeout = 1000;
RTimer timer;
TRequestStatus timerstatus;
TPtr8 buff;
iSocket.RecvOneOrMore( buff, 0, iStatus, len );
timer.CreateLocal();
timer.After(timerstatus, timeout);
// Wait for two requests – if timer completes first, we have a
// timeout.
User::WaitForRequest(iStatus, timerstatus);
if(timerstatus.Int() != KRequestPending)
{
iSocket.CancelRead();
}
else
{
timer.Cancel();
}
timer.Close();
}
// file main.cpp
void TestActiveObject::RunL()
{
TUint Data;
MQueue.ReceiveBlocking(Data);
CheckTCPSocket();
SetActive();
}
This part is executed within active Object and since integrating the code piece above I always get the kernel panic:
E32User-CBase 46: This panic is raised by an active scheduler, a CActiveScheduler. It is caused by a stray signal.
I never had any problem with my code until now this piece of code is executed; code executes fine as data is read from the socket and
then the timer is canceled and closed. I do not understand how the timer object has here any influence on the AO.
Would be great if someone could point me to the right direction.
Thanks
This could be a problem with another active object completing (not one of these two), or SetActive() not being called. See Forum Nokia. Hard to say without seeing all your code!
BTW User::WaitForRequest() is nearly always a bad idea. See why here.
Never mix active objects and User::WaitForRequest().
(Well, almost never. When you know exactly what you are doing it can be ok, but the code you posted suggests you still have some learning to do.)
You get the stray signal panic when the thread request semaphore is signalled with RThread::RequestComplete() by the asynchronous service provider and the active scheduler that was waiting on the semaphore with User::WaitForAnyRequest() tries to look for an active object that was completed so that its RunL() could be called, but cannot find any in its list of active objects.
In this case you have two ongoing requests, neither of which is controlled by the active scheduler (for example, not using CActive::iStatus as the TRequestStatus; issuing SetActive() on an object where CActive::iStatus is not involved in an async request is another error in your code but not the reason for stray signal). You wait for either one of them to complete with WaitForRequest() but don't wait for the other to complete at all. The other request's completion signal will go to the active scheduler's WaitForAnyRequest(), resulting in stray signal. If you cancel a request, you will still need to wait on the thread request semaphore.
The best solution is to make the timeout timer an active object as well. Have a look at the CTimer class.
Another solution is just to add another WaitForRequest on the request not yet completed.
You are calling TestActiveObject::SetActive() but there is no call to any method that sets TestActiveObject::iStatus to KRequestPending. This will create the stray signal panic.
The only iStatus variable in your code is local to the CheckTCPSocket() method.