How to trace resource deadlocks? - c++

I've wrote a timer using std::thread - here is how it looks like:
TestbedTimer::TestbedTimer(char type, void* contextObject) :
Timer(type, contextObject) {
this->active = false;
}
TestbedTimer::~TestbedTimer(){
if (this->active) {
this->active = false;
if(this->timer->joinable()){
try {
this->timer->join();
} catch (const std::system_error& e) {
std::cout << "Caught system_error with code " << e.code() <<
" meaning " << e.what() << '\n';
}
}
if(timer != nullptr) {
delete timer;
}
}
}
void TestbedTimer::run(unsigned long timeoutInMicroSeconds){
this->active = true;
timer = new std::thread(&TestbedTimer::sleep, this, timeoutInMicroSeconds);
}
void TestbedTimer::sleep(unsigned long timeoutInMicroSeconds){
unsigned long interval = 500000;
if(timeoutInMicroSeconds < interval){
interval = timeoutInMicroSeconds;
}
while((timeoutInMicroSeconds > 0) && (active == true)){
if (active) {
timeoutInMicroSeconds -= interval;
/// set the sleep time
std::chrono::microseconds duration(interval);
/// set thread to sleep
std::this_thread::sleep_for(duration);
}
}
if (active) {
this->notifyAllListeners();
}
}
void TestbedTimer::interrupt(){
this->active = false;
}
I'm not really happy with that kind of implementation since I let the timer sleep for a short interval and check if the active flag has changed (but I don't know a better solution since you can't interrupt a sleep_for call). However, my program core dumps with the following message:
thread is joinable
Caught system_error with code generic:35 meaning Resource deadlock avoided
thread has rejoined main scope
terminate called without an active exception
Aborted (core dumped)
I've looked up this error and as seems that I have a thread which waits for another thread (the reason for the resource deadlock). However, I want to find out where exactly this happens. I'm using a C library (which uses pthreads) in my C++ code which provides among other features an option to run as a daemon and I'm afraid that this interfers with my std::thread code. What's the best way to debug this?
I've tried to use helgrind, but this hasn't helped very much (it doesn't find any error).
TIA
** EDIT: The code above is actually not exemplary code, but I code I've written for a routing daemon. The routing algorithm is a reactive meaning it starts a route discovery only if it has no routes to a desired destination and does not try to build up a routing table for every host in its network. Every time a route discovery is triggered a timer is started. If the timer expires the daemon is notified and the packet is dropped. Basically, it looks like that:
void Client::startNewRouteDiscovery(Packet* packet) {
AddressPtr destination = packet->getDestination();
...
startRouteDiscoveryTimer(packet);
...
}
void Client::startRouteDiscoveryTimer(const Packet* packet) {
RouteDiscoveryInfo* discoveryInfo = new RouteDiscoveryInfo(packet);
/// create a new timer of a certain type
Timer* timer = getNewTimer(TimerType::ROUTE_DISCOVERY_TIMER, discoveryInfo);
/// pass that class as callback object which is notified if the timer expires (class implements a interface for that)
timer->addTimeoutListener(this);
/// start the timer
timer->run(routeDiscoveryTimeoutInMilliSeconds * 1000);
AddressPtr destination = packet->getDestination();
runningRouteDiscoveries[destination] = timer;
}
If the timer has expired the following method is called.
void Client::timerHasExpired(Timer* responsibleTimer) {
char timerType = responsibleTimer->getType();
switch (timerType) {
...
case TimerType::ROUTE_DISCOVERY_TIMER:
handleExpiredRouteDiscoveryTimer(responsibleTimer);
return;
....
default:
// if this happens its a bug in our code
logError("Could not identify expired timer");
delete responsibleTimer;
}
}
I hope that helps to get a better understanding of what I'm doing. However, I did not to intend to bloat the question with that additional code.

Related

gRPC: How can RPC handlers properly detect if `Server` has been `Shutdown()`

Current, I'm using a hackish way – a global variable – to make RPC handlers able to detect that the Server has been (about to be) called Shutdown().
bool g_ServerIsNotDead = true; // Hack!
Status StreamServiceImpl::GetCurrentTemperature(ServerContext *context_,
const UpdateInterval *request_,
ServerWriter<Temperature> *stream_)
{
auto currentTemp = 100.0f;
while(g_ServerIsNotDead) // Hack!!!
{
qDebug() << QThread::currentThreadId() << currentTemp << "farenheit.";
Temperature message;
message.set_temperature(currentTemp);
stream_->Write(message);
QThread::sleep(2);
currentTemp += 1.0f;
}
return Status::OK;
}
void insideSomeFunction() {
// Testing shutdown 5 seconds later
QTimer::singleShot(std::chrono::seconds(5), this, [=]() {
qDebug() << "Shuting down!";
g_ServerIsNotDead = false; // Hack!!
this->server->Shutdown(); // This method actually blocks until all RPC handlers have exited, believe it or not!
emit shutdown();
qDebug() << "All dead.";
});
}
Ref: https://github.com/C0D1UM/grpc-qt-example/blob/master/rpc_server/hellostream_server.cpp
It would be really nice if I could somehow check that Server has been Shutdown() from grpc::ServerContext, but I didn't see any relevant methods to achieve this.
Even better if someone could propose a way to take out the while loop completely (?). I'm using Qt so everything is event-driven.
So, I think it's worth making clear what Shutdown does. There's a detailed comment about this but basically, server Shutdown doesn't fail, cancel, or kill your existing in-progress calls (unless you use the deadline argument and the gRPC C++ async API).
Rather, it stops listening for new connections, stops accepting new calls, fails requested-but-not-yet-accepted calls. If you want to fail or terminate your calls at shutdown, you can do it at application-level code as you've done above.
I would just recommend that instead of using a global variable, you should use a member function of your StreamServiceImpl class so that you can support multiple services running in the same process if you choose.
We can use ServerContext::IsCancelled as a breaking/termination criteria in streaming APIs. I changed GetCurrentTemperature(...) as follows (just replaced g_ServerIsNotDead with !context_->IsCancelled()) and it worked:
Status StreamServiceImpl::GetCurrentTemperature(ServerContext *context_,
const UpdateInterval *request_,
ServerWriter<Temperature> *stream_) {
auto currentTemp = 100.0f;
while(!context_->IsCancelled) {
qDebug() << QThread::currentThreadId() << currentTemp << "farenheit.";
Temperature message;
message.set_temperature(currentTemp);
stream_->Write(message);
QThread::sleep(2);
currentTemp += 1.0f;
}
return Status::OK;
}

Qt timers cannot be stopped from another thread

Hy,
I'm writing my first Qt program and getting now in troubles with:
QObject::killTimer: timers cannot be stopped from another thread
QObject::startTimer: timers cannot be started from another thread
My program will communicate to a CANOpen bus for that I'm using the Canfestival Stack. The Canfestival will work with callback methods. To detects timeout in communication I setup a timer function (somehow like a watchdog). My timer package consist out of a "tmr" module, a "TimerForFWUpgrade" module and a "SingleTimer" module. The "tmr" module was originally C programmed so the static "TimerForFWUpgrade" methods will interface it. The "tmr" module will be part of a C programed Firmware update package.
The timer will work as follows. Before a message is sent I will call TMR_Set method. An then in my idle program loop with TMR_IsElapsed we check for a timer underflow. If TMR_IsElapsed I will do the errorhandling. As you see the TMR_Set method will be called continuously and restart the QTimer again and again.
The above noted errors are appearing if I start my program. Can you tell me if my concept could work? Why does this errors appear? Do I have to use additional threads (QThread) to the main thread?
Thank you
Matt
Run and Idle loop:
void run
{
// start communicate with callbacks where TMR_Set is set continously
...
while(TMR_IsElapsed(TMR_NBR_CFU) != 1);
// if TMR_IsElapsed check for errorhandling
....
}
Module tmr (interface to C program):
extern "C"
{
void TMR_Set(UINT8 tmrnbr, UINT32 time)
{
TimerForFWUpgrade::set(tmrnbr, time);
}
INT8 TMR_IsElapsed(UINT8 tmrnbr)
{
return TimerForFWUpgrade::isElapsed(tmrnbr);
}
}
Module TimerForFWUpgrade:
SingleTimer* TimerForFWUpgrade::singleTimer[NR_OF_TIMERS];
TimerForFWUpgrade::TimerForFWUpgrade(QObject* parent)
{
for(unsigned char i = 0; i < NR_OF_TIMERS; i++)
{
singleTimer[i] = new SingleTimer(parent);
}
}
//static
void TimerForFWUpgrade::set(unsigned char tmrnbr, unsigned int time)
{
if(tmrnbr < NR_OF_TIMERS)
{
time *= TimerForFWUpgrade::timeBase;
singleTimer[tmrnbr]->set(time);
}
}
//static
char TimerForFWUpgrade::isElapsed(unsigned char tmrnbr)
{
if(true == singleTimer[tmrnbr]->isElapsed())
{
return 1;
}
else
{
return 0;
}
}
Module SingleTimer:
SingleTimer::SingleTimer(QObject* parent) : QObject(parent),
pTime(new QTimer(this)),
myElapsed(true)
{
connect(pTime, SIGNAL(timeout()), this, SLOT(slot_setElapsed()));
pTime->setTimerType(Qt::PreciseTimer);
pTime->setSingleShot(true);
}
void SingleTimer::set(unsigned int time)
{
myElapsed = false;
pTime->start(time);
}
bool SingleTimer::isElapsed()
{
QCoreApplication::processEvents();
return myElapsed;
}
void SingleTimer::slot_setElapsed()
{
myElapsed = true;
}
Use QTimer for this purpose and make use of SIGNALS and SLOT for the purpose of starting and stopping the timer/s from different threads. You can emit the signal from any thread and catch it in the thread which created the timer to act on it.
Since you say you are new to Qt, I suggest you go through some tutorials before proceeding so that you will know what Qt has to offer and don't end up trying to reinvent the wheel. :)
VoidRealms is a good starting point.
You have this problem because the timers in the static array is created in Thread X, but started and stopped in Thread Y. This is not allowed, because Qt rely on thread affinity to timeout timers.
You can either create, start stop in the same thread or use signal and slots to trigger start and stop operations for timers. The signal and slot solution is a bit problematic Because you have n QTimer objects (Hint: how do you start the timer at position i?)
What you can do instead is create and initialize the timer at position tmrnbr in
TimerForFWUpgrade::set(unsigned char tmrnbr, unsigned int time)
{
singleTimer[tmrnbr] = new SingleTimer(0);
singleTimer[tmrnbr]->set(time);
}
which is executed by the same thread.
Futhermore, you don't need a SingleTimer class. You are using Qt5, and you already have all you need at your disposal:
SingleTimer::isElapsed is really QTimer::remainingTime() == 0;
SingleTimer::set is really QTimer::setSingleShot(true); QTimer::start(time);
SingleTimer::slot_setElapsed becomes useless
ThusSingleTimer::SingleTimer becomes useless and you dont need a SingleTimer class anymore
I got the errors away after changing my timer concept. I'dont use anymore my SingleTimer module. Before the QTimer I won't let timeout and maybe because of that I run into problems. Now I have a cyclic QTimer that times out every 100ms in slot function I will then count the events. Below my working code:
TimerForFWUpgrade::TimerForFWUpgrade(QObject* parent) : QObject(parent),
pTime(new QTimer(this))
{
connect(pTime, SIGNAL(timeout()), this, SLOT(slot_handleTimer()));
pTime->setTimerType(Qt::PreciseTimer);
pTime->start(100);
}
void TimerForFWUpgrade::set(unsigned char tmrnbr, unsigned int time)
{
if(tmrnbr < NR_OF_TIMERS)
{
if(timeBase != 0)
{
myTimeout[tmrnbr] = time / timeBase;
}
else
{
myTimeout[tmrnbr] = 0;
}
myTimer[tmrnbr] = 0;
myElapsed[tmrnbr] = false;
myActive[tmrnbr] = true;
}
}
char TimerForFWUpgrade::isElapsed(unsigned char tmrnbr)
{
QCoreApplication::processEvents();
if(tmrnbr < NR_OF_TIMERS)
{
if(true == myElapsed[tmrnbr])
{
return 1;
}
else
{
return 0;
}
}
else
{
return 0; // NOK
}
}
void TimerForFWUpgrade::slot_handleTimer()
{
for(UINT8 i = 0; i < NR_OF_TIMERS; i++)
{
if(myActive[i] == true)
{
myTimer[i]++;
if(myTimeout[i] < myTimer[i])
{
myTimer[i] = 0;
myElapsed[i] = true;
myActive[i] = false;
}
}
}
}

Boost scoped lock assert fails

I'm using Boost 1.41 in a linux app that receives data on one thread and sticks it in a queue, another thread pops it off the queue and processes it. To make it thread safe I'm using scoped locks.
My problem is that very infrequently the lock function fails in the read function with the message:
void boost::mutex::lock() Assertion '!pthread_mutext_lock(&m)' failed
It is very infrequent, on last run, it took 36 hours (~425M transactions) before it failed. The read and write functions are listed below, its always in the read function that the Assert arises
Write to queue
void PacketForwarder::Enqueue(const byte_string& newPacket, long sequenceId)
{
try
{
boost::mutex::scoped_lock theScopedLock(pktQueueLock);
queueItem itm(newPacket,sequenceId);
packetQueue.push(itm);
if (IsConnecting() && packetQueue.size() > MaximumQueueSize)
{
// Reached maximum queue size while client unavailable; popping.
packetQueue.pop();
}
}
catch(...)
{
std::cout << name << " Exception was caught:" << std::endl;
}
}
Read from queue
while ( shouldRun )
{
try
{
if (clientSetsHaveChanged)
{
tryConnect();
}
size_t size = packetQueue.size();
if (size > 0)
{
byte_string packet;
boost::mutex::scoped_lock theQLock(pktQueueLock);
queueItem itm = packetQueue.front();
packet = itm.data;
packetQueue.pop();
BytesSent += packet.size();
trySend(packet);
}
else
{
boost::this_thread::sleep(boost::posix_time::milliseconds(50));
}
}
catch (...)
{
cout << name << " Other exception in send packet" << endl;
}
I've googled and found a few problems when destroying scoped_locks but nothing on failing to get a lock. I have also had a search through boost release notes and Trac logs to see if this has been identified as an issue by anyone else. I thought my code was about as simple as it gets but obviously something is up. Any thoughts?
TIA
Paul
There is one thread-safety issue in your program, in this piece of code:
size_t size = packetQueue.size();
if (size > 0)
{
byte_string packet;
boost::mutex::scoped_lock theQLock(pktQueueLock);
queueItem itm = packetQueue.front();
packet = itm.data;
packetQueue.pop();
// ...
}
The issue here is that between the time you checked the queue size and the time you got the lock some other reader thread might take the last item out of the queue, which will cause front() and pop() to fail. Unless you have only one reader thread, you need the size check to be under the lock as well.
I do not know if this is the reason of the assertion failure though. The assertion means the call to pthread_mutex_lock returned a non-zero value signaling an error. Unfortunately, Boost does not show which exactly of possible pthread_mutex_lock errors has happened.

C++ Timed Process

I'm trying to set up some test software for code that is already written (that I cannot change). The issue I'm having is that it is getting hung up on certain calls, so I want to try to implement something that will kill the process if it does not complete in x seconds.
The two methods I've tried to solve this problem were to use fork or pthread, both haven't worked for me so far though. I'm not sure why pthread didn't work, I'm assuming it's because the static call I used to set up the thread had some issues with the memory needed to run the function I was calling (I continually got a segfault while the function I was testing was running). Fork worked initially, but on the second time I would fork a process, it wouldn't be able to check to see if the child had finished or not.
In terms of semi-pseudo code, this is what I've written
test_runner()
{
bool result;
testClass* myTestClass = new testClass();
pid_t pID = fork();
if(pID == 0) //Child
{
myTestClass->test_function(); //function in question being tested
}
else if(pID > 0) //Parent
{
int status;
sleep(5);
if(waitpid(0,&status,WNOHANG) == 0)
{
kill(pID,SIGKILL); //If child hasn't finished, kill process and fail test
result = false;
}
else
result = true;
}
}
This method worked for the initial test, but then when I would go to test a second function, the if(waitpid(0,&status,WNOHANG) == 0) would return that the child had finished, even when it had not.
The pthread method looked along these lines
bool result;
test_runner()
{
long thread = 1;
pthread_t* thread_handle = (pthread_t*) malloc (sizeof(pthread_t));
pthread_create(&thread_handle[thread], NULL, &funcTest, (void *)&thread); //Begin class that tests function in question
sleep(10);
if(pthread_cancel(thread_handle[thread] == 0))
//Child process got stuck, deal with accordingly
else
//Child process did not get stuck, deal with accordingly
}
static void* funcTest(void*)
{
result = false;
testClass* myTestClass = new testClass();
result = myTestClass->test_function();
}
Obviously there is a little more going on than what I've shown, I just wanted to put the general idea down. I guess what I'm looking for is if there is a better way to go about handling a problem like this, or maybe if someone sees any blatant issues with what I'm trying to do (I'm relatively new to C++). Like I mentioned, I'm not allowed to go into the code that I'm setting up the test software for, which prevents me from putting signal handlers in the function I'm testing. I can only call the function, and then deal with it from there.
If c++11 is legit you could utilize future with wait_for for this purpose.
For example (live demo):
std::future<int> future = std::async(std::launch::async, [](){
std::this_thread::sleep_for(std::chrono::seconds(3));
return 8;
});
std::future_status status = future.wait_for(std::chrono::seconds(5));
if (status == std::future_status::timeout) {
std::cout << "Timeout" <<endl ;
} else{
cout << "Success" <<endl ;
} // will print Success
std::future<int> future2 = std::async(std::launch::async, [](){
std::this_thread::sleep_for(std::chrono::seconds(3));
return 8;
});
std::future_status status2 = future2.wait_for(std::chrono::seconds(1));
if (status2 == std::future_status::timeout) {
std::cout << "Timeout" <<endl ;
} else{
cout << "Success" <<endl ;
} // will print Timeout
Another thing:
As per the documentation using waitpid with 0 :
meaning wait for any child process whose process group ID is equal to
that of the calling process.
Avoid using pthread_cancel it's probably not a good idea.

How to wait on a Mutex with OpenMP

I've a for loop that will launch processes in parallel every launched process will return a response back indicating that it is ready. I want to wait for the response and I'll abort if a certain timeout is reached.
Development environment is VS2008
Here is the pseudo code:
void executeCommands(std::vector<Command*> commands)
{
#pragma omp parallel for
for (int i = 0; i < commands.size(); i++)
{
Command* cmd = commands[i];
DWORD pid = ProcessLauncher::launchProcess(cmd->getWorkingDirectory(), cmd->getCommandToExcecute(), cmd->params);
//Should I wait for process to become ready?
if (cmd->getWaitStatusTimeout() > 0)
{
ProcessStatusManager::getInstance().addListener(*this);
//TODO: emit process launching signal
//BEGINNING OF QUESTION
//I don't how to do this part.
//I might use QT's QWaitCondition but if there is another solution in omp
//I'd like to use it
bool timedOut;
SOMEHANDLE handle = Openmp::waitWithTimeout(cmd->getWaitStatusTimeout(), &timedOut);
mWaitConditions[pid]) = handle;
//END OF QUESTION
if (timedOut)
{
ProcessStatusManager::getInstance().removeListener(*this);
//TODO: kill process
//TODO: emit fail signal
}
else
{
//TODO: emit process ready signal
}
}
else
{
//TODO: emit process ready signal
}
}
}
void onProcessReady(DWORD sourceProcessPid)
{
ProcessStatusManager::getInstance().removeListener(*this);
SOMEHANDLE handle = mWaitConditions[sourceProcessPid];
if (mWaitConditions[sourceProcessPid] != 0)
{
Openmp::wakeAll(handle);
}
}
As the comment above pointed out, Michael Suess did present a paper on adding this functionality to OpenMP. He is the last of several people that have proposed adding some type of wait function to OpenMP. The OpenMP language committee has taken the issue up several times. Each time it has been rejected because there are other ways to do this function already. I don't know Qt, but as long as the functions it provides are thread safe, then you should be able to use them.