Stop threads from eating up all resources

Stop threads from eating up all resources - c++

I have written a program in QT using several threads for doing important stuff in the background. The target for this program is a BeagleBone Black (Single-Core-CPU). But in my tests on my computer (VM, 4 Cores of i7) the separate threads are already eating up two of the four cores as seen in htop (maybe because in two of them a while(condition){}-loop is running). How can I prevent these threads from eating up all my resources, so that I will be able to run this multi-thread-program without speed loss on a single-core-arm-cpu? How can I find out which threads are eating up all my cpu resources?

As you're using Qt, there's a better possibility for making your threads wait. You could indeed use QWaitConditions.
This allows you to make your thread block until a certain condition is met in another thread for example. This thread then can notify either all threads that are waiting that the condition has been met, and then wake them, or just one depending on your need (though through one QWaitCondition you can't determine/predict which one will be notified, that depends on the OS).
In case you need a more general resource about this topic (idleness), I invite you to read the article In praise of idleness which covers this topic more thoroughly.

Besides using waitConditions you can also use the event loop to sequence the code
What will need to happen is that each function in the form of:
void Worker::foo(){
//some code
while(!stop_condition){}
//other code
}
void Worker::signalCondition(){
stop_condition=true;
}
has to be translated to:
void Worker::foo(){
//some code
}
//actual slot called with queuedConnection
void Worker::signalCondition(){
//other code
}
This means Worker needs to be moved to the other thread or the signalCondition will be called on the wrong thread.
admittedly the code change for using QWaitConditions is simpler:
void Worker::foo(){
//some code
{
QMutexLocker locker(&mutex);
while(!stop_condition){
waitCondition.wait(&mutex);
}
}
//other code
}
void Worker::signalCondition(){
QMutexLocker locker(&mutex);
stop_condition=true;
waitCondition.wakeOne();
}

Related

C++ : Throw Away threads vs Thread Pool

So I have a main application thread in my opengl application with all rendering and stuff. Now i need some really heavy calculation which takes about 2 3 seconds so I moved it to a seperate thread here is how I manage it:
std::atomic<bool> working = false;
void work(){
if(!working)
{
working = true;
std::thread worker(do_work);
worker.detach();
}
else
{
// Some Updations
}
}
void do_work()
{
// The actual work
// working = false;
}
Now i call work every frame and the actual work gets dispatched automatically once the previous one has finished.
Now my question is what will be some ways to optimize this?
Some ideas that come to my mind are have a thread pool but I am not sure if that is worth the trouble implementing? Or is there any other way?

You could use std::launch as some people have suggested. Or you could do a google for "c++ thread pool library" and probably find something waiting for you.
But the reality is simple: writing a thread pool is close to trivial and is a good exercise. So you could write your own and learn something. As has been suggested, you can dispatch via some sort of message queue.
A work queue would be any sort of mutex & cond_var - managed FIFO, and then you can have multiple readers and multiple writers. The entries in the queue can be any sort of runnable or bound function (such as a lambda).
Fun to write and you can toss into your person library for years to come.

How to cleanly exit a threaded C++ program?

I am creating multiple threads in my program. On pressing Ctrl-C, a signal handler is called. Inside a signal handler, I have put exit(0) at last. The thing is that sometimes the program terminates safely but the other times, I get runtime error stating
abort() has been called
So what would be the possible solution to avoid the error?

The usual way is to set an atomic flag (like std::atomic<bool>) which is checked by all threads (including the main thread). If set, then the sub-threads exit, and the main thread starts to join the sub-threads. Then you can exit cleanly.
If you use std::thread for the threads, that's a possible reason for the crashes you have. You must join the thread before the std::thread object is destructed.

Others have mentioned having the signal-handler set a std::atomic<bool> and having all the other threads periodically check that value to know when to exit.
That approach works well as long as all of your other threads are periodically waking up anyway, at a reasonable frequency.
It's not entirely satisfactory if one or more of your threads is purely event-driven, however -- in an event-driven program, threads are only supposed to wake up when there is some work for them to do, which means that they might well be asleep for days or weeks at a time. If they are forced to wake up every (so many) milliseconds simply to poll an atomic-boolean-flag, that makes an otherwise extremely CPU-efficient program much less CPU-efficient, since now every thread is waking up at short regular intervals, 24/7/365. This can be particularly problematic if you are trying to conserve battery life, as it can prevent the CPU from going into power-saving mode.
An alternative approach that avoids polling would be this one:
On startup, have your main thread create an fd-pipe or socket-pair (by calling pipe() or socketpair())
Have your main thread (or possibly some other responsible thread) include the receiving-socket in its read-ready select() fd_set (or take a similar action for poll() or whatever wait-for-IO function that thread blocks in)
When the signal-handler is executed, have it write a byte (any byte, doesn't matter what) into the sending-socket.
That will cause the main thread's select() call to immediately return, with FD_ISSET(receivingSocket) indicating true because of the received byte
At that point, your main thread knows it is time for the process to exit, so it can start directing all of its child threads to start shutting down (via whatever mechanism is convenient; atomic booleans or pipes or something else)
After telling all the child threads to start shutting down, the main thread should then call join() on each child thread, so that it can be guaranteed that all of the child threads are actually gone before main() returns. (This is necessary because otherwise there is a risk of a race condition -- e.g. the post-main() cleanup code might occasionally free a resource while a still-executing child thread was still using it, leading to a crash)

The first thing you must accept is that threading is hard.
A "program using threading" is about as generic as a "program using memory", and your question is similar to "how do I not corrupt memory in a program using memory?"
The way you handle threading problem is to restrict how you use threads and the behavior of the threads.
If your threading system is a bunch of small operations composed into a data flow network, with an implicit guarantee that if an operation is too big it is broken down into smaller operations and/or does checkpoints with the system, then shutting down looks very different than if you have a thread that loads an external DLL that then runs it for somewhere from 1 second to 10 hours to infinite length.
Like most things in C++, solving your problem is going to be about ownership, control and (at a last resort) hacks.
Like data in C++, every thread should be owned. The owner of a thread should have significant control over that thread, and be able to tell it that the application is shutting down. The shut down mechanism should be robust and tested, and ideally connected to other mechanisms (like early-abort of speculative tasks).
The fact you are calling exit(0) is a bad sign. It implies your main thread of execution doesn't have a clean shutdown path. Start there; the interrupt handler should signal the main thread that shutdown should begin, and then your main thread should shut down gracefully. All stack frames should unwind, data should be cleaned up, etc.
Then the same kind of logic that permits that clean and fast shutdown should also be applied to your threaded off code.
Anyone telling you it is as simple as a condition variable/atomic boolean and polling is selling you a bill of goods. That will only work in simple cases if you are lucky, and determining if it works reliably is going to be quite hard.

Additional to Some programmer dude answer and related to discussion in the comment section, you need to make the flag that controls termination of your threads as atomic type.
Consider following case :
bool done = false;
void pending_thread()
{
while(!done)
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
Here worker thread can be your main thread also and done is terminating flag of your thread, but pending thread need to do something with given data by working thread, before exiting.
this example has race condition and undefined behaviour along with it, and it's really hard to find what is the actual problem int the real world.
Now the corrected version using std::automic :
std::atomic<bool> done(false);
void pending_thread()
{
while(!done.load())
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
You can exit thread without being concern of race condition or UB.

OpenMP: how to explicitly divide code into different threads

Let's say I have a Writer class that generates some data, and a Reader class that consumes it. I want them to run all the time under different threads. How can I do that with OpenMP?
This is want I would like to have:
class Reader
{
public:
void run();
};
class Writer
{
public:
void run();
};
int main()
{
Reader reader;
Writer writer;
reader.run(); // starts asynchronously
writer.run(); // starts asynchronously
wait_until_finished();
}
I guess the first answers will point to separate each operation into a section, but sections does not guarantee that code blocks will be given to different threads.
Can tasks do it? As far as I understood after reading about task is that each code block is executed just once, but the assigned thread can change.
Any other solution?
I would like to know this to know if a code I have inherited that uses pthreads, which explicitly creates several threads, could be written with OpenMP. The issue is that some threads were not smartly written and contain active waiting loops. In that situation, if two objects with active waiting are assigned to the same OpenMP thread (and hence are executed sequentially), they can reach a deadlock. At least, I think that could happen with sections, but I am not sure about tasks.

Serialisation could also happen with tasks. One horrible solution would be to reimplement sections on your own with guarantee that each section would run in a separate thread:
#pragma omp parallel num_threads(3)
{
switch (omp_get_thread_num())
{
case 0: wait_until_finished(); break;
case 1: reader.run(); break;
case 2: writer.run(); break;
}
}
This code assumes that you would like wait_until_finished() to execute in parallel with reader.run() and writer.run(). This is necessary since in OpenMP only the scope of the parallel construct is where the program executes in parallel and there is no way to put things in the background, so to say.

If you're rewriting the code anyway, you might be better moving to Threading Building Blocks (TBB; http://www.threadingbuildingblocks.org).
TBB has explicit support for pipeline style operation (or more complicated task graphs), while maintaing cache-locality and independence of the underlying number of threads.

Semaphore (Mutex) trouble

Let's say I have one Mutex, two threads, one function and one cycle (Pseudo code).
Function:
void Update(){
Mutex.enter();
...// time: 10 ms
Mutex.leave();
}
Main.cpp:
void main(){
...// Starting thread
while(true)
Update();
}
Thread:
void Thread(void *){
Mutex.enter();
... //
Mutex.leave();
}
But Function calls constantly, so Mutex small time is free. How high chance Thread have to enter in Mutex? If low, how it can be resolved?

If you're using boost threads (link), then I'd use yield(). It'll allow any other "waiting" threads to "get a chance" to run.
There's probably a win32 or pthreads way of doing this too.
Edit: and by the way, use yield() outside of the locks. If it's inside the locks, obviously that would be useless.
Edit2: And here's the functions for different platforms:
Win32: SwitchToThread() msdn link.
Linux/Unix pthreads: `pthread_yield()' link
If you're not on any of those platforms, read the descriptions at those links, and look for a function that does the same thing in your framework.

From the pseudo-code you showed it seems like there's no cooperation between threads. If thread2 is lucky to grab the mutex before the call to the first Update() is placed then for the whole lifetime of thread2 Update() functions will not be called. It looks like a flawed design to me. If thread2 is doing the work and 'main' thread is calling Update() function to monitor and report the progress of whatever is happening in thread2 thread routine, then it would make much more sense to have thread1 (the main one) wait on a update_required signal and thread2 (the one that is progressing with work) would do the work, then fill-in a struct variable with all the data needed to report the progress and signal thread1 to use the data and report the progress. Using a ring buffer of such struct variable could eliminate the need for mutexes altogether.

Synchronizing looping threads

I am making some multi-threaded video game code. Before I began coding I looked at an article describing vaguely Valve's solution to multi-threaded game design. A key concept I gleaned from the article is thread synchronization. I don't know if this is how Valve does it but I imagined multiple threads each executing a game loop. At the end of each iteration, the threads pause and wait for other threads to finish their current iteration, then synchronize shared data. I figure that besides the overhead is this management scheme, there would be no different to just let the threads operate completely asynchronously. The article mentioned a thread used exclusively for syncing but I am trying to get a different solution to work correctly. This is how I (try) to do it:
// at end of loop on each thread...
sig_thread_done();
while (!is_sync_done())
{
PauseExecution(1);
}
sig_thread_done and is_sync_done are function objects from another class that controls a list of all "threads". These functions look like this:
bool Core::IsFrameDone()
{
MutexLock lock(manager_mutex);
if (waiting_components == -1)
{
waiting_components = 0;
return true;
}
return false;
}
void Core::SignalFrameDone()
{
MutexLock lock(manager_mutex);
if (++waiting_components == (int)components.size()) // components == threads
{
//sync shared data...
waiting_components = -1; // -1 signifies that all threads have completed their iteration
}
}
The problem is that a fast thread can exit its waiting loop and come back around to it again before other threads have a chance to exit there's. So the other threads miss the exit through is_sync_done returning false before another thread begins waiting and the whole system gets stuck waiting forever.
I can't find an easy way to resolve this issue. I really like this approach because synchronization doesn't get stalled while some independent thread performs the sync.
I appreciate any insight or suggestions anyone has to offer.
Link to article.

I think you are trying to re-invent a Thread barrier.

For something like this you want to sync on a barrier, with something like a Win32 Event (or an array thereof), this makes sure you cannot get the situation you described (the barrier ensures that everything syncs up to the same frame) while at the same time freeing CPU time, as waiting on events is done as a kernel signal, and sleeps the thread till that signal is received. You'd also what to use wait-free algorithms in there, these work particularly well if you have a job/task based threading model, where certain things can be decoupled from the system.
Also, here is a better publication on multi-threading the source engine, its far more in depth and technical (they also specifically state that they avoid mutexes for this sort of thing).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js