guaranteeing to wake up all threads and only once per each - c++

I found a bug in my program, that the same thread is awoke twice taking the opportunity for another thread to run, thus causing unintended behaviours. It is required in my program that all threads waiting should run exactly once per turn. This bug happens because I use semaphores to make the threads wait. With a semaphore initialized with count 0, every thread calls down to the semaphore at the start of its infinite loop, and the main thread calls up in a for loop NThreads (the number of threads) times. Occasionally the same thread takes the up call twice and the problem arises.
What is the way to deal with this problem properly? Is using condition variables and broadcasting a way to do this? Will it guarantee that every thread is awoke once and only once? What are other good ways possible?

On windows, you could use WaitForMultipleObjects to select a ready thread from the threads that have not been run in the current Nthread iterations.
Each thread should have a "ready" event to signal when it is ready, and a "wake" event to wait on after it has signaled its "ready" event.
At the start of your main thread loop (1st of NThreads iteration), call WaitForMultipleObjects with an array of your NThreads "ready" events.
Then set the "wake" event of the thread corresonding to the "ready" event returned by WaitForMultipleObjects, and remove it from the array of "ready" handles. That will guaranty that the thread that has already been run won't be returned by WaitForMultipleObjects on the next iteration.
Repeat until the last iteration, where you will call WaitForMultipleObjects with an array of only 1 thread handle (I think this will work as if you called WaitForSingleObject).
Then repopulate the array of NThreads "ready" events for the next new Nthreads iterations.

Well, use an array of semaphores, one for each thread. If you want the array of threads to run once only, send one unit to each semaphore. If you want the threads to all run exactly N times, send N units to each semaphore.

Related

Run a function at the start of every minute?

I am trying to run a function at the start of every minute mm:00.000
So, I want to run a function (perfomance is very importnant) every time this condition is true:
(std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count()) % 6000 == 0
Any idea how to do this?
Start a separate thread.
The thread checks std::chrono_system_clock, and computes the absolute time for the next minute boundary. There are several ways to make the thread sleep until the prescribed time arrives. One way would be for thread to create a private mutex and condition variable, lock the mutex, and call wait_until() the absolute time for the next minute boundary. Since nothing else will notify the condition variable, the thread will simply sleep until the prescribed time arrives, then your thread can invoke the given function.
A separate thread is strictly not necessary. This could all be done as part of your main execution thread, if your main execution thread has nothing to do, otherwise.

Is this implementation of a general semaphore with binary semaphores correct?

Prove or Disprove the correctness of the following semaphore.
Here are my thoughts on this.
Well, if someone implements it so wait runs first before signal, there will be a deadlock. The program will call wait, decrement count, enter the count < 0 condition and wait at gate. Because it is waiting at gate, it cannot proceed to the signal that is right after the wait. So in that case, this might imply that the semaphore is incorrect.
However, if we assume that two processes are running, one running wait first and the other running signal first, then if the first process run waits and blocks at wait(gate), then the other process can run signal and release the process that was blocked. Thus, continuing on this scheme would make the algorithm valid and not result in a dead lock.
Given implementation follows these principles:
Binary semaphore S protect count variable from concurrent access.
If non-negative, count reflect number of free resources for general semaphore. Otherwise, absolute value of count reflect number of threads which wait (p5) or ready-to-wait (between p4 and p5) on binary semaphore gate.
Every signal() call increments count and, if its previous value is negative, signals binary semaphore gate.
But, because of possibility of ready-to-wait state, given implementation is incorrect:
Assume thread#1 calls wait(), and currently is in ready-to-wait state. Assume another thread#2 also calls wait(), and currently is in ready-to-wait state too.
Assume thread#3 calls signal() at this moment. Because count is negative (-2), the thread performs all operations including p10 (signal(gate)). Because gate is not waited at the moment, it becomes in free state.
Assume another thread#4 calls signal() at this moment. Because count is still negative (-1), the thread also performs all operations including p10. But now gate is already in free state. So, signal(gate) is no-op here, and we have missed signal event: only one of thread#1 and thread#2 will continue after executing p5 (wait(gate)). Other thread will wait forever.
Without possibility of ready-to-wait state (that is signal(S) and wait(gate) would be executed atomically) implementation would be OK.

QThread always one thread left behind

This is my thread:
class calThread : public QThread
{
Q_OBJECT
public:
calThread(QList<int> number);
~calThread();
void run();
QList<int> cal(QList<int> input);
signals:
void calFinished(QList<int> result);
};
void calThread::run()
{
output = cal(number);
emit calFinished(output);
sleep(1);
}
This is how I call the thread:
calThread* worker3 = new calThread(numberList);
connect(worker3, SIGNAL(calFinished(List<int>)), this, SLOT(handleResult(List<int>)));
connect(worker3, SIGNAL(finished()), worker3, SLOT(deleteLater()));
worker3->start();
I have a large list of input. I divide the list into four equal sized list and put them into individual thread to calculate. They are namely worker0 to worker3.
Every time the program runs, the four threads start at similar time. But there is always one thread that returns much much slower. For example, it takes about 2 minutes for first 3 threads to finish, the fourth thread takes maybe 5 minutes to return.
But all thread should have same number of items and same complexity to calculate.
Why is there always a thread left behind?
Debug output:
inputThread0 item numbers: 1736
inputThread1 item numbers: 1736
inputThread2 item numbers: 1736
inputThread3 item numbers: 1737
"20:29:58" Thread 0 Thread ID 0x7f07119df700
"20:29:58" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:58" Thread 1 Thread ID 0x7f06fd1d5700
"20:29:58" Thread 2 Thread ID 0x7f06fc9d4700
"20:29:58" Thread 0 Thread ID 0x7f07119df700
"20:29:58" Thread 1 Thread ID 0x7f06fd1d5700
….............................
//Most of them are Thread 0,1,2 afterward
….............................
"20:29:58" Thread 1 Thread ID 0x7f06fd1d5700
// This is last Thread from thread 0,1,or2
// It takes less than one second to finish
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
….................................
// Only Thread 3 left
"20:30:17" Thread 3 Thread ID 0x7f06fc1d3700
// This last thread takes 19 second to finish
"Why is there always a thread left behind?"
Why not? Thread scheduling is completely on the whim of the OS. There is no guarantee at all that any threads will get any sort of a "fair share" of any CPU resources. You need to assign small chunks of work and have them automatically distributed across the worker threads. QtConcurrent::run and the QtConcurrent framework in general offers a trivial way of getting that done. As long as the chunks of work passed to run, mapReduce, etc. are reasonably sized (say take between 0.1 and 1s), all of the threads in the pool will be done within a couple tenths of a second of each one.
A partial explanation for your observed behavior is that a thread that already runs on a given core is more likely to be rescheduled on the same core to utilize the warm caches there. If there are three out of four cores that run your threads almost continuously, the fourth thread often ends up sharing the core with your GUI thread, and will necessarily run slower if the GUI is not idle. If the GUI thread is busy processing the results from the other threads, it is not unexpected that the computation thread would be starved on that core. This is actually the most power- and time-efficient way to schedule the threads, with least amount of overhead.
As long as you give the threads small chunks of work to do, and distribute them on as-ready basis - as QtConcurrent does - it will also lead to smallest wall clock runtimes. If the scheduler was forcing "fair" reschedules, your long-running threads would all finish roughly at the same time, but would take more time and power to finish the job. Modern schedulers enable you to run your jobs most efficiently, but you must set the jobs up to take advantage of that.
In a way, your scheduler is helping you improve your code to be more resource-efficient. That's a good thing.

Thread synchronization in qt

I have a program that have 3 threads.All of them take data from ethernet on different ports.The frequencies of the data coming for 3 of the threads may be different. But all of the incoming data must be processed at the same time.
So if one data comes for one thread, it must wait the others data to come. How can I get it?
Boost.Thread has a barrier class, whose purpose is to block multiple threads until a specified number have reached the barrier.
You would just create a boost::barrier initialized with 3, meaning that it blocks until three threads are waiting on the barrier. When each of your threads is done waiting for data, you have them call wait() on the barrier. When the third thread calls wait(), all three threads will continue execution.
boost::barrier barrier(3);
void thread_function()
{
read_data();
barrier.wait(); // Threads will block here until all three are ready.
process_data();
}
If you only want one thread to process the data, you can check the return value of wait(); the function will only return true for one of the threads at the barrier.
You need a barrier. Barrier has preset capacity N and blocks N-1 threads until N-th arrives. After the N-th arrives, all N threads are released simultaneously.
Unfortunately Qt has no direct support for barriers, but there is simple implementation using Qt primitives here: https://stackoverflow.com/a/9639624/1854587
Not as simple as boost's barrier as answered by #dauphic, but this can be done with Qt alone, using slots, signals and another class on a 4th thread.
Create a class on a separate thread that coordinates the other 3, the network threads can send a signal to the 'coordinator' class when they receive data. Once this coordinator class has received messages from all 3 network threads, it can then signal the threads to process the data.

Having Thread local queue's with counters

I have four threads which has its own private queue and a private'int count' member, whenever a task is produced from the program thread, it should be enqueued to a thread's queue which has minimum 'int count' among the threads.
whenever a task is pushed into the queue, the private 'int count' should be increased by 1
whenever a task is popped out of the queue, the private 'int count' should be decreased by 1
so, the 'int count' is dynamically changing regarding to tasks push,pop operation and the program thread will dispatch the task to the queue with the lowest, (or first zero found), count.
This is the underlying logic of the program.
I am working in c++ programing language in linux multithreading library implementing a multi-rate synchronous data flow paradigm.
could you please give some coding ideas for implemenating this logic. ie.
1.Initializing all private int queue counter =0
2.counter++ when task are pushed,
3.counter-- when tasks are popped,
4.Task disptacher sees the private int count of each thread.
5.Dispatches tasks to queue which has minimum count
I have four threads which has its own private queue and a private'int
*count' member, whenever a task is produced from the program thread, i*t
should be enqueued to a thread's queue which has minimum 'int count'
*among the threads.*
whenever a task is pushed into the queue, the private 'int count'
*should be increased by 1 whenever a task is popped out of the queue,*
the private 'int count' should be decreased by 1
Ok, so Basically your program thread is the producer and you have 4 consumer threads. By using a queue in each thread you will be minimizing the time spent by the main thread interacting with the consumers. N.B. You need to consider whether your threads are going to be starved / or overflow - I.E. if the single producer will create "work" at a rate that warrants 4 consumers, or if 4 consumers will be swamped.
naive approach
So you need to synchronize the queue access / increment meaning that you need a mutex to stop the consumer accessing anything while the count and queue are modified. The easiest way is to do the synchronization would be to have a method (E.G. enqueue(Item& item) ) which locks the mutex within it.
C++11 : Mutex http://en.cppreference.com/w/cpp/thread/mutex
Additionally if starvation is an issue (or overflow) you will need to use some signalling to stop the relevant threads activity (Starved - stop consumers to avoid CPU usage, Overflow - stop producer while consumers catch up). Usually these signals are implemented using condition variables.
C++11 : Condition variables : http://en.cppreference.com/w/cpp/thread/condition_variable
so, the 'int count' is dynamically changing regarding to tasks
*push,pop operation and the program thread will dispatch the task t*o
the queue with the lowest, (or first zero found), count.
So the situation is slightly complicated here, in that the threads that you want to populate will be the ones with the least work to do. This requires that you inspect the 4 counts and choose the queue. However because there is only one producer you can probably just scan for the queue without locking. The logic here is that the consumers will not be affected by the read, and the choice of thread would not really be incorrect even with the consumers working during that choice.
So I would have an array of thread objects, each of which would have the count, and a mutex for locking.
1.Initializing all private int queue counter =0
Initialize the counts in the constructors - make sure that the producer isn't working during initialization and synchronization won't be an issue.
2.counter++ when task are pushed,
*3.counter-- when tasks are popped,*
Implement 2 methods on the thread object to do the enqueing / dequeuing and in each use a lock_guard to lock the mutex (RAII technique). Then push/pop item to/from the queue and increment/decrement as applicable.
C++11: lock_guard http://en.cppreference.com/w/cpp/thread/lock_guard
4.Task disptacher sees the private int count of each thread.
*5.Dispatches tasks to queue which has minimum count*
As I said above if there is only one you can simply scan through the array of objects and choose (maintain an index to) the thread object where the counter (add a getCount() method)is the lowest. It will most likely be the lowest even with the consumers continuing their work.
If there are multiple threads producing work then you might need to think about how you want to handle the 2 threads enquing to the same thread (It might not matter)