Thread Synchronization C++ - c++

I am having this weird issue with threads. On my mac with OS X this works fine but once I more it over to my desktop that is running Ubuntu, I am facing issues.
Essentially what I am doing is the following:
Function() {
for(i = 1 to 10)
while(array not completely changed) {
pthread_mutex_lock(&lock);
-- perform actions
pthread_mutex_unlock(&unlock);
}
}
}
And I have two threads running this function. While it is supposed to be running in such a manner that is:
Thread 1 grabs lock
performs opperations on array
Thread 1 releases lock
Thread 2 grabs lock
performs calculations on array
Thread 2 releases lock
and so on in a back and forth pattern until the array have been completed changed but on Linux all of the calculations of Thread 1 complete and then Thread 2 starts.
So I will get:
Thread 1 grabs lock
performs opperations on array
Thread 1 releases lock
Thread 1 grabs lock
performs calculations on array
Thread 1 releases lock
Thread 1 grabs lock
performs calculations on array
Thread 1 releases lock
And so on until the array is completely changed, once I increment the for loop, then Thread 2 will perform all calculations and continue this pattern.
Can anyone explain what is going on?

You're experiencing "starvation". Add a small nanosleep call occasionally to give the other threads a chance to run. Add the call outside the mutex pair (e.g. after the unlock). Thread 1 is monopolizing things.
You may also want to consider restructuring and splitting up the critical [requires locking] vs non-critical work:
while (more) {
lock ...
do critical stuff ...
unlock ...
nanosleep ...
do non-critical stuff
}

Related

Is it possible a lock wouldn't release in a while loop

I have two threads using a common semaphore to conduct some processing. What I noticed is Thread 1 appears to hog the semaphore, and thread 2 is never able to acquire it. My running theory is maybe through compiler optimization/thread priority, somehow it just keeps giving it to thread 1.
Thread 1:
while(condition) {
mySemaphore->aquire();
//do some stuff
mySemaphore->release();
}
Thread 2:
mySemaphore->aquire();
//block of code i never reach...
mySemaphore->release();
As soon as I add a delay before Thread 1s next iteration, it allows thread 2 in. Which I think confirms my theory.
Basically for this to work I might need some sort of ordering aware lock. Does my reasoning make sense?

Making a gather/barrier function with System V Semaphores

I'm trying to implement a gather function that waits for N processes to continue.
struct sembuf operations[2];
operaciones[0].sem_num = 0;
operaciones[0].sem_op = -1; // wait() or p()
operaciones[1].sem_num = 0;
operaciones[1].sem_op = 0; // wait until it becomes 0
semop ( this->id,operations,2 );
Initially, the value of the semaphore is N.
The problem is that it freezes even when all processes have executed the semop function. I think it is related to the fact that the operations are executed atomically (but I don't know exactly what it means). But I don't understand why it doesn't work.
Does the code subtract 1 from the semaphore and then block the process if it's not the last or is the code supposed to act in a different way?
It's hard to see what the code does without the whole function and algorithm.
By the looks of it, you apply 2 action in a single atomic action: subtract 1 from the semaphore and wait for 0.
There could be several issues if all processes freeze; the semaphore is not a shared between all processes, you got the number of processes wrong when initiating the semaphore or one process leaves the barrier, at a later point increases the semaphore and returns to the barrier.
I suggest debugging to see that all processes are actually in barrier, and maybe even printing each time you do any action on the semaphore (preferably on the same console).
As for what is an atomic action is; it is a single or sequence of operation that guarantied not to be interrupted while being executed. This means no other process/thread will interfere the action.

Lock two mutex at same time

I'm trying to implement a multi-in multi-out interthread channel class. I have three mutexes: full locks when buffer is full. empty locks when buffer is empty. th locks when anyone else is modifying buffer. My single IO program looks like
operator<<(...){
full.lock() // locks when trying to push to full buffer
full.unlock() // either it's locked or not, unlock it
th.lock()
...
empty.unlock() // it won't be empty
if(...)full.lock() // it might be full
th.unlock()
operator>>(...){
// symmetric
}
This works totally fine for single IO. But for multiple IO, when consumer thread unlocks full, all provider thread will go down, only one will obtain th and buffer might be full again because of that single thread, while there's no full check anymore. I can add a full.lock() again of course, but this is endless. Is there anyway to lock full and th at same time? I do see a similar question about this, but I don't see order is the problem here.
Yes, use std::lock(full , th);, this could avoid some deadlocks
for example:
thread1:
full.lock();
th.lock();
thread2:
th.lock();
full.lock();
this could cause a deadlock, but the following don't:
thread1:
std::lock(full, th);
thread2:
std::lock(th, full);
No, you can't atomically lock two mutexes.
Additionally, it looks like you are locking a mutex in one thread and then unlocking it in another. That's not allowed.
I suggest switching to condition variables for this problem. Note that it's perfectly fine to have one mutex associated with multiple condition variables.
No, you cannot lock two mutexes at once, but you can use a std::condition_variable for the waiting threads and invoke notify_one when you are done.
See here for further details.
Functonality you try to achieve would require something similar to System V semaphores, where group of operations on semaphors could be applied atomically. In your case you would have 3 semaphores:
semaphore 1 - locking, initialized to 0
semaphore 2 - counter of available data, initialized to 0
semaphore 3 - counter of available buffers, initialized how much buffers you have
then push operation would do this group to lock:
check semaphore 1 is 0
increase semaphore 1 by +1
increase semaphore 2 by +1
decrease semaphore 3 by -1
then
decrease semaphore 1 by -1
to unlock. then to pull data first group would be changed to:
check semaphore 1 is 0
increase semaphore 1 by +1
decrease semaphore 2 by -1
increase semaphore 3 by +1
unlock is the same as before. Using mutexes, which are special case semaphores most probably would not solve your problem this way. First of all they are binary ie only have 2 states but more important API does not provide group operations on them. So you either find semaphore implementation for your platform or use single mutex with condition variable(s) to signal waiting threads that data or buffer is available.

QThread always one thread left behind

This is my thread:
class calThread : public QThread
{
Q_OBJECT
public:
calThread(QList<int> number);
~calThread();
void run();
QList<int> cal(QList<int> input);
signals:
void calFinished(QList<int> result);
};
void calThread::run()
{
output = cal(number);
emit calFinished(output);
sleep(1);
}
This is how I call the thread:
calThread* worker3 = new calThread(numberList);
connect(worker3, SIGNAL(calFinished(List<int>)), this, SLOT(handleResult(List<int>)));
connect(worker3, SIGNAL(finished()), worker3, SLOT(deleteLater()));
worker3->start();
I have a large list of input. I divide the list into four equal sized list and put them into individual thread to calculate. They are namely worker0 to worker3.
Every time the program runs, the four threads start at similar time. But there is always one thread that returns much much slower. For example, it takes about 2 minutes for first 3 threads to finish, the fourth thread takes maybe 5 minutes to return.
But all thread should have same number of items and same complexity to calculate.
Why is there always a thread left behind?
Debug output:
inputThread0 item numbers: 1736
inputThread1 item numbers: 1736
inputThread2 item numbers: 1736
inputThread3 item numbers: 1737
"20:29:58" Thread 0 Thread ID 0x7f07119df700
"20:29:58" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:58" Thread 1 Thread ID 0x7f06fd1d5700
"20:29:58" Thread 2 Thread ID 0x7f06fc9d4700
"20:29:58" Thread 0 Thread ID 0x7f07119df700
"20:29:58" Thread 1 Thread ID 0x7f06fd1d5700
….............................
//Most of them are Thread 0,1,2 afterward
….............................
"20:29:58" Thread 1 Thread ID 0x7f06fd1d5700
// This is last Thread from thread 0,1,or2
// It takes less than one second to finish
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
"20:29:59" Thread 3 Thread ID 0x7f06fc1d3700
….................................
// Only Thread 3 left
"20:30:17" Thread 3 Thread ID 0x7f06fc1d3700
// This last thread takes 19 second to finish
"Why is there always a thread left behind?"
Why not? Thread scheduling is completely on the whim of the OS. There is no guarantee at all that any threads will get any sort of a "fair share" of any CPU resources. You need to assign small chunks of work and have them automatically distributed across the worker threads. QtConcurrent::run and the QtConcurrent framework in general offers a trivial way of getting that done. As long as the chunks of work passed to run, mapReduce, etc. are reasonably sized (say take between 0.1 and 1s), all of the threads in the pool will be done within a couple tenths of a second of each one.
A partial explanation for your observed behavior is that a thread that already runs on a given core is more likely to be rescheduled on the same core to utilize the warm caches there. If there are three out of four cores that run your threads almost continuously, the fourth thread often ends up sharing the core with your GUI thread, and will necessarily run slower if the GUI is not idle. If the GUI thread is busy processing the results from the other threads, it is not unexpected that the computation thread would be starved on that core. This is actually the most power- and time-efficient way to schedule the threads, with least amount of overhead.
As long as you give the threads small chunks of work to do, and distribute them on as-ready basis - as QtConcurrent does - it will also lead to smallest wall clock runtimes. If the scheduler was forcing "fair" reschedules, your long-running threads would all finish roughly at the same time, but would take more time and power to finish the job. Modern schedulers enable you to run your jobs most efficiently, but you must set the jobs up to take advantage of that.
In a way, your scheduler is helping you improve your code to be more resource-efficient. That's a good thing.

Low performance of boost::barrier, wait operation

I have performance issue with boost:barrier. I measure time of wait method call, for single thread situation when call to wait is repeated around 100000 it takes around 0.5 sec. Unfortunately for two thread scenario this time expands to 3 seconds and it is getting worse with every thread ( I have 8 core processor).
I implemented custom method which is responsible for providing the same functionality and it is much more faster.
Is it normal to work so slow for this method. Is there faster way to synchronize threads in boost (so all threads wait for completion of current job by all threads and then proceed to the next task, just synchronization, no data transmission is required).
I have been asked for my current code.
What I want to achieve. In a loop I run a function, this function can be divided into many threads, however all thread should finish current loop run before execution of another run.
My current solution
volatile int barrierCounter1 =0; //it will store number of threads which completed current loop run
volatile bool barrierThread1[NumberOfThreads]; //it will store go signal for all threads with id > 0. All values are set to false at the beginning
boost::mutex mutexSetBarrierCounter; //mutex for barrierCounter1 modification
void ProcessT(int threadId)
{
do
{
DoWork(); //function which should be executed by every thread
mutexSetBarrierCounter.lock();
barrierCounter1++; //every thread notifies that it finish execution of function
mutexSetBarrierCounter.unlock();
if(threadId == 0)
{
//main thread (0) awaits for completion of all threads
while(barrierCounter1!=NumberOfThreads)
{
//I assume that the number of threads is lower than the number of processor cores
//so this loop should not have an impact of overall performance
}
//if all threads completed, notify other thread that they can proceed to the consecutive loop
for(int i = 0; i<NumberOfThreads; i++)
{
barrierThread1[i] = true;
}
//clear counter, no lock is utilized because rest of threads await in else loop
barrierCounter1 = 0;
}
else
{
//rest of threads await for "go" signal
while(barrierThread1[i]==false)
{
}
//if thread is allowed to proceed then it should only clean up its barrier thread array
//no lock is utilized because '0' thread would not modify this value until all threads complete loop run
barrierThread1[i] = false;
}
}
while(!end)
}
Locking runs counter to concurrency. Lock contention is always worst behaviour.
IOW: Thread synchronization (in itself) never scales.
Solution: only use synchronization primitives in situations where the contention will be low (the threads need to synchronize "relatively rarely"[1]), or do not try to employ more than one thread for the job that contends for the shared resource.
Your benchmark seems to magnify the very worst-case behavior, by making all threads always wait. If you have a significant workload on all workers between barriers, then the overhead will dwindle, and could easily become insignificant.
Trust you profiler
Profile only your application code (no silly synthetic benchmarks)
Prefer non-threading to threading (remember: asynchrony != concurrency)
[1] Which is highly relative and subjective