Is this function thread-safe?

Is this function thread-safe? - c++

I am learning multi-threading and for the sake of understanding I have wriiten a small function using multithreading...it works fine.But I just want to know if that thread is safe to use,did I followed the correct rule.
void CThreadingEx4Dlg::OnBnClickedOk()
{
//in thread1 100 elements are copied to myShiftArray(which is a CStringArray)
thread1 = AfxBeginThread((AFX_THREADPROC)MyThreadFunction1,this);
WaitForSingleObject(thread1->m_hThread,INFINITE);
//thread2 waits for thread1 to finish because thread2 is going to make use of myShiftArray(in which thread1 processes it first)
thread2 = AfxBeginThread((AFX_THREADPROC)MyThreadFunction2,this);
thread3 = AfxBeginThread((AFX_THREADPROC)MyThreadFunction3,this);
}
UINT MyThreadFunction1(LPARAM lparam)
{
CThreadingEx4Dlg* pthis = (CThreadingEx4Dlg*)lparam;
pthis->MyFunction(0,100);
return 0;
}
UINT MyThreadFunction2(LPARAM lparam)
{
CThreadingEx4Dlg* pthis = (CThreadingEx4Dlg*)lparam;
pthis->MyCommonFunction(0,20);
return 0;
}
UINT MyThreadFunction3(LPARAM lparam)
{
CThreadingEx4Dlg* pthis = (CThreadingEx4Dlg*)lparam;
WaitForSingleObject(pthis->thread3->m_hThread,INFINITE);
//here thread3 waits for thread 2 to finish so that thread can continue
pthis->MyCommonFunction(21,40);
return 0;
}
void CThreadingEx4Dlg::MyFunction(int minCount,int maxCount)
{
for(int i=minCount;i<maxCount;i++)
{
//assume myArray is a CStringArray and it has 100 elemnts added to it.
//myShiftArray is a CStringArray -public to the class
CString temp;
temp = myArray.GetAt(i);
myShiftArray.Add(temp);
}
}
void CThreadingEx4Dlg::MyCommonFunction(int min,int max)
{
for(int i = min;i < max;i++)
{
CSingleLock myLock(&myCS,TRUE);
CString temp;
temp = myShiftArray.GetAt(i);
//threadArray is CStringArray-public to the class
threadArray.Add(temp);
}
myEvent.PulseEvent();
}

Which function do you intend to be "thread-safe"?
I think that the term should be applied to your CommonFunction. This is a function that you intend to be called be several (two in this first case) threads.
I think your code has a rule on the lines of:
Thread 2 do some work
meanwhile Thread 3 wait until Thread 2 finishes then you do some work
In fact your code has
WaitForSingleObject(pthis->thread3->m_hThread,INFINITE);
maybe waits for the wrong thread?
But back to thread safety. Where is the policing of the safety? It's in the control logic of your threads. Suppose you had lots of threads, how would you extend what you've written? You have lots of logic of the kind:
if thread a has finished and thread b has finished ...
Really hard to get right and maintain. Instead you need to make CommonFunction truly thread safe, that is it needs to tolerate being called by several threads at the same time.
In this case you might do that by putting some kind of mutex around the critical part of the code, which perhaps in this case is the whole function - it's not clear whether you intend to keep the items you copy together or whether you mind if the values are interleaved.
In the latter case the only question is whether access to myArray and myShiftArray are thread safe collections
temp = myArray.GetAt(i);
myShiftArray.Add(temp);
all your other variables are local, on the stack so owned by current threads - so you just need to consult the documentation for those collections to determine if they can safely be called by separate threads.

As I've pointed out before what you are doing is entirely pointless you may as well not use threads as you fire a thread off and then wait for the thread to complete before doing anything further.
You give precious little information about your CEvent but your WaitForSingleObjects are waiting for the thread to enter a signalled state (ie for them to exit).
As MyCommonFunction is where the actual potentially thread un-safe thing occurs you have correctly critical sectioned the area, however, threads 2 and threads 3 don't run concurrently. Remove the WaitForSingleObject from MyThreadFunction3 and then you will have both running concurrently in a thread-safe manner, thanks to the critical section.
That said its still a tad pointless as both threads are going to spend most of their time waiting for the critical section to come free. In general you want to structure threads so that there is precious little they need to hit critical sections for and then, when they hit a critical section, hit it only for a very short time (ie not the vast majority of the function's processing time).
Edit:
A Critical section works by saying I'm holding this critical section anything else that wants it has to wait. This means that Thread 1 enters the critical section and begins to do what it needs to do. Thread 2 then comes along and says "I want to use the critical section". The kernel tell its "Thread 1 is using the critical section you have to wait your turn". Thread 3 comes along and gets told the same thing. Threads 2 and 3 are now in a wait state waiting for that critical section to come free. When Thread 1 finishes with the critical section both Threads 2 and 3 race to see who gets to hold the critical section first and when one obtains it the other has to continue waiting.
Now in your example above there would be so much waiting for critical sections it is possible that Thread 1 can be in the critical section and Thread 2 waiting and before Thread 2 has been given the chance to enter the critical section Thread 1 has looped back round and re-entered it. This means that Thread 1 could end up doing all its work before Thread 2 ever gets a chance to enter the critical section. Therefore keeping the amount of work done in the critical section compared to the rest of the loop/function as low as possible will aid the Threads running simultaneously. In your example one thread will ALWAYS be waiting for the other thread and hence just doing it serially may actually be faster as you have no kernel threading overheads.
ie the more you avoid CriticalSections the less time lost for threads waiting for each other. They are necessary, however, as you NEED to make sure that 2 threads don't try and operate on the same object at the same time. Certain in-built objects are "atomic" which can aid you on this but for non-atomic operations a critical section is a must.
An Event is a different sort of synchronisation object. Basically an event is an object that can be one of 2 states. Signalled or not-signalled. If you WaitForSingleObject on a "not-signalled" event then the thread will be put to sleep until it enters a signalled state.
This can be useful when you have a thread that MUST wait for another thread to complete something. In general though you want to avoid using such synchronisation objects as much as possible as it destroys the parallel-ness of your code.
Personally I use them when I have a worker thread waiting for when it needs to do something. The Thread sits in a wait state most of its time and then when some background processing is required I signal the event. The thread then jumps into life and does what it needs to do before looping back round and re-entering the wait state. You can also mark a variable as indicating that the object needs to exit. This way you can set an exit variable to true and then signal the waiting thread. The waiting thread wakes up and says "I should exit" and then exits. Be warned though that you "may" need a memory barrier that says make sure the exit variable is set before the event is woken up otherwise the compiler might re-order the operations. This could end up leaving your thread waking up finding out that the exit variable isn't set doing its thing and then going back to sleep. However the thread that originally sent the signal now assumes the thread has exited when it actually hasn't.
Whoever said multi-threading was easy eh? ;)

It looks like this is going to work because the threads aren't going to do any work concurrently. If you change the code to make the threads perform work at the same time, you will need to put mutexes (in MFC you can use CCriticalSection for that) around the code that accesses data members which are shared between the threads.

Related

C++ wait notify in threads with synchronized queues

I have a program structured like that: one thread that receives tasks and writes them to input queue, multiple which process them and write in output queue, one that responds with results from it. When queue is empty, thread sleeps for several milliesconds. Queue has mutex inside it, pushing does lock(), and popping does try_lock() and returns if there is nothing in queue.
This is processing thread for example:
//working - atomic bool
while (working) {
if (!inputQue_->pop(msg)) {
std::this_thread::sleep_for(std::chrono::milliseconds(200));
continue;
} else {
string reply = messageHandler_->handle(msg);
if (!reply.empty()) {
outputQue_->push(reply);
}
}
}
And the thing that I dont like is that the time since receiving task until responding, as i have measured with high_resolution_clock, is almost 0, when there is no sleeping. When there is sleeping, it becomes bigger.
I dont want cpu resources to be wasted and want to do something like that: when recieving thread gets task, it notifies one of the processing threads, that does wait_for, and when processing task is done, it notifies responding thread same way. As a result I think i will get less time spent and cpu resources will not be wasted. And I have some questions:
Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?

Will this work the way that I see it supposed to, and the only difference will be waking up on notifying?
Yes, assuming you do it correctly.
To do this, I have to create 2 condition variables: first same for receiving thread and all processing, second same for all processing and responding? And mutex in processing threads has to be common for all of them or uniuqe?
You can use a single mutex and a single condition variable, but that makes it a bit more complex. I'd suggest a single mutex, but one condition variable for each condition a thread might want to wait for.
Can I place creation of unique_lock(mutex) and wait_for() in if branch just instead of sleep_for?
Absolutely not. You need to hold the mutex while you check whether the queue is empty and continue to hold it until you call wait_for. Otherwise, you destroy the entire logic of the condition variable. The mutex associated with the condition variable must protect the condition that the thread is going to wait for, which in this case is the queue being non-empty.
If some processing threads are busy, is it possible that notify_one() can try to wake up one of them, but not the free thread? I need to use notify_all()?
I don't know what you mean by the "free thread". As a general rule, you can use notify_one if it's not possible for a thread to be blocked on the condition variable that can't handle the condition. You should use notify_all if either more than one thread might need to be awoken or there's a possibility that more than one thread will be blocked on the condition variable and the "wrong thread" could be woken, that is, there could be at least one thread that can't do whatever it is that needs to be done.
Is it possible that notify will not wake up any of threads? If yes, does it have high probability?
Sure, it's quite possible. But that would mean no threads were blocked on the condition. In that case, no thread can block on the condition because threads must check the condition before they wait, and they do it while holding a mutex. To provide this atomic "unlock and wait" semantic is the entire purpose of a condition variable.

The mechanism you have is called polling. The thread repeatedly checks (polls) if there is data available. As you mentioned, it has the drawback of wasting time. (But it is simple). What you mentioned you would like to use is called a blocking mechanism. This deschedules the thread until the moment that work becomes available.
1) Yes (although I don't know exactly what you're imagining)
2) a) Yes, 2 condition variables is one way to do it. b) Common mutex is best
3) You would probably place those within pop, and calling pop would have the potential to block.
4) No. notify_one will only wake a thread that is currently waiting from having called wait. Also, if multiple are waiting, it is not necessarily guaranteed which will receive the notification. (OS/library dependent)
5) No. If 1+ threads are waiting, notify_one it is guaranteed to wake one. BUT if no threads are waiting, the notification is consumed (and has no effect). Note that under certain edge conditions, notify_one may actually wake more than one. Also, a thread may wake from wait without anyone having called notify_one ("Spurious wake up"). The fact that this can happen at all means that you always have to do additional checking for it.
This is called the producer/consumer problem btw.

In general, your considerations about condition variable are correct. My proposal is more connected to design and reusability of such functionality.
The main idea is to implement ThreadPool pattern, which has constructor with number of worker threads ,methods submitTask, shutdown, join.
Having such class, you will use 2 instances of pools: one multithreaded for processing, second (singlethreaded by your choice) for result sending.
The pool consists of Blocking Queue of Tasks and array of Worker threads, each performing the same "pop Task and run" loop.The Blocking Queue encapsulates mutex and cond_var. The Task is common functor.
This also brings your design to Task oriented approach, which has a lot of advantages in future of your application.
You are welcome to ask more questions about implementation details if you like this idea.
Best regards, Daniel

What happens to a thread, waiting on condition variable, that is getting joined?

I've got a class named TThreadpool, which holds member pool of type std::vector<std::thread>>, with the following destructor:
~TThreadpool() {
for (size_t i = 0; i < pool.size(); i++) {
assert(pool[i].joinable());
pool[i].join();
}
}
I'm confident that when destructor is called, all of the threads are waiting on a single condition variable (spurious wakeup controlled with always-false predicate), and joinable outputs true.
Reduced example of running thread would be:
void my_thread() {
std::unique_lock<std::mutex> lg(mutex);
while (true) {
my_cond_variable.wait(lg, [] {
return false;
});
# do some work and possibly break, but never comes farther then wait
# so this probably should not matter
}
}
To check what threads are running, I'm launching top -H. At the start of the program, there are pool.size() threads + 1 thread where TThreadpool itself lives. And to my surprise, joining these alive threads does not remove them from list of threads that top is giving. Is this expected behaviour?
(Originally, my program was a bit different - I made a simple ui application using qt, that used threadpool running in ui thread and other threads controlled by threadpool, and on closing the ui window joining of threads had been called, but QtCreator said my application still worked after I closed the window, requiring me to shut it down with a crash. That made me check state of my threads, and it turned out it had nothing to do with qt. Although I'm adding this in case I missed some obvious detail with qt).
A bit later, I tried not asserting joinable, but printing it, and found out the loop inside Threadpool destructor never moved further than first join - the behaviour I did not expect and cannot explain

join() doesn't do anything to the child thread -- all it does is block until the child thread has exited. It only has an effect on the calling thread (i.e. by blocking its progress). The child thread can keep running for as long as it wants (although typically you'd prefer it to exit quickly, so that the thread calling join() doesn't get blocked for a long time -- but that's up to you to implement)

And to my surprise, joining these alive threads does not remove them from list of threads that top is giving. Is this expected behaviour?
That suggests the thread(s) are still running. Calling join() on a thread doesn't have any impact on that running thread; simply the calling thread
waits for the called-on thread to exit.
found out the loop inside Threadpool destructor never moved further than first join
That means the first thread hasn't completed yet. So none of the other threads haven't been joined yet either (even if they have exited).
However, if the thread function is implemented correctly, the first thread (and all other threads in the pool) should eventually complete and
the join() calls should return (assuming the threads in the pool are supposed to exit - but this doesn't need to true in general.
Depending on application, you could simply make the threads run forever too).
So it appears there's some sort of deadlock or wait for some resource that's holding up one or more threads. So you need to run through a debugger.
Helgrind would be very useful.
You could also try to reduce the number of threads (say 2) and to see if the problem becomes reproducible/obvious and then you could increase the threads.

Worker Thread permanently hibernates, after executing too fast

I am trying to incorporate threads into my project, but have a problem where using merely 1 worker thread makes it "fall asleep" permanently. Perhaps I have a race condition, but just can't notice it.
My PeriodicThreads object maintains a collection of threads. Once PeriodicThreads::exec_threads() has been invoked, the threads are notified, are awaken and preform their task. Afterwards, they fall back to sleep.
Function of such a worker-thread:
void PeriodicThreads::threadWork(size_t threadId){
//not really used, but need to decalre to use conditional_variable:
std::mutex mutex;
std::unique_lock<std::mutex> lck(mutex);
while (true){
// wait until told to start working on a task:
while (_thread_shouldWork[threadId] == false){
_threads_startSignal.wait(lck);
}
thread_iteration(threadId); //virtual function
_thread_shouldWork[threadId] = false; //vector of flags
_thread_doneSignal.notify_all();
}//end while(true) - run until terminated externally or this whole obj is deleted
}
As you can see, each thread is monitoring its own entry in a vector of flags, and once it sees that it's flag is true - performs the task then resets its flag.
Here is the function that can awaken all the threads:
std::atomic_bool _threadsWorking =false;
//blocks the current thread until all worker threads have completed:
void PeriodicThreads::exec_threads(){
if(_threadsWorking ){
throw std::runtime_error("you requested exec_threads(), but threads haven't yet finished executing the previous task!");
}
_threadsWorking = true;//NOTICE: doing this after the exception check.
//tell all threads to unpause by setting their flags to 'true'
std::fill(_thread_shouldWork.begin(), _thread_shouldWork.end(), true);
_threads_startSignal.notify_all();
//wait for threads to complete:
std::mutex mutex;
std::unique_lock<std::mutex> lck(mutex); //lock & mutex are not really used.
auto isContinueWaiting = [&]()->bool{
bool threadsWorking = false;
for (size_t i=0; i<_thread_shouldWork.size(); ++i){
threadsWorking |= _thread_shouldWork[i];
}
return threadsWorking;
};
while (isContinueWaiting()){
_thread_doneSignal.wait(lck);
}
_threadsWorking = false;//set atomic to false
}
Invoking exec_threads() works fine for several hundred or in rare cases several thousand consecutive iterations. Invocations occur from the main thread's while loop. Its worker thread processes the task, resets its flag and goes back to sleep until the next exec_threads(), and so on.
However, some time after that, the program snaps into a "hibernation", and seems to pause, but doesn't crash.
During such a "hibernation" putting a breakpoint at any while-loop of my condition_variables never actualy causes that breakpoint to trigger.
Being sneaky, I've created my own verify-thread (parallel to main) and monitor my PeriodicThreads object. As it falls into hibernation, my verify-thread keeps outputting to the console me that no threads are currently running (the _threadsWorking atomic of PeriodicThreads is permanently set to false). However, during the other tests the atomic remains as true, once that "hibernation issue" begins.
The strange thing is that if I force the PeriodicThreads::run_thread to sleep for at least 10 microseconds before resetting its flag, things work as normal, and no "hibernation" occurs. Otherwise, if we allow thread to complete it's task very quickly it might cause this whole issue.
I've wrapped each condition_variable inside a while loop to prevent spurious wakes from triggering transition, and situation where notify_all is called before .wait() is called on it. Link
Notice, this occurs even when I have only 1 worker thread
What could be the cause?
Edit
Abandoning these vector flags and just testing on a single atomic_bool with 1 worker thread still shows the same issue.

All shared data should be protected by a mutex. The mutex should have (at least) the same scope as the shared data.
Your _thread_shouldWork container is shared data. You can make a global array of mutexes and each one can protect its own _thread_shouldWork element. (see note below). You should also have at least as many condition variables as you have mutexes. (You can use 1 mutex with several different condition variables, but you should not use several different mutexes with 1 condition variable.)
A condition_variable should protect an actual condition (in this case, the state of an individual element of _thread_shouldWork at any given point) and the mutex is used to protect the variables that encompass that condition.
If you're just using a random local mutex (as you are in your thread code) or just not using a mutex at all (in the main code), then all bets are off. It's undefined behavior. Although I could see it working (by luck) most of the time. What I suspect is happening is that a worker thread is missing the signal from the main thread. It could also be that your main thread is missing the signal from a worker thread. (Thread A reads the state and enters the while loop, then Thread B changes the state and sends the notification, then Thread A goes to sleep... waiting for a notification that was already sent)
Mutexes with local scope are a red flag!
Note: If you're using a vector, you have to watch out because adding or removing items can trigger a resize which will touch elements without grabbing the mutex first (because of course the vector doesn't know about your mutex).
You also have to watch out for false sharing when using arrays
Edit: Here's a video that #Kari found useful for explaining false sharing
https://www.youtube.com/watch?v=dznxqe1Uk3E

unlock the mutex after condition_variable::notify_all() or before?

Looking at several videos and the documentation example, we unlock the mutex before calling the notify_all(). Will it be better to instead call it after?
The common way:
Inside the Notifier thread:
//prepare data for several worker-threads;
//and now, awaken the threads:
std::unique_lock<std::mutex> lock2(sharedMutex);
_threadsCanAwaken = true;
lock2.unlock();
_conditionVar.notify_all(); //awaken all the worker threads;
//wait until all threads completed;
//cleanup:
_threadsCanAwaken = false;
//prepare new batches once again, etc, etc
Inside one of the worker threads:
while(true){
// wait for the next batch:
std::unique_lock<std::mutex> lock1(sharedMutex);
_conditionVar.wait(lock1, [](){return _threadsCanAwaken});
lock1.unlock(); //let sibling worker-threads work on their part as well
//perform the final task
//signal the notifier that one more thread has completed;
//loop back and wait until the next task
}
Notice how the lock2 is unlocked before we notify the condition variable - should we instead unlock it after the notify_all() ?
Edit
From my comment below: My concern is that, what if the worker spuriously awakes, sees that the mutex is unlocked, super-quickly completes the task and loops back to the start of while. Now the slow-poke Notifier finally calls notify_all(), causing the worker to loop an additional time (excessive and undesired).

There are no advantages to unlocking the mutex before signaling the condition variable unless your implementation is unusual. There are two disadvantages to unlocking before signaling:
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
There is a performance penalty for unlocking before signaling that is avoided by unlocking after signaling. If you signal before you unlock, a good implementation will know that your signal cannot possibly render any thread ready-to-run because the mutex is held by the calling thread and any thread affects by the condition variable necessarily cannot make forward progress without the mutex. This permits a significant optimization (often called "wait morphing") that is not possible if you unlock first.
So signal while holding the lock unless you have some unusual reason to do otherwise.

should we instead unlock it after the notify_all() ?
It is correct to do it either way but you may have different behavior in different situations. It is quite difficult to predict how it will affect performance of your program - I've seen both positive and negative effects for different applications. So it is better you profile your program and make decision on your particular situation based on profiling.

As mentioned here : cppreference.com
The notifying thread does not need to hold the lock on the same mutex
as the one held by the waiting thread(s); in fact doing so is a
pessimization, since the notified thread would immediately block
again, waiting for the notifying thread to release the lock.
That said, documentation for wait
At the moment of blocking the thread, the function automatically calls
lck.unlock(), allowing other locked threads to continue.
Once notified (explicitly, by some other thread), the function
unblocks and calls lck.lock(), leaving lck in the same state as when
the function was called. Then the function returns (notice that this
last mutex locking may block again the thread before returning).
so when notified wait will re-attempt to gain the lock and in that process it will get blocked again till original notifying thread releases the lock.
So I'll suggest that release the lock before calling notify. As done in example on cppreference.com and most importantly
Don't be Pessimistic.

David's answer seems to me wrong.
First, assuming the simple case of two threads, one waiting for the other on a condition variable, unlocking first by the notifier will not waken the other waiting thread, as the signal has not arrived. Then the notify call will immediately waken the waiting thread. You do not need any special optimizations.
On the other hand, signalling first has the potential of waking up a thread and making it sleep immediately again, as it cannot hold the lock—unless wait morphing is implemented.
Wait morphing does not exist in Linux at least, according to the answer under this StackOverflow question: Which OS / platforms implement wait morphing optimization?
The cppreference example also unlocks first before signalling: https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all
It explicit says:
The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s). Doing so may be a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock, though some implementations recognize the pattern and do not attempt to wake up the thread that is notified under lock.

should we instead unlock it after the notify_all() ?
After reading several related posts, I've formed the opinion that it's purely a performance issue. If OS supports "wait morphing", unlock after; otherwise, unlock before.
I'm adding an answer here to augment that of #DavidSchwartz 's. Particularly, I'd like to clarify his point 1.
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
The 1st thing I said is that, because it's a CV and not a Mutex, a better term for the so-called "deadlock" might be "sleep paralysis" - a mistake some programs make is that
a thread that's supposed to wake
went to sleep due to not rechecking the condition it's been waiting for before wait'ng again.
The 2nd thing is that, when waking some other thread(s),
the default choice should be broadcast/notify_all (broadcast is the POSIX term, which is equivalent to its C++ counterpart).
signal/notify is an optimized special case used for when there's only 1 other thread is waiting.
Finally 3rd, David is adamant that
it's better to unlock after notify,
because it can avoid the "deadlock" which I've been referring to as "sleep paralysis".
If it's unlock then notify, then there's a window where another thread (let's call this the "wrong" thread) may i.) acquire the mutex, ii.)going into wait, and iii.) wake up. The steps i. ii. and iii. happens too quickly, consumed the signal, leaving the intended (let's call it "correct") thread in sleep.
I discussed this extensively with David, he clarified that only when all 3 points are violated ( 1. condvar associated with several separate conditions and/or didn't check it before waiting again; 2. signal/notify only 1 thread when there're more than 1 other threads using the condvar; 3. unlock before notify creating a window for race condition ), the "sleep paralysis" would occur.
Finally, my recommendation is that, point 1 and 2 are essential for correctness of the program, and fixing issues associated with 1 and 2 should be prioritized over 3, which should only be a augmentative "last resort".
For the purpose of providing reference, manpage for signal/broadcast and wait contains some info from version 3 of Single Unix Specification that gave some explanations on point 1 and 2, and partly 3. Although specified for POSIX/Unix/Linux in C, it's concepts are applicable to C++.
As of this writing (2023-01-31), the 2018 edition of version 4 of Single Unix Specification is released, and the drafting of version 5 is underway.

How to cleanly exit a threaded C++ program?

I am creating multiple threads in my program. On pressing Ctrl-C, a signal handler is called. Inside a signal handler, I have put exit(0) at last. The thing is that sometimes the program terminates safely but the other times, I get runtime error stating
abort() has been called
So what would be the possible solution to avoid the error?

The usual way is to set an atomic flag (like std::atomic<bool>) which is checked by all threads (including the main thread). If set, then the sub-threads exit, and the main thread starts to join the sub-threads. Then you can exit cleanly.
If you use std::thread for the threads, that's a possible reason for the crashes you have. You must join the thread before the std::thread object is destructed.

Others have mentioned having the signal-handler set a std::atomic<bool> and having all the other threads periodically check that value to know when to exit.
That approach works well as long as all of your other threads are periodically waking up anyway, at a reasonable frequency.
It's not entirely satisfactory if one or more of your threads is purely event-driven, however -- in an event-driven program, threads are only supposed to wake up when there is some work for them to do, which means that they might well be asleep for days or weeks at a time. If they are forced to wake up every (so many) milliseconds simply to poll an atomic-boolean-flag, that makes an otherwise extremely CPU-efficient program much less CPU-efficient, since now every thread is waking up at short regular intervals, 24/7/365. This can be particularly problematic if you are trying to conserve battery life, as it can prevent the CPU from going into power-saving mode.
An alternative approach that avoids polling would be this one:
On startup, have your main thread create an fd-pipe or socket-pair (by calling pipe() or socketpair())
Have your main thread (or possibly some other responsible thread) include the receiving-socket in its read-ready select() fd_set (or take a similar action for poll() or whatever wait-for-IO function that thread blocks in)
When the signal-handler is executed, have it write a byte (any byte, doesn't matter what) into the sending-socket.
That will cause the main thread's select() call to immediately return, with FD_ISSET(receivingSocket) indicating true because of the received byte
At that point, your main thread knows it is time for the process to exit, so it can start directing all of its child threads to start shutting down (via whatever mechanism is convenient; atomic booleans or pipes or something else)
After telling all the child threads to start shutting down, the main thread should then call join() on each child thread, so that it can be guaranteed that all of the child threads are actually gone before main() returns. (This is necessary because otherwise there is a risk of a race condition -- e.g. the post-main() cleanup code might occasionally free a resource while a still-executing child thread was still using it, leading to a crash)

The first thing you must accept is that threading is hard.
A "program using threading" is about as generic as a "program using memory", and your question is similar to "how do I not corrupt memory in a program using memory?"
The way you handle threading problem is to restrict how you use threads and the behavior of the threads.
If your threading system is a bunch of small operations composed into a data flow network, with an implicit guarantee that if an operation is too big it is broken down into smaller operations and/or does checkpoints with the system, then shutting down looks very different than if you have a thread that loads an external DLL that then runs it for somewhere from 1 second to 10 hours to infinite length.
Like most things in C++, solving your problem is going to be about ownership, control and (at a last resort) hacks.
Like data in C++, every thread should be owned. The owner of a thread should have significant control over that thread, and be able to tell it that the application is shutting down. The shut down mechanism should be robust and tested, and ideally connected to other mechanisms (like early-abort of speculative tasks).
The fact you are calling exit(0) is a bad sign. It implies your main thread of execution doesn't have a clean shutdown path. Start there; the interrupt handler should signal the main thread that shutdown should begin, and then your main thread should shut down gracefully. All stack frames should unwind, data should be cleaned up, etc.
Then the same kind of logic that permits that clean and fast shutdown should also be applied to your threaded off code.
Anyone telling you it is as simple as a condition variable/atomic boolean and polling is selling you a bill of goods. That will only work in simple cases if you are lucky, and determining if it works reliably is going to be quite hard.

Additional to Some programmer dude answer and related to discussion in the comment section, you need to make the flag that controls termination of your threads as atomic type.
Consider following case :
bool done = false;
void pending_thread()
{
while(!done)
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
Here worker thread can be your main thread also and done is terminating flag of your thread, but pending thread need to do something with given data by working thread, before exiting.
this example has race condition and undefined behaviour along with it, and it's really hard to find what is the actual problem int the real world.
Now the corrected version using std::automic :
std::atomic<bool> done(false);
void pending_thread()
{
while(!done.load())
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
You can exit thread without being concern of race condition or UB.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js