I am surprised to see from pstack that this code leads to deadlock! I don't see a reason for the same.
pthread_mutex_t lock;
_Cilk_for (int i = 0; i < N; ++i) {
int ai = A[i];
if (ai < pivot) {
pthread_mutex_lock(&lock);
A[ia++] = ai;
pthread_mutex_unlock(&lock);
}
else if (ai > pivot) {
pthread_mutex_lock(&lock);
A[ib++] = ai;
pthread_mutex_unlock(&lock);
}
else {
pthread_mutex_lock(&lock);
A[ic++] = ai;
pthread_mutex_unlock(&lock);
}
}
I am just using mutexes to make sure that access to A is atomic and serialized.
What is wrong with this code to lead to deadlock?
Is there a better way to implement this?
If that's code inside a function, then you're not initialising the mutex correctly. You need to set it to PTHREAD_MUTEX_INITIALIZER (for a simple, default mutex) or do a pthread_mutex_init() on it (for more complex requirements). Without proper initialisation, you don't know what state the mutex starts in - it may well be in a locked state simply because whatever happened to be on the stack at that position looked like a locked mutex.
That's why it always needs to be initialised somehow, so that there is no doubt of the initial state.
Another potential problem you may have is this:
int ai = A[i];
You probably should protect that access with the same mutex since otherwise you may read it in a "half-state" (when another thread is only part way through updating the variable).
And, I have to say, I'm not sure that threads are being used wisely here. The use of mutexes is likely to swamp a statement like A[ia++] = ai to the point where the vast majority of time will be spent locking and unlocking the mutex. They're generally more useful where the code being processed during the lock is a little more substantial.
You may find a non-threaded variant will blow this one out of the water (but, of course, don't take my word for it - my primary optimisation mantra is "measure, don't guess").
Your pthread_mutex_t lock is not properly initialized, so, since it is a local variable, it may contain garbage, and might be in a strangely locked state. You should call pthread_mutex_init or initialize your lock with PTHREAD_MUTEX_INITIALIZER
As others complained, you are not wisely using mutexes. The critical sections of your code are much too small.
AFTER you fix or otherwise verify that you are in fact initializing your lock:
pstack may be privy to control mechanisms introduced by _Cilk_for that are interfering with what would otherwise be reasonable pthread code.
A quick search shows there are mutex solutions for use with Cilk - intermixing Cilk and pthreads isn't mentioned. It does look like Cilk is a layer on top of pthreads - so if Cilk chose to put a wrapper around mutex, they likely did so for a good reason. I'd suggest staying with the Cilk API.
That aside, there's a more fundamental issue with your algorithm. In your case, the overhead for creating parallel threads and synchronizing them likely dwarfs the cost of executing the code in the body of the for-loop. It's very possible this will run faster without parallelizing it.
Related
I'd like to write a function that is accessible only by a single thread at a time. I don't need busy waits, a brutal 'rejection' is enough if another thread is already running it. This is what I have come up with so far:
std::atomic<bool> busy (false);
bool func()
{
if (m_busy.exchange(true) == true)
return false;
// ... do stuff ...
m_busy.exchange(false);
return true;
}
Is the logic for the atomic exchange correct?
Is it correct to mark the two atomic operations as std::memory_order_acq_rel? As far as I understand a relaxed ordering (std::memory_order_relaxed) wouldn't be enough to prevent reordering.
Your atomic swap implementation might work. But trying to do thread safe programming without a lock is most always fraught with issues and is often harder to maintain.
Unless there's a performance improvement that's needed, then std::mutex with the try_lock() method is all you need, eg:
std::mutex mtx;
bool func()
{
// making use of std::unique_lock so if the code throws an
// exception, the std::mutex will still get unlocked correctly...
std::unique_lock<std::mutex> lck(mtx, std::try_to_lock);
bool gotLock = lck.owns_lock();
if (gotLock)
{
// do stuff
}
return gotLock;
}
Your code looks correct to me, as long as you leave the critical section by falling out, not returning or throwing an exception.
You can unlock with a release store; an RMW (like exchange) is unnecessary. The initial exchange only needs acquire. (But does need to be an atomic RMW like exchange or compare_exchange_strong)
Note that ISO C++ says that taking a std::mutex is an "acquire" operation, and releasing is is a "release" operation, because that's the minimum necessary for keeping the critical section contained between the taking and the releasing.
Your algo is exactly like a spinlock, but without retry if the lock's already taken. (i.e. just a try_lock). All the reasoning about necessary memory-order for locking applies here, too. What you've implemented is logically equivalent to the try_lock / unlock in #selbie's answer, and very likely performance-equivalent, too. If you never use mtx.lock() or whatever, you're never actually blocking i.e. waiting for another thread to do something, so your code is still potentially lock-free in the progress-guarantee sense.
Rolling your own with an atomic<bool> is probably good; using std::mutex here gains you nothing; you want it to be doing only this for try-lock and unlock. That's certainly possible (with some extra function-call overhead), but some implementations might do something more. You're not using any of the functionality beyond that. The one nice thing std::mutex gives you is the comfort of knowing that it safely and correctly implements try_lock and unlock. But if you understand locking and acquire / release, it's easy to get that right yourself.
The usual performance reason to not roll your own locking is that mutex will be tuned for the OS and typical hardware, with stuff like exponential backoff, x86 pause instructions while spinning a few times, then fallback to a system call. And efficient wakeup via system calls like Linux futex. All of this is only beneficial to the blocking behaviour. .try_lock leaves that all unused, and if you never have any thread sleeping then unlock never has any other threads to notify.
There is one advantage to using std::mutex: you can use RAII without having to roll your own wrapper class. std::unique_lock with the std::try_to_lock policy will do this. This will make your function exception-safe, making sure to always unlock before exiting, if it got the lock.
I stumbled across the following Code Review StackExchange and decided to read it for practice. In the code, there is the following:
Note: I am not looking for a code review and this is just a copy paste of the code from the link so you can focus in on the issue at hand without the other code interfering. I am not interested in implementing a 'smart pointer', just understanding the memory model:
// Copied from the link provided (all inside a class)
unsigned int count;
mutex m_Mutx;
void deref()
{
m_Mutx.lock();
count--;
m_Mutx.unlock();
if (count == 0)
{
delete rawObj;
count = 0;
}
}
Seeing this makes me immediately think "what if two threads enter when count == 1 and neither see the updates of each other? Can both end up seeing count as zero and double delete? And is it possible for two threads to cause count to become -1 and then deletion never happens?
The mutex will make sure one thread enters the critical section, however does this guarantee that all threads will be properly updated? What does the C++ memory model tell me so I can say this is a race condition or not?
I looked at the Memory model cppreference page and std::memory_order cppreference, however the latter page seems to deal with a parameter for atomic. I didn't find the answer I was looking for or maybe I misread it. Can anyone tell me if what I said is wrong or right, and whether or not this code is safe or not?
For correcting the code if it is broken:
Is the correct answer for this to turn count into an atomic member? Or does this work and after releasing the lock on the mutex, all the threads see the value?
I'm also curious if this would be considered the correct answer:
Note: I am not looking for a code review and trying to see if this kind of solution would solve the issue with respect to the C++ memory model.
#include <atomic>
#include <mutex>
struct ClassNameHere {
int* rawObj;
std::atomic<unsigned int> count;
std::mutex mutex;
// ...
void deref()
{
std::scoped_lock lock{mutex};
count--;
if (count == 0)
delete rawObj;
}
};
"what if two threads enter when count == 1" -- if that happens, something else is fishy. The idea behind smart pointers is that the refcount is bound to an object's lifetime (scope). The decrement happens when the object (via stack unrolling) is destroyed. If two threads trigger that, the refcount can not possibly be just 1 unless another bug is present.
However, what could happen is that two threads enter this code when count = 2. In that case, the decrement operation is locked by the mutex, so it can never reach negative values. Again, this assumes non-buggy code elsewhere. Since all this does is to delete the object (and then redundantly set count to zero), nothing bad can happen.
What can happen is a double delete though. If two threads at count = 2 decrement the count, they could both see the count = 0 afterwards. Just determine whether to delete the object inside the mutex as a simple fix. Store that info in a local variable and handle accordingly after releasing the mutex.
Concerning your third question, turning the count into an atomic is not going to fix things magically. Also, the point behind atomics is that you don't need a mutex, because locking a mutex is an expensive operation. With atomics, you can combine operations like decrement and check for zero, which is similar to the fix proposed above. Atomics are typically slower than "normal" integers. They are still faster than a mutex though.
In both cases there’s a data race. Thread 1 decrements the counter to 1, and just before the if statement a thread switch occurs. Thread 2 decrement the counter to 0 and then deletes the object. Thread 1 resumes, sees that count is 0, and deletes the object again.
Move the unlock() to the end of th function.or, better, use std::lock_guard to do the lock; its destructor will unlock the mutex even when the delete call throws an exception.
If two threads potentially* enter deref() concurrently, then, regardless of the previous or previously expected value of count, a data race occurs, and your entire program, even the parts that you would expect to be chronologically prior, has undefined behavior as stated in the C++ standard in [intro.multithread/20] (N4659):
Two actions are potentially concurrent if
(20.1) they are performed by different threads, or
(20.2) they are unsequenced, at least one is performed by a signal handler, and they are not both performed by the same signal handler invocation.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is
not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
The potentially concurrent actions in this case, of course, are the read of count outside of the locked section, and the write of count within it.
*) That is, if current inputs allow it.
UPDATE 1: The section you reference, describing atomic memory order, explains how atomic operations synchronize with each other and with other synchronization primitives (such as mutexes and memory barriers). In other words, it describes how atomics can be used for synchronization so that some operations aren't data races. It does not apply here. The standard takes a conservative approach here: Unless other parts of the standard explicitly make clear that two conflicting accesses are not concurrent, you have a data race, and hence UB (where conflicting means same memory location, and at least one of them isn't read-only).
Your lock prevents that operation count-- gets in a mess when performed concurrently in different threads. It does not guarantee, however, that the values of count are synchronized, such that repeated reads outside a single critical section will bear the risk of a data race.
You could rewrite it as follows:
void deref()
{
bool isLast;
m_Mutx.lock();
--count;
isLast = (count == 0);
m_Mutx.unlock();
if (isLast) {
delete rawObj;
}
}
Thereby, the lock makes sure that access to count is synchronized and always in a valid state. This valid state is carried over to the non-critical section through a local variable (without race condition). Thereby, the critical section can be kept rather short.
A simpler version would be to synchronize the complete function body; this might get a disadvantage if you want to do more elaborative things than just delete rawObj:
void deref()
{
std::lock_guard<std::mutex> lock(m_Mutx);
if (! --count) {
delete rawObj;
}
}
BTW: std::atomic allone will not solve this issue as this synchronizes just each single access, but not a "transaction". Therefore, your scoped_lock is necessary, and - as this spans the complete function then - the std::atomic becomes superfluous.
I am developing an application in Qt/C++. At some point, there are two threads : one is the UI thread and the other one is the background thread. I have to do some operation from the background thread based on the value of an extern variable which is type of bool. I am setting this value by clicking a button on UI.
header.cpp
extern bool globalVar;
mainWindow.cpp
//main ui thread on button click
setVale(bool val){
globalVar = val;
}
backgroundThread.cpp
while(1){
if(globalVar)
//do some operation
else
//do some other operation
}
Here, writing to globalVar happens only when the user clicks the button whereas reading happens continuously.
So my question is :
In a situation like the one above, is mutex mandatory?
If read and write happens at the same time, does this cause the application to crash?
If read and write happens at same time, is globalVar going to have some value other than true or false?
Finally, does the OS provide any kind of locking mechanism to prevent the read/write operation to access a memory location at the same time by a different thread?
The loop
while(1){
if(globalVar)
//do some operation
else
//do some other operation
}
is busy waiting, which is extremely wasteful. Thus, you're probably better off with some classic synchronization that will wake the background thread (mostly) when there is something to be done. You should consider adapting this example of std::condition_variable.
Say you start with:
#include <thread>
#include <mutex>
#include <condition_variable>
std::mutex m;
std::condition_variable cv;
bool ready = false;
Your worker thread can then be something like this:
void worker_thread()
{
while(true)
{
// Wait until main() sends data
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, []{return ready;});
ready = false;
lk.unlock();
}
The notifying thread should do something like this:
{
std::lock_guard<std::mutex> lk(m);
ready = true;
}
cv.notify_one();
Since it is just a single plain bool, I'd say a mutex is overkill, you should just go for an atomic integer instead. An atomic will read and write in a single CPU clock so no worries there, and it will be lock free, which is always better if possible.
If it is something more complex, then by all means go for a mutex.
It won't crash from that alone, but you can get data corruption, which may crash the application.
The system will not manage that stuff for you, you do it manually, just make sure all access to the data goes through the mutex.
Edit:
Since you specify a number of times that you don't want a complex solution, you may opt for simply using a mutex instead of the bool. There is no need to protect the bool with a mutex, since you can use the mutex as a bool, and yes, you could go with an atomic, but that's what the mutex already does (plus some extra functionality in the case of recursive mutexes).
It also matters what is your exact workload, since your example doesn't make a lot of sense in practice. It would be helpful to know what those some operations are.
So in your ui thread you could simply val ? mutex.lock() : mutex.unlock(), and in your secondary thread you could use if (mutex.tryLock()) doStuff; mutex.unlock(); else doOtherStuff;. Now if the operation in the secondary thread takes too long and you happen to be changing the lock in the main thread, that will block the main thread until the secondary thread unlocks. You could use tryLock(timeout) in the main thread, depending on what you prefer, lock() will block until success, while tryLock(timeout) will prevent blocking but the lock may fail. Also, take care not to unlock from a thread other than the one you locked with, and not to unlock an already unlocked mutex.
Depending on what you are actually doing, maybe an asynchronous event driven approach would be more appropriate. Do you really need that while(1)? How frequently do you perform those operations?
In situation like above does mutex is necessary?
A mutex is one tool that will work. What you actually need are three things:
a means of ensuring an atomic update (a bool will give you this as it's mandated to be an integral type by the standard)
a means of ensuring that the effects of a write made by one thread is actually visible in the other thread. This may sound counter-intuitive but the c++ memory model is single-threaded and optimisations (software and hardware) do not need to consider cross-thread communication, and...
a means of preventing the compiler (and CPU!!) from re-ordering the reads and writes.
The answer to the implied question is 'yes'. You will need something at does all of these things (see below)
If read and write happend at the same time does this cause to crash the application?
not when it's a bool, but the program won't behave as you expect. In fact, because the program is now exhibiting undefined behaviour you can no longer reason about its behaviour at all.
If read and write happens at same time, is globalVar going to have some value other thantrue or false?
not in this case because it's an intrinsic (atomic) type.
And is it going to happen the access(read/write) of a memory location at same time by different thread, does OS providing any kind of locking mechanism to prevent it?
Not unless you specify one.
Your options are:
std::atomic<bool>
std::mutex
std::atomic_signal_fence
Realistically speaking, as long as you use an integer type (not bool), make it volatile, and keep inside of its own cache line by properly aligning its storage, you don't need to do anything special at all.
In situation like above does mutex is necessary?
Only if you want to keep the value of the variable synchronized with other state.
If read and write happed at the same time does this cause to crash the application?
According to C++ standard, it's undefined behavior. So anything can happen: e.g. your application might not crash, but its state might be subtly corrupted. In real life, though, compilers often offer some sane implementation defined behavior and you're fine unless your platform is really weird. Anything commonplace, like 32 and 64 bit intel, PPC and ARM will be fine.
If read and write happens at same time, is globalVar going to have some value other thantrue or false?
globalVar can only have these two values, so it makes no sense to speak of any other values unless you're talking about its binary representation. Yes, it could happen that the binary representation is incorrect and not what the compiler would expect. That's why you shouldn't use a bool but a uint8_t instead.
I wouldn't love to see such flag in a code review, but if a uint8_t flag is the simplest solution to whatever problem you're solving, I say go for it. The if (globalVar) test will treat zero as false, and anything else as true, so temporary "gibberish" is OK and won't have any odd effects in practice. According to the standard, you'll be facing undefined behavior, of course.
And is it going to happen the access(read/write) of a memory location at same time by different thread, does OS providing any kind of locking mechanism to prevent it?
It's not the OS's job to do that.
Speaking of practice, though: on any reasonable platform, the use of a std::atomic_bool will have no overhead over the use of a naked uint8_t, so just use that and be done.
static char szInfo[256];
static volatile bool bIsLocked = false;
static void ApiFunc() {
while (bIsLocked) { }
bIsLocked = true;
//do something to szInfo
bIsLocked = false;
}
It has been awhile since I have done any threading in C++, Is this safe enough? This is a much simpler solution to me than using a mutex, but why would I use windows mutex instead?
You would use a mutex (or more likely a critical section) because that would work. This code does not synchronise. Multiple threads can enter the critical region.
And of course, real locks don't spin. Well, spin-locks do, but you need a deep understanding of the performance implications of a spin-lock before electing to use one.
It is not thread safe at all!
Thread #1 gets trough check, but do not set boolean. Thread #2 comes in this time in the critical session.
What you've implemented is almost Peterson's Algorithm. As the above posters have said this is not thread safe, as there is no mechanism to prevent both threads entering the critical section at the same time. You can try implementing Peterson's Algorithm properly, but it would be far more effective to use a true mutex.
The major problem with your approach is that a thread can be interrupted after exiting the while loop, but before they set the bool to true. If this happens then your two threads enter the critical section together. If you have more than two threads then you will have multiple threads that exit the loop at the same time.
It's not safe at all. volatile has no defined semantics with regards to threads. At most, it will prevent the compiler from suppressing the assignments completely (since their net effect is a no-op), but it will not prevent the compiler from reordering accesses to szInfo around accesses to bIsLocked, and it will not prevent the hardware from any reordering what so ever, or even suppressing the bIsLocked = true completely.
This code won't work the way you hope it does. There's a race condition between the end of the while loop and when you set locked to true. It also CPU busy-waits for the lock while synchronization primatives are able to yield CPU while waiting.
A better way to solve the problem is just to use local data rather than a global buffer. Then you might not even need to lock at all! If you do in fact need to synchronize threads, use a mutex or critical section because those will actually work.
Use Locks.
A talk on the perils of threading without locks, from BoostCon 2010:
http://blip.tv/file/4211197/
slides, notes, at:
http://boostcon.boost.org/2010-resources
I'd like to minimize synchronization and write lock-free code when possible in a project of mine. When absolutely necessary I'd love to substitute light-weight spinlocks built from atomic operations for pthread and win32 mutex locks. My understanding is that these are system calls underneath and could cause a context switch (which may be unnecessary for very quick critical sections where simply spinning a few times would be preferable).
The atomic operations I'm referring to are well documented here: http://gcc.gnu.org/onlinedocs/gcc-4.4.1/gcc/Atomic-Builtins.html
Here is an example to illustrate what I'm talking about. Imagine a RB-tree with multiple readers and writers possible. RBTree::exists() is read-only and thread safe, RBTree::insert() would require exclusive access by a single writer (and no readers) to be safe. Some code:
class IntSetTest
{
private:
unsigned short lock;
RBTree<int>* myset;
public:
// ...
void add_number(int n)
{
// Aquire once locked==false (atomic)
while (__sync_bool_compare_and_swap(&lock, 0, 0xffff) == false);
// Perform a thread-unsafe operation on the set
myset->insert(n);
// Unlock (atomic)
__sync_bool_compare_and_swap(&lock, 0xffff, 0);
}
bool check_number(int n)
{
// Increment once the lock is below 0xffff
u16 savedlock = lock;
while (savedlock == 0xffff || __sync_bool_compare_and_swap(&lock, savedlock, savedlock+1) == false)
savedlock = lock;
// Perform read-only operation
bool exists = tree->exists(n);
// Decrement
savedlock = lock;
while (__sync_bool_compare_and_swap(&lock, savedlock, savedlock-1) == false)
savedlock = lock;
return exists;
}
};
(lets assume it need not be exception-safe)
Is this code indeed thread-safe? Are there any pros/cons to this idea? Any advice? Is the use of spinlocks like this a bad idea if the threads are not truly concurrent?
Thanks in advance. ;)
You need a volatile qualifier on lock, and I would also make it a sig_atomic_t. Without the volatile qualifier, this code:
u16 savedlock = lock;
while (savedlock == 0xffff || __sync_bool_compare_and_swap(&lock, savedlock, savedlock+1) == false)
savedlock = lock;
may not re-read lock when updating savedlock in the body of the while-loop. Consider the case that lock is 0xffff. Then, savedlock will be 0xffff prior to checking the loop condition, so the while condition will short-circuit prior to calling __sync_bool_compare_and_swap. Since __sync_bool_compare_and_swap wasn't called, the compiler doesn't encounter a memory barrier, so it might reasonably assume that the value of lock hasn't changed underneath you, and avoid re-loading it in savedlock.
Re: sig_atomic_t, there's a decent discussion here. The same considerations that apply to signal handlers would also apply to threads.
With these changes, I'd guess that your code would be thread-safe. I would still recommend using mutexes, though, since you really don't know how long your RB-tree insert will take in the general case (per my previous comments under the question).
It may be worth noting that if you're using the Win32 mutexes, that from Vista onwards a thread pool is provided for you. Depending on what you use the RB tree for, you could replace with that.
Also, what you should remember is that atomic operations are not particularly fast. Microsoft said they were a couple hundred cycles, each.
Rather than trying to "protect" the function in this way, it would likely be much more efficient to simply synchronize the threads, either changing to a SIMD/thread pool approach, or to just use a mutex.
But, of course, without seeing your code, I can't really make any more comments. The trouble with multithreading is that you have to see someone's whole model to understand it.