Writing to a mutex'ed shared resourced - c++

I've a C++ list which is being processed by multiple thread.
Each thread creates a pthread_mutex_lock on the list so that other threads cannot "interfere" with the list. As a part of processing, each thread also push_back data on the list.
My question is - is push_back on a mutex-ed list a bad idea? Is the mutex still valid while the thread is pusing more data on the list? Most of the documentation/examples I've seen on pthread_mutex_lock are only doing "reading" so I am curious to know what happens the same thread which acquired lock, writes on the shared resource.

As long as only that particular thread is holding the lock, and no other thread can take this lock, writing should be fine. think of why a problem could happen? it wouldve been a problem if one thread was writing and the other was reading simultaneously. If a ball is yours, you can do anything with it right? things change when they're shared.

The mutex needs to be unique for the entire group of threads (i.e. all threads must use the same mutex). If you create a mutex for each thread, then you are not thread-safe at all, because each thread will wait on its own mutex and not be synchronized with the rest.
And yes an acquired mutex can be used safely to both read and write.

Related

If only one thread modifies an std::vector, does that same thread need to use a lock when it reads from the vector?

As I understand a data race can only occur with a std::vector when one or more threads are modifying a vector. If all threads are simply reading that is thread-safe.
Now assume I have 2 threads. One which can only read from the vector, but the other reads and modifies the size of the vector. I have a lock to remain thread-safe to the vector.
When the read-only thread reads from the vector it must use the lock since it does not know what the other thread is doing at the same time.
However, the read and write thread knows that the read-only thread is read-only and thus when it is reading the other thread can only be reading or not. This means that a data-race is impossible in that case and it does not have to use the lock.
The read and write thread must still use the lock when modifying the vector.
Is that right?
You are correct. If your single writer thread is in read mode, then a lock does not need to happen because the other thread is only a reader. Readers can't conflict with each other. The read only thread will still need to acquire the lock on every read, and the writing thread will need to lock when it writes, but it doesn't need to lock to read.

std::condition_variable memory writes visibility

Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.
I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.

Multithreading Clarification

I've been trying to learn how to multithread and came up with the following understanding. I was wondering if I'm correct or far off and, if I'm incorrect in any way, if someone could give me advice.
To create a thread, first you need to utilize a library such as <thread> or any alternative (I'm using boost's multithreading library to get cross-platform capabilities). Afterwards, you can create a thread by declaring it as such (for std::thread)
std::thread thread (foo);
Now, you can use thread.join() or thread.detach(). The former will wait until the thread finishes, and then continue; while, the latter will run the thread alongside whatever you plan to do.
If you want to protect something, say a vector std::vector<double> data, from threads accessing simultaneously, you would use a mutex.
Mutex's would be declared as a global variable so that they may access the thread functions (OR, if you're making a class that will be multithreaded, the mutex can be declared as a private/public variable of the class). Afterwards, you can lock and unlock a thread using a mutex.
Let's take a quick look at this example pseudo code:
std::mutex mtx;
std::vector<double> data;
void threadFunction(){
// Do stuff
// ...
// Want to access a global variable
mtx.lock();
data.push_back(3.23);
mtx.unlock();
// Continue
}
In this code, when the mutex locks down on the thread, it only locks the lines of code between it and mtx.unlock(). Thus, other threads will still continue on their merry way until they try accessing data (Note, we would likely through a mutex in the other threads as well). Then they would stop, wait to use data, lock it, push_back, unlock it and continue. Check here for a good description of mutex's.
That's about it on my understanding of multithreading. So, am I horribly wrong or accurate?
Your comments refer to "locking the whole thread". You can't lock part of a thread.
When you lock a mutex, the current thread takes ownership of the mutex. Conceptually, you can think of it as the thread places its mark on the mutex (stores its threadid in the mutex data structure). If any other thread comes along and attempts to acquire the same mutex instance, it sees that the mutex is already "claimed" by somebody else and it waits until the first thread has released the mutex. When the owning thread later releases the mutex, one of the threads that is waiting for the mutex can wake up, acquire the mutex for themselves, and carry on.
In your code example, there is a potential risk that the mutex might not be released once it is acquired. If the call to data.push_back(xxx) throws an exception (out of memory?), then execution will never reach mtx.unlock() and the mutex will remain locked forever. All subsequent threads that attempt to acquire that mutex will drop into a permanent wait state. They'll never wake up because the thread that owns the mutex is toast.
For this reason, acquiring and releasing critical resources like mutexes should be done in a manner that will guarantee they will be released regardless of how execution leaves the current scope. In other languages, this would mean putting the mtx.unlock() in the finally section of a try..finally block:
mtx.lock();
try
{
// do stuff
}
finally
{
mtx.unlock();
}
C++ doesn't have try..finally statements. Instead, C++ leverages its language rules for automatic disposal of locally defined variables. You construct an object in a local variable, the object acquires a mutex lock in its constructor. When execution leaves the current function scope, C++ will make sure that the object is disposed, and the object releases the lock when it is disposed. That's the RAII others have mentioned. RAII just makes use of the existing implicit try..finally block that wraps every C++ function body.

Conditional variables in pthreads and releasing multiple locks

Sorry if this is a trivial question. But I couldn't find an answer anywhere. I'm writing a program that uses pthreads. One thread acquires a lock (mutex) and then attempts to push data into a synchronized buffer. The buffer has its own mutex which gets acquired once the push() method is invoked.
If the buffer is full and the thread needs to wait on a conditional variable, will the wait call release all acquired locks? Or will it just release the one associated with the conditional variable (which happens to be the last acquired lock)? If it is the latter how can I avoid deadlocks if another thread needs to acquire the first lock?
Edit:
The problem I have is the following. I have two threads, say A and B. Thread A has a for loop which inserts data into a number of buffers. Each iteration inserts an element into one of the buffers. Since this is a thread, it continuously executes this loop within another outer loop for the thread. When B is activated, it manipulates those buffers. But B must not operate on the buffers if A is interrupted by the scheduler in the middle of executing the for loop. Therefore I use a mutex to lock the critical section in A, which is the for loop.
Edit-2:
I have been giving it some thought. And the answer to my problem definitely won't be that the buffers' conditional variable releases the first lock as well. That would mean that the first lock was not even necessary in the first place. If consumer threads (different from Thread B) responsible for removing elements from the buffers are doing their job properly, Thread A would resume at some point and the for loop will complete. Therefore, my problem must lie there. I will take a closer look and update.
pthread_cond_wait() will release only the mutex that you pass to it, which in your case will be the buffer's mutex acquired earlier within the push() call.
It sounds like you will have the possibility of deadlock in your situation, but that's a consequence of your high-level design. If Thread B cannot be allowed to execute while Thread A is running its for() loop, but within that for() loop Thread A might have to wait for Thread B to consume some data to proceeed, then deadlock is inevitable.
You will need to update your design to account for this. One possibility is to add a reserve() function to your synchronised buffers that reserves space for a subsequent push(), but doesn't add any data. Your Thread A can then go through and reserve the space it needs without holding the outer mutex (since it's not adding any data visible to Thread B yet) - if it has to wait for Thread B to consume some data, it'll do so here. One it has reserved all the space it requires, it can then lock the outer mutex and push the data - this is guaranteed not to have to wait for Thread B, because the space has already been reserved.

Use lock-free algorithm for shared memory

I want to use lock-free algorithm for shared memory for avoiding mutex. I have some processes that share data use shared memory. If a process is locking mutex and crashes, all other processes also crash.
I read some papers which implement lock-free algorithm with linked list. But in my shared memory, I can't define data structure for using on this block of memory. I have just a pointer to this block.
So I don't have any ideas for apply lock-free algorithm in my situation. I need some helps from you. Thanks and sorry if my English is very bad.
If a process is locking mutex and crashes, all other processes also crash.
Specifically for this use case there are robust mutexes:
PTHREAD_MUTEX_ROBUST
If the process containing the owning thread of a robust mutex terminates while holding the mutex lock, the next thread that acquires the mutex shall be notified about the termination by the return value [EOWNERDEAD] from the locking function. If the owning thread of a robust mutex terminates while holding the mutex lock, the next thread that acquires the mutex may be notified about the termination by the return value [EOWNERDEAD]. The notified thread can then attempt to mark the state protected by the mutex as consistent again by a call to pthread_mutex_consistent(). After a subsequent successful call to pthread_mutex_unlock(), the mutex lock shall be released and can be used normally by other threads. If the mutex is unlocked without a call to pthread_mutex_consistent(), it shall be in a permanently unusable state and all attempts to lock the mutex shall fail with the error [ENOTRECOVERABLE]. The only permissible operation on such a mutex is pthread_mutex_destroy().