when to use mutex - c++

Here is the thing: there is a float array float bucket[5] and 2 threads, say thread1 and thread2.
Thread1 is in charge of tanking up the bucket, assigning each element in bucket a random number. When the bucket is tanked up, thread2 will access bucket and read its elements.
Here is how I do the job:
float bucket[5];
pthread_mutex_t mu = PTHREAD_MUTEX_INITIALIZER;
pthread_t thread1, thread2;
void* thread_1_proc(void*); //thread1's startup routine, tank up the bucket
void* thread_2_proc(void*); //thread2's startup routine, read the bucket
int main()
{
pthread_create(&thread1, NULL, thread_1_proc, NULL);
pthread_create(&thread2, NULL, thread_2_proc, NULL);
pthread_join(thread1);
pthread_join(thread2);
}
Below is my implementation for thread_x_proc:
void* thread_1_proc(void*)
{
while(1) { //make it work forever
pthread_mutex_lock(&mu); //lock the mutex, right?
cout << "tanking\n";
for(int i=0; i<5; i++)
bucket[i] = rand(); //actually, rand() returns int, doesn't matter
pthread_mutex_unlock(&mu); //bucket tanked, unlock the mutex, right?
//sleep(1); /* this line is commented */
}
}
void* thread_2_proc(void*)
{
while(1) {
pthread_mutex_lock(&mu);
cout << "reading\n";
for(int i=0; i<5; i++)
cout << bucket[i] << " "; //read each element in the bucket
pthread_mutex_unlock(&mu); //reading done, unlock the mutex, right?
//sleep(1); /* this line is commented */
}
}
Question
Is my implementation right? Cuz the output is not as what I expected.
...
reading
5.09434e+08 6.58441e+08 1.2288e+08 8.16198e+07 4.66482e+07 7.08736e+08 1.33455e+09
reading
5.09434e+08 6.58441e+08 1.2288e+08 8.16198e+07 4.66482e+07 7.08736e+08 1.33455e+09
reading
5.09434e+08 6.58441e+08 1.2288e+08 8.16198e+07 4.66482e+07 7.08736e+08 1.33455e+09
reading
tanking
tanking
tanking
tanking
...
But if I uncomment the sleep(1); in each thread_x_proc function, the output is right, tanking and reading follow each other, like this:
...
tanking
reading
1.80429e+09 8.46931e+08 1.68169e+09 1.71464e+09 1.95775e+09 4.24238e+08 7.19885e+08
tanking
reading
1.64976e+09 5.96517e+08 1.18964e+09 1.0252e+09 1.35049e+09 7.83369e+08 1.10252e+09
tanking
reading
2.0449e+09 1.96751e+09 1.36518e+09 1.54038e+09 3.04089e+08 1.30346e+09 3.50052e+07
...
Why? Should I use sleep() when using mutex?

Your code is technically correct, but it does not make a lot of sense, and it does not do what you assume.
What your code does is, it updates a section of data atomically, and reads from that section, atomically. However, you don't know in which order this happens, nor how often the data is written to before being read (or if at all!).
What you probably wanted is generate exactly one sequence of numbers in one thread every time and read exactly one new sequence each time in the other thread. For this, you would use either have to use an additional semaphore or better a single-producer-single-consumer queue.
In general the answer to "when should I use a mutex" is "never, if you can help it". Threads should send messages, not share state. This makes a mutex most of the time unnecessary, and offers parallelism (which is the main incentive for using threads in the first place).
The mutex makes your threads run lockstep, so you could as well just run in a single thread.

There is no implied order in which threads will get to run. This means you shall not expect any order. What's more it is possible to get on thread running over and over without letting the other to run. This is implementation specific and should be assumed random.
The case you presented falls much rather for a semaphor which is "posted" with each element added.
However if it has always to be like:
write 5 elements
read 5 elements
you should have two mutexes:
one that blocks producer until the consumer finished
one that blocks consumer until the producer finished
So the code should look something like that:
Producer:
while(true){
lock( &write_mutex )
[insert data]
unlock( &read_mutex )
}
Consumer:
while(true){
lock( &read_mutex )
[insert data]
unlock( &write_mutex )
}
Initially write_mutex should be unlocked and read_mutex locked.
As I said your code seems to be a better case for semaphores or maybe condition variables.
Mutexes are not meant for cases such as this (which doesn't mean you can't use them, it just means there are more handy tools to solve that problem).

You have no right to assume that just because you want your threads to run in a particular order, the implementation will figure out what you want and actually run them in that order.
Why shouldn't thread2 run before thread1? And why shouldn't each thread complete its loop several times before the other thread gets a chance to run up to the line where it acquires the mutex?
If you want execution to switch between two threads in a predictable way, then you need to use a semaphore, condition variable, or other mechanism for messaging between the two threads. sleep appears to result in the order you want on this occasion, but even with the sleep you haven't done enough to guarantee that they will alternate. And I have no idea why the sleep makes a difference to which thread gets to run first -- is that consistent across several runs?

If you have two functions that should execute sequentially, i.e. F1 should finish before F2 starts, then you shouldn't be using two threads. Run F2 on the same thread as F1, after F1 returns.
Without threads, you won't need the mutex either.

It isn't really the issue here.
The sleep only lets the 'other' thread access the mutex lock (by chance, it is waiting for the lock so Probably it will have the mutex), there is no way you can be sure the first thread won't re-lock the mutex though and let the other thread access it.
Mutex is for protecting data so two threads don't :
a) write simultaneously
b) one is writing when another is reading
It is not for making threads work in a certain order (if you want that functionality, ditch the threaded approach or use a flag to tell that the 'tank' is full for example).

By now, it should be clear, from the other answers, what are the mistakes in the original code. So, let's try to improve it:
/* A flag that indicates whose turn it is. */
char tanked = 0;
void* thread_1_proc(void*)
{
while(1) { //make it work forever
pthread_mutex_lock(&mu); //lock the mutex
if(!tanked) { // is it my turn?
cout << "tanking\n";
for(int i=0; i<5; i++)
bucket[i] = rand(); //actually, rand() returns int, doesn't matter
tanked = 1;
}
pthread_mutex_unlock(&mu); // unlock the mutex
}
}
void* thread_2_proc(void*)
{
while(1) {
pthread_mutex_lock(&mu);
if(tanked) { // is it my turn?
cout << "reading\n";
for(int i=0; i<5; i++)
cout << bucket[i] << " "; //read each element in the bucket
tanked = 0;
}
pthread_mutex_unlock(&mu); // unlock the mutex
}
}
The code above should work as expected. However, as others have pointed out, the result would be better accomplished with one of these two other options:
Sequentially. Since the producer and the consumer must alternate, you don't need two threads. One loop that tanks and then reads would be enough. This solution would also avoid the busy waiting that happens in the code above.
Using semaphores. This would be the solution if the producer was able to run several times in a row, accumulating elements in a bucket (not the case in the original code, though).
http://en.wikipedia.org/wiki/Producer-consumer_problem#Using_semaphores

Related

Mutex in c++ is not running properly

I am trying to use mutex to arrange the output between two threads to print the message from Thread 1 then print output from thread 2.
but I am getting the messages to be printed randomly so it seems like I am not using mutex correctly.
std::mutex mu;
void share_print(string msg, int id)
{
mu.lock();
cout << msg << id << endl;
mu.unlock();
}
void func1()
{
for (int i = 0; i > -50; i--)
{
share_print(string("From Func 1: "), i);
}
}
int main()
{
std::thread t1(func1);
for (int i = 0; i < 50; i++)
{
share_print(string("From Main: "), i);
}
t1.join();
return 0;
}
the output is:
Your usage of mutexes is 100% correct. It's your expectation of mutex behavior, and execution thread behavior, that misses the mark. For example, C++ execution threads give you no guarantees whatsoever that any line in func1 will be executed before main() completely finishes executing its for loop.
As far as mutexes are concerned, your only guarantees, that matter here are:
Only one execution thread can lock a given std::mutex at the same time.
If a std::mutex is not locked, one of two things will happen when an execution thread attempts to lock it, either: a) it will lock it b) if another thread already has it locked or manages to lock it first it will block until the mutex is no longer locked, and then it will attempt to lock the mutex again.
It is very important to understand all the implications of these rules. Even if your execution thread has a mutex locked, then proceeds to unlock it, and then lock it again, it may end up re-locking the mutex immediately even if another execution thread is also waiting to lock the mutex. Mutexes do not impose any kind of a queueing, a locking order, or a priority between different execution threads that are trying to lock it. It's a free-for-all.
Even if mutexes worked the way you expected them to work, that still gives you no guarantees whatsoever:
std::thread t1 (func1 );
Your only guarantee here is that func1 will be called by a new execution thread at some point on or after this std::thread object's construction finishes.
for (int i = 0; i < 50; i++)
{
share_print(string("From Main: "), i);
}
This entire for loop can finish even before a single line from func1 gets executed. It'll lock and unlock the mutex 50 times and call it a day, before func1 wakes up and does the same.
Or, alternatively, it's possible for func1 to run to completion before main enters the for loop.
You have no expectations of any order of execution of multiple execution threads, unless explicit syncronization takes place.
In order to achieve your interleaving output a lot more work is needed. In addition to just a mutex there will need to be some kind of a condition variable, and a separate variable that indicates whose "turn" it is. Each execution thread, both main and func1, will not only need to lock the mutex, but block on the condition variable until the shared variable indicates that it's turn is up, then do its printing, set the shared variable to indicate that it's the other thread's turn, signal the condition variable, and only then unlock the mutex (or, always keep the mutex locked and always spin on the condition variable).

is there any way to wakeup multiple threads at the same time in c/c++

well, actually, I'm not asking the threads must "line up" to work, but I just want to notify multiple threads. so I'm not looking for barrier.
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation(also the potential problem in multiple semaphore post operation). it's kind of like:
std::atomic_flag flag{ATOMIC_FLAG_INIT};
void example() {
if (!flag.test_and_set()) {
// this is the thread to do the job, and notify others
do_something();
notify_others(); // this is what I'm looking for
flag.clear();
} else {
// this is the waiting thread
wait_till_notification();
do_some_other_thing();
}
}
void runner() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
// ...
}
so how can I do this in c/c++ or maybe posix API?
sorry, I didn't make this question clear enough, I'd add some more explaination.
it's not thunder heard problem I'm talking about, and yes, it's the re-acquire-lock that bothers me, and I tried shared_mutex, there's still some problem.
let me split the threads to 2 parts, 1 as leader thread, which do the writing job, the others as worker threads, which do the reading job.
but actually they're all equal in programme, the leader thread is the thread that 1st got access to the job( you can take it as the shared buffer is underflowed for this thread). once the job is done, the other workers just need to be notified that them have the access.
if the mutex is used here, any thread would block the others.
to give an example: the main thread's job do_something() here is a read, and it block the main thread, thus the whole system is blocked.
unfortunatly, shared_mutex won't solve this problem:
void example() {
if (!flag.test_and_set()) {
// leader thread:
lk.lock();
do_something();
lk.unlock();
flag.clear();
} else {
// worker thread
lk.shared_lock();
do_some_other_thing();
lk.shared_unlock();
}
}
// outer loop
void looper() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
}
in this code, if the leader job was done, and not much to do between this unlock and next lock (remember they're in a loop), it may get the lock again, leave the worker jobs not working, which is why I call it starve earlier.
and to explain the blocking in do_something(), I don't want this part of job takes all my CPU time, even if the leader's job is not ready (no data arrive for read)
and std::call_once may still not be the answer to this. because, as you can see, the workers must wait till the leader's job finished.
to summarize, this is actually a one-producer-multi-consumer problem.
but I want the consumers can do the job when the product is ready for them. and any can be the producer or consumer. if any but the 1st find the product has run out, the thread should be the producer, thus others are automatically consumer.
but unfortunately, I'm not sure if this idea would work or not
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation
In principle it's not waking up that is serialized, but re-acquiring the lock.
You can avoid that by using std::condition_variable_any with a std::shared_lock - so long as nobody ever gets an exclusive lock on the std::shared_mutex. Alternatively, you can provide your own Lockable type.
Note however that this won't magically allow you to concurrently run more threads than you have cores, or force the scheduler to start them all running in parallel. They'll just be marked as runnable and scheduled as normal - this only fixes the avoidable serialization in your own code.
It sounds like you are looking for call_once
#include <mutex>
void example()
{
static std::once_flag flag;
bool i_did_once = false;
std::call_once(flag, [&i_did_once]() mutable {
i_did_once = true;
do_something();
});
if(! i_did_once)
do_some_other_thing();
}
I don't see how your problem relates to starvation. Are you perhaps thinking about the thundering herd problem? This may arise if do_some_other_thing has a mutex but in that case you have to describe your problem in more detail.

Synchronize n Threads with only using Semaphore and/or mutex in C++

We're studying for our test next week, and have been given an exercise from our teacher, and we just don't see the solution:
How to synchronize n threads, so that all n threads wait at a specific location and only continue with their "work" together when all n threads have reached that location?
We're allowed to use Mutex and Semaphore constructs. The solution should be easy, but we just cant find the answer.
Here's a big hint. You need 2 semaphores, both with N flags. You can solve this with an extra thread. The key is that you can call down() on a semaphore multiple times. e.g. If you call down() on a semaphore 8 times, you need all 8 up()'s before you can continue.
// an additional thread (not one of the N)
void trigger(Semaphore* workersCollect, Semaphore* workersRelease, int n)
{
while(true)
{
for (int i = 0; i < n; ++i)
workersCollect->down();
for (int i = 0; i < n; ++i)
workersRelease->up();
}
}
// Prototype for the "checkpoint" function (exercise for the reader)
void await(Semaphore* workersCollect, Semaphore* workersRelease);
You can also solve it without the extra thread, by using more complicated state checking.
This design has a drawback. If a worker finishes its work extremely quickly, it can grab more than one task (while another thread ends up not running at all). This is fine if you have a threadpool kind of design, but bad if, say, each thread is supposed to work on it's own distinct section of a dataset.
To fix that, you need a semaphore per thread. Something akin to
Semaphore workerRelease[N];
but being careful to avoid false sharing. (You don't want more than 1 semaphore on a cache line.)

Check if a thread is finished to send another param to it

I wanna to check if a thread job has been finished to call it again and send another parameter to that. The code is sth like this:
void SendMassage(double Speed)
{
Sleep(200);
cout << "Speed:" << Speed << endl;
}
int main() {
int Speed_1 = 0;
thread f(SendMassage, Speed_1);
for (int i = 0; i < 50; i++)
{
Sleep(20);
if (?)
{
another call of thread // If last thread done then call it again, otherwise not.
}
Speed_1++;
}
}
How should I do it?
Use, e.g., an atomic flag to indicate that the thread has finished:
std::atomic<bool> finished_flag{false};
void SendMassage(double Speed) {
Sleep(200);
cout << "Speed:" << Speed << endl;
finished_flag = true;
}
int main() {
int Speed_1 = 0;
thread f(SendMassage, Speed_1);
while (Speed_1 < 50) {
Sleep(20);
if (finished_flag) {
f.join();
finished_flag = false;
f = std::thread(SendMassage, Speed_1);
}
Speed_1++;
}
f.join();
}
Working example: https://wandbox.org/permlink/BrEMHFvlInshBy5V
Note that I assumed that, according to your code, you don't want to block when checking whether the thread f has finished. Otherwise, simply call f.join().
If you want to wait untill a thread has finished it's job without using Sleep, you neeed to call it's join method, like so
thread t(SendMassage, Speed_1);
t.join();
//Code here will start executing after returning from join
You can read more about it here http://en.cppreference.com/w/cpp/thread/thread/join
About sending another parameter, I think the best way would be splitting it into another function that you would call after this thread has been joined, if you need some information about something that's known only inside the function, you could create a class that would store that information in it's fields, and use it in the function you're threading.
The possibly most simple way of doing so is just joining the thread. Nothing clever, but...
OK, but why would you then want to have another thread at all if your main thread passes all its time sleeping anyway, so you quite sure are looking for something cleverer.
I personally like the principle of queues; you could use e. g. a std::deque for:
Your producer thread places in some values, your consumer thread just takes them out. Of course, you need to protect your queue via a std::mutex (or by other appropriate means) against race conditions...
The consumer would be running in an endless loop, processing the queue, if entries are available, or sleep if this is not the case. Have a look at this response for how to do the waiting...
There is the danger, though, that your queue runs full, so you might define some threshold when you stop or at least slow down producing new values, if you discover your producer being too fast. The queue has another advantage, though: If your producer is too fast, you might have more than one consumer, all serving the same queue (depending on your needs, putting together the results might need some extra efforts to keep ordering of correct).
Admitted, that's quite some work to do, it might be worth the effort, it might be overkill. If simpler approaches fit your needs already, Daniel's answer is fine, too...

Using a mutex to block execution from outside the critical section

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?
If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.
You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).