Usually, when using std::atomic types accessed concurrently by multiple threads, there's no guarantee a thread will read the "up to date" value when accessing them, and a thread may get a stale value from cache or any older value. The only way to get the up to date value are functions such as compare_exchange_XXX. (See questions here and here)
#include <atomic>
std::atomic<int> cancel_work = 0;
std::mutex mutex;
//Thread 1 executes this function
void thread1_func()
{
cancel_work.store(1, <some memory order>);
}
// Thread 2 executes this function
void thread2_func()
{
//No guarantee tmp will be 1, even when thread1_func is executed first
int tmp = cancel_work.load(<some memory order>);
}
However my question is, what happens when using a mutex and lock instead? Do we have any guarantee of the freshness of shared data accessed?
For example, assuming both thread 1 and thread 2 are run concurrently and thread 1 obtains the lock first (executes first). Does it guarantee that thread 2 will see the modified value and not an old value?
Does it matter whether the shared data "cancel_work" is atomic or not in this case?
#include <atomic>
int cancel_work = 0; //any difference if replaced with std::atomic<int> in this case?
std::mutex mutex;
// Thread 1 executes this function
void thread1_func()
{
//Assuming Thread 1 enters lock FIRST
std::lock_guard<std::mutex> lock(mutex);
cancel_work = 1;
}
// Thread 2 executes this function
void thread2_func()
{
std::lock_guard<std::mutex> lock(mutex);
int tmp = cancel_work; //Will tmp be 1 or 0?
}
int main()
{
std::thread t1(thread1_func);
std::thread t2(thread2_func);
t1.join(); t2.join();
return 0;
}
Yes, the using of the mutex/lock guarantees that thread2_func() will obtain a modified value.
However, according to the std::atomic specification:
The synchronization is established only between the threads releasing
and acquiring the same atomic variable. Other threads can see
different order of memory accesses than either or both of the
synchronized threads.
So your code will work correctly using acquire/release logic, too.
#include <atomic>
std::atomic<int> cancel_work = 0;
void thread1_func()
{
cancel_work.store(1, std::memory_order_release);
}
void thread2_func()
{
// tmp will be 1, when thread1_func is executed first
int tmp = cancel_work.load(std::memory_order_acquire);
}
The C++ standard only constrains the observable behavior of the abstract machine in well formed programs without undefined behavior anywhere during the abstract machine's execution.
It provides no guarantees about mapping between the physical hardware actions the program executes and behavior.
In your cases, on the abstract machine there is no ordering between thread1 and thread2's execution. Even if the physical hardware where to schedule and run thread1 before thread2, that places zero constraints (in your simple example) on the output the program generates. The programs' output is only contrained by what legal outputs the abstract machine could produce.
A C++ compiler can legally:
Eliminate your program completely as equivalent to return 0;
Prove that the read of cancel_work in thread2 is unsequenced relative to all modification of cancel_work away from 0, and change it to a constant read of 0.
Actually run thread1 first then run thread2, but prove it can treat the operations in thread2 as-if they occurred before thread1 ran, so don't bother forcing a cache line refresh in thread2 and reading stale data from cancel_work.
What actually happens on the hardware does not impact what the program can legally do. And what the program can legally do is in threading sitations is restricted by observable behavior of the abstract machine, and on the behavior of synchronization primitives and their use in different threads.
For an actual happens before relationship to occur, you need something like:
std::thread(thread1_func).join();
std::thread(thread2_func).join();
and now we do know that everything in thread1_func happens before thread2_func.
We can still rewrite your program as return 0; and similar changes. But we now have a guarantee that thread1_func happens before thread2_func code does.
Note that we can eliminate (1) above via:
std::lock_guard<std::mutex> lock(mutex);
int tmp = cancel_work; //Will tmp be 1 or 0?
std::cout << tmp;
and cause tmp to actually be printed.
The program can then be converted to one that prints 1 or 0 and has no threading at all. It could keep the threading, but change thread2_func to print a constant 0. Etc.
So we rewrite your program to look like this:
std::condition_variable cv;
bool writ = false;
int cancel_work = 0; //any difference if replaced with std::atomic<int> in this case?
std::mutex mutex;
// Thread 1 executes this function
void thread1_func()
{
{
std::lock_guard<std::mutex> lock(mutex);
cancel_work = 1;
}
{
std::lock_guard<std::mutex> lock(mutex);
writ = true;
cv.notify_all();
}
}
// Thread 2 executes this function
void thread2_func()
{
std::unique_lock<std::mutex> lock(mutex);
cv.wait(lock, []{ return writ; } );
int tmp = cancel_work;
std::cout << tmp; // will print 1
}
int main()
{
std::thread t1(thread1_func);
std::thread t2(thread2_func);
t1.join(); t2.join();
return 0;
}
and now thread2_func happens after thread1_func and all is good. The read is guaranteed to be 1.
Related
I am trying to use mutex to arrange the output between two threads to print the message from Thread 1 then print output from thread 2.
but I am getting the messages to be printed randomly so it seems like I am not using mutex correctly.
std::mutex mu;
void share_print(string msg, int id)
{
mu.lock();
cout << msg << id << endl;
mu.unlock();
}
void func1()
{
for (int i = 0; i > -50; i--)
{
share_print(string("From Func 1: "), i);
}
}
int main()
{
std::thread t1(func1);
for (int i = 0; i < 50; i++)
{
share_print(string("From Main: "), i);
}
t1.join();
return 0;
}
the output is:
Your usage of mutexes is 100% correct. It's your expectation of mutex behavior, and execution thread behavior, that misses the mark. For example, C++ execution threads give you no guarantees whatsoever that any line in func1 will be executed before main() completely finishes executing its for loop.
As far as mutexes are concerned, your only guarantees, that matter here are:
Only one execution thread can lock a given std::mutex at the same time.
If a std::mutex is not locked, one of two things will happen when an execution thread attempts to lock it, either: a) it will lock it b) if another thread already has it locked or manages to lock it first it will block until the mutex is no longer locked, and then it will attempt to lock the mutex again.
It is very important to understand all the implications of these rules. Even if your execution thread has a mutex locked, then proceeds to unlock it, and then lock it again, it may end up re-locking the mutex immediately even if another execution thread is also waiting to lock the mutex. Mutexes do not impose any kind of a queueing, a locking order, or a priority between different execution threads that are trying to lock it. It's a free-for-all.
Even if mutexes worked the way you expected them to work, that still gives you no guarantees whatsoever:
std::thread t1 (func1 );
Your only guarantee here is that func1 will be called by a new execution thread at some point on or after this std::thread object's construction finishes.
for (int i = 0; i < 50; i++)
{
share_print(string("From Main: "), i);
}
This entire for loop can finish even before a single line from func1 gets executed. It'll lock and unlock the mutex 50 times and call it a day, before func1 wakes up and does the same.
Or, alternatively, it's possible for func1 to run to completion before main enters the for loop.
You have no expectations of any order of execution of multiple execution threads, unless explicit syncronization takes place.
In order to achieve your interleaving output a lot more work is needed. In addition to just a mutex there will need to be some kind of a condition variable, and a separate variable that indicates whose "turn" it is. Each execution thread, both main and func1, will not only need to lock the mutex, but block on the condition variable until the shared variable indicates that it's turn is up, then do its printing, set the shared variable to indicate that it's the other thread's turn, signal the condition variable, and only then unlock the mutex (or, always keep the mutex locked and always spin on the condition variable).
Please consider this code:
#include <stdio>
int myInt = 10;
bool firstTime = true;
void dothings(){
/*repeatedly check for myInt here*/
while(true) {
if(myInt > 200) { /*send an alert to a socket*/}
}
}
void launchThread() {
if (firsttime) {
std::thread t2(dothings);
t2.detach();
firsttime = false;
} else {
/* update myInt with some value here*/
}
return;
}
int main() {
/* sleep for 4 seconds */
while(true) {
std::thread t1(launchThread);
t1.detach();
}
}
I have to call launchthread - there is no other way around to update a value or to start the thread t2 - this is how a third party SDK is designed.
Note that launchThread is exiting first. Main will keep on looping.
To my understanding, however, dothings() will continue to run.
My question is - can dothings still access the newly updated values of myInt after subsequent calls of launchThread from main?
I can't find a definite answer on google - but I believe it will - but it is not thread safe and data corruption can happen. But may be experts here can correct me. Thank you.
About the lifetime of myInt and firsttime
The lifetime of both myInt and firstime will start before main() runs, and end after main() returns. Neither launchThread nor doThings manage the lifetime of any variables (except for t2, which is detached anyway, so it shouldn't matter).
Whether a thread was started by the main thread, or by any other thread, doesn't have any relevance. Once a thread starts, and specially when it is detached, it is basically independent: It has no relation to the other threads running in the program.
Thou shalt not access shared memory without synchronization
But yes, you will run into problems. myInt is shared between multiple threads, so you have to synchronize acesses to it. If you don't, you will eventually run into undefined behavior caused by simultaneous access to shared memory. The simplest way to synchronize myInt is to make it into an atomic.
I'm assuming only one thread is running launchThread at each given time. Looking at your example, though, that may be not the case. If it is not, you also need to synchronize firsttime.
Alternatives
However, your myInt looks a lot like a Condition Variable. Maybe you want to have doThings be blocked until your condition (myInt > 200) is fulfilled. An std::condition_variable will help you with that. This will avoid a busy wait and save your processor some cycles. Some kind of event system using Message Queues can also help you with that, and it will even make your program cleaner and easier to maintain.
Following is a small example on using condition variables and atomics to synchronize your threads. I've tried to keep it simple, so there's still some improvements to be made here. I leave those to your discretion.
#include <atomic>
#include <condition_variable>
#include <iostream>
#include <thread>
std::mutex cv_m; // This mutex will be used both for myInt and cv.
std::condition_variable cv;
int myInt = 10; // myInt is already protected by the mutex, so there's not need for it to be an atomic.
std::atomic<bool> firstTime{true}; // firstTime does need to be an atomic, because it may be accessed by multiple threads, and is not protected by a mutex.
void dothings(){
while(true) {
// std::condition_variable only works with std::unique_lock.
std::unique_lock<std::mutex> lock(cv_m);
// This will do the same job of your while(myInt > 200).
// The difference is that it will only check the condition when
// it is notified that the value has changed.
cv.wait(lock, [](){return myInt > 200;});
// Note that the lock is reaquired after waking up from the wait(), so it is safe to read and modify myInt here.
std::cout << "Alert! (" << myInt << ")\n";
myInt -= 40; // I'm making myInt fall out of the range here. Otherwise, we would get multiple alerts after the condition (since it would be now true forever), and it wouldn't be as interesting.
}
}
void launchThread() {
// Both the read and the write to firstTime need to be a single atomic operation.
// Otherwise, two or more threads could read the value as "true", and assume this is the first time entering this function.
if (firstTime.exchange(false)) {
std::thread t2(dothings);
t2.detach();
} else {
{
std::lock_guard<std::mutex> lock(cv_m);
myInt += 50;
}
// Value of myInt has changed. Notify all waiting threads.
cv.notify_all();
}
return;
}
int main() {
for (int i = 0; i < 6; ++i) { // I'm making this a for loop just so I can be sure the program exits
std::thread t1(launchThread);
t1.detach();
}
// We sleep only to wait for anything to be printed. Your program has an infinite loop on main() already, so you don't have this problem.
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
See it live on Coliru!
I've a vector of strings which is a shared resourse.
std::vector<std::string> vecstr;
Have 2 threads which run in parallel:
Thread1: To insert strings to shared resourse.
Thread2: To calculate the size of the shared resourse.
std::mutex mt;
void f1()
{
mt.lock();
while(some_condition())
{
std::string str = getStringFromSomewhere();
vecstr.push_back(str);
}
mt.unlock();
}
size_t f2()
{
mt.lock();
while(string_sending_hasEnded())
{
size_t size = vecstr.size();
}
mt.unlock();
}
int main()
{
std::thread t1(f1);
std::thread t2(f2);
t1.join();
t2.join();
}
My question is : if the t1 thread keeps the vecstr shared resource mutex locked for the entire while loop duration how will the t2 get hold of the shared resource vecstr to calculate it's size ?
Does the execution keep switching between the 2 threads or it depends on who gets hold of mutex 1st. So if T1 got hold of mutex then it will release it only after while loop ends ? Is this true ? Or the execution keeps switching between the 2 threads.
If any one of the thread is going to hijack the execution by not
allowing other thread to be switched in between then how do i handle
such a scenario with while/for loops in each thread but both threads
needs to be continuously executed ? Where I want both the threads to
keep switching their execution. Shall I lock and unlock inside the
while loop, so that each iteration has mutex locked & unlocked ?
You got it. If you want to use mutexes successfully in real life, you will keep a mutex looked only for the smallest amount of time possible. For example just around the push_back() and size() calls.
But really, what you need to do first is figure out what your program is supposed to do, and then use mutexes to make sure to achieve that. At the moment I know that you want to run some threads, but that's not what you want to achieve.
So if T1 got hold of mutex then it will release it only after while loop ends ? Is this true ?
Yes, that's true.
Either of the threads will lock the mt mutex over the whole time these loops are executed.
As for your comment
If that's the case how do i handle such a scenario ? Where I want both the threads to keep switching their execution. Shall I lock and unlock inside the while loop, so that each iteration has mutex locked & unlocked
Yes use more fine grained locking, just for the operations that change/access the vector:
std::mutex mt;
void f1() {
while(some_condition()) {
std::string str = getStringFromSomewhere();
{ std::unique_lock(mt); // -+
vecstr.push_back(str); // | locked
} // -+
}
}
size_t f2() {
while(string_sending_hasEnded()) {
size_t size = 0;
{ std::unique_lock(mt); // -+
size = vecstr.size(); // | locked
} // -+
}
}
I also highly recommend to use a lock-guard (as the std::unique_lock in my example), instead of using lock() unock() yourself manually. So it's safe that the mutex will be unlocked, e.g. in case of exceptions thrown.
I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?
If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.
You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).
On follow link (http://www.cplusplus.com/reference/mutex/mutex/try_lock/) we have declared that sample can return only values from 1 to 100000. Does it is declared that 0 can't be in output?
// mutex::try_lock example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
volatile int counter (0); // non-atomic counter
std::mutex mtx; // locks access to counter
void attempt_10k_increases () {
for (int i=0; i<10000; ++i) {
if (mtx.try_lock()) { // only increase if currently not locked:
++counter;
mtx.unlock();
}
}
}
int main ()
{
std::thread threads[10];
// spawn 10 threads:
for (int i=0; i<10; ++i)
threads[i] = std::thread(attempt_10k_increases);
for (auto& th : threads) th.join();
std::cout << counter << " successful increases of the counter.\n";
return 0;
}
In any case, it's easy to answer 'How to get 2?', but really not clear about how to get 1 and never get 0.
The try_lock can "fail spuriously when no other thread has a lock on the mutex, but repeated calls in these circumstances shall succeed at some point", but if it true, then sample can return 0 (and also can return 1 in some case).
But, if this specification sample declared true and 0 cannot be in output, then words about "fail spuriously" maybe not true then?
The Standard says the following:
30.4.1.2/14 [thread.mutex.requirements.mutex]
An implementation
may fail to obtain the lock even if it is not held by any other thread. [ Note: This spurious failure is
normally uncommon, but allows interesting implementations based on a simple compare and exchange
(Clause 29). —end note ]
So you can even get 0 if all of try_lock fail.
Also, please do not use cplusplus.com, it has a long history of having lots of mistakes.
It's safer to use cppreference.com which is much closer to the Standard
try_lock can fail if another thread held a lock and just released it, for example. You read that "repeated calls in these circumstances shall succeed at some point". Doing 10,000 calls to try_lock will count as "repeated calls" and one of them will succeed.
You can never get 0:
When first call to try_lock() happens (doesn't matter which thread is here first) the mutex is unlocked. This means that 1 of the 10 threads will manage to lock the mutex, meaning try_lock() will succeed.
You can get 1:
Lets say thread 0 manages to lock the mutex:
void attempt_10k_increases () {
for (int i=0; i<10000; ++i) {
if (mtx.try_lock()) { // thread 1 to 9 are here
++counter; // thread 0 is here
mtx.unlock();
}
}
}
Now we say that the OS scheduler chose to not run thread 0 for a little while. In the meantime, thread 1 to 9 keep running, calling try_lock() and failing, because thread 0 holds the mutex.
Thread 1 to 9 are now done. They failed to acquire the mutex even once.
Thread 0 gets reschuled. It unlock the mutex and finish.
Counter is now 1.