I've written a simple implementation for Peterson's Algorithm in C++ with multi threading. This program changes the string through two threads. But I'm not getting the final result. Where am I wrong?
using namespace std;
int flag[2]={0,1};
int turn;
void* first(void* data){
while(flag[1] && turn==1){}
string &str=*(static_cast<string*>(data));
void* second(void* data){
while(flag[0] && turn==0){}
string &str=*(static_cast<string*>(data));
int main(){
int rc=0;
string s = "wxyz";
pthread_t t;
while(flag[0] && flag[1]!=0){}
return 0;

Prior to C++11 there was no threading model in C++. After C++11, your code does unordered access to the same variable causing race conditions.
Race conditions result in undefined behavior.
Changing a std::string is not atomic. You cannot do it safely while other threads are reading or writing from it.
In C++11 the threading primitives of std are a better idea than the above raw pthread code, excluding the very rare features you cannot emulate.

Refactored to use atomics. Note the explicit fences to guarantee correct ordering or reads/writes to the (non-atomic) string across threads.
Maybe someone would like to sanity-check my logic?
#include <iostream>
#include <thread>
#include <atomic>
#include <memory>
using namespace std;
// atomic types require the compiler to issue appropriate
// store-release/load-acquire ordering
std::atomic<int> flag[2]={{0},{1}};
std::atomic<int> turn;
void first(std::string& str){
while(flag[1] && turn==1){}
void second(std::string& str){
while(flag[0] && turn==0){}
int main(){
string s = "wxyz";
auto t1 = std::thread(first, std::ref(s));
auto t2 = std::thread(second, std::ref(s));
for( ; flag[0] && flag[1]; )
cout << s << endl;
return 0;
expected output:
Modern memory architectures are not what they were when this algorithm was invented. Reads and writes to memory don't happen when you expect on a modern chip, and sometimes don't happen at all.
Cancel your appointments for the next 3 hours and watch this fantastic talk on the subject:

I had to make the threads wait till the thread first has finished, so I created separate threads, t for first and u for second and make main wait by pthread_join till the threads have finished. This removed the need to main spin.
pthread_t t,u;
//while(flag[0] && flag[1]!=0){}
The atomic fence in functions remained as they ensured ordered execution of instructions.
And although changing the order of pthread_createof first and second always outputs to Hello, It kills the very idea of first thread waiting for second thread to complete. So I think the above changes will be the answer to it.


I'm completely new to multithreading and have a little trouble understanding how multithreading actually works.
Let's consider the following example of code. The program simply takes file names as input and counts the number of lowercase letters in them.
#include <iostream>
#include <thread>
#include <mutex>
#include <memory>
#include <vector>
#include <string>
#include <fstream>
#include <ctype.h>
class LowercaseCounter{
LowercaseCounter() :
void count_lowercase_letters(const std::string& filename)
int count = 0;
std::ifstream fin(filename);
char a;
while (fin >> a)
if (islower(a))
std::lock_guard<std::mutex> guard(m);
void print_num() const
std::lock_guard<std::mutex> guard(m);
std::cout << total_count << std::endl;
int total_count;
mutable std::mutex m;
int main(){
std::vector<std::unique_ptr<std::thread>> threads;
LowercaseCounter counter;
std::string line;
while (std::cin >> line)
if (line == "exit")
else if (line == "print")
counter.print_num(); //I think that this should print 0 every time it's called.
threads.emplace_back(new std::thread(&LowercaseCounter::count_lowercase_letters, counter, line));
for (auto& thread : threads)
Firstly I though that the output of counter.print_num() will print 0 as far as the threads are not 'joined' yet to execute the functions. However, It turns out that the program works correctly and the output of counter.print_num() is not 0. So I asked myself the following questions.
What actually happens when a thread is constructed?
If the program above works fine, then thread must be executed when is created, then what does std::thread::join method do?
If the thread is executed at the time of creation, then what's the point of using multithreading in this example?
Thanks in advance.
You seem to be under the impression that the program can only be running one thread at a time, and that it needs to interrupt whatever it's doing in order to execute the code of the thread. That's not the case.
You can think of a thread as a completely separate program that happens to share memory and resources with the program that created it. The function you pass as an argument is that program's 'main()` for every intent and purpose. In Linux, threads are literally separate processes, but as far as C++ is concerned, that's just an implementation detail.
So, in a modern operating system with preemptive multitasking, much like multiple programs can run at the same time, threads can also run at the same time. Note that I say can, it's up to the compiler and OS to decide when to give CPU time to each thread.
then what does std::thread::join method do?
It just waits until the thread is done.
So what would happen if I didn't call join() method for each one of threads
It would crash upon reaching the end of main() because attempting to exit the program without joining a non-detached thread is considered an error.
As you said, in c++ the thread is executed when it is created all std::thread::join does is wait for the thread to finish execution.
In your code all the threads will start executing simultaneously in the loop and then the main thread will wait for each thread to finish execution in the next loop.

Please consider this code:
#include <stdio>
int myInt = 10;
bool firstTime = true;
void dothings(){
/*repeatedly check for myInt here*/
while(true) {
if(myInt > 200) { /*send an alert to a socket*/}
void launchThread() {
if (firsttime) {
std::thread t2(dothings);
firsttime = false;
} else {
/* update myInt with some value here*/
int main() {
/* sleep for 4 seconds */
while(true) {
std::thread t1(launchThread);
I have to call launchthread - there is no other way around to update a value or to start the thread t2 - this is how a third party SDK is designed.
Note that launchThread is exiting first. Main will keep on looping.
To my understanding, however, dothings() will continue to run.
My question is - can dothings still access the newly updated values of myInt after subsequent calls of launchThread from main?
I can't find a definite answer on google - but I believe it will - but it is not thread safe and data corruption can happen. But may be experts here can correct me. Thank you.
About the lifetime of myInt and firsttime
The lifetime of both myInt and firstime will start before main() runs, and end after main() returns. Neither launchThread nor doThings manage the lifetime of any variables (except for t2, which is detached anyway, so it shouldn't matter).
Whether a thread was started by the main thread, or by any other thread, doesn't have any relevance. Once a thread starts, and specially when it is detached, it is basically independent: It has no relation to the other threads running in the program.
Thou shalt not access shared memory without synchronization
But yes, you will run into problems. myInt is shared between multiple threads, so you have to synchronize acesses to it. If you don't, you will eventually run into undefined behavior caused by simultaneous access to shared memory. The simplest way to synchronize myInt is to make it into an atomic.
I'm assuming only one thread is running launchThread at each given time. Looking at your example, though, that may be not the case. If it is not, you also need to synchronize firsttime.
However, your myInt looks a lot like a Condition Variable. Maybe you want to have doThings be blocked until your condition (myInt > 200) is fulfilled. An std::condition_variable will help you with that. This will avoid a busy wait and save your processor some cycles. Some kind of event system using Message Queues can also help you with that, and it will even make your program cleaner and easier to maintain.
Following is a small example on using condition variables and atomics to synchronize your threads. I've tried to keep it simple, so there's still some improvements to be made here. I leave those to your discretion.
#include <atomic>
#include <condition_variable>
#include <iostream>
#include <thread>
std::mutex cv_m; // This mutex will be used both for myInt and cv.
std::condition_variable cv;
int myInt = 10; // myInt is already protected by the mutex, so there's not need for it to be an atomic.
std::atomic<bool> firstTime{true}; // firstTime does need to be an atomic, because it may be accessed by multiple threads, and is not protected by a mutex.
void dothings(){
while(true) {
// std::condition_variable only works with std::unique_lock.
std::unique_lock<std::mutex> lock(cv_m);
// This will do the same job of your while(myInt > 200).
// The difference is that it will only check the condition when
// it is notified that the value has changed.
cv.wait(lock, [](){return myInt > 200;});
// Note that the lock is reaquired after waking up from the wait(), so it is safe to read and modify myInt here.
std::cout << "Alert! (" << myInt << ")\n";
myInt -= 40; // I'm making myInt fall out of the range here. Otherwise, we would get multiple alerts after the condition (since it would be now true forever), and it wouldn't be as interesting.
void launchThread() {
// Both the read and the write to firstTime need to be a single atomic operation.
// Otherwise, two or more threads could read the value as "true", and assume this is the first time entering this function.
if ( {
std::thread t2(dothings);
} else {
std::lock_guard<std::mutex> lock(cv_m);
myInt += 50;
// Value of myInt has changed. Notify all waiting threads.
int main() {
for (int i = 0; i < 6; ++i) { // I'm making this a for loop just so I can be sure the program exits
std::thread t1(launchThread);
// We sleep only to wait for anything to be printed. Your program has an infinite loop on main() already, so you don't have this problem.
See it live on Coliru!

I have a code at work that starts multiple threads that doing some operations and if any of them fail they set the shared variable to false.
Then main thread joins all the worker threads. Simulation of this looks roughly like this (I commented out the possible fix which I don't know if it's needed):
#include <thread>
#include <atomic>
#include <vector>
#include <iostream>
#include <cassert>
using namespace std;
//atomic_bool success = true;
bool success = true;
int main()
vector<thread> v;
for (int i = 0; i < 10; ++i)
if (i == 5 || i == 6)
//, memory_order_release);
success = false;
for (auto& t : v)
//assert(success.load(memory_order_acquire) == false);
assert(success == false);
cout << "Finished" << endl;
return 0;
Is there a possibility that main thread will read the success variable as true even though one of the workers set it to false?
I found that thread::join() is a full memory barrier (source) but does that imply synchronized-with relationship with the following read of success variable from the main thread, so that we're guaranteed to get newest value?
Is the fix I posted (in the commented code) necessary in this case (or maybe another fix if this one is wrong)?
Is there a possibility that read of success variable will be optimized away (since it's not volatile) and we will get old value regardless of suppossed to exist implicit memory barrier on thread::join?
The code is suppossed to work on multiple architectures (cannot remember all of them, I don't have makefile in front of me) but there are atleast x86, amd64, itanium, arm7.
Thanks for any help with this.
Edit: I've modified the example, because in real situation more then one thread can try to write to success variable.
The code above represents a data race, and the use of join cannot change that fact. If only one thread wrote to the variable, it would be fine. But you have two threads writing to it, with no synchronization between them. That's a data race.
join simply means "all side effects of that thread's operation have completed and are now visible to you." That does not create ordering or synchronization between that thread and any thread other than your own.
If you used an atomic_bool, then it wouldn't be UB; it would be guaranteed to be false. But because there is a data race, you get pure UB. It might be true, false, or nasal demons.

I am researching mutexes.
I come up with this example that seems to work without any synchronization.
#include <cstdint>
#include <thread>
#include <iostream>
constexpr size_t COUNT = 10000000;
int g_x = 0;
void p1(){
for(size_t i = 0; i < COUNT; ++i){
void p2(){
int a = 0;
for(size_t i = 0; i < COUNT; ++i){
if (a > g_x){
std::cout << "Problem detected" << '\n';
a = g_x;
int main(){
std::thread t1{ p1 };
std::thread t2{ p2 };
std::cout << g_x << '\n';
My assumptions are following:
Thread 1 change the value of g_x, but it is the only thread that change the value, so theoretically this suppose to be OK.
Thread 2 reads the value of g_x. Reads suppose to be atomic on x86 and ARM. So there must be no problem there too. I have example with several read threads and it works OK too.
With other words, write is not shared and reads are atomic.
Are the assumptions correct?
There's certainly a data race here: g_x is not an std::atomic; it is written to by one thread, and read from by another. So the results are undefined.
Note that the CPU memory model is only part of the deal. The compiler might do all sorts of optimizations (using registers, reordering etc.) if you don't declare your shared variables properly.
As for mutexes, you do not need one here. Declaring g_x as atomic should remove the UB and guarantee proper communication between the threads. Btw, the for in p2 can probably be optimized out even if you're using atomics, but I assume this is just a reduced code and not the real thing.

I'm trying to understand memory fences in c++11, I know there are better ways to do this, atomic variables and so on, but wondered if this usage was correct. I realize that this program doesn't do anything useful, I just wanted to make sure that the usage of the fence functions did what I thought they did.
Basically that the release ensures that any changes made in this thread before the fence are visible to other threads after the fence, and that in the second thread that any changes to the variables are visible in the thread immediately after the fence?
Is my understanding correct? Or have I missed the point entirely?
#include <iostream>
#include <atomic>
#include <thread>
int a;
void func1()
for(int i = 0; i < 1000000; ++i)
a = i;
// Ensure that changes to a to this point are visible to other threads
void func2()
for(int i = 0; i < 1000000; ++i)
// Ensure that this thread's view of a is up to date
std::cout << a;
int main()
std::thread t1 (func1);
std::thread t2 (func2);
t1.join(); t2.join();
Your usage does not actually ensure the things you mention in your comments. That is, your usage of fences does not ensure that your assignments to a are visible to other threads or that the value you read from a is 'up to date.' This is because, although you seem to have the basic idea of where fences should be used, your code does not actually meet the exact requirements for those fences to "synchronize".
Here's a different example that I think demonstrates correct usage better.
#include <iostream>
#include <atomic>
#include <thread>
std::atomic<bool> flag(false);
int a;
void func1()
a = 100;
atomic_thread_fence(std::memory_order_release);, std::memory_order_relaxed);
void func2()
std::cout << a << '\n'; // guaranteed to print 100
int main()
std::thread t1 (func1);
std::thread t2 (func2);
t1.join(); t2.join();
The load and store on the atomic flag do not synchronize, because they both use the relaxed memory ordering. Without the fences this code would be a data race, because we're performing conflicting operations a non-atomic object in different threads, and without the fences and the synchronization they provide there would be no happens-before relationship between the conflicting operations on a.
However with the fences we do get synchronization because we've guaranteed that thread 2 will read the flag written by thread 1 (because we loop until we see that value), and since the atomic write happened after the release fence and the atomic read happens-before the acquire fence, the fences synchronize. (see ยง 29.8/2 for the specific requirements.)
This synchronization means anything that happens-before the release fence happens-before anything that happens-after the acquire fence. Therefore the non-atomic write to a happens-before the non-atomic read of a.
Things get trickier when you're writing a variable in a loop, because you might establish a happens-before relation for some particular iteration, but not other iterations, causing a data race.
std::atomic<int> f(0);
int a;
void func1()
for (int i = 0; i<1000000; ++i) {
a = i;
atomic_thread_fence(std::memory_order_release);, std::memory_order_relaxed);
void func2()
int prev_value = 0;
while (prev_value < 1000000) {
while (true) {
int new_val = f.load(std::memory_order_relaxed);
if (prev_val < new_val) {
prev_val = new_val;
std::cout << a << '\n';
This code still causes the fences to synchronize but does not eliminate data races. For example if f.load() happens to return 10 then we know that a=1,a=2, ... a=10 have all happened-before that particular cout<<a, but we don't know that cout<<a happens-before a=11. Those are conflicting operations on different threads with no happens-before relation; a data race.
Your usage is correct, but insufficient to guarantee anything useful.
For example, the compiler is free to internally implement a = i; like this if it wants to:
while(a != i)
So the other thread may see any values at all.
Of course, the compiler would never implement a simple assignment like that. However, there are cases where similarly perplexing behavior is actually an optimization, so it's a very bad idea to rely on ordinary code being implemented internally in any particular way. This is why we have things like atomic operations and fences only produce guaranteed results when used with such operations.