Why it doesn't work? Simple multithreading example - c++

can you help me with understanding why does this code freeze the program?
#include <iostream>
#include <thread>
#include <mutex>
using namespace std;
int i = 0;
mutex mx;
void foo() {
while(1) {
lock_guard<mutex> locker(mx);
if(i == 5000) {
void boo() {
while(1) {
if(i == 100) {
lock_guard<mutex> locker(mx);
i = 5000;
int main(int argc, char *argv[])
thread th1(foo);
thread th2(boo);
return 0;
Why do I have such a result?
How to change the code to make it right? Could you give me your thoughts.

Even if boo starts running first, it will probably never see i==100.
If you only have one CPU, then it's very unlikely that the CPU would be switched from foo to boo while i==100.
If you have multiple CPUs, then i==100 will probably never even make it into foo's cache, because i is not volatile, and the mutex is not locked between reads.
Really the compiler doesn't even have to read i after the first time, because there are no memory barriers. It can assume that the value hasn't changed.
Even if you were to fix this, the distinct possibility would remain that i could be incremented past 100 before boo would notice. It looks like you expect the two threads to "take turns", but that's just not how it works.

The behavior of the program is undefined, so reasoning about what it does is futile. The problem is that boo reads the value of i and foo both reads and writes the value of i, but the read of i in if (i == 100) in boo is unsequenced with respect to the writes occurring in foo. That's a data race, and the behavior of the program is undefined. Sure, you can guess at what might happen, but if you want your code to run correctly, you have to ensure that there are no data races. That means using some form of synchronization: either move the lock in boo before the if, or get rid of the mutex and change the type of i to std::atomic<int>.

There are a few concurrency issues with your solution:
You have to lock the mutex consistently. All access to i must be protected by the mutex, so also at the if (i == 100) { line. In the absence of synchronization, the compiler is free to optimize the thread as-if it was running in isolation, and assume i to never change.
There is no guarantee that boo will start before foo. If it starts after, i will already be incremented well above 100.
Mutex locking is not guaranteed to be fair. Two threads competing for the same mutex will not run in an interleaved manner. Which means foo might increment i many times before boo gets a chance to run, so the value of i as seen by boo might easily jump from 0 to 1000, skipping the desired 100.
In isolation, foo will "run away", incrementing i well beyond 5000. There should be some exit or a restart condition.
How to change the code to make it right?
Add some synchronization in order to enforce interleaved processing. For example, using condition_variables to signal between threads:
int i = 0;
mutex mx;
condition_variable updated_cond;
bool updated = false;
condition_variable consumed_cond;
bool consumed = true;
void foo() {
while (1) {
unique_lock<mutex> locker(mx);
consumed_cond.wait(locker, [] { return consumed; });
consumed = false;
if (i == 5000) {
std::cout << "foo: i = " << i << "+1\n";
updated = true;
std::cout << "foo exiting\n";
void boo() {
for (bool exit = false; !exit; ) {
unique_lock<mutex> locker(mx);
updated_cond.wait(locker, [] { return updated; });
updated = false;
std::cout << "boo: i = " << i << "\n";
if (i == 100) {
i = 5000;
exit = true;
consumed = true;
std::cout << "boo exiting\n";


Are I/O streams really thread-safe?

I wrote a program that writes random numbers to one file in the first thread, and another thread reads them from there and writes to another file those that are prime numbers. The third thread is needed to stop/start the work. I read that I/O threads are thread-safe. Since writing to a single shared resource is thread-safe, what could be the problem?
Output: always correct record in numbers.log, sometimes no record in numbers_prime.log when there are prime numbers, sometimes they are all written.
#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <vector>
#include <condition_variable>
#include <future>
#include <random>
#include <chrono>
#include <string>
using namespace std::chrono_literals;
std::atomic_int ITER_NUMBERS = 30;
std::atomic_bool _var = false;
bool ret() { return _var; }
std::atomic_bool _var_log = false;
bool ret_log() { return _var_log; }
std::condition_variable cv;
std::condition_variable cv_log;
std::mutex mtx;
std::mutex mt;
std::atomic<int> count{0};
std::atomic<bool> _FL = 1;
int MIN = 100;
int MAX = 200;
bool is_empty(std::ifstream& pFile) // function that checks if the file is empty
return pFile.peek() == std::ifstream::traits_type::eof();
bool isPrime(int n) // function that checks if the number is prime
if (n <= 1)
return false;
for (int i = 2; i <= sqrt(n); i++)
if (n % i == 0)
return false;
return true;
void Log(int min, int max) { // function that generates random numbers and writes them to a file numbers.log
std::string str;
std::ofstream log;
std::random_device seed;
std::mt19937 gen{seed()};
std::uniform_int_distribution dist{min, max};
log.open("numbers.log", std::ios_base::trunc);
for (int i = 0; i < ITER_NUMBERS; ++i, ++count) {
std::unique_lock<std::mutex> ulm(mtx);
str = std::to_string(dist(gen)) + '\n';
log.write(str.c_str(), str.length());
_var_log = true;
//_var_log = false;
_var_log = true;
_FL = 0;
void printCheck() { // Checking function to start/stop printing
std::cout << "Log to file? [y/n]\n";
while (_FL) {
char input;
std::cin >> input;
if (input == 'y') {
_var = true;
if (input == 'n') {
_var = false;
void primeLog() { // a function that reads files from numbers.log and writes prime numbers to numbers_prime.log
std::unique_lock ul(mt);
int number = 0;
std::ifstream in("numbers.log");
std::ofstream out("numbers_prime.log", std::ios_base::trunc);
if (is_empty(in)) {
cv_log.wait(ul, ret_log);
int oldCount{};
for (int i = 0; i < ITER_NUMBERS; ++i) {
if (oldCount == count && count != ITER_NUMBERS) { // check if primeLog is faster than Log. If it is faster, then we wait to continue
cv_log.wait(ul, ret_log);
_var_log = false;
if (!in.eof()) {
in >> number;
if (isPrime(number)) {
out << number;
out << "\n";
oldCount = count;
int main() {
std::thread t1(printCheck);
std::thread t2(Log, MIN, MAX);
std::thread t3(primeLog);
return 0;
This has nothing to do with the I/O stream thread safety. The shown code's logic is broken.
The shown code seems to follow a design pattern of breaking up a single logical algorithm into multiple pieces, and scattering them far and wide. This makes it more difficult to understand what it's doing. So let's rewrite a little bit of it, to make the logic more clear. In primeLog let's do this instead:
cv_log.wait(ul, []{ return _var_log; });
_var_log = false;
It's now more clear that this waits for _var_log to be set, before proceeding on its merry way. Once it is it gets immediately reset.
The code that follows reads exactly one number from the file, before looping back here. So, primeLog's main loop will always handle exactly one number, on each iteration of the loop.
The problem now is very easy to see, once we head over to the other side, and do the same clarification:
std::unique_lock<std::mutex> ulm(mtx);
cv.wait(ulm,[]){ return _var; });
// Code that generates one number and writes it to the file
_var_log = true;
Once _var is set to true, it remains true. This loops starts running full blast, iterating continuously. On each iteration of the loop it blindly sets _var_log to true and signals the other thread's condition variable.
C++ execution threads are completely independent of each other unless they are explicitly synchronize in some way.
Nothing is preventing this loop from running full blast, getting through its entire number range, before the other execution thread wakes up and decides to read the first number from the file. It'll do that, then go back and wait for its condition variable to be signaled again, for the next number. Its hopes and dreams of the 2nd number will be left unsatisfied.
On each iteration of the generating thread's loop the condition variable, for the other execution thread, gets signaled.
Condition variables are not semaphores. If nothing is waiting on a condition variable when it's signaled -- too bad. When some execution thread decides to wait on a condition variable, it may or may not be immediately woken up.
One of these two execution thread relies on it receiving a condition variable notification for every iteration of its loop.
The logic in the other execution thread fails to implement this guarantee. This may not be the only flaw, there might be others, subject to further analysis, this was just the most apparent logical flaw.
Thanks to those who wrote about read-behind-write, now I know more. But that was not the problem. The main problem was that if it was a new file, when calling pFile.peek() in the is_empty function, we permanently set the file flag to eofbit. Thus, until the end of the program in.rdstate() == std::ios_base::eofbit.
Fix: reset the flag state.
if (is_empty(in)) {
cv_log.wait(ul, ret_log);
in.clear(); // reset state
There was also a problem with the peculiarity of reading/writing one file from different threads, though it was not the cause of my program error, but it led to another one.
Because if when I run the program again primeLog() opens std::ifstream in("numbers.log") for reading faster than log.open("numbers.log", std::ios_base::trunc), then in will save old data into its buffer faster than log.open will erase them with the std::ios_base::trunc flag. Hence we will read and write to numbers_prime.log the old data.

If statement passes only when preceded by debug cout line (multi-threading in C)

I created this code to use for solving CPU intensive tasks real-time and potentially as a base for a game engine in the future. For it I created a system where there is an array of ints each thread modifies to signal whether they are done with their current task.
The problem occurs when running it with more than 4 threads. When using 6 threads or more, the "if (threadone_private == threadcount)" stops working UNLESS I add this debug line "cout << threadone_private << endl;" before it.
I cannot comprehend why this debug line makes any difference on whether the if conditional functions as expected, neither why it works without it when using 4 threads or less.
For this code I'm using:
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <iostream>
#include <thread>
#include <atomic>
#include <vector>
#include <string>
#include <fstream>
#include <sstream>
using namespace std;
Right now this code only counts up to 60 trillion, in asynchronous steps of 3 billion, really fast.
Here are the relevant parts of the code:
int thread_done[6] = { 0,0,0,0,0,0 };
atomic<long long int> testvar1 = 0;
atomic<long long int> testvar2 = 0;
atomic<long long int> testvar3 = 0;
atomic<long long int> testvar4 = 0;
atomic<long long int> testvar5 = 0;
atomic<long long int> testvar6 = 0;
void task1(long long int testvar, int thread_number)
int continue_work = 1;
for (; ; ) {
while (continue_work == 1) {
for (int i = 1; i < 3000000001; i++) {
thread_done[thread_number] = 1;
if (thread_number==0) {
testvar1 = testvar;
if (thread_number == 1) {
testvar2 = testvar;
if (thread_number == 2) {
testvar3 = testvar;
if (thread_number == 3) {
testvar4 = testvar;
if (thread_number == 4) {
testvar5 = testvar;
if (thread_number == 5) {
testvar6 = testvar;
continue_work = 0;
if (thread_done[thread_number] == 0) {
continue_work = 1;
And here is the relevant part of the main thread:
int main() {
long long int testvar = 0;
int threadcount = 6;
int threadone_private = 0;
thread thread_1(task1, testvar, 0);
thread thread_2(task1, testvar, 1);
thread thread_3(task1, testvar, 2);
thread thread_4(task1, testvar, 3);
thread thread_5(task1, testvar, 4);
thread thread_6(task1, testvar, 5);
for (; ; ) {
if (threadcount == 0) {
for (int i = 1; i < 3000001; i++) {
cout << testvar << endl;
else {
while (testvar < 60000000000000) {
threadone_private = thread_done[0] + thread_done[1] + thread_done[2] + thread_done[3] + thread_done[4] + thread_done[5];
cout << threadone_private << endl;
if (threadone_private == threadcount) {
testvar = testvar1 + testvar2 + testvar3 + testvar4 + testvar5 + testvar6;
cout << testvar << endl;
thread_done[0] = 0;
thread_done[1] = 0;
thread_done[2] = 0;
thread_done[3] = 0;
thread_done[4] = 0;
thread_done[5] = 0;
I expected that since each worker thread only modifies one int out of the array threadone_private, and since the main thread only ever reads it until all worker threads are waiting, that this if (threadone_private == threadcount) should be bulletproof... Apparently I'm missing something important that goes wrong whenever I change this:
threadone_private = thread_done[0] + thread_done[1] + thread_done[2] + thread_done[3] + thread_done[4] + thread_done[5];
cout << threadone_private << endl;
if (threadone_private == threadcount) {
To this:
threadone_private = thread_done[0] + thread_done[1] + thread_done[2] + thread_done[3] + thread_done[4] + thread_done[5];
//cout << threadone_private << endl;
if (threadone_private == threadcount) {
Disclaimer: Concurrent code is quite complicated and easy to get wrong, so it's generally a good idea to use higher level abstractions. There are a whole lot of details that are easy to get wrong without ever noticing. You should think very carefully about doing such low-level programming if you're not an expert. Sadly C++ lacks good built-in high level concurrent constructs, but there are libraries out there that handle this.
It's unclear what the whole code is supposed to do anyhow to me. As far as I can see whether the code ever stops relies purely on timing - even if you did the synchronization correctly - which is completely non deterministic. Your threads could execute in such a way that thread_done is never all true.
But apart from that there is at least one correctness issue: You're reading and writing to int thread_done[6] = { 0,0,0,0,0,0 }; without synchronization. This is undefined behavior so the compiler can do what it wants.
What probably happens is that the compiler sees that it can cache the value of threadone_private since the thread never writes to it so the value cannot change (legally). The external call to std::cout means it can't be sure that the value isn't change behind its back so it has to read the value each iteration new (also std::cout uses locks which causes synchronization in most implementations which again limits what the compiler can assume).
I cannot see any std::mutex, std::condition_variable or variants of std::lock in your code. Doing multithreading without any of those will never succeed reliably. Because whenever multiple threads modify the same data, you need to make sure only one thread (including your main thread) has access to that data at any given time.
Edit: I noticed you use atomic. I do not have any experience with this, however I know using mutexes works reliably.
Therefore, you need to lock every access (read or write) to that data with a mutex like this:
std::mutex myMutex;
std::condition_variable myCondition;
int workersDone = 0;
/* main thread */
std::unique_lock<std::mutex> lock(myMutex); //waits until mutex is locked.
while(workersDone != 2) {
myCondition.wait(lock); //the mutex is unlocked while waiting
std::cout << "the data is ready now" << std::endl;
} //the lock is destroyed, unlocking the mutex
/* Worker thread */
while(true) {
std::unique_lock<std::mutex> lock(myMutex); //waits until mutex is locked
if(read_or_modify_a_piece_of_shared_data() == DATA_FINISHED) {
break; //lock leaves the scope, unlocks the mutex
prepare_everything_for_the_next_piece_of_shared_data(); //DO NOT access data here
//data is processed
myCondition.notify_one(); //no mutex here. This wakes up the waiting thread
I hope this gives you an idea on how to use mutexes and condition variables to gain thread safety.
Disclaimer: 100% pseudo code ;)

Using infinite loops in std::thread to increment and display a value

Considering the following, simple code:
using ms = std::chrono::milliseconds;
int val = 0;
std::cout << val++ << ' ';
We see that we infinitely print subsequent numbers each 0.2 second.
Now, I would like to implement the same logic using a helper class and multithreading. My aim is to be able to run something similar to this:
int main()
Foo f;
std::thread t1(&Foo::inc, f);
std::thread t2(&Foo::dis, f);
where Foo::inc() will increment a member variable val of an object f by 1 and Foo::dis() will display the same variable.
Since the original idea consisted of incrementing and printing the value infinitely, I would assume that both of those functions must contain an infinite loop. The problem that could occur is data race - reading and incrementing the very same variable. To prevent that I decided to use std::mutex.
My idea of implementing Foo is as follows:
class Foo {
int val;
Foo() : val{0} {}
void inc()
void dis()
using ms = std::chrono::milliseconds;
std::cout << val << ' ';
Obviously it's missing the mtx object, so the line
std::mutex mtx;
is written just under the #includes, declaring mtx as a global variable.
To my understanding, combining this class' definition with the above main() function should issue two, separate, infinite loops that each will firstly lock the mutex, either increment or display val and unlock the mutex so the other one could perform the second action.
What actually happens is instead of displaying the sequence of 0 1 2 3 4... it simply displays 0 0 0 0 0.... My guess is that I am either using std::mutex::lock and std::mutex::unlock incorrectly, or my fundamental understanding of multithreading is lacking some basic knowledge.
The question is - where is my logic wrong?
How would I approach this problem using a helper class and two std::threads with member functions of the same object?
Is there a guarantee that the incrementation of val and printing of it will each occur one after the other using this kind of logic? i.e. will there never be a situation when val is incremented twice before it being displayed, or vice versa?
You are sleeping with the thread locked preventing the other thread from running for most of the time.
void dis()
using ms = std::chrono::milliseconds;
std::cout << val << ' ';
std::this_thread::sleep_for(ms(200)); // this is still blocking the other thread
Try this:
void dis()
using ms = std::chrono::milliseconds;
std::cout << val << ' ';
mtx.unlock(); // unlock to allow the other thread to progress
Also, rather than using a global std::mutex you could add it as a member of your class.
If you want to synchronize the threads to produce an even output of numbers incrementing by exactly one each time then you need something like a std::condition_variable so that each thread can signal the other when it has done it's part of the job (thread one - incrementing and thread 2 - printing).
Here is an example:
class Foo {
int val;
std::mutex mtx;
std::condition_variable cv;
bool new_value; // flag when a new value is ready
Foo() : val{0}, new_value{false} {}
void inc()
std::unique_lock<std::mutex> lock(mtx);
// release the lock and wait until new_value has been consumed
cv.wait(lock, [this]{ return !new_value; }); // wait for change in new_value
new_value = true; // signal for the other thread there is a new value
cv.notify_one(); // wake up the other thread
void dis()
using ms = std::chrono::milliseconds;
// a nice delay
std::unique_lock<std::mutex> lock(mtx);
// release the lock and wait until new_value has been produced
cv.wait(lock, [this]{ return new_value; }); // wait for a new value
std::cout << val << ' ' << std::flush; // don't forget to flush
new_value = false; // signal for the other thread that the new value was used
cv.notify_one(); // wake up the other thread
int main(int argc, char** argv)
Foo f;
std::thread t1(&Foo::inc, &f);
std::thread t2(&Foo::dis, &f);
A mutex is not a signal. It is not fair. You can unlock then relock a mutex, and someone waiting for it can never notice.
All it guarantees is that exactly one thread has it locked.
Your task, splitting it into two threads, seems utterly pointless. Using sleep for is also a bad idea, as printing takes an unknown amount of time, making the period between displays drift by an unpredictable amount.
You probably (A) do not want to do this, and failing that (B) use a condition variable. One thread increments the value every X time (based off a fixed start time, not based off delays of X), and then signs the condition variable. It holds no mutex while waiting.
The other thread waits on the condition variable and the counter value changing. When it wakes, it copies the counter, unlocks, prints once, updates the last value seen, then waits on the condition variable (and value changing) again.
A mild benefit to this is that if the io is ridiculously slow or blocking, the counter keeps incrementing, so other consumers can use it.
struct Counting {
int val = -1; // optionally atomic
std::mutex mtx;
std::condition_variable cv;
void counting() {
auto l=std::unique_lock<std::mutex>(mtx);
++val; // even if atomic, val must be modified while or before the mtx is held and before the notify.
// or notify all:
cv.notify_one(); // no need to hold lock here
using namespace std::literals;
std::this_thread::sleep_for(200ms); // ideally wait to an absolute time instead of delay here
void printing() {
int old_val=-1;
int new_val=[&]{
auto lock=std::unique_lock<std::mutex>(mtx);
cv.wait(lock, [&]{ return val!=old_val; }); // only print if we have a new value
return val;
}();// release lock, no need to hold it while printing
std::cout << new_val << std::endl; // endl flushes. Note there are threading issues streaming to cout like this.
old_val=new_val; // update last printed value
if one thread is printing the other counting, you'll get basically what you want.
When launching a thread with a member function, you need to pass the address of the object, not the object itself
std::thread t2(&Foo::dis, &f);
Please note that this still won't print 1 2 3 4 .. You'll need to have the increment operation and the print alternate exactly for that.
#include <thread>
#include <mutex>
std::mutex mtx1, mtx2;
class Foo {
int val;
Foo() : val{0} { mtx2.lock(); }
void inc()
void dis()
using ms = std::chrono::milliseconds;
std::cout << val <<std::endl;
int main()
Foo f;
std::thread t1(&Foo::inc, &f);
std::thread t2(&Foo::dis, &f);
Also take a look at http://en.cppreference.com/w/cpp/thread/condition_variable

How to stop a async evaluating function on timeout?

say we have a simple async call we want to kill/terminate/eliminate on timeout
// future::wait_for
#include <iostream> // std::cout
#include <future> // std::async, std::future
#include <chrono> // std::chrono::milliseconds
// a non-optimized way of checking for prime numbers:
bool is_prime (int x) {
for (int i=2; i<x; ++i) if (x%i==0) return false;
return true;
int main ()
// call function asynchronously:
std::future<bool> fut = std::async (is_prime,700020007);
// do something while waiting for function to set future:
std::cout << "checking, please wait";
std::chrono::milliseconds span (100);
while (fut.wait_for(span)==std::future_status::timeout)
std::cout << '.';
bool x = fut.get();
std::cout << "\n700020007 " << (x?"is":"is not") << " prime.\n";
return 0;
we want to kill it as soon as first timeout happens. Cant find a method in future.
The closest I could find to stop a running task was std::packaged_task reset method yet it does not say if it can interrupt a running task. So how one kills a task running asyncrinusly not using boost thread or other non stl libraries?
It's not possible to stop a std::async out of the box... However, You can do this, pass a bool to terminate the is_prime method and throw an exception if there is a timeout:
// future::wait_for
#include <iostream> // std::cout
#include <future> // std::async, std::future
#include <chrono> // std::chrono::milliseconds
// A non-optimized way of checking for prime numbers:
bool is_prime(int x, std::atomic_bool & run) {
for (int i = 2; i < x && run; ++i)
if (x%i == 0) return false;
if (!run)
throw std::runtime_error("timed out!");
return true;
int main()
// Call function asynchronously:
std::atomic_bool run;
run = true;
std::future<bool> fut = std::async(is_prime, 700020007, std::ref(run));
// Do something while waiting for function to set future:
std::cout << "checking, please wait";
std::chrono::milliseconds span(100);
while (fut.wait_for(span) == std::future_status::timeout)
std::cout << '.';
run = false;
bool x = fut.get();
std::cout << "\n700020007 " << (x ? "is" : "is not") << " prime.\n";
catch (const std::runtime_error & ex)
// Handle timeout here
return 0;
Why being able to stop thread is bad.
Stopping threads at an arbitrary point is dangerous and will lead to resource leaks, where resources being pointers, handles to files and folders, and other things the program should do.
When killing a thread, the thread may or may not be doing work. Whatever it was doing, it won’t get to complete and any variables successfully created will not get their destructors called because there is no thread to run them on.
I have outlined some of the issues here.
I think its not possible to safely interrupt running cycle from outside of cycle itself, so STL doesn't provide such a functionality. Of course, one could try to kill running thread, but it's not safe as may lead to resource leaking.
You can check for timeout inside is_prime function and return from it if timeout happens. Or you can try to pass a reference to std::atomic<bool> to is_prime and check its value each iteration. Then, when timeout happens you change the value of the atomic in the main so is_prime returns.

Why this thread safe queue, creates a deadlock?

I've written my own version of thread safe queue. However, when I run this program, it hangs/deadlocks itself.
Wondering, why is this locks/hangs forever.
void concurrentqueue::addtoQueue(const int number)
locker currentlock(lock_for_queue);
int concurrentqueue::getFromQueue()
int number = 0;
locker currentlock(lock_for_queue);
if ( empty() )
number = numberlist.front();
return number;
bool concurrentqueue::empty()
return numberlist.empty();
I've written, the class locker as RAII.
class locker
locker(pthread_mutex_t& lockee): target(lockee)
pthread_mutex_t target;
My writer/reader thread code is very simple. Writer thread, adds to the queue and reader thread, reads from the queue.
void * writeintoqueue(void* myqueue)
void *t = 0;
concurrentqueue *localqueue = (concurrentqueue *) myqueue;
for ( int i = 0; i < 10 ; ++i)
void * readfromqueue(void* myqueue)
void *t = 0;
concurrentqueue *localqueue = (concurrentqueue *) myqueue;
int number = 0;
for ( int i = 0 ; i < 10 ; ++i)
number = localqueue->getFromQueue();
std::cout << "The number from the queue is " << number << std::endl;
This is definitely not safe:
if ( empty() )
If another thread that was not previously waiting calls getFromQueue() after addtoQueue() has signalled the condition variable and exited but before the waiting thread has aquired the lock then this thread could exit and expect the queue to have values in it. You must recheck that the queue is not empty.
Change the if into a while:
while ( empty() )
Reformulating spong's comment as an answer: your locker class should NOT be copying the pthread_mutex_t by value. You should use a reference or a pointer instead, e.g.:
class locker
locker(pthread_mutex_t& lockee): target(lockee)
pthread_mutex_t& target; // <-- this is a reference
The reason for this is that all pthreads data types should be treated as opaque types -- you don't know what's in them and should not copy them. The library does things like looking at a particular memory address to determine if a lock is held, so if there are two copies of a variable that indicates if the lock is held, odd things could happen, such as multiple threads appearing to succeed in locking the same mutex.
I tested your code, and it also deadlocked for me. I then ran it through Valgrind, and although it did not deadlock in that case (due to different timings, or maybe Valgrind only simulates one thread at a time), Valgrind reported numerous errors. After fixing locker to use a reference instead, it ran without deadlocking and without generating any errors in Valgrind.
See also Debugging with pthreads.