How to let a writer thread starve

How to let a writer thread starve - c++

I've written an implementation of the reader-writers problem using shared_timed_mutex of C++14. In my opinion the following code should cause the Writer to starve since too many reader threads are working on the database (in this example a simple array) all the time: The writer has no chance to acquire the lock.
mutex cout_mtx; // controls access to standard output
shared_timed_mutex db_mtx; // controls access to data_base
int data_base[] = { 0, 0, 0, 0, 0, 0 };
const static int NR_THREADS_READ = 10;
const static int NR_THREADS_WRITE = 1;
const static int SLEEP_MIN = 10;
const static int SLEEP_MAX = 20;
void read_database(int thread_nr) {
shared_lock<shared_timed_mutex> lck(db_mtx, defer_lock); // create a lock based on db_mtx but don't try to acquire the mutex yet
while (true) {
// generate new random numbers
std::random_device r;
std::default_random_engine e(r());
std::uniform_int_distribution<int> uniform_dist(SLEEP_MIN, SLEEP_MAX);
std::uniform_int_distribution<int> uniform_dist2(0, 5);
int sleep_duration = uniform_dist(e); // time to sleep between read requests
int read_duration = uniform_dist(e); // duration of reading from data_base
int cell_number = uniform_dist2(e); // what data cell will be read from
int cell_value = 0;
// wait some time before requesting another access to the database
this_thread::sleep_for(std::chrono::milliseconds(sleep_duration));
if (!lck.try_lock()) {
lck.lock(); // try to get the lock in blocked state
}
// read data
cell_value = data_base[cell_number];
lck.unlock();
}
}
void write_database(int thread_nr) {
unique_lock<shared_timed_mutex> lck(db_mtx, defer_lock); // create a lock based on db_mtx but don't try to acquire the mutex yet
while (true) {
// generate new random numbers
std::random_device r;
std::default_random_engine e(r());
std::uniform_int_distribution<int> uniform_dist(SLEEP_MIN, SLEEP_MAX);
std::uniform_int_distribution<int> uniform_dist2(0, 5);
int sleep_duration = uniform_dist(e); // time to sleep between write requests
int read_duration = uniform_dist(e); // duration of writing to data_base
int cell_number = uniform_dist2(e); // what data cell will be written to
// wait some time before requesting another access to the database
this_thread::sleep_for(std::chrono::milliseconds(sleep_duration));
// try to get exclusive access
cout_mtx.lock();
cout << "Writer <" << thread_nr << "> requesting write access." << endl;
cout_mtx.unlock();
if (!lck.try_lock()) {
lck.lock(); // try to get the lock in blocked state
}
// write data
data_base[cell_number] += 1;
lck.unlock();
}
}
I added some output to standard output when a thread is reading, writing, trying to acquire the lock either in blocked mode or via the try_lock() method but I deleted the output for the sake of clarity. I start the threads further down in the main method. When I run the program the writer always gets the chance to write to the array (causing all of the reader threads to block, which is ok) but as I said above the writer should not be able to get access at all since there are too many reader threads reading from the array. Even when I don't let the reader threads to sleep at all (argument 0) the writer threads somehow finds a way to get the mutex. How do I get the writer to starve then?

A quality implementation of std::shared_timed_mutex will not starve readers nor writers. However as the number of readers / number of writers grows, the lesser the probability that the writers get the lock. With your current setting (1 writer to 10 readers) I'm guessing that the writer gets the lock about 9% of the time. As you increase that ratio, the writer will get the lock less, but will never be 100% starved out.
If you only let the writer acquire under a try_lock, then your chances of starving it 100% will greatly increase.
The existence of the algorithms which allow std::shared_timed_mutex to be implemented without starving readers nor writers is the reason that std::shared_timed_mutex does not have an API that allows you to dictate reader-priority or writer-priority.
The Algorithm
Imagine that there are two gates within the mutex: gate1 and gate2.
To get past gate1 it (almost) doesn't matter whether you are a reader or a writer. If there is another writer inside of gate1, you can't get in. Readers have to follow an additional rule that in practice never comes into play: If there are already the maximum number of readers past gate1, you can't get in.
Once past gate1, a reader owns the shared lock.
Once past gate1, a writer does not own the unique lock. He must further wait outside gate2 until there are no more readers holding the shared lock. Once past gate2, the writer owns the unique lock.
This algorithm is "fair" in that it makes little difference if you are a reader or writer to get past gate1. If there are a bunch of readers and writers outside of gate1, the next thread to get in is decided by the OS, not by this algorithm. So you can think of it as a roll of the dice. If you have the same number of readers as writers competing for gate1, it is a 50/50 chance whether a reader or a writer is the next one to get past gate1.

Related

Reading a variable from reader thread without holding up a writer thread

I've got two threads, a reader thread and a writer thread. The writer thread writes a string and the reader thread reads the string. The writer thread is extremely high speed and I do not want to hold the writer thread up. The reader thread is much slower (a factor million or more slower) and it is not important if the read string is a couple of cycles behind. The only important thing for the reader thread is that when it reads the string that it's not in an undefined state.
Is there a way to be thread safe for reading the string without holding up the writing thread?
I've also looked at making the variable atomic, but I read that this might be a performance bottleneck as well for the writing thread.

I'm not sure if it works but I come up with an idea:
Assume you have two string buffers, Buffer_0 & Buffer_1, each can hold a single string of multiple characters of a predefined max length.
The writer thread alternates between two buffers, but it first checks a mutex. The writer doesn't block on the mutex, it just writes to the other buffer if the mutex is not available. This means that it stops alternating between two buffers and writes into the same buffer multiple times while the reader slowly reads the mutex protected buffer.
Buffer choice of the reader probably doesn't matter much. It can always try to read Buffer_0. It may simply block on the mutex and wait until the writer starts writing Buffer_1. While it reads from the Buffer_0, the writer always writes to Buffer_1 over and over as it fails to acquire the mutex.
Of course, checking the availability, acquiring and releasing of the mutex introduces some run-time cost. Maybe, using an atomic variable which indicates the buffer index that the writer is currently writing into, may work faster than using a mutex. But I'm not sure if it works.
Update: I realized that in the above scenario, Buffer_1 is mostly useless as the reader only reads from Buffer_0. If it's not acceptable for reader to block, it can alternate too and read Buffer_1 instead of waiting. Or the writer can just skip the whole writing operation (writing to Buffer_1) if it's unable to acquire the mutex.

Are you OK with the reader reading a recent value and not necessarily the most recent one ? If so, you can use atomics :
#include <thread>
#include <atomic>
#include <string>
#include <iostream>
std::string spots[4];
std::atomic<int> canWrite;
std::atomic<int> readyIndex;
int writer()
{
while(true) // for demonstration, will be your real writer loop
{
if (readyIndex != canWrite)
{
spots[canWrite] = "foo"; // write here what the writers wants to write
readyIndex = canWrite + 0; // marks that spot as ready
}
}
}
int reader()
{
canWrite = 0;
while(true) // for demonstration, will be your real reader loop
{
if (readyIndex == canWrite)
{
std::cout << spots[readyIndex] << std::endl;
canWrite = (canWrite + 1) % 4; // allow the write to start writing at the next location
}
}
}
int main()
{
std::thread t1(writer);
std::thread r1(reader);
t1.join();
r1.join();
return 0;
}
The reader only writes to canWrite, telling the writer where it can write. The writer only writes to readyIndex, telling the reader where it can read.
If the reader has not read yet the latest string, the writer just skips and goes its merry way.

Using a mutex to block execution from outside the critical section

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?

If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.

You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).

how avoid freezing other threads when one thread locks a big map

How to avoid freezing other threads which try to access the same map that is being locked by current thread? see below code:
//pseudo code
std::map<string, CSomeClass* > gBigMap;
void AccessMapForWriting(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
if (obj){
gBigMap.erase(aString);
delete obj;
obj = NULL;
}
pthread_mutex_unlock(&MapLock);
}
void AccessMapForReading(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
//below code consumes much time
//sometimes it even sleeps for milliseconds
if (obj){
obj->TimeConsumingOperation();
}
pthread_mutex_unlock(&MapLock);
}
//other threads will also call
//the same function -- AccessMap
void *OtherThreadFunc(void *){
//call AccessMap here
}

Consider using a read write lock instead, pthread_rwlock_t
There are some details here
It says
"Using a normal mutex, when a thread obtains the mutex all other
threads are forced to block until that mutex is released by the owner.
What about the situation where the vast majority of threads are simply
reading the data? If this is the case then we should not care if there
is 1 or up to N readers in the critical section at the same time. In
fact the only time we would normally care about exclusive ownership is
when a writer needs access to the code section."

You have a std::string as a key. Can you break down that key in a short suffix (possibly just a single letter) and a remainder? Because in that case, you might implement this datastructure as 255 maps with 255 locks. That of course means that most of the time, there's no lock contention because the suffix differs, and therefore the lock.

C++ multithreading, simple consumer / producer threads, LIFO, notification, counter

I am new to multi-thread programming, I want to implement the following functionality.
There are 2 threads, producer and consumer.
Consumer only processes the latest value, i.e., last in first out (LIFO).
Producer sometimes generates new value at a faster rate than consumer can
process. For example, producer may generate 2 new value in 1
milli-second, but it approximately takes consumer 5 milli-seconds to process.
If consumer receives a new value in the middle of processing an old
value, there is no need to interrupt. In other words, consumer will finish current
execution first, then start an execution on the latest value.
Here is my design process, please correct me if I am wrong.
There is no need for a queue, since only the latest value is
processed by consumer.
Is notification sent from producer being queued automatically???
I will use a counter instead.
ConsumerThread() check the counter at the end, to make sure producer
doesn't generate new value.
But what happen if producer generates a new value just before consumer
goes to sleep(), but after check the counter???
Here is some pseudo code.
boost::mutex mutex;
double x;
void ProducerThread()
{
{
boost::scoped_lock lock(mutex);
x = rand();
counter++;
}
notify(); // wake up consumer thread
}
void ConsumerThread()
{
counter = 0; // reset counter, only process the latest value
... do something which takes 5 milli-seconds ...
if (counter > 0)
{
... execute this function again, not too sure how to implement this ...
}
else
{
... what happen if producer generates a new value here??? ...
sleep();
}
}
Thanks.

If I understood your question correctly, for your particular application, the consumer only needs to process the latest available value provided by the producer. In other words, it's acceptable for values to get dropped because the consumer cannot keep up with the producer.
If that's the case, then I agree that you can get away without a queue and use a counter. However, the shared counter and value variables will be need to be accessed atomically.
You can use boost::condition_variable to signal notifications to the consumer that a new value is ready. Here is a complete example; I'll let the comments do the explaining.
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition_variable.hpp>
#include <boost/thread/locks.hpp>
#include <boost/date_time/posix_time/posix_time_types.hpp>
boost::mutex mutex;
boost::condition_variable condvar;
typedef boost::unique_lock<boost::mutex> LockType;
// Variables that are shared between producer and consumer.
double value = 0;
int count = 0;
void producer()
{
while (true)
{
{
// value and counter must both be updated atomically
// using a mutex lock
LockType lock(mutex);
value = std::rand();
++count;
// Notify the consumer that a new value is ready.
condvar.notify_one();
}
// Simulate exaggerated 2ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
}
}
void consumer()
{
// Local copies of 'count' and 'value' variables. We want to do the
// work using local copies so that they don't get clobbered by
// the producer when it updates.
int currentCount = 0;
double currentValue = 0;
while (true)
{
{
// Acquire the mutex before accessing 'count' and 'value' variables.
LockType lock(mutex); // mutex is locked while in this scope
while (count == currentCount)
{
// Wait for producer to signal that there is a new value.
// While we are waiting, Boost releases the mutex so that
// other threads may acquire it.
condvar.wait(lock);
}
// `lock` is automatically re-acquired when we come out of
// condvar.wait(lock). So it's safe to access the 'value'
// variable at this point.
currentValue = value; // Grab a copy of the latest value
// while we hold the lock.
}
// Now that we are out of the mutex lock scope, we work with our
// local copy of `value`. The producer can keep on clobbering the
// 'value' variable all it wants, but it won't affect us here
// because we are now using `currentValue`.
std::cout << "value = " << currentValue << "\n";
// Simulate exaggerated 5ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(500));
}
}
int main()
{
boost::thread c(&consumer);
boost::thread p(&producer);
c.join();
p.join();
}
ADDENDUM
I was thinking about this question recently, and realized that this solution, while it may work, is not optimal. Your producer is using all that CPU just to throw away half of the computed values.
I suggest that you reconsider your design and go with a bounded blocking queue between the producer and consumer. Such a queue should have the following characteristics:
Thread-safe
The queue has a fixed size (bounded)
If the consumer wants to pop the next item, but the queue is empty, the operation will be blocked until notified by the producer that an item is available.
The producer can check if there's room to push another item and block until the space becomes available.
With this type of queue, you can effectively throttle down the producer so that it doesn't outpace the consumer. It also ensures that the producer doesn't waste CPU resources computing values that will be thrown away.
Libraries such as TBB and PPL provide implementations of concurrent queues. If you want to attempt to roll your own using std::queue (or boost::circular_buffer) and boost::condition_variable, check out this blogger's example.

The short answer is that you're almost certainly wrong.
With a producer/consumer, you pretty much need a queue between the two threads. There are basically two alternatives: either your code won't will simply lose tasks (which usually equals not working at all) or else your producer thread will need to block for the consumer thread to be idle before it can produce an item -- which effectively translates to single threading.
For the moment, I'm going to assume that the value you get back from rand is supposed to represent the task to be executed (i.e., is the value produced by the producer and consumed by the consumer). In that case, I'd write the code something like this:
void producer() {
for (int i=0; i<100; i++)
queue.insert(random()); // queue.insert blocks if queue is full
queue.insert(-1.0); // Tell consumer to exit
}
void consumer() {
double value;
while ((value = queue.get()) != -1) // queue.get blocks if queue is empty
process(value);
}
This, relegates nearly all the interlocking to the queue. The rest of the code for both threads pretty much ignores threading issues entirely.

Implementing a pipeline is actually quite tricky if you are doing it ground-up. For example, you'd have to use condition variable to avoid the kind of race condition you described in your question, avoid busy waiting when implementing the mechanism for "waking up" the consumer etc... Even using a "queue" of just 1 element won't save you from some of these complexities.
It's usually much better to use specialized libraries that were developed and extensively tested specifically for this purpose. If you can live with Visual C++ specific solution, take a look at Parallel Patterns Library, and the concept of Pipelines.

when to use mutex

Here is the thing: there is a float array float bucket[5] and 2 threads, say thread1 and thread2.
Thread1 is in charge of tanking up the bucket, assigning each element in bucket a random number. When the bucket is tanked up, thread2 will access bucket and read its elements.
Here is how I do the job:
float bucket[5];
pthread_mutex_t mu = PTHREAD_MUTEX_INITIALIZER;
pthread_t thread1, thread2;
void* thread_1_proc(void*); //thread1's startup routine, tank up the bucket
void* thread_2_proc(void*); //thread2's startup routine, read the bucket
int main()
{
pthread_create(&thread1, NULL, thread_1_proc, NULL);
pthread_create(&thread2, NULL, thread_2_proc, NULL);
pthread_join(thread1);
pthread_join(thread2);
}
Below is my implementation for thread_x_proc:
void* thread_1_proc(void*)
{
while(1) { //make it work forever
pthread_mutex_lock(&mu); //lock the mutex, right?
cout << "tanking\n";
for(int i=0; i<5; i++)
bucket[i] = rand(); //actually, rand() returns int, doesn't matter
pthread_mutex_unlock(&mu); //bucket tanked, unlock the mutex, right?
//sleep(1); /* this line is commented */
}
}
void* thread_2_proc(void*)
{
while(1) {
pthread_mutex_lock(&mu);
cout << "reading\n";
for(int i=0; i<5; i++)
cout << bucket[i] << " "; //read each element in the bucket
pthread_mutex_unlock(&mu); //reading done, unlock the mutex, right?
//sleep(1); /* this line is commented */
}
}
Question
Is my implementation right? Cuz the output is not as what I expected.
...
reading
5.09434e+08 6.58441e+08 1.2288e+08 8.16198e+07 4.66482e+07 7.08736e+08 1.33455e+09
reading
5.09434e+08 6.58441e+08 1.2288e+08 8.16198e+07 4.66482e+07 7.08736e+08 1.33455e+09
reading
5.09434e+08 6.58441e+08 1.2288e+08 8.16198e+07 4.66482e+07 7.08736e+08 1.33455e+09
reading
tanking
tanking
tanking
tanking
...
But if I uncomment the sleep(1); in each thread_x_proc function, the output is right, tanking and reading follow each other, like this:
...
tanking
reading
1.80429e+09 8.46931e+08 1.68169e+09 1.71464e+09 1.95775e+09 4.24238e+08 7.19885e+08
tanking
reading
1.64976e+09 5.96517e+08 1.18964e+09 1.0252e+09 1.35049e+09 7.83369e+08 1.10252e+09
tanking
reading
2.0449e+09 1.96751e+09 1.36518e+09 1.54038e+09 3.04089e+08 1.30346e+09 3.50052e+07
...
Why? Should I use sleep() when using mutex?

Your code is technically correct, but it does not make a lot of sense, and it does not do what you assume.
What your code does is, it updates a section of data atomically, and reads from that section, atomically. However, you don't know in which order this happens, nor how often the data is written to before being read (or if at all!).
What you probably wanted is generate exactly one sequence of numbers in one thread every time and read exactly one new sequence each time in the other thread. For this, you would use either have to use an additional semaphore or better a single-producer-single-consumer queue.
In general the answer to "when should I use a mutex" is "never, if you can help it". Threads should send messages, not share state. This makes a mutex most of the time unnecessary, and offers parallelism (which is the main incentive for using threads in the first place).
The mutex makes your threads run lockstep, so you could as well just run in a single thread.

There is no implied order in which threads will get to run. This means you shall not expect any order. What's more it is possible to get on thread running over and over without letting the other to run. This is implementation specific and should be assumed random.
The case you presented falls much rather for a semaphor which is "posted" with each element added.
However if it has always to be like:
write 5 elements
read 5 elements
you should have two mutexes:
one that blocks producer until the consumer finished
one that blocks consumer until the producer finished
So the code should look something like that:
Producer:
while(true){
lock( &write_mutex )
[insert data]
unlock( &read_mutex )
}
Consumer:
while(true){
lock( &read_mutex )
[insert data]
unlock( &write_mutex )
}
Initially write_mutex should be unlocked and read_mutex locked.
As I said your code seems to be a better case for semaphores or maybe condition variables.
Mutexes are not meant for cases such as this (which doesn't mean you can't use them, it just means there are more handy tools to solve that problem).

You have no right to assume that just because you want your threads to run in a particular order, the implementation will figure out what you want and actually run them in that order.
Why shouldn't thread2 run before thread1? And why shouldn't each thread complete its loop several times before the other thread gets a chance to run up to the line where it acquires the mutex?
If you want execution to switch between two threads in a predictable way, then you need to use a semaphore, condition variable, or other mechanism for messaging between the two threads. sleep appears to result in the order you want on this occasion, but even with the sleep you haven't done enough to guarantee that they will alternate. And I have no idea why the sleep makes a difference to which thread gets to run first -- is that consistent across several runs?

If you have two functions that should execute sequentially, i.e. F1 should finish before F2 starts, then you shouldn't be using two threads. Run F2 on the same thread as F1, after F1 returns.
Without threads, you won't need the mutex either.

It isn't really the issue here.
The sleep only lets the 'other' thread access the mutex lock (by chance, it is waiting for the lock so Probably it will have the mutex), there is no way you can be sure the first thread won't re-lock the mutex though and let the other thread access it.
Mutex is for protecting data so two threads don't :
a) write simultaneously
b) one is writing when another is reading
It is not for making threads work in a certain order (if you want that functionality, ditch the threaded approach or use a flag to tell that the 'tank' is full for example).

By now, it should be clear, from the other answers, what are the mistakes in the original code. So, let's try to improve it:
/* A flag that indicates whose turn it is. */
char tanked = 0;
void* thread_1_proc(void*)
{
while(1) { //make it work forever
pthread_mutex_lock(&mu); //lock the mutex
if(!tanked) { // is it my turn?
cout << "tanking\n";
for(int i=0; i<5; i++)
bucket[i] = rand(); //actually, rand() returns int, doesn't matter
tanked = 1;
}
pthread_mutex_unlock(&mu); // unlock the mutex
}
}
void* thread_2_proc(void*)
{
while(1) {
pthread_mutex_lock(&mu);
if(tanked) { // is it my turn?
cout << "reading\n";
for(int i=0; i<5; i++)
cout << bucket[i] << " "; //read each element in the bucket
tanked = 0;
}
pthread_mutex_unlock(&mu); // unlock the mutex
}
}
The code above should work as expected. However, as others have pointed out, the result would be better accomplished with one of these two other options:
Sequentially. Since the producer and the consumer must alternate, you don't need two threads. One loop that tanks and then reads would be enough. This solution would also avoid the busy waiting that happens in the code above.
Using semaphores. This would be the solution if the producer was able to run several times in a row, accumulating elements in a bucket (not the case in the original code, though).
http://en.wikipedia.org/wiki/Producer-consumer_problem#Using_semaphores

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js