boost::thread yield different results on every run - c++

I am trying to make use of boost::thread to perform "n" similar jobs. Of course, "n" in general could be exorbitantly high and so I want to restrict the number of simultaneously running threads to some small number m (say 8). I wrote something like the following, where I open 11 text files, four at a time using four threads.
I have a small class parallel (which upon invoking run() method would open an output file and write a line to it, taking in a int variable. The compilation goes smoothly and the program runs without any warning. The result however is not as expected. The files are created, but they are not always 11 in number. Does anyone know what's the mistake I am making?
Here's parallel.hpp:
#include <fstream>
#include <iostream>
#include <boost/thread.hpp>
class parallel{
public:
int m_start;
parallel()
{ }
// member function
void run(int start=2);
};
The parallel.cpp implementation file is
#include "parallel.hpp"
void parallel::run(int start){
m_start = start;
std::cout << "I am " << m_start << "! Thread # "
<< boost::this_thread::get_id()
<< " work started!" << std::endl;
std::string fname("test-");
std::ostringstream buffer;
buffer << m_start << ".txt";
fname.append(buffer.str());
std::fstream output;
output.open(fname.c_str(), std::ios::out);
output << "Hi, I am " << m_start << std::endl;
output.close();
std::cout << "Thread # "
<< boost::this_thread::get_id()
<< " work finished!" << std::endl;
}
And the main.cpp:
#include <iostream>
#include <fstream>
#include <string>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
#include "parallel.hpp"
int main(int argc, char* argv[]){
std::cout << "main: startup!" << std::endl;
std::cout << boost::thread::hardware_concurrency() << std::endl;
parallel p;
int populationSize(11), concurrency(3);
// define concurrent thread group
std::vector<boost::shared_ptr<boost::thread> > threads;
// population one-by-one
while(populationSize >= 0) {
// concurrent threads
for(int i = 0; i < concurrency; i++){
// create a thread
boost::shared_ptr<boost::thread>
thread(new boost::thread(&parallel::run, &p, populationSize--));
threads.push_back(thread);
}
// run the threads
for(int i =0; i < concurrency; i++)
threads[i]->join();
threads.clear();
}
return 0;
}

You have a single parallel object with a single m_start member variable, which all threads access without any synchronization.
Update
This race condition seems to be a consequence of a design problem. It is unclear what an object of type parallel is meant to represent.
If it is meant to represent a thread, then one object should be allocated for each thread created. The program as posted has a single object and many threads.
If it is meant to represent a group of threads, then it should not keep data that belongs to individual threads.

Related

Using boost to turn single thread to multi thread

I'm trying to turn a code from a single thread to a multi thread(example, create 6 threads instead of 1) while making sure they all start and finish without any interference from each other. What would be a way to do this? Could I just do a for loop that creates a thread until i < 6? And just add a mutex class with lock() and unlock()?
#include <iostream>
#include <boost/thread.hpp>
#include <boost/date_time.hpp>
void workerFunc()
{
boost::posix_time::seconds workTime(3);
std::cout << "Worker: running" << std::endl;
// Pretend to do something useful...
boost::this_thread::sleep(workTime);
std::cout << "Worker: finished" << std::endl;
}
int main(int argc, char* argv[])
{
std::cout << "main: startup" << std::endl;
boost::thread workerThread(workerFunc);
std::cout << "main: waiting for thread" << std::endl;
workerThread.join();
std::cout << "main: done" << std::endl;
system("pause");
return 0;
}
Yes, it's certainly possible. Since you don't want any interference between them, give them unique data to work with so that you do not need to synchronize the access to that data with a std::mutex or making it std::atomic. To further minimize the interference between threads, align the data according to std::hardware_destructive_interference_size.
You can use boost::thread::hardware_concurrency() to get the number of hardware threads available on the current system so that you don't have to hardcode the number of threads to run.
Passing references to the thread can be done using std::ref (or else the thread will get a ref to a copy of the data).
Here I create a std::list of threads and a std::vector of data to work on.
#include <cstdint> // std::int64_t
#include <iostream>
#include <list>
#include <new> // std::hardware_destructive_interference_size
#include <vector>
#include <boost/thread.hpp>
unsigned hardware_concurrency() {
unsigned rv = boost::thread::hardware_concurrency();
if(rv == 0) rv = 1; // fallback if hardware_concurrency returned 0
return rv;
}
// if you don't have hardware_destructive_interference_size, use something like this
// instead:
//struct alignas(64) data {
struct alignas(std::hardware_destructive_interference_size) data {
std::int64_t x;
};
void workerFunc(data& d) {
// work on the supplied data
for(int i = 0; i < 1024*1024-1; ++i) d.x -= i;
for(int i = 0; i < 1024*1024*1024-1; ++i) d.x += i;
}
int main() {
std::cout << "main: startup" << std::endl;
size_t number_of_threads = hardware_concurrency();
std::list<boost::thread> threads;
std::vector<data> dataset(number_of_threads);
// create the threads
for(size_t idx = 0; idx < number_of_threads; ++idx)
threads.emplace_back(workerFunc, std::ref(dataset[idx]));
std::cout << "main: waiting for threads" << std::endl;
// join all threads
for(auto& th : threads) th.join();
// display results
for(const data& d : dataset) std::cout << d.x << "\n";
std::cout << "main: done" << std::endl;
}
If you are using C++11 (or later), I suggest using std::thread instead.
Starting and stopping a bunch of Boost threads
std::vector<boost::thread> threads;
for (int i = 0; i < numberOfThreads; ++i) {
boost::thread t(workerFunc);
threads.push_back(std::move(t));
}
for (auto& t : threads) {
t.join();
}
Keep in mind that join() doesn't terminate the threads, it only waits until they are finished.
Synchronization
Mutexes are required if multiple threads access the same data and at least one of them is writing the data. You can use a mutex to ensure that multiple threads enter the critical sections of the code. Example:
std::queue<int> q;
std::mutex q_mu;
void workerFunc1() {
// ...
{
std::lock_guard<std::mutex> guard(q_mu);
q.push(foo);
} // lock guard goes out of scope and automatically unlocks q_mu
// ...
}
void workerFunc2() {
// ...
{
std::lock_guard<std::mutex> guard(q_mu);
foo = q.pop();
} // lock guard goes out of scope and automatically unlocks q_mu
// ...
}
This prevents undefined behavior like reading an item from the queue that hasn't been written completely. Be careful - data races can crash your program or corrupt your data. I'm frequently using tools like Thread Sanitizer or Helgrind to ensure I didn't miss anything. If you only want to pass results back into the main program but don't need to share data between your threads you might want to consider using std::promise and std::future.
Yes, spawning new threads can be done with a simple loop. You will have to keep a few things in mind though:
If threads will operate on shared data, it will need to be protected with mutexes, atomics or via some other way to avoid data races and undefined behaviour (bear in mind that even primitive types such as int have to be wrapped with an atomic or mutex according to the standard).
You will have to make sure that you will eventually either call join() or detach() on every spawned thread before its object goes out of scope to prevent it from suddenly terminating.
Its best to do some computations on the main thread while waiting for worker threads to use this time efficiently instead of wasting it.
You generally want to spawn 1 thread less than the number of total threads you want as the program starts running with with one thread by default (the main thread).

Boost interprocess named_condition_any not notifying

I'm trying to switch an application over from using boost::interprocess::named_mutex to boost::interprocess::file_lock for interprocess synchronization, but when I did so I noticed that my condition variables were never being woken up.
I've created two examples that demonstrate the types of changes I made and the issues I'm seeing. In both examples the same application should periodically send notifications if invoked with any arguments, or wait for notifications if invoked with no arguments
Originally my application used name_mutex and named_condition. The below example using name_mutex and named_condition works as expected: every time the "sender" application prints out "Notifying" the "receiver" application prints out "Notified!" (provided I manually clean out /dev/shm/ between runs).
#include <iostream>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_condition.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <boost/thread.hpp>
int main(int argc, char** argv)
{
boost::interprocess::named_mutex mutex(boost::interprocess::open_or_create,
"mutex");
// Create condition variable
boost::interprocess::named_condition cond(boost::interprocess::open_or_create, "cond");
while(true)
{
if(argc > 1)
{// Sender
std::cout << "Notifying" << std::endl;
cond.notify_all();
boost::this_thread::sleep_for(boost::chrono::seconds(1));
}
else
{// Receiver
std::cout << "Acquiring lock..." << std::endl;
boost::interprocess::scoped_lock<boost::interprocess::named_mutex> lock(mutex);
std::cout << "Locked. Waiting for notification..." << std::endl;
cond.wait(lock);
std::cout << "Notified!" << std::endl;
}
}
return 0;
}
The following code represents my attempt to change the working code above from using name_mutex and named_condition to using file_lock and named_condition_any
#include <iostream>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_condition_any.hpp>
#include <boost/interprocess/sync/file_lock.hpp>
#include <boost/thread.hpp>
int main(int argc, char** argv)
{
// Second option for locking
boost::interprocess::file_lock flock("/tmp/flock");
// Create condition variable
boost::interprocess::named_condition_any cond(boost::interprocess::open_or_create,
"cond_any");
while(true)
{
if(argc > 1)
{// Sender
std::cout << "Notifying" << std::endl;
cond.notify_all();
boost::this_thread::sleep_for(boost::chrono::seconds(1));
}
else
{// Receiver
std::cout << "Acquiring lock..." << std::endl;
boost::interprocess::scoped_lock<boost::interprocess::file_lock> lock(flock);
std::cout << "Locked. Waiting for notification..." << std::endl;
cond.wait(lock);
std::cout << "Notified!" << std::endl;
}
}
return 0;
}
However I can't seem to get the "receiver" application to wake up when notified. The "sender" happily prints "Notifying" at ~1Hz, but the "receiver" hangs after printing "Locked. Waiting for notification..." once.
What am I doing wrong with my file_lock/named_condition_any implementation?
This appears to be caused by a bug in the implementation of boost::interprocess::named_condition_any.
boost::interprocess::named_condition_any is implemented using an instance of boost::interprocess::ipcdetail::shm_named_condition_any. boost::interprocess::ipcdetail::shm_named_condition_any has all of the member variables associated with its implementation aggregated into a class called internal_condition_members. When shm_named_condition_any is constructed it either creates or opens shared memory. If it creates the shared memory it also instantiates an internal_condition_members object in that shared memory.
The problem is that shm_named_condition_any also maintains a "local" (i.e. just on the stack, not in shared memory) member instance of an internal_condition_members object, and its wait, timed_wait, notify_one, and notify_all functions are all implemented using the local internal_condition_members member instead of the internal_condition_members from shared memory.
I was able to get the expected behavior from my example by editing boost/interprocess/sync/shm/named_condition_any.hpp and changing the implementation of the shm_named_condition_any class as follows:
typedef ipcdetail::condition_any_wrapper<internal_condition_members> internal_condition;
internal_condition m_cond;
to
typedef ipcdetail::condition_any_wrapper<internal_condition_members> internal_condition;
internal_condition &internal_cond()
{ return *static_cast<internal_condition*>(m_shmem.get_user_address()); }
and changing all usages of m_cond to this->internal_cond(). This is analogous to how the shm_named_condition class is implemented.

Thread programming in vmware, 'process scheduling' didn't happen

I'm having some thread programming learning on my virtual machine. The code that do not perform as expected is following:
#include <iostream>
#include <thread>
using namespace std;
void function01() {
for (int i=0; i<100; i++) {
std::cout << "from t1:" << i << std::endl;
}
}
int main() {
// data race and mutex
std::thread t1( function01 );
for (int i=0; i<100; i++) {
std::cout << "from main:" << i << std::endl;
}
t1.join();
return 0;
}
These code should make a data race on std output. But when I compiled it with
:!g++ -std=c++11 -pthread ./foo.cpp
and running, every time I got a result in which 100 times "t1" followed 100 times "main". What confusing me is that when I did the same thing on my another ubuntu14.04 which was installed in my old lap-top, the code performed as my expected. That means this code encountered with data race.
I don't know much about vmware. Are the threads running on the vmware are managed and won't encountered data race?
------------- second edit -----------------------
Thanks for everybody.
The quantity of core might be the main reason. And I had my expected result after setting quantity of vm core to more than one.
Your new machine is probably much faster than your old one. So it is able to complete execution of function01 before main gets to its own loop.
Or it has only one CPU, so it can execute only one routine at a time. And because your loop requires really small amount of computation, CPU could be done with it in one slice of time given to it by OS.
Make sure that your VM has more than one CPU allocated to it. And try to make each step in your loops 'heavier'.
double accumulator = 0;
for (int i=0; i<100; i++) {
for (int j=1; j<1000*1000; j++)
accumulator += std::rand();
std::cout << "from t1:" << i << std::endl;
}
I think the problem is with the time slice. You can verify it by yourself, by introducing some delay in your code. For example:
#include <iostream>
#include <chrono>
#include <thread>
void function01() {
for (int i=0; i<100; i++) {
std::cout << "from t1:" << i << std::endl;
std::this_thread::sleep_for(std::chrono::duration<double, std::milli>{10});
}
}
int main() {
// data race and mutex
std::thread t1( function01 );
for (int i=0; i<100; i++) {
std::cout << "from main:" << i << std::endl;
std::this_thread::sleep_for(std::chrono::duration<double, std::milli>{10});
}
t1.join();
return 0;
}

C++ pthreads, main function stops running early

I'm making my first multithreaded program and having some issues. I based the code on an example I found online, which worked fine until I made my changes. The main function below makes several threads which run another function. That function runs instances of another c++ program that I wrote which works fine. The issue is that after the program creates all the threads, it stops running. The other threads continue to run and work fine, but the main thread stops, not even printing out a cout statement I gave it. For example, if I run it the output is:
Enter the number of threads:
// I enter '3'
main() : creating thread, 0
this line prints every time
main() : creating thread, 1
this line prints every time
main() : creating thread, 2
this line prints every time
this is followed by all the output from my other program, which is running 3 times. But the main function never prints out "This line is never printed out". I'm sure there is some fundamental misunderstanding I have of how threads work.
#include <iostream>
#include <stdlib.h>
#include <cstdlib>
#include <pthread.h>
#include <stdio.h>
#include <string>
#include <sstream>
#include <vector>
#include <fstream>
#include <unistd.h>
using namespace std;
struct thread_data{
int thread_id;
};
void *PrintHello(void *threadarg)
{
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
stringstream convert;
convert << "./a.out " << my_data->thread_id << " " << (my_data->thread_id+1) << " " << my_data->thread_id;
string sout = convert.str();
system(sout.c_str());
pthread_exit(NULL);
}
int main ()
{
int NUM_THREADS;
cout << "Enter the number of threads:\n";
cin >> NUM_THREADS;
pthread_t threads[NUM_THREADS];
struct thread_data td[NUM_THREADS];
int i;
for( i=0; i < NUM_THREADS; i++ ){
cout <<"main() : creating thread, " << i << endl;
td[i].thread_id = i;
pthread_create(&threads[i], NULL, PrintHello, (void *)&td[i]);
cout << endl << "this line prints every time" << endl;
}
cout << endl << "This line is never printed out";
pthread_exit(NULL);
}
It's because you're not using pthread_join(threads[i],NULL). pthread_join() prevents the main from ending before the threads finish executing

When boost library "interprocess" defines a named_mutex do those named_mutexes work properly between different processes, or only with threads?

I think I must be assuming something from the name boost::interprocess that is not true.
The documents repeat that named_mutex is global here.
I am unable to make it work though. Two copies of the same executable should be run at the same time, and I expect that a named mutex in a library named boost::interprocess might actually BLOCK sometimes. It doesn't. It also doesn't prevent data file corruption in the code below.
Here's some code from the boost docs:
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <fstream>
#include <iostream>
#include <cstdio>
int main ()
{
using namespace boost::interprocess;
try{
struct file_remove
{
file_remove() { std::remove("file_name"); }
~file_remove(){ std::remove("file_name"); }
} file_remover;
struct mutex_remove
{
mutex_remove() { named_mutex::remove("fstream_named_mutex"); }
~mutex_remove(){ named_mutex::remove("fstream_named_mutex"); }
} remover;
//Open or create the named mutex
named_mutex mutex(open_or_create, "fstream_named_mutex");
std::ofstream file("file_name");
for(int i = 0; i < 10; ++i){
//Do some operations...
//Write to file atomically
scoped_lock<named_mutex> lock(mutex);
file << "Process name, ";
file << "This is iteration #" << i;
file << std::endl;
}
}
catch(interprocess_exception &ex){
std::cout << ex.what() << std::endl;
return 1;
}
return 0;
Here's what I did to it so I could prove to myself the mutex was doing something:
#include <windows.h>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/lambda/lambda.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <cstdio>
int main (int argc, char *argv[])
{
srand((unsigned) time(NULL));
using namespace boost::interprocess;
try{
/*
struct file_remove
{
file_remove() { std::remove("file_name"); }
~file_remove(){ std::remove("file_name"); }
} file_remover;
*/
struct mutex_remove
{
mutex_remove() { named_mutex::remove("fstream_named_mutex"); }
~mutex_remove(){ named_mutex::remove("fstream_named_mutex"); }
} remover;
//Open or create the named mutex
named_mutex mutex(open_or_create, "fstream_named_mutex");
std::ofstream file("file_name");
for(int i = 0; i < 100; ++i){
//Do some operations...
//Write to file atomically
DWORD n1,n2;
n1 = GetTickCount();
scoped_lock<named_mutex> lock(mutex);
n2 = GetTickCount();
std::cout << "took " << (n2-n1) << " msec to acquire mutex";
int randomtime = rand()%10;
if (randomtime<1)
randomtime = 1;
Sleep(randomtime*100);
std::cout << " ... writing...\n";
if (argc>1)
file << argv[1];
else
file << "SOMETHING";
file << " This is iteration #" << i;
file << std::endl;
file.flush(); // added in case this explains the corruption, it does not.
}
}
catch(interprocess_exception &ex){
std::cout << "ERROR " << ex.what() << std::endl;
return 1;
}
return 0;
}
Console Output:
took 0 msec to acquire mutex ... writing...
took 0 msec to acquire mutex ... writing...
took 0 msec to acquire mutex ... writing...
took 0 msec to acquire mutex ... writing...
Also, the demo writes to a file, which if you run two copies of the program will be missing some data.
I expect that if I delete file_name and run two copies of the program, I should get interleaved writes to file_name containing 100 rows from each instance.
(Note, that the demo code is clearly not using an ofstream in append mode, instead it simply rewrites the file each time this program runs, so if we wanted a demo to show two processes writing to a file, I'm aware of that reason why it wouldn't work, but what I did expect is for the above code to be a feasible demonstration of mutual exclusion, which it is not. Also calls to a very handy and aptly named ofstream::flush() method could have been included, and weren't.)
Using Boost 1.53 on Visual C++ 2008
It turns out that Boost is a wonderful library, and it code examples interspersed in the documentation may sometimes be broken. At least the one for boost::interprocess::named_mutex in the docs is not functional on Windows systems.
*Always deleting a mutex as part of the demo code causes the mutex to not function. *
That should be commented in the demo code at the very least. It fails to pass the "principle of least amazement", although I wondered why it was there, I thought it must be idiomatic and necessary, it's idiotic and unnecessary, in actual fact. Or if it's necessary it's an example of what Joel Spolsky would call a leaky abstraction. If mutexes are really filesystem points under C:\ProgramData in Windows I sure don't want to know about it, or know that turds get left behind that will break the abstraction if I don't detect that case and clean it up. (Sure smells like posix friendly semantics for mutexes in Boost have caused them to use a posix-style implementation instead of going to Win32 API directly and implementing a simple mutex that has no filesystem turds.)
Here's a working demo:
#include <windows.h>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/lambda/lambda.hpp>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <fstream>
#include <iostream>
#include <cstdio>
#include <windows.h>
int main (int argc, char *argv[])
{
srand((unsigned) time(NULL));
using namespace boost::interprocess;
try{
/*
// UNCOMMENT THIS IF YOU WANT TO MAKE THIS DEMO IMPOSSIBLE TO USE TO DEMO ANYTHING
struct file_remove
{
file_remove() { std::remove("file_name"); }
~file_remove(){ std::remove("file_name"); }
} file_remover;
// UNCOMMENT THIS IF YOU WANT TO BREAK THIS DEMO HORRIBLY:
struct mutex_remove
{
mutex_remove() { named_mutex::remove("fstream_named_mutex"); }
~mutex_remove(){ named_mutex::remove("fstream_named_mutex"); }
} remover;
*/
//Open or create the named mutex
named_mutex mutex(open_or_create, "fstream_named_mutex");
std::ofstream file("file_name", std::ios_base::app );
int randomtime = 0;
for(int i = 0; i < 100; ++i){
//Do some operations...
//Write to file atomically
DWORD n1,n2;
n1 = GetTickCount();
{
scoped_lock<named_mutex> lock(mutex);
n2 = GetTickCount();
std::cout << "took " << (n2-n1) << " msec to acquire mutex";
randomtime = rand()%10;
if (randomtime<1)
randomtime = 1;
std::cout << " ... writing...\n";
if (argc>1)
file << argv[1];
else
file << "SOMETHING";
file << "...";
Sleep(randomtime*100);
file << " This is iteration #" << i;
file << std::endl;
file.flush();
}
Sleep(randomtime*100); // let the other guy in.
}
}
catch(interprocess_exception &ex){
std::cout << "ERROR " << ex.what() << std::endl;
return 1;
}
return 0;
}
I would love critques and edits on this answer, so that people will have a working demo of using this named mutex .
To use the demo:
- Build it and run two copies of it. Pass a parameter in so you can see which instance wrote which lines (start myexename ABC and start myexename DEF from a command prompt in windows)
- If it's your second run, delete any stray output named "file_name" if you don't want the second run appended to the first.