Potential race condition for a simple multiple threading program - c++

I'm writing a simple program that consists of three threads. Each thread is passed in an object Foo and no matter which thread calls which function, the output for the program will always be "firstsecondthird". I use semaphore and I'm writing the test code for my implementation. Sometimes, my test case passed but sometimes the test case failed:
input: [1,2,3] = firstsecond
Assertion failed: (false), function test, file /home/foo/printInOrder.cc, line 100.
Abort trap: 6
My program looks like below:
#include "cpputility.h"
#include <functional>
#include <iostream>
#include <semaphore.h>
#include <sstream>
#include <string>
#include <thread>
#include <unordered_map>
#include <vector>
using namespace std;
void printFirst()
{
cout << "first" << std::flush;
}
void printSecond()
{
cout << "second" << std::flush;
}
void printThird()
{
cout << "third" << std::flush;
}
class Foo
{
protected:
sem_t firstJobDone;
sem_t secondJobDone;
public:
Foo()
{
sem_init(&firstJobDone, 0, 0);
sem_init(&secondJobDone, 0, 0);
}
void first(function<void()> printFirst)
{
printFirst();
sem_post(&firstJobDone);
}
void second(function<void()> printSecond)
{
sem_wait(&firstJobDone);
printSecond();
sem_post(&secondJobDone);
}
void third(function<void()> printThird)
{
sem_wait(&secondJobDone);
printThird();
}
};
void test()
{
unordered_map<int, pair<void (Foo::*)(function<void()>), function<void()>>> m({
{1, {&Foo::first, printFirst}},
{2, {&Foo::second, printSecond}},
{3, {&Foo::third, printThird}},
});
struct testCase
{
vector<int> input;
string expected;
};
vector<testCase> test_cases = {
{{1, 2, 3}, "firstsecondthird"},
{{1, 3, 2}, "firstsecondthird"},
};
for (auto &&test_case : test_cases)
{
std::stringstream buffer;
std::streambuf *old = std::cout.rdbuf(buffer.rdbuf());
Foo foo;
vector<thread> threads;
for (int i = 0; i < 3; ++i)
{
threads.emplace_back(m[i+1].first, foo, m[i+1].second);
}
for (auto &&th : threads)
{
th.join();
}
auto got = buffer.str();
if (got != test_case.expected)
{
printf("input: %s = %s\n",
CPPUtility::oneDVectorStr<int>(test_case.input).c_str(),
got.c_str());
assert(false);
}
std::cout.rdbuf(old);
}
}
int main()
{
for(int i = 0; i < 10; ++i) {
// Test repeatedly to detect any potential race condition
test();
}
}
The oneDVectorStr is some helper function I write inside a file called cpputility.h to help print out the 1D vector, here is the implementation to compile the code above
template <typename T>
std::string oneDVectorStr(const std::vector<T>& vec) {
std::string cand = "[";
for(int i = 0; i < vec.size(); ++i) {
cand += std::to_string(vec[i]);
i != vec.size() - 1 ? cand += "," : cand += "";
}
cand += "]";
return cand;
}
I've stared at this code for quite a while but couldn't locate any race condition. Any suggestion is welcome. Thanks in advance.

I take a further look at code and realize that there is a subtle bug in my test code: I pass Foo object (e.g., foo) by copy when create each new thread. However, I really want to have multiple threads sharing the same Foo object across multiple threads. Thus, I add std::ref to the foo:
threads.emplace_back(m[i + 1].first, ref(foo), m[i + 1].second);
In addition, I print firstJobDone and secondJobDone semaphore values using sem_getvalue() as following:
Foo()
{
sem_init(&firstJobDone, 0, 0);
sem_init(&secondJobDone, 0, 0);
int value;
sem_getvalue(&firstJobDone, &value);
printf("The initial value of the firstJobDone is %d\n", value);
sem_getvalue(&secondJobDone, &value);
printf("The initial value of the secondJobDone is %d\n", value);
}
And quite shocking, I have:
The initial value of the firstJobDone is 32766
The initial value of the secondJobDone is 32766
input: [1,2,3] = third
Assertion failed: (false), function test, file /home/foo/printInOrder.cc, line 101.
Abort trap: 6
Both semaphores are not properly initialized to 0 with LLVM on the Mac that I'm using. However, my implementation invariant insists that both semaphores have to be initilized to 0. I don't understand why but I'm assuming since sem_init is marked as deprecated by LLVM, the behavior is not guaranteed to be correct. Thus, per the comments to my question, I change my implementation using conditional variable and mutex, and everything works fine.

Related

Displaying results as soon as they are ready with std::async

I'm trying to discover asynchronous programming in C++. Here's a toy example I've been using:
#include <iostream>
#include <future>
#include <vector>
#include <chrono>
#include <thread>
#include <random>
// For simplicity
using namespace std;
int called_from_async(int m, int n)
{
this_thread::sleep_for(chrono::milliseconds(rand() % 1000));
return m * n;
}
void test()
{
int m = 12;
int n = 42;
vector<future<int>> results;
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 10; j++)
{
results.push_back(async(launch::async, called_from_async, i, j));
}
}
for(auto& f : results)
{
cout << f.get() << endl;
}
}
Now, the example is not really interesting, but it raises a question that is, to me, interesting. Let's say I want to display results as they "arrive" (I don't know what will be ready first, since the delay is random), how should I do it?
What I'm doing here is obviously wrong, since I wait for all the tasks in the order in which I created them - so I'll wait for the first to finish even if it's longer than the others.
I thought about the following idea: for each future, using wait_for on a small time and if it's ready, display the value. But I feel weird doing that:
while (any_of(results.begin(), results.end(), [](const future<int>& f){
return f.wait_for(chrono::seconds(0)) != future_status::ready;
}))
{
cout << "Loop" << endl;
for(auto& f : results)
{
auto result = f.wait_for(std::chrono::milliseconds(20));
if (result == future_status::ready)
cout << f.get() << endl;
}
}
This brings another issue: we'd call get several times on some futures, which is illegal:
terminate called after throwing an instance of 'std::future_error' what(): std::future_error: No associated state
So I don't really know what to do here, please suggest!
Use valid() to skip the futures for which you have already called get().
bool all_ready;
do {
all_ready = true;
for(auto& f : results) {
if (f.valid()) {
auto result = f.wait_for(std::chrono::milliseconds(20));
if (result == future_status::ready) {
cout << f.get() << endl;
}
else {
all_ready = false;
}
}
}
}
while (!all_ready);

using boost thread pool executor submit method

I am trying to learn boost thread pool.
not working, hope to find help here.
Some strange thing happens in the following code, when I
create an executor ea and
use async to start it till it finishes the submitted functions,
then submit other functions to this first executor ea but I don't start it,
and then go make another executor ea3 of the same type (basic_thread_pool) and different number of threads (1)
and start the 2nd executor ea3 with async,
it first executes the 1st executer ea, then the second executor ea3!
Although I just use ea3 as argument to async.
Why does this happen? and how do I prevent the execution of ea?
Does nameless scope has an effect??
Live On Coliru
#define BOOST_THREAD_VERSION 4
#define BOOST_THREAD_PROVIDES_EXECUTORS
#define BOOST_THREAD_USES_LOG_THREAD_ID
#define BOOST_THREAD_QUEUE_DEPRECATE_OLD
#include <boost/lexical_cast.hpp> // for lexical_cast
#include <boost/thread.hpp>
#include <boost/thread/caller_context.hpp> // for BOOST_CONTEXTOF, caller_context_t, operator<<
#include <boost/thread/executors/basic_thread_pool.hpp> // for basic_thread_pool
#include <boost/thread/executors/executor.hpp> // for executor
#include <boost/thread/executors/executor_adaptor.hpp> // for executor_adaptor
#include <iostream> // for operator<<, cout, endl, ..
#include <chrono> // for operator "" ms, operator "" s
#include <thread> // for sleep_for
#include <string>
#include <vector>
static boost::mutex stdout_mutex;
template <typename A, typename B = std::string> static inline void trace(A const &a, B const& b = "") {
boost::lock_guard<boost::mutex> lock(stdout_mutex);
std::cout << a << b << std::endl;
}
static inline void trace(boost::caller_context_t const &ctx) { trace(boost::lexical_cast<std::string>(ctx)); }
namespace {
using namespace std::chrono_literals;
using std::this_thread::sleep_for;
void p1() { trace(BOOST_CONTEXTOF); sleep_for(20ms); }
void p2() { trace(BOOST_CONTEXTOF); sleep_for(1s); }
int f1() { trace(BOOST_CONTEXTOF); sleep_for(100ms); return 1; }
}
void submit_some(boost::executor &tp) {
for (int i = 0; i < 3; ++i) { tp.submit(p2); }
for (int i = 0; i < 3; ++i) { tp.submit(p1); }
}
int test_executor_adaptor() {
trace(BOOST_CONTEXTOF);
try {
boost::executor_adaptor<boost::basic_thread_pool> ea(4);
trace(BOOST_CONTEXTOF);
submit_some(ea);
{
boost::future<int> t1 = boost::async(ea, &f1);
trace(BOOST_CONTEXTOF);
trace(" t1= ", t1.get()); // problem unknown on running showing thread result before running code
std::vector<boost::future<int> > vt1;
for (int i = 0; i < 4; i++) {
vt1.push_back(boost::async(ea, &f1)); // here async starts all closures already submitted to ea
// then submit f1 to all threads in ea to work
// asynchronusly so end result will be 7 already submitted
// and 4 more f1 and return futures to only the last 4 f1
// which is submitted by async
}
for (auto &e : vt1) {
auto e_value = e.get();
trace("vt1 e_value = ", e_value);
}
}
submit_some(ea);
{
boost::executor_adaptor<boost::basic_thread_pool> ea3(1);
boost::future<int> t1 = boost::async(ea3, &f1);
std::vector<boost::future<int> > vt1;
for (int i = 0; i < 4; i++) {
vt1.push_back(
boost::async(ea3, &f1)); // here async starts all closures already submitted to ea
// then submit f1 to all threads in ea to work
// asynchronusly so end result will be 7 already submitted
// and 4 more f1 and return futures to only the last 4 f1
// which is submitted by async
}
for (auto &e : vt1) {
trace("vt1 e_value = ", e.get());
}
}
}
catch (std::exception &ex) { trace("ERROR= ", ex.what()); return 1; }
catch (...) { trace("UNKNOWN EXCEPTION"); return 2; }
trace(BOOST_CONTEXTOF);
return 0;
}
int main() {
trace(BOOST_CONTEXTOF);
return test_executor_adaptor();
}
i think that basic_thread_pool use event loop,
and this event loop is singleton,
so when i define another basic_thread_pool it just uses the same event loop,and in turn when getting futures it run all tasks on the event loop including those of other thread pools,
this is a guess but i cant find proof for it from the impl code
in case this guess is right how can i make different event loop for each basic_thread_pool.
i use msvc 14
thanks

std::atomic_flag to stop multiple threads

I'm trying to stop multiple worker threads using a std::atomic_flag. Starting from Issue using std::atomic_flag with worker thread the following works:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
std::atomic_flag continueFlag;
std::thread t;
void work()
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
continueFlag.test_and_set(std::memory_order_relaxed);
t = std::thread(&work);
}
void stop()
{
continueFlag.clear(std::memory_order_relaxed);
t.join();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
Trying to rewrite into multiple worker threads:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
#include <vector>
#include <memory>
struct thread_data {
std::atomic_flag continueFlag;
std::thread thread;
};
std::vector<thread_data> threads;
void work(int threadNum, std::atomic_flag &continueFlag)
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work" << threadNum << " ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
thread_data td;
td.continueFlag.test_and_set(std::memory_order_relaxed);
td.thread = std::thread(&work, i, td.continueFlag);
threads.push_back(std::move(td));
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
}
}
void stop()
{
//Flag stop
for (auto &data : threads) {
data.continueFlag.clear(std::memory_order_relaxed);
}
//Join
for (auto &data : threads) {
data.thread.join();
}
threads.clear();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
My issue is "Problem Sector" in above. Namely creating the threads. I cannot wrap my head around how to instantiate the threads and passing the variables to the work thread.
The error right now is referencing this line threads.push_back(std::move(td)); with error Error C2280 'thread_data::thread_data(const thread_data &)': attempting to reference a deleted function.
Trying to use unique_ptr like this:
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, td->continueFlag);
threads.push_back(std::move(td));
Gives error std::atomic_flag::atomic_flag(const std::atomic_flag &)': attempting to reference a deleted function at line td->thread = std::thread(&work, i, td->continueFlag);. Am I fundamentally misunderstanding the use of std::atomic_flag? Is it really both immovable and uncopyable?
Your first approach was actually closer to the truth. The problem is that it passed a reference to an object within the local for loop scope to each thread, as a parameter. But, of course, once the loop iteration ended, that object went out of scope and got destroyed, leaving each thread with a reference to a destroyed object, resulting in undefined behavior.
Nobody cared about the fact that you moved the object into the std::vector, after creating the thread. The thread received a reference to a locally-scoped object, and that's all it knew. End of story.
Moving the object into the vector first, and then passing to each thread a reference to the object in the std::vector will not work either. As soon as the vector internally reallocates, as part of its natural growth, you'll be in the same pickle.
What needs to happen is to have the entire threads array created first, before actually starting any std::threads. If the RAII principle is religiously followed, that means nothing more than a simple call to std::vector::resize().
Then, in a second loop, iterate over the fully-cooked threads array, and go and spawn off a std::thread for each element in the array.
I was almost there with my unique_ptr solution. I just needed to pass the call as a std::ref() as such:
std::vector<std::unique_ptr<thread_data>> threads;
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, std::ref(td->continueFlag));
threads.push_back(std::move(td));
}
}
However, inspired by Sam above I also figured a non-pointer way:
std::vector<thread_data> threads;
void start()
{
const unsigned int numThreads = 2;
//create new vector, resize doesn't work as it tries to assign/copy which atomic_flag
//does not support
threads = std::vector<thread_data>(numThreads);
for (int i = 0; i < numThreads; i++) {
auto& t = threads.at(i);
t.continueFlag.test_and_set(std::memory_order_relaxed);
t.thread = std::thread(&work, i, std::ref(t.continueFlag));
}
}

How to apply a concurrent solution to a Producer-Consumer like situation

I have a XML file with a sequence of nodes. Each node represents an element that I need to parse and add in a sorted list (the order must be the same of the nodes found in the file).
At the moment I am using a sequential solution:
struct Graphic
{
bool parse()
{
// parsing...
return parse_outcome;
}
};
vector<unique_ptr<Graphic>> graphics;
void producer()
{
for (size_t i = 0; i < N_GRAPHICS; i++)
{
auto g = new Graphic();
if (g->parse())
graphics.emplace_back(g);
else
delete g;
}
}
So, only if the graphic (that actually is an instance of a class derived from Graphic, a Line, a Rectangle and so on, that is why the new) can be properly parse, it will be added to my data structure.
Since I only care about the order in which thes graphics are added to my list, I though to call the parse method asynchronously, such that the producer has the task of read each node from the file and add this graphic to the data structure, while the consumer has the task of parse each graphic whenever a new graphic is ready to be parsed.
Now I have several consumer threads (created in the main) and my code looks like the following:
queue<pair<Graphic*, size_t>> q;
mutex m;
atomic<size_t> n_elements;
void producer()
{
for (size_t i = 0; i < N_GRAPHICS; i++)
{
auto g = new Graphic();
graphics.emplace_back(g);
q.emplace(make_pair(g, i));
}
n_elements = graphics.size();
}
void consumer()
{
pair<Graphic*, size_t> item;
while (true)
{
{
std::unique_lock<std::mutex> lk(m);
if (n_elements == 0)
return;
n_elements--;
item = q.front();
q.pop();
}
if (!item.first->parse())
{
// here I should remove the item from the vector
assert(graphics[item.second].get() == item.first);
delete item.first;
graphics[item.second] = nullptr;
}
}
}
I run the producer first of all in my main, so that when the first consumer starts the queue is already completely full.
int main()
{
producer();
vector<thread> threads;
for (auto i = 0; i < N_THREADS; i++)
threads.emplace_back(consumer);
for (auto& t : threads)
t.join();
return 0;
}
The concurrent version seems to be at least twice as faster as the original one.
The full code has been uploaded here.
Now I am wondering:
Are there any (synchronization) errors in my code?
Is there a way to achieve the same result faster (or better)?
Also, I noticed that on my computer I get the best result (in terms of elapsed time) if I set the number of thread equals to 8. More (or less) threads give me worst results. Why?
Blockquote
There isn't synchronization errors, but I think that the memory managing could be better, since your code leaked if parse() throws an exception.
There isn't synchronization errors, but I think that your memory managing could be better, since you will have leaks if parse() throw an exception.
Blockquote
Is there a way to achieve the same result faster (or better)?
Probably. You could use a simple implementation of a thread pool and a lambda that do the parse() for you.
The code below illustrate this approach. I use the threadpool implementation
here
#include <iostream>
#include <stdexcept>
#include <vector>
#include <memory>
#include <chrono>
#include <utility>
#include <cassert>
#include <ThreadPool.h>
using namespace std;
using namespace std::chrono;
#define N_GRAPHICS (1000*1000*1)
#define N_THREADS 8
struct Graphic;
using GPtr = std::unique_ptr<Graphic>;
static vector<GPtr> graphics;
struct Graphic
{
Graphic()
: status(false)
{
}
bool parse()
{
// waste time
try
{
throw runtime_error("");
}
catch (runtime_error)
{
}
status = true;
//return false;
return true;
}
bool status;
};
int main()
{
auto start = system_clock::now();
auto producer_unit = []()-> GPtr {
std::unique_ptr<Graphic> g(new Graphic);
if(!g->parse()){
g.reset(); // if g don't parse, return nullptr
}
return g;
};
using ResultPool = std::vector<std::future<GPtr>>;
ResultPool results;
// ThreadPool pool(thread::hardware_concurrency());
ThreadPool pool(N_THREADS);
for(int i = 0; i <N_GRAPHICS; ++i){
// Running async task
results.emplace_back(pool.enqueue(producer_unit));
}
for(auto &t : results){
auto value = t.get();
if(value){
graphics.emplace_back(std::move(value));
}
}
auto duration = duration_cast<milliseconds>(system_clock::now() - start);
cout << "Elapsed: " << duration.count() << endl;
for (size_t i = 0; i < graphics.size(); i++)
{
if (!graphics[i]->status)
{
cerr << "Assertion failed! (" << i << ")" << endl;
break;
}
}
cin.get();
return 0;
}
It is a bit faster (1s) on my machine, more readable, and removes the necessity of shared datas (synchronization is evil, avoid it or hide it in a reliable and efficient way).

how to use boost atomic to remove race condition?

I am trying to use boost::atomic to do multithreading synchronization on linux.
But, the result is not consistent.
Any help will be appreciated.
thanks
#include <boost/bind.hpp>
#include <boost/threadpool.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread.hpp>
#include <boost/atomic.hpp>
boost::atomic<int> g(0) ;
void f()
{
g.fetch_add(1, boost::memory_order_relaxed);
return ;
}
const int threadnum = 10;
int main()
{
boost::threadpool::fifo_pool tp(threadnum);
for (int i = 0 ; i < threadnum ; ++i)
tp.schedule(boost::bind(f));
tp.wait();
std::cout << g << std::endl ;
return 0 ;
}
I'm not familiar with the boost thread library specifically, or boost::threadpool, but it looks to me like the threads have not necessarily completed when you access the value of g, so you will get some value between zero and 10.
Here's your program, modified to use the standard library, with joins inserted so that the fetch adds happen before the output of g.
std::atomic<int> g(0);
void f() {
g.fetch_add(1, std::memory_order_relaxed);
}
int main() {
const int threadnum = 10;
std::vector<std::thread> v;
for (int i = 0 ; i < threadnum ; ++i)
v.push_back(std::thread(f));
for (auto &th : v)
th.join();
std::cout << g << '\n';
}
edit:
If your program still isn't consistent even with the added tp.wait() then that is puzzling. The adds should happen before the threads end, and I would think that the threads ending would synchronize with the tp.wait(), which happens before the read. So all the adds should happen before g is printed, even though you use memory_order_relaxed, so the printed value should be 10.
Here are some examples that might help:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html
Basically, you're trying to "protect" a "critical region" with a "lock".
You can set or unset a semaphore.
Or you can "exchange" a boost "atomic" variable. For example (from the above link):
class spinlock {
private:
typedef enum {Locked, Unlocked} LockState;
boost::atomic<LockState> state_;
public:
spinlock() : state_(Unlocked) {}
lock()
{
while (state_.exchange(Locked, boost::memory_order_acquire) == Locked) {
/* busy-wait */
}
}
unlock()
{
state_.store(Unlocked, boost::memory_order_release);
}
};