c++ threading, duplicate/missing threads - c++

I'm trying to write a program that concurrently add and removes items from a "storehouse". I have a "Monitor" class that handles the "storehouse" operations:
class Monitor
{
private:
mutex m;
condition_variable cv;
vector<Storage> S;
int counter = 0;
bool busy = false;;
public:
void add(Computer c, int index) {
unique_lock <mutex> lock(m);
if (busy)
cout << "Thread " << index << ": waiting for !busy " << endl;
cv.wait(lock, [&] { return !busy; });
busy = true;
cout << "Thread " << index << ": Request: add " << c.CPUFrequency << endl;
for (int i = 0; i < counter; i++) {
if (S[i].f == c.CPUFrequency) {
S[i].n++;
busy = false; cv.notify_one();
return;
}
}
Storage s;
s.f = c.CPUFrequency;
s.n = 1;
// put the new item in a sorted position
S.push_back(s);
counter++;
busy = false; cv.notify_one();
}
}
The threads are created like this:
void doThreadStuff(vector<Computer> P, vector <Storage> R, Monitor &S)
{
int Pcount = P.size();
vector<thread> myThreads;
myThreads.reserve(Pcount);
for (atomic<size_t> i = 0; i < Pcount; i++)
{
int index = i;
Computer c = P[index];
myThreads.emplace_back([&] { S.add(c, index); });
}
for (size_t i = 0; i < Pcount; i++)
{
myThreads[i].join();
}
// printing results
}
Running the program produced the following results:
I'm familiar with race conditions, but this doesn't look like one to me. My bet would be on something reference related, because in the results we can see that for every "missing thread" (threads 1, 3, 10, 25) I get "duplicate threads" (threads 2, 9, 24, 28).
I have tried to create local variables in functions and loops but it changed nothing.
I have heard about threads sharing memory regions, but my previous work should have produced similar results, so I don't think that's the case here, but feel free to prove me wrong.
I'm using Visual Studio 2017

Here you catch local variables by reference in a loop, they will be destroyed in every turn, causing undefined behavior:
for (atomic<size_t> i = 0; i < Pcount; i++)
{
int index = i;
Computer c = P[index];
myThreads.emplace_back([&] { S.add(c, index); });
}
You should catch index and c by value:
myThreads.emplace_back([&S, index, c] { S.add(c, index); });

Another approach would be to pass S, i and c as arguments instead of capturing them by defining the following non-capturing lambda, th_func:
auto th_func = [](Monitor &S, int index, Computer c){ S.add(c, index); };
This way you have to explicitly wrap the arguments that must be passed by reference to the thread's callable object with std::reference_wrapper by means of the function template std::ref(). In your case, only S:
for (atomic<size_t> i = 0; i < Pcount; i++) {
int index = i;
Computer c = P[index];
myThreads.emplace_back(th_func, std::ref(S), index, c);
}
Failing to wrap with std::reference_wrapper the arguments that must be passed by reference will result in a compile-time error. That is, the following won't compile:
myThreads.emplace_back(th_func, S, index, c); // <-- it should be std::ref(S)
See also this question.

Related

adding series of numbers [1-->5000] with tread C++?

I want to add a series of numbers [1-->5000] with threads. But the result is not correct.
The goal is only to understand the threading well, because I am a beginner.
I tried this:
void thread_function(int i, int (*S))
{
(*S) = (*S) + i;
}
main()
{
std::vector<std::thread> vecto_Array;
int i = 0, Som = 0;
for(i = 1; i <= 5000; i++)
{
vecto_Array.emplace_back([&](){ thread_function(i, &Som); });
}
for(auto& t: vecto_Array)
{
t.join();
}
std::cout << Som << std::endl;
}
And I tried this:
int thread_function(int i)
{
return i;
}
main()
{
std::vector<std::thread> vecto_Array;
int i = 0, Som = 0;
for(i = 1; i <= 5000; i++)
{
vecto_Array.emplace_back([&](){ Som = Som + thread_function(i); });
}
for(auto& t: vecto_Array)
{
t.join();
}
std::cout << Som << std::endl;
}
The result is always wrong. Why?
I solved the problem as follows:
void thread_function(int (*i),int (*S))
{
(*S)=(*S)+(*i);
(*i)++;
}
main()
{
std::vector<std::thread> vecto_Array;
int i=0,j=0,Som=0;
for(i=1;i<=5000;i++)
{
vecto_Array.emplace_back([&](){thread_function(&j,&Som);});
}
for(auto& t: vecto_Array)
{
t.join();
}
std::cout << Som<<std::endl;
}
But is there anyone to explain to me why it did not work when taking "i of loop" ?
Your attempt #1 has a race condition. See What is a race condition?
Your Attempt #2 neglects the standard, which says about the thread function these words:
Any return value from the function is ignored.
(see: https://en.cppreference.com/w/cpp/thread/thread/thread )
Your attempt #3 has a race condition.
Concurrent programming is an advanced topic. What you need is a book or tutorial. I first learned it from Bartosz Milewski's course: https://www.youtube.com/watch?v=80ifzK3b8QQ&list=PL1835A90FC78FF8BE&index=1
but be warned that it will likely take years before you become comfortable in concurrency. I am still not. I guess what you need as a beginner is std::async (see Milewski's tutorial or use Google). Even gentler learning curve is with OpenMP https://en.wikipedia.org/wiki/OpenMP , which could be called "parallelization for the masses".

Error occurred when using thread_local to maintain a concurrent memory buffer

In the following code, I want to create a memory buffer that allows multiple threads to read/write it concurrently. At a time, all threads will read this buffer in parallel, and later they will write to the buffer in parallel. But there will be no read/write operation at the same time.
To do this, I use a vector of shared_ptr<vector<uint64_t>>. When a new thread arrives, it will be allocated with a new vector<uint64_t> and only write to it. Two threads will not write to the same vector.
I use thread_local to track the vector index and offset the current thread will write to. When I need to add a new buffer to the memory_ variable, I use a mutex to protect it.
class TestBuffer {
public:
thread_local static uint32_t index_;
thread_local static uint32_t offset_;
thread_local static bool ready_;
vector<shared_ptr<vector<uint64_t>>> memory_;
mutex lock_;
void init() {
if (!ready_) {
new_slab();
ready_ = true;
}
}
void new_slab() {
std::lock_guard<mutex> lock(lock_);
index_ = memory_.size();
memory_.push_back(make_shared<vector<uint64_t>>(1000));
offset_ = 0;
}
void put(uint64_t value) {
init();
if (offset_ == 1000) {
new_slab();
}
if(memory_[index_] == nullptr) {
cout << "Error" << endl;
}
*(memory_[index_]->data() + offset_) = value;
offset_++;
}
};
thread_local uint32_t TestBuffer::index_ = 0;
thread_local uint32_t TestBuffer::offset_ = 0;
thread_local bool TestBuffer::ready_ = false;
int main() {
TestBuffer buffer;
vector<std::thread> threads;
for (int i = 0; i < 10; ++i) {
thread t = thread([&buffer, i]() {
for (int j = 0; j < 10000; ++j) {
buffer.put(i * 10000 + j);
}
});
threads.emplace_back(move(t));
}
for (auto &t: threads) {
t.join();
}
}
The code does not behave as expected, and reports error is in the put function. The root cause is that memory_[index_] sometimes return nullptr. However, I do not understand why this is possible as I think I have set the values properly. Thanks for the help!
You have a race condition in put caused by new_slab(). When new_slab calls memory_.push_back() the _memory vector may need to resize itself, and if another thread is executing put while the resize is in progress, memory_[index_] might access stale data.
One solution is to protect the _memory vector by locking the mutex:
{
std::lock_guard<mutex> lock(lock_);
if(memory_[index_] == nullptr) {
cout << "Error" << endl;
}
*(memory_[index_]->data() + offset_) = value;
}
Another is to reserve the space you need in the memory_ vector ahead of time.

Threads with Classes and std::packaged_task

I'm trying to implement this Thread Pool with classes in C++.
Since now I was confident to have understand how classes work but now I'm getting mad.
I have 2 files
"JobScheduler.h" and "JobScheduler.cpp"
JobScheduler.h
class JobScheduler {
int thread_id;
std::vector<std::thread> pool;
std::mutex m1;
int set_id();
public:
JobScheduler();
~JobScheduler();
void Start();
};
JobScheduler.cpp
int id = 0;
std::mutex m;
JobScheduler::JobScheduler() {...}
JobScheduler::~JobScheduler() {...}
int JobScheduler::set_id() {
m1.lock();
int tmp_id = thread_id;
thread_id++;
std::cout << "id = " << tmp_id << "\n";
m1.unlock();
return tmp_id;;
}
int set_id_02(){
m.lock();
int tmp_id = id;
id++;
std::cout << "id = " << tmp_id << "\n";
m.unlock();
return tmp_id;
}
void JobScheduler::Start(){
// THIS DOESN'T WORK
/*
for(unsigned int i = 0; i < std::thread::hardware_concurrency(); i++){
pool.emplace_back(std::thread(std::packaged_task<void()>(JobScheduler::set_id))); // <--- error
}
... // print something and join threads
*/
// MANY THREADS - NO CLASS METHOD AS FUNCTION AND GLOBAL CPP VARIABLE - WORK
/*
for(unsigned int i = 0; i < std::thread::hardware_concurrency(); i++){
pool.emplace_back(std::thread(std::packaged_task<int()>(set_id_02)));
}
... // print something and join threads
*/
}
now if I use a function defined in .cpp it works fine but if I try to use a function I defined in the class it doesn't work but I need to be able to access Class variables.
So I have a lot of doubts:
1) why this doesn't work, what am I getting wrong?
2) it's ok to create a std::package_task like I do in the for? Or should I do something like
std::pakaged_task<int()> main_task(set_id);
for(unsigned int i = 0; i < std::thread::hardware_concurrency(); i++){
pool.emplace_back(std::thread(main_task));
}
3) in both cases how can I access the future of the task I created?
1) It's not working because you are creating the packaged task wrongly.
Since you are trying to use a member function you have to specify which object is going to be used to call those functions so you can try different approaches
Bind the object with the member function
Use a lambda as a proxy
std::packaged_task<int()>(std::bind(&JobScheduler::set_id, this))
std::packaged_task<int()>([this]{ return set_id(); })
For the second function is enough to just pass the function since it's a "free" function
std::packaged_task<int()>(set_id_02);
2) See above
3) In order to access the results of your packaged_task you must store its future
std::vector<std::future<int>> results;
for(unsigned int i = 0; i < std::thread::hardware_concurrency(); i++){
auto task = std::packaged_task<int()>([this]{ return set_id(); });
results.emplace_back(task.get_future());
pool.emplace_back(std::thread(std::move(task)));
}
//Access results
for (auto& f : results) {
cout << f.get() << endl;
}
As you rightly say, the problem is that you have to provide the object on which you want to call the member function. I see two solutions for that
// 1st wrap in a lambda
for(unsigned int i = 0; i < std::thread::hardware_concurrency(); i++){
pool.emplace_back(std::thread(std::packaged_task<void()>([this](){this->set_id();})));
}
// 2nd Use std::mem_fn and std::bind
for(unsigned int i = 0; i < std::thread::hardware_concurrency(); i++){
pool.emplace_back(std::thread(std::packaged_task<void()>(std::bind(std::mem_fn(&JobScheduler::set_id), *this))));
}
The first one should be clear, I think. In the second, std::mem_fn creates a function f such that f(object) does object.set_id() and std::bind creates a function g such that g() does f(this).
I prefer the first solution. It is one of many cases where lambdas are much simpler than using bind.

Displaying results as soon as they are ready with std::async

I'm trying to discover asynchronous programming in C++. Here's a toy example I've been using:
#include <iostream>
#include <future>
#include <vector>
#include <chrono>
#include <thread>
#include <random>
// For simplicity
using namespace std;
int called_from_async(int m, int n)
{
this_thread::sleep_for(chrono::milliseconds(rand() % 1000));
return m * n;
}
void test()
{
int m = 12;
int n = 42;
vector<future<int>> results;
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < 10; j++)
{
results.push_back(async(launch::async, called_from_async, i, j));
}
}
for(auto& f : results)
{
cout << f.get() << endl;
}
}
Now, the example is not really interesting, but it raises a question that is, to me, interesting. Let's say I want to display results as they "arrive" (I don't know what will be ready first, since the delay is random), how should I do it?
What I'm doing here is obviously wrong, since I wait for all the tasks in the order in which I created them - so I'll wait for the first to finish even if it's longer than the others.
I thought about the following idea: for each future, using wait_for on a small time and if it's ready, display the value. But I feel weird doing that:
while (any_of(results.begin(), results.end(), [](const future<int>& f){
return f.wait_for(chrono::seconds(0)) != future_status::ready;
}))
{
cout << "Loop" << endl;
for(auto& f : results)
{
auto result = f.wait_for(std::chrono::milliseconds(20));
if (result == future_status::ready)
cout << f.get() << endl;
}
}
This brings another issue: we'd call get several times on some futures, which is illegal:
terminate called after throwing an instance of 'std::future_error' what(): std::future_error: No associated state
So I don't really know what to do here, please suggest!
Use valid() to skip the futures for which you have already called get().
bool all_ready;
do {
all_ready = true;
for(auto& f : results) {
if (f.valid()) {
auto result = f.wait_for(std::chrono::milliseconds(20));
if (result == future_status::ready) {
cout << f.get() << endl;
}
else {
all_ready = false;
}
}
}
}
while (!all_ready);

"Segmentation fault (core dumped)" while using pthread_create

So I've got a problem: when I trying to create the last thread it always says that core is dumped. Doesn't matter if I write to create 5 or 2 threads. Here is my code:
UPD: Now I can't do more than 3 threads and threads don't do functions that I want them to do(consume and produce)
UPD_2: Now I've go a message like that: terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
Aborted (core dumped)
#include<cstdlib>
#include <iostream>
#include <string>
#include <mutex>
#include <pthread.h>
#include <condition_variable>
#define NUM_THREADS 4
using namespace std;
struct thread_data
{
int thread_id;
int repeat;
};
class our_monitor{
private:
int buffer[100];
mutex m;
int n = 0, lo = 0, hi = 0;
condition_variable in,out;
unique_lock<mutex> lk;
public:
our_monitor():lk(m)
{
}
void insert(int val, int repeat)
{
in.wait(lk, [&]{return n <= 100-repeat;});
for(int i=0; i<repeat; i++)
{
buffer[hi] = val;
hi = (hi + 1) % 100; //ring buffer
n = n +1; //one more item in buffer
}
lk.unlock();
out.notify_one();
}
int remove(int repeat)
{
out.wait(lk, [&]{return n >= repeat;});
int val;
for(int i=0; i<repeat; i++)
{
val = buffer[lo];
lo = (lo + 1) % 100;
n -= 1;
}
lk.unlock();
in.notify_one();
return val;
}
};
our_monitor mon;
void* produce(void *threadarg)
{
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
cout<<"IN produce after paramiters"<< my_data->repeat<<endl;
int item;
item = rand()%100 + 1;
mon.insert(item, my_data->repeat);
cout<< "Item: "<< item << " Was prodused by thread:"<< my_data->thread_id << endl;
}
void* consume(void *threadarg)
{
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
cout<<"IN consume after paramiters"<< my_data->repeat<<endl;
int item;
item = mon.remove(my_data->repeat);
if(item) cout<< "Item: "<< item << " Was consumed by thread:"<< my_data->thread_id << endl;
}
int main()
{
our_monitor *mon = new our_monitor();
pthread_t threads[NUM_THREADS];
thread_data td[NUM_THREADS];
int rc;
int i;
for( i = 0; i < NUM_THREADS; i++ )
{
td[i].thread_id = i;
td[i].repeat = rand()%5 + 1;
if(i % 2 == 0)
{
cout << "main() : creating produce thread, " << i << endl;
rc = pthread_create(&threads[i], NULL, produce, (void*) &td[i]);
if (rc)
{
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
} else
{
cout << "main() : creating consume thread, " << i << endl;
rc = pthread_create(&threads[i], NULL, consume, (void *)&td[i]);
if (rc)
{
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
}
}
pthread_join(threads[0], NULL);
pthread_join(threads[1], NULL);
pthread_join(threads[2], NULL);
//pthread_exit(NULL);
}
UPD: Now I can't do more than 3 threads and threads don't do functions that I want them to do(consume and produce)
UPD_2: Now I've go a message like that: terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
Aborted (core dumped)
From cppref regarding std::condition_variable.wait(...)
"Calling this function if lock.mutex() is not locked by the current
thread is undefined behavior."
http://en.cppreference.com/w/cpp/thread/condition_variable/wait
Unfortunately, the program doesn't crash on line 47, but on line 55, where you unlock the lock that wasn't locked.
Lock the lock when you enter your functions. I've done a quick check of the rest of your logic, and I'm like 85% sure it's otherwise ok.
While I have you here, this is not strictly necessary, but it's good practice. std::lock_guard and std::unique_lock automatically lock the mutex when it enters scope and unlock it when it leaves scope. This helps simplify exception handling and weird function returns. I recommend you get rid of lk as a member variable and use it as a scoped local variable instead.
void insert(int val, int repeat)
{
{ // Scoped. Somewhat pedantic in this case, but it's always best to signal after the mutex is unlocked
std::unique_lock<std::mutex> lk(m);
in.wait(lk, [&]{return n <= 100-repeat;});
for(int i=0; i<repeat; i++)
{
buffer[hi] = val;
hi = (hi + 1) % 100; //ring buffer
n = n +1; //one more item in buffer
}
}
out.notify_one();
}
Ok, now for the final issue. The cool thing about producer/consumer is that we could produce and consume at the same time. However, we just locked our functions so this is no longer possible. What you can do now is move your condition lock/wait/unlock/work/signal inside the for loop
in pseudocode:
// produce:
while (true)
{
{
unique_lock lk(m)
wait(m, predicate)
}
produce 1
signal
}
The is equivalent to using semaphores (which C++'11 stl doesn't have, but you can easily make your own as shown above.)
// produce:
semaphore in(100);
semaphore out(0);
while (true)
{
in.down(1) // Subtracts 1 from in.count. Blocks when in.count == 0 (meaning the buffer is full)
produce 1
out.up(1) // Adds 1 to out.count
}
When main ends, td goes out of scope and ceases to exist. But you passed pointers into it to threads. You need to make sure td continues to exist as long as any threads might be using it.