Threads not exiting? - c++

I am trying to write a program to solve the producer consumer problem with threads in C++, and from what I can tell the program works fine until the very end when the threads are supposed to exit with the join() function. (The Product object is a simple data container).
#include <iostream>
#include <random>
#include <cstdlib>
#include <ctime>
#include <chrono>
#include <sstream>
#include <vector>
#include <stack>
#include <thread>
#include <mutex>
#include <atomic>
#include <condition_variable>
#include <Product.h>
using namespace std;
const int max_items = 100;
atomic<int> itemNum(0);
atomic<int> numProducersWorking(0);
stack<Product> items;
int maxBuffer;
float storeSales[10];
float monthSales[12];
float totalSales;
mutex xmutex;
condition_variable isNotFull;
condition_variable isNotEmpty;
int intRand(const int & min, const int & max) {
static thread_local mt19937 generator(time(0));
uniform_int_distribution<int> distribution(min,max);
return distribution(generator);
}
float floatRand(const float & min, const float & max) {
static thread_local mt19937 generator(time(0));
uniform_real_distribution<float> distribution(min,max);
return distribution(generator);
}
void produce(int pId)
{
unique_lock<mutex> lock(xmutex);
int day, month, year, id, regNum;
float saleAmnt;
Product item;
id = pId;
day = intRand(1, 30);
month = intRand(1, 12);
year = 20;
regNum = intRand(1, 6);
saleAmnt = floatRand(0.50, 999.99);
item = Product(day, month, year, id, regNum, saleAmnt);
isNotFull.wait(lock, [] { return items.size() != maxBuffer; });
if(itemNum < max_items)
{
items.push(item);
itemNum++;
}
isNotEmpty.notify_all();
}
void consume(int cId)
{
unique_lock<mutex> lock(xmutex);
Product item;
isNotEmpty.wait(lock, [] { return items.size() > 0; });
item = items.top();
items.pop();
storeSales[item.getStoreID()-1] += item.getSaleAmnt();
monthSales[item.getMonth()-1] += item.getSaleAmnt();
totalSales += item.getSaleAmnt();
isNotFull.notify_all();
}
void producer(int id)
{
++numProducersWorking;
while(itemNum < max_items)
{
produce(id);
this_thread::sleep_for(chrono::milliseconds(intRand(5, 40)));
}
--numProducersWorking;
}
void consumer(int id)
{
while(numProducersWorking != 0 || items.size() > 0 )
{
consume(id);
}
}
int main()
{
int p, c, b;
p = 5;
c = 5;
b = 5;
maxBuffer = b;
vector<thread> prodsCons;
auto start = chrono::high_resolution_clock::now();
//create producers
for(int i = 1; i <= p; i++)
{
prodsCons.push_back(thread(producer, i));
}
//create consumers
for(int i = 0; i < c; i++)
{
prodsCons.push_back(thread(consumer, i));
}
int x = 0;
//wait for consumers and producers to finish
for(auto& th : prodsCons)
{
th.join();
cout<<"thread "<<x<<" joined"<<endl;
x++;
}
auto stop = chrono::high_resolution_clock::now();
auto duration = chrono::duration_cast<chrono::microseconds>(stop - start);
cout<<"Store-wide total sales: "<<endl;
for(int x = 1; x <= p; x++)
{
cout<<" store "<<x<<" sales: $"<<storeSales[x-1]<<endl;
}
cout<<"Month-wise total sales: "<<endl;
for(int x = 1; x <= 12; x++)
{
cout<<" month "<<x<<" sales: $"<<monthSales[x-1]<<endl;
}
cout<<"Total sales: $"<<totalSales<<endl;
cout<<"Simulation time: "<<duration.count()<<" microseconds"<<endl;
}
The output looks like this:
thread 0 joined
thread 1 joined
thread 2 joined
thread 3 joined
thread 4 joined
indicating that 5 out of the 10 threads aren't exiting (most likely the consumers), and so the program never reaches the end. Is there a condition that isn't being fulfilled, or did I implement the mutexes incorrectly?

Once the consume thread reaches the condition_variable::wait call inside consume(), it will not return without some sort of signal.
I typically have a flag shutdown, which is protected by the same mutex as the queue, and my wait condition is going to be based on the shutdown flag and the size.
When its time for the consumers to stop, I acquire the mutex, and set the shutdown flag. Then, on exit from the wait, I will either exit immediately on shutdown, or only if the queue is also empty. The former is an immediate shutdown, while the latter is a shutdown once work is complete.
Also, all access to the items stack must be protected by the mutex. You've done that in some places, but not others.

Related

Is there a race condition when one thread is constantly querying a variable from memory while another thread updates it?

I have the following code
#include <stdio.h>
#include <thread>
#include <mutex>
class A
{
public:
A(int n) : num_workers(n) {
counter_lock = new std::mutex();
threads = new std::thread[num_workers];
for (int i = 0; i < num_workers; i++) {
threads[i] = std::thread(&A::run, this);
}
}
~A() {
delete counter_lock;
}
void start() {
go = true;
counter = 0;
total = 1000000;
while (counter < total) {};
for (int i = 0; i < num_workers; i++) {
threads[i].join();
}
}
void run() {
printf("Spinning\n");
while(1) {
if (go) {
int i;
counter_lock->lock();
i = counter++;
counter_lock->unlock();
if (i >= total) break;
}
}
printf("Done\n");
}
private:
std::mutex* counter_lock;
std::thread* threads;
int num_workers;
int counter;
int total;
bool active;
bool go;
};
int main() {
A a = A(10);
a.start();
}
In the constructor of A I create a thread pool of num_workers. They are all executing a function A::run which simply waits until it gets a signal to go and then starts incrementing the counter. The function start is meant to be synchronous with whomever called it, so in this case it should be synchronous with main. Therefore, it just idles until the counter reaches total after issuing the signal to go to all the worker threads.
The code works as expected. The counter ends up at 10000010 which makes sense because each of the 10 threads have incremented the counter once before quitting.
Is there a race condition here that I'm not noticing? Is the fact that I'm reading counter in the while(counter < total) {} loop without a lock causing any problems?
Thanks for any advice.
EDIT: I replaced the while (counter < total) {} loop with the following, perhaps more thread safe, implementation. But, very rarely, it still ends up looping forever since the value of counter hasn't updated.
while (1) {
int c;
counter_lock->lock();
c = counter;
counter_lock->unlock();
if (c >= total) break;
};

C++ threads. Why always executes last thread?

Why only last threads executes every time? I'm trying to divide grid into N workers, half of grid always not touchable and other part always proceed by 1 last created thread. Should I use an array instead of vector? Locks also do not help to resolve this problem.
#include <iostream>
#include <unistd.h>
#include <vector>
#include <stdio.h>
#include <cstring>
#include <future>
#include <thread>
#include <pthread.h>
#include <mutex>
using namespace std;
std::mutex m;
int main(int argc, char * argv[]) {
int iterations = atoi(argv[1]), workers = atoi(argv[2]), x = atoi(argv[3]), y = atoi(argv[4]);
vector<vector<int> > grid( x , vector<int> (y, 0));
std::vector<thread> threads(workers);
int start, end, lastworker, nwork;
int chunkSize = y/workers;
for(int t = 0; t < workers; t++){
start = t * chunkSize;
end = start + chunkSize;
nwork = t;
lastworker = workers - 1;
if(lastworker == t){
end = y; nwork = workers - 1;
}
threads[nwork] = thread([&start, &end, &x, &grid, &t, &nwork, &threads] {
cout << " ENTER TO THREAD -> " << threads[nwork].get_id() << endl;
for (int i = start; i < end; ++i)
{
for (int j = 0; j < x; ++j)
{
grid[i][j] = t;
}
}
sleep(2);
});
cout << threads[nwork].get_id() << endl;
}
for(auto& th : threads){
th.join();
}
for (int i = 0; i < y; ++i)
{
for (int j = 0; j < x; ++j)
{
cout << grid[i][j];
}
cout << endl;
}
return(0);
}
[&start, &end, &x, &grid, &t, &nwork, &threads]
This line is the root of the problem. You are capturing all the variables by reference, which is not what you want to do.
As a consequence, each thread uses the same variables, which is also not what you want.
You should only capture grid and threads by reference, the other variables should be captured by value ('copied' into the lambda)
[start, end, x, &grid, t, nwork, &threads]
Also, you are accessing grid wrong everywhere: change grid[i][j] to grid[j][i]
thread([&start, &end, &x, &grid, &t, &nwork, &threads] {
=======
The lambda closure that gets executed by every thread captures a reference to nwork.
Which means that as the for loop iterates and starts every thread, each captured thread will always reference the current value of nwork, at the time it does.
As such, the outer loop probably quickly finishes creating each thread object before all the threads actually initialize and actually enter the lambda closure, and each closure sees the same value of nwork, because it is captured by reference, which is the last thread id.
You need to capture nwork by value instead of by reference.
You're passing all the thread parameters are references to the thread lambda. However, when the loop continues in the main thread, the thread parameter variables change, which changes their values in the threads as well, messing up all the previously-created threads.

What is the fastest way to read from a small (on the order of 10 elements) vector of class pointers in parallel?

    I am looking for the fastest way to have multiple threads reading from the same small vector (one which is not static but will only ever be changed by the main thread and only ever when the child threads are not reading from it) of pointers.
    I've tried using a shared std::vector of pointers which is somewhat faster than a shared array of pointers but still slower per thread... I thought that the reason for that is the threads reading so close together in memory causing false sharing, but I am unsure.
    I'm hoping there is either a way around that since the data is read only when the threads are accessing it or there's an entirely different approach that is faster. Below is a minimal example
#include <thread>
#include <iostream>
#include <iomanip>
#include <vector>
#include <atomic>
#include <chrono>
namespace chrono=std::chrono;
class A {
public:
A(int n=1) {
a=n;
}
int a;
};
void tfunc();
int nelements=10;
int nthreads=1;
std::vector<A*> elements;
std::atomic<int> complete;
std::atomic<int> remaining;
std::atomic<int> next;
std::atomic<int> tnow;
int tend=1000000;
int main() {
complete=false;
remaining=0;
next=0;
tnow=0;
for (int i=0; i < nelements; i++) {
A* a=new A();
elements.push_back(a);
}
std::thread threads[nthreads];
for (int i=0; i < nthreads; i++) {
threads[i]=std::thread(tfunc);
}
auto begin=chrono::high_resolution_clock::now();
while (tnow < tend) {
remaining=nthreads;
next=0;
tnow += 1;
while (remaining > 0) {}
// if {elements} is changed it is changed here
}
complete=true;
for (int i=0; i < nthreads; i++) {
threads[i].join();
}
auto complete=chrono::high_resolution_clock::now();
auto elapsed=chrono::duration_cast<chrono::microseconds>(complete-begin).count();
std::cout << std::setw(2) << nthreads << "Time - " << elapsed << std::endl;
}
void tfunc() {
int sum=0;
int tpre=0;
int curr=0;
while (tnow == 0) {}
while (!complete) {
if (tnow-tpre > 0) {
tpre=tnow;
while (remaining > 0) {
curr=next++;
if (curr > nelements) break;
for (int i=0; i < nelements; i++) {
if (i != curr) {
sum += elements[i] -> a;
}
}
remaining--;
}
}
}
}
Which for nthreads between 1 and 10 on my system outputs (the times are in microseconds)
1 Time - 281548
2 Time - 404926
3 Time - 546826
4 Time - 641898
5 Time - 714259
6 Time - 812776
7 Time - 922391
8 Time - 994909
9 Time - 1147579
10 Time - 1199838
I am wondering if there is a faster way to do this or if such a parallel operation will always be slower than serial due to the smallness of the vector.

How to sync "for" loop counter in multithread?

How to sync "for" loop counter on multithread?
If these multi thread program
void Func(int n){
for(int i=0; i<n; i++){ //at the same time with other Func()
cout << i <<endl;
}
}
void main(){
std::thread t1(Func(2));
std::thread t2(Func(2));
t1.join();
t2.join();
}
When executing Func() in parallel , I want to sync "for" loop counter "i".
For example, the program has possibility to output the result
0
1
0
1
but I want to always get the result
0
0
1
1
Can I it?
If you use OpenMP to thread your loop you can use a #pragma omp barrier statement.
In C++11 you can use a condition_variable to block all threads until they reach the same spot.
One way to do it would be to use a few variables for the threads to coordinate things (in the following they are globals, just for simplicity).
mutex m;
condition_variable c;
static int index = 0;
static int count = 2;
The index variable says at which index are the threads, and the count variable says how many threads are at the index still.
Now you're loop becomes:
void Func(int n){
for(int i=0; i<n; i++){ //at the same time with other Func()
unique_lock<mutex> l(m);
c.wait(l, [i](){return index == i;});
cout << i <<endl;
if(--count == 0)
{
++index;
count = 2;
c.notify_one();
}
}
}
Here is the full code:
#include <thread>
#include <mutex>
#include <condition_variable>
#include <iostream>
using namespace std;
mutex m;
condition_variable c;
static int index = 0;
static int count = 2;
void Func(int n){
for(int i=0; i<n; i++){ //at the same time with other Func()
unique_lock<mutex> l(m);
c.wait(l, [i](){return index == i;});
cout << i <<endl;
if(--count == 0)
{
++index;
count = 2;
c.notify_one();
}
}
}
int main(){
std::thread t1(Func, 20);
std::thread t2(Func, 20);
t1.join();
t2.join();
}
You can use a std:atomic variable and pass it to all threads.
void Func(int n, int & i){
for (; i<n; i++){ //at the same time with other Func()
cout << i << endl;
}
}
void main(){
std::atomic<int> counter = 0;
std::thread t1(Func, 2, std::ref(counter));
std::thread t2(Func, 2, std::ref(counter));
t1.join();
t2.join();
}
Also you should note that the way you are crating your threads in your example are incorrect. Secondly if you are using cout in multiple threads each cout should be guarded with a std::mutex as cout is not thread safe.

Threads failing to affect performance

Below is a small program meant to parallelize the approximation of the 1/(n^2) series. Note the global parameter NUM_THREADS.
My issue is that increasing the number of threads from 1 to 4 (the number of processors my computer has is 4) does not significantly affect the outcomes of timing experiments. Do you see a logical flaw in the ThreadFunction? Is there false sharing or misplaced blocking that ends up serializing the execution?
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <string>
#include <future>
#include <chrono>
std::mutex sum_mutex; // This mutex is for the sum vector
std::vector<double> sum_vec; // This is the sum vector
int NUM_THREADS = 1;
int UPPER_BD = 1000000;
/* Thread function */
void ThreadFunction(std::vector<double> &l, int beg, int end, int thread_num)
{
double sum = 0;
for(int i = beg; i < end; i++) sum += (1 / ( l[i] * l[i]) );
std::unique_lock<std::mutex> lock1 (sum_mutex, std::defer_lock);
lock1.lock();
sum_vec.push_back(sum);
lock1.unlock();
}
void ListFill(std::vector<double> &l, int z)
{
for(int i = 0; i < z; ++i) l.push_back(i);
}
int main()
{
std::vector<double> l;
std::vector<std::thread> thread_vec;
ListFill(l, UPPER_BD);
int len = l.size();
int lower_bd = 1;
int increment = (UPPER_BD - lower_bd) / NUM_THREADS;
for (int j = 0; j < NUM_THREADS; ++j)
{
thread_vec.push_back(std::thread(ThreadFunction, std::ref(l), lower_bd, lower_bd + increment, j));
lower_bd += increment;
}
for (auto &t : thread_vec) t.join();
double big_sum;
for (double z : sum_vec) big_sum += z;
std::cout << big_sum << std::endl;
return 0;
}
From looking at your code, I suspect that ListFill is taking longer than ThreadFunction. Why pass a list of values to the thread instead of the bounds each thread should loop over? Something like:
void ThreadFunction( int beg, int end ) {
double sum = 0.0;
for(double i = beg; i < end; i++)
sum += (1.0 / ( i * i) );
std::unique_lock<std::mutex> lock1 (sum_mutex);
sum_vec.push_back(sum);
}
To maximize parallelism, you need to push as much work as possible onto the threads. See Amdahl's Law
In addition to dohashi's nice improvement, you can remove the need for the mutex by populating the sum_vec in advance in the main thread:
sum_vec.resize(4);
then writing directly to it in ThreadFunction:
sum_vec[thread_num] = sum;
since each thread writes to a distinct element and doesn't modify the vector itself there is no need to lock anything.