How do I implement std::thread the right way with loops? - c++

I´m trying to create a multithread part in my program, where a loop creates multiple threads, that get a vector consisting of objects along some integers and the vector which holds the results.
The problem is I can´t seem to wrap my head around how the thread part works, I tried different things but all end in the same three errors.
This is where I don´t know how to proceed:
std::thread thread_superdiecreator;
for (int64_t i = 0; i < dicewithside.back().sides; i++) {
thread_superdiecreator(func_multicreator(dicewithside, i, amount, lastdiepossibilities, superdie));
}
term does not evalutate to a function taking 1 arguments
I tried this:
thread_superdiecreator(func_multicreator, dicewithside, i, amount, lastdiepossibilities, superdie);
call of an object of a class type without appropriate operator() or conversion functions to pointer-to-function type
And this:
std::thread thread_superdiecreator(func_multicreator, dicewithside, i, amount, lastdiepossibilities, superdie);
Invoke error in thread.
The whole code snippet:
#pragma once
#include <mutex>
#include <thread>
#include <algorithm>
#include "class_Diewithside.h"
#include "struct_Sortedinput.h"
#include "func_maximumpossibilities.h"
std::mutex superdielock;
void func_multicreator(std::vector<Diewithside> dicewithside, int64_t lastdieside, int64_t size, int64_t lastdiepossibilities, std::vector<int64_t> &superdie) {
// Set the last die side to number of the thread
dicewithside[size-1].dieside = lastdieside;
//
std::vector<int64_t> partsuperdie;
partsuperdie.reserve(lastdiepossibilities);
// Calculate all possible results of all dice thrown with the last one set
for (int64_t i = 0; i < lastdiepossibilities; i++) {
// Reset the result
int64_t result = 0;
for (int64_t j = 0; j < size; j++) {
result += dicewithside[j].alleyes[dicewithside[j].dieside];
}
partsuperdie.push_back(result);
//
for (int64_t j = 0; j < size - 1; j++) {
if (dicewithside[j].dieside == dicewithside[j].sides - 1) {
dicewithside[j].dieside = 0;
}
else {
dicewithside[j].dieside += 1;
break;
}
}
}
superdielock.lock();
for (int64_t i = 0; i < lastdiepossibilities; i++) {
superdie.push_back(partsuperdie[i]);
}
superdielock.unlock();
}
// The function superdie creates an array that holds all possible outcomes of the dice thrown
std::vector<int64_t> func_superdiecreator(sortedinput varsortedinput) {
// Get the size of the diceset vector and create a new vector out of class Diewithside
int64_t size = varsortedinput.dicesets.size();
std::vector<Diewithside> dicewithside;
// Initialize the integer amount and iterate through all the amounts of vector dicesets adding them together to set the vector dicewithside reserve
int64_t amount = 0;
for (int64_t i = 0; i < size; i++) {
amount += varsortedinput.dicesets[i].amount;
}
dicewithside.reserve(amount);
// Fill the new vector dicewithside with each single die and add the starting value of 0
for (int64_t i = 0; i < size; i++) {
for (int64_t j = 0; j < varsortedinput.dicesets[i].amount; j++) {
dicewithside.push_back(Diewithside{varsortedinput.dicesets[i].plusorminus, varsortedinput.dicesets[i].sides, varsortedinput.dicesets[i].alleyes, 0});
}
}
// Get the maximum possibilities and divide by sides of the last die to get the amount of iterations each thread has to run
int64_t maximumpossibilities = func_maximumpossibilities(varsortedinput.dicesets, size);
int64_t lastdiepossibilities = maximumpossibilities / dicewithside[amount-1].sides;
// Multithread calculate all possibilities and save them in array
std::vector<int64_t> superdie;
superdie.reserve(maximumpossibilities);
std::thread thread_superdiecreator;
for (int64_t i = 0; i < dicewithside.back().sides; i++) {
thread_superdiecreator(func_multicreator(dicewithside, i, amount, lastdiepossibilities, superdie));
}
thread_superdiecreator.join();
return superdie;
Thanks for any help!

You indeed need to create the thread using the third alternative mentioned in the question, i.e. use the constructor of std::thread to start the thread.
The issue with this approach is the fact the last parameter of func_multicreator being a lvalue reference: std::thread creates copies of parameters and moves those copies during for calling the function on the background thread, and an rvalue reference cannot be implicitly converted to an lvalue reference. You need to use std::reference_wrapper here to be able to "pass" an lvalue reference to the thread.
You should join every thread created so you need to create a collection of threads.
Simplified example:
(The interesting stuff is between the ---... comments.)
struct Diewithside
{
int64_t sides;
};
void func_multicreator(std::vector<Diewithside> dicewithside, int64_t lastdieside, int64_t size, int64_t lastdiepossibilities, std::vector<int64_t>& superdie)
{
}
std::vector<int64_t> func_superdiecreator() {
std::vector<Diewithside> dicewithside;
// Initialize the integer amount and iterate through all the amounts of vector dicesets adding them together to set the vector dicewithside reserve
int64_t amount = 0;
int64_t lastdiepossibilities = 0;
std::vector<int64_t> superdie;
// -----------------------------------------------
std::vector<std::thread> threads;
for (int64_t i = 0; i < dicewithside.back().sides; i++) {
// create thread using constructor std::thread(func_multicreator, dicewithside, i, amount, lastdiepossibilities, std::reference_wrapper(superdie));
threads.emplace_back(func_multicreator, dicewithside, i, amount, lastdiepossibilities, std::reference_wrapper(superdie));
}
for (auto& t : threads)
{
t.join();
}
// -----------------------------------------------
return superdie;
}

std::thread thread_superdiecreator;
A single std::thread object always represents a single execution threads. You seem to be trying to use this single object to represent multiple execution threads. No matter what you will try, it won't work. You need multiple std::thread objects, one for each execution thread.
thread_superdiecreator(func_multicreator, dicewithside, i, amount, lastdiepossibilities, superdie);
An actual execution thread gets created by constructing a new std::thread object, and not by invoking it as a function.
Constructing an execution thread object corresponds to the creation of a new execution thread, it's just that simple. And the simplest way to have multiple execution threads is to have a vector of them.
std::vector<std::thread> all_execution_threads.
With that in place, creating a new execution thread involves nothing more than constructing a new std::thread object and moving it into the vector. Or, better yet, emplace it directly:
all_execution_threads.emplace_back(
func_multicreator, dicewithside, i,
amount, lastdiepossibilities, superdie
);
This presumes that everything else is correct: func_multicreator agrees with the following parameters, none of them are passed by reference (you need to fix this, at least, your attempt to pass a reference into a thread function will not work), leaving dangling references behind, all access to all objects by multiple execution threads are correctly synchronized, with mutexes, and all the other usual pitfalls when working with multiple execution threads. But this covers the basics of creating some unspecified number of multiple, concurrent, execution threads. When all and said and done you end up with a std::vector of std::threads, one for each actual execution thread.

Related

How to start a thread from one object's method by calling another method?

I have read various articles on C++ threading, among others GeeksForGeeks article. I have also read this quection but none of these has an answer for my need. In my project, (which is too complex to mention here), I would need something along the lines:
#include <iostream>
#include <thread>
using namespace std;
class Simulate{
public:
int Numbers[100][100];
thread Threads[100][100];
// Method to be passed to thread - in the same way as function pointer?
void DoOperation(int i, int j) {
Numbers[i][j] = i + j;
}
// Method to start the thread from
void Update(){
// Start executing threads
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++) {
Threads[i][j] = thread(DoOperation, i, j);
}
}
// Wait till all of the threads finish
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++) {
if (Threads[i][j].joinable()) {
Threads[i][j].join();
}
}
}
}
};
int main()
{
Simulate sim;
sim.Update();
}
How can I do this please? Any help is appreciated, and alternative solutions wellcomed. I am a mathematician by training, learning C++ for less than a week, so simplicity is pereferred. I desperately need something along these lines to make my research simulations faster.
The easiest way to call member functions and pass arguments is to use a lambda expression:
Threads[i][j] = std::thread([this, i, j](){ this->DoOperation(i, j); });
The variables listed in [] are captured and their values can be used by the code inside {}. The lambda itself has a unique anonymous type, but can be implicitly cast to std::function which is accepted by std::thread constructor.
However, starting 100x100 = 10000 threads will quickly exhaust memory on most systems. Adding more threads than there are CPU cores does not improve performance for computational tasks. Instead it is a better idea to start e.g. 10 threads that each process 1000 items in a loop.

How can we run n instances of an algorithm in parallel and compute the mean of a function of the results in an efficient way?

I want to run n instances of an algorithm in parallel and compute the mean of a function f of the results. If I'm not terribly wrong, the following code achieves this goal:
struct X {};
int f(X) { return /* ... */; }
int main()
{
std::size_t const n = /* ... */;
std::vector<std::future<X>> results;
results.reserve(n);
for (std::size_t i = 0; i < n; ++i)
results.push_back(std::async([]() -> X { /* ... */ }));
int mean = 0;
for (std::size_t i = 0; i < n; ++i)
mean += f(results[i].get());
mean /= n;
}
However, is there a better way to do this? The obvious problem with the code above is the following: The order of summation in the line mean += f(results[i].get()); doesn't matter. Thus, it would be better to add the results to mean as soon as they are available. If in the code above, the result of the ith task is not yet available, the program waits for that result, while it might be possible that all results of task i + 1 to n - 1 are already available.
So, how can we do this in a better way?
You're blocking on the future, which is one operation too early.
Why not update the accumulated sum in the async thread and then block on all threads being complete?
#include <condition_variable>
#include <thread>
#include <mutex>
struct X {};
int f(X);
X make_x(int);
struct algo_state
{
std::mutex m;
std::condition_variable cv;
int remaining_tasks;
int accumulator;
};
void task(X x, algo_state& state)
{
auto part = f(x);
auto lock = std::unique_lock(state.m);
state.accumulator += part;
if (--state.remaining_tasks == 0)
{
lock.unlock();
state.cv.notify_one();
}
}
int main()
{
int get_n();
auto n = get_n();
algo_state state = {
{},
{},
n,
0
};
for(int i = 0 ; i < n ; ++i)
std::thread([&] { task(make_x(i), state); }).detach();
auto lock = std::unique_lock(state.m);
state.cv.wait(lock, [&] { return state.remaining_tasks == 0; });
auto mean = state.accumulator / n;
return mean;
}
Couldn't fit this into comment:
Instead of passing N functions to M threads for N data points(X), you can have:
K queues of N/K elements of data elements for each of them
M threads in a pool (producers, ready with same function)
1 consumer (adder) thread (main?)
and pass only N data points between threads. Passing functions and executing them can have more overhead than just data.
Also those functions can add into a shared variable without needing any extra summation outside then only M producers can work with a suitable synchronization such as atomics or lock guards.
What is sizeof that struct?
Easiest way
What about making the lambda return f(x) instead of x:
for (std::size_t i = 0; i < n; ++i)
results.push_back(std::async([]() -> int { /* ... */ }));
In this case, f() could be performed as soon as possible an without waiting. The average computation would still need to wait in a sequential order. But this is a false problem since there's nothing faster than summarising integers, and anyway, you would not be able to finish the calculation of the average before having summed each part.
Easy alternative
Still another approach could be to use atomic<int> mean; and capture it in the lambda and update the sum. So in the end you'd only need to be sure that all future delivered before doing the division. But as said, considering the cost of an integer addition, this might be overkill here.
std::vector<std::future<void>> results;
...
atomic<int> mean{0};
for (std::size_t i = 0; i < n; ++i)
results.push_back(std::async([&mean]() -> void
{ X x = ...; int i=f(x); mean+=i; return; }));
for (std::size_t i = 0; i < n; ++i)
results[i].get();
mean = mean/n; // attention not an atomic operation, but all concurent things are done

How can I modify a string from a thread?

I am trying to modify a some strings from threads (each thread would have its own string) but all strings are stored in a vector, because i need to be able to access them after the threads have done their thing.
I haven't used threads in c++, so if this is a terrible thing to do, all suggestions welcome :)
Basically the only thing the program does now is:
create some threads
send a string and an id to each thread
thread function modifies the string to add the id to it
end
This gives a segfault :(
Is this just a bad approach? How else could I do this?
static const int cores = 8;
void bmh_t(std::string & resr, int tid){
resr.append(std::to_string(tid));
resr.append(",");
return;
}
std::vector<std::string> parbmh(std::string text, std::string pat){
std::vector<std::string> distlists;
std::thread t[cores];
//Launch a group of threads
for (int i = 0; i < cores; ++i) {
distlists.push_back(" ");
t[i] = std::thread(bmh_t,std::ref(distlists[i]), i);
}
for (int i = 0; i < cores; ++i) {
t[i].join();
}
return distlists;
}
Your basic approach is fine. The main thing you need to consider when writing parallel code is that any data shared between threads is done so in a safe way. Because your algorithm uses a different string for each thread, it's a good approach.
The reason you're seeing a crash is because you're calling push_back on your vector of strings after you've already given each thread a reference to data stored within the vector. This is a problem because push_back needs to grow your vector, when its size reaches its capacity. That growth can invalidate the references that you've dispatched to each thread, causing them to write to freed memory.
The fix is very simple: just make sure ahead of time that your vector doesn't need to grow. This can be accomplished with a constructor argument specifying an initial number of elements; a call to reserve(); or a call to resize().
Here's an implementation that doesn't crash:
static const int cores = 8;
void bmh_t(std::string & resr, int tid){
resr.append(std::to_string(tid));
resr.append(",");
return;
}
std::vector<std::string> parbmh(){
std::vector<std::string> distlists;
std::thread t[cores];
distlists.reserve(cores);
//Launch a group of threads
for (int i = 0; i < cores; ++i) {
distlists.push_back(" ");
t[i] = std::thread(bmh_t, std::ref(distlists[i]), i);
}
for (int i = 0; i < cores; ++i) {
t[i].join();
}
return distlists;
}
The vector of strings is being destructed before the threads can act on the contained strings. You'll want to join the threads before returning so that the vector of strings isn't destroyed.

Fill an array from different threads concurrently c++

First of all, I think it is important to say that I am new to multithreading and know very little about it. I was trying to write some programs in C++ using threads and ran into a problem (question) that I will try to explain to you now:
I wanted to use several threads to fill an array, here is my code:
static const int num_threads = 5;
int A[50], n;
//------------------------------------------------------------
void ThreadFunc(int tid)
{
for (int q = 0; q < 5; q++)
{
A[n] = tid;
n++;
}
}
//------------------------------------------------------------
int main()
{
thread t[num_threads];
n = 0;
for (int i = 0; i < num_threads; i++)
{
t[i] = thread(ThreadFunc, i);
}
for (int i = 0; i < num_threads; i++)
{
t[i].join();
}
for (int i = 0; i < n; i++)
cout << A[i] << endl;
return 0;
}
As a result of this program I get:
0
0
0
0
0
1
1
1
1
1
2
2
2
2
2
and so on.
As I understand, the second thread starts writing elements to an array only when the first thread finishes writing all elements to an array.
The question is why threads dont't work concurrently? I mean why don't I get something like that:
0
1
2
0
3
1
4
and so on.
Is there any way to solve this problem?
Thank you in advance.
Since n is accessed from more than one thread, those accesses need to be synchronized so that changes made in one thread don't conflict with changes made in another. There are (at least) two ways to do this.
First, you can make n an atomic variable. Just change its definition, and do the increment where the value is used:
std::atomic<int> n;
...
A[n++] = tid;
Or you can wrap all the accesses inside a critical section:
std::mutex mtx;
int next_n() {
std::unique_lock<std::mutex> lock(mtx);
return n++;
}
And in each thread, instead of directly incrementing n, call that function:
A[next_n()] = tid;
This is much slower than the atomic access, so not appropriate here. In more complex situations it will be the right solution.
The worker function is so short, i.e., finishes executing so quickly, that it's possible that each thread is completing before the next one even starts. Also, you may need to link with a thread library to get real threads, e.g., -lpthread. Even with that, the results you're getting are purely by chance and could appear in any order.
There are two corrections you need to make for your program to be properly synchronized. Change:
int n;
// ...
A[n] = tid; n++;
to
std::atomic_int n;
// ...
A[n++] = tid;
Often it's preferable to avoid synchronization issues altogether and split the workload across threads. Since the work done per iteration is the same here, it's as easy as dividing the work evenly:
void ThreadFunc(int tid, int first, int last)
{
for (int i = first; i < last; i++)
A[i] = tid;
}
Inside main, modify the thread create loop:
for (int first = 0, i = 0; i < num_threads; i++) {
// possible num_threads does not evenly divide ASIZE.
int last = (i != num_threads-1) ? std::size(A)/num_threads*(i+1) : std::size(A);
t[i] = thread(ThreadFunc, i, first, last);
first = last;
}
Of course by doing this, even though the array may be written out of order, the values will be stored to the same locations every time.

Can a multithreaded function use a static array to share data between threads?

Say you have a function that counts the occurences of a value in a large array. You want to multithread this function by having each thread count the occurences in its own part of the array, then adding the results together. We can assume that each thread has a unique rank (from 0 to N-1), and that each thread will call the function at about the same time. E.g.:
int count[N]; // global array
int countOccurences()
{
count[rank] = /* count occurences in one part of the array */
// wait until all other threads reached this point
int total = 0;
for (int i=0; i<N; i++)
total += count[i];
return total;
}
The question is, in the above example, could I move the count array into the body of countOccurences() as a static variable? I don't need it to exist anywhere outside the function body: will a static array be shared among threads?
The answer to your question is yes, threads have access to global data and static data in the same compilation unit, but there is more to this topic about "sharing data between threads" that is important to understand.
For each thread, there is a corresponding function (the "thread function") that a thread will execute in parallel with other threads. A thread has access to whatever that function can access, either through pointer or reference parameters, global data, static data in the same compilation unit as the thread function, or global and/or static data that is readable/modifiable via other functions that the thread function can call. You should be able to determine every memory region that a given function can read or modify. Those memory regions of a given thread function are exactly the memory regions that the thread has access to.
It is easy to see that global data and static data in the same compilation unit are accessible to a thread function, so that's why the answer to your question is "yes".
One thing that you might want to look into is OpenMP, which has built-in constructs for parallelizing a countOccurrences operation (a so-called "reduction"):
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
size_t countOccurrences(const int *arr, size_t n, int which) {
size_t count, i;
count = 0;
#pragma omp parallel for reduction(+:count)
for (i = 0; i < n; ++i) {
if (arr[i] == which)
++count;
}
return count;
}
int main()
{
int arr[] = { 3, 5, -1, -1, 0 };
size_t count = countOccurrences(arr, sizeof (arr)/sizeof (arr[0]), -1);
printf("count = %d\n", (int) count);
return EXIT_SUCCESS;
}
yes, you can, because your threads finish their work before you return from your countOccurences. just pass reference to i-th element to i-th thread function so it can access it
something like this:
#include <boost/thread.hpp>
#include <numeric>
#include <boost/bind.hpp>
void thread_function(int& result)
{
// your implementation
}
size_t const n = 100;
int countOccurences()
{
int count[n];
boost::thread_group group;
for (int i = 0; i != n; ++i)
group.create_thread(boost::bind(thread_function, boost::ref(count[i])));
group.join_all();
return std::accumulate(count, count + n, 0);
}