Multiple threads to data to array C++ - c++

I'm using for loop to create given number of threads, each one of them makes approximation of part of my integral, I want them to give that data back to array so later I can sum it up (if I think right, I can't just make sum += in each thread because they will collide), everything worked right, to the moment when I want to take that data from each thread, I get error:
calka.cpp:49:33: error: request for member 'get_future' in 'X', which is of non-class type 'std::promise<float>[(N + -1)]'
#include <iostream> //cout
#include <thread> //thread
#include <future> //future , promise
#include <stdlib.h> //atof
#include <string> //string
#include <sstream> //stringstream
using namespace std;
// funkcja 4x^3 + (x^2)/3 - x + 3
// całka x^4 + (x^3)/9 - (x^2)/2 + 3x
void thd(float begin, float width, promise<float> & giveback)
float x = begin + 1/2 * width;
float height = x*x*x*x + (x*x*x)/9 - (x*x)/2 + 3*x ;
float outcome = height * width;
stringstream ss;
ss << this_thread::get_id();
string output = "thread #id: " + ss.str() + " outcome" + to_string(outcome);
cout << output << endl;
int main(int argc, char* argv[])
int sum = 0;
float begin = atof(argv[1]);
float size = atof(argv[2]);
int N = atoi(argv[3]);
float end = begin + N*size;
promise<float> X[N-1];
thread t[N];
for(int i=0; i<N; i++){
t[i] = thread(&thd, begin, size, ref(X[i]));
begin += size;
future<float> wynik_ftr = X.get_future();
float wyniki[N-1];
for(int i=0; i<N; i++){
wyniki[i] = wynik_ftr.get();
//place for loop adding outcome from threads to sum
cout << N;
return 0;

Don't use VLA - promise<float> X[N-1]. It is an extension of some compilers, so your code is not portable. Use std::vector instead.
It seems you want to split calculation of integral to N threads. You create N-1 background threads and one invocation of thd is executed from main thread. In main you join all results, so
you don't need to create wyniki as array to store a result per thread,
because you are gathering these results in serially manner - inside for loop in main function.
Therefore one float wyniki variable is sufficient.
Steps you have to do are:
prepare N promises
starts N-1 threads
call thd from main
join and add results from N-1 threads in for loop
join and add main thread result
std::vector<promise<float>> partialResults(N);
std::vector<thread> t(N-1);
for (int i = 0; i<N-1; i++) {
t[i] = thread(&thd, begin, size, ref(partialResults[i]));
begin += size;
float wyniki = 0.0f;
for (int i = 0; i<N-1; i++) {
std::future<float> res = partialResults[i].get_future();
wyniki += res.get();
std::future<float> res = partialResults[N-1].get_future(); // get res from main
wyniki += res.get();
cout << wyniki << endl;


C++ threads. Why always executes last thread?

Why only last threads executes every time? I'm trying to divide grid into N workers, half of grid always not touchable and other part always proceed by 1 last created thread. Should I use an array instead of vector? Locks also do not help to resolve this problem.
#include <iostream>
#include <unistd.h>
#include <vector>
#include <stdio.h>
#include <cstring>
#include <future>
#include <thread>
#include <pthread.h>
#include <mutex>
using namespace std;
std::mutex m;
int main(int argc, char * argv[]) {
int iterations = atoi(argv[1]), workers = atoi(argv[2]), x = atoi(argv[3]), y = atoi(argv[4]);
vector<vector<int> > grid( x , vector<int> (y, 0));
std::vector<thread> threads(workers);
int start, end, lastworker, nwork;
int chunkSize = y/workers;
for(int t = 0; t < workers; t++){
start = t * chunkSize;
end = start + chunkSize;
nwork = t;
lastworker = workers - 1;
if(lastworker == t){
end = y; nwork = workers - 1;
threads[nwork] = thread([&start, &end, &x, &grid, &t, &nwork, &threads] {
cout << " ENTER TO THREAD -> " << threads[nwork].get_id() << endl;
for (int i = start; i < end; ++i)
for (int j = 0; j < x; ++j)
grid[i][j] = t;
cout << threads[nwork].get_id() << endl;
for(auto& th : threads){
for (int i = 0; i < y; ++i)
for (int j = 0; j < x; ++j)
cout << grid[i][j];
cout << endl;
[&start, &end, &x, &grid, &t, &nwork, &threads]
This line is the root of the problem. You are capturing all the variables by reference, which is not what you want to do.
As a consequence, each thread uses the same variables, which is also not what you want.
You should only capture grid and threads by reference, the other variables should be captured by value ('copied' into the lambda)
[start, end, x, &grid, t, nwork, &threads]
Also, you are accessing grid wrong everywhere: change grid[i][j] to grid[j][i]
thread([&start, &end, &x, &grid, &t, &nwork, &threads] {
The lambda closure that gets executed by every thread captures a reference to nwork.
Which means that as the for loop iterates and starts every thread, each captured thread will always reference the current value of nwork, at the time it does.
As such, the outer loop probably quickly finishes creating each thread object before all the threads actually initialize and actually enter the lambda closure, and each closure sees the same value of nwork, because it is captured by reference, which is the last thread id.
You need to capture nwork by value instead of by reference.
You're passing all the thread parameters are references to the thread lambda. However, when the loop continues in the main thread, the thread parameter variables change, which changes their values in the threads as well, messing up all the previously-created threads.

C ++: How to write to multiple files with different names every loop?

I have a program where I created the output:
std::ofstream OutA("A.dat");
There is a loop in this program where data is created to put in A:
for ( k = 1; k < n_iterations ; k++ ){
OutA << Data_for_A << std::endl;
However, now I wanted to do another loop.
The evolution of the values ​​that I will put in A depends on a variable, T.
So, I'll make several tables for different T. There will be a loop like this:
for ( T = 0; T < x; T = T + 0.5){
for ( k = 1; k < n_iterations ; k++ ){
OutA << Data_for_A << std::endl;
But it would be convenient if, as the loop changed the value of T, it would write in different files with different names, according to the variable T.
The first time you run the loop, pass the data to "OutA1.dat", the second time "OutA2.dat" and follow with the indexes 1,2, ...
Or that the indexes are not 1,2, ... but rather the values ​​of T. Thus: "OutA_T0.dat", in the next "OutA_T0.5.dat", with indexes varying T = 0,0.5 , 1.1.5, ...
What would be the best way to do this?
use std::to_string function.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
string Data_for_A() {
return "hello";
int main() {
double x = 3;
int n_iterations = 5;
for (double T = 0; T < x; T = T + 0.5) {
string name = "OutA_T" + std::to_string(T) + ".dat";
std::ofstream OutA(name.c_str());
for (int k = 1; k < n_iterations ; k++ ) {
OutA << Data_for_A() << std::endl;
return 0;

Using structs / genetic algorithm

As practice for myself I'm trying to create a genetic algorithm that will solve equations. So far my program can generate random "genes", fill up individuals with these "genes", and do some basic calculations with the genes (at the moment, simply summing the "genes").
However, I've realised now that I want to implement my fitness function that I would have been better off creating a struct for individual, since I need to keep the genes and the fitness outcome together to have the fittest genes reproduce again.
Anyway, here's my code:
// GA.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <iostream>
#include <vector>
#include <random>
#include <string>
const int population_size = 10;
const int number_of_variables = 7;
struct one_individual
double evaluation = 0;
double fit = 0;
int main()
// Generate random number
std::random_device rd;
std::mt19937 rng(rd()); // random-number engine (Mersenne-Twister in this case)
std::uniform_real_distribution<double> dist(-10.0, 10.0);
// Create vector that holds vectors called individual and fill size it to the amount of individuals I want to have.
for (int i = 0; i < population_size; i++)
for (int j = 0; j < number_of_variables; j++)
// Display entire population
for (auto &count : individual)
for (auto &count2 : count)
std::cout << count2 << " ";
std::cout << "\n";
// Do calculation with population. At the moment I just add up all the genes (sum) and display the sum for each individual.
for (int i = 0; i < population_size; i++)
int j = 0;
std::cout << "Organism "<< i;
double sum = individual[i].at(j) + individual[i].at(j + 1) + individual[i].at(j + 2) + individual[i].at(j + 3) + individual[i].at(j + 4) + individual[i].at(j + 5) + individual[i].at(j + 6);
std::cout << " is " << sum << "\n";
std::cout << "\n";
return 0;
What I think I should be doing is something like this:
for (int i = 0; i < population_size; i++)
one_individual individual;
for (int j = 0; j < number_of_variables; j++)
The above code is not working. What happens when I try to compile is I get a list of errors, I just pasted it into pastebin since it's a pretty big list: If I remove everything except the parts needed for the "creating individuals part" the errors that remain are: All errors are on line 41 which is the final line one_individual.individual.push_back(variables);
Edited for clarity, apologies that it was unclear.
Consider the instruction
where one_individual is a type (struct one_individual).
I suppose you should use the defined variable of type one_individual, so

Threads failing to affect performance

Below is a small program meant to parallelize the approximation of the 1/(n^2) series. Note the global parameter NUM_THREADS.
My issue is that increasing the number of threads from 1 to 4 (the number of processors my computer has is 4) does not significantly affect the outcomes of timing experiments. Do you see a logical flaw in the ThreadFunction? Is there false sharing or misplaced blocking that ends up serializing the execution?
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <string>
#include <future>
#include <chrono>
std::mutex sum_mutex; // This mutex is for the sum vector
std::vector<double> sum_vec; // This is the sum vector
int NUM_THREADS = 1;
int UPPER_BD = 1000000;
/* Thread function */
void ThreadFunction(std::vector<double> &l, int beg, int end, int thread_num)
double sum = 0;
for(int i = beg; i < end; i++) sum += (1 / ( l[i] * l[i]) );
std::unique_lock<std::mutex> lock1 (sum_mutex, std::defer_lock);
void ListFill(std::vector<double> &l, int z)
for(int i = 0; i < z; ++i) l.push_back(i);
int main()
std::vector<double> l;
std::vector<std::thread> thread_vec;
ListFill(l, UPPER_BD);
int len = l.size();
int lower_bd = 1;
int increment = (UPPER_BD - lower_bd) / NUM_THREADS;
for (int j = 0; j < NUM_THREADS; ++j)
thread_vec.push_back(std::thread(ThreadFunction, std::ref(l), lower_bd, lower_bd + increment, j));
lower_bd += increment;
for (auto &t : thread_vec) t.join();
double big_sum;
for (double z : sum_vec) big_sum += z;
std::cout << big_sum << std::endl;
return 0;
From looking at your code, I suspect that ListFill is taking longer than ThreadFunction. Why pass a list of values to the thread instead of the bounds each thread should loop over? Something like:
void ThreadFunction( int beg, int end ) {
double sum = 0.0;
for(double i = beg; i < end; i++)
sum += (1.0 / ( i * i) );
std::unique_lock<std::mutex> lock1 (sum_mutex);
To maximize parallelism, you need to push as much work as possible onto the threads. See Amdahl's Law
In addition to dohashi's nice improvement, you can remove the need for the mutex by populating the sum_vec in advance in the main thread:
then writing directly to it in ThreadFunction:
sum_vec[thread_num] = sum;
since each thread writes to a distinct element and doesn't modify the vector itself there is no need to lock anything.

Multi-threaded Simulated Annealing

I wrote a multithreaded simulated annealing program but its not running. I am not sure if the code is correct or not. The code is able to compile but when i run the code it crashes. Its just a run time error.
#include <stdio.h>
#include <time.h>
#include <iostream>
#include <stdlib.h>
#include <math.h>
#include <string>
#include <vector>
#include <algorithm>
#include <fstream>
#include <ctime>
#include <windows.h>
#include <process.h>
using namespace std;
typedef vector<double> Layer; //defines a vector type
typedef struct {
Layer Solution1;
double temp1;
double coolingrate1;
int MCL1;
int prob1;
//void SA(Layer Solution, double temp, double coolingrate, int MCL, int prob){
double Rand_NormalDistri(double mean, double stddev) {
//Random Number from Normal Distribution
static double n2 = 0.0;
static int n2_cached = 0;
if (!n2_cached) {
// choose a point x,y in the unit circle uniformly at random
double x, y, r;
do {
// scale two random integers to doubles between -1 and 1
x = 2.0*rand()/RAND_MAX - 1;
y = 2.0*rand()/RAND_MAX - 1;
r = x*x + y*y;
} while (r == 0.0 || r > 1.0);
// Apply Box-Muller transform on x, y
double d = sqrt(-2.0*log(r)/r);
double n1 = x*d;
n2 = y*d;
// scale and translate to get desired mean and standard deviation
double result = n1*stddev + mean;
n2_cached = 1;
return result;
} else {
n2_cached = 0;
return n2*stddev + mean;
double FitnessFunc(Layer x, int ProbNum)
int i,j,k;
double z;
double fit = 0;
double sumSCH;
// Ellipsoidal function
for(j=0;j< x.size();j++)
else if(ProbNum==2){
// Schwefel's function
for(j=0; j< x.size(); j++)
for(i=0; i<j; i++)
sumSCH += x[i];
fit += sumSCH * sumSCH;
else if(ProbNum==3){
// Rosenbrock's function
for(j=0; j< x.size()-1; j++)
fit += 100.0*(x[j]*x[j] - x[j+1])*(x[j]*x[j] - x[j+1]) + (x[j]-1.0)*(x[j]-1.0);
return fit;
double probl(double energychange, double temp){
double a;
a= (-energychange)/temp;
return double(min(1.0,exp(a)));
int random (int min, int max){
int n = max - min + 1;
int remainder = RAND_MAX % n;
int x;
x = rand();
}while (x >= RAND_MAX - remainder);
return min + x % n;
//void SA(Layer Solution, double temp, double coolingrate, int MCL, int prob){
void SA(void *param){
t *args = (t*) param;
Layer Solution = args->Solution1;
double temp = args->temp1;
double coolingrate = args->coolingrate1;
int MCL = args->MCL1;
int prob = args->prob1;
double Energy;
double EnergyNew;
double EnergyChange;
Layer SolutionNew(50);
Energy = FitnessFunc(Solution, prob);
while (temp > 0.01){
for ( int i = 0; i < MCL; i++){
for (int j = 0 ; j < SolutionNew.size(); j++){
SolutionNew[j] = Rand_NormalDistri(5, 1);
EnergyNew = FitnessFunc(SolutionNew, prob);
EnergyChange = EnergyNew - Energy;
if(EnergyChange <= 0){
Solution = SolutionNew;
Energy = EnergyNew;
if(probl(EnergyChange ,temp ) > random(0,1)){
Solution = SolutionNew;
Energy = EnergyNew;
cout << temp << "=" << Energy << endl;
temp = temp * coolingrate;
int main ()
srand ( time(NULL) ); //seed for getting different numbers each time the prog is run
Layer SearchSpace(50); //declare a vector of 20 dimensions
//for(int a = 0;a < 10; a++){
for (int i = 0 ; i < SearchSpace.size(); i++){
SearchSpace[i] = Rand_NormalDistri(5, 1);
t *arg1;
arg1 = (t *)malloc(sizeof(t));
arg1->Solution1 = SearchSpace;
arg1->temp1 = 1000;
arg1->coolingrate1 = 0.01;
arg1->MCL1 = 100;
arg1->prob1 = 3;
//cout << "Test " << ""<<endl;
_beginthread( SA, 0, (void*) arg1);
Sleep( 100 );
//SA(SearchSpace, 1000, 0.01, 100, 3);
return 0;
Please help.
As leftaroundabout pointed out, you're using malloc in C++ code. This is the source of your crash.
Malloc will allocate a block of memory, but since it was really designed for C, it doesn't call any C++ constructors. In this case, the vector<double> is never properly constructed. When
arg1->Solution1 = SearchSpace;
Is called, the member variable "Solution1" has an undefined state and the assignment operator crashes.
Instead of malloc try
arg1 = new t;
This will accomplish roughly the same thing but the "new" keyword also calls any necessary constructors to ensure the vector<double> is properly initialized.
This also brings up another minor issue, that this memory you've newed also needs to be deleted somewhere. In this case, since arg1 is passed to another thread, it should probably be cleaned up like
delete args;
by your "SA" function after its done with the args variable.
While I don't know the actual cause for your crashes I'm not really surprised that you end up in trouble. For instance, those "cached" static variables in Rand_NormalDistri are obviously vulnerable to data races. Why don't you use std::normal_distribution? It's almost always a good idea to use standard library routines when they're available, and even more so when you need to consider multithreading trickiness.
Even worse, you're heavily mixing C and C++. malloc is something you should virtually never use in C++ code – it doesn't know about RAII, which is one of the few intrinsically safe things you can cling onto in C++.