I am using the following code to create 10 threads. I expect to receive different random numbers from my threads and print them. but the results are the same.
#include "pch.h"
#include <iostream>
#include "C.h"
#include "BB.h"
#include <vector>
#include <thread>
#include <mutex>
#include <future>
void initiazer(std::promise<int> * promObj, int i)
{
std::cout << "Inside Thread " <<i<< std::endl;
(promObj)->set_value((rand() % 100) + 1);
}
int main()
{
srand((unsigned)time(0));
std::promise<int> promiseObj[10];
std::future<int> futureObj [10];
std::thread th[10];
for (size_t i = 0; i < 10; i++)
{
futureObj[i] = promiseObj[i].get_future();
}
for (size_t i = 0; i < 10; i++)
{
th[i] = std::thread(initiazer,&promiseObj[i],i) ;
std::cout << futureObj[i].get() << std::endl;
}
for (size_t i = 0; i < 10; i++)
{
th[i].join();
}
return 0;
}
rand() is not threadsafe, see https://linux.die.net/man/3/rand. Use the more modern functions defined in random instead, e.g.
std::random_device rd;
auto seed = rd ();
std::mt19937 mt (seed);
....
auto random_number = mt ();
Edit:
As others have pointed out, mt19937::operator () is not guaranteed to be threadsafe either. Better then, as suggested by n.m., to create one of these objects per thread as the updated live demo now shows.
Live demo
Related
I am trying to write a multi-threaded program to produce a vector of N*NumPerThread uniform random integers, where N is the return value of std::thread::hardware_concurrency() and NumPerThread is the amount of random numbers I want each thread to generate.
I created a multi-threaded version:
#include <iostream>
#include <thread>
#include <vector>
#include <random>
#include <chrono>
using Clock = std::chrono::high_resolution_clock;
namespace Vars
{
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> RandNums(NumPerThread*N);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
int sz = 0;
}
using namespace Vars;
void AddN(int start)
{
static std::mutex mtx;
std::lock_guard<std::mutex> lock(mtx);
for (unsigned int i=start; i<start+NumPerThread; i++)
{
RandNums[i] = dis(gen);
++sz;
}
}
int main()
{
auto start_time = Clock::now();
std::vector<std::thread> threads;
threads.reserve(N);
for (unsigned int i=0; i<N; i++)
{
threads.emplace_back(std::move(std::thread(AddN, i*NumPerThread)));
}
for (auto &i: threads)
{
i.join();
}
auto end_time = Clock::now();
std::cout << "\nTime difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
std::cout << "size = " << sz << '\n';
}
and a single-threaded version
#include <iostream>
#include <thread>
#include <vector>
#include <random>
#include <chrono>
using Clock = std::chrono::high_resolution_clock;
namespace Vars
{
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> RandNums(NumPerThread*N);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
int sz = 0;
}
using namespace Vars;
void AddN()
{
for (unsigned int i=0; i<NumPerThread*N; i++)
{
RandNums[i] = dis(gen);
++sz;
}
}
int main()
{
auto start_time = Clock::now();
AddN();
auto end_time = Clock::now();
std::cout << "\nTime difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
std::cout << "size = " << sz << '\n';
}
The execution times are more or less the same. I am assuming there is a problem with the multi-threaded version?
P.S. I looked at all of the other similar questions here, I don't see how they directly apply to this task...
Threading is not a magical salve you can rub onto any code that makes it go faster. Like any tool, you have to use it correctly.
In particular, if you want performance out of threading, among the most important questions you need to ask is what data needs to be shared across threads. Your algorithm decided that the data which needs to be shared is the entire std::vector<int> result object. And since different threads cannot manipulate the object at the same time, each thread has to wait its turn to do the manipulation.
Your code is the equivalent of expecting 10 chefs to cook 10 meals in the same time as 1 chef, but you only provide them a single stove.
Threading works out best when nobody has to wait on anybody else to get any work done. Arrange your algorithms accordingly. For example, each thread could build its own array and return them, with the receiving code concatenating all of the arrays together.
You can do with without any mutex.
Create your vector
Use a mutex just to (and technically this probably isn't ncessary) to create an iterator point at v.begin () + itsThreadIndex*NumPerThread;
then each thread can freely increment that iterator and write to a part of the vector not touched by other threads.
Be sure each thread has its own copy of
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
That should run much faster.
UNTESTED code - but this should make my above suggestion more clear:
using Clock = std::chrono::high_resolution_clock;
namespace SharedVars
{
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> RandNums(NumPerThread*N);
std::mutex mtx;
}
void PerThread_AddN(int threadNumber)
{
using namespace SharedVars;
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
int sz = 0;
vector<int>::iterator from;
vector<int>::iterator to;
{
std::lock_guard<std::mutex> lock(mtx); // hold the lock only while accessing shared vector, not while accessing its contents
from = RandNums.begin () + threadNumber*NumPerThread;
to = from + NumPerThread;
}
for (auto i = from; i < to; ++i)
{
*i = dis(gen);
}
}
int main()
{
auto start_time = Clock::now();
std::vector<std::thread> threads;
threads.reserve(N);
for (unsigned int i=0; i<N; i++)
{
threads.emplace_back(std::move(std::thread(PerThread_AddN, i)));
}
for (auto &i: threads)
{
i.join();
}
auto end_time = Clock::now();
std::cout << "\nTime difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
std::cout << "size = " << sz << '\n';
}
Nicol Boas was right on the money. I reimplemented it using std::packaged_task, and it's around 4-5 times faster now.
#include <iostream>
#include <vector>
#include <random>
#include <future>
#include <chrono>
using Clock = std::chrono::high_resolution_clock;
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> x(NumPerThread);
std::vector<int> createVec()
{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
for (unsigned int i = 0; i < NumPerThread; i++)
{
x[i] = dis(gen);
}
return x;
}
int main()
{
auto start_time = Clock::now();
std::vector<int> RandNums;
RandNums.reserve(N*NumPerThread);
std::vector<std::future<std::vector<int>>> results;
results.reserve(N);
std::vector<int> crap;
crap.reserve(NumPerThread);
for (unsigned int i=0; i<N; i++)
{
std::packaged_task<std::vector<int>()> temp(createVec);
results[i] = std::move(temp.get_future());
temp();
crap = std::move(results[i].get());
RandNums.insert(RandNums.begin()+(0*NumPerThread),crap.begin(),crap.end());
}
std::cout << RandNums.size() << '\n';
auto end_time = Clock::now();
std::cout << "Time difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
}
But is there a way to make this one better? lewis's version is way faster than this, so there must be something else missing...
I've got a problem with srand(). It only works when I use a number as a parameter, for example srand(1234), but when I try to use it with 'n' or with time (as below), then randint() keeps returning the same value.
#include <iostream>
#include <experimental/random>
#include <cstdlib>
#include <ctime>
using namespace std;
int main() {
srand(time(nullptr));
for (int i = 0; i < 4; ++i) {
int random = experimental::randint(0, 9);
cout << random;
}
}
Thanks for your time.
The C function srand is meant to be used in combination with the C function rand. These are separate functions from those in C++'s std::experimental header. The randint function from the latter is meant to be used with the reseed function from the same header:
#include <experimental/random>
#include <iostream>
int main() {
std::experimental::reseed();
for (int i = 4; i--; ) {
int random = std::experimental::randint(0, 9);
std::cout << random << '\n';
}
}
However, there is no need to use experimental features here. Since C++11, there is std::uniform_int_distribution:
#include <iostream>
#include <random>
int main() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> distrib(0, 9); // Default type is 'int'
for (int i = 4; i--; ) {
int random = distrib(gen);
std::cout << random << '\n';
}
}
This method is more flexible than the one from the C standard library and should, generally, be preferred in C++.
I really don't like the rand() function.I wanted to use the library but I don't really know how to set up a range for example from 1 to 3. I want to "random" these numbers(1,2,3) and not huge numbers like 243245.This code is how you can use the random library and print random numbers
#include <iostream>
#include <string>
#include <random>
using namespace std;
int main()
{
minstd_rand simple_rand;
simple_rand.seed(NULL);
for (int ii = 0; ii < 10; ++ii)
{
std::cout << simple_rand() << '\n';
}
}
Use std::uniform_int_distribution:
#include <ctime>
#include <iostream>
#include <random>
int main()
{
std::mt19937 rng(std::time(0)); // `std::minstd_rand` would also work.
std::uniform_int_distribution d(1,3);
for (int i = 0; i < 10; i++)
{
std::cout << d(rng) << '\n';
}
}
#include <iostream>
#include <random>
int main()
{
std::random_device rd; //Will be used to obtain a seed for the random number engine
std::mt19937 gen(rd()); //Standard mersenne_twister_engine seeded with rd()
std::uniform_int_distribution<> dis(1, 3);
for (int n=0; n<10; ++n)
//Use dis to transform the random unsigned int generated by gen into an int in [1, 6]
std::cout << dis(gen) << ' ';
std::cout << '\n';
}
Thanks to #holyBlackCat Credit to: cppreference.com
I have a integer variable, that contains the number of threads to execute. Lets call it myThreadVar. I want to execute myThreadVar threads, and cannot think of any way to do it, without a ton of if statements. Is there any way I can create myThreadVar threads, no matter what myThreadVar is?
I was thinking:
for (int i = 0; i < myThreadVar; ++i) { std::thread t_i(myFunc); }, but that obviously won't work.
Thanks in advance!
Make an array or vector of threads, put the threads in, and then if you want to wait for them to finish have a second loop go over your collection and join them all:
std::vector<std::thread> myThreads;
myThreads.reserve(myThreadVar);
for (int i = 0; i < myThreadVar; ++i)
{
myThreads.push_back(std::thread(myFunc));
}
While other answers use vector::push_back(), I prefer vector::emplace_back(). Possibly more efficient. Also use vector::reserve(). See it live here.
#include <thread>
#include <vector>
void func() {}
int main() {
int num = 3;
std::vector<std::thread> vec;
vec.reserve(num);
for (auto i = 0; i < num; ++i) {
vec.emplace_back(func);
}
for (auto& t : vec) t.join();
}
So, obvious the best solution is not to wait previous thread to done. You need to run all of them in parallel.
In this case you can use vector class to store all of instances and after that make join to all of them.
Take a look at my example.
#include <thread>
#include <vector>
void myFunc() {
/* Some code */
}
int main()
{
int myThreadVar = 50;
std::vector <thread> threadsToJoin;
threadsToJoin.resize(myThreadVar);
for (int i = 0; i < myThreadVar; ++i) {
threadsToJoin[i] = std::thread(myFunc);
}
for (int i = 0; i < threadsToJoin.size(); i++) {
threadsToJoin[i].join();
}
}
#include <iostream>
#include <thread>
void myFunc(int n) {
std::cout << "myFunc " << n << std::endl;
}
int main(int argc, char *argv[]) {
int myThreadVar = 5;
for (int i = 0; i < myThreadVar; ++i) {
std::cout << "Launching " << i << std::endl;
std::thread t_i(myFunc,i);
t_i.detach();
}
}
g++ -std=c++11 -o 35106568 35106568.cpp
./35106568
Launching 0
myFunc 0
Launching 1
myFunc 1
Launching 2
myFunc 2
Launching 3
myFunc 3
Launching 4
myFunc 4
You need to store the thread so you can send it to join.
std::thread t[myThreadVar];
for (int i = 0; i < myThreadVar; ++i) { t[i] = std::thread(myFunc); }//Start all threads
for (int i = 0; i < myThreadVar; ++i) {t[i].join;}//Wait for all threads to finish
I think this is valid syntax, but I'm more used to c so I am unsure if I initialized the array correctly.
I am trying to get real random values using boost::random libraries. This is my code:
#include <iostream>
#include <boost/random/uniform_real_distribution.hpp>
#include <boost/random/mersenne_twister.hpp>
boost::random::mt19937 eng = boost::random::mt19937();
boost::random::uniform_real_distribution<double> urd =
boost::random::uniform_real_distribution<double>(0,20);
for (int i = 0; i <= 100; i++)
std::cout << urd(eng) << std::endl;
But I get integer numbers between 0 and 20.
How can I do?
I also tried another engine:
#include <iostream>
#include <boost/random/uniform_real_distribution.hpp>
#include <boost/random/lagged_fibonacci.hpp>
boost::random::lagged_fibonacci607 eng = boost::random::lagged_fibonacci607();
boost::random::uniform_real_distribution<double> urd =
boost::random::uniform_real_distribution<double>(0,20);
for (int i = 0; i <= 100; i++)
std::cout << urd(eng) << std::endl;
But nothing... (always integer values)
How about setting the precision before you output? std::cout.precision(15);?
Or use:
std::cout.precision(std::numeric_limits<double>::digits10);
Example
#include <iostream>
#include <limits>
#include <boost/random/uniform_real_distribution.hpp>
#include <boost/random/mersenne_twister.hpp>
int main()
{
boost::random::mt19937 eng = boost::random::mt19937();
boost::random::uniform_real_distribution<double> urd =
boost::random::uniform_real_distribution<double>(0,20);
std::cout.precision(std::numeric_limits<double>::digits10);
for (int i = 0; i <= 100; i++)
{
std::cout << urd(eng) << std::endl;
}
}
The default precision for std::cout is set at 6, so it should work without setting this, but...