Thread safety of a static random number generator

Thread safety of a static random number generator - c++

I have a bunch of threads, each one needs a thread safe random number. Since in my real program threads are spawned and joined repeatedly, I wouldn't like to create random_device and mt19937 each time I enter a new parallel region which calls the same function, so I put them as static:
#include <iostream>
#include <random>
#include <omp.h>
void test(void) {
static std::random_device rd;
static std::mt19937 rng(rd());
static std::uniform_int_distribution<int> uni(1, 1000);
int x = uni(rng);
# pragma omp critical
std::cout << "thread " << omp_get_thread_num() << " | x = " << x << std::endl;
}
int main() {
# pragma omp parallel num_threads(4)
test();
}
I cannot place them as threadprivate because of Error C3057: dynamic initialization of 'threadprivate' symbols is not currently supported. Some sources say random_device and mt19937 are thread safe, but I haven't managed to find any docs which would prove it.
Is this randomization thread safe?
If no, which of the static objects can be left as static to preserve thread safety?

Here is a different approach. I keep a global seeding value so that the random_device is only used once. Since using it can be very slow, I think it is prudent to only use it as rarely as possible.
Instead we increment the seeding value per thread and also per use. That way we avoid the birthday paradox and we minimize the thread-local state to a single integer.
#include <omp.h>
#include <algorithm>
#include <array>
#include <random>
using seed_type = std::array<std::mt19937::result_type, std::mt19937::state_size>;
namespace {
seed_type init_seed()
{
seed_type rtrn;
std::random_device rdev;
std::generate(rtrn.begin(), rtrn.end(), std::ref(rdev));
return rtrn;
}
}
/**
* Provides a process-global random seeding value
*
* Thread-safe (assuming the C++ compiler if standard-conforming.
* Seed is initialized on first call
*/
seed_type global_seed()
{
static seed_type rtrn = init_seed();
return rtrn;
}
/**
* Creates a new random number generator
*
* Operation is thread-safe, Each thread will get its own RNG with a different
* seed. Repeated calls within a thread will create different RNGs, too.
*/
std::mt19937 make_rng()
{
static std::mt19937::result_type sequence_number = 0;
# pragma omp threadprivate(sequence_number)
seed_type seed = global_seed();
static_assert(seed.size() >= 3);
seed[0] += sequence_number++;
seed[1] += static_cast<std::mt19937::result_type>(omp_get_thread_num());
seed[2] += static_cast<std::mt19937::result_type>(omp_get_level());
std::seed_seq sseq(seed.begin(), seed.end());
return std::mt19937(sseq);
}
See also this: How to make this code thread safe with openMP? Monte Carlo two-dimensional integration
For the approach of just increment the seeding value, see this:
https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/

I think threadprivate is the right approach still, and you can obviate the initialization problem by doing a parallel assignment later.
static random_device rd;
static mt19937 rng;
#pragma omp threadprivate(rd)
#pragma omp threadprivate(rng)
int main() {
#pragma omp parallel
rng = mt19937(rd());
#pragma omp parallel
{
stringstream res;
uniform_int_distribution<int> uni(1, 100);
res << "Thread " << omp_get_thread_num() << ": " << uni(rng) << "\n";
cout << res.str();
}
return 0;
}
Btw, note the stringstream: OpenMP has a tendency to split output lines at the << operators.

Related

C++ multithreaded version of creating vector of random numbers slower than single-threaded version

I am trying to write a multi-threaded program to produce a vector of N*NumPerThread uniform random integers, where N is the return value of std::thread::hardware_concurrency() and NumPerThread is the amount of random numbers I want each thread to generate.
I created a multi-threaded version:
#include <iostream>
#include <thread>
#include <vector>
#include <random>
#include <chrono>
using Clock = std::chrono::high_resolution_clock;
namespace Vars
{
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> RandNums(NumPerThread*N);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
int sz = 0;
}
using namespace Vars;
void AddN(int start)
{
static std::mutex mtx;
std::lock_guard<std::mutex> lock(mtx);
for (unsigned int i=start; i<start+NumPerThread; i++)
{
RandNums[i] = dis(gen);
++sz;
}
}
int main()
{
auto start_time = Clock::now();
std::vector<std::thread> threads;
threads.reserve(N);
for (unsigned int i=0; i<N; i++)
{
threads.emplace_back(std::move(std::thread(AddN, i*NumPerThread)));
}
for (auto &i: threads)
{
i.join();
}
auto end_time = Clock::now();
std::cout << "\nTime difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
std::cout << "size = " << sz << '\n';
}
and a single-threaded version
#include <iostream>
#include <thread>
#include <vector>
#include <random>
#include <chrono>
using Clock = std::chrono::high_resolution_clock;
namespace Vars
{
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> RandNums(NumPerThread*N);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
int sz = 0;
}
using namespace Vars;
void AddN()
{
for (unsigned int i=0; i<NumPerThread*N; i++)
{
RandNums[i] = dis(gen);
++sz;
}
}
int main()
{
auto start_time = Clock::now();
AddN();
auto end_time = Clock::now();
std::cout << "\nTime difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
std::cout << "size = " << sz << '\n';
}
The execution times are more or less the same. I am assuming there is a problem with the multi-threaded version?
P.S. I looked at all of the other similar questions here, I don't see how they directly apply to this task...

Threading is not a magical salve you can rub onto any code that makes it go faster. Like any tool, you have to use it correctly.
In particular, if you want performance out of threading, among the most important questions you need to ask is what data needs to be shared across threads. Your algorithm decided that the data which needs to be shared is the entire std::vector<int> result object. And since different threads cannot manipulate the object at the same time, each thread has to wait its turn to do the manipulation.
Your code is the equivalent of expecting 10 chefs to cook 10 meals in the same time as 1 chef, but you only provide them a single stove.
Threading works out best when nobody has to wait on anybody else to get any work done. Arrange your algorithms accordingly. For example, each thread could build its own array and return them, with the receiving code concatenating all of the arrays together.

You can do with without any mutex.
Create your vector
Use a mutex just to (and technically this probably isn't ncessary) to create an iterator point at v.begin () + itsThreadIndex*NumPerThread;
then each thread can freely increment that iterator and write to a part of the vector not touched by other threads.
Be sure each thread has its own copy of
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
That should run much faster.
UNTESTED code - but this should make my above suggestion more clear:
using Clock = std::chrono::high_resolution_clock;
namespace SharedVars
{
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> RandNums(NumPerThread*N);
std::mutex mtx;
}
void PerThread_AddN(int threadNumber)
{
using namespace SharedVars;
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
int sz = 0;
vector<int>::iterator from;
vector<int>::iterator to;
{
std::lock_guard<std::mutex> lock(mtx); // hold the lock only while accessing shared vector, not while accessing its contents
from = RandNums.begin () + threadNumber*NumPerThread;
to = from + NumPerThread;
}
for (auto i = from; i < to; ++i)
{
*i = dis(gen);
}
}
int main()
{
auto start_time = Clock::now();
std::vector<std::thread> threads;
threads.reserve(N);
for (unsigned int i=0; i<N; i++)
{
threads.emplace_back(std::move(std::thread(PerThread_AddN, i)));
}
for (auto &i: threads)
{
i.join();
}
auto end_time = Clock::now();
std::cout << "\nTime difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
std::cout << "size = " << sz << '\n';
}

Nicol Boas was right on the money. I reimplemented it using std::packaged_task, and it's around 4-5 times faster now.
#include <iostream>
#include <vector>
#include <random>
#include <future>
#include <chrono>
using Clock = std::chrono::high_resolution_clock;
const unsigned int N = std::thread::hardware_concurrency(); //number of threads on device
const unsigned int NumPerThread = 5e5; //number of random numbers to generate per thread
std::vector<int> x(NumPerThread);
std::vector<int> createVec()
{
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 1000);
for (unsigned int i = 0; i < NumPerThread; i++)
{
x[i] = dis(gen);
}
return x;
}
int main()
{
auto start_time = Clock::now();
std::vector<int> RandNums;
RandNums.reserve(N*NumPerThread);
std::vector<std::future<std::vector<int>>> results;
results.reserve(N);
std::vector<int> crap;
crap.reserve(NumPerThread);
for (unsigned int i=0; i<N; i++)
{
std::packaged_task<std::vector<int>()> temp(createVec);
results[i] = std::move(temp.get_future());
temp();
crap = std::move(results[i].get());
RandNums.insert(RandNums.begin()+(0*NumPerThread),crap.begin(),crap.end());
}
std::cout << RandNums.size() << '\n';
auto end_time = Clock::now();
std::cout << "Time difference = "
<< std::chrono::duration<double, std::nano>(end_time - start_time).count() << " nanoseconds\n";
}
But is there a way to make this one better? lewis's version is way faster than this, so there must be something else missing...

Can I use a single `default_random_engine` to create multiple normally distributed sets of numbers?

I want to generate a set of unit vectors (for any arbitrary dimension), which are evenly distributed across all directions. For this I generate normally distributed numbers for each vector component and scale the result by the inverse of the magnitude.
My question: Can I use a single std::default_random_engine to generate numbers for all components of my vector or does every component require its own engine?
Afaik, each component needs to be Gaussian-distributed independently for the math to work out and I cannot assess the difference between the two scenarios. Here's a MWE with a single RNG (allocation and normalization of vectors is omitted here).
std::vector<std::vector<double>> GenerateUnitVecs(size_t dimension, size_t count)
{
std::vector<std::vector<double>> result;
/* Set up a _single_ RNG */
size_t seed = GetSeed(); // system_clock
std::default_random_engine gen(seed);
std::normal_distribution<double> distribution(0.0, 1.0);
/* Generate _multiple_ (independent?) distributions */
for(size_t ii = 0; ii < count; ++ii){
std::vector<double> vec;
for(size_t comp = 0; comp < dimension; ++comp)
vec.push_back(distribution(gen)); // <-- random number goes here
result.push_back(vec);
}
return result;
}
Thank you.

The OP asked:
My question: Can I use a single std::default_random_engine to generate numbers for all components of my vector or does every component require its own engine?
I would suggest as others have stated in the comments about not using std::default_random_engine and instead use std::random_device or std::chrono::high_resolution_clock
To use random_device for a normal distribution or Gaussian it is quite simple:
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <random>
#include <cmath>
int main() {
std::random_device rd{};
std::mt19937 gen{ rd() };
// values near the mean are the most likely
// standard deviation affects the dispersion of generated values from the mean
std::normal_distribution<> d{5,2};
std::map<int, int> hist{};
for ( int n=0; n<10000; ++n ) {
++hist[std::round(d(gen))];
}
for ( auto p : hist ) {
std::cout << std::setw(2)
<< p.first << ' ' << std::string(p.second/200, '*' ) << '\n';
}
}
To use std::chrono::high_resolution_clock: there is a little more work but just as easy.
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <random>
#include <cmath>
#include <limits>
#include <chrono>
class ChronoClock {
public:
using Clock = std::conditional_t<std::chrono::high_resolution_clock::is_steady,
std::chrono::high_resolution_clock,
std::chrono::steady_clock>;
static unsigned int getTimeNow() {
unsigned int now = static_cast<unsigned int>(Clock::now().time_since_epoch().count());
return now;
}
};
int main() {
/*static*/ std::mt19937 gen{}; // Can be either static or not.
gen.seed( ChronoClock::getTimeNow() );
// values near the mean are the most likely
// standard deviation affects the dispersion of generated values from the mean
std::normal_distribution<> d{5,2};
std::map<int, int> hist{};
for ( int n=0; n<10000; ++n ) {
++hist[std::round(d(gen))];
}
for ( auto p : hist ) {
std::cout << std::setw(2)
<< p.first << ' ' << std::string(p.second/200, '*' ) << '\n';
}
}
As you can see from the examples above where these are shown here from cppreference.comthere is a single engine, single seed, and a single distribution, that it is generating random numbers or sets of random numbers with a single engine.
EDIT - Additionally you can use a class that I've written as a wrapper class for random engines and random distributions. You can refer to this answer of mine here.

I am assuming you are not generating random numbers in parallel. Then theoretically, there is no problem with generating random independent Gaussian vectors with one engine.
Each call to std::normal_distribution's () operator gives you a random real-valued number following specified Gaussian distribution. Successive calls of () operator give you independent samples. The implementation in gcc (my version: 4.8) uses the Marsaglia Polar method for standard normal random number generation. You can read this Wikipedia page for more detail.
However, for rigorous scientific research that demands high quality randomness and a huge amount of random samples, I would recommend using the Mersenne-Twister engine (mt19937 32-bit or 64-bit) instead of the default engine, since it is based on a well-established method, has long period and performs well on statistical random tests.

Math.Random equivalent in C++

I have been programming in Java for three years, and have been using Math.random() to get a random number. I'm fairly new to C++, and I was wondering if there was equivalent to that but in C++? A specific function or method that I could use? Also include an explanation. Thanks so much!

C++ provides a fairly nice random number library, <random>, but it doesn't yet have the sort of dead simple API beginners generally want. It's easy to produce such an API, as I show below, and hopefully some such API will be included at some point.
The C++ API splits random number generation into two parts, sources of 'randomness', and machinery for turning randomness into numbers with specific distributions. Many basic uses of random numbers don't particularly care how good (or fast, or small) the source of 'randomness' is, and they only need 'uniform' distributions. So the typically recommended source of randomness is the "Mersenne Twister" engine. You create one of these and seed it like so:
#include <random>
int main() {
std::mt19937 eng{42};
}
Now eng is an object that can be passed around and used as a source for random bits. It's a value-type so you can make copies of it, assign to it, etc. like a normal value. In terms of thread safety, accessing this value is like accessing any other, so if you need multiple threads you should either put an engine on each thread or use mutual exclusion.
To turn data from an engine into random values, use a distribution object. Typical uses need 'uniform' distributions, so for integral values use std::uniform_int_distribution<int>.
std::uniform_int_distribution<int> dice{1, 6};
A distribution object is a function object, and you get values from it by calling it and passing it the source of randomness it will use:
auto die_roll = dice(eng);
One thing to keep in mind is that the math for producing random values should be encapsulated inside a distribution object. If you find yourself doing some kind of transformation on the results then you probably should be using a different distribution. Don't do things like dist(eng) % 10 or dist(eng) / 6.0 + 10.0. There are several other distributions provided in the library, including ones for producing floating point values with various distributions.
Here's a pretty easy way to wrap the <random> functionality for simple usage:
#include <iostream>
#include <random>
std::mt19937 seeded_eng() {
std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
return std::mt19937(seed);
}
class Random {
std::mt19937 eng = seeded_eng();
public:
auto operator()(int a, int b) {
std::uniform_int_distribution<int> dist(a, b);
return dist(eng);
}
};
int main() {
Random random;
for (int i = 0; i < 10; ++i) {
std::cout << "Dice: " << random(1, 6) << " " << random(1, 6) << '\n';
}
}

#include <iostream>
#include <ctime>
int main()
{
srand((unsigned int) time (NULL)); //activates the generator
//...
int a = rand()%10; //gives a random from 0 to 9
double r = ((double) rand() / (RAND_MAX)); //gives a random from 0 to 1
int max, min;
//...
int c = (rand()%(max - min)) + min; //gives a random from min to max
//...
return 0;
}
These ways are the simpliest.
Sometimes it means "the best", sometimes - not.

1.srand((unsigned) time(0)) will make sure that everytime you run your program that the rand() function will get a new seed causing it to produce a different or "random" output. Without stand((unsigned) time(0)), the rand() will produce the same output.
2.int Number, is used to store the random number that is being generated by the rand() function. The rand() % 27 will give you numbers 0-26.
#include <iostream>
#include <ctime>
int main()
{
srand((unsigned)time(0))
int Number = ((rand() % 27));
cout << Number << endl;
return 0;
}

Here is a simple solution. The function random is overloaded. One instance is used to acquire a random number generator for integers. Another instance is used to acquire a random number generator for doubles. After you have these two functions, applications becomes rather trivial as can be observed in the main function.
#include <algorithm>
#include <functional>
#include <iostream>
#include <iterator>
#include <numeric>
#include <ostream>
#include <random>
// Single global engine, a better version of std::rand
std::mt19937 engine{ std::random_device()() };
// Returns a generator producing uniform random integers in the closed range [a, b]
std::function<int()> random(int a, int b)
{
auto dist = std::uniform_int_distribution<>(a, b);
return std::bind(dist, std::ref(engine));
}
// Returns a generator producing uniform random doubles in the half-open range [x, y)
std::function<double()> random(double x, double y)
{
auto dist = std::uniform_real_distribution<>(x, y);
return std::bind(dist, std::ref(engine));
}
int main()
{
const auto no_iterations = int{ 12 };
auto dice = random(1, 6);
// Roll the dice a few times and observe the outcome
std::generate_n(std::ostream_iterator<int>(std::cout, " "),
no_iterations, dice);
std::cout << std::endl;
// U is a uniform random variable on the unit interval [0, 1]
auto U = random(0.0, 1.0);
// Generate some observations
std::vector<double> observations;
std::generate_n(std::back_inserter(observations), no_iterations, U);
// Calculate the mean of the observations
auto sum = std::accumulate(observations.cbegin(), observations.cend(), 0.0);
auto mean = sum / no_iterations;
std::cout << "The mean is " << mean << std::endl;
return 0;
}

Generating random numbers in parallel with identical engines fails

I am using the RNG provided by C++11 and I am also toying around with OpenMP. I have assigned an engine to each thread and as a test I give the same seed to each engine. This means that I would expect both threads to yield the exact same sequence of randomly generated numbers. Here is a MWE:
#include <iostream>
#include <random>
using namespace std;
uniform_real_distribution<double> uni(0, 1);
normal_distribution<double> nor(0, 1);
int main()
{
#pragma omp parallel
{
mt19937 eng(0); //GIVE EACH THREAD ITS OWN ENGINE
vector<double> vec;
#pragma omp for
for(int i=0; i<5; i++)
{
nor(eng);
vec.push_back(uni(eng));
}
#pragma omp critical
cout << vec[0] << endl;
}
return 0;
}
Most often I get the output 0.857946 0.857946, but a few times I get 0.857946 0.592845. How is the latter result possible, when the two threads have identical, uncorrelated engines?!

You have to put nor and uni inside the omp parallel region too. Like this:
#pragma omp parallel
{
uniform_real_distribution<double> uni(0, 1);
normal_distribution<double> nor(0, 1);
mt19937 eng(0); //GIVE EACH THREAD ITS OWN ENGINE
vector<double> vec;
Otherwise there will only be one copy of each, when in fact every thread needs its own copy.
Updated to add: I now see that exactly the same problem is discussed in
this stackoverflow thread.

Duplicate values generated by mt19937

I am working with C++11's random library, and I have a small program that generates a coordinate pair x, y on a circle with unit radius. Here is the simple multithreaded program
#include <iostream>
#include <fstream>
#include <random>
using namespace std;
int main()
{
const double PI = 3.1415;
double angle, radius, X, Y;
int i;
vector<double> finalPositionX, finalPositionY;
#pragma omp parallel
{
vector <double> positionX, positionY;
mt19937 engine(0);
uniform_real_distribution<> uniform(0, 1);
normal_distribution<double> normal(0, 1);
#pragma omp for private(angle, radius, X, Y)
for(i=0; i<1000000; ++i)
{
angle = uniform(engine)*2.0*PI;
radius = sqrt(uniform(engine));
X = radius*cos(angle);
Y = radius*sin(angle);
positionX.push_back(X);
positionY.push_back(Y);
}
#pragma omp barrier
#pragma omp critical
finalPositionX.insert(finalPositionX.end(), positionX.begin(), positionX.end());
finalPositionY.insert(finalPositionY.end(), positionY.begin(), positionY.end());
}
ofstream output_data("positions.txt", ios::out);
output_data.precision(9);
for(unsigned long long temp_var=0; temp_var<(unsigned long long)finalPositionX.size(); temp_var++)
{
output_data << finalPositionX[temp_var]
<< "\t\t\t\t"
<< finalPositionY[temp_var]
<< "\n";
}
output_data.close();
return 0;
}
Question: Many of the x-coordinates appear twice (same with y-coordinates). I don't understand this, since the period of the mt19937 is much longer than 1.000.000. Does anyone have an idea of what is wrong here?
Note: I get the same behavior when I don't multithread the application, so the problem is not related to wrong multithreading.
EDIT As pointed out in one of the answers, I shouldn't use the same seed for both threads - but that is an error I made when formulating this question, in my real program I seem the threads differently.

Using the core part of your code, I wrote this imperfect test but from what I can see the distribution is pretty uniform:
#include <iostream>
#include <fstream>
#include <random>
#include <map>
#include <iomanip>
using namespace std;
int main()
{
int i;
vector<double> finalPositionX, finalPositionY;
std::map<int, int> hist;
vector <double> positionX, positionY;
mt19937 engine(0);
uniform_real_distribution<> uniform(0, 1);
//normal_distribution<double> normal(0, 1);
for(i=0; i<1000000; ++i)
{
double rnum = uniform(engine);
++hist[std::round(1000*rnum)];
}
for (auto p : hist) {
std::cout << std::fixed << std::setprecision(1) << std::setw(2)
<< p.first << ' ' << std::string(p.second/200, '*') << '\n';
}
return 0;
}
and as others already said it is not unexpected to see some values repeated. For the normal distribution, I used the following modification to rnum and hist to test that and it looks good too:
double rnum = normal(engine);
++hist[std::round(10*rnum)];

As described in this article (and a later article by a Stack Overflow contributor), true randomness doesn't distribute perfectly.
Good randomness :
Bad randomness :
I really recommend reading the article, but to summarize it: a RNG has to be unpredictable, which implies that calling it 100 times must not perfectly fill a 10x10 grid.

First of all - just because you get the same number twice doesn't mean it isn't random. If you throw a dice six times, would you expect six different results? See birthday paradox. That being said - you are right that you shouldn't see too much repetition in this particular case.
I'm not familiar with "#pragma omp parallel", but my guess is you are spawning multiple threads that all seed the mt19937 with the same seed (0). You should use different seeds for all threads - e.g. the thread id.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Thread safety of a static random number generator - c++

Related

C++ multithreaded version of creating vector of random numbers slower than single-threaded version

Can I use a single `default_random_engine` to create multiple normally distributed sets of numbers?

Math.Random equivalent in C++

Generating random numbers in parallel with identical engines fails

Duplicate values generated by mt19937

Categories

Resources