Duplicate values generated by mt19937 - c++

I am working with C++11's random library, and I have a small program that generates a coordinate pair x, y on a circle with unit radius. Here is the simple multithreaded program
#include <iostream>
#include <fstream>
#include <random>
using namespace std;
int main()
{
const double PI = 3.1415;
double angle, radius, X, Y;
int i;
vector<double> finalPositionX, finalPositionY;
#pragma omp parallel
{
vector <double> positionX, positionY;
mt19937 engine(0);
uniform_real_distribution<> uniform(0, 1);
normal_distribution<double> normal(0, 1);
#pragma omp for private(angle, radius, X, Y)
for(i=0; i<1000000; ++i)
{
angle = uniform(engine)*2.0*PI;
radius = sqrt(uniform(engine));
X = radius*cos(angle);
Y = radius*sin(angle);
positionX.push_back(X);
positionY.push_back(Y);
}
#pragma omp barrier
#pragma omp critical
finalPositionX.insert(finalPositionX.end(), positionX.begin(), positionX.end());
finalPositionY.insert(finalPositionY.end(), positionY.begin(), positionY.end());
}
ofstream output_data("positions.txt", ios::out);
output_data.precision(9);
for(unsigned long long temp_var=0; temp_var<(unsigned long long)finalPositionX.size(); temp_var++)
{
output_data << finalPositionX[temp_var]
<< "\t\t\t\t"
<< finalPositionY[temp_var]
<< "\n";
}
output_data.close();
return 0;
}
Question: Many of the x-coordinates appear twice (same with y-coordinates). I don't understand this, since the period of the mt19937 is much longer than 1.000.000. Does anyone have an idea of what is wrong here?
Note: I get the same behavior when I don't multithread the application, so the problem is not related to wrong multithreading.
EDIT As pointed out in one of the answers, I shouldn't use the same seed for both threads - but that is an error I made when formulating this question, in my real program I seem the threads differently.

Using the core part of your code, I wrote this imperfect test but from what I can see the distribution is pretty uniform:
#include <iostream>
#include <fstream>
#include <random>
#include <map>
#include <iomanip>
using namespace std;
int main()
{
int i;
vector<double> finalPositionX, finalPositionY;
std::map<int, int> hist;
vector <double> positionX, positionY;
mt19937 engine(0);
uniform_real_distribution<> uniform(0, 1);
//normal_distribution<double> normal(0, 1);
for(i=0; i<1000000; ++i)
{
double rnum = uniform(engine);
++hist[std::round(1000*rnum)];
}
for (auto p : hist) {
std::cout << std::fixed << std::setprecision(1) << std::setw(2)
<< p.first << ' ' << std::string(p.second/200, '*') << '\n';
}
return 0;
}
and as others already said it is not unexpected to see some values repeated. For the normal distribution, I used the following modification to rnum and hist to test that and it looks good too:
double rnum = normal(engine);
++hist[std::round(10*rnum)];

As described in this article (and a later article by a Stack Overflow contributor), true randomness doesn't distribute perfectly.
Good randomness :
Bad randomness :
I really recommend reading the article, but to summarize it: a RNG has to be unpredictable, which implies that calling it 100 times must not perfectly fill a 10x10 grid.

First of all - just because you get the same number twice doesn't mean it isn't random. If you throw a dice six times, would you expect six different results? See birthday paradox. That being said - you are right that you shouldn't see too much repetition in this particular case.
I'm not familiar with "#pragma omp parallel", but my guess is you are spawning multiple threads that all seed the mt19937 with the same seed (0). You should use different seeds for all threads - e.g. the thread id.

Related

How to generate lognormal distributed random variable using c++ boost library

I am trying to generate different random numbers following different distributions to conduct some experiments on them. I choose the boost library in c++ because I saw a large number of functions build in there. For example, in lognormal distribution, I followed this page https://www.boost.org/doc/libs/1_43_0/libs/math/doc/sf_and_dist/html/math_toolkit/dist/dist_ref/dists/lognormal_dist.html and some more. But I cannot understand how I can actually generate the random number. I tried
int main(){boost::math::lognormal_distribution<> myLognormal{0,8};cout << myLognormal() << endl; return 0;}
but its doing nothing but error.
You have had some helpful comments already. Let me tie it together into an example:
Using Standard Library (C++11 and up)
Live On Coliru
#include <random>
#include <iostream>
int main() {
std::mt19937 engine; // uniform random bit engine
// seed the URBG
std::random_device dev{};
engine.seed(dev());
// setup a distribution:
double mu = 1.0;
double sigma = 1.0;
std::lognormal_distribution<double> dist(mu, sigma);
for (int i = 1'000; i--;) {
std::cout << dist(engine) << "\n";
}
}
Plotting those numbers:
https://plotly.com/~sehe/27/
Using Boost Random
Same with Boost Random:
Live On Coliru
#include <boost/random.hpp>
#include <boost/random/random_device.hpp>
#include <iostream>
int main() {
boost::random::mt19937 engine; // uniform random bit engine
// seed the URBG
boost::random::random_device dev;
engine.seed(dev); // actually without call operator is better with boost
// setup a distribution:
double mu = 1.0;
double sigma = 1.0;
boost::random::lognormal_distribution<double> dist(mu, sigma);
for (int i = 1'000; i--;) {
std::cout << dist(engine) << "\n";
}
}
Note that you need to link the Boost Random library then.

Random number generator producing identical results

I am having trouble using the random header to create a simple random number generator.
#include <iostream>
#include <random>
using namespace std;
int main()
{
random_device rd; //seed generator
mt19937_64 generator{rd()}; //generator initialized with seed from rd
uniform_int_distribution<> dist{1, 6};
for(int i = 0; i < 15; i++)
{
int random = dist(generator);
cout << random << endl;
}
}
This code produces identical results every time I run the program. What am I doing wrong? Also is there a way to modify this code such that it will generate a floating point number between 0 and 1? I don't think the uniform_int_distribution will let me and I can't figure out which distribution to use.
EDIT: Posted a possible solution to my problem below
Here is what I came up with eventually:
#include <iostream>
#include <ctime>
#include <random>
using namespace std;
int main()
{
srand(time(0));
default_random_engine rd(rand());
mt19937_64 generator{rd()}; //generator initialized with seed from rd
uniform_real_distribution<double> dist{0,1};
for(int i = 0; i < 15; i++)
{
double random = dist(generator);
cout << fixed << random << endl;
}
}
It turns out that you actually CAN combine srand(time(0)) with an engine from the random header file, and their powers combined seem to produce random-feeling numbers better than I have managed with either alone. Please feel free to point out any problems with this arrangement.

Can I use a single `default_random_engine` to create multiple normally distributed sets of numbers?

I want to generate a set of unit vectors (for any arbitrary dimension), which are evenly distributed across all directions. For this I generate normally distributed numbers for each vector component and scale the result by the inverse of the magnitude.
My question: Can I use a single std::default_random_engine to generate numbers for all components of my vector or does every component require its own engine?
Afaik, each component needs to be Gaussian-distributed independently for the math to work out and I cannot assess the difference between the two scenarios. Here's a MWE with a single RNG (allocation and normalization of vectors is omitted here).
std::vector<std::vector<double>> GenerateUnitVecs(size_t dimension, size_t count)
{
std::vector<std::vector<double>> result;
/* Set up a _single_ RNG */
size_t seed = GetSeed(); // system_clock
std::default_random_engine gen(seed);
std::normal_distribution<double> distribution(0.0, 1.0);
/* Generate _multiple_ (independent?) distributions */
for(size_t ii = 0; ii < count; ++ii){
std::vector<double> vec;
for(size_t comp = 0; comp < dimension; ++comp)
vec.push_back(distribution(gen)); // <-- random number goes here
result.push_back(vec);
}
return result;
}
Thank you.
The OP asked:
My question: Can I use a single std::default_random_engine to generate numbers for all components of my vector or does every component require its own engine?
I would suggest as others have stated in the comments about not using std::default_random_engine and instead use std::random_device or std::chrono::high_resolution_clock
To use random_device for a normal distribution or Gaussian it is quite simple:
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <random>
#include <cmath>
int main() {
std::random_device rd{};
std::mt19937 gen{ rd() };
// values near the mean are the most likely
// standard deviation affects the dispersion of generated values from the mean
std::normal_distribution<> d{5,2};
std::map<int, int> hist{};
for ( int n=0; n<10000; ++n ) {
++hist[std::round(d(gen))];
}
for ( auto p : hist ) {
std::cout << std::setw(2)
<< p.first << ' ' << std::string(p.second/200, '*' ) << '\n';
}
}
To use std::chrono::high_resolution_clock: there is a little more work but just as easy.
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <random>
#include <cmath>
#include <limits>
#include <chrono>
class ChronoClock {
public:
using Clock = std::conditional_t<std::chrono::high_resolution_clock::is_steady,
std::chrono::high_resolution_clock,
std::chrono::steady_clock>;
static unsigned int getTimeNow() {
unsigned int now = static_cast<unsigned int>(Clock::now().time_since_epoch().count());
return now;
}
};
int main() {
/*static*/ std::mt19937 gen{}; // Can be either static or not.
gen.seed( ChronoClock::getTimeNow() );
// values near the mean are the most likely
// standard deviation affects the dispersion of generated values from the mean
std::normal_distribution<> d{5,2};
std::map<int, int> hist{};
for ( int n=0; n<10000; ++n ) {
++hist[std::round(d(gen))];
}
for ( auto p : hist ) {
std::cout << std::setw(2)
<< p.first << ' ' << std::string(p.second/200, '*' ) << '\n';
}
}
As you can see from the examples above where these are shown here from cppreference.comthere is a single engine, single seed, and a single distribution, that it is generating random numbers or sets of random numbers with a single engine.
EDIT - Additionally you can use a class that I've written as a wrapper class for random engines and random distributions. You can refer to this answer of mine here.
I am assuming you are not generating random numbers in parallel. Then theoretically, there is no problem with generating random independent Gaussian vectors with one engine.
Each call to std::normal_distribution's () operator gives you a random real-valued number following specified Gaussian distribution. Successive calls of () operator give you independent samples. The implementation in gcc (my version: 4.8) uses the Marsaglia Polar method for standard normal random number generation. You can read this Wikipedia page for more detail.
However, for rigorous scientific research that demands high quality randomness and a huge amount of random samples, I would recommend using the Mersenne-Twister engine (mt19937 32-bit or 64-bit) instead of the default engine, since it is based on a well-established method, has long period and performs well on statistical random tests.

Generating random numbers in parallel with identical engines fails

I am using the RNG provided by C++11 and I am also toying around with OpenMP. I have assigned an engine to each thread and as a test I give the same seed to each engine. This means that I would expect both threads to yield the exact same sequence of randomly generated numbers. Here is a MWE:
#include <iostream>
#include <random>
using namespace std;
uniform_real_distribution<double> uni(0, 1);
normal_distribution<double> nor(0, 1);
int main()
{
#pragma omp parallel
{
mt19937 eng(0); //GIVE EACH THREAD ITS OWN ENGINE
vector<double> vec;
#pragma omp for
for(int i=0; i<5; i++)
{
nor(eng);
vec.push_back(uni(eng));
}
#pragma omp critical
cout << vec[0] << endl;
}
return 0;
}
Most often I get the output 0.857946 0.857946, but a few times I get 0.857946 0.592845. How is the latter result possible, when the two threads have identical, uncorrelated engines?!
You have to put nor and uni inside the omp parallel region too. Like this:
#pragma omp parallel
{
uniform_real_distribution<double> uni(0, 1);
normal_distribution<double> nor(0, 1);
mt19937 eng(0); //GIVE EACH THREAD ITS OWN ENGINE
vector<double> vec;
Otherwise there will only be one copy of each, when in fact every thread needs its own copy.
Updated to add: I now see that exactly the same problem is discussed in
this stackoverflow thread.

Generating a random double between a range of values

Im currently having trouble generating random numbers between -32.768 and 32.768. It keeps giving me the same values but with a small change in the decimal field. ex : 27.xxx.
Heres my code, any help would be appreciated.
#include <iostream>
#include <ctime>
#include <cstdlib>
using namespace std;
int main()
{
srand( time(NULL) );
double r = (68.556*rand()/RAND_MAX - 32.768);
cout << r << endl;
return 0;
}
I should mention if you're using a C++11 compiler, you can use something like this, which is actually easier to read and harder to mess up:
#include <random>
#include <iostream>
#include <ctime>
int main()
{
//Type of random number distribution
std::uniform_real_distribution<double> dist(-32.768, 32.768); //(min, max)
//Mersenne Twister: Good quality random number generator
std::mt19937 rng;
//Initialize with non-deterministic seeds
rng.seed(std::random_device{}());
// generate 10 random numbers.
for (int i=0; i<10; i++)
{
std::cout << dist(rng) << std::endl;
}
return 0;
}
As bames53 pointed out, the above code can be made even shorter if you make full use of c++11:
#include <random>
#include <iostream>
#include <ctime>
#include <algorithm>
#include <iterator>
int main()
{
std::mt19937 rng;
std::uniform_real_distribution<double> dist(-32.768, 32.768); //(min, max)
rng.seed(std::random_device{}()); //non-deterministic seed
std::generate_n(
std::ostream_iterator<double>(std::cout, "\n"),
10,
[&]{ return dist(rng);} );
return 0;
}
Also, If you are not using c++ 11 you can use the following function instead:
double randDouble(double precision, double lowerBound, double upperBound) {
double random;
random = static_cast<double>(((rand()%(static_cast<int>(std::pow(10,precision)*(upperBound - lowerBound) + 1))) + lowerBound*std::pow(10,precision)))/std::pow(10,precision);
return random;
}
So, I think this is a typical case of "using time(NULL) isn't a great way of seeding random numbers for runs that start close together". There isn't that many bits that change in time(NULL) from one call to the next, so random numbers are fairly similar. This is not a new phenomena - if you google "my random numbers aren't very random", you'll find LOTS of this.
There are a few different solutions - getting a microsecond or nanosecond time would be the simplest choice - in Linux gettimeofday will give you a microsecond time as part of the struct.
It seams to be plainly obvious but some of the examples say otherwise... but i thought when you divide 1 int with another you always get an int? and you need to type cast each int to double/float before you divide them.
ie: double r = (68.556* (double)rand()/(double)RAND_MAX - 32.768);
also if you call srand() every time you call rand() you reset the seed which results in similar values returned every time instead of ''random'' ones.
I've added a for loop to your program:
#include <iostream>
#include <ctime>
#include <cstdlib>
using namespace std;
int main () {
srand(time (NULL));
for (int i = 0; i < 10; ++i) {
double r = ((68.556 * rand () / RAND_MAX) - 32.768);
cout << r << endl;
}
return 0;
}
Example output:
31.6779
-28.2096
31.5672
18.9916
-1.57149
-0.993889
-32.4737
24.6982
25.936
26.4152
It seems Okay to me. I've added the code on Ideone for you.
Here are four runs:
Run 1:
-29.0863
-22.3973
34.1034
-1.41155
-2.60232
-30.5257
31.9254
-17.0673
31.7522
28.227
Run 2:
-14.2872
-0.185124
-27.3674
8.12921
22.4611
-0.414546
-21.4944
-11.0871
4.87673
5.4545
Run 3:
-23.9083
-6.04738
-6.54314
30.1767
-16.2224
-19.4619
3.37444
9.28014
25.9318
-22.8807
Run 4:
25.1364
16.3011
0.596151
5.3953
-25.2851
10.7301
18.4541
-18.8511
-0.828694
22.8335
Perhaps you're not waiting at least a second between runs?