how to generate the same random number in two different environments? - c++

I compiled exactly the same code that generate random numbers in two different environments ( Linux and visual studio ). But I noticed that the outputs are different. I searched online and understand that the two implementations generate different random numbers. But I need the Linux to generate the same random numbers of that generated by visual studio.
So, how to let the two different environments ( Linux and visual studio ) generate the same random numbers. Any ideas.
My code:
void mix_dataset(array<array<int, 20>, 5430>& array_X_dataset, array<int, 5430>& array_Y_dataset) {
// size_t len = array_X_dataset.size();
// for (size_t i = 0; i < len; ++i) {
// size_t swap_index = rand() % len;
mt19937 engine;
engine.seed(3);
for (size_t i = 0; i < 5430; ++i) {
size_t swap_index = engine() % 5430;
if (i == swap_index)
continue;
array<int, 20> data_point{ };
data_point = array_X_dataset[i];
array_X_dataset[i] = array_X_dataset[swap_index];
array_X_dataset[swap_index] = data_point;
int Y = array_Y_dataset[i];
array_Y_dataset[i] = array_Y_dataset[swap_index];
array_Y_dataset[swap_index] = Y;
}
}
int main(){
srand(3);
mix_dataset(array_X_dataset, array_Y_dataset);
}

You can use a the mersenne twister it has reproducable output (it is standardized).
Use the same seed on 2 machines and you're good to go.
#include <random>
#include <iostream>
int main()
{
std::mt19937 engine;
engine.seed(1);
for (std::size_t n = 0; n < 10; ++n)
{
std::cout << engine() << std::endl;
}
}
You can verify it here, https://godbolt.org/z/j5r6ToGY7, just select different compilers and check the output

If you want a pseudorandom number with a known algorithm, you must choose them both explicitly with the C++ library.
You can't do this with rand(), because it will vary between C library implementations.
And please distinguish between pseudorandom number generators (which will produce the same sequence from the same seed) and random ones, which are vanishingly unlikely to coincide.

Since the standard library function implementations offered by the two platforms differ, you'll need to decide on a PRNG that whose output given the same seed is the same on both platforms.
Standard C provides very little in the way of guarantees when it comes to the quality of the PRNG exposed by rand anyway and serious applications should stay away from that.
Whatever the critiques, you would be in a better position using PCG, for example, since you do not seem to need cryptographic quality.
In addition, avoid
size_t swap_index = rand() % len;
as randomness may suffer. You can instead use rejection sampling if the library you choose does not offer an alternative.

Related

(Why) is the std::binomial_distribution biased for large probabilities p and slow for small n?

I want to generate binomially distributed random numbers in c++. Speed is a major concern. Not knowing a lot about random number generators, I use the standard libraries' tools. My code looks like something below:
#include <random>
static std::random_device random_dev;
static std::mt19937 random_generator{random_dev()};
std::binomial_distribution<int> binomial_generator;
void RandomInit(int s) {
//I create the generator object here to save time. Does this make sense?
binomial_generator = std::binomial_distribution<int>(1, 0.5);
random_generator.seed(s);
}
int binomrand(int n, double p) {
binomial_generator.param(std::binomial_distribution<int>::param_type(n, p));
return binomial_generator(random_generator);
}
To test my implementation, I have built a cython wrapper and then executed and timed the function from within python. For reference I have also implemented a "stupid" binomial distribution, which just returns the sum of Bernoulli trials.
int binomrand2(int n, double p) {
int result = 0;
for (int i = 0; i<n; i++) {
if (_Random() < p) //_Random is a thoroughly tested custom random number generator on U[0,1)
result++;
}
return result;
}
Timing showed that the latter implementation is about 50% faster than the former if n < 25. Furthermore, for p = 0.95, the former yielded significantly biased results (the mean over 1000000 trials for n = 40 was 38.23037; standard deviation is 0.0014; the result was reproducable with different seeds).
Is this a (known) issue with the standard library's functions or is my implementation wrong? What could I do to achieve my goal of obtaining accurate results with high efficiency?
The parameter n will mostly be below 100 and smaller values will occur more frequently.
I am open to suggestions outside the realm of the standard library, but I may not be able to use external software libraries.
I am using the VC 2019 compiler on 64bit Windows.
Edit
I have also tested the bias without using python:
double binomrandTest(int n, double p, long long N) {
long long result = 0;
for (long long i = 0; i<N; i++) {
result += binomrand(n, p);
}
return ((double) result) / ((double) N);
}
The result remained biased (38.228045 for the parameters above, where something like 38.000507 would be expected).

Which machines support nondeterministic random_device?

I need to obtain data from different C++ random number generation algorithms, and for that purpose I created some programs. Some of them use pseudo-random number generators and others use random_device (nondeterministic random number generator). The following program belongs to the second group:
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
using namespace std;
const int N = 5000;
const int M = 1000000;
const int VALS = 2;
const int ESP = M / VALS;
int main() {
for (int i = 0; i < N; ++i) {
random_device rd;
if (rd.entropy() == 0) {
cout << "No support for nondeterministic RNG." << endl;
break;
} else {
mt19937 gen(rd());
uniform_int_distribution<int> distrib(0, 1);
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[distrib(gen)];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
}
As you can see in the code, I check for the entropy to be greater than 0. I do this because:
Unlike the other standard generators, this [random_device] is not meant to be an
engine that generates pseudo-random numbers, but a generator based on
stochastic processes to generate a sequence of uniformly distributed
random numbers. Although, certain library implementations may lack the
ability to produce such numbers and employ a random number engine to
generate pseudo-random values instead. In this case, entropy returns
zero. Source
Checking the value of the entropy allows me to abort de data obtaining if the resulting data is going to be pseudo-random (not nondeterministic). Please note that I assume that if rd.entropy() == 0 is true, then we are in pseudo-random mode.
Unfortunately, all my trials result in a file with no data because of entropy being 0. My question is: what can I do to my computer, or where can I find a machine that allows me to obtain the data?
The source you cite is misleading you. The standard says that
double entropy() const noexcept;
Returns: If the implementation employs a random number engine, returns 0.0. Otherwise, returns an entropy estimate for the random numbers returned by operator(), in the range min() to log2(max()+1).
And a better reference has some empirical observations
Notes
This function is not fully implemented in some standard libraries. For
example, LLVM libc++ always returns zero even though the device is
non-deterministic. In comparison, Microsoft Visual C++ implementation
always returns 32, and boost.random returns 10.
In practice, nearly all the main implementations (targeting general purpose computers) have non-deterministic std::random_devices. Your test has a very high false negative rate.

Exponential number generator sometimes gives "weird" results

I am building a simulation in C++ and I have an exponential generator to make the burst times of the processes.
Usually it returns values as such: 3.14707,1.04998. But frequently 1/10 occasions such numbers turn out: 2.64823e-307
This is the code of the generator (I am using srand ( time(NULL) ); at the beginning of the program):
double exponential(float u)
{
double x,mean;
mean = 10;
// generate a U(0,1) random variate
x = rand();
u = x / RAND_MAX;
return (-mean * log(u));
}
And this is how I assign the values. The while part inside is my effort to get rid of such values but it didn't work:
for (int i = 0; i < nPages; i++)
{
index[i] = i;
arrival[i]= poisson(r);
burst[i]=exponential(u);
while (burst[i]<1 || burst[i]>150)
{
cout<<"P"<<i<<endl;
burst[i]=(burst[i-1]+burst[i+1])/2;
}
}
Why do you use the C library instead of the C++ library ??
std::random_device rd;
std::default_random_engine gen(rd());
std::exponential_distribution<double> dist(lambda);
double x = dist(gen);
If the size of burst is nPages, then
for (int i = 0; i < nPages; i++)
{
//...
burst[i]=(burst[i-1]+burst[i+1])/2;
}
will step outside its bounds, so you are likely to end up with nonsense.
You need to think about what is required at the edges.
As far as the comments about rand go rand considered harmful is worth a watch. In your case taking log of 0 is not sensible.
Using your exponential function copied verbatim, I cannot reproduce the error you describe. Issues with the PRNG cranking out either 0 or RAND_MAX should only show up one time out of RAND_MAX apiece, not 10% of the time. I suspect either a buggy compiler, or that what you have shared is not the actual code that produces the described problem.

C/C++ algorithm to produce same pseudo-random number sequences from same seed on different platforms? [duplicate]

This question already has answers here:
Consistent pseudo-random numbers across platforms
(5 answers)
Closed 9 years ago.
The title says it all, I am looking for something preferably stand-alone because I don't want to add more libraries.
Performance should be good since I need it in a tight high-performance loop. I guess that will come at a cost of the degree of randomness.
Any particular pseudo-random number generation algorithm will behave like this. The problem with rand is that it's not specified how it is implemented. Different implementations will behave in different ways and even have varying qualities.
However, C++11 provides the new <random> standard library header that contains lots of great random number generation facilities. The random number engines defined within are well-defined and, given the same seed, will always produce the same set of numbers.
For example, a popular high quality random number engine is std::mt19937, which is the Mersenne twister algorithm configured in a specific way. No matter which machine, you're on, the following will always produce the same set of real numbers between 0 and 1:
std::mt19937 engine(0); // Fixed seed of 0
std::uniform_real_distribution<> dist;
for (int i = 0; i < 100; i++) {
std::cout << dist(engine) << std::endl;
}
Here's a Mersenne Twister
Here is another another PRNG implementation in C.
You may find a collection of PRNG here.
Here's the simple classic PRNG:
#include <iostream>
using namespace std;
unsigned int PRNG()
{
// our initial starting seed is 5323
static unsigned int nSeed = 5323;
// Take the current seed and generate a new value from it
// Due to our use of large constants and overflow, it would be
// very hard for someone to predict what the next number is
// going to be from the previous one.
nSeed = (8253729 * nSeed + 2396403);
// Take the seed and return a value between 0 and 32767
return nSeed % 32767;
}
int main()
{
// Print 100 random numbers
for (int nCount=0; nCount < 100; ++nCount)
{
cout << PRNG() << "\t";
// If we've printed 5 numbers, start a new column
if ((nCount+1) % 5 == 0)
cout << endl;
}
}

Uniform random number generator in c++

I am trying to produce true random number in c++ with C++ TR1.
However, when run my program again, it produces same random numbers.The code is below.
I need true random number for each run as random as possible.
std::tr1::mt19937 eng;
std::tr1::uniform_real<double> unif(0, 1);
unif(eng);
You have to initialize the engine with a seed, otherwise the default seed is going to be used:
eng.seed(static_cast<unsigned int >(time(NULL)));
However, true randomness is something you cannot achieve on a deterministic machine without additional input. Every pseudo-random number generator is periodical in some way, which is something you wouldn't expect from a non-deterministic number. For example std::mt19937 has a period of 219937-1 iterations. True randomness is hard to achieve, as you would have to monitor something that doesn't seem deterministic (user input, atmospheric noise). See Jerry's and Handprint's answer.
If you don't want a time based seed you can use std::random_device as seen in emsr's answer. You could even use std::random_device as generator, which is the closest you'll get to true randomness with standard library methods only.
These are pseudo-random number generators. They can never produce truly random numbers. For that, you typically need special hardware (e.g., typically things like measuring noise in a thermal diode or radiation from radioactive source).
To get a difference sequences from pseudo-random generators in different runs, you typically seed the generator based on the current time.
That produces fairly predictable results though (i.e., somebody else can figure out the seed you used fairly easily. If you need to prevent that, most systems do provide some source of at least fairly random numbers. On Linux, /dev/random, and on Windows, CryptGenRandom.
Those latter tend to be fairly slow, though, so you usually want to use them as a seed, not just retrieve all your random numbers from them.
If you want true hardware random numbers then the standard library offers access to this through the random_device class:
I use it to seed another generator:
#include <random>
...
std::mt19937_64 re;
std::random_device rd;
re.seed(rd());
...
std::cout << re();
If your hardware has /dev/urandom or /dev/random then this will be used. Otherwise the implementation is free to use one of it's pseudorandom generators. On G++ mt19937 is used as a fallback.
I'm pretty sure tr1 has this as well bu as others noted I think it's best to use std C++11 utilities at this point.
Ed
This answer is a wiki. I'm working on a library and examples in .NET, feel free to add your own in any language...
Without external 'random' input (such as monitoring street noise), as a deterministic machine, a computer cannot generate truly random numbers: Random Number Generation.
Since most of us don't have the money and expertise to utilize the special equipment to provide chaotic input, there are ways to utitlize the somewhat unpredictable nature of your OS, task scheduler, process manager, and user inputs (e.g. mouse movement), to generate the improved pseudo-randomness.
Unfortunately, I do not know enough about C++ TR1 to know if it has the capability to do this.
Edit
As others have pointed out, you get different number sequences (which eventually repeat, so they aren't truly random), by seeding your RNG with different inputs. So you have two options in improving your generation:
Periodically reseed your RNG with some sort of chaotic input OR make the output of your RNG unreliable based on how your system operates.
The former can be accomplished by creating algorithms that explicitly produce seeds by examining the system environment. This may require setting up some event handlers, delegate functions, etc.
The latter can be accomplished by poor parallel computing practice: i.e. setting many RNG threads/processes to compete in an 'unsafe manner' to create each subsequent random number (or number sequence). This implicitly adds chaos from the sum total of activity on your system, because every minute event will have an impact on which thread's output ends up having being written and eventually read when a 'GetNext()' type method is called. Below is a crude proof of concept in .NET 3.5. Note two things: 1) Even though the RNG is seeded with the same number everytime, 24 identical rows are not created; 2) There is a noticeable hit on performance and obvious increase in resource consumption, which is a given when improving random number generation:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace RandomParallel
{
class RandomParallel
{
static int[] _randomRepository;
static Queue<int> _randomSource = new Queue<int>();
static void Main(string[] args)
{
InitializeRepository(0, 1, 40);
FillSource();
for (int i = 0; i < 24; i++)
{
for (int j = 0; j < 40; j++)
Console.Write(GetNext() + " ");
Console.WriteLine();
}
Console.ReadLine();
}
static void InitializeRepository(int min, int max, int size)
{
_randomRepository = new int[size];
var rand = new Random(1024);
for (int i = 0; i < size; i++)
_randomRepository[i] = rand.Next(min, max + 1);
}
static void FillSource()
{
Thread[] threads = new Thread[Environment.ProcessorCount * 8];
for (int j = 0; j < threads.Length; j++)
{
threads[j] = new Thread((myNum) =>
{
int i = (int)myNum * _randomRepository.Length / threads.Length;
int max = (((int)myNum + 1) * _randomRepository.Length / threads.Length) - 1;
for (int k = i; k <= max; k++)
{
_randomSource.Enqueue(_randomRepository[k]);
}
});
threads[j].Priority = ThreadPriority.Highest;
}
for (int k = 0; k < threads.Length; k++)
threads[k].Start(k);
}
static int GetNext()
{
if (_randomSource.Count > 0)
return _randomSource.Dequeue();
else
{
FillSource();
return _randomSource.Dequeue();
}
}
}
}
As long as there is user(s) input/interaction during the generation, this technique will produce an uncrackable, non-repeating sequence of 'random' numbers. In such a scenario, knowing the initial state of the machine would be insufficient to predict the outcome.
Here's an example of seeding the engine (using C++11 instead of TR1)
#include <chrono>
#include <random>
#include <iostream>
int main() {
std::mt19937 eng(std::chrono::high_resolution_clock::now()
.time_since_epoch().count());
std::uniform_real_distribution<> unif;
std::cout << unif(eng) << '\n';
}
Seeding with the current time can be relatively predictable and is probably not something that should be done. The above at least does not limit you just to one possible seed per second, which is very predictable.
If you want to seed from something like /dev/random instead of the current time you can do:
std::random_device r;
std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};
std::mt19937 eng(seed);
(This depends on your standard library implementation. For example, libc++ uses /dev/urandom by default, but in VS11 random_device is deterministic)
Of course nothing you get out of mt19937 is going to meet your requirement of a "true random number", and I suspect that you don't really need true randomness.