Exponential number generator sometimes gives "weird" results - c++

I am building a simulation in C++ and I have an exponential generator to make the burst times of the processes.
Usually it returns values as such: 3.14707,1.04998. But frequently 1/10 occasions such numbers turn out: 2.64823e-307
This is the code of the generator (I am using srand ( time(NULL) ); at the beginning of the program):
double exponential(float u)
{
double x,mean;
mean = 10;
// generate a U(0,1) random variate
x = rand();
u = x / RAND_MAX;
return (-mean * log(u));
}
And this is how I assign the values. The while part inside is my effort to get rid of such values but it didn't work:
for (int i = 0; i < nPages; i++)
{
index[i] = i;
arrival[i]= poisson(r);
burst[i]=exponential(u);
while (burst[i]<1 || burst[i]>150)
{
cout<<"P"<<i<<endl;
burst[i]=(burst[i-1]+burst[i+1])/2;
}
}

Why do you use the C library instead of the C++ library ??
std::random_device rd;
std::default_random_engine gen(rd());
std::exponential_distribution<double> dist(lambda);
double x = dist(gen);

If the size of burst is nPages, then
for (int i = 0; i < nPages; i++)
{
//...
burst[i]=(burst[i-1]+burst[i+1])/2;
}
will step outside its bounds, so you are likely to end up with nonsense.
You need to think about what is required at the edges.
As far as the comments about rand go rand considered harmful is worth a watch. In your case taking log of 0 is not sensible.

Using your exponential function copied verbatim, I cannot reproduce the error you describe. Issues with the PRNG cranking out either 0 or RAND_MAX should only show up one time out of RAND_MAX apiece, not 10% of the time. I suspect either a buggy compiler, or that what you have shared is not the actual code that produces the described problem.

Related

(Why) is the std::binomial_distribution biased for large probabilities p and slow for small n?

I want to generate binomially distributed random numbers in c++. Speed is a major concern. Not knowing a lot about random number generators, I use the standard libraries' tools. My code looks like something below:
#include <random>
static std::random_device random_dev;
static std::mt19937 random_generator{random_dev()};
std::binomial_distribution<int> binomial_generator;
void RandomInit(int s) {
//I create the generator object here to save time. Does this make sense?
binomial_generator = std::binomial_distribution<int>(1, 0.5);
random_generator.seed(s);
}
int binomrand(int n, double p) {
binomial_generator.param(std::binomial_distribution<int>::param_type(n, p));
return binomial_generator(random_generator);
}
To test my implementation, I have built a cython wrapper and then executed and timed the function from within python. For reference I have also implemented a "stupid" binomial distribution, which just returns the sum of Bernoulli trials.
int binomrand2(int n, double p) {
int result = 0;
for (int i = 0; i<n; i++) {
if (_Random() < p) //_Random is a thoroughly tested custom random number generator on U[0,1)
result++;
}
return result;
}
Timing showed that the latter implementation is about 50% faster than the former if n < 25. Furthermore, for p = 0.95, the former yielded significantly biased results (the mean over 1000000 trials for n = 40 was 38.23037; standard deviation is 0.0014; the result was reproducable with different seeds).
Is this a (known) issue with the standard library's functions or is my implementation wrong? What could I do to achieve my goal of obtaining accurate results with high efficiency?
The parameter n will mostly be below 100 and smaller values will occur more frequently.
I am open to suggestions outside the realm of the standard library, but I may not be able to use external software libraries.
I am using the VC 2019 compiler on 64bit Windows.
Edit
I have also tested the bias without using python:
double binomrandTest(int n, double p, long long N) {
long long result = 0;
for (long long i = 0; i<N; i++) {
result += binomrand(n, p);
}
return ((double) result) / ((double) N);
}
The result remained biased (38.228045 for the parameters above, where something like 38.000507 would be expected).

Which machines support nondeterministic random_device?

I need to obtain data from different C++ random number generation algorithms, and for that purpose I created some programs. Some of them use pseudo-random number generators and others use random_device (nondeterministic random number generator). The following program belongs to the second group:
#include <iostream>
#include <vector>
#include <cmath>
#include <random>
using namespace std;
const int N = 5000;
const int M = 1000000;
const int VALS = 2;
const int ESP = M / VALS;
int main() {
for (int i = 0; i < N; ++i) {
random_device rd;
if (rd.entropy() == 0) {
cout << "No support for nondeterministic RNG." << endl;
break;
} else {
mt19937 gen(rd());
uniform_int_distribution<int> distrib(0, 1);
vector<int> hist(VALS, 0);
for (int j = 0; j < M; ++j) ++hist[distrib(gen)];
int Y = 0;
for (int j = 0; j < VALS; ++j) Y += abs(hist[j] - ESP);
cout << Y << endl;
}
}
}
As you can see in the code, I check for the entropy to be greater than 0. I do this because:
Unlike the other standard generators, this [random_device] is not meant to be an
engine that generates pseudo-random numbers, but a generator based on
stochastic processes to generate a sequence of uniformly distributed
random numbers. Although, certain library implementations may lack the
ability to produce such numbers and employ a random number engine to
generate pseudo-random values instead. In this case, entropy returns
zero. Source
Checking the value of the entropy allows me to abort de data obtaining if the resulting data is going to be pseudo-random (not nondeterministic). Please note that I assume that if rd.entropy() == 0 is true, then we are in pseudo-random mode.
Unfortunately, all my trials result in a file with no data because of entropy being 0. My question is: what can I do to my computer, or where can I find a machine that allows me to obtain the data?
The source you cite is misleading you. The standard says that
double entropy() const noexcept;
Returns: If the implementation employs a random number engine, returns 0.0. Otherwise, returns an entropy estimate for the random numbers returned by operator(), in the range min() to log2(max()+1).
And a better reference has some empirical observations
Notes
This function is not fully implemented in some standard libraries. For
example, LLVM libc++ always returns zero even though the device is
non-deterministic. In comparison, Microsoft Visual C++ implementation
always returns 32, and boost.random returns 10.
In practice, nearly all the main implementations (targeting general purpose computers) have non-deterministic std::random_devices. Your test has a very high false negative rate.

First random number is always smaller than rest

I happen to notice that in C++ the first random number being called with the std rand() method is most of the time significant smaller than the second one. Concerning the Qt implementation the first one is nearly always several magnitudes smaller.
qsrand(QTime::currentTime().msec());
qDebug() << "qt1: " << qrand();
qDebug() << "qt2: " << qrand();
srand((unsigned int) time(0));
std::cout << "std1: " << rand() << std::endl;
std::cout << "std2: " << rand() << std::endl;
output:
qt1: 7109361
qt2: 1375429742
std1: 871649082
std2: 1820164987
Is this intended, due to error in seeding or a bug?
Also while the qrand() output varies strongly the first rand() output seems to change linearly with time. Just wonder why.
I'm not sure that could be classified as a bug, but it has an explanation. Let's examine the situation:
Look at rand's implementation. You'll see it's just a calculation using the last generated value.
You're seeding using QTime::currentTime().msec(), which is by nature bounded by the small range of values 0..999, but qsrand accepts an uint variable, on the range 0..4294967295.
By combining those two factors, you have a pattern.
Just out of curiosity: try seeding with QTime::currentTime().msec() + 100000000
Now the first value will probably be bigger than the second most of the time.
I wouldn't worry too much. This "pattern" seems to happen only on the first two generated values. After that, everything seems to go back to normal.
EDIT:
To make things more clear, try running the code below. It'll compare the first two generated values to see which one is smaller, using all possible millisecond values (range: 0..999) as the seed:
int totalCalls, leftIsSmaller = 0;
for (totalCalls = 0; totalCalls < 1000; totalCalls++)
{
qsrand(totalCalls);
if (qrand() < qrand())
leftIsSmaller++;
}
qDebug() << (100.0 * leftIsSmaller) / totalCalls;
It will print 94.8, which means 94.8% of the time the first value will be smaller than the second.
Conclusion: when using the current millisecond to seed, you'll see that pattern for the first two values. I did some tests here and the pattern seems to disappear after the second value is generated. My advice: find a "good" value to call qsrand (which should obviously be called only once, at the beginning of your program). A good value should span the whole range of the uint class. Take a look at this other question for some ideas:
Recommended way to initialize srand?
Also, take a look at this:
PCG: A Family of Better Random Number Generators
Neither current Qt nor C standard run-time have a quality randomizer and your test shows. Qt seems to use C run-time for that (this is easy to check but why). If C++ 11 is available in your project, use much better and way more reliable method:
#include <random>
#include <chrono>
auto seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::uniform_int_distribution<uint> distribution;
uint randomUint = distribution(generator);
There is good video that covers the topic. As noted by commenter user2357112 we can apply different random engines and then different distributions but for my specific use the above worked really well.
Keeping in mind that making judgments about a statistical phenomena based on a small number of samples might be misleading, I decided to run a small experiment. I run the following code:
int main()
{
int i = 0;
int j = 0;
while (i < RAND_MAX)
{
srand(time(NULL));
int r1 = rand();
int r2 = rand();
if (r1 < r2)
++j;
++i;
if (i%10000 == 0) {
printf("%g\n", (float)j / (float)i);
}
}
}
which basically printed the percentage of times the first generated number was smaller than the second. Below you see the plot of that ratio:
and as you can see it actually approaches 0.5 after less than 50 actual new seeds.
As suggested in the comment, we could modify the code to use consecutive seeds every iteration and speed up the convergence:
int main()
{
int i = 0;
int j = 0;
int t = time(NULL);
while (i < RAND_MAX)
{
srand(t);
int r1 = rand();
int r2 = rand();
if (r1 < r2)
++j;
++i;
if (i%10000 == 0) {
printf("%g\n", (float)j / (float)i);
}
++t;
}
}
This gives us:
which stays pretty close to 0.5 as well.
While rand is certainly not the best pseudo random number generator, the claim that it often generates a smaller number during the first run does not seem to be warranted.

Invalid argument for mersenne twister::seed (C++)

I have created a simulation environment which has several stochastic parts involved. I draw numbers from normal, uniform and lognormal distributions. In most of the cases this runs fine, however, when I decide to do 100 simulations after each other I am getting the error:
R6010 Abort() has been called.
In my console I get the error: invalid argument for mersenne_twister::seed. However, I am only using the standard pseudo-random number generator rand(). At no point I call mersene_twister. So this probably is a method from the std::normal_distribution.
Furthermore I don't seed why my seed value is invalid after X iterations and not for the first X iterations?
Does anyone have any experience with this error? Does anyone have any suggestions how to solve this
P.s. srand(time(0)) is called only once, in the beginning of the main. While all random numbers are generated in a second class "random_num".
P.s.s I am aware that this might not be the best way to generate random numbers, however it is sufficient for my purpose.
The code as requested for the RNG:
double random_num::uniform(int lb, int ub)//Generate uniformly distributed random numbers with lowerbound lb and upperbound ub
{
//srand(time(0));
double number;
number =((double) rand() / (RAND_MAX+1)) * (ub-lb+1) + lb;
return number;
}
double random_num::normal(double mean, double var) //Generate normally distributed random numbers with mean and variance
{
//srand(time(0));
default_random_engine generator (rand());
normal_distribution<double> distribution(mean, var);
return distribution(generator);
}
double random_num::lognormal(double mean, double var, double offset)
{
//srand(time(0));
random_num dummy;
double random;
random = exp(dummy.normal(mean,var))-offset; //Calculate the 3 parameter lognormal
return random;
}
#lip The problem was indeed that rand() returned a zero at some moment. And therefore default_random_engine generator(0); aborted.
The solution was quite simple:
Create a function that checks that rand() it is not a zero:
int rand0()
{
int dummy = rand();
while(dummy==0)
{
dummy = rand();
}
return dummy;
}
And then: default_random_engine generator(rand0());

How do you create pseudo random numbers sequentially in c/c++?

srand(time(NULL));
for (it=hand.begin(); it < hand.end(); it++)
(*it) = rand() % 13 + 1;
This code does not work to create many random numbers at a time.
Is there a way to do it that isn't as complex as Mersennes and isn't operating system dependent?
PRNGs don't create many PRNs at once. Each output is dependent on the previous output, PRNGs are highly stateful.
Try:
srand(time(NULL)); // once at the start of the program
for( int i = 0; i < N; ++i )
r[i] = rand();
Even APIs that return an entire block of output in a single function call, have just moved that loop inside the function.
Call srand just once, at the start of your program. Then call rand() (not srand(rand())) to generate each random number.
Boost.Random has lots of good random number generators which are easy to use.
George Marsaglia posted a Multiply With Carry PRNG some time ago in sci.math.
I cannot say how good it is or how well it behaves, but you might want to give it a try.
It should be OS and platform independent.
"please make sure you answer the question"
OK
for (int i=n1; i < n2; ++i)
{
int k;
do k = rand(); while (i !=k);
// k is a sequential pseudo random number
}
There may be issues with efficiency...