Millions of random numbers generated "overflow" rand_r?

Millions of random numbers generated "overflow" rand_r? - c++

I am having trouble with rand_r. I have a simulation that generates millions of random numbers. I have noticed that at a certain point in time, these numbers are no longer uniform. What could be the problem?
What i do: i create an instance of a generator and give it is own seed.
mainRGen= new nativeRandRUni(idumSeed_g);
here is the class/object def:
class nativeRandRUni {
public:
unsigned seed;
nativeRandRUni(unsigned sd){ seed= sd; }
float genP() { return (rand_r(&seed))/float(RAND_MAX); } // [0,1]
int genI(int R) { return (rand_r(&seed) % R); } // [0,R-1]
};
numbers are simply generated by:
newIntNumber= mainRGen->genI(desired_max);
newFloatNumber= mainRGen->genP();
the simulations have the problem described above. I know this is happening cause i have checked the distribution of the generated numbers after the point in time that a signature is shown in the results (see this, top image, http://ubuntuone.com/0tbfidZaXfGNTfiVr3x7DR)
also, if i print the seed at t-1 and t, being t the time point of the signature, i can see the seed changing by an order of magnitude from value 263069042 to 1069048066
if i run the code with a different seed, the problem is always present but at different time points
Also, if i use rand() instead of my object, all goes well... i DO need the object cause sometimes i used threads. The example above does not have threads.
i am really lost here, any clues?
EDIT - EDIT
it can be reproducible by looping enough times, problem is that, like i said, it takes millions of iterations for the problem to arise. For seed -158342163 i get it at generation t=134065568. One can check numbers generated before (uniform) and after (not uniform). I get the same problem if i change the seed manually at given t's, see (*) in code. Something i also do not expect to happen?
#include <tr1/random>
#include <fstream>
#include <sstream>
#include <iostream>
using std::ofstream;
using std::cout;
using std::endl;
class nativeRandRUni {
public:
unsigned seed;
long count;
nativeRandRUni(unsigned sd){ seed= sd; count=0; }
float genP() { count++; return (rand_r(&seed))/float(RAND_MAX); } // [0,1]
int genI(int R) { count++; return (rand_r(&seed) % R); } // [0,R-1]
};
int main(int argc, char *argv[]){
long timePointOfProblem= 134065568;
nativeRandRUni* mainRGen= new nativeRandRUni(-158342163);
int rr;
//ofstream* fout_metaAux= new ofstream();
//fout_metaAux->open("random.numbers");
for(int i=0; i< timePointOfProblem; i++){
rr= mainRGen->genI(1009200);
//(*fout_metaAux) << rr << endl;
//if(i%1000==0) mainRGen->seed= 111111; //(*) FORCE
}
//fout_metaAux->close();
}

Given that random numbers is key to your simulation, you should implement your own generator. I don't know what algorithm rand_r is using, but it could be something pretty crappy like linear congruent generator.
I'd look into implementing something fast and with good qualities where you know the underlying algorithm. I'd start by looking at implementing Mersenne Twister:
http://en.wikipedia.org/wiki/Mersenne_twister
Its simple to implement and very fast - requires no divides.

ended up trying a simple solution from boost, changing the generator to:
class nativeRandRUni {
public:
typedef mt19937 EngineType;
typedef uniform_real<> DistributionType;
typedef variate_generator<EngineType, DistributionType> VariateGeneratorType;
nativeRandRUni(long s, float min, float max) : gen(EngineType(s), DistributionType(min, max)) {}
VariateGeneratorType gen;
};
I don't get the problem anymore... tho it solved it, i dont feel very comfortable with not understanding what it was. I think Rafael is right, i should not trust rand_r for this intensive number of generations
Now, this is slower than before, so i may look for ways of optimizing it.
QUESTION: Would a Mersenne Twister implementation in principle be faster?
and thanks to all!

Related

C++: How to generate random numbers while excluding numbers from a given cache

So in c++ I'm using mt19937 engine and the uniform_int_distribution in my random number generator like so:
#include <random>
#include <time.h>
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
return dist(mt);
}
What I need is to alter the above generator such that there is a cache that contains a number of integers I need to be excluded when I use the above generator over and over again.
How do I alter the above such that I can achieve this?

There are many ways to do it. A simple way would be to maintain your "excluded numbers" in a std::set and after each generation of a random number, check whether it is in the set and if it is then generate a new random number - repeat until you get a number that was not in the set, then return that.
Btw; while distributions are cheap to construct, engines are not. You don't want to re-construct your mt19937 every time the function is called, but instead create it once and then re-use it. You probably also want to use a better seed than the current time in seconds.

Are you 1) attempting to sample without replacement in the discrete interval? Or is it 2) a patchy distribution over the interval that says fairly constant?
If 1) you could use std::shuffle as per the answer here How to sample without replacement using c++ uniform_int_distribution
If 2) you could use std::discrete_distribution (element 0 corresponding to lwr_lm) and weight zero the numbers you don't want. Obviously the memory requirements are linear in upper_lm-lwr_lm so might not be practical if this is large

I would propose two similar solutions for the problem. They are based upon probabilistic structures, and provide you with the answer "potentially in cache" or "definitely not in cache". There are false positives but no false negatives.
Perfect hash function. There are many implementations, including one from GNU. Basically, run it on set of cache values, and use generated perfect hash functions to reject sampled values. You don't even need to maintain hash table, just function mapping random value to integer index. As soon as index is in the hash range, reject the number. Being perfect means you need only one call to check and result will tell you that number is in the set. There are potential collisions, so false positives are possible.
Bloom filter. Same idea, build filter with whatever bits per cache item you're willing to spare, and with quick check you either will get "possible in the cache" answer or clear negative. You could trade answer precision for memory and vice versa. False positives are possible

As mentioned by #virgesmith, in his answer, it might be better solution in function of your problem.
The method with a cache and uses it to filter future generation is inefficient for large range wiki.
Here I write a naive example with a different method, but you will be limited by your memory. You pick random number for a buffer and remove it for next iteration.
#include <random>
#include <time.h>
#include <iostream>
int get_random(int lwr_lm, int upper_lm, std::vector<int> &buff, std::mt19937 &mt){
if (buff.size() > 0) {
std::uniform_int_distribution<int> dist(0, buff.size()-1);
int tmp_index = dist(mt);
int tmp_value = buff[tmp_index];
buff.erase(buff.begin() + tmp_index);
return tmp_value;
} else {
return 0;
}
}
int main() {
// lower and upper limit for random distribution
int lower = 0;
int upper = 10;
// Random generator
std::mt19937 mt(time(nullptr));
// Buffer to filter and avoid duplication, Buffer contain all integer between lower and uper limit
std::vector<int> my_buffer(upper-lower);
std::iota(my_buffer.begin(), my_buffer.end(), lower);
for (int i = 0; i < 20; ++i) {
std::cout << get_random(lower, upper, my_buffer, mt) << std::endl;
}
return 0;
}
Edit: a cleaner solution here

It might not be the prettiest solution, but what's stopping you from maintaining that cache and checking existence before returning? It will slow down for large caches though.
#include <random>
#include <time.h>
#include <set>
std::set<int> cache;
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
auto r = dist(mt);
while(cache.find(r) != cache.end())
r = dist(mt);
return r;
}

How to use <random> to replace rand()?

C++11 introduced the header <random> with declarations for random number engines and random distributions. That's great - time to replace those uses of rand() which is often problematic in various ways. However, it seems far from obvious how to replace
srand(n);
// ...
int r = rand();
Based on the declarations it seems a uniform distribution can be built something like this:
std::default_random_engine engine;
engine.seed(n);
std::uniform_int_distribution<> distribution;
auto rand = [&](){ return distribution(engine); }
This approach seems rather involved and is surely something I won't remember unlike the use of srand() and rand(). I'm aware of N4531 but even that still seems to be quite involved.
Is there a reasonably simple way to replace srand() and rand()?

Is there a reasonably simple way to replace srand() and rand()?
Full disclosure: I don't like rand(). It's bad, and it's very easily abused.
The C++11 random library fills in a void that has been lacking for a long, long time. The problem with high quality random libraries is that they're oftentimes hard to use. The C++11 <random> library represents a huge step forward in this regard. A few lines of code and I have a very nice generator that behaves very nicely and that easily generates random variates from many different distributions.
Given the above, my answer to you is a bit heretical. If rand() is good enough for your needs, use it. As bad as rand() is (and it is bad), removing it would represent a huge break with the C language. Just make sure that the badness of rand() truly is good enough for your needs.
C++14 didn't deprecate rand(); it only deprecated functions in the C++ library that use rand(). While C++17 might deprecate rand(), it won't delete it. That means you have several more years before rand() disappears. The odds are high that you will have retired or switched to a different language by the time the C++ committee finally does delete rand() from the C++ standard library.
I'm creating random inputs to benchmark different implementations of std::sort() using something along the lines of std::vector<int> v(size); std::generate(v.begin(), v.end(), std::rand);
You don't need a cryptographically secure PRNG for that. You don't even need Mersenne Twister. In this particular case, rand() probably is good enough for your needs.
Update
There is a nice simple replacement for rand() and srand() in the C++11 random library: std::minstd_rand.
#include <random>
#include <iostream>
int main ()
{
std:: minstd_rand simple_rand;
// Use simple_rand.seed() instead of srand():
simple_rand.seed(42);
// Use simple_rand() instead of rand():
for (int ii = 0; ii < 10; ++ii)
{
std::cout << simple_rand() << '\n';
}
}
The function std::minstd_rand::operator()() returns a std::uint_fast32_t. However, the algorithm restricts the result to between 1 and 231-2, inclusive. This means the result will always convert safely to a std::int_fast32_t (or to an int if int is at least 32 bits long).

How about randutils by Melissa O'Neill of pcg-random.org?
From the introductory blog post:
randutils::mt19937_rng rng;
std::cout << "Greetings from Office #" << rng.uniform(1,17)
<< " (where we think PI = " << rng.uniform(3.1,3.2) << ")\n\n"
<< "Our office morale is " << rng.uniform('A','D') << " grade\n";

Assuming you want the behavior of the C-style rand and srand functions, including their quirkiness, but with good random, this is the closest I could get.
#include <random>
#include <cstdlib> // RAND_MAX (might be removed soon?)
#include <climits> // INT_MAX (use as replacement?)
namespace replacement
{
constexpr int rand_max {
#ifdef RAND_MAX
RAND_MAX
#else
INT_MAX
#endif
};
namespace detail
{
inline std::default_random_engine&
get_engine() noexcept
{
// Seeding with 1 is silly, but required behavior
static thread_local auto rndeng = std::default_random_engine(1);
return rndeng;
}
inline std::uniform_int_distribution<int>&
get_distribution() noexcept
{
static thread_local auto rnddst = std::uniform_int_distribution<int> {0, rand_max};
return rnddst;
}
} // namespace detail
inline int
rand() noexcept
{
return detail::get_distribution()(detail::get_engine());
}
inline void
srand(const unsigned seed) noexcept
{
detail::get_engine().seed(seed);
detail::get_distribution().reset();
}
inline void
srand()
{
std::random_device rnddev {};
srand(rnddev());
}
} // namespace replacement
The replacement::* functions can be used exactly like their std::* counterparts from <cstdlib>. I have added a srand overload that takes no arguments and seeds the engine with a “real” random number obtained from a std::random_device. How “real” that randomness will be is of course implementation defined.
The engine and the distribution are held as thread_local static instances so they carry state across multiple calls but still allow different threads to observe predictable sequences. (It's also a performance gain because you don't need to re-construct the engine or use locks and potentially trash other people's cashes.)
I've used std::default_random_engine because you did but I don't like it very much. The Mersenne Twister engines (std::mt19937 and std::mt19937_64) produce much better “randomness” and, surprisingly, have also been observed to be faster. I don't think that any compliant program must rely on std::rand being implemented using any specific kind of pseudo random engine. (And even if it did, implementations are free to define std::default_random_engine to whatever they like so you'd have to use something like std::minstd_rand to be sure.)

Abusing the fact that engines return values directly
All engines defined in <random> has an operator()() that can be used to retrieve the next generated value, as well as advancing the internal state of the engine.
std::mt19937 rand (seed); // or an engine of your choosing
for (int i = 0; i < 10; ++i) {
unsigned int x = rand ();
std::cout << x << std::endl;
}
It shall however be noted that all engines return a value of some unsigned integral type, meaning that they can potentially overflow a signed integral (which will then lead to undefined-behavior).
If you are fine with using unsigned values everywhere you retrieve a new value, the above is an easy way to replace usage of std::srand + std::rand.
Note: Using what has been described above might lead to some values having a higher chance of being returned than others, due to the fact that the result_type of the engine not having a max value that is an even multiple of the highest value that can be stored in the destination type.
If you have not worried about this in the past — when using something like rand()%low+high — you should not worry about it now.
Note: You will need to make sure that the std::engine-type::result_type is at least as large as your desired range of values (std::mt19937::result_type is uint_fast32_t).
If you only need to seed the engine once
There is no need to first default-construct a std::default_random_engine (which is just a typedef for some engine chosen by the implementation), and later assigning a seed to it; this could be done all at once by using the appropriate constructor of the random-engine.
std::random-engine-type engine (seed);
If you however need to re-seed the engine, using std::random-engine::seed is the way to do it.
If all else fails; create a helper-function
Even if the code you have posted looks slightly complicated, you are only meant to write it once.
If you find yourself in a situation where you are tempted to just copy+paste what you have written to several places in your code it is recommended, as always when doing copy+pasting; introduce a helper-function.
Intentionally left blank, see other posts for example implementations.

You can create a simple function like this:
#include <random>
#include <iostream>
int modernRand(int n) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, n);
return dis(gen);
}
And later use it like this:
int myRandValue = modernRand(n);
As mentioned here

How to use the C++11 random generators efficiently?

I am executing computational experiments, which need to be reproducible. Therefore each experiment uses its own random number generator and remembers its seed:
class Experiment
{
public:
void operator()();
private:
unsigned seed_;
std::mt19937 engine_;
};
The problem is that the engine needs to be passed down to the most elementary functions.
Let's say that somewhere 10 levels down the call stack there is a simple function that needs an engine to generate a random number between 0 and 1. Then that engine needs to be passed to each of those 10 calls, making the code a mess.
I considered and refused these two approaches:
1. global engine:
I would have a global engine and all the elementary functions would call this engine. This could however cause problems if I wanted to run several experiments in different threads. I have zero experience in multithreading, but I got a lot of advice against anything global, especially in a multithreaded application and I do not want to make a step in the wrong direction.
2. local engine in each small function.
Each function would create an engine on the stack, use it and destroy it on return. This could however cause performance problems, since the random number generator is a big complicated object. On my implementation it has 5000 bytes.
What approach should I use?

The only way to get reproducible random numbers is to use the seed when you initialize your random umber generators.
For the speed you should not be too concerned as these objects are not that big.
Here is an example
#include "stdafx.h"
// uniform_real_distribution
#include <iostream>
#include <random>
#include <vector>
#include <thread>
void generateNumbers(std::vector<double>& vRes, unsigned int nbNumbers, int seed, double & sum)
{
std::default_random_engine generator(seed);//forcing this parameter will force the results to be the samme so you only need to keep track of one number
std::uniform_real_distribution<double> distribution(0.0,1.0);//uniform distribution between 0 and 1.0
sum=0.0;
vRes.resize(nbNumbers);
for (unsigned int i=0;i<nbNumbers;++i)
{
vRes[i]=distribution(generator);
sum+=vRes[i];
}
}
int _tmain(int argc, _TCHAR* argv[])
{
const unsigned int nbNumbers=1000000;
const int seed=100;
const int nbThreads=300;
std::vector<std::vector<double > > vTest(nbThreads);
std::vector<double> vSum(nbThreads);//vector of checksums: all numbers should be the same as we sum the same random numbers
for (int currThread=0;currThread<nbThreads;++currThread)
{
std::thread th(&generateNumbers, vTest[currThread],nbNumbers,seed,std::ref(vSum[currThread]));
th.join();
}
return 0;
}
This code runs in less than 10seconds in Release Visual Studio 2012. It can be greatly improved by using less threads (thread creation is time consuming) but that gives the idea.
Hope that it helps,

generating Random numbers using rand()

I'm trying to generate a bunch of random numbers using rand() % (range).
Here's how my code is setup :
srand(time(NULL));
someClass *obj = new someClass(rand()%range1,rand()%range2... etc ); // i.e. a number of random numbers one after the other
Whenever I run this, it seems all the calls to rand() generate the same number. I tried doing it without the : (edit : all rand() do not generate the same number it seems , read edit at the end)
srand(time(NULL));
then , every execution of the program yields the same results.
Also, since all calls to rand() are in a constructor , I cant really reseed it all the time. I guess I can create all objects sent to the constructor beforehand and reseed the random number generator in between, but it seems like an inelegant solution.
How can I generate a bunch of different random numbers ?
edit: It seems because I was creating a lot of objects in a loop, so every time the loop iterated srand(time(NULL)) was reseeded and the sequence got reset ( as time(NULL) has a resolution of a second) , that's why all subsequent objects had very similar properties.

If you call srand once, then all subsequent rand calls will return (different) pseudorandom numbers. If they don't, you're doing it wrong. :)
Apart from this, rand is pretty useless. Boost.Random (or the C++11 standard library <random> header) provides much more powerful random number generators, with nicer, more modern interfaces as well (for example allowing you to have multiple independent generators, unlike rand which uses a single global seed)

Unless reseeded with a different starting point, rand() always returns the same sequence. That is actually a feature to make program tests repeatable!
So, you have to call srand if you want a different sequence for different runs. Perhaps you can do that before calling the first constructor?

Call srand once at the begin of the program. Then call rand()%range any time you want a random number. Here is an example for your situation, that works pretty well
#include <iostream>
#include <stdlib.h>
#include <time.h>
using namespace std;
class Test
{
public:
Test(int num0,int num1, int num2):num0_(num0),num1_(num1),num2_(num2){}
int num0_,num1_,num2_;
};
int main()
{
srand(time(NULL));
Test *test=new Test(rand()%100,rand()%100,rand()%100);
cout << test->num0_ << "\n";
cout << test->num1_ << "\n";
cout << test->num2_ << "\n";
delete test;
return 0;
}

check this code at: http://ideone.com/xV0R3#view_edit_box
#include<iostream>
#include <stdio.h>
#include <stdlib.h>
#include<time.h>
using namespace std;
int main()
{
int i=0;
srand(time(NULL));
while(i<10)
{
cout<<rand()<<endl;
i++;
}
return 0;
}
this produces different random numbers. you need to call srand() only once. rand() generates a different number every time after the srand() call

Using boost::random and getting same sequence of numbers

I have the following code:
Class B {
void generator()
{
// creating random number generator
boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));
boost::normal_distribution<float> noise(0,1);
boost::variate_generator<boost::mt19937,
boost::normal_distribution<float> > nD(randgen, noise);
for (int i = 0; i < 100; i++)
{
value = nD();
// graph each value
}
}
};
Class A {
void someFunction()
{
for(int i = 1; i <=3; i++)
{
std::shared_ptr<B> b;
b.reset(new B());
b->generator();
}
}
};
I wish to execute the above code multiple times in rapid succession to produce multiple graphs. I have also reviewed this stackoverflow question which is similar but the caveat states that when time(0) is used and the member function is called in rapid succession then you will still likely get the same sequence of numbers.
How might I overcome this problem?
EDIT: I've tried making randgen static in Class B, also tried making it a global variable in Class A, but each time the 3 graphs are still the same. I've also tried seeding from the GetSystemTime milliseconds. I must be missing something.

One way would be to not reseed the random number generator every time you execute your code.
Create the generator and seed it once, then just continue to use it.
That's assuming you're calling that code multiple times within the same run. If you're doing multiple runs (but still within the same second), you can use another differing property such as the process ID to change the seed.
Alternatively, you can go platform-dependent, using either the Windows GetSystemTime() returning a SYSTEMTIME structure with one of its elements being milliseconds, or the Linux getTimeOfDay returning number of microseconds since the epoch.
Windows:
#include <windows.h>
SYSTEMTIME st;
GetSystemTime (&st);
// Use st.wSecond * 100 + st.wMillisecs to seed (0 thru 59999).
Linux:
#include <sys/time.h>
struct timeval tv;
gettimeofday (&tv, NULL);
// Use tv.tv_sec * 100 + (tv.tv_usec / 1000) to seed (0 thru 59999).

With Boost.Random you can save the state of the random number generator--for example, you can save it to a text file. This is done with streams.
For example, using your code, after you seed the generator and have run it once, you can save the state with an output stream, like so:
std::ofstream generator_state_file("rng.saved");
generator_state_file << randgen;
Then later, when you've created a new generator, you can load the state back from that file using the opposite stream:
std::ifstream generator_state_file("rng.saved");
generator_state_file >> randgen;
And then use the state to generate some more random numbers, and then re-save the state, and so on and so on.
It may also be possible to save the state to a std::string using std::stringstream, if you don't want to use a file, but I haven't personally tried this.

Only create a single random number generator so it's only seeded once:
static boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));

First Thoughts
On unix you could try reading some bytes from /dev/random or /dev/urandom for the seed. You could also try using a combination of time(0) + pid + static counter (or pseudo-random sequence).
I believe on windows, you can use QueryPerformanceCounter to get the value of the high performance timer register.
Another thought:
You could declare your mt19937 prng as a static or global so you never lose its state.
A third thought:
You wish to "execute the above code multiple times in rapid succession to produce multiple graphs" pass in a graph index. (e.g. genGraph(int graphIndex) and combine this (add, xor, etc) with the output of time(0). boost::mt19937 randgen(static_cast<unsigned int>(std::time(0) + graphIndex));

A late answer: two random-number generator functions for comparing boost with standard method.
boost
#include <boost/random.hpp>
//the code that uses boost is massively non-intuitive, complex and obfuscated
bool _boost_seeded_=false;
/*--------------------*/int
boostrand(int High, int Low)
{
static boost::mt19937 random;
if (!_boost_seeded_)
{
random = boost::mt19937(time(0));
_boost_seeded_=true;
}
boost::uniform_int<> range(Low,High);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> >
getrandom(random, range);
return getrandom();
}
standard
#include <cstdlib>
#include <time.h>
//standard code is straight-forward and quite understandable
bool _stdrand_seeded_=false;
/*--------------------*/int
stdrand(int High, int Low)
{
if (!_stdrand_seeded_)
{
srand(time(0));
_stdrand_seeded_=true;
}
return ((rand() % (High - Low + 1)) + Low);
}
The results from both functions are comparably of the same "randomness". I would apply the KISS-principle.

If you do not want to use only one generator you could create one generator with seed(time(0)) and then use that generator as seed into the other generators.
time(0) has the resolution of 1 second. Using it multiple times as seed within a short time span will create the same generator.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Millions of random numbers generated "overflow" rand_r? - c++

Related

C++: How to generate random numbers while excluding numbers from a given cache

How to use <random> to replace rand()?

How to use the C++11 random generators efficiently?

generating Random numbers using rand()

Using boost::random and getting same sequence of numbers

Categories

Resources