How to use the C++11 random generators efficiently? - c++

I am executing computational experiments, which need to be reproducible. Therefore each experiment uses its own random number generator and remembers its seed:
class Experiment
{
public:
void operator()();
private:
unsigned seed_;
std::mt19937 engine_;
};
The problem is that the engine needs to be passed down to the most elementary functions.
Let's say that somewhere 10 levels down the call stack there is a simple function that needs an engine to generate a random number between 0 and 1. Then that engine needs to be passed to each of those 10 calls, making the code a mess.
I considered and refused these two approaches:
1. global engine:
I would have a global engine and all the elementary functions would call this engine. This could however cause problems if I wanted to run several experiments in different threads. I have zero experience in multithreading, but I got a lot of advice against anything global, especially in a multithreaded application and I do not want to make a step in the wrong direction.
2. local engine in each small function.
Each function would create an engine on the stack, use it and destroy it on return. This could however cause performance problems, since the random number generator is a big complicated object. On my implementation it has 5000 bytes.
What approach should I use?

The only way to get reproducible random numbers is to use the seed when you initialize your random umber generators.
For the speed you should not be too concerned as these objects are not that big.
Here is an example
#include "stdafx.h"
// uniform_real_distribution
#include <iostream>
#include <random>
#include <vector>
#include <thread>
void generateNumbers(std::vector<double>& vRes, unsigned int nbNumbers, int seed, double & sum)
{
std::default_random_engine generator(seed);//forcing this parameter will force the results to be the samme so you only need to keep track of one number
std::uniform_real_distribution<double> distribution(0.0,1.0);//uniform distribution between 0 and 1.0
sum=0.0;
vRes.resize(nbNumbers);
for (unsigned int i=0;i<nbNumbers;++i)
{
vRes[i]=distribution(generator);
sum+=vRes[i];
}
}
int _tmain(int argc, _TCHAR* argv[])
{
const unsigned int nbNumbers=1000000;
const int seed=100;
const int nbThreads=300;
std::vector<std::vector<double > > vTest(nbThreads);
std::vector<double> vSum(nbThreads);//vector of checksums: all numbers should be the same as we sum the same random numbers
for (int currThread=0;currThread<nbThreads;++currThread)
{
std::thread th(&generateNumbers, vTest[currThread],nbNumbers,seed,std::ref(vSum[currThread]));
th.join();
}
return 0;
}
This code runs in less than 10seconds in Release Visual Studio 2012. It can be greatly improved by using less threads (thread creation is time consuming) but that gives the idea.
Hope that it helps,

Related

How to use <random> to replace rand()?

C++11 introduced the header <random> with declarations for random number engines and random distributions. That's great - time to replace those uses of rand() which is often problematic in various ways. However, it seems far from obvious how to replace
srand(n);
// ...
int r = rand();
Based on the declarations it seems a uniform distribution can be built something like this:
std::default_random_engine engine;
engine.seed(n);
std::uniform_int_distribution<> distribution;
auto rand = [&](){ return distribution(engine); }
This approach seems rather involved and is surely something I won't remember unlike the use of srand() and rand(). I'm aware of N4531 but even that still seems to be quite involved.
Is there a reasonably simple way to replace srand() and rand()?
Is there a reasonably simple way to replace srand() and rand()?
Full disclosure: I don't like rand(). It's bad, and it's very easily abused.
The C++11 random library fills in a void that has been lacking for a long, long time. The problem with high quality random libraries is that they're oftentimes hard to use. The C++11 <random> library represents a huge step forward in this regard. A few lines of code and I have a very nice generator that behaves very nicely and that easily generates random variates from many different distributions.
Given the above, my answer to you is a bit heretical. If rand() is good enough for your needs, use it. As bad as rand() is (and it is bad), removing it would represent a huge break with the C language. Just make sure that the badness of rand() truly is good enough for your needs.
C++14 didn't deprecate rand(); it only deprecated functions in the C++ library that use rand(). While C++17 might deprecate rand(), it won't delete it. That means you have several more years before rand() disappears. The odds are high that you will have retired or switched to a different language by the time the C++ committee finally does delete rand() from the C++ standard library.
I'm creating random inputs to benchmark different implementations of std::sort() using something along the lines of std::vector<int> v(size); std::generate(v.begin(), v.end(), std::rand);
You don't need a cryptographically secure PRNG for that. You don't even need Mersenne Twister. In this particular case, rand() probably is good enough for your needs.
Update
There is a nice simple replacement for rand() and srand() in the C++11 random library: std::minstd_rand.
#include <random>
#include <iostream>
int main ()
{
std:: minstd_rand simple_rand;
// Use simple_rand.seed() instead of srand():
simple_rand.seed(42);
// Use simple_rand() instead of rand():
for (int ii = 0; ii < 10; ++ii)
{
std::cout << simple_rand() << '\n';
}
}
The function std::minstd_rand::operator()() returns a std::uint_fast32_t. However, the algorithm restricts the result to between 1 and 231-2, inclusive. This means the result will always convert safely to a std::int_fast32_t (or to an int if int is at least 32 bits long).
How about randutils by Melissa O'Neill of pcg-random.org?
From the introductory blog post:
randutils::mt19937_rng rng;
std::cout << "Greetings from Office #" << rng.uniform(1,17)
<< " (where we think PI = " << rng.uniform(3.1,3.2) << ")\n\n"
<< "Our office morale is " << rng.uniform('A','D') << " grade\n";
Assuming you want the behavior of the C-style rand and srand functions, including their quirkiness, but with good random, this is the closest I could get.
#include <random>
#include <cstdlib> // RAND_MAX (might be removed soon?)
#include <climits> // INT_MAX (use as replacement?)
namespace replacement
{
constexpr int rand_max {
#ifdef RAND_MAX
RAND_MAX
#else
INT_MAX
#endif
};
namespace detail
{
inline std::default_random_engine&
get_engine() noexcept
{
// Seeding with 1 is silly, but required behavior
static thread_local auto rndeng = std::default_random_engine(1);
return rndeng;
}
inline std::uniform_int_distribution<int>&
get_distribution() noexcept
{
static thread_local auto rnddst = std::uniform_int_distribution<int> {0, rand_max};
return rnddst;
}
} // namespace detail
inline int
rand() noexcept
{
return detail::get_distribution()(detail::get_engine());
}
inline void
srand(const unsigned seed) noexcept
{
detail::get_engine().seed(seed);
detail::get_distribution().reset();
}
inline void
srand()
{
std::random_device rnddev {};
srand(rnddev());
}
} // namespace replacement
The replacement::* functions can be used exactly like their std::* counterparts from <cstdlib>. I have added a srand overload that takes no arguments and seeds the engine with a “real” random number obtained from a std::random_device. How “real” that randomness will be is of course implementation defined.
The engine and the distribution are held as thread_local static instances so they carry state across multiple calls but still allow different threads to observe predictable sequences. (It's also a performance gain because you don't need to re-construct the engine or use locks and potentially trash other people's cashes.)
I've used std::default_random_engine because you did but I don't like it very much. The Mersenne Twister engines (std::mt19937 and std::mt19937_64) produce much better “randomness” and, surprisingly, have also been observed to be faster. I don't think that any compliant program must rely on std::rand being implemented using any specific kind of pseudo random engine. (And even if it did, implementations are free to define std::default_random_engine to whatever they like so you'd have to use something like std::minstd_rand to be sure.)
Abusing the fact that engines return values directly
All engines defined in <random> has an operator()() that can be used to retrieve the next generated value, as well as advancing the internal state of the engine.
std::mt19937 rand (seed); // or an engine of your choosing
for (int i = 0; i < 10; ++i) {
unsigned int x = rand ();
std::cout << x << std::endl;
}
It shall however be noted that all engines return a value of some unsigned integral type, meaning that they can potentially overflow a signed integral (which will then lead to undefined-behavior).
If you are fine with using unsigned values everywhere you retrieve a new value, the above is an easy way to replace usage of std::srand + std::rand.
Note: Using what has been described above might lead to some values having a higher chance of being returned than others, due to the fact that the result_type of the engine not having a max value that is an even multiple of the highest value that can be stored in the destination type.
If you have not worried about this in the past — when using something like rand()%low+high — you should not worry about it now.
Note: You will need to make sure that the std::engine-type::result_type is at least as large as your desired range of values (std::mt19937::result_type is uint_fast32_t).
If you only need to seed the engine once
There is no need to first default-construct a std::default_random_engine (which is just a typedef for some engine chosen by the implementation), and later assigning a seed to it; this could be done all at once by using the appropriate constructor of the random-engine.
std::random-engine-type engine (seed);
If you however need to re-seed the engine, using std::random-engine::seed is the way to do it.
If all else fails; create a helper-function
Even if the code you have posted looks slightly complicated, you are only meant to write it once.
If you find yourself in a situation where you are tempted to just copy+paste what you have written to several places in your code it is recommended, as always when doing copy+pasting; introduce a helper-function.
Intentionally left blank, see other posts for example implementations.
You can create a simple function like this:
#include <random>
#include <iostream>
int modernRand(int n) {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, n);
return dis(gen);
}
And later use it like this:
int myRandValue = modernRand(n);
As mentioned here

How do I generate thread-safe uniform random numbers?

My program needs to generate many random integers in some range (int min, int max). Each call will have a different range. What is a good (preferably thread-safe) way to do this? The following is not thread-safe (and uses rand(), which people seem to discourage):
int intRand(const int & min, const int & max)
{
return (rand() % (max+1-min)) + min;
}
This is much slower, but uses <random>:
int intRand(const int & min, const int & max) {
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(min,max);
return distribution(generator);
}
Something like this is what I'm going for (the changeParameters function doesn't exist though):
int intRand(const int & min, const int & max) {
static std::default_random_engine generator;
static std::uniform_int_distribution<int> distribution(0, 10);
distribution.changeParameters(min, max);
return distribution(generator);
}
Another option would be to make a wide range on the uniform_int_distribution and then use mod like in the first example. However, I'm doing statistical work, so I want the numbers to come from as unbiased of a distribution as possible (e.g., if the range of the distribution used is not a multiple of (max-min), the distribution will be slightly biased). This is an option, but again, I would like to avoid it.
SOLUTION This solution comes from the answers by #konrad-rudolph #mark-ransom and #mathk . The seeding of the random number generator is done to suit my particular needs. A more common approach would be to use time(NULL). If you make many threads in the same second, they would then get the same seed though. Even with clock() this is an issue, so we include the thread id. A drawback - this leaks memory --- one generator per thread.
#if defined (_MSC_VER) // Visual studio
#define thread_local __declspec( thread )
#elif defined (__GCC__) // GCC
#define thread_local __thread
#endif
#include <random>
#include <time.h>
#include <thread>
using namespace std;
/* Thread-safe function that returns a random number between min and max (inclusive).
This function takes ~142% the time that calling rand() would take. For this extra
cost you get a better uniform distribution and thread-safety. */
int intRand(const int & min, const int & max) {
static thread_local mt19937* generator = nullptr;
if (!generator) generator = new mt19937(clock() + this_thread::get_id().hash());
uniform_int_distribution<int> distribution(min, max);
return distribution(*generator);
}
Have you tried this?
int intRand(const int & min, const int & max) {
static thread_local std::mt19937 generator;
std::uniform_int_distribution<int> distribution(min,max);
return distribution(generator);
}
Distributions are extremely cheap (they will be completely inlined by the optimiser so that the only remaining overhead is the actual random number rescaling). Don’t be afraid to regenerate them as often as you need – in fact, resetting them would conceptually be no cheaper (which is why that operation doesn’t exist).
The actual random number generator, on the other hand, is a heavy-weight object carrying a lot of state and requiring quite some time to be constructed, so that should only be initialised once per thread (or even across threads, but then you’d need to synchronise access which is more costly in the long run).
Make the generator static, so it's only created once. This is more efficient, since good generators typically have a large internal state; more importantly, it means you are actually getting the pseudo-random sequence it generates, not the (much less random) initial values of separate sequences.
Create a new distribution each time; these are typically lightweight objects with little state, especially one as simple as uniform_int_distribution.
For thread safety, options are to make the generator thread_local, with a different seed for each thread, or to guard it with a mutex. The former is likely to be faster, especially if there's a lot of contention, but will consume more memory.
You can use one default_random_engine per thread using Thread Local Storage.
I can not tell you how to correctly use TLS since it is OS dependent. The best source you can use is to search through the internet.
I am a person from the future with the same problem. The accepted answer won't compile on MSVC 2013, because it doesn't implement thread_local (and using __declspec(thread) doesn't work because it doesn't like constructors).
The memory leak in your solution can be moved off the heap by modifying everything to use placement new.
Here's my solution (combined from a header and source file):
#ifndef BUILD_COMPILER_MSVC
thread_local std::mt19937 _generator;
#else
__declspec(thread) char _generator_backing[sizeof(std::mt19937)];
__declspec(thread) std::mt19937* _generator;
#endif
template <typename type_float> inline type_float get_uniform(void) {
std::uniform_real_distribution<type_float> distribution;
#ifdef BUILD_COMPILER_MSVC
static __declspec(thread) bool inited = false;
if (!inited) {
_generator = new(_generator_backing) std::mt19937();
inited = true;
}
return distribution(*_generator);
#else
return distribution(_generator);
#endif
}
Write a simple LCG (or whatever) PRNG for yourself, which will produce numbers up to the maximum possible required. Use a single static copy of the built-in RNG to seed a new local copy of your own PRNG for each new thread you generate. Each thread-local PRNG will have its own local storage, and never needs to refer to the central RNG again.
This assumes that a statistically good RNG is fine for you and that cryptographic security is not an issue.

Millions of random numbers generated "overflow" rand_r?

I am having trouble with rand_r. I have a simulation that generates millions of random numbers. I have noticed that at a certain point in time, these numbers are no longer uniform. What could be the problem?
What i do: i create an instance of a generator and give it is own seed.
mainRGen= new nativeRandRUni(idumSeed_g);
here is the class/object def:
class nativeRandRUni {
public:
unsigned seed;
nativeRandRUni(unsigned sd){ seed= sd; }
float genP() { return (rand_r(&seed))/float(RAND_MAX); } // [0,1]
int genI(int R) { return (rand_r(&seed) % R); } // [0,R-1]
};
numbers are simply generated by:
newIntNumber= mainRGen->genI(desired_max);
newFloatNumber= mainRGen->genP();
the simulations have the problem described above. I know this is happening cause i have checked the distribution of the generated numbers after the point in time that a signature is shown in the results (see this, top image, http://ubuntuone.com/0tbfidZaXfGNTfiVr3x7DR)
also, if i print the seed at t-1 and t, being t the time point of the signature, i can see the seed changing by an order of magnitude from value 263069042 to 1069048066
if i run the code with a different seed, the problem is always present but at different time points
Also, if i use rand() instead of my object, all goes well... i DO need the object cause sometimes i used threads. The example above does not have threads.
i am really lost here, any clues?
EDIT - EDIT
it can be reproducible by looping enough times, problem is that, like i said, it takes millions of iterations for the problem to arise. For seed -158342163 i get it at generation t=134065568. One can check numbers generated before (uniform) and after (not uniform). I get the same problem if i change the seed manually at given t's, see (*) in code. Something i also do not expect to happen?
#include <tr1/random>
#include <fstream>
#include <sstream>
#include <iostream>
using std::ofstream;
using std::cout;
using std::endl;
class nativeRandRUni {
public:
unsigned seed;
long count;
nativeRandRUni(unsigned sd){ seed= sd; count=0; }
float genP() { count++; return (rand_r(&seed))/float(RAND_MAX); } // [0,1]
int genI(int R) { count++; return (rand_r(&seed) % R); } // [0,R-1]
};
int main(int argc, char *argv[]){
long timePointOfProblem= 134065568;
nativeRandRUni* mainRGen= new nativeRandRUni(-158342163);
int rr;
//ofstream* fout_metaAux= new ofstream();
//fout_metaAux->open("random.numbers");
for(int i=0; i< timePointOfProblem; i++){
rr= mainRGen->genI(1009200);
//(*fout_metaAux) << rr << endl;
//if(i%1000==0) mainRGen->seed= 111111; //(*) FORCE
}
//fout_metaAux->close();
}
Given that random numbers is key to your simulation, you should implement your own generator. I don't know what algorithm rand_r is using, but it could be something pretty crappy like linear congruent generator.
I'd look into implementing something fast and with good qualities where you know the underlying algorithm. I'd start by looking at implementing Mersenne Twister:
http://en.wikipedia.org/wiki/Mersenne_twister
Its simple to implement and very fast - requires no divides.
ended up trying a simple solution from boost, changing the generator to:
class nativeRandRUni {
public:
typedef mt19937 EngineType;
typedef uniform_real<> DistributionType;
typedef variate_generator<EngineType, DistributionType> VariateGeneratorType;
nativeRandRUni(long s, float min, float max) : gen(EngineType(s), DistributionType(min, max)) {}
VariateGeneratorType gen;
};
I don't get the problem anymore... tho it solved it, i dont feel very comfortable with not understanding what it was. I think Rafael is right, i should not trust rand_r for this intensive number of generations
Now, this is slower than before, so i may look for ways of optimizing it.
QUESTION: Would a Mersenne Twister implementation in principle be faster?
and thanks to all!

generating Random numbers using rand()

I'm trying to generate a bunch of random numbers using rand() % (range).
Here's how my code is setup :
srand(time(NULL));
someClass *obj = new someClass(rand()%range1,rand()%range2... etc ); // i.e. a number of random numbers one after the other
Whenever I run this, it seems all the calls to rand() generate the same number. I tried doing it without the : (edit : all rand() do not generate the same number it seems , read edit at the end)
srand(time(NULL));
then , every execution of the program yields the same results.
Also, since all calls to rand() are in a constructor , I cant really reseed it all the time. I guess I can create all objects sent to the constructor beforehand and reseed the random number generator in between, but it seems like an inelegant solution.
How can I generate a bunch of different random numbers ?
edit: It seems because I was creating a lot of objects in a loop, so every time the loop iterated srand(time(NULL)) was reseeded and the sequence got reset ( as time(NULL) has a resolution of a second) , that's why all subsequent objects had very similar properties.
If you call srand once, then all subsequent rand calls will return (different) pseudorandom numbers. If they don't, you're doing it wrong. :)
Apart from this, rand is pretty useless. Boost.Random (or the C++11 standard library <random> header) provides much more powerful random number generators, with nicer, more modern interfaces as well (for example allowing you to have multiple independent generators, unlike rand which uses a single global seed)
Unless reseeded with a different starting point, rand() always returns the same sequence. That is actually a feature to make program tests repeatable!
So, you have to call srand if you want a different sequence for different runs. Perhaps you can do that before calling the first constructor?
Call srand once at the begin of the program. Then call rand()%range any time you want a random number. Here is an example for your situation, that works pretty well
#include <iostream>
#include <stdlib.h>
#include <time.h>
using namespace std;
class Test
{
public:
Test(int num0,int num1, int num2):num0_(num0),num1_(num1),num2_(num2){}
int num0_,num1_,num2_;
};
int main()
{
srand(time(NULL));
Test *test=new Test(rand()%100,rand()%100,rand()%100);
cout << test->num0_ << "\n";
cout << test->num1_ << "\n";
cout << test->num2_ << "\n";
delete test;
return 0;
}
check this code at: http://ideone.com/xV0R3#view_edit_box
#include<iostream>
#include <stdio.h>
#include <stdlib.h>
#include<time.h>
using namespace std;
int main()
{
int i=0;
srand(time(NULL));
while(i<10)
{
cout<<rand()<<endl;
i++;
}
return 0;
}
this produces different random numbers. you need to call srand() only once. rand() generates a different number every time after the srand() call

Using boost::random and getting same sequence of numbers

I have the following code:
Class B {
void generator()
{
// creating random number generator
boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));
boost::normal_distribution<float> noise(0,1);
boost::variate_generator<boost::mt19937,
boost::normal_distribution<float> > nD(randgen, noise);
for (int i = 0; i < 100; i++)
{
value = nD();
// graph each value
}
}
};
Class A {
void someFunction()
{
for(int i = 1; i <=3; i++)
{
std::shared_ptr<B> b;
b.reset(new B());
b->generator();
}
}
};
I wish to execute the above code multiple times in rapid succession to produce multiple graphs. I have also reviewed this stackoverflow question which is similar but the caveat states that when time(0) is used and the member function is called in rapid succession then you will still likely get the same sequence of numbers.
How might I overcome this problem?
EDIT: I've tried making randgen static in Class B, also tried making it a global variable in Class A, but each time the 3 graphs are still the same. I've also tried seeding from the GetSystemTime milliseconds. I must be missing something.
One way would be to not reseed the random number generator every time you execute your code.
Create the generator and seed it once, then just continue to use it.
That's assuming you're calling that code multiple times within the same run. If you're doing multiple runs (but still within the same second), you can use another differing property such as the process ID to change the seed.
Alternatively, you can go platform-dependent, using either the Windows GetSystemTime() returning a SYSTEMTIME structure with one of its elements being milliseconds, or the Linux getTimeOfDay returning number of microseconds since the epoch.
Windows:
#include <windows.h>
SYSTEMTIME st;
GetSystemTime (&st);
// Use st.wSecond * 100 + st.wMillisecs to seed (0 thru 59999).
Linux:
#include <sys/time.h>
struct timeval tv;
gettimeofday (&tv, NULL);
// Use tv.tv_sec * 100 + (tv.tv_usec / 1000) to seed (0 thru 59999).
With Boost.Random you can save the state of the random number generator--for example, you can save it to a text file. This is done with streams.
For example, using your code, after you seed the generator and have run it once, you can save the state with an output stream, like so:
std::ofstream generator_state_file("rng.saved");
generator_state_file << randgen;
Then later, when you've created a new generator, you can load the state back from that file using the opposite stream:
std::ifstream generator_state_file("rng.saved");
generator_state_file >> randgen;
And then use the state to generate some more random numbers, and then re-save the state, and so on and so on.
It may also be possible to save the state to a std::string using std::stringstream, if you don't want to use a file, but I haven't personally tried this.
Only create a single random number generator so it's only seeded once:
static boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));
First Thoughts
On unix you could try reading some bytes from /dev/random or /dev/urandom for the seed. You could also try using a combination of time(0) + pid + static counter (or pseudo-random sequence).
I believe on windows, you can use QueryPerformanceCounter to get the value of the high performance timer register.
Another thought:
You could declare your mt19937 prng as a static or global so you never lose its state.
A third thought:
You wish to "execute the above code multiple times in rapid succession to produce multiple graphs" pass in a graph index. (e.g. genGraph(int graphIndex) and combine this (add, xor, etc) with the output of time(0). boost::mt19937 randgen(static_cast<unsigned int>(std::time(0) + graphIndex));
A late answer: two random-number generator functions for comparing boost with standard method.
boost
#include <boost/random.hpp>
//the code that uses boost is massively non-intuitive, complex and obfuscated
bool _boost_seeded_=false;
/*--------------------*/int
boostrand(int High, int Low)
{
static boost::mt19937 random;
if (!_boost_seeded_)
{
random = boost::mt19937(time(0));
_boost_seeded_=true;
}
boost::uniform_int<> range(Low,High);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> >
getrandom(random, range);
return getrandom();
}
standard
#include <cstdlib>
#include <time.h>
//standard code is straight-forward and quite understandable
bool _stdrand_seeded_=false;
/*--------------------*/int
stdrand(int High, int Low)
{
if (!_stdrand_seeded_)
{
srand(time(0));
_stdrand_seeded_=true;
}
return ((rand() % (High - Low + 1)) + Low);
}
The results from both functions are comparably of the same "randomness". I would apply the KISS-principle.
If you do not want to use only one generator you could create one generator with seed(time(0)) and then use that generator as seed into the other generators.
time(0) has the resolution of 1 second. Using it multiple times as seed within a short time span will create the same generator.