Generating random doubles in different ranges - c++

In my program I need to generate random doubles repeatedly (millions of times) and there are several variables with flat distributions, but different ranges. Currently this is what I do:
double w, v, k;
double wmax = 0.5;
double vmax = 1.0;
std::random_device rd;
std::default_random_engine dre(rd());
std::uniform_real_distribution<double> wRand(-wmax, wmax);
std::uniform_real_distribution<double> vRand(-vmax, vmax);
std::uniform_real_distribution<double> kRand(0.0, 1.0);
w = wRand(dre);
v = vRand(dre);
k = kRand(dre);
Is this a proper way, OR having one distribution and constructing all the numbers from it is better? I'm extremely cautious of performance issues and I feel like having one distribution and a couple of arithmetic operations on it would be quicker. Will it? What about the comparative quality of random numbers in such case?

My suggestion would be to use a single distribution with three arithmetic operations to scale them appropriately. It will use less memory and the arithmetic operations will be fast.
But your overall performance will be more driven by how you structure your loops and minimize branch misprediction. See this question

Related

C++: How to generate random numbers while excluding numbers from a given cache

So in c++ I'm using mt19937 engine and the uniform_int_distribution in my random number generator like so:
#include <random>
#include <time.h>
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
return dist(mt);
}
What I need is to alter the above generator such that there is a cache that contains a number of integers I need to be excluded when I use the above generator over and over again.
How do I alter the above such that I can achieve this?
There are many ways to do it. A simple way would be to maintain your "excluded numbers" in a std::set and after each generation of a random number, check whether it is in the set and if it is then generate a new random number - repeat until you get a number that was not in the set, then return that.
Btw; while distributions are cheap to construct, engines are not. You don't want to re-construct your mt19937 every time the function is called, but instead create it once and then re-use it. You probably also want to use a better seed than the current time in seconds.
Are you 1) attempting to sample without replacement in the discrete interval? Or is it 2) a patchy distribution over the interval that says fairly constant?
If 1) you could use std::shuffle as per the answer here How to sample without replacement using c++ uniform_int_distribution
If 2) you could use std::discrete_distribution (element 0 corresponding to lwr_lm) and weight zero the numbers you don't want. Obviously the memory requirements are linear in upper_lm-lwr_lm so might not be practical if this is large
I would propose two similar solutions for the problem. They are based upon probabilistic structures, and provide you with the answer "potentially in cache" or "definitely not in cache". There are false positives but no false negatives.
Perfect hash function. There are many implementations, including one from GNU. Basically, run it on set of cache values, and use generated perfect hash functions to reject sampled values. You don't even need to maintain hash table, just function mapping random value to integer index. As soon as index is in the hash range, reject the number. Being perfect means you need only one call to check and result will tell you that number is in the set. There are potential collisions, so false positives are possible.
Bloom filter. Same idea, build filter with whatever bits per cache item you're willing to spare, and with quick check you either will get "possible in the cache" answer or clear negative. You could trade answer precision for memory and vice versa. False positives are possible
As mentioned by #virgesmith, in his answer, it might be better solution in function of your problem.
The method with a cache and uses it to filter future generation is inefficient for large range wiki.
Here I write a naive example with a different method, but you will be limited by your memory. You pick random number for a buffer and remove it for next iteration.
#include <random>
#include <time.h>
#include <iostream>
int get_random(int lwr_lm, int upper_lm, std::vector<int> &buff, std::mt19937 &mt){
if (buff.size() > 0) {
std::uniform_int_distribution<int> dist(0, buff.size()-1);
int tmp_index = dist(mt);
int tmp_value = buff[tmp_index];
buff.erase(buff.begin() + tmp_index);
return tmp_value;
} else {
return 0;
}
}
int main() {
// lower and upper limit for random distribution
int lower = 0;
int upper = 10;
// Random generator
std::mt19937 mt(time(nullptr));
// Buffer to filter and avoid duplication, Buffer contain all integer between lower and uper limit
std::vector<int> my_buffer(upper-lower);
std::iota(my_buffer.begin(), my_buffer.end(), lower);
for (int i = 0; i < 20; ++i) {
std::cout << get_random(lower, upper, my_buffer, mt) << std::endl;
}
return 0;
}
Edit: a cleaner solution here
It might not be the prettiest solution, but what's stopping you from maintaining that cache and checking existence before returning? It will slow down for large caches though.
#include <random>
#include <time.h>
#include <set>
std::set<int> cache;
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
auto r = dist(mt);
while(cache.find(r) != cache.end())
r = dist(mt);
return r;
}

Reseeding std::rand / c++11 <random>

I am trying to implement a simple noise function that takes two integers and return a random float based on the seed combined with the two parameters.
using std::mt19937 works great, but for some reason when I try to use srand with rand(), I get repeated numbers..
Note: Using c++11 seed member function in a loop is really, really slow.
here are two terrains using both methods (with the same reseeding numbers):
c++11 random:
std::random_device rd;
std::mt19937 g{ rd() };
std::uniform_real_distribution<float> gen{ -1.0, 1.0 };
float getNoise(int x, int z) {
g.seed(x * 523234 + z * 128354 + seed);
return gen(g);
}
c random:
float getNoise(int x, int z) {
std::srand(x * 523234 + z * 128354 + seed);
return static_cast<float>(std::rand()) / RAND_MAX * 2.0f - 1.0f;
}
To the questions:
Is there a faster way to reseed the c++11 pseudo-random number ?
Why doesn't the srand work as expected?
Thanks in advance.
EDIT:
Ok sorry for not being clear, and, I know maybe I am wrong but let me try to explain again, I use reseeding because I use the same x and z coordinate's when iterating (not the same iteration).
If I remove the reseeding I will get this result:
I am trying to implement a simple noise function that takes two integers and return a random float based on the seed combined with the two parameters.
Please don't say I shouldn't reseeding, I want to reseed on purpose.
You are purposely breaking it and asking us why it is broken, with the caveat that we aren't allowed to mention the gorilla in the room.
Don't reseed.
[edit]
Alright, as per comment request, here's an answer:
1) No, there is no faster way to reseed a PRNG, which you shouldn't be doing anyway. Properly, you should be seeding and then “warming up” the PRNG by discarding a few thousand values.
2) The reason rand() (and, even though you don't believe it, any other PRNG you use) doesn't work is because your getNoise() function is incorrect.
Your third image is correct. It is the result you should expect from simply returning a clamped random value.
You have attempted to modulate it by messing with the seed and, because of an apparent visual goodness in your first attempt, concluded that it is the correct method. However, what is really happening is you are simply crippling the PRNG and seeing the result of that. (It is more clear in the rand() attempt because its seed more crudely defines the resulting sequence, which itself has a smaller period than the Mersenne Twister.)
(Attempting to modify it by skewing the (x,z) coordinate is also a red herring. It doesn't affect the actual randomness of the output.)
TL;DR
You're doing it wrong.
If you want to generate terrain maps, you should google around fractal terrain generation. In fact, here's a decent link for you: http://www.gameprogrammer.com/fractal.html
You will find that it takes a little more work to do it, but that the methods are very pleasingly simple and that you can very easily tweak them to modify your results.
Hope this helps.
The random number generator generates a sequence of random values from an initial seed, and is not meant to be used to generate single random values in function of a seed. So it should be initialized with g.seed(seed), and then be called in a fixed order for all (x, y) values, without reseeding each time. This will give random values efficiently, with the expected distribution.
For example:
std::random_device rd;
std::mt19937 g{ rd() };
std::uniform_real_distribution<float> gen{ -1.0, 1.0 };
constexpr std::size_t nx = 100;
constexpr std::size_t nz = 100;
float noise[nx][nz];
void generateNoise() {
g.seed(seed);
for(int x = 0; x < nx; ++x) for(int x = 0; x < nx; ++x)
noise[x][z] = gen(g);
return gen(g);
}
I don't see why you'd want to continuously re-seed - that seems pointless and slow. But that's not what you are asking, so...
rand produces very poor quality random numbers. Very low period and usually based on a linear congruential generator (not good). Also, the seed size is very small. Don't use it - <random> exists for a reason.
The way you seed using srand seems to depend very much on the x and z values you pass in and that you then multiply by large numbers which likely leeds to overflows and truncation when passing to srand, meaning that (due to the limited number of possible seed values) you'll be reusing the same seed(s) often.
Some relevant links you may want to visit:
http://en.cppreference.com/w/cpp/numeric/random
https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful

Efficient random number generation with C++11 <random>

I am trying to understand how the C++11 random number generation features are meant to be used. My concern is performance.
Suppose that we need to generate a series of random integers between 0..k, but k changes at every step. What is the best way to proceed?
Example:
for (int i=0; i < n; ++i) {
int k = i; // of course this is more complicated in practice
std::uniform_int_distribution<> dist(0, k);
int random_number = dist(engine);
// do something with random number
}
The distributions that the <random> header provides are very convenient. But they are opaque to the user, so I cannot easily predict how they will perform. It is not clear for example how much (if any) runtime overhead will be caused by the construction of dist above.
Instead I could have used something like
std::uniform_real_distribution<> dist(0.0, 1.0);
for (int i=0; i < n; ++i) {
int k = i; // of course this is more complicated in practice
int random_number = std::floor( (k+1)*dist(engine) );
// do something with random number
}
which avoids constructing a new object in each iteration.
Random numbers are often used in numerical simulations where performance is important. What is the best way to use <random> in these situations?
Please do no answer "profile it". Profiling is part of effective optimization, but so is a good understanding of how a library is meant to be used and the performance characteristics of that library. If the answer is that it depends on the standard library implementation, or that the only way to know is to profile it, then I would rather not use the distributions from <random> at all. Instead I can use my own implementation which will be transparent to me and much easier to optimize if/when necessary.
One thing you can do is to have a permanent distribution object so that you only create the param_type object each time like this:
template<typename Integral>
Integral randint(Integral min, Integral max)
{
using param_type =
typename std::uniform_int_distribution<Integral>::param_type;
// only create these once (per thread)
thread_local static std::mt19937 eng {std::random_device{}()};
thread_local static std::uniform_int_distribution<Integral> dist;
// presumably a param_type is cheaper than a uniform_int_distribution
return dist(eng, param_type{min, max});
}
For maximizing performance, first of all consider different PRNG, such as xorshift128+. It has been reported being more than twice as fast as mt19937 for 64-bit random numbers; see http://xorshift.di.unimi.it/. And it can be implemented with a few lines of code.
Moreover, if you don't need "perfectly balanced" uniform distribution and your k is much less than 2^64 (which likely is), I would suggest to write simply something as:
uint64_t temp = engine_64(); // generates 0 <= temp < 2^64
int random_number = temp % (k + 1); // crop temp to 0,...,k
Note, however, that integer division/modulo operations are not cheap. For example, on an Intel Haswell processor, they take 39-103 processor cycles for 64-bit numbers, which is likely much longer than calling an MT19937 or xorshift+ engine.

several random numbers c++

I am a physicist, writing a program that involves generating several (order of a few billions) random numbers, drawn from a Gaussian distribution. I am trying to use C++11. The generation of these random numbers is separated by an operation that should take very little time. My biggest worry is if the fact that I am generating so many random numbers, with such a little time gap, could potentially lead to sub-optimal performance. I am testing certain statistical properties, which rely heavily on the independence of the randomness of the numbers, so, my result is particularly sensitive to these issues. My question is, with the kinds of numbers I mention below in the code (a simplified version of my actual code), am I doing something obviously (or even, subtly) wrong?
#include <random>
// Several other includes, etc.
int main () {
int dim_vec(400), nStats(1e8);
vector<double> vec1(dim_vec), vec2(dim_vec);
// Initialize the above vectors, which are order 1 numbers.
random_device rd;
mt19937 generator(rd());
double y(0.0);
double l(0.0);
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
normal_distribution<double> distribution(0.0,1/sqrt(vec1[j]));
l=distribution(generator);
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
}
The normal_distribution is allowed to have state. And with this particular distribution, it is common to generate numbers in pairs with every other call, and on the odd calls, return the second cached number. By constructing a new distribution on each call you are throwing away that cache.
Fortunately you can "shape" a single distribution by calling with different normal_distribution::param_type's:
normal_distribution<double> distribution;
using P = normal_distribution<double>::param_type;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l=distribution(generator, P(0.0,1/sqrt(vec1[j])));
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
I'm not familiar with all implementations of std::normal_distribution. However I wrote the one for libc++. So I can tell you with some amount of certainty that my slight rewrite of your code will have a positive performance impact. I am unsure what impact it will have on the quality, except to say that I know it won't degrade it.
Update
Regarding Severin Pappadeux's comment below about the legality of generating pairs of numbers at a time within a distribution: See N1452 where this very technique is discussed and allowed for:
Distributions sometimes store values from their associated source of
random numbers across calls to their operator(). For example, a common
method for generating normally distributed random numbers is to
retrieve two uniformly distributed random numbers and compute two
normally distributed random numbers out of them. In order to reset the
distribution's random number cache to a defined state, each
distribution has a reset member function. It should be called on a
distribution whenever its associated engine is exchanged or restored.
Some thoughts on top of excellent HH answer
Normal distribution (mu,sigma) is generated from normal (0,1) by shift and scale:
N(mu, sigma) = mu + N(0,1)*sigma
if your mean (mu) is always zero, you could simplify and speed-up (by not adding 0.0) your code by doing something like
normal_distribution<double> distribution;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l = distribution(generator);
y += l*vec2[j]/sqrt(vec1[j]);
}
cout << y << endl;
y=0.0;
}
If speed is of utmost importance, I would try to precompute everything I can outside the main 10^8 loop. Is it possible to precompute sqrt(vec1[j]) so you save on sqrt() call? Is it possible to
have vec2[j]/sqrt(vec1[j]) as a single vector?
If it is not possible to precompute those vectors, I would try to save on memory access. Keeping pieces of vec2[j] and vec1[j] together might help with fetching one cache line instead of two. So declare vector<pair<double,double>> vec12(dim_vec); and use in sampling y+=l*vec12[j].first/sqrt(vec12[j].second)

boost random number library, use same random number generator for different variate generators

It seems that one can use the following code to produce random numbers from a particular Normal distribution:
float mean = 0, variance = 1;
boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));
boost::normal_distribution<float> noise(mean, variance);
variate_generator<mt19937, normal_distribution<float> > nD(randgen, noise);
float random = nD();
This works fine, however, I would like to be able to draw numbers from several distributions, i.e. one would think something like:
float mean1 = 0, variance1 = 1, mean2 = 10, variance2 = 0.25;
boost::mt19937 randgen(static_cast<unsigned int>(std::time(0)));
boost::normal_distribution<float> noise1(mean1, variance1);
boost::normal_distribution<float> noise2(mean2, variance2);
variate_generator<mt19937, normal_distribution<float> > nD(randgen, noise1);
variate_generator<mt19937, normal_distribution<float> > nC(randgen, noise2);
float random1 = nD();
float random2 = nC();
However, the problem appears to be that nD() and nC() are generating similar sequences of numbers. I hypothesize this is because the constructor for variate_generator appears to make a copy of randgen, not use it explicitly. Thus, the same psuedo-random sequence is being generated and simply pushed through different transformations (due to the different parameters of the distributions).
Does anyone know if there is a way, in Boost, to create a single random number generator and use it for multiple distributions? Alternatively, does the design of the Boost random library intend users to create one random number generator per distribution? Obviously, I could write code to transform a sequence of uniform random numbers to a sequence from an arbitrary distribution, but I'm looking for something simple and already built-in to the library.
Thanks in advance for your help.
Your hypothesis is correct. You want both variate_generator instances to use the same random number generator instance. So use a reference to mt19937 as your template parameter.
variate_generator<mt19937 &, normal_distribution<float> > nD(randgen, noise1);
variate_generator<mt19937 &, normal_distribution<float> > nC(randgen, noise2);
Obviously you'll have to ensure randgen does not go out of scope before nD and nC do.