several random numbers c++ - c++

I am a physicist, writing a program that involves generating several (order of a few billions) random numbers, drawn from a Gaussian distribution. I am trying to use C++11. The generation of these random numbers is separated by an operation that should take very little time. My biggest worry is if the fact that I am generating so many random numbers, with such a little time gap, could potentially lead to sub-optimal performance. I am testing certain statistical properties, which rely heavily on the independence of the randomness of the numbers, so, my result is particularly sensitive to these issues. My question is, with the kinds of numbers I mention below in the code (a simplified version of my actual code), am I doing something obviously (or even, subtly) wrong?
#include <random>
// Several other includes, etc.
int main () {
int dim_vec(400), nStats(1e8);
vector<double> vec1(dim_vec), vec2(dim_vec);
// Initialize the above vectors, which are order 1 numbers.
random_device rd;
mt19937 generator(rd());
double y(0.0);
double l(0.0);
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
normal_distribution<double> distribution(0.0,1/sqrt(vec1[j]));
l=distribution(generator);
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
}

The normal_distribution is allowed to have state. And with this particular distribution, it is common to generate numbers in pairs with every other call, and on the odd calls, return the second cached number. By constructing a new distribution on each call you are throwing away that cache.
Fortunately you can "shape" a single distribution by calling with different normal_distribution::param_type's:
normal_distribution<double> distribution;
using P = normal_distribution<double>::param_type;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l=distribution(generator, P(0.0,1/sqrt(vec1[j])));
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
I'm not familiar with all implementations of std::normal_distribution. However I wrote the one for libc++. So I can tell you with some amount of certainty that my slight rewrite of your code will have a positive performance impact. I am unsure what impact it will have on the quality, except to say that I know it won't degrade it.
Update
Regarding Severin Pappadeux's comment below about the legality of generating pairs of numbers at a time within a distribution: See N1452 where this very technique is discussed and allowed for:
Distributions sometimes store values from their associated source of
random numbers across calls to their operator(). For example, a common
method for generating normally distributed random numbers is to
retrieve two uniformly distributed random numbers and compute two
normally distributed random numbers out of them. In order to reset the
distribution's random number cache to a defined state, each
distribution has a reset member function. It should be called on a
distribution whenever its associated engine is exchanged or restored.

Some thoughts on top of excellent HH answer
Normal distribution (mu,sigma) is generated from normal (0,1) by shift and scale:
N(mu, sigma) = mu + N(0,1)*sigma
if your mean (mu) is always zero, you could simplify and speed-up (by not adding 0.0) your code by doing something like
normal_distribution<double> distribution;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l = distribution(generator);
y += l*vec2[j]/sqrt(vec1[j]);
}
cout << y << endl;
y=0.0;
}
If speed is of utmost importance, I would try to precompute everything I can outside the main 10^8 loop. Is it possible to precompute sqrt(vec1[j]) so you save on sqrt() call? Is it possible to
have vec2[j]/sqrt(vec1[j]) as a single vector?
If it is not possible to precompute those vectors, I would try to save on memory access. Keeping pieces of vec2[j] and vec1[j] together might help with fetching one cache line instead of two. So declare vector<pair<double,double>> vec12(dim_vec); and use in sampling y+=l*vec12[j].first/sqrt(vec12[j].second)

Related

C++: How to generate random numbers while excluding numbers from a given cache

So in c++ I'm using mt19937 engine and the uniform_int_distribution in my random number generator like so:
#include <random>
#include <time.h>
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
return dist(mt);
}
What I need is to alter the above generator such that there is a cache that contains a number of integers I need to be excluded when I use the above generator over and over again.
How do I alter the above such that I can achieve this?
There are many ways to do it. A simple way would be to maintain your "excluded numbers" in a std::set and after each generation of a random number, check whether it is in the set and if it is then generate a new random number - repeat until you get a number that was not in the set, then return that.
Btw; while distributions are cheap to construct, engines are not. You don't want to re-construct your mt19937 every time the function is called, but instead create it once and then re-use it. You probably also want to use a better seed than the current time in seconds.
Are you 1) attempting to sample without replacement in the discrete interval? Or is it 2) a patchy distribution over the interval that says fairly constant?
If 1) you could use std::shuffle as per the answer here How to sample without replacement using c++ uniform_int_distribution
If 2) you could use std::discrete_distribution (element 0 corresponding to lwr_lm) and weight zero the numbers you don't want. Obviously the memory requirements are linear in upper_lm-lwr_lm so might not be practical if this is large
I would propose two similar solutions for the problem. They are based upon probabilistic structures, and provide you with the answer "potentially in cache" or "definitely not in cache". There are false positives but no false negatives.
Perfect hash function. There are many implementations, including one from GNU. Basically, run it on set of cache values, and use generated perfect hash functions to reject sampled values. You don't even need to maintain hash table, just function mapping random value to integer index. As soon as index is in the hash range, reject the number. Being perfect means you need only one call to check and result will tell you that number is in the set. There are potential collisions, so false positives are possible.
Bloom filter. Same idea, build filter with whatever bits per cache item you're willing to spare, and with quick check you either will get "possible in the cache" answer or clear negative. You could trade answer precision for memory and vice versa. False positives are possible
As mentioned by #virgesmith, in his answer, it might be better solution in function of your problem.
The method with a cache and uses it to filter future generation is inefficient for large range wiki.
Here I write a naive example with a different method, but you will be limited by your memory. You pick random number for a buffer and remove it for next iteration.
#include <random>
#include <time.h>
#include <iostream>
int get_random(int lwr_lm, int upper_lm, std::vector<int> &buff, std::mt19937 &mt){
if (buff.size() > 0) {
std::uniform_int_distribution<int> dist(0, buff.size()-1);
int tmp_index = dist(mt);
int tmp_value = buff[tmp_index];
buff.erase(buff.begin() + tmp_index);
return tmp_value;
} else {
return 0;
}
}
int main() {
// lower and upper limit for random distribution
int lower = 0;
int upper = 10;
// Random generator
std::mt19937 mt(time(nullptr));
// Buffer to filter and avoid duplication, Buffer contain all integer between lower and uper limit
std::vector<int> my_buffer(upper-lower);
std::iota(my_buffer.begin(), my_buffer.end(), lower);
for (int i = 0; i < 20; ++i) {
std::cout << get_random(lower, upper, my_buffer, mt) << std::endl;
}
return 0;
}
Edit: a cleaner solution here
It might not be the prettiest solution, but what's stopping you from maintaining that cache and checking existence before returning? It will slow down for large caches though.
#include <random>
#include <time.h>
#include <set>
std::set<int> cache;
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
auto r = dist(mt);
while(cache.find(r) != cache.end())
r = dist(mt);
return r;
}

What is an efficient method to force uniqueness using rand();

If I used (with appropriate #includes)
int main()
{
srand(time(0));
int arr[1000];
for(int i = 0; i < 1000; i++)
{
arr[i] = rand() % 100000;
}
return 0;
}
To generate random 5-digit ID numbers (disregard iomanip stuff here), would those ID numbers be guranteed by rand() to be unique? I've been running another loop to check all the values of the array vs the recently generated ID number but it takes forever to run, considering the nested 1000 iteration loops. By the way is there a simple way to do that check?
Since the question was tagged c++11,
you should consider using <random> in place of rand().
Using a standard distribution engine, you can't guarantee that you will get back unique values. If you use a std::set, you can keep retrying until you have the right amount. Depending on your distribution range, and the amount of unique values you are requesting, that may be adequate.
For example, here is a customized function to get n unique values from range [x,y].
#include <unordered_set>
#include <iostream>
#include <random>
template <typename T>
std::unordered_set<T> GetUniqueNumbers(int amount, T low, T high){
static std::random_device random_device;
static std::mt19937 engine{random_device()};
std::uniform_int_distribution<T> dist(low, high);
std::unordered_set<T> uniques;
while (uniques.size() < amount){
uniques.insert(dist(engine));
}
return uniques;
}
int main(){
//get 10 unique numbers between [0,100]
auto numbers = GetUniqueNumbers(10,0,100);
for (auto number: numbers){
std::cout << number << " ";
}
}
No, because any guarantee about the output of a random source makes it less random.
There are specific mathematical formulas that have the behavior known as a random permutation. This site seems to have quite a good write-up about it: http://preshing.com/20121224/how-to-generate-a-sequence-of-unique-random-integers/
No, there is definitely no guarantee rand will not produce duplicate numbers, designing it in such a way would not only be expensive in terms of remembering all the numbers it has returned so far but will also reduce its randomness greatly (after it had returned many numbers you could guess what it is likely to return from what it had already returned so far).
If uniqueness is your only goal, just use an incrementing ID number for each thing. If the numbers must also be arbitrary and hard to guess you will have to use some kind of random generator or hash, but should make the numbers much longer to make the chance of a collision much closer to 0.
However if you absolutely must do it the current way I would suggest storing all the numbers you have generated so far into a std::unordered_map and generating another random number if it is already in it.
There is a common uniqueness guarantee in most PRNGs, but it won't help you here. A generator will typically iterate over a finite number of states and not visit the same state twice until every other state has been visited once.
However, a state is not the same thing as the number you get to see. Many states can map to the same number and in the worst possible case two consecutive states could map to the same number.
That said, there are specific configurations of PRNG that can visit every value in a range you specify exactly once before revisiting an old state. Notably, an LCG designed with a modulo that is a multiple of your range can be reduced to exactly your range with another modulo operation. Since most LCG implementations have a power-of-two period, this means that the low-order bits repeat with shorter periods. However, 10000 is not a power of two, so that won't help you.
A simple method is to use an LCG, bitmask it down to a power of two larger than your desired range, and just throw away results that it produces that are out of range.

Why GCC and MSVC std::normal_distribution are different? [duplicate]

This question already has an answer here:
std::normal_distribution<double> results in wrong order windows versus linux?
(1 answer)
Closed 6 years ago.
I have a simple code sample:
#include <iostream>
#include <random>
using namespace std;
int main() {
minstd_rand0 gen(1);
uniform_real_distribution<double> dist(0.0, 1.0);
for(int i = 0; i < 10; ++i) {
cout << "1 " << dist(gen) << endl;
}
normal_distribution<double> dist2(0.0, 1.0);
minstd_rand0 gen2(1);
for(int i = 0; i < 10; ++i) {
cout << "2 " << dist2(gen2) << endl;
}
return 0;
}
Which I compile on gcc and msvc. I get diferent results on std code!(
So why GCC and MSVC std::normal_distribution results are diferent for the same seed and generator, and, most importantly, how to force them to be same?
Unlike the PRN generators defined by the standard that must produce the same output for the same seed the standard does not keep that mandate for distrobutions. From [rand.dist.general]/3
The algorithms for producing each of the specified distributions are implementation-defined.
So In this case even though the distribution has to have a density function in the form of
How the implementation does that is up to them.
The only way to get a portable distribution would be to write one yourself or use a third party library.
It's problematic, but the standard unfortunately does not specify in detail what algorithm to use when constructing (many) of the randomly distributed numbers, and there are several valid alternatives, with different benefits.
26.6.8.5 Normal distributions [rand.dist.norm]
26.6.8.5.1 Class template normal_distribution [rand.dist.norm.normal]
A normal_distribution random number distribution produces random
numbers x distributed according to the probability density function
parameters μ and are also known as this distribution’s mean and
standard deviation .
The most common algorithm for generating normally distributed numbers is Box-Muller, but even with that algorithm there are options and variations.
The freedom is even explicitly mentioned in the standard:
26.6.8 Random number distribution class templates [rand.dist]
. . .
3 The
algorithms for producing each of the specified distributions are
implementation-defined.
A goto option for this is boost random
By the way, as #Hurkyl points out: It seems that the two implementations are actually the same: For example box-muller generates pairs of values, of which one is returned and one is cached. The two implementations differ only in which of the values is returned.
Further, the random number engines are completely specified and will give the same sequence between implementations, but care does need to be taken since the different distributions can also consume different amounts of random data in order to produce their results, which will put the engines out of sync.

How to get nth number in sequence of rand() directly without having to call rand() n times?

According to my understanding, setting srand with a particular seed causes the sequence of calls to rand() to produce the same series of numbers each time for that particular seed:
Eg:
srand(seed1);
rand() // firstnumber (e.g.: 42)
rand() // second number (e.g: 17)
srand(seed1)
rand() // first number (same as above (42))
rand() // second number (same as above (17))
Is there a way to get the nth number in the sequence directly without having to call rand() n times ?
For example, if I want the 17th random number in the series, I want to get the number in one call, instead of calling rand() 17 times.
I cannot precompute and store the values
EDIT: I was looking at this article :
https://mathoverflow.net/questions/104915/pseudo-random-algorithm-allowing-o1-computation-of-nth-element
The answer on linear feedback shift registers seems to do it, but rather than implement it myself, I would rather use a trusted implementation, since this seems like a common problem.
EDIT: The reason I want to "jump" to the nth term, is because I use rand in different classes with different seeds, and I keep jumping back and forth between each class. I want the sequence in each class to continue where it left off, instead of starting from the first number each time. This is a single threaded application.
EDIT: When writing the post, I used the term PRNG. But really I'm just looking for a function which appears to produce random number. I'm using this for graphics, so there is no security problem. I use the random numbers to produce slight offsets in pixels.
I just need a function which is fast.
Appears to produce random numbers, but doesn't have to be of the kind used in security applications.
Have to be able to calculate the nth number in O(1) time.
Edit: Made a mistake - storing state isn't enough. I need to calculate nth random number in series in O(1) time. Since within the same class there may be multiple calls for the same nth term, storing state won't be enough, and I need to compute the nth term in O(1)
All of the C++11 PRNGs have a "discard" function, e.g.
#include <random>
#include <iostream>
int main() {
std::mt19937 rng;
static const size_t distance = 5;
rng.seed(0);
rng.discard(distance);
std::cout << "after discard 5: " << rng() << '\n';
rng.seed(0);
for (size_t i = 0; i <= distance; ++i) {
std::cout << i << ": " << rng() << '\n';
}
}
http://ideone.com/0zeRNq
after discard 5: 3684848379
0: 2357136044
1: 2546248239
2: 3071714933
3: 3626093760
4: 2588848963
5: 3684848379
Make your own rand and store one in each class.
Of course this is the weakest PRNG.
The point is you can have multiple PRNG active at once.
class Rand {
int seed;
const int a = 1103515245;
const int c = 12345;
public:
Rand();
void srand( int );
int rand();
};
Rand::Rand() : seed(123456789) {}
void Rand::srand( int s ) { seed = s; }
int Rand::rand()
{
seed = a * seed + c;
return seed;
}
The OP asks for "I use rand in different classes with different seeds".
Each instance of Rand has its own seed.
So place an instance of Rand in each object that needs its own seed.
Use rand_r(). With that function, the seed is not global and implicit. You pass the seed to use explicitly and the function updates it as it computes the next random number. That way, each class's stream of random numbers is independent of the others'.
Each object or each class (depending on your design needs) would store a seed value in an unsigned int variable. It would initialize it; for objects, in the init method; for classes, in +initialize. You could use the time or perhaps /dev/random for the initial value. If you initialize several such objects or classes in close succession, then using the time is a bad idea, since they may all happen at the "same" time (within the resolution of the clock you use).
After that, each time you want a random number, you call rand_r(&yourSeedVariable). That will return a pseudo-random value computed only from the passed-in seed, not using any implicit or global state. It uses the same algorithm as rand(). It also updates the seed variable such that the next call will produce the next random number in that sequence.
Any other object or class using this same technique would have an independent random sequence. Their calls to rand_r() would not affect this object's or class's and this object's or class's calls will not affect them. Same for any callers of rand().
To clarify a bit further. You said in one of the edits to your question:
The reason I want to "jump" to the nth term, is because I use rand in
different classes with different seeds, and I keep jumping back and
forth between each class. I want the sequence in each class to
continue where it left off, instead of starting from the first number
each time.
I am addressing that need with my suggestion. My suggestion does not address your question as phrased originally. It does not let you get the *n*th number in a pseudo-random sequence. It instead lets you use separate sequences in separate parts of your code such that they don't interfere with each other.
You want random access to a set of pseudorandom streams. You can get it by switching from std::rand() to a block cipher in counter mode (CTR) as your pseudorandom number generator. To read successive pseudorandom numbers, encrypt successive cleartext numbers. To read in some other order, encrypt numbers from the same range in some order order. Each class would then have its own seed, consisting of a key and initial value.
For example, one class's seed might be 8675309 and initial value 8008135. To read off successive random numbers, encrypt each of 8008136, 8008137, 8008138, 8008139, 8008140, ... with that key. To read off the 17th number in this sequence, encrypt (8008135 + 17) = 8008152.
You can use a 1:1 hash function on a 32-bit or 64-bit counter. For your hash you can adapt any method that a PRNG would use as its feedback and/or tempering function, like this one from Wikipedia's xorshift page:
uint64_t state;
void srand(uint64_t seed) {
state = seed;
}
uint64_t hash(uint64_t x) {
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 2685821657736338717ull;
}
uint32_t rand(void) {
return hash(state++) >> 32;
}
uint32_t rand(uint32_t n) {
return hash(n) >> 32;
}
The main thing about PRNGs is that (in common, fast implementations) next value depends on previous. So no, you can't get Nth value without calculating all the previous N-1 ones.
Short answer: no.
Longer answer: Pseudorandom series are "random" in that the computer cannot pre-compute the series without knowing the previously pre-computed item (or the seed), but are "pseudo" in that the series is reproducible using the same seed.
From using Google-fu, LSFR's require a finite number of states. PRNG's, which is what you're trying to get, do not.

Get random number in sequence C++

Is there a way using the C++ standard library built in random generator to get a specific random number in a sequence, without saving them all?
Like
srand(cTime);
getRand(1); // 10
getRand(2); // 8995
getRand(3); // 65464456
getRand(1); // 10
getRand(2); // 8995
getRand(1); // 10
getRand(3); // 65464456
C++11 random number engines are required to implement a member function discard(unsigned long long z) (§26.5.1.4) that advances the random number sequence by z steps. The complexity guarantee is quite weak: "no worse than
the complexity of z consecutive calls e()". This member obviously exists solely to make it possible to expose more performant implementations when possible as note 274 states:
This operation is common in user code, and can often be implemented
in an engine-specific manner so as to provide significant performance
improvements over an equivalent naive loop that makes z consecutive
calls e().
Given discard you can easily implement your requirement to retrieve the nth number in sequence by reseeding a generator, discarding n-1 values and using the next generated value.
I'm unaware of which - if any - of the standard RNG engines are amenable to efficient implementations of discard. It may be worth your time to do a bit of investigation and profiling.
You have to save the numbers. There may be other variants, but it still requires saving a list of numbers (e.g. using different seeds based on the argument to getRand() - but that wouldn't really be beneficial over saving them).
Something like this would work reasonably well, I'd say:
int getRand(int n)
{
static std::map<int, int> mrand;
// Check if it's there.
if ((std::map::iterator it = mrand.find(n)) != mrand.end())
{
return it->second;
}
int r = rand();
mrand[n] = r;
return r;
}
(I haven't compiled this code, just written it up as a "this sort of thing might work")
Implement getRand() to always seed and then return the given number. This will interfere with all other random numbers in a system, though, and will be slow, especially for large indexes. Assuming a 1-based index:
int getRand(int index)
{
srand(999); // fix the seed
for (int loop=1; loop<index; ++loop)
rand();
return rand();
}
Similar to cdmh's post,
Following from C++11 could also be used :
#include<random>
long getrand(int index)
{
std::default_random_engine e;
for(auto i=1;i<index;i++)
e();
return e();
}
Check out:
Random123
From the documentation:
Random123 is a library of "counter-based" random number generators (CBRNGs), in which the Nth random number can be obtained by applying a stateless mixing function to N..