Is there a way using the C++ standard library built in random generator to get a specific random number in a sequence, without saving them all?
Like
srand(cTime);
getRand(1); // 10
getRand(2); // 8995
getRand(3); // 65464456
getRand(1); // 10
getRand(2); // 8995
getRand(1); // 10
getRand(3); // 65464456
C++11 random number engines are required to implement a member function discard(unsigned long long z) (ยง26.5.1.4) that advances the random number sequence by z steps. The complexity guarantee is quite weak: "no worse than
the complexity of z consecutive calls e()". This member obviously exists solely to make it possible to expose more performant implementations when possible as note 274 states:
This operation is common in user code, and can often be implemented
in an engine-specific manner so as to provide significant performance
improvements over an equivalent naive loop that makes z consecutive
calls e().
Given discard you can easily implement your requirement to retrieve the nth number in sequence by reseeding a generator, discarding n-1 values and using the next generated value.
I'm unaware of which - if any - of the standard RNG engines are amenable to efficient implementations of discard. It may be worth your time to do a bit of investigation and profiling.
You have to save the numbers. There may be other variants, but it still requires saving a list of numbers (e.g. using different seeds based on the argument to getRand() - but that wouldn't really be beneficial over saving them).
Something like this would work reasonably well, I'd say:
int getRand(int n)
{
static std::map<int, int> mrand;
// Check if it's there.
if ((std::map::iterator it = mrand.find(n)) != mrand.end())
{
return it->second;
}
int r = rand();
mrand[n] = r;
return r;
}
(I haven't compiled this code, just written it up as a "this sort of thing might work")
Implement getRand() to always seed and then return the given number. This will interfere with all other random numbers in a system, though, and will be slow, especially for large indexes. Assuming a 1-based index:
int getRand(int index)
{
srand(999); // fix the seed
for (int loop=1; loop<index; ++loop)
rand();
return rand();
}
Similar to cdmh's post,
Following from C++11 could also be used :
#include<random>
long getrand(int index)
{
std::default_random_engine e;
for(auto i=1;i<index;i++)
e();
return e();
}
Check out:
Random123
From the documentation:
Random123 is a library of "counter-based" random number generators (CBRNGs), in which the Nth random number can be obtained by applying a stateless mixing function to N..
Related
So in c++ I'm using mt19937 engine and the uniform_int_distribution in my random number generator like so:
#include <random>
#include <time.h>
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
return dist(mt);
}
What I need is to alter the above generator such that there is a cache that contains a number of integers I need to be excluded when I use the above generator over and over again.
How do I alter the above such that I can achieve this?
There are many ways to do it. A simple way would be to maintain your "excluded numbers" in a std::set and after each generation of a random number, check whether it is in the set and if it is then generate a new random number - repeat until you get a number that was not in the set, then return that.
Btw; while distributions are cheap to construct, engines are not. You don't want to re-construct your mt19937 every time the function is called, but instead create it once and then re-use it. You probably also want to use a better seed than the current time in seconds.
Are you 1) attempting to sample without replacement in the discrete interval? Or is it 2) a patchy distribution over the interval that says fairly constant?
If 1) you could use std::shuffle as per the answer here How to sample without replacement using c++ uniform_int_distribution
If 2) you could use std::discrete_distribution (element 0 corresponding to lwr_lm) and weight zero the numbers you don't want. Obviously the memory requirements are linear in upper_lm-lwr_lm so might not be practical if this is large
I would propose two similar solutions for the problem. They are based upon probabilistic structures, and provide you with the answer "potentially in cache" or "definitely not in cache". There are false positives but no false negatives.
Perfect hash function. There are many implementations, including one from GNU. Basically, run it on set of cache values, and use generated perfect hash functions to reject sampled values. You don't even need to maintain hash table, just function mapping random value to integer index. As soon as index is in the hash range, reject the number. Being perfect means you need only one call to check and result will tell you that number is in the set. There are potential collisions, so false positives are possible.
Bloom filter. Same idea, build filter with whatever bits per cache item you're willing to spare, and with quick check you either will get "possible in the cache" answer or clear negative. You could trade answer precision for memory and vice versa. False positives are possible
As mentioned by #virgesmith, in his answer, it might be better solution in function of your problem.
The method with a cache and uses it to filter future generation is inefficient for large range wiki.
Here I write a naive example with a different method, but you will be limited by your memory. You pick random number for a buffer and remove it for next iteration.
#include <random>
#include <time.h>
#include <iostream>
int get_random(int lwr_lm, int upper_lm, std::vector<int> &buff, std::mt19937 &mt){
if (buff.size() > 0) {
std::uniform_int_distribution<int> dist(0, buff.size()-1);
int tmp_index = dist(mt);
int tmp_value = buff[tmp_index];
buff.erase(buff.begin() + tmp_index);
return tmp_value;
} else {
return 0;
}
}
int main() {
// lower and upper limit for random distribution
int lower = 0;
int upper = 10;
// Random generator
std::mt19937 mt(time(nullptr));
// Buffer to filter and avoid duplication, Buffer contain all integer between lower and uper limit
std::vector<int> my_buffer(upper-lower);
std::iota(my_buffer.begin(), my_buffer.end(), lower);
for (int i = 0; i < 20; ++i) {
std::cout << get_random(lower, upper, my_buffer, mt) << std::endl;
}
return 0;
}
Edit: a cleaner solution here
It might not be the prettiest solution, but what's stopping you from maintaining that cache and checking existence before returning? It will slow down for large caches though.
#include <random>
#include <time.h>
#include <set>
std::set<int> cache;
int get_random(int lwr_lm, int upper_lm){
std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
auto r = dist(mt);
while(cache.find(r) != cache.end())
r = dist(mt);
return r;
}
Since C++11 there are a number of std random number engines. One of the member functions they implement is void discard(int long long z) which skips over z randomly generated numbers. The complexity of this function is given as O(z) on www.cplusplus.com (http://www.cplusplus.com/reference/random/mersenne_twister_engine/discard/)
However, on www.cppreference.com (http://en.cppreference.com/w/cpp/numeric/random/mersenne_twister_engine/discard) there is a note to say that
For some engines, "fast jump" algorithms are known, which advancing
the state by many steps (order of millions) without calculating
intermediate state transitions.
How do I know for which engines the actual cost of discard is O(1)?
Well, if you use precomputed jump points, O(1) will work for each and every RNG in existence. Please, remember, that there are algorithm which might have better than O(z), but not O(1) - say, O(log2 z).
If we're talking about jump to arbitrary point, things get interesting. For example, for linear congruential generator there is known O(log2 z) jump ahead algorithm, based upon paper by F. Brown, "Random Number Generation with Arbitrary Stride," Trans. Am. Nucl. Soc. (Nov. 1994). Code example is here.
There is LCG RNG in the C++11 standard, not sure how fast jump ahead is done in particular implementation (http://en.cppreference.com/w/cpp/numeric/random/linear_congruential_engine)
PCG family of RNGs share the same property, I believe
Fact is that std::linear_congruential_engine<UIntType,a,c,m>::discard(unsigned long long z) is definitely possible to implement very efficiently. It is mostly equivalent to exponentiation of a to the power of z modulo m (for both zero and non-zero c) - it means most basic software implementation executes in O(log(z % phi(m))) UIntType multiplications (phi(m)=m-1 for prime number and <m in general) and can be implemented to execute much faster with parallel hardware exponentiation algorithm.
^^^NOTE that in a way O(log(z % phi(m))) is O(1) because log2(z % phi(m)) < log2(m) < sizeof(UIntType)*CHAR_BIT - though in practice it is more often like O(log(z)).
Also probably there are efficient algorithms for most other engines' discard functions that would satisfy O(P(size of state)) constraint (P(x) - some low degree polynomial function, most likely 1+epsilon degree or something even smaller like x*log(x), given that log(z) < sizeof(unsigned long long)*CHAR_BIT may be considered as constant).
Somehow for some unknown reason C++ standard (as of ISO/IEC 14882:2017) does not require to implement discard in more efficient way than just z operator()() calls for any PRNG engine including those that definitely allow this.
To me personally it is baffling and JUST MAKES NO SENSE - it brutally violates one of the fundamental C++ language design principles that is to add to the C++ standard only reasonable functionality in terms of performance and practical usefulness.
For example: there is NO SUCH THING AS std::list<T>::operator[](size_type n) even though it is as "easy" as to just call operator++() n times with iterator begin(). And naturally so because O(n) execution time would make this function unreasonable choice in any practical application (a code word for "plain stupid idea"). For this obvious reason a[n] and a.at(n) is not part of mandatory Sequence container requirements (ISO/IEC 14882:2017 26.2.3 Table 87) but instead is a part of Optional sequence container operations (ISO/IEC 14882:2017 26.2.3 Table 88).
SO why in the world then e.discard(z) is part of mandatory Random number engine requirements (ISO/IEC 14882:2017 29.6.1.4 Table 104) with this ridiculous complexity requirement - no worse than the complexity of z consecutive calls e() - instead of some optional operations section entry with adequate complexity requirement like O(size of state) or O(P(size of state))?
Even more baffling was to actually find in my GCC this real-world implementation:
void discard(unsigned long long __z)
{
for (; __z != 0ULL; --__z)
(*this)(); //<-- Wait what? Are you kidding me?
}
So once again as it was before we have no other choice than to implement the necessary functionality ourselves...
I don't think such things exists at all.
My heuristic conclusion is that O(1)-jump RNG is basically a hash, with all that this implies (e.g. it might not be "good" RNG at all).
(See new answer by #AkiSuihkonen and link therein.)
But even if you are asking about O(log z) I don't see that implemented in the STL.
In GCC's all the discard functions I was able to grep are all simple loops.
discard(unsigned long long __z)
{
for (; __z != 0ULL; --__z)
(*this)();
}
Which is not only sad but also misleading since discard should exists only if there is an efficient way to do it.
The only non trivial one is mersenne (below) but it is still O(z).
discard(unsigned long long __z)
{
while (__z > state_size - _M_p)
{
__z -= state_size - _M_p;
_M_gen_rand();
}
_M_p += __z;
}
Boost's Mersenne, has a skip function but it is only called for skips larger than a (default of) 10000000 (!?). Which already tells me that that the skip is very heavy computationally (even if it is O(log z)).
https://www.boost.org/doc/libs/1_72_0/boost/random/mersenne_twister.hpp
Finally, Thrust has an efficient discard for linear congruential apparently, but only in the case c == 0. (Which I am not sure if it makes it less useful as a RNG.)
https://thrust.github.io/doc/classthrust_1_1random_1_1linear__congruential__engine_aec05b19d2a85d02f1ff437791ea4dd68.html#aec05b19d2a85d02f1ff437791ea4dd68
All Counter Based Random Number Generators
These work solely by computing a function uint64_t rnd(uint64_t counter) or maybe uint16_t rnd(uint128_t counter). Then the skip function is as easy as
// the method itself is "randomly" generated -- known at least
// by von Neumann to typically generate poor results
struct MyRandomCBRNG {
uint64_t counter{0};
void skip(uint64_t a) { counter+=a;}
uint64_t operator()() {
uint64_t x = counter++;
// repeat multiple times if needed
x = x * 0xdeadbeefcafebabeull ^ (x >> 53) ^ (x << 11) + 0x13459876abcdfdecull;
return x;
}
};
One can make even cryptographically strong CBRNGs by hashing the counter concatenated by secret key with something like SHA-512, not to mention Blum-Blum-Shub.
If I used (with appropriate #includes)
int main()
{
srand(time(0));
int arr[1000];
for(int i = 0; i < 1000; i++)
{
arr[i] = rand() % 100000;
}
return 0;
}
To generate random 5-digit ID numbers (disregard iomanip stuff here), would those ID numbers be guranteed by rand() to be unique? I've been running another loop to check all the values of the array vs the recently generated ID number but it takes forever to run, considering the nested 1000 iteration loops. By the way is there a simple way to do that check?
Since the question was tagged c++11,
you should consider using <random> in place of rand().
Using a standard distribution engine, you can't guarantee that you will get back unique values. If you use a std::set, you can keep retrying until you have the right amount. Depending on your distribution range, and the amount of unique values you are requesting, that may be adequate.
For example, here is a customized function to get n unique values from range [x,y].
#include <unordered_set>
#include <iostream>
#include <random>
template <typename T>
std::unordered_set<T> GetUniqueNumbers(int amount, T low, T high){
static std::random_device random_device;
static std::mt19937 engine{random_device()};
std::uniform_int_distribution<T> dist(low, high);
std::unordered_set<T> uniques;
while (uniques.size() < amount){
uniques.insert(dist(engine));
}
return uniques;
}
int main(){
//get 10 unique numbers between [0,100]
auto numbers = GetUniqueNumbers(10,0,100);
for (auto number: numbers){
std::cout << number << " ";
}
}
No, because any guarantee about the output of a random source makes it less random.
There are specific mathematical formulas that have the behavior known as a random permutation. This site seems to have quite a good write-up about it: http://preshing.com/20121224/how-to-generate-a-sequence-of-unique-random-integers/
No, there is definitely no guarantee rand will not produce duplicate numbers, designing it in such a way would not only be expensive in terms of remembering all the numbers it has returned so far but will also reduce its randomness greatly (after it had returned many numbers you could guess what it is likely to return from what it had already returned so far).
If uniqueness is your only goal, just use an incrementing ID number for each thing. If the numbers must also be arbitrary and hard to guess you will have to use some kind of random generator or hash, but should make the numbers much longer to make the chance of a collision much closer to 0.
However if you absolutely must do it the current way I would suggest storing all the numbers you have generated so far into a std::unordered_map and generating another random number if it is already in it.
There is a common uniqueness guarantee in most PRNGs, but it won't help you here. A generator will typically iterate over a finite number of states and not visit the same state twice until every other state has been visited once.
However, a state is not the same thing as the number you get to see. Many states can map to the same number and in the worst possible case two consecutive states could map to the same number.
That said, there are specific configurations of PRNG that can visit every value in a range you specify exactly once before revisiting an old state. Notably, an LCG designed with a modulo that is a multiple of your range can be reduced to exactly your range with another modulo operation. Since most LCG implementations have a power-of-two period, this means that the low-order bits repeat with shorter periods. However, 10000 is not a power of two, so that won't help you.
A simple method is to use an LCG, bitmask it down to a power of two larger than your desired range, and just throw away results that it produces that are out of range.
I am a physicist, writing a program that involves generating several (order of a few billions) random numbers, drawn from a Gaussian distribution. I am trying to use C++11. The generation of these random numbers is separated by an operation that should take very little time. My biggest worry is if the fact that I am generating so many random numbers, with such a little time gap, could potentially lead to sub-optimal performance. I am testing certain statistical properties, which rely heavily on the independence of the randomness of the numbers, so, my result is particularly sensitive to these issues. My question is, with the kinds of numbers I mention below in the code (a simplified version of my actual code), am I doing something obviously (or even, subtly) wrong?
#include <random>
// Several other includes, etc.
int main () {
int dim_vec(400), nStats(1e8);
vector<double> vec1(dim_vec), vec2(dim_vec);
// Initialize the above vectors, which are order 1 numbers.
random_device rd;
mt19937 generator(rd());
double y(0.0);
double l(0.0);
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
normal_distribution<double> distribution(0.0,1/sqrt(vec1[j]));
l=distribution(generator);
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
}
The normal_distribution is allowed to have state. And with this particular distribution, it is common to generate numbers in pairs with every other call, and on the odd calls, return the second cached number. By constructing a new distribution on each call you are throwing away that cache.
Fortunately you can "shape" a single distribution by calling with different normal_distribution::param_type's:
normal_distribution<double> distribution;
using P = normal_distribution<double>::param_type;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l=distribution(generator, P(0.0,1/sqrt(vec1[j])));
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
I'm not familiar with all implementations of std::normal_distribution. However I wrote the one for libc++. So I can tell you with some amount of certainty that my slight rewrite of your code will have a positive performance impact. I am unsure what impact it will have on the quality, except to say that I know it won't degrade it.
Update
Regarding Severin Pappadeux's comment below about the legality of generating pairs of numbers at a time within a distribution: See N1452 where this very technique is discussed and allowed for:
Distributions sometimes store values from their associated source of
random numbers across calls to their operator(). For example, a common
method for generating normally distributed random numbers is to
retrieve two uniformly distributed random numbers and compute two
normally distributed random numbers out of them. In order to reset the
distribution's random number cache to a defined state, each
distribution has a reset member function. It should be called on a
distribution whenever its associated engine is exchanged or restored.
Some thoughts on top of excellent HH answer
Normal distribution (mu,sigma) is generated from normal (0,1) by shift and scale:
N(mu, sigma) = mu + N(0,1)*sigma
if your mean (mu) is always zero, you could simplify and speed-up (by not adding 0.0) your code by doing something like
normal_distribution<double> distribution;
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
l = distribution(generator);
y += l*vec2[j]/sqrt(vec1[j]);
}
cout << y << endl;
y=0.0;
}
If speed is of utmost importance, I would try to precompute everything I can outside the main 10^8 loop. Is it possible to precompute sqrt(vec1[j]) so you save on sqrt() call? Is it possible to
have vec2[j]/sqrt(vec1[j]) as a single vector?
If it is not possible to precompute those vectors, I would try to save on memory access. Keeping pieces of vec2[j] and vec1[j] together might help with fetching one cache line instead of two. So declare vector<pair<double,double>> vec12(dim_vec); and use in sampling y+=l*vec12[j].first/sqrt(vec12[j].second)
According to my understanding, setting srand with a particular seed causes the sequence of calls to rand() to produce the same series of numbers each time for that particular seed:
Eg:
srand(seed1);
rand() // firstnumber (e.g.: 42)
rand() // second number (e.g: 17)
srand(seed1)
rand() // first number (same as above (42))
rand() // second number (same as above (17))
Is there a way to get the nth number in the sequence directly without having to call rand() n times ?
For example, if I want the 17th random number in the series, I want to get the number in one call, instead of calling rand() 17 times.
I cannot precompute and store the values
EDIT: I was looking at this article :
https://mathoverflow.net/questions/104915/pseudo-random-algorithm-allowing-o1-computation-of-nth-element
The answer on linear feedback shift registers seems to do it, but rather than implement it myself, I would rather use a trusted implementation, since this seems like a common problem.
EDIT: The reason I want to "jump" to the nth term, is because I use rand in different classes with different seeds, and I keep jumping back and forth between each class. I want the sequence in each class to continue where it left off, instead of starting from the first number each time. This is a single threaded application.
EDIT: When writing the post, I used the term PRNG. But really I'm just looking for a function which appears to produce random number. I'm using this for graphics, so there is no security problem. I use the random numbers to produce slight offsets in pixels.
I just need a function which is fast.
Appears to produce random numbers, but doesn't have to be of the kind used in security applications.
Have to be able to calculate the nth number in O(1) time.
Edit: Made a mistake - storing state isn't enough. I need to calculate nth random number in series in O(1) time. Since within the same class there may be multiple calls for the same nth term, storing state won't be enough, and I need to compute the nth term in O(1)
All of the C++11 PRNGs have a "discard" function, e.g.
#include <random>
#include <iostream>
int main() {
std::mt19937 rng;
static const size_t distance = 5;
rng.seed(0);
rng.discard(distance);
std::cout << "after discard 5: " << rng() << '\n';
rng.seed(0);
for (size_t i = 0; i <= distance; ++i) {
std::cout << i << ": " << rng() << '\n';
}
}
http://ideone.com/0zeRNq
after discard 5: 3684848379
0: 2357136044
1: 2546248239
2: 3071714933
3: 3626093760
4: 2588848963
5: 3684848379
Make your own rand and store one in each class.
Of course this is the weakest PRNG.
The point is you can have multiple PRNG active at once.
class Rand {
int seed;
const int a = 1103515245;
const int c = 12345;
public:
Rand();
void srand( int );
int rand();
};
Rand::Rand() : seed(123456789) {}
void Rand::srand( int s ) { seed = s; }
int Rand::rand()
{
seed = a * seed + c;
return seed;
}
The OP asks for "I use rand in different classes with different seeds".
Each instance of Rand has its own seed.
So place an instance of Rand in each object that needs its own seed.
Use rand_r(). With that function, the seed is not global and implicit. You pass the seed to use explicitly and the function updates it as it computes the next random number. That way, each class's stream of random numbers is independent of the others'.
Each object or each class (depending on your design needs) would store a seed value in an unsigned int variable. It would initialize it; for objects, in the init method; for classes, in +initialize. You could use the time or perhaps /dev/random for the initial value. If you initialize several such objects or classes in close succession, then using the time is a bad idea, since they may all happen at the "same" time (within the resolution of the clock you use).
After that, each time you want a random number, you call rand_r(&yourSeedVariable). That will return a pseudo-random value computed only from the passed-in seed, not using any implicit or global state. It uses the same algorithm as rand(). It also updates the seed variable such that the next call will produce the next random number in that sequence.
Any other object or class using this same technique would have an independent random sequence. Their calls to rand_r() would not affect this object's or class's and this object's or class's calls will not affect them. Same for any callers of rand().
To clarify a bit further. You said in one of the edits to your question:
The reason I want to "jump" to the nth term, is because I use rand in
different classes with different seeds, and I keep jumping back and
forth between each class. I want the sequence in each class to
continue where it left off, instead of starting from the first number
each time.
I am addressing that need with my suggestion. My suggestion does not address your question as phrased originally. It does not let you get the *n*th number in a pseudo-random sequence. It instead lets you use separate sequences in separate parts of your code such that they don't interfere with each other.
You want random access to a set of pseudorandom streams. You can get it by switching from std::rand() to a block cipher in counter mode (CTR) as your pseudorandom number generator. To read successive pseudorandom numbers, encrypt successive cleartext numbers. To read in some other order, encrypt numbers from the same range in some order order. Each class would then have its own seed, consisting of a key and initial value.
For example, one class's seed might be 8675309 and initial value 8008135. To read off successive random numbers, encrypt each of 8008136, 8008137, 8008138, 8008139, 8008140, ... with that key. To read off the 17th number in this sequence, encrypt (8008135 + 17) = 8008152.
You can use a 1:1 hash function on a 32-bit or 64-bit counter. For your hash you can adapt any method that a PRNG would use as its feedback and/or tempering function, like this one from Wikipedia's xorshift page:
uint64_t state;
void srand(uint64_t seed) {
state = seed;
}
uint64_t hash(uint64_t x) {
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 2685821657736338717ull;
}
uint32_t rand(void) {
return hash(state++) >> 32;
}
uint32_t rand(uint32_t n) {
return hash(n) >> 32;
}
The main thing about PRNGs is that (in common, fast implementations) next value depends on previous. So no, you can't get Nth value without calculating all the previous N-1 ones.
Short answer: no.
Longer answer: Pseudorandom series are "random" in that the computer cannot pre-compute the series without knowing the previously pre-computed item (or the seed), but are "pseudo" in that the series is reproducible using the same seed.
From using Google-fu, LSFR's require a finite number of states. PRNG's, which is what you're trying to get, do not.