C++ fast normal random number generator

C++ fast normal random number generator - c++

I'm using the mt19937 generator to generate normal random numbers as shown below:
normal_distribution<double> normalDistr(0, 1);
mt19937 generator(123);
vector<double> randNums(1000000);
for (size_t i = 0; i != 1000000; ++i)
{
randNums[i] = normalDistr(generator);
}
The above code works, however since I'm generating more than 100 million normal random numbers in my code, the above is very slow.
Is there a faster way to generate normal random numbers?
The following is some background on how the code would be used:
Quality of the random numbers is not that important
Precision of the numbers is not that important, either double or float is OK
The normal distribution always has mean = 0 and sigma = 1
EDIT:
#Dúthomhas, Andrew:
After profiling the following function is taking up more than 50% of the time:
std::normal_distribution<double>::_Eval<std::mersenne_twister_engine<unsigned int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433‌253> >

Most importantly, do you really need 100,000,000 random numbers simultaneously? The writing to and subsequent reading from RAM of all these data unavoidably requires significant time. If you only need the random numbers one at a time, you should avoid that.
Assuming that you do need all of these numbers in RAM, then you should first
profile your code if you really want to know where the CPU time is spent/lost.
Second, you should avoid unnecessary re-allocation and initialisation of the data. This is most easily done by using std::vector::reserve(final_size) in conjunction with std::vector::push_back().
Third, you could use a faster RNG than std::mt19937. That RNG is recommended when the quality of the numbers is of importance. The online documentation says that the lagged Fibonacci generator (implemented in std:: subtract_with_carry_engine) is fast, but it may not have a long enough recurrence period -- you must check this. Alternatively, you may want to use std::min_stdrand (which uses the linear congruential generator)
std::vector<double> make_normal_random(std::size_t number,
std::uint_fast32_t seed)
{
std::normal_distribution<double> normalDistr(0,1);
std::min_stdrand generator(seed);
std::vector<double> randNums;
randNums.reserve(number);
while(number--)
randNums.push_back(normalDistr(generator));
return randNums;
}

You also will want to look into std::vector reserve rather than resize. It will allow you get all the memory you will need in 1 shot. I am assuming you don't need all 100 million doubles at once?

If it really is the generator that is the cause of the performance degradation then use the ordinary rand function (you need to draw numbers in pairs), transform to a float or double in 0, 1 then apply the Box Muller transformation.
That will be hard to beat in terms of time, but note that the statistical properties are no better than rand.
A numerical recipes routine gasdev does this - you should be able to download a copy.

Related

Forking a random number generator deterministically?

I'm using std::mt19937 to produce deterministic random numbers. I'd like to pass it to functions so I can control their source of randomness. I could do int foo(std::mt19937& rng);, but I want to call foo and bar in parallel, so that won't work. Even if I put the generation function behind a mutex (so each call to operator() did std::lock_guard lock(mutex); return rng();), calling foo and bar in parallel wouldn't be deterministic due to the race on the mutex.
I feel like conceptually I should be able to do this:
auto fooRNG = std::mt19937(rng()); // Seed a RNG with the output of `rng`.
auto barRNG = std::mt19937(rng());
parallel_invoke([&] { fooResult = foo(fooRNG); },
[&] { barResult = bar(barRNG); });
where I "fork" rng into two new ones with different seeds. Since fooRNG and barRNG are seeded deterministically, they should be random and independent.
Is this general gist viable?
Is this particular implementation sufficient (I doubt it)?
Extended question: Suppose I want to call baz(int n, std::mt19937&) massively in parallel over a range of indexed values, something like
auto seed = rng();
parallel_for(range(0, 1 << 20),
[&](int i) {
auto thisRNG = std::mt19937(seed ^ i); // Deterministically set up RNGs in parallel?
baz(i, thisRNG);
});
something like that should work, right? That is, provided we give it enough bits of state?
Update:
Looking into std::seed_seq, it looks(?) like it's designed to turn not-so-random seeds into high-quality seeds: How to properly initialize a C++11 std::seed_seq
So maybe what I want something like
std::mt19937 fork(std::mt19937& rng) {
return std::mt19937(std::seed_seq({rng()}));
}
or more generally:
//! Represents a state that can be used to generate multiple
//! distinct deterministic child std::mt19937 instances.
class rng_fork {
std::mt19937::result_type m_seed;
public:
rng_fork(std::mt19937& rng) : m_seed(rng()) {}
// Copy is explicit b/c I think it's a correctness footgun:
explicit rng_fork(const rng_fork&) = default;
//! Make the ith fork: a deterministic but well-seeded
//! RNG based off the internal seed and the given index:
std::mt19937 ith_fork(std::mt19937::result_type index) const {
return std::mt19937(std::seed_seq({m_seed, index}));
}
};
then the initial examples would become
auto fooRNG = fork(rng);
auto barRNG = fork(rng);
parallel_invoke([&] { fooResult = foo(fooRNG); },
[&] { barResult = bar(barRNG); });
and
auto fork_point = rng_fork{rng};
parallel_for(range(0, 1 << 20),
[&](int i) {
auto thisRNG = fork_point.ith_fork(i); // Deterministically set up a RNG in parallel.
baz(i, thisRNG);
});
Is that correct usage of std::seed_seq?

I am aware of 3 ways to seed multiple parallel pseudo random number generators (PRNGs):
First option
Given a seed, initialize the first instance of the PRNG with seed, the second with seed+1, etc. The thing to be aware of here is that the state of the PRNGs will be initially very close in case the seed is not hashed. Some PRNGs will take a long time to diverge. See e.g. this blog post for more information.
For std::mt19937 specifically, however, this was never an issue in my tests because the initial seed is not taken as is but instead gets "mangled/hashed" (compare the documentation of the result_type constructor). So it seems to be a viable option in practice.
However, notice that there are some potential pitfalls when seeding a Mersenne Twister (which has an internal state of 624 32-bit integers) with a single 32 bit integer. For example, the first number can never be 7 or 13. See this blog post for more information. But if you do not rely on the randomness of only the first few drawn numbers but draw a more reasonable number of numbers from each PRNG, it is probably fine.
Second option
Without std::seed_seq:
Seed one "parent" PRNG. Then, to initialize N parallel PRNGs, draw N random numbers and use them as seeds. This is your initial idea where you draw 2 random numbers rng() and initialize the two std::mt19937:
std::mt19937 & rng = ...;
auto fooRNG = std::mt19937(rng()); // Seed a RNG with the output of `rng`.
auto barRNG = std::mt19937(rng());
The major issue to look out for here is the birthday problem. It essentially states that the probability to draw the same number twice is more likely than you'd intuitively think. Given a type of PRNG that has a value range of b (i.e. b different values can appear in its output), the probability p(t) to draw the same number twice when drawing t numbers can be estimated as:
p(t) ~ t^2 / (2b) for t^2 << b
(compare this post). If we stretch the estimate "a bit", just to show the basic issue:
For a PRNG producing a 16 bit integer, we have b=2^16. Drawing 256 numbers results in a 50% chance to draw the same number twice according to that formula. For a 32 bit PRNG (such as std::mt19937) we need to draw 65536 numbers, and for a 64 bit integer PRNG we need to draw ~4e9 numbers to reach the 50%. Of course, this is all an estimate, so you want to draw several orders of magnitude less numbers than that. Also see this blog post for more information.
In case of seeding the parallel std::m19937 instances with this method (32 bit output and input!), that means you probably do not want to draw more than a hundred or so random numbers. Otherwise, you have a realistic chance of drawing the same number twice. Of course, you could ensure that you do not draw the same seed twice by keeping a list of already used seeds. Or use std::mt19937_64.
Additionally, there are still the potential pitfalls mentioned above regarding the seeding of a Mersenne Twister with 32 bit numbers.
With seed sequence:
The idea of std::seed_seq is to take some numbers, "mix them" and then provide them as input to the PRNG so that it can initialize its state. Since the 32 bit Mersenne Twister has a state of 624 32-bit integers, you should provide that many numbers to the seed sequence for theoretically optimal results. That way you get b=2^(624*32), meaning that you avoid the birthday problem for all practical purposes.
But in your example
std::mt19937 fork(std::mt19937& rng) {
return std::mt19937(std::seed_seq({rng()}));
}
you provide only a single 32 bit integer. This effectively means that you hash that 32 bit number before putting it into std::mt19937. So you do not gain anything regarding the birthday problem. And the additional hashing is unnecessary because std::mt19937 already does something like this.
std::seed_seq itself is somewhat flawed, see this blog post. But I guess for practical purposes it does not really matter. A supposedly better alternative exists, but I have no experience with it.
Third option
Some PRNG algorithms such as PCG or xoshiro256++ allow to jump over a large number of random numbers fast. For example, xoshiro256++ has a period of (2^256)-1 before it repeats itself. It allows to jump ahead by 2^128 (or alternatively 2^192) numbers. So the idea would be that the first PRNG is seeded, then you create a copy of it and jump ahead by 2^128 numbers, then create a copy of that second one and jump ahead again by 2^128, etc. So each instance works in a slice of length 2^128 from the total range of 2^256. The slices are stochastically independent. This elegantly bypasses the problems with the above methods.
The standard PRNGs do have a discard(z) method to jump z values ahead. However, it is not guaranteed that the jumping will be fast. I don't know whether std::mt19937 implements fast jumping in all standard library implementations. (As far as I know, the Mersenne Twister algorithm itself does allow this in principle.)
Additional note
I found PRNGs to be surprisingly difficult to use "right". It really depends on the use case how careful you need to be and what method to choose. Think about the worst thing that could happen in your case if something goes wrong, and invest an according amount of time in researching the topic.
For ordinary scientific simulations where you require only a few dozens or so parallel instances of std::mt19937, I'd guess that the first and second option (without seed sequence) are both viable. But if you need several hundreds or even more, you should think more carefully.

rand() not giving me a random number (even when srand() is used)

Okay I'm starting to lose my mind. All I want to do is random a number between 0 and 410, and according to this page, my code should do that. And since I want a random number and not a pseudo-random number, I'm using srand() as well, in a way that e.g. this thread told me to do. But this isn't working. All I get is a number that is depending on how long it was since my last execution. If I e.g. execute it again as fast as I can, the number is usually 6 numbers higher than the last number, and if I wait longer, it's higher, etc. When it reaches 410 it goes back to 0 and begins all over again. What am I missing?
Edit: And oh, if I remove the srand(time(NULL)); line I just get the same number (41) every time I run the program. That's not even pseudo random, that's just a static number. Just copying the first line of code from the article I linked to above still gives me number 41 all the time. Am I the star in a sequel to "The Number 23", or have I missed something?
int main(void) {
srand(time(NULL));
int number = rand() % 410;
std::cout << number << std::endl;
system("pause");
}

That is what you get for using deprecated random number generation.
rand produces a fixed sequence of numbers (which by itself is fine), and does that very, very badly.
You tell rand via srand where in the sequence to start. Since your "starting point" (called seed btw) depends on the number of seconds since 1.1.1970 0:00:00 UTC, your output is obviously time depended.
The correct way to do what you want to do is using the C++11 <random> library. In your concrete example, this would look somewhat like this:
std::mt19937 rng (std::random_device{}());
std::uniform_int_distribution<> dist (0, 409);
auto random_number = dist(rng);
For more information on the evils of rand and the advantages of <random> have a look at this.
As a last remark, seeding std::mt19937 like I did above is not quite optimal because the MT's state space is much larger than the 32 bit you get out of a single call to std::random_device{}(). This is not a problem for toy programs and your standard school assignments, but for reference: Here is my take at seeding the MT's entire state space, plus some helpful suggestions in the answers.

From manual:
time() returns the time as the number of seconds since the Epoch,
1970-01-01 00:00:00 +0000 (UTC).
Which means that if you start your program twice both times at the same second you will initialize srand with same value and will get same state of PRNG.
And if you remove initialization via call to srand you will always get exactly same sequence of numbers from rand.

I'm afraid you can't get trully random numbers there. Built in functions are meant to provide just pseudo random numbers. Moreover using srand and rand, because the first uses the same approach as the second one. If you want to cook true random numbers, you must find a correct source of entrophy, working for example with atmospheric noise, as the approach of www.random.org.
The problem here consists in the seed used by the randomness algorithm: if it's a number provided by a machine, it can't be unpredictable. A normal solution for this is using external hardware.

Unfortunately you can't get a real random number from a computer without specific hardware (which is often too slow to be practical).
Therefore you need to make do with a pseudo generator. But you need to use them carefully.
The function rand is designed to return a number between 0 and RAND_MAX in a way that, broadly speaking, satisfies the statistical properties of a uniform distribution. At best you can expect the mean of the drawn numbers to be 0.5 * RAND_MAX and the variance to be RAND_MAX * RAND_MAX / 12.
Typically the implementation of rand is a linear congruential generator which basically means that the returned number is a function of the previous number. That can give surprisingly good results and allows you to seed the generator with a function srand.
But repeated use of srand ruins the statistical properties of the generator, which is what is happening to you: your use of srand is correlated with your system clock time. The behaviour you're observing is completely expected.
What you should do is to only make one call to srand and then draw a sequence of numbers using rand. You cannot easily do this in the way you've set things up. But there are alternatives; you could switch to a random number generator (say mersenne twister) which allows you to draw the (n)th term and you could pass the value of n as a command line argument.
As a final remark, I'd avoid using a modulus when drawing a number. This will create a statistical bias if your modulo is not a multiple of RAND_MAX.

Try by change the NULL in time(NULL) by time(0) (that will give you the current système time). If it doesn't work, you could try to convert time(0) into ms by doing time(0)*1000.

How to generate a massive amount of high quality Random Numbers?

I'm working on a random walk simulation of particles moving in a lattice. For that reason I must create a massive amount of random numbers, about 10^12 and above. Currently I'm using the possibilities C++11 provides with <random>. When profiling my program, I see that a major amount of time is spent in <random>. The vast majority of those numbers are between 0 and 1, evenly distributed. Here a then I need a number from a binomial distribution. But the focus lies on the 0..1 numbers.
The question is: What can I do to reduce the CPU time needed to generate these numbers and what would the impact be on their quality?
As you can see, I tried different engines, but that had no big effect on CPU time. Further, what is the difference between my uniform01(gen) and generate_canonical<double,numeric_limits<double>::digits>(gen) anyhow?
Edit: Reading through the answers I conclude that there is not THE ideal solution for my problem. Thus I decided to first make my program multi threading capable and run multiple RNG in different threads (seeded with one random_device number + an thread individual increment). For the time being this seams to be the most unavoidable step (multi threading would be required anyhow). As a further step, pending on exact requirements I consider switching to the suggested Intel RNG or to Thrust. Meaning that my RNG implementation should not be to complex, which, currently is is not. But for now I like to focus on the physical correctness of my model and not on programming stuff, this comes as soon as the output of my program is physically correct.
Thrust
Concerning Intel RNG
Here is what I do currently:
class Generator {
public:
Generator();
virtual ~Generator();
double rand01(); //random number [0,1)
int binomial(int n, double p); //binomial distribution with n samples with probability p
private:
std::random_device randev; //seed
/*Engines*/
std::mt19937_64 gen;
//std::mt19937 gen;
//std::default_random_engine gen;
/*Distributions*/
std::uniform_real_distribution<double> uniform01;
std::binomial_distribution<> binomialdist;
};
Generator::Generator() : randev(), gen(randev()), uniform01(0.,1.), binomial(1,1.) {
}
Generator::~Generator() { }
double Generator::rand01() {
//return uniform01(gen);
return generate_canonical<double,numeric_limits<double>::digits>(gen);
}
int Generator::binomialdist(int n, double p) {
binomial.param(binomial_distribution<>::param_type(n,p));
return binomial(gen);
}

You can pre-process random numbers and use them when you need.
If you need true random numbers I suggest you to use a service like http://www.random.org/ that ensures random numbers calculated by environment ambient instead that some algorithm.
And, speaking about random numbers, you must also check this:

If you need a massive amount of random numbers, and I mean MASSIVE, do a careful search on the internet for IBM's floating point random number generator, published maybe ten years ago. You'll have to buy either a PowerPC machine, or a newer Intel machine with fused multiply-add. They achieved random numbers at a rate of one per cycle per core. So if you bought a new Mac Pro, you could achieve probably 50 billion random numbers per second.

Perhaps instead of using a CPU you could use a GPU to generate many numbers concurrently?
Efficient Random Number Generation and Application Using CUDA

On my i3, the following program runs in about five seconds:
#include <random>
std::mt19937_64 foo;
double drand() {
union {
double d;
long long l;
} x;
x.d = 1.0;
x.l |= foo() & (1LL<<53)-1;
return x.d-1;
}
int main() {
double d;
for (int i = 0; i < 1e9; i++)
d += drand();
printf("%g\n", d);
}
whereas replacing the drand() call with the following results in a program that runs in about ten seconds:
double drand2() {
return std::generate_canonical<double,
std::numeric_limits<double>::digits>(foo);
}
Using the following instead of drand() also results in a program that runs in about ten seconds:
std::uniform_real_distribution<double> uni;
double drand3() {
return uni(foo);
}
Perhaps the hacky drand() above suits your purposes better than the standard solutions..

Task Definition
OP asks to get answer for both the
1. Speed of generation -- assuming a set of 10E+012 random numbers to be "massive"
and
2. Quality of generator -- with a weak assumption that numbers just evenly distributed over some range of values are also random
However, there are more cardinal aspects to be addressed and successfully solved for the real system:
A. Define, whether your system simulation needs to be provided with a guarantee of a repeatability of the sequence of the random numbers for future re-runs of an experiment.
If this is not the case, the re-runs of the simulated experiment will yield principally different results then the randomizer process ( or pre-randomizer and randomized-selector ) need not worry about their re-entrant, state-full mode of operation and will get much simpler implementation.
B. Define, to what level do you need to proof a quality of randomness of the generated random numbers ( or does the generated sets of random numbers have to belong to some specific law of statistic theory ( some known synthetic distributions or truly random with an utmost Kolmogorov complexity of the resulting set of random numbers )). One need not be NSA expert to state that numerical generators of true-random sequences is a very hard issue and has it's computational costs associated with production of high-randomness products.
Hyper-chaotic and true-random sequences are computationally extemely expensive. Using low- or poor-randomness generators is not an option for randomness-quality sensitive applications ( whatever the marketing papers may say, no MIL-STD- or NSA-graded system will ever try this compromised quality in enviroments, where the results indeed matter, so why to settle for less in scientific simulations? Perhaps not a problem if you do not mind to miss so many "unvisited" states of the simulated phenomena ).
C. Verify, how many random numbers does your simulation system need to "consume per [usec]" and whether this design requirement parameter is constant or may get scaled-up by going into multi-threaded, vectorised, Grid-/Cloud-based distributed computation framework.
D. Does your simulation system require to maintain a global or per-thread- or perGrid/CloudNode- individual access management to the pool-of-randomized numbers in case of vectorized or Grid/Cloud-based computational strategy.
Task Solution Approach
Fastest [1] and best [2] solution with [A] and [B] solved and options for [D] is to pre-generate an utmost randomness quality numbers into an adequate access-pool ( and pay an acceptable cost of [C] and [D] on access-policy and access-management controls to re-read from the pool, rather than to re-generate ).

C++ uniform_int_distribution always returning min() on first invocation

In at least one implementation of the standard library, the first invocation of a std::uniform_int_distribution<> does not return a random value, but rather the distribution's min value. That is, given the code:
default_random_engine engine( any_seed() );
uniform_int_distribution< int > distribution( smaller, larger );
auto x = distribution( engine );
assert( x == smaller );
...x will in fact be smaller for any values of any_seed(), smaller, or larger.
To play along at home, you can try a code sample that demonstrates this problem in gcc 4.8.1.
I trust this is not correct behavior? If it is correct behavior, why would a random distribution return this clearly non-random value?

Explanation for the observed behavior
This is how uniform_int_distribution maps the random bits to numbers if the range of possible outcomes is smaller than the range of number the rng produces:
const __uctype __uerange = __urange + 1; // __urange can be zero
const __uctype __scaling = __urngrange / __uerange;
const __uctype __past = __uerange * __scaling;
do
__ret = __uctype(__urng()) - __urngmin;
while (__ret >= __past);
__ret /= __scaling;
where __urange is larger - smaller and __urngrange is the difference between the maximum and the minimum value the rng can return. (Code from bits/uniform_int_dist.h in libstdc++ 6.1)
In our case, the rng default_random_engine is a minstd_rand0, which yields __scaling == 195225785 for the range [0,10] you tested with. Thus, if rng() < 195225785, the distribution will return 0.
The first number a minstd_rand0 returns is
(16807 * seed) % 2147483647
(where seed == 0 gets adjusted to 1 btw). We can thus see that the first value produced by a minstd_rand0 seeded with a number smaller than 11615 will yield 0 with the uniform_int_distribution< int > distribution( 0, 10 ); you used. (mod off-by-one-errors on my part. ;) )
You mentioned the problem going away for bigger seeds: As soon as the seeds get big enough to actually make the mod operation do something, we cannot simply assign a whole range of values to the same output by division, so the results will look better.
Does that mean (libstdc++'s impl of) <random> is broken?
No. You introduced significant bias in what is supposed to be a random 32 bit seed by always choosing it small. That bias showing up in the results is not surprising or evil. For random seeds, even your minstd_rand0 will yield a fairly uniformly random first value. (Though the sequence of numbers after that will not be of great statistical quality.)
What can we do about this?
Case 1: You want random number of high statistical quality.
For that, you use a better rng like mt19937 and seed its entire state space. For the Mersenne Twister, that's 624 32-bit integers. (For reference, here is my attempt to do this properly with some helpful suggestions in the answer.)
Case 2: You really want to use those small seeds only.
We can still get decent results out of this. The problem is that pseudo random number generators commonly depend "somewhat continuously" on their seed. To ship around this, we discard enough numbers to let the initially similar sequences of output diverge. So if your seed must be small, you can initialize your rng like this:
std::mt19937 rng(smallSeed);
rng.discard(700000);
It is vital that you use a good rng like the Mersenne Twister for this. I do not know of any method to get even decent values out of a poorly seeded minstd_rand0, for example see this train-wreck. Even if seeded properly, the statistical properties of a mt19937 are superior by far.
Concerns about the large state space or slow generation you sometimes hear about are usually of no concern outside the embedded world. According to boost and cacert.at, the MT is even way faster than minstd_rand0.
You still need to do the discard trick though, even if your results look good to the naked eye without. It takes less than a millisecond on my system, and you don't seed rngs very often, so there is no reason not to.
Note that I am not able to give you a sharp estimate for the number of discards we need, I took that value from this answer, it links this paper for a rational. I don't have the time to work through that right now.

How to properly choose rng seed for parallel processes

I'm currently working on a C/C++ project where I'm using a random number generator (gsl or boost). The whole idea can be simplified to a non-trivial stochastic process which receives a seed and returns results. I'm computing averages over different realisations of the process.
So, the seed is important: the processes must be with different seeds or it will bias the averages.
So far, I'm using time(NULL) to give a seed. However, if two processes start at the same second, the seed is the same. That happens because I'm using parallelisation (using openMP).
So, my question is: how to implement a "seed giver" on C/C++ which gives independent seeds?
For instance, I though in using the thread number (thread_num), seed = time(NULL)*thread_num. However, this means that the seeds are correlated: they are multiple of each others. Does that poses any problem to the "pseudo-random" or is it as good as sequential seeds?
The requirements are that it must work on both Mac OS (my pc) and Linux distribution similar to OS Cent (the cluster) (and naturally give independent realisations).

A commonly used scheme for this is to have a "master" RNG used to generate seeds for each process-specific RNG.
The advantage of such a scheme is that the whole computation is determined by only one seed, which you can record somewhere to be able to replay any simulation (this might be useful to debug nasty bugs).

We ran into a similar problem on a Beowulf computing grid, the solution we used was to incorporate the pid of the process into the RNG seed, like so:
time(NULL)*thread_num*getpid()
Of course, you could just read from /dev/urandom or /dev/random into an integer.

When faced with this problem I often use seed_rng from Boost.Uuid. It uses time, clock and random data from /dev/urandom to calculate a seed. You can use it like
#include <boost/uuid/seed_rng.hpp>
#include <iostream>
int main() {
int seed = boost::uuids::detail::seed_rng()();
std::cout << seed << std::endl;
}
Note that seed_rng comes from a detail namespace, so it can go away without further notice. In that case writing your own implementation based on seed_rng shouldn't be too hard.

Mac OS is Unix too, so it probably has /dev/random. If so, that's the
best solution for obtaining the seeds. Otherwise, if the generator is
good, taking time( NULL ) once, and then incrementing it for the seed
of each generator, should give reasonably good results.

If you are on x86 and don't mind making the code non-portable then you could read the Time Stamp Counter (TSC) which is a 64-bit counter that increments at the CPU (max) clock rate (about 3 GHz) and use that as a seed.
#include <stdint.h>
static inline uint64_t rdtsc()
{
uint64_t tsc;
asm volatile
(
"rdtsc\n\t"
"shl\t$32,%%rdx\n\t" // rdx = TSC[ 63 : 32 ] : 0x00000000
"add\t%%rdx,%%rax\n\t" // rax = TSC[ 63 : 0 ]
: "=a" (tsc) : : "%rdx"
);
return tsc;
}

When compare two infinite time sequences produced by the same pseudo-random number generator with different seeds, we can see that they are same delayed by some time tau. Usually this time time scale is much bigger than your problem to ensure that the two random walks are uncorrelated.
If your stochastic process is in a high dimensional phase space, I think that one good suggestion could be:
seed = MAXIMUM_INTEGER/NUMBER_OF_PARALLEL_RW*thread_num + time(NULL)
Notice that using scheme you are not guaranteeing that time tau is big !!
If you have some knowledge of your system time scale, you can call your random number generator some number o times in order to generate seeds that are equidistant by some time interval.

Maybe you could try std::chrono high resolution clock from C++11:
Class std::chrono::high_resolution_clock represents the clock with the
smallest tick period available on the system. It may be an alias of
std::chrono::system_clock or std::chrono::steady_clock, or a third,
independent clock.
http://en.cppreference.com/w/cpp/chrono/high_resolution_clock
BUT tbh Im not sure that there is anything wrong with srand(0); srand(1), srand(2).... but my knowledge of rand is very very basic. :/
For crazy safety consider this:
Note that all pseudo-random number generators described below are
CopyConstructible and Assignable. Copying or assigning a generator
will copy all its internal state, so the original and the copy will
generate the identical sequence of random numbers.
http://www.boost.org/doc/libs/1_51_0/doc/html/boost_random/reference.html#boost_random.reference.generators
Since most of the generators have crazy long cycles you could generate one, copy it as first generator, generate X numbers with original, copy it as second, generate X numbers with original, copy it as third...
If your users call their own generator less than X time they will not be overlapping.

The way I understand your question, you have multiple processes using the same pseudo-random number generation algorithm, and you want each "stream" of random numbers (in each process) to be independent of each other. Am I correct ?
In that case, you are right in suspecting that giving different (correlated) seeds does not guaranty you anything unless the rng algorithm says so. You basically have two solutions:
Simple version
Use a single source of random numbers, with a single seed. Then feed random numbers in a round-robin fashion to each process.
This solution is slow but provide some guaranty that the number you give to your processes are ok.
You can do the same thing but generating all the random numbers you need at once, and then splitting this set into as many slices as you have processes.
Use a RNG designed for that
You can find in papers and on the web several algorithms specifically designed to provide independent streams of random numbers from a single initial state. They are complicated but most provide source code. The idea is generally to "split" the RNG space (values you can obtain from the initial state) into various chunks like above. They are just faster because the algorithm used makes it possible to compute easily what would be the state of the RNG if you skipped a given number of values.
These generators are generally called "parallel random number generators".
The most popular ones are probably these two:
RngStreams: http://statmath.wu.ac.at/software/RngStreams/
SPRNG: http://sprng.cs.fsu.edu/
Check their manuals to fully understand what they do, how they do it, and if it really is what you need.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js