How to properly choose rng seed for parallel processes

How to properly choose rng seed for parallel processes - c++

I'm currently working on a C/C++ project where I'm using a random number generator (gsl or boost). The whole idea can be simplified to a non-trivial stochastic process which receives a seed and returns results. I'm computing averages over different realisations of the process.
So, the seed is important: the processes must be with different seeds or it will bias the averages.
So far, I'm using time(NULL) to give a seed. However, if two processes start at the same second, the seed is the same. That happens because I'm using parallelisation (using openMP).
So, my question is: how to implement a "seed giver" on C/C++ which gives independent seeds?
For instance, I though in using the thread number (thread_num), seed = time(NULL)*thread_num. However, this means that the seeds are correlated: they are multiple of each others. Does that poses any problem to the "pseudo-random" or is it as good as sequential seeds?
The requirements are that it must work on both Mac OS (my pc) and Linux distribution similar to OS Cent (the cluster) (and naturally give independent realisations).

A commonly used scheme for this is to have a "master" RNG used to generate seeds for each process-specific RNG.
The advantage of such a scheme is that the whole computation is determined by only one seed, which you can record somewhere to be able to replay any simulation (this might be useful to debug nasty bugs).

We ran into a similar problem on a Beowulf computing grid, the solution we used was to incorporate the pid of the process into the RNG seed, like so:
time(NULL)*thread_num*getpid()
Of course, you could just read from /dev/urandom or /dev/random into an integer.

When faced with this problem I often use seed_rng from Boost.Uuid. It uses time, clock and random data from /dev/urandom to calculate a seed. You can use it like
#include <boost/uuid/seed_rng.hpp>
#include <iostream>
int main() {
int seed = boost::uuids::detail::seed_rng()();
std::cout << seed << std::endl;
}
Note that seed_rng comes from a detail namespace, so it can go away without further notice. In that case writing your own implementation based on seed_rng shouldn't be too hard.

Mac OS is Unix too, so it probably has /dev/random. If so, that's the
best solution for obtaining the seeds. Otherwise, if the generator is
good, taking time( NULL ) once, and then incrementing it for the seed
of each generator, should give reasonably good results.

If you are on x86 and don't mind making the code non-portable then you could read the Time Stamp Counter (TSC) which is a 64-bit counter that increments at the CPU (max) clock rate (about 3 GHz) and use that as a seed.
#include <stdint.h>
static inline uint64_t rdtsc()
{
uint64_t tsc;
asm volatile
(
"rdtsc\n\t"
"shl\t$32,%%rdx\n\t" // rdx = TSC[ 63 : 32 ] : 0x00000000
"add\t%%rdx,%%rax\n\t" // rax = TSC[ 63 : 0 ]
: "=a" (tsc) : : "%rdx"
);
return tsc;
}

When compare two infinite time sequences produced by the same pseudo-random number generator with different seeds, we can see that they are same delayed by some time tau. Usually this time time scale is much bigger than your problem to ensure that the two random walks are uncorrelated.
If your stochastic process is in a high dimensional phase space, I think that one good suggestion could be:
seed = MAXIMUM_INTEGER/NUMBER_OF_PARALLEL_RW*thread_num + time(NULL)
Notice that using scheme you are not guaranteeing that time tau is big !!
If you have some knowledge of your system time scale, you can call your random number generator some number o times in order to generate seeds that are equidistant by some time interval.

Maybe you could try std::chrono high resolution clock from C++11:
Class std::chrono::high_resolution_clock represents the clock with the
smallest tick period available on the system. It may be an alias of
std::chrono::system_clock or std::chrono::steady_clock, or a third,
independent clock.
http://en.cppreference.com/w/cpp/chrono/high_resolution_clock
BUT tbh Im not sure that there is anything wrong with srand(0); srand(1), srand(2).... but my knowledge of rand is very very basic. :/
For crazy safety consider this:
Note that all pseudo-random number generators described below are
CopyConstructible and Assignable. Copying or assigning a generator
will copy all its internal state, so the original and the copy will
generate the identical sequence of random numbers.
http://www.boost.org/doc/libs/1_51_0/doc/html/boost_random/reference.html#boost_random.reference.generators
Since most of the generators have crazy long cycles you could generate one, copy it as first generator, generate X numbers with original, copy it as second, generate X numbers with original, copy it as third...
If your users call their own generator less than X time they will not be overlapping.

The way I understand your question, you have multiple processes using the same pseudo-random number generation algorithm, and you want each "stream" of random numbers (in each process) to be independent of each other. Am I correct ?
In that case, you are right in suspecting that giving different (correlated) seeds does not guaranty you anything unless the rng algorithm says so. You basically have two solutions:
Simple version
Use a single source of random numbers, with a single seed. Then feed random numbers in a round-robin fashion to each process.
This solution is slow but provide some guaranty that the number you give to your processes are ok.
You can do the same thing but generating all the random numbers you need at once, and then splitting this set into as many slices as you have processes.
Use a RNG designed for that
You can find in papers and on the web several algorithms specifically designed to provide independent streams of random numbers from a single initial state. They are complicated but most provide source code. The idea is generally to "split" the RNG space (values you can obtain from the initial state) into various chunks like above. They are just faster because the algorithm used makes it possible to compute easily what would be the state of the RNG if you skipped a given number of values.
These generators are generally called "parallel random number generators".
The most popular ones are probably these two:
RngStreams: http://statmath.wu.ac.at/software/RngStreams/
SPRNG: http://sprng.cs.fsu.edu/
Check their manuals to fully understand what they do, how they do it, and if it really is what you need.

Related

Is the seed of the mersenne_twister_engine instance invariant? [duplicate]

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?

The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.

Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_twister_engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries

Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

Deterministic random numbers from STL [duplicate]

Inspired from this and the similar questions, I want to learn how does mt19937 pseudo-number generator in C++11 behaves, when in two separate machines, it is seeded with the same input.
In other words, say we have the following code;
std::mt19937 gen{ourSeed};
std::uniform_int_distribution<int> dest{0, 10000};
int randNumber = dist(gen);
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time ?
And in either case, why this is the case ?
A further question:
Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?

The generator will generate the same values.
The distributions may not, at least with different compilers or library versions. The standard did not specify their behaviour to that level of detail. If you want stability between compilers and library versions, you have to roll your own distribution.
Barring library/compiler changes, that will return the same values in the same sequence. But if you care write your own distribution.
...
All PRNGs have patterns and periods. mt19937 is named after its period of 2^19937-1, which is unlikely to be a problem. But other patterns can develop. MT PRNGs are robust against many statistical tests, but they are not crytographically secure PRNGs.
So it being a problem if you run for months will depend on specific details of what you'd find to be a problem. However, mt19937 is going to be a better PRNG than anything you are likely to write yourself. But assume attackers can predict its future behaviour from past evidence.

Regardless of the seed, will this code generate randomly numbers infinitely ? I mean for example, if we use this block of code in a program that runs for months without stopping, will there be any problem in the generation of the number or in the uniformity of the numbers ?
RNG we deal with with standard C++ are called pseudo-random RNGs. By definition, this is pure computational device, with multi-bit state (you could think about state as large bit vector) and three functions:
state seed2state(seed);
state next_state(state);
uint(32|64)_t state2output(state);
and that is it. Obviously, state has finite size, 19937 bits in case of MT19937, so total number of states are 219937 and thus MT19937 next_state() function is a periodic one, with max period no more than 219937. This number is really HUGE, and most likely more than enough for typical simulation
But output is at max 64 bits, so output space is 264. It means that during large run any particular output appears quite a few times. What matters is when not only some 64bit number appears again, but number after that, and after that and after that - this is when you know RNG period is reached.
If we try this code on different machines at different times, will we get the same sequence of randNumber values or a different sequence each time?
Generators are defined rather strictly, and you'll get the same bit stream. For example for MT19937 from C++ standard (https://timsong-cpp.github.io/cppwp/rand)
class mersenne_twister_engine {
...
static constexpr result_type default_seed = 5489u;
...
and function seed2state described as (https://timsong-cpp.github.io/cppwp/rand#eng.mers-6)
Effects: Constructs a mersenne_twister_engine object. Sets X−n to value mod 2w. Then, iteratively for i=−n,…,−1, sets Xi to ...
Function next_state is described as well together with test value at 10000th invocation. Standard says (https://timsong-cpp.github.io/cppwp/rand#predef-3)
using mt19937 = mersenne_twister_engine<uint_fast32_t,32,624,397,31,0x9908b0df,11,0xffffffff,7,0x9d2c5680,15,0xefc60000,18,1812433253>;
3
#Required behavior: The 10000th consecutive invocation of a default-constructed object
of type mt19937 shall produce the value 4123659995.
Big four compilers (GCC, Clang, VC++, Intel C++) I used produced same MT19937 output.
Distributions, from the other hand, are not specified that well, and therefore vary between compilers and libraries. If you need portable distributions you either roll your own or use something from Boost or similar libraries

Any pseudo RNG which takes a seed will give you the same sequence for the same seed every time, on every machine. This happens since the generator is just a (complex) mathematical function, and has nothing actually random about it. Most times when you want to randomize, you take the seed from the system clock, which constantly changes so each run will be different.
It is useful to have the same sequence in computer games for example when you have a randomly generated world and want to generate the exact same one, or to avoid people cheating using save games in a game with random chances.

How to properly seed a 64 bit random generator with time [duplicate]

I'm working on a program that runs Monte Carlo simulation; specifically, I'm using a Metropolis algorithm. The program needs to generate possibly billions of "random" numbers. I know that the Mersenne twister is very popular for Monte Carlo simulation, but I would like to make sure that I am seeding the generator in the best way possible.
Currently I'm computing a 32-bit seed using the following method:
mt19937_64 prng; //pseudo random number generator
unsigned long seed; //store seed so that every run can follow the same sequence
unsigned char seed_count; //to help keep seeds from repeating because of temporal proximity
unsigned long genSeed() {
return ( static_cast<unsigned long>(time(NULL)) << 16 )
| ( (static_cast<unsigned long>(clock()) & 0xFF) << 8 )
| ( (static_cast<unsigned long>(seed_count++) & 0xFF) );
}
//...
seed = genSeed();
prng.seed(seed);
I have a feeling there are much better ways to assure non-repeating new seeds, and I'm quite sure mt19937_64 can be seeded with more then 32-bits. Does anyone have any suggestions?

Use std::random_device to generate the seed. It'll provide non-deterministic random numbers, provided your implementation supports it. Otherwise it's allowed to use some other random number engine.
std::mt19937_64 prng;
seed = std::random_device{}();
prng.seed(seed);
operator() of std::random_device returns an unsigned int, so if your platform has 32-bit ints, and you want a 64-bit seed, you'll need to call it twice.
std::mt19937_64 prng;
std::random_device device;
seed = (static_cast<uint64_t>(device()) << 32) | device();
prng.seed(seed);
Another available option is using std::seed_seq to seed the PRNG. This allows the PRNG to call seed_seq::generate, which produces a non-biased sequence over the range [0 ≤ i < 232), with an output range large enough to fill its entire state.
std::mt19937_64 prng;
std::random_device device;
std::seed_seq seq{device(), device(), device(), device()};
prng.seed(seq);
I'm calling the random_device 4 times to create a 4 element initial sequence for seed_seq. However, I'm not sure what the best practice for this is, as far as length or source of elements in the initial sequence is concerned.

Let's recap (comments too), we want to generate different seeds to get independent sequences of random numbers in each of the following occurrences:
The program is relaunched on the same machine later,
Two threads are launched on the same machine at the same time,
The program is launched on two different machines at the same time.
1 is solved using time since epoch, 2 is solved with a global atomic counter, 3 is solved with a platform dependent id (see How to obtain (almost) unique system identifier in a cross platform way?)
Now the point is what is the best way to combine them to get a uint_fast64_t (the seed type of std::mt19937_64)? I assume here that we do not know a priori the range of each parameter or that they are too big, so that we cannot just play with bit shifts getting a unique seed in a trivial way.
A std::seed_seq would be the easy way to go, however its return type uint_least32_t is not our best choice.
A good 64 bits hasher is a much better choice. The STL offers std::hash under the functional header, a possibility is to concatenate the three numbers above into a string and then passing it to the hasher. The return type is a size_t which on 64 machines is very likely to match our requirements.
Collisions are unlikely but of course possible, if you want to be sure to not build up statistics that include a sequence more than once, you can only store the seeds and discard the duplicated runs.
A std::random_device could also be used to generate the seeds (collisions may still happen, hard to say if more or less often), however since the implementation is library dependent and may go down to a pseudo random generator, it is mandatory to check the entropy of the device and avoid to a use zero-entropy device for this purpose as you will probably break the points above (especially point 3). Unfortunately you can discover the entropy only when you take the program to the specific machine and test with the installed library.

As far as I can tell from your comments, it seems that what you are interested in is ensuring that if a process starts several of your simulations at exactly the same time, they will get different seeds.
The only significant problem I can see with your current approach is a race condition: if you are going to start multiple simulations simultaneously, it must be done from separate threads. If it is done from separate threads, you need to update seed_count in a thread-safe manner, or multiple simulations could end up with the same seed_count. You could simply make it an std::atomic<int> to solve that.
Beyond that, it just seems more complicated than it has to be. What do you gain by using two separate timers? You could do something as simple as this:
at program startup, grab the current system time (using a high resolution timer) once, and store that.
assign each simulation a unique ID (this could just be an integer initialized to 0, (which should be generated without any race conditions, as mentioned above) which is incremented each time a simulation starts, effectively like your seed_count.
when seeding a simulation, just use the initially generated timestamp + the unique ID. If you do this, every simulation in the process is assured a unique seed.

How about...
There is some main code that starts the threads and there are copies of a function run in those threads, each copy with it's own Marsenne Twister. Am I correct? If it is so, why not use another random generator in the main code? It would be seeded with time stamp, and send it's consecutive pseudorandom numbers to function instances as their seeds.

From the comments I understand you want to run several instances of the algorithm, one instance per thread. And given that the seed for each instance will be generated pretty much at the same time, you want to ensure that these seeds are different. If that is indeed what you are trying to solve, then your genSeed function will not necessarily guarantee that.
In my opinion, what you need is a parallelisable random number generator (RNG). What this means, is that you only need one RNG which you instantiate with only one seed (which you can generate with your genSeed) and then the sequence of random numbers that would normally be gerenated in a sequential environment is split in X non-overlapping sequences; where X is the number of threads. There is a very good library which provides these type of RNGs in C++, follows the C++ standard for RNGs, and is called TRNG(http://numbercrunch.de/trng).
Here is a little more information. There are two ways you can achieve non-overlapping sequences per thread. Let's assume that the sequence of random numbers from a single RNG is r = {r(1), r(2), r(3),...} and you have only two threads. If you know in advance how many random numbers you will need per thread, say M, you can give the first M of the r sequence to the first thread, ie {r(1), r(2),..., r(M)}, and the second M to the second thread, ie {r(M+1), r(M+2),...r(2M)}. This technique is called blocksplitting since you split the sequence in two consecutive blocks.
The second way is to create the sequence for the first thread as {r(1), r(3), r(5), ...} and for the second thread as {r(2), r(4), r(6),...}, which has the advantage that you do not need to know in advance how many random numbers you will need per thread. This is called leapfroging.
Note that both methods guarantee that the sequences per thread are indeed non-overlapping. The link I posted above has many examples and the library itself is extremely easy to use. I hope my post helps.

The POSIX function gettimeofday(2) gives the time with microsecond precision.
The POSIX thread function gettid(2) returns the ID number of the current thread.
You should be able to combine the time in seconds since the epoch (which you are already using), the time in microseconds, and the thread ID to get a seed which is always unique on one machine.
If you also need it to be unique across multiple machines, you could consider also getting the hostname, the IP address, or the MAC address.
I would guess that 32 bits is probably enough, since there are over 4 billion unique seeds available. Unless you are running billions of processes, which doesn't seem likely, you should be alright without going to 64 bit seeds.

What is a good way to seed parallel pseudo random number generators?

The PRNG I wrote has a period of 2^64. When I use a spinlock to protect it from 4 threads, It runs twice slower than when there is a single thread. A mutex appears better at making things slower. So I decided to have separate generators per thread, but the problem here is that when the seeds are too close, The same series of random numbers will appear again and again each in a different thread. I'm not 100% sure how bad this will affect my simulation, but I'd like to avoid having very closely seeded PRNGs.
Maybe my original question was too less specified to get an easy solution. Below I posted the PRNG that I'm using. It performs very well in statistical tests such as Diehard or FIPS, but I really cannot prove why as I'm no expert in this area. I need a way to find good seeds for 4 or more generators running in parallel. With 2 seeds, the worst pair of seeds are the same seeds so that 2 threads are getting the same sequence of random numbers. The best pair of seeds will produce two sequences with no overlapping part.
I see that it gets harder to find the 'best' set of seeds as the number of parallel generators or the number of random numbers generated or both get greater. There will be at least 4 threads and at least a billion random numbers generated per task.
I simplest solution I can reach is periodic reseeding. Sometimes I may get a bad set of seeds, but it will soon get replaced by a better one after a periodic reseed.
Is there a general solution to my problem that can be applied to any PRNG? Or at least something available to the generator I'm currently using? I can possibly change my PRNG, if there's one which is specifically designed and well suited for parallel random number generation.
static thread_local unsigned long long n;
void seedRand(unsigned long long s)
{
n = s;
}
unsigned genRand(void)
{
n *= 123456789;
n ^= n >> 3;
n ^= n << 5;
return n ^= n >> 7;
}

I can possibly change my PRNG, if there's one which is specifically designed and well suited for parallel random number generation
well, if you're willing to change RNG, there are generators which have fast (logarithmic over of number of calls skipped) skip-ahead (a.k.a leap-frog).
They by design guarantee not to overlap. Say, your simulation per thread requires 10^9 RNG calls and could run on 8 threads, then you start with single seed, and first thread is skipped by 0, second thread is skipped by 10^9, and thread number N is skipped by (N-1)* 10^9.
Reasonable acceptable one (which is reimplementation of MCNP5 fortran code) is here https://github.com/Iwan-Zotow/LCG-PLE63, but most likely is won't pass Diehard.
A bit more complex one (and it passed Diehard, I believe) is here http://www.pcg-random.org/
Both based upon fast exponentiation, paper by F. Brown, "Random Number Generation with Arbitrary Stride," Trans. Am. Nucl. Soc. (Nov. 1994)

If you have access to a cryptographic library then you can encrypt the padded initial seed with AES and split up the output among your PRNGs. For example, using counter mode with a 64-bit initial seed of [seed] you'd concatenate this until you goet a plaintext of 256 bits:
[seed][seed][seed][seed]
and your initialization vector would be [counter][seed] (you need a unique initialization vector but not necessarily a secure initialization vector, since nobody is trying to decrypt your output). This will produce a 256-bit output, the first 64 bits seed the first PRNG, the second 64 bits seed the second PRNG, etc.
There are other ways of doing this depending on what's provided in the crypto library, e.g. you could hash the initial seed, or you could produce random UUIDs until you've got enough bits to seed all of your PRNGs.

Best way to seed mt19937_64 for Monte Carlo simulations

I'm working on a program that runs Monte Carlo simulation; specifically, I'm using a Metropolis algorithm. The program needs to generate possibly billions of "random" numbers. I know that the Mersenne twister is very popular for Monte Carlo simulation, but I would like to make sure that I am seeding the generator in the best way possible.
Currently I'm computing a 32-bit seed using the following method:
mt19937_64 prng; //pseudo random number generator
unsigned long seed; //store seed so that every run can follow the same sequence
unsigned char seed_count; //to help keep seeds from repeating because of temporal proximity
unsigned long genSeed() {
return ( static_cast<unsigned long>(time(NULL)) << 16 )
| ( (static_cast<unsigned long>(clock()) & 0xFF) << 8 )
| ( (static_cast<unsigned long>(seed_count++) & 0xFF) );
}
//...
seed = genSeed();
prng.seed(seed);
I have a feeling there are much better ways to assure non-repeating new seeds, and I'm quite sure mt19937_64 can be seeded with more then 32-bits. Does anyone have any suggestions?

Use std::random_device to generate the seed. It'll provide non-deterministic random numbers, provided your implementation supports it. Otherwise it's allowed to use some other random number engine.
std::mt19937_64 prng;
seed = std::random_device{}();
prng.seed(seed);
operator() of std::random_device returns an unsigned int, so if your platform has 32-bit ints, and you want a 64-bit seed, you'll need to call it twice.
std::mt19937_64 prng;
std::random_device device;
seed = (static_cast<uint64_t>(device()) << 32) | device();
prng.seed(seed);
Another available option is using std::seed_seq to seed the PRNG. This allows the PRNG to call seed_seq::generate, which produces a non-biased sequence over the range [0 ≤ i < 232), with an output range large enough to fill its entire state.
std::mt19937_64 prng;
std::random_device device;
std::seed_seq seq{device(), device(), device(), device()};
prng.seed(seq);
I'm calling the random_device 4 times to create a 4 element initial sequence for seed_seq. However, I'm not sure what the best practice for this is, as far as length or source of elements in the initial sequence is concerned.

Let's recap (comments too), we want to generate different seeds to get independent sequences of random numbers in each of the following occurrences:
The program is relaunched on the same machine later,
Two threads are launched on the same machine at the same time,
The program is launched on two different machines at the same time.
1 is solved using time since epoch, 2 is solved with a global atomic counter, 3 is solved with a platform dependent id (see How to obtain (almost) unique system identifier in a cross platform way?)
Now the point is what is the best way to combine them to get a uint_fast64_t (the seed type of std::mt19937_64)? I assume here that we do not know a priori the range of each parameter or that they are too big, so that we cannot just play with bit shifts getting a unique seed in a trivial way.
A std::seed_seq would be the easy way to go, however its return type uint_least32_t is not our best choice.
A good 64 bits hasher is a much better choice. The STL offers std::hash under the functional header, a possibility is to concatenate the three numbers above into a string and then passing it to the hasher. The return type is a size_t which on 64 machines is very likely to match our requirements.
Collisions are unlikely but of course possible, if you want to be sure to not build up statistics that include a sequence more than once, you can only store the seeds and discard the duplicated runs.
A std::random_device could also be used to generate the seeds (collisions may still happen, hard to say if more or less often), however since the implementation is library dependent and may go down to a pseudo random generator, it is mandatory to check the entropy of the device and avoid to a use zero-entropy device for this purpose as you will probably break the points above (especially point 3). Unfortunately you can discover the entropy only when you take the program to the specific machine and test with the installed library.

As far as I can tell from your comments, it seems that what you are interested in is ensuring that if a process starts several of your simulations at exactly the same time, they will get different seeds.
The only significant problem I can see with your current approach is a race condition: if you are going to start multiple simulations simultaneously, it must be done from separate threads. If it is done from separate threads, you need to update seed_count in a thread-safe manner, or multiple simulations could end up with the same seed_count. You could simply make it an std::atomic<int> to solve that.
Beyond that, it just seems more complicated than it has to be. What do you gain by using two separate timers? You could do something as simple as this:
at program startup, grab the current system time (using a high resolution timer) once, and store that.
assign each simulation a unique ID (this could just be an integer initialized to 0, (which should be generated without any race conditions, as mentioned above) which is incremented each time a simulation starts, effectively like your seed_count.
when seeding a simulation, just use the initially generated timestamp + the unique ID. If you do this, every simulation in the process is assured a unique seed.

How about...
There is some main code that starts the threads and there are copies of a function run in those threads, each copy with it's own Marsenne Twister. Am I correct? If it is so, why not use another random generator in the main code? It would be seeded with time stamp, and send it's consecutive pseudorandom numbers to function instances as their seeds.

From the comments I understand you want to run several instances of the algorithm, one instance per thread. And given that the seed for each instance will be generated pretty much at the same time, you want to ensure that these seeds are different. If that is indeed what you are trying to solve, then your genSeed function will not necessarily guarantee that.
In my opinion, what you need is a parallelisable random number generator (RNG). What this means, is that you only need one RNG which you instantiate with only one seed (which you can generate with your genSeed) and then the sequence of random numbers that would normally be gerenated in a sequential environment is split in X non-overlapping sequences; where X is the number of threads. There is a very good library which provides these type of RNGs in C++, follows the C++ standard for RNGs, and is called TRNG(http://numbercrunch.de/trng).
Here is a little more information. There are two ways you can achieve non-overlapping sequences per thread. Let's assume that the sequence of random numbers from a single RNG is r = {r(1), r(2), r(3),...} and you have only two threads. If you know in advance how many random numbers you will need per thread, say M, you can give the first M of the r sequence to the first thread, ie {r(1), r(2),..., r(M)}, and the second M to the second thread, ie {r(M+1), r(M+2),...r(2M)}. This technique is called blocksplitting since you split the sequence in two consecutive blocks.
The second way is to create the sequence for the first thread as {r(1), r(3), r(5), ...} and for the second thread as {r(2), r(4), r(6),...}, which has the advantage that you do not need to know in advance how many random numbers you will need per thread. This is called leapfroging.
Note that both methods guarantee that the sequences per thread are indeed non-overlapping. The link I posted above has many examples and the library itself is extremely easy to use. I hope my post helps.

The POSIX function gettimeofday(2) gives the time with microsecond precision.
The POSIX thread function gettid(2) returns the ID number of the current thread.
You should be able to combine the time in seconds since the epoch (which you are already using), the time in microseconds, and the thread ID to get a seed which is always unique on one machine.
If you also need it to be unique across multiple machines, you could consider also getting the hostname, the IP address, or the MAC address.
I would guess that 32 bits is probably enough, since there are over 4 billion unique seeds available. Unless you are running billions of processes, which doesn't seem likely, you should be alright without going to 64 bit seeds.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js