I'm working on a test suite for my package, and as part of the tests I would like to run my algorithm on a block of data. However, it occured to me that instead of hardcoding a particular block of data, I could use an algorithm to generate it. I'm wondering if the C++11 <random> facilities would be appropriate for this purpose.
From what I understand, the C++11 random number engines are required to implement specific algorithms. Therefore, given the same seed they should produce the same sequence of random integers in the range defined by the algorithm parameters.
However, as far as distributions are concerned, the standard specifies that:
The algorithms for producing each of the specified distributions are implementation-defined.
(26.5.8.1 Random number distribution class templates / In general)
Which — unless I'm mistaken — means that the output of a distribution is pretty much undefined. And from what I've tested, the distributions in GNU libstdc++ and LLVM project's libc++ produce different results given the same random engines.
The question would therefore be: what would be the most correct way of producing pseudo-random data that would be completely repeatable across different platforms?
what would be the most correct way of producing pseudo-random data that would be completely repeatable across different platforms?
That would be obvious: write your own distribution. As you yourself pointed out, the engines are cross-platform since they implement a specific algorithm. It's the distributions that are implementation-defined.
So write the distributions yourself.
Please see this answer: https://stackoverflow.com/a/34962942/1151329
I had exactly this problem and writing my own distributions worked perfectly. I got the same sequences across linux, OSx, windows, x86 and ARM.
Related
I am using random_shuffle on sequence a 1000 times. I want to ensure that on every computer the final sequence is same. This might look undesirable, but still I want to achieve this. Does srand(x) ensure that ?
There are two sources of differences in the resulting shuffle: the algorithm for rand() is not specified, so different implementations produce different sequences of numbers; and the algorithm for random_shuffle is not specified, so, again, different implementations produce different results, even with the same sequence of pseudo-random numbers.
You can eliminate the first problem by using any of the random-number generators in C++11; they're all specified in detail, including, for some specializations, a requirement for the 10,000th value, which is a great help in debugging their implementation. However, there's no analog for shuffling. In particular, the algorithm for std::shuffle is not specified, so it won't give reproducible results. You'll have to write your own. That's not difficult (nowhere near as difficult as writing an engine), just do a little research; there are lots of discussion out there for you to start from.
Why is it that the result of standard distributions isn't mandated to be consistent across implementations? The result of pseudo random number generators is on the other hand mandated to be identical.
For example, the following will almost certainly print something different for every different standard library implementation.
std::mt19937 random {100};
std::normal_distribution<> dist;
std::cout << dist(random);
Say I want to do procedural generation and would like identical starting seeds to result in identical results across platforms and compilers. I can't do it with the stl. I have to "regress" to using boost. Why isn't this a defect?
This is not a defect, it is by design. The rationale for this can be found in A Proposal to Add an Extensible Random Number Facility to the Standard Library (N1398) which says (emphasis mine):
On the other hand, the specifications for the distributions only
define the statistical result, not the precise algorithm to use. This
is different from engines, because for distribution algorithms,
rigorous proofs of their correctness are available, usually under the
precondition that the input random numbers are (truely) uniformly
distributed. For example, there are at least a handful of algorithms
known to produce normally distributed random numbers from uniformly
distributed ones. Which one of these is most efficient depends on at
least the relative execution speeds for various transcendental
functions, cache and branch prediction behaviour of the CPU, and
desired memory use. This proposal therefore leaves the choice of the
algorithm to the implementation. It follows that output sequences for
the distributions will not be identical across implementations. It is
expected that implementations will carefully choose the algorithms for
distributions up front, since it is certainly surprising to customers
if some distribution produces different numbers from one
implementation version to the next.
This point is reiterated in the implementation defined section which says:
The algorithms how to produce the various distributions are specified
as implementation-defined, because there is a vast variety of
algorithms known for each distribution. Each has a different trade-off
in terms of speed, adaptation to recent computer architectures, and
memory use. The implementation is required to document its choice so
that the user can judge whether it is acceptable quality-wise.
I want for a project in C++, to have a class that has the functionality that Random class has in java or in c#
I have found this one, http://www.dreamincode.net/code/snippet342.htm but it has bugs and I can't quite fix them right now.
Could you point out the bugs and their fixes, or suggest another implementation?
There are three nearly identical, high-quality "standard" random number generation libraries that you should try to find in descending order:
C++11's <random>.
The TR1's <tr1/random>
Boost's <boost/random.hpp>.
They're all conceptually identical and even practically near-identical, apart from the namespace (std, std::tr1 and boost, respectively).
Each library defines a set of engines, such as std::mt19937. Pick one (for each thread) and seed it.
Once you have an engine, you can use a wide variety of distributions to generate numbers, using your engine. Frequently used distributions are uniform integers in a range [a, b], uniform floats in the range [0,1), and several well-known probability distributions like the normal distribution.
Try this: http://bedaux.net/mtrand/
I actually have one implemented: http://frigocoder.dyndns.org/svn/Frigo/Math, see Random, Random.cpp and MersenneTwister
Random is an abstract class though unlike in Java, I had no desire to implement silly LCG generators by default. It has no nextGaussian method either. MersenneTwister inherits from it.
They have a lot of dependencies on my library, but they can be easily removed, and you have the general idea.
I have this long and complex source code that uses a RNG with a fix seed.
This code is a simulator and the parameters of this simulator are the random values given by this RNG.
When I execute the code in the same machine, no matter how many attempts I do the output is the same. But when I execute this code on two different machines and I compare the outputs of both the machines, they are different.
Is it possible that two different machines gives somehow different output using the same random number generator and the same seed?
The compiler version, the libraries and the OS are the same.
It is certainly possible, as the RNG may be combining machine specific data with the seed, such as the network card address, to generate the random number. It is basically implementation specific.
As they do give different results it is obviously possible that they give different results. Easy-to-answer question, next!
Seriously: without knowing the source code to the RNG it’s hard to know whether you’re observing a bug or a feature. But it sounds like the RNG in question is using a second seed from somewhere else, e.g. the current time, or some hardware-dependent value like the network card’s MAC address.
If you need something that can be repeated from machine to machine, try the Boost Random Number Library.
If it's a pseudo random generator that uses nothing but the seed to produce a number sequence, then by definition they cannot be different. However, if the ones you're using are using something machine dependent to perturb the seed, or quite simply, a different algorithm, it's of course quite possible. Which implementation are you using, and if it's a standard library implementation, are they both the same version?
Yes. There are floating-point RNGs, for instance, whose results can depend on whether your CPU is properly implementing IEEE floats (not guranteed in ISO C++). Also, effects such as spilling 80 bits doubles to memory can influence results.
There is also some possibile confusion about the notion of a "seed". Some people define the seed as all input to set the initial state of the RNG. Others restrict it to only the explicit input in code, and exclude implicit input from e.g. HW sources or /dev/random.
Perhaps it's a little/big endian problem, or the code detects the processor in some way. Easiest way to do this would be to use breakpoints or similar debug routines to watch the seeding routines and the RNG itself at work.
It depends greatly on which RNG you are using. Things such like random(3) or the rand48(3) family are designed to return the same sequence when run with the same seed. Now, if the RNG you are using take /dev/random output, all bets are off and results will be different.
I want to supply a number, and then receive a set of random numbers. However, I want those numbers to be the same regardless of which computer I run it on (assuming I supply the same seed).
Basically my question is: in C++, if I make use of rand(), but supply srand() with a user-defined seed rather than the current time, will I be able to generate the same random number stream on any computer?
There are dozens of PRNGs available as libraries. Pick one. I tend to use Mersenne Twister.
By using an externally supplied library, you bypass the risk of a weird or buggy implementation of your language's library rand(). As long as your platforms all conform to the same mathematical semantics, you'll get consistent results.
MT is a favorite of mine because I'm a physicist, and I use these things for Monte Carlo, where the guarantee of equal-distribution to high dimensions is important. But don't use MT as a cryptographic PRNG!
srand() & rand() are not part of the STL. They're actually part of the C runtime.
Yes, they will produce the same results as long as it's the same implementation of srand()/rand().
Depending on your needs, you might want to consider using Boost.Random. It provides several high-quality random number generators.
Assuming the implementations of rand() are the same, yes.
The easiest way to ensure this is to include a known rand() implementation with your program - either included in your project's source code or in the form of a library you can manage.
No, the ANSI C standard only specifies that rand() must produce a stream of random integers between 0 and RAND_MAX, which must be at least 32767 (source). This stream must be deterministic only in that, for a given implementation on a given machine, it must produce the same integer stream given the same seed.
You want a portable PRNG. Mersenne Twister (many implementations linked at the bottom) is pretty portable, as is Ben Pfaff's homegrown C99-compliant PRNG. Boost.Random should be fine too; as you're writing your code in C++, using Boost doesn't limit your choice of platforms much (although some "lesser" (i.e. non-compliant) compilers may have trouble with its heavy use of template metaprogramming). This is only really a problem for low-volume embedded platforms and perhaps novel research architectures, so if by "any computer" you mean "any x86/PPC/ARM/SPARC/Alpha/etc. platform that GCC targets", any of the above should do just fine.
Write your own pseudorandom number routine. There are a lot of algorithms documented on the internet, and they have a number of applications where rand isn't good enough (e.g. Perlin Noise).
Try these links for starters:
http://en.wikipedia.org/wiki/Linear_congruential_generator
http://en.wikipedia.org/wiki/Pseudorandom_number_generator
I believe if you supply with srand with the same seed, you will get the same results. That's pretty much the definition of a seed in terms of pseudo random number generators.
Yes. For a given seed (starting value), the sequence of numbers that rand() returns will always be the same.