Generating 64-bit values with a 32-bit Mersenne Twister

Generating 64-bit values with a 32-bit Mersenne Twister - c++

According to this Boost documentation page, the 64-bit variant of the Mersenne Twister is much slower than its 32-bit counterpart (which makes sense). As I understand, a lot of the features introduced in C++11, including random number generation, are basically Boost in the standard library. This leads me to believe that 32-bit MT performance in standard C++ is also better.
I'm writing a raytracer (for fun, mostly), and speed is one of my primary concerns. Essentially all the numerical values are represented as double precision floats. My question is, since the 32-bit MT is considerably faster, can I use it to generate doubles? What drawbacks (precision loss, performance, etc.) can I expect?

For this I am adding one assumption that you did not mention: I am assuming you are doing one random draw per double. Obviously you can get twice the randomness by doing two draws.
The first question is really "does 32-bits of pseudorandomness have enough randomness for my ray tracer." My guess is yes. Most raytracers are only shooting out a few million rays, so you wont notice that there's only 4 billion bits of pseudorandomness.
Second question is "can I distribute the pseudorandomness across the domain of double values I care about." Again, my guess is yes. If you are shooting rays in a 90 degree field, and there are 4 billion possible results from one pseudorandom draw. For perspective, a sniper looking through a high power scope sees millions of times less angular precision than the average difference between those pseudorandom vectors.
All that being said, profile your code. I'd give a 99.9998% chance that your raytracing code itself takes much longer than the pseudorandom generation unless your scenes all consist of single non-reflective spheres

Related

How to access Intel Random Number generator RdRand in C++? [duplicate]

I have seen that Intel seems to have included a new assembly function to get real random numbers obtained from hardware. The name of the instruction is RdRand, but only a small amount of details seem accessible on it on Internet: http://en.wikipedia.org/wiki/RdRand
My questions concerning this new instruction and its use in C++11 are the following:
Are the random numbers generated with RdRand really random? (each bit generated from uncorrelated white noise or quantum processes? )
Is it a special feature of Ivy Bridge processors and will Intel continue to implement this function in the next generation of cpu?
How to use it through C++11? Maybe with std::random_device but do compilers already call RdRand if the instruction is available?
How to check whether RdRand is really called when I compile a program?

I designed the random number generator that supplies the random numbers to the RdRand instruction. So for a change, I really know the answers.
1) The random numbers are generated from an SP800-90 AES-CTR DRBG compliant PRNG. The AES uses a 128 bit key, and so the numbers have multiplicative prediction resistance up to 128 bits and additive beyond 128.
However the PRNG is reseeded from a full entropy source frequently. For isolated RdRand instructions it will be freshly reseeded. For 8 threads on 4 cores pulling as fast as possible, it will be reseeded always more frequently than once per 14 RdRands.
The seeds come from a true random number generator. This involves a 2.5Gbps entropy source that is fed into a 3:1 compression ratio entropy extractor using AES-CBC-MAC.
So it is in effect a TRNG, but one that falls back to the properties of a cryptographically secure PRNG for short sequences when heavily loaded.
This is exactly the semantic difference between /dev/random and /dev/urandom on linux, only a lot faster.
The entropy is ultimately gathered from a quantum process, since that is the only fundamental random process we know of in nature. In the DRNG it is specifically the thermal noise in the gates of 4 transistors that drive the resolution state of a metastable latch, 2.5 billion times a second.
The entropy source and conditioner is intended to SP800-90B and SP800-90C compliant, but those specs are still in draft form.
2) RdRand is a part of the standard intel instruction set. It will be supported in all CPU products in the future.
3) You either need to use inline assembly or a library (like openssl) that does use RdRand. If you use a library, the library is implementing the inline assembler that you could implement directly. Intel gives code examples on their web site.
Someone else mentioned librdrand.a. I wrote that. It's pretty simple.
4) Just look for the RdRand opcodes in the binary.

That certainly depends on your view of the determinism of the universe, so is more a philosophical question, but many people consider it being random.
Only intel will know, but since there was demand to add it, its likely there will be demand to keep it
std::random_device is not required to be hardware driven, and even if it is, it is not required to use rdrand. You can ask its double entropy() const noexcept member function whether it is hardware driven or not. Using rdrand for that is a QoI issue, but I would expect every sane implementation that has it available to do so (I have seen e.g. gcc doing it). If unsure, you can always check assembly, but also other means of hardware randomness should be good enough (there is other dedicated hardware available).
See above, if you are interested in whether its only hardware, use entropy, if interested in rdrand, scan the generated machine code.

Since PRISM and Snowden revelations, I would be very carefull at using hardware random generators, or relying on one single library, in an application with security concerns. I prefer using a combination of independant open source cryptographic random generators. By combination, I mean for example:
Let's ra, rb, rc be three independant cryptographic random generators, r be the random value returned to the application.
Let's sa, sb, sc be their seed, ta, tb, tc, reseed periods i.e. e.g. reseed rb every tb draws.
By independant: belonging as far as possible to independant libraries, and relying on different cyphers or algorithms.
Pseudo-code:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sc = rb every tc
r = rb xor rc
sa = rc every ta
Of course, every draw shall be used only once.
Probably two sources are enough:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sa = rb every ta
r = rb xor ra
Choose different values for ta, tb, tc. Their range depends on the strengh of the random source you use.
EDIT: I have started the new library ABaDooRand for this purpose.

1) No, the numbers from RdRand are not truly random, since they come from a cryptographically-secure pseudorandom number generator. However, RdRand, RdSeed, and the Intel Secure Key technology are probably the closest to truly random you will find.
2) Yes, the feature is available in all Intel processors that appear in laptops, desktops, and servers starting with the Ivy Bridge processors you mention. These days, the features are also implemented in AMD chips.
3 and 4) The Intel software development guide is the place to look for these answers. There is an interesting discussion of how Intel Secure Key is applied to an astrophysical problem here (http://iopscience.iop.org/article/10.3847/1538-4357/aa7ede/meta;jsessionid=A9DA9DDB925E6522D058F3CEEC7D0B21.ip-10-40-2-120) and non-paywalled version here (https://arxiv.org/abs/1707.02212). This paper describes how the technology works, how to implement it, and describes its performance (Sections 2.2.1 and 5). Had to read it for a class.

I think they are "said to be" random...Since it's for encryption. I wouldn't worry too much about the quality of the random numbers.
I think Intel will keep doing it as they always regard backward compatibility as important even if this instruction maybe useless in the future.
I am sorry I cannot answer this question because I don't use C++11.
You can try librdrand.a if you don't want to dig into assembly code. Intel has provided the library for free download on their website. I have tested it, it's pretty convenient and has error report mechanism (since the random number generator has a small probability of failing to generate a random number). So if you use this library, you only need to check the return value of the function in librdrand
Please let me know if there is anything wrong in my reply. Thanks
Good luck
xiangpisaiMM

How should I choose parameters for a smaller-than-standard std::mersenne_twister_engine?

I need a C++11 random number generator which is "good enough" and which I can save and restore state in. I want the saved state to be significantly smaller than the 6.6kb or so which this code produces
std::mt19937 rng (1);
std::ofstream save ("save.txt");
save << rng;
std::mersenne_twister_engine has a large number of parameters. It's a bit scary.
For my purposes, a period on the order of billions is sufficient. I've heard of TinyMT, that may be appropriate but can't see how to implement it as a template specialization.
How should I choose the parameters? I suspect it will break badly if I merely reduce the "state size" parameter to a few words.
I would consider using a different engine entirely but, apart from tolerating a moderate period, I don't want to sacrifice the quality of statistical randomness. Artefacts such as the below (for linear congruentals) are unacceptable.

If don't need a lot of numbers, any decent 64bit size RNG will be good. Out of top of my hat very good generator would be XorShift64*, paper http://arxiv.org/abs/1402.6246, code https://github.com/Iwan-Zotow/xorshift64STAR
Another option to use is PCG, "Quadratisch. Praktisch. Gut.", paper and code at http://www.pcg-random.org/
They are both statistically better than MT, the only disadvantage being small(er) period, but it is ok with you as far as I can see

There are many good generators with a small state: MRG32k3a, LFSR113, Chacha-8, Philox-32x4. Even Mixmax (with N=17) would be small by your standard (state of 17 doubles).
TinyMT is also a possibility, although Vigna has shown that some of the bits are not always good (not sure if the not so great lower bits really matters in practice).
I would be wary of xorshift based rngs, see the paper Again, random numbers fall mainly in the planes: xorshift128+ generators by Matsumoto for example. I am also dubious of PCG, if only for the colored table on frontpage of the website: it dumbs things down too much, does not present all the relevant generators, and is skewed towards PCG of course.

True random numbers with C++11 and RDRAND

I have seen that Intel seems to have included a new assembly function to get real random numbers obtained from hardware. The name of the instruction is RdRand, but only a small amount of details seem accessible on it on Internet: http://en.wikipedia.org/wiki/RdRand
My questions concerning this new instruction and its use in C++11 are the following:
Are the random numbers generated with RdRand really random? (each bit generated from uncorrelated white noise or quantum processes? )
Is it a special feature of Ivy Bridge processors and will Intel continue to implement this function in the next generation of cpu?
How to use it through C++11? Maybe with std::random_device but do compilers already call RdRand if the instruction is available?
How to check whether RdRand is really called when I compile a program?

That certainly depends on your view of the determinism of the universe, so is more a philosophical question, but many people consider it being random.
Only intel will know, but since there was demand to add it, its likely there will be demand to keep it
std::random_device is not required to be hardware driven, and even if it is, it is not required to use rdrand. You can ask its double entropy() const noexcept member function whether it is hardware driven or not. Using rdrand for that is a QoI issue, but I would expect every sane implementation that has it available to do so (I have seen e.g. gcc doing it). If unsure, you can always check assembly, but also other means of hardware randomness should be good enough (there is other dedicated hardware available).
See above, if you are interested in whether its only hardware, use entropy, if interested in rdrand, scan the generated machine code.

Since PRISM and Snowden revelations, I would be very carefull at using hardware random generators, or relying on one single library, in an application with security concerns. I prefer using a combination of independant open source cryptographic random generators. By combination, I mean for example:
Let's ra, rb, rc be three independant cryptographic random generators, r be the random value returned to the application.
Let's sa, sb, sc be their seed, ta, tb, tc, reseed periods i.e. e.g. reseed rb every tb draws.
By independant: belonging as far as possible to independant libraries, and relying on different cyphers or algorithms.
Pseudo-code:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sc = rb every tc
r = rb xor rc
sa = rc every ta
Of course, every draw shall be used only once.
Probably two sources are enough:
// init
seed std rand with time (at least millisec, preferably microsec)
sa = std rand xor time // of course, not the same time evaluation
// loop
sb = ra every tb
sa = rb every ta
r = rb xor ra
Choose different values for ta, tb, tc. Their range depends on the strengh of the random source you use.
EDIT: I have started the new library ABaDooRand for this purpose.

1) No, the numbers from RdRand are not truly random, since they come from a cryptographically-secure pseudorandom number generator. However, RdRand, RdSeed, and the Intel Secure Key technology are probably the closest to truly random you will find.
2) Yes, the feature is available in all Intel processors that appear in laptops, desktops, and servers starting with the Ivy Bridge processors you mention. These days, the features are also implemented in AMD chips.
3 and 4) The Intel software development guide is the place to look for these answers. There is an interesting discussion of how Intel Secure Key is applied to an astrophysical problem here (http://iopscience.iop.org/article/10.3847/1538-4357/aa7ede/meta;jsessionid=A9DA9DDB925E6522D058F3CEEC7D0B21.ip-10-40-2-120) and non-paywalled version here (https://arxiv.org/abs/1707.02212). This paper describes how the technology works, how to implement it, and describes its performance (Sections 2.2.1 and 5). Had to read it for a class.

I think they are "said to be" random...Since it's for encryption. I wouldn't worry too much about the quality of the random numbers.
I think Intel will keep doing it as they always regard backward compatibility as important even if this instruction maybe useless in the future.
I am sorry I cannot answer this question because I don't use C++11.
You can try librdrand.a if you don't want to dig into assembly code. Intel has provided the library for free download on their website. I have tested it, it's pretty convenient and has error report mechanism (since the random number generator has a small probability of failing to generate a random number). So if you use this library, you only need to check the return value of the function in librdrand
Please let me know if there is anything wrong in my reply. Thanks
Good luck
xiangpisaiMM

Fast way to compute n times 10 raised to the power of minus m

I want to compute 10 raised to the power minus m. In addition to use the math function pow(10, -m), is there any fast and efficient way to do that?
What I ask such a simple question to the c++ gurus from SO is that, as you know, just like base 2, 10 is also a special base. If some value n times the 10's power minus m, it is equivalent to move n's decimal point to the left m times. I think it must be a fast and efficient way to cope with.

For floating point m, so long as your standard library implementation is well written, then pow will be efficient.
If m is an integer, and you hinted that it is, then you could use an array of pre calculated values.
You should only be worrying about this kind of thing if that routine is a bottleneck in your code. That is if the calls to that routine take a significant proportion of the total running time.

Ten is not a special value on a binary machine, only two is. Use pow or exponentiation by squaring.

Unfortunately there is no fast and efficient way to calculate it using IEEE 754 floating point representation. The fastest way to get the result is to build a table for every value of m that you care about, and then just perform a lookup.

If there's a fast and efficient way to do it then I'm sure your CPU supports it, unless you're running on an embedded system in which case I'd hope that the pow(...) implementation is well written.
10 is special to us as most of us have ten fingers. Computers only have two digits, so 2 is special to them. :)

Use lookup table there cant be more than 1000 floats and especially if m is integer.

If you could operate with log n instead of n for a significant time, you could save time because instead of
n = pow(10*n,-m)
you now have to calculate (using the definition l = log10(n))
l = -m*(l+1)

Just some more ideas which may lead you to further solutions...
If you are interested in
optimization on algorithm level you
might look for a parallelized
approach.
You may speed up on
system/archtectural level on using Ipp
(for Intel Processors), or e.g. AMD
Core Math Library (ACML) for AMD
To use the power of your graphics
card may be another way (e.g. CUDA for NVIDEA cards)
I think it's also worth to look at
OpenCL

IEEE 754 specifies a bunch of floating-point formats. Those that are in widespread use are binary, which means that base 10 isn't in any way special. This is contrary to your assumption that "10 is also a special base".
Interestingly, IEEE 754-2008 does add decimal floating-point formats (decimal32 and friends). However, I'm yet to come across hardware implementations of those.
In any case, you shouldn't be micro-optimizing your code before you've profiled it and established that this is indeed the bottleneck.

Same code using floats on two computers gives two different results

I've got some image processing code in C++ which calculates gradients and finds straight lines in them with the hough transformation algorithm. The program does most of the calculations with floats.
When I run this code on the same image on two different computers, one Pentium IV running latest Fedora, the other a Core i5 latest Ubuntu, both 32 bit, I get slightly different results. E.g. I have after some lengthy calculation 1.3456f for some variable on the one machine and 1.3457f on the other. Is this expected behavior or should I search for errors in my program?
My first guess was, that I'm accessing some uninitialized or out-of-bounds memory but I did run the program through valgrind and it can't find any errors, also running multiple times on the same machine always gives the same results.

This is not uncommon and it will depend on your compiler, optimisation settings, math libraries, CPU, and of course the numerical stability of the algorithms that you are using.
You need to have a good idea of your accuracy requirements and if you are not meeting these then you may need to look at your algorithms and e.g. consider using double rather than float where needed.

For background on why given source code might not result in the same output on different computers, see What Every Computer Scientist Should Know About Floating-Point Arithmetic. I doubt this is due to any deficiency of your code unless it performs aggregation in a non-deterministic way eg. by centrally collating calculation results from multiple threads.
Floating point behaviour is often tunable per compiler options, even to the level of different CPUs. Check your compiler docs to see if you can reduce or eliminate the discrepancy. On Visual C++ (for example) this is done via /fp.

Is it due to the a phonomena called machine epsilon?
http://en.wikipedia.org/wiki/Machine_epsilon
There are limitations on flaoting-point number. The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.

Basically, the same C++ instructions can be compiled to different machine instructions (even on the same CPU and certainly on different CPUs) depending on a large number of factors, and the same machine instructions can lead to different low-level CPU actions depending on a large number of factors. In theory, these are supposed to be semantically equivalent, but with floating-point numbers, there are edge cases where they aren't.
Read "The pitfalls of verifying floating-point computations" by David Monniaux for details.

I will also say that this is very common, and probably not your fault.
I spent a lot of time in the past trying to figure out the same problem.
I would suggest to use decimal instead of float and double as long as your numbers do not refer to scientific calculations but to values like prices, quantities, exchange rates, etc.

This is totally normal, unfortunately.
There are libraries which can produce identical results everywhere--see http://www.mpfr.org/ for an example. But the performance cost is substantial and it's probably not worth it unless exact identical results are the most important criterion.
I've actually written a closed-source library which implemented floating-point math in the integer unit, in order to make floats provide identical results on multiple platforms (Intel, AMD, PowerPC) across different compilers. We had an app which simply could not function if floating-point results varied. It was quite a challenge, though. If we could do it again we'd have just designed the original app in fixed-point, but at the time it was too much code to rewrite.

Either this is a difference between the internal representation of the float, making slightly different results, or perhaps it is a difference in the way the float is printed to the screen? I doubt that it is your fault...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js