Generate random numbers following a normal distribution in C/C++ - c++

How can I easily generate random numbers following a normal distribution in C or C++?
I don't want any use of Boost.
I know that Knuth talks about this at length but I don't have his books at hand right now.

There are many methods to generate Gaussian-distributed numbers from a regular RNG.
The Box-Muller transform is commonly used. It correctly produces values with a normal distribution. The math is easy. You generate two (uniform) random numbers, and by applying an formula to them, you get two normally distributed random numbers. Return one, and save the other for the next request for a random number.

C++11
C++11 offers std::normal_distribution, which is the way I would go today.
C or older C++
Here are some solutions in order of ascending complexity:
Add 12 uniform random numbers from 0 to 1 and subtract 6. This will match mean and standard deviation of a normal variable. An obvious drawback is that the range is limited to ±6 – unlike a true normal distribution.
The Box-Muller transform. This is listed above, and is relatively simple to implement. If you need very precise samples, however, be aware that the Box-Muller transform combined with some uniform generators suffers from an anomaly called Neave Effect1.
For best precision, I suggest drawing uniforms and applying the inverse cumulative normal distribution to arrive at normally distributed variates. Here is a very good algorithm for inverse cumulative normal distributions.
1. H. R. Neave, “On using the Box-Muller transformation with multiplicative congruential pseudorandom number generators,” Applied Statistics, 22, 92-97, 1973

A quick and easy method is just to sum a number of evenly distributed random numbers and take their average. See the Central Limit Theorem for a full explanation of why this works.

I created a C++ open source project for normally distributed random number generation benchmark.
It compares several algorithms, including
Central limit theorem method
Box-Muller transform
Marsaglia polar method
Ziggurat algorithm
Inverse transform sampling method.
cpp11random uses C++11 std::normal_distribution with std::minstd_rand (it is actually Box-Muller transform in clang).
The results of single-precision (float) version on iMac Corei5-3330S#2.70GHz , clang 6.1, 64-bit:
For correctness, the program verifies the mean, standard deviation, skewness and kurtosis of the samples. It was found that CLT method by summing 4, 8 or 16 uniform numbers do not have good kurtosis as the other methods.
Ziggurat algorithm has better performance than the others. However, it does not suitable for SIMD parallelism as it needs table lookup and branches. Box-Muller with SSE2/AVX instruction set is much faster (x1.79, x2.99) than non-SIMD version of ziggurat algorithm.
Therefore, I will suggest using Box-Muller for architecture with SIMD instruction sets, and may be ziggurat otherwise.
P.S. the benchmark uses a simplest LCG PRNG for generating uniform distributed random numbers. So it may not be sufficient for some applications. But the performance comparison should be fair because all implementations uses the same PRNG, so the benchmark mainly tests the performance of the transformation.

Here's a C++ example, based on some of the references. This is quick and dirty, you are better off not re-inventing and using the boost library.
#include "math.h" // for RAND, and rand
double sampleNormal() {
double u = ((double) rand() / (RAND_MAX)) * 2 - 1;
double v = ((double) rand() / (RAND_MAX)) * 2 - 1;
double r = u * u + v * v;
if (r == 0 || r > 1) return sampleNormal();
double c = sqrt(-2 * log(r) / r);
return u * c;
}
You can use a Q-Q plot to examine the results and see how well it approximates a real normal distribution (rank your samples 1..x, turn the ranks into proportions of total count of x ie. how many samples, get the z-values and plot them. An upwards straight line is the desired result).

Use std::tr1::normal_distribution.
The std::tr1 namespace is not a part of boost. It's the namespace that contains the library additions from the C++ Technical Report 1 and is available in up to date Microsoft compilers and gcc, independently of boost.

This is how you generate the samples on a modern C++ compiler.
#include <random>
...
std::mt19937 generator;
double mean = 0.0;
double stddev = 1.0;
std::normal_distribution<double> normal(mean, stddev);
cerr << "Normal: " << normal(generator) << endl;

You can use the GSL. Some complete examples are given to demonstrate how to use it.

Have a look on: http://www.cplusplus.com/reference/random/normal_distribution/. It's the simplest way to produce normal distributions.

If you're using C++11, you can use std::normal_distribution:
#include <random>
std::default_random_engine generator;
std::normal_distribution<double> distribution(/*mean=*/0.0, /*stddev=*/1.0);
double randomNumber = distribution(generator);
There are many other distributions you can use to transform the output of the random number engine.

I've followed the definition of the PDF given in http://www.mathworks.com/help/stats/normal-distribution.html and came up with this:
const double DBL_EPS_COMP = 1 - DBL_EPSILON; // DBL_EPSILON is defined in <limits.h>.
inline double RandU() {
return DBL_EPSILON + ((double) rand()/RAND_MAX);
}
inline double RandN2(double mu, double sigma) {
return mu + (rand()%2 ? -1.0 : 1.0)*sigma*pow(-log(DBL_EPS_COMP*RandU()), 0.5);
}
inline double RandN() {
return RandN2(0, 1.0);
}
It is maybe not the best approach, but it's quite simple.

The comp.lang.c FAQ list shares three different ways to easily generate random numbers with a Gaussian distribution.
You may take a look of it: http://c-faq.com/lib/gaussian.html

There exists various algorithms for the inverse cumulative normal distribution. The most popular in quantitative finance are tested on http://chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/
In my opinion, there is not much incentive to use something else than algorithm AS241 from Wichura: it is machine precision, reliable and fast. Bottlenecks are rarely in the Gaussian random number generation.
The top answer here advocates for Box-Müller, you should be aware that it has known deficiencies. I quote https://www.sciencedirect.com/science/article/pii/S0895717710005935:
in the literature, Box–Muller is sometimes regarded as slightly inferior, mainly for two reasons. First, if one applies the Box–Muller method to numbers from a bad linear congruential generator, the transformed numbers provide an extremely poor coverage of the space. Plots of transformed numbers with spiraling tails can be found in many books, most notably in the classic book of Ripley, who was probably the first to make this observation"

Box-Muller implementation:
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
// return a uniformly distributed random number
double RandomGenerator()
{
return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
// return a normally distributed random number
double normalRandom()
{
double y1=RandomGenerator();
double y2=RandomGenerator();
return cos(2*3.14*y2)*sqrt(-2.*log(y1));
}
int main(){
double sigma = 82.;
double Mi = 40.;
for(int i=0;i<100;i++){
double x = normalRandom()*sigma+Mi;
cout << " x = " << x << endl;
}
return 0;
}

1) Graphically intuitive way you can generate Gaussian random numbers is by using something similar to the Monte Carlo method. You would generate a random point in a box around the Gaussian curve using your pseudo-random number generator in C. You can calculate if that point is inside or underneath the Gaussian distribution using the equation of the distribution. If that point is inside the Gaussian distribution, then you have got your Gaussian random number as the x value of the point.
This method isn't perfect because technically the Gaussian curve goes on towards infinity, and you couldn't create a box that approaches infinity in the x dimension. But the Guassian curve approaches 0 in the y dimension pretty fast so I wouldn't worry about that. The constraint of the size of your variables in C may be more of a limiting factor to your accuracy.
2) Another way would be to use the Central Limit Theorem which states that when independent random variables are added, they form a normal distribution. Keeping this theorem in mind, you can approximate a Gaussian random number by adding a large amount of independent random variables.
These methods aren't the most practical, but that is to be expected when you don't want to use a preexisting library. Keep in mind this answer is coming from someone with little or no calculus or statistics experience.

Monte Carlo method
The most intuitive way to do this would be to use a monte carlo method. Take a suitable range -X, +X. Larger values of X will result in a more accurate normal distribution, but takes longer to converge.
a. Choose a random number z between -X to X.
b. Keep with a probability of N(z, mean, variance) where N is the gaussian distribution. Drop otherwise and go back to step (a).

Take a look at what I found.
This library uses the Ziggurat algorithm.

Computer is deterministic device. There is no randomness in calculation.
Moreover arithmetic device in CPU can evaluate summ over some finite set of integer numbers (performing evaluation in finite field) and finite set of real rational numbers. And also performed bitwise operations. Math take a deal with more great sets like [0.0, 1.0] with infinite number of points.
You can listen some wire inside of computer with some controller, but would it have uniform distributions? I don't know. But if assumed that it's signal is the the result of accumulate values huge amount of independent random variables then you will receive approximately normal distributed random variable (It was proved in Probability Theory)
There is exist algorithms called - pseudo random generator. As I feeled the purpose of pseudo random generator is to emulate randomness. And the criteria of goodnes is:
- the empirical distribution is converged (in some sense - pointwise, uniform, L2) to theoretical
- values that you receive from random generator are seemed to be idependent. Of course it's not true from 'real point of view', but we assume it's true.
One of the popular method - you can summ 12 i.r.v with uniform distributions....But to be honest during derivation Central Limit Theorem with helping of Fourier Transform, Taylor Series, it is neededed to have n->+inf assumptions couple times. So for example theoreticaly - Personally I don't undersand how people perform summ of 12 i.r.v. with uniform distribution.
I had probility theory in university. And particulary for me it is just a math question. In university I saw the following model:
double generateUniform(double a, double b)
{
return uniformGen.generateReal(a, b);
}
double generateRelei(double sigma)
{
return sigma * sqrt(-2 * log(1.0 - uniformGen.generateReal(0.0, 1.0 -kEps)));
}
double generateNorm(double m, double sigma)
{
double y2 = generateUniform(0.0, 2 * kPi);
double y1 = generateRelei(1.0);
double x1 = y1 * cos(y2);
return sigma*x1 + m;
}
Such way how todo it was just an example, I guess it exist another ways to implement it.
Provement that it is correct can be found in this book
"Moscow, BMSTU, 2004: XVI Probability Theory, Example 6.12, p.246-247" of Krishchenko Alexander Petrovich ISBN 5-7038-2485-0
Unfortunately I don't know about existence of translation of this book into English.

Related

Which approximation algorithm is used for sin() by compilers? [duplicate]

I've been poring through .NET disassemblies and the GCC source code, but can't seem to find anywhere the actual implementation of sin() and other math functions... they always seem to be referencing something else.
Can anyone help me find them? I feel like it's unlikely that ALL hardware that C will run on supports trig functions in hardware, so there must be a software algorithm somewhere, right?
I'm aware of several ways that functions can be calculated, and have written my own routines to compute functions using taylor series for fun. I'm curious about how real, production languages do it, since all of my implementations are always several orders of magnitude slower, even though I think my algorithms are pretty clever (obviously they're not).
In GNU libm, the implementation of sin is system-dependent. Therefore you can find the implementation, for each platform, somewhere in the appropriate subdirectory of sysdeps.
One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code: sysdeps/ieee754/dbl-64/s_sin.c, look for __sin (double x).
This code is very complex. No one software algorithm is as fast as possible and also accurate over the whole range of x values, so the library implements several different algorithms, and its first job is to look at x and decide which algorithm to use.
When x is very very close to 0, sin(x) == x is the right answer.
A bit further out, sin(x) uses the familiar Taylor series. However, this is only accurate near 0, so...
When the angle is more than about 7°, a different algorithm is used, computing Taylor-series approximations for both sin(x) and cos(x), then using values from a precomputed table to refine the approximation.
When |x| > 2, none of the above algorithms would work, so the code starts by computing some value closer to 0 that can be fed to sin or cos instead.
There's yet another branch to deal with x being a NaN or infinity.
This code uses some numerical hacks I've never seen before, though for all I know they might be well-known among floating-point experts. Sometimes a few lines of code would take several paragraphs to explain. For example, these two lines
double t = (x * hpinv + toint);
double xn = t - toint;
are used (sometimes) in reducing x to a value close to 0 that differs from x by a multiple of π/2, specifically xn × π/2. The way this is done without division or branching is rather clever. But there's no comment at all!
Older 32-bit versions of GCC/glibc used the fsin instruction, which is surprisingly inaccurate for some inputs. There's a fascinating blog post illustrating this with just 2 lines of code.
fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: fdlibm/s_sin.c and fdlibm/k_sin.c
Functions like sine and cosine are implemented in microcode inside microprocessors. Intel chips, for example, have assembly instructions for these. A C compiler will generate code that calls these assembly instructions. (By contrast, a Java compiler will not. Java evaluates trig functions in software rather than hardware, and so it runs much slower.)
Chips do not use Taylor series to compute trig functions, at least not entirely. First of all they use CORDIC, but they may also use a short Taylor series to polish up the result of CORDIC or for special cases such as computing sine with high relative accuracy for very small angles. For more explanation, see this StackOverflow answer.
OK kiddies, time for the pros....
This is one of my biggest complaints with inexperienced software engineers. They come in calculating transcendental functions from scratch (using Taylor's series) as if nobody had ever done these calculations before in their lives. Not true. This is a well defined problem and has been approached thousands of times by very clever software and hardware engineers and has a well defined solution.
Basically, most of the transcendental functions use Chebyshev Polynomials to calculate them. As to which polynomials are used depends on the circumstances. First, the bible on this matter is a book called "Computer Approximations" by Hart and Cheney. In that book, you can decide if you have a hardware adder, multiplier, divider, etc, and decide which operations are fastest. e.g. If you had a really fast divider, the fastest way to calculate sine might be P1(x)/P2(x) where P1, P2 are Chebyshev polynomials. Without the fast divider, it might be just P(x), where P has much more terms than P1 or P2....so it'd be slower. So, first step is to determine your hardware and what it can do. Then you choose the appropriate combination of Chebyshev polynomials (is usually of the form cos(ax) = aP(x) for cosine for example, again where P is a Chebyshev polynomial). Then you decide what decimal precision you want. e.g. if you want 7 digits precision, you look that up in the appropriate table in the book I mentioned, and it will give you (for precision = 7.33) a number N = 4 and a polynomial number 3502. N is the order of the polynomial (so it's p4.x^4 + p3.x^3 + p2.x^2 + p1.x + p0), because N=4. Then you look up the actual value of the p4,p3,p2,p1,p0 values in the back of the book under 3502 (they'll be in floating point). Then you implement your algorithm in software in the form:
(((p4.x + p3).x + p2).x + p1).x + p0
....and this is how you'd calculate cosine to 7 decimal places on that hardware.
Note that most hardware implementations of transcendental operations in an FPU usually involve some microcode and operations like this (depends on the hardware).
Chebyshev polynomials are used for most transcendentals but not all. e.g. Square root is faster to use a double iteration of Newton raphson method using a lookup table first.
Again, that book "Computer Approximations" will tell you that.
If you plan on implmementing these functions, I'd recommend to anyone that they get a copy of that book. It really is the bible for these kinds of algorithms.
Note that there are bunches of alternative means for calculating these values like cordics, etc, but these tend to be best for specific algorithms where you only need low precision. To guarantee the precision every time, the chebyshev polynomials are the way to go. Like I said, well defined problem. Has been solved for 50 years now.....and thats how it's done.
Now, that being said, there are techniques whereby the Chebyshev polynomials can be used to get a single precision result with a low degree polynomial (like the example for cosine above). Then, there are other techniques to interpolate between values to increase the accuracy without having to go to a much larger polynomial, such as "Gal's Accurate Tables Method". This latter technique is what the post referring to the ACM literature is referring to. But ultimately, the Chebyshev Polynomials are what are used to get 90% of the way there.
Enjoy.
For sin specifically, using Taylor expansion would give you:
sin(x) := x - x^3/3! + x^5/5! - x^7/7! + ... (1)
you would keep adding terms until either the difference between them is lower than an accepted tolerance level or just for a finite amount of steps (faster, but less precise). An example would be something like:
float sin(float x)
{
float res=0, pow=x, fact=1;
for(int i=0; i<5; ++i)
{
res+=pow/fact;
pow*=-1*x*x;
fact*=(2*(i+1))*(2*(i+1)+1);
}
return res;
}
Note: (1) works because of the aproximation sin(x)=x for small angles. For bigger angles you need to calculate more and more terms to get acceptable results.
You can use a while argument and continue for a certain accuracy:
double sin (double x){
int i = 1;
double cur = x;
double acc = 1;
double fact= 1;
double pow = x;
while (fabs(acc) > .00000001 && i < 100){
fact *= ((2*i)*(2*i+1));
pow *= -1 * x*x;
acc = pow / fact;
cur += acc;
i++;
}
return cur;
}
Concerning trigonometric function like sin(), cos(),tan() there has been no mention, after 5 years, of an important aspect of high quality trig functions: Range reduction.
An early step in any of these functions is to reduce the angle, in radians, to a range of a 2*π interval. But π is irrational so simple reductions like x = remainder(x, 2*M_PI) introduce error as M_PI, or machine pi, is an approximation of π. So, how to do x = remainder(x, 2*π)?
Early libraries used extended precision or crafted programming to give quality results but still over a limited range of double. When a large value was requested like sin(pow(2,30)), the results were meaningless or 0.0 and maybe with an error flag set to something like TLOSS total loss of precision or PLOSS partial loss of precision.
Good range reduction of large values to an interval like -π to π is a challenging problem that rivals the challenges of the basic trig function, like sin(), itself.
A good report is Argument reduction for huge arguments: Good to the last bit (1992). It covers the issue well: discusses the need and how things were on various platforms (SPARC, PC, HP, 30+ other) and provides a solution algorithm the gives quality results for all double from -DBL_MAX to DBL_MAX.
If the original arguments are in degrees, yet may be of a large value, use fmod() first for improved precision. A good fmod() will introduce no error and so provide excellent range reduction.
// sin(degrees2radians(x))
sin(degrees2radians(fmod(x, 360.0))); // -360.0 < fmod(x,360) < +360.0
Various trig identities and remquo() offer even more improvement. Sample: sind()
Yes, there are software algorithms for calculating sin too. Basically, calculating these kind of stuff with a digital computer is usually done using numerical methods like approximating the Taylor series representing the function.
Numerical methods can approximate functions to an arbitrary amount of accuracy and since the amount of accuracy you have in a floating number is finite, they suit these tasks pretty well.
Use Taylor series and try to find relation between terms of the series so you don't calculate things again and again
Here is an example for cosinus:
double cosinus(double x, double prec)
{
double t, s ;
int p;
p = 0;
s = 1.0;
t = 1.0;
while(fabs(t/s) > prec)
{
p++;
t = (-t * x * x) / ((2 * p - 1) * (2 * p));
s += t;
}
return s;
}
using this we can get the new term of the sum using the already used one (we avoid the factorial and x2p)
It is a complex question. Intel-like CPU of the x86 family have a hardware implementation of the sin() function, but it is part of the x87 FPU and not used anymore in 64-bit mode (where SSE2 registers are used instead). In that mode, a software implementation is used.
There are several such implementations out there. One is in fdlibm and is used in Java. As far as I know, the glibc implementation contains parts of fdlibm, and other parts contributed by IBM.
Software implementations of transcendental functions such as sin() typically use approximations by polynomials, often obtained from Taylor series.
Chebyshev polynomials, as mentioned in another answer, are the polynomials where the largest difference between the function and the polynomial is as small as possible. That is an excellent start.
In some cases, the maximum error is not what you are interested in, but the maximum relative error. For example for the sine function, the error near x = 0 should be much smaller than for larger values; you want a small relative error. So you would calculate the Chebyshev polynomial for sin x / x, and multiply that polynomial by x.
Next you have to figure out how to evaluate the polynomial. You want to evaluate it in such a way that the intermediate values are small and therefore rounding errors are small. Otherwise the rounding errors might become a lot larger than errors in the polynomial. And with functions like the sine function, if you are careless then it may be possible that the result that you calculate for sin x is greater than the result for sin y even when x < y. So careful choice of the calculation order and calculation of upper bounds for the rounding error are needed.
For example, sin x = x - x^3/6 + x^5 / 120 - x^7 / 5040... If you calculate naively sin x = x * (1 - x^2/6 + x^4/120 - x^6/5040...), then that function in parentheses is decreasing, and it will happen that if y is the next larger number to x, then sometimes sin y will be smaller than sin x. Instead, calculate sin x = x - x^3 * (1/6 - x^2 / 120 + x^4/5040...) where this cannot happen.
When calculating Chebyshev polynomials, you usually need to round the coefficients to double precision, for example. But while a Chebyshev polynomial is optimal, the Chebyshev polynomial with coefficients rounded to double precision is not the optimal polynomial with double precision coefficients!
For example for sin (x), where you need coefficients for x, x^3, x^5, x^7 etc. you do the following: Calculate the best approximation of sin x with a polynomial (ax + bx^3 + cx^5 + dx^7) with higher than double precision, then round a to double precision, giving A. The difference between a and A would be quite large. Now calculate the best approximation of (sin x - Ax) with a polynomial (b x^3 + cx^5 + dx^7). You get different coefficients, because they adapt to the difference between a and A. Round b to double precision B. Then approximate (sin x - Ax - Bx^3) with a polynomial cx^5 + dx^7 and so on. You will get a polynomial that is almost as good as the original Chebyshev polynomial, but much better than Chebyshev rounded to double precision.
Next you should take into account the rounding errors in the choice of polynomial. You found a polynomial with minimum error in the polynomial ignoring rounding error, but you want to optimise polynomial plus rounding error. Once you have the Chebyshev polynomial, you can calculate bounds for the rounding error. Say f (x) is your function, P (x) is the polynomial, and E (x) is the rounding error. You don't want to optimise | f (x) - P (x) |, you want to optimise | f (x) - P (x) +/- E (x) |. You will get a slightly different polynomial that tries to keep the polynomial errors down where the rounding error is large, and relaxes the polynomial errors a bit where the rounding error is small.
All this will get you easily rounding errors of at most 0.55 times the last bit, where +,-,*,/ have rounding errors of at most 0.50 times the last bit.
The actual implementation of library functions is up to the specific compiler and/or library provider. Whether it's done in hardware or software, whether it's a Taylor expansion or not, etc., will vary.
I realize that's absolutely no help.
There's nothing like hitting the source and seeing how someone has actually done it in a library in common use; let's look at one C library implementation in particular. I chose uLibC.
Here's the sin function:
http://git.uclibc.org/uClibc/tree/libm/s_sin.c
which looks like it handles a few special cases, and then carries out some argument reduction to map the input to the range [-pi/4,pi/4], (splitting the argument into two parts, a big part and a tail) before calling
http://git.uclibc.org/uClibc/tree/libm/k_sin.c
which then operates on those two parts.
If there is no tail, an approximate answer is generated using a polynomial of degree 13.
If there is a tail, you get a small corrective addition based on the principle that sin(x+y) = sin(x) + sin'(x')y
They are typically implemented in software and will not use the corresponding hardware (that is, aseembly) calls in most cases. However, as Jason pointed out, these are implementation specific.
Note that these software routines are not part of the compiler sources, but will rather be found in the correspoding library such as the clib, or glibc for the GNU compiler. See http://www.gnu.org/software/libc/manual/html_mono/libc.html#Trig-Functions
If you want greater control, you should carefully evaluate what you need exactly. Some of the typical methods are interpolation of look-up tables, the assembly call (which is often slow), or other approximation schemes such as Newton-Raphson for square roots.
If you want an implementation in software, not hardware, the place to look for a definitive answer to this question is Chapter 5 of Numerical Recipes. My copy is in a box, so I can't give details, but the short version (if I remember this right) is that you take tan(theta/2) as your primitive operation and compute the others from there. The computation is done with a series approximation, but it's something that converges much more quickly than a Taylor series.
Sorry I can't rembember more without getting my hand on the book.
Whenever such a function is evaluated, then at some level there is most likely either:
A table of values which is interpolated (for fast, inaccurate applications - e.g. computer graphics)
The evaluation of a series that converges to the desired value --- probably not a taylor series, more likely something based on a fancy quadrature like Clenshaw-Curtis.
If there is no hardware support then the compiler probably uses the latter method, emitting only assembler code (with no debug symbols), rather than using a c library --- making it tricky for you to track the actual code down in your debugger.
If you want to look at the actual GNU implementation of those functions in C, check out the latest trunk of glibc. See the GNU C Library.
As many people pointed out, it is implementation dependent. But as far as I understand your question, you were interested in a real software implemetnation of math functions, but just didn't manage to find one. If this is the case then here you are:
Download glibc source code from http://ftp.gnu.org/gnu/glibc/
Look at file dosincos.c located in unpacked glibc root\sysdeps\ieee754\dbl-64 folder
Similarly you can find implementations of the rest of the math library, just look for the file with appropriate name
You may also have a look at the files with the .tbl extension, their contents is nothing more than huge tables of precomputed values of different functions in a binary form. That is why the implementation is so fast: instead of computing all the coefficients of whatever series they use they just do a quick lookup, which is much faster. BTW, they do use Tailor series to calculate sine and cosine.
I hope this helps.
I'll try to answer for the case of sin() in a C program, compiled with GCC's C compiler on a current x86 processor (let's say a Intel Core 2 Duo).
In the C language the Standard C Library includes common math functions, not included in the language itself (e.g. pow, sin and cos for power, sine, and cosine respectively). The headers of which are included in math.h.
Now on a GNU/Linux system, these libraries functions are provided by glibc (GNU libc or GNU C Library). But the GCC compiler wants you to link to the math library (libm.so) using the -lm compiler flag to enable usage of these math functions. I'm not sure why it isn't part of the standard C library. These would be a software version of the floating point functions, or "soft-float".
Aside: The reason for having the math functions separate is historic, and was merely intended to reduce the size of executable programs in very old Unix systems, possibly before shared libraries were available, as far as I know.
Now the compiler may optimize the standard C library function sin() (provided by libm.so) to be replaced with an call to a native instruction to your CPU/FPU's built-in sin() function, which exists as an FPU instruction (FSIN for x86/x87) on newer processors like the Core 2 series (this is correct pretty much as far back as the i486DX). This would depend on optimization flags passed to the gcc compiler. If the compiler was told to write code that would execute on any i386 or newer processor, it would not make such an optimization. The -mcpu=486 flag would inform the compiler that it was safe to make such an optimization.
Now if the program executed the software version of the sin() function, it would do so based on a CORDIC (COordinate Rotation DIgital Computer) or BKM algorithm, or more likely a table or power-series calculation which is commonly used now to calculate such transcendental functions. [Src: http://en.wikipedia.org/wiki/Cordic#Application]
Any recent (since 2.9x approx.) version of gcc also offers a built-in version of sin, __builtin_sin() that it will used to replace the standard call to the C library version, as an optimization.
I'm sure that is as clear as mud, but hopefully gives you more information than you were expecting, and lots of jumping off points to learn more yourself.
Don't use Taylor series. Chebyshev polynomials are both faster and more accurate, as pointed out by a couple of people above. Here is an implementation (originally from the ZX Spectrum ROM): https://albertveli.wordpress.com/2015/01/10/zx-sine/
Computing sine/cosine/tangent is actually very easy to do through code using the Taylor series. Writing one yourself takes like 5 seconds.
The whole process can be summed up with this equation here:
Here are some routines I wrote for C:
double _pow(double a, double b) {
double c = 1;
for (int i=0; i<b; i++)
c *= a;
return c;
}
double _fact(double x) {
double ret = 1;
for (int i=1; i<=x; i++)
ret *= i;
return ret;
}
double _sin(double x) {
double y = x;
double s = -1;
for (int i=3; i<=100; i+=2) {
y+=s*(_pow(x,i)/_fact(i));
s *= -1;
}
return y;
}
double _cos(double x) {
double y = 1;
double s = -1;
for (int i=2; i<=100; i+=2) {
y+=s*(_pow(x,i)/_fact(i));
s *= -1;
}
return y;
}
double _tan(double x) {
return (_sin(x)/_cos(x));
}
Improved version of code from Blindy's answer
#define EPSILON .0000000000001
// this is smallest effective threshold, at least on my OS (WSL ubuntu 18)
// possibly because factorial part turns 0 at some point
// and it happens faster then series element turns 0;
// validation was made against sin() from <math.h>
double ft_sin(double x)
{
int k = 2;
double r = x;
double acc = 1;
double den = 1;
double num = x;
// precision drops rapidly when x is not close to 0
// so move x to 0 as close as possible
while (x > PI)
x -= PI;
while (x < -PI)
x += PI;
if (x > PI / 2)
return (ft_sin(PI - x));
if (x < -PI / 2)
return (ft_sin(-PI - x));
// not using fabs for performance reasons
while (acc > EPSILON || acc < -EPSILON)
{
num *= -x * x;
den *= k * (k + 1);
acc = num / den;
r += acc;
k += 2;
}
return (r);
}
The essence of how it does this lies in this excerpt from Applied Numerical Analysis by Gerald Wheatley:
When your software program asks the computer to get a value of
or , have you wondered how it can get the
values if the most powerful functions it can compute are polynomials?
It doesnt look these up in tables and interpolate! Rather, the
computer approximates every function other than polynomials from some
polynomial that is tailored to give the values very accurately.
A few points to mention on the above is that some algorithms do infact interpolate from a table, albeit only for the first few iterations. Also note how it mentions that computers utilise approximating polynomials without specifying which type of approximating polynomial. As others in the thread have pointed out, Chebyshev polynomials are more efficient than Taylor polynomials in this case.
if you want sin then
__asm__ __volatile__("fsin" : "=t"(vsin) : "0"(xrads));
if you want cos then
__asm__ __volatile__("fcos" : "=t"(vcos) : "0"(xrads));
if you want sqrt then
__asm__ __volatile__("fsqrt" : "=t"(vsqrt) : "0"(value));
so why use inaccurate code when the machine instructions will do?

How to sample from a normal distribution restricted to a certain interval, C++ implementation?

With this function I can sample from a normal distribution. I was wondering how could I sample efficiently from a normal distribution restricted to a certain interval [a,b]. My trivial approach would be to sample from the normal distribution and then keep the value if it belongs to a certain interval, otherwise re-sample. However would probably discards many values before I get a suitable one.
I could also approximate the normal distribution using a triangular distrubution, however I don't think this would be accurate enough.
I could also try to work on the cumulative function, but probably this would be slow as well. Is there any efficient approach to the problem?
Thx
I'm assuming you know how to transform to and from standard normal with shifting by μ and scaling by σ.
Option 1, as you said, is acceptance/rejection. Generate normals as usual, reject them if they're outside the range [a, b]. It's not as inefficient as you might think. If p = P{a < Z < b}, then the number of trials required follows a geometric distribution with parameter p and the expected number of attempts before accepting a value is 1/p.
Option 2 is to use an inverse Gaussian function, such as the one in boost. Calculate lo = Φ(a) and hi = Φ(b), the probabilities of your normal being below a and b, respectively. Then generate U distributed uniformly between lo and hi, and crank the resulting set of U's through the inverse Gaussian function and rescale to get outcomes with the desired truncated distribution.
The normal distribution is an integral, see the formula:
std::cout << "riemann_midpnt_sum = " << 1 / (sqrt(2*PI)) * riemann_mid_point_sum(fctn, -1, 1.0, 100) << '\n';
// where fctn is the function inside the integral
double fctn(double x) {
return exp(-(x*x)/2);
}
output: "riemann_midpnt_sum = 0.682698"
This calculates the normal distribution (standard) from -1 to 1.
This is using a riemman sum approximate the integral. You can take the riemman sum from here
You could have a look at the implementation of the normal dist function in your standard library (e.g., https://gcc.gnu.org/onlinedocs/gcc-4.6.3/libstdc++/api/a00277.html), and figure out a way to re-implement this with your constraint.
It might be tricky to understand the template-heavy library code, but if you really need speed then the trivial approach is not well suited, particularly if your interval is quite small.

How to generate a massive amount of high quality Random Numbers?

I'm working on a random walk simulation of particles moving in a lattice. For that reason I must create a massive amount of random numbers, about 10^12 and above. Currently I'm using the possibilities C++11 provides with <random>. When profiling my program, I see that a major amount of time is spent in <random>. The vast majority of those numbers are between 0 and 1, evenly distributed. Here a then I need a number from a binomial distribution. But the focus lies on the 0..1 numbers.
The question is: What can I do to reduce the CPU time needed to generate these numbers and what would the impact be on their quality?
As you can see, I tried different engines, but that had no big effect on CPU time. Further, what is the difference between my uniform01(gen) and generate_canonical<double,numeric_limits<double>::digits>(gen) anyhow?
Edit: Reading through the answers I conclude that there is not THE ideal solution for my problem. Thus I decided to first make my program multi threading capable and run multiple RNG in different threads (seeded with one random_device number + an thread individual increment). For the time being this seams to be the most unavoidable step (multi threading would be required anyhow). As a further step, pending on exact requirements I consider switching to the suggested Intel RNG or to Thrust. Meaning that my RNG implementation should not be to complex, which, currently is is not. But for now I like to focus on the physical correctness of my model and not on programming stuff, this comes as soon as the output of my program is physically correct.
Thrust
Concerning Intel RNG
Here is what I do currently:
class Generator {
public:
Generator();
virtual ~Generator();
double rand01(); //random number [0,1)
int binomial(int n, double p); //binomial distribution with n samples with probability p
private:
std::random_device randev; //seed
/*Engines*/
std::mt19937_64 gen;
//std::mt19937 gen;
//std::default_random_engine gen;
/*Distributions*/
std::uniform_real_distribution<double> uniform01;
std::binomial_distribution<> binomialdist;
};
Generator::Generator() : randev(), gen(randev()), uniform01(0.,1.), binomial(1,1.) {
}
Generator::~Generator() { }
double Generator::rand01() {
//return uniform01(gen);
return generate_canonical<double,numeric_limits<double>::digits>(gen);
}
int Generator::binomialdist(int n, double p) {
binomial.param(binomial_distribution<>::param_type(n,p));
return binomial(gen);
}
You can pre-process random numbers and use them when you need.
If you need true random numbers I suggest you to use a service like http://www.random.org/ that ensures random numbers calculated by environment ambient instead that some algorithm.
And, speaking about random numbers, you must also check this:
If you need a massive amount of random numbers, and I mean MASSIVE, do a careful search on the internet for IBM's floating point random number generator, published maybe ten years ago. You'll have to buy either a PowerPC machine, or a newer Intel machine with fused multiply-add. They achieved random numbers at a rate of one per cycle per core. So if you bought a new Mac Pro, you could achieve probably 50 billion random numbers per second.
Perhaps instead of using a CPU you could use a GPU to generate many numbers concurrently?
Efficient Random Number Generation and Application Using CUDA
On my i3, the following program runs in about five seconds:
#include <random>
std::mt19937_64 foo;
double drand() {
union {
double d;
long long l;
} x;
x.d = 1.0;
x.l |= foo() & (1LL<<53)-1;
return x.d-1;
}
int main() {
double d;
for (int i = 0; i < 1e9; i++)
d += drand();
printf("%g\n", d);
}
whereas replacing the drand() call with the following results in a program that runs in about ten seconds:
double drand2() {
return std::generate_canonical<double,
std::numeric_limits<double>::digits>(foo);
}
Using the following instead of drand() also results in a program that runs in about ten seconds:
std::uniform_real_distribution<double> uni;
double drand3() {
return uni(foo);
}
Perhaps the hacky drand() above suits your purposes better than the standard solutions..
Task Definition
OP asks to get answer for both the
1. Speed of generation -- assuming a set of 10E+012 random numbers to be "massive"
and
2. Quality of generator -- with a weak assumption that numbers just evenly distributed over some range of values are also random
However, there are more cardinal aspects to be addressed and successfully solved for the real system:
A. Define, whether your system simulation needs to be provided with a guarantee of a repeatability of the sequence of the random numbers for future re-runs of an experiment.
If this is not the case, the re-runs of the simulated experiment will yield principally different results then the randomizer process ( or pre-randomizer and randomized-selector ) need not worry about their re-entrant, state-full mode of operation and will get much simpler implementation.
B. Define, to what level do you need to proof a quality of randomness of the generated random numbers ( or does the generated sets of random numbers have to belong to some specific law of statistic theory ( some known synthetic distributions or truly random with an utmost Kolmogorov complexity of the resulting set of random numbers )). One need not be NSA expert to state that numerical generators of true-random sequences is a very hard issue and has it's computational costs associated with production of high-randomness products.
Hyper-chaotic and true-random sequences are computationally extemely expensive. Using low- or poor-randomness generators is not an option for randomness-quality sensitive applications ( whatever the marketing papers may say, no MIL-STD- or NSA-graded system will ever try this compromised quality in enviroments, where the results indeed matter, so why to settle for less in scientific simulations? Perhaps not a problem if you do not mind to miss so many "unvisited" states of the simulated phenomena ).
C. Verify, how many random numbers does your simulation system need to "consume per [usec]" and whether this design requirement parameter is constant or may get scaled-up by going into multi-threaded, vectorised, Grid-/Cloud-based distributed computation framework.
D. Does your simulation system require to maintain a global or per-thread- or perGrid/CloudNode- individual access management to the pool-of-randomized numbers in case of vectorized or Grid/Cloud-based computational strategy.
Task Solution Approach
Fastest [1] and best [2] solution with [A] and [B] solved and options for [D] is to pre-generate an utmost randomness quality numbers into an adequate access-pool ( and pay an acceptable cost of [C] and [D] on access-policy and access-management controls to re-read from the pool, rather than to re-generate ).

Random numbers from binomial distribution

I need to generate quickly lots of random numbers from binomial distributions for dramatically different trial sizes (most, however, will be small). I was hoping not to have to code an algorithm by hand (see, e.g., this related discussion from November), because I'm a novice programmer and don't like reinventing wheels. It appears Boost does not supply a generator for binomially distributed variates, but TR1 and GSL do. Is there a good reason to choose one over the other, or is it better that I write something customized to my situation? I don't know if this makes sense, but I'll alternate between generating numbers from uniform distributions and binomial distributions throughout the program, and I'd like for them to share the same seed and to minimize overhead. I'd love some advice or examples for what I should be considering.
Boost 1.43 appears to support binomial distributions. You can use boost::variate_generator to connect your source of randomness to the type
of distribution you want to sample from.
So your code might look something like this (Disclaimer: not tested!):
boost::mt19937 rng; // produces randomness out of thin air
// see pseudo-random number generators
const int n = 20;
const double p = 0.5;
boost::binomial<> my_binomial(n,p); // binomial distribution with n=20, p=0.5
// see random number distributions
boost::variate_generator<boost::mt19937&, boost::binomial<> >
next_value(rng, my_binomial); // glues randomness with mapping
int x = next_value(); // simulate flipping a fair coin 20 times
You misunderstand the Boost model - you choose a random number generator type and then a distribution on which to spread the values the RNG produces over. There's a very simple example in this answer, which uses a uniform distribution, but other distributions use the same basic pattern - the generator and the distribution are completely decoupled.

generating a random number within range 0 to n where n can be > RAND_MAX

How can I generate a random number within range 0 to n where n can be > RAND_MAX in c,c++?
Thanks.
split the generation in two phases, then combine the resulting numbers.
Random numbers is a very specialized subject that unless you are a maths junky is very easy to get wrong. So I would advice against building a random number from multiple sources you should use a good library.
I would first look at boost::Random
If that is not suffecient try of this group sci.crypt.random-numbers
Ask the question there they should be able to help.
suppose you want to generate a 64-bit random number, you could do this:
uint64_t n = 0;
for(int i = 0; i < 8; ++i) {
uint64_t x = generate_8bit_random_num();
n = (n << (8 * i)) | x;
}
Of course you could do it 16/32 bits at a time too, but this illustrates the concept.
How you generate that 8/16/32-bit random numbers is up to you. It could be as simple as rand() & 0xff or something better depending on how much you care about the randomness.
Assuming C++, have you tried looking at a decent random number library, like Boost.Random. Otherwise you may have to combine multiple random numbers.
If you're looking for a uniform distribution (or any distribution for that manner) , you must take care that the statistical properties of the output are sufficient for your needs. If you can't use the output of a random number generator directly, you should be very careful trying to combine numbers to achieve your needs.
At a bare minimum you should make sure the distribution is appropriate. If you're looking for a uniform distribution of integers from 0 to M, and you have some uniform random number generator g() to produce outputs that are smaller than M, make sure you do not do one of the following:
add k outputs of g() together until they're large enough (the result is nonuniform)
take r = g() + (g() << 16), then compute r % M (if the range of r is not an even multiple of M, it will weight certain values in the range slightly more than others; the shift-left itself is questionable unless g() outputs a range between 0 and a power of 2 minus 1)
Beyond that, there is the potential for cross-correlation between terms of the sequence (random number generators are supposed to produce independent identically-distributed outputs).
Read The Art of Computer Programming vol. 2 (Knuth) and/or Numerical Recipes and ask questions until you feel confident.
If your implementation has an integer type large enough to hold the result you need, it's generally easier to get a decent distribution by simply using a generator that produces the required range than to try to combine outputs from the smaller generator.
Of course, in most cases, you can just download code for something like the Mersenne Twister or (if you need a cryptographic quality generator) Blum-Blum-Shub, and forget about writing your own.
Do x random numbers (from 0 to RAND_MAX) and add them together, where
x = n % RAND_MAX
Consider a random variable which can take on values {0, 1} with P(0) = P(1) = 0.5. If you want to generate random values between 0 to 2 by summing two independent draws, you will have P(0) = 0.25, P(1) = 0.5 and P(2) = 0.25.
Therefore, use an appropriate library unless you do not care at all about the PDF of the RNG.
See also Chapter 7 in Numerical Recipes. (This is a link to the older edition but that's the one I studied anyway ;-)
There are many ways to do this.
If you are OK with less granularity (higher chance of dupes), then something like (in pseudocode) rand() * n / RAND_MAX will work to spread the values across a larger range. The catch is that in your real code you'll need to avoid overflow, either via casting rand() or n to a large-enough type (e.g. 64-bit int if RAND_MAX is 0xFFFFFFFF) to hold the multiplication result without overflow, or use a multiply-then-divide API (like GNU's MulDiv64 or Win32's MulDiv) which is optimized for this scenario.
If you want granuarity down to each integer, you can call rand() multiple times and append the results. Another answer suggests calling rand() for each 8-bit/16-bit/32-bit chunk depending on size of RAND_MAX.
But, IMHO, the above ideas can rapidly get complicated, inaccurate, or both. Generating random numbers is a solved problem in other libraries, and it's probably much easier to borrow existing code (e.g. from Boost) than try to roll your own. Open source random number generation algorithm in C++? has answers with more links if you want something besides Boost.
[ EDIT: revising after having a busy day... meant to get back and clean up my quick answer this morning, but got pulled away and only getting back now. :-) ]