How does the noise function actually work? - c++

I've looked into the libnoise sources and found the ValuNoise3D function:
double noise::ValueNoise3D (int x, int y, int z, int seed)
{
return 1.0 - ((double)IntValueNoise3D (x, y, z, seed) / 1073741824.0);
}
int noise::IntValueNoise3D (int x, int y, int z, int seed)
{
// All constants are primes and must remain prime in order for this noise
// function to work correctly.
int n = (
X_NOISE_GEN * x
+ Y_NOISE_GEN * y
+ Z_NOISE_GEN * z
+ SEED_NOISE_GEN * seed)
& 0x7fffffff;
n = (n >> 13) ^ n;
return (n * (n * n * 60493 + 19990303) + 1376312589) & 0x7fffffff;
}
But when I am looking at this, it is a magic for me. How does this actually work? I mean why the guy who wrote this, took those prime numbers instead of others? Why such equations? How did he decide to use those equations instead of others? Just... how to understand this?

The libnoise Web site has a good explanation of the mathematics behind this noise function. In particular, with regards to the prime numbers:
These large integers are primes. These integers may be modified as long as they remain prime; non-prime numbers may introduce discernible patterns to the output.
noise::IntValueNoise3D actually operates in two steps: the first step converts the (x, y, z) coordinates to a single integer, and the second step puts this integer through an integer noise function to produce a noise value roughly between -1073741824 and 1073741824. noise::ValueNoise3D just converts that integer to a floating-point value between -1 and 1.
As for why noise::IntValueNoise3D performs all those convoluted operations, it basically boils down to the fact that this particular sequence of operations produces a nice, noisy result with no clear pattern visible. This is not the only sequence of operations that could have been used; anything that produces a sufficiently noisy result would have worked.

There is an art to randomness. There are are many things that make a pseudorandom number "look good." For a lot of 3d function, the most important thing in making it "look good" is having a proper looking frequency distribution. Anything which ensures a good frequency distribution modulo 2^32 will yield very good looking numbers. Multiplying by a large prime number yields good frequency distributions because they share no factors with 2^32.

Related

Obtain values from multiple distributions with a single generator roll

I am trying to implement the Alias method, also described here. This is an algorithm which allows to sample from a weighted N-sided dice in O(1).
The algorithm calls for the generation of two values:
An uniformly distributed integer i in [0, N]
An uniformly distributed real y in [0, 1)
The paper specifies that these two numbers can be obtained by a single real number x between [0, N). From x one can then derive two values as:
i = floor(x)
y = x - i
Now, the other implementations that I have seen call for the random number generator two times, one to generate i and one to generate y. Given that I am using a fairly expensive generator (std::mt19937) and that I need to sample many times, I was wondering if there was a better approach in terms of performance, while preserving the quality of the result.
I'm not sure whether using an uniform_real_distribution to generate x makes sense as if N is large then y's distribution is going to get sparser as doubles are not uniformly distributed. Is there maybe a way to call the engine, get the random bits out, and then generate i and y from them directly?
You are correct, with their method the distribution of y will become less and less uniform with increasing N.
In fact, for N above 2^52 y will be exactly 0, as all numbers above that value are integers for double precision. 2^52 is 4,503,599,627,370,496 (4.5 quadrillion).
It will not matter at all for reasonable values of N though. You should be fine if your N is less than 2^26 (67 million), intuitively. Your die does not have an astronomical number of sides, does it?
I had similar problem, and would tell you how I solved it in my case. It might be applicable to you or not, but here is the story
I didn't use any kind of 32bit RNG. Basically, no 32 bit platform and software to care about. So I used std::mt19937_64 as baseline generator. One 64bit unsigned int per call. Later I tried to use one of the PCG 64bit RNG, overall faster good outcome.
Top N bits to be used directly for selection from table (dice in your case). You could suffer from modulo bias, so I managed to extend table to be exact power of 2 (210 in my case, 10 bits for index sampling)
Remainder 54 bits were used to get uniform double random number following S. Vigna suggestion.
If you need more than 11 bits for index, you could either live with reduced randomness in mantissa, or replace double y with carefully crafted integer comparison.
Along the lines, some pseudocode (not tested!)
uint64_t mask = (1ULL << 53ULL) - 1ULL;
auto seed{ 98765432101ULL };
auto rng = std::mt19937_64{seed};
for (int k = 0; k != 1000; ++k) {
auto rv = rng();
auto idx = rv >> uint64_t(64 - 10); // needed only 10 bits for index
double y = (rv & mask) * (1. / (1ULL << 53ULL)); // 53 bits used for mantissa
std::cout << idx << "," << y << '\n';
}
Reference to S.Vigna integer2double conversion for RNG: http://xoshiro.di.unimi.it/, at the very end of the page

nan output due to maclaurin series expansion of sine, console crashes

Here is my code:
#include <iostream>
#include <cmath>
using namespace std;
int factorial(int);
int main()
{
for(int k = 0; k < 100000; k++)
{
static double sum = 0.0;
double term;
term = (double)pow(-1.0, k) * (double)pow(4.0, 2*k+1) / factorial(2*k+1);
sum = sum + term;
cout << sum << '\n';
}
}
int factorial(int n)
{
if(n == 0)
{
return 1;
}
return n*factorial(n-1);
}
I'm just trying to calculate the value of sine(4) using the maclaurin expansion form of sine. For each console output, the value reads 'nan'. The console gives an error and shuts down after like 10 second. I don't get any errors in the IDE.
There're multiple problems with your approach.
Your factorial function can't return an int. The return value will be way too big, very quickly.
Using pow(-1, value) to get a alternating positive/negative one is very inefficient and will yield incorrect value pretty quick. You should pick 1.0 or -1.0 depending on k's parity.
When you sum a long series of terms, you want to sum the terms with the least magnitude first. Otherwise, you lose precision due to existing bit limiting the range you can reach. In your case, the power of four is dominated by the factorial, so you sum the highest magnitude values first. You'd probably get better precision starting by the other end.
Algorithmically, if you're going to raise 4 to the 2k+1 power and then divide by (2k+1)!, you should keep both the list of factors (4, 4, 4, 4...) and (2,3,4,5,6,7,8,9,....) and simplify both sides. There's plenty of fours to remove on the numerators and denominators at the same time.
Even with those four, I'm not sure you can get anywhere close to the 100000 target you set, without specialized code.
As already stated by others, the intermediate results you will get for large k are magnitudes too large to fit into a double. From a certain k on pow as well as factorial will return infinity. This is simply what happens for very large doubles. And as you then divide one infinity by another you get NaN.
One common trick to deal with too large numbers is using logarithms for intermediate results and only in the end apply the exponential function once.
Some mathematical knowledge of logarithms is required here. To understand what I am doing here you need to know exp(log(x)) == x, log(a^b) == b*log(a), and log(a/b) == log(a) - log(b).
In your case you can rewrite
pow(4, 2*k+1)
to
exp((2*k+1)*log(4))
Then there is still the factorial. The lgamma function can help with factorial(n) == gamma(n+1) and log(factorial(n)) == lgamma(n+1). In short, lgamma gives you the log of a factorial without huge intermediate results.
So summing up, replace
pow(4, 2*k+1) / factorial(2*k+1)
With
exp((2*k+1)*log(4) - lgamma(2*k+2))
This should help you with your NaNs. Also, this should increase performance as lgamma operates in O(1) whereas your factorial is in O(k).
Note, however, that I have still very little confidence that your result will be numerically accurate.
A double has still limited precision of roughly 16 decimal digits. Your 100000 iterations are very likely worthless, probably even harmfull.

Calculating Probability C++ Bernoulli Trials

The program asks the user for the number of times to flip a coin (n; the number of trials).
A success is considered a heads.
Flawlessly, the program creates a random number between 0 and 1. 0's are considered heads and success.
Then, the program is supposed to output the expected values of getting x amount of heads. For example if the coin was flipped 4 times, what are the following probabilities using the formula
nCk * p^k * (1-p)^(n-k)
Expected 0 heads with n flips: xxx
Expected 1 heads with n flips: xxx
...
Expected n heads with n flips: xxx
When doing this with "larger" numbers, the numbers come out to weird values. It happens if 15 or twenty are put into the input. I have been getting 0's and negative values for the value that should be xxx.
Debugging, I have noticed that the nCk has come out to be negative and not correct towards the upper values and beleive this is the issue. I use this formula for my combination:
double combo = fact(n)/fact(r)/fact(n-r);
here is the psuedocode for my fact function:
long fact(int x)
{
int e; // local counter
factor = 1;
for (e = x; e != 0; e--)
{
factor = factor * e;
}
return factor;
}
Any thoughts? My guess is my factorial or combo functions are exceeding the max values or something.
You haven't mentioned how is factor declared. I think you are getting integer overflows. I suggest you use double. That is because since you are calculating expected values and probabilities, you shouldn't be concerned much about precision.
Try changing your fact function to.
double fact(double x)
{
int e; // local counter
double factor = 1;
for (e = x; e != 0; e--)
{
factor = factor * e;
}
return factor;
}
EDIT:
Also to calculate nCk, you need not calculate factorials 3 times. You can simply calculate this value in the following way.
if k > n/2, k = n-k.
n(n-1)(n-2)...(n-k+1)
nCk = -----------------------
factorial(k)
You're exceeding the maximum value of a long. Factorial grows so quickly that you need the right type of number--what type that is will depend on what values you need.
Long is an signed integer, and as soon as you pass 2^31, the value will become negative (it's using 2's complement math).
Using an unsigned long will buy you a little time (one more bit), but for factorial, it's probably not worth it. If your compiler supports long long, then try an "unsigned long long". That will (usually, depends on compiler and CPU) double the number of bits you're using.
You can also try switching to use double. The problem you'll face there is that you'll lose accuracy as the numbers increase. A double is a floating point number, so you'll have a fixed number of significant digits. If your end result is an approximation, this may work okay, but if you need exact values, it won't work.
If none of these solutions will work for you, you may need to resort to using an "infinite precision" math package, which you should be able to search for. You didn't say if you were using C or C++; this is going to be a lot more pleasant with C++ as it will provide a class that acts like a number and that would use standard arithmetic operators.

Optimising code for modular arithmetic

I am trying to calculate below expression for large numbers.
Since the value of this expression will be very large, I just need the value of this expression modulus some prime number. Suppose the value of this expression is x and I choose the prime number 1000000007; I'm looking for x % 1000000007.
Here is my code.
#include<iostream>
#define MOD 1000000007
using namespace std;
int main()
{
unsigned long long A[1001];
A[2]=2;
for(int i=4;i<=1000;i+=2)
{
A[i]=((4*A[i-2])/i)%MOD;
A[i]=(A[i]*(i-1))%MOD;
while(1)
{
int N;
cin>>N;
cout<<A[N];
}
}
But even this much optimisation is failing for large values of N. For example if N is 50, the correct output is 605552882, but this gives me 132924730. How can I optimise it further to get the correct output?
Note : I am only considering N as even.
When you do modular arithmetic, there is no such operation as division. Instead, you take the modular inverse of the denominator and multiply. The modular inverse is computed using the extended Euclidean algorithm, discovered by Etienne Bezout in 1779:
# return y such that x * y == 1 (mod m)
function inverse(x, m)
a, b, u := 0, m, 1
while x > 0
q, r := divide(b, x)
x, a, b, u := b % x, u, x, a - q * u
if b == 1 return a % m
error "must be coprime"
The divide function returns both quotient and remainder. All of the assignment operators given above are simultaneous assignment, where all of the right hand sides are computed first, then all of the left hand sides are assigned simultaneously. You can see more about modular arithmetic at my blog.
For starters no modulo division is needed at all, your formula can be rewrited as follows:
N!/((N/2)!^2)
=(1.2.3...N)/((1.2.3...N/2)*(1.2.3...N/2))
=((N/2+1)...N)/(1.2.3...N/2))
ok now you are dividing bigger number by the smaller
so you can iterate the result by multiplicating divisor and divident
so booth sub results have similar magnitude
any time both numbers are divisible 2 shift them left
this will ensure that the do not overflow
if you are at the and of (N/2)! than continue the the multiplicetion only for the rest.
any time both subresults are divisible by anything divide them
until you are left with divison by 1
after this you can multiply with modulo arithmetics till the end normaly.
for more advanced approach see this.
N! and (N/2)! are decomposable much further than it seems at the first look
i had solved that for some time now,...
here is what i found: Fast exact bigint factorial
in shortcut your terms N! and ((N/2)!)^2 will disappear completely.
only simple prime decomposition + 4N <-> 1N correction will remind
solution:
I. (4N!)=((2N!)^2) . mul(i=all primes<=4N) of [i^sum(j=1,2,3,4,5,...4N>=i^j) of [(4N/(i^j))%2]]
II. (4N)!/((4N/2)!^2) = (4N)!/((2N)!^2)
----------------------------------------
I.=II. (4N)!/((2N)!^2)=mul(i=all primes<=4N) of [i^sum(j=1,2,3,4,5,...4N>=i^j) of [(4N/(i^j))%2]]
the only thing is that N must be divisible by 4 ... therefore 4N in all terms.
if you have N%4!=0 than solve for N-N%4 and the result correct by the misin 1-3 numbers.
hope it helps

What is the optimal algorithm for generating an unbiased random integer within a range?

In this StackOverflow question:
Generating random integer from a range
the accepted answer suggests the following formula for generating a random integer in between given min and max, with min and max being included into the range:
output = min + (rand() % (int)(max - min + 1))
But it also says that
This is still slightly biased towards lower numbers ... It's also
possible to extend it so that it removes the bias.
But it doesn't explain why it's biased towards lower numbers or how to remove the bias. So, the question is: is this the most optimal approach to generation of a random integer within a (signed) range while not relying on anything fancy, just rand() function, and in case if it is optimal, how to remove the bias?
EDIT:
I've just tested the while-loop algorithm suggested by #Joey against floating-point extrapolation:
static const double s_invRandMax = 1.0/((double)RAND_MAX + 1.0);
return min + (int)(((double)(max + 1 - min))*rand()*s_invRandMax);
to see how much uniformly "balls" are "falling" into and are being distributed among a number of "buckets", one test for the floating-point extrapolation and another for the while-loop algorithm. But results turned out to be varying depending on the number of "balls" (and "buckets") so I couldn't easily pick a winner. The working code can be found at this Ideone page. For example, with 10 buckets and 100 balls the maximum deviation from the ideal probability among buckets is less for the floating-point extrapolation than for the while-loop algorithm (0.04 and 0.05 respectively) but with 1000 balls, the maximum deviation of the while-loop algorithm is lesser (0.024 and 0.011), and with 10000 balls, the floating-point extrapolation is again doing better (0.0034 and 0.0053), and so on without much of consistency. Thinking of the possibility that none of the algorithms consistently produces uniform distribution better than that of the other algorithm, makes me lean towards the floating-point extrapolation since it appears to perform faster than the while-loop algorithm. So is it fine to choose the floating-point extrapolation algorithm or my testings/conclusions are not completely correct?
The problem is that you're doing a modulo operation. This would be no problem if RAND_MAX would be evenly divisible by your modulus, but usually that is not the case. As a very contrived example, assume RAND_MAX to be 11 and your modulus to be 3. You'll get the following possible random numbers and the following resulting remainders:
0 1 2 3 4 5 6 7 8 9 10
0 1 2 0 1 2 0 1 2 0 1
As you can see, 0 and 1 are slightly more probable than 2.
One option to solve this is rejection sampling: By disallowing the numbers 9 and 10 above you can cause the resulting distribution to be uniform again. The tricky part is figuring out how to do so efficiently. A very nice example (one that took me two days to understand why it works) can be found in Java's java.util.Random.nextInt(int) method.
The reason why Java's algorithm is a little tricky is that they avoid slow operations like multiplication and division for the check. If you don't care too much you can also do it the naïve way:
int n = (int)(max - min + 1);
int remainder = RAND_MAX % n;
int x, output;
do {
x = rand();
output = x % n;
} while (x >= RAND_MAX - remainder);
return min + output;
EDIT: Corrected a fencepost error in above code, now it works as it should. I also created a little sample program (C#; taking a uniform PRNG for numbers between 0 and 15 and constructing a PRNG for numbers between 0 and 6 from it via various ways):
using System;
class Rand {
static Random r = new Random();
static int Rand16() {
return r.Next(16);
}
static int Rand7Naive() {
return Rand16() % 7;
}
static int Rand7Float() {
return (int)(Rand16() / 16.0 * 7);
}
// corrected
static int Rand7RejectionNaive() {
int n = 7, remainder = 16 % n, x, output;
do {
x = Rand16();
output = x % n;
} while (x >= 16 - remainder);
return output;
}
// adapted to fit the constraints of this example
static int Rand7RejectionJava() {
int n = 7, x, output;
do {
x = Rand16();
output = x % n;
} while (x - output + 6 > 15);
return output;
}
static void Test(Func<int> rand, string name) {
var buckets = new int[7];
for (int i = 0; i < 10000000; i++) buckets[rand()]++;
Console.WriteLine(name);
for (int i = 0; i < 7; i++) Console.WriteLine("{0}\t{1}", i, buckets[i]);
}
static void Main() {
Test(Rand7Naive, "Rand7Naive");
Test(Rand7Float, "Rand7Float");
Test(Rand7RejectionNaive, "Rand7RejectionNaive");
}
}
The result is as follows (pasted into Excel and added conditional coloring of cells so that differences are more apparent):
Now that I fixed my mistake in above rejection sampling it works as it should (before it would bias 0). As you can see, the float method isn't perfect at all, it just distributes the biased numbers differently.
The problem occurs when the number of outputs from the random number generator (RAND_MAX+1) is not evenly divisible by the desired range (max-min+1). Since there will be a consistent mapping from a random number to an output, some outputs will be mapped to more random numbers than others. This is regardless of how the mapping is done - you can use modulo, division, conversion to floating point, whatever voodoo you can come up with, the basic problem remains.
The magnitude of the problem is very small, and undemanding applications can generally get away with ignoring it. The smaller the range and the larger RAND_MAX is, the less pronounced the effect will be.
I took your example program and tweaked it a bit. First I created a special version of rand that only has a range of 0-255, to better demonstrate the effect. I made a few tweaks to rangeRandomAlg2. Finally I changed the number of "balls" to 1000000 to improve the consistency. You can see the results here: http://ideone.com/4P4HY
Notice that the floating-point version produces two tightly grouped probabilities, near either 0.101 or 0.097, nothing in between. This is the bias in action.
I think calling this "Java's algorithm" is a bit misleading - I'm sure it's much older than Java.
int rangeRandomAlg2 (int min, int max)
{
int n = max - min + 1;
int remainder = RAND_MAX % n;
int x;
do
{
x = rand();
} while (x >= RAND_MAX - remainder);
return min + x % n;
}
It's easy to see why this algorithm produces a biased sample. Suppose your rand() function returns uniform integers from the set {0, 1, 2, 3, 4}. If I want to use this to generate a random bit 0 or 1, I would say rand() % 2. The set {0, 2, 4} gives me 0, and the set {1, 3} gives me 1 -- so clearly I sample 0 with 60% and 1 with 40% likelihood, not uniform at all!
To fix this you have to either make sure that your desired range divides the range of the random number generator, or otherwise discard the result whenever the random number generator returns a number that's larger than the largest possible multiple of the target range.
In the above example, the target range is 2, the largest multiple that fits into the random generation range is 4, so we discard any sample that is not in the set {0, 1, 2, 3} and roll again.
By far the easiest solution is std::uniform_int_distribution<int>(min, max).
You have touched on two points involving a random integer algorithm: Is it optimal, and is it unbiased?
Optimal
There are many ways to define an "optimal" algorithm. Here we look at "optimal" algorithms in terms of the number of random bits it uses on average. In this sense, rand is a poor method to use for randomly generated numbers because, among other problems with rand(), it need not necessarily produce random bits (because RAND_MAX is not exactly specified). Instead, we will assume we have a "true" random generator that can produce unbiased and independent random bits.
In 1976, D. E. Knuth and A. C. Yao showed that any algorithm that produces random integers with a given probability, using only random bits, can be represented as a binary tree, where random bits indicate which way to traverse the tree and each leaf (endpoint) corresponds to an outcome. (Knuth and Yao, "The complexity of nonuniform random number generation", in Algorithms and Complexity, 1976.) They also gave bounds on the number of bits a given algorithm will need on average for this task. In this case, an optimal algorithm to generate integers in [0, n) uniformly, will need at least log2(n) and at most log2(n) + 2 bits on average.
There are many examples of optimal algorithms in this sense. See the following answer of mine:
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
Unbiased
However, any optimal integer generator that is also unbiased will, in general, run forever in the worst case, as also shown by Knuth and Yao. Going back to the binary tree, each one of the n outcomes labels leaves in the binary tree so that each integer in [0, n) can occur with probability 1/n. But if 1/n has a non-terminating binary expansion (which will be the case if n is not a power of 2), this binary tree will necessarily either—
Have an "infinite" depth, or
include "rejection" leaves at the end of the tree,
And in either case, the algorithm won't run in constant time and will run forever in the worst case. (On the other hand, when n is a power of 2, the optimal binary tree will have a finite depth and no rejection nodes.)
And for general n, there is no way to "fix" this worst case time complexity without introducing bias. For instance, modulo reductions (including the min + (rand() % (int)(max - min + 1)) in your question) are equivalent to a binary tree in which rejection leaves are replaced with labeled outcomes — but since there are more possible outcomes than rejection leaves, only some of the outcomes can take the place of the rejection leaves, introducing bias. The same kind of binary tree — and the same kind of bias — results if you stop rejecting after a set number of iterations. (However, this bias may be negligible depending on the application. There are also security aspects to random integer generation, which are too complicated to discuss in this answer.)
Without loss of generality, the problem of generating random integers on [a, b] can be reduced to the problem of generating random integers on [0, s). The state of the art for generating random integers on a bounded range from a uniform PRNG is represented by the following recent publication:
Daniel Lemire,"Fast Random Integer Generation in an Interval." ACM Trans. Model. Comput. Simul. 29, 1, Article 3 (January 2019) (ArXiv draft)
Lemire shows that his algorithm provides unbiased results, and motivated by the growing popularity of very fast high-quality PRNGs such as Melissa O'Neill's PCG generators, shows how to the results can be computed fast, avoiding slow division operations almost all of the time.
An exemplary ISO-C implementation of his algorithm is shown in randint() below. Here I demonstrate it in conjunction with George Marsaglia's older KISS64 PRNG. For performance reasons, the required 64×64→128 bit unsigned multiplication is typically best implemented via machine-specific intrinsics or inline assembly that map directly to appropriate hardware instructions.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* PRNG state */
typedef struct Prng_T *Prng_T;
/* Returns uniformly distributed integers in [0, 2**64-1] */
uint64_t random64 (Prng_T);
/* Multiplies two 64-bit factors into a 128-bit product */
void umul64wide (uint64_t, uint64_t, uint64_t *, uint64_t *);
/* Generate in bias-free manner a random integer in [0, s) with Lemire's fast
algorithm that uses integer division only rarely. s must be in [0, 2**64-1].
Daniel Lemire, "Fast Random Integer Generation in an Interval," ACM Trans.
Model. Comput. Simul. 29, 1, Article 3 (January 2019)
*/
uint64_t randint (Prng_T prng, uint64_t s)
{
uint64_t x, h, l, t;
x = random64 (prng);
umul64wide (x, s, &h, &l);
if (l < s) {
t = (0 - s) % s;
while (l < t) {
x = random64 (prng);
umul64wide (x, s, &h, &l);
}
}
return h;
}
#define X86_INLINE_ASM (0)
/* Multiply two 64-bit unsigned integers into a 128 bit unsined product. Return
the least significant 64 bist of the product to the location pointed to by
lo, and the most signfiicant 64 bits of the product to the location pointed
to by hi.
*/
void umul64wide (uint64_t a, uint64_t b, uint64_t *hi, uint64_t *lo)
{
#if X86_INLINE_ASM
uint64_t l, h;
__asm__ (
"movq %2, %%rax;\n\t" // rax = a
"mulq %3;\n\t" // rdx:rax = a * b
"movq %%rax, %0;\n\t" // l = (a * b)<31:0>
"movq %%rdx, %1;\n\t" // h = (a * b)<63:32>
: "=r"(l), "=r"(h)
: "r"(a), "r"(b)
: "%rax", "%rdx");
*lo = l;
*hi = h;
#else // X86_INLINE_ASM
uint64_t a_lo = (uint64_t)(uint32_t)a;
uint64_t a_hi = a >> 32;
uint64_t b_lo = (uint64_t)(uint32_t)b;
uint64_t b_hi = b >> 32;
uint64_t p0 = a_lo * b_lo;
uint64_t p1 = a_lo * b_hi;
uint64_t p2 = a_hi * b_lo;
uint64_t p3 = a_hi * b_hi;
uint32_t cy = (uint32_t)(((p0 >> 32) + (uint32_t)p1 + (uint32_t)p2) >> 32);
*lo = p0 + (p1 << 32) + (p2 << 32);
*hi = p3 + (p1 >> 32) + (p2 >> 32) + cy;
#endif // X86_INLINE_ASM
}
/* George Marsaglia's KISS64 generator, posted to comp.lang.c on 28 Feb 2009
https://groups.google.com/forum/#!original/comp.lang.c/qFv18ql_WlU/IK8KGZZFJx4J
*/
struct Prng_T {
uint64_t x, c, y, z, t;
};
struct Prng_T kiss64 = {1234567890987654321ULL, 123456123456123456ULL,
362436362436362436ULL, 1066149217761810ULL, 0ULL};
/* KISS64 state equations */
#define MWC64 (kiss64->t = (kiss64->x << 58) + kiss64->c, \
kiss64->c = (kiss64->x >> 6), kiss64->x += kiss64->t, \
kiss64->c += (kiss64->x < kiss64->t), kiss64->x)
#define XSH64 (kiss64->y ^= (kiss64->y << 13), kiss64->y ^= (kiss64->y >> 17), \
kiss64->y ^= (kiss64->y << 43))
#define CNG64 (kiss64->z = 6906969069ULL * kiss64->z + 1234567ULL)
#define KISS64 (MWC64 + XSH64 + CNG64)
uint64_t random64 (Prng_T kiss64)
{
return KISS64;
}
int main (void)
{
int i;
Prng_T state = &kiss64;
for (i = 0; i < 1000; i++) {
printf ("%llu\n", randint (state, 10));
}
return EXIT_SUCCESS;
}
If you really want to get a perfect generator assuming rand() function that you have is perfect, you need to apply the method explained bellow.
We will create a random number, r, from 0 to max-min=b-1, which is then easy to move to the range that you want, just take r+min
We will create a random number where b < RAND_MAX, but the procedure can be easily adopted to have a random number for any base
PROCEDURE:
Take a random number r in its original RAND_MAX size without any truncation
Display this number in base b
Take first m=floor(log_b(RAND_MAX)) digits of this number for m random numbers from 0 to b-1
Shift each by min (i.e. r+min) to get them into the range (min,max) as you wanted
Since log_b(RAND_MAX) is not necessarily an integer, the last digit in the representation is wasted.
The original approach of just using mod (%) is mistaken exactly by
(log_b(RAND_MAX) - floor(log_b(RAND_MAX)))/ceil(log_b(RAND_MAX))
which you might agree is not that much, but if you insist on being precise, that is the procedure.