Given a random source (a generator of random bit stream), how do I generate a uniformly distributed random floating-point value in a given range?
Assume that my random source looks something like:
unsigned int GetRandomBits(char* pBuf, int nLen);
And I want to implement
double GetRandomVal(double fMin, double fMax);
Notes:
I don't want the result precision to be limited (for example only 5 digits).
Strict uniform distribution is a must
I'm not asking for a reference to an existing library. I want to know how to implement it from scratch.
For pseudo-code / code, C++ would be most appreciated
I don't think I'll ever be convinced that you actually need this, but it was fun to write.
#include <stdint.h>
#include <cmath>
#include <cstdio>
FILE* devurandom;
bool geometric(int x) {
// returns true with probability min(2^-x, 1)
if (x <= 0) return true;
while (1) {
uint8_t r;
fread(&r, sizeof r, 1, devurandom);
if (x < 8) {
return (r & ((1 << x) - 1)) == 0;
} else if (r != 0) {
return false;
}
x -= 8;
}
}
double uniform(double a, double b) {
// requires IEEE doubles and 0.0 < a < b < inf and a normal
// implicitly computes a uniform random real y in [a, b)
// and returns the greatest double x such that x <= y
union {
double f;
uint64_t u;
} convert;
convert.f = a;
uint64_t a_bits = convert.u;
convert.f = b;
uint64_t b_bits = convert.u;
uint64_t mask = b_bits - a_bits;
mask |= mask >> 1;
mask |= mask >> 2;
mask |= mask >> 4;
mask |= mask >> 8;
mask |= mask >> 16;
mask |= mask >> 32;
int b_exp;
frexp(b, &b_exp);
while (1) {
// sample uniform x_bits in [a_bits, b_bits)
uint64_t x_bits;
fread(&x_bits, sizeof x_bits, 1, devurandom);
x_bits &= mask;
x_bits += a_bits;
if (x_bits >= b_bits) continue;
double x;
convert.u = x_bits;
x = convert.f;
// accept x with probability proportional to 2^x_exp
int x_exp;
frexp(x, &x_exp);
if (geometric(b_exp - x_exp)) return x;
}
}
int main() {
devurandom = fopen("/dev/urandom", "r");
for (int i = 0; i < 100000; ++i) {
printf("%.17g\n", uniform(1.0 - 1e-15, 1.0 + 1e-15));
}
}
Here is one way of doing it.
The IEEE Std 754 double format is as follows:
[s][ e ][ f ]
where s is the sign bit (1 bit), e is the biased exponent (11 bits) and f is the fraction (52 bits).
Beware that the layout in memory will be different on little-endian machines.
For 0 < e < 2047, the number represented is
(-1)**(s) * 2**(e – 1023) * (1.f)
By setting s to 0, e to 1023 and f to 52 random bits from your bit stream, you get a random double in the interval [1.0, 2.0). This interval is unique in that it contains 2 ** 52 doubles, and these doubles are equidistant. If you then subtract 1.0 from the constructed double, you get a random double in the interval [0.0, 1.0). Moreover, the property about being equidistant is preserve.
From there you should be able to scale and translate as needed.
I'm surprised that for question this old, nobody had actual code for the best answer. User515430's answer got it right--you can take advantage of IEEE-754 double format to directly put 52 bits into a double with no math at all. But he didn't give code. So here it is, from my public domain ojrandlib:
double ojr_next_double(ojr_generator *g) {
uint64_t r = (OJR_NEXT64(g) & 0xFFFFFFFFFFFFFull) | 0x3FF0000000000000ull;
return *(double *)(&r) - 1.0;
}
NEXT64() gets a 64-bit random number. If you have a more efficient way of getting only 52 bits, use that instead.
This is easy, as long as you have an integer type with as many bits of precision as a double. For instance, an IEEE double-precision number has 53 bits of precision, so a 64-bit integer type is enough:
#include <limits.h>
double GetRandomVal(double fMin, double fMax) {
unsigned long long n ;
GetRandomBits ((char*)&n, sizeof(n)) ;
return fMin + (n * (fMax - fMin))/ULLONG_MAX ;
}
This is probably not the answer you want, but the specification here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf
in sections [rand.util.canonical] and [rand.dist.uni.real], contains sufficient information to implement what you want, though with slightly different syntax. It isn't easy, but it is possible. I speak from personal experience. A year ago I knew nothing about random numbers, and I was able to do it. Though it took me a while... :-)
The question is ill-posed. What does uniform distribution over floats even mean?
Taking our cue from discrepancy, one way to operationalize your question is to define that you want the distribution that minimizes the following value:
Where x is the random variable you are sampling with your GetRandomVal(double fMin, double fMax) function, and means the probability that a random x is smaller or equal to t.
And now you can go on and try to evaluate eg a dabbler's answer. (Hint all the answers that fail to use the whole precision and stick to eg 52 bits will fail this minimization criterion.)
However, if you just want to be able to generate all float bit patterns that fall into your specified range with equal possibility, even if that means that eg asking for GetRandomVal(0,1000) will create more values between 0 and 1.5 than between 1.5 and 1000, that's easy: any interval of IEEE floating point numbers when interpreted as bit patterns map easily to a very small number of intervals of unsigned int64. See eg this question. Generating equally distributed random values of unsigned int64 in any given interval is easy.
I may be misunderstanding the question, but what stops you simply sampling the next n bits from the random bit stream and converting that to a base 10 number number ranged 0 to 2^n - 1.
To get a random value in [0..1[ you could do something like:
double value = 0;
for (int i=0;i<53;i++)
value = 0.5 * (value + random_bit()); // Insert 1 random bit
// or value = ldexp(value+random_bit(),-1);
// or group several bits into one single ldexp
return value;
Related
Say I have a number, 100000, I can use some simple maths to check its size, i.e. log(100000) -> 5 (base 10 logarithm). Theres also another way of doing this, which is quite slow. std::string num = std::to_string(100000), num.size(). Is there an way to mathematically determine the length of a number? (not just 100000, but for things like 2313455, 123876132.. etc)
Why not use ceil? It rounds up to the nearest whole number - you can just wrap that around your log function, and add a check afterwards to catch the fact that a power of 10 would return 1 less than expected.
Here is a solution to the problem using single precision floating point numbers in O(1):
#include <cstdio>
#include <iostream>
#include <cstring>
int main(){
float x = 500; // to be converted
uint32_t f;
std::memcpy(&f, &x, sizeof(uint32_t)); // Convert float into a manageable int
uint8_t exp = (f & (0b11111111 << 23)) >> 23; // get the exponent
exp -= 127; // floating point bias
exp /= 3.32; // This will round but for this case it should be fine (ln2(10))
std::cout << std::to_string(exp) << std::endl;
}
For a number in scientific notation a*10^e this will return e (when 1<=a<10), so the length of the number (if it has an absolute value larger than 1), will be exp + 1.
For double precision this works, but you have to adapt it (bias is 1023 I think, and bit alignment is different. Check this)
This only works for floating point numbers, though so probably not very useful in this case. The efficiency in this case relative to the logarithm will also be determined by the speed at which int -> float conversion can occur.
Edit:
I just realised the question was about double. The modified result is:
int16_t getLength(double a){
uint64_t bits;
std::memcpy(&bits, &a, sizeof(uint64_t));
int16_t exp = (f >> 52) & 0b11111111111; // There is no 11 bit long int so this has to do
exp -= 1023;
exp /= 3.32;
return exp + 1;
}
There are some changes so that it behaves better (and also less shifting).
You can also use frexp() to get the exponent without bias.
If the number is whole, keep dividing by 10, until you're at 0. You'd have to divide 100000 6 times, for example. For the fractional part, you need to keep multiplying by 10 until trunc(f) == f.
I´m implementing the Xorshift generators and others to compare their performances on my system - Windows and Linux.
https://en.wikipedia.org/wiki/Xorshift
http://xoroshiro.di.unimi.it/
I´m just now checking the generators with 64 bit states, like the xorshift64star from the wikipedia (here with my changes to trace the error)
double xorshift64star() {
uint64_t x = global_state[0]; /* The state must be seeded with a nonzero value. */
x ^= x >> 12; // a
x ^= x << 25; // b
x ^= x >> 27; // c
global_state[0] = x;
auto u64val = x * 0x2545F4914F6CDD1D;
double dval = (double)u64val;
return dval;
}
However, running on an online compiler https://www.onlinegdb.com/ the double value returned is always 0 or 3.1148823182455562e-317
I haven´t been able to find a solution on how to make the output from this function to be normalize into a [0,1] uniform distribution, without losing much precision and entropy.
What is the "corect" transformation I would have to do the output?
SOLVED!
Thank you #RetiredNinja.
The generator was already normalizing the uint64 value. However only casting it to double don´t seem to work for that particular compiler.
The solution was to use the custom cast from http://xoroshiro.di.unimi.it/
static inline double to_double(uint64_t x) {
const union { uint64_t i; double d; } u = {.i = UINT64_C(0x3FF) << 52 | x >> 12 };
return u.d - 1.0;
}
Assuming that your u64val is uniform between 0 and numeric_limits<uint64_t>::max, the obvious transform is u64val/numeric_limits<uint64_t>::max.
This is not the most accurate transform, though. The problem here is that you end up generating multiples of 1.0/numeric_limits<uint64_t>::max. This obviously gives many small values a probability of zero. But consider this: the probabilities of all numbers between 0 and 1e-100 combined has to be 1e-100. That means you'd need to generate about 1e100 numbers to get any one of these numbers.
This basically means we've got an underspecified engineering problem here. Exactly how close should the approximation to uniform be?
I am managing some big (128~256bits) integers with gmp. It has come a point were I would like to multiply them for a double close to 1 (0.1 < double < 10), the result being still an approximated integer. A good example of the operation I need to do is the following:
int i = 1000000000000000000 * 1.23456789
I searched in the gmp documentation but I didn't find a function for this, so I ended up writing this code which seems to work well:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d, int prec=10) {
if (prec > 15) prec=15; //avoids overflows
uint_fast64_t m = (uint_fast64_t) floor(d);
r = i * m;
uint_fast64_t pos=1;
for (uint_fast8_t j=0; j<prec; j++) {
const double posd = (double) pos;
m = ((uint_fast64_t) floor(d * posd * 10.)) -
((uint_fast64_t) floor(d * posd)) * 10;
pos*=10;
r += (i * m) /pos;
}
}
Can you please tell me what do you think? Do you have any suggestion to make it more robust or faster?
this is what you wanted:
// BYTE lint[_N] ... lint[0]=MSB, lint[_N-1]=LSB
void mul(BYTE *c,BYTE *a,double b) // c[_N]=a[_N]*b
{
int i; DWORD cc;
double q[_N+1],aa,bb;
for (q[0]=0.0,i=0;i<_N;) // mul,carry down
{
bb=double(a[i])*b; aa=floor(bb); bb-=aa;
q[i]+=aa; i++;
q[i]=bb*256.0;
}
cc=0; if (q[_N]>127.0) cc=1.0; // round
for (i=_N-1;i>=0;i--) // carry up
{
double aa,bb;
cc+=q[i];
c[i]=cc&255;
cc>>=8;
}
}
_N is number of bits/8 per large int, large int is array of _N BYTEs where first byte is MSB (most significant BYTE) and last BYTE is LSB (least significant BYTE)
function is not handling signum, but it is only one if and some xor/inc to add.
trouble is that double has low precision even for your number 1.23456789 !!! due to precision loss the result is not exact what it should be (1234387129122386944 instead of 1234567890000000000) I think my code is mutch quicker and even more precise than yours because i do not need to mul/mod/div numbers by 10, instead i use bit shifting where is possible and not by 10-digit but by 256-digit (8bit). if you need more precision than use long arithmetic. you can speed up this code by using larger digits (16,32, ... bit)
My long arithmetics for precise astro computations are usually fixed point 256.256 bits numbers consist of 2*8 DWORDs + signum, but of course is much slower and some goniometric functions are realy tricky to implement, but if you want just basic functions than code your own lon arithmetics is not that hard.
also if you want to have numbers often in readable form is good to compromise between speed/size and consider not to use binary coded numbers but BCD coded numbers
I am not so familiar with either C++ or GMP what I could suggest source code without syntax errors, but what you are doing is more complicated than it should and can introduce unnecessary approximation.
Instead, I suggest you write function mpz_mult_d() like this:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d) {
d = ldexp(d, 52); /* exact, no overflow because 1 <= d <= 10 */
unsigned long long l = d; /* exact because d is an integer */
p = l * i; /* exact, in GMP */
(quotient, remainder) = p / 2^52; /* in GMP */
And now the next step depends on the kind of rounding you wish. If you wish the multiplication of d by i to give a result rounded toward -inf, just return quotient as result of the function. If you wish a result rounded to the nearest integer, you must look at remainder:
assert(0 <= remainder); /* proper Euclidean division */
assert(remainder < 2^52);
if (remainder < 2^51) return quotient;
if (remainder > 2^51) return quotient + 1; /* in GMP */
if (remainder == 2^51) return quotient + (quotient & 1); /* in GMP, round to “even” */
PS: I found your question by random browsing but if you had tagged it “floating-point”, people more competent than me could have answered it quickly.
Try this strategy:
Convert integer value to big float
Convert double value to big float
Make product
Convert result to integer
mpf_set_z(...)
mpf_set_d(...)
mpf_mul(...)
mpz_set_f(...)
I develop software for embedded platform and need a single-word division algorithm.
The problem is as follows:
given a large integer represented by a sequence of 32-bit words (can be many),
we need to divide it by another 32-bit word, i.e. compute the quotient (also large integer)
and the remainder (32-bits).
Certainly, If I were developing this algorithm on x86, I could simply take GNU MP
but this library is way too large for embdedde platform. Furthermore, our processor
does not have hardware integer divider (integer division is performed in the software).
However the processor has quite fast FPU, so the trick is to use floating-point arithmetic wherever possible.
Any ideas how to implement this ?
Sounds like a classic optimization. Instead of dividing by D, multiply by 0x100000000/D and then divide by 0x100000000. The latter is just a wordshift, i.e. trivial. Calculating the multiplier is a bit harder, but not a lot.
See also this detailed article for a far more detailed background.
Take a look at this one: the algorithm divides an integer a[0..n-1] by a single word 'c'
using floating-point for 64x32->32 division. The limbs of the quotient 'q' are just printed in a loop, you can save then in an array if you like. Note that you don't need GMP to run the algorithm - I use it just to compare the results.
#include <gmp.h>
// divides a multi-precision integer a[0..n-1] by a single word c
void div_by_limb(const unsigned *a, unsigned n, unsigned c) {
typedef unsigned long long uint64;
unsigned c_norm = c, sh = 0;
while((c_norm & 0xC0000000) == 0) { // make sure the 2 MSB are set
c_norm <<= 1; sh++;
}
// precompute the inverse of 'c'
double inv1 = 1.0 / (double)c_norm, inv2 = 1.0 / (double)c;
unsigned i, r = 0;
printf("\nquotient: "); // quotient is printed in a loop
for(i = n - 1; (int)i >= 0; i--) { // start from the most significant digit
unsigned u1 = r, u0 = a[i];
union {
struct { unsigned u0, u1; };
uint64 x;
} s = {u0, u1}; // treat [u1, u0] as 64-bit int
// divide a 2-word number [u1, u0] by 'c_norm' using floating-point
unsigned q = floor((double)s.x * inv1), q2;
r = u0 - q * c_norm;
// divide again: this time by 'c'
q2 = floor((double)r * inv2);
q = (q << sh) + q2; // reconstruct the quotient
printf("%x", q);
}
r %= c; // adjust the residue after normalization
printf("; residue: %x\n", r);
}
int main() {
mpz_t z, quo, rem;
mpz_init(z); // this is a dividend
mpz_set_str(z, "9999999999999999999999999999999", 10);
unsigned div = 9; // this is a divisor
div_by_limb((unsigned *)z->_mp_d, mpz_size(z), div);
mpz_init(quo); mpz_init(rem);
mpz_tdiv_qr_ui(quo, rem, z, div); // divide 'z' by 'div'
gmp_printf("compare: Quo: %Zx; Rem %Zx\n", quo, rem);
mpz_clear(quo);
mpz_clear(rem);
mpz_clear(z);
return 1;
}
I believe that a look-up table and Newton Raphson successive approximation is the canonical choice used by hardware designers (who generally can't afford the gates for a full hardware divide). You get to choose the trade off the between accuracy and execution time.
I need to write a function that rounds from unsigned long long to float, and the rounding should be toward nearest even.
I cannot just do a C++ type-cast, since AFAIK the standard does not specify the rounding.
I was thinking of using boost::numeric, but i could not find any useful lead after reading the documentation. Can this be done using that library?
Of course, if there is an alternative, i would be glad to use it.
Any help would be much appreciated.
EDIT: Adding an example to make things a bit clearer.
Suppose i want to convert 0xffffff7fffffffff to its floating point representation. The C++ standard permits either one of:
0x5f7fffff ~ 1.9999999*2^63
0x5f800000 = 2^64
Now if you add the restriction of round to nearest even, only the first result is acceptable.
Since you have so many bits in the source that can't be represented in the float and you can't (apparently) rely on the language's conversion, you'll have to do it yourself.
I devised a scheme that may or may not help you. Basically, there are 31 bits to represent positive numbers in a float so I pick up the 31 most significant bits in the source number. Then I save off and mask away all the lower bits. Then based on the value of the lower bits I round the "new" LSB up or down and finally use static_cast to create a float.
I left in some couts that you can remove as desired.
const unsigned long long mask_bit_count = 31;
float ull_to_float2(unsigned long long val)
{
// How many bits are needed?
int b = sizeof(unsigned long long) * CHAR_BIT - 1;
for(; b >= 0; --b)
{
if(val & (1ull << b))
{
break;
}
}
std::cout << "Need " << (b + 1) << " bits." << std::endl;
// If there are few enough significant bits, use normal cast and done.
if(b < mask_bit_count)
{
return static_cast<float>(val & ~1ull);
}
// Save off the low-order useless bits:
unsigned long long low_bits = val & ((1ull << (b - mask_bit_count)) - 1);
std::cout << "Saved low bits=" << low_bits << std::endl;
std::cout << val << "->mask->";
// Now mask away those useless low bits:
val &= ~((1ull << (b - mask_bit_count)) - 1);
std::cout << val << std::endl;
// Finally, decide how to round the new LSB:
if(low_bits > ((1ull << (b - mask_bit_count)) / 2ull))
{
std::cout << "Rounding up " << val;
// Round up.
val |= (1ull << (b - mask_bit_count));
std::cout << " to " << val << std::endl;
}
else
{
// Round down.
val &= ~(1ull << (b - mask_bit_count));
}
return static_cast<float>(val);
}
I did this in Smalltalk for arbitrary precision integer (LargeInteger), implemented and tested in Squeak/Pharo/Visualworks/Gnu Smalltalk/Dolphin Smalltalk, and even blogged about it if you can read Smalltalk code http://smallissimo.blogspot.fr/2011/09/clarifying-and-optimizing.html .
The trick for accelerating the algorithm is this one: IEEE 754 compliant FPU will round exactly the result of an inexact operation. So we can afford 1 inexact operation and let the hardware rounds correctly for us. That let us handle easily first 48 bits. But we cannot afford two inexact operations, so we sometimes have to care of the lowest bits differently...
Hope the code is documented enough:
#include <math.h>
#include <float.h>
float ull_to_float3(unsigned long long val)
{
int prec=FLT_MANT_DIG ; // 24 bits, the float precision
unsigned long long high=val>>prec; // the high bits above float precision
unsigned long long mask=(1ull<<prec) - 1 ; // 0xFFFFFFull a mask for extracting significant bits
unsigned long long tmsk=(1ull<<(prec - 1)) - 1; // 0x7FFFFFull same but tie bit
// handle trivial cases, 48 bits or less,
// let FPU apply correct rounding after exactly 1 inexact operation
if( high <= mask )
return ldexpf((float) high,prec) + (float) (val & mask);
// more than 48 bits,
// what scaling s is needed to isolate highest 48 bits of val?
int s = 0;
for( ; high > mask ; high >>= 1) ++s;
// high now contains highest 24 bits
float f_high = ldexpf( (float) high , prec + s );
// store next 24 bits in mid
unsigned long long mid = (val >> s) & mask;
// care of rare case when trailing low bits can change the rounding:
// can mid bits be a case of perfect tie or perfect zero?
if( (mid & tmsk) == 0ull )
{
// if low bits are zero, mid is either an exact tie or an exact zero
// else just increment mid to distinguish from such case
unsigned long long low = val & ((1ull << s) - 1);
if(low > 0ull) mid++;
}
return f_high + ldexpf( (float) mid , s );
}
Bonus: this code should round according to your FPU rounding mode whatever it may be, since we implicitely used the FPU to perform rounding with + operation.
However, beware of aggressive optimizations in standards < C99, who knows when the compiler will use extended precision... (unless you force something like -ffloat-store).
If you always want to round to nearest even, whatever the current rounding mode, then you'll have to increment high bits when:
mid bits > tie, where tie=1ull<<(prec-1);
mid bits == tie and (low bits > 0 or high bits is odd).
EDIT:
If you stick to round-to-nearest-even tie breaking, then another solution is to use Shewchuck EXPANSION-SUM of non adjacent parts (fhigh,flow) and (fmid) see http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps :
#include <math.h>
#include <float.h>
float ull_to_float4(unsigned long long val)
{
int prec=FLT_MANT_DIG ; // 24 bits, the float precision
unsigned long long mask=(1ull<<prec) - 1 ; // 0xFFFFFFull a mask for extracting significant bits
unsigned long long high=val>>(2*prec); // the high bits
unsigned long long mid=(val>>prec) & mask; // the mid bits
unsigned long long low=val & mask; // the low bits
float fhigh = ldexpf((float) high,2*prec);
float fmid = ldexpf((float) mid,prec);
float flow = (float) low;
float sum1 = fmid + flow;
float residue1 = flow - (sum1 - fmid);
float sum2 = fhigh + sum1;
float residue2 = sum1 - (sum2 - fhigh);
return (residue1 + residue2) + sum2;
}
This makes a branch-free algorithm with a bit more ops. It may work with other rounding modes, but I let you analyze the paper to make sure...
What is possible between between 8-byte integers and the float format is straightforward to explain but less so to implement!
The next paragraph concerns what is representable in 8 byte signed integers.
All positive integers between 1 (2^0) and 16777215 (2^24-1) are exactly representable in iEEE754 single precision (float). Or, to be precise, all numbers between 2^0 and 2^24-2^0 in increments of 2^0. The next range of exactly representable positive integers is 2^1 to 2^25-2^1 in increments of 2^1 and so on up to 2^39 to 2^63-2^39 in increments of 2^39.
Unsigned 8-byte integer values can be expressed up to 2^64-2^40 in increments of 2^40.
The single precison format doesn't stop here but goes on all the way up to the range 2^103 to 2^127-2^103 in increments of 2^103.
For 4-byte integers (long) the highest float range is 2^7 to 2^31-2^7 in 2^7 increments.
On the x86 architecture the largest integer type supported by the floating point instruction set is the 8 byte signed integer. 2^64-1 cannot be loaded by conventional means.
This means that for a given range increment expressed as "2^i where i is an integer >0" all integers that end with the bit pattern 0x1 up to 2^i-1 will not be exactly representable within that range in a float
This means that what you call rounding upwards is actually dependent on what range you are working in. It is of no use to try to round up by 1 (2^0) or 16 (2^4) if the granularity of the range you are in is 2^19.
An additional consequence of what you propose to do (rounding 2^63-1 to 2^63) could result in an (long integer format) overflow if you attempt the following conversion: longlong_int=(long long) ((float) 2^63).
Check out this small program I wrote (in C) which should help illustrate what is possible and what isn't.
int main (void)
{
__int64 basel=1,baseh=16777215,src,dst,j;
float cnvl,cnvh,range;
int i=0;
while (i<40)
{
src=basel<<i;
cnvl=(float) src;
dst=(__int64) cnvl; /* compare dst with basel */
src=baseh<<i;
cnvh=(float) src;
dst=(__int64) cnvh; /* compare dst with baseh */
j=basel;
while (j<=baseh)
{
range=(float) j;
dst=(__int64) range;
if (j!=dst) dst/=0;
j+=basel;
}
++i;
}
return i;
}
This program shows the representable integer value ranges. There is overlap beteen them: for example 2^5 is representable in all ranges with a lower boundary 2^b where 1=