Generating random numbers given a uniform random number generator

Generating random numbers given a uniform random number generator - c++

I was asked to generate a random number between a and b, inclusive, using random(0,1). random(0,1) generates a uniform random number between 0 and 1.
I answered
(a+(((1+random(0,1))*b))%(b-a))
My interviewer was not satisfied with my usage of b in this piece of the expression:
(((1+random(0,1))*b))
Then I tried changing my answer to:
int*z=(int*)malloc(sizeof(int));
(a+(((1+random(0,1))*(*z)))%(b-a));
Later the question changed to generate random(1,7) from random(1,5). I responded with:
A = rand(1,5)%3
B = (rand(1,5)+1)%3
C = (rand(1,5)+2)%3
rand(1,7) = rand(1,5)+ (A+B+C)%3
Were my answers correct?

I think you were confused between random integral-number generator and random floating-point number generator. In C++, rand() generates random integral number between 0 and 32K. Thus to generate a random number from 1 to 10, we write rand() % 10 + 1. As such, to generate a random number from integer a to integer b, we write rand() % (b - a + 1) + a.
The interviewer told you that you had a random generator from 0 to 1. It means floating-point number generator.
How to get the answer mathematically:
Shift the question to a simple form such that the lower bound is 0.
Scale the range by multiplication
Re-shift to the required range.
For example: to generate R such that
a <= R <= b.
Apply rule 1, we get a-a <= R - a <= b-a
0 <= R - a <= b - a.
Think R - a as R1. How to generate R1 such that R1 has range from 0 to (b-a)?
R1 = rand(0, 1) * (b-a) // by apply rule 2.
Now substitute R1 by R - a
R - a = rand(0,1) * (b-a) ==> R = a + rand(0,1) * (b-a)
==== 2nd question - without explanation ====
We have 1 <= R1 <= 5
==> 0 <= R1 - 1 <= 4
==> 0 <= (R1 - 1)/4 <= 1
==> 0 <= 6 * (R1 - 1)/4 <= 6
==> 1 <= 1 + 6 * (R1 - 1)/4 <= 7
Thus, Rand(1,7) = 1 + 6 * (rand(1,5) - 1) / 4

random(a,b) from random(0,1):
random(0,1)*(b-a)+a
random(c,d) from random(a,b):
(random(a,b)-a)/(b-a)*(d-c)+c
or, simplified for your case (a=1,b=5,c=1,d=7):
random(1,5) * 1.5 - 0.5
(note: I assume we're talking about float values and that rounding errors are negligible)

random(a,b) from random(c,d) = a + (b-a)*((random(c,d) - c)/(d-c))
No?

[random(0,1)*(b-a)] + a, i think would give random numbers b/w a&b.
([random(1,5)-1]/4)*6 + 1 should give the random nubers in the range (1,7)
I am not sure whether the above will destroy the uniform distribution..

Were my answers correct?
I think there are some problems.
First off, I'm assuming that random() returns a floating point value - otherwise to generate any useful distribution of a larger range of numbers using random(0,1) would require repeated calls to generate a pool of bits to work with.
I'm also going to assume C/C++ is the intended platform, since the question is tagged as such.
Given these assumptions, one problem with your answers is that C/C++ do not allow the use of the % operator on floating point types.
But even if we imagine that the % operator was replaced with a function that performed a modulo operation with floating point arguments in a reasonable way, there are still some problems. In your initial answer, if b (or the uninitialized *z allocated in your second attempt - I'm assuming this is a kind of bizarre way to get an arbitrary value, or is something else intended?) is zero (say the range given for a and b is (-5, 0)), then your result will be decidedly non-uniform. The result would always be b.
Finally, I'm certainly no statistician, but in your final answer (to generate random(1,7) from random(1.5)), I'm pretty sure that A+B+C would be non-uniform and would therefore introduce a bias in the result.

I think that there is a nicer answer to this. There is one value (probability -> zero) that this overflows and thus the modulus is there.
Take a random number x in the interval [0,1].
Increment your upper_bound which could be a parameter by one.
Calculate (int(random() / (1.0 / upper_bound)) % upper_bound) + 1 + lower_bound .
This ought to return a number in your desired interval.

given random(0,5) you can generate random(0,7) in the following way
A = random(0,5)*random(0,5)
now the range of A is 0-25
if we simply take the modulo 7 of A, we can get the random numbers but they wont be truly random as for values of A from 22-25, you will get 1-4 values after modulo operation, hence getting modulo 7 from range(0,25) will bias the output towards 1-4. This is because 7 does not evenly divide 25: the largest multiple of 7 less than or equal to 25 is 7*3=21 and it is the numbers in the incomplete range from 21-25 that will cause the bias.
Easiest way to fix this problem is to discard those numbers (from 22-25) and to keep tying again until a number in the suitable range come up.
Obviously, this is true when we assume that we want random integers.
However to get random float numbers we need to modify the range accordingly as described in above posts.

Related

How to write this floating point code in a portable way?

I am working on a cryptocurrency and there is a calculation that nodes must make:
average /= total;
double ratio = average/DESIRED_BLOCK_TIME_SEC;
int delta = -round(log2(ratio));
It is required that every node has the exact same result no matter what architecture or stdlib being used by the system. My understanding is that log2 might have different implementations that yield very slightly different results or flags like --ffast-math could impact the outputted results.
Is there a simple way to convert the above calculation to something that is verifiably portable across different architectures (fixed point?) or am I overthinking the precision that is needed (given that I round the answer at the end).
EDIT: Average is a long and total is an int... so average ends up rounded to the closest second.
DESIRED_BLOCK_TIME_SEC = 30.0 (it's a float) that is #defined

For this kind of calculation to be exact, one must either calculate all the divisions and logarithms exactly -- or one can work backwards.
-round(log2(x)) == round(log2(1/x)), meaning that one of the divisions can be turned around to get (1/x) >= 1.
round(log2(x)) == floor(log2(x * sqrt(2))) == binary_log((int)(x*sqrt(2))).
One minor detail here is, if (double)sqrt(2) rounds down, or up. If it rounds up, then there might exist one or more value x * sqrt2 == 2^n + epsilon (after rounding), where as if it would round down, we would get 2^n - epsilon. One would give the integer value of n the other would give n-1. Which is correct?
Naturally that one is correct, whose ratio to the theoretical mid point x * sqrt(2) is smaller.
x * sqrt(2) / 2^(n-1) < 2^n / (x * sqrt(2)) -- multiply by x*sqrt(2)
x^2 * 2 / 2^(n-1) < 2^n -- multiply by 2^(n-1)
x^2 * 2 < 2^(2*n-1)
In order of this comparison to be exact, x^2 or pow(x,2) must be exact as well on the boundary - and it matters, what range the original values are. Similar analysis can and should be done while expanding x = a/b, so that the inexactness of the division can be mitigated at the cost of possible overflow in the multiplication...
Then again, I wonder how all the other similar applications handle the corner cases, which may not even exist -- and those could be brute force searched assuming that average and total are small enough integers.
EDIT
Because average is an integer, it makes sense to tabulate those exact integer values, which are on the boundaries of -round(log2(average)).
From octave: d=-round(log2((1:1000000)/30.0)); find(d(2:end) ~= find(d(1:end-1))
1 2 3 6 11 22 43 85 170 340 679 1358 2716
5431 10862 21723 43445 86890 173779 347558 695115
All the averages between [1 2( -> 5
All the averages between [2 3( -> 4
All the averages between [3 6( -> 3
..
All the averages between [43445 86890( -> -11
int a = find_lower_bound(average, table); // linear or binary search
return 5 - a;
No floating point arithmetic needed

I'm having problem storing 10^18 as a float

I'm writing a program which must take in an integer, N, in range 3<=N<=10^18.
This is one of the operations I have to perform with N.
final=((0.5*(pow(2,0.5))*(pow((pow(((N/2)-0.5),2)+pow((N/2)-0.5,2)),0.5)))-0.5)*4;
N is such that final is guaranteed to contain an integer.
The problem is that I can't store N in a float type as it is too large. If I store it in long long int, the answer is wrong(I think its because the intermediate value of N / 2 is then rounded off).

The correct answer is
final = 2 * abs(N-1) - 2;
This can be verified, by removing the unnecessary parenthesis, regrouping same terms, distributing multiplication by constants, and using the following identities:
square root of squared is absolute value: pow(pow(x,2),0.5) = abs(x)
pow(k*x,0.5) = pow(k,0.5)*pow(x,0.5)
k*abs(x+y) = abs(k*x + k*y)
This is almost the same than the accepted answer. But the other answer is correct only for any N >= 1. It is wrong as soon as N-1<0, so in the range of the possible N values allowed by your question, for 0 <= N < 1
You can verify this with this online demo
Edit: following your edit of the question that changes the range and therewith exclude the problematic values, the accepted answer will do. I leave my answer here for the records, and for the sake of the maths ;-)

This formula looks intimidating, but can be simplified (if N > 1) to
final = 4 * (N / 2 - 1)
using the identity: (xa)1/a = x.
If N / 2 is supposed to be a floating-point division, not an integer one, the answer is
final = 2 * (N - 2)

log and rand() gives not a number

In the following part of code:
I want to generate a random number "U" from the range 0 to 1,
then I calculate an equation having log
The error is: some value of U makes the log in the equation give "not a number"value
I tried casting the "U" to float or double or even round it to 2 decimal places but same error
vector <double>Xs;//random Xs
double x;
double U;
while (check_arr < 360)
{
U = ((rand() / RAND_MAX) * 100) / 100;
x = (log10(1 - U)) / (-1 / a);
Xs.push_back(x);
}

There are multiple problems with your code.
rand() returns an integer, and RAND_MAX is an integer, so when you divide them you get an integer which will almost always be zero (since rand() can produce the value RAND_MAX - one time in 2^31 on my computer - and that division will produce 1).
Next, multiplying then dividing by 100 is doing nothing. The result will be the same: an integer that's almost always 0, sometimes 1.
Finally, you must avoid taking the log10 of zero. This value is disallowed and will raise the divide-by-zero exception (also, negative values would raise the invalid floating point exception).
Perhaps you could use the following expression instead:
U = (rand() % 100)/100.0;
This will give you a value of U with a distribution from 0.00 up to 0.99 inclusive. When you then take log10(1-U) you won't get an exception.

log10() will return "not a number" when the parameter being passed to it is 0. When I ran the method on my machine the result that I got was "-1.#INF000000000000". log(0) is an invalid number. You can verify this by opening the calculator on your PC (if you are using windows), switch to scientific mode then try to do log 0.
Mathematical explanation:
The log base 10 function is used to help find the exponent y in 10^y=x. So when you are trying to plug in 0 in the function you are trying to find a solution to the following:
10^y=0
But there are no solution to this so instead the function will return an invalid number. It would be better if you set the range of the x value to 0 < x <= 1 so you will not have that same issue.
Since the rand function returns a value between 0 and RAND-MAX you can be able to use the following to ensure that you will not input 0 into the log function:
U = (rand() % 100 + 1)/100;
This will return a range of 0.01 and 1. You can mess around with the numbers to increase/decrease the range.

Fast integer solution of x(x-1)/2 = c

Given a non-negative integer c, I need an efficient algorithm to find the largest integer x such that
x*(x-1)/2 <= c
Equivalently, I need an efficient and reliably accurate algorithm to compute:
x = floor((1 + sqrt(1 + 8*c))/2) (1)
For the sake of defineteness I tagged this question C++, so the answer should be a function written in that language. You can assume that c is an unsigned 32 bit int.
Also, if you can prove that (1) (or an equivalent expression involving floating-point arithmetic) always gives the right result, that's a valid answer too, since floating-point on modern processors can be faster than integer algorithms.

If you're willing to assume IEEE doubles with correct rounding for all operations including square root, then the expression that you wrote (plus a cast to double) gives the right answer on all inputs.
Here's an informal proof. Since c is a 32-bit unsigned integer being converted to a floating-point type with a 53-bit significand, 1 + 8*(double)c is exact, and sqrt(1 + 8*(double)c) is correctly rounded. 1 + sqrt(1 + 8*(double)c) is accurate to within one ulp, since the last term being less than 2**((32 + 3)/2) = 2**17.5 implies that the unit in the last place of the latter term is less than 1, and thus (1 + sqrt(1 + 8*(double)c))/2 is accurate to within one ulp, since division by 2 is exact.
The last piece of business is the floor. The problem cases here are when (1 + sqrt(1 + 8*(double)c))/2 is rounded up to an integer. This happens if and only if sqrt(...) rounds up to an odd integer. Since the argument of sqrt is an integer, the worst cases look like sqrt(z**2 - 1) for positive odd integers z, and we bound
z - sqrt(z**2 - 1) = z * (1 - sqrt(1 - 1/z**2)) >= 1/(2*z)
by Taylor expansion. Since z is less than 2**17.5, the gap to the nearest integer is at least 1/2**18.5 on a result of magnitude less than 2**17.5, which means that this error cannot result from a correctly rounded sqrt.
Adopting Yakk's simplification, we can write
(uint32_t)(0.5 + sqrt(0.25 + 2.0*c))
without further checking.

If we start with the quadratic formula, we quickly reach sqrt(1/4 + 2c), round up at 1/2 or higher.
Now, if you do that calculation in floating point, there can be inaccuracies.
There are two approaches to deal with these inaccuracies. The first would be to carefully determine how big they are, determine if the calculated value is close enough to a half for them to be important. If they aren't important, simply return the value. If they are, we can still bound the answer to being one of two values. Test those two values in integer math, and return.
However, we can do away with that careful bit, and note that sqrt(1/4 + 2c) is going to have an error less than 0.5 if the values are 32 bits, and we use doubles. (We cannot make this guarantee with floats, as by 2^31 the float cannot handle +0.5 without rounding).
In essense, we use the quadratic formula to reduce it to two possibilities, and then test those two.
uint64_t eval(uint64_t x) {
return x*(x-1)/2;
}
unsigned solve(unsigned c) {
double test = sqrt( 0.25 + 2.*c );
if ( eval(test+1.) <= c )
return test+1.
ASSERT( eval(test) <= c );
return test;
}
Note that converting a positive double to an integral type rounds towards 0. You can insert floors if you want.

This may be a bit tangential to your question. But, what caught my attention is the specific formula. You are trying to find the triangular root of Tn - 1 (where Tn is the nth triangular number).
I.e.:
Tn = n * (n + 1) / 2
and
Tn - n = Tn - 1 = n * (n - 1) / 2
From the nifty trick described here, for Tn we have:
n = int(sqrt(2 * c))
Looking for n such that Tn - 1 ≤ c in this case doesn't change the definition of n, for the same reason as in the original question.
Computationally, this saves a few operations, so it's theoretically faster than the exact solution (1). In reality, it's probably about the same.
Neither this solution or the one presented by David are as "exact" as your (1) though.
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(2 * c)) (red) vs Exact (white line)
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(0.25 + 2 * c) + 0.5 (red) vs Exact (white line)
My real point is that triangular numbers are a fun set of numbers that are connected to squares, pascal's triangle, Fibonacci numbers, et. al.
As such there are loads of identities around them which might be used to rearrange the problem in a way that didn't require a square root.
Of particular interest may be that Tn + Tn - 1 = n2
I'm assuming you know that you're working with a triangular number, but if you didn't realize that, searching for triangular roots yields a few questions such as this one which are along the same topic.

How to efficiently find the largest integer closest to the mean of two integers in increments of 100,000?

Let's say I am given integers x and y (satisfying x <= y with ones digit of 0 so they are, in particular, divisible by two). Then I know that their average avg = ((x+y) / 2) is an integer as well. I would like to find this midpoint rounded up to a resolution of 100. In other words if my two inputs are 75200 and 75300 then the avg is 75250 and rounded up to the nearest 100 (but without exceeding or equaling the bigger number) forces the answer to be 75200.
How can I implement this logic without first dividing everything by 100 and using the following floating point arithmetic:
x + std::floor((y - x) * .5 * 100 + .5)*0.01
In other words, how can I do the above without floating point values but obtain the same behavior at the resolution of 100 instead of 0.01?

To compute the average you can do
avg = (x + y) / 2
(BTW, integer addition and division by 2 are very cheap operations even on small microcontrollers.)
To round this to the nearest multiple of 100 (corresponding to your floating-point example) you can do
result = ((avg + 50) / 100) * 100
as integer division rounds down to the nearest integer. By changing the 50 to 0 you can always round down, while changing it to 99 always rounds up.
Edit: Note that this method for rounding doesn't work for negative numbers. Since integer division rounds towards zero, in that case you'll need to subtract the 50, subtract 99 to always round down and subtract 0 to always round up.

Your problematic example requires strong conditions:
the difference between x and y needs to be not greater than 100
y % 100 must be 0
So for most cases, a simple rounded average is perfect for you:
avg100 = avg - (avg % 100) + 100
The tricky part is fixing the remaining error without a condition - if you want to avoid conditions, or slow operations.
For this, the best way is to use a multiplication, and split the expression into two:
avg100 = avg - (avg % 100)
avg100 += 100 * !!(y - avg100)
For most cases, y is greater than avg100. For this case, the !! operator will return 1. In the rare case when they equal, it will return a 0, and it won't change the value.
(I don't know if the compiler will really generate a code without conditions for the '!!' operator, but I don't have a batter idea, and if it is possible, I think it will. If not, this code is still short and easy to understand.)
Also, you can calculate the average using the following expression:
avg = y - (y-x)/2
Or even change the division into bit shift for optimization.
This won't require for both of the numbers to be even, just to be the same parity.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Generating random numbers given a uniform random number generator - c++

random(a,b) from random(0,1): random(0,1)(b-a)+a random(c,d) from random(a,b): (random(a,b)-a)/(b-a)(d-c)+c or, simplified for your case (a=1,b=5,c=1,d=7): random(1,5) * 1.5 - 0.5 (note: I assume we're talking about float values and that rounding errors are negligible)

random(a,b) from random(c,d) = a + (b-a)*((random(c,d) - c)/(d-c)) No?

[random(0,1)(b-a)] + a, i think would give random numbers b/w a&b. ([random(1,5)-1]/4)6 + 1 should give the random nubers in the range (1,7) I am not sure whether the above will destroy the uniform distribution..

Related

How to write this floating point code in a portable way?

I'm having problem storing 10^18 as a float

log and rand() gives not a number

Fast integer solution of x(x-1)/2 = c

How to efficiently find the largest integer closest to the mean of two integers in increments of 100,000?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Generating random numbers given a uniform random number generator - c++

random(a,b) from random(0,1): random(0,1)*(b-a)+a random(c,d) from random(a,b): (random(a,b)-a)/(b-a)*(d-c)+c or, simplified for your case (a=1,b=5,c=1,d=7): random(1,5) * 1.5 - 0.5 (note: I assume we're talking about float values and that rounding errors are negligible)

random(a,b) from random(c,d) = a + (b-a)*((random(c,d) - c)/(d-c)) No?

[random(0,1)*(b-a)] + a, i think would give random numbers b/w a&b. ([random(1,5)-1]/4)*6 + 1 should give the random nubers in the range (1,7) I am not sure whether the above will destroy the uniform distribution..

Related

How to write this floating point code in a portable way?

I'm having problem storing 10^18 as a float

log and rand() gives not a number

Fast integer solution of x(x-1)/2 = c

How to efficiently find the largest integer closest to the mean of two integers in increments of 100,000?

Categories

Resources

random(a,b) from random(0,1): random(0,1)(b-a)+a random(c,d) from random(a,b): (random(a,b)-a)/(b-a)(d-c)+c or, simplified for your case (a=1,b=5,c=1,d=7): random(1,5) * 1.5 - 0.5 (note: I assume we're talking about float values and that rounding errors are negligible)

[random(0,1)(b-a)] + a, i think would give random numbers b/w a&b. ([random(1,5)-1]/4)6 + 1 should give the random nubers in the range (1,7) I am not sure whether the above will destroy the uniform distribution..