How can I get a random uint or the last digit of float in HLSL/GLSL? - opengl

I just need a random uint, better ranging from 0-6, but there is no enumeration type in openGL. I learned that I can get a random float ranging 0-1 from the code below:
frac(sin(dot(uv, float2(12.9898, 78.233))) * 43758.5453123)
I tried to do 1/above and get floor(), but it doesn't work. Then how can I get a random int? or is there a way to get the last digit of the float(so presumably still random)?

First, let's define what we mean by "random". In the context of this answer, a "random" variable is a variable whose values are unpredictable. That is, there is no function that determines/computes an outcome for the random variable when being evaluated (with any possible inputs). Or at least, no such function has been found (yet).
Obviously, when we are talking about computing here, there is no such thing as a true random variable as described above, because anything we do in computing (and by extension in a shader) is necessarily bound to the set of functions that are computable.
Your proposed function in the question:
f(uv) = frac(sin(dot(uv, float2(12.9898, 78.233))) * 43758.5453123)
is just a computable function. It takes as input a vector uv, which itself is a deterministic/computable value - such as derived from a built-in or custom varying variable giving you the "coordinates" of the current fragment.
After evaluation, the function's result itself was computable/deterministic and happens to be a value (which the input vector uv maps to). Taking different IEEE 754 rules and precisions aside (which may vary between different GPUs such as desktop ones and mobile ones), the function itself is purely deterministic/computable and therefore does not give you a random value.
We humans may think that the output is random, because we lack the intuition for the functions used to compute the result, such that when we "see" a number 0.623513632 followed by another number 0.9734126 for only slight variations in the input vector, we could draw the conclusion that "yeah, that looks pretty random", when it fact it obviously isn't. It is just what that function computed, given two input values.
So, when you already have a deterministic function like the above and wanted to obtain values in the closed range [0, 6] from it as a GLSL uint, you can simply scale the output of said function by multiplying the function's result with 7.0 and truncating the result:
g(uv) = uint(f(uv) * 7.0)
If you wanted to obtain true random numbers drawn from a random variable (whose deterministic function simply hasn't been found yet), you can obtain such values from universe background radiation (such as from random.org) and use that as an input to your shader (such as via textures or buffer objects).
But, from a computational perspective, a shader is just a function taking in values (ints, floats, ...) and computing (by means of computable functions) a deterministic result.
All we can do is to shuffle/scramble/diffuse the input bits in such a way, that the result "looks" like random to us. We then call these "pseudo-random" values.
Taking this a step further, we could now ask the question of the distribution quality of the obtained pseudo-random values. This has two qualities:
how evenly distributed are the pseudo-random values in their domain/interval? I.e. do all possible values have the same probability of occurring? Or: Do you even want to have uniformly-distributed values or should the values follow another distribution (like Guassian?)
how well are two values drawn from two sequential input values spaced apart? I.e. what is the frequency of the pseudo-random values?
There are different (deterministic) algorithms/functions depending on which distribution and which frequency spectrum your values should have. But first, you should define an answer to the two questions for your use-case.
And by the way, the commonly used function in your question to obtain pseudo-random numbers in a shader has a terrible distribution quality.
Last but not least, it should also be mentioned that true randomness (i.e. non-determinism), like when you do use an entropy source as input values, is oftentimes an undesirable property in computation, because it:
makes it difficult to repeat the same computation / output when needed, which is useful in various algorithms in the context of path tracing
makes it difficult to reproduce/debug/inspect your function for a particular run when every following execution/run will yield a different output

Related

Checking results of parallelized BLAS routines

I implemented some parallel BLAS routines in OpenCL. To check if the kernels are correct, I also implemented the same routines in a naive way. After executing the kernels I compare the kernel results with the results of the naive implementation.
I understand that I can not compare float values with ==. I therefore calculate the absolute difference of the two floats and check if it exceeds a limit. I already read this article that describes a few other methods of comparing floats. My problem however is, that I am unsure about the limit to use to compare the floats. In my case the limit seems highly dependent on the BLAS routine and input size.
For example, I implemented asum that calculates the absolute sum of a vector of float values. For an input vector of size 16 777 216 the difference between the naive implementation and my parallelized implementation is 96! For an input size of 1 048 576 the difference is only 0.5. Im fairly certain that my kernel is correct, because I checked the results by hand for small input sizes. I'm guessing the difference accumulates due to the large input vector.
My question is, is there a way to calculate the maximal difference that can originate from float inaccuracies? Is there a way to know when the difference is definitly due to an error in the kernel code?
There is a technique called interval mathematics you can use here.
Instead of having some fixed error which you deem acceptable, you keep track of the most and least value a given floating point operation could "actually" be referring to.
Wikipedia has an article on it.
If I couldn't find a library, what I'd do is create an interval float type. It contains two floats, which represent the highest and lowest (inclusive) values that the interval could represent.
It would override + and * and / and - to include the effects of rounding. It would take work to write.
So if you add {1.0,1.0} and {2.0,2.0}, the answer would be {3.0,3.0}, as the range of values in the 3.0 may be large enough to account for the errors in the 1.0 and 2.0s.
Subtract 2.0 and the answer becomes {0.9999999999997, 1.00000000003} or similar, as the error in the {3.0, 3.0} is larger than error implied by {1.0, 1.0}.
The same holds for multiplication and division.
It may be shockingly easy for these intervals to reach "every possible number including inf/nan" if you have division involved. And, as noted, subtraction leads to serious problems; and if you have large terms that cancel, you can easily end up with error bars far larger than you might expect.
In the end, if your OpenCL solution results in a value within the interval, you can say "well, it isn't wrong".

boost::math::pdf, how does it compute a probability from a normal distribution with just 1 value?

I am using the boost::math::pdf to calculate a probability from a normal distribution. I give a variable which corresponds to distance to the mean and boost::math::pdf gives me a porbability in return.
It works, but i really dont get how because in a continuous distribution (and a normal distribution is a continuous distribution ) you need to integrate between two values to get a probability.
If the distribution is discrete then, a point really does corresponds to a probability but from everything i've read i got the impression that i deal with a continuous distribution.
I would really appriciate it if anyone can shed light upon the topic. How do you get the the probability of just one value with boost::math::pdf ?
PS: Since computer work in a discrete way, i though maybe the normal distribution i am using is discrete after all but that doesnt make sense tbh.
PDF stands for Probability Density Function, which is just a specialized function whose area under the curve from -infinity to +infinity equals 1 (for continuous probability distributions).
You are giving it the X value, and it is returning the resulting Y value. Your interpretation of that value is not correct - it is NOT the probability of the result equaling EXACTLY that X value (you are correct in that probability is weakly zero).
I recommend you read up about PDF (see the link above) so you understand the library function.

Y-Axis Units on pnorm command?

When one generates a graph using the pnorm command it generates a graph with units:
Y Axis: Normal F[(Variable Name-m)/s]
X Axis: P[i] = i/(N+1)
The X-Axis seems reasonable to calculate by hand. I am confused as to what the units of the Y-Axis mean?
How does Normal Normal F[(Variable Name-m)/s] break down? Does m represent the mean and s represent the standard deviation. If so, what does the function Normal F() represent?
This is a query about the underlying statistics.
F (usually better as F) is standard statistical notation for the cumulative distribution function, often abbreviated distribution function. That's the probability of being less than any particular value. For a single variable, as here, the function approaches 0 as values decrease towards the minimum of that variable (nothing can be less than the minimum) and 1 as values increase towards its maximum (nothing can be more).
In the case of the normal (Gaussian) distribution in principle any finite value is possible. The distribution function depends on the mean m and standard deviation s, as you surmise, which specify the particular normal distribution being compared with data. So, in words we have "normal distribution function with mean and standard deviation for these data".
All documented:
Stata manual entry for pnorm
Wikpedia on normal distribution
Wikipedia on P-P plots
FAQ on plotting positions

Hash 16-bit integer to a 256-bit space efficiently

It sounds weird to be going bigger, but that's what I'm trying to do. I want to take the entire sequence of 16-bit integers and hash each one in such a way that it maps to 256-bit space uniformly.
The reason for this is that I'm trying to put a subset of the 16-bit number space into a 256-bit bloom filter, for fast membership testing.
I could use some well-known hashing function on each integer, but I'm looking for an extremely efficient implementation (just a few instructions) so that this runs well in a GPU shader program. I feel like the fact that the hash input is known to be only 16-bits can inform the hash function is designed somehow, but I am failing to see the solution.
Any ideas?
EDITS
Based on the responses, my original question is confusing. Sorry about that. I will try to restate it with a more concrete example:
I have a subset S1 of n numbers from the set S, which is in the range (0, 2^16-1). I need to represent this subset S1 with a 256-bit bloom filter constructed with a single hashing function. The reason for the bloom filter is a space consideration. I've chosen a 256-bit bloom filter because it fits my space requirements, and has a low enough probability of false positives. I'm looking to find a very simple hashing function that can take a number from set S and represent it in 256 bits such that each bit has roughly equal probability of being 1 or 0.
The reason for the requirement of simplicity in the hashing function is that this hashing function is going to have to run thousands of times per pixel, so anywhere where I can trim instructions is a win.
If you multiply (using uint32_t) a 16 bit value by prime (or for that matter any odd number) p between 2^31 and 2^32, then you "probably" smear the results fairly evenly across the 32 bit space. Then you might want to add another prime value, to prevent 0 mapping to 0 (you want each bit to have an equal probability of being 0 or 1, only one input value in 2^256 should have output all zeros, and since there are only 2^16 inputs that means you want none of them to have output all zeros).
So that's how to expand 16 bits to 32 with one operation (plus whatever instructions are needed to load the constant). Use four different values p1 ... p4 to get 256 bits, and run some tests with different p values to find good ones (i.e. those that produce not too many more false positives than what you expect for your Bloom filter given the size of the set you're encoding and assuming an ideal hashing function). For example I'm pretty sure -1 is a bad p-value.
No matter how good the values you'll see some correlations, though: for example as I've described it above the lowest bit of all 4 separate values will be equal, which is a pretty serious dependency. So you probably want a couple more "mixing" operations. For example you might say that each byte of the final output shall be the XOR of two of the bytes of what I've described (and not two least-siginficant bytes!), just to get rid of the simple arithmetic relations.
Unless I've misunderstood the question, though, this is not how a Bloom filter usually works. Usually you want your hash to produce an exact fixed number of set bits for each input, and all the arithmetic to compute the false positive rate relies on this. That's why for a Bloom filter 256 bits in size you'd normally have k 8-bit hashes, not one 256-bit hash. k is normally rather less than half the size of the filter in bits (the optimal value is the number of bits per value in the filter, times ln(2) which is about 0.7). So normally you don't want the probability of each bit being 1 to be anything like as high as 0.5.
The reason is that once you've ORed as few as 4 such 256-bit values together, almost all the bits in your filter are set (15 in 16 of them). So you're looking at a lot of false positives already.
But if you've done the math and you're happy with a single hash function producing a variable number of set bits averaging half of them, then fair enough. Or is the double-occurrence of the number 256 just a coincidence, because k happens to be 32 for the set size you have chosen and you're actually using the 256-bit hash as 32 8-bit hashes?
[Edit: your comment clarifies this, but anyway k should not get so high that you need 256 bits of hash in total. Clearly there's no point in this case using a Bloom filter with more than 16 bits per value (i.e fewer than 16 values), since using the same amount of space you could just list the values, and have a false positive rate of 0. A filter with 16 bits per value gives a false positive rate of something like 1 in 2200. Even there, optimal k is only 23, that is you should set 23 bits in the filter for each value in the set. If you expect the sets to be bigger than 16 values then you want to set fewer bits for each element, and you'll get a higher false positive rate.]
I believe there is some confusion in the question as posed. I will first try to clear up any inconsistencies I've noticed above.
OP originally states that he is trying to map a smaller space into a larger one. If this is truly the case, then the use of the bloom filter algorithm is unnecessary. Instead, as has been suggested in the comments above, the identity function is the only "hash" function necessary to set and test each bit. However, I make the assertion that this is not really what the OP is looking for. If so, then the OP must be storing 2^256 bits in memory (based on how the question is stated) in order for the space of 16-bit integers (i.e. 2^16) to be smaller than his set size; this is an unreasonable amount of memory to be using and is highly unlikely to be the case.
Therefore, I make the assumption that the problem constraints are as follows: we have a 256-bit bit vector in which we want to map the space of 16-bit integers. That is, we have 256 bits available to map 2^16 possible different integers. Thus, we are not actually mapping into a larger space, but, instead, a much smaller space. Similarly, it does appear (again, as previously pointed out in the comments above) that the OP is requesting a single hash function. If this is the case, there is clear misunderstanding about how bloom filters work.
Bloom filters typically use a set of hash independent hash functions to reduce false positives. Without going into too much detail, every input to the bloom filter runs through all n hash functions and then the resulting index in the bit vector is tested for each function. If all indices tested are set to 1, then the value may be in the set (with proper collisions in all n hash functions or overlap, false positives will occur). Moreover, if any of the indices is set to 0, then the value is absolutely not in the set. With this in mind, it is important to notice that an entirely saturated bloom filter has no benefit. That is, every query to the bloom filter will return that the item is in the set.
Hash Function Concerns
Now, back to the OP's original question. It is likely going to be best to use known hashing algorithms (since these are mathematically difficult to write and "rolling your own" typically doesn't end well). If you are worried about efficiency down to clock-cycles, implement the algorithm yourself in the appropriate assembly language for your architecture to reduce running time for each hash function. Remember, algorithmically, hash functions should run in O(1) time, so they should not contribute too much overhead if implemented properly. To start you off, I would recommend considering the modified bernstein hash. I have written a version for your specific case below (mostly for example purposes):
unsigned char modified_bernstein(short key)
{
unsigned ret = key & 0xff;
ret = 33 * ret ^ (key >> 8);
return ret % 256; // Try to do some modulo math to keep it in range
}
The bernstein method I have adapted generally runs as a function of the number of bytes of the input. Since a short type is 2 bytes or 16-bits, I have removed any variables and loops from the algorithm and simply performed some bit twiddling to get at each byte. Finally, an unsigned char can return a value in the range of [0,256) which forces the hash function to return a valid index in the bit vector.

Using bitwise & instead of modulus operator to randomly sample integers from a range

I need to randomly sample from a uniform distribution of integers over the interval [LB,UB] in C++. To do so, I start with a "good" RN generator (from Numerical Recipes 3rd ed.) that uniformly randomly samples 64-bit integers; let's call it int64().
Using the mod operator, I can sample from the integers in [LB,UB] by:
LB+int64()%(UB-LB+1);
The only issue with using the mod operator is the slowness of the integer division. So, I then tried the method suggested here, which is:
LB + (int64()&(UB-LB))
The bitwise & method is about 3 times as fast. This is huge for me, because one of my simulations in C++ needs to randomly sample about 20 million integers.
But there's 1 big problem. When I analyze the integers sampled using the bitwise & method, they don't appear uniformly distributed over the interval [LB,UB]. The integers are indeed sampled from [LB,UB], but only from the even integers in that range. For example, here is a histogram of 5000 integers sampled from [20,50] using the bitwise & method:
By comparison, here is what a similar histogram looks like when using the mod operator method, which of course works fine:
What's wrong with my bitwise & method? Is there any way to modify it so that both even and odd numbers are sampled over the defined interval?
The bitwise & operator looks at each pair of corresponding bits of its operands, performs an and using only those two bits, and puts that result in the corresponding bit of the result.
So, if the last bit of UB-LB is 0, then the last bit of the result is 0. That is to say, if UB-LB is even then every output will be even.
The & is inappropriate to the purpose, unless UB-LB+1 is a power of 2. If you want to find a modulus, then there's no general shortcut: the compiler will already implement % the fastest way it knows.
Note that I said no general shortcut. For particular values of UB-LB, known at compile time, there can be faster ways. And if you can somehow arrange for UB and LB to have values that the compiler can compute at compile time then it will use them when you write %.
By the way, using % does not in fact produce uniformly-distributed integers over the range, unless the size of the range is a power of 2. Otherwise there must be a slight bias in favour of certain values, because the range of your int64() function cannot be assigned equally across the desired range. It may be that the bias is too small to affect your simulation in particular, but bad random number generators have broken random simulations in the past, and will do so again.
If you want a uniform random number distribution over an arbitrary range, then use std::uniform_int_distribution from C++11, or the class of the same name in Boost.
This works well if the range difference (UB-LB) is 2n-1, but won't work at all well if for example 2n.
The two are equivalent only when the size of the interval is a power of two. In general y%x and y&(x-1) are not the same.
For example, x%5 produces numbers from 0 to 4 (or to -4, for negative x), but x&4 produces either 0 or 4, never 1, 2, or 3, because of how bitwise operators work...