I've seen this pseudo-random number generator for use in shaders referred to here and there around the web:
float rand(vec2 co){
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}
It's variously called "canonical", or "a one-liner I found on the web somewhere".
What's the origin of this function? Are the constant values as arbitrary as they seem or is there some art to their selection? Is there any discussion of the merits of this function?
EDIT: The oldest reference to this function that I've come across is this archive from Feb '08, the original page now being gone from the web. But there's no more discussion of it there than anywhere else.
Very interesting question!
I am trying to figure this out while typing the answer :)
First an easy way to play with it: http://www.wolframalpha.com/input/?i=plot%28+mod%28+sin%28x*12.9898+%2B+y*78.233%29+*+43758.5453%2C1%29x%3D0..2%2C+y%3D0..2%29
Then let's think about what we are trying to do here: For two input coordinates x,y we return a "random number". Now this is not a random number though. It's the same every time we input the same x,y. It's a hash function!
The first thing the function does is to go from 2d to 1d. That is not interesting in itself, but the numbers are chosen so they do not repeat typically. Also we have a floating point addition there. There will be a few more bits from y or x, but the numbers might just be chosen right so it does a mix.
Then we sample a black box sin() function. This will depend a lot on the implementation!
Lastly it amplifies the error in the sin() implementation by multiplying and taking the fraction.
I don't think this is a good hash function in the general case. The sin() is a black box, on the GPU, numerically. It should be possible to construct a much better one by taking almost any hash function and converting it. The hard part is to turn the typical integer operation used in cpu hashing into float (half or 32bit) or fixed point operations, but it should be possible.
Again, the real problem with this as a hash function is that sin() is a black box.
The origin is probably the paper: "On generating random numbers, with help of y= [(a+x)sin(bx)] mod 1", W.J.J. Rey, 22nd European Meeting of Statisticians and the 7th Vilnius Conference on Probability Theory and Mathematical Statistics, August 1998
EDIT: Since I can't find a copy of this paper and the "TestU01" reference may not be clear, here's the scheme as described in TestU01 in pseudo-C:
#define A1 ???
#define A2 ???
#define B1 pi*(sqrt(5.0)-1)/2
#define B2 ???
uint32_t n; // position in the stream
double next() {
double t = fract(A1 * sin(B1*n));
double u = fract((A2+t) * sin(B2*t));
n++;
return u;
}
where the only recommended constant value is the B1.
Notice that this is for a stream. Converting to a 1D hash 'n' becomes the integer grid. So my guess is that someone saw this and converted 't' into a simple function f(x,y). Using the original constants above that would yield:
float hash(vec2 co){
float t = 12.9898*co.x + 78.233*co.y;
return fract((A2+t) * sin(t)); // any B2 is folded into 't' computation
}
the constant values are arbitrary, especially that they are very large, and a couple of decimals away from prime numbers.
a modulus over 1 of a hi amplitude sinus multiplied by 4000 is a periodic function. it's like a window blind or a corrugated metal made very small because it's multiplied by 4000, and turned at an angle by the dot product.
as the function is 2-D, the dot product has the effect of turning the periodic function at an oblique relative to X and Y axis. By 13/79 ratio approximately. It is inefficient, you can actually achieve the same by doing sinus of (13x + 79y) this will also achieve the same thing I think with less maths..
If you find the period of the function in both X and Y, you can sample it so that it will look like a simple sine wave again.
Here is a picture of it zoomed in graph
I don't know the origin but it is similar to many others, if you used it in graphics at regular intervals it would tend to produce moire patterns and you could see it's eventually goes around again.
Maybe it's some non-recurrent chaotic mapping, then it could explain many things, but also can be just some arbitrary manipulation with large numbers.
EDIT: Basically, the function fract(sin(x) * 43758.5453) is a simple hash-like function, the sin(x) provides smooth sin interpolation between -1 to 1, so sin(x) * 43758.5453 will be interpolation from -43758.5453 to 43758.5453. This is a quite huge range, so even small step in x will provide large step in result and really large variation in fractional part. The "fract" is needed to get values in range -0.99... to 0.999... .
Now, when we have something like hash function we should create function for production hash from the vector. The simplest way is call "hash" separetly for x any y component of the input vector. But then, we will have some symmetrical values. So, we should get some value from the vector, the approach is find some random vector and find "dot" product to that vector, here we go: fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
Also, according to the selected vector, its lenght should be long engough to have several peroids of the "sin" function after "dot" product will be computed.
I do not believe this to be the true origin, but OP's code is presented as code example in "The Book of Shaders" by Patricio Gonzalez Vivo and Jen Lowe ( https://thebookofshaders.com/10/ ). In their code, Patricio Gonzales Vivo is cited as the author, i.e "// Author #patriciogv - 2015"
Since the OP's research dates back even further (to '08), the source might at least explain its popularity, and the author might be able to shed some light on his source.
Related
Hello stackoverflow community,
I have troubles in understanding a least-square-error-problem in the c++ armadillo package.
I have a matrix A with many more rows than columns (5000 to 100 for example) so it is overdetermined.
I want to find x so that A*x=b gives me the least square error.
If i use the solve function of armadillo on my data like "x = Solve(A,b)" the error of "(A*x-b)^2" is sometimes way to high.
If on the other hand I solve for x with the analytical form by "x = (A^T * A)^-1 *A^T * b" the results are always right.
The results for x in both cases can differ by 10 magnitudes.
I had thought that armadillo would use this analytical form in the background if the system is overdetermined.
Now I would like to understand why these two methods give such different results.
I wanted to give a short example program, but i can't reproduce this behavior with a short program.
I thought about giving the Matrix here, but with 5000 times 100 it's also very big. I can deliver the values for which this happens though if needed.
So as a short background.
The matrix I get from my program is a numerically solved reaction of a nonlinear oscillator in which I put information inside by wiggeling a parameter of this system.
Because the influence of this parameter on the system is small, the values of my different rows are very similar but never the same, otherwise armadillo should throw an error.
I'm still thinking that this is the problem, but the solve function never threw any error.
Another thing that confuses me is that in a short example program with a random matrix, the analytical form is waaay slower than the solve function.
But on my program, both are nearly identically fast.
I guess this has something to do with the numerical convergence of the pseudo inverse and the special case of my matrix, but for that i don't know enough about how armadillo works.
I hope someone can help me with that problem and thanks a lot in advance.
Thanks for the replies. I think i figured the problem out and wanted to give some feedback for everybody who runs into the same problem.
The Armadillo solve function gives me the x that minimizes (A*x-b)^2.
I looked at the values of x and they are sometimes in the magnitude of 10^13.
This comes from the fact that the rows of my matrix only slightly change. (So nearly linear dependent but not exactly).
Because of that i was in the numerical precision of my doubles and as a result my error sometimes jumped around.
If i use the rearranged analytical formular (A^T * A)*x = *A^T * b with the solve function this problem doesn't occur anymore because the fitted values of x are in the magnitude of 10^4. The least square error is a little bit higher but that is okay, as i want to avoid overfitting.
I now additionally added Tikhonov regularization by solving (A^T * A + lambda*Identity_Matrix)*x = *A^T * b with the solve function of armadillo.
Now the weight vectors are in the order of around 1 and the error nearly doesn't change compared to the formular without regularization.
I have some code, a function basically, that returns a value. This function takes long time to run. Function takes a double as a parameter:
double estimate(double factor);
My goal is to find such parameter factor at which this estimate function returns maximum value. I can simply brute force and iterate different factor inputs and get what I need, but the function takes long time to run, so I'd like to minimize amount of "probes" that I take (e.g. call the estimate function as least as possible).
Usually, maximum is returned for factor values between 0.5 and 3.5. If I graph returned values, I get something that looks like a bell curve. What's the most efficient approach to partition possible inputs to that I could discover maximum faster?
The previous answer suggested a 2 point approach. This is a good idea for functions that are approximately lines, because lines are defined by 2 parameters: y=ax+b.
However, the actual bell-shaped curve is more like a parabola, which is defined by ax²+bx+c (so 3 parameters). You therefore should take 3 points {x1,x2,x3} and solve for {a,b,c}. This will give you an estimate for the xtop at -b/2a. (The linked answer uses the name x0 here).
You'll need to iteratively approximate the actual top if the function isn't a real parabola, but this process converges really fast. The easiest solution is to take the original triplet x1,x2,x3, add xtop and remove the xn value which is furthest away from xtop. The advantage of this is that you can reuse 2 of the old f(x) values. This reuse helps a lot with the stated goal of "mininal samples".
If your function indeed has a bell shaped curve then you can use binary search as follows:
Choose an initial x1 (say x1 = 2, midway between 0.5 and 3.5) and find f(x1) and f(x1 + delta) where delta is small enough. If f(x1 + delta) > f(x1) it means that the peak is towards the right of x1 otherwise it is towards the left.
Carry out binary search and come to a close enough value of the peak as you want.
You can modify the above approach by choosing the next x_t according to the difference f(x1 + delta) - f(x1).
Here's the problem:
I am currently trying to create a control system which is required to find a solution to a series of complex linear equations without a unique solution.
My problem arises because there will ever only be six equations, while there may be upwards of 20 unknowns (usually way more than six unknowns). Of course, this will not yield an exact solution through the standard Gaussian elimination or by changing them in a matrix to reduced row echelon form.
However, I think that I may be able to optimize things further and get a more accurate solution because I know that each of the unknowns cannot have a value smaller than zero or greater than one, but it is free to take on any value in between them.
Of course, I am trying to create code that would find a correct solution, but in the case that there are multiple combinations that yield satisfactory results, I would want to minimize Sum of (value of unknown * efficiency constant) over all unknowns, i.e. Sigma[xI*eI] from I=0 to n, but finding an accurate solution is of a greater priority.
Performance is also important, due to the fact that this algorithm may need to be run several times per second.
So, does anyone have any ideas to help me on implementing this?
Edit: You might just want to stick to linear programming with equality and inequality constraints, but here's an interesting exact solution that does not incorporate the constraint that your unknowns are between 0 and 1.
Here's a powerpoint discussing your problem: http://see.stanford.edu/materials/lsoeldsee263/08-min-norm.pdf
I'll translate your problem into math to make things a bit easier to figure out:
you have a 6x20 matrix A and a vector x with 20 elements. You want to minimize (x^T)e subject to Ax=y. According to the slides, if you were just minimizing the sum of x, then the answer is A^T(AA^T)^(-1)y. I'll take another look at this as soon as I get the chance and see what the solution is to minimizing (x^T)e (ie your specific problem).
Edit: I looked in the powerpoint some more and near the end there's a slide entitled "General norm minimization with equality constraints". I am going to switch the notation to match the slide's:
Your problem is that you want to minimize ||Ax-b||, where b = 0 and A is your e vector and x is the 20 unknowns. This is subject to Cx=d. Apparently the answer is:
x=(A^T A)^-1 (A^T b -C^T(C(A^T A)^-1 C^T)^-1 (C(A^T A)^-1 A^Tb - d))
it's not pretty, but it's not as bad as you might think. There's really aren't that many calculations. For example (A^TA)^-1 only needs to be calculated once and then you can reuse the answer. And your matrices aren't that big.
Note that I didn't incorporate the constraint that the elements of x are within [0,1].
It looks like the solution for what I am doing is with Linear Programming. It is starting to come back to me, but if I have other problems I will post them in their own dedicated questions instead of turning this into an encyclopedia.
My question is an extension of the discussion How to fit the 2D scatter data with a line with C++. Now I want to extend my question further: when estimating the line that fits 2D scatter data, it would be better if we can treat each 2D scatter data differently. That is to say, if the scatter point is far away from the line, we can give the point a low weighting, and vice versa. Therefore, the question then becomes: given an array of 2D scatter points as well as their weighting factors, how can we estimate the linear line that passes them? A good implementation of this method can be found in this article (weighted least regression). However, the implementation of the algorithm in that article is too complicated as it involves matrix calculation. I am therefore trying to find a method without matrix calculation. The algorithm is an extension of simple linear regression, and in order to illustrate the algorithm, I wrote the following MATLAB codes:
function line = weighted_least_squre_for_line(x,y,weighting);
part1 = sum(weighting.*x.*y)*sum(weighting(:));
part2 = sum((weighting.*x))*sum((weighting.*y));
part3 = sum( x.^2.*weighting)*sum(weighting(:));
part4 = sum(weighting.*x).^2;
beta = (part1-part2)/(part3-part4);
alpha = (sum(weighting.*y)-beta*sum(weighting.*x))/sum(weighting);
a = beta;
c = alpha;
b = -1;
line = [a b c];
In the above codes, x,y,weighting represent the x-coordinate, y-coordinate and the weighting factor respectively. I test the algorithm with several examples but still not sure whether it is right or not as this method gets a different result with Polyfit, which relies on matrix calculation. I am now posting the implementation here and for your advice. Do you think it is a right implementation? Thanks!
If you think it is a good idea to downweight points that are far from the line, you might be attracted by http://en.wikipedia.org/wiki/Least_absolute_deviations, because one way of calculating this is via http://en.wikipedia.org/wiki/Iteratively_re-weighted_least_squares, which will give less weight to points far from the line.
If you think all your points are "good data", then it would be a mistake to weight them naively according to their distance from your initial fit. However, it's a fairly common practice to discard "outliers": if a few data points are implausibly far from the fit, and you have reason to believe that there's an error mechanism that could generate a small subset of "bad" datapoints, you could simply remove the implausible points from the dataset to get a better fit.
As far as the math is concerned, I would recommend biting the bullet and trying to figure out the matrix math. Perhaps you could find a different article, or a book which has a better presentation. I will not comment on your Matlab code, except to say that it looks like you will have some precision problems when subtracting part4 from part3, and probably part2 from part1 as well.
Not exactly what you are asking for, but you should look into robust regression. MATLAB has the function robustfit (requires Statistics Toolbox).
There is even an interactive demo you can play with to compare regular linear regression vs. robust regression:
>> robustdemo
This shows that in the presence of outliers, robust regression tends to give better results.
I'm having a trouble with my math:
Assume that we have a function: F(x,y) = P; And my question is: what would be the most efficient way of counting up suitable (x,y) plots for this function ? It means that I don't need the coordinates themself, but I need a number of them. P is in a range: [0 ; 10^14]. "x" and "y" are integers. Is it solved using bruteforce or there are some advanced tricks(math / programming language(C,C++)) to solve this fast enough ?
To be more concrete, the function is: x*y - ((x+y)/2) + 1.
x*y - ((x+y)/2) + 1 == P is equivalent to (2x-1)(2y-1) == (4P-3).
So, you're basically looking for the number of factorizations of 4P-3. How to factor a number in C or C++ is probably a different question, but each factorization yields a solution to the original equation. [Edit: in fact two solutions, since if A*B == C then of course (-A)*(-B) == C also].
As far as the programming languages C and C++ are concerned, just make sure you use a type that's big enough to contain 4 * 10^14. int won't do, so try long long.
You have a two-parameter function and want to solve it for a given constant.
This is a pretty big field in mathematics, and there are probably dozens of algorithms of solving your equation. One key idea that many use is the fact that if you find a point where F<P and then a point F>P, then somewhere between these two points, F must equal P.
One of the most basic algorithms for finding a root (or zero, which you of course can convert to by taking F'=F-P) is Newton's method. I suggest you start with that and read your way up to more advanced algorithms. This is a farily large field of study, so happy reading!
Wikipedia has a list of root-finding algorithms that you can use as a starting place.