When implementing the ReLU function for AutoDiff, one of the methods used is the std::max function; other implementations (conditional statements) work correctly but a try to implement max functions returns only 0 in the whole range.
On input vector:
dual in[] = { -3.0, -1.5, -0.1, 0.1, 1.5, 3.0 }
the derivative call in the form
derivative(ReLU,wrt(y),at(y)) where y = in[i]
gives proper results if ReLU is implemented with:
dual ReLU_ol(dual x) {
return (x > 0) * x; // ok; autodiff gives x > 0 ? 1 : 0
}
dual ReLU_if(dual x) {
if (x > 0.0) {
return x;
} else {
return 0.0;
}
// ok; autodiff gives x > 0 ? 1 : 0
}
that is (regarding derivative) one if x > 0 and zero elsewhere.
When ReLU is implemented in the form:
dual ReLU_max(dual x) {
return std::max(0.0,(double)x); // results an erroneous result of derivative
}
as result, I get zero in the whole range.
I expect that std::max (or std::min) should be prepared correctly for automatic differentiation.
Am I doing something wrong or maybe I miss understand something?
The resulting plot is:
where d / dx is calculated with AutoDiff; purple and blue lines are overlapping; results for ReLU_ol are not plotted.
Related
I writing a program to numerically find the roots of functions with irrational roots by various methods.
For methods such as linear interpolation, you need to find the approximate range in which a root lies, for this I wrote this code:
bool fxn1 = false;
bool fxn2 = false;
vector<float> root_list;
if(f_x(-100) < 0)
{
fxn2 = true;
}
for(float i = -99.99; i < 100.01; i += 0.01)
{
fxn1 = fxn2;
if(f_x(i) < 0)
{
fxn2 = true;
}
else
{
fxn2 = false;
}
if((fxn1 == false && fxn2 == true) || (fxn1 == true && fxn2 == false))
{
root_list.push_back(i-0.01);
root_list.push_back(i);
}
}
However, for non-continuous functions (i.e. functions with asymptotes), this code will also be triggered when the function swaps from positive to negative values either side of the asymptote.
Is there a way to get the program to tell the difference between a root and an asymptote?
Thanks in advance
If the function, f(x), is converging on a point inside [a,b] then the half-way point (a + b) / 2 should be closer to zero than a or b.
This observation leads to the following procedure:
Let mid = (a + b) / 2
If |f(mid)| < |f(a)| AND |f(mid)| < |f(b)| Then
Algorithm has converged to a root
Else
Algorithm has converged to an asymptote
End
In this pseudo code |.| denotes floating-point absolute value.
Finding numerically a root only make sense if the function has nice properties, and at least is continuous. What would you think about this one:
f: x -> f(x) defined by:
2 * i < x < 2 * i + 1 (i element of Z) : f(x) = x
2 - i + 1 < x < 2 * i (i element of Z) : f(x) = -x
x = i (i element of Z) : f(x) = 1
It is perfectly defined on R, is bounded on any bounded interval, has positive and negative values on any interval of size > 1, and is continuous on any non integer point, but it has no root.
It is simply because the rule that a root must exist on segment ]x, y[ if x < 0 < y or y < 0 < x only applies if the function is continuous on the interval.
And good luck if you want to numerically test for continuity of a function...
I am implementing a lambda to row normalize a 2D vector in C++. Consider the simple case of a 3x3 matrix.
1 0 1
0 1 0
0 1 1
My normalization factor is the sum of non-zero entries in the row. Each entry is then divided by this normalization factor. For instance, row 1 has 2 non-zero entries summing up 2. Therefore, I divide each entry by 2. The row normalized vector is defined as follows:
1/2 0 1/2
0 1 0
0 1/2 1/2
The relevant normalization code is shown here(note MAX_SIZE = 3). There is a syntactical error in the lambda capture list.
for(int i = 0; i < MAX_SIZE ; i++)
{
transform(matrix[i].begin(),matrix[i].end(),matrix.begin(), [matrix[i].begin()](int x){
return distance(matrix[i].begin(),lower_bound(matrix[i].begin(),matrix[i].end(),x))});
}
Am I missing anything here?
A lambda capture list in C++ can only specify the names of values to capture, and matrix[i].begin() is not a name, it is a temporary value. You can either give it a name or you can make a variable for it in the enclosing scope. Much of the surrounding code is missing, so I invented a working version of the code for you to dissect:
#include <algorithm>
#include <cstdio>
template<int N>
void normalize(double (&mat)[N][N]) {
std::for_each(std::begin(mat), std::end(mat),
[](double (&row)[N]) {
double sum = std::accumulate(std::begin(row), std::end(row), 0.0);
std::transform(std::begin(row), std::end(row), std::begin(row),
[sum](double x) { return x / sum; });
});
}
template<int N>
void print(const double (&mat)[N][N]) {
std::for_each(std::begin(mat), std::end(mat),
[](const double (&row)[N]) {
std::for_each(std::begin(row), std::end(row),
[](double x) { std::printf(" %3.1f", x); });
std::putchar('\n');
});
}
int main() {
double mat[3][3] = {
{ 1, 0, 1 },
{ 0, 1, 0 },
{ 0, 1, 1 },
};
std::puts("Matrix:");
print(mat);
normalize(mat);
std::puts("Normalized:");
print(mat);
return 0;
}
Here is the output:
Matrix:
1.0 0.0 1.0
0.0 1.0 0.0
0.0 1.0 1.0
Normalized:
0.5 0.0 0.5
0.0 1.0 0.0
0.0 0.5 0.5
This code is a bit weird, as far as C++ code goes, because it uses lambdas for everything instead of loops (or mixing for loops with higher-order-functions). But you can see that by having a variable for each row (named row) we can make it very easy to loop over that row instead of specifying matrix[i] everywhere.
The weird syntax for array parameters double (&mat)[N][N] is to avoid pointer decay, which allows us to use begin() and end() in the function body (which don't work if the parameters decay to pointers).
From this question: Random number generator which gravitates numbers to any given number in range? I did some research since I've come across such a random number generator before. All I remember was the name "Mueller", so I guess I found it, here:
Box-Mueller transform
I can find numerous implementations of it in other languages, but I can't seem to implement it correctly in C#.
This page, for instance, The Box-Muller Method for Generating Gaussian Random Numbers says that the code should look like this (this is not C#):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
double gaussian(void)
{
static double v, fac;
static int phase = 0;
double S, Z, U1, U2, u;
if (phase)
Z = v * fac;
else
{
do
{
U1 = (double)rand() / RAND_MAX;
U2 = (double)rand() / RAND_MAX;
u = 2. * U1 - 1.;
v = 2. * U2 - 1.;
S = u * u + v * v;
} while (S >= 1);
fac = sqrt (-2. * log(S) / S);
Z = u * fac;
}
phase = 1 - phase;
return Z;
}
Now, here's my implementation of the above in C#. Note that the transform produces 2 numbers, hence the trick with the "phase" above. I simply discard the second value and return the first.
public static double NextGaussianDouble(this Random r)
{
double u, v, S;
do
{
u = 2.0 * r.NextDouble() - 1.0;
v = 2.0 * r.NextDouble() - 1.0;
S = u * u + v * v;
}
while (S >= 1.0);
double fac = Math.Sqrt(-2.0 * Math.Log(S) / S);
return u * fac;
}
My question is with the following specific scenario, where my code doesn't return a value in the range of 0-1, and I can't understand how the original code can either.
u = 0.5, v = 0.1
S becomes 0.5*0.5 + 0.1*0.1 = 0.26
fac becomes ~3.22
the return value is thus ~0.5 * 3.22 or ~1.6
That's not within 0 .. 1.
What am I doing wrong/not understanding?
If I modify my code so that instead of multiplying fac with u, I multiply by S, I get a value that ranges from 0 to 1, but it has the wrong distribution (seems to have a maximum distribution around 0.7-0.8 and then tapers off in both directions.)
Your code is fine. Your mistake is thinking that it should return values exclusively within [0, 1]. The (standard) normal distribution is a distribution with nonzero weight on the entire real line. That is, values outside of [0, 1] are possible. In fact, values within [-1, 0] are just as likely as values within [0, 1], and moreover, the complement of [0, 1] has about 66% of the weight of the normal distribution. Therefore, 66% of the time we expect a value outside of [0, 1].
Also, I think this is not the Box-Mueller transform, but is actually the Marsaglia polar method.
I am no mathematician, or statistician, but if I think about this I would not expect a Gaussian distribution to return numbers in an exact range. Given your implementation the mean is 0 and the standard deviation is 1 so I would expect values distributed on the bell curve with 0 at the center and then reducing as the numbers deviate from 0 on either side. So the sequence would definitely cover both +/- numbers.
Then since it is statistical, why would it be hard limited to -1..1 just because the std.dev is 1? There can statistically be some play on either side and still fulfill the statistical requirement.
The uniform random variate is indeed within 0..1, but the gaussian random variate (which is what Box-Muller algorithm generates) can be anywhere on the real line. See wiki/NormalDistribution for details.
I think the function returns polar coordinates. So you need both values to get correct results.
Also, Gaussian distribution is not between 0 .. 1. It can easily end up as 1000, but probability of such occurrence is extremely low.
This is a monte carlo method so you can't clamp the result, but what you can do is ignore samples.
// return random value in the range [0,1].
double gaussian_random()
{
double sigma = 1.0/8.0; // or whatever works.
while ( 1 ) {
double z = gaussian() * sigma + 0.5;
if (z >= 0.0 && z <= 1.0)
return z;
}
}
I have a problem with my code where agents moving around suddenly disappear. This seems to be because their positions suddenly become 1.#INF000 in the x and y axis. I did a little research and someone said this can occur with acos if a value is over or under 1 and -1 respectively, but went on to say it could happen if the values were close too. I added an if statement to check to see if I'm ever taking acos of 1 or -1 and it does evaluate to 1 a few frame cycles before they disappear, however I don't really understand the problem to be able to fix it. Can anyone shed any light on this matter?
D3DXVECTOR3 D3DXVECTOR3Helper::RotToTarget2DPlane(D3DXVECTOR3 position, D3DXVECTOR3 target)//XY PLANE
{
//Create std::vector to target
D3DXVECTOR3 vectorToTarget = target - position;
D3DXVec3Normalize(&vectorToTarget, &vectorToTarget);
//creates a displacement std::vector of relative 0, 0, 0
D3DXVECTOR3 neutralDirectionalVector = D3DXVECTOR3(1, 0, 0);//set this to whatever direction your models are loaded facing
//Create the angle between them
if(D3DXVec3Dot(&vectorToTarget, &neutralDirectionalVector) >= 1.0f ||D3DXVec3Dot(&vectorToTarget, &neutralDirectionalVector) <= -1.0f)
{
float i = D3DXVec3Dot(&vectorToTarget, &neutralDirectionalVector);
float j = 0; //ADDED THIS IF STATEMENT
}
float angle = acos(D3DXVec3Dot(&vectorToTarget, &neutralDirectionalVector));
if (target.y > position.y)
{
return D3DXVECTOR3(0, 0, angle);
}
else
{
return D3DXVECTOR3(0, 0, -angle);
}
}//end VecRotateToTarget2DPlane()
It is dangerous to call acos on a value that may be exactly +/-1.0, because rounding errors can cause the computed value to be outside this range.
But it's easy to fix -- use this function instead:
double SafeAcos (double x)
{
if (x < -1.0) x = -1.0 ;
else if (x > 1.0) x = 1.0 ;
return acos (x) ;
}
man page for acos tells this :
On success, these functions return the arc cosine of x in radians; the
return value is in the range [0, pi].
If x is a NaN, a NaN is returned.
If x is +1, +0 is returned.
If x is positive infinity or negative infinity, a domain error occurs,
and a NaN is returned.
If x is outside the range [-1, 1], a domain error occurs, and a NaN is
returned.
This means that for a value outside of the [-1,+1] range, the value is not a number. That also corresponds to how the acos is defined.
As mentioned above (by #TonyK) , acos is not defined outside the range [-1,+1].
First, you should check why the issue exists, aka: Why is my argument out of range? Maybe there is some issue with the calculation of your argument.
If you worked that out, you can use the SafeAcos proposed by TonyK.
As we do know that acos(-1.0) = π and acos(1.0) = 0,
I suggest a little modification (for performance reasons):
double SafeAcos (double x)
{
if (x <= -1.0)
return MATH_PI;
else if(x >= 1.0)
return 0;
else
return acos (x) ;
}
Where MATH_PI = 3.14...
Suppose you have have a rectangle, bottom-left point 0,0 and upper-right point is 100,100.
Now two line intersects the rectangle. I have to find out the coordinate of the intersection point. I have done that. Now the problem is I can't tell whether it is inside the rectangle or not. I used double comparison. But I think it is giving me wrong answer. Suppose the intersection point is ( x , y ). I used this checking for comparison : if( x >= 0.0 && x <= 100.0 && y >= 0.0 && y <= 100.0 ). What should I do?
//this function generates line
line genline( int x1 , int y1 , int x2 , int y2 ){
line l ;
l.A = y2 - y1 ;
l.B = x1 - x2 ;
l.C = l.A * x1 + l.B * y1 ;
return l ;
}
//this function checks intersection
bool intersect( line m ,line n ) {
int det = m.A * n.B - m.B * n.A ;
if( det == 0 ){
return false ;
}
else {
double x = ( n.B * m.C - m.B * n.C ) / ( det * 1.0 ) ;
double y = ( m.A * n.C - n.A * m.C ) / ( det * 1.0 ) ;
if( x >= 0.0 && x <= L && y >= 0.0 && y <= W ) { return true ; }
else{ return false ; }
}
}
EDIT:
Both the line are stretched to infinity.
Your math looks like it's right. By the way, If a line intersects something, it is always inside that something.
Checking to see if a point is inside a rectangle is relatively easy. However, the challenge is to find the intersection between two line segments. There are a large number of corner cases to that problem and limited accuracy of floating point numbers play a huge roll here.
Your algorithm seems to be overly simplistic. For a deeper discussion about this topic you can look at this and this. This two parts article investigates the problem of finding the intersection of two lines using floating point numbers. Notice that they are about MATLAB not C++ though that does not change the problem and the algorithms are easily translatable to any language.
Depending on application, even with clever tricks floating point representation might not simply cut it for some geometry problems. CGAL is a C++ library dedicated to computational geometry that deals with these kind problems. When necessary it uses arbitrary precision arithmetic to handle degenerate cases.
When you're dealing with floating point (or double), testing for equality is naïve and will fail in edge cases. Every comparison you make should be in reference to "epsilon", an extremely small quantity that doesn't matter. If two numbers are within epsilon for each other, then they are considered equal.
For example, instead of "if(a == b)", you need:
bool isEqual(double a, double b, double epsilon = 1.E-10)
{ return fabs(a - b) <= epsilon;
}
Pick a suitable value for epsilon depending on your problem domain.