I've been wrestling with this issue for a week and I just need some guidance on the math part of it. If I could just understand the math behind it I could piece together the functions to make it work. The assignment is;
Design and develop a C++ program for Calculating e(n) when delta <= 0.000001
e(n-1) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n-1)!
e(n) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n)!
delta = e(n) – e(n-1)
You do not have any input to the program. Your output should be something like this:
N = 2 e(1) = 2 e(2) = 2.5 delta = 0.5
N = 3 e(2) = 2.5 e(3) = 2.565 delta = 0.065
...
You must use recursive function calls.
My first issue is the math and the variables that would contain them.
the delta, e(n), and e(n-1) variable must doubles
if e(n) = 1 + 1 / 1! = 2 then e(n-1) must equal 1, which means delta = 1 (that's my thinking anyway) I'm just not sure of the math behind the .5 delta the first time and the 0.065 in the second iteration.
Can someone point me in the right direction on this problem?
Thank you,
T
From the wikipedia link, you can see that
I will not explain the notion of limits here, but what this basically means is that, if we define a function e where e(n) = 1 + 1/1! + 1/2! + 1/3! + 1/4! + … + 1/(n)! (which is the function given in your problem), we are able to approximate the real value of the constant e.
The higher n is, the closer we get from e.
If you look closely at the function, you can see that each time, we add a term which is smaller than the previous one: 1 >= 1/1! >= 1/2! >= .... >= 1/(n)!
That basically means that, every time we increase n we are getting closer to e but we are slowing down in the way.
The real value of e is 2.71828...
In our first step e(1) = 1, we are 1.71828... too far from the real value
In the second step e(2) = 2, we are at 0.71828..., 1 distance closer
In the third step e(3) = 2.5, we are now at 0.21828..., 0.5 distance closer
As you can see, we are getting there, but the closer we get, the slower we move. Now let's say that at each step, we want to know how close we have moved compared to the previous value.
We then do simply e(n) - e(n-1). This is basically what the delta means.
At some point, we are moving so slow that it does no longer make any sense to keep going. We are almost staying put. At this point, we decide that our approximation is close enough from e.
In your case, the problem defines the minimum progression speed to 0.000001
here is a solution :-
delta = e(n) - e(n-1)
delta = 1/n!
delta < 0.000001
n! > 1000000
n >= 10 as 10! = 3628800
Related
Using Chebyshev polynomials, we can compute sin(2*Pi/n) exactly using the CGAL and CORE library, like the following piece of codes:
#include <CGAL/CORE_Expr.h>
#include <CGAL/Polynomial.h>
#include <CGAL/number_utils.h>
//return sin(theta) and cos(theta) for theta = 2pi/n
static std::pair<AA, AA> sin_cos(unsigned short n) {
// We actually use -x instead of x since root_of will give the k-th
// smallest root but we want the second largest one without counting.
Polynomial x(CGAL::shift(Polynomial(-1), 1));
Polynomial twox(2*x);
Polynomial a(1), b(x);
for (unsigned short i = 2; i <= n; ++i) {
Polynomial c = twox*b - a;
a = b;
b = c;
}
a = b - 1;
AA cos = -CGAL::root_of(2, a.begin(), a.end());
AA sin = CGAL::sqrt(AA(1) - cos*cos);
return std::make_pair(sin, cos);
}
But if I want to compute sin(2*m*Pi/n) exactly, where m and n are integers, what is the formula of the polynomial that I should use? Thanks.
(Partial solution.)
This is essentially computing the real and imaginary part of the roots of unity as algebraic numbers. Let's denote w(m) = exp(2*pi*I*m/n). Then, w(m) itself is a complex root of En(x) = x^n-1.
You need to find a defining polynomial of Re(w(m)). Resultants are a tool to find such a polynomial: 2*Re(w(m)) is a root of Res (En(x-y), En(y); y).
For an explanation why this is the case: Note that 2*Re(w(m)) = w(m) + conj(w(m)), and that the complex roots of En come in conjugate pairs; hence, also conj(w(m)) is a root of En. Now loosely speaking, the En(y) part "constrains" y to be any (complex) root of En, and combining this with the first argument allows x to take any complex value such that x-y is a root of En as well. Hence, a possible assignment is y = conj(w(m)) and x-y = w(m), hence x = w(m)+conj(w(m)) = 2*Re(w(m)).
CGAL can compute resultants of multivariate polynomials, so you can compute this resultant, and you simply have to pick the correct real root. (The largest one will obviously be w(0) = 1, the smallest one is 2*Re(w(floor(n/2))).)
Unfortunately, the resultant has a high complexity (degree n^2), and resultant computation will not be the fastest operation you've ever seen. Also, you'll pay for dense polynomials although your instances are very sparse and structured. YMMV; I have no clue about your use case, and if you need higher degrees.
However, I did a few tests in a computer algebra system, and I found that the resultant splits into factors of more reasonable size, and that all its real roots actually belong to a much simpler polynomial of degree floor(n/2)+1 only. (No proof, just an observation.)
I don't know of a direct formula to write down this factor, and I don't want to speculate about it. But maybe some people at mathoverflow or math.stackexchange can help?
EDIT: Here is a guess for at least a recursive formula.
I write s(n,x) for the significant factor of the resultant polynomial containing all real roots but 0. This means that s(n,x) has all values 2*Re(w(m)) for m != n/4, 3*n/4 as roots.
s(0,x) = 0
s(1,x) = x - 2
s(2,x) = x^2 - 4
s(3,x) = x^2 - x - 2
s(4,x) = x^2 - 4
s(5,x) = x^3 - x^2 - 3*x + 2
s(6,x) = x^4 - 5*x^2 + 4
s(7,x) = x^4 - x^3 - 4*x^2 + 3*x + 2
s(8,x) = x^4 - 6*x^2 + 8
s(n,x) = (x^2-2)*s(n-4,x) - s(n-8,x)
Waiting for a proof...
Hi I need to calculate (2^n + (-1)^n) % 10000007
where 1 < n < 10^9
How should I go about writing a program for it in c++?
I know this mod property
(a + b)%n = (a%n + b%n)%n but this wont help me.
Given
(a + b)%m = (a%m + b%m)%m
Then, replace both a and b with the same power of 2, and you get the recurrence:
2k+1%m = (2k%m + 2k%m)%m
You probably already figured your formula allows you to break down your problem into:
(2n + (-1)n)%P = (2n%P + (-1)n%P)%P
Then, note that (-1)k is either 1 or -1, and you should be able to calculate your problem in O(n) time.
I'm trying to write out a bit of code for the gradient descent algorithm explained in the Stanford Machine Learning lecture (lecture 2 at around 25:00). Below is the implementation I used at first, and I think it's properly copied over from the lecture, but it doesn't converge when I add large numbers (>8) to the training set.
I'm inputting a number X, and the point (X,X) is added to the training set, so at the moment, I'm only trying to get it to converge to y=ax+b where a=1=theta\[1\] and b=0=theta\[0\].
The training set is the array x and y, where (x[i],y[i]) is a point.
void train()
{
double delta;
for (int i = 0; i < x.size(); i++)
{
delta = y[i]-hypothesis(x[i]);
theta[1] += alpha*delta*x[i];
theta[0] += alpha*delta*1;
}
}
void C_Approx::display()
{
std::cout<<theta[1]<<"x + "<<theta[0]<<" \t "<<"f(x)="<<hypothesis(1)<<std::endl;
}
some of the results I'm getting:
I input a number, it runs train() a few times, then display()
1
0.33616x + 0.33616 f(x)=0.67232
1
0.482408x + 0.482408 f(x)=0.964816
1
0.499381x + 0.499381 f(x)=0.998762
1
0.499993x + 0.499993 f(x)=0.999986
1
0.5x + 0.5 f(x)=1
An example of it diverging after it passed 8:
1
0.33616x + 0.33616 f(x)=0.67232
2
0.705508x + 0.509914 f(x)=1.21542
3
0.850024x + 0.449928 f(x)=1.29995
4
0.936062x + 0.330346 f(x)=1.26641
5
0.951346x + 0.231295 f(x)=1.18264
6
0.992876x + 0.137739 f(x)=1.13062
7
0.932206x + 0.127372 f(x)=1.05958
8
1.00077x + 0.000493063 f(x)=1.00126
9
-0.689325x + -0.0714712 f(x)=-0.760797
10
4.10321e+08x + 4.365e+07 f(x)=4.53971e+08
11
1.79968e+22x + 1.61125e+21 f(x)=1.9608e+22
12
-3.9452e+41x + -3.26957e+40 f(x)=-4.27216e+41
I tried the solution proposed here of scaling the step and ended up with similar results.
What am I doing wrong?
Your implementation is good. Generally, stochastic gradient descent might diverge when α is too large. What you would do with a large dataset is take a reasonably sized random sample, find α that gives you the best results, and then use it for the rest.
I have experienced the same problem (albeit in Java) because my learning rate was too big.
For short, I was using α = 0.001 and I had to push it to 0.000001 to see actual convergence.
Of course these values are linked to your dataset.
When your cost function increases or cycles up and down, you usually have too large a value for alpha. What alpha are you using?
Start out with an alpha = 0.001 and see if that converges? If not try various alphas (0.003, 0.01, 0.03, 0.1, 0.3, 1) and find one that converges quickly.
Scaling the data (normalization) won't help you with only 1 feature (your theta[1]) as normalization only applies to 2+ features (multivariate linear regression).
Also bear in mind that for a small number of features you can use the Normal Equation to get the correct answer.
use backtracking line search to guaranty convergence. It is very simple to implement. See Stephen Boyd, Convex Optimization for reference. You can choose some standard alpha, beta values for backtracking line search, for example 0.3 and 0.8.
If I understand you correctly, your training set only has a non-zero gradient at the edge of a line? Unless you start at the line (actually start exactly at one of your training points) you won't find the line. You are always at a local minimum.
It's not clean from your description what problem you're solving.
Also it's very dangerous to post links to external resources - you can be blocked in stackoverflow.
In any case - gradient descend method and (subgradient descend too) with fixed step size (ML community call it learning rate) should not necesseray converge.
p.s.
Machine Learning community is not interesting in "convergence condition" and "convergence to what" - they are interested in create "something" which pass cross-validation with good result.
If you're curious about optimization - start to look in convex optimization. Unfortunately it's hard to find job on it, but it append clean vision into what happens in various math optimization things.
Here is source code which demonstrate it for simple quadratic objective:
#!/usr/bin/env python
# Gradiend descend method (without stepping) is not converged for convex
# objective
alpha = 0.1
#k = 10.0 # jumping around minimum
k = 20.0 # diverge
#k = 0.001 # algorithm converged but gap to the optimal is big
def f(x): return k*x*x
def g(x): return 2*k*x
x0 = 12
xNext = x0
i = 0
threshold = 0.01
while True:
i += 1
xNext = xNext + alpha*(-1)*(g(xNext))
obj = (xNext)
print "Iteration: %i, Iterate: %f, Objective: %f, Optimality Gap: %f" % (i, xNext, obj, obj - f(0.0))
if (abs(g(xNext)) < threshold):
break
if i > 50:
break
print "\nYou launched application with x0=%f,threshold=%f" % (x0, threshold)
So i'm implementing a heuristic algorithm, and i've come across this function.
I have an array of 1 to n (0 to n-1 on C, w/e). I want to choose a number of elements i'll copy to another array. Given a parameter y, (0 < y <= 1), i want to have a distribution of numbers whose average is (y * n). That means that whenever i call this function, it gives me a number, between 0 and n, and the average of these numbers is y*n.
According to the author, "l" is a random number: 0 < l < n . On my test code its currently generating 0 <= l <= n. And i had the right code, but i'm messing with this for hours now, and i'm lazy to code it back.
So i coded the first part of the function, for y <= 0.5
I set y to 0.2, and n to 100. That means it had to return a number between 0 and 99, with average 20.
And the results aren't between 0 and n, but some floats. And the bigger n is, smaller this float is.
This is the C test code. "x" is the "l" parameter.
//hate how code tag works, it's not even working now
int n = 100;
float y = 0.2;
float n_copy;
for(int i = 0 ; i < 20 ; i++)
{
float x = (float) (rand()/(float)RAND_MAX); // 0 <= x <= 1
x = x * n; // 0 <= x <= n
float p1 = (1 - y) / (n*y);
float p2 = (1 - ( x / n ));
float exp = (1 - (2*y)) / y;
p2 = pow(p2, exp);
n_copy = p1 * p2;
printf("%.5f\n", n_copy);
}
And here are some results (5 decimals truncated):
0.03354
0.00484
0.00003
0.00029
0.00020
0.00028
0.00263
0.01619
0.00032
0.00000
0.03598
0.03975
0.00704
0.00176
0.00001
0.01333
0.03396
0.02795
0.00005
0.00860
The article is:
http://www.scribd.com/doc/3097936/cAS-The-Cunning-Ant-System
pages 6 and 7.
or search "cAS: cunning ant system" on google.
So what am i doing wrong? i don't believe the author is wrong, because there are more than 5 papers describing this same function.
all my internets to whoever helps me. This is important to my work.
Thanks :)
You may misunderstand what is expected of you.
Given a (properly normalized) PDF, and wanting to throw a random distribution consistent with it, you form the Cumulative Probability Distribution (CDF) by integrating the PDF, then invert the CDF, and use a uniform random predicate as the argument of the inverted function.
A little more detail.
f_s(l) is the PDF, and has been normalized on [0,n).
Now you integrate it to form the CDF
g_s(l') = \int_0^{l'} dl f_s(l)
Note that this is a definite integral to an unspecified endpoint which I have called l'. The CDF is accordingly a function of l'. Assuming we have the normalization right, g_s(N) = 1.0. If this is not so we apply a simple coefficient to fix it.
Next invert the CDF and call the result G^{-1}(x). For this you'll probably want to choose a particular value of gamma.
Then throw uniform random number on [0,n), and use those as the argument, x, to G^{-1}. The result should lie between [0,1), and should be distributed according to f_s.
Like Justin said, you can use a computer algebra system for the math.
dmckee is actually correct, but I thought that I would elaborate more and try to explain away some of the confusion here. I could definitely fail. f_s(l), the function you have in your pretty formula above, is the probability distribution function. It tells you, for a given input l between 0 and n, the probability that l is the segment length. The sum (integral) for all values between 0 and n should be equal to 1.
The graph at the top of page 7 confuses this point. It plots l vs. f_s(l), but you have to watch out for the stray factors it puts on the side. You notice that the values on the bottom go from 0 to 1, but there is a factor of x n on the side, which means that the l values actually go from 0 to n. Also, on the y-axis there is a x 1/n which means these values don't actually go up to about 3, they go to 3/n.
So what do you do now? Well, you need to solve for the cumulative distribution function by integrating the probability distribution function over l which actually turns out to be not too bad (I did it with the Wolfram Mathematica Online Integrator by using x for l and using only the equation for y <= .5). That however was using an indefinite integral and you are really integration along x from 0 to l. If we set the resulting equation equal to some variable (z for instance), the goal now is to solve for l as a function of z. z here is a random number between 0 and 1. You can try using a symbolic solver for this part if you would like (I would). Then you have not only achieved your goal of being able to pick random ls from this distribution, you have also achieved nirvana.
A little more work done
I'll help a little bit more. I tried doing what I said about for y <= .5, but the symbolic algebra system I was using wasn't able to do the inversion (some other system might be able to). However, then I decided to try using the equation for .5 < y <= 1. This turns out to be much easier. If I change l to x in f_s(l) I get
y / n / (1 - y) * (x / n)^((2 * y - 1) / (1 - y))
Integrating this over x from 0 to l I got (using Mathematica's Online Integrator):
(l / n)^(y / (1 - y))
It doesn't get much nicer than that with this sort of thing. If I set this equal to z and solve for l I get:
l = n * z^(1 / y - 1) for .5 < y <= 1
One quick check is for y = 1. In this case, we get l = n no matter what z is. So far so good. Now, you just generate z (a random number between 0 and 1) and you get an l that is distributed as you desired for .5 < y <= 1. But wait, looking at the graph on page 7 you notice that the probability distribution function is symmetric. That means that we can use the above result to find the value for 0 < y <= .5. We just change l -> n-l and y -> 1-y and get
n - l = n * z^(1 / (1 - y) - 1)
l = n * (1 - z^(1 / (1 - y) - 1)) for 0 < y <= .5
Anyway, that should solve your problem unless I made some error somewhere. Good luck.
Given that for any values l, y, n as described, the terms you call p1 and p2 are both in [0,1) and exp is in [1,..) making pow(p2, exp) also in [0,1) thus I don't see how you'd ever get an output with the range [0,n)
I'm kind of stuck here, I guess it's a bit of a brain teaser. If I have numbers in the range between 0.5 to 1 how can I normalize it to be between 0 to 1?
Thanks for any help, maybe I'm just a bit slow since I've been working for the past 24 hours straight O_O
Others have provided you the formula, but not the work. Here's how you approach a problem like this. You might find this far more valuable than just knowning the answer.
To map [0.5, 1] to [0, 1] we will seek a linear map of the form x -> ax + b. We will require that endpoints are mapped to endpoints and that order is preserved.
Method one: The requirement that endpoints are mapped to endpoints and that order is preserved implies that 0.5 is mapped to 0 and 1 is mapped to 1
a * (0.5) + b = 0 (1)
a * 1 + b = 1 (2)
This is a simultaneous system of linear equations and can be solved by multiplying equation (1) by -2 and adding equation (1) to equation (2). Upon doing this we obtain b = -1 and substituting this back into equation (2) we obtain that a = 2. Thus the map x -> 2x - 1 will do the trick.
Method two: The slope of a line passing through two points (x1, y1) and (x2, y2) is
(y2 - y1) / (x2 - x1).
Here we will use the points (0.5, 0) and (1, 1) to meet the requirement that endpoints are mapped to endpoints and that the map is order-preserving. Therefore the slope is
m = (1 - 0) / (1 - 0.5) = 1 / 0.5 = 2.
We have that (1, 1) is a point on the line and therefore by the point-slope form of an equation of a line we have that
y - 1 = 2 * (x - 1) = 2x - 2
so that
y = 2x - 1.
Once again we see that x -> 2x - 1 is a map that will do the trick.
Subtract 0.5 (giving you a new range of 0 - 0.5) then multiply by 2.
double normalize( double x )
{
// I'll leave range validation up to you
return (x - 0.5) * 2;
}
To add another generic answer.
If you want to map the linear range [A..B] to [C..D], you can apply the following steps:
Shift the range so the lower bound is 0. (subract A from both bounds:
[A..B] -> [0..B-A]
Scale the range so it is [0..1]. (divide by the upper bound):
[0..B-A] -> [0..1]
Scale the range so it has the length of the new range which is D-C. (multiply with D-C):
[0..1] -> [0..D-C]
Shift the range so the lower bound is C. (add C to the bounds):
[0..D-C] -> [C..D]
Combining this to a single formula, we get:
(D-C)*(X-A)
X' = ----------- + C
(B-A)
In your case, A=0.5, B=1, C=0, D=1 you get:
(X-0.5)
X' = ------- = 2X-1
(0.5)
Note, if you have to convert a lot of X to X', you can change the formula to:
(D-C) C*B - A*D
X' = ----- * X + ---------
(B-A) (B-A)
It is also interesting to take a look at non linear ranges. You can take the same steps, but you need an extra step to transform the linear range to a nonlinear range.
Lazyweb answer: To convert a value x from [minimum..maximum] to [floor..ceil]:
General case:
normalized_x = ((ceil - floor) * (x - minimum))/(maximum - minimum) + floor
To normalize to [0..255]:
normalized_x = (255 * (x - minimum))/(maximum - minimum)
To normalize to [0..1]:
normalized_x = (x - minimum)/(maximum - minimum)
× 2 − 1
should do the trick
You could always use clamp or saturate within your math to make sure your final value is between 0-1. Some saturate at the end, but I've seen it done during a computation, too.