Wrong answer when calculating binomial coefficients using modified formula - python-2.7

I am trying to write a function to calculate the binomial coefficients using this formula:
The problem I am having is that I can not mange to get the correct answer. This is an example of two ways I have tried to write the function.
def binomial(n, i):
total = 0
for j in range(1, (n-i+1)):
n = float(n)
i = float(i)
j = float(j)
product = (i+j) / j
if total == 0:
total = product
else:
total = total * product
print '%.f' %total
or like this using numpy
import numpy as np
def binomial_np(n, i):
array = np.zeros(n-i+1)
for j in range(1, (n-i+1)):
s = float(j)
n = float(n)
i = float(i)
array[j] = (i+s)/s
array = array[1 : ]
array = np.prod(array)
print '%.f' %array
Both of the functions produces almost the correct result. After looking around a bit on the forum I did find some other examples that do produce the correct result, like this one from Python Binomial Coefficient
import math
x = int(input("Enter a value for x: "))
y = int(input("Enter a value for y: "))
if y == x:
print(1)
elif y == 1: # see georg's comment
print(x)
elif y > x: # will be executed only if y != 1 and y != x
print(0)
else: # will be executed only if y != 1 and y != x and x <= y
a = math.factorial(x)
b = math.factorial(y)
c = math.factorial(x-y) # that appears to be useful to get the correct result
div = a // (b * c)
print(div)
The real question I have from this is if there is something wrong with the way I have written the formulas, or if it just isnt possible to get the correct answer this way because of how float's and number of decimals work in Python. Hope someone can point me in the right direction on what I am doing wrong here.

The slight discrepancies seem to come from using floating point arithmetic. However, if you are sure that n and i are integers, there is no need at all for floating point values in your routine. You can just do
def binomial(n, i):
result = 1
for j in range(1, n-i+1):
result = result * (i+j) // j
return result
This works because the product of 2 consecutive numbers is divisible by 1*2, the product of 3 consecutive numbers is divisible by 1*2*3, ... the product of n-i consecutive numbers is divisible by (n-i)!. The calculations in the code above are ordered to that only integers result, so you get an exact answer. This because my code does not calculate (i+j)/j as your code does; it calculates result * (i+j) and only then divides by j. This code also does a fairly good job of keeping the integer values as small as possible, which should increase speed.
If course, if n or i is float rather than integer, this may not work. Also note this code does not check that 0 <= i <= n, which should be done.

I would indeed see float precision as the main problem here. You do floating point division, which means your integers may get rounded. I suggest you maintain the numerator and denominator as separate numbers, and do the division in the end. Or, if the numbers get too big using this approach, write some gcd computation and cancel common factors along the way. But only do integer divisions (//) to avoid loss of precision.

Related

avoiding zero float division in loops - python

sorry i am a bit of a newbie with programming but I am getting a float division error in a simple loop which I am not sure how to rectify.
Here is a code in python 2.7
import random
N = 100
A = []
p = 0
q = 0
k = 1
while k<=N:
x = random.random()
if x<= 0.5:
p+= 1
else:
q+=1
y = p/q
A.append(y)
k+=1
Running this code gives a zero division error. which I am not able to rectify. Can anyone tell me how to rectify this?
You are getting zero division error because of this code
if x <= 0.5:
p+=1
else:
q+=1
y= p/q
You have initialised q = 0 thus when while loop is run first time and if x <= 0.5 then p will be incremented but q will be equal to zero and in next step you are dividing p by q(which is zero). You need to put a check condition before performing division so that denominator is not zero. You can rectify it in following manner.
if x <= 0.5:
p+=1
else:
q+=1
if (q == 0):
print "Denominator is zero"
else:
y= p/q
This is just one solution since I don't know what you are trying to do in your code.
You can use numpy.nextafter(q, 1).
This gives you the next floating-point value after q towards 1, which is very small number.

Matrix multiplication with Python

I have a numerical analysis assignment and I need to find some coefficients by multiplying matrices. We were given an example in Mathcad, but now we have to do it in another programming language so I chose Python.
The problem is, that I get different results by multiplying matrices in respective environments. Here's the function in Python:
from numpy import *
def matrica(C, n):
N = len(C) - 1
m = N - n
A = [[0] * (N + 1) for i in range(N+1)]
A[0][0] = 1
for i in range(0, n + 1):
A[i][i] = 1
for j in range(1, m + 1):
for i in range(0, N + 1):
if i + j <= N:
A[i+j][n+j] = A[i+j][n+j] - C[i]/2
A[int(abs(i - j))][n+j] = A[int(abs(i - j))][n+j] - C[i]/2
M = matrix(A)
x = matrix([[x] for x in C])
return [float(y) for y in M.I * x]
As you can see I am using numpy library. This function is consistent with its analog in Mathcad until return statement, the part where matrices are multiplied, to be more specific. One more observation: this function returns correct matrix if N = 1.
I'm not sure I understand exactly what your code do. Could you explain a little more, like what math stuff you're actually computing. But if you want a plain regular product and if you use a numpy.matrix, why don't you use the already written matrix product?
a = numpy.matrix(...)
b = numpy.matrix(...)
p = a * b #matrix product

Finding the fibonacci numbers in a certain range using python

I am trying to write a function in python that returns a list of all the fibonacci numbers in a certain range but my code wont work it simply returns [0]. What is the problem?
from math import sqrt
def F(n):
return int(((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5)))
def Frange(x):
A = [0]
while max(A) < x:
H = 1
for i in range(H):
A.append(F(i))
H = H+1
return A
You set H = 1 as the first statement in your while loop; so every time you enter the for loop, H = 1 and you'll only get the Fibonacci number for n=0
You need to set H = 1 outside the while loop:
def Frange(x):
A = [0]
H = 1
while max(A) < x:
for i in range(H):
A.append(F(i))
H = H+1
return A
You could have solved this yourself very easily by printing various values inside the loops, such as print H.
I found another error and the improved code is:
from math import sqrt
def F(n):
return int(((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5)))
def Frange2(x):
A = [0]
H = 1
while max(A) < x:
if F(H) < x:
A.append(F(H))
else:
break
H = H+1
return A
The fastest and most popular and uncomplicated solution to calculating a list of fibonacci numbers in a range is
def fib3(n): #FASTEST YET
fibs= [0,1] #list from bottom up
for i in range(2, n+1):
fibs.append(fibs[-1]+fibs[-2])
return fibs
This function stores the computed fibonacci numbers in a list and later uses them as 'cached' numbers to compute further.
Hope it helps!

i was solving a sequence of large numbers, but python shell stop responding

I am a python newbie
How to deal with large numbers in python?
If i need to calculate the value of n*(2**(n-1))%m
where,
n=1000000000000000009 m=1000000007
when i was dealing with small value of n up to
n=1000009
it works but python shell stops responding for higher values
You should use a modular exponentiation method. Python builtin pow does that for you:
pow(x, y[, z]) -> number
With two arguments, equivalent to x**y. With three arguments,
equivalent to (x**y) % z, but may be more efficient (e.g. for longs).
Here is a non builtin implementation (copied from here):
def f(x,e,m):
X = x
E = e
Y = 1
while E > 0:
if E % 2 == 0:
X = (X * X) % m
E = E/2
else:
Y = (X * Y) % m
E = E - 1
return Y
Finally, what you need is
>>> n=1000000000000000009
>>> m=1000000007
>>> n*pow(2,n-1, m) % m
783433706L
n*(2**(n-1)) will be huge, so computing this the naive way will fail. Instead, you should try to split this up into reasonably sized chunks of work, and compute the modulo operation after each such chunk. A common technique here is binarization: you compute 2%m, 4%m=((2%m)*(2%m))%m, 16%m=((4%m)*(4%m))%m, 256%m=((16%m)*(16%m))%m and so on, and then multiply those values for which (n-1) has a bit set.
I think the lag may just be Python taking time to calculate the massive numbers you're dealing with.

Probability density function from a paper, implemented using C++, not working as intended

So i'm implementing a heuristic algorithm, and i've come across this function.
I have an array of 1 to n (0 to n-1 on C, w/e). I want to choose a number of elements i'll copy to another array. Given a parameter y, (0 < y <= 1), i want to have a distribution of numbers whose average is (y * n). That means that whenever i call this function, it gives me a number, between 0 and n, and the average of these numbers is y*n.
According to the author, "l" is a random number: 0 < l < n . On my test code its currently generating 0 <= l <= n. And i had the right code, but i'm messing with this for hours now, and i'm lazy to code it back.
So i coded the first part of the function, for y <= 0.5
I set y to 0.2, and n to 100. That means it had to return a number between 0 and 99, with average 20.
And the results aren't between 0 and n, but some floats. And the bigger n is, smaller this float is.
This is the C test code. "x" is the "l" parameter.
//hate how code tag works, it's not even working now
int n = 100;
float y = 0.2;
float n_copy;
for(int i = 0 ; i < 20 ; i++)
{
float x = (float) (rand()/(float)RAND_MAX); // 0 <= x <= 1
x = x * n; // 0 <= x <= n
float p1 = (1 - y) / (n*y);
float p2 = (1 - ( x / n ));
float exp = (1 - (2*y)) / y;
p2 = pow(p2, exp);
n_copy = p1 * p2;
printf("%.5f\n", n_copy);
}
And here are some results (5 decimals truncated):
0.03354
0.00484
0.00003
0.00029
0.00020
0.00028
0.00263
0.01619
0.00032
0.00000
0.03598
0.03975
0.00704
0.00176
0.00001
0.01333
0.03396
0.02795
0.00005
0.00860
The article is:
http://www.scribd.com/doc/3097936/cAS-The-Cunning-Ant-System
pages 6 and 7.
or search "cAS: cunning ant system" on google.
So what am i doing wrong? i don't believe the author is wrong, because there are more than 5 papers describing this same function.
all my internets to whoever helps me. This is important to my work.
Thanks :)
You may misunderstand what is expected of you.
Given a (properly normalized) PDF, and wanting to throw a random distribution consistent with it, you form the Cumulative Probability Distribution (CDF) by integrating the PDF, then invert the CDF, and use a uniform random predicate as the argument of the inverted function.
A little more detail.
f_s(l) is the PDF, and has been normalized on [0,n).
Now you integrate it to form the CDF
g_s(l') = \int_0^{l'} dl f_s(l)
Note that this is a definite integral to an unspecified endpoint which I have called l'. The CDF is accordingly a function of l'. Assuming we have the normalization right, g_s(N) = 1.0. If this is not so we apply a simple coefficient to fix it.
Next invert the CDF and call the result G^{-1}(x). For this you'll probably want to choose a particular value of gamma.
Then throw uniform random number on [0,n), and use those as the argument, x, to G^{-1}. The result should lie between [0,1), and should be distributed according to f_s.
Like Justin said, you can use a computer algebra system for the math.
dmckee is actually correct, but I thought that I would elaborate more and try to explain away some of the confusion here. I could definitely fail. f_s(l), the function you have in your pretty formula above, is the probability distribution function. It tells you, for a given input l between 0 and n, the probability that l is the segment length. The sum (integral) for all values between 0 and n should be equal to 1.
The graph at the top of page 7 confuses this point. It plots l vs. f_s(l), but you have to watch out for the stray factors it puts on the side. You notice that the values on the bottom go from 0 to 1, but there is a factor of x n on the side, which means that the l values actually go from 0 to n. Also, on the y-axis there is a x 1/n which means these values don't actually go up to about 3, they go to 3/n.
So what do you do now? Well, you need to solve for the cumulative distribution function by integrating the probability distribution function over l which actually turns out to be not too bad (I did it with the Wolfram Mathematica Online Integrator by using x for l and using only the equation for y <= .5). That however was using an indefinite integral and you are really integration along x from 0 to l. If we set the resulting equation equal to some variable (z for instance), the goal now is to solve for l as a function of z. z here is a random number between 0 and 1. You can try using a symbolic solver for this part if you would like (I would). Then you have not only achieved your goal of being able to pick random ls from this distribution, you have also achieved nirvana.
A little more work done
I'll help a little bit more. I tried doing what I said about for y <= .5, but the symbolic algebra system I was using wasn't able to do the inversion (some other system might be able to). However, then I decided to try using the equation for .5 < y <= 1. This turns out to be much easier. If I change l to x in f_s(l) I get
y / n / (1 - y) * (x / n)^((2 * y - 1) / (1 - y))
Integrating this over x from 0 to l I got (using Mathematica's Online Integrator):
(l / n)^(y / (1 - y))
It doesn't get much nicer than that with this sort of thing. If I set this equal to z and solve for l I get:
l = n * z^(1 / y - 1) for .5 < y <= 1
One quick check is for y = 1. In this case, we get l = n no matter what z is. So far so good. Now, you just generate z (a random number between 0 and 1) and you get an l that is distributed as you desired for .5 < y <= 1. But wait, looking at the graph on page 7 you notice that the probability distribution function is symmetric. That means that we can use the above result to find the value for 0 < y <= .5. We just change l -> n-l and y -> 1-y and get
n - l = n * z^(1 / (1 - y) - 1)
l = n * (1 - z^(1 / (1 - y) - 1)) for 0 < y <= .5
Anyway, that should solve your problem unless I made some error somewhere. Good luck.
Given that for any values l, y, n as described, the terms you call p1 and p2 are both in [0,1) and exp is in [1,..) making pow(p2, exp) also in [0,1) thus I don't see how you'd ever get an output with the range [0,n)