Unit Testing probability - unit-testing

I have a method that creates a 2 different instances (M, N) in a given x of times (math.random * x) the method will create object M and the rest of times object N.
I have written unit-tests with mocking the random number so I can assure that the method behaves as expected. However I am not sure on how to (and if) to test that the probability is accurate, for example if x = 0.1 I expect 1 out of 10 cases to return instance M.
How do I test this functionality?

Split the test. The first test should allow you to define what the random number generator returns (I assume you already have that). This part of the test just satisfies the "do I get the expected result if the random number generator would return some value".
The second test should just run the random number generator using some statistical analysis function (like counting how often it returns each value).
I suggest to wrap the real generator with a wrapper that returns "create M" and "create N" (or possibly just 0 and 1). That way, you can separate implementation from the place where it's used (the code which creates the two different instance shouldn't need to know how the generator is initialized or how you turn the real result into "create X".

I'll do this in the form of Python.
First describe your functionality:
def binomial_process(x):
'''
given a probability, x, return M with that probability,
else return N with probability 1-x
maybe: return random.random() > x
'''
Then test for this functionality:
import random
def binom(x):
return random.random() > x
Then write your test functions, first a setup function to put together your data from an expensive process:
def setUp(x, n):
counter = dict()
for _ in range(n):
result = binom(x)
counter[result] = counter.get(result, 0) + 1
return counter
Then the actual test:
import scipy.stats
trials = 1000000
def test_binomial_process():
ps = (.01, .1, .33, .5, .66, .9, .99)
x_01 = setUp(.01, trials)
x_1 = setUp(.1, trials)
x_33 = setUp(.1, trials)
x_5 = setUp(.5, trials)
x_66 = setUp(.9, trials)
x_9 = setUp(.9, trials)
x_99 = setUp(.99, trials)
x_01_result = scipy.stats.binom_test(x_01.get(True, 0), trials, .01)
x_1_result = scipy.stats.binom_test(x_1.get(True, 0), trials, .1)
x_33_result = scipy.stats.binom_test(x_33.get(True, 0), trials, .33)
x_5_result = scipy.stats.binom_test(x_5.get(True, 0), trials)
x_66_result = scipy.stats.binom_test(x_66.get(True, 0), trials, .66)
x_9_result = scipy.stats.binom_test(x_9.get(True, 0), trials, .9)
x_99_result = scipy.stats.binom_test(x_99.get(True, 0), trials, .99)
setups = (x_01, x_1, x_33, x_5, x_66, x_9, x_99)
results = (x_01_result, x_1_result, x_33_result, x_5_result,
x_66_result, x_9_result, x_99_result)
print 'can reject the hypothesis that the following tests are NOT the'
print 'results of a binomial process (with their given respective'
print 'probabilities) with probability < .01, {0} trials each'.format(trials)
for p, setup, result in zip(ps, setups, results):
print 'p = {0}'.format(p), setup, result, 'reject null' if result < .01 else 'fail to reject'
Then write your function (ok, we already did):
def binom(x):
return random.random() > x
And run your tests:
test_binomial_process()
Which on last output gives me:
can reject the hypothesis that the following tests are NOT the
results of a binomial process (with their given respective
probabilities) with probability < .01, 1000000 trials each
p = 0.01 {False: 10084, True: 989916} 4.94065645841e-324 reject null
p = 0.1 {False: 100524, True: 899476} 1.48219693752e-323 reject null
p = 0.33 {False: 100633, True: 899367} 2.96439387505e-323 reject null
p = 0.5 {False: 500369, True: 499631} 0.461122365668 fail to reject
p = 0.66 {False: 900144, True: 99856} 2.96439387505e-323 reject null
p = 0.9 {False: 899988, True: 100012} 1.48219693752e-323 reject null
p = 0.99 {False: 989950, True: 10050} 4.94065645841e-324 reject null
Why do we fail to reject on p=0.5? Let's look at the help on scipy.stats.binom_test:
Help on function binom_test in module scipy.stats.morestats:
binom_test(x, n=None, p=0.5, alternative='two-sided')
Perform a test that the probability of success is p.
This is an exact, two-sided test of the null hypothesis
that the probability of success in a Bernoulli experiment
is `p`.
Parameters
----------
x : integer or array_like
the number of successes, or if x has length 2, it is the
number of successes and the number of failures.
n : integer
the number of trials. This is ignored if x gives both the
number of successes and failures
p : float, optional
The hypothesized probability of success. 0 <= p <= 1. The
default value is p = 0.5
alternative : {'two-sided', 'greater', 'less'}, optional
Indicates the alternative hypothesis. The default value is
'two-sided'.
So .5 is the default null hypothesis for test, and it makes sense not to reject the null hypothesis in this case.

Related

Getting Values from IPOPT Display Pyomo

Here is my code for a Concrete Model of the Rosenbrock.
from pyomo.environ import *
from pyomo.opt import SolverFactory
import numpy as np
import math
import statistics
import time
m = ConcreteModel()
m.x = Var()
m.y = Var()
m.z = Var()
def rosenbrock(model):
return (1.0-m.x)2 + 100.0*(m.y - m.x2)2 + (1.0-m.y)2 + 100.0*(m.z - m.y2)2
m.obj = Objective(rule=rosenbrock, sense=minimize)
dist = 0.0
xval = yval = zval = error = times = []
for i in range(50):
m.x = np.random.uniform(low=-5.0, high=5.0)
m.y = np.random.uniform(low=-5.0, high=5.0)
m.z = np.random.uniform(low=-5.0, high=5.0)
solver = SolverFactory('ipopt')
t1 = time.time()
results = solver.solve(m, tee=True)
The solver.solve line when passed the tee=True prints out this beautiful display of all sorts of nice information. I want to access that information from the prinout and have scoured Pyomo and IPOPT documentation and cannot seem to understand how to access the values that are printed to the screen. I've also included a short example of the printout, I want to save the values from each run so that I can iterate and gather statistics over the total range.
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 5
Total number of variables............................: 3
variables with only lower bounds: 0
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
****OMITTED****
Number of objective function evaluations = 45
Number of objective gradient evaluations = 23
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 22
Total CPU secs in IPOPT (w/o function evaluations) = 0.020
Total CPU secs in NLP function evaluations = 0.000
I need some of these values but I see no feasible interface to access them from my search of the documentation, any wizards know how to do this? Thanks.
See this Ipopt solver wrapper that was contributed to Pyomo. It's essentially a parser for the Ipopt output log and you should be able to generalize/expand it to collect any values that aren't currently collected.
https://github.com/Pyomo/pyomo/blob/master/pyomo/contrib/parmest/ipopt_solver_wrapper.py

Wrong answer when calculating binomial coefficients using modified formula

I am trying to write a function to calculate the binomial coefficients using this formula:
The problem I am having is that I can not mange to get the correct answer. This is an example of two ways I have tried to write the function.
def binomial(n, i):
total = 0
for j in range(1, (n-i+1)):
n = float(n)
i = float(i)
j = float(j)
product = (i+j) / j
if total == 0:
total = product
else:
total = total * product
print '%.f' %total
or like this using numpy
import numpy as np
def binomial_np(n, i):
array = np.zeros(n-i+1)
for j in range(1, (n-i+1)):
s = float(j)
n = float(n)
i = float(i)
array[j] = (i+s)/s
array = array[1 : ]
array = np.prod(array)
print '%.f' %array
Both of the functions produces almost the correct result. After looking around a bit on the forum I did find some other examples that do produce the correct result, like this one from Python Binomial Coefficient
import math
x = int(input("Enter a value for x: "))
y = int(input("Enter a value for y: "))
if y == x:
print(1)
elif y == 1: # see georg's comment
print(x)
elif y > x: # will be executed only if y != 1 and y != x
print(0)
else: # will be executed only if y != 1 and y != x and x <= y
a = math.factorial(x)
b = math.factorial(y)
c = math.factorial(x-y) # that appears to be useful to get the correct result
div = a // (b * c)
print(div)
The real question I have from this is if there is something wrong with the way I have written the formulas, or if it just isnt possible to get the correct answer this way because of how float's and number of decimals work in Python. Hope someone can point me in the right direction on what I am doing wrong here.
The slight discrepancies seem to come from using floating point arithmetic. However, if you are sure that n and i are integers, there is no need at all for floating point values in your routine. You can just do
def binomial(n, i):
result = 1
for j in range(1, n-i+1):
result = result * (i+j) // j
return result
This works because the product of 2 consecutive numbers is divisible by 1*2, the product of 3 consecutive numbers is divisible by 1*2*3, ... the product of n-i consecutive numbers is divisible by (n-i)!. The calculations in the code above are ordered to that only integers result, so you get an exact answer. This because my code does not calculate (i+j)/j as your code does; it calculates result * (i+j) and only then divides by j. This code also does a fairly good job of keeping the integer values as small as possible, which should increase speed.
If course, if n or i is float rather than integer, this may not work. Also note this code does not check that 0 <= i <= n, which should be done.
I would indeed see float precision as the main problem here. You do floating point division, which means your integers may get rounded. I suggest you maintain the numerator and denominator as separate numbers, and do the division in the end. Or, if the numbers get too big using this approach, write some gcd computation and cancel common factors along the way. But only do integer divisions (//) to avoid loss of precision.

avoiding zero float division in loops - python

sorry i am a bit of a newbie with programming but I am getting a float division error in a simple loop which I am not sure how to rectify.
Here is a code in python 2.7
import random
N = 100
A = []
p = 0
q = 0
k = 1
while k<=N:
x = random.random()
if x<= 0.5:
p+= 1
else:
q+=1
y = p/q
A.append(y)
k+=1
Running this code gives a zero division error. which I am not able to rectify. Can anyone tell me how to rectify this?
You are getting zero division error because of this code
if x <= 0.5:
p+=1
else:
q+=1
y= p/q
You have initialised q = 0 thus when while loop is run first time and if x <= 0.5 then p will be incremented but q will be equal to zero and in next step you are dividing p by q(which is zero). You need to put a check condition before performing division so that denominator is not zero. You can rectify it in following manner.
if x <= 0.5:
p+=1
else:
q+=1
if (q == 0):
print "Denominator is zero"
else:
y= p/q
This is just one solution since I don't know what you are trying to do in your code.
You can use numpy.nextafter(q, 1).
This gives you the next floating-point value after q towards 1, which is very small number.

Quadratic Equations Factored Form

I'm a beginner with python as my first language trying to factor a
quadratic where the equation provides the result in
factor form for example:
x^2+5x+4
Output to be (or any factors in parenthesis)
(x+4)(x+1)
So far this only gives me x but not a correct value either
CODE
def quadratic(a,b,c):
x = -b+(((b**2)-(4*a*c))**(1/2))/(2*a)
return x
print quadratic(1,5,4)
Your parentheses are in the wrong places, you're only calculating and returning one root, and (most importantly), you're using **(1/2) to calculate the square root. In Python 2, this will evaluate to 0 (integer arithmetic). To get 0.5, use (1./2) (or 0.5 directly).
This is (slightly) better:
def quadratic(a,b,c):
x1 = (-b+(b**2 - 4*a*c)**(1./2))/(2*a)
x2 = (-b-(b**2 - 4*a*c)**(1./2))/(2*a)
return x1, x2
print quadratic(1,5,4)
and returns (-1.0, -4.0).
To get your parentheses, put the negative of the roots in an appropriate string:
def quadratic(a,b,c):
x1 = (-b+(b**2 - 4*a*c)**(1./2))/(2*a)
x2 = (-b-(b**2 - 4*a*c)**(1./2))/(2*a)
return '(x{:+f})(x{:+f})'.format(-x1,-x2)
print quadratic(1,5,4)
Returns:
(x+1.000000)(x+4.000000)
This will help you:
from __future__ import division
def quadratic(a,b,c):
x = (-b+((b**2)-(4*a*c))**(1/2))/(2*a)
y = (-b-((b**2)-(4*a*c))**(1/2))/(2*a)
return x,y
m,n = quadratic(1,5,4)
sign_of_m = '-' if m > 0 else '+'
sign_of_n = '-' if n > 0 else '+'
print '(x'+sign_of_m+str(abs(m))+')(x'+sign_of_n+str(abs(n))+')'
Output
(x+1.0)(x+4.0)
Let me know if it helps.

Finding the fibonacci numbers in a certain range using python

I am trying to write a function in python that returns a list of all the fibonacci numbers in a certain range but my code wont work it simply returns [0]. What is the problem?
from math import sqrt
def F(n):
return int(((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5)))
def Frange(x):
A = [0]
while max(A) < x:
H = 1
for i in range(H):
A.append(F(i))
H = H+1
return A
You set H = 1 as the first statement in your while loop; so every time you enter the for loop, H = 1 and you'll only get the Fibonacci number for n=0
You need to set H = 1 outside the while loop:
def Frange(x):
A = [0]
H = 1
while max(A) < x:
for i in range(H):
A.append(F(i))
H = H+1
return A
You could have solved this yourself very easily by printing various values inside the loops, such as print H.
I found another error and the improved code is:
from math import sqrt
def F(n):
return int(((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5)))
def Frange2(x):
A = [0]
H = 1
while max(A) < x:
if F(H) < x:
A.append(F(H))
else:
break
H = H+1
return A
The fastest and most popular and uncomplicated solution to calculating a list of fibonacci numbers in a range is
def fib3(n): #FASTEST YET
fibs= [0,1] #list from bottom up
for i in range(2, n+1):
fibs.append(fibs[-1]+fibs[-2])
return fibs
This function stores the computed fibonacci numbers in a list and later uses them as 'cached' numbers to compute further.
Hope it helps!