Auto-rounding up problems python - python-2.7

I am working on a problem, that essentially it comes down to solving the following equation (b/n)*((b-1)/(n-1)) = 0.5, where only the lower limit of n is 10**12. I was able to solve the problem making use of methods described here https://www.alpertron.com.ar/QUAD.HTM
However I also tried solving the problem as a quadratic equation, and checking than the answers are integers, and that the required ratio is reached. The program works for lower values of n, but as soon it starts approaching the required limit (10**12), it starts giving false solutions. For example, the program yields
b = 707106783028 and
n = 1000000002604
as a set of solutions, and yet it is not -> (b/n)*((b-1)/(n-1)) gives 0.499999999999, however python just takes it as 0.5. I tried using x.hex() to try to account for that, but it did not help. Is there any way to make python store/display the true (or most accurate) value of a float?

Related

How to know if the optimization problem is infeasible or not? Pyomo Warning: Problem may be infeasible

Pyomo can find a solution, but it gives this warning:
WARNING: Loading a SolverResults object with a warning status into
model=(SecondCD);
message from solver=Ipopt 3.11.1\x3a Converged to a locally infeasible point. Problem may be infeasible.
How do I know if the problem is infeasible or not?
this pyomo model optimizes a farm's decision of inputs allocation.
model.Crops = Set() # set Crops := cereal rapes maize ;
model.Inputs = Set() # set Inputs := land labor capital fertilizer;
model.b = Param(model.Inputs) # Parameters in CD production function
model.x = Var(model.Crops, model.Inputs, initialize = 100, within=NonNegativeReals)
def production_function(model, i):
return prod(model.x[i,j]**model.b[j] for j in model.Inputs)
model.Q = Expression(model.Crops, rule=production_function)
...
instance = model.create_instance(data="SecondCD.dat")
opt = SolverFactory("ipopt")
opt.options["tol"] = 1E-64
results = opt.solve(instance, tee=True) # solves and updates instance
instance.display()
if I set b >=1, (e.g.: param b := land 1 labor 1 capital 1 fertilizer 1),
pyomo can find optimal solution;
but if i set b < 1, (e.g.: param b := land 0.1 labor 0.1 capital 0.1 fertilizer 0.1), and set opt.options["tol"] = 1E-64, pyomo can find a solution, but gives that warning.
I expect an optimal solution, but the actual result gives the warning mentioned above.
The message you get (message from solver=Ipopt 3.11.1\x3a Converged to a locally infeasible point. Problem may be infeasible.) doesn't mean that the problem is necessarilly infeasible. A non-linear solver will typically give you a local optimum, and the path to get to the solution is a very important part of finding a "better" local optimum. When you tried with another point, you found a feasible solution, and that is the proof that your problem is feasible.
Now, in finding the global optimum instead of a local optimum, this is a little bit harder. One way to find out is to check if your problem is convex. If it is, it means that there will only be one local optimum, and that this local optimum is the global optimum. This can be done mathematically. See https://math.stackexchange.com/a/1707213/470821 and http://www.princeton.edu/~amirali/Public/Teaching/ORF523/S16/ORF523_S16_Lec7_gh.pdf from a quick Google search). If you found that your problem is not convex, then you can try to prove that there are few local optimums and that they can be found easily with good starting points. Finally, if this can't be done, you should consider more advanced techniques, all with their pros and cons. For example, you can try to generate a set of starting solutions to make sure that you cover the whole feasible domain of your problem. Another one would be to use meta-heuristics methods to help you find a better starting solution.
Also, I am sure that Ipopt have some tools to help tackling this problem of finding a good starting solution that improves the resulting local optimum.

userWarning pymc3 : What does reparameterize mean?

I built a pymc3 model using the DensityDist distribution. I have four parameters out of which 3 use Metropolis and one uses NUTS (this is automatically chosen by the pymc3). However, I get two different UserWarnings
1.Chain 0 contains number of diverging samples after tuning. If increasing target_accept does not help try to reparameterize.
MAy I know what does reparameterize here mean?
2. The acceptance probability in chain 0 does not match the target. It is , but should be close to 0.8. Try to increase the number of tuning steps.
Digging through a few examples I used 'random_seed', 'discard_tuned_samples', 'step = pm.NUTS(target_accept=0.95)' and so on and got rid of these user warnings. But I couldn't find details of how these parameter values are being decided. I am sure this might have been discussed in various context but I am unable to find solid documentation for this. I was doing a trial and error method as below.
with patten_study:
#SEED = 61290425 #51290425
step = pm.NUTS(target_accept=0.95)
trace = sample(step = step)#4000,tune = 10000,step =step,discard_tuned_samples=False)#,random_seed=SEED)
I need to run these on different datasets. Hence I am struggling to fix these parameter values for each dataset I am using. Is there any way where I give these values or find the outcome (if there are any user warnings and then try other values) and run it in a loop?
Pardon me if I am asking something stupid!
In this context, re-parametrization basically is finding a different but equivalent model that it is easier to compute. There are many things you can do depending on the details of your model:
Instead of using a Uniform distribution you can use a Normal distribution with a large variance.
Changing from a centered-hierarchical model to a
non-centered
one.
Replacing a Gaussian with a Student-T
Model a discrete variable as a continuous
Marginalize variables like in this example
whether these changes make sense or not is something that you should decide, based on your knowledge of the model and problem.

Python, getting a centered average with a catch

So, my assignment is to get the centered average of a list, much like a few of the other posts on here like this one (https://codereview.stackexchange.com/questions/108404/how-to-get-a-centered-mean-excluding-max-and-min-value-of-a-list-in-python) and a few others. However, my professor has told us we are not allowed to use min, max, or sort to solve this. So what I have right now is this, it is still a work in progress:
def centered_average(nums):
high=0
low=0
a=0
b=0
for i in range(len(nums)):
if nums[i]>a:
a=nums[i]
high=a
for i in range(len(nums)):
if nums[i]<b:
b=nums[i]
low=b
total=sum(nums)
average=(total-high-low)/(len(nums)-2)
print(average)
My problem is that I can't get low to be recognized as the lowest number in the list. For example, if I input [1,2,3,4,5] as the list, my function should return 5 as the high, 1 as the low, and 3 as the centered average since 2+3+4 is 9/3=3. However, what I have right there returns the low as 0. I think it is because of the (lens(nums) since it would think the first number is a 0. I'm not sure how I should fix this.
Note: I am still a beginner at this stuff so I know what I have might not be the best or that the error could be simple to fix, but I am still in the process of learning so any help and advice would be much appreciated.
The problem is your starting the running minimum (and running maximum) as 0.
Start the running minimum as float("inf") (as everything is guaranteed to be less than that). Start the running maximum as float("-inf") (as everything is guaranteed to be greater than that).
Or, start both as the first element of the list (which is either a true minimum/maximum, or there's another element that is lower/higher than it).

Same do-file, same computer, sometimes different results

I've got a large do-file that calls several sub-do-files, all in the lead-up to the estimation of a custom maximum likelihood model. That is, I have a main.do, which looks like this
version 12
set seed 42
do prepare_data
* some other stuff
do estimate_ml
and estimate_ml.do looks like this
* lots of other stuff
global cdf "normal"
program define customML
args lnf r noise
tempvar prob1l prob2l prob1r prob2r y1l y2l y1r y2r euL euR euDiff scale
quietly {
generate double `prob1l' = $ML_y2
generate double `prob2l' = $ML_y3
generate double `prob1r' = $ML_y4
generate double `prob2r' = $ML_y5
generate double `scale' = 1/100
generate double `y1l' = `scale'*((($ML_y10+$ML_y6)^(1-`r'))/(1-`r'))
generate double `y2l' = `scale'*((($ML_y10+$ML_y7)^(1-`r'))/(1-`r'))
generate double `y1r' = `scale'*((($ML_y10+$ML_y8)^(1-`r'))/(1-`r'))
generate double `y2r' = `scale'*((($ML_y10+$ML_y9)^(1-`r'))/(1-`r'))
generate double `euL' = (`prob1l'*`y1l')+(`prob2l'*`y2l')
generate double `euR' = (`prob1r'*`y1r')+(`prob2r'*`y2r')
generate double `euDiff' = (`euR'-`euL')/`noise'
replace `lnf' = ln($cdf( `euDiff')) if $ML_y1==1
replace `lnf' = ln($cdf(-`euDiff')) if $ML_y1==0
}
end
ml model lf customML ... , maximize technique(nr) difficult cluster(id)
ml display
To my great surprise, when I run the whole thing from top to bottom in Stata 12/SE I get different results for one of the coefficients reported by ml display each time I run it.
At first I thought this was a problem of running the same code on different computers but the issue occurs even if I run the same code on the same machine multiple times. Then I thought this was a random number generator issue but, as you can see, I can reproduce the issue even if I fix the seed at the beginning of the main do-file. The same holds when I move the set seed command immediately above the ml model.... The only way to get the same results though multiple runs is if I run everything above ml model and then only run ml model and ml display repeatedly.
I know that the likelihood function is very flat in the direction of the parameter whose value changes over runs so it's no surprise it can change. But I don't understand why it would, given that there seems to be little that isn't deterministic in my do files to begin with and nothing that couldn't be made deterministic by fixing the seed.
I suspect a problem with sorting. The default behaviour is that if two observations have the same value, they will be sorted randomly. Moreover, the random process that guides this sorting is governed by a different seed. This is intentional, as it prevents users to by accident see consistency where none exist. The logic being that it is better to be puzzled than to be overly confident.
As someone mentioned in the comments to this answer, adding the option stable to my sort command made the difference in my situation.

CLI/C++ How to store more than 15 digit float number?

For a school project, I have a simple program, which compares 20x20 photos. I put 20 photos, and then i put 21th photo, which is compared to existing 20, and pops up the answer, which photo i did insert (or which one is most similar). The problem is, my teacher wanted me to use nearest neighbour algorithm, so i am counting distance from every photo. I got everything working, but the thing is, if photos are too similar, i got the problem with saying which one is closer to my one. For example i get these distances with 2 different photos (well, they are ALMOST the same):
0 distance: 1353.07982026191
1 distance: 1353.07982026191
It is 15 digits already, and i am using double type. I was reading that long double is the same. Is there any "easy" way to store numbers with more than 15 digits and do math on them?
I count distance using Euclidean distance
I just need to be more precise, or thats limit i probably wont pass here, and i should talk to my teacher i cant compare such similar photos?
I think you need this: gmplib.org
There's a guide how to install this library on this site too.
And here's article about floats: http://gmplib.org/manual/C_002b_002b-Interface-Floats.html#C_002b_002b-Interface-Floats
Maybe you could use an algebraic approach.
Let us assume that you are trying to calcuate if vector x is closer to a or b. What you need to calculate is the sign of
d2(x, a) - d2(x, b)
Which becomes (I'll omit some passages for brevity)
and then
Which only contains differences between values which should be very similar. Summing over such small values should yield a better precision than working on the aggregate.