Check if pyomo model and generated LP file format is valid and catch error/exception - linear-programming

I have a pyomo ConcreteModel() which I solve repeatedly within another stochastic optimization process whereas one or more parameters are changed on the model.
The basic process can be described as follows:
# model is created as a pyomo.ConcreteModel()
for i in range(0, 10):
# change some parameter on the model
opt = SolverFactory('gurobi', solver_io='lp')
# how can I check here if the changed model/lp-file is valid?
results = opt.solve(model)
Now I get an error for some cases where the model and LP file (see gist) seems to contain NaN values:
ERROR: Solver (gurobi) returned non-zero return code (1)
ERROR: Solver log: Academic license - for non-commercial use only Error
reading LP format file /tmp/tmp8agg07az.pyomo.lp at line 1453 Unrecognized
constraint RHS or sense Neighboring tokens: " <= nan c_u_x1371_: +1 x434
<= nan "
Unable to read file Traceback (most recent call last):
File "<stdin>", line 5, in <module> File
"/home/cord/.anaconda3/lib/python3.6/site-
packages/pyomo/solvers/plugins/solvers/GUROBI_RUN.py", line 61, in
gurobi_run
model = read(model_file)
File "gurobi.pxi", line 2652, in gurobipy.read
(../../src/python/gurobipy.c:127968) File "gurobi.pxi", line 72, in
gurobipy.gurobi.read (../../src/python/gurobipy.c:125753)
gurobipy.GurobiError: Unable to read model Freed default Gurobi
environment
Of course, the first idea would be to prevent setting these NaN-values. But I don't know why they occur anyhow and want to figure out when the model breaks due to a wrong structure caused by NaNs.
I know that I can catch the solver status and termination criterion from the SolverFactory() object. But the error obviously occurs somewhere before the solving process due to the invalid changed values.
How can I can catch these kinds of errors for different solvers before solving i. e. check if the model/lp-file is valid before applying a solver? Is there some method e.g. check_model() which delivers True or False if the model is (not) valid or something similar?
Thanks in advance!

If you know that the error is taking place when the parameter values are being changed, then you could test to see whether the sum of all relevant parameter values is a valid number. After all, NaN + 3 = NaN.
Since you are getting NaN, I am going to guess that you are importing parameter values using Pandas from an Excel spreadsheet? There is a way to convert all the NaNs to a default number.
Code example for parameter check:
>>> from pyomo.environ import *
>>> m = ConcreteModel()
>>> m.p1 = Param(initialize=1)
>>> m.p2 = Param(initialize=2)
>>> for p in m.component_data_objects(ctype=Param):
... print(p.name)
...
p1
p2
>>> import numpy
>>> m.p3 = Param(initialize=numpy.nan)
>>> import math
>>> math.isnan(value(sum(m.component_data_objects(ctype=Param))))
True
Indexed, Mutable Parameters:
>>> from pyomo.environ import *
>>> m = ConcreteModel()
>>> m.i = RangeSet(2)
>>> m.p = Param(m.i, initialize={1: 1, 2:2}, mutable=True)
>>> import math
>>> import numpy
>>> math.isnan(value(sum(m.component_data_objects(ctype=Param))))
False
>>> m.p[1] = numpy.nan
>>> math.isnan(value(sum(m.component_data_objects(ctype=Param))))
True

Related

Is a GeoDjango MultiPolygonField supposed to accept a Polygon geometry?

When I try to set a Polygon on a MultiPolygonField the following exception is raised:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/mattrowbum/.virtualenvs/my_env/lib/python3.7/site-packages/django/contrib/gis/db/models/proxy.py", line 75, in __set__
instance.__class__.__name__, gtype, type(value)))
TypeError: Cannot set Location SpatialProxy (MULTIPOLYGON) with value of type: <class 'django.contrib.gis.geos.polygon.Polygon'>
That would be understandable, except that a note in the GeoDjango tutorial states:
...a GeoDjango MultiPolygonField will accept a Polygon geometry.
I've had a look at the source of proxy.py, and it is checking whether the value (a Polygon) is an instance of the relevant geometry class (a MultiPolygon). I've tried doing this check manually which confirms that a Polygon does not inherit from MultiPolygon:
>>> from django.contrib.gis.geos import MultiPolygon, Polygon
>>> ext_coords = ((0, 0), (0, 1), (1, 1), (1, 0), (0, 0))
>>> int_coords = ((0.4, 0.4), (0.4, 0.6), (0.6, 0.6), (0.6, 0.4), (0.4, 0.4))
>>> poly = Polygon(ext_coords, int_coords)
>>> isinstance(poly, Polygon)
True
>>> isinstance(poly, MultiPolygon)
False
This came to my attention when attempting to simplify an existing stored MultiPolygon value. Below is my model:
from django.contrib.gis.db import models
class Location(models.Model):
name = models.CharField(max_length=180)
mpoly = models.MultiPolygonField(geography=True, blank=True, null=True)
The process I used to simplify the MultiPolygon is below. The last line causes the exception to be raised:
>>> from my_app.models import Location
>>> location = Location.objects.get(pk=1)
>>> geom = location.mpoly
>>> simplified_geom = geom.simplify(0.0002)
>>> location.mpoly = simplified_geom
If I use the Polygon to create a MultiPolygon, it works fine:
>>> multi = MultiPolygon([simplified_geom,])
>>> location.mpoly = multi
Is the note in the tutorial misleading or am I doing something wrong?
EDIT: Further tests on the geometries.
The original MultiPolygon straight from the model field:
>>> geom = Location.mpoly
>>> type(geom)
<class 'django.contrib.gis.geos.collections.MultiPolygon'>
>>> geom.geom_type
'MultiPolygon'
>>> geom.valid
True
>>> geom.srid
4326
Applying the simplify() method:
>>> simplified_geom = geom.simplify(0.0002)
>>> type(simplified_geom)
<class 'django.contrib.gis.geos.polygon.Polygon'>
>>> simplified_geom.geom_type
'Polygon'
>>> simplified_geom.valid
True
>>> simplified_geom.srid
4326
Creating a MultiPolygon from the simplified geometry:
>>> multi = MultiPolygon([simplified_geom,])
>>> type(multi)
<class 'django.contrib.gis.geos.collections.MultiPolygon'>
>>> multi.geom_type
'MultiPolygon'
>>> multi.valid
True
>>> multi.srid
>>> print(multi.srs)
None
Note that the above MultiPolygon has no SRID. I thought that might have been the reason is was being accepted. I created one with the srid=4326 argument but it too was accepted by the field.
Here's a really basic example of the problem:
>>> # This works
>>> location.mpoly = MultiPolygon()
>>> # This doesn't
>>> location.mpoly = Polygon()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/mattrowbum/.virtualenvs/musicteacher/lib/python3.7/site-packages/django/contrib/gis/db/models/proxy.py", line 75, in __set__
instance.__class__.__name__, gtype, type(value)))
TypeError: Cannot set Location SpatialProxy (MULTIPOLYGON) with value of type: <class 'django.contrib.gis.geos.polygon.Polygon'>
EDIT:
Although the MultiPolygon code seems to allow a Polygon to b stored as MultiPolygon object:
class MultiPolygon(GeometryCollection):
_allowed = Polygon
_typeid = 6
the issue you are presenting rises as you describe it.
I have tried some workarounds and the only one that I am somewhat satisfied with is to refactor your geom field into a generic GeometryField that can store any type of geometry.
Another option without touching your model would be to convert each Polygon to a MultiPolygon before inserting it to the field:
p = Polygon()
location.mpoly = MultiPolygon(p)
Seems to me that this is worthy of an issue with either a Tutorial update request, or a code fix.
Leaving the previous state of the answer here for comment continuity:
The issue is with the geography=True setting and not with the field, because the MultiPolygonField does accept a Polygon as well as a MultiPolygon:
The geography type provides native support for spatial features represented with geographic coordinates (e.g., WGS84 longitude/latitude). Unlike the plane used by a geometry type, the geography type uses a spherical representation of its data. Distance and measurement operations performed on a geography column automatically employ great circle arc calculations and return linear units. In other words, when ST_Distance is called on two geographies, a value in meters is returned (as opposed to degrees if called on a geometry column in WGS84).
Since you set the field to expect a geography type object, then if you try to pass a non-geography representation of a Polygon, you will get the error in question.

Sympy derivative with a non-symbol

For a project i am working on i need the derivative of a function against wrt cos(theta) but when using Sympy v1.5.1 get an error message stating non-symbols cannot be used as a derivative. This was no problem up to Sympy v1.3 but later versions give this error.
>>> l=1
>>> theta = symbols('theta')
>>> eq=diff((cos(theta)**2-1)**l,cos(theta),l)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/base/data/home/apps/s~sympy-live-
hrd/20200105t193609.423659059328302322/sympy/sympy/core/function.py", line 2446, in diff
return f.diff(*symbols, **kwargs)
File "/base/data/home/apps/s~sympy-live-
hrd/20200105t193609.423659059328302322/sympy/sympy/core/expr.py", line 3352, in diff
return Derivative(self, *symbols, **assumptions)
File "/base/data/home/apps/s~sympy-live-
hrd/20200105t193609.423659059328302322/sympy/sympy/core/function.py", line 1343, in __new__
__)))
ValueError:
Can't calculate derivative wrt cos(theta).
According to the Sympy documentation (https://docs.sympy.org/latest/modules/core.html#sympy.core.function.Derivative) i may be able to solve this using:
>>> from sympy.abc import t
>>> F = Function('F')
>>> U = f(t)
>>> V = U.diff(t)
>>> direct = F(t, U, V).diff(U)
Unfortunately i can't get this to work with this equation in Sympy v1.5.1.
Suggestions/help are much appreciated.
derivative of a function against wrt cos(theta)
Did this really work before in sympy? i.e. you were able to differentiate w.r.t cos(theta)? This should not work as differentiation is w.r.t to a symbol. For example Maple also gives error
diff( 1+cos(theta)^2,cos(theta))
Error, invalid input: diff received cos(theta), which is not valid for its 2nd argument
Strange that Mathematica does allow this. But I think this is not good behavior. May be that is why sympy no longer allows it.
But you can do this in sympy
from sympy import *
theta,x = symbols('theta x')
eq = (cos(theta)**2-1)**2
result = diff( eq.subs(cos(theta),x) ,x)
result.subs(x,cos(theta))
Which gives
4*(cos(theta)**2 - 1)*cos(theta)
In Mathematica (which allows this)
D[(Cos[theta]^2 - 1)^2, Cos[theta]]
gives
4 Cos[theta] (-1 + Cos[theta]^2)
Perhaps SymPy over-corrected. If the expression has a single generator matching the function of interest then the substitution-equivalent differentiation could take place. Cases which shouldn't (probably) be allowed are (x + 1).diff(cos(x)), sin(x).diff(cos(x)), etc... But (cos(x)**2 - 1).diff(cos(x)) should (probably) be ok. As #Nasser has indicated, a simple substitution/differentiation/backsubstitution will work.

TypeError: Expected sequence or array-like, got estimator

I am working on a project that has user reviews on products. I am using TfidfVectorizer to extract features from my dataset apart from some other features that I have extracted manually.
df = pd.read_csv('reviews.csv', header=0)
FEATURES = ['feature1', 'feature2']
reviews = df['review']
reviews = reviews.values.flatten()
vectorizer = TfidfVectorizer(min_df=1, decode_error='ignore', ngram_range=(1, 3), stop_words='english', max_features=45)
X = vectorizer.fit_transform(reviews)
idf = vectorizer.idf_
features = vectorizer.get_feature_names()
FEATURES += features
inverse = vectorizer.inverse_transform(X)
for i, row in df.iterrows():
for f in features:
df.set_value(i, f, False)
for inv in inverse[i]:
df.set_value(i, inv, True)
train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700)
The above code works fine. But when I change the max_features from 45 to anything higher I get an error on tran_test_split line.
Traceback as follows:
Traceback (most recent call last):
File "analysis.py", line 120, in <module>
train_df, test_df = train_test_split(df, test_size = 0.2, random_state=700)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1906, in train_test_split
arrays = indexable(*arrays)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 201, in indexable
check_consistent_length(*result)
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 173, in check_consistent_length
uniques = np.unique([_num_samples(X) for X in arrays if X is not None])
File "/Users/user/Tools/anaconda/lib/python2.7/site-packages/sklearn/utils/validation.py", line 112, in _num_samples
'estimator %s' % x)
TypeError: Expected sequence or array-like, got estimator
I am not sure what exactly is changing when I change increase the max_features size.
Let me know if you need more data or if I have missed something
I know this is old, but I had the same issue and while the answer from #shahins works, I wanted something that would keep the dataframe object so I can have my indexing in the train/test splits.
Solution:
Rename the dataframe column fit as something (anything) else:
df = df.rename(columns = {'fit': 'fit_feature'})
Why it works:
It isn't actually the number of features that is the issue, it is one feature in particular that is causing the problem. I'm guessing you are getting the word "fit" as one of your text features (and it didn't show up with the lower max_features threshold).
Looking at the sklearn source code, it checks to make sure you are not passing an sklearn estimator by testing to see if the any of your objects have a "fit" attribute. The code is checking for the fit method of an sklearn estimator, but will also raise an exception when you have a fit column of the dataframe (remember df.fit and df['fit'] both select the "fit" column).
I had this issue and I tried something like this and it worked for me:
train_test_split(df.as_matrix(), test_size = 0.2, random_state=700)
train_test_split(x.as_matrix(), y.as_matrix(), test_size=0.2, random_state=0)
This worked for me.

Python TypeError: list indices must be integers, not tuple

Using the python 2.7 shell on osx lion. The .csv file has 12 columns by 892 rows.
import csv as csv
import numpy as np
# Open up csv file into a Python object
csv_file_object = csv.reader(open('/Users/scdavis6/Documents/Kaggle/train.csv', 'rb'))
header = csv_file_object.next()
data=[]
for row in csv_file_object:
data.append(row)
data = np.array(data)
# Convert to float for numerical calculations
number_passengers = np.size(data[0::,0].astype(np.float))
And this is the error I get:
Traceback (most recent call last):
File "pyshell#5>", line 1, in <module>
number_passengers = np.size(data[0::,0].astype(np.float))
TypeError: list indices must be integers, not tuple
What am I doing wrong.
Don't use csv to read the data into a NumPy array. Use numpy.genfromtxt; using dtype=None will cause genfromtxt to make an intelligent guess at the dtypes for you. By doing it this way you won't have to manually convert strings to floats.
data[0::, 0] just gives you the first column of data.
data[:, 0] would give you the same result.
The error message
TypeError: list indices must be integers, not tuple
suggests that for some reason your data variable might be holding a list rather than a ndarray. For example, the same Exception can produced like this:
In [73]: data = [1,2,3]
In [74]: data[1,2]
TypeError: list indices must be integers, not tuple
I don't know why that is happening, but if you post a sample of your CSV we should be able to help fix that.
Using np.genfromtxt, your current code could be simplified to:
import numpy as np
filename = '/Users/scdavis6/Documents/Kaggle/train.csv'
data = np.genfromtxt(filename, delimiter=',', skiprows=1, dtype=None)
number_passengers = np.size(data, axis=0)

Fitting The Theoretical Equation To My Data

I am very, very new to python, so please bear with me, and pardon my naivety. I am using Spyder Python 2.7 on my Windows laptop. As the title suggests, I have some data, a theoretical equation, and I am attempting to fit my data, with what I believe is the Chi-squared fit. The theoretical equation I am using is
import math
import numpy as np
import scipy.optimize as optimize
import matplotlib.pylab as plt
import csv
#with open('1.csv', 'r') as datafile:
# datareader = csv.reader(datafile)
# for row in datareader:
# print ', '.join(row)
t_y_data = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',', usecols=(1,4), skiprows = 1)
print(t_y_data)
t = t_y_data[:,0]
y = t_y_data[:,1]
gamma0 = [.1]
sigma = [(0.345366)/2]*(len(t))
#len(sigma)
#print(sigma)
#print(len(sigma))
#sigma is the error in our measurements, which is the radius of the object
# Dragfunction is the theoretical equation of the position as a function of time when the thing falling experiences a drag force
# This is the function we are trying to fit to our data
# t is the independent variable time, m is the mass, and D is the Diameter
#Gamma is the value of which python will vary, until chi-squared is a minimum
def Dragfunction(x, gamma):
print x
g = 9.8
D = 0.345366
m = 0.715
# num = math.sqrt(gamma)*D*g*x
# den = math.sqrt(m*g)
# frac = num/den
# print "frac", frac
return ((m)/(gamma*D**2))*math.log(math.cosh(math.sqrt(gamma/m*g)*D*g*t))
optimize.curve_fit(Dragfunction, t, y, gamma0, sigma)
This is the error message I am getting:
return ((m)/(gamma*D**2))*math.log(math.cosh(math.sqrt(gamma/m*g)*D*g*t))
TypeError: only length-1 arrays can be converted to Python scalars
My professor and I have spent about three or four hours trying to fix this. He helped me work out a lot of the problems, but this we can't seem to resolve.
Could someone please help? If there is any other information you need, please let me know.
Your error message comes from the fact that those math functions only accept a scalar, so to call functions on an array, use the numpy versions:
In [82]: a = np.array([1,2,3])
In [83]: np.sqrt(a)
Out[83]: array([ 1. , 1.41421356, 1.73205081])
In [84]: math.sqrt(a)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
----> 1 math.sqrt(a)
TypeError: only length-1 arrays can be converted to Python scalars
In the process, I happened to spot a mathematical error in your code. Your equation at top says that g is in the bottom of the square root inside the log(cosh()), but you've got it on the top because a/b*c == a*c/b in python, not a/(b*c)
log(cosh(sqrt(gamma/m*g)*D*g*t))
should instead be any one of these:
log(cosh(sqrt(gamma/m/g)*D*g*t))
log(cosh(sqrt(gamma/(m*g))*D*g*t))
log(cosh(sqrt(gamma*g/m)*D*t)) # the simplest, by canceling with the g from outside sqrt
A second error is that in your function definition, you have the parameter named x which you never use, but instead you're using t which at this point is a global variable (from your data), so you won't see an error. You won't see an effect using curve_fit since it will pass your t data to the function anyway, but if you tried to call the Dragfunction on a different data set, it would still give you the results from the t values. Probably you meant this:
def Dragfunction(t, gamma):
print t
...
return ... D*g*t ...
A couple other notes as unsolicited advice, since you said you were new to python:
You can load and "unpack" the t and y variables at once with:
t, y = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',', usecols=(1,4), skiprows = 1, unpack=True)
If your error is constant, then sigma has no effect on curve_fit, as it only affects the relative weighting for the fit, so you really don't need it at all.
Below is my version of your code, with all of the above changes in place.
import numpy as np
from scipy import optimize # simplified syntax
import matplotlib.pyplot as plt # pylab != pyplot
# `unpack` lets you split the columns immediately:
t, y = np.loadtxt('exerciseball.csv', dtype=float, delimiter=',',
usecols=(1, 4), skiprows=1, unpack=True)
gamma0 = .1 # does not need to be a list
def Dragfunction(x, gamma):
g = 9.8
D = 0.345366
m = 0.715
gammaD_m = gamma*D*D/m # combination is used twice, only calculate once for (small) speedup
return np.log(np.cosh(np.sqrt(gammaD_m*g)*t)) / gammaD_m
gamma_best, gamma_var = optimize.curve_fit(Dragfunction, t, y, gamma0)