I am trying to draw a sine wave as a beginner python project. So far everything was going as smoothly as possible for me, but when I started working on drawing lines between points, I started to get the Error mentioned below, which hasnt happened before. As I am fairly new to programming in general, I don't understand a) what __init__() is and does and b) what it means by "arguments" and where I am giving it too many/of the wrong kind (also I dont really know what a class is so try to keep your answer simple)
As stated before, the program used to work fine under python 2.7 on windows 10 64bit until I added the
l = Line(prex, prey, x, truey)
l.draw(win)
bit.
I have tried putting everything in a seperate class as I have seen it work in a similar example, this has however only led to more problems and confusion and since it didnt sem to help I have abandoned this approach.
I have tried googling and looking on the internet but I havent found anything that has satisfied me.
from math import sin
from graphics import *
def main():
global x #sets up variables and defines start values
global y
global truey #used for drawing points
global prex #used for drawing lines between points
global prey #used for drawing lines between points
truey = 1
x = 1
y = 1
prex = 0
prey = 0
win = GraphWin("Sin wave", 1000, 500) #creates the window
for i in range(1, 1000): #repeats the "logic"
x = x + 2.3 #goes along the x-axis to repeat calculations
y = sin(x) #calculates y-value which will be used later
truey = (y * 100) + 100 #calculates actual y value used to draw
c = Circle(Point(x, truey),2) #draws a point to represent wave
l = Line(prex, prey, x, truey) #connects points with lines, WIP
l.draw(win) #draws the line, WIP
c.setFill("black") #fills "points"
c.draw(win) #draws circles
print truey
win.getMouse() #keeps window open
win.close() #keeps window open
main()
And the error I get is:
Traceback (most recent call last):
File "C:/Users/Lion/.PyCharmCE2018.2/config/scratches/Sin wave.py", line 31, in <module>
main()
File "C:/Users/Lion/.PyCharmCE2018.2/config/scratches/Sin wave.py", line 23, in main
l = Line(prex, prey, x, truey)
TypeError: __init__() takes exactly 3 arguments (5 given)
Process finished with exit code 1
I have a pyomo ConcreteModel() which I solve repeatedly within another stochastic optimization process whereas one or more parameters are changed on the model.
The basic process can be described as follows:
# model is created as a pyomo.ConcreteModel()
for i in range(0, 10):
# change some parameter on the model
opt = SolverFactory('gurobi', solver_io='lp')
# how can I check here if the changed model/lp-file is valid?
results = opt.solve(model)
Now I get an error for some cases where the model and LP file (see gist) seems to contain NaN values:
ERROR: Solver (gurobi) returned non-zero return code (1)
ERROR: Solver log: Academic license - for non-commercial use only Error
reading LP format file /tmp/tmp8agg07az.pyomo.lp at line 1453 Unrecognized
constraint RHS or sense Neighboring tokens: " <= nan c_u_x1371_: +1 x434
<= nan "
Unable to read file Traceback (most recent call last):
File "<stdin>", line 5, in <module> File
"/home/cord/.anaconda3/lib/python3.6/site-
packages/pyomo/solvers/plugins/solvers/GUROBI_RUN.py", line 61, in
gurobi_run
model = read(model_file)
File "gurobi.pxi", line 2652, in gurobipy.read
(../../src/python/gurobipy.c:127968) File "gurobi.pxi", line 72, in
gurobipy.gurobi.read (../../src/python/gurobipy.c:125753)
gurobipy.GurobiError: Unable to read model Freed default Gurobi
environment
Of course, the first idea would be to prevent setting these NaN-values. But I don't know why they occur anyhow and want to figure out when the model breaks due to a wrong structure caused by NaNs.
I know that I can catch the solver status and termination criterion from the SolverFactory() object. But the error obviously occurs somewhere before the solving process due to the invalid changed values.
How can I can catch these kinds of errors for different solvers before solving i. e. check if the model/lp-file is valid before applying a solver? Is there some method e.g. check_model() which delivers True or False if the model is (not) valid or something similar?
Thanks in advance!
If you know that the error is taking place when the parameter values are being changed, then you could test to see whether the sum of all relevant parameter values is a valid number. After all, NaN + 3 = NaN.
Since you are getting NaN, I am going to guess that you are importing parameter values using Pandas from an Excel spreadsheet? There is a way to convert all the NaNs to a default number.
Code example for parameter check:
>>> from pyomo.environ import *
>>> m = ConcreteModel()
>>> m.p1 = Param(initialize=1)
>>> m.p2 = Param(initialize=2)
>>> for p in m.component_data_objects(ctype=Param):
... print(p.name)
...
p1
p2
>>> import numpy
>>> m.p3 = Param(initialize=numpy.nan)
>>> import math
>>> math.isnan(value(sum(m.component_data_objects(ctype=Param))))
True
Indexed, Mutable Parameters:
>>> from pyomo.environ import *
>>> m = ConcreteModel()
>>> m.i = RangeSet(2)
>>> m.p = Param(m.i, initialize={1: 1, 2:2}, mutable=True)
>>> import math
>>> import numpy
>>> math.isnan(value(sum(m.component_data_objects(ctype=Param))))
False
>>> m.p[1] = numpy.nan
>>> math.isnan(value(sum(m.component_data_objects(ctype=Param))))
True
I'm trying to implement a simple logistic regression model trained with my own set of images, but I am getting this error when I try to train the model:
Traceback (most recent call last):
File "main.py", line 26, in <module>
model.entrenar_modelo(sess, training_images, training_labels)
File "/home/jr/Desktop/Dropbox/Machine_Learning/TF/Míos/Hip/model_log_reg.py", line 24, in entrenar_modelo
train_step.run({x: batch_xs, y_: batch_ys})
File "/home/jr/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1267, in run
_run_using_default_session(self, feed_dict, self.graph, session)
File "/home/jr/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2763, in _run_using_default_session
session.run(operation, feed_dict)
File "/home/jr/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 334, in run
np_val = np.array(subfeed_val, dtype=subfeed_t.dtype.as_numpy_dtype)
ValueError: setting an array element with a sequence.
The data I'm feeding to train_step.run({x: batch_xs, y_: batch_ys}) is like this:
batch_xs: list of tensor objects representing images of 100x100 (10,000 long tensors)
batch_ys: list of labels as floats (1.0 or 0.0)
What am I doing wrong?
Edits
It seems the problem was that I had to evaluate the tensors in batch_xs before passing them to train_step.run(...). I thought the run method would take care of that, but I guess I was wrong?
Anyway, so once I did this before calling the function:
for i, x in enumerate(batch_xs):
batch_xs[i] = x.eval()
#print batch_xs[i].shape
#assert all(x.shape == (100, 100, 3) for x in batch_xs)
# Now I can call the function
I had several issues even after doing what is suggested in the answers below. I finally fixed everything by ditching tensors and using numpy arrays.
This particular error is coming out of numpy. Calling np.array on a sequence with a inconsistant dimensions can throw it.
>>> np.array([1,2,3,[4,5,6]])
ValueError: setting an array element with a sequence.
It looks like it's failing at the point where tf ensures that all the elements of the feed_dict are numpy.arrays.
Check your feed_dict.
The feed_dict argument to Operation.run() (also Session.run() and Tensor.eval()) accepts a dictionary mapping Tensor objects (usually tf.placeholder() tensors) to a numpy array (or objects that can be trivially converted to a numpy array).
In your case, you are passing batch_xs, which is a list of numpy arrays, and TensorFlow does not know how to convert this to a numpy array. Let's say that batch_xs is defined as follows:
batch_xs = [np.random.rand(100, 100),
np.random.rand(100, 100),
..., # 29 rows omitted.
np.random.rand(100, 100)] # len(batch_xs) == 32.
We can convert batch_xs into a 32 x 100 x 100 array using the following:
# Convert each 100 x 100 element to 1 x 100 x 100, then vstack to concatenate.
batch_xs = np.vstack([np.expand_dims(x, 0) for x in batch_xs])
print batch_xs.shape
# ==> (32, 100, 100)
Note that, if batch_ys is a list of floats, this will be transparently converted into a 1-D numpy array by TensorFlow, so you should not need to convert this argument.
EDIT: mdaoust makes a valid point in the comments: If you pass a list of arrays into np.array (and therefore as the value in a feed_dict), it will automatically be vstacked, so there should be no need to convert your input as I suggested. Instead, it sounds like you have a mismatch between the shapes of your list elements. Try adding the following:
assert all(x.shape == (100, 100) for x in batch_xs)
...before the call to train_step.run(), and this should reveal whether you have a mismatch.
I have a code that opens a file, calculates the median value and writes that value to a separate file. Some of the files maybe empty so I wrote the following loop to check it the file is empty and if so skip it, increment the count and go back to the loop. It does what is expected for the first empty file it finds ,but not the second. The loop is below
t = 15.2
while t>=11.4:
if os.stat(r'C:\Users\Khary\Documents\bin%.2f.txt'%t ).st_size > 0:
print("All good")
F= r'C:\Users\Documents\bin%.2f.txt'%t
print(t)
F= np.loadtxt(F,skiprows=0)
LogMass = F[:,0]
LogRed = F[:,1]
value = np.median(LogMass)
filesave(*find_nearest(LogMass,LogRed))
t -=0.2
else:
t -=0.2
print("empty file")
The output is as follows
All good
15.2
All good
15.0
All good
14.8
All good
14.600000000000001
All good
14.400000000000002
All good
14.200000000000003
All good
14.000000000000004
All good
13.800000000000004
All good
13.600000000000005
All good
13.400000000000006
empty file
All good
13.000000000000007
Traceback (most recent call last):
File "C:\Users\Documents\Codes\Calculate Bin Median.py", line 35, in <module>
LogMass = F[:,0]
IndexError: too many indices
A second issue is that t somehow goes from one decimal place to 15 and the last place seems to incrementing whats with that?
Thanks for any and all help
EDIT
The error IndexError: too many indices only seems to apply to files with only one line example...
12.9982324 0.004321374
If I add a second line I no longer get the error can someone explain why this is? Thanks
EDIT
I tried a little experiment and it seems numpy does not like extracting a column if the array only has one row.
In [8]: x = np.array([1,3])
In [9]: y=x[:,0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-9-50e27cf81d21> in <module>()
----> 1 y=x[:,0]
IndexError: too many indices
In [10]: y=x[:,0].shape
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-10-e8108cf30e9a> in <module>()
----> 1 y=x[:,0].shape
IndexError: too many indices
In [11]:
You should be using try/except blocks. Something like:
t = 15.2
while t >= 11.4:
F= r'C:\Users\Documents\bin%.2f.txt'%t
try:
F = np.loadtxt(F,skiprows=0)
LogMass = F[:,0]
LogRed = F[:,1]
value = np.median(LogMass)
filesave(*find_nearest(LogMass,LogRed))
except IndexError:
print("bad file: {}".format(F))
else:
print("file worked!")
finally:
t -=0.2
Please refer to the official tutorial for more details about exception handling.
The issue with the last digit is due to how floats work they can not represent base10 numbers exactly. This can lead to fun things like:
In [13]: .3 * 3 - .9
Out[13]: -1.1102230246251565e-16
To deal with the one line file case, add the ndmin parameter to np.loadtxt (review its doc):
np.loadtxt('test.npy',ndmin=2)
# array([[ 1., 2.]])
With the help of a user named ajcr, found the problem was that ndim=2 should have been used in numpy.loadtxt() to insure that the array always 2 has dimensions.
Python uses indentation to define if while and for blocks.
It doesn't look like your if else statement is fully indented from the while.
I usually use a full 'tab' keyboard key to indent instead of 'spaces'
I am modelling data for a logit model with 34 dependent variables,and it keep throwing in the singular matrix error , as below -:
Traceback (most recent call last):
File "<pyshell#1116>", line 1, in <module>
test_scores = smf.Logit(m['event'], train_cols,missing='drop').fit()
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 1186, in fit
disp=disp, callback=callback, **kwargs)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 164, in fit
disp=disp, callback=callback, **kwargs)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 357, in fit
hess=hess)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 405, in _fit_mle_newton
newparams = oldparams - np.dot(np.linalg.inv(H),
File "/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 445, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
File "/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 328, in solve
raise LinAlgError, 'Singular matrix'
LinAlgError: Singular matrix
Which was when I stumpled on this method to reduce the matrix to its independent columns
def independent_columns(A, tol = 0):#1e-05):
"""
Return an array composed of independent columns of A.
Note the answer may not be unique; this function returns one of many
possible answers.
https://stackoverflow.com/q/13312498/190597 (user1812712)
http://math.stackexchange.com/a/199132/1140 (Gerry Myerson)
http://mail.scipy.org/pipermail/numpy-discussion/2008-November/038705.html
(Anne Archibald)
>>> A = np.array([(2,4,1,3),(-1,-2,1,0),(0,0,2,2),(3,6,2,5)])
2 4 1 3
-1 -2 1 0
0 0 2 2
3 6 2 5
# try with checking the rank of matrixs
>>> independent_columns(A)
np.array([[1, 4],
[2, 5],
[3, 6]])
"""
Q, R = linalg.qr(A)
independent = np.where(np.abs(R.diagonal()) > tol)[0]
#print independent
return A[:, independent], independent
A,independent_col_indexes=independent_columns(train_cols.as_matrix(columns=None))
#train_cols will not be converted back from a df to a matrix object,so doing this explicitly
A2=pd.DataFrame(A, columns=train_cols.columns[independent_col_indexes])
test_scores = smf.Logit(m['event'],A2,missing='drop').fit()
I still get the LinAlgError , though I was hoping I will have the reduced matrix rank now.
Also, I see np.linalg.matrix_rank(train_cols) returns 33 (ie. before calling on the independent_columns function total "x" columns was 34(ie, len(train_cols.ix[0])=34 ), meaning I don't have a full rank matrix), while np.linalg.matrix_rank(A2) returns 33 (meaning I have dropped a columns, and yet I still see the LinAlgError , when I run test_scores = smf.Logit(m['event'],A2,missing='drop').fit() , what am I missing ?
reference to the code above -
How to find degenerate rows/columns in a covariance matrix
I tried to start building the model forward,by introducing each variable at a time, which doesn't give me the singular matrix error, but I would rather have a method that is deterministic, and lets me know, what am I doing wrong & how to eliminate these columns.
Edit (updated post the suggestions by #
user333700 below)
1. You are right, "A2" doesn't have the reduced rank of 33 . ie. len(A2.ix[0]) =34 -> meaning the possibly collinear columns are not dropped - should I increase the "tol", tolerance to get rank of A2 (and the numbers of columns thereof) , as 33. If I change the tol to "1e-05" above, then I do get len(A2.ix[0]) =33, which suggests to me that tol >0 (strictly) is one indicator.
After this I just did the same, test_scores = smf.Logit(m['event'],A2,missing='drop').fit(), without nm to get the convergence.
2. Errors post trying 'nm' method. Strange thing though is that if I take just 20,000 rows, I do get the results. Since it is not showing up Memory error, but "Inverting hessian failed, no bse or cov_params available" - I am assuming, there are multiple nearly-similar records - what would you say ?
m = smf.Logit(data['event_custom'].ix[0:1000000] , train_cols.ix[0:1000000],missing='drop')
test_scores=m.fit(start_params=None,method='nm',maxiter=200,full_output=1)
Warning: Maximum number of iterations has been exceeded
Warning (from warnings module):
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 374
warn(warndoc, Warning)
Warning: Inverting hessian failed, no bse or cov_params available
test_scores.summary()
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
test_scores.summary()
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 2396, in summary
yname_list)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 2253, in summary
use_t=False)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/iolib/summary.py", line 826, in add_table_params
use_t=use_t)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/iolib/summary.py", line 447, in summary_params
std_err = results.bse
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/tools/decorators.py", line 95, in __get__
_cachedval = self.fget(obj)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 1037, in bse
return np.sqrt(np.diag(self.cov_params()))
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 1102, in cov_params
raise ValueError('need covariance of parameters for computing '
ValueError: need covariance of parameters for computing (unnormalized) covariances
Edit 2: (updated post the suggestions by #user333700 below)
Reiterating what I am trying to model - less than about 1% of total
users "convert" (success outcomes) - so I took a balanced sample of
35(+ve) /65 (-ve)
I suspect the model is not robust, though it converges. So, will use "start_params" as the params from earlier iteration, from a different dataset.
This edit is about confirming is the "start_params" can feed into the results as below -:
A,independent_col_indexes=independent_columns(train_cols.as_matrix(columns=None))
A2=pd.DataFrame(A, columns=train_cols.columns[independent_col_indexes])
m = smf.Logit(data['event_custom'], A2,missing='drop')
#m = smf.Logit(data['event_custom'], train_cols,missing='drop')#,method='nm').fit()#This doesnt work, so tried 'nm' which work, but used lasso, as nm did not converge.
test_scores=m.fit_regularized(start_params=None, method='l1', maxiter='defined_by_method', full_output=1, disp=1, callback=None, alpha=0, \
trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03)
a_good_looking_previous_result.params=test_scores.params #storing the parameters of pass1 to feed into pass2
test_scores.params
bidfloor_Quartile_modified_binned_0 0.305765
connectiontype_binned_0 -0.436798
day_custom_binned_Fri -0.040269
day_custom_binned_Mon 0.138599
day_custom_binned_Sat -0.319997
day_custom_binned_Sun -0.236507
day_custom_binned_Thu -0.058922
user_agent_device_family_binned_iPad -10.793270
user_agent_device_family_binned_iPhone -8.483099
user_agent_masterclass_binned_apple 9.038889
user_agent_masterclass_binned_generic -0.760297
user_agent_masterclass_binned_samsung -0.063522
log_height_width 0.593199
log_height_width_ScreenResolution -0.520836
productivity -1.495373
games 0.706340
entertainment -1.806886
IAB24 2.531467
IAB17 0.650327
IAB14 0.414031
utilities 9.968253
IAB1 1.850786
social_networking -2.814148
IAB3 -9.230780
music 0.019584
IAB9 -0.415559
C(time_day_modified)[(6, 12]]:C(country)[AUS] -0.103003
C(time_day_modified)[(0, 6]]:C(country)[HKG] 0.769272
C(time_day_modified)[(6, 12]]:C(country)[HKG] 0.406882
C(time_day_modified)[(0, 6]]:C(country)[IDN] 0.073306
C(time_day_modified)[(6, 12]]:C(country)[IDN] -0.207568
C(time_day_modified)[(0, 6]]:C(country)[IND] 0.033370
... more params here
Now on a different dataset(pass2, for indexing), I model the same as below -:
ie. I read a new dataframe, do all the variable transformation and then model via Logit as earlier .
m_pass2 = smf.Logit(data['event_custom'], A2_pass2,missing='drop')
test_scores_pass2=m_pass2.fit_regularized(start_params=a_good_looking_previous_result.params, method='l1', maxiter='defined_by_method', full_output=1, disp=1, callback=None, alpha=0, \
trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03)
and, possibly keep iterating by picking up "start_params" from earlier passes.
Several points to this:
You need tol > 0 to detect near perfect collinearity, which might also cause numerical problems in later calculations.
Check the number of columns of A2 to see whether a column has really be dropped.
Logit needs to do some non-linear calculations with the exog, so even if the design matrix is not very close to perfect collinearity, the transformed variables for the log-likelihood, derivative or Hessian calculations might still end up being with numerical problems, like singular Hessian.
(All these are floating point problems when we work near floating point precision, 1e-15, 1e-16. There are sometimes differences in the default thresholds for matrix_rank and similar linalg functions which can imply that in some edge cases one function identifies it as singular and another one doesn't.)
The default optimization method for the discrete models including Logit is a simple Newton method, which is fast in reasonably nice cases, but can fail in cases that are badly conditioned. You could try one of the other optimizers which will be one of those in scipy.optimize, method='nm' is usually very robust but slow, method='bfgs' works well in many cases but also can run into convergence problems.
Nevertheless, even when one of the other optimization methods succeeds, it is still necessary to inspect the results. More often than not, a failure with one method means that the model or estimation problem might not be well defined.
A good way to check whether it is just a problem with bad starting values or a specification problem is to run method='nm' first and then run one of the more accurate methods like newton or bfgs using the nm estimate as starting value, and see whether it succeeds from good starting values.