ValidationException: ValidationException: 400 Bad Request
{u'message': u'Item size has exceeded the maximum allowed size', u'__type': u'com.amazon.coral.validate#ValidationException'}
The item object I have, has size of 92004 Bytes
>>> iii
<boto.dynamodb2.items.Item object at 0x7f7922c97190>
>>> iiip = iii.prepare_full() # it is now in dynamodb format e.g. "Item":{"time":{"N":"300"}, "user":{"S":"self"}}
>>> len(json.dumps(iiip))
92004
>>>
The size I get 92004 is less than 400KB, Why do I see the above mentioned error when saving the item?
Any pointers?
EDIT:
I played around with different sizes of data,
>>> i00['Resources'] = "A" * 66848; len(json.dumps(i00))
68481
>>> i = Item(ct.table, data=i00); i.save()
True
>>> i.delete()
True
>>> i00['Resources'] = "A" * 66849; len(json.dumps(i00))
68482
>>> i = Item(ct.table, data=i00); i.save()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/var/www/virtualenv/ken/local/lib/python2.7/site-packages/boto/dynamodb2/items.py", line 455, in save
returned = self.table._put_item(final_data, expects=expects)
File "/var/www/virtualenv/ken/local/lib/python2.7/site-packages/boto/dynamodb2/table.py", line 835, in _put_item
self.connection.put_item(self.table_name, item_data, **kwargs)
File "/var/www/virtualenv/ken/local/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 1510, in put_item
body=json.dumps(params))
File "/var/www/virtualenv/ken/local/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2842, in make_request
retry_handler=self._retry_handler)
File "/var/www/virtualenv/ken/local/lib/python2.7/site-packages/boto/connection.py", line 954, in _mexe
status = retry_handler(response, i, next_sleep)
File "/var/www/virtualenv/ken/local/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2882, in _retry_handler
response.status, response.reason, data)
ValidationException: ValidationException: 400 Bad Request
{u'message': u'Item size has exceeded the maximum allowed size', u'__type': u'com.amazon.coral.validate#ValidationException'}
In other words, the size of cloudtrail data has to be less than 68482 bytes. I wonder why they claim it to be 400KB. Clearly, I am missing something.
Answering my own question since it might help someone with the same problem.
I contacted aws technical support and here is the explanation:
I had 5 indexes on my dynamodb table, since the data is replicated for each index; the total data = 68481 * (5 + 1) = 410886 which is close to 400KB.
I feel this is missing from Dynamodb documentation and it'd be nice it amazon adds it.
So, to summarize, the total data (item size) that ends up being saved in dynamodb table is = Acutal data * (number of indexes + 1).
Can you share your input data if no issues? Are you trying to insert bulk data using a flat file as input? Looks like dynamoDB is not able to interpret new line or is treating all records as single record!!
I got a similar error, but for hash key field. I was trying bulk data load using hive scripts. I realized that the attributes should be tab separated, and by fixing the input format, error was fixed for me!!
Try inserting single record at a time. If you don't get the above error, then it is to do with the format of the data!!
I have a code that opens a file, calculates the median value and writes that value to a separate file. Some of the files maybe empty so I wrote the following loop to check it the file is empty and if so skip it, increment the count and go back to the loop. It does what is expected for the first empty file it finds ,but not the second. The loop is below
t = 15.2
while t>=11.4:
if os.stat(r'C:\Users\Khary\Documents\bin%.2f.txt'%t ).st_size > 0:
print("All good")
F= r'C:\Users\Documents\bin%.2f.txt'%t
print(t)
F= np.loadtxt(F,skiprows=0)
LogMass = F[:,0]
LogRed = F[:,1]
value = np.median(LogMass)
filesave(*find_nearest(LogMass,LogRed))
t -=0.2
else:
t -=0.2
print("empty file")
The output is as follows
All good
15.2
All good
15.0
All good
14.8
All good
14.600000000000001
All good
14.400000000000002
All good
14.200000000000003
All good
14.000000000000004
All good
13.800000000000004
All good
13.600000000000005
All good
13.400000000000006
empty file
All good
13.000000000000007
Traceback (most recent call last):
File "C:\Users\Documents\Codes\Calculate Bin Median.py", line 35, in <module>
LogMass = F[:,0]
IndexError: too many indices
A second issue is that t somehow goes from one decimal place to 15 and the last place seems to incrementing whats with that?
Thanks for any and all help
EDIT
The error IndexError: too many indices only seems to apply to files with only one line example...
12.9982324 0.004321374
If I add a second line I no longer get the error can someone explain why this is? Thanks
EDIT
I tried a little experiment and it seems numpy does not like extracting a column if the array only has one row.
In [8]: x = np.array([1,3])
In [9]: y=x[:,0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-9-50e27cf81d21> in <module>()
----> 1 y=x[:,0]
IndexError: too many indices
In [10]: y=x[:,0].shape
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-10-e8108cf30e9a> in <module>()
----> 1 y=x[:,0].shape
IndexError: too many indices
In [11]:
You should be using try/except blocks. Something like:
t = 15.2
while t >= 11.4:
F= r'C:\Users\Documents\bin%.2f.txt'%t
try:
F = np.loadtxt(F,skiprows=0)
LogMass = F[:,0]
LogRed = F[:,1]
value = np.median(LogMass)
filesave(*find_nearest(LogMass,LogRed))
except IndexError:
print("bad file: {}".format(F))
else:
print("file worked!")
finally:
t -=0.2
Please refer to the official tutorial for more details about exception handling.
The issue with the last digit is due to how floats work they can not represent base10 numbers exactly. This can lead to fun things like:
In [13]: .3 * 3 - .9
Out[13]: -1.1102230246251565e-16
To deal with the one line file case, add the ndmin parameter to np.loadtxt (review its doc):
np.loadtxt('test.npy',ndmin=2)
# array([[ 1., 2.]])
With the help of a user named ajcr, found the problem was that ndim=2 should have been used in numpy.loadtxt() to insure that the array always 2 has dimensions.
Python uses indentation to define if while and for blocks.
It doesn't look like your if else statement is fully indented from the while.
I usually use a full 'tab' keyboard key to indent instead of 'spaces'
I'm currently learning how to code with python following the exercise at the website 'Learn python the hard way' exercise 25.
The problem is that I can't complete exercise 25 because I have a problem that i can't figure out.
I'm typing into the python console but at the instruction number 8 ex25.print_last_word(words) I have this error:
>>> ex25.print_last_word(words)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "ex25.py", line 19, in print_last_word
word = words.pop(-1)
NameError: global name 'POP' is not defined
this is my code.
def break_words(stuff):
"""This function will break up word for us, praticamente
divide in blank space tra le parole"""
words = stuff.split(' ')
return words
def sort_words(words):
'''sort the words, ordina la parola??'''
return sorted(words)
def print_first_word(words):
'''print the first word after popping it off, ossia trova pop(0) trova
la lettera iniziale della parola..'''
word = words.pop(0)
print word
def print_last_word(words):
'''print the last word after popping it off'''
word = words.pop(-1)
print word
def sort_sentence(sentence):
'''takes in a full sentence and return the sorted words.'''
words = break_words(sentence)
words = break_words(words)
def print_first_and_last(sentence):
'''prints the first and the last words of the sentence.'''
words = break_words(sentence)
print_first_word(words)
print_last_word(words)
def print_first_and_last_sorted(sentence):
'''Sorts the words then prints the first and last one'''
word = sort_sentence(sentence)
print_first_word(words)
print_last_word(words)
The error raised by the Python interpreter does not match the code you posted, since POP is never mentioned in your code.
The error might be an indication that the interpreter has in memory a different definition for the module ex25 than what is in your text file, ex25.py. You can refresh the definition using
>>> reload(ex25)
Note that you must do this every time you modify ex25.py.
For this reason, you may find it easier to modify ex25.py so that it can be run from the command-line by adding
if __name__ == '__main__':
words = ...
print_last_word(words)
to the end of ex25.py, and running the script from the command-line:
python ex25.py
I am modelling data for a logit model with 34 dependent variables,and it keep throwing in the singular matrix error , as below -:
Traceback (most recent call last):
File "<pyshell#1116>", line 1, in <module>
test_scores = smf.Logit(m['event'], train_cols,missing='drop').fit()
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 1186, in fit
disp=disp, callback=callback, **kwargs)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 164, in fit
disp=disp, callback=callback, **kwargs)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 357, in fit
hess=hess)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 405, in _fit_mle_newton
newparams = oldparams - np.dot(np.linalg.inv(H),
File "/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 445, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
File "/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 328, in solve
raise LinAlgError, 'Singular matrix'
LinAlgError: Singular matrix
Which was when I stumpled on this method to reduce the matrix to its independent columns
def independent_columns(A, tol = 0):#1e-05):
"""
Return an array composed of independent columns of A.
Note the answer may not be unique; this function returns one of many
possible answers.
https://stackoverflow.com/q/13312498/190597 (user1812712)
http://math.stackexchange.com/a/199132/1140 (Gerry Myerson)
http://mail.scipy.org/pipermail/numpy-discussion/2008-November/038705.html
(Anne Archibald)
>>> A = np.array([(2,4,1,3),(-1,-2,1,0),(0,0,2,2),(3,6,2,5)])
2 4 1 3
-1 -2 1 0
0 0 2 2
3 6 2 5
# try with checking the rank of matrixs
>>> independent_columns(A)
np.array([[1, 4],
[2, 5],
[3, 6]])
"""
Q, R = linalg.qr(A)
independent = np.where(np.abs(R.diagonal()) > tol)[0]
#print independent
return A[:, independent], independent
A,independent_col_indexes=independent_columns(train_cols.as_matrix(columns=None))
#train_cols will not be converted back from a df to a matrix object,so doing this explicitly
A2=pd.DataFrame(A, columns=train_cols.columns[independent_col_indexes])
test_scores = smf.Logit(m['event'],A2,missing='drop').fit()
I still get the LinAlgError , though I was hoping I will have the reduced matrix rank now.
Also, I see np.linalg.matrix_rank(train_cols) returns 33 (ie. before calling on the independent_columns function total "x" columns was 34(ie, len(train_cols.ix[0])=34 ), meaning I don't have a full rank matrix), while np.linalg.matrix_rank(A2) returns 33 (meaning I have dropped a columns, and yet I still see the LinAlgError , when I run test_scores = smf.Logit(m['event'],A2,missing='drop').fit() , what am I missing ?
reference to the code above -
How to find degenerate rows/columns in a covariance matrix
I tried to start building the model forward,by introducing each variable at a time, which doesn't give me the singular matrix error, but I would rather have a method that is deterministic, and lets me know, what am I doing wrong & how to eliminate these columns.
Edit (updated post the suggestions by #
user333700 below)
1. You are right, "A2" doesn't have the reduced rank of 33 . ie. len(A2.ix[0]) =34 -> meaning the possibly collinear columns are not dropped - should I increase the "tol", tolerance to get rank of A2 (and the numbers of columns thereof) , as 33. If I change the tol to "1e-05" above, then I do get len(A2.ix[0]) =33, which suggests to me that tol >0 (strictly) is one indicator.
After this I just did the same, test_scores = smf.Logit(m['event'],A2,missing='drop').fit(), without nm to get the convergence.
2. Errors post trying 'nm' method. Strange thing though is that if I take just 20,000 rows, I do get the results. Since it is not showing up Memory error, but "Inverting hessian failed, no bse or cov_params available" - I am assuming, there are multiple nearly-similar records - what would you say ?
m = smf.Logit(data['event_custom'].ix[0:1000000] , train_cols.ix[0:1000000],missing='drop')
test_scores=m.fit(start_params=None,method='nm',maxiter=200,full_output=1)
Warning: Maximum number of iterations has been exceeded
Warning (from warnings module):
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 374
warn(warndoc, Warning)
Warning: Inverting hessian failed, no bse or cov_params available
test_scores.summary()
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
test_scores.summary()
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 2396, in summary
yname_list)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/discrete/discrete_model.py", line 2253, in summary
use_t=False)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/iolib/summary.py", line 826, in add_table_params
use_t=use_t)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/iolib/summary.py", line 447, in summary_params
std_err = results.bse
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/tools/decorators.py", line 95, in __get__
_cachedval = self.fget(obj)
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 1037, in bse
return np.sqrt(np.diag(self.cov_params()))
File "/usr/local/lib/python2.7/site-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/base/model.py", line 1102, in cov_params
raise ValueError('need covariance of parameters for computing '
ValueError: need covariance of parameters for computing (unnormalized) covariances
Edit 2: (updated post the suggestions by #user333700 below)
Reiterating what I am trying to model - less than about 1% of total
users "convert" (success outcomes) - so I took a balanced sample of
35(+ve) /65 (-ve)
I suspect the model is not robust, though it converges. So, will use "start_params" as the params from earlier iteration, from a different dataset.
This edit is about confirming is the "start_params" can feed into the results as below -:
A,independent_col_indexes=independent_columns(train_cols.as_matrix(columns=None))
A2=pd.DataFrame(A, columns=train_cols.columns[independent_col_indexes])
m = smf.Logit(data['event_custom'], A2,missing='drop')
#m = smf.Logit(data['event_custom'], train_cols,missing='drop')#,method='nm').fit()#This doesnt work, so tried 'nm' which work, but used lasso, as nm did not converge.
test_scores=m.fit_regularized(start_params=None, method='l1', maxiter='defined_by_method', full_output=1, disp=1, callback=None, alpha=0, \
trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03)
a_good_looking_previous_result.params=test_scores.params #storing the parameters of pass1 to feed into pass2
test_scores.params
bidfloor_Quartile_modified_binned_0 0.305765
connectiontype_binned_0 -0.436798
day_custom_binned_Fri -0.040269
day_custom_binned_Mon 0.138599
day_custom_binned_Sat -0.319997
day_custom_binned_Sun -0.236507
day_custom_binned_Thu -0.058922
user_agent_device_family_binned_iPad -10.793270
user_agent_device_family_binned_iPhone -8.483099
user_agent_masterclass_binned_apple 9.038889
user_agent_masterclass_binned_generic -0.760297
user_agent_masterclass_binned_samsung -0.063522
log_height_width 0.593199
log_height_width_ScreenResolution -0.520836
productivity -1.495373
games 0.706340
entertainment -1.806886
IAB24 2.531467
IAB17 0.650327
IAB14 0.414031
utilities 9.968253
IAB1 1.850786
social_networking -2.814148
IAB3 -9.230780
music 0.019584
IAB9 -0.415559
C(time_day_modified)[(6, 12]]:C(country)[AUS] -0.103003
C(time_day_modified)[(0, 6]]:C(country)[HKG] 0.769272
C(time_day_modified)[(6, 12]]:C(country)[HKG] 0.406882
C(time_day_modified)[(0, 6]]:C(country)[IDN] 0.073306
C(time_day_modified)[(6, 12]]:C(country)[IDN] -0.207568
C(time_day_modified)[(0, 6]]:C(country)[IND] 0.033370
... more params here
Now on a different dataset(pass2, for indexing), I model the same as below -:
ie. I read a new dataframe, do all the variable transformation and then model via Logit as earlier .
m_pass2 = smf.Logit(data['event_custom'], A2_pass2,missing='drop')
test_scores_pass2=m_pass2.fit_regularized(start_params=a_good_looking_previous_result.params, method='l1', maxiter='defined_by_method', full_output=1, disp=1, callback=None, alpha=0, \
trim_mode='auto', auto_trim_tol=0.01, size_trim_tol=0.0001, qc_tol=0.03)
and, possibly keep iterating by picking up "start_params" from earlier passes.
Several points to this:
You need tol > 0 to detect near perfect collinearity, which might also cause numerical problems in later calculations.
Check the number of columns of A2 to see whether a column has really be dropped.
Logit needs to do some non-linear calculations with the exog, so even if the design matrix is not very close to perfect collinearity, the transformed variables for the log-likelihood, derivative or Hessian calculations might still end up being with numerical problems, like singular Hessian.
(All these are floating point problems when we work near floating point precision, 1e-15, 1e-16. There are sometimes differences in the default thresholds for matrix_rank and similar linalg functions which can imply that in some edge cases one function identifies it as singular and another one doesn't.)
The default optimization method for the discrete models including Logit is a simple Newton method, which is fast in reasonably nice cases, but can fail in cases that are badly conditioned. You could try one of the other optimizers which will be one of those in scipy.optimize, method='nm' is usually very robust but slow, method='bfgs' works well in many cases but also can run into convergence problems.
Nevertheless, even when one of the other optimization methods succeeds, it is still necessary to inspect the results. More often than not, a failure with one method means that the model or estimation problem might not be well defined.
A good way to check whether it is just a problem with bad starting values or a specification problem is to run method='nm' first and then run one of the more accurate methods like newton or bfgs using the nm estimate as starting value, and see whether it succeeds from good starting values.
I'm having an issue with this code:
import math
class Money(object):
def __init__(self, salary):
self.salary = salary
sal(self.salary)
def sal(self, x):
y = ( x - ( ( (x * 0.22) + 6534) ) - (1900.9408 + ( (x - 37568)*.077) ) )
print '-----------------------------------------------------------'
print 'monthly income before tax will be: ${0:.2f}' .format(x/12)
print 'bi-weekly income before tax will be: ${0:.2f}' .format(x/24)
print 'Hourly after tax: ${0:.2f}' .format(x/24/70)
print '-----------------------------------------------------------'
print 'Income after tax will be: ${0:.2f}' .format(y)
print 'Monthly after tax: ${0:.2f}' .format((y/12))
print 'bi-weekly after tax: ${0:.2f}' .format((y/24))
print 'Hourly after tax: ${0:.2f}' .format(y/24/70)
answer = raw_input('Do you want to do this again?\nType [Y] or [N]: ')
if( answer == 'Y'):
sal(x)
else:
print 'Thank you!'
return
def main():
x = input('Enter your taxable income: ')
salaryLister = Money(x)
main()
The traceback shows this:
Traceback (most recent call last):
File "taxableincome.py", line 35, in <module>
main()
File "taxableincome.py", line 33, in main
salaryLister = Money(x)
File "taxableincome.py", line 7, in __init__
sal(self.salary)
NameError: global name 'sal' is not defined
What does:
global name 'sal' is not defined mean?
Feel free to make comments about my design as well. I'd love to learn.
use self.sal, this is how you call instance methods of classes in python
How this works in python is, If you look at the method signature you have
def sal(self, salary)
basically, it needs the class reference as the first variable. And in python when you do self.sal it translates to
Money.sal(self, salary)
You can also call the method like this, but the recommended way is
self.sal(salary)
As per comments on your code, There definitely aren't any clear red flags. Though the last return statement in the sal function is not required. Its not a problem having it there, just something that caught my eye.
Also since you asked, I'd like to point this out. Please try to keep to a coding standard. let it be your own or someone else's. The important thing is consistency. But PEP-8 is generally the accepted style for python. You even have plugins for your editor which help you stick to it. You might want to read the style guide linked here.