Unexpected results when checking all values in a Python dictionary are None - python-2.7

Let's consider the following three dictionaries:
topByClass = {'Real Estate': 'VNO', 'Construction': 'TOL', 'Utilities': 'EXC'}
shouldPass = {'Real Estate': None, 'Construction': None, 'Utilities': 'EXC'}
shouldFail = {'Real Estate': None, 'Construction': None, 'Utilities': None}
I am looking to separate instances where all values in the dictionary are None, from everything else. (i.e. the first two should pass, while the last should fail)
I looked around online, particularly at posts such as this one. I tested various solutions out in the python console (running Python 2.7 in a virtualenv on my Mac), and the following worked:
not all(value == None for value in topByClass.values())
Both with and without "not" I can separate dictionaries like 'topByClass' from 'shouldFail'.
>>> not all(value == None for value in shouldFail.values())
>>> False
>>> not all(value == None for value in topByClass.values())
>>> True
(and vice versa for without not)
The thing is, when I go to run the python file, the if statement always evaluates as if every value is None. I have checked if possibly I am mistaking the dictionary, however I print off the dict "topByClass" in the console, and have directly pasted it above. Any ideas what this could be?
Edit:
def _getTopByClass(self, assetClass):
# Find the instrument with the highest rank.
ret = None
highestRank = None
for instrument in self.__instrumentsByClass[assetClass]:
rank = self._getRank(instrument)
if rank is not None and (highestRank is None or rank > highestRank):
highestRank = rank
ret = instrument
return ret
def _getTop(self):
ret = {}
for assetClass in self.__instrumentsByClass:
ret[assetClass] = self._getTopByClass(assetClass)
return ret
def _rebalance(self):
topByClass = self.getTop()
self.info(topByClass) # where I get the output I presented above
if any(value is not None for value in topByClass.values()):
self.info("Not All Empty")
else:
self.info("All None")
Now with the above "if" all are printing ("Not All Empty")
If you would like to see getRank() or more, I would reccommend this example from PyAlgoTrade, as the core mechanics affecting the problem are similar.
Edit 2:
Thought I might mention this incase someone tried to replicate the file linked above... PyAlgoTrade's modules for downloading feeds doesn't work. So you have to use this package to download the data, as well as adding bars from csv:
feed = yahoofeed.Feed()
feed.addBarsFromCSV("SPY", "data/SPY.csv")
for industry, stocks in instrumentsByClass.items():
for stock in stocks:
feed.addBarsFromCSV(stock, "data/"+stock+".csv")
Edit 3: Added some debug info:
self.info(isinstance(topByClass, dict))
self.info(isinstance(topByClass.values(), list))
self.info(isinstance(topByClass.values()[0], str))
returns:
>>> True
>>> True
>>> True (False when the first value is None)
Also, per a comment I thought I'd throw this in
self.info(list(topByClass.values()))
>>> [None, None, None, None]
FINAL EDIT:
Many thanks to all the people who responded, thought I would go ahead and post what I figured out incase anyone runs into a similar problem...
First of all the code/output that identified the problem:
self.info(list(shouldFail.values())
>>> [None, None, None]
self.info(list(topByClass.values())
>>>['VNO', 'TOL', 'EXC']
self.info(list(value is not None for value in topByClass.values()))
>>> [True, True, True]
self.info(any(value is not None for value in topByClass.values()))
>>> <generator object <genexpr> at 0x116094dc0>
I wasn't sure why it returned a generator, then I realized that it was probably using numpy's any() function, as I decalred:
import numpy as *
After changing this to:
import numpy as np
it behaved as expected.

Since you haven't shown us the actual code that is tickling the fail (I understand that might not be possible in a production environment), here is some philosophy about how to debug in a class hierarchy, and one theory about what might be causing this:
do add a print or logging statement to print/log your instance's values before you test it. Then you can see if it really held the value you believe it did ("When reality collides with a theory, reality wins"). Logging should become your new trusty friend in bug-hunting. Distrust all your assumptions (rubber-duck them). But logging is faster and more reliable than poring over a large class hierarchy.
beware there might be an accidental string conversion somewhere up your class hierarchy (possibly from some class someone else wrote, or accidental use of str or repr e.g in a constructor, setter or property, or a init or method with an arg default = 'None' instead of None): 'None' != None . This sort of bug is subtle and insidious. If you find it, you will laugh as you cry.
Anyway, happy logging, and please post us the logger output when you pinpoint the failing comparison. It's important to track down these sort of 'existential' bugs, since they reveal something broken or blind spot in your chain of assumptions, or debugging methodology. That's how you learn.

Related

How to understand this code in terms of building and evaluating a stack?

I was recently trying to solve a challenge on Hackerrank which asked us to figure out whether a string containing brackets (e.g. {}, (), and [] ) was balanced or not (source: https://www.hackerrank.com/challenges/balanced-brackets). I wanted to solve this using the following approach that also integrated the initial format Hackerrank provided:
import sys
def isBalanced(s):
#insert code here
if __name__ == "__main__":
t = int(raw_input().strip())
for a0 in xrange(t):
s = raw_input().strip()
result = isBalanced(s)
print result
I should also note that site has configured the following as being the standard input in order to test the functionality of the code:
3
{[()]}
{[(])}
{{[[(())]]}}
In order to get the following output:
YES
NO
YES
However, I didn't understand how to approach this code, chiefly because I did not understand why Hackerrank used the if __name__ == "__main__": clause, as I thought that this clause was only used if someone wanted their module to be executed directly rather than executed through being imported in another script (source: What does if __name__ == "__main__": do?). I also did not understand the for loop containing for a0 in xrange(t): since a0 is not used within the for loop for anything, so I'm really unsure how the standard input would be processed.
So I ended up looking up the solution on the discussion page, and here it is below:
lefts = '{[('
rights = '}])'
closes = { a:b for a,b in zip(rights,lefts)}
def valid(s):
stack = []
for c in s:
if c in lefts:
stack.append(c)
elif c in rights:
if not stack or stack.pop() != closes[c]:
return False
return not stack # stack must be empty at the end
t = int(raw_input().strip())
for a0 in xrange(t):
s = raw_input().strip()
if valid(s):
print 'YES'
else:
print 'NO'
This code also confuses me, as the writer claimed to utilize a data structure known as a "stack" (although it seems to be just a Python list to me). And although the writer removed the if __name__ == "__main__": statement, they retained the for a0 in xrange(t):, which I'm not sure how it processes the standard input's integer and corresponding strings.
Furthermore, although the isBalanced function confuses me because it returns not stack. In a hash comment on the return statement of the function, the writer also states the # stack must be empty at the end. What does that mean? And how could this list be empty if, during the clause if c in lefts:, the stack is appended with the character of the string that is being iterated in the for-loop. So why would the function return not stack? Wouldn't it be consistent to return True so that the function would act as a Boolean tester (i.e. would return true if a certain object adhered to certain criteria, and false if the the object did not)?
I am still relatively new to coding so there are a lot of principles I am not familiar with. Can anyone illuminate as to how this would work?
iam not sure how your code works .. if name == "main": do?). just exlained where you use of Before executing the code, it will define a few special variables. For example, if the python interpreter is running that module (the source file) as the main program, it sets the special name variable to have a value "main". If this file is being imported from another module, name will be set to the module's name

Elasticsearch "get by index" returns the document, while "match_all" returns no results

I am trying to mock elasticsearch data for hosted CI unit-testing purposes.
I have prepared some fixtures that I can successfully load with bulk(), but then, for unknown reason, I cannot match anything, even though the test_index seemingly contains the data (because I can get() items by their IDs).
The fixtures.json is a subset of ES documents that I fetched from real production index. With real world index, everything works as expected and all tests pass.
An artificial example of the strange behaviour follows:
class MyTestCase(TestCase):
es = Elasticsearch()
#classmethod
def setUpClass(cls):
super().setUpClass()
cls.es.indices.create('test_index', SOME_SCHEMA)
with open('fixtures.json') as fixtures:
bulk(cls.es, json.load(fixtures))
#classmethod
def tearDownClass(cls):
super().tearDownClass()
cls.es.indices.delete('test_index')
def test_something(self):
# check all documents are there:
with open('fixtures.json') as fixtures:
for f in json.load(fixtures):
print(self.es.get(index='test_index', id=f['_id']))
# yes they are!
# BUT:
match_all = {"query": {"match_all": {}}}
print('hits:', self.es.search(index='test_index', body=match_all)['hits']['hits'])
# prints `hits: []` like there was nothing in
print('count:', self.es.count(index='test_index', body=match_all)['count'])
# prints `count: 0`
While I can completely understand your pain (everything works except for the tests), the answer is actually quite simple: the tests, in contrast to your experiments, are too quick.
Elasticsearch is near real-time search engine, which means there
is up to 1s delay between indexing a document and it being
searchable.
There is also unpredictable delay (depending on actual
overhead) between creating an index and it being ready.
So the fix would be time.sleep() to give ES some space to create all the sorcery it needs to give you results. I would do this:
#classmethod
def setUpClass(cls):
super().setUpClass()
cls.es.indices.create('test_index', SOME_SCHEMA)
with open('fixtures.json') as fixtures:
bulk(cls.es, json.load(fixtures))
cls.wait_until_index_ready()
#classmethod
def wait_until_index_ready(cls, timeout=10):
for sec in range(timeout):
time.sleep(1)
if cls.es.cluster.health().get('status') in ('green', 'yellow'):
break
While #jsmesami's is very correct in his answer, there is this possibly cleaner way of doing this. If you notice, the issue is because ES has not re-indexed. There are actually functions exposed by the API for this very purpose.
Try something like,
cls.es.indices.flush(wait_if_ongoing=True)
cls.es.indices.refresh(index='*')
To be more specific, you can pass index='test_index' to both these functions. I think this is a cleaner and more specific way than using sleep(..).

Django test case db giving inconsistent responses, caching or transaction culprit?

I am seeing some really surprising and frustrating behavior with Django testing. Model objects are being "found" by a related lookup, but no model objects exist. (I apologize for the weird description here...the behavior is bizarre enough that I don't know quite how to describe it. Do the objects exist? Do I exist? Do you??)
I need them to exist, so I have a method in place that creates them if they don't exist. The problem is that on one line, Django finds that they do exist, and therefore they are not created...and then on the next line we can confirm that no such objects exist.
My tests are giving Errors in test_something() related to the absence of the necessary TaskMetadata object.
#the model
class TaskMetadata(models.Model):
task = models.OneToOneField(ContentType)
...
#the test
class SimpleTest(TestCase):
def setUp(self):
some_utility_function()
def test_something(self):
...something that requires TaskMetadata...
def some_utility_function():
task = ...whatever...
ctype = ContentType.objects.get_for_model(task)
try:
ctype.taskmetadata
except TaskMetadata.DoesNotExist:
...create TaskMetadata...
print "Created TaskMetadata object for %s" % task.__name__
else:
print "TaskMetadata object already exists for %s" % task.__name__
print ctype.taskmetadata
print "ALL OF THEM!! %s" % TaskMetadata.objects.all()
and the printed result of some_utility_function():
TaskMetadata object already exists for SomeTask
some task
ALL OF THEM!! [] # <-- NOTE EMPTY QUERYSET
In summary: "Yes, TaskMetadata object exists. Yes, TaskMetadata object exists. No, there are no TaskMetadata objects at all!!"
So, seriously, what on earth is going on here? Is this a cache problem? I tried clearing the cache (wild guess; I don't have CACHES configured in settings.py)
def setUp(self):
cache.clear()
some_utility_function()
Does not help. Transactions maybe? I'm stumped. How do I even debug this?
UPDATE:
See a minimal django project that replicates the issue here.
When the first testcase runs, TaskMetadata.objects.all() is NOT an empty queryset (it is in fact populated with objects, as I would expect); when the second testcase (exactly the same as the first) runs, it is empty.
I suspect this has something to do with database flushing between testcases that is clearing out the TaskMetadata objects, but the related lookup is cached, and so the next time some_utility_function() is called for the next testcase, it doesn't create any TaskMetadata objects. 1) Is that plausible? 2) How to work around it? 3) This is a Django bug, right?
Django bug ticket
In your tearDown method you need to call ContentType.objects.clear_cache(). This is because Django caches calls to ContentType.objects.get_for_model. Having a one-to-one to content type is a bit weird, so I don't think django needs to make any changes for this, especially as it should be a one line fix for you.
The problem here is the "finally" clause.
A finally clause is always executed before leaving the try statement, whether an exception has occurred or not.
http://docs.python.org/2/tutorial/errors.html
So, the finally clause containing the print statements will always be executed.

Why does get_FOO_display() return integer value when logging info (django)?

Why does get_FOO_display() return integer value when logging info (django)?
I have a model field that is using a choice to restrict its value. This works fine
and I have it working everywhere within the app, except when logging information,
when the get_FOO_display() method returns the underlying integer value instead
of the human-readable version.
This is the model definition (abridged):
THING_ROLE_MONSTER = 0
THING_ROLE_MUMMY = 1
ROLE_CHOICES = (
(THING_ROLE_MONSTER, u'Monster'),
(THING_ROLE_MUMMY, u'Mummy'),
)
# definition of property within model
class Thing(models.Model):
...
role = models.IntegerField(
'Role',
default=0,
choices=ROLE_CHOICES
)
If I run this within the (django) interactive shell it behaves exactly as you would expect:
>>> from frankenstein.core.models import Thing
>>> thing = Thing()
>>> thing.role = 0
>>> thing.get_role_display()
u'Monster'
However, when I use exactly the same construct within a string formatting / logging
scenario I get the problem:
logger.info('New thing: <b>%s</b>', thing.get_role_display())
returns:
New thing: <b>0</b>
Help!
[UPDATE 1]
When I run the logging within the interactive shell I get the correct output:
>>> from frankenstein.core.models import Thing
>>> import logging
>>> thing = Thing()
>>> thing.role = 0
>>> logging.info('hello %s', b.get_role_display())
INFO hello Monster
[UPDATE 2] Django internals
Following up on the answer from #joao-oliveira below, I have dug into the internals and uncovered the following.
The underlying _get_FIELD_display method in django.db.models looks like this:
def _get_FIELD_display(self, field):
value = getattr(self, field.attname)
return force_unicode(dict(field.flatchoices).get(value, value), strings_only=True)
If I put a breakpoint into the code, and then run ipdb I can see that I have the issue:
ipdb> thing.get_role_display()
u'1'
ipdb> thing._get_FIELD_display(thing._meta.get_field('role'))
u'1'
So, the fix hasn't changed anything. If I then try running through the _get_FIELD_display method code by hand, I get this:
ipdb> fld = thing._meta.get_field('role')
ipdb> fld.flatchoices
[(0, 'Monster'), (1, 'Mummy')]
ipdb> getattr(thing, fld.attname)
u'1'
ipdb> value = getattr(thing, fld.attname)
ipdb> dict(fld.flatchoices).get(value, value)
u'1'
Which is equivalent to saying:
ipdb> {0: 'Monster', 1: 'Mummy'}.get(u'1', u'1')
u'1'
So. The problem we have is that the method is using the string value u'1' to look up the corresponding description in the choices dictionary, but the dictionary keys are integers, and not strings. Hence we never get a match, but instead the default value, which is set to the existing value (the string).
If I manually force the cast to int, the code works as expected:
ipdb> dict(fld.flatchoices).get(int(value), value)
'Mummy'
ipdb> print 'w00t'
This is all great, but doesn't answer my original question as to why the get_foo_display method does return the right value most of the time. At some point the string (u'1') must be cast to the correct data type (1).
[UPDATE 3] The answer
Whilst an honourable mention must go to Joao for his insight, the bounty is going to Josh for pointing out the blunt fact that I am passing in the wrong value to begin with. I put this down to being an emigre from 'strongly-typed-world', where these things can't happen!
The code that I didn't include here is that the object is initialised from a django form, using the cleaned_data from a ChoiceField. The problem with this is that the output from a ChoiceField is a string, not an integer. The bit I missed is that in a loosely-typed language it is possible to set an integer property with a string, and for nothing bad to happen.
Having now looked into this, I see that I should have used the TypedChoiceField, to ensure that the output from cleaned_data is always an integer.
Thank you all.
I'm really sorry if this sounds condescending, but are you 100% sure that you're setting the value to the integer 1 and not the string '1'?
I've gone diving through the internals and running some tests and the only way that the issue you're experiencing makes sense is if you're setting the value to a string. See my simple test here:
>>> from flogger.models import TestUser
>>> t = TestUser()
>>> t.status = 1
>>> t.get_status_display()
u'Admin'
>>> t.status = '1'
>>> t.get_status_display()
u'1'
Examine your view code, or whatever code is actually setting the value, and examine the output of the field directly.
As you pasted from the internal model code:
def _get_FIELD_display(self, field):
value = getattr(self, field.attname)
return force_unicode(dict(field.flatchoices).get(value, value), strings_only=True)
It simply gets the current value of the field, and indexes into the dictionary, and returns the value of the attribute if a lookup isn't found.
I'm guessing there were no errors previously, because the value is coerced into an integer before being inserted into the database.
Edit:
Regarding your update mentioning the type system of python. Firstly, you should be using TypedChoiceField to ensure the form verifies the type that you expect. Secondly, python is a strongly typed language, but the IntegerField does its own coercing with int() when preparing for the database.
Variables are not typed, but the values within them are. I was actually surprised that the IntegerField was coercing the string to an int also. Good lessen to learn here - check the basics first!
Haven't tried your code, neither the #like-it answer sorry, but _get_FIELD_display from models.Model is curried in the fields to set the get_Field_display function, so thats probably why you'r getting that output
try calling the _get_FIELD_display:
logging.info('hello %s', b._get_FIELD_display(b._meta.get('role')))
try this:
class Thing(models.Model):
THING_ROLE_MONSTER = 0
THING_ROLE_MUMMY = 1
ROLE_CHOICES = (
(THING_ROLE_MONSTER, u'Monster'),
(THING_ROLE_MUMMY, u'Mummy'),
)
role = models.IntegerField('Role', default=0,choices=ROLE_CHOICES)

Modifying Dictionary in Django Session Does Not Modify Session

I am storing dictionaries in my session referenced by a string key:
>>> request.session['my_dict'] = {'a': 1, 'b': 2, 'c': 3}
The problem I encountered was that when I modified the dictionary directly, the value would not be changed during the next request:
>>> request.session['my_dict'].pop('c')
3
>>> request.session.has_key('c')
False
# looks okay...
...
# Next request
>>> request.session.has_key('c')
True
# what gives!
As the documentation states, another option is to use
SESSION_SAVE_EVERY_REQUEST=True
which will make this happen every request anyway. Might be worth it if this happens a lot in your code; I'm guessing the occasional additional overhead wouldn't be much and it is far less than the potential problems from neglecting from including the
request.session.modified = True
line each time.
I apologize for "asking" a question to which I already know the answer, but this was frustrating enough that I thought the answer should be recorded on stackoverflow. If anyone has something to add to my explanation I will award the "answer". I couldn't find the answer by searching based on the problem, but after searching based upon the answer I found that my "problem" is documented behavior. Also turns out another person had this problem.
It turns out that SessionBase is a dictionary-like object that keeps track of when you modify it's keys, and manually sets an attribute modified (there's also an accessed). If you mess around with objects within those keys, however, SessionBase has no way to know that the objects are modified, and therefore your changes might not get stored in whatever backend you are using. (I'm using a database backend; I presume this problem applies to all backends, though.) This problem might not apply to models, since the backend is probably storing a reference to the model (and therefore would receive any changes when it loaded the model from the database), but the problem does apply to dictionaries (and perhaps any other base python types that must be stored entirely in the session store.)
The trick is that whenever you modify objects in the session that the session won't notice, you must explicitly tell the session that it is modified:
>>> request.session.modified = True
Hope this helps someone.
The way I got around this was to encapsulate any pop actions on the session into a method that takes care of the details (this method also accepts a view parameter so that session variables can be view-specific):
def session_pop(request, view, key, *args, **kwargs):
"""
Either returns and removes the value of the key from request.session, or,
if request.session[key] is a list, returns the result of a pop on this
list.
Also, if view is not None, only looks within request.session[view.func_name]
so that I can store view-specific session variables.
"""
# figure out which dictionary we want to operate on.
dicto = {}
if view is None:
dicto = request.session
else:
if request.session.has_key(view.func_name):
dicto = request.session[view.func_name]
if dicto.has_key(key):
# This is redundant if `dicto == request.session`, but rather than
# duplicate the logic to test whether we popped a list underneath
# the root level of the session, (which is also determined by `view`)
# just explicitly set `modified`
# since we certainly modified the session here.
request.session.modified = True
# Return a non-list
if not type(dicto[key]) == type(list()):
return dicto.pop(key)
# pop a list
else:
if len(dicto[key]) > 0:
return dicto[key].pop()
# Parse out a default from the args/kwargs
if len(args) > 0:
default = args[0]
elif kwargs.has_key('default'):
default = kwargs['default']
else:
# If there wasn't one, complain
raise KeyError('Session does not have key "{0}" and no default was provided'.format(key))
return default
I'm not too surprised by this. I guess it's just like modifying the contents of a tuple:
a = (1,[2],3)
print a
>>> 1, [2], 3)
a[1] = 4
>>> Traceback (most recent call last):
... File "<stdin>", line 1, in <module>
... TypeError: 'tuple' object does not support item assignment
print a
>>> 1, [2], 3)
a[1][0] = 4
print a
>>> 1, [4], 3)
But thanks anyway.