Python - null object pattern with generators - python-2.7

It is apparently Pythonic to return values that can be treated as 'False' versions of the successful return type, such that if MyIterableObject: do_things() is a simple way to deal with the output whether or not it is actually there.
With generators, bool(MyGenerator) is always True even if it would have a len of 0 or something equally empty. So while I could write something like the following:
result = list(get_generator(*my_variables))
if result:
do_stuff(result)
It seems like it defeats the benefit of having a generator in the first place.
Perhaps I'm just missing a language feature or something, but what is the pythonic language construct for explicitly indicating that work is not to be done with empty generators?
To be clear, I'd like to be able to give the user some insight as to how much work the script actually did (if any) - contextual snippet as follows:
# Python 2.7
templates = files_from_folder(path_to_folder)
result = list(get_same_sections(templates)) # returns generator
if not result:
msg("No data to sync.")
sys.exit()
for data in result:
for i, tpl in zip(data, templates):
tpl['sections'][i]['uuid'] = data[-1]
msg("{} sections found to sync up.".format(len(result)))
It works, but I think that ultimately it's a waste to change the generator into a list just to see if there's any work to do, so I assume there's a better way, yes?
EDIT: I get the sense that generators just aren't supposed to be used in this way, but I will add an example to show my reasoning.
There's a semi-popular 'helper function' in Python that you see now and again when you need to traverse a structure like a nested dict or what-have-you. Usually called getnode or getn, whenever I see it, it reads something like this:
def get_node(seq, path):
for p in path:
if p in seq:
seq = seq[p]
else:
return ()
return seq
So in this way, you can make it easier to deal with the results of a complicated path to data in a nested structure without always checking for None or try/except when you're not actually dealing with 'something exceptional'.
mydata = get_node(my_container, ('path', 2, 'some', 'data'))
if mydata: # could also be "for x in mydata", etc
do_work(mydata)
else:
something_else()
It's looking less like this kind of syntax would (or could) exist with generators, without writing a class that handles generators in this way as has been suggested.

A generator does not have a length until you've exhausted its iterations.
the only way to get whether it's got anything or not, is to exhaust it
items = list(myGenerator)
if items:
# do something
Unless you wrote a class with attribute nonzero that internally looks at your iterations list
class MyGenerator(object):
def __init__(self, items):
self.items = items
def __iter__(self):
for i in self.items:
yield i
def __nonzero__(self):
return bool(self.items)
>>> bool(MyGenerator([]))
False
>>> bool(MyGenerator([1]))
True
>>>

Related

Assign nested function to variable with parameter

disclaimer: My title may not be accurate as far as what I would like to accomplish, but I can update if someone can correct my terminology
I have 2 functions, each with a separate purpose and usable on its own, but occasionally I would like to combine the two to perform both actions at once and return a single result, and to do this I would like to assign to a variable name
I know I can create a 3rd function that does basically what I want as it is really simple.. though it's become a bit of a challenge to myself to find a way of doing this
def str2bool(string):
return string.lower() in ("yes", "true", "t", "1")
def get_setting(string):
if string == 'cat':
return 'yes'
else:
return 'no'
VALID_BOOL = str2bool(get_setting)
print VALID_BOOL('cat')
So basically I would like to assign the combination of the 2 functions to a variable that I can call and pass in the string parameter to evaluate
In my real world code, get_setting() would retrieve a user setting and return the value, I would then like to test that value and return it as a boolean
Again I know I can just create a 3rd function that would get the value and do the quick test.. but this is more for learning to see if it can be done as I'm trying to do.. and so far my different variations of assigning and calling aren't working, is it even possible or would it turn too complex?
Using lambda is easy, but i don't know if it is exactly what you are looking for.
Example:
f = lambda astring : str2bool(get_setting(astring))
Outputs:
>>> f('cat')
True

Parse CSV efficiently in python

I am writing a CSV parser which has following structure
class decode:
def __init__(self):
self.fd = open('test.csv')
def decodeoperation(self):
for row in self.fd:
getcmd = self.decodecmd(row)
if cmd == 'A'
self.decodeAopt()
elif cmd == 'B':
self.decodeBopt()
def decodeAopt(self):
for row in self.fd:
#decodefurther dependencies based on cmd A till
#a condition occurs on any further row
return
def decodeBopt(self):
for row in self.fd:
#decodefurther dependencies based on cmd B till
#a condition occurs on any further row
return
The current code is working fine for me but I am not feeling good to iterate through the CSV file in all the methods. Could it be done in a better way?
There is nothing inherently wrong with using a common iterator across multiple methods, as long as you can determine in advance which method to dispatch to at any given point in the sequence (which you are doing by decoding the cmd from the row and getting 'A', 'B', etc.). The design has issues if you have to read several items before you could determine which method to call, and might have to back up if you picked the wrong method and needed to try another. In parsing, this is called backtracking. Since you are passing around a file object, backing up is difficult. Note that your separate decoder methods will have to know when to stop before reading the next row that contains a command, so they will need some sort of terminating sentinel row that they can recognize.
Some general comments on your Python and class design:
You have a nice simple if-elif-elif dispatch table that can translate to a Python dict like this:
# put this code in place of your "if cmd == ... elif elif elif..." code
dispatch = {
# note - no ()'s, we just want to reference the methods, not call them
'A': self.decodeAopt,
'B': self.decodeBopt,
'C': self.decodeCopt,
# look how easy it is to add more decoders
}
# lookup which decoder to use for the current cmd
decoder = dispatch[cmd]
# run it
decoder()
# or do it all in one line
dispatch[cmd]()
Instead of having your __init__ method open a file, let it accept an iterator object. This will make it much easier to write tests for your object, since you'll be able to pass simple Python lists containing CSV rows.
class decode:
def __init__(self, sequence):
self.fd = sequence
You might want to rename this var from 'fd' to something like 'seq', since it doesn't have to be a file, but could be any iterable that gives you decodable rows.
If you are doing your own CSV parsing, look at using the builtin csv module. It will do quite a bit of work for you, like parsing quoted strings that could contain commas, and can give you easy-to-work-with dicts for each row, given headers read from the input file, or specified by you. If you have modified __init__ as I suggested, you can use it like:
import csv
# assuming test.csv has a header row
reader = csv.DictReader(open('test.csv'))
# or specify headers if not - I encourage you to give these columns better names
reader.fieldnames = ['cmd', 'val1', 'val2', 'val3']
decoder = decode(reader)
decoder.decodeoperation()
Then you can write in decodeoperation:
cmd = row['cmd']
Note that this would impart a slightly different design to your class, that it would expect to be given a sequence of dicts, rather than a sequence of strings.

Reference a list of dicts

Python 2.7 on Mint Cinnamon 17.3.
I have a bit of test code employing a list of dicts and despite many hours of frustration, I cannot seem to work out why it is not working as it should do.
blockagedict = {'location': None, 'timestamp': None, 'blocked': None}
blockedlist = [blockagedict]
blockagedict['location'] = 'A'
blockagedict['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockagedict['blocked'] = True
blockagedict['location'] = 'B'
blockagedict['timestamp'] = '12-Apr-2016 01:01:09.312459'
blockagedict['blocked'] = False
blockedlist.append(blockagedict)
for test in blockedlist:
print test['location'], test['timestamp'], test['blocked']
This always produces the following output and I cannot work out why and cannot see if I have anything wrong with my code. It always prints out the last set of dict values but should print all, if I am not mistaken.
B 12-Apr-2016 01:01:09.312459 False
B 12-Apr-2016 01:01:09.312459 False
I would be happy for someone to show me the error of my ways and put me out of my misery.
It is because the line blockedlist = [blockagedict] actually stores a reference to the dict, not a copy, in the list. Your code effectively creates a list that has two references to the very same object.
If you care about performance and will have 1 million dictionaries in a list, all with the same keys, you will be better off using a NumPy structured array. Then you can have a single, efficient data structure which is basically a matrix of rows and named columns of appropriate types. You mentioned in a comment that you may know the number of rows in advance. Here's a rewrite of your example code using NumPy instead, which will be massively more efficient than a list of a million dicts.
import numpy as np
dtype = [('location', str, 1), ('timestamp', str, 27), ('blocked', bool)]
count = 2 # will be much larger in the real program
blockages = np.empty(count, dtype) # use zeros() instead if some data may never be populated
blockages[0]['location'] = 'A'
blockages[0]['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockages[0]['blocked'] = True
blockages['location'][1] = 'B' # n.b. indexing works this way too
blockages['timestamp'][1] = '12-Apr-2016 01:01:09.312459'
blockages['blocked'][1] = False
for test in blockages:
print test['location'], test['timestamp'], test['blocked']
Note that the usage is almost identical. But the storage is in a fixed size, single allocation. This will reduce memory usage and compute time.
As a nice side effect, writing it as above completely sidesteps the issue you originally had, with multiple references to the same row. Now all the data is placed directly into the matrix with no object references at all.
Later in a comment you mention you cannot use NumPy because it may not be installed. Well, we can still avoid unnecessary dicts, like this:
from array import array
blockages = {'location': [], 'timestamp': [], 'blocked': array('B')}
blockages['location'].append('A')
blockages['timestamp'].append('12-Apr-2016 01:01:08.702149')
blockages['blocked'].append(True)
blockages['location'].append('B')
blockages['timestamp'].append('12-Apr-2016 01:01:09.312459')
blockages['blocked'].append(False)
for location, timestamp, blocked in zip(*blockages.values()):
print location, timestamp, blocked
Note I use array here for efficient storage of the fixed-size blocked values (this way each value takes exactly one byte).
You still end up with resizable lists that you could avoid, but at least you don't need to store a dict in every slot of the list. This should still be more efficient.
Ok, I have initialised the list of dicts right off the bat and this seems to work. Although I am tempted to write a class for this.
blockedlist = [{'location': None, 'timestamp': None, 'blocked': None} for k in range(2)]
blockedlist[0]['location'] = 'A'
blockedlist[0]['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockedlist[0]['blocked'] = True
blockedlist[1]['location'] = 'B'
blockedlist[1]['timestamp'] = '12-Apr-2016 01:01:09.312459'
blockedlist[1]['blocked'] = False
for test in blockedlist:
print test['location'], test['timestamp'], test['blocked']
And this produces what I was looking for:
A 12-Apr-2016 01:01:08.702149 True
B 12-Apr-2016 01:01:09.312459 False
I will be reading from a text file with 1 to 2 million lines, so converting the code to iterate through the lines won't be a problem.

Python: Cleaner ways to initialize

Or maybe I should say, ways to skip having to initialize at all.
I really hate that every time I want to do a simple count variable, I have to say, "hey python, this variable starts at 0." I want to be able to say count+=1and have it instantly know to start from 0 at the first iteration of the loop. Maybe there's some sort of function I can design to accomodate this? count(1) that adds 1 to a self-created internal count variable that sticks around between iterations of the loop.
I have the same dislike for editing strings/lists into a new string/list.
(Initializing new_string=""/new_list=[] before the loop).
I think list comprehensions may work for some lists.
Does anyone have some pointers for how to solve this problem? I am fairly new, I've only been programming off and on for half a year.
Disclaimer: I do not think that this will make initialization any cleaner. Also, in case you have a typo in some uses of your counter variable, you will not get a NameError but instead it will just silently create and increment a second counter. Remember the Zen of Python:
Explicit is better than implicit.
Having said that, you could create a special class that will automatically add missing attributes and use this class to create and auto-initialize all sorts of counters:
class Counter:
def __init__(self, default_func=int):
self.default = default_func
def __getattr__(self, name):
if name not in self.__dict__:
self.__dict__[name] = self.default()
return self.__dict__[name]
Now you can create a single instance of that class to create an arbitrary number of counters of the same type. Example usage:
>>> c = Counter()
>>> c.foo
0
>>> c.bar += 1
>>> c.bar += 2
>>> c.bar
3
>>> l = Counter(list)
>>> l.blub += [1,2,3]
>>> l.blub
[1, 2, 3]
In fact, this is similar to what collections.defaultdict does, except that you can use dot-notation for accessing the counters, i.e. c.foo instead of c['foo']. Come to think of it, you could even extend defaultdict, making the whole thing much simpler:
class Counter(collections.defaultdict):
def __getattr__(self, name):
return self[name]
If you are using a counter in a for loop you can use enumerate:
for counter, list_index in enumerate(list):
the counter is the first variable in the statement and 1 is added to it per iteration of the loop, the next variable is the value of that iteration in the list. I hope this answers your first question as for your second, the following code might help
list_a = ["this", "is"]
list_b = ["a", "test"]
list_a += list_b
print(list_a)
["this", "is", "a", "test"]
The += works for strings as well because they are essentially lists aw well. Hope this helps!

Is it possible to use User Defined Attributes to get values at runtime?

What I really would like to do is cache/memoize certain function arguments and results. I understand in d there's User Defined Attributes, but it appears theres no way to get runtime values with it. Am I mistaken? Is there another similar design pattern I could use here to get similar results?
#memoize("expensiveCalc")
int expensiveCalc(string foo){
///bar
}
So memoize is actually a function that gets called. However, it utilizes the value of my arguments to quickly hash parameters and call the actual function.
Similar to this:
def memoize(iden, time = 0, stale=False, timeout=30):
def memoize_fn(fn):
def new_fn(*a, **kw):
#if the keyword param _update == True, the cache will be
#overwritten no matter what
update = kw.pop('_update', False)
key = make_key(iden, *a, **kw)
res = None if update else memoizecache.get(key)
if res is None:
# okay now go and actually calculate it
res = fn(*a, **kw)
memoizecache.set(key, res, time=time)
return res
new_fn.memoized_fn = fn
return new_fn
return memoize_fn
For what you're trying to do, you'll want a wrapper template rather than a UDA. Phobos actually has one for memoization: http://dlang.org/phobos/std_functional.html#memoize
UDAs in D are used to add information to a function (or other symbol, types and variables too), but they don't actually modify it. The pattern is to have some other code read all the names with reflection, look at the UDAs, and generate the new code that way. If you want to get runtime values from a UDA, you'd write a function that reads it with compile time reflection, then returns the value. Calling that function at runtime gives the UDA there. If you'd like to know more, I can write it up, but I think std.functional.memoize will do what you want here. Remember, UDAs in D add information, they don't change or create code.