Python: Cleaner ways to initialize - python-2.7

Or maybe I should say, ways to skip having to initialize at all.
I really hate that every time I want to do a simple count variable, I have to say, "hey python, this variable starts at 0." I want to be able to say count+=1and have it instantly know to start from 0 at the first iteration of the loop. Maybe there's some sort of function I can design to accomodate this? count(1) that adds 1 to a self-created internal count variable that sticks around between iterations of the loop.
I have the same dislike for editing strings/lists into a new string/list.
(Initializing new_string=""/new_list=[] before the loop).
I think list comprehensions may work for some lists.
Does anyone have some pointers for how to solve this problem? I am fairly new, I've only been programming off and on for half a year.

Disclaimer: I do not think that this will make initialization any cleaner. Also, in case you have a typo in some uses of your counter variable, you will not get a NameError but instead it will just silently create and increment a second counter. Remember the Zen of Python:
Explicit is better than implicit.
Having said that, you could create a special class that will automatically add missing attributes and use this class to create and auto-initialize all sorts of counters:
class Counter:
def __init__(self, default_func=int):
self.default = default_func
def __getattr__(self, name):
if name not in self.__dict__:
self.__dict__[name] = self.default()
return self.__dict__[name]
Now you can create a single instance of that class to create an arbitrary number of counters of the same type. Example usage:
>>> c = Counter()
>>> c.foo
0
>>> c.bar += 1
>>> c.bar += 2
>>> c.bar
3
>>> l = Counter(list)
>>> l.blub += [1,2,3]
>>> l.blub
[1, 2, 3]
In fact, this is similar to what collections.defaultdict does, except that you can use dot-notation for accessing the counters, i.e. c.foo instead of c['foo']. Come to think of it, you could even extend defaultdict, making the whole thing much simpler:
class Counter(collections.defaultdict):
def __getattr__(self, name):
return self[name]

If you are using a counter in a for loop you can use enumerate:
for counter, list_index in enumerate(list):
the counter is the first variable in the statement and 1 is added to it per iteration of the loop, the next variable is the value of that iteration in the list. I hope this answers your first question as for your second, the following code might help
list_a = ["this", "is"]
list_b = ["a", "test"]
list_a += list_b
print(list_a)
["this", "is", "a", "test"]
The += works for strings as well because they are essentially lists aw well. Hope this helps!

Related

How to filter a Django queryset, after it is modified in a loop

I have recently been working with Django, and it has been confusing me a lot (although I also like it).
The problem I am facing right now is when I am looping, and in the loop modifying the queryset, in the next loop the .filter is not working.
So let's take the following simplified example:
I have a dictionary that is made from the queryset like this
animal_dict = {chicken: 6,
cows: 7,
fish: 1,
sheep: 2}
The queryset is called self.animals
for key in dict:
if dict[key] < 3:
remove_animal = max(dict, key=dict.get)
remove = self.animals.filter(animal = remove_animal)[-2:]
self.animals = self.animals.difference(remove)
key[replaced_industry] = key[replaced_industry] - 2
What I am trying to do is as follows: my goal is that there needs to be a balance under the animals. So since there are not enough fish, 2 of the animals with the highest n have to go (cows). And then in the second loop - since there are not enough sheep, 2 of the animals with the highest n have to go again (chicken).
Now the first time it loops (with fish), the .filter does exactly as it should. However, when I loop it a second time (for sheep), the remove = self.animals.filter(animal = remove_animal)[-2:] gives me an output is not in line with animal = filter. When I print the remove in the second loop, it returns a list of all different animals (instead of just 1).
After the loops, the dict should look like this: {chicken: 4,
cows: 5,
fish: 1,
sheep: 2}
This because first cow will go down 2 and it is the max, and then chicken will go down 2, as it is then the max
I am definitely missing some Django logic here, but to me this seems very strange. I hope the question is well-understood, else happy to clarify further.
As others have pointed out, every time you call self.animals.filter you are making a request to the database, and this should not be done in a loop.
It isn't very clear what you are trying to achieve, but it seems like you want the number of each type of animal to be (almost?) the same number, and that the only operation you can perform on it is reducing the number of animals.
Its always better to avoid loops if you can.
If you want them all to be the same number, and you can only reduce the number of animals you have, then the best solution would be
fewest_number = min(self.animals.values())
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you want, to say, allow a tolerance +1 of the fewest animal
fewest_number = min(self.animals.values()) + 1
self.animals = {animal: fewest_number for animal in self.animals.keys()}
If you can increase the number of each type of animal, then you could find the average:
average_number = sum(self.animals.values()) / len(self.animals)
self.animals = {animal: average_number for animal in self.animals.keys()}
Answering my own question here. Apparently it didn't work because only count(), order_by(), values(), values_list() and slicing of union queryset is allowed. You can't filter on union queryset and the same applies to .difference.
More about this here: Django: Filter a Queryset made of unions not working
Because in the second loop it is made into a union queryset, the filter function simply doesn't work. The weird thing is that this doesn't show an error, but will simply not filter, what makes it hard to detect.

Reference a list of dicts

Python 2.7 on Mint Cinnamon 17.3.
I have a bit of test code employing a list of dicts and despite many hours of frustration, I cannot seem to work out why it is not working as it should do.
blockagedict = {'location': None, 'timestamp': None, 'blocked': None}
blockedlist = [blockagedict]
blockagedict['location'] = 'A'
blockagedict['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockagedict['blocked'] = True
blockagedict['location'] = 'B'
blockagedict['timestamp'] = '12-Apr-2016 01:01:09.312459'
blockagedict['blocked'] = False
blockedlist.append(blockagedict)
for test in blockedlist:
print test['location'], test['timestamp'], test['blocked']
This always produces the following output and I cannot work out why and cannot see if I have anything wrong with my code. It always prints out the last set of dict values but should print all, if I am not mistaken.
B 12-Apr-2016 01:01:09.312459 False
B 12-Apr-2016 01:01:09.312459 False
I would be happy for someone to show me the error of my ways and put me out of my misery.
It is because the line blockedlist = [blockagedict] actually stores a reference to the dict, not a copy, in the list. Your code effectively creates a list that has two references to the very same object.
If you care about performance and will have 1 million dictionaries in a list, all with the same keys, you will be better off using a NumPy structured array. Then you can have a single, efficient data structure which is basically a matrix of rows and named columns of appropriate types. You mentioned in a comment that you may know the number of rows in advance. Here's a rewrite of your example code using NumPy instead, which will be massively more efficient than a list of a million dicts.
import numpy as np
dtype = [('location', str, 1), ('timestamp', str, 27), ('blocked', bool)]
count = 2 # will be much larger in the real program
blockages = np.empty(count, dtype) # use zeros() instead if some data may never be populated
blockages[0]['location'] = 'A'
blockages[0]['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockages[0]['blocked'] = True
blockages['location'][1] = 'B' # n.b. indexing works this way too
blockages['timestamp'][1] = '12-Apr-2016 01:01:09.312459'
blockages['blocked'][1] = False
for test in blockages:
print test['location'], test['timestamp'], test['blocked']
Note that the usage is almost identical. But the storage is in a fixed size, single allocation. This will reduce memory usage and compute time.
As a nice side effect, writing it as above completely sidesteps the issue you originally had, with multiple references to the same row. Now all the data is placed directly into the matrix with no object references at all.
Later in a comment you mention you cannot use NumPy because it may not be installed. Well, we can still avoid unnecessary dicts, like this:
from array import array
blockages = {'location': [], 'timestamp': [], 'blocked': array('B')}
blockages['location'].append('A')
blockages['timestamp'].append('12-Apr-2016 01:01:08.702149')
blockages['blocked'].append(True)
blockages['location'].append('B')
blockages['timestamp'].append('12-Apr-2016 01:01:09.312459')
blockages['blocked'].append(False)
for location, timestamp, blocked in zip(*blockages.values()):
print location, timestamp, blocked
Note I use array here for efficient storage of the fixed-size blocked values (this way each value takes exactly one byte).
You still end up with resizable lists that you could avoid, but at least you don't need to store a dict in every slot of the list. This should still be more efficient.
Ok, I have initialised the list of dicts right off the bat and this seems to work. Although I am tempted to write a class for this.
blockedlist = [{'location': None, 'timestamp': None, 'blocked': None} for k in range(2)]
blockedlist[0]['location'] = 'A'
blockedlist[0]['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockedlist[0]['blocked'] = True
blockedlist[1]['location'] = 'B'
blockedlist[1]['timestamp'] = '12-Apr-2016 01:01:09.312459'
blockedlist[1]['blocked'] = False
for test in blockedlist:
print test['location'], test['timestamp'], test['blocked']
And this produces what I was looking for:
A 12-Apr-2016 01:01:08.702149 True
B 12-Apr-2016 01:01:09.312459 False
I will be reading from a text file with 1 to 2 million lines, so converting the code to iterate through the lines won't be a problem.

How to create a series of variables automatically?

How can I create a series of variables automatically with python?
Like this:
tmp1=1;
tmp2=1;
tmp3=1;
tmp4=1;
tmp5=1;
Look at this SO question. you have several ways to do that:
Use dict or collection instead
Using globals
Using the exec() method
About the first solution (dict or collection) - is not actually what you asked for, which is to create a variable in the global scope.. but I would go with that anytime. I don't see really any reason why I'd need to create variables dynamically instead of using some datatype.
I would say that using both globals and exec() method for this is a bad practice.
Store them in a dictionary
d = {}
value = ['a','b','c']
for key in range(1, 3):
d[key]= value[key]
print d
> {0:'a',1:'b',2:'c'}
print d[0]
> 'a'
(Comments? I am new to python too!)
I think I have got an answer somewhere:
for i in range(100):
locals()['tmp%d'%i]=1
or:
>>> for i in range(1, 10):
... exec 'tmp' + str(i) + '=1'
I don't know if I have a ambiguity describe, the above two is exactly I want.

Python - null object pattern with generators

It is apparently Pythonic to return values that can be treated as 'False' versions of the successful return type, such that if MyIterableObject: do_things() is a simple way to deal with the output whether or not it is actually there.
With generators, bool(MyGenerator) is always True even if it would have a len of 0 or something equally empty. So while I could write something like the following:
result = list(get_generator(*my_variables))
if result:
do_stuff(result)
It seems like it defeats the benefit of having a generator in the first place.
Perhaps I'm just missing a language feature or something, but what is the pythonic language construct for explicitly indicating that work is not to be done with empty generators?
To be clear, I'd like to be able to give the user some insight as to how much work the script actually did (if any) - contextual snippet as follows:
# Python 2.7
templates = files_from_folder(path_to_folder)
result = list(get_same_sections(templates)) # returns generator
if not result:
msg("No data to sync.")
sys.exit()
for data in result:
for i, tpl in zip(data, templates):
tpl['sections'][i]['uuid'] = data[-1]
msg("{} sections found to sync up.".format(len(result)))
It works, but I think that ultimately it's a waste to change the generator into a list just to see if there's any work to do, so I assume there's a better way, yes?
EDIT: I get the sense that generators just aren't supposed to be used in this way, but I will add an example to show my reasoning.
There's a semi-popular 'helper function' in Python that you see now and again when you need to traverse a structure like a nested dict or what-have-you. Usually called getnode or getn, whenever I see it, it reads something like this:
def get_node(seq, path):
for p in path:
if p in seq:
seq = seq[p]
else:
return ()
return seq
So in this way, you can make it easier to deal with the results of a complicated path to data in a nested structure without always checking for None or try/except when you're not actually dealing with 'something exceptional'.
mydata = get_node(my_container, ('path', 2, 'some', 'data'))
if mydata: # could also be "for x in mydata", etc
do_work(mydata)
else:
something_else()
It's looking less like this kind of syntax would (or could) exist with generators, without writing a class that handles generators in this way as has been suggested.
A generator does not have a length until you've exhausted its iterations.
the only way to get whether it's got anything or not, is to exhaust it
items = list(myGenerator)
if items:
# do something
Unless you wrote a class with attribute nonzero that internally looks at your iterations list
class MyGenerator(object):
def __init__(self, items):
self.items = items
def __iter__(self):
for i in self.items:
yield i
def __nonzero__(self):
return bool(self.items)
>>> bool(MyGenerator([]))
False
>>> bool(MyGenerator([1]))
True
>>>

If the set.add() function and list serve for the same purpose in Python?

As what mentioned in the Title, if both of them serve for the same purpose?
Most of the time i will chose to use list, and i don't know when is a better time to use set.add() function.
I try both of them and give me the exact same result...
Personally feel list is better. What do you guys think?
a = set()
a.add('a1')
a.add('a2')
a.add('a3')
for ele in a:
print ele
b = []
b.append('a1')
b.append('a2')
b.append('a3')
for ele in b:
print ele
Please advise...
In terms of general data structures, a set structure tends to allow only one element of each value whereas a list may have more than one of each.
In other words, the pseudo-code set.add(7) executed twice results in the set containing the single element 7 (or an error if it considers adding the same element twice to be invalid).
Using a list instead of a set would result in two elements, both being 7.
For Python specifically, adding duplicates to a set is not an error but it still plainly only allows one of each:
>>> s = set()
>>> s.add(1)
>>> s.add(1)
>>> s.add(2)
>>> s
set([1, 2])
The list on the other hand allows multiples:
>>> l = list()
>>> l.append(1)
>>> l.append(1)
>>> l.append(2)
>>> l
[1, 1, 2]
The reason why you didn't see a difference is simply because you added three unique items to the list and set. In that context, they act the same. Behaviour only diverges when you add duplicate items.