Appending to a copied list appends to the original - list

I am a newbie to Python, so find this a little perplexing. In my code, I have a list of strings:
a=['One','Two','Three','Four']
I want to use this list in a function call, later, but need to use a slightly modified list for a function call before that. So, what do I do? I make a copy of a, preserving it presumably.
b=a
b.append('Five')
Now I happily use b, without affecting a, correct? No, it seems that I was wrong, and a gets "infected" with whatever I did to b.
print(a)
['One', 'Two', 'Three', 'Four', 'Five']
This was the cause of a major bug in some analysis code I was writing. Took me an hour to track down. This suggests that the assignment b=a only makes the pointer of b to point to a, but does not create a copy (the way I am used to in Fortran 95, and Matlab).
How does one copy a list to another one, while leaving the original unmolested?

Turns out that I need to use the copy() function: https://www.programiz.com/python-programming/methods/list/copy.

Python supports dynamic memory allocation. You are storing value of a into b but what happens is, the address of b and a becomes same. You can check it using id() function like this:-
a=['One', 'Two', 'Three', 'Four']
b=a
print(id(a))
print(id(b))
So, what you can do is you can use .copy() function that creates copy of a list and store that in another variable that will be stored at different address.
the changes you will make into the copied list will no more affect the original list because both are stored at two different locations.
a = ['One', 'Two', 'Three', 'Four']
b = a.copy()
b.append('five')
print(a)
print(b)
output:-
['One', 'Two', 'Three', 'Four']
['One', 'Two', 'Three', 'Four', 'five']

Related

Changing the name of a list on the fly using a counter

I have a set of list,
list_0=[a,b,a,b,b,c,f,h................]
list_1=[f,g,c,g,f,a,b,b,b,.............]
list_2=[...............................]
............
list_j=[...............................]
where j is (k-1), with some thousands of value stored in them. I want to count for how many times a specific value is in a specific list. And I can have only 8 different values (I mean, every single element of those list can only have one out of 8 specific values, let's say a,b,c,d,e,f,g,h; so I want to count for every list how many times there's the value a, how many times the value b, and so on).
This is not so complicated.
What is complicated, at least for me, is to change on the fly the name of the list.
I tried:
for i in range(k):
my_list='list_'+str(int(k))
a_sum=exec(my_list.count(a))
b_sum=exec(my_list.count(b))
...
and it doesn't work.
I've read some other answer to similar problem, but I' not able to translate it to fit my need :-(
Tkx.
What you want is to dynamically access a local variable by its name. That's doable, all you need is locals().
If you have variables with names "var0", "var1" and "var2", but you want to access their content without hardcoding it. You can do it as follows:
var0 = [1,2,3]
var1 = [4,5,6]
var2 = [7,8,9]
for i in range(3):
variable = locals()['var'+str(i)]
print(variable)
Output:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
Although doable, it's not advised to do this, you could store those lists in a dict containing their names as string keys, so that later you could access them by simply using a string without needing to take care about variable scopes.
If your names differ just by a number then perhaps you could also use a list, and the number would be the index inside it.

Python:IndexError: list index out of range in python

So, i'm just a noob when it comes to programming especially python. I have a list holding all the variable names that i'm using in my program:
dList=['market_a','market_b','market_c','market_d','market_e','market_f','market_g']
What i want to do is remove all these objects from memory i.e., this is what i believe needs to be done:
del market_a,market_b,market_c,market_d,market_e,market_f,market_g
market_a=market_b=market_c=market_d=market_e=market_f=market_g= None
I was trying to del the objects by doing something like this:
for index in (len(dList):
del dList[index]
But i'm getting this error.
IndexError: list index out of range
Can somebody please help me with this? Also can somebody please tell me how i can do market_a=market_b=market_c=market_d=market_e=market_f=market_g= Nonefrom dList?
Thanks in advance.
I am not recommending that you do this (see below for a dictionary based solution), however, you can use the exec statement to assign None to the variables:
dList = ['market_a', 'market_b', 'market_c', 'market_d', 'market_e', 'market_f', 'market_g']
for var in dList:
exec '{} = None'.format(var)
You can also explicitly call del on the variable:
for var in dList:
exec 'del {}'.format(var)
# exec '{} = None'.format(var)
although this is not usually required because, if the object to which a variable name was bound has no other references to it, rebinding the variable name results in the original object being eligible for garbage collection and so it will eventually be removed from memory.
In general it makes little sense to store a list of variable names: if the variables are static, that is, known to your code at "compile" time, then you already know what they are called and can simply refer to them by name in your code.
If the variables are dynamic then you'd be better off using a dictionary to associate the names with values. Treat the entries in the dictionary as your variables.
markets = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7}
# to access a "variable"
>>> markets['a']
1
# to modify a "variable"
>>> markets['a'] += 100
>>> markets['a']
101
To delete them all you can simply delete the dictionary:
del markets
or rebind it to an empty dictionary:
markets = {}
Assuming that there is no other reference to the objects stored as values in the dictionary, this will make these objects available for garbage collection, effectively deleting them from memory.
Or you can delete specific keys:
# to delete a specific "variable"
del markets['a']
Unless it is required by your algorithms there's no need to assign None.
As you delete your array is getting smaller, therefore the index is no longer correct.
You can try this:
while dList:
del dList[0]
Just a note when the array is empty, it will evaluate to False.
As mentioned above as you remove items the array gets smaller so a for loop is probably not the best solution. To me it looks like you are trying to accomplish 2 tasks: shrink the array and physically delete the variables:
while dList:
var = dList.pop(0)
exec('del {0}'.format(var))
Consider using the pop() method which will shrink your array and then using the exec() method to delete the variable from memory by passing in the variable name and instead of deleting the string deleting the actual variable instance.

What does deep=True do in pyyaml.Loader.construct_mapping?

In searching around the web for usages of custom constructors I see things like this:
def some_constructor(loader, node):
value = loader.construct_mapping(node, deep=True)
return SomeClass(value)
What does the deep=True do? I don't see it in the pyyaml documentation.
It looks like I need it; I have a yaml file generated by a pyyaml representer and it includes node anchors and aliases (like &id003 and *id003); without deep=True I get a shallow map back for those objects containing anchors/aliases.
That you don't see deep=True in the documentation is because you don't normally need to use it as an end-user of the PyYAML package.
If you trace the use of methods in constructor.py that use deep= you come to construct_mapping() and construct_sequence() in class BaseConstructor() and both of these call BaseConstructor.construct_object().
The relevant code in that method to study is:
if tag_suffix is None:
data = constructor(self, node)
else:
data = constructor(self, tag_suffix, node)
if isinstance(data, types.GeneratorType):
generator = data
data = next(generator)
if self.deep_construct:
for dummy in generator:
pass
else:
self.state_generators.append(generator)
and in particular the for loop in there, which only gets executed if deep=True was passed in.
Rougly said if the data comes from a constructor is a generator, then it walks over that data (in the for loop) until the generator is exhausted. With that mechanism, those constructors can contain a yield to create a base object, of which the details can be filled out after the yield. Because of their being only one yield in such constructors, e.g. for mappings (constructed as Python dicts):
def construct_yaml_map(self, node):
data = {}
yield data
value = self.construct_mapping(node)
data.update(value)
I call this a two step process (one step to the yield the next to the end of the method.
In such two-step constructors the data to be yielded is constructed empty, yielded and then filled out. And that has to be so because of what you already noticed: recursion. If there is a self reference to data somewhere underneath, data cannot be constructed after all its children are constructed, because it would have to wait for itself to be constructed.
The deep parameter indirectly controls whether objects that are potentially generators are recursively being built or appended to the list self.state_generators to be resolved later on.
Constructing a YAML document then boils down to constructing the top-level objects and looping over the potentially recursive objects in self.state_generators until no generators are left (a process that might take more than one pass).
The deep argument controls how nested dictionaries are handled during this process.
When deep=True, the construct_mapping method will recursively call itself on any
nested dictionaries it encounters, and merge the resulting dictionaries together.
for example:
a:
b: 1
c: 2
d:
b: 3
When "deep=True"
{'a': {'b': 1, 'c': 2}, 'd': {'b': 3}}
When "deep=False"
{'a': {'c': 2}, 'd': {'b': 3}}

Django - Many To Many after filtering one end of the relationship, counting backwards always produces the value 1

I have two models that are in a Many To Many relationship, and I want the following effect to occur:
Consider models A and B in a many to many relationship with eachother.
A's related_name for B is bs and B's related name for A is as
Whenever I create an A or B it will always be immediately connect to one or more of the other, so initially all instances of A and B will have at least one related object.
If I want to delete an A (let's call it a0) I want it to delete all Bs that would be left with no related A's after a0 is deleted, so essentially I want to delete all B that have only a0 in their related_set as (the reverse example of this would also be expected).
The way I was trying to implement this is, when I want to delete an A such as a0, I would say:
a0.bs.annotate(Count('bs')).filter(bs__count=1).delete()
However this would unconditionally delete ALL related B instances in a0.bs, and when I went to the shell to test it out, when I would get this result:
>>> a0.bs.annotate(Count('bs')).values_list('bs__count',flat=True)
<QuerySet [1, 1, 1, 1]>
>>> B.objects.filter(as=a0).annotate(Count('bs')).values_list('bs__count',flat=True)
<QuerySet [1, 1, 1, 1]>
But I would also get this if I did this with the same database instance:
>>> B.objects.annotate(Count('bs')).filter(as=a0).values_list('bs__count',flat=True)
<QuerySet [1, 4, 6, 6]>
So it is the case that 3 out of the 4 of these B instances owuldn't satisfy count == 1 but they all satisfy if I filter for the specific B instances I want to look at before annotating, which seems significantly more efficient than the last command used (the one with the accurate result).
Can anyone give me any insight on this effect?
Can't you assume that any B record that does not have any related A records to it should be deleted as well? The same is true vice-versa as well.
A.objects.filter(pk=1).delete()
B.objects.filter(as=None).delete()

Reference a list of dicts

Python 2.7 on Mint Cinnamon 17.3.
I have a bit of test code employing a list of dicts and despite many hours of frustration, I cannot seem to work out why it is not working as it should do.
blockagedict = {'location': None, 'timestamp': None, 'blocked': None}
blockedlist = [blockagedict]
blockagedict['location'] = 'A'
blockagedict['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockagedict['blocked'] = True
blockagedict['location'] = 'B'
blockagedict['timestamp'] = '12-Apr-2016 01:01:09.312459'
blockagedict['blocked'] = False
blockedlist.append(blockagedict)
for test in blockedlist:
print test['location'], test['timestamp'], test['blocked']
This always produces the following output and I cannot work out why and cannot see if I have anything wrong with my code. It always prints out the last set of dict values but should print all, if I am not mistaken.
B 12-Apr-2016 01:01:09.312459 False
B 12-Apr-2016 01:01:09.312459 False
I would be happy for someone to show me the error of my ways and put me out of my misery.
It is because the line blockedlist = [blockagedict] actually stores a reference to the dict, not a copy, in the list. Your code effectively creates a list that has two references to the very same object.
If you care about performance and will have 1 million dictionaries in a list, all with the same keys, you will be better off using a NumPy structured array. Then you can have a single, efficient data structure which is basically a matrix of rows and named columns of appropriate types. You mentioned in a comment that you may know the number of rows in advance. Here's a rewrite of your example code using NumPy instead, which will be massively more efficient than a list of a million dicts.
import numpy as np
dtype = [('location', str, 1), ('timestamp', str, 27), ('blocked', bool)]
count = 2 # will be much larger in the real program
blockages = np.empty(count, dtype) # use zeros() instead if some data may never be populated
blockages[0]['location'] = 'A'
blockages[0]['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockages[0]['blocked'] = True
blockages['location'][1] = 'B' # n.b. indexing works this way too
blockages['timestamp'][1] = '12-Apr-2016 01:01:09.312459'
blockages['blocked'][1] = False
for test in blockages:
print test['location'], test['timestamp'], test['blocked']
Note that the usage is almost identical. But the storage is in a fixed size, single allocation. This will reduce memory usage and compute time.
As a nice side effect, writing it as above completely sidesteps the issue you originally had, with multiple references to the same row. Now all the data is placed directly into the matrix with no object references at all.
Later in a comment you mention you cannot use NumPy because it may not be installed. Well, we can still avoid unnecessary dicts, like this:
from array import array
blockages = {'location': [], 'timestamp': [], 'blocked': array('B')}
blockages['location'].append('A')
blockages['timestamp'].append('12-Apr-2016 01:01:08.702149')
blockages['blocked'].append(True)
blockages['location'].append('B')
blockages['timestamp'].append('12-Apr-2016 01:01:09.312459')
blockages['blocked'].append(False)
for location, timestamp, blocked in zip(*blockages.values()):
print location, timestamp, blocked
Note I use array here for efficient storage of the fixed-size blocked values (this way each value takes exactly one byte).
You still end up with resizable lists that you could avoid, but at least you don't need to store a dict in every slot of the list. This should still be more efficient.
Ok, I have initialised the list of dicts right off the bat and this seems to work. Although I am tempted to write a class for this.
blockedlist = [{'location': None, 'timestamp': None, 'blocked': None} for k in range(2)]
blockedlist[0]['location'] = 'A'
blockedlist[0]['timestamp'] = '12-Apr-2016 01:01:08.702149'
blockedlist[0]['blocked'] = True
blockedlist[1]['location'] = 'B'
blockedlist[1]['timestamp'] = '12-Apr-2016 01:01:09.312459'
blockedlist[1]['blocked'] = False
for test in blockedlist:
print test['location'], test['timestamp'], test['blocked']
And this produces what I was looking for:
A 12-Apr-2016 01:01:08.702149 True
B 12-Apr-2016 01:01:09.312459 False
I will be reading from a text file with 1 to 2 million lines, so converting the code to iterate through the lines won't be a problem.