What does deep=True do in pyyaml.Loader.construct_mapping?

What does deep=True do in pyyaml.Loader.construct_mapping? - python-2.7

In searching around the web for usages of custom constructors I see things like this:
def some_constructor(loader, node):
value = loader.construct_mapping(node, deep=True)
return SomeClass(value)
What does the deep=True do? I don't see it in the pyyaml documentation.
It looks like I need it; I have a yaml file generated by a pyyaml representer and it includes node anchors and aliases (like &id003 and *id003); without deep=True I get a shallow map back for those objects containing anchors/aliases.

That you don't see deep=True in the documentation is because you don't normally need to use it as an end-user of the PyYAML package.
If you trace the use of methods in constructor.py that use deep= you come to construct_mapping() and construct_sequence() in class BaseConstructor() and both of these call BaseConstructor.construct_object().
The relevant code in that method to study is:
if tag_suffix is None:
data = constructor(self, node)
else:
data = constructor(self, tag_suffix, node)
if isinstance(data, types.GeneratorType):
generator = data
data = next(generator)
if self.deep_construct:
for dummy in generator:
pass
else:
self.state_generators.append(generator)
and in particular the for loop in there, which only gets executed if deep=True was passed in.
Rougly said if the data comes from a constructor is a generator, then it walks over that data (in the for loop) until the generator is exhausted. With that mechanism, those constructors can contain a yield to create a base object, of which the details can be filled out after the yield. Because of their being only one yield in such constructors, e.g. for mappings (constructed as Python dicts):
def construct_yaml_map(self, node):
data = {}
yield data
value = self.construct_mapping(node)
data.update(value)
I call this a two step process (one step to the yield the next to the end of the method.
In such two-step constructors the data to be yielded is constructed empty, yielded and then filled out. And that has to be so because of what you already noticed: recursion. If there is a self reference to data somewhere underneath, data cannot be constructed after all its children are constructed, because it would have to wait for itself to be constructed.
The deep parameter indirectly controls whether objects that are potentially generators are recursively being built or appended to the list self.state_generators to be resolved later on.
Constructing a YAML document then boils down to constructing the top-level objects and looping over the potentially recursive objects in self.state_generators until no generators are left (a process that might take more than one pass).

The deep argument controls how nested dictionaries are handled during this process.
When deep=True, the construct_mapping method will recursively call itself on any
nested dictionaries it encounters, and merge the resulting dictionaries together.
for example:
a:
b: 1
c: 2
d:
b: 3
When "deep=True"
{'a': {'b': 1, 'c': 2}, 'd': {'b': 3}}
When "deep=False"
{'a': {'c': 2}, 'd': {'b': 3}}

Related

Parse CSV efficiently in python

I am writing a CSV parser which has following structure
class decode:
def __init__(self):
self.fd = open('test.csv')
def decodeoperation(self):
for row in self.fd:
getcmd = self.decodecmd(row)
if cmd == 'A'
self.decodeAopt()
elif cmd == 'B':
self.decodeBopt()
def decodeAopt(self):
for row in self.fd:
#decodefurther dependencies based on cmd A till
#a condition occurs on any further row
return
def decodeBopt(self):
for row in self.fd:
#decodefurther dependencies based on cmd B till
#a condition occurs on any further row
return
The current code is working fine for me but I am not feeling good to iterate through the CSV file in all the methods. Could it be done in a better way?

There is nothing inherently wrong with using a common iterator across multiple methods, as long as you can determine in advance which method to dispatch to at any given point in the sequence (which you are doing by decoding the cmd from the row and getting 'A', 'B', etc.). The design has issues if you have to read several items before you could determine which method to call, and might have to back up if you picked the wrong method and needed to try another. In parsing, this is called backtracking. Since you are passing around a file object, backing up is difficult. Note that your separate decoder methods will have to know when to stop before reading the next row that contains a command, so they will need some sort of terminating sentinel row that they can recognize.
Some general comments on your Python and class design:
You have a nice simple if-elif-elif dispatch table that can translate to a Python dict like this:
# put this code in place of your "if cmd == ... elif elif elif..." code
dispatch = {
# note - no ()'s, we just want to reference the methods, not call them
'A': self.decodeAopt,
'B': self.decodeBopt,
'C': self.decodeCopt,
# look how easy it is to add more decoders
}
# lookup which decoder to use for the current cmd
decoder = dispatch[cmd]
# run it
decoder()
# or do it all in one line
dispatch[cmd]()
Instead of having your __init__ method open a file, let it accept an iterator object. This will make it much easier to write tests for your object, since you'll be able to pass simple Python lists containing CSV rows.
class decode:
def __init__(self, sequence):
self.fd = sequence
You might want to rename this var from 'fd' to something like 'seq', since it doesn't have to be a file, but could be any iterable that gives you decodable rows.
If you are doing your own CSV parsing, look at using the builtin csv module. It will do quite a bit of work for you, like parsing quoted strings that could contain commas, and can give you easy-to-work-with dicts for each row, given headers read from the input file, or specified by you. If you have modified __init__ as I suggested, you can use it like:
import csv
# assuming test.csv has a header row
reader = csv.DictReader(open('test.csv'))
# or specify headers if not - I encourage you to give these columns better names
reader.fieldnames = ['cmd', 'val1', 'val2', 'val3']
decoder = decode(reader)
decoder.decodeoperation()
Then you can write in decodeoperation:
cmd = row['cmd']
Note that this would impart a slightly different design to your class, that it would expect to be given a sequence of dicts, rather than a sequence of strings.

Python:IndexError: list index out of range in python

So, i'm just a noob when it comes to programming especially python. I have a list holding all the variable names that i'm using in my program:
dList=['market_a','market_b','market_c','market_d','market_e','market_f','market_g']
What i want to do is remove all these objects from memory i.e., this is what i believe needs to be done:
del market_a,market_b,market_c,market_d,market_e,market_f,market_g
market_a=market_b=market_c=market_d=market_e=market_f=market_g= None
I was trying to del the objects by doing something like this:
for index in (len(dList):
del dList[index]
But i'm getting this error.
IndexError: list index out of range
Can somebody please help me with this? Also can somebody please tell me how i can do market_a=market_b=market_c=market_d=market_e=market_f=market_g= Nonefrom dList?
Thanks in advance.

I am not recommending that you do this (see below for a dictionary based solution), however, you can use the exec statement to assign None to the variables:
dList = ['market_a', 'market_b', 'market_c', 'market_d', 'market_e', 'market_f', 'market_g']
for var in dList:
exec '{} = None'.format(var)
You can also explicitly call del on the variable:
for var in dList:
exec 'del {}'.format(var)
# exec '{} = None'.format(var)
although this is not usually required because, if the object to which a variable name was bound has no other references to it, rebinding the variable name results in the original object being eligible for garbage collection and so it will eventually be removed from memory.
In general it makes little sense to store a list of variable names: if the variables are static, that is, known to your code at "compile" time, then you already know what they are called and can simply refer to them by name in your code.
If the variables are dynamic then you'd be better off using a dictionary to associate the names with values. Treat the entries in the dictionary as your variables.
markets = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7}
# to access a "variable"
>>> markets['a']
1
# to modify a "variable"
>>> markets['a'] += 100
>>> markets['a']
101
To delete them all you can simply delete the dictionary:
del markets
or rebind it to an empty dictionary:
markets = {}
Assuming that there is no other reference to the objects stored as values in the dictionary, this will make these objects available for garbage collection, effectively deleting them from memory.
Or you can delete specific keys:
# to delete a specific "variable"
del markets['a']
Unless it is required by your algorithms there's no need to assign None.

As you delete your array is getting smaller, therefore the index is no longer correct.
You can try this:
while dList:
del dList[0]
Just a note when the array is empty, it will evaluate to False.

As mentioned above as you remove items the array gets smaller so a for loop is probably not the best solution. To me it looks like you are trying to accomplish 2 tasks: shrink the array and physically delete the variables:
while dList:
var = dList.pop(0)
exec('del {0}'.format(var))
Consider using the pop() method which will shrink your array and then using the exec() method to delete the variable from memory by passing in the variable name and instead of deleting the string deleting the actual variable instance.

How to maintain order of insertion in dictionary in python? [duplicate]

I have a dictionary that I declared in a particular order and want to keep it in that order all the time. The keys/values can't really be kept in order based on their value, I just want it in the order that I declared it.
So if I have the dictionary:
d = {'ac': 33, 'gw': 20, 'ap': 102, 'za': 321, 'bs': 10}
It isn't in that order if I view it or iterate through it. Is there any way to make sure Python will keep the explicit order that I declared the keys/values in?

From Python 3.6 onwards, the standard dict type maintains insertion order by default.
Defining
d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}
will result in a dictionary with the keys in the order listed in the source code.
This was achieved by using a simple array with integers for the sparse hash table, where those integers index into another array that stores the key-value pairs (plus the calculated hash). That latter array just happens to store the items in insertion order, and the whole combination actually uses less memory than the implementation used in Python 3.5 and before. See the original idea post by Raymond Hettinger for details.
In 3.6 this was still considered an implementation detail; see the What's New in Python 3.6 documentation:
The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5).
Python 3.7 elevates this implementation detail to a language specification, so it is now mandatory that dict preserves order in all Python implementations compatible with that version or newer. See the pronouncement by the BDFL. As of Python 3.8, dictionaries also support iteration in reverse.
You may still want to use the collections.OrderedDict() class in certain cases, as it offers some additional functionality on top of the standard dict type. Such as as being reversible (this extends to the view objects), and supporting reordering (via the move_to_end() method).

from collections import OrderedDict
OrderedDict((word, True) for word in words)
contains
OrderedDict([('He', True), ('will', True), ('be', True), ('the', True), ('winner', True)])
If the values are True (or any other immutable object), you can also use:
OrderedDict.fromkeys(words, True)

Rather than explaining the theoretical part, I'll give a simple example.
>>> from collections import OrderedDict
>>> my_dictionary=OrderedDict()
>>> my_dictionary['foo']=3
>>> my_dictionary['aol']=1
>>> my_dictionary
OrderedDict([('foo', 3), ('aol', 1)])
>>> dict(my_dictionary)
{'foo': 3, 'aol': 1}

Note that this answer applies to python versions prior to python3.7. CPython 3.6 maintains insertion order under most circumstances as an implementation detail. Starting from Python3.7 onward, it has been declared that implementations MUST maintain insertion order to be compliant.
python dictionaries are unordered. If you want an ordered dictionary, try collections.OrderedDict.
Note that OrderedDict was introduced into the standard library in python 2.7. If you have an older version of python, you can find recipes for ordered dictionaries on ActiveState.

Dictionaries will use an order that makes searching efficient, and you cant change that,
You could just use a list of objects (a 2 element tuple in a simple case, or even a class), and append items to the end. You can then use linear search to find items in it.
Alternatively you could create or use a different data structure created with the intention of maintaining order.

I came across this post while trying to figure out how to get OrderedDict to work. PyDev for Eclipse couldn't find OrderedDict at all, so I ended up deciding to make a tuple of my dictionary's key values as I would like them to be ordered. When I needed to output my list, I just iterated through the tuple's values and plugged the iterated 'key' from the tuple into the dictionary to retrieve my values in the order I needed them.
example:
test_dict = dict( val1 = "hi", val2 = "bye", val3 = "huh?", val4 = "what....")
test_tuple = ( 'val1', 'val2', 'val3', 'val4')
for key in test_tuple: print(test_dict[key])
It's a tad cumbersome, but I'm pressed for time and it's the workaround I came up with.
note: the list of lists approach that somebody else suggested does not really make sense to me, because lists are ordered and indexed (and are also a different structure than dictionaries).

You can't really do what you want with a dictionary. You already have the dictionary d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}created. I found there was no way to keep in order once it is already created. What I did was make a json file instead with the object:
{"ac":33,"gw":20,"ap":102,"za":321,"bs":10}
I used:
r = json.load(open('file.json'), object_pairs_hook=OrderedDict)
then used:
print json.dumps(r)
to verify.

from collections import OrderedDict
list1 = ['k1', 'k2']
list2 = ['v1', 'v2']
new_ordered_dict = OrderedDict(zip(list1, list2))
print new_ordered_dict
# OrderedDict([('k1', 'v1'), ('k2', 'v2')])

Another alternative is to use Pandas dataframe as it guarantees the order and the index locations of the items in a dict-like structure.

I had a similar problem when developing a Django project. I couldn't use OrderedDict, because I was running an old version of python, so the solution was to use Django's SortedDict class:
https://code.djangoproject.com/wiki/SortedDict
e.g.,
from django.utils.datastructures import SortedDict
d2 = SortedDict()
d2['b'] = 1
d2['a'] = 2
d2['c'] = 3
Note: This answer is originally from 2011. If you have access to Python version 2.7 or higher, then you should have access to the now standard collections.OrderedDict, of which many examples have been provided by others in this thread.

Generally, you can design a class that behaves like a dictionary, mainly be implementing the methods __contains__, __getitem__, __delitem__, __setitem__ and some more. That class can have any behaviour you like, for example prividing a sorted iterator over the keys ...

if you would like to have a dictionary in a specific order, you can also create a list of lists, where the first item will be the key, and the second item will be the value
and will look like this
example
>>> list =[[1,2],[2,3]]
>>> for i in list:
... print i[0]
... print i[1]
1
2
2
3

You can do the same thing which i did for dictionary.
Create a list and empty dictionary:
dictionary_items = {}
fields = [['Name', 'Himanshu Kanojiya'], ['email id', 'hima#gmail.com']]
l = fields[0][0]
m = fields[0][1]
n = fields[1][0]
q = fields[1][1]
dictionary_items[l] = m
dictionary_items[n] = q
print dictionary_items

Duplicating without referencing in python

How can i duplicate a list of lists (or any other types) in a way that the resulting lists are new objects and not references to the old ones? As an example i have the following list of lists:
l=[[1,2],[3,4]]
what i want as result is:
l=[[1,2],[3,4],[1,2],[3,4]]
If i do l*=2 the new sub-lists are references to the old sub-lists.
Doing l[0].append("python") will result in
l=[[1,2,'python'],[3,4],[1,2,'python'],[3,4]]
Also creating a new list like:
l2=list(l)
or
l2=l[:]
doesn't solve the problem. I want to have new sub-lists which are independent of their origin and which upon changing have no impact on their old fellows. How can i do this i python?

In general, the best way to copy a nested data structure so that copies get made of all the references (not just the ones at the top level) is to use copy.deepcopy. In your nested list example, you can do:
l.extend(copy.deepcopy(l))
deepcopy will still work even if the data structure contains references to itself, or multiple references to the same object. It usually works for objects stored as attributes on an instances of custom classes too. You can define a __deepcopy__ method if you want to give a class special copying behavior (e.g. if some of its attributes are bookkeeping data that shouldn't be copied).
Here's a version of your nested list example code using instances of a linked list class rather than Python lists. copy.deepcopy does the right thing!
class linked_list(object):
def __init__(self, value, next=None):
self.value = value
self.next = next
def __repr__(self):
if self.next is not None:
return "({!r})->{!r}".format(self.value, self.next)
else:
return "({!r})".format self.value
lst = linked_list(linked_list(1, linked_list(2)),
linked_list(linked_list(3, linked_list(4))))
print(lst) # prints ((1)->(2))->((3)->(4))
lst.next.next = copy.deepcopy(lst)
print(lst) # prints ((1)->(2))->((3)->(4))->((1)->(2))->((3)->(4))
lst.value.value = 5
print(lst) # prints ((5)->(2))->((3)->(4))->((1)->(2))->((3)->(4))

list of lists of dictionaries?

I need to create a structure, in my mind similar to an array of linked lists (where a python list = array and dictionary = linked list). I have a list called blocks, and this is something like what I am looking to make:
blocks[0] = {dictionary},{dictionary},{dictionary},...
blocks[1] = {dictionary},{dictionary},{dictionary},...
etc..
currently I build the blocks as such:
blocks = []
blocks.append[()]
blocks.append[()]
blocks.append[()]
blocks.append[()]
I know that must look ridiculous. I just cannot see in my head what that just made, which is part of my problem. I assign to a block from a different list of dictionary items. Here is a brief overview of how a single block is created...
hold = {}
hold['file']=file
hold['count']=count
hold['mass']=mass_lbs
mg1.append(hold)
##this append can happen several times to mg1
blocks[i].append(mg1[j])
##where i is an index for the block I want to append to, and j is the list index corresponding to whichever dictionary item of mg1 I want to grab.
The reason I want these four main indices in blocks is so that I have shorter code with just the one list instead of block1 block2 block3 block4, which would just make the code way longer than it is now.

Okay, going off of what was discussed in the comments, you're looking for a simple way to create a structure that is a list of four items where each item is a list of dictionaries, and all the dictionaries in one of those lists have the same keys but not necessarily the same values. However, if you know exactly what keys each dictionary will have and that never changes, then it might be worth it to consider making them classes that wrap dictionaries and have each of the four lists be a list of objects. This would be easier to keep in your head, and a bit more Pythonic in my opinion. You also gain the advantage of ensuring that the keys in the dictionary are static, plus you can define helper methods. And by emulating the methods of a container type, you can still use dictionary syntax.
class BlockA:
def __init__(self):
self.dictionary = {'file':None, 'count':None, 'mass':None }
def __len__(self):
return len(self.dictionary)
def __getitem__(self, key):
return self.dictionary[key]
def __setitem__(self, key, value):
if key in self.dictionary:
self.dictionary[key] = value
else:
raise KeyError
def __repr__(self):
return str(self.dictionary)
block1 = BlockA()
block1['file'] = "test"
block2 = BlockA()
block2['file'] = "other test"
Now, you've got a guarantee that all instances of your first block object will have the same keys and no additional keys. You can make similar classes for your other blocks, or some general class, or some mix of the two using inheritance. Now to make your data structure:
blocks = [ [block1, block2], [], [], [] ]
print(blocks) # Or "print blocks" if you're not using Python 3.x
blocks[0][0]['file'] = "some new file"
print(blocks)
It might also be worthwhile to have a class for this blocks container, with specific methods for adding blocks of each type and accessing blocks of each type. That way you wouldn't trip yourself up with accidentally adding the wrong kind of block to one of the four lists or similar issues. But depending on how much you'll be using this structure, that could be overkill.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js