Could someone explain how dictionaries are sorted and why?
The below line's output:
>>> d= {(1, 2):"f", (1, 3):"f", (1, 4):"f", (1, 4):"f"}
>>> d
{(1, 2): 'f', (1, 5): 'f', (1, 3): 'f', (1, 4): 'f'}
and in general :
>>> de= {"a":1, "b":1, "c":1, "e":1, "d":1}
>>> de
{'a': 1, 'c': 1, 'b': 1, 'e': 1, 'd': 1}
Lists don't behave like this so I'm confused. This is more out of curiosity I could sort it myself for example.
They're hashtables, so they don't guarantee any sorting in any way. After all, that's why they're fast.
Dictionaries are not sorted. The language spec does not garantee that if you print a dictionary twice, that the order will be the same.
Lists on the other hand are indeed sorted.
If you want to emulate something like a dictionary with a given order of the keys, then you could use a list over tuples of keys and values.
dictionaries are key/value pairs. you can slice the dictionaries index and values into a list and sort them if you want to iterate the dictionary in sorted order. Supposing you wanted to see the dictionaries values in sorted order
Related
I want to sort lists of tuples of mathematical operators (stored as strings) and their index, in order of precedence (*,/,+,-), while retaining their original index. There are thousands of lists of tuples within my list.
E.g.
my_lists = [[(0,'*'),(1,'+'),(2,'-')],[(0,'-'),(1,'*'),(2,'*')],[(0,'+'),(1,'/'),(2,'-')]]
should become:
new_list = [[(0,'*'),(1,'+'),(2,'-')],[(1,'*'),(2,'*'),(0,'-')],[(1,'/'),(0,'+'),(2,'-')]]
I've tried using the 'sorted' built in function and storing the precedence in a dictionary.
priority = {'*': 0, '/': 1, '+': 2, '-': 3}
new_list = [sorted(item, key = priority.get) for item in my_lists]
This produces the same original list.
How do I access just the operator part of the tuple whilst sorting the list of tuples?
You are sorting using the whole tuple as key, such as (0, '*'). You have to use the second part of it only (i.e. x[1]):
[sorted(item, key = lambda x: priority.get(x[1])) for item in my_lists]
returns
[[(0, '*'), (1, '+'), (2, '-')],
[(1, '*'), (2, '*'), (0, '-')],
[(1, '/'), (0, '+'), (2, '-')]]
Your code didn't throw an error, because priority.get((0, '*')) is legal and returns None, which is perfectly sortable in Python 2.7 and keeps the list in its original order.
In Python's document, it says the following things for the zip function:
"The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n)."
I have a difficulty in understanding the zip(*[iter(s)]*n) idiom. Can any body give me an example on when we should use that idiom?
Thank you very much!
I don't know what documentation you're using, but this version of zip() documentation, has this example:
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> zipped
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zipped)
>>> x == list(x2) and y == list(y2)
True
It interpolates two lists together, in respective order, and it also has an "unzip" feature
And since you asked, here's a slightly more understandable example:
>>> friends = ["Amy", "Bob", "Cathy"]
>>> orders = ["Burger", "Pizza", "Hot dog"]
>>> friend_order_pairs = zip(x, y)
>>> friend_order_pairs
[("Amy", "Burger"), ("Bob", "Pizza"), ("Cathy", "Hot dog")]
It's 2020, but let me leave this here for reference.
The zip(*[iter(s)]*n) idiom is used to split a flat list into chunks.
For example:
>>> mylist = [1, 2, 3, 'a', 'b', 'c', 'first', 'second', 'third']
>>> list(zip(*[iter(mylist)]*3))
[(1, 2, 3), ('a', 'b', 'c'), ('first', 'second', 'third')]
The idiom is analyzed here.
zip() is for sticking two or more lists together.
names=['bob','tim','larry']
ages=[15,36,50]
zip(names,ages)
Out: [('bob', 15), ('tim', 36), ('larry', 50)]
I use it to create dictionaries when I have a separate lists of keys and values:
>>> keys = ('pi', 'c', 'e')
>>> values = (3.14, 3*10**8, 1.6*10**-19)
>>> dict(zip(keys, values))
{'c': 300000000, 'pi': 3.14, 'e': 1.6000000000000002e-19}
Here is how to iterate over two lists and their indices using enumerate() together with zip():
alist = ['a1', 'a2', 'a3']
blist = ['b1', 'b2', 'b3']
for i, (a, b) in enumerate(zip(alist, blist)):
print i, a, b
zip() basically combines two or more items to form another list of equal length:
>>> alist = ['a1', 'a2', 'a3']
>>> blist = ['b1', 'b2', 'b3']
>>>
>>> zip(alist, blist)
[('a1', 'b1'), ('a2', 'b2'), ('a3', 'b3')]
>>>
Use izip instead.
When working with very large data sets, you can use izip which uses a generator and only evaluates results when requested - therefore great for memory management and much better performance. I usually use generator based variants of python modules when possible.
imagine an example like this:
from itertools import islice,izip
w = xrange(9000000000000000000)
x = xrange(2000000000000000000)
y = xrange(9000000000000000000)
z = xrange(9000000000000000000)
# The following only returns a generator that holds an iterator for the first 100 items
# without loading that large mess of numbers into memory
first_100_items_generator = islice(izip(w,x,y,z), 100)
# Iterate through the generator and return only what you need - first 100 items
first_100_items = list(first_100_items_generator)
print(first_100_items)
Output:
[ (0, 0, 0, 0),
(1, 1, 1, 1),
(2, 2, 2, 2),
(3, 3, 3, 3),
(4, 4, 4, 4),
(5, 5, 5, 5),
(6, 6, 6, 6),
(7, 7, 7, 7),
(8, 8, 8, 8),
(9, 9, 9, 9),
(10, 10, 10, 10),
(11, 11, 11, 11)
...
...
]
So here I have four large arrays of numbers, I used izip to zip the values then used islice to pick out the first 100 items.
The nice thing about using xrange, izip and islice is that are use generators, therefore they are not executed until the final "list()" method is called on it.
It's a bit of a digression into generators but good to know when you start doing large data processing in python.
Info on generators:
youtube
Generator intro
I have an OrderedDict of lists, for example:
data = OrderedDict([('a', [3, 2, 1]]), ('b', [z, y, x])])
I would like to sort both lists by data['a']. The result would look like:
OrderedDict([('a', [1, 2, 3]]), ('b', [x, y, z])])
where the elements maintain relative index position across all lists.
I can do it by tracking the indices:
sorted_data = sorted([(t, i) for t, i in zip(data['a'],range(len(data['a'])))])
but that requires another loop to sort out the other column(s). This example is shortened - there are many more dictionary entries. Is there a more Pythonic way to do this, say with operator.itemgetter?
Thanks.
I'll take my own answer for now, until someone submits something more refined. This is my solution but seems clunky:
sortedData = sorted([(t, i) for t, i in zip(data['a'],range(len(data['a'])))])
sortedIndex = [i[1] for i in sortedData]
for key in data.keys():
data[key] = [ data[key][i] for i in sortedIndex]
here is my list:
projects = ["A", "B", "C"]
hours = [1,2,3]
I want my final answer to be like: {A:1,B:2,C:3}
Is there any suggestion?
Did you try to call dict constructor?
dict(zip(projects,hours))
The code fragment zip(projects,hours) will generate a list of tuples (key,value) which will be used to feed the map (usually called dictionary in python) constructor: dict
In Python 2.7 is also "dictionary comprehension"
>>> projects = ["A", "B", "C"]
>>> hours = [1,2,3]
>>> res = {project: hours for project, hours in zip(projects, hours)}
>>> res
... {'A': 1, 'B': 2, 'C': 3}
My answer is {'A': 1, 'C': 3, 'B': 2}, but I want it to be exactly {'A': 1, 'B': 2, 'C': 3}. I used "sorted", but it only printed out "A, B, C", which missed the value of dictionary
I am trying to obtain the nine keys with the highest values from a large (14m keys) dictionary.
I am using the following to return the nine keys:
import heapq
def dict_nlargest(d,n):
return heapq.nlargest(n ,d, key = lambda k: d[k])
print dict_nlargest(mydict,9)
This works, but I would also like to print the values of those keys. Is there a way to do that using this method?
Normally, iterating over a dict iterates over its keys, so only those will be in the heap. You can change that by using items() or (preferably) iteritems(). You then iterate over (key, value) tuples. The key (for comparison) should be only the value, which can be achieved with lambda x: x[1] or (slightly faster) using operator.itemgetter.
import heapq
from operator import itemgetter
def dict_nlargest_items(d,n):
return heapq.nlargest(n, d.iteritems(), key=itemgetter(1))
mydict = {'a': 1, 'b': 2, 'c': 3}
print dict_nlargest_items(mydict, 2) # [('c', 3), ('b', 2)]
Of course, there is no real need to make this adjustment. Once you have the key, you can always look up the value:
print [(k, mydict[k]) for k in dict_nlargest(mydict, 2)] # [('c', 3), ('b', 2)]