I have a list of (label, count) tuples like this:
[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]
From that I want to sum all values with the same label (same labels always adjacent) and return a list in the same label order:
[('grape', 103), ('apple', 29), ('banana', 3)]
I know I could solve it with something like:
def group(l):
result = []
if l:
this_label = l[0][0]
this_count = 0
for label, count in l:
if label != this_label:
result.append((this_label, this_count))
this_label = label
this_count = 0
this_count += count
result.append((this_label, this_count))
return result
But is there a more Pythonic / elegant / efficient way to do this?
itertools.groupby can do what you want:
import itertools
import operator
L = [('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10),
('apple', 4), ('banana', 3)]
def accumulate(l):
it = itertools.groupby(l, operator.itemgetter(0))
for key, subiter in it:
yield key, sum(item[1] for item in subiter)
print(list(accumulate(L)))
# [('grape', 103), ('apple', 29), ('banana', 3)]
using itertools and list comprehensions
import itertools
[(key, sum(num for _, num in value))
for key, value in itertools.groupby(l, lambda x: x[0])]
Edit: as gnibbler pointed out: if l isn't already sorted replace it with sorted(l).
import collections
d=collections.defaultdict(int)
a=[]
alist=[('grape', 100), ('banana', 3), ('apple', 10), ('apple', 4), ('grape', 3), ('apple', 15)]
for fruit,number in alist:
if not fruit in a: a.append(fruit)
d[fruit]+=number
for f in a:
print (f,d[f])
output
$ ./python.py
('grape', 103)
('banana', 3)
('apple', 29)
>>> from itertools import groupby
>>> from operator import itemgetter
>>> L=[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]
>>> [(x,sum(map(itemgetter(1),y))) for x,y in groupby(L, itemgetter(0))]
[('grape', 103), ('apple', 29), ('banana', 3)]
my version without itertools
[(k, sum([y for (x,y) in l if x == k])) for k in dict(l).keys()]
Method
def group_by(my_list):
result = {}
for k, v in my_list:
result[k] = v if k not in result else result[k] + v
return result
Usage
my_list = [
('grape', 100), ('grape', 3), ('apple', 15),
('apple', 10), ('apple', 4), ('banana', 3)
]
group_by(my_list)
# Output: {'grape': 103, 'apple': 29, 'banana': 3}
You Convert to List of tuples like list(group_by(my_list).items()).
Or a simpler more readable answer ( without itertools ):
pairs = [('foo',1),('bar',2),('foo',2),('bar',3)]
def sum_pairs(pairs):
sums = {}
for pair in pairs:
sums.setdefault(pair[0], 0)
sums[pair[0]] += pair[1]
return sums.items()
print sum_pairs(pairs)
Simpler answer without any third-party libraries:
dct={}
for key,value in alist:
if key not in dct:
dct[key]=value
else:
dct[key]+=value
Related
How do I combine elements in a list, e.g.
List(('h', 1), ('i', 1), ('h', 1), ('i', 1), ('l', 2))
such that I get the following result:
List(('h', 2), ('i', 2), ('l', 2))
Basically, I want to sum the numbers associated with each letter, and the letter should appear in the list only once.
val myList = List(('h', 1), ('i', 3), ('h', 5), ('i', 7), ('l', 2))
myList.groupBy(_._1).mapValues(_.foldLeft(0)(_ + _._2)).toList
res0: List[(Char, Int)] = List((h,6), (i,10), (l,2))
val df = List(('h', 1), ('i', 1), ('h', 1), ('i', 1), ('l', 2))
val c = df.groupBy(_._1).mapValues(_.map(_._2).sum).toList
List((h,2), (i,2), (l,2))
you can do:
val h = List(('h', 3), ('i', 1), ('h', 1), ('i', 1), ('l', 2))
h.groupBy(_._1).map(f => (f._1, f._2.map(_._2).sum)).toList
I have a set of x and y coordinates as follows:
x = (1,1,2,2,3,4)
y= (0,1,2,3,4,5)
What is the best way of going about transforming this list into a multiline string format, e.g:
x_y = [((1,0)(1,1)),((1,1)(2,2)),((2,2)(2,3)),((2,3)(3,4)),((3,4)(4,5))]
You can pair up the elements of x and y with zip():
>>> x = (1,1,2,2,3,4)
>>> y = (0,1,2,3,4,5)
>>> xy = zip(x, y)
>>> xy
[(1, 0), (1, 1), (2, 2), (2, 3), (3, 4), (4, 5)]
Then you can rearrange this into the kind of list in your example with a list comprehension:
>>> x_y = [(xy[i], xy[i+1]) for i in xrange(len(xy)-1)]
>>> x_y
[((1, 0), (1, 1)), ((1, 1), (2, 2)), ((2, 2), (2, 3)), ((2, 3), (3, 4)), ((3, 4), (4, 5))]
If you don't care about efficiency, the second part could also be written as:
>>> x_y = zip(xy, xy[1:])
In my code I frequently need to take a subset range of keys+values from a Python OrderedDict (from collections package). Slicing doesn't work (throws TypeError: unhashable type) and the alternative, iterating, is cumbersome:
from collections import OrderedDict
o = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
# want to do:
# x = o[1:3]
# need to do:
x = OrderedDict()
for idx, key in enumerate(o):
if 1 <= idx < 3:
x[key] = o[key]
Is there a better way to get this done?
You can use the itertools.islice function, which takes an iterable and outputs the stop first elements. This is beneficial since iterables don't support the common slicing method, and you won't need to create the whole items list from the OrderedDict.
from collections import OrderedDict
from itertools import islice
o = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
sliced = islice(o.items(), 3) # o.iteritems() in Python 2.7 is o.items() in Python 3
sliced_o = OrderedDict(sliced)
The ordered dict in the standard library, doesn't provide that functionality. Even though libraries existed for a few years before collections.OrderedDict that have this functionality (and provide essentially a superset of OrderedDict): voidspace odict and ruamel.ordereddict (I am the author of the latter package, which is a reimplementation of odict in C):
from odict import OrderedDict as odict
p = odict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print p[1:3]
In ruamel.ordereddict you can relax the ordered input requirement (AFAIK you cannot ask derivative of dict if its keys are ordered (would be good addition to ruamel.ordereddict to recognise collection.OrderedDicts)):
from ruamel.ordereddict import ordereddict
q = ordereddict(o, relax=True)
print q[1:3]
r = odict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print r[1:3]
If you want (or have to) stay within the standard library you can sublass collections.OrderedDict's __getitem__:
class SlicableOrderedDict(OrderedDict):
def __getitem__(self, k):
if not isinstance(k, slice):
return OrderedDict.__getitem__(self, k)
x = SlicableOrderedDict()
for idx, key in enumerate(self.keys()):
if k.start <= idx < k.stop:
x[key] = self[key]
return x
s = SlicableOrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print s[1:3]
of course you could use Martijn's or Jimmy's shorter versions to get the actual slice that needs returning:
from itertools import islice
class SlicableOrderedDict(OrderedDict):
def __getitem__(self, k):
if not isinstance(k, slice):
return OrderedDict.__getitem__(self, k)
return SlicableOrderedDict(islice(self.viewitems(), k.start, k.stop))
t = SlicableOrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print t[1:3]
or if you just want smarten up all existing OrderedDicts without subclassing:
def get_item(self, k):
if not isinstance(k, slice):
return OrderedDict._old__getitem__(self, k)
return OrderedDict(islice(self.viewitems(), k.start, k.stop))
OrderedDict._old__getitem__ = OrderedDict.__getitem__
OrderedDict.__getitem__ = get_item
u = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
print u[1:3]
In Python 2, you can slice the keys:
x.keys()[1:3]
and to support both Python 2 and Python 3, you'd convert to a list first:
list(k)[1:3]
The Python 2 OrderedDict.keys() implementation does exactly that.
In both cases you are given a list of keys in correct order. If creating a whole list first is an issue, you can use itertools.islice() and convert the iterable it produces to a list:
from itertools import islice
list(islice(x, 1, 3))
All of the above also can be applied to the items; use dict.viewitems() in Python 2 to get the same iteration behaviour as Python 3 dict.items() provides. You can pass the islice() object straight to another OrderedDict() in this case:
OrderedDict(islice(x.items(), 1, 3)) # x.viewitems() in Python 2
I was able to slice an OrderedDict using the following:
list(myordereddict.values())[start:stop]
I didn't test the performance.
I wanted to slice using a key, since I didn't know the index in advance:
o = OrderedDict(zip(list('abcdefghijklmnopqrstuvwxyz'),range(1,27)))
stop = o.keys().index('e') # -> 4
OrderedDict(islice(o.items(),stop)) # -> OrderedDict([('a', 1), ('b', 2), ('c', 3)])
or to slice from start to stop:
start = o.keys().index('c') # -> 2
stop = o.keys().index('e') # -> 4
OrderedDict(islice(o.iteritems(),start,stop)) # -> OrderedDict([('c', 3), ('d', 4)])
def slice_odict(odict, start=None, end=None):
return OrderedDict([
(k,v) for (k,v) in odict.items()
if k in list(odict.keys())[start:end]
])
This allows for:
>>> x = OrderedDict([('a',1), ('b',2), ('c',3), ('d',4)])
>>> slice_odict(x, start=-1)
OrderedDict([('d', 4)])
>>> slice_odict(x, end=-1)
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> slice_odict(x, start=1, end=3)
OrderedDict([('b', 2), ('c', 3)])
x = OrderedDict(o.items()[1:3])
In Python's document, it says the following things for the zip function:
"The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n)."
I have a difficulty in understanding the zip(*[iter(s)]*n) idiom. Can any body give me an example on when we should use that idiom?
Thank you very much!
I don't know what documentation you're using, but this version of zip() documentation, has this example:
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> zipped
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zipped)
>>> x == list(x2) and y == list(y2)
True
It interpolates two lists together, in respective order, and it also has an "unzip" feature
And since you asked, here's a slightly more understandable example:
>>> friends = ["Amy", "Bob", "Cathy"]
>>> orders = ["Burger", "Pizza", "Hot dog"]
>>> friend_order_pairs = zip(x, y)
>>> friend_order_pairs
[("Amy", "Burger"), ("Bob", "Pizza"), ("Cathy", "Hot dog")]
It's 2020, but let me leave this here for reference.
The zip(*[iter(s)]*n) idiom is used to split a flat list into chunks.
For example:
>>> mylist = [1, 2, 3, 'a', 'b', 'c', 'first', 'second', 'third']
>>> list(zip(*[iter(mylist)]*3))
[(1, 2, 3), ('a', 'b', 'c'), ('first', 'second', 'third')]
The idiom is analyzed here.
zip() is for sticking two or more lists together.
names=['bob','tim','larry']
ages=[15,36,50]
zip(names,ages)
Out: [('bob', 15), ('tim', 36), ('larry', 50)]
I use it to create dictionaries when I have a separate lists of keys and values:
>>> keys = ('pi', 'c', 'e')
>>> values = (3.14, 3*10**8, 1.6*10**-19)
>>> dict(zip(keys, values))
{'c': 300000000, 'pi': 3.14, 'e': 1.6000000000000002e-19}
Here is how to iterate over two lists and their indices using enumerate() together with zip():
alist = ['a1', 'a2', 'a3']
blist = ['b1', 'b2', 'b3']
for i, (a, b) in enumerate(zip(alist, blist)):
print i, a, b
zip() basically combines two or more items to form another list of equal length:
>>> alist = ['a1', 'a2', 'a3']
>>> blist = ['b1', 'b2', 'b3']
>>>
>>> zip(alist, blist)
[('a1', 'b1'), ('a2', 'b2'), ('a3', 'b3')]
>>>
Use izip instead.
When working with very large data sets, you can use izip which uses a generator and only evaluates results when requested - therefore great for memory management and much better performance. I usually use generator based variants of python modules when possible.
imagine an example like this:
from itertools import islice,izip
w = xrange(9000000000000000000)
x = xrange(2000000000000000000)
y = xrange(9000000000000000000)
z = xrange(9000000000000000000)
# The following only returns a generator that holds an iterator for the first 100 items
# without loading that large mess of numbers into memory
first_100_items_generator = islice(izip(w,x,y,z), 100)
# Iterate through the generator and return only what you need - first 100 items
first_100_items = list(first_100_items_generator)
print(first_100_items)
Output:
[ (0, 0, 0, 0),
(1, 1, 1, 1),
(2, 2, 2, 2),
(3, 3, 3, 3),
(4, 4, 4, 4),
(5, 5, 5, 5),
(6, 6, 6, 6),
(7, 7, 7, 7),
(8, 8, 8, 8),
(9, 9, 9, 9),
(10, 10, 10, 10),
(11, 11, 11, 11)
...
...
]
So here I have four large arrays of numbers, I used izip to zip the values then used islice to pick out the first 100 items.
The nice thing about using xrange, izip and islice is that are use generators, therefore they are not executed until the final "list()" method is called on it.
It's a bit of a digression into generators but good to know when you start doing large data processing in python.
Info on generators:
youtube
Generator intro
I'm trying to multiply two polynomials in Python3 (2x^3-3x^2+4x * 2x^2-3 = 4x^5-6x^4+2x^3+9x^2-12x) and to represent the polynomial I'm using a tuple (exponent, variable), so the operation I described above would be: [(3,2), (2,-3), (1,4)] * [(2,2), (0, -3)]
And I got the next list as an answer: [(5, 4), (3, -6), (4, -6), (2, 9), (3, 8), (1, -12)]
That would represent: 4x^5-6x^3-6x^4+9x^2+8x^3-12x
But my problem is that I can't find a way to 'add' the tuples that have the same first element as you can see with the -6x^3 (3, -6) and 8x^3 (3, 8).
Is there a "Pythonic" way to achieve this?
I would switch from lists to dictionaries. To make addition easier, I'd use defaultdict:
from collections import defaultdict
poly = defaultdict(int)
And then add those tuples into the dictionary:
for exponent, variable in poly_list:
poly[exponent] += variable
It sort of works:
>>> from collections import defaultdict
>>>
>>> poly = defaultdict(int)
>>>
>>> for poly_list in [[(1, 1)], [(1, 1)]]:
... for exponent, variable in poly_list:
... poly[exponent] += variable
...
>>> poly
defaultdict(<type 'int'>, {1: 2})
>>> poly.items()
[(1, 2)]
Although personally, I would just make a Polynomial class:
class Polynomial(object):
def __init__(self, terms=None):
if isinstance(terms, dict):
self.terms = terms
else:
self.terms = dict(terms) or {}
def copy(self):
return Polynomial(self.terms.copy())
def __add__(self, other):
result = self.copy()
for e, c in self.terms.items():
result[e] = self.get(e, 0) + c
return result
def __mul__(self, other):
result = self.copy()
for e1, c1 in self.terms.items():
for e2, c2 in other.terms.items():
result[e1 + e2] = self.get(e1, 0) * other.get(e2, 0)
return result
This could be done in one line using itertools.groupby():
>>> [(exponent, sum(value for _, value in values)) for exponent, values in groupby(sorted(l, key=itemgetter(0)), key=itemgetter(0))]
[(1, -12), (2, 9), (3, 2), (4, -6), (5, 4)]
Breaking it down into something more readable (readability counts)...
Import the tools:
>>> from itertools import groupby
>>> from operator import itemgetter
>>>
Declaring the input (you've already done this bit):
>>> l = [(5, 4), (3, -6), (4, -6), (2, 9), (3, 8), (1, -12)]
>>>
Before we can group, we need to sort (on the first item in the tuple):
>>> l_sorted = sorted(l, key=itemgetter(0))
>>>
And then group (again, by that first item):
>>> l_grouped = groupby(l_sorted, key=itemgetter(0))
>>>
Then create a list comprehension, summing the values in the group (ignoring the key):
>>> [(exponent, sum(v for _,v in values)) for exponent, values in l_grouped]
[(1, -12), (2, 9), (3, 2), (4, -6), (5, 4)]