Sum values of each key in list of dictionaries python - django

I have a list of dictionaries which is coming from Django query set.
Like this:
email_sent_count = [
{
'second_follow_count': 1,
'first_follow_count': 1,
'initial_count': 1,
'third_follow_count': 0
},
{
'second_follow_count': 1,
'first_follow_count': 0,
'initial_count': 1,
'third_follow_count': 1
},
{
'second_follow_count': 1,
'first_follow_count': 1,
'initial_count': 1,
'third_follow_count': 1
}
]
Now, I want the sum of each key separately.
like this:
inital_contact = 3
first_followup = 2
second_followup = 3
third_followup = 2
I am trying the following solution:
initial_contact = sum(map(lambda x: x['initial_count'], email_sent_count))
first_followup = sum(map(lambda x: x['first_follow_count'], email_sent_count))
second_followup = sum(map(lambda x: x['second_follow_count'], email_sent_count))
third_followup = sum(map(lambda x: x['third_follow_count'], email_sent_count))
But now, I'm getting 11 keys in all dictionaries and I'm implementing 11 lambda function so is there any good way to solve this issue rather than calling 11 times lambda function
By following ORM I getting above email_sent_count
i = Q(inital_contact__range=(from_date, to_date))
f1 = Q(first_followup__range=(from_date, to_date))
f2 = Q(second_followup__range=(from_date, to_date))
f3 = Q(third_followup__range=(from_date, to_date))
email_count = campaign_contact.filter(i | f1 | f2 | f3).annotate(initial_count=Count('inital_contact'),
first_follow_count=Count('first_followup'),
second_follow_count=Count('second_followup'),
third_follow_count=Count('third_followup'),
).values('initial_count', 'first_follow_count',
'second_follow_count', 'third_follow_count'
So, is there is a solution which is directly working with ORM ?

if you don't mind getting a dictionary as result you could use collections.defaultdict like this:
from collections import defaultdict
sums = defaultdict(int)
for item in email_sent_count:
for key, value in item.items():
sums[key] += value
which results in
defaultdict(<class 'int'>,
{'second_follow_count': 3, 'initial_count': 3,
'first_follow_count': 2, 'third_follow_count': 2})
and you can access the individual sums just like a dictionary: sums['second_follow_count'].
...or maybe even better with collections.Counter:
from collections import Counter
sums = Counter()
for item in email_sent_count:
for key, value in item.items():
sums[key] += value
# Counter({'second_follow_count': 3, 'initial_count': 3,
# 'first_follow_count': 2, 'third_follow_count': 2})

Or if you prefer do do it yourself without using Counter or DefaultDict:
from pprint import pprint
email_sent_count = [
{
'second_follow_count': 1,
'first_follow_count': 1,
'initial_count': 1,
'third_follow_count': 0
},
{
'second_follow_count': 1,
'first_follow_count': 0,
'initial_count': 1,
'third_follow_count': 1
},
{
'second_follow_count': 1,
'first_follow_count': 1,
'initial_count': 1,
'third_follow_count': 1
}
]
# create empty dict with all keys from any inner dict and initialized to 0
alls = dict( (x,0) for y in email_sent_count for x in y)
pprint(alls)
for d in email_sent_count:
for k in d:
alls[k] += d[k]
pprint(alls)
Output:
{'first_follow_count': 0,
'initial_count': 0,
'second_follow_count': 0,
'third_follow_count': 0}
{'first_follow_count': 2,
'initial_count': 3,
'second_follow_count': 3,
'third_follow_count': 2}

Finally, I changed the ORM query that gets me the exact result:
ORM looks like this:
email_sent_count = campaign_contact.filter(i | f1 | f2 | f3
).annotate(initial_count=Count('inital_contact'),
first_follow_count=Count('first_followup'),
second_follow_count=Count('second_followup'),
third_follow_count=Count('third_followup')
).aggregate(initial_sum=Sum('initial_count'),
first_follow_sum=Sum('first_follow_count'),
second_follow_sum=Sum('second_follow_count'),
third_follow_sum=Sum('third_follow_count'))
output...
{'third_follow_sum': 2, 'second_follow_sum': 3, 'first_follow_sum': 2, 'initial_sum': 3}
So, there is not for loops, no lambda... I think performance wise it will work.
And Thanks to all to provide me the solutions, by looking other solution I'm able to solve this. :)

Related

Python 2.7 current row index on 2d array iteration

When iterating on a 2d array, how can I get the current row index? For example:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
Something like:
for rows in x:
print x current index (for example, when iterating on [ 5. 6. 7. 8.], return 1)
Enumerate is a built-in function of Python. It’s usefulness can not be summarized in a single line. Yet most of the newcomers and even some advanced programmers are unaware of it. It allows us to loop over something and have an automatic counter. Here is an example:
for counter, value in enumerate(some_list):
print(counter, value)
And there is more! enumerate also accepts an optional argument which makes it even more useful.
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):
print(c, value)
.
# Output:
# 1 apple
# 2 banana
# 3 grapes
# 4 pear
The optional argument allows us to tell enumerate from where to start the index. You can also create tuples containing the index and list item using a list. Here is an example:
my_list = ['apple', 'banana', 'grapes', 'pear']
counter_list = list(enumerate(my_list, 1))
print(counter_list)
.
# Output: [(1, 'apple'), (2, 'banana'), (3, 'grapes'), (4, 'pear')]
enumerate:
In [42]: x = [[ 1, 2, 3, 4],
...: [ 5, 6, 7, 8],
...: [ 9, 0, 3, 6]]
In [43]: for index, rows in enumerate(x):
...: print('current index {}'.format(index))
...: print('current row {}'.format(rows))
...:
current index 0
current row [1, 2, 3, 4]
current index 1
current row [5, 6, 7, 8]
current index 2
current row [9, 0, 3, 6]

Subset a list of dict by keeping a maximum of duplicates

I've to subset a list of dict with conditions on duplicates keys.
For instance with max_duplicates = 2 on key 'main' and the following list:
[
{'main': 1, 'more': 1},
{'main': 1, 'more': 2},
{'main': 1, 'more': 3},
{'main': 2, 'more': 1},
{'main': 2, 'more': 1},
{'main': 2, 'more': 3},
{'main': 3, 'more': 1}
]
I would like to get:
[
{'main': 1, 'more': 1},
{'main': 1, 'more': 2},
{'main': 2, 'more': 1},
{'main': 2, 'more': 1},
{'main': 3, 'more': 1}
]
The selected elements for a given key can be random, and the key will be always the same.
I am looking for the best optimized solution. For now this is my code:
from collections import Counter
import numpy
def remove_duplicates(initial_list, max_duplicates):
main_counts = Counter([elem["main"] for elem in initial_list])
main_values_for_selection = set([main_value for main_value, count in main_counts.iteritems()
if count > max_duplicates])
result = [elem for elem in initial_list
if elem["main"] not in main_values_for_selection]
for main_value in main_values_for_selection:
all_indexes = [index for index, elem in enumerate(initial_list)
if elem["main"] == main_value]
indexes = numpy.random.choice(a=all_indexes, size=max_duplicates, replace=False)
result += [initial_list[i] for i in indexes]
return result
Thanks in advance for your help ;-)
This method always takes the first 2 or max_duplicate of a given key that it sees, but I think it is pretty efficient, only looking through the list once with few temporary variables:
from collections import defaultdict
def remove_duplicates(initials,max_dups):
dup_tracker = defaultdict(int)
rets = []
for entry in initials:
if dup_tracker[entry['main']] < max_dups:
dup_tracker[entry['main']] += 1
rets.append(entry)
return rets
max_dups = 2
initials = [
{'main': 1, 'more': 1},
{'main': 1, 'more': 2},
{'main': 1, 'more': 3},
{'main': 2, 'more': 1},
{'main': 2, 'more': 1},
{'main': 2, 'more': 3},
{'main': 3, 'more': 1}
]
rets = remove_duplicates(initials,max_dups)
print rets
To explain the code, defaultdict(int) creates a dictionary where every key (even if it is not yet defined), starts off with a value of 0. Next we loop through the list and keep track of how many of each key we've seen in the dup_tracker which is a dict keyed by the values of 'main' and valued by the number of times its seen that specific key. If the dup_tracker has seen few enough entries with that given key, it appends it to the rets output array and then returns it.
TIMED EDIT:
It looks like the method I implemented is at least a couple of orders of magnitude faster than yours. I included all the code I used to time them below.
TL;DR Yours 35.721 seconds vs mine 0.016 seconds when running on a list of 50,000 dicts, with values of main ranging from 0-10,000
from collections import Counter
import random
import time
​
def remove_duplicates_1(initial_list, max_duplicates):
main_counts = Counter([elem["main"] for elem in initial_list])
main_values_for_selection = set([main_value for main_value, count in main_counts.iteritems()
if count > max_duplicates])
result = [elem for elem in initial_list
if elem["main"] not in main_values_for_selection]
​
for main_value in main_values_for_selection:
all_indexes = [index for index, elem in enumerate(initial_list)
if elem["main"] == main_value]
indexes = numpy.random.choice(a=all_indexes, size=max_duplicates, replace=False)
result += [initial_list[i] for i in indexes]
return result
​
​
def remove_duplicates_2(initials,max_dups):
dup_tracker = {}
rets = []
for entry in initials:
if entry['main'] not in dup_tracker:
dup_tracker[entry['main']] = 1
rets.append(entry)
elif dup_tracker[entry['main']] < max_dups:
dup_tracker[entry['main']] += 1
rets.append(entry)
return rets
​
def generate_test_list(num_total,max_main):
test_list = []
for it in range(num_total):
main_value = round(random.random()*max_main)
test_list.append({'main':main_value, 'more':it})
return test_list
​
max_duplicates = 2
test_list = generate_test_list(50000,10000)
​
start = time.time()
rets_1 = remove_duplicates_1(test_list,max_duplicates)
time_1 = time.time()-start
​
start = time.time()
rets_2 = remove_duplicates_2(test_list,max_duplicates)
time_2 = time.time()-start
​
print "Yours",time_1,"vs mine",time_2
#Results:
#Yours 35.7210621834 vs mine 0.0159771442413

How to avoid using "no data" in image stacking

I am new in using python. My problem might seems easy but unfortunately I could not find a solution for it. I have a set of images in Geotiff format which are at the same size, their pixel values range between 0 to 5 and their non values are -9999. I would like to do kind of image stacking using Numpy and Gdal. I am looking for an stacking algorithm in which those pixels of each image that have a value between 0 to 5 are used and the no data values are not used in computing the average. For example if I have 30 images and for two of them the value at the index Image[20,20] are 2 & 3 respectively and for the rest of images it is -9999 at this index. I want the single band output image to be 2.5 at this index. I am wondering if anyone knows the way to do it?
Any suggestions or hints are highly appreciated.
Edit:
let me clarify it a bit more. Here is a sample :
import numpy as np
myArray = np.random.randint(5,size=(3,3,3))
myArray [1,1,1] = -9999
myArray
>> array([[[ 0, 2, 1],
[ 1, 4, 1],
[ 1, 1, 2]],
[[ 4, 2, 0],
[ 3, -9999, 0],
[ 1, 0, 3]],
[[ 2, 0, 3],
[ 1, 3, 4],
[ 2, 4, 3]]])
suppose that myArray is an ndarray which contains three images as follow:
Image_01 = myArray[0]
Image_02 = myArray[1]
Image_03 = myArray[2]
the final stacked image is :
stackedImage = myArray.mean(axis=0)
>> array([[ 2.00000000e+00, 1.33333333e+00, 1.33333333e+00],
[ 1.66666667e+00, -3.33066667e+03, 1.66666667e+00],
[ 1.33333333e+00, 1.66666667e+00, 2.66666667e+00]])
But I want it to be this :
array([[ 2.00000000e+00, 1.33333333e+00, 1.33333333e+00],
[ 1.66666667e+00, 3.5, 1.66666667e+00],
[ 1.33333333e+00, 1.66666667e+00, 2.66666667e+00]])
Masked arrays are a good way to deal with missing or invalid values. Masked arrays have a .data attribute, which contains the numerical value for each element, and a .mask attribute that specifies which values should be considered 'invalid' and ignored.
Here's a full example using your data:
import numpy as np
# your example data, with a bad value at [1, 1, 1]
M = np.array([[[ 0, 2, 1],
[ 1, 4, 1],
[ 1, 1, 2]],
[[ 4, 2, 0],
[ 3, -9999, 0],
[ 1, 0, 3]],
[[ 2, 0, 3],
[ 1, 3, 4],
[ 2, 4, 3]]])
# create a masked array where all of the values in `M` that are equal to
# -9999 are masked
masked_M = np.ma.masked_equal(M, -9999)
# take the mean over the first axis
masked_mean = masked_M.mean(0)
# `masked_mean` is another `np.ma.masked_array`, whose `.data` attribute
# contains the result you're looking for
print masked_mean.data
# [[ 2. 1.33333333 1.33333333]
# [ 1.66666667 3.5 1.66666667]
# [ 1.33333333 1.66666667 2.66666667]]

How to replace values in a list at indexed positions?

I have following list of text positions with all values being set to '-999' as default:
List = [(70, 55), (170, 55), (270, 55), (370, 55),
(70, 85), (170, 85), (270, 85), (370, 85)]
for val in List:
self.depth = wx.TextCtrl(panel, -1, value='-999', pos=val, size=(60,25))
I have indexed list and corresponding values at them such as:
indx = ['2','3']
val = ['3.10','4.21']
I want to replace index locations '2' and '3' with values '3.10' and '4.21' respectively in 'List' and keep the rest as '-999'. Any suggestions?
Solved. I used following example:
>>> s, l, m
([5, 4, 3, 2, 1, 0], [0, 1, 3, 5], [0, 0, 0, 0])
>>> d = dict(zip(l, m))
>>> d #dict is better then using two list i think
{0: 0, 1: 0, 3: 0, 5: 0}
>>> [d.get(i, j) for i, j in enumerate(s)]
[0, 0, 3, 0, 1, 0]
from similar question.

Change the values of a list?

liste = [1,2,8,12,19,78,34,197,1,-7,-45,-97,-32,23]
liste2 = []
def repetisjon(liste,liste2):
for count in liste:
if count > 0:
liste2.append(1)
elif count < 0:
liste2.append(0)
return liste2
return (liste2)
print (repetisjon(liste,liste2))
The point is to change all the values of the list. If it's greater than or equal to 0, it is to be replaced by the value 1. And if it's lower than 0, it is to be replaced by 0. But I wasn't able to change the current list. The only solution I found was to make a new list. But is there anyway to CHANGE the current list without making a new one? I tried this as well, but didnt work at all:
liste = [4,8,43,4,78,24,8,45,-78,-6,-7,-3,8,-12,4,36]
def repe (liste):
for count in liste:
if count > 0:
count == 1
else:
count == 0
print (liste)
repe(liste)
Here, I replace the content of liste with the transformed data. since sameliste points to the same list, its value changes too.
>>> sameliste = liste = [1,2,8,12,19,78,34,197,1,-7,-45,-97,-32,23]
>>> sameliste
[1, 2, 8, 12, 19, 78, 34, 197, 1, -7, -45, -97, -32, 23]
>>> liste
[1, 2, 8, 12, 19, 78, 34, 197, 1, -7, -45, -97, -32, 23]
>>> liste[:] = [int(x >= 0) for x in liste]
>>> liste
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1]
>>> sameliste
[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1]
>>>