Combine two Django Querysets based on common field - django

I have two Querysets (actually, list of dicts) like:
q1 = M1.objects.filter(id=pk).values('p_id', 'q1_quantity')
# q1: <Queryset[{'p_id': 2, 'q1_quantity': 4}, {'p_id': 3, 'q1_quantity': 5}]>
q2 = M2.objects.filter(p_id__in=[q1[x]['p_id'] for x in range(len(q1))]).values('p_id', 'q2_quantity')
# q2: <Queryset[{'p_id': 2, 'q2_quantity': 2}, {'p_id': 2, 'q2_quantity': 5}, {'p_id': 3, 'q2_quantity': 1}, {'p_id': 3, 'q2_quantity': 7}]>
q1 has distinct key:value pairs, while q2 has repeated keys.
1) I want to sum all the values of q2 by common p_id, such that q2 becomes:
# q2: <Queryset[{'p_id': 2, 'q2_quantity': 7}, {'p_id': 3, 'q2_quantity': 8}]>
2) Then, merge q1 and q2 into q3, based on common p_id, like:
q3 = ?
# q3: <Queryset[{'p_id': 2, 'q1_quantity': 4, 'q2_quantity': 7}, {'p_id': 3, 'q1_quantity': 5, 'q2_quantity': 8}]>
I have looked into union(). But don't know how to go about summing the queryset (q2) and then merging it with q1.
Can someone please help me?

The problem is that you're implementing inefficient models, having 2 separate models with repeated fields will force you to make 2 queries. You may want to consider having them all in one model, or the M2 model extends M1.
models.py
class M(models.Model):
p_id = #Your Field...
q1_quantity = #Your Field...
q2_quantity = #Your Field...
then on your views.py
q = M.objects.filter(id=pk).values('p_id', 'q1_quantity', 'q2_quantity')
Potential Issue: In the code you posted, the commented section shows a queryset of more than 1 object and pk as primary key should be unique and therefore should return a unique object queryset.

1) I want to sum all the values of q2 by common p_id, such that q2 becomes:
# q2: <Queryset[{'p_id': 2, 'q2_quantity': 7}, {'p_id': 3, 'q2_quantity': 8}]>
Used itertools.combinations:
from itertools import combinations
compare = []
for a, b in combinations(q2, 2):
if a['p_id'] == b ['p_id']:
a['q2_quantity'] += b['q2_quantity']
if len(compare) <= 0:
compare.append(a)
else:
[compare[d]['q2_quantity'] for d in range(len(compare)) if a['p_id'] == compare[d]['p_id']]
else:
if len(compare) <= 0:
compare.append(a)
compare.append(b)
else:
if any([a['p_id'] == compare[d]['p_id'] for d in range(len(compare))]):
pass
else:
compare.append(a)
if any([b['p_id'] == compare[d]['p_id'] for d in range(len(compare))]):
pass
else:
compare.append(b)
2) Then, merge q1 and q2 into q3, based on common p_id, like:
q3 = ?
# q3: <Queryset[{'p_id': 2, 'q1_quantity': 4, 'q2_quantity': 7}, {'p_id': 3, 'q1_quantity': 5, 'q2_quantity': 8}]>
As per this SO post:
from collections import defaultdict
from itertools import chain
collector = defaultdict(dict)
for collectible in chain(cp, compare):
collector[collectible['p_id']].update(collectible.items())
products = list(collector.values())

Related

How to get top 5 records in django dict data

I m having two tables 1) Visit 2) disease. visit table having a column for disease. I m trying to get top 5 disease from visit table.
dis=disease.objects.all()
for d in dis:
v=visits.objects.filter(disease=d.disease_name).count()
data={
d.disease_name : v
}
print (data)
This print all disease with respective count. as below:
{'Headache': 2}
{'Cold': 1}
{'Cough': 4}
{'Dog Bite': 0}
{'Fever': 2}
{'Piles': 3}
{'Thyroid': 4}
{'Others': 9}
I want to get top 5 from this list based on count. How to do it?
Thank you all for your reply, I got an other simple solution for it.
from django.db.models import Count
x = visits.objects.values('disease').annotate(disease_count=Count('disease')).order_by('-disease_count')[:5]
print(x)
it returns as below:
<QuerySet [{'disease': 'Others', 'disease_count': 9}, {'disease': 'Thyroid', 'disease_count': 4}, {'disease': 'Cough', 'disease_count': 4}, {'disease': 'Piles', 'disease_count': 3}, {'disease': 'Headache', 'disease_count': 2}]>
I think this is simplest solutions. It working for me...
Add data in a list and sort list based on what you want:
dis=disease.objects.all()
l = list()
for d in dis:
v=visits.objects.filter(disease=d.disease_name).count()
data={
d.disease_name : v
}
l.append(data)
l.sort(reverse=True, key=lambda x:list(x.values())[0])
for i in range(min(len(l), 5)):
print(l[i])
You can sort these values by writing code like that:
diseases = list(Disease.objects.values_list('disease_name', flat=True))
visits = list(
Visits.objects.filter(disease__in=diseases).values_list('disease', flat=True))
data = {}
for name in diseases:
count = visits.count(name)
data[name] = count
sorted_data = sorted(data.items(), key=operator.itemgetter(1), reverse=True)
new_data = {}
for idx in range(min(len(sorted_data), 5)):
item = sorted_data[idx]
new_data[item[0]] = item[1]
print(new_data)
It's little messy, but it does the job:
I also optimised your queries, so the code should also run bit faster (when you do logic like that, use list and .values_list(...) because it caches data in memory - and using native python functions on list instead of QuerySet like .count() should also be faster than hitting database).

TypeError: list indices must be integers, not set Python

What I am trying to do here is to take a list of sets as input and return a set of elements that occur in all of the given sets. I am getting a 'TypeError: list indices must be integers, not set' error. I do not understand why this is the case since range(len(list_of_sets)) is a list of integers.
def intersection_of_sets(list_of_sets):
return reduce(lambda x, y: list_of_sets[x] &\
list_of_sets[y], range(len(list_of_sets)))
print(intersection_of_sets([{1, 2, 3}, {2, 3, 4}, {2, 5}, {1, 2, 5}]))
This OUTPUT that I am going for is set([2])
The problem occurs on the second iteration/step of the reduce() operation since the first iteration/step produced a set:
list_of_sets[0] & list_of_sets[1] # returns a set
You can observe it if you debug it and print out the values of x and y:
def intersection_of_sets(list_of_sets):
def merge_function(x, y):
print(x, y)
return list_of_sets[x] & list_of_sets[y]
return reduce(merge_function, range(len(list_of_sets)))
You would see printed:
0 1
{2, 3} 2 # < we've got a problem here
...
TypeError: list indices must be integers or slices, not set
What you meant to do is to reduce the list_of_sets itself:
def intersection_of_sets(list_of_sets):
return reduce(lambda x, y: x & y, list_of_sets)
Demo:
In [1]: from functools import reduce
In [2]: def intersection_of_sets(list_of_sets):
...: return reduce(lambda x, y: x & y, list_of_sets)
...:
In [3]: print(intersection_of_sets([{1, 2, 3}, {2, 3, 4}, {2, 5}, {1, 2, 5}]))
set([2])
Because you are accessing a whole set, such as {1,2,3}.
Try instead:
set([2][1])
Iterating over all of them works:
def intersection_of_sets(list_of_sets):
res = list_of_sets[0]
for s in list_of_sets[1:]:
res &= s
return res
print(intersection_of_sets([{1, 2, 3}, {2, 3, 4}, {2, 5}, {1, 2, 5}]))
Output:
{2}
Here &= is the in-place operator, meaning the same as res = res & s.
Sets have a built in function, intersection
a = {1, 2, 3}
b = {1, 2, 4}
c = {0, 2, 5}
a.intersection(b).intersection(c)
returns: {2}
Just create a recursive function to compare the first two and push the result until you only have a length of one on either the array or the result set.
def compareSets(arr):
if len(arr) == 1: return arr[0]
intersect = arr[0].intersection(arr[1])
if len(arr) == 0: return {}
arr.append(intersect)
return compareSets(arr)

Subset a list of dict by keeping a maximum of duplicates

I've to subset a list of dict with conditions on duplicates keys.
For instance with max_duplicates = 2 on key 'main' and the following list:
[
{'main': 1, 'more': 1},
{'main': 1, 'more': 2},
{'main': 1, 'more': 3},
{'main': 2, 'more': 1},
{'main': 2, 'more': 1},
{'main': 2, 'more': 3},
{'main': 3, 'more': 1}
]
I would like to get:
[
{'main': 1, 'more': 1},
{'main': 1, 'more': 2},
{'main': 2, 'more': 1},
{'main': 2, 'more': 1},
{'main': 3, 'more': 1}
]
The selected elements for a given key can be random, and the key will be always the same.
I am looking for the best optimized solution. For now this is my code:
from collections import Counter
import numpy
def remove_duplicates(initial_list, max_duplicates):
main_counts = Counter([elem["main"] for elem in initial_list])
main_values_for_selection = set([main_value for main_value, count in main_counts.iteritems()
if count > max_duplicates])
result = [elem for elem in initial_list
if elem["main"] not in main_values_for_selection]
for main_value in main_values_for_selection:
all_indexes = [index for index, elem in enumerate(initial_list)
if elem["main"] == main_value]
indexes = numpy.random.choice(a=all_indexes, size=max_duplicates, replace=False)
result += [initial_list[i] for i in indexes]
return result
Thanks in advance for your help ;-)
This method always takes the first 2 or max_duplicate of a given key that it sees, but I think it is pretty efficient, only looking through the list once with few temporary variables:
from collections import defaultdict
def remove_duplicates(initials,max_dups):
dup_tracker = defaultdict(int)
rets = []
for entry in initials:
if dup_tracker[entry['main']] < max_dups:
dup_tracker[entry['main']] += 1
rets.append(entry)
return rets
max_dups = 2
initials = [
{'main': 1, 'more': 1},
{'main': 1, 'more': 2},
{'main': 1, 'more': 3},
{'main': 2, 'more': 1},
{'main': 2, 'more': 1},
{'main': 2, 'more': 3},
{'main': 3, 'more': 1}
]
rets = remove_duplicates(initials,max_dups)
print rets
To explain the code, defaultdict(int) creates a dictionary where every key (even if it is not yet defined), starts off with a value of 0. Next we loop through the list and keep track of how many of each key we've seen in the dup_tracker which is a dict keyed by the values of 'main' and valued by the number of times its seen that specific key. If the dup_tracker has seen few enough entries with that given key, it appends it to the rets output array and then returns it.
TIMED EDIT:
It looks like the method I implemented is at least a couple of orders of magnitude faster than yours. I included all the code I used to time them below.
TL;DR Yours 35.721 seconds vs mine 0.016 seconds when running on a list of 50,000 dicts, with values of main ranging from 0-10,000
from collections import Counter
import random
import time
​
def remove_duplicates_1(initial_list, max_duplicates):
main_counts = Counter([elem["main"] for elem in initial_list])
main_values_for_selection = set([main_value for main_value, count in main_counts.iteritems()
if count > max_duplicates])
result = [elem for elem in initial_list
if elem["main"] not in main_values_for_selection]
​
for main_value in main_values_for_selection:
all_indexes = [index for index, elem in enumerate(initial_list)
if elem["main"] == main_value]
indexes = numpy.random.choice(a=all_indexes, size=max_duplicates, replace=False)
result += [initial_list[i] for i in indexes]
return result
​
​
def remove_duplicates_2(initials,max_dups):
dup_tracker = {}
rets = []
for entry in initials:
if entry['main'] not in dup_tracker:
dup_tracker[entry['main']] = 1
rets.append(entry)
elif dup_tracker[entry['main']] < max_dups:
dup_tracker[entry['main']] += 1
rets.append(entry)
return rets
​
def generate_test_list(num_total,max_main):
test_list = []
for it in range(num_total):
main_value = round(random.random()*max_main)
test_list.append({'main':main_value, 'more':it})
return test_list
​
max_duplicates = 2
test_list = generate_test_list(50000,10000)
​
start = time.time()
rets_1 = remove_duplicates_1(test_list,max_duplicates)
time_1 = time.time()-start
​
start = time.time()
rets_2 = remove_duplicates_2(test_list,max_duplicates)
time_2 = time.time()-start
​
print "Yours",time_1,"vs mine",time_2
#Results:
#Yours 35.7210621834 vs mine 0.0159771442413

Break django values down into count of each value

I have a model defined similar to below
class MyModel(models.Model):
num_attempts = models.IntegerField()
num_generated = models.IntegerField()
num_deleted = models.IntegerField()
Assuming my data looked something like this:
|id|num_attempts|num_generated|num_deleted
1 1 2 0
2 2 0 1
3 3 2 1
4 3 1 2
I want to get a count of the instances at each possible value for each possible field.
For example, a return sample could look like this.
{
'num_attempts_at_1': 1,
'num_attempts_at_2': 1,
'num_attempts_at_3': 2,
'num_generated_at_0': 1,
'num_generated_at_1': 1,
'num_generated_at_2': 2,
'num_deleted_at_0': 1,
'num_deleted_at_1': 2,
'num_deleted_at_2': 1
}
This above example assumes a lot, like naming of the variables after and that it would be serialized. None of that matters but rather just how do I get it broken down like that from the database. It would be best to have this done in one query if possible.
We are using Postgres as the database.
Here is sorta close, but not quite.
qs.values_list('num_attempts','num_generated','num_deleted').annotate(Count('id'))
Gives this (not the same data as the example above)
[{'num_attempts': 4, 'id__count': 3, 'num_deleted': 3, 'num_generated': 6}, {'num_attempts': 3, 'id__count': 12, 'num_deleted': 2, 'num_generated': 2}, {'num_attempts': 2, 'id__count': 5, 'num_deleted': 0, 'num_generated': 6}]
Now with some custom python I was able to do this, but really want a database solution if possible.
def get(self, request, *args, **kwargs):
qs = self.get_queryset()
return_data = {}
for obj in qs:
count = obj.pop('id__count')
for k, v in obj.items():
key = "{}_at_{}".format(k, v)
value = return_data.get(key, 0) + count
return_data[key] = value
return Response(return_data)

How do i check for duplicate values present in a Dictionary?

I want to map a function that takes a dictionary as the input and returns a list of the keys.
The keys in the list must be of only the unique values present in the dictionary.
So, this is what I have done.
bDict={}
for key,value in aDict.items():
if bDict.has_key(value) == False:
bDict[value]=key
else:
bDict.pop(value,None)
This is the output :
>>> aDict.keys()
Out[4]: [1, 3, 6, 7, 8, 10]
>>> aDict.values()
Out[5]: [1, 2, 0, 0, 4, 0]
>>> bDict.keys()
Out[6]: [0, 1, 2, 4]
>>> bDict.values()
Out[7]: [10, 1, 3, 8]
But, the expected output should be for bDict.values() : [*1,3,8*]
This may help.
CODE
aDict = { 1:1, 3:2, 6:0, 7:0, 8:4, 10:0, 11:0}
bDict = {}
for i,j in aDict.items():
if j not in bDict:
bDict[j] = [i]
else:
bDict[j].append(i)
print map(lambda x: x[0],filter(lambda x: len(x) == 1,bDict.values()))
OUTPUT
[1, 3, 8]
So it appears you're creating a new dictionary with the keys and values inverted, keeping pairs where the value is unique. You can figure out which of the items are unique first then build a dictionary off of that.
def distinct_values(d):
from collections import Counter
counts = Counter(d.itervalues())
return { v: k for k, v in d.iteritems() if counts[v] == 1 }
This yields the following result:
>>> distinct_values({ 1:1, 3:2, 6:0, 7:0, 8:4, 10:0 })
{1: 1, 2: 3, 4: 8}
Here is a solution (with two versions of the aDict to test a rand case which failed in another solution):
#aDict = { 1:1, 3:2, 6:0, 7:0, 8:4, 10:0}
aDict = { 1:1, 3:2, 6:0, 7:0, 8:4, 10:0, 11:2}
seenValues = {}
uniqueKeys = set()
for aKey, aValue in aDict.items():
if aValue not in seenValues:
# Store the key of the value, and assume it is unique
seenValues[aValue] = aKey
uniqueKeys.add(aKey)
elif seenValues[aValue] in uniqueKeys:
# The value has been seen before, and the assumption of
# it being unique was wrong, so remove it
uniqueKeys.remove(seenValues[aValue])
print "Remove non-unique key/value pair: {%d, %d}" % (aKey, aValue)
else:
print "Non-unique key/value pair: {%d, %d}" % (aKey, aValue)
print "Unique keys: ", sorted(uniqueKeys)
And this produces the output:
Remove non-unique key/value pair: {7, 0}
Non-unique key/value pair: {10, 0}
Remove non-unique key/value pair: {11, 2}
Unique keys: [1, 8]
Or with original version of aDict:
Remove non-unique key/value pair: {7, 0}
Non-unique key/value pair: {10, 0}
Unique keys: [1, 3, 8]
As a python 2.7 one-liner,
[k for k,v in aDict.iteritems() if aDict.values().count(v) == 1]
Note that the above
Calls aDict.values() many times, once for each entry in the dictionary, and
Calls aDict.values().count(v) multiple times for each replicated value.
This is not a problem if the dictionary is small. If the dictionary isn't small, the creation and destruction of those duplicative lists and the duplicative calls to count() may be costly. It may help to cache the value of adict.values(), and it may also help to create a dictionary that maps the values in the dictionary to the number of occurrences as a dictionary entry value.