Getting count of one column after a distinct on two columns

Getting count of one column after a distinct on two columns - django

Here is a simplified representation of my models:
class Post(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
template_id = models.IntegerField(null=True)
...
What I want to do is show the number of times a template has been used by users. So when I list out the templates, I want to be able to say Used by X users. The main draw is that I don't only want to count a user once (so if a user uses a template twice, they still count as "one use case"). All stackoverflow posts talk about doing something like this:
counts = Post.objects.all().values("template_id").order_by().annotate(count=Count("template_id"))
But that obviously double counts a user that uses the same template twice. I was able to do a distinct on template_id and user pairings like so:
Post.objects.all().values("template_id", "user__id").distinct()
# Printing this out, I get 2 distinct entries in the QuerySet:
# <QuerySet [{'template_id': 1, 'user__id': 1}, {'template_id': 1, 'user__id': 2}]>
However, when I try to get the counts of template_id (the code below), it seems to ignore the distinct and still double counts users.
Post.objects.all().values("template_id", "user__id").distinct().values("template_id").annotate(count=Count("template_id"))
# Printing this out I get `count` = 3, which double counts a user.
# <QuerySet [{'template_id': 1, 'count': 3}]>
For what it's worth, I wrote a quick test case which is what is failing.
user1 = baker.make("User")
user2 = baker.make("User")
# Populate posts
quest1 = baker.make("post.Post", user=user1, template_id=1)
quest2 = baker.make("post.Post", user=user1, template_id=1) # Duplicate shouldn't count
quest3 = baker.make("post.Post", user=user2, template_id=1)

Got it to work using Django's built in ORM by doing the following:
template_ids = [] # My templates
# Get the number of times each template_id was used.
top_template_counts = (
Post.objects.filter(template_id__in=template_ids)
.values("template_id") # groups by template ids
.annotate(user_count=Count("user", distinct=True)) # Gets the number of users using each template
.order_by("-user_count")
)
# Accessing `top_template_counts`
for template_id_count in top_template_counts[:10]:
template_id = template_id_count["template_id"]
count = template_id_count["parent_count"]

why you just do not use:
counts = Post.objects.all().values("template_id", "user__id").distinct().values("template_id").count()

Related

Optimize request in the FOR loop

How can I optimize the following request to eliminate loop? Codes count is several hundred, so I get several hundreds database queries, which is unacceptable.
n = 3
result = []
codes = Target.objects.filter(code__in=['ABC', 'CDE', ...])
for code in codes:
result.append(Data.objects.select_related('target')
.filter(target__code=code)
.values('spec', 'spec_type')
.order_by('-spec')[:n])
Models:
class Data(models.Model):
target = models.ForeignKey(Target)
spec_type = models.CharField()
spec = models.FloatField()
class Target(models.Model):
code = models.TextField(db_index=True)

You do not have to retrieve the codes as a QuerySet to enumerate over. We can directly work with the list of codes.
If you want to construct a QuerySet that contains all the given elements, you can make a QuerySet with union that will fetch these objects. In that case this can be done with .union(…) [Django-doc]:
codes = ['ABC', 'CDE']
n = 3
result = Data.objects.none().union(
*[
Data.objects.filter(target__code=code).values('spec', 'spec_type').order_by('-spec')[:n]
for code in codes
],
all=True
)

Like #Willem Van Onsem said, you don't need to get get a queryset of your Target objects since you already seem to have the codes that you want. Just store the codes in a variable and then you can do a django query using that list.
codes = ['ABC', 'CDE', ...]
result = Data.objects.filter(target__code__in = codes)
This query should return all Data objects of which the related Target object's code is in the list codes.

Django - creating and saving multiple object in a loop, with ForeignKeys

I am having trouble creating and saving objects in Django. I am very new to Django so I'm sure I'm missing something very obvious!
I am building a price comparison app, and I have a Search model:
Search - all searches carried out, recording best price, worst price, product searched for, time of search etc. I have successfully managed to save these searches to a DB and am happy with this model.
The two new models I am working with are:
Result - this is intended to record all search results returned, for each search carried out. I.e. Seller 1 £100, Seller 2 £200, Seller 3, £300. (One search has many search results).
'Agent' - a simple table of Agents that I compare prices at. (One Agent can have many search Results).
class Agent(models.Model):
agent_id = models.AutoField(primary_key=True)
agent_name = models.CharField(max_length=30)
class Result(models.Model):
search_id = models.ForeignKey(Search, on_delete=models.CASCADE) # Foreign Key of Search table
agent_id = models.ForeignKey(Agent, on_delete=models.CASCADE) # Foreign Key of Agent table
price = models.FloatField()
search_position = models.IntegerField().
My code that is creating and saving the objects is here:
def update_search_table(listed, product):
if len(listed) > 0:
search = Search(product=product,
no_of_agents=len(listed),
valid_search=1,
best_price=listed[0]['cost'],
worst_price=listed[-1]['cost'])
search.save()
for i in range(len(listed)):
agent = Agent.objects.get(agent_name = listed[i]['company'])
# print(agent.agent_id) # Prints expected value
# print(search.search_id) # Prints expected value
# print(listed[i]['cost']) # Prints expected value
# print(i + 1) # Prints expected value
result = Result(search_id = search,
agent_id = agent,
price = listed[i]['cost'],
position = i + 1)
search.result_set.add(result)
agent.result_set.add(result)
result.save()
Up to search.save() is working as expected.
The first line of the for loop is also correctly retrieving the relevant Agent.
The rest of it is going wrong (i.e. not saving any Result objects to the Result table). What I want to achieve is, if there are 10 different agent results returned, create 10 Result objects and save each one. Link each of those 10 objects to the Search that triggered the results, and link each of those 10 objects to the relevant Agent.
Have tried quite a few iterations but not sure where I'm going wrong.
Thanks

Giving relations an order to sort by

Given the following Django models:
class Room(models.Model):
name = models.CharField(max_length=20)
class Beacon(models.Model):
room = models.ForeignKey(Room)
uuid = models.UUIDField(default=uuid.uuid4)
major = models.PostiveIntegerField(max_value=65536)
minor = models.PositiveIntegerField(max_value=65536)
The Beacon model is a bluetooth beacon relationship to the room.
I want to select all Rooms that match a given uuid, major, minor combination.
The catch is, that I want to order the rooms by the beacon that is nearest to me. Because of this, I need to be able to assign a value to each beacon dynamically, and then sort by it.
Is this possible with the Django ORM? In Django 1.8?
NOTE - I will know the ordering of the beacons beforehand, I will be using the order they are passed in the query string. So the first beacon (uuid, major, minor) passed should match the first room that is returned by the Room QuerySet
I am envisioning something like this, though I know this won't work:
beacon_order = [
beacon1 = 1,
beacon0 = 2,
beacon3 = 3,
]
queryset = Room.objects.annotate(beacon_order=beacon_order).\
order_by('beacon_order')

If you already know the order of the beacons, there's no need to sort within the QuerySet itself. Take an ordered list called beacon_list, which contains the beacons' primary keys in order, e.g. the item at index 0 is the closest beacon's primary key, the item at index 1 is the second closest beacon's PK, etc. Then use a list comprehension:
ordered_rooms = [Room.objects.get(pk=x) for x in beacon_list]
You don't have to use the PK either, you can use anything which identifies the given object in the database, e.g. the name field.

Looks like this works:
from django.db.models import Case, Q, When
beacons = request.query_params.getlist('beacon[]')
query = Q()
order = []
for pos, beacon in enumerate(beacons):
uuid, major, minor = beacon.split(':')
query |= Q(
beacon__uuid=uuid,
beacon__major=major,
beacon__minor=minor,
)
order.append(When(
beacon__uuid=uuid,
beacon__major=major,
beacon__minor=minor,
then=pos,
))
rooms = Room.objects.filter(query).order_by(Case(*order))

Django ORM. Filter many to many with AND clause

With the following models:
class Item(models.Model):
name = models.CharField(max_length=255)
attributes = models.ManyToManyField(ItemAttribute)
class ItemAttribute(models.Model):
attribute = models.CharField(max_length=255)
string_value = models.CharField(max_length=255)
int_value = models.IntegerField()
I also have an Item which has 2 attributes, 'color': 'red', and 'size': 3.
If I do any of these queries:
Item.objects.filter(attributes__string_value='red')
Item.objects.filter(attributes__int_value=3)
I will get Item returned, works as I expected.
However, if I try to do a multiple query, like:
Item.objects.filter(attributes__string_value='red', attributes__int_value=3)
All I want to do is an AND. This won't work either:
Item.objects.filter(Q(attributes__string_value='red') & Q(attributes__int_value=3))
The output is:
<QuerySet []>
Why? How can I build such a query that my Item is returned, because it has the attribute red and the attribute 3?

If it's of any use, you can chain filter expressions in Django:
query = Item.objects.filter(attributes__string_value='red').filter(attributes__int_value=3')
From the DOCS:
This takes the initial QuerySet of all entries in the database, adds a filter, then an exclusion, then another filter. The final result is a QuerySet containing all entries with a headline that starts with “What”, that were published between January 30, 2005, and the current day.
To do it with .filter() but with dynamic arguments:
args = {
'{0}__{1}'.format('attributes', 'string_value'): 'red',
'{0}__{1}'.format('attributes', 'int_value'): 3
}
Product.objects.filter(**args)
You can also (if you need a mix of AND and OR) use Django's Q objects.
Keyword argument queries – in filter(), etc. – are “AND”ed together. If you need to execute more complex queries (for example, queries with OR statements), you can use Q objects.
A Q object (django.db.models.Q) is an object used to encapsulate a
collection of keyword arguments. These keyword arguments are specified
as in “Field lookups” above.
You would have something like this instead of having all the Q objects within that filter:
** import Q from django
from *models import Item
#assuming your arguments are kwargs
final_q_expression = Q(kwargs[1])
for arg in kwargs[2:..]
final_q_expression = final_q_expression & Q(arg);
result = Item.objects.filter(final_q_expression)
This is code I haven't run, it's out of the top of my head. Treat it as pseudo-code if you will.
Although, this doesn't answer why the ways you've tried don't quite work. Maybe it has to do with the lookups that span relationships, and the tables that are getting joined to get those values. I would suggest printing yourQuerySet.query to visualize the raw SQL that is being formed and that might help guide you as to why .filter( Q() & Q()) is not working.

Reverse Count of ManytoManyField with a condition

I have a usecase where I have to count occurences of a ManyToManyField but its getting more complex than I'd think.
models.py:
class Tag(models.Model):
name = models.CharField(max_length=100, unique=True)
class People(models.Model):
tag = models.ManyToManyField(Tag, blank=True)
Here I have to come up with a list of Tags and the number of times they appear overall but only for those People who have >0 and <6 tags. Something like:
tag1 - 265338
tag2 - 4649303
tag3 - 36636
...
This is how I came up with the count initially:
q = People.objects.annotate(tag_count=Count('tag')).filter(tag_count__lte=6, tag_count__gt=0)
for tag in Tag.objects.all():
cnt = q.filter(tag__name=tag.name).count()
# doing something with the cnt
But I later realised that this may be inefficient since I am probably iterating through the People model many times (Records in People are way larger than those in Tag).
Intuitively I think I should be able to do one iteration of the Tag model without any iteration of the People model. So then I came up with this:
for tag in Tag.objects.all():
cnt = tag.people_set.annotate(tag_count=Count('tag')).filter(tag_count__lte=6).count()
# doing something with the cnt
But, first, this is not producing the expected results. Second, I am thinking this has become more complex that it seemed to be, so perhaps I am complicating a simple thing. All ears to any advice.
Update: I got queryset.query and ran the query on the db to debug it. For some reason, the tag_count column in the resulting join shows all 1's. Can't seem to understand why.

Can be done using reverse ManyToMany field query.
Would also reduce the overhead, and shift most of overhead from python to database server.
from some_app.models import Tag, People
from django.db.models import F, Value, Count, CharField
from django.db.models.functions import Concat
# queryset: people with tags >0 and <6, i.e. 1 to 5 tags
people_qualified = People.objects.annotate(tag_count=Count('tag'))\
.filter(tag_count__range=(1, 5))
# query tags used with above category of people, with count
tag_usage = Tag.objects.filter(people__in=people_qualified)\
.annotate(tag=F('name'), count=Count('people'))\
.values('tag', 'count')
# Result: <QuerySet [{'count': 3, 'tag': u'hello'}, {'count': 2, 'tag': u'world'}]>
# similarily, if needed the string output
tag_usage_list = Tag.objects.filter(people__in=people_qualified)\
.annotate(tags=Concat(F('name'), Value(' - '), Count('people'),
output_field=CharField()))\
.values_list('tags', flat=True)
# Result: <QuerySet [u'hello - 3', u'world - 2']>

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Getting count of one column after a distinct on two columns - django

why you just do not use: counts = Post.objects.all().values("template_id", "user__id").distinct().values("template_id").count()

Related

Optimize request in the FOR loop

Django - creating and saving multiple object in a loop, with ForeignKeys

Giving relations an order to sort by

Django ORM. Filter many to many with AND clause

Reverse Count of ManytoManyField with a condition

Categories

Resources