Django query with order_by, distinct and limit on Postgresql - django

I have the following :
class Product(models.Model):
name = models.CharField(max_length=255)
class Action(models.Model):
product = models.ForeignKey(Product)
created_at = models.DateTimeField(auto_now_add=True)
I would like to retrieve the 10 most recent actions ordered by created_at DESC with distinct products.
The following is close to the result but still misses the ordering:
Action.objects.all().order_by('product_id').distinct('product_id')[:10]

Your solution seems like it's trying to do too much. It will also result in 2 separate SQL queries. This would work fine and with only a single query:
action_ids = Action.objects.order_by('product_id', '-created_at')\
.distinct('product_id').values_list('id', flat=True)
result = Action.objects.filter(id__in=action_ids)\
.order_by('-created_at')[:10]

EDIT: this solution works but Ross Lote's is cleaner
This is the way I finally did it, using Django Aggregation:
from django.db.models import Max
actions_id = Action.objects.all().values('product_id') \
.annotate(action_id=Max('id')) \
.order_by('-action_id')[:10] \
.values_list('action_id', flat=True)
result = Action.objects.filter(id__in=actions_id).order_by('-created_at')
By setting values('product_id') we do a group by on product_id.
With annotate() we can use order_by only on fields used in values() or annotate(). Since for each action the created_at field is automatically set to now, ordering on created_at is the same as ordering on id, using annotate(action_id=Max('id')).order_by('-action_id') is the right way.
Finnaly, we just need to slice our query [:10]
Hope this helps.

Related

Django conditionally count annotated values in "group-by" statement

Having the following models:
class TheModel(models.Model):
created_at = models.DateTimeField(default=datetime.now)
class Item(models.Model):
the_model = models.ForeignKey(TheModel, on_delete=models.CASCADE, related_name='items')
How can be calculated the number of models and how many of them have more than 2 items grouped by day?
I tried:
qs = models.TheModel.objects.all()
qs = qs.annotate(contained_items=Count('items'))
result = qs.values('created_at__date').annotate(
total_count=Count('created_at__date'),
models_with_contained_items=Count('created_at__date', filter=Q(contained_items__gt=2))
)
But it raises "OperationalError" "misuse of aggregate function COUNT()"
You can do it as follows:
from django.db.models.functions import ExtractDay, ExtractMonth, ExtractYear
query_set = Model.objects.filter(contained_items__gt=2).annotate(day=ExtractDay('created_at'), month=ExtractMonth('created_at'), year=ExtractYear('created_at')).values('day', 'month', 'year').annotate(total_count=Count('items')).values('day', 'month', 'year', 'total_count').order_by()
Read more about Extract
A question might arise, why order_by() is used at last? It is used because at the end Django always applies its default ordering so you might get unexpected results and not get the data grouped, so to overcome that .order_by() is used without any parameters to tell django to not apply any ordering at the end.

How to return all records but exclude the last item

I'm having problem filtering in django-models.
I want to return all records of a particular animal but excluding the last item based on the latest created_at value and sorted in a descending order.
I have this model.
class Heat(models.Model):
# Fields
performer = models.CharField(max_length=25)
is_bred = models.BooleanField(default=False)
note = models.TextField(max_length=250, blank=True, null=True)
result = models.BooleanField(default=False)
# Relationship Fields
animal = models.ForeignKey(Animal, related_name='heats', on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True, editable=False)
last_updated = models.DateTimeField(auto_now=True, editable=False)
I was able to achieved the desired result by this raw sql script. But I want a django approach.
SELECT
*
FROM
heat
WHERE
heat.created_at != (SELECT MAX((heat.created_at)) FROM heat)
AND heat.animal_id = '2' ORDER BY heat.created_at DESC;
Please help.
It will be
Heat.objects.order_by("-created_at")[1:]
For a particular animal it will then be:
Heat.objects.filter(animal_id=2).order_by("-created_at")[1:]
where [1:] on a queryset has a regular python slice syntax and generates the correct SQL code. (In this case simply removes the first / most recently created element)
Upd: as #schwobaseggl mentioned, in the comments, slices with negative index don't work on django querysets. Therefore the objects are reverse ordered first.
I just converted your SQL query to Django ORM code.
First, fetch the max created_at value using aggregation and do an exclude.
from django.db.models import Max
heat_objects = Heat.objects.filter(
animal_id=2
).exclude(
created_at=Heat.objects.all().aggregate(Max('created_at'))['created_at__max']
)
Get last record:
obj= Heat.objects.all().order_by('-id')[0]
Make query:
query = Heat.objects.filter(animal_id=2).exclude(id=obj['id']).all()
The query would be :
Heat.objects.all().order_by('id')[1:]
You could also put any filter you require by replacing all()

Count number of distinct records by date in Django

following this question:
Count number of records by date in Django
class Review(models.Model):
venue = models.ForeignKey(Venue, db_index=True)
review = models.TextField()
datetime_visited = models.DateTimeField(default=datetime.now)
It is true that the following line solves the problem of count number of records by date:
Review.objects.filter
.extra({'date_visited' : "date(datetime_visisted)"})
.values('date_visited')
.annotate(visited_count=Count('id'))
However, say I would like to have a distinct count, that is, I would like to avoid Review objects from the same id on the same day, what can I do?
I tried:
Review.objects.filter.
.extra({'date_visited': "date(datetime_visited)"})
.values('date_visited', 'id')
.distinct()
.annotate(Count('id'))
but it seems not working
Your problem is that you're including id in your values(), which is making all records unique, defeating distinct(). Try this instead:
Review.objects.filter.
.extra({'date_visited': "date(datetime_visited)"})
.values('date_visited')
.distinct()
.annotate(Count('date_visited'))

Django queryset - Adding HAVING constraint

I have been using Django for a couple of years now but I am struggling today with adding a HAVING constraint to a GROUP BY.
My queryset is the following:
crm_models.Contact.objects\
.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_rels=Count('dealercontact'))
which gives me the following MySQL query:
SELECT *
FROM `contact`
LEFT OUTER JOIN `dealer_contact` ON (`contact`.`id_contact` = `dealer_contact`.`id_contact`)
WHERE (`dealer_contact`.`active` = True
AND `dealer_contact`.`activity` = 'gardening'
AND `contact`.`date_data_collected` >= '2012-10-01'
AND `contact`.`date_data_collected` < '2013-10-01'
AND `dealer_contact`.`id_dealer` IN (265))
GROUP BY `contact`.`id_contact`
ORDER BY NULL;
I would get exactly what I need with this HAVING constraint:
HAVING SUM(IF(`dealer_contact`.`type`='customer', 1, 0)) = 0
How can I get this fixed with a Django Queryset? I need a queryset in this instance.
Here I am using annotate only in order to get the GROUP BY on contact.id_contact.
Edit: My goal is to get the Contacts who have no "customer" relation in dealercontact but have "ref" relation(s) (according to the WHERE clause of course).
Models
class Contact(models.Model):
id_contact = models.AutoField(primary_key=True)
title = models.CharField(max_length=255L, blank=True, choices=choices_custom_sort(TITLE_CHOICES))
last_name = models.CharField(max_length=255L, blank=True)
first_name = models.CharField(max_length=255L, blank=True)
[...]
date_data_collected = models.DateField(null=True, db_index=True)
class Dealer(models.Model):
id_dealer = models.AutoField(primary_key=True)
address1 = models.CharField(max_length=45L, blank=True)
[...]
class DealerContact(Auditable):
id_dealer_contact = models.AutoField(primary_key=True)
contact = models.ForeignKey(Contact, db_column='id_contact')
dealer = models.ForeignKey(Dealer, db_column='id_dealer')
activity = models.CharField(max_length=32, choices=choices_custom_sort(ACTIVITIES), db_index=True)
type = models.CharField(max_length=32, choices=choices_custom_sort(DEALER_CONTACT_TYPE), db_index=True)
I figured this out by adding two binary fields in DealerContact: is_ref and is_customer.
If type='ref' then is_ref=1 and is_customer=0.
Else if type='customer' then is_ref=0 and is_customer=1.
Thus, I am now able to use annotate(nb_customers=Sum('is_customer')) and then use filter(nb_customers=0).
The final queryset consists in:
Contact.objects.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_customers=Sum('dealercontact__is_customer'))\
.filter(nb_customers=0)
Actually there is a way you can add your own custom HAVING and GROUP BY clauses if you need.
Just use my example with caution - if Django ORM code/paths will change in future Django versions, you will have to update your code too.
Image you have Book and Edition models, where for each book there can be multiple editions and you want to select first US edition date within Book queryset.
Adding custom HAVING and GROUP BY clauses in Django 1.5+:
from django.db.models import Min
from django.db.models.sql.where import ExtraWhere, AND
qs = Book.objects.all()
# Standard annotate
qs = qs.annotate(first_edition_date=Min("edition__date"))
# Custom HAVING clause, to limit annotation by US country only
qs.query.having.add(ExtraWhere(['"app_edition"."country"=%s'], ["US"]), AND)
# Custom GROUP BY clause will be needed too
qs.query.group_by.append(("app_edition", "country"))
ExtraWhere can contain not just fields, but any raw sql conditions and functions too.
Are you not using raw query just because you want orm object? Using Contact.objects.raw() generate instances similar filter. Refer to https://docs.djangoproject.com/en/dev/topics/db/sql/ for more help.
My goal is to get the Contacts who have no "customer" relation in
dealercontact but have "ref" relation(s) (according to the WHERE
clause of course).
This simple query fulfills this requirement:
Contact.objects.filter(dealercontact__type="ref").exclude(dealercontact__type="customer")
Is this enough, or do you need it to do something more?
UPDATE: if your requirement is
Contacts that have a "ref" relations, but do not have "customer"
relations with the same dealer
you can do this:
from django.db.models import Q
Contact.objects.filter(Q(dealercontact__type="ref") & ~Q(dealercontact__type="customer"))

Subquery in select Django

Trying to run a complicated query in Django over Postgresql.
These are my models:
class Link(models.Model):
short_key = models.CharField(primary_key=True, max_length=8, unique=True, blank=True)
long_url = models.CharField(max_length=150)
class Stats_links_ads(models.Model):
link_id = models.ForeignKey(Link, related_name='link_viewed', primary_key=True)
ad_id = models.ForeignKey(Ad, related_name='ad_viewed')
views = models.PositiveIntegerField()
clicks = models.PositiveIntegerField()
I want to run using the Django ORM a query which will translate into something like so:
select a.link_id, sum(a.clicks), sum (a.views), (select long_url from links_link b where b.short_key = a.link_id_id)
from links_stats_links_ads a
group by a.link_id_id;
If i exclude the long_url field that I need I can run this code and it will work:
Stats_links_Ads.objects.all().values('link_id').annotate(Sum('views'), Sum('clicks'))
I don't know how to add the subquery in the select statement.
Thanks
You can see the raw sql behind your queries using the query attribute of Queryset.
For example, look at the sql behind my first answer using select_related, it's clear the generated sql doesn't behave as expected and accessing the long_url will result in additional queries.
Take 2
You can follow relationships using double underscore notation like this
qs = Stats_links_ads.objects
.values('link_id', 'link_id__long_url')
.annotate(Sum('views'), Sum('clicks'))
str(qs.query)
'SELECT
"stackoverflow_stats_links_ads"."link_id_id",
"stackoverflow_link"."long_url",
SUM("stackoverflow_stats_links_ads"."clicks") AS "clicks__sum",
SUM("stackoverflow_stats_links_ads"."views") AS "views__sum"
FROM "stackoverflow_stats_links_ads"
INNER JOIN "stackoverflow_link"
ON ("stackoverflow_stats_links_ads"."link_id_id" = "stackoverflow_link"."short_key")
GROUP BY
"stackoverflow_stats_links_ads"."link_id_id",
"stackoverflow_link"."long_url"'
I'm not working with any data, so I haven't verified it, but the sql looks right.
Take 1
Does not work
Can't you use .select_related? [docs]
qs = Stats_links_Ads.objects.select_related('link')
.values('link_id').annotate(Sum('views'), Sum('clicks'))
str(qs.query)
'SELECT
"stackoverflow_stats_links_ads"."link_id_id",
SUM("stackoverflow_stats_links_ads"."clicks") AS "clicks__sum",
SUM("stackoverflow_stats_links_ads"."views") AS "views__sum"
FROM "stackoverflow_stats_links_ads"
GROUP BY "stackoverflow_stats_links_ads"."link_id_id"'