Subquery in select Django - django

Trying to run a complicated query in Django over Postgresql.
These are my models:
class Link(models.Model):
short_key = models.CharField(primary_key=True, max_length=8, unique=True, blank=True)
long_url = models.CharField(max_length=150)
class Stats_links_ads(models.Model):
link_id = models.ForeignKey(Link, related_name='link_viewed', primary_key=True)
ad_id = models.ForeignKey(Ad, related_name='ad_viewed')
views = models.PositiveIntegerField()
clicks = models.PositiveIntegerField()
I want to run using the Django ORM a query which will translate into something like so:
select a.link_id, sum(a.clicks), sum (a.views), (select long_url from links_link b where b.short_key = a.link_id_id)
from links_stats_links_ads a
group by a.link_id_id;
If i exclude the long_url field that I need I can run this code and it will work:
Stats_links_Ads.objects.all().values('link_id').annotate(Sum('views'), Sum('clicks'))
I don't know how to add the subquery in the select statement.
Thanks

You can see the raw sql behind your queries using the query attribute of Queryset.
For example, look at the sql behind my first answer using select_related, it's clear the generated sql doesn't behave as expected and accessing the long_url will result in additional queries.
Take 2
You can follow relationships using double underscore notation like this
qs = Stats_links_ads.objects
.values('link_id', 'link_id__long_url')
.annotate(Sum('views'), Sum('clicks'))
str(qs.query)
'SELECT
"stackoverflow_stats_links_ads"."link_id_id",
"stackoverflow_link"."long_url",
SUM("stackoverflow_stats_links_ads"."clicks") AS "clicks__sum",
SUM("stackoverflow_stats_links_ads"."views") AS "views__sum"
FROM "stackoverflow_stats_links_ads"
INNER JOIN "stackoverflow_link"
ON ("stackoverflow_stats_links_ads"."link_id_id" = "stackoverflow_link"."short_key")
GROUP BY
"stackoverflow_stats_links_ads"."link_id_id",
"stackoverflow_link"."long_url"'
I'm not working with any data, so I haven't verified it, but the sql looks right.
Take 1
Does not work
Can't you use .select_related? [docs]
qs = Stats_links_Ads.objects.select_related('link')
.values('link_id').annotate(Sum('views'), Sum('clicks'))
str(qs.query)
'SELECT
"stackoverflow_stats_links_ads"."link_id_id",
SUM("stackoverflow_stats_links_ads"."clicks") AS "clicks__sum",
SUM("stackoverflow_stats_links_ads"."views") AS "views__sum"
FROM "stackoverflow_stats_links_ads"
GROUP BY "stackoverflow_stats_links_ads"."link_id_id"'

Related

Django How can I pass a subquery in LEFT JOIN

I have 3 models (A,B,C):
class A(models.Model):
url = models.URLField()
uuid = models.UUIDField()
name = models.CharField(max_length=400)
id = models.IntegerField()
class B(models.Model):
user = models.ForeignKey(C, to_field='user_id',
on_delete=models.PROTECT,)
uuid = models.ForeignKey(A, to_field='uuid',
on_delete=models.PROTECT,)
and I want to perform the following SQL query using the Django ORM:
SELECT A.id, COUNT(A.id), COUNT(foo.user)
FROM A
LEFT JOIN (SELECT uuid, user FROM B where user = '<a_specific_user_id>') as foo
ON A.uuid = foo.uuid_id
WHERE name = '{}'
GROUP by 1
HAVING COUNT(A.id)> 1 AND COUNT(A.id)>COUNT(foo.user)
My problem is mainly with LEFT JOIN. I know I can form a LEFT JOIN by checking for the existence of null fields on table B:
A.objects.filter(name='{}', b__isnull=True).values('id', 'name')
but how can I LEFT JOIN on the specific sub-query I want?
I tried using Subquery() but it seems to populate the final WHERE statement and not pass my custom sub-query in the LEFT JOIN.
For anyone stumbling upon this in the future. I directly contacted the Django irc channel and it's confirmed that, as of now, it's not possible to include a custom subquery in a LEFT JOIN clause, using the Django ORM.

Django query with order_by, distinct and limit on Postgresql

I have the following :
class Product(models.Model):
name = models.CharField(max_length=255)
class Action(models.Model):
product = models.ForeignKey(Product)
created_at = models.DateTimeField(auto_now_add=True)
I would like to retrieve the 10 most recent actions ordered by created_at DESC with distinct products.
The following is close to the result but still misses the ordering:
Action.objects.all().order_by('product_id').distinct('product_id')[:10]
Your solution seems like it's trying to do too much. It will also result in 2 separate SQL queries. This would work fine and with only a single query:
action_ids = Action.objects.order_by('product_id', '-created_at')\
.distinct('product_id').values_list('id', flat=True)
result = Action.objects.filter(id__in=action_ids)\
.order_by('-created_at')[:10]
EDIT: this solution works but Ross Lote's is cleaner
This is the way I finally did it, using Django Aggregation:
from django.db.models import Max
actions_id = Action.objects.all().values('product_id') \
.annotate(action_id=Max('id')) \
.order_by('-action_id')[:10] \
.values_list('action_id', flat=True)
result = Action.objects.filter(id__in=actions_id).order_by('-created_at')
By setting values('product_id') we do a group by on product_id.
With annotate() we can use order_by only on fields used in values() or annotate(). Since for each action the created_at field is automatically set to now, ordering on created_at is the same as ordering on id, using annotate(action_id=Max('id')).order_by('-action_id') is the right way.
Finnaly, we just need to slice our query [:10]
Hope this helps.

Django queryset - Adding HAVING constraint

I have been using Django for a couple of years now but I am struggling today with adding a HAVING constraint to a GROUP BY.
My queryset is the following:
crm_models.Contact.objects\
.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_rels=Count('dealercontact'))
which gives me the following MySQL query:
SELECT *
FROM `contact`
LEFT OUTER JOIN `dealer_contact` ON (`contact`.`id_contact` = `dealer_contact`.`id_contact`)
WHERE (`dealer_contact`.`active` = True
AND `dealer_contact`.`activity` = 'gardening'
AND `contact`.`date_data_collected` >= '2012-10-01'
AND `contact`.`date_data_collected` < '2013-10-01'
AND `dealer_contact`.`id_dealer` IN (265))
GROUP BY `contact`.`id_contact`
ORDER BY NULL;
I would get exactly what I need with this HAVING constraint:
HAVING SUM(IF(`dealer_contact`.`type`='customer', 1, 0)) = 0
How can I get this fixed with a Django Queryset? I need a queryset in this instance.
Here I am using annotate only in order to get the GROUP BY on contact.id_contact.
Edit: My goal is to get the Contacts who have no "customer" relation in dealercontact but have "ref" relation(s) (according to the WHERE clause of course).
Models
class Contact(models.Model):
id_contact = models.AutoField(primary_key=True)
title = models.CharField(max_length=255L, blank=True, choices=choices_custom_sort(TITLE_CHOICES))
last_name = models.CharField(max_length=255L, blank=True)
first_name = models.CharField(max_length=255L, blank=True)
[...]
date_data_collected = models.DateField(null=True, db_index=True)
class Dealer(models.Model):
id_dealer = models.AutoField(primary_key=True)
address1 = models.CharField(max_length=45L, blank=True)
[...]
class DealerContact(Auditable):
id_dealer_contact = models.AutoField(primary_key=True)
contact = models.ForeignKey(Contact, db_column='id_contact')
dealer = models.ForeignKey(Dealer, db_column='id_dealer')
activity = models.CharField(max_length=32, choices=choices_custom_sort(ACTIVITIES), db_index=True)
type = models.CharField(max_length=32, choices=choices_custom_sort(DEALER_CONTACT_TYPE), db_index=True)
I figured this out by adding two binary fields in DealerContact: is_ref and is_customer.
If type='ref' then is_ref=1 and is_customer=0.
Else if type='customer' then is_ref=0 and is_customer=1.
Thus, I am now able to use annotate(nb_customers=Sum('is_customer')) and then use filter(nb_customers=0).
The final queryset consists in:
Contact.objects.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_customers=Sum('dealercontact__is_customer'))\
.filter(nb_customers=0)
Actually there is a way you can add your own custom HAVING and GROUP BY clauses if you need.
Just use my example with caution - if Django ORM code/paths will change in future Django versions, you will have to update your code too.
Image you have Book and Edition models, where for each book there can be multiple editions and you want to select first US edition date within Book queryset.
Adding custom HAVING and GROUP BY clauses in Django 1.5+:
from django.db.models import Min
from django.db.models.sql.where import ExtraWhere, AND
qs = Book.objects.all()
# Standard annotate
qs = qs.annotate(first_edition_date=Min("edition__date"))
# Custom HAVING clause, to limit annotation by US country only
qs.query.having.add(ExtraWhere(['"app_edition"."country"=%s'], ["US"]), AND)
# Custom GROUP BY clause will be needed too
qs.query.group_by.append(("app_edition", "country"))
ExtraWhere can contain not just fields, but any raw sql conditions and functions too.
Are you not using raw query just because you want orm object? Using Contact.objects.raw() generate instances similar filter. Refer to https://docs.djangoproject.com/en/dev/topics/db/sql/ for more help.
My goal is to get the Contacts who have no "customer" relation in
dealercontact but have "ref" relation(s) (according to the WHERE
clause of course).
This simple query fulfills this requirement:
Contact.objects.filter(dealercontact__type="ref").exclude(dealercontact__type="customer")
Is this enough, or do you need it to do something more?
UPDATE: if your requirement is
Contacts that have a "ref" relations, but do not have "customer"
relations with the same dealer
you can do this:
from django.db.models import Q
Contact.objects.filter(Q(dealercontact__type="ref") & ~Q(dealercontact__type="customer"))

Django JOIN query without foreign key

Is there a way in Django to write a query using the ORM, not raw SQL that allows you to JOIN on another table without there being a foreign key? Looking through the documentation it appears in order for the One to One relationship to work there must be a foreign key present?
In the models below I want to run a query with a JOIN on UserActivity.request_url to UserActivityLink.url.
class UserActivity(models.Model):
id = models.IntegerField(primary_key=True)
last_activity_ip = models.CharField(max_length=45L, blank=True)
last_activity_browser = models.CharField(max_length=255L, blank=True)
last_activity_date = models.DateTimeField(auto_now_add=True)
request_url = models.CharField(max_length=255L, blank=True)
session_id = models.CharField(max_length=255L)
users_id = models.IntegerField()
class Meta:
db_table = 'user_activity'
class UserActivityLink(models.Model):
id = models.IntegerField(primary_key=True)
url = models.CharField(max_length=255L, blank=True)
url_description = models.CharField(max_length=255L, blank=True)
type = models.CharField(max_length=45L, blank=True)
class Meta:
db_table = 'user_activity_link'
The link table has a more descriptive translation of given URLs in the system, this is needed for some reporting the system will generate.
I've tried creating the foreign key from UserActivity.request_url to UserActivityLink.url but it fails with the following error: ERROR 1452: Cannot add or update a child row: a foreign key constraint fails
No, there isn't an effective way unfortunately.
The .raw() is there for this exact thing. Even if it could it probably would be a lot slower than raw SQL.
There is a blogpost here detailing how to do it with query.join() but as they themselves point out. It's not best practice.
Just reposting some related answer, so everyone could see it.
Taken from here: Most efficient way to use the django ORM when comparing elements from two lists
First problem: joining unrelated models
I'm assuming that your Model1 and Model2 are not related,
otherwise you'd be able to use Django's related objects
interface. Here are two approaches you could take:
Use extra and a SQL subquery:
Model1.objects.extra(where = ['field in (SELECT field from myapp_model2 WHERE ...)'])
Subqueries are not handled very efficiently in some databases
(notably MySQL) so this is probably not as good as #2 below.
Use a raw SQL query:
Model1.objects.raw('''SELECT * from myapp_model1
INNER JOIN myapp_model2
ON myapp_model1.field = myapp_model2.field
AND ...''')
Second problem: enumerating the result
Two approaches:
You can enumerate a query set in Python using the built-in enumerate function:
enumerate(Model1.objects.all())
You can use the technique described in this answer to do the enumeration in MySQL. Something like this:
Model1.objects.raw('''SELECT *, #row := #row + 1 AS row
FROM myapp_model1
JOIN (SELECT #row := 0) rowtable
INNER JOIN myapp_model2
ON myapp_model1.field = myapp_model2.field
AND ...''')
The Django ForeignKey is different from SQL ForeignKey. Django ForeignKey just represent a relation, it can specify whether to use database constraints.
Try this:
request_url = models.ForeignKey(UserActivityLink, to_field='url_description', null=True, on_delete=models.SET_NULL, db_constraint=False)
Note that the db_constraint=False is required, without it Django will build a SQL like:
ALTER TABLE `user_activity` ADD CONSTRAINT `xxx` FOREIGN KEY (`request_url`) REFERENCES `user_activity_link` (`url_description`);"
I met the same problem, after a lot of research, I found the above method.
Hope it helps.

How to create annotation on filtered data in Django ORM?

I have the following models:
class ApiUser(models.Model):
apikey = models.CharField(max_length=32, unique=True)
class ExtMethodCall(models.Model):
apiuser = models.ForeignKey(ApiUser)
method = models.CharField(max_length=100) #method name
units = models.PositiveIntegerField() #how many units method call cost
created_dt = models.DateField(auto_now_add=True)
For report, i need to get all users who made any call today and total cost of all calls for each user.
In SQL, that would be something like:
SELECT apiuser.*, q1.total_cost
FROM apiuser INNER JOIN (
SELECT apiuser_id, sum(units) as total_cost
FROM extmethodcall
WHERE create_dt = curdate()
GROUP by apiuser_id
) USING apiuser_id
So far, i have found the following solution:
models.ExtMethodCall.objects.filter(created_dt=datetime.date.today()).values('apiuser').annotate(Sum('units'))
which returns me apiuser_id and units__sum.
Is there any more intelligent solution?
Is there any more intelligent solution?
No this is the most natural solution, Django ORM will "tanslate"
.annotate(Sum('units'))
into SQL
SELECT ... sum(units) as units__sum