Django queryset - Adding HAVING constraint - django

I have been using Django for a couple of years now but I am struggling today with adding a HAVING constraint to a GROUP BY.
My queryset is the following:
crm_models.Contact.objects\
.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_rels=Count('dealercontact'))
which gives me the following MySQL query:
SELECT *
FROM `contact`
LEFT OUTER JOIN `dealer_contact` ON (`contact`.`id_contact` = `dealer_contact`.`id_contact`)
WHERE (`dealer_contact`.`active` = True
AND `dealer_contact`.`activity` = 'gardening'
AND `contact`.`date_data_collected` >= '2012-10-01'
AND `contact`.`date_data_collected` < '2013-10-01'
AND `dealer_contact`.`id_dealer` IN (265))
GROUP BY `contact`.`id_contact`
ORDER BY NULL;
I would get exactly what I need with this HAVING constraint:
HAVING SUM(IF(`dealer_contact`.`type`='customer', 1, 0)) = 0
How can I get this fixed with a Django Queryset? I need a queryset in this instance.
Here I am using annotate only in order to get the GROUP BY on contact.id_contact.
Edit: My goal is to get the Contacts who have no "customer" relation in dealercontact but have "ref" relation(s) (according to the WHERE clause of course).
Models
class Contact(models.Model):
id_contact = models.AutoField(primary_key=True)
title = models.CharField(max_length=255L, blank=True, choices=choices_custom_sort(TITLE_CHOICES))
last_name = models.CharField(max_length=255L, blank=True)
first_name = models.CharField(max_length=255L, blank=True)
[...]
date_data_collected = models.DateField(null=True, db_index=True)
class Dealer(models.Model):
id_dealer = models.AutoField(primary_key=True)
address1 = models.CharField(max_length=45L, blank=True)
[...]
class DealerContact(Auditable):
id_dealer_contact = models.AutoField(primary_key=True)
contact = models.ForeignKey(Contact, db_column='id_contact')
dealer = models.ForeignKey(Dealer, db_column='id_dealer')
activity = models.CharField(max_length=32, choices=choices_custom_sort(ACTIVITIES), db_index=True)
type = models.CharField(max_length=32, choices=choices_custom_sort(DEALER_CONTACT_TYPE), db_index=True)

I figured this out by adding two binary fields in DealerContact: is_ref and is_customer.
If type='ref' then is_ref=1 and is_customer=0.
Else if type='customer' then is_ref=0 and is_customer=1.
Thus, I am now able to use annotate(nb_customers=Sum('is_customer')) and then use filter(nb_customers=0).
The final queryset consists in:
Contact.objects.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_customers=Sum('dealercontact__is_customer'))\
.filter(nb_customers=0)

Actually there is a way you can add your own custom HAVING and GROUP BY clauses if you need.
Just use my example with caution - if Django ORM code/paths will change in future Django versions, you will have to update your code too.
Image you have Book and Edition models, where for each book there can be multiple editions and you want to select first US edition date within Book queryset.
Adding custom HAVING and GROUP BY clauses in Django 1.5+:
from django.db.models import Min
from django.db.models.sql.where import ExtraWhere, AND
qs = Book.objects.all()
# Standard annotate
qs = qs.annotate(first_edition_date=Min("edition__date"))
# Custom HAVING clause, to limit annotation by US country only
qs.query.having.add(ExtraWhere(['"app_edition"."country"=%s'], ["US"]), AND)
# Custom GROUP BY clause will be needed too
qs.query.group_by.append(("app_edition", "country"))
ExtraWhere can contain not just fields, but any raw sql conditions and functions too.

Are you not using raw query just because you want orm object? Using Contact.objects.raw() generate instances similar filter. Refer to https://docs.djangoproject.com/en/dev/topics/db/sql/ for more help.

My goal is to get the Contacts who have no "customer" relation in
dealercontact but have "ref" relation(s) (according to the WHERE
clause of course).
This simple query fulfills this requirement:
Contact.objects.filter(dealercontact__type="ref").exclude(dealercontact__type="customer")
Is this enough, or do you need it to do something more?
UPDATE: if your requirement is
Contacts that have a "ref" relations, but do not have "customer"
relations with the same dealer
you can do this:
from django.db.models import Q
Contact.objects.filter(Q(dealercontact__type="ref") & ~Q(dealercontact__type="customer"))

Related

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

Django query joining two tables

I am new in django. I want to create a query in django I tried with select_related but I don't know how to insert the second part of the condition: AND model1.qty >= model2.items
I've tried:
Model1.objects.select_related('model2).filter(model1.qty__gte=?)
But it's not working properly.
Below is the SQL query which I want to implement with django queryset:
SELECT model1.name,model2.name WHERE model1.id=model2.model1.id AND model1.qty >= model2.items
My models:
class Article(models.Model):
date_crea = models.DateTimeField('Créer le', auto_now_add=True)
designation = models.TextField('designation', max_length=500)
seuil = models.IntegerField('Seuil d\'alerte')
class Stock(models.Model):
date_crea = models.DateTimeField('Créer le', auto_now_add=True)
article = models.ForeignKey(Article, on_delete=models.CASCADE)
qte_reel = models.IntegerField('stock reel',default=0)
You use an F expression to refer to the value of a field in the database. There have to be relations to follow from the current object to the field on the object that you want to compare against.
I'm not clear how the question relates to the posted models, but an F expression can follow a foreign key so
Stock.objects.filter( qte_reel__gte = F( 'article__seuil' ))
would work, if those fields are the ones you want to compare.

Annotate QuerySet using raw SQL

In case I am asking the wrong question, let me first state the end goal: I need to allow users to filter a ListView by a field that's not in the primary model (Salesleadquote), but instead the field comes from a model (Salesleadbusinessgroup) with a FK to a related model (Saleslead).
The way I am trying to approach this is by annotating a field on Salesleadquote.
The models:
class Salesleadquote(models.Model):
salesleadquoteid = models.AutoField(db_column='SalesLeadQuoteId', primary_key=True)
salesleadid = models.ForeignKey(Saleslead, models.DO_NOTHING, db_column='SalesLeadId')
...
class Saleslead(models.Model):
salesleadid = models.AutoField(db_column='SalesLeadId', primary_key=True)
...
class Salesleadbusinessgroup(models.Model):
salesleadbusinessgroupid = models.AutoField(db_column='SalesLeadBusinessGroupId', primary_key=True)
salesleadid = models.ForeignKey(Saleslead, models.DO_NOTHING, db_column='SalesLeadId')
businessgroupid = models.ForeignKey(Businessgroup, models.DO_NOTHING, db_column='BusinessGroupId')
The desired result (queryset), in SQL:
SELECT slq.*, slbg.BusinessGroupId FROM crm.SalesLeadQuote slq
LEFT JOIN
(SELECT SalesLeadId, BusinessGroupId
FROM crm.SalesLeadBusinessGroup ) slbg
ON slbg.SalesLeadId = slq.SalesLeadId
WHERE slbg.BusinessGroupId IN (5,21)
I know I can get a RawQuerySet by doing something like
Salesleadquote.objects.raw("SELECT salesleadquote.*, \
salesleadbusinessgroup.businessgroupid \
FROM salesleadquote \
LEFT JOIN salesleadbusinessgroup \
ON salesleadquote.salesleadid = salesleadbusinessgroup.salesleadid \
WHERE salesleadbusinessgroup.businessgroupid IN (5,21)")
But I need the functionality of a QuerySet, so my idea was to annotate the desired field (businessgroupid) in Salesleadquote, but I've been struggling with how to accomplish this.
I implemented a work-around that doesn't address my original question but works for my use case. I created a view at the database level (called SalesLeadQuoteBG) using the SQL I had posted and then tied that to a model to use with Django's ORM.
class Salesleadquotebg(models.Model):
"""
This model represents a database view that extends the Salesleadquote table with a business group id column.
There can be multiple business groups per quote, resulting in duplicate quotes, but this is handled at the view and template layer
via filtering (users are required to select a business group).
"""
salesleadquoteid = models.IntegerField(db_column='SalesLeadQuoteId', primary_key=True) # Field name made lowercase.
salesleadid = models.ForeignKey(Saleslead, models.DO_NOTHING, db_column='SalesLeadId') # Field name made lowercase.
...
businessgroupid = models.ForeignKey(Businessgroup, models.DO_NOTHING, db_column='BusinessGroupId')
I am using django-filters for the filtering.
filters.py:
BG_CHOICES = (
(5, 'Machine Vision'),
(21, 'Process Systems'),
)
class BGFilter(django_filters.FilterSet):
businessgroupid = django_filters.ChoiceFilter(choices=BG_CHOICES)
class Meta:
model = Salesleadquotebg
fields = ['businessgroupid', ]
This can be done with pure Django ORM. Relationships can be followed backwards (filtering on salesleadbusinessgroup) and the double underscore syntax can be used to query attributes of the related model and also follow more relationships
Salesleadquote.objects.filter(
salesleadbusinessgroup__businessgroupid__in=(5,21)
)

How to return all records but exclude the last item

I'm having problem filtering in django-models.
I want to return all records of a particular animal but excluding the last item based on the latest created_at value and sorted in a descending order.
I have this model.
class Heat(models.Model):
# Fields
performer = models.CharField(max_length=25)
is_bred = models.BooleanField(default=False)
note = models.TextField(max_length=250, blank=True, null=True)
result = models.BooleanField(default=False)
# Relationship Fields
animal = models.ForeignKey(Animal, related_name='heats', on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True, editable=False)
last_updated = models.DateTimeField(auto_now=True, editable=False)
I was able to achieved the desired result by this raw sql script. But I want a django approach.
SELECT
*
FROM
heat
WHERE
heat.created_at != (SELECT MAX((heat.created_at)) FROM heat)
AND heat.animal_id = '2' ORDER BY heat.created_at DESC;
Please help.
It will be
Heat.objects.order_by("-created_at")[1:]
For a particular animal it will then be:
Heat.objects.filter(animal_id=2).order_by("-created_at")[1:]
where [1:] on a queryset has a regular python slice syntax and generates the correct SQL code. (In this case simply removes the first / most recently created element)
Upd: as #schwobaseggl mentioned, in the comments, slices with negative index don't work on django querysets. Therefore the objects are reverse ordered first.
I just converted your SQL query to Django ORM code.
First, fetch the max created_at value using aggregation and do an exclude.
from django.db.models import Max
heat_objects = Heat.objects.filter(
animal_id=2
).exclude(
created_at=Heat.objects.all().aggregate(Max('created_at'))['created_at__max']
)
Get last record:
obj= Heat.objects.all().order_by('-id')[0]
Make query:
query = Heat.objects.filter(animal_id=2).exclude(id=obj['id']).all()
The query would be :
Heat.objects.all().order_by('id')[1:]
You could also put any filter you require by replacing all()

django querset filter foreign key select first record

I have a History model like below
class History(models.Model):
class Meta:
app_label = 'subscription'
ordering = ['-start_datetime']
subscription = models.ForeignKey(Subscription, related_name='history')
FREE = 'free'
Premium = 'premium'
SUBSCRIPTION_TYPE_CHOICES = ((FREE, 'Free'), (Premium, 'Premium'),)
name = models.CharField(max_length=32, choices=SUBSCRIPTION_TYPE_CHOICES, default=FREE)
start_datetime = models.DateTimeField(db_index=True)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
cancelled_datetime = models.DateTimeField(blank=True, null=True)
Now i have a queryset filtering like below
users = get_user_model().objects.all()
queryset = users.exclude(subscription__history__end_datetime__lt=timezone.now())
The issue is that in the exclude above it is checking end_datetime for all the rows for a particular history object. But i only want to compare it with first row of history object.
Below is how a particular history object looks like. So i want to write a queryset filter which can do datetime comparison on first row only.
You could use a Model Manager method for this. The documentation isn't all that descriptive, but you could do something along the lines of:
class SubscriptionManager(models.Manager):
def my_filter(self):
# You'd want to make this a smaller query most likely
subscriptions = Subscription.objects.all()
results = []
for subscription in subscriptions:
sub_history = subscription.history_set.first()
if sub_history.end_datetime > timezone.now:
results.append(subscription)
return results
class History(models.Model):
subscription = models.ForeignKey(Subscription)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
objects = SubscriptionManager()
Then: queryset = Subscription.objects().my_filter()
Not a copy-pastable answer, but shows the use of Managers. Given the specificity of what you're looking for, I don't think there's a way to get it just via the plain filter() and exclude().
Without knowing what your end goal here is, it's hard to say whether this is feasible, but have you considered adding a property to the subscription model that indicates whatever you're looking for? For example, if you're trying to get everyone who has a subscription that's ending:
class Subscription(models.Model):
#property
def ending(self):
if self.end_datetime > timezone.now:
return True
else:
return False
Then in your code: queryset = users.filter(subscription_ending=True)
I have tried django's all king of expressions(aggregate, query, conditional) but was unable to solve the problem so i went with RawSQL and it solved the problem.
I have used the below SQL to select the first row and then compare the end_datetime
SELECT (end_datetime > %s OR end_datetime IS NULL) AS result
FROM subscription_history
ORDER BY start_datetime DESC
LIMIT 1;
I will select my answer as accepted if not found a solution with queryset filter chaining in next 2 days.