QuerySet results being duplicated when chaining queries

QuerySet results being duplicated when chaining queries - django

So I am attempting to chain queries together. This is what I am doing
queryset_list = modelEmployee.objects.filter(stars__lte=3)
A = len(queryset_list) #A=2
queryset_list = queryset_list.filter(skills__skill_description__in=skill_filter)
A = len(queryset_list) #A=4
So with the above I am suppose to get two results but I am getting four. Seems like the results of first query are being duplicated in the second thus resulting to 4. Any suggestion on why the results are being duplicated and how I can fix this ? I was expecting to get only two items since it passes both the filters.
This is the model
class modelEmployee(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, null=True, blank=True)
skills = models.ManyToManyField(modelSkill, blank=True)
location = models.PointField(srid=4326,max_length=40, blank=True,null=True)

If you do a query on a ManyToManyField Django will perform an INNER JOIN which means there will be a row for every item on each side of the join. If you want to have unique results use distinct().
queryset_list = queryset_list.filter(
skills__skill_description__in=skill_filter
).distinct()
See this article for some examples.

Related

how to select fields in related tables quickly in django models

I'm trying to get all values in current table, and also get some fields in related tables.
class school(models.Model):
school_name = models.CharField(max_length=256)
school_type = models.CharField(max_length=128)
school_address = models.CharField(max_length=256)
class hometown(models.Model):
hometown_name = models.CharField(max_length=32)
class person(models.Model):
person_name = models.CharField(max_length=128)
person_id = models.CharField(max_length=128)
person_school = models.ForeignKey(school, on_delete=models.CASCADE)
person_ht = models.ForeignKey(hometown, on_delete=models.CASCADE)
how to quick select all info i needed into a dict for rendering.
there will be many records in person, i got school_id input, and want to get all person in this school, and also want these person's hometown_name shown.
i tried like this, can get the info i wanted. And any other quick way to do it?
m=person.objects.filter(person_school_id=1)
.values('id', 'person_name', 'person_id',
school_name=F('person_school__school_name'),
school_address=F('person_school__school_address'),
hometown_name=F('person_ht__hometown_name'))
person_name, person_id, school_name, school_address, hometown_name
if the person have many fields, it will be a hard work for list all values.
what i mean, is there any queryset can join related tables' fields together, which no need to list fields in values.
Maybe like this:
m=person.objects.filter(person_school_id=1).XXXX.values()
it can show all values in school, and all values in hometown together with person's values in m, and i can
for x in m:
print(x.school_name, x.hometown_name, x.person_name)

You add a prefetch_related query on top of your queryset.
prefetch_data = Prefetch('person_set, hometown_set, school_set', queryset=m)
Where prefetch_data will prepare your DB to fetch related tables and m is your original filtered query (so add this below your Person.objects.filter(... )
Then you do the actual query to the DB:
query = query.prefetch_related(prefetch_data)
Where query will be the actual resulting query with a list of Person objects (so add that line below the prefetch_data one).
Example:
m=person.objects.filter(person_school_id=1)
.values('id', 'person_name', 'person_id',
school_name=F('person_school__school_name'),
school_address=F('person_school__school_address'),
hometown_name=F('person_ht__hometown_name'))
prefetch_data = Prefetch('person_set, hometown_set, school_set', queryset=m)
query = query.prefetch_related(prefetch_data)
In that example I've broken down the queries into more manageable pieces, but you can do the whole thing in one big line too (less manageable to read though):
m=person.objects.filter(person_school_id=1)
.values('id', 'person_name', 'person_id',
school_name=F('person_school__school_name'),
school_address=F('person_school__school_address'),
hometown_name=F('person_ht__hometown_name')).prefetch_related('person, hometown, school')

How to return all records but exclude the last item

I'm having problem filtering in django-models.
I want to return all records of a particular animal but excluding the last item based on the latest created_at value and sorted in a descending order.
I have this model.
class Heat(models.Model):
# Fields
performer = models.CharField(max_length=25)
is_bred = models.BooleanField(default=False)
note = models.TextField(max_length=250, blank=True, null=True)
result = models.BooleanField(default=False)
# Relationship Fields
animal = models.ForeignKey(Animal, related_name='heats', on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True, editable=False)
last_updated = models.DateTimeField(auto_now=True, editable=False)
I was able to achieved the desired result by this raw sql script. But I want a django approach.
SELECT
*
FROM
heat
WHERE
heat.created_at != (SELECT MAX((heat.created_at)) FROM heat)
AND heat.animal_id = '2' ORDER BY heat.created_at DESC;
Please help.

It will be
Heat.objects.order_by("-created_at")[1:]
For a particular animal it will then be:
Heat.objects.filter(animal_id=2).order_by("-created_at")[1:]
where [1:] on a queryset has a regular python slice syntax and generates the correct SQL code. (In this case simply removes the first / most recently created element)
Upd: as #schwobaseggl mentioned, in the comments, slices with negative index don't work on django querysets. Therefore the objects are reverse ordered first.

I just converted your SQL query to Django ORM code.
First, fetch the max created_at value using aggregation and do an exclude.
from django.db.models import Max
heat_objects = Heat.objects.filter(
animal_id=2
).exclude(
created_at=Heat.objects.all().aggregate(Max('created_at'))['created_at__max']
)

Get last record:
obj= Heat.objects.all().order_by('-id')[0]
Make query:
query = Heat.objects.filter(animal_id=2).exclude(id=obj['id']).all()

The query would be :
Heat.objects.all().order_by('id')[1:]
You could also put any filter you require by replacing all()

django querset filter foreign key select first record

I have a History model like below
class History(models.Model):
class Meta:
app_label = 'subscription'
ordering = ['-start_datetime']
subscription = models.ForeignKey(Subscription, related_name='history')
FREE = 'free'
Premium = 'premium'
SUBSCRIPTION_TYPE_CHOICES = ((FREE, 'Free'), (Premium, 'Premium'),)
name = models.CharField(max_length=32, choices=SUBSCRIPTION_TYPE_CHOICES, default=FREE)
start_datetime = models.DateTimeField(db_index=True)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
cancelled_datetime = models.DateTimeField(blank=True, null=True)
Now i have a queryset filtering like below
users = get_user_model().objects.all()
queryset = users.exclude(subscription__history__end_datetime__lt=timezone.now())
The issue is that in the exclude above it is checking end_datetime for all the rows for a particular history object. But i only want to compare it with first row of history object.
Below is how a particular history object looks like. So i want to write a queryset filter which can do datetime comparison on first row only.

You could use a Model Manager method for this. The documentation isn't all that descriptive, but you could do something along the lines of:
class SubscriptionManager(models.Manager):
def my_filter(self):
# You'd want to make this a smaller query most likely
subscriptions = Subscription.objects.all()
results = []
for subscription in subscriptions:
sub_history = subscription.history_set.first()
if sub_history.end_datetime > timezone.now:
results.append(subscription)
return results
class History(models.Model):
subscription = models.ForeignKey(Subscription)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
objects = SubscriptionManager()
Then: queryset = Subscription.objects().my_filter()
Not a copy-pastable answer, but shows the use of Managers. Given the specificity of what you're looking for, I don't think there's a way to get it just via the plain filter() and exclude().
Without knowing what your end goal here is, it's hard to say whether this is feasible, but have you considered adding a property to the subscription model that indicates whatever you're looking for? For example, if you're trying to get everyone who has a subscription that's ending:
class Subscription(models.Model):
#property
def ending(self):
if self.end_datetime > timezone.now:
return True
else:
return False
Then in your code: queryset = users.filter(subscription_ending=True)

I have tried django's all king of expressions(aggregate, query, conditional) but was unable to solve the problem so i went with RawSQL and it solved the problem.
I have used the below SQL to select the first row and then compare the end_datetime
SELECT (end_datetime > %s OR end_datetime IS NULL) AS result
FROM subscription_history
ORDER BY start_datetime DESC
LIMIT 1;
I will select my answer as accepted if not found a solution with queryset filter chaining in next 2 days.

order_by on Many-to-Many field results in duplicate entries in queryset

I am attempting to perform an order_by based a m2m field, but it ends up creating duplicate entries in my queryset. I have been searching through the django documentation and related questions on stack exchange, but I haven't been able to come up with any solutions.
Models:
class WorkOrder(models.Model):
...
appointment = models.ManyToManyField(Appointment, null=True, blank=True, related_name = 'appointment_from_schedule')
...
class Appointment(models.Model):
title = models.CharField(max_length=1000, blank=True)
allDay = models.BooleanField(default=False)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
url = models.URLField(blank=True, null=True)
Query:
qs = WorkOrder.objects.filter(work_order_status="complete").order_by("-appointment__start")
Results:
[<WorkOrder: 45: Davis>, <WorkOrder: 45: Davis>]
In interactive mode:
>>>qs[0] == a[1]
True
>>>qs[0].pk
45
>>>qs[1].pk
45
If I remove the order_by then I get only a single result, but adding it later puts the duplicate entry back in.
>>>qs = WorkOrder.objects.filter(work_order_status="complete")
>>>qs
[<WorkOrder: 45: Davis>]
>>>qs.order_by('appointment__start')
[<WorkOrder: 45: Davis>, <WorkOrder: 45: Davis>]
I have tried adding .distinct() and .distinct('pk'), but the former has no effect and the latter results in an error:
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

I took suggestions provided by sfletche about using annotate and discussed the problem in freenode.net irc channel #django.
Users FunkyBob and jtiai were able to help me getting it working.
Since there can be many appointments for each work order, when we ask it to order by appointments, it will return a row for every instance of appointment since it doesn't know which appointment I was intending for it to order by.
from django.db.models import Max
WorkOrder.objects.annotate(max_date=Max('appointment__start')).filter(work_order_status="complete").order_by('max_date')
So, we were on the right path it was just about getting the syntax correct.
Thank you for the help sfletche, FunkyBob and jtiai.

You might try using annotate with values:
qs = WorkOrder.objects.filter(work_order_status="complete").values("appointment").annotate(status="work_order_status").order_by("-appointment__start")

Reducing queries for manytomany models in django

EDIT:
It turns out the real question is - how do I get select_related to follow the m2m relationships I have defined? Those are the ones that are taxing my system. Any ideas?
I have two classes for my django app. The first (Item class) describes an item along with some functions that return information about the item. The second class (Itemlist class) takes a list of these items and then does some processing on them to return different values. The problem I'm having is that returning a list of items from Itemlist is taking a ton of queries, and I'm not sure where they're coming from.
class Item(models.Model):
# for archiving purposes
archive_id = models.IntegerField()
users = models.ManyToManyField(User, through='User_item_rel',
related_name='users_set')
# for many to one relationship (tags)
tag = models.ForeignKey(Tag)
sub_tag = models.CharField(default='',max_length=40)
name = models.CharField(max_length=40)
purch_date = models.DateField(default=datetime.datetime.now())
date_edited = models.DateTimeField(auto_now_add=True)
price = models.DecimalField(max_digits=6, decimal_places=2)
buyer = models.ManyToManyField(User, through='Buyer_item_rel',
related_name='buyers_set')
comments = models.CharField(default='',max_length=400)
house_id = models.IntegerField()
class Meta:
ordering = ['-purch_date']
def shortDisplayBuyers(self):
if len(self.buyer_item_rel_set.all()) != 1:
return "multiple buyers"
else:
return self.buyer_item_rel_set.all()[0].buyer.name
def listBuyers(self):
return self.buyer_item_rel_set.all()
def listUsers(self):
return self.user_item_rel_set.all()
def tag_name(self):
return self.tag
def sub_tag_name(self):
return self.sub_tag
def __unicode__(self):
return self.name
and the second class:
class Item_list:
def __init__(self, list = None, house_id = None, user_id = None,
archive_id = None, houseMode = 0):
self.list = list
self.house_id = house_id
self.uid = int(user_id)
self.archive_id = archive_id
self.gen_balancing_transactions()
self.houseMode = houseMode
def ret_list(self):
return self.list
So after I construct Itemlist with a large list of items, Itemlist.ret_list() takes up to 800 queries for 25 items. What can I do to fix this?

Try using select_related
As per a question I asked here

Dan is right in telling you to use select_related.
select_related can be read about here.
What it does is return in the same query data for the main object in your queryset and the model or fields specified in the select_related clause.
So, instead of a query like:
select * from item
followed by several queries like this every time you access one of the item_list objects:
select * from item_list where item_id = <one of the items for the query above>
the ORM will generate a query like:
select item.*, item_list.*
from item a join item_list b
where item a.id = b.item_id
In other words: it will hit the database once for all the data.

You probably want to use prefetch_related
Works similarly to select_related, but can deal with relations selected_related cannot. The join happens in python, but I've found it to be more efficient for this kind of work than the large # of queries.
Related reading on the subject

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

QuerySet results being duplicated when chaining queries - django

Related

how to select fields in related tables quickly in django models

How to return all records but exclude the last item

django querset filter foreign key select first record

order_by on Many-to-Many field results in duplicate entries in queryset

Reducing queries for manytomany models in django

Categories

Resources