Django ORM and hitting DB - django

When I do something like
I. objects = Model.objects.all()
and then
II. objects.filter(field_1=some_condition)
I hit db every time when on the step 2 with various conditions. Is there any way to get all data in first action and then just take care of the result?

You actually don't hit the db until you evaluate the qs, queries are lazy.
Read more here.
edit:
After re-reading your question it becomes apparent you were asking how to prevent db hits when filtering for different conditions.
qs = SomeModel.objects.all()
qs1 = qs.filter(some_field='some_value')
qs2 = qs.filter(some_field='some_other_value')
Usually you would want the database to do the filtering for you.
You could force an evaluation of the qs by converting it to a list. This would prevent further db hits, however it would likely be worse than having your db return you results.
qs_l = list(qs)
qs1_l = [element for element in qs_l if element.some_field='some_value']
qs2_l = [element for element in qs_l if element.some_field='some_other_value']

Of course you will hit db every time. filter() transforms to SQL statement which is executed by your db, you can't filter without hitting it. So you can retrieve all the objects you need with values() or list(Model.objects.all()) and, as zeekay suggested, use Python expressions (like list comprehensions) for additional filtering.

Why don't you just do objs = Model.objects.filter(field=condition)? That said, once the SQL query is executed you can use Python expressions to do further filtering/processing without incurring additional database hits.

Related

Forcing evaluation of prefetched results in django queries

I'm trying to use select_related/prefetch_related to optimize some queries. However I have issues in "forcing" the queries to be evaluated all at once.
Say I'm doing the following:
fp_query = Fourprod.objects.filter(choisi=True).select_related("fk_fournis")
pf = Prefetch("fourprod", queryset=fp_query) #
products = Products.objects.filter(id__in=fp_query).prefetch_related(pf)
With models:
class Fourprod(models.Model):
fk_produit = models.ForeignKey(to=Produit, related_name="fourprod")
fk_fournis = models.ForeignKey(to=Fournis,related_name="fourprod")
choisi = models.BooleanField(...)
class Produit(models.Model):
... ordinary fields...
class Fournis(models.Model):
... ordinary fields...
So essentially, Fourprod has a fk to Fournis, Produit, and I want to prefetch those when I build the Produits queryset. I've checked in debug that the prefetch actually occurs and it does.
I have a bunch of fields from different models I need to use to compute results. I don't really control the table structure, so I have to work with this. I can't come up with a reasonable query to do it all with the queries (or using raw), so I want to compute stuff python-side. It's a few 1000 objects, so reasonable to do in-memory. So I cast to a list to force the query evaluation:
products = list(products)
At this point, I would think that the Products and the related objects that I have pre-fetched should have been fetched from the DB. In the logs, just after the list() call, I get this:
02/08/22 15:21:08 DEBUG DEFAULT: (0.019) SELECT "products_fourprod"."id", "products_fourprod"."fk_produit_id", "products_fourprod"."fk_fournis_id", "products_fourprod"."choisi", "products_fourprod"."code_four", "products_fourprod"."prix", "products_fourprod"."comment", "products_fournis"."id", "products_fournis"."fk_user_create_id", "products_fournis"."nom", "products_fournis"."adresse", "products_fournis"."ville", "products_fournis"."tel", "products_fournis"."fax", "products_fournis"."contact", "products_fournis"."note", "products_fournis"."pays", "products_fournis"."province", "products_fournis"."postal", "products_fournis"."monnaie", "products_fournis"."tel_long", "products_fournis"."inactif", "products_fournis"."inuse", "products_fournis"."par", "products_fournis"."fk_langue", "products_fournis"."NOTE2" FROM "products_fourprod" LEFT OUTER JOIN "products_fournis" ON ("products_fourprod"."fk_fournis_id" = "products_fournis"."id") WHERE ("products_fourprod"."choisi" AND "products_fourprod"."fk_produit_id" IN (... all Product.id meeting the conditions...)
But then, the list comprehension using the products takes forever to complete:
rows = [[p.id, p.fourprod.first().id, p.desuet, p.no_prod, ... ] for p in products]
With apparently each single call to p.fourprod resulting in a DB hit:
02/08/22 15:26:19 DEBUG DEFAULT: (0.000) SELECT "products_fourprod"."id", "products_fourprod"."fk_produit_id", "products_fourprod"."fk_fournis_id", "products_fourprod"."choisi", "products_fourprod"."code_four", "products_fourprod"."prix", "products_fourprod"."comment", "products_fournis"."id", "products_fournis"."fk_user_create_id", "products_fournis"."nom", "products_fournis"."adresse", "products_fournis"."ville", "products_fournis"."tel", "products_fournis"."fax", "products_fournis"."contact", "products_fournis"."note", "products_fournis"."pays", "products_fournis"."province", "products_fournis"."postal", "products_fournis"."monnaie", "products_fournis"."tel_long", "products_fournis"."inactif", "products_fournis"."inuse", "products_fournis"."par", "products_fournis"."fk_langue", "products_fournis"."NOTE2" FROM "products_fourprod" LEFT OUTER JOIN "products_fournis" ON ("products_fourprod"."fk_fournis_id" = "products_fournis"."id") WHERE ("products_fourprod"."choisi" AND "products_fourprod"."fk_produit_id" = 1185) ORDER BY "products_fourprod"."id" ASC LIMIT 1; args=(1185,)
02/08/22 15:26:19 DEBUG DEFAULT: (0.000) SELECT "products_fourprod"."id", (.... more similar db hits... )
If I remove all the uses of related objects, then the list() call has actually forced the db hit already and the query executes quickly.
So.... if simply calling products = list(products) does not force the db to be queried for the prefetched objects as well, is there any ways I can make django's orm do so?
From the docs:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
first() implies a database query, so that will cause your query to not use the prefetched values.
Try to use p.fourprod.all()[0] instead to access the first related fourprod instead.

Django QuerySet update performance

Which one would be better for performance?
We take a slice of products. which make us impossible to bulk update.
products = Product.objects.filter(featured=True).order_by("-modified_on")[3:]
for product in products:
product.featured = False
product.save()
or (invalid)
for product in products.iterator():
product.update(featured=False)
I have tried QuerySet's in statement too as following.
Product.objects.filter(pk__in=products).update(featured=False)
This line works fine on SQLite. But, it rises following exception on MySQL. So, I couldn't use that.
DatabaseError: (1235, "This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'")
Edit: Also iterator() method causes re-evaluate the query. So, it is bad for performance.
As #Chris Pratt pointed out in comments, the second example is invalid because the objects don't have update methods. Your first example will require queries equal to results+1 since it has to update each object. That might really be costly if you have 1000 products. Ideally you do want to reduce this to a more fixed expense if possible.
This is a similar situation to another question:
Django: Cannot update a query once a slice has been taken
That being said, you would have to do it in at least 2 queries, but you have to be a bit sneaky on how to construct the LIMIT...
Using Q objects for complex queries:
# get the IDs we want to exclude
products = Product.objects.filter(featured=True).order_by("-modified_on")[:3]
# flatten them into just a list of ids
ids = products.values_list('id', flat=True)
# Now use the Q object to construct a complex query
from django.db.models import Q
# This builds a list of "AND id NOT EQUAL TO i"
limits = [~Q(id=i) for i in ids]
Product.objects.filter(featured=True, *limits).update(featured=False)
In some cases it's acceptable to cache QuerySet in array
products = list(products)
Product.objects.filter(pk__in=products).update(featured=False)
Small optimization with values_list
products_id = list(products.values_list('id', flat=True)
Product.objects.filter(pk__in=products_id).update(featured=False)

django's .extra(where= clauses are clobbered by table-renaming .filter(foo__in=... subselects

The short of it is, the table names of all queries that are inside a filter get renamed to u0, u1, ..., so my extra where clauses won't know what table to point to. I would love to not have to hand-make all the queries for every way I might subselect on this data, and my current workaround is to turn my extra'd queries into pk values_lists, but those are really slow and something of an abomination.
Here's what this all looks like. You can mostly ignore the details of what goes in the extra of this manager method, except the first sql line which points to products_product.id:
def by_status(self, *statii):
return self.extra(where=["""products_product.id IN
(SELECT recent.product_id
FROM (
SELECT product_id, MAX(start_date) AS latest
FROM products_productstatus
GROUP BY product_id
) AS recent
JOIN products_productstatus AS ps ON ps.product_id = recent.product_id
WHERE ps.start_date = recent.latest
AND ps.status IN (%s))""" % (', '.join([str(stat) for stat in statii]),)])
Which works wonderfully for all the situations involving only the products_product table.
When I want these products as a subselect, i do:
Piece.objects.filter(
product__in=Product.objects.filter(
pk__in=list(
Product.objects.by_status(FEATURED).values_list('id', flat=True))))
How can I keep the generalized abilities of a query set, yet still use an extra where clause?
At first: the issue is not totally clear to me. Is the second code block in your question the actual code you want to execute? If this is the case the query should work as expected since there is no subselect performed.
I assume so that you want to use the second code block without the list() around the subselect to prevent a second query being performed.
The django documentation refers to this issue in the documentation about the extra method. However its not very easy to overcome this issue.
The easiest but most "hakish" solution is to observe which table alias is produced by django for the table you want to query in the extra method. You can rely on the persistent naming of this alias as long as you construct the query always in the same fashion (you don't change the order of multiple extra methods or filter calls that cause a join).
You can inspect a query that will be execute in the DB queryset by using:
print Model.objects.filter(...).query
This will reveal the aliases that are used for the tables you want to query.
As of Django 1.11, you should be able to use Subquery and OuterRef to generate an equivalent query to your extra (using a correlated subquery rather than a join):
def by_status(self, *statii):
return self.filter(
id__in=Subquery(ProductStatus.values("product_id").filter(
status__in=statii,
product__in=Subquery(ProductStatus.objects.values(
"product_id",
).annotate(
latest=Max("start_date"),
).filter(
latest=OuterRef("start_date"),
).values("product_id"),
),
)
You could probably do it with Window expressions as well (as of Django 2.0).
Note that this is untested, so may need some tweaks.

fast lookup for the last element in a Django QuerySet?

I've a model called Valor. Valor has a Robot. I'm querying like this:
Valor.objects.filter(robot=r).reverse()[0]
to get the last Valor the the r robot. Valor.objects.filter(robot=r).count() is about 200000 and getting the last items takes about 4 seconds in my PC.
How can I speed it up? I'm querying the wrong way?
The optimal mysql syntax for this problem would be something along the lines of:
SELECT * FROM table WHERE x=y ORDER BY z DESC LIMIT 1
The django equivalent of this would be:
Valor.objects.filter(robot=r).order_by('-id')[:1][0]
Notice how this solution utilizes django's slicing method to limit the queryset before compiling the list of objects.
If none of the earlier suggestions are working, I'd suggest taking Django out of the equation and run this raw sql against your database. I'm guessing at your table names, so you may have to adjust accordingly:
SELECT * FROM valor v WHERE v.robot_id = [robot_id] ORDER BY id DESC LIMIT 1;
Is that slow? If so, make your RDBMS (MySQL?) explain the query plan to you. This will tell you if it's doing any full table scans, which you obviously don't want with a table that large. You might also edit your question and include the schema for the valor table for us to see.
Also, you can see the SQL that Django is generating by doing this (using the query set provided by Peter Rowell):
qs = Valor.objects.filter(robot=r).order_by('-id')[0]
print qs.query
Make sure that SQL is similar to the 'raw' query I posted above. You can also make your RDBMS explain that query plan to you.
It sounds like your data set is going to be big enough that you may want to denormalize things a little bit. Have you tried keeping track of the last Valor object in the Robot object?
class Robot(models.Model):
# ...
last_valor = models.ForeignKey('Valor', null=True, blank=True)
And then use a post_save signal to make the update.
from django.db.models.signals import post_save
def record_last_valor(sender, **kwargs):
if kwargs.get('created', False):
instance = kwargs.get('instance')
instance.robot.last_valor = instance
post_save.connect(record_last_valor, sender=Valor)
You will pay the cost of an extra db transaction when you create the Valor objects but the last_valor lookup will be blazing fast. Play with it and see if the tradeoff is worth it for your app.
Well, there's no order_by clause so I'm wondering about what you mean by 'last'. Assuming you meant 'last added',
Valor.objects.filter(robot=r).order_by('-id')[0]
might do the job for you.
django 1.6 introduces .first() and .last():
https://docs.djangoproject.com/en/1.6/ref/models/querysets/#last
So you could simply do:
Valor.objects.filter(robot=r).last()
Quite fast should also be:
qs = Valor.objects.filter(robot=r) # <-- it doesn't hit the database
count = qs.count() # <-- first hit the database, compute a count
last_item = qs[ count-1 ] # <-- second hit the database, get specified rownum
So, in practice you execute only 2 SQL queries ;)
Model_Name.objects.first()
//To get the first element
Model_name.objects.last()
//For get last()
in my case, the last is not work because there is only one row in the database
maybe help full for you too :)
Is there a limit clause in django? This way you can have the db, simply return a single record.
mysql
select * from table where x = y limit 1
sql server
select top 1 * from table where x = y
oracle
select * from table where x = y and rownum = 1
I realize this isn't translated into django, but someone can come back and clean this up.
The correct way of doing this, is to use the built-in QuerySet method latest() and feeding it whichever column (field name) it should sort by. The drawback is that it can only sort by a single db column.
The current implementation looks like this and is optimized in the same sense as #Aaron's suggestion.
def latest(self, field_name=None):
"""
Returns the latest object, according to the model's 'get_latest_by'
option or optional given field_name.
"""
latest_by = field_name or self.model._meta.get_latest_by
assert bool(latest_by), "latest() requires either a field_name parameter or 'get_latest_by' in the model"
assert self.query.can_filter(), \
"Cannot change a query once a slice has been taken."
obj = self._clone()
obj.query.set_limits(high=1)
obj.query.clear_ordering()
obj.query.add_ordering('-%s' % latest_by)
return obj.get()

Django Object Filter (last 1000)

How would one go about retrieving the last 1,000 values from a database via a Objects.filter? The one I am currently doing is bringing me the first 1,000 values to be entered into the database (i.e. 10,000 rows and it's bringing me the 1-1000, instead of 9000-1,000).
Current Code:
limit = 1000
Shop.objects.filter(ID = someArray[ID])[:limit]
Cheers
Solution:
queryset = Shop.objects.filter(id=someArray[id])
limit = 1000
count = queryset.count()
endoflist = queryset.order_by('timestamp')[count-limit:]
endoflist is the queryset you want.
Efficiency:
The following is from the django docs about the reverse() queryset method.
To retrieve the ''last'' five items in
a queryset, you could do this:
my_queryset.reverse()[:5]
Note that this is not quite the same
as slicing from the end of a sequence
in Python. The above example will
return the last item first, then the
penultimate item and so on. If we had
a Python sequence and looked at
seq[-5:], we would see the fifth-last
item first. Django doesn't support
that mode of access (slicing from the
end), because it's not possible to do
it efficiently in SQL.
So I'm not sure if my answer is merely inefficient, or extremely inefficient. I moved the order_by to the final query, but I'm not sure if this makes a difference.
reversed(Shop.objects.filter(id=someArray[id]).reverse()[:limit])