Optimizing Django get queries

Optimizing Django get queries - django

I have Django code similar to this:
for obj in some_list:
m1obj = Model1.objects.get(a=obj.a, b=obj.b, c=obj.c)
Model2(m1=m1obj, d=obj.d, e='foo').save()
I did optimize the insert into Model2 using bulk_create, however, this is still painfully slow because of the get from Model1 (~45sec for 3k inserts).
I also tried adding:
class Meta:
index_together = [
('a', 'b', 'c'),
]
unique_together = [
('a', 'b', 'c'),
]
The unique_together helps a little, index_together didn't seem to have much effect.
I have a cumbersome workaround for this doing:
Filter Model1 getting all the objects I will need ordered by by one or more keys, e.g. order_by('a', 'b'), and make sure Django caches the result, e.g. len()
Use binary search (from bisect import bisect_left) to locate the first a then b ... etc (although there are much fewer bs and cs so just iterating is the same.
This reduces the insert time to jus over 3 seconds!
There must be a better, cleaner and maintainable way to do this. Any suggestions?
Is there a way to filter/get (smartly) within Django's cache query results?
EDIT: Changed d='foo' to d=obj.d - any bulk get needs to be mappable to the tuple it belongs to otherwise I cannot the create Model2 entry.

You can make a single query (as described here) that will fetch only the results that you need so no need to sort and binary search later.
I have not tested it so I don't know whether it will be faster than what you are already doing or not. Also as the SQL query will be large (according to number of records in some_list) so this query might raise error if it exceeds size defined in MySQL settings by parameter max_allowed_packet (By default 16MB as mentioned here).
import operator
from django.db.models import Q
query = reduce(operator.or_, (Q(a=obj.a, b=obj.b, c=obj.c) for x in values))
model1_objs = Model1.objects.filter(query)
Then you can do bulk_create with Model2.
Model2.objects.bulk_create([
Model2(m1=m1, d='foo', e='bar')
for m1 in model1_objs
])

How many rows does Model1 has? if it is relatively small (less than 50k) you can fetch all using filter then compare the tuple in python.
How about "some_list" is small list (less than 100), if it is you can the Q keyword to filter everything at once.
first = some_list.pop()
conditions = Q(a=first.a, b=first.b, c=first.c)
for obj in some_list:
conditions |= Q(a=obj.a, b=obj.b, c=obj.c)
Model1.objects.filter(conditions) # this will get your all the Model1 from ur list
Q object Ref: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#q-objects

Related

How to get boolean result in annotate django?

I have a filter which should return a queryset with 2 objects, and should have one different field. for example:
obj_1 = (name='John', age='23', is_fielder=True)
obj_2 = (name='John', age='23', is_fielder=False)
Both the objects are of same model, but different primary key. I tried usign the below filter:
qs = Model.objects.filter(name='John', age='23').annotate(is_fielder=F('plays__outdoor_game_role')=='Fielder')
I used annotate first time, but it gave me the below error:
TypeError: QuerySet.annotate() received non-expression(s): False.
I am new to Django, so what am I doing wrong, and what should be the annotate to get the required objects as shown above?

The solution by #ktowen works well, quite straightforward.
Here is another solution I am using, hope it is helpful too.
queryset = queryset.annotate(is_fielder=ExpressionWrapper(
Q(plays__outdoor_game_role='Fielder'),
output_field=BooleanField(),
),)
Here are some explanations for those who are not familiar with Django ORM:
Annotate make a new column/field on the fly, in this case, is_fielder. This means you do not have a field named is_fielder in your model while you can use it like plays.outdor_game_role.is_fielder after you add this 'annotation'. Annotate is extremely useful and flexible, can be combined with almost every other expression, should be a MUST-KNOWN method in Django ORM.
ExpressionWrapper basically gives you space to wrap a more complecated combination of conditions, use in a format like ExpressionWrapper(expression, output_field). It is useful when you are combining different types of fields or want to specify an output type since Django cannot tell automatically.
Q object is a frequently used expression to specify a condition, I think the most powerful part is that it is possible to chain the conditions:
AND (&): filter(Q(condition1) & Q(condition2))
OR (|): filter(Q(condition1) | Q(condition2))
Negative(~): filter(~Q(condition))
It is possible to use Q with normal conditions like below:
(Q(condition1)|id__in=[list])
The point is Q object must come to the first or it will not work.
Case When(then) can be simply explained as if con1 elif con2 elif con3 .... It is quite powerful and personally, I love to use this to customize an ordering object for a queryset.
For example, you need to return a queryset of watch history items, and those must be in an order of watching by the user. You can do it with for loop to keep the order but this will generate plenty of similar queries. A more elegant way with Case When would be:
item_ids = [list]
ordering = Case(*[When(pk=pk, then=pos)
for pos, pk in enumerate(item_ids)])
watch_history = Item.objects.filter(id__in=item_ids)\
.order_by(ordering)
As you can see, by using Case When(then) it is possible to bind those very concrete relations, which could be considered as 1) a pinpoint/precise condition expression and 2) especially useful in a sequential multiple conditions case.

You can use Case/When with annotate
from django.db.models import Case, BooleanField, Value, When
Model.objects.filter(name='John', age='23').annotate(
is_fielder=Case(
When(plays__outdoor_game_role='Fielder', then=Value(True)),
default=Value(False),
output_field=BooleanField(),
),
)

How to 'bulk update' with Django?

I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?

Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.

Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.

Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])

If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.

Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)

To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.

IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create

Django QuerySet update performance

Which one would be better for performance?
We take a slice of products. which make us impossible to bulk update.
products = Product.objects.filter(featured=True).order_by("-modified_on")[3:]
for product in products:
product.featured = False
product.save()
or (invalid)
for product in products.iterator():
product.update(featured=False)
I have tried QuerySet's in statement too as following.
Product.objects.filter(pk__in=products).update(featured=False)
This line works fine on SQLite. But, it rises following exception on MySQL. So, I couldn't use that.
DatabaseError: (1235, "This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'")
Edit: Also iterator() method causes re-evaluate the query. So, it is bad for performance.

As #Chris Pratt pointed out in comments, the second example is invalid because the objects don't have update methods. Your first example will require queries equal to results+1 since it has to update each object. That might really be costly if you have 1000 products. Ideally you do want to reduce this to a more fixed expense if possible.
This is a similar situation to another question:
Django: Cannot update a query once a slice has been taken
That being said, you would have to do it in at least 2 queries, but you have to be a bit sneaky on how to construct the LIMIT...
Using Q objects for complex queries:
# get the IDs we want to exclude
products = Product.objects.filter(featured=True).order_by("-modified_on")[:3]
# flatten them into just a list of ids
ids = products.values_list('id', flat=True)
# Now use the Q object to construct a complex query
from django.db.models import Q
# This builds a list of "AND id NOT EQUAL TO i"
limits = [~Q(id=i) for i in ids]
Product.objects.filter(featured=True, *limits).update(featured=False)

In some cases it's acceptable to cache QuerySet in array
products = list(products)
Product.objects.filter(pk__in=products).update(featured=False)
Small optimization with values_list
products_id = list(products.values_list('id', flat=True)
Product.objects.filter(pk__in=products_id).update(featured=False)

Migrate Django model to unique_together constraint

I have a model with three fields
class MyModel(models.Model):
a = models.ForeignKey(A)
b = models.ForeignKey(B)
c = models.ForeignKey(C)
I want to enforce a unique constraint between these fields, and found django's unique_together, which seems to be the solution. However, I already have an existing database, and there are many duplicates. I know that since unique_together works at the database level, I need to unique-ify the rows, and then try a migration.
Is there a good way to go about removing duplicates (where a duplicate has the same (A,B,C)) so that I can run migration to get the unique_together contstraint?

If you are happy to choose one of the duplicates arbitrarily, I think the following might do the trick. Perhaps not the most efficient but simple enough and I guess you only need to run this once. Please verify this all works yourself on some test data in case I've done something silly, since you are about to delete a bunch of data.
First we find groups of objects which form duplicates. For each group, (arbitrarily) pick a "master" that we are going to keep. Our chosen method is to pick the one with lowest pk
from django.db.models import Min, Count
master_pks = MyModel.objects.values('A', 'B', 'C'
).annotate(Min('pk'), count=Count('pk')
).filter(count__gt=1
).values_list('pk__min', flat=True)
we then loop over each master, and delete all its duplicates
masters = MyModel.objects.in_bulk( list(master_pks) )
for master in masters.values():
MyModel.objects.filter(a=master.a, b=master.b, c=master.c
).exclude(pk=master.pk).del_ACCIDENT_PREVENTION_ete()

I want to add a slightly improved answer that will delete everything in a single query, instead of looping and deleting for each duplicate group. This will be much faster if you have a lot of records.
non_dupe_pks = list(
Model.objects.values('A', 'B', 'C')
.annotate(Min('pk'), count=Count('pk'))
.order_by()
.values_list('pk__min', flat=True)
)
dupes = Model.objects.exclude(pk__in=non_dupe_pks)
dupes.delete()
It's important to add order_by() in the first query otherwise the default ordering in the model might mess up with the aggregation.
You can comment out the last line and use dupes.count() to check if the query is working as expected.

Django DB, finding Categories whose Items are all in a subset

I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.

Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.

lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module

I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Optimizing Django get queries - django

Related

How to get boolean result in annotate django?

How to 'bulk update' with Django?

Django QuerySet update performance

Migrate Django model to unique_together constraint

Django DB, finding Categories whose Items are all in a subset

Categories

Resources