Prefetch of ForeignKey in Serializer.is_valid() in Django - django

I am attempting to create many model instances in one POST using a Mixin to support POST of arrays.
My use case will involve creating 1000s of model instances in each call. This very quickly becomes slow with DRF due to each model being created one at a time.
In an attempt to optimise the creation, I have changed to use bulk_create(). While this does result in a significant improvement, I noticed that for each model instance being created, a SELECT statement was being run to get the ForeignKey, which I traced to the call to serializer.is_valid().
As such, adding n instances would result in n SELECT queries to get the ForeignKey and 1 INSERT query.
As an example:
Models (using automatic ID fields):
class Customer(models.Model):
name = models.CharField(max_length=100, blank=False)
joined = models.DateTimeField(auto_now_add=True)
class Order(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
timestamp = models.DateTimeField()
price = models.FloatField()
POST data to api/orders/:
[
{
"customer": 13,
...
},
{
"customer": 14,
...
},
{
"customer": 14,
...
}
]
This would result in 3 SELECT statements to get the Customer for each of the Orders, followed by 1 INSERT statement to push the data in.
Similar to prefetch_related() for queries when fetching data in GET requests, is there any way to avoid performing so many queries when deserializing and validating (such as setting the serializer to prefetch foreign keys)?

Related

How to apply an arbitrary filter on a specific chained prefetch_related() within Django?

I'm trying to optimize the fired queries of an API. I have four models namely User, Content, Rating, and UserRating with some relations to each other. I want the respective API returns all of the existing contents alongside their rating count as well as the score given by a specific user to that.
I used to do something like this: Content.objects.all() as a queryset, but I realized that in the case of having a huge amount of data tons of queries will be fired. So I've done some efforts to optimize the fired queries using select_related() and prefetch_related(). However, I'm dealing with an extra python searching, that I hope to remove that, using a controlled prefetch_related() — applying a filter just for a specific prefetch in a nested prefetch and select.
Here are my models:
from django.db import models
from django.conf import settings
class Content(models.Model):
title = models.CharField(max_length=50)
class Rating(models.Model):
count = models.PositiveBigIntegerField(default=0)
content = models.OneToOneField(Content, on_delete=models.CASCADE)
class UserRating(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL, blank=True, null=True, on_delete=models.CASCADE
)
score = models.PositiveSmallIntegerField()
rating = models.ForeignKey(
Rating, related_name="user_ratings", on_delete=models.CASCADE
)
class Meta:
unique_together = ["user", "rating"]
Here's what I've done so far:
contents = (
Content.objects.select_related("rating")
.prefetch_related("rating__user_ratings")
.prefetch_related("rating__user_ratings__user")
)
for c in contents: # serializer like
user_rating = c.rating.user_ratings.all()
for u in user_rating: # how to remove this dummy search?
if u.user_id == 1:
print(u.score)
Queries:
(1) SELECT "bitpin_content"."id", "bitpin_content"."title", "bitpin_rating"."id", "bitpin_rating"."count", "bitpin_rating"."content_id" FROM "bitpin_content" LEFT OUTER JOIN "bitpin_rating" ON ("bitpin_content"."id" = "bitpin_rating"."content_id"); args=(); alias=default
(2) SELECT "bitpin_userrating"."id", "bitpin_userrating"."user_id", "bitpin_userrating"."score", "bitpin_userrating"."rating_id" FROM "bitpin_userrating" WHERE "bitpin_userrating"."rating_id" IN (1, 2); args=(1, 2); alias=default
(3) SELECT "users_user"."id", "users_user"."password", "users_user"."last_login", "users_user"."is_superuser", "users_user"."first_name", "users_user"."last_name", "users_user"."email", "users_user"."is_staff", "users_user"."is_active", "users_user"."date_joined", "users_user"."user_name" FROM "users_user" WHERE "users_user"."id" IN (1, 4); args=(1, 4); alias=default
As you can see on the above fired queries I've only three queries rather than too many queries which were happening in the past. However, I guess I can remove the python searching (the second for loop) using a filter on my latest query — users_user"."id" IN (1,) instead. According to this post and my efforts, I couldn't apply a .filter(rating__user_ratings__user_id=1) on the third query. Actually, I couldn't match my problem using Prefetch(..., queryset=...) instance given in this answer.
I think you are looking for Prefetch object:
https://docs.djangoproject.com/en/4.0/ref/models/querysets/#prefetch-objects
Try this:
from django.db.models import Prefetch
contents = Content.objects.select_related("rating").prefetch_related(
Prefetch(
"rating__user_ratings",
queryset=UserRating.objects.filter(user__id=1),
to_attr="user_rating_number_1",
)
)
for c in contents: # serializer like
print(c.rating.user_rating_number_1[0].score)

Django, Create GIN index for child element in JSON Array field

I have a model that uses PostgreSQL and has field like this:
class MyModel(models.Model):
json_field = models.JSONField(default=list)
This field contains data like this:
[
{"name": "AAAAA", "product": "11111"},
{"name": "BBBBB", "product": "22222"},
]
Now I want to index by json_field -> product field, because it is being used as identification. Then i want to create GinIndex like this:
class Meta:
indexes = [
GinIndex(name='product_json_idx', fields=['json_field->product'], opclasses=['jsonb_path_ops'])
]
When I try to create migration, I get error like this:
'indexes' refers to the nonexistent field 'json_field->product'.
How to create GinIndex that will be used for child attribute in Json Array?
Please don't use a JSONField [Django-doc] for well-structured data: if the structure is clear, like here where we have a list of objects where each object has a name and a product, it makes more sense to work with extra models, like:
class MyModel(models.Model):
# …
pass
class Product(models.Model):
# …
pass
class Entry(models.Model):
my_model = models.ForeignKey(MyModel, on_delete=models.CASCADE)
name = models.CharField(max_length=255)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
This will automatically add indexes on the ForeignKeys, but will also make querying simpeler and usually more efficient.
While databases like PostgreSQL indeed have put effort into making JSON columns easier to query, aggregate, etc. usually it is still beter to perform database normalization [wiki], especially since it has more means for referential integrity, and a lot of aggregates are simpeler on linear data.
If for example later a product is removed, it will require a lot of work to inspect the JSON blobs to remove that product. This is however a scenario that both Django and PostgreSQL databases cover with ON DELETE triggers and which will likely be more effective and safe when using the Django toolchain for this.

Race condition when two different users inserting new records to database in Django

There is a race condition situation, when I want to create a new instance of model Order.
There is a daily_id field that everyday for any category starts from one. It means every category has its own daily id.
class Order(models.Model):
daily_id = models.SmallIntegerField(default=0)
category = models.ForeignKey(Categoty, on_delete=models.PROTECT, related_name="orders")
declare_time = models.DateField()
...
}
daily_id field of new record is being calculated using this method:
def get_daily_id(category, declare_time):
try:
last_order = Order.objects.filter(declare_time=declare_time,
category=category).latest('daily_id')
return last_order.daily_id + 1
except Order.DoesNotExist:
# If no order has been registered in declare_time date.
return 1
The problem is that when two different users are registering orders in the same category at the same time, it is highly likely that the orders have the repetitive daily_id values.
I have tried #transaction.atomic decorator for post method of DRF APIView and it didn't work!
You must use an auto increment and add a view that computes your semantic order like :
SELECT *, ROW_NUMBER() OVER(PARTITION BY MyDayDate ORDER BY id_autoinc) AS daily_id

Query optimisation for FK association on inherited model from the base model

I have users who create (or receive) transactions. The transaction hierarchy I have is a multi-table inheritance, with Transaction as the base model containing the common fields between all transaction types, such as User (FK), amount, etc. I have several transaction types, which extend the Transaction model with type specific data.
For the sake of this example, a simplified structure illustrating my problem can be found below.
from model_utils.managers import InheritanceManager
class User(models.Model):
pass
class Transaction(models.Model):
DEPOSIT = 'deposit'
WITHDRAWAL = 'withdrawal'
TRANSFER = 'transfer'
TYPES = (
(DEPOSIT, DEPOSIT),
(WITHDRAWAL, WITHDRAWAL),
(TRANSFER, TRANSFER),
)
type = models.CharField(max_length=24, choices=TYPES)
user = models.ForeignKey(User)
amount = models.PositiveIntegerField()
objects = InheritanceManager()
class Meta:
indexes = [
models.Index(fields=['user']),
models.Index(fields=['type'])
]
class Withdrawal(Transaction):
TYPE = Transaction.WITHDRAWAL
bank_account = models.ForeignKey(BankAccount)
class Deposit(Transaction):
TYPE = Transaction.DEPOSIT
card = models.ForeignKey(Card)
class Transfer(Transaction):
TYPE = Transaction.Transfer
recipient = models.ForeignKey(User)
class Meta:
indexes = [
models.Index(fields=['recipient'])
]
I then set each transaction's type in the inherited model's .save() method. This is all fine and well.
The problem comes in when I would like to fetch the a user's transactions. Specifically, I require the sub-model instances (deposits, transfers and withdrawals), rather than the base model (transactions). I also require transactions that the user both created themselves AND transfers they have received. For the former I use django-model-utils's fantastic IneritanceManager, which works great. Except that when I include the filtering on the transfer submodel's recipient FK field the DB query increases by an order of magnitude.
As illustrated above I have placed indexes on the Transaction user column and the Transfer recipient column. But it appeared to me that what I may need is an index on the Transaction subtype, if that is at all possible. I have attempted to achieve this effect by putting an index on the Transaction type field and including it in the query, as you will see below, but this appears to have no effect. Furthermore, I use .select_related() for the user objects since they are required in the serializations.
The query is structured as such:
from django.db.models import Q
queryset = Transaction.objects.select_related(
'user',
'transfer__recipient'
).select_subclasses().filter(
Q(user=request.user) |
Q(type=Transaction.TRANSFER, transfer__recipient=request.user)
).order_by('-id')
So my question is, why is there an order of magnitude difference on the DB query when including the Transfer.recipient in the query? Have I missed something? Am I doing something silly? Or is there a way I can optimise this further?

how to update unique field for multiple entries in django

I have a simple Django model similar to this:
class TestModel(models.Model):
test_field = LowerCaseCharField(max_length=20, null=False,
verbose_name='Test Field')
other_test_field = LowerCaseCharField(max_length=20, null=False, unique=True,
verbose_name='Other Test Field')
Notice that other_test_field is a unique field. Now I also have some data stored that looks like this:
[
{
test_field: "object1",
other_test_field: "test1"
},
{
test_field: "object2",
other_test_field: "test2"
}
]
All I'm trying to do now is switch the other_test_field fields in these two objects, so that the first object has "test2" and the second object has "test1" for other_test_field. How do I accomplish that while preserving the uniqueness? Ultimately I'm trying to update data in bulk, not just swapping two fields.
Anything that updates data in serial is going to hit an IntegrityError due to unique constraint violation, and I don't know a good way to remove the unique constraint temporarily, for this one operation, before adding it back. Any suggestions?