Django .count() on ManyToMany has become very slow - django

I have a Django project that consists of a scraper of our inventory, run on the server as a cronjob every few hours, and the Django Admin page - which we use to view / access all items.
We have about 30 items that are indexed.
So each 'Scraping Operation' consists of about 30 individual 'Search Operations' each of which get around 500 results per run.
Now, this description is a bit confusing, so I've included the models below.
class ScrapingOperation(models.Model):
date_started = models.DateTimeField(default=timezone.now, editable=True)
date_completed = models.DateTimeField(blank=True, null=True)
completed = models.BooleanField(default=False)
round = models.IntegerField(default=-1)
trusted = models.BooleanField(default=True)
class Search(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE)
date_started = models.DateTimeField(default=timezone.now, editable=True)
date_completed = models.DateTimeField(blank=True, null=True)
completed = models.BooleanField(default=False)
round = models.IntegerField(default=1)
scraping_operation = models.ForeignKey(ScrapingOperation, on_delete=models.CASCADE, related_name='searches')
trusted = models.BooleanField(default=True)
def total_ads(self):
return self.ads.count()
class Ad(models.Model):
item = models.ForeignKey(Item, on_delete=models.CASCADE, related_name='ads')
title = models.CharField(max_length=500)
price = models.DecimalField(max_digits=8, decimal_places=2, null=True)
first_seen = models.DateTimeField(default=timezone.now, editable=True)
last_seen = models.DateTimeField(default=timezone.now, editable=True)
def __str__(self):
return self.title
Now here is the problem we've run into.
On the admin pages for both the Search model and the SeachOperation model we would like to see the amount of ads scraped for that particular object (represented as a number) This works fine four our seachers, but our implementation for the SearchOperation has run into problems
This is the code that we use:
class ScrapingOperationAdmin(admin.ModelAdmin):
list_display = ['id', 'completed', 'trusted', 'date_started', 'date_completed', 'number_of_ads']
list_filter = ('completed', 'trusted')
view_on_site = False
inlines = [
SearchInlineAdmin,
]
def number_of_ads(self, instance):
total_ads = 0
for search in instance.searches.all():
total_ads += search.ads.count()
return total_ads
The problem that we have run into is this: The code works and provides the correct number, however, after +/- 10 ScrapingOperation we noticed that the site started to slow done when loading the page. We are now up to 60 ScrapingOperations and when we click the ScrapingOperations page in the Django admin it takes almost a minute to load.
Is there a more efficient way to do this? We thought about saving the total number of ads to the model itself, but it seems wasteful to dedicate a field to information that should be accessible with a simple .count() call. Yet our query is evidently so inefficient that the entire site locks down for almost a minute when it is executed. Does anyone have an idea of what we are doing wrong?
Based on the comments below I am currently working on the following solution:
def number_of_ads(self, instance):
total_ads = 0
searches = Search.objects.filter(scraping_operation=instance).annotate(Count('ads'))
for search in searches:
total_ads += search.ads__count
return total_ads

Use an annotation when getting the queryset
from django.db.models import Count
class ScrapingOperationAdmin(admin.ModelAdmin):
...
def get_queryset(self, request):
qs = super().get_queryset(request)
qs.annotate(number_of_ads=Count('searches__ads')
return qs

Related

How to use Get and Filter in set to get values related to a particular user

Im trying to get values of an order which a particular user made, I mean I have an e commerce app where a user can made purchases, I successfully got the order item to display when a user wants to make an order, but i want to get all the orders which are already purchased by that user to display in a different page (Order History), Im trying to use queryset for the Serializers but its just not work despite severally tweaks, and i have ready the docs but cant seem to get it right. Please help,
Model:
user = models.ForeignKey(settings.AUTH_USER_MODEL,
on_delete=models.CASCADE)
ref_code = models.CharField(max_length=20, blank=True, null=True)
items = models.ManyToManyField(eOrderItem)
start_date = models.DateTimeField(auto_now_add=True)
ordered_date = models.DateTimeField(null=True)
ordered = models.BooleanField(default=False)
payment = models.ForeignKey(
'Payment', on_delete=models.SET_NULL, blank=True, null=True)
coupon = models.ForeignKey(
'Coupon', on_delete=models.SET_NULL, blank=True, null=True)
being_delivered = models.BooleanField(default=False)
received = models.BooleanField(default=False)
refund_requested = models.BooleanField(default=False)
refund_granted = models.BooleanField(default=False)
transaction_id = models.CharField(max_length=200, null=True)
qr_code = models.ImageField(upload_to='qrcode', blank=True)
This is the serializer for the (Order History)
class TicketSerializer(serializers.ModelSerializer):
order_items = serializers.SerializerMethodField()
class Meta:
model = Order
fields = '__all__'
def get_order_items(self, obj):
return OrderItemSerializer().data
View:
class TicketDetailView(RetrieveAPIView):
serializer_class = TicketSerializer
permission_classes = (IsAuthenticated,)
def get_object(self):
try:
# order = Order.objects.get(user=self.request.user).filter(ordered=True)
# order = Order.objects.filter(order=True)
# order = Order.objects.get(user=self.request.user, ordered=True)
order = Order.objects.filter(ordered=False, user=self.request.user)
return order
except ObjectDoesNotExist:
return Response({"message": "You do not have any ticket"}, status=HTTP_400_BAD_REQUEST)
from the view, you can see i try tried may options with queryset, but its not work, It works when i use get(user=self.request.user), but when i pass Order=True(for order history) it says get() returned more than one Order -- it returned 3! and i understand because i use get() (other options dont work) when i pass Order=False (as it works for the order item to be purchased) it works because its just one at a time.
What do i do please, i just want to be about to get all the items that are order by a particular user.
You expect multiples results that's why you should override get_queryset() and not get_object() (should be used for detail views):
def get_queryset(self):
return Order.objects.filter(ordered=False, user=self.request.user)

Django Slow Database Model Query

I am having issues with the performance of a couple of queries in my Django app... all others are very fast.
I have an Orders model with OrderItems, the query seems to be running much slower than other queries (1-2 seconds, vs. 0.2 seconds). I'm using MySQL backend. In the serializer I do a count to return whether an order has food or drink items, I suspect this is causing the performance hit. Is there a better way to do it?
Here is my models setup for Order and OrderItems
class Order(models.Model):
STATUS = (
('1', 'Placed'),
('2', 'Complete')
)
PAYMENT_STATUS = (
('1', 'Pending'),
('2', 'Paid'),
('3', 'Declined'),
('4', 'Manual')
)
shop= models.ForeignKey(Shop,on_delete=models.DO_NOTHING)
customer = models.ForeignKey(Customer,on_delete=models.DO_NOTHING)
total_price = models.DecimalField(max_digits=6, decimal_places=2,default=0)
created_at = models.DateTimeField(auto_now_add=True, null=True)
time_completed = models.DateTimeField(auto_now_add=True, null=True,blank=True)
time_cancelled = models.DateTimeField(auto_now_add=True, null=True,blank=True)
status = models.CharField(max_length=2, choices=STATUS, default='1',)
payment_method = models.CharField(max_length=2, choices=PAYMENT_METHOD, default='3',)
payment_status = models.CharField(max_length=2, choices=PAYMENT_STATUS, default='1',)
type = models.CharField(max_length=2, choices=TYPE, default='1',)
def __str__(self):
return str(self.id)
class OrderItem(models.Model):
order = models.ForeignKey(Order,on_delete=models.CASCADE)
type = models.CharField(max_length=200,default='DRINK')
drink = models.ForeignKey(
Drink,
blank=True,null=True,on_delete=models.DO_NOTHING
)
food = models.ForeignKey(
Food,
blank=True,
null=True,
on_delete=models.DO_NOTHING
)
quantity = models.IntegerField(blank=True,null=True)
price = models.DecimalField(max_digits=6, decimal_places=2,default=0)
created_at = models.DateTimeField(auto_now_add=True, null=True)
delivered = models.BooleanField(default=False)
def __str__(self):
return str(self.id)
In my rest order serializer, here is the query for get,
queryset = Order.objects.filter(shop=shop,status__in=['1','2'],payment_status__in=['2','4'])
The serializer is below, but this query is quite slow. I assume because I am doing a count() on OrderItems - is there a more efficient way to do this?
class OrderOverviewSerializer(serializers.ModelSerializer):
tabledetails = serializers.SerializerMethodField()
has_food = serializers.SerializerMethodField()
has_drink = serializers.SerializerMethodField()
class Meta:
model = Order
fields = ['id','total_price', 'created_at','has_food','has_drink','type','status','shop','table','customer','shopdetails']
def get_shopdetails(self, instance):
qs = Shop.objects.get(id=instance.shop.id)
serializer = ShopSerializer(instance=qs, many=False)
return serializer.data
def get_has_food(self, obj):
foodCount = OrderItem.objects.filter(order=obj.id,type='FOOD').count()
return foodCount
def get_has_drink(self, obj):
drinkCount = OrderItem.objects.filter(order=obj.id,type='DRINK').count()
return drinkCount
There reason for that is famous N+1 problem that Django ORM is so inept to handle. The solution for it is to use select_related answerd in this question. More on that here.
You should consider db_index=True on the fields you're querying over (Order.status, Order.payment_status, OrderItem.type).
get_shopdetails() isn't used for anything in the serializer? (On a similar note, the getter for tabledetails is missing... are you maybe presenting some code that's not exactly what you're running?)
get_shopdetails() is redundant anyway; you can simply declare shop = ShopSerializer() and DRF will know what to do.
If the get_has_food/get_has_drink fields did prove to be the bottleneck (which they apparently didn't), you could use a Django aggregate to count the rows during the query for orders.
Speaking of, your serializer is accessing several foreign keys, which will all cause N+1 queries; you can add .select_related('shop', 'customer', 'table') (or .prefetch_related() the same) at the very least to have those get loaded in one fell swoop.
Beyond this -- profile your code! The easiest way to do that is to copy the skeleton from manage.py and add some code to simulate your query, e.g. (this is dry-coded):
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "my_settings")
import django
django.setup()
from django.test import TestClient
c = TestClient()
for x in range(15):
c.get("/api/order/") # TODO: use correct URL
and run your script with
python -m cProfile my_test_script.py
You'll see which functions end up taking the most time.

Object-level permissions in Django

I have a ListView as follows, enabling me to loop over two models (Market and ScenarioMarket) in a template:
class MarketListView(LoginRequiredMixin, ListView):
context_object_name = 'market_list'
template_name = 'market_list.html'
queryset = Market.objects.all()
login_url = 'login'
def get_context_data(self, **kwargs):
context = super(MarketListView, self).get_context_data(**kwargs)
context['scenariomarkets'] = ScenarioMarket.objects.all()
context['markets'] = self.queryset
return context
The two market models are as follows:
class Market(models.Model):
title = models.CharField(max_length=50, default="")
current_price = models.DecimalField(max_digits=5, decimal_places=2, default=0.50)
description = models.TextField(default="")
shares_yes = models.IntegerField(default=0)
shares_no = models.IntegerField(default=0)
b = models.IntegerField(default=100)
cost_function = models.IntegerField(default=0)
open = models.BooleanField(default=True)
def __str__(self):
return self.title[:50]
def get_absolute_url(self):
return reverse('market_detail', args=[str(self.id)])
class ScenarioMarket(models.Model):
title = models.CharField(max_length=50, default="")
description = models.TextField(default="")
b = models.IntegerField(default=100)
cost_function = models.IntegerField(default=0)
most_likely = models.CharField(max_length=50, default="Not defined")
open = models.BooleanField(default=True)
def __str__(self):
return self.title[:50]
def get_absolute_url(self):
return reverse('scenario_market_detail', args=[str(self.id)])
And my user model is as follows:
class CustomUser(AbstractUser):
points = models.DecimalField(
max_digits=20,
decimal_places=2,
default=Decimal('1000.00'),
verbose_name='User points'
)
bets_placed = models.IntegerField(
default=0,
verbose_name='Bets placed'
)
net_gain = models.DecimalField(
max_digits=20,
decimal_places=2,
default=Decimal('0.00'),
verbose_name='Net gain'
)
class Meta:
ordering = ['-net_gain']
What I want happen is that different users see different sets of markets. For example, I want users from company X to only see markets pertaining to X, and same for company Y, Z, and so forth.
Four possibilities so far, and their problems:
I could hardcode this: If each user has a company feature (in addition to username, etc.), I could add a company feature to each market as well, and then use if tags in the template to ensure that the right users see the right markets. Problem: Ideally I'd want to do this through the Admin app: whenever a new market is created there, it would be specified what company can see it.
I could try to use Django's default permissions, which of course would be integrated with Admin. Problem: Setting a view permission (e.g., here) would concern the entire model, not particular instances of it.
From googling around, it seems that something like django-guardian might be what I ultimately have to go with. Problem: As I'm using a CustomUser model, it seems I might run into problems there (see here).
I came across this here on SO, which would enable me to do this without relying on django-guardian. Problem: I'm not clear on how to integrate that into the Admin app, in the manner that django-guardian seems able to.
If anyone has any advice, that would be greatly appreciated!
You can add some relationships between the models:
class Company(models.Model):
market = models.ForeignKey('Market', on_delete=models.CASCADE)
...
class CustomUser(AbstractUser):
company = models.ForeignKey('Company', on_delete=models.CASCADE)
...
then in your view you can simply filter the queryset as appropriate:
class MarketListView(LoginRequiredMixin, ListView):
context_object_name = 'market_list'
template_name = 'market_list.html'
login_url = 'login'
def get_queryset(self):
return Market.objects.filter(company__user=self.request.user)
Note, you don't need the context['markets'] = self.queryset line in your get_context_data; the queryset is already available as market_list, since that's what you set the context_object_name to.

Why is a Django model taking so long to load in admin?

I have a fairly simple Django set up for a forum, and one of the most basic models is this, for each thread:
class Post(models.Model):
created = models.DateTimeField(auto_now_add=True)
last_reply = models.DateTimeField(auto_now_add=True, blank=True, null=True)
username = models.ForeignKey(User, related_name="forumuser")
fixed = models.BooleanField(_("Sticky"), default=False)
closed = models.BooleanField(default=False)
markdown_enabled = models.BooleanField(default=False)
reply_count = models.IntegerField(default=0)
title = models.CharField(_("Title Post"), max_length=255)
content = models.TextField(_("Content"), blank=False)
rating = models.IntegerField(default=0)
followers = models.IntegerField(default=0)
ip_address = models.CharField(max_length=255)
def __unicode__(self):
return self.title
def get_absolute_url(self):
return "/post/%s/" % self.id
Then we have some replies:
class PostReply(models.Model):
user = models.ForeignKey(User, related_name='replyuser')
post = models.ForeignKey(Post, related_name='replypost')
created = models.DateTimeField(auto_now_add=True)
content = models.TextField()
ip_address = models.CharField(max_length=255)
quoted_post = models.ForeignKey('self', related_name='quotedreply', blank=True, null=True)
rating = models.IntegerField(default=0)
reply_order = models.IntegerField(default=1)
Now, currently there just over 1600 users, 6000 Posts, and 330,000 PostReply objects in the db for this setup. When I run this SQL query:
SELECT * FROM `forum_post` LIMIT 10000
I see that Query took 0.0241 sec which is fine. When I browse to the Django admin section of my site, pulling up an individual Post is rapid, as is the paginated list of Posts.
However, if I try and pull up an individual PostReply, it takes around 2-3 minutes to load.
Obviously each PostReply admin page will have a dropdown list of all the Posts in it, but can anyone tell me why this or anything else would cause such a dramatically slow query? It's worth noting that the forum itself is pretty fast.
Also, if it is something to do with that dropdown list, has anyone got any suggestions for making that more usable?
Try to add all foreign keys in raw_id_fields in admin
class PostReplyAdmin(ModelAdmin):
raw_id_fields = ['user', 'post', 'quoted_post']
This will decrease page's load time in change view. The problem is that django loads ForeignModel.objects.all() for each foreign key's dropdowns.
Another way is to add foreign keys in autocomplete_fields (docs) in admin
class PostReplyAdmin(ModelAdmin):
autocomplete_fields = ['user', 'post', 'quoted_post']
As pointed by #Andrey Nelubin the problem for me was indeed in the page loading all related models for each foreign key's dropdown. However, with autocomplete_fields selects are turned into autocomplete inputs (see figure below), which load options asynchronously.

Django: DateTime not good when saved on my database

It seems stupid but i have hard time since hours and hours about saving my dateTime on db. I'm pretty new in Python and it's not everyday that i'm manipulating datetime.
I have one hour of difference when i'm saving my value. So 18h is now 17h (sorry for my english)
My models is like this:
class Event(models.Model):
title = models.CharField(max_length=245)
description = models.TextField(max_length=750, null=True, blank=True)
start = models.DateTimeField()
end = models.DateTimeField()
created_at = models.DateTimeField(editable=False)
updated_at = models.DateTimeField(editable=False)
slug = AutoSlugField(populate_from='title', unique=True, editable=False)
nb_participant = models.PositiveSmallIntegerField(default=1)
price = models.PositiveSmallIntegerField(default=0)
user = models.ForeignKey(User, editable=False, related_name='author')
address = models.ForeignKey('Address', editable=False, related_name='events')
participants = models.ManyToManyField(User, related_name='participants', blank=True)
class Meta:
db_table = 'event'
def save(self, *args, **kwargs):
if not self.pk:
self.created_at = timezone.localtime(timezone.now())
print self.created_at
self.updated_at = timezone.localtime(timezone.now())
super(Event, self).save(*args, **kwargs)
As you see i have 4 fields with datetime. 2 are actually save automatically save when the model is created.
I resolved the probleme by using timezone.localtime(timezone.now()) instead of timezone.now(). I find that there enter link description here at the bottom of the page. But they said to use timezone.now() in most case. So i don't know why i have this one hour difference.
I have two other fields that are send from my angular frontend to my API( using django rest framework)
I put a screenshot. The first object i send by angular.As you seen the date is well formatted.
The second object is the response from my API and i have lost one hour (so the GMT +1)
Why ? I'm totally block so if someone has a solution, i'll be very happy :)
My settings.py:
LANGUAGE_CODE = 'fr-fr'
TIME_ZONE = 'Europe/Paris'
USE_L10N = True
USE_TZ = True
Thanks.
In settings file try with USE_TZ=False, and use normal datetime.now().