Reducing queries for manytomany models in django - django

EDIT:
It turns out the real question is - how do I get select_related to follow the m2m relationships I have defined? Those are the ones that are taxing my system. Any ideas?
I have two classes for my django app. The first (Item class) describes an item along with some functions that return information about the item. The second class (Itemlist class) takes a list of these items and then does some processing on them to return different values. The problem I'm having is that returning a list of items from Itemlist is taking a ton of queries, and I'm not sure where they're coming from.
class Item(models.Model):
# for archiving purposes
archive_id = models.IntegerField()
users = models.ManyToManyField(User, through='User_item_rel',
related_name='users_set')
# for many to one relationship (tags)
tag = models.ForeignKey(Tag)
sub_tag = models.CharField(default='',max_length=40)
name = models.CharField(max_length=40)
purch_date = models.DateField(default=datetime.datetime.now())
date_edited = models.DateTimeField(auto_now_add=True)
price = models.DecimalField(max_digits=6, decimal_places=2)
buyer = models.ManyToManyField(User, through='Buyer_item_rel',
related_name='buyers_set')
comments = models.CharField(default='',max_length=400)
house_id = models.IntegerField()
class Meta:
ordering = ['-purch_date']
def shortDisplayBuyers(self):
if len(self.buyer_item_rel_set.all()) != 1:
return "multiple buyers"
else:
return self.buyer_item_rel_set.all()[0].buyer.name
def listBuyers(self):
return self.buyer_item_rel_set.all()
def listUsers(self):
return self.user_item_rel_set.all()
def tag_name(self):
return self.tag
def sub_tag_name(self):
return self.sub_tag
def __unicode__(self):
return self.name
and the second class:
class Item_list:
def __init__(self, list = None, house_id = None, user_id = None,
archive_id = None, houseMode = 0):
self.list = list
self.house_id = house_id
self.uid = int(user_id)
self.archive_id = archive_id
self.gen_balancing_transactions()
self.houseMode = houseMode
def ret_list(self):
return self.list
So after I construct Itemlist with a large list of items, Itemlist.ret_list() takes up to 800 queries for 25 items. What can I do to fix this?

Try using select_related
As per a question I asked here

Dan is right in telling you to use select_related.
select_related can be read about here.
What it does is return in the same query data for the main object in your queryset and the model or fields specified in the select_related clause.
So, instead of a query like:
select * from item
followed by several queries like this every time you access one of the item_list objects:
select * from item_list where item_id = <one of the items for the query above>
the ORM will generate a query like:
select item.*, item_list.*
from item a join item_list b
where item a.id = b.item_id
In other words: it will hit the database once for all the data.

You probably want to use prefetch_related
Works similarly to select_related, but can deal with relations selected_related cannot. The join happens in python, but I've found it to be more efficient for this kind of work than the large # of queries.
Related reading on the subject

Related

how to call back unique together constraints as a field django

i'm trying to call back unique constraints field , in my project i have to count number of M2M selected
class Booking(models.Model):
room_no = models.ForeignKey(Room,on_delete=models.CASCADE,blank=True,related_name='rooms')
takes_by = models.ManyToManyField(Vistor)
#property
def no_persons(self):
qnt = Booking.objects.filter(takes_by__full_information=self).count()#but this doesnt work
return qnt
Cannot query "some room information": Must be "Vistor" instance.
class Vistor(models.Model):
full_name = models.CharField(max_length=150)
dob = models.DateField(max_length=14)
city = models.ForeignKey(City,on_delete=models.CASCADE)
class Meta:
constraints = [
models.UniqueConstraint(fields=['full_name','dob','city'],name='full_information')
]
def __str__(self):
return f'{self.full_name} - {self.city} - {self.dob}'
it it possible to access full_information through Booking model ? thank you ..
If you want to count the number of Visitors related to that booking, you can count these with:
#property
def no_persons(self):
self.taken_by.count()
This will make an extra query to the database, therefore it is often better to let the database count these in the query. You can thus remove the property, and query with:
from django.db.models import Count
Booking.objects.annotate(
no_persons=Count('takes_by')
)
The Bookings that arise from this QuerySet will have an extra attribute no_persons with the number of related Visitors.

AND search with reverse relations

I'm working on a django project with the following models.
class User(models.Model):
pass
class Item(models.Model):
user = models.ForeignKey(User)
item_id = models.IntegerField()
There are about 10 million items and 100 thousand users.
My goal is to override the default admin search that takes forever and
return all the matching users that own "all" of the specified item ids within a reasonable timeframe.
These are a couple of the tests I use to better illustrate my criteria.
class TestSearch(TestCase):
def search(self, searchterm):
"""A tuple is returned with the first element as the queryset"""
return do_admin_search(User.objects.all())
def test_return_matching_users(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
Item.objects.create(item_id=67890, user=user)
result = self.search('12345 67890')
assert_equal(1, result[0].count())
assert_equal(user, result[0][0])
def test_exclude_users_that_do_not_match_1(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
result = self.search('12345 67890')
assert_false(result[0].exists())
def test_exclude_users_that_do_not_match_2(self):
user = User.objects.create()
result = self.search('12345 67890')
assert_false(result[0].exists())
The following snippet is my best attempt using annotate that takes over 50 seconds.
def search_by_item_ids(queryset, item_ids):
params = {}
for i in item_ids:
cond = Case(When(item__item_id=i, then=True), output_field=BooleanField())
params['has_' + str(i)] = cond
queryset = queryset.annotate(**params)
params = {}
for i in item_ids:
params['has_' + str(i)] = True
queryset = queryset.filter(**params)
return queryset
Is there anything I can do to speed it up?
Here's some quick suggestions that should improve performance drastically.
Use prefetch_related` on the initial queryset to get related items
queryset = User.objects.filter(...).prefetch_related('user_set')
Filter with the __in operator instead of looping through a list of IDs
def search_by_item_ids(queryset, item_ids):
return queryset.filter(item__item_id__in=item_ids)
Don't annotate if it's already a condition of the query
Since you know that this queryset only consists of records with ids in the item_ids list, no need to write that per object.
Putting it all together
You can speed up what you are doing drastically just by calling -
queryset = User.objects.filter(
item__item_id__in=item_ids
).prefetch_related('user_set')
with only 2 db hits for the full query.

Filtering queryset if one value is greater than another value

I am trying to filter in view my queryset based on relation between 2 fields .
however always getting the error that my field is not defined .
My Model has several calculated columns and I want to get only the records where values of field A are greater than field B.
So this is my model
class Material(models.Model):
version = IntegerVersionField( )
code = models.CharField(max_length=30)
name = models.CharField(max_length=30)
min_quantity = models.DecimalField(max_digits=19, decimal_places=10)
max_quantity = models.DecimalField(max_digits=19, decimal_places=10)
is_active = models.BooleanField(default=True)
def _get_totalinventory(self):
from inventory.models import Inventory
return Inventory.objects.filter(warehouse_Bin__material_UOM__UOM__material=self.id, is_active = true ).aggregate(Sum('quantity'))
totalinventory = property(_get_totalinventory)
def _get_totalpo(self):
from purchase.models import POmaterial
return POmaterial.objects.filter(material=self.id, is_active = true).aggregate(Sum('quantity'))
totalpo = property(_get_totalpo)
def _get_totalso(self):
from sales.models import SOproduct
return SOproduct.objects.filter(product__material=self.id , is_active=true ).aggregate(Sum('quantity'))
totalso = property(_get_totalpo)
#property
def _get_total(self):
return (totalinventory + totalpo - totalso)
total = property(_get_total)
And this is line in my view where I try to get the conditional queryset
po_list = MaterialFilter(request.GET, queryset=Material.objects.filter( total__lte = min_quantity ))
But I am getting the error that min_quantity is not defined
What could be the problem ?
EDIT:
My problem got solved thank you #Moses Koledoye but in the same code I have different issue now
Cannot resolve keyword 'total' into field.Choices are: am, author, author_id, bom, bomversion, code, creation_time, description, id, inventory, is_active, is_production, itemgroup, itemgroup_id, keywords, materialuom, max_quantity, min_quantity, name, pomaterial, produce, product, slug, trigger_quantity, uom, updated_by, updated_by_id, valid_from, valid_to, version, warehousebin
Basically it doesn't show any of my calculated fields I have in my model.
Django cannot write a query which is conditioned on a field whose value is unknown. You need to use a F expression for this:
from django.db.models import F
queryset = Material.objects.filter(total__lte = F('min_quantity'))
And your FilterSet becomes:
po_list = MaterialFilter(request.GET, queryset = Material.objects.filter(total__lte=F('min_quantity')))
From the docs:
An F() object represents the value of a model field or annotated
column. It makes it possible to refer to model field values and
perform database operations using them without actually having to pull
them out of the database into Python memory

django saving list of foreignkey objects to m2m field w/ through model with ordering

I have a html table w/ javascript that allows ordering of rows which passes the track_id list and row order list through a form on POST.
I just added the class PlaylistTrack using through for the model so I can add ordering to the tracks.m2m field. The view I have below works before I added the through model, but now I am not sure how I should save a list of tracks w/ its associated order number since I can't use add() and I must use create(). How should I use create() in my view to save a list of track_id's and associate the order number w/ list? Can I use bulk_create?
models.py:
class Playlist(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, default=1)
name = models.CharField(max_length=50)
tracks = models.ManyToManyField(Track, through='PlaylistTrack')
def __str__(self):
return self.name
class PlaylistTrack(models.Model):
track = models.ForeignKey(Track)
playlist = models.ForeignKey(Playlist)
order = models.PositiveIntegerField()
class Meta:
ordering = ['order']
views.py:
def add_track(request):
active_playlist = Playlist.objects.get(id=request.POST.get('playlist_id'))
add_tracks = request.POST.getlist('track_id')
if request.POST.get('playlist_id'):
active_playlist.tracks.add(*add_tracks) # how to use create or bulk_create?
return redirect("playlists:list_playlists")
Ozgur's answer has you mostly covered. However, you do NOT need to fetch Playlist or Track instances from the database and you CAN use bulk_create:
def add_track(request):
playlist_id = request.POST.get('playlist_id')
track_ids = enumerate(request.POST.getlist('track_id'), start=1)
PlaylistTrack.objects.bulk_create([
PlaylistTrack(playlist_id=playlist_id, track_id=track_id, order=i)
for i, track_id in track_ids
])
return redirect("playlists:list_playlists")
This reduces the entire procedure to a single db operation where you had (1 + 2n) operations before (n being the number of tracks).
You need to fetch Track objects that will be added:
Track.objects.filter(pk__in=add_tracks)
--
Since order field must be populated, you can't use .add() on M2M field. You gotta create objects by yourself:
def add_track(request):
playlist = Playlist.objects.get(id=request.POST.get('playlist_id'))
for i, track_id in enumerate(request.POST.getlist('track_id'), start=1):
track = Track.objects.get(pk=track_id)
PlaylistTrack.objects.create(track=track, playlist=playlist, order=i)
return redirect("playlists:list_playlists")

Sorting products after dateinterval and weight

What I want is to be able to get this weeks/this months/this years etc. hotest products. So I have a model named ProductStatistics that will log each hit and each purchase on a day-to-day basis. This is the models I have got to work with:
class Product(models.Model):
name = models.CharField(_("Name"), max_length=200)
slug = models.SlugField()
description = models.TextField(_("Description"))
picture = models.ImageField(upload_to=product_upload_path, blank=True)
category = models.ForeignKey(ProductCategory)
prices = models.ManyToManyField(Store, through='Pricing')
objects = ProductManager()
class Meta:
ordering = ('name', )
def __unicode__(self):
return self.name
class ProductStatistic(models.Model):
# There is only 1 `date` each day. `date` is
# set by datetime.today().date()
date = models.DateTimeField(default=datetime.now)
hits = models.PositiveIntegerField(default=0)
purchases = models.PositiveIntegerField(default=0)
product = models.ForeignKey(Product)
class Meta:
ordering = ('product', 'date', 'purchases', 'hits', )
def __unicode__(self):
return u'%s: %s - %s hits, %s purchases' % (self.product.name, str(self.date).split(' ')[0], self.hits, self.purchases)
How would you go about sorting the Products after say (hits+(purchases*2)) the latest week?
This structure isn't set in stone either, so if you would structure the models in any other way, please tell!
first idea:
in the view you could query for today's ProductStatistic, than loop over the the queryset and add a variable ranking to every object and add that object to a list. Then just sort after ranking and pass the list to ur template.
second idea:
create a filed ranking (hidden for admin) and write the solution of ur formula each time the object is saved to the database by using a pre_save-signal. Now you can do ProductStatistic.objects.filter(date=today()).order_by('ranking')
Both ideas have pros&cons, but I like second idea more
edit as response to the comment
Use Idea 2
Write a view, where you filter like this: ProductStatistic.objects.filter(product= aProductObject, date__gte=startdate, date__lte=enddate)
loop over the queryset and do somthing like aProductObject.ranking+= qs_obj.ranking
pass a sorted list of the queryset to the template
Basically a combination of both ideas
edit to your own answer
Your solution isn't far away from what I suggested — but in sql-space.
But another solution:
Make a Hit-Model:
class Hit(models.Model):
date = models.DateTimeFiles(auto_now=True)
product = models.ForeignKey(Product)
purchased= models.BooleanField(default=False)
session = models.CharField(max_length=40)
in your view for displaying a product you check, if there is a Hit-object with the session, and object. if not, you save it
Hit(product=product,
date=datetime.datetime.now(),
session=request.session.session_key).save()
in your purchase view you get the Hit-object and set purchased=True
Now in your templates/DB-Tools you can do real statistics.
Of course it can generate a lot of DB-Objects over the time, so you should think about a good deletion-strategy (like sum the data after 3 month into another model MonthlyHitArchive)
If you think, that displaying this statistics would generate to much DB-Traffic, you should consider using some caching.
I solved this the way I didn't want to solve it. I added week_rank, month_rank and overall_rank to Product and then I just added the following to my ProductStatistic model.
def calculate_rank(self, days_ago=7, overall=False):
if overall:
return self._default_manager.all().extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
else:
return self._default_manager.filter(
date__gte = datetime.today()-timedelta(days_ago),
date__lte = datetime.today()
).extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
def save(self, *args, **kwargs):
super(ProductStatistic, self).save(*args, **kwargs)
t = Product.objects.get(pk=self.product.id)
t.week_rank = self.calculate_rank()
t.month_rank = self.calculate_rank(30)
t.overall_rank = self.calculate_rank(overall=True)
t.save()
I'll leave it unsolved if there is a better solution.