Let's assume I have a Product model in my project:
class Product(models.Model):
price = models.IntegerField()
and I want to have some sort of statistics (let's say I want to keep track of how has the price changed over time) for it:
class ProductStatistics(models.Model):
created = models.DateTimeField(auto_add_now=True)
statistics_value = models.IntegerField()
product = models.ForeignKey(Product)
#classmethod
def create_for_product(cls, product_ids):
statistics = []
products = Product.objects.filter(id__in=products_ids)
for product in products:
statistics.append(
product=product
statistics_value=product.price
)
cls.objects.bulk_create(statistics)
#classmethod
def get_latest_by_products_ids(cls, product_ids):
return None
I have a problem with implementing get_latest_by_products_ids method. I want only latest statistic, so I can't do something like:
#classmethod
def get_latest_by_products_ids(cls, product_ids):
return cls.objects.filter(product__id__in=product_ids)
because this would return all statistics I have gathered through time. How can I limit the query to only most recent one for each Product?
EDIT
I am using PostgreSQL database.
Querysets already have a last() method (and a first() method too FWIW). The only question is what you want to define as "last" since this depends on the queryset's ordering... But assuming you want the last by creation date (created field), you can also use the lastest() method:
#classmethod
def get_latest_by_products_ids(cls, product_ids):
found = []
for pid in products_ids:
found.append(cls.objects.filter(product_id=pid).latest("created"))
return found
As a side note: Django's coding style is to use the Manager (and eventually the Queryset) for operations working on the whole table, so instead of creating classmethods on your model you should create a custom manager:
class productStatisticManager(models.Manager):
def create_for_products(self, product_ids):
statistics = []
products = Product.objects.filter(id__in=products_ids)
for product in products:
statistics.append(
product=product
statistics_value=product.price
)
self.bulk_create(statistics)
def get_latest_by_products_ids(cls, product_ids):
found = []
for pid in products_ids:
last = self.objects.filter(product_id=pid).latest("created")
found.append(last)
return found
class ProductStatistics(models.Model):
created = models.DateTimeField(auto_add_now=True)
statistics_value = models.IntegerField()
product = models.ForeignKey(Product)
objects = ProductStatisticManager()
Putting the method in Product model and will be easier:
class Product(models.Model):
price = models.IntegerField()
def get_latest_stat(self):
return self.productstatistics_set.all().order_by('-created')[0] # or [:1]
Using [:1] instead of [0] will return QuerySet of single element while [0] will return just one Object of model Class.
eg.
>>> type(cls.objects.filter(product__id__in=product_ids).order_by('-created')[:1])
<class 'django.db.models.query.QuerySet'>
>>> type(cls.objects.filter(product__id__in=product_ids).order_by('-created')[0])
<class 'myApp.models.MyModel'>
Related
I'm using Django filters (django-filter) in my project. I have the models below, where a composition (Work) has a many-to-many instrumentations field with a through model. Each instrumentation has several instruments within it.
models.py:
class Work(models.Model):
instrumentations = models.ManyToManyField(Instrument,
through='Instrumentation',
blank=True)
class Instrument(models.Model):
name = models.CharField(max_length=100)
class Instrumentation(models.Model):
players = models.IntegerField(validators=[MinValueValidator(1)])
work = models.ForeignKey(Work, on_delete=models.CASCADE)
instrument = models.ForeignKey(Instrument, on_delete=models.CASCADE)
views.py:
import django_filters
class WorkFilter(django_filters.FilterSet):
instrument = django_filters.ModelMultipleChoiceFilter(
field_name="instrumentation__instrument",
queryset=Instrument.objects.all())
My filter works fine: it grabs all the pieces where there is the instrument selected by the user in the filter form.
However, I'd like to add the possibility of filtering the compositions with those exact instruments. For instance, if a piece contains violin, horn and cello and nothing else, I'd like to get that, but not a piece written for violin, horn, cello, and percussion. Is it possible to achieve that?
I'd also like the user to choose, from the interface, whether to perform an exact search or not, but that's a secondary issue for now, I suppose.
Update: type_of_search using ChoiceFilter
I made some progress; with the code below, I can give the user a choice between the two kinds of search. Now, I need to find which query would grab only the compositions with that exact set of instruments.
class WorkFilter(django_filters.FilterSet):
# ...
CHOICES = {
('exact', 'exact'), ('not_exact', 'not_exact')
}
type_of_search = django_filters.ChoiceFilter(label="Exact match?", choices=CHOICES, method="filter_instruments")
def filter_instruments(self, queryset, name, value):
if value == 'exact':
return queryset.??
elif value == 'not_exact':
return queryset.??
I know that the query I want is something like:
Work.objects.filter(instrumentations__name='violin').filter(instrumentations__name='viola').filter(instrumentations__name='horn')
I just don't know how to 'translate' it into the django_filters language.
Update 2: 'exact' query using QuerySet.annotate
Thanks to this question, I think this is the query I'm looking for:
from django.db.models import Count
instrument_list = ['...'] # How do I grab them from the form?
instruments_query = Work.objects.annotate(count=Count('instrumentations__name')).filter(count=len(instrument_list))
for instrument in instrument_list:
instruments_query = instruments_query.filter(instrumentations__name=instrument_list)
I feel I'm close, I just don't know how to integrate this with django_filters.
Update 3: WorkFilter that returns empty if the search is exact
class WorkFilter(django_filters.FilterSet):
genre = django_filters.ModelChoiceFilter(
queryset=Genre.objects.all(),
label="Filter by genre")
instrument = django_filters.ModelMultipleChoiceFilter(
field_name="instrumentation__instrument",
queryset=Instrument.objects.all(),
label="Filter by instrument")
CHOICES = {
('exact', 'exact'), ('not_exact', 'not_exact')
}
type_of_search = django_filters.ChoiceFilter(label="Exact match?", choices=CHOICES, method="filter_instruments")
def filter_instruments(self, queryset, name, value):
instrument_list = self.data.getlist('instrumentation__instrument')
if value == 'exact':
queryset = queryset.annotate(count=Count('instrumentations__name')).filter(count=len(instrument_list))
for instrument in instrument_list:
queryset = queryset.filter(instrumentations__name=instrument)
elif value == 'not_exact':
pass # queryset = ...
return queryset
class Meta:
model = Work
fields = ['genre', 'title', 'instrument', 'instrumentation']
You can grab instrument_list with self.data.getlist('instrument').
This is how you would use instrument_list for the 'exact' query:
type_of_search = django_filters.ChoiceFilter(label="Exact match?", choices=CHOICES, method=lambda queryset, name, value: queryset)
instrument = django_filters.ModelMultipleChoiceFilter(
field_name="instrumentation__instrument",
queryset=Instrument.objects.all(),
label="Filter by instrument",
method="filter_instruments")
def filter_instruments(self, queryset, name, value):
if not value:
return queryset
instrument_list = self.data.getlist('instrument') # [v.pk for v in value]
type_of_search = self.data.get('type_of_search')
if type_of_search == 'exact':
queryset = queryset.annotate(count=Count('instrumentations')).filter(count=len(instrument_list))
for instrument in instrument_list:
queryset = queryset.filter(instrumentations__pk=instrument)
else:
queryset = queryset.filter(instrumentations__pk__in=instrument_list).distinct()
return queryset
I want to write all types of complex queries,
for example :
If someone wants information "Fruit" is "Guava" in "Pune District" then they will get data for guava in pune district.
htt//api/?fruit=Guava&?district=Pune
If someone wants information "Fruit" is "Guava" in "Girnare Taluka" then they will get data for guava in girnare taluka.
htt://api/?fruit=Guava&?taluka=Girnare
If someone wants information for "Fruit" is "Guava" and "Banana" then they will get all data only for this two fruits, like wise
htt://api/?fruit=Guava&?Banana
But, when I run server then I cant get correct output
If i use http://api/?fruit=Banana then I get all data for fruit which is banana, pomegranate, guava instead of get data for fruit is only banana. So I am confuse what happen here.
can you please check my code, where I made mistake?
*Here is my all files
models.py
class Wbcis(models.Model):
Fruit = models.CharField(max_length=50)
District = models.CharField(max_length=50)
Taluka = models.CharField(max_length=50)
Revenue_circle = models.CharField(max_length=50)
Sum_Insured = models.FloatField()
Area = models.FloatField()
Farmer = models.IntegerField()
def get_wbcis(fruit=None, district=None, talkua=None, revenue_circle=None, sum_insured=None, area=None,min_farmer=None, max_farmer=None, limit=100):
query = Wbcis.objects.all()
if fuit is not None:
query = query.filter(Fruit=fruit)
if district is not None:
query = query.filter(District=district)
if taluka is not None:
query = query.filter(Taluka=taluka)
if revenue_circle is not None:
query = query.filter(Revenue_circle= revenue_circle)
if sum_insured is not None:
query = query.filter(Sum_Insured=sum_Insured)
if area is not None:
query = query.filter(Area=area)
if min_farmer is not None:
query = query.filter(Farmer__gte=min_farmer)
if max_farmer is not None:
query = query.filter(Farmer__lt=max_farmer)
return query[:limit]
Views.py
class WbcisViewSet(ModelViewSet):
queryset = Wbcis.objects.all()
serializer_class = WbcisSerializer
def wbcis_view(request):
fruit = request.GET.get("fruit")
district = request.GET.get("district")
taluka = request.GET.get("taluka")
revenue_circle = request.GET.get("revenue_circle")
sum_insured = request.GET.get("sum_insured")
area = request.GET.get("area")
min_farmer = request.GET.get("min_farmer")
max_farmer = request.GET.get("max_farmer")
wbcis = get_wbcis(fruit, district, taluka,revenue_circle,sum_insured,area, min_farmer, max_farmer)
#convert them to JSON:
dicts = []
for wbci in wbcis:
dicts.append(model_to_dict(wbci))
return JsonResponse(dicts)
Serializers.py
from rest_framework.serializers import ModelSerializer
from WBCIS.models import Wbcis
class WbcisSerializer(ModelSerializer):
class Meta:
model = Wbcis
fields=('id','Fruit','District','Sum_Insured','Area','Farmer','Taluka','Revenue_circle',)
whats need changes in this code for call these queries to get exact output?
I don't think that you're actually calling that view, judging by your usage I presume you're calling the viewset itself and then ignoring the query params.
You should follow the drf docs for filtering but essentially, provide the get queryset method to your viewset and include the code you currently have in your view in that
class WbcisViewSet(ModelViewSet):
queryset = Wbcis.objects.all() # Shouldn't need this anymore
serializer_class = WbcisSerializer
def get_queryset(self):
fruit = self.request.query_params.get("fruit")
....
return get_wbscis(...)
I'm working on a django project with the following models.
class User(models.Model):
pass
class Item(models.Model):
user = models.ForeignKey(User)
item_id = models.IntegerField()
There are about 10 million items and 100 thousand users.
My goal is to override the default admin search that takes forever and
return all the matching users that own "all" of the specified item ids within a reasonable timeframe.
These are a couple of the tests I use to better illustrate my criteria.
class TestSearch(TestCase):
def search(self, searchterm):
"""A tuple is returned with the first element as the queryset"""
return do_admin_search(User.objects.all())
def test_return_matching_users(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
Item.objects.create(item_id=67890, user=user)
result = self.search('12345 67890')
assert_equal(1, result[0].count())
assert_equal(user, result[0][0])
def test_exclude_users_that_do_not_match_1(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
result = self.search('12345 67890')
assert_false(result[0].exists())
def test_exclude_users_that_do_not_match_2(self):
user = User.objects.create()
result = self.search('12345 67890')
assert_false(result[0].exists())
The following snippet is my best attempt using annotate that takes over 50 seconds.
def search_by_item_ids(queryset, item_ids):
params = {}
for i in item_ids:
cond = Case(When(item__item_id=i, then=True), output_field=BooleanField())
params['has_' + str(i)] = cond
queryset = queryset.annotate(**params)
params = {}
for i in item_ids:
params['has_' + str(i)] = True
queryset = queryset.filter(**params)
return queryset
Is there anything I can do to speed it up?
Here's some quick suggestions that should improve performance drastically.
Use prefetch_related` on the initial queryset to get related items
queryset = User.objects.filter(...).prefetch_related('user_set')
Filter with the __in operator instead of looping through a list of IDs
def search_by_item_ids(queryset, item_ids):
return queryset.filter(item__item_id__in=item_ids)
Don't annotate if it's already a condition of the query
Since you know that this queryset only consists of records with ids in the item_ids list, no need to write that per object.
Putting it all together
You can speed up what you are doing drastically just by calling -
queryset = User.objects.filter(
item__item_id__in=item_ids
).prefetch_related('user_set')
with only 2 db hits for the full query.
EDIT:
It turns out the real question is - how do I get select_related to follow the m2m relationships I have defined? Those are the ones that are taxing my system. Any ideas?
I have two classes for my django app. The first (Item class) describes an item along with some functions that return information about the item. The second class (Itemlist class) takes a list of these items and then does some processing on them to return different values. The problem I'm having is that returning a list of items from Itemlist is taking a ton of queries, and I'm not sure where they're coming from.
class Item(models.Model):
# for archiving purposes
archive_id = models.IntegerField()
users = models.ManyToManyField(User, through='User_item_rel',
related_name='users_set')
# for many to one relationship (tags)
tag = models.ForeignKey(Tag)
sub_tag = models.CharField(default='',max_length=40)
name = models.CharField(max_length=40)
purch_date = models.DateField(default=datetime.datetime.now())
date_edited = models.DateTimeField(auto_now_add=True)
price = models.DecimalField(max_digits=6, decimal_places=2)
buyer = models.ManyToManyField(User, through='Buyer_item_rel',
related_name='buyers_set')
comments = models.CharField(default='',max_length=400)
house_id = models.IntegerField()
class Meta:
ordering = ['-purch_date']
def shortDisplayBuyers(self):
if len(self.buyer_item_rel_set.all()) != 1:
return "multiple buyers"
else:
return self.buyer_item_rel_set.all()[0].buyer.name
def listBuyers(self):
return self.buyer_item_rel_set.all()
def listUsers(self):
return self.user_item_rel_set.all()
def tag_name(self):
return self.tag
def sub_tag_name(self):
return self.sub_tag
def __unicode__(self):
return self.name
and the second class:
class Item_list:
def __init__(self, list = None, house_id = None, user_id = None,
archive_id = None, houseMode = 0):
self.list = list
self.house_id = house_id
self.uid = int(user_id)
self.archive_id = archive_id
self.gen_balancing_transactions()
self.houseMode = houseMode
def ret_list(self):
return self.list
So after I construct Itemlist with a large list of items, Itemlist.ret_list() takes up to 800 queries for 25 items. What can I do to fix this?
Try using select_related
As per a question I asked here
Dan is right in telling you to use select_related.
select_related can be read about here.
What it does is return in the same query data for the main object in your queryset and the model or fields specified in the select_related clause.
So, instead of a query like:
select * from item
followed by several queries like this every time you access one of the item_list objects:
select * from item_list where item_id = <one of the items for the query above>
the ORM will generate a query like:
select item.*, item_list.*
from item a join item_list b
where item a.id = b.item_id
In other words: it will hit the database once for all the data.
You probably want to use prefetch_related
Works similarly to select_related, but can deal with relations selected_related cannot. The join happens in python, but I've found it to be more efficient for this kind of work than the large # of queries.
Related reading on the subject
What I want is to be able to get this weeks/this months/this years etc. hotest products. So I have a model named ProductStatistics that will log each hit and each purchase on a day-to-day basis. This is the models I have got to work with:
class Product(models.Model):
name = models.CharField(_("Name"), max_length=200)
slug = models.SlugField()
description = models.TextField(_("Description"))
picture = models.ImageField(upload_to=product_upload_path, blank=True)
category = models.ForeignKey(ProductCategory)
prices = models.ManyToManyField(Store, through='Pricing')
objects = ProductManager()
class Meta:
ordering = ('name', )
def __unicode__(self):
return self.name
class ProductStatistic(models.Model):
# There is only 1 `date` each day. `date` is
# set by datetime.today().date()
date = models.DateTimeField(default=datetime.now)
hits = models.PositiveIntegerField(default=0)
purchases = models.PositiveIntegerField(default=0)
product = models.ForeignKey(Product)
class Meta:
ordering = ('product', 'date', 'purchases', 'hits', )
def __unicode__(self):
return u'%s: %s - %s hits, %s purchases' % (self.product.name, str(self.date).split(' ')[0], self.hits, self.purchases)
How would you go about sorting the Products after say (hits+(purchases*2)) the latest week?
This structure isn't set in stone either, so if you would structure the models in any other way, please tell!
first idea:
in the view you could query for today's ProductStatistic, than loop over the the queryset and add a variable ranking to every object and add that object to a list. Then just sort after ranking and pass the list to ur template.
second idea:
create a filed ranking (hidden for admin) and write the solution of ur formula each time the object is saved to the database by using a pre_save-signal. Now you can do ProductStatistic.objects.filter(date=today()).order_by('ranking')
Both ideas have pros&cons, but I like second idea more
edit as response to the comment
Use Idea 2
Write a view, where you filter like this: ProductStatistic.objects.filter(product= aProductObject, date__gte=startdate, date__lte=enddate)
loop over the queryset and do somthing like aProductObject.ranking+= qs_obj.ranking
pass a sorted list of the queryset to the template
Basically a combination of both ideas
edit to your own answer
Your solution isn't far away from what I suggested — but in sql-space.
But another solution:
Make a Hit-Model:
class Hit(models.Model):
date = models.DateTimeFiles(auto_now=True)
product = models.ForeignKey(Product)
purchased= models.BooleanField(default=False)
session = models.CharField(max_length=40)
in your view for displaying a product you check, if there is a Hit-object with the session, and object. if not, you save it
Hit(product=product,
date=datetime.datetime.now(),
session=request.session.session_key).save()
in your purchase view you get the Hit-object and set purchased=True
Now in your templates/DB-Tools you can do real statistics.
Of course it can generate a lot of DB-Objects over the time, so you should think about a good deletion-strategy (like sum the data after 3 month into another model MonthlyHitArchive)
If you think, that displaying this statistics would generate to much DB-Traffic, you should consider using some caching.
I solved this the way I didn't want to solve it. I added week_rank, month_rank and overall_rank to Product and then I just added the following to my ProductStatistic model.
def calculate_rank(self, days_ago=7, overall=False):
if overall:
return self._default_manager.all().extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
else:
return self._default_manager.filter(
date__gte = datetime.today()-timedelta(days_ago),
date__lte = datetime.today()
).extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
def save(self, *args, **kwargs):
super(ProductStatistic, self).save(*args, **kwargs)
t = Product.objects.get(pk=self.product.id)
t.week_rank = self.calculate_rank()
t.month_rank = self.calculate_rank(30)
t.overall_rank = self.calculate_rank(overall=True)
t.save()
I'll leave it unsolved if there is a better solution.