Fetch objects with its relation source objects - django

The models:
class Offer(models.Model):
desc = models.TextField()
class Bid(models.Model):
offer = models.ForeignKey(Offer)
So there may be many bids for one offer.
Is there any way to fetch all offers, with its bids without performing query for each offer ?
There's a table with offer list, and I need to add a "B" flag in every row if there's at least one bid.
I tried with prefetch_related(). This worked fine. I got a "bids" attribute attached (as list) for every offer instance, but it resulted in num_offers queries.
offers = Offer.objects.prefetch_related(
models.Prefetch('bid_set', to_attr='bids', queryset=Bid.objects.select_related()))

Since you are querying from the Offer Model which is 1 to many, then the only available way is to actuallty perform 2 queries (1 for the Offer and 1 for all related bids)
offers = Offer.objects.prefetch_related('bid_set').annotate(number_of_bids=models.Count('bid_set')).all()
On the other hand, you can create a query from the other side, from the Bid towards the Offer, that would create a singe query with a JOIN:
bids = Bid.objects.select_related('offer').all()
offers = [o.offer for o in bids]
Which one you prefer depends on what you want and how your data is structured or how many entries your DB contains.
Annotate will produce a virtual number_of_bids field containing the number of bids for each offer.

Related

is it possible to add db_index = True to a field that is not unique (django)

I have a model that has some fields like:
current_datetime = models.TimeField(auto_now_add=True)
new_datetime = models.DateTimeField(null=True, db_index=True)
and data would be like :
currun_date_time = 2023-01-22T09:42:00+0330 new_datetime =2023-01-22T09:00:00+0330
currun_date_time = 2023-01-22T09:52:00+0330 new_datetime =2023-01-22T09:00:00+0330
currun_date_time = 2023-01-22T10:02:00+0330 new_datetime =2023-01-22T10:00:00+0330
is it possible new_datetime to have db_index = True ?
the reason i want this index is there are many rows (more than a 200,000 and keep adding every day) and there is a place that user can choose datetime range and see the results(it's a statistical website). i want to send a query with that filtered datetime range so it should be done fast. by the way i am using postgresql
also if you have tips for handling data or sth. like that for such websites i would be glad too hear
thanks.
Yes, It is possible to have datetime field to be true. This could upgrade the performance of queries that sort or screen by the given field.
Other better ways to have an index in datetime field is:
To evaluate the query plan and detect any sluggish processes or
missing indexes, take advantage of the "explain" command of your
database.
Employ the "limit" and "offset" parameters within your queries to
get only the necessary data.
For retrieving associated data in a single query, rather than
numerous queries, incorporate the "select_related" and
"prefetch_related" methods in your Django queries.
To store the outcomes of elaborate queries and dodge running the
same query multiple times, make use of caching systems such as
Redis or Memcached.
Moreover, if there are too many rows and the data is not required
for a long period of time, you can contemplate filing the
information in another table or database.

Join two records from same model in django queryset

Been searching the web for a couple hours now looking for a solution but nothing quite fits what I am looking for.
I have one model (simplified):
class SimpleModel(Model):
name = CharField('Name', unique=True)
date = DateField()
amount = FloatField()
I have two dates; date_one and date_two.
I would like a single queryset with a row for each name in the Model, with each row showing:
{'name': name, 'date_one': date_one, 'date_two': date_two, 'amount_one': amount_one, 'amount_two': amount_two, 'change': amount_two - amount_one}
Reason being I would like to be able to find the rank of amount_one, amount_two, and change, using sort or filters on that single queryset.
I know I could create a list of dictionaries from two separate querysets then sort on that and get the ranks from the index values ...
but perhaps nievely I feel like there should be a DB solution using one queryset that would be faster.
union seemed promising but you cannot perform some simple operations like filter after that
I think I could perhaps split name into its own Model and generate queryset with related fields, but I'd prefer not to change the schema at this stage. Also, I only have access to sqlite.
appreciate any help!
Your current model forces you to have ONE name associated with ONE date and ONE amount. Because name is unique=True, you literally cannot have two dates associated with the same name
So if you want to be able to have several dates/amounts associated with a name, there are several ways to proceed
Idea 1: If there will only be 2 dates and 2 amounts, simply add a second date field and a second amount field
Idea 2: If there can be an infinite number of days and amounts, you'll have to change your model to reflect it, by having :
A model for your names
A model for your days and amounts, with a foreign key to your names
Idea 3: You could keep the same model and simply remove the unique constraint, but that's a recipe for mistakes
Based on your choice, you'll then have several ways of querying what you need. It depends on your final model structure. The best way to go would be to create custom model methods that query the 2 dates/amount, format an array and return it

django subquery with a join in it

I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.
I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.

Counting the number of related objects with a certain value in Django

This are simplified models to demonstrate my problem:
class User(models.Model):
username = models.CharField(max_length=30)
total_readers = models.IntegerField(default=0)
class Book(models.Model):
author = models.ForeignKey(User)
title = models.CharField(max_length=100)
class Reader(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
So, we have Users, Books and Readers (Users, who have read a Book). Thus, Reader is basically a many-to-many relationship between Book and User.
Now let's say, the current user reads a book. Now, I'd like to update the number of total readers for all books of this book's author:
# get the book (as an example pk=1)
book = Book.objects.get(pk=1)
# save Reader object for this user and this book
Reader(user=request.user, book=book).save()
# count and save the total number of readers for this author in all his books
book.author.total_readers = Reader.objects.filter(book__author=book.author).count()
book.author.save()
By doing so, Django creates a LEFT OUTER JOIN query for PostgreSQL and we get the expected result. However, the database tables are huge and this has become a bottleneck.
In this example, we could simply increase the total_readers by one on each view, instead of actually counting the database rows. However, this is just a simplified model structure and we cannot do this in reality here.
What I can do, is creating another field in the Reader model called book_author_id. Thus, I denormalize data and can count the Reader objects without having PostgreSQL making the LEFT OUTER JOIN with the User table.
Finally, here's my question: Is it possible to create some sort of database index, so that PostgreSQL handles this denormalization automatically? Or do I really have to create this additional model field and redundantly store the author's PK in there?
EDIT - to point out the essential question: I got several great answers, which work for a lot of scenarios. However, they don't solve this actual problem. The only thing I'd like to know, is if it's possible to have PostgreSQL handle such a denormalization automatically - e.g. by creating some sort of database index.
Sometimes, this query can serve better:
book.author.total_readers = Reader.objects.filter(book__in=Book.objects.filter(author=book.author)).count()
That will generate query with sub-query, sometimes it will have better performance that query with join. You even go further and end up creating 2 queries separately:
book.author.total_readers = Reader.objects.filter(book_id__in=Book.objects.filter(author=book.author).values_list('id', flat=True)).count()
That will generate 2 queries, one will retrieve list of all book IDs for that author and second will retrieve count of reads for books with ID in that list.
Good solution also may be to create some batch task that will run for example once per hour and count up all reads, but that way you will end up with not live refreshing count of reads.
You can also create celery task that will run just after read is created to generate new value for author. That way you won't have long response time and delay from creating read to counting it up won't be so long.
It's always way better to solve bottlenecks of this sort with good design and maybe a little bit of caching rather than duplicating data in the way you suggest. The total_readers field is data you should generate instead of recording.
class User(models.Model):
username = models.CharField(max_length=30)
#property
def total_readers(self):
cached_value = caching_client.get("readers_"+self.username, None)
if cached_value is None:
cached_value = self.readers()
caching_client.set("readers_"+self.username,
cached_value)
return cached_value
def readers(self):
return Reader.objects.filter(book__author__user=self).count()
There are libraries that do the caching via decorators but I felt it was a pattern you would benefit from seeing expressly. You can also attach a TTL to the cache so that you insure that the value can't be wrong for longer than TTL. You can also regenerate the cache upon creation of a Reader object.
You might actually get some mileage with declaring an m2m and defining through relationships but I have no experience of it.

How to sort by annotated Count() in a related model in Django

I'm building a food logging database in Django and I've got a query related problem.
I've set up my models to include (among other things) a Food model connected to the User model through an M2M-field "consumer" via the Consumption model. The Food model describes food dishes and the Consumption model describes a user's consumption of Food (date, amount, etc).
class Food(models.Model):
food_name = models.CharField(max_length=30)
consumer = models.ManyToManyField("User", through=Consumption)
class Consumption(models.Model):
food = models.ForeignKey("Food")
user = models.ForeignKey("User")
I want to create a query that returns all Food objects ordered by the number of times that Food object appears in the Consumption table for that user (the number of times the user has consumed the food).
I'm trying something in the line of:
Food.objects.all().annotate(consumption_times = Count(consumer)).order_by('consumption_times')`
But this will of course count all Consumption objects related to the Food object, not just the ones associated with the user. Do I need to change my models or am I just missing something obvious in the queries?
This is a pretty time-critical operation (among other things, it's used to fill an Autocomplete field in the Frontend) and the Food table has a couple of thousand entries, so I'd rather do the sorting in the database end, rather than doing the brute force method and iterate over the results doing:
Consumption.objects.filter(food=food, user=user).count()
and then using python sort to sort them. I don't think that method would scale very well as the user base increases and I want to design the database as future proof as I can from the start.
Any ideas?
Perhaps something like this?
Food.objects.filter(consumer__user=user)\
.annotate(consumption_times=Count('consumer'))\
.order_by('consumption_times')
I am having a very similar issue. Basically, I know that the SQL query you want is:
SELECT food.*, COUNT(IF(consumption.user_id=123,TRUE,NULL)) AS consumption_times
FROM food LEFT JOIN consumption ON (food.id=consumption.food_id)
ORDER BY consumption_times;
What I wish is that you could mix aggregate functions and F expression, annotate F expressions without an aggregate function, have a richer set of operations/functions for F expressions, and have virtual fields that are basically an automatic F expression annotation. So that you could do:
Food.objects.annotate(consumption_times=Count(If(F('consumer')==user,True,None)))\
.order_by('consumtion_times')
Also, just being able more easily able to add your own complex aggregate functions would be nice, but in the meantime, here's a hack that adds an aggregate function to do this.
from django.db.models import aggregates,sql
class CountIf(sql.aggregates.Count):
sql_template = '%(function)s(IF(%(field)s=%(equals)s,TRUE,NULL))'
sql.aggregates.CountIf = CountIf
consumption_times = aggregates.Count('consumer',equals=user.id)
consumption_times.name = 'CountIf'
rows = Food.objects.annotate(consumption_times=consumption_times)\
.order_by('consumption_times')