Django and writing queries with lots of joins - django

I have trouble to make these kind of queries with lots of joins. I didn't found examples, but I guess they are not so complicated to write. It's just there are several FKs.
Here is the models.py (not complicated)
class User(AbstractBaseUser, PermissionsMixin): # Django custom user model
# Some stuff
class CliProfile(models.Model):
user = models.OneToOneField(settings.AUTH_USER_MODEL)
class BizProfile(models.Model):
user = models.OneToOneField(settings.AUTH_USER_MODEL)
class Card(models.Model):
linked_client = models.ForeignKey(CliProfile, blank=True, null=True)
class Points(models.Model):
benef_card = models.ForeignKey(Card)
at_owner = models.ForeignKey(BizProfile)
creation_date = models.DateTimeField(auto_now_add=True)
Quick description of the model
a user can be a client (using CliProfile) or a business (using BizProfile)
each card is linked to a client
each card contains a [points - business] association
This way: a client has a card and can has 3 points at Pizza Hut, and 5 points at McDonalds with the same card)
The request I'm trying to write
Functionally speaking, the purpose is a owner (like PizzaHut) can see all his clients (client who have cards which has points at Pizza Hut)
Technically speaking, I'm trying to write a query to get all clients (ie. a CliProfile queryset) whose cards (at least 1 of all) whose points (at least 1 of all) whose owner (there is only 1) whose user (there is only 1) = request.user ?
Do you have any idea how to write such a query? Thanks a lot.

To match fields within models in filter() you need to use two underscores. The following worked for me
CliProfile.objects.filter(card__points__at_owner=request.user)
But #Alex's suggestion makes the most sense unless this was just an example of what you are trying to do.
If you wanted profiles that are associated with one of several cards you can use the __in field lookup:
CliProfile.objects.filter(card__in=IterableOfCards)
Also you don't use == in filter(). That would return True or False and then pass that value in the filter() call effectively making the call filter(True or False) which won't do anything useful. you have to use = because you are passing a named parameter into the filter function.
Why card instead of card_set()?
cart_set only exists within an instance of a CliProfile. You are not in an instance of a CliProfile, you are trying to get a list of them.
You can try it in the terminal and it will tell you the valid choices.
#Note that it doesn't matter what you put after=, since it fails before that is checked.
>>> CliProfile.objects.filter(card_set=True)
FieldError: Cannot resolve keyword 'card_set' into field. Choices are: card, id, user
a CliProfile can be referenced by multiple cards, which is why card_set exists in it but you are trying to match one card. The card whose points at_owner field is request.user.
You would use a_cliprofile_instance.card_set.filter() to get a subset of their cards or a_cliprofile_instance.card_set.all() to display all of their cards

Related

Django bulk_create() with models' fields having custom validators

In my Django application, I am using bulk_create(). For one of the fields in a target model I have assigned a set of validators to restrict the allowed value to uppercase letters (alphabets) and to a fixed length of "3", as shown below:
class Plant(models.Model):
plant = models.CharField(primary_key=True, max_length=4, ...
plant_name = models.CharField(max_length=75, ...
plant_short_name = models.CharField(max_length=3, validators=[...
# rest of the fields ...
I am restricting field plant_short_name to something like CHT for say, Plant Charlotte.
Using the source file (.csv) I am able to successfully create new instances using bulk_create, however I find that the data get saved even with field plant_short_name's value being different.
For example, if I use the source as:
plant,plant_name,plant_short_name
9999,XYZ Plant,XY
the new instance still gets created although the length of (string) value of field plant_short_name is only 2 (instead of 3 as defined in the validators).
If I am to use an online create function (say, Django CreateView), the validators work as expected.
How do I control / rstrict the creation of model instance when a field value of incorrect length is used in the source file?
bulk_create():
This method inserts the provided list of objects into the database in
an efficient manner (generally only 1 query, no matter how many
objects there are). Also, does not call save() on each of the
instances, do not send any pre/post_save signals.
By efficient manner it means there is no validation. You can explore more of the function code in django/models/db/query.py inside the environment.

django subquery with a join in it

I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.
I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.

Counting the number of related objects with a certain value in Django

This are simplified models to demonstrate my problem:
class User(models.Model):
username = models.CharField(max_length=30)
total_readers = models.IntegerField(default=0)
class Book(models.Model):
author = models.ForeignKey(User)
title = models.CharField(max_length=100)
class Reader(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
So, we have Users, Books and Readers (Users, who have read a Book). Thus, Reader is basically a many-to-many relationship between Book and User.
Now let's say, the current user reads a book. Now, I'd like to update the number of total readers for all books of this book's author:
# get the book (as an example pk=1)
book = Book.objects.get(pk=1)
# save Reader object for this user and this book
Reader(user=request.user, book=book).save()
# count and save the total number of readers for this author in all his books
book.author.total_readers = Reader.objects.filter(book__author=book.author).count()
book.author.save()
By doing so, Django creates a LEFT OUTER JOIN query for PostgreSQL and we get the expected result. However, the database tables are huge and this has become a bottleneck.
In this example, we could simply increase the total_readers by one on each view, instead of actually counting the database rows. However, this is just a simplified model structure and we cannot do this in reality here.
What I can do, is creating another field in the Reader model called book_author_id. Thus, I denormalize data and can count the Reader objects without having PostgreSQL making the LEFT OUTER JOIN with the User table.
Finally, here's my question: Is it possible to create some sort of database index, so that PostgreSQL handles this denormalization automatically? Or do I really have to create this additional model field and redundantly store the author's PK in there?
EDIT - to point out the essential question: I got several great answers, which work for a lot of scenarios. However, they don't solve this actual problem. The only thing I'd like to know, is if it's possible to have PostgreSQL handle such a denormalization automatically - e.g. by creating some sort of database index.
Sometimes, this query can serve better:
book.author.total_readers = Reader.objects.filter(book__in=Book.objects.filter(author=book.author)).count()
That will generate query with sub-query, sometimes it will have better performance that query with join. You even go further and end up creating 2 queries separately:
book.author.total_readers = Reader.objects.filter(book_id__in=Book.objects.filter(author=book.author).values_list('id', flat=True)).count()
That will generate 2 queries, one will retrieve list of all book IDs for that author and second will retrieve count of reads for books with ID in that list.
Good solution also may be to create some batch task that will run for example once per hour and count up all reads, but that way you will end up with not live refreshing count of reads.
You can also create celery task that will run just after read is created to generate new value for author. That way you won't have long response time and delay from creating read to counting it up won't be so long.
It's always way better to solve bottlenecks of this sort with good design and maybe a little bit of caching rather than duplicating data in the way you suggest. The total_readers field is data you should generate instead of recording.
class User(models.Model):
username = models.CharField(max_length=30)
#property
def total_readers(self):
cached_value = caching_client.get("readers_"+self.username, None)
if cached_value is None:
cached_value = self.readers()
caching_client.set("readers_"+self.username,
cached_value)
return cached_value
def readers(self):
return Reader.objects.filter(book__author__user=self).count()
There are libraries that do the caching via decorators but I felt it was a pattern you would benefit from seeing expressly. You can also attach a TTL to the cache so that you insure that the value can't be wrong for longer than TTL. You can also regenerate the cache upon creation of a Reader object.
You might actually get some mileage with declaring an m2m and defining through relationships but I have no experience of it.

Django filter on two fields of the same foreign key object

I have a database schema similar to this:
class User(models.Model):
… (Some fields irrelevant for this query)
class UserNotifiy(models.Model):
user = models.ForeignKey(User)
target = models.ForeignKey(<Some other Model>)
notification_level = models.SmallPositivIntegerField(choices=(1,2,3))
Now I want to query for all Users that have a UserNotify object for a specific target and at least a specific notification level (e.g. 2).
If I do something like this:
User.objects.filter(usernotify__target=desired_target,
usernotify__notification_level__gte=2)
I get all Users that have a UserNotify object for the specified target and at least one UserNotify object with a notification_level greater or equal to 2. These two UserNotify objects, however, do not have to be identical.
I am aware that I can do something like this:
user_ids = UserNotify.objects.filter(target=desired_target,
notification_level__gte=2).values_list('user_id', flat=True)
users = User.objects.filter(id__in=user_ids).distinct()
But this seems a step too much for me and I believe it executes two queries.
Is there a way to solve my problem with a single query?
Actually I don't see how you can run the first query, given that usernotify is not a valid field name for User.
You should start from UserNotify as you did in your second example:
UserNotify.objects.filter(
target=desired_target,
notification_level__gte=2
).select_related('user').values('user').distinct()
I've been looking for this behaviour but I've never found a better way than the one you describe (creating a query for user ids and inject it in a User query). Note this is not bad since if your database support subqueries, your code should fire only one request composed by a query and a subquery.
However, if you just need a particular field from the User objects (for example first_name), you may try
qs = (UserNotify.objects
.filter(target=desired_target, notification_level__gte=2)
.values_list('user_id', 'user__first_name')
.order_by('user_id')
.distinct('user_id')
)
I am not sure if I understood your question, but:
class User(models.Model):
… (Some fields irrelevant for this query)
class UserNotifiy(models.Model):
user = models.ForeignKey(User, related_name="notifications")
target = models.ForeignKey(<Some other Model>)
notification_level = models.SmallPositivIntegerField(choices=(1,2,3))
Then
users = User.objects.select_related('notifications').filter(notifications__target=desired_target,
notifications__notification_level__gte=2).distinct('id')
for user in users:
notifications = [x for x in user.notifications.all()]
I don't have my vagrant box handy now, but I believe this should work.

Generating a single queryset with filtered summary data across a foreign key?

I have a small Django project to learn with (it's a web UI for the RANCID backup software) and I've run into a problem.
The model for the app defines Devices, and DeviceGroups. Each Device is a member of a group and has a couple of state flags - Enabled, Successful - to indicate if they are operating correctly. Here's the relevant bits.
class DeviceGroup(models.Model):
group_name = models.CharField(max_length=60,unique=True)
class Device(models.Model):
hostname = models.CharField(max_length=60,unique=True)
enabled = models.BooleanField(default=True)
device_group = models.ForeignKey(DeviceGroup)
last_was_success = models.BooleanField(default=False,editable=False)
I have a summary table on the front 'dashboard' page, that shows a list of all the groups, and for each group, how many devices are in it. I'd like to also show the number of Active devices, and the number of failing (i.e. Not last_was_success) devices per-group. The plain device count is already available through the ForeignKey field.
This seems like the kind of thing that annotate is for, but not quite. And actually, I'm not sure how I'd do it with raw SQL either. Most likely as three queries and some lookup afterwards, or subqueries.
So - is it possible 'nicely' in Django? Or alternatively, how do you do the joining up again in the Template or View? The object passed into the template is simply:
device_groups = DeviceGroup.objects.order_by('group_name')
currently, and I don't think I can just add extra fields onto the queryset results "manually", can I? i.e. it's not a dict or similar.
i think you must use
device_groups = DeviceGroup.objects.all().order_by('group_name')
or
device_groups = DeviceGroup.objects.filter(condition).order_by('group_name')