How to avoid N+1 when counting in django - django

look at the following scenario:
I have an User model and an Address model that belongs to user.
In the user index, I need to show along with user's info how many addresses does the user have, but it's generating N+1 queries as everytime I call count it executes an additional query for that user id.
How can I do that? I read about select_related but I'm trying to make it in the reverse order...
In SQL it could be translated to:
SELECT user.*,
(SELECT count(*) FROM address WHERE address.user_id = user.id) AS address_count
FROM user
Is there a way to get the above SQL with django QuerySet?

You can annotate the number of addresses, you haven't shown your models but you can use the following on your queryset
.annotate(address_count=Count('address'))
User.objects.all().annotate(address_count=Count('address')) # Im guessing you want this
This would provide an address_count property on for each result
Docs for count

Related

django setting filter field with a variable

I show a model of sales that can be aggregated by different fields through a form. Products, clients, categories, etc.
view_by_choice = filter_opts.cleaned_data["view_by_choice"]
sales = sales.values(view_by_choice).annotate(........).order_by(......)
In the same form I have a string input where the user can filter the results. By "product code" for example.
input_code = filter_opts.cleaned_data["filter_code"]
sales = sales.filter(prod_code__icontains=input_code)
What I want to do is filter the queryset "sales" by the input_code, defining the field dynamically from the view_by_choice variable.
Something like:
sales = sales.filter(VARIABLE__icontains=input_code)
Is it possible to do this? Thanks in advance.
You can make use of dictionary unpacking [PEP-448] here:
sales = sales.filter(
**{'{}__icontains'.format(view_by_choice): input_code}
)
Given that view_by_choice for example contains 'foo', we thus first make a dictionary { 'foo__icontains': input_code }, and then we unpack that as named parameter with the two consecutive asterisks (**).
That being said, I strongly advice you to do some validation on the view_by_choice: ensure that the number of valid options is limited. Otherwise a user might inject malicious field names, lookups, etc. to exploit data from your database that should remain hidden.
For example if you model has a ForeignKey named owner to the User model, he/she could use owner__email, and thus start trying to find out what emails are in the database by generating a large number of queries and each time looking what values that query returned.

Django queryset behind the scenes

**
Difference between creating a foreign key for consistency and for joins
**
I am fine to use Foreignkey and Queryset API with Django.
I just want to understand little bit more deeply how it works behind the scenes.
In Django manual, it says
a database index is automatically created on the ForeignKey. You can
disable this by setting db_index to False. You may want to avoid the
overhead of an index if you are creating a foreign key for consistency
rather than joins, or if you will be creating an alternative index
like a partial of multiple column index.
creating for a foreign key for consistency rather than joins
this part is confusing me.
I expected that you use Join keyword if you do query with Foreign key like below.
SELECT
*
FROM
vehicles
INNER JOIN users ON vehicles.car_owner = users.user_id
For example,
class Place(models.Model):
name = models.Charfield(max_length=50)
address = models.Charfield(max_length=50)
class Comment(models.Model):
place = models.ForeignKeyField(Place)
content = models.Charfield(max_length=50)
if you use queryset like Comment.objects.filter(place=1), i expected using Join Keyword in low level SQL command.
but, when I checked it by printing out queryset.query in console, it showed like below.
(I simplified with Model just to explains. below, it shows all attributes in my model. you can ignore attributes)
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" WHERE "bfm_comment"."place_id" = 1
creating a foreign key for consistency vs creating a foreign key for joins
simply, I thought if you use any queryset, it means using foreign key for joins. Because you can get parent's table data by c = Comment.objects.get(id=1) c.place.name easily. I thought it joins two tables behind scenes. But result of Print(queryset.query) didn't how Join Keyword but Find it by Where keyword.
The way I understood from an answer
Case 1:
Comment.objects.filter(place=1)
result
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment"
WHERE "bfm_comment"."id" = 1
Case 2:
Comment.objects.filter(place__name="df")
result
SELECT "bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" INNER JOIN "bfm_place" ON ("bfm_comment"."place_id" = "bfm_place"."id")
WHERE "bfm_place"."name" = df
Case1 is searching rows which has comment.id column is 1 in just Comment table.
But in Case 2, it needs to know Place table's attribute 'name', so It has to use JOIN keyword to check values in column of Place table. Right?
So Is it alright to think that I create a foreign key for joins if i use queryset like Case2 and that it is better to create index on the Foreign Key?
for above question, I think I can take the answer from Django Manual
Consider adding indexes to fields that you frequently query using
filter(), exclude(), order_by(), etc. as indexes may help to speed up
lookups. Note that determining the best indexes is a complex
database-dependent topic that will depend on your particular
application. The overhead of maintaining an index may outweigh any
gains in query speed
In conclusion, it really depends on how my application work with it.
If you execute the following command the mystery will be revealed
./manage.py sqlmigrate myapp 0001
Take care to replace myapp with your app name (bfm I think) and 0001 with the actual migration where the Comment model is created.
The generated sql will reveal that the actual table is created with place_id int rather than a place Place that is because the RDBMS doesn't know anything about models, the models are only in the application level. It's the job of the django orm to fetch the data from the RDBMS and convert them into model instances. That's why you always get a place member in each of your Comment instances and that place member gives you access to the members of the related Place instance in turn.
So what happens when you do?
Comment.objects.filter(place=1)
Django is smart enough to know that you are referring to a place_id because 1 is obviously not an instance of a Place. But if you used a Place instance the result would be the same. So there is no join here. The above query would definitely benefit from having an index on the place_id, but it wouldn't benefit from having a foreign key constraint!! Only the Comment table is queried.
If you want a join, try this:
Comment.objects.filter(place__name='my home')
Queries of this nature with the __ often result in joins, but sometimes it results in a sub query.
Querysets are lazy.
https://docs.djangoproject.com/en/1.10/topics/db/queries/#querysets-are-lazy
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:

Filtering in django admin using address bar

Suppose I have a model Order, which has a column num -- an order number. Now I want to filter several rows from this model in admin view. Having 1 value, I do:
http://bla-bla-bla/admin/app/order/?num__exact=11534
How can I do this when I have several values?
Or should I use queryset()? How then I should send a list of values to request?
in should work, try this in the url
http://bla-bla-bla/admin/app/order/?num__in=11534,11535,11536
Don't forget that whatever you put in the query string has to be allowed for the admin interface. You can't put in filters that weren't defined there - ever since this security release https://www.djangoproject.com/weblog/2010/dec/22/security/

How can I get GROUP BY to work the Django ORM where I want all fields and the object

I need to group all entries by user and get the count doing something like this:
class Promotion(models.Model):
pass
class Entry(models.Model):
user = models.ForeignKey('User');
promotion = models.ForeignKey('Promotion')
def get_uniques(promotion_id):
promotion = Promotion.objects.get(promotion_id)
entries = promotion.entry_set.annotate(Count('user'))
return entries
However it's returning the same user multiple times. I've also tried the following after looking around StackOverflow, and it seem to be doing something other than what I want:
promotion.entry_set.annotate(Count('user')).order_by('user')[:10]
promotion.entry_set.all().values('user').annotate(entry_count=Count('user')).order_by()
Entry.objects.filter(promotion=promotion).annotate(Count('user')).order_by('user')
Basically I'm trying to do this, giving me an Entry object for each user:
Entry.objects.raw("""
SELECT *
FROM promotion_entry
WHERE promotion_id = %s
GROUP BY user_id""", (promotion_id,))
Then I'll perform a second query to get the entry count, still not ideal. Can I do a GROUP BY without raw?
There seem to be a ticket that would let me do what I want in the future over on the bugtracker by enabling DISTINCT ON: https://code.djangoproject.com/ticket/6422
If you want to count entries for each user use:
promotion.entry_set.all().values('user').annotate(entry_count=Count('id')).order_by()
If you want some entry for each user use:
promotion.entry_set.all().values('user').annotate(entry_id=Max('id')).order_by()
This will give you id's of entries, use __in to get objects themselves.

Lots of queries from django foreignkey fields

I've been drooling over Django all day while coding up an internal website in record time, but now I'm noticing that something is very inefficient with my ForeignKeys in the model.
I have a model which has 6 ForeignKeys, which are basically lookup tables. When I query all objects and display them in a template, it's running about 10 queries per item. Here's some code, which ought to explain it better:
class Website(models.Model):
domain_name = models.CharField(max_length=100)
registrant = models.ForeignKey('Registrant')
account = models.ForeignKey('Account')
registrar = models.ForeignKey('Registrar')
server = models.ForeignKey('Server', related_name='server')
host = models.ForeignKey('Host')
target_server = models.ForeignKey('Server', related_name='target')
class Registrant(models.Model):
name = models.CharField(max_length=100)
...and 5 more simple tables. There are 155 Website records, and in the view I'm using:
Website.objects.all()
It ends up executing 1544 queries. In the template, I'm using all of the foreign fields, as in:
<span class="value">Registrant:</span> {{ website.registrant.name }}<br />
So I know it's going to run a lot of queries...but it seems like this is excessive. Is this normal? Should I not be doing it this way?
I'm pretty new to Django, so hopefully I'm just doing something stupid. It's definitely a pretty amazing framework.
You should use the select_related function, e.g.
Website.objects.select_related()
so that it will automatically do a join and follow all of those foreign keys when the query is performed instead of loading them on demand as they are used. Django loads data from the database lazily, so by default you get the following behavior
# one database query
website = Website.objects.get(id=123)
# first time account is referenced, so another query
print website.account.username
# account has already been loaded, so no new query
print website.account.email_address
# first time registrar is referenced, so another query
print website.registrar.name
and so on. If you use selected related, then a join is performed behind the scenes and all of these foreign keys are automatically followed and loaded on the first query, so only one database query is performed. So in the above example, you'd get
# one database query with a join and all foreign keys followed
website = Website.objects.select_related().get(id=123)
# no additional query is needed because the data is already loaded
print website.account.username
print website.account.email_address
print website.registrar.name