Django QuerySet access foreign key field directly, without forcing a join - django

Suppose you have a model Entry, with a field "author" pointing to another model Author. Suppose this field can be null.
If I run the following QuerySet:
Entry.objects.filter(author=X)
Where X is some value. Suppose in MySQL I have setup a compound index on Entry for some other column and author_id, ideally I'd like the SQL to just use "author_id" on the Entry model, so that it can use the compound index.
It turns out that Entry.objects.filter(author=5) would work, no join is done. But, if I say author=None, Django does a join with Author, then add to the Where clause Author.id IS NULL. So in this case, it can't use the compound index.
Is there a way to tell Django to just check the pk, and not follow the link?
The only way I know is to add an additional .extra(where=['author_id IS NULL']) to the QuerySet, but I was hoping some magic in .filter() would work.
Thanks.
(Sorry I was not clearer earlier about this, and thanks for the answers from lazerscience and Josh).

Does this not work as expected?
Entry.objects.filter(author=X.id)
You can either use a model or the model id in a foreign key filter. I can't check right yet if this executes a separate query, though I'd really hope it wouldn't.

If do as you described and do not use select_related() Django will not perform any join at all - no matter if you filter for the primary key of the related object or the related itself (which doesn't make any difference).
You can try:
print Entry.objects.(author=X).query

Assuming that the foreign key to Author has the name author_id, (if you didn't specify the name of the foreign key column for ForeignKey field, it should be NAME_id, if you specified the name, then check the model definition / your database schema),
Entry.objects.filter(author_id=value)
should work.

Second Attempt:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#isnull
Maybe you can have a separate query, depending on whether X is null or not by having author__isnull?

Pretty late, but I just ran into this. I'm using Q objects to build up the query, so in my case this worked fine:
~Q(author_id__gt=0)
This generates sql like
NOT ("author_id" > 0 AND "author_id" IS NOT NULL)
You could probably solve the problem in this question by using
Entry.objects.exclude(author_id__gt=0)

Related

Django queryset behind the scenes

**
Difference between creating a foreign key for consistency and for joins
**
I am fine to use Foreignkey and Queryset API with Django.
I just want to understand little bit more deeply how it works behind the scenes.
In Django manual, it says
a database index is automatically created on the ForeignKey. You can
disable this by setting db_index to False. You may want to avoid the
overhead of an index if you are creating a foreign key for consistency
rather than joins, or if you will be creating an alternative index
like a partial of multiple column index.
creating for a foreign key for consistency rather than joins
this part is confusing me.
I expected that you use Join keyword if you do query with Foreign key like below.
SELECT
*
FROM
vehicles
INNER JOIN users ON vehicles.car_owner = users.user_id
For example,
class Place(models.Model):
name = models.Charfield(max_length=50)
address = models.Charfield(max_length=50)
class Comment(models.Model):
place = models.ForeignKeyField(Place)
content = models.Charfield(max_length=50)
if you use queryset like Comment.objects.filter(place=1), i expected using Join Keyword in low level SQL command.
but, when I checked it by printing out queryset.query in console, it showed like below.
(I simplified with Model just to explains. below, it shows all attributes in my model. you can ignore attributes)
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" WHERE "bfm_comment"."place_id" = 1
creating a foreign key for consistency vs creating a foreign key for joins
simply, I thought if you use any queryset, it means using foreign key for joins. Because you can get parent's table data by c = Comment.objects.get(id=1) c.place.name easily. I thought it joins two tables behind scenes. But result of Print(queryset.query) didn't how Join Keyword but Find it by Where keyword.
The way I understood from an answer
Case 1:
Comment.objects.filter(place=1)
result
SELECT
"bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment"
WHERE "bfm_comment"."id" = 1
Case 2:
Comment.objects.filter(place__name="df")
result
SELECT "bfm_comment"."id", "bfm_comment"."content", "bfm_comment"."user_id", "bfm_comment"."place_id", "bfm_comment"."created_at"
FROM "bfm_comment" INNER JOIN "bfm_place" ON ("bfm_comment"."place_id" = "bfm_place"."id")
WHERE "bfm_place"."name" = df
Case1 is searching rows which has comment.id column is 1 in just Comment table.
But in Case 2, it needs to know Place table's attribute 'name', so It has to use JOIN keyword to check values in column of Place table. Right?
So Is it alright to think that I create a foreign key for joins if i use queryset like Case2 and that it is better to create index on the Foreign Key?
for above question, I think I can take the answer from Django Manual
Consider adding indexes to fields that you frequently query using
filter(), exclude(), order_by(), etc. as indexes may help to speed up
lookups. Note that determining the best indexes is a complex
database-dependent topic that will depend on your particular
application. The overhead of maintaining an index may outweigh any
gains in query speed
In conclusion, it really depends on how my application work with it.
If you execute the following command the mystery will be revealed
./manage.py sqlmigrate myapp 0001
Take care to replace myapp with your app name (bfm I think) and 0001 with the actual migration where the Comment model is created.
The generated sql will reveal that the actual table is created with place_id int rather than a place Place that is because the RDBMS doesn't know anything about models, the models are only in the application level. It's the job of the django orm to fetch the data from the RDBMS and convert them into model instances. That's why you always get a place member in each of your Comment instances and that place member gives you access to the members of the related Place instance in turn.
So what happens when you do?
Comment.objects.filter(place=1)
Django is smart enough to know that you are referring to a place_id because 1 is obviously not an instance of a Place. But if you used a Place instance the result would be the same. So there is no join here. The above query would definitely benefit from having an index on the place_id, but it wouldn't benefit from having a foreign key constraint!! Only the Comment table is queried.
If you want a join, try this:
Comment.objects.filter(place__name='my home')
Queries of this nature with the __ often result in joins, but sometimes it results in a sub query.
Querysets are lazy.
https://docs.djangoproject.com/en/1.10/topics/db/queries/#querysets-are-lazy
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:

Django filter a ForeignKey field when it is null

Let's say I have two tables in Django, TableA and TableB. Table A contains some boolean field, bool, and TableB contains a foreign key field, for_field to TableA, which can be Null.
class TableA(models.Model):
bool = models.BooleanField()
class TableB(models.Model):
for_field = models.ForeignKey('TableA', null=True)
If I want to filter TableB so as to get all the entries where for_field.bool is True or for_field is Null, what is the shortest way to achieve this?
I'm using .filter((Q(for_field__is_null=True) | Q(for_field__bool=True)), but I wonder if there's shorter code for this.
After some experiments it seems that .exclude(for_field__bool=False) will contain also for_field__isnull=True entries and will not raise any exceptions. You can be sure by executing .exclude(for_field__bool=False).filter(for_field__isnull=True) and see some results also.
And honestly I don't know which option is faster, but IMO your variant with two Q objects much more readable because it shows logic you're really want. So I actually suggest you to stick with it.
I'm pretty sure, that your option is the shortest possible (correct me if I'm wrong). That is because you can't do OR queries without Q objects.

Django annotate a field value to queryset

I want to attach a field value (id) to a QS like below, but Django throws a 'str' object has no attribute 'lookup' error.
Book.objects.all().annotate(some_id='somerelation__id')
It seems I can get my id value using Sum()
Book.objects.all().annotate(something=Sum('somerelation__id'))
I'm wondering is there not a way to simply annotate raw field values to a QS? Using sum() in this case doesn't feel right.
There are at least three methods of accessing related objects in a queryset.
using Django's double underscore join syntax:
If you just want to use the field of a related object as a condition in your SQL query you can refer to the field field on the related object related_object with related_object__field. All possible lookup types are listed in the Django documentation under Field lookups.
Book.objects.filter(related_object__field=True)
using annotate with F():
You can populate an annotated field in a queryset by refering to the field with the F() object. F() represents the field of a model or an annotated field.
Book.objects.annotate(added_field=F("related_object__field"))
accessing object attributes:
Once the queryset is evaluated, you can access related objects through attributes on that object.
book = Book.objects.get(pk=1)
author = book.author.name # just one author, or…
authors = book.author_set.values("name") # several authors
This triggers an additional query unless you're making use of select_related().
My advice is to go with solution #2 as you're already halfway down that road and I think it'll give you exactly what you're asking for. The problem you're facing right now is that you did not specify a lookup type but instead you're passing a string (somerelation_id) Django doesn't know what to do with.
Also, the Django documentation on annotate() is pretty straight forward. You should look into that (again).
You have <somerelation>_id "by default". For example comment.user_id. It works because User has many Comments. But if Book has many Authors, what author_id supposed to be in this case?

Django: Equivalent of "select [column name] from [tablename]"

I wanted to know is there anything equivalent to:
select columnname from tablename
Like Django tutorial says:
Entry.objects.filter(condition)
fetches all the objects with the given condition. It is like:
select * from Entry where condition
But I want to make a list of only one column [which in my case is a foreign key]. Found that:
Entry.objects.values_list('column_name', flat=True).filter(condition)
does the same. But in my case the column is a foreign key, and this query loses the property of a foreign key. It's just storing the values. I am not able to make the look-up calls.
Of course, values and values_list will retrieve the raw values from the database. Django can't work its "magic" on a model which means you don't get to traverse relationships because you're stuck with the id the foreign key is pointing towards, rather than the ForeignKey field.
If you need to filters those values, you could do the following (assuming column_name is a ForeignKey pointing to MyModel):
ids = Entry.objects.values_list('column_name', flat=True).filter(...)
my_models = MyModel.objects.filter(pk__in=set(ids))
Here's a documentation for values_list()
To restrict a query set to a specific column(s) you use .values(columname)
You should also probably add distinct to the end, so your query will end being:
Entry.objects.filter(myfilter).values(columname).distinct()
See: https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values
for more information
Depending on your answer in the comment, I'll come back and edit.
Edit:
I'm not certain if the approach is right one though. You can get all of your objects in a python list by getting a normal queryset via filter and then doing:
myobjectlist = map(lambda x: x.mycolumnname, myqueryset)
The only problem with that approach is if your queryset is large your memory use is going to be equally large.
Anyway, I'm still not certain on some of the specifics of the problem.
You have a model A with a foreign key to another model B, and you want to select the Bs which are referred to by some A. Is that right? If so, the query you want is just:
B.objects.filter(a__isnull = False)
If you have conditions on the corresponding A, then the query can be:
B.objects.filter(a__field1 = value1, a__field2 = value2, ...)
See Django's backwards relation documentation for an explanation of why this works, and the ForeignKey.related_name option if you want to change the name of the backwards relation.

QuerySet using only() to fetch reference with no deferring

Consider the following:
objs1 = MyModel.objects.filter(field1='1').only('foreign_key1','field2')
objs2 = MyModel.objects.filter(field1='2').only('foreign_key1','field2')
for o1 in objs1:
matches = [o2 for o2 in objs2 if o1.foreign_key1==o2.foreign_key1]
print len(matches)
only() makes all the other fields deferred. However AFAICT, although I requested foreign_key1 not to be deferred, it is! and the list comprehension takes a very long while due to the db being hit twice per itteration.
I also tried foreign_key1__id in the querysets but it didn't help. How can I not deffer the foreign key while still using only()?
Turns out the problem was not in the only() at all. As far as I can tell only() doesn't fetch related models, even if you give it the foreign key field. To fetch related models you need select_related(). Furthermore, notice that if null=True for the ForeignKey, you also need to give select_related() the specific foreign key field such as follows:
.select_related('foreign_key1')
the following is enough as well in my case:
.select_related('foreign_key1__id')
try this:
o1.foreign_key1_id==o2.foreign_key1_id
it should help.
only('xxxx_id') makes no sense.