Equivalence of Django queryset filteration - django

I've been using the following two interchangeably in a Django project:
Comments.objects.filter(writer=self.request.user).latest('id')
and
Comments.objects.get(writer=self.request.user)
Are they equivalent in practice? The docs don't seem to explicitly address this.

They are not equivalent at all, but this greatly depends on the particular model. We do not know the particulars of your model Comments, but if we assume that field writer is not unique:
For the first statement:
Comments.objects.filter(writer=self.request.user).latest('id')
Returns in essence the object with the largest id amongst a queryset of all comments with the particular writer. If one takes a look at the django.db.connections['default'].queries, will see that the resulting query is a SELECT .. ORDER_BY .. LIMIT .. statement.
For the second statement:
Comments.objects.get(writer=self.request.user)
Returns the particular record fot that writer. In case there are more than one, you get a MultipleObjectsReturned exception. If no object is found, you get a DoesNotExist exception. In the event that this would be a unique field or there would be a single object by chance, the resulting query would be a SELECT .. WHERE statement which is faster.
Regarding the documentation, if you take a look at the Options.get_latest_by reference, there is more information regarding the purpose of the latest function. Think of it more of a convenience provided by Django. It is nonetheless very important to understand how Django evaluates queries and the resulting SQL, and there are always many ways to achieve the same query so it is a matter of logic.

I don't know why you would think these are equivalent, or why the docs should address this specifically.
In the case where you only have one matching Comment, yes this will give the same result. But the first version will do it via a more complex query, with an added sort on id.
If you have more than one Comment for that writer - as seems likely - the second version will always give a MultipleObjectsReturned error.

filter gives all lines corresponding your filter.
latest gives the most recent line (highest id value)
For example:
Comments.objects.filter(writer=self.request.user).latest('id') gets in first place all Comments written by self.request.user and then latest get the newest from them.
get is made to get a unique line, so :
Comments.objects.get(writer=self.request.user) will give the comment written by self.request.user. There should be only one. If a user can write many comments then you have to use filter or maybe all. It depends on what you want exactly.
More info here

Related

What's wrong with django queryset? I get different answer when access the same indice

I found that the objects could be duplicate in a queryset. However, when I try to access each of the object and do nothing, it changes and seems to be right.
Here are the commands I have typed into the shell
At first I gained a queryset orderby the field 'receiveTime'. Then it seems that ds[1996] equals to ds[1997]. And I try to use the loop:
for d in ds:
pass
Then the ds[1996] isn't equal to ds[1997], but what have I done?
Maybe it is a feature of the lazy search?
plus 1:I have reproduced it just now. I didn't do any inserting or deleting just now.
These are the commands I just typed into the shell.
plus 2:I have seen the raw sql queries when I call the ds[0] and ds[1] which I have shown in the picture 2. The sql queries are correct but the answer seems to be wrong. I think maybe the reason is that the sorting parameter receiveTime of two objects are the same, which lead to the disorder of the objects?
Here are the raw sql queries
Replace order_by("receive_time") with order_by("receive_time", "id"). PostgreSQL uses qsort which is an unstable sort. Given only receive_time, if values are the same, the order is not guaranteed.
Don't post code or logs in images. Ever.

Will Django use previously-evaluated results when applying additional filters to a query set?

Let's say I need to do some work both on a set of model objects, as well as a subset of the first set:
things = Thing.objects.filter(active=True)
for thing in things: # (1)
pass # do something with each `thing`
special_things = things.filter(special=True)
for thing in special_things: # (2)
pass # do something more with these things
My understanding is that at point (1) marked in the code above, an actual SQL query something like SELECT * FROM things_table WHERE active=1 will get executed against the database. The QuerySet documentation also says:
When a QuerySet is evaluated, it typically caches its results.
Now my question is, what happens at point (2) in the example Python code above?
Will Django execute a second SQL query, something like SELECT * FROM things_table WHERE active=1 AND special=1?
Or, will it use the cached result from earlier, automatically doing for me behind the scenes something like the more optimal filter(lambda d: d.special == True, things), i.e. avoiding a needless second trip to the database?
Either way, is the current behavior guaranteed (by documentation or something) or should I not rely on it? For example, it is not only a point of optimization, but could also make a possible logic difference if the database table is modified by another thread/process between the two potential queries.
It will execute a second SQL query. filter creates a new queryset, which doesn't copy the results cache.
As for guarantees - well, the docs specify that filter returns a new queryset object. I think you can be confident that that new queryset won't have cached results yet. As further support, the "when are querysets evaluated" docs suggest using .all() to get a new queryset if you want to pick up possibly changed results:
If the data in the database might have changed since a QuerySet was
evaluated, you can get updated results for the same query by calling
all() on a previously evaluated QuerySet.

Django filter vs exclude

Is there a difference between filter and exclude in django? If I have
self.get_query_set().filter(modelField=x)
and I want to add another criteria, is there a meaningful difference between to following two lines of code?
self.get_query_set().filter(user__isnull=False, modelField=x)
self.get_query_set().filter(modelField=x).exclude(user__isnull=True)
is one considered better practice or are they the same in both function and performance?
Both are lazily evaluated, so I would expect them to perform equivalently. The SQL is likely different, but with no real distinction.
It depends what you want to achieve. With boolean values it is easy to switch between .exclude() and .filter() but what about e.g. if you want to get all articles except those from March? You can write the query as
Posts.objects.exclude(date__month=3)
With .filter() it would be (but I not sure whether this actually works):
Posts.objects.filter(date__month__in=[1,2,4,5,6,7,8,9,10,11,12])
or you would have to use a Q object.
As the function name already suggest, .exclude() is used to exclude datasets from the resultset. For boolean values you can easily invert this and use .filter() instead, but for other values this can be more tricky.
In general exclude is opposite of filter. In this case both examples works the same.
Here:
self.get_query_set().filter(user__isnull=False, modelField=x)
You select entries that field user is not null and modelField has value x
In this case:
self.get_query_set().filter(modelField=x).exclude(user__isnull=True)
First you select entries that modelField has value x(both user in null and user is not null), then you exclude entries that have field user null.
I think that in this case it would be better use first option, it looks more cleaner. But both work the same.

How to limit columns returned by Django query?

That seems simple enough, but all Django Queries seems to be 'SELECT *'
How do I build a query returning only a subset of fields ?
In Django 1.1 onwards, you can use defer('col1', 'col2') to exclude columns from the query, or only('col1', 'col2') to only get a specific set of columns. See the documentation.
values does something slightly different - it only gets the columns you specify, but it returns a list of dictionaries rather than a set of model instances.
Append a .values("column1", "column2", ...) to your query
The accepted answer advising defer and only which the docs discourage in most cases.
only use defer() when you cannot, at queryset load time, determine if you will need the extra fields or not. If you are frequently loading and using a particular subset of your data, the best choice you can make is to normalize your models and put the non-loaded data into a separate model (and database table). If the columns must stay in the one table for some reason, create a model with Meta.managed = False (see the managed attribute documentation) containing just the fields you normally need to load and use that where you might otherwise call defer(). This makes your code more explicit to the reader, is slightly faster and consumes a little less memory in the Python process.

Django database query - return the most recent three objects

This can't be hard, but... I just need to get the most recent three objects added to my database field.
So, query with reverse ID ordering, maximum three objects.
Been fiddling round with
Records.objects.order_by(-id)[:3]
Records.objects.all[:3]
and including an if clause to check whether there are actually three objects:
num_maps = Records.objects.count()
if (num_maps > 3): # etc...
and using reverse() and filter() for a while...
But just can't figure it out! Nothing I do gives the right result and using num_maps feels pretty inelegant. Not getting much joy from the documentation. Can anyone help?!
All you should need is:
Records.objects.all().order_by('-id')[:3]
You need the all() first before the order_by and the argument you pass into order_by should be a string. No need to check if there are actually 3 before running this query because the [:3] will not break if there are less than 3.