Does iterator provide improvement when used together with values_list? - django

Recently I saw code which used together iterator() and values_list(). Does it make sence to use them both together? Will it improve speed or memory usage?
Sample code:
Customer.objects.values_list("pk", flat=True).iterator()

value_list() returns a QuerySet that returns dictionaries[docs].
QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated [docs].
You can see this to check when a queryset is evaluated.
In fact by creating a queryset django doesn't hit the database until it is being evaluated (by something like iterator).
as Iterator() read the results without caching will result to better performance in situations we need to access the objects just onece, and it's not related to the kind of querysets

Related

In Django, why does queryset's iterator() method reduce memory usage?

In Django, I can't understand why queryset's iterator() method reduces memory usage.
Django document says like below.
A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value). For a QuerySet which returns a large number of objects that you only need to access once, this can result in better performance and a significant reduction in memory.
In my knowledge, however, wheather iterator() method is used or not, after evaluation, the queried rows are fetched from the database at once and loaded into memory. Isn't it the same that memory proportional to the number of rows is used, wheather the queryset do caching or not? Then what is the benifit of using iterator() method, assuming evaluating the queryset only once?
Is it because raw data fetched from the database and data that is cached (after instantiating) are stored in separate memory spaces? If so, I think I can understand that not performing caching using the iterator() method saves memory.
When using iterator, Django uses DB cursors to get data row by row. If you use something like all and iterate on that with python, all of the records would be cached in the memory while you need one by one.
So by using iterator, you are using DB cursor, and fetching data one by one, and the other records would not be fetched all at once.

Is there a faster way to query from NDB using list?

I have a list that I I need to query the corresponding information from.
I can do:
for i in list:
database.query(infomation==i).fetch()
but this is so slow, because for every element in the list, it have to go to data base and then back, repeat, instead of querying everything at once. Is there a way to speed this process up?
You can use the ndb async operations to speed up your code. Basically you would launch all your queries pretty much in parallel, then process the results as they come in, which would result in potentially much faster overall execution, especially if your list is long. Something along these lines:
futures = []
for i in list:
futures.append(
database.query(infomation==i).fetch_async())
for future in futures:
results = future.get_result()
# do something with your results
There are more advanced ways of using the async operations described in the mentioned doc which you may find interesting, depending on the structure of your actual code.

In Django does .get() have better performance than .first()?

The Django implementation of .first() seems to get all items into a list and then return the first one.
Is .get() more performant ? Surely the database can just return one item, the implementation of .first() seems suboptimal,
I see no reason to think so, although I have not actually profiled.
Slicing on Django querysets is implemented by modifying the query to use LIMIT and OFFSET terms to retrieve only the necessary number of elements. This means the first() implementation only fetches a single element from the database.

Django queryset reversing ways

I know two ways to reverse ordered queryset:
qs.objects.filter(**cond).order_by('-field_name')
and
qs.objects.filter(**cond).order_by('field_name').reverse()
Is there any appropriate case (except of tempates) to use second way?
I don't believe there is an appropriate case, from what I understand qs.objects.filter(**cond).order_by('-field_name') is done at the database level, so it will outperform calling reverse on the queryset.

Django Queryset only() and defer()

In the real world, how often do people use QuerySet methods like defer() and only()?
I guess I handnt really heard much about them and only recently have I came across these methods.
See Docs here.
https://docs.djangoproject.com/en/dev/ref/models/querysets/
These methods are mostly of use when optimizing performance of your application.
Generally speaking, if you are not having performance problems, you don't need to optimize. And if you don't need to optimize, you don't need these functions. This is a case with a lot of advanced QuerySet features, such as select_related or prefetch_related.
As for "how often they are used in the real world", that isn't really an answerable question. They are used when they're needed. If you don't need them, don't use them.
defer() and only() are somewhat opposite of each other. Both receives list of field_names. defer() will not query, list of columns, passed to it as argument. On Contrary to it, only() will query, only list of columns, passed to it as argument.
Both are used in a scenario, where
you want either optimization, by avoiding unnecessary column fetching
you are implementing views in your python code. Eg.Admin have to
shown X no. of cols, User have to be shown Y no. of cols, Visitor
have to be shown z no of cols.