Query on database - django

In my project, I faces some situations that I need to query on same model several times in the same view. (django model in this case as I am using django and postgresql).
The first approach for this may be filtering several times on the same model.
The another approach may be that I query on the model and fetched all the data and then saved that into a local variable. Then I can make filter on that variable several times.
which approach is most efficient I mean faster and which approach should I go through.
Lets say I have a model named People and I can take the following two approaches:
(1)
active_peoples = People.objects.filter(active=True)
lazy_peoples = People.objects.filter(lazy=True)
inactive_peoples = People.objects.filter(active=False)
good_peoples = People.objects.filter(good=True)
bad_peoples = People.objects.filter(good=False)
(2)
peoples = People.objects.all()
lazy_peoples = peoples.filter(lazy=True)
inactive_peoples = peoples.filter(active=False)
good_peoples = peoples.filter(good=True)
bad_peoples = peoples.filter(good=False)
Which approach is faster??

I think it' s totally depends on the datasets and your coding, see Django provides best filtering methods , which can filter your data in efficient way with less time constrain.
First test case:-
Suppose you have small dataset, then may be hitting database several times and fetching data may takes more time than fetching it once, stores in one variable and iterating through that. In this case better you go with storing data in one variable.
Second test case:-
Suppose you have large dataset,in this if you fetched data with djago filters every time may take less time than fetching it once, store in variable and after storing in variable iterating through it with less complexity algorithm.

Related

PERFORMANCE Calling multiple times a posgres db to get filtered queries vs querying all objects and filtering in my django view

I'm working in a Django project and it has a postgreSQL db.
I'm calling multiple times the model to filter the results:
latest = Product.objects.all().order_by('-update_date')[:4]
best_rate = Product.objects.all().order_by('rating')[:2]
expensive = Product.objects.all().order_by('-price')[:3]
But I wonder if it's better for performance and resources consumption to just do 1 query and get all the objects from the database and do the filtering inside my Django view.
all = Product.objects.all()
# Do some filtering here iterating over variable all
Which of these do you think would be the best approximation? Or do you have a better option?
The second way you suggested will be better, you can do this way i guess:
products_all = Products.objects.all()
latest = products_all.order_by('-update_date')[:4]
best_rate = products_all.order_by('rating')[:2]
expensive = products_all.order_by('-price')[:3]
The point of database tech is to allow your program to work with very large datasets. Your requirement is to retrieve the newest / best / cheapest items from your list of products.
Should you do this with one query operation or three? That depends on how many products you will have in your list when you're running at scale. If you know you will never have more than, say, 100 products, retrieve the data with one query and filter in your Django program. But if you will eventually have thousands of products, use three separate filters.
You don't want your application to take more and more RAM as it scales up.

Django table or Dict: performance?

I have multiple small key/value tables in Django, and there value never change
ie: 1->"Active", 2->"Down", 3->"Running"....
and multiple times, I do some get by id and other time by name.
So I'm asking, if it's not more optimize to move them all as Dict (global or in models) ?
thank you
Generally django querysets are slower than dicts, so if you want to write model with one field that has these statuses (active, down, running) it's generally better to use dict until there is need for editability.
Anyway I don't understand this kind of question, the performance benefits are not really high until you got ~10k+ records in single QS, and even by then you can cast the whole model to list by using .values_list syntax. Execution will take approximately part of second.
Also if I understand, these values should be anyway in models.CharField with choices field set, rather than set up by fixture in models.ForeignKey.

Optimising API queries using JSONField()

Initial opening: I am utilising postgresql JSONFields.
I have the following attribute (field) in my User model:
class User(AbstractUser):
...
benefits = JSONField(default=dict())
...
I essentially currently serialize benefits for each User on the front end with DRF:
benefits = UserBenefit.objects.filter(user=self)
serializer = UserBenefitSerializer(benefits, many=True)
As the underlying returned benefits changes little and slowly, I thought about "caching" the JSON in the database every time there is a change to improve the performance of the UserBenefit.objects.filter(user=user) QuerySet. Instead, becoming user.benefits and hopefully lightening DB load over 100K+ users.
1st Q:
Should I do this?
2nd Q:
Is there an efficient way to write the corresponding serializer.data <class 'rest_framework.utils.serializer_helpers.ReturnList'> to the JSON field?
I am currently using:
data = serializers.serialize("json", UserBenefit.objects.filter(user=self))
For your first question:
It's not a bad idea if you don't want to use caching alternatives.
If you have to query the database because of some changes or ... and you can't cache the hole request, then the idea of saving a JSON object can be a pretty good idea. This way you only retrieve the data and skip most parts of serializing and also terminate the need to query a pivot table to get the m2m data. But also note that this way, you are adding a whole bunch of extra data to your rows and unless you're going to need them most of the time, and you will get extra data that you don't really need which you can help it using values function on querysets but still it requires more coding. Basically, you're going to use more bandwidth for your first query and more storage to store the data instead of process power. Also, the pagination will be really hard to achieve on your benefits if you need it at some point.
Getting m2m relation data is usually pretty fast depending on the amount of data you have on your database but the ultimate way of getting better performance is caching the requests and reducing the database hits as much as possible.
And as you probably hear it a lot, you should test and benchmark to see which options really works for you the best depending on your requirements and limitations. It's really hard to suggest an optimization method without knowing the information about the whole scope and the current solution.
And for your second question:
I think I don't really get it. If you are storing a JSON object which is a field in User model, then why do you need data = serializers.serialize("json", UserBenefit.objects.filter(user=self)) ?
You don't need it since the serializer can just return the JSON field data.

Which is a more efficient method, using a list comprehension or django's 'values_list' function?

When attempting to return a list of values from django objects, will performance be better using a list comprehension:
[x.value for x in Model.objects.all()]
or calling list() on django's values_list function:
list(Model.objects.values_list('value', flat=True))
and why?
The most efficient way is to do the second approach (using values_list()). The reason for this is that this modifies the SQL query that is sent to the database to only select the values provided.
The first approach FIRST selects all values from the database, and after that filters them again. So you have already "spend" the resources to fetch all values with that approach.
You can compare the queries generated by wrapping your QuerySet with str(queryset.query) and it will return the actual SQL query that gets executed.
See example below
class Model(models.Model):
foo = models.CharField()
bar = models.CharField()
str(Model.objects.all().query)
# SELECT "model"."id", "model"."foo", "model"."bar" FROM "model"
str(Model.objects.values_list("foo").query)
# SELECT "model"."foo" FROM "model"
I had also somewhat assumed the argument in the currently-accepted answer would be correct. Namely, having a fewer number of fields being fetched would lead to Model.objects.all() taking less time than Model.objects.values_list('foo') to execute. However, I didn't find this in practice when using %timeit.
I actually found that doing
Model.objects.values_list('foo', flat=True) would take ~2-10x longer than just Model.objects.all(). I found this was the case for
an empty django table
a table with 10s of rows
a table with millions of rows
Including/removing flat=True seemed to make no significant difference in executing time for values_list. I would be interested what others find as well?
So this makes me think from a pure "what SQL is executed" point of view, although the values_list ORM query fetches fewer field values from the db, I imagine there is more logic still within the source django code of .all() vs .values_list() which could lead to different additional execution times (including .all() taking less time).
However, to fully address the initial example code, we would also need to factor in any further considerations affecting the execution time due to using a list comprehension [] in the .all() case VS list() in the .values_list() case. The general discussion of list() VS a list comprehension is covered in other questions already.
TLDR So I imagine it is a trade-off between those 2 factors.
the apparent difference in execution time between .values_list() and .all() (which from my tests indicate we can't simply deduce fewer fields being fetched leads to faster execution - more investigation of underlying django source code needed for cause of this)
any differences between using a list comprehension and list()
In my test cases, I generally found the .all() query was actually faster than the .values_list() query, but when also factoring in the transformation to a list, the .values_list scenario would overall take less time. So it may well depend on the scenario...

Selecting a random row in Django, quickly

I have a view which returns data associated with a randomly-chosen row from one of my models. I'm aware of order_by('?') and its performance problems, and I want to avoid using order_by('?') in my view.
Because the data in my model changes very rarely (if at all), I'm considering the approach of caching the entire model in memory between requests. I know how many records I'm dealing with, and I'm comfortable taking the memory hit. If the model does change somehow, I could regenerate the cache at that moment.
Is my strategy reasonable? If so, how do I implement it? If not, how can I quickly select a random row from a model that changes very rarely if at all?
If you know the ids of your object, and its range you can randomize over the ids, and then query the database
A better approach might be to keep the number of objects in your cache, and simply retrieve a random one when you need it:
item_number = random.randint(MODEL_COUNT)
MyModel.objects.all()[item_number]