I have a 'categories' model which I is used more than once on a page. Since I am obtaining all the categories at the start, I want to cut down on database queries by obtaining the same data more than once.
Since the initial query is getting ALL the categories, is there a way to store this information in the model so that when I reference the data again later, I don't have to hit the database again?
Perhaps some kind of associative array or dict which stores the categories?
Any help would be appreciated.
Django querysets are lazy and cached, so the database is not hit till the queryset is accessed. You should also take a look at how queries are evaluated.
If you could post some code, we could help you figure out an optimal way to write queries.
Related
I have multiple small key/value tables in Django, and there value never change
ie: 1->"Active", 2->"Down", 3->"Running"....
and multiple times, I do some get by id and other time by name.
So I'm asking, if it's not more optimize to move them all as Dict (global or in models) ?
thank you
Generally django querysets are slower than dicts, so if you want to write model with one field that has these statuses (active, down, running) it's generally better to use dict until there is need for editability.
Anyway I don't understand this kind of question, the performance benefits are not really high until you got ~10k+ records in single QS, and even by then you can cast the whole model to list by using .values_list syntax. Execution will take approximately part of second.
Also if I understand, these values should be anyway in models.CharField with choices field set, rather than set up by fixture in models.ForeignKey.
Initial opening: I am utilising postgresql JSONFields.
I have the following attribute (field) in my User model:
class User(AbstractUser):
...
benefits = JSONField(default=dict())
...
I essentially currently serialize benefits for each User on the front end with DRF:
benefits = UserBenefit.objects.filter(user=self)
serializer = UserBenefitSerializer(benefits, many=True)
As the underlying returned benefits changes little and slowly, I thought about "caching" the JSON in the database every time there is a change to improve the performance of the UserBenefit.objects.filter(user=user) QuerySet. Instead, becoming user.benefits and hopefully lightening DB load over 100K+ users.
1st Q:
Should I do this?
2nd Q:
Is there an efficient way to write the corresponding serializer.data <class 'rest_framework.utils.serializer_helpers.ReturnList'> to the JSON field?
I am currently using:
data = serializers.serialize("json", UserBenefit.objects.filter(user=self))
For your first question:
It's not a bad idea if you don't want to use caching alternatives.
If you have to query the database because of some changes or ... and you can't cache the hole request, then the idea of saving a JSON object can be a pretty good idea. This way you only retrieve the data and skip most parts of serializing and also terminate the need to query a pivot table to get the m2m data. But also note that this way, you are adding a whole bunch of extra data to your rows and unless you're going to need them most of the time, and you will get extra data that you don't really need which you can help it using values function on querysets but still it requires more coding. Basically, you're going to use more bandwidth for your first query and more storage to store the data instead of process power. Also, the pagination will be really hard to achieve on your benefits if you need it at some point.
Getting m2m relation data is usually pretty fast depending on the amount of data you have on your database but the ultimate way of getting better performance is caching the requests and reducing the database hits as much as possible.
And as you probably hear it a lot, you should test and benchmark to see which options really works for you the best depending on your requirements and limitations. It's really hard to suggest an optimization method without knowing the information about the whole scope and the current solution.
And for your second question:
I think I don't really get it. If you are storing a JSON object which is a field in User model, then why do you need data = serializers.serialize("json", UserBenefit.objects.filter(user=self)) ?
You don't need it since the serializer can just return the JSON field data.
I know how to write Actions that provide intermediate pages, since the docs are great:
https://docs.djangoproject.com/en/2.0/ref/contrib/admin/actions/#actions-that-provide-intermediate-pages
But, if my selection contains 100k rows, the pattern of the docs does not work since the URL gets too long.
How to write Django Admin Actions that provide intermediate pages and can handle +100k rows?
I solved it this way:
Pickle QuerySets
Store pickled QuerySet in the cache under a random ID
forward the random ID to the next page
the next pages use the random ID to read the QuerySet from the cache.
When i need something closer to that i used some grouping variables like: all, active, accepted, denied. By doing this grouping i can do some bulk action on huge large of data without creating a python list with thousands of pks.
Another good point to pay atention is that you need to pass that to the DB, otherwise you will have a enormous bottleneck on the views/models.
On a Django website of mine, users contribute posts, which are then showed globally on the home page, sorted by most-recent first.
I'm introducing redis into this mix, via doing an lpush of all post_ids into a redis list (which is kept trimmed at 1000 entries). The code is:
def add_post(link_id):
my_server = redis.Redis(connection_pool=POOL)
my_server.lpush("posts:1000", link_id)
my_server.ltrim("posts:1000", 0, 9999)
Then, when a user requests the contents of the home page, I simply execute the following query in the get_queryset method of the relevant class-based view:
Post.objects.filter(id__in=all_posts())
Where all_posts() is simply:
def all_posts():
my_server = redis.Redis(connection_pool=POOL)
return my_server.lrange("posts:1000", 0, -1)
Next, I iterate over the context["object_list"] in a Django template (i.e. {% for post in object_list %}, and one by one populate the latest posts for my users to see.
My problem is that this arrangement does not show most-recent first. It always shows most-recent last. So I changed lpush to rpush instead, but the result didn't change at all. Why isn't changing redis' list insert method changing the ordering of the results Django's queryset is returning to me?
Perhaps I'm missing something rudimentary. Please advise me on what's going on, and how can I fix this (is {% for post in object_list reversed %} my sole option here). My reason for taking the redis route was, naturally, performance. Prior to redis, I would do: Post.objects.order_by('-id')[:1000] Thanks in advance.
Note: please ask for more information if required.
You're iterating through a queryset that doesn't have an order_by clause, which means that you can't have any expectations about the order or the results. The __in clause just controls which rows to return, not their order.
The fact that the returned results are in the id order is an implementation detail. If you want to rely on that, you can just iterate through the queryset in reverse order. A more robust solution would be to reorder (in Python) the instances based on the order of the ids returned from Redis.
All that said, though, I don't think there will be any performance advantage to using Redis here. I think that any relational database with an index on id will be able to execute Post.objects.order_by('-id')[:1000] very efficiently. (Note that slicing a queryset does a LIMIT on the database; you're not fetching all the rows into Python and then slicing a huge list.)
I'm listing queryset results and would like to add an option for choosing the order results are displayed.
I would like to pass the actual data from the database to other page for sorting.
I was able to achieve such thing by getting all objects ids and use django session to recreate a new queryset based on the order criteria.
I was thinking if there is any other way to achieve such goal?
10x
Assuming you are currently displaying the data as a table, you could give chance to some javascript client side table sorter such as tablesorter. There are lots of javascript table sorte.
I'm away from my development machine right now, but I think you could just pass the list of ids to a new Queryset, pk__in=list_of_object_ids, and then use the native order_by function.
For example:
objs = Object.objects.filter(pk__in=list_of_object_ids).order_by('value_to_order_by')
Anyway, that's what I would try first, though I'm sure there are better optimizations.
For example, instead of a list of object ids, you could pass a dictionary with a key:value pair that has the value you want to order by.
For example:
[{'obj_id':1,'obj_value':'foo'},{'obj_id':2,'obj_value':'foo'}]
Then use some lambda function to sort it, like here.