Django - data too large for session

Django - data too large for session - django

I have a view in which I query the database then I give the queryset to session and use it in other views. It works fine most of the time but in vary rare cases when the queryset gets very large, it takes a long time and I get a timeout. What I would like to know is if I am doing the right thing? If not, what is the best practice for this case? What options do I have?

I never store QuerySet data in sessions. You need just to make a list (like [1,2,3,4,5]) of all id's you need, then send it.
Next step it is to get QuerySet from list of id's:
data_list = request.session['data_list']
services = Service.objects.filter(id__in=data_list)
and now you have same QuerySet you have before, but sessions never be filled.

Related

Optimising API queries using JSONField()

Initial opening: I am utilising postgresql JSONFields.
I have the following attribute (field) in my User model:
class User(AbstractUser):
...
benefits = JSONField(default=dict())
...
I essentially currently serialize benefits for each User on the front end with DRF:
benefits = UserBenefit.objects.filter(user=self)
serializer = UserBenefitSerializer(benefits, many=True)
As the underlying returned benefits changes little and slowly, I thought about "caching" the JSON in the database every time there is a change to improve the performance of the UserBenefit.objects.filter(user=user) QuerySet. Instead, becoming user.benefits and hopefully lightening DB load over 100K+ users.
1st Q:
Should I do this?
2nd Q:
Is there an efficient way to write the corresponding serializer.data <class 'rest_framework.utils.serializer_helpers.ReturnList'> to the JSON field?
I am currently using:
data = serializers.serialize("json", UserBenefit.objects.filter(user=self))

For your first question:
It's not a bad idea if you don't want to use caching alternatives.
If you have to query the database because of some changes or ... and you can't cache the hole request, then the idea of saving a JSON object can be a pretty good idea. This way you only retrieve the data and skip most parts of serializing and also terminate the need to query a pivot table to get the m2m data. But also note that this way, you are adding a whole bunch of extra data to your rows and unless you're going to need them most of the time, and you will get extra data that you don't really need which you can help it using values function on querysets but still it requires more coding. Basically, you're going to use more bandwidth for your first query and more storage to store the data instead of process power. Also, the pagination will be really hard to achieve on your benefits if you need it at some point.
Getting m2m relation data is usually pretty fast depending on the amount of data you have on your database but the ultimate way of getting better performance is caching the requests and reducing the database hits as much as possible.
And as you probably hear it a lot, you should test and benchmark to see which options really works for you the best depending on your requirements and limitations. It's really hard to suggest an optimization method without knowing the information about the whole scope and the current solution.
And for your second question:
I think I don't really get it. If you are storing a JSON object which is a field in User model, then why do you need data = serializers.serialize("json", UserBenefit.objects.filter(user=self)) ?
You don't need it since the serializer can just return the JSON field data.

django: saving a query set to session

I am trying to save query result obtained in one view to session, and retrieve it in another view, so I tried something like below:
def default (request):
equipment_list = Equipment.objects.all()
request.session['export_querset'] = equipment_list
However, this gives me
TypeError at /calbase/
<QuerySet [<Equipment: A>, <Equipment: B>, <Equipment: C>]> is not JSON serializable
I am wondering what does this mean exactly and how should I go about it? Or maybe there is alternative way of doing what I want besides using session?

If this is what you are saving:
equipment_list = Equipment.objects.all()
You shouldn't or wouldn't need to use sessions. Why? Because this is a simple query without any filtering. equipment_list would be common to all the users. This can quite easily be saved in the cache
from django.core.cache import cache
equipment_list = cache.get('equipment_list')
if not equipment_list:
equipment_list = Equipment.objects.all()
cache.set('equipment_list',equipment_list)
Note that a queryset can be saved in the cache without it having to be converted to values first.
Update:
One of the other answers mention that a querysets are not json serializable. That's only applicable when you are trying to pass that off as a json response. Isn't applicable when you are trying to cache it because django.core.cache does not use json serialization it uses pickling.

'e4c5' raises a concern which is perfectly valid. From the limited code we can see, it makes no sense to put in the results of that query into the session. Unless ofcourse you have some other plans which we cant quite see here. I'll ignore this and assume you ABSOLUTELY MUST save the query results into the session.
With this assumption, you must understand that the queryset instance which Django is giving you is a python object. You can move this around WITHIN your Django application without any hassles. However, whenever you attempt to send such an entity over the wire into some other data store/application (in your case, saving it into the session, which involves sending this data over to your configured session store), it must be serializable to some format which:
your application knows how to serialize objects into
the data store at the other end knows how to de-serialize. In this case, the accepted format seems to be JSON. (this is optional, the JSON string can be stored directly)
The problem is, the queryset instance not only contains the rows returned from the table, it also contains a bunch of other attributes and meta attributes which come in handy to you when you use the Django ORM API. When you try to send the queryset instance over the wire to your session store, the system knows no better and tries to serialize all these attributes into JSON. This fails because there are attributes within the queryset that are not serializable into JSON.
As far as a solution is concerned, if you must save data into the session, as some people have suggested, simply performing objects.all().values() and saving it into your session may not always work. A simple case is when your table returns datetime objects. Datetime objects are by default, not JSON serializable.
So what should you do? What you need is some sort of serializer which accepts a queryset, and safely iterates over the returned rows converting each python native datatype into a JSON safe equivalent, and then returning that. In case of datetime.datetime objects, you would need to call obj.isoformat() to transform it into an ISO format datetime string.

You cannot save a QuerySet instance in session, cause well as you said, they're not JSON Serializable. Read This for more information.
To save your queryset, you can use values and values_list methods to get your desired fields, then you cast them to a list and then save the list into session.(most of the time saving only the PKs does the job though).
so basically:
qset = Model.objects.values_list("pk", "field_one", "field_two") # Gives you a ValuesListQuerySet object which's still not serializable.
cache_results = list(qset)
# Now you cache the cache_results variable however you want.
redis.setex("cached:user_id:querytype", 10 * 60, json.dumps(cache_results))
It's also better to change the way you save this special result (values_list) so you can have better lookups, a dictionary might be a good choice.

Saving query sets in django sessions requires them to be serialized and that causes the error. One way of easily moving the query set by saving them in sessions is to make a list of the id of the Equipments model. (Or any other field that serves as the primary key of the model), like:
equipments = [equipment.id for equipment in Equipment.objects.all()]
request.session['export_querset'] = equipments
And then whenever you need the Equipments, traverse this list and get the corresponding Equipment.
equipments = [Equipment.objects.get(id=id) for id in request.session['export_querset']]
Note: This method is inefficient and is not recommended for large query sets, but for small query sets, it can be used without worries.

Django pagination random: order_by('?')

I am loving Django, and liking its implemented pagination functionality. However, I encounter issues when attempting to split a randomly ordered queryset across multiple pages.
For example, I have 100 elements in a queryset, and wish to display them 25 at a time. Providing the context object as a queryset ordered randomly (with the .order_by('?') specification), a completely new queryset is loaded into the context each time a new page is requested (page 2, 3, 4).
Explicitly stated: how do I (or can I) request a single queryset, randomly ordered, and display it across digestible pages?

I ran into the same problem recently where I didn't want to have to cache all the results.
What I did to resolve this was a combination of .extra() and raw().
This is what it looks like:
raw_sql = str(queryset.extra(select={'sort_key': 'random()'})
.order_by('sort_key').query)
set_seed = "SELECT setseed(%s);" % float(random_seed)
queryset = self.model.objects.raw(set_seed + raw_sql)
I believe this will only work for postgres. Doing a similar thing in MySQL is probably simpler since you can pass the seed directly to RAND(123).
The seed can be stored in the session/a cookie/your frontend in the case of ajax calls.
Warning - There is a better way
This is actually a very slow operation. I found this blog post describes a very good method both for retrieving a single result as well as sets of results.
In this case the seed will be used in your local random number generator.

i think this really good answer will be useful to you: How to have a "random" order on a set of objects with paging in Django?
basically he suggests to cache the list of objects and refer to it with a session variable, so it can be maintained between the pages (using django pagination).
or you could manually randomize the list and pass a seed to maintain the randomification for the same user!

The best way to achive this is to use some pagination APP like:
pure-pagination
django-pagination
django-infinite-pagination
Personally i use the first one, it integrates pretty well with Haystack.
""" EXAMPLE: (django-pagination) """
#paginate 10 results.
{% autopaginate my_query 10 %}

django queryset ordering

I'm listing queryset results and would like to add an option for choosing the order results are displayed.
I would like to pass the actual data from the database to other page for sorting.
I was able to achieve such thing by getting all objects ids and use django session to recreate a new queryset based on the order criteria.
I was thinking if there is any other way to achieve such goal?
10x

Assuming you are currently displaying the data as a table, you could give chance to some javascript client side table sorter such as tablesorter. There are lots of javascript table sorte.

I'm away from my development machine right now, but I think you could just pass the list of ids to a new Queryset, pk__in=list_of_object_ids, and then use the native order_by function.
For example:
objs = Object.objects.filter(pk__in=list_of_object_ids).order_by('value_to_order_by')
Anyway, that's what I would try first, though I'm sure there are better optimizations.
For example, instead of a list of object ids, you could pass a dictionary with a key:value pair that has the value you want to order by.
For example:
[{'obj_id':1,'obj_value':'foo'},{'obj_id':2,'obj_value':'foo'}]
Then use some lambda function to sort it, like here.

Django - Storing results of query

I have a 'categories' model which I is used more than once on a page. Since I am obtaining all the categories at the start, I want to cut down on database queries by obtaining the same data more than once.
Since the initial query is getting ALL the categories, is there a way to store this information in the model so that when I reference the data again later, I don't have to hit the database again?
Perhaps some kind of associative array or dict which stores the categories?
Any help would be appreciated.

Django querysets are lazy and cached, so the database is not hit till the queryset is accessed. You should also take a look at how queries are evaluated.
If you could post some code, we could help you figure out an optimal way to write queries.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js