django: saving a query set to session - django

I am trying to save query result obtained in one view to session, and retrieve it in another view, so I tried something like below:
def default (request):
equipment_list = Equipment.objects.all()
request.session['export_querset'] = equipment_list
However, this gives me
TypeError at /calbase/
<QuerySet [<Equipment: A>, <Equipment: B>, <Equipment: C>]> is not JSON serializable
I am wondering what does this mean exactly and how should I go about it? Or maybe there is alternative way of doing what I want besides using session?

If this is what you are saving:
equipment_list = Equipment.objects.all()
You shouldn't or wouldn't need to use sessions. Why? Because this is a simple query without any filtering. equipment_list would be common to all the users. This can quite easily be saved in the cache
from django.core.cache import cache
equipment_list = cache.get('equipment_list')
if not equipment_list:
equipment_list = Equipment.objects.all()
cache.set('equipment_list',equipment_list)
Note that a queryset can be saved in the cache without it having to be converted to values first.
Update:
One of the other answers mention that a querysets are not json serializable. That's only applicable when you are trying to pass that off as a json response. Isn't applicable when you are trying to cache it because django.core.cache does not use json serialization it uses pickling.

'e4c5' raises a concern which is perfectly valid. From the limited code we can see, it makes no sense to put in the results of that query into the session. Unless ofcourse you have some other plans which we cant quite see here. I'll ignore this and assume you ABSOLUTELY MUST save the query results into the session.
With this assumption, you must understand that the queryset instance which Django is giving you is a python object. You can move this around WITHIN your Django application without any hassles. However, whenever you attempt to send such an entity over the wire into some other data store/application (in your case, saving it into the session, which involves sending this data over to your configured session store), it must be serializable to some format which:
your application knows how to serialize objects into
the data store at the other end knows how to de-serialize. In this case, the accepted format seems to be JSON. (this is optional, the JSON string can be stored directly)
The problem is, the queryset instance not only contains the rows returned from the table, it also contains a bunch of other attributes and meta attributes which come in handy to you when you use the Django ORM API. When you try to send the queryset instance over the wire to your session store, the system knows no better and tries to serialize all these attributes into JSON. This fails because there are attributes within the queryset that are not serializable into JSON.
As far as a solution is concerned, if you must save data into the session, as some people have suggested, simply performing objects.all().values() and saving it into your session may not always work. A simple case is when your table returns datetime objects. Datetime objects are by default, not JSON serializable.
So what should you do? What you need is some sort of serializer which accepts a queryset, and safely iterates over the returned rows converting each python native datatype into a JSON safe equivalent, and then returning that. In case of datetime.datetime objects, you would need to call obj.isoformat() to transform it into an ISO format datetime string.

You cannot save a QuerySet instance in session, cause well as you said, they're not JSON Serializable. Read This for more information.
To save your queryset, you can use values and values_list methods to get your desired fields, then you cast them to a list and then save the list into session.(most of the time saving only the PKs does the job though).
so basically:
qset = Model.objects.values_list("pk", "field_one", "field_two") # Gives you a ValuesListQuerySet object which's still not serializable.
cache_results = list(qset)
# Now you cache the cache_results variable however you want.
redis.setex("cached:user_id:querytype", 10 * 60, json.dumps(cache_results))
It's also better to change the way you save this special result (values_list) so you can have better lookups, a dictionary might be a good choice.

Saving query sets in django sessions requires them to be serialized and that causes the error. One way of easily moving the query set by saving them in sessions is to make a list of the id of the Equipments model. (Or any other field that serves as the primary key of the model), like:
equipments = [equipment.id for equipment in Equipment.objects.all()]
request.session['export_querset'] = equipments
And then whenever you need the Equipments, traverse this list and get the corresponding Equipment.
equipments = [Equipment.objects.get(id=id) for id in request.session['export_querset']]
Note: This method is inefficient and is not recommended for large query sets, but for small query sets, it can be used without worries.

Related

Optimising API queries using JSONField()

Initial opening: I am utilising postgresql JSONFields.
I have the following attribute (field) in my User model:
class User(AbstractUser):
...
benefits = JSONField(default=dict())
...
I essentially currently serialize benefits for each User on the front end with DRF:
benefits = UserBenefit.objects.filter(user=self)
serializer = UserBenefitSerializer(benefits, many=True)
As the underlying returned benefits changes little and slowly, I thought about "caching" the JSON in the database every time there is a change to improve the performance of the UserBenefit.objects.filter(user=user) QuerySet. Instead, becoming user.benefits and hopefully lightening DB load over 100K+ users.
1st Q:
Should I do this?
2nd Q:
Is there an efficient way to write the corresponding serializer.data <class 'rest_framework.utils.serializer_helpers.ReturnList'> to the JSON field?
I am currently using:
data = serializers.serialize("json", UserBenefit.objects.filter(user=self))
For your first question:
It's not a bad idea if you don't want to use caching alternatives.
If you have to query the database because of some changes or ... and you can't cache the hole request, then the idea of saving a JSON object can be a pretty good idea. This way you only retrieve the data and skip most parts of serializing and also terminate the need to query a pivot table to get the m2m data. But also note that this way, you are adding a whole bunch of extra data to your rows and unless you're going to need them most of the time, and you will get extra data that you don't really need which you can help it using values function on querysets but still it requires more coding. Basically, you're going to use more bandwidth for your first query and more storage to store the data instead of process power. Also, the pagination will be really hard to achieve on your benefits if you need it at some point.
Getting m2m relation data is usually pretty fast depending on the amount of data you have on your database but the ultimate way of getting better performance is caching the requests and reducing the database hits as much as possible.
And as you probably hear it a lot, you should test and benchmark to see which options really works for you the best depending on your requirements and limitations. It's really hard to suggest an optimization method without knowing the information about the whole scope and the current solution.
And for your second question:
I think I don't really get it. If you are storing a JSON object which is a field in User model, then why do you need data = serializers.serialize("json", UserBenefit.objects.filter(user=self)) ?
You don't need it since the serializer can just return the JSON field data.

Wrapping a Django query set in a raw query or vice versa?

I'm thinking of using a raw query to quickly get around limitations with either my brain or the Django ORM, but I don't want to redevelop the infrastructure required to support the existing ORM code such as filters. Right now I'm stuck with two dead ends:
Writing an inner raw query and reusing that like any other query set. Even though my raw query selects the correct columns, I can't filter on it:
AttributeError: 'RawQuerySet' object has no attribute 'filter'
This is corroborated by another answer, but I'm still hoping that that information is out of date.
Getting the SQL and parameters from the query set and wrapping that in a raw query. It seems the raw SQL should be retrievable using queryset.query.get_compiler(DEFAULT_DB_ALIAS).as_sql() - how would I get the parameters as well (obviously without actually running the query)?
One option for dealing with complex queries is to write a VIEW that encapsulates the query, and then stick a model in front of that. You will still be able to filter (and depending upon your view, you may even get push-down of parameters to improve query performance).
All you need to do to get a model that is backed by a view is have it as "unmanaged", and then have the view created by a migration operation.
It's better to try to write a QuerySet if you can, but at times it is not possible (because you are using something that cannot be expressed using the ORM, for instance, or you need to to something like a LATERAL JOIN).

Should I use JSONField over ForeignKey to store data?

I'm facing a dilemma, I'm creating a new product and I would not like to mess up the way I organise the informations in my database.
I have these two choices for my models, the first one would be to use foreign keys to link my them together.
Class Page(models.Model):
data = JsonField()
Class Image(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
Class Video(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
etc...
The second is to keep everything in Page's JSONField:
Class Page(models.Model):
data = JsonField() # videos and pictures, etc... are stored here
Is one better than the other and why? This would be a huge help on the way I would organize my databases in the futur.
I thought maybe the second option could be slower since everytime something changes all the json would be overridden, but does it make a huge difference or is what I am saying false?
A JSONField obfuscates the underlying data, making it difficult to write readable code and fully use Django's built-in ORM, validations and other niceties (ModelForms for example). While it gives flexibility to save anything you want to the db (e.g. no need to migrate the db when adding new fields), it takes away the clarity of explicit fields and makes it easy to introduce errors later on.
For example, if you start saving a new key in your data and then try to access that key in your code, older objects won't have it and you might find your app crashing depending on which object you're accessing. That can't happen if you use a separate field.
I would always try to avoid it unless there's no other way.
Typically I use a JSONField in two cases:
To save a response from 3rd party APIs (e.g. as an audit trail)
To save references to archived objects (e.g. when the live products in my db change but I still have orders referencing the product).
If you use PostgreSQL, as a relational database, it's optimised to be super-performant on JOINs so using ForeignKeys is actually a good thing. Use select_related and prefetch_related in your code to optimise the number of queries made, but the queries themselves will scale well even for millions of entries.

Django - How to annotate QuerySet using multiple field values?

I have a model called "Story" that has two integer fields called "views" and "votes". When I retrieve all the Story objects I would like to annotate the returned QuerySet with a "ranking" field that is simply "views"/"votes". Then I would like to sort the QuerySet by "ranking". Something along the lines of...
Story.objects.annotate( ranking=CalcRanking('views','votes') ).sort_by(ranking)
How can I do this in Django? Or should it be done after the QuerySet is retrieved in Python (like creating a list that contains the ranking for each object in the QuerySet)?
Thanks!
PS: In my actual program, the ranking calculation isn't as simple as above and depends on other filters to the initial QuerySet, so I can't store it as another field in the Story model.
In Django, the things you can pass to annotate (and aggregate) must be subclasses of django.db.models.aggregates.Aggregate. You can't just pass arbitrary Python objects to it, since the aggregation/annotation actually happens inside the database (that's the whole point of aggregate and annotate). Note that writing custom aggregations is not supported in Django (there is no documentation for it). All information available on it is this minimal source code: https://code.djangoproject.com/browser/django/trunk/django/db/models/aggregates.py
This means you either have to store the calculations in the database somehow, figure out how the aggregation API works or use raw sql (raw method on the Manager) to do what you do.

Django Dynamic Forms Save

I am using James Bennetts code (link text) to create a dynamic form. Everything is working ok but I have now come to the point where I need to save the data and have become a bit stuck. I know I can assess the data returned by the form and simply save this to the database as a string but what I'd really like to do is save what type of data it is e.g. date, integer, varchar along with the value so that when it comes to viewing the data I can do some processing on it depending on what type it is e.g. get dates greater than last week.
So my question is how do I access what database type the form element is based on what type of form element it is e.g. a django.forms.IntegerField has a database field type of int, django.forms.DateField would be a date field and django.forms.ChoiceField would be a varchar field?
Why do you want to know, what kind of database field you are using? Are you storing information from the form through raw sql? You should have some model, that you are storing information from the form and it will do all the work for you.
Maybe you could show some form code? Right now it's hard to determine, what exactly you are trying to do.
I can not understand the exact problem, so forgive me if I get things wrong.
If you are using models, then you don't need to know about database-level data types. They are defined by django according to your model fields.
However, since you are talking about dynamic forms (I've read the article), you are probably not working with models, at least not directly. In that case, it should not matter as well, because you are using form validation so, for example, you can be absolutely sure that an integer comes out of a forms.IntegerField field, unicode comes out of forms.CharField and so on.
If you are writing your database-interaction routies by hand (raw sql), then you have to map python-types to db-types yourself, for example <type 'int'> goes to a column of type integer (or something), <type 'datetime.datetime'> goes to a datetime type of column (or not, this example is arbitrary) and so on. When you are using models, django does this type of mapping for you in a database-engine-independent way.
Either way, you, yourself are defining the datatypes on the python side and you or django must also define the datatypes on the db side. The choice of those types is, at times, not an automatic 1:1 type of decision, but, rather, a design decision, based on what this data is used for in your application.
Sorry, if this makes little sense, but, I must admit, that I don't quite understand the problem behind your question.
If you're using James' code, then you don't get a Model out of the form per se, rather a list of form field elements. That means that you can't save the data as a Model instance.
I think you have two choices; bundle the whole form into a JSON object and save that into a LONGTEXT variable in your database, or save each form element into a row of the database on its own, saving it into a BLOB entry. In this case, you'll need to 'pickle' the object before saving it. If you pickle the object and save it into the database, when you retrieve it and unpickle it, you'll have all the python class information associated with the object.
Trying to make this clearer - if you have the bytes; 2009-11-28 21:34:36.516176, is this a str or a datetime object? You can't tell if it's stored in the database as a VARCHAR or LONGTEXT. Which is the core of your question - you do get object information if you save it as a pickled object though.
By extension, you could save your whole Form object into the database, either as a JSON object, or pickle the object and save that.
I'm struggling with something very similar at the moment, as I'm trying to put together a dynamic form system, and thinking of going the 'individual form field element, pickled' and then saved into the database. So I'll be watching how this question works out! :)