How much can request.session store? - django

I'm new to learning about Django sessions (and Django in general). It seems to me that request.session functions like a dictionary, but I'm not sure how much data I can save on it. Most of the examples I have looked at so far have been using request.session to store relatively small data such as a short string or integer. So is there a limit to the amount of data I can save on a request.session or is it more related to what database I am using?
Part of the reason why I have this question is because I don't fully understand how the storage of request.session works. Does it work like another Model? If so, how can I access the keys/items on the admin page?
Thanks for any help in advance!

In short: it depends on the backend you use, you specify this with the SESSION_BACKEND [Django-doc]. Te backends can be (but are not limited to):
'django.contrib.sessions.backends.db'
'django.contrib.sessions.backends.file'
'django.contrib.sessions.backends.cache'
'django.contrib.sessions.backends.cached_db'
'django.contrib.sessions.backends.signed_cookies'
Depending on how each backend is implemented, different maximums are applied.
Furthermore the SESSION_SERIALIZER matters as well, since this determines how the data is encoded. There are two builtin serializers:
'django.contrib.sessions.serializers.JSONSerializer'; and
''django.contrib.sessions.serializers.PickleSerializer'.
Serializers
The serializer determines how the session data is converted to a stream, and thus has some impact on the compression rate.
For the JSONSerializer, it will make a JSON dump that is then compressed with base64 compression, and signed with hmac/SHA1. This compression ratio will likely have ~33% overhead compared to the original JSON blob.
The PickleSerializer will first pickle the object, and then compress it as well and sign it. Pickling tends to be less compact than JSON encoding, but pickling on the other hand can convert objects that are not dictionaries, lists, etc. into a stream.
Backends
Once the data is serialized, the backend determines where it is stored. Some backends have limitations.
django.contrib.sessions.backends.db
Here Django uses a database model to store session data. If the database can store values up to 4 GiB (like MySQL for example), then it will probably store JSON blobs up to 3 GiB per session. Note that of course there should be sufficient disk space to store the table.
django.contrib.sessions.backends.file
Here the data is written to a file. There are no limitations implemented, but of course there should be sufficient disk space. Some operating systems can add certain limitations to the amount of disk space files in a directory can allocate.
django.contrib.sessions.backends.cache
Here it is stored in one of the caches you specified in the CACHES setting [Django-doc], depending on the cache system you pick certain limitations can apply.
django.contrib.sessions.backends.cache_db
Here you use a combination of cache and db: you use the cache, but the data is backed by the database, such that if the cache is invalidated, the database still contains the data. This thus means that the limitations of both backends apply.
django.contrib.sessions.backends.signed_cookies
Here you store signed cookies at the browser of the client. The limitations of the cookies are here specified by the browser.
RFC-2965 on HTTP State Management Mechanism specifies that a browser should normally be capable of storing at least 4096 bytes per cookie. But with the signing part, it might be possible that this threshold is not sufficient at all.
If you use the cookies of the browser, you thus can only store very limited amounts of data.

Related

Storing raw text data vs analytics

I’ve been working on a hobby project that’s a django react site that give analytics and data viz for texts. Most likely will host on AWS. The user uploads a csv of texts. The current logic is that they get stored in the db and then when the user calls the api it runs the analytics on them and sends the analytics. I’m trying to decide whether to store the raw text data (what I have now) or run the analytics on the texts once when they're uploaded and then discard them, only storing the analytics.
My thoughts are:
Raw data:
pros:
changes to analytics won’t require re uploading
probably simpler db schema
cons:
more sensitive data (not sure how safe it is in a django db on AWS, not sure what measures I could put in place to protect it more)
more data to store (not sure what it would cost to store a lot of rows of texts)
Analytics:
pros:
less sensitive, less space
cons:
if something goes wrong with the analytics on the first run (that doesn’t throw an error), then they could be inaccurate and will remain that way

Changing Django session engine without destroying existing sessions

I'm currently running a Django application with SESSION_ENGINE configured as django.contrib.sessions.backends.db. I'd like to change this to django.contrib.sessions.backends.cached_db for a performance boost.
Can I make this change without destroying the existing sessions?
Yes, you can make this change without logged in users suddenly finding themselves being logged out. That's because cached_db checks memcache first for the key and if it cannot be found in it, goes to the database. Thus making this change will not cause a loss of session data. Fragment of code from cached_db
def load(self):
try:
data = self._cache.get(self.cache_key)
except Exception:
# Some backends (e.g. memcache) raise an exception on invalid
# cache keys. If this happens, reset the session. See #17810.
data = None
if data is None:
# Duplicate DBStore.load, because we need to keep track
# of the expiry date to set it properly in the cache.
However please note that cached sessions backends are a bit over rated. Depending on the middleware that you have, the session object may be updated very often, as often as every request if only to change the expire date. In that case you will find that the database is being written to all the time. Which means the cached value has to be discarded too.
You should be able to. cached_db backend is just a write-through cache to a database backed, persistent, db backend which speeds up your read queries. It will not speed up your write queries, so you should try and find out how much you are reading and writing the session data.
Your Django SECRET_KEY setting determines your session key hashing parametrs along with Session settings that determine the cache you will use for sessions and session your TTLs, so if you are not changing those variables, you should be good.

Access to pandas dataframe object between requests via session key

I have a pandas dataframe with a loose wrapper class around it that provides metadata for my django/DRF application. The application is basically a user friendly (non programmer) way to do some data analysis and validation. Between requests I want to be able to save the state of the dataframe so I can have a series of interactions with the data but it does not need to be saved in a database ( It only needs to survive as long as the browser session ). From this it was logical to check out django's session framework, but from what I've heard session data should be lightweight and the dataframe object does not json serialize.
Because I dont have a ton of users, and I want the app to feel like a desktop site, I was thinking of using the django cache as a way to keep the dataframe object in memory. So putting the data in the cache would go something like this
>>> from django.core.cache import caches
>>> cache1 = caches['default']
>>> cache1.set(request.session._get_session_key, dataframe_object)
and then the same except using get in the following requests to access.
Is this a good way to do handle this workflow or is there another system I should use to keep rather large data(5mb to 100mb) in memory?
If you are running your application on a modern server then 100mb is not a huge amount of memory. However if you have more than a couple dozen simultaneous users, each requiring 100mb of cache then this could add up to more memory than your server can handle. Your cache and server should be configured appropriately and you may want to limit the total number of cached dataframes in your python code.
Since it does appear that Django needs to serialize session data your choice is to either use sessions with PickleSerializer or to use the cache. According to documentation, PickleSerializer is not recommended for security reasons so your choice to use the cache is a good one.
The default cache backend in Django does not share entries across processes so you would get better memory and time efficiency by installing memcached and enabling the memcached.MemcachedCache backend.

Storing large data on the server

I am using Django to write a website that conducts a user study. For each user, I need to load a large amount of data in RAM, and let that data be accessible throughout this particular user's time on the website. When the user leaves the website, this data can be discarded. When the next user visits the website, a new set of data will be loaded into RAM. The data is the same size, but of different value, for each user. A maximum of four users will be visiting the website at any one time. The data can be up to 100MB in size.
What is the best way to implement this? The only solution I can think of is to store the data as a session variable, but I'm wondering whether this involves any memory copying, which might be slow given that the data is large?
You shouldn't allocate RAM via Django. If you have heavy processes to run, run them asynchronously - you probably need Celery:
https://pypi.python.org/pypi/django-celery
http://www.celeryproject.org/
First do your "machine learning calculations based on the user's input" in a Django command. Then you can check with Celery when to run it...
The workflow would be:
- user enters some data in a form
- user submits it: that saves a record in the database
- the command is automatically ran afterwards using that record

Best Practice for Cookies

There are two approaches I've been thinking about for storing data in cookies. One way is to use one cookie for all the data and store it as a JSON string.
The other approach is to use a different cookies for each piece of data.
The negatives I see with the first approach is that it'll take up more space in the headers because of the extra JSON characters in the cookie. Also I'll have to parse and stringify the JSON which will take a little processing time. The positive is that only one cookie is being used. Are there other positives I am missing?
The negatives I see with the second approach is that there will be more cookie key value pairs used.
There are about 15-20 cookies that I will be storing. The expires date will be the same for each cookie.
From what I understand the max number of cookies per domain is around 4000. We are not close to that number yet.
Are there any issue I am overlooking? Which approach would be best?
Edit - These cookies are managed by the JavaScript.
If you hand out any data for storage to your users (which is what cookies do), you should encrypt the data, or at the very very least sign it.
This is needed to protect the data from tampering.
At this point, size considerations are way off (due to padding), and so is the performance overhead of parsing the JSON (encryption will cause significantly greater overhead).
Conclusion: store your data as JSON, (encrypt it), sign it, encode it as base64, and store it in a single cookie. Keep in mind that there is a maximum size for cookies (and it's 4K).
Reference: among numerous other frameworks and applications, this is what Rails does.
A best-practice for cookies is to minimize their use. For instance, limit your cookie usage to just remembering the session id, and then store your data on the server side.
In the EU, cookies are subject to legal regulations, and using cookies for almost anything but session ids require explicit client consent.
Good morning.
I think i understand you. At sometime ago, i use cookies stored as json data encrypted, but for intranet, or administration accounts. For users of shop, i used this same practice. Whetever, to store products on shop site, i don't use encryption.
Important: sometimes i have problems with json decode before decrypt data. Depending your use, you can adopt a system storing data separated by ; and : encrypted like:
encrypt_function($key, "product:K10072;qtd:1|product:1042;qtd:1|product:3790;qtd:1") to store products; and
encrypt_function($key, "cad_products:1;mdf_products:2;cad_collabs:0") to store security grants.
Any system can be hacked. You need to create an applycation with constant user data verification and log analyzing. This system, yes, needs to be fast.