Django's cache framework - auto deletion?

Django's cache framework - auto deletion? - django

I have encountered the following problem and I have no clue why.
I use Django's cache framework to cache part of my site.
I have set the expires time as 15mins.
Sometimes when I am checking with the database, there is no record in the cache table. At first, I suspect Django will remove the expired cache in the database.
But later, I can find some expired caches still exist in the table.
I want to ask how Django handles the cache in the database?
Does Django auto-remove all the expired cache in the table?
Thanks!

How and when the cache is purged depends on which cache backend you are using. Generally the cache will only be purged periodically, when the number of items in it exceeds a specified limit (as opposed to when they have expired - Django does not check this until and unless you try to fetch an item from the cache).
From the documentation on the cache configuration:
Cache backends that implement their own culling strategy (i.e., the locmem, filesystem and database backends) will honor the following options:
MAX_ENTRIES: The maximum number of entries allowed in the cache before old values are deleted. This argument defaults to 300.
CULL_FREQUENCY: The fraction of entries that are culled when MAX_ENTRIES is reached. The actual ratio is 1 / CULL_FREQUENCY, so set CULL_FREQUENCY to 2 to cull half the entries when MAX_ENTRIES is reached. This argument should be an integer and defaults to 3.
So when and how your cache is cleared depends on these parameters. By default the entire cache is not cleared - only a fraction of entries are removed.

Related

Django redis caching per url

So I am trying to implement moreover learn how to cache Django views per URL. I am able to do so and here is what is happening...
I visit a URL for 1st time and Django sets the caches.
I get my result from cache and not from querying the database during the second visit if the browser is same.
Now the doubt is - if I change my browser from the first visit and second visit, for example, do the first visit from Chrome (it sets the cache) and during the second visit from Mozilla, it again sets the cache. I was expecting it to return the result from the cache.
During my research on StackOverflow and checking what it sets as cache, I found there are two important things first being a header and the second being the content. And I think every time a browser is changed the header is new so it sets the cache instead of returning the result from cache. Do let me know if I am wrong.
I have a public URL and I was thinking to show data from the cache if a subsequent request is made, irrespective of browser or mobile/laptop/desktop, only based on Url, is that anyhow possible?
**(I was thinking if someone from the north part of the country visit a URL, subsequent visit to the same URL from the south part of the country should get data from the cache, based on my cache expiry time though)
Also if my understanding is wrong please correct me.
I am learning to cache using Redis on Django.

So i manually set key for some of my public url(views), adjust cache on create and delete and during get-list I check for the key values in cache, get the result from cache if cache timeout or unavailable then get the result from database. Somehow response time for this is little bit slower than cache_page(), the default django function, I dont know why. Any explanation ?? or Am i correct ?

How much can request.session store?

I'm new to learning about Django sessions (and Django in general). It seems to me that request.session functions like a dictionary, but I'm not sure how much data I can save on it. Most of the examples I have looked at so far have been using request.session to store relatively small data such as a short string or integer. So is there a limit to the amount of data I can save on a request.session or is it more related to what database I am using?
Part of the reason why I have this question is because I don't fully understand how the storage of request.session works. Does it work like another Model? If so, how can I access the keys/items on the admin page?
Thanks for any help in advance!

In short: it depends on the backend you use, you specify this with the SESSION_BACKEND [Django-doc]. Te backends can be (but are not limited to):
'django.contrib.sessions.backends.db'
'django.contrib.sessions.backends.file'
'django.contrib.sessions.backends.cache'
'django.contrib.sessions.backends.cached_db'
'django.contrib.sessions.backends.signed_cookies'
Depending on how each backend is implemented, different maximums are applied.
Furthermore the SESSION_SERIALIZER matters as well, since this determines how the data is encoded. There are two builtin serializers:
'django.contrib.sessions.serializers.JSONSerializer'; and
''django.contrib.sessions.serializers.PickleSerializer'.
Serializers
The serializer determines how the session data is converted to a stream, and thus has some impact on the compression rate.
For the JSONSerializer, it will make a JSON dump that is then compressed with base64 compression, and signed with hmac/SHA1. This compression ratio will likely have ~33% overhead compared to the original JSON blob.
The PickleSerializer will first pickle the object, and then compress it as well and sign it. Pickling tends to be less compact than JSON encoding, but pickling on the other hand can convert objects that are not dictionaries, lists, etc. into a stream.
Backends
Once the data is serialized, the backend determines where it is stored. Some backends have limitations.
django.contrib.sessions.backends.db
Here Django uses a database model to store session data. If the database can store values up to 4 GiB (like MySQL for example), then it will probably store JSON blobs up to 3 GiB per session. Note that of course there should be sufficient disk space to store the table.
django.contrib.sessions.backends.file
Here the data is written to a file. There are no limitations implemented, but of course there should be sufficient disk space. Some operating systems can add certain limitations to the amount of disk space files in a directory can allocate.
django.contrib.sessions.backends.cache
Here it is stored in one of the caches you specified in the CACHES setting [Django-doc], depending on the cache system you pick certain limitations can apply.
django.contrib.sessions.backends.cache_db
Here you use a combination of cache and db: you use the cache, but the data is backed by the database, such that if the cache is invalidated, the database still contains the data. This thus means that the limitations of both backends apply.
django.contrib.sessions.backends.signed_cookies
Here you store signed cookies at the browser of the client. The limitations of the cookies are here specified by the browser.
RFC-2965 on HTTP State Management Mechanism specifies that a browser should normally be capable of storing at least 4096 bytes per cookie. But with the signing part, it might be possible that this threshold is not sufficient at all.
If you use the cookies of the browser, you thus can only store very limited amounts of data.

Changing Django session engine without destroying existing sessions

I'm currently running a Django application with SESSION_ENGINE configured as django.contrib.sessions.backends.db. I'd like to change this to django.contrib.sessions.backends.cached_db for a performance boost.
Can I make this change without destroying the existing sessions?

Yes, you can make this change without logged in users suddenly finding themselves being logged out. That's because cached_db checks memcache first for the key and if it cannot be found in it, goes to the database. Thus making this change will not cause a loss of session data. Fragment of code from cached_db
def load(self):
try:
data = self._cache.get(self.cache_key)
except Exception:
# Some backends (e.g. memcache) raise an exception on invalid
# cache keys. If this happens, reset the session. See #17810.
data = None
if data is None:
# Duplicate DBStore.load, because we need to keep track
# of the expiry date to set it properly in the cache.
However please note that cached sessions backends are a bit over rated. Depending on the middleware that you have, the session object may be updated very often, as often as every request if only to change the expire date. In that case you will find that the database is being written to all the time. Which means the cached value has to be discarded too.

You should be able to. cached_db backend is just a write-through cache to a database backed, persistent, db backend which speeds up your read queries. It will not speed up your write queries, so you should try and find out how much you are reading and writing the session data.
Your Django SECRET_KEY setting determines your session key hashing parametrs along with Session settings that determine the cache you will use for sessions and session your TTLs, so if you are not changing those variables, you should be good.

Is there a way to force Sitecore to sync MongoDB data with it's SQL database?

I am setting up Sitecore xDB and am trying to test exactly what info gets through the system for authenticated and non-authenticated users. I would like to be able to make a change and see the results quickly in Sitecore. I found the setting to lower session lifetime to 1 minute rather than 20. I have not found a way to just force Sitecore to sync with Mongo on demand or at least within 1-5 minutes rather than, what also appears to be about 20 minutes at the moment. Does it exist or is "rebuilding" the database explained here the only existing process?

See this blog post by Martina Welander for this and more good info about xDB sessions: https://mhwelander.net/2016/08/24/whats-in-a-session-what-exactly-happens-during-a-session-and-how-does-the-xdb-know-who-you-are/
You just need a utility page that calls System.Web.HttpContext.Current.Session.Abandon(). You may also want to redirect the user to a page that doesn't exist.
Update to address comment
My understanding is that once an xDB session has expired, processing should take place quickly. In the Sitecore.Analytics.Processing.Services.config file, the BackgroundService agent is set to run on an interval of 15 seconds by default.
You may just be seeing cached reporting data. Try clearing the cache using the /sitecore/admin/cache.aspx page. You could also decrease the defaultCacheExpiration setting for the reporting cacheProvider in the Sitecore.Analytics.Reporting.config file. The default is 10 minutes.

accurate page view count in Django

What is a good approach to keeping accurate counts of how many times a page has been viewed?
I'm using Django. Specifically, I don't want refreshing the page to up the count.

As far as I'm aware, no browsers out there at the moment send any kind of message/header to the server saying whether the request was from a refresh or not.
The only way I can see to not count a user refreshing the page is to track the IPs and times that a user views a page, and then if the user last viewed the page less than 30 minutes ago, say, you would dismiss it as a refresh and not increment the page view count.
IMO most page refreshes should be counted as a page view anyway, as the only reason I have for refreshing is to see new data that might have been added, or the occasional accidental refresh/reloading after a browser crash (which the above method would dismiss).

You could give each user cookie, that expires at the end of the day, containing a unique number. If he reloads a page you can check wether she has been counted already that day.

You could create a table with unique visitors of the pages, e.g. VisitorIP + X-Forwarded-For content, User-Agent string along with a PageID of some sorts. If the data itself is irrelevant, you can create a md5/sha1 hash from these values (besides the PageID of course). Be warned however that this table will grow really fast.
I'd advise against setting cookies for that purpose. They have a limited size and with many visited pages by the user, you could reach that limit and make the solution unreliable. Also it makes it harder to cache such page on client-side (see Cacheability), since it becomes interactive content.

You can write a django middleware and catch request.url, then setup a table with url / accesses columns. Beware of transactions for concurrent update.
If you have load problems, you can use memcached with incr or add function and periodicaly update the database table to avoid transaction locks.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js