Django caching bug .. even if caching is disabled - django

I have a Django site where a strange bug is occurring.
On the site they can add "publications", which is basically the same thing as a blog post under a different name.
Things gets weird when they modify an existing post. They first modify it in the admin and when they go on the site, the change isn't visible. Like if the old version was cached.
In fact, at the beginning I was pretty sure it was a browser caching bug. But after some trials, things got a little weirder.
I found out that clearing browser cache or using a different browser does not solve the problem, but rather interestingly it toggles between the old version and the modified version upon refresh.
So if the body of the post was "Hello World" and I modify it to be "Goodbye cruel world" and then go to the site and refresh the page multiple times, I would see "Hello World", then "Goodbye cruel world", then "Hello World" and so on.. no matter how long I keep doing it.
But it doesn't stop there .. after about 24h everything falls back into place and work normally. No permutation anymore, the site sticks to the new version...
I'm pretty much speechless because I built well over 50 other Django sites using the same server and I never had this problem before.
I'm using the latest django (1.3) with a MySQL DB and caching is not enabled..
Any ideas ?
Edit: A graceful restart of Apache solve the problem .. but restarting apache after each update isn't the greatest thing..
Update: I've just re-setuped my dev environement and I found out the bug is far more acute with the dev server. The modified contend won't show up until I kill/restart the dev server, no matter how often I refresh or clear my cache..

The problem is explicitly addressed in the generic views documentation. The querysets in your extra_context dictionary are evaluated once, when the urlconf is first processed, and each time after that they will continue to use the same values. That's why they only change when you reset Apache or the dev server.
The solution, as described on the linked page, is to use callables which return the querysets, rather than specifying the querysets in the dictionary itself.

I had a similar problem once. It turned out I created the object at the top of the urls.py, and the object was alive as long as the process was alive. You may be using a global variable in one of your views.

There are a few other ways to control cache parameters. For example, HTTP allows applications to do the following:
Define the maximum time a page should be cached.Specify whether a cache should always check for newer versions, only delivering the cached content when there are no changes. (Some caches might deliver cached content even if the server page changed, simply because the cache copy isn't yet expired.**)
In Django, use the cache_control view decorator to specify these cache parameters. In this example, cache_control tells caches to revalidate the cache on every access and to store cached versions for, at most, 3,600 seconds:
from django.views.decorators.cache import cache_control
#cache_control(must_revalidate=True, max_age=3600)
def my_view(request):
# ...
Any valid Cache-Control HTTP directive is valid in cache_control(). Here's a full list:
public=True
private=True
no_cache=True
no_transform=True
must_revalidate=True
proxy_revalidate=True
max_age=num_seconds
s_maxage=num_seconds

Related

django-compressor writing new files in collect_static/CACHE on every request

I've a django website set up using django-compressor + memcached.
Not sure when it started, but I'm finding new css and js files in .../collect_static/CACHE/css and .../collect_static/CACHE/js every minute, like output.2fde5b60eff0.css.
I use django.contrib.staticfiles.storage.ManifestStaticFilesStorage.
I have no clue if this is normal, or happening because of some misconfiguration. But in every few days, I need to clean the server because of this.
Any suggestions what is going on here?
Update: It seems to be happening because of template variables inside css and js code, as per this answer, but as I've a lot of such variables, I still don't know how to fix this.
Ok, so I found the underlying reason.
It is not actually the presence of template variables like {{context_data_var}} within compressed code.
It is the presence of any such variables the values of which change on each request.
I had two such instances:
Storage keys for the third party storage service (such as Google or Amazon)
csrf tokens used in various ajax requests
For 1. above, I simply moved such code outside compress.
For 2., the solution is slightly involved. I had to move away from using {{csrf_token}}. Django explains it in detail here. We need to use the csrftoken cookie instead of the variable {{csrf_token}}, and django sets this cookie if there is at least one {% csrf_token %} in the template. I had one luckily in my base template, so the cookie was already getting set for me. I also had the getCookie() function defined for all pages.
Thus, I was able to get rid of the issue explained in my question.

Django object change doesn't show op on page

I've got an object, Question, that I've created an edit page for. This worked fine. Someone would edit the question and changes would show up on the page that just showed the question. Recently however, I started noticing the changes did NOT show up on the list of questions. This problem persists after having changed the cache-backend to the dummybackend. When running the developmentserver, I see the request with a nice 200 code. Print statements I put into the view, that I expected to show up in the output of de dev-server, do not show up. So apperently the view method isn't even called. I get the feeling the 200 code does not mean something wasn't retrieved from cache.
Three ways I noticed, make the website show the change in the object after it has been saved:
1. Signing the current user out of the website and then logging in again.
2. Appending ?something=whatever to the url.
3. Waiting for an unknown amount of time. I tried if the waitingtime could be changed by modifying session-parameters, but to no avail.
Though I think it is possible to use that last method, it doesn't feel right. And it means means quite a bit of work to solve a problem that wasn't there before. And I'd like to know just what happend.
Here is the cache-bit from settings.py. No surprises there I'd think:
CACHES = {
'default':{
'BACKEND': 'django.core.cache.backends.dummy.DummyCache'}}
And, because logging out & in helps, the session stuff:
SESSION_ENGINE = 'django.contrib.sessions.backends.file'
SESSION_EXPIRE_AT_BROWSER_CLOSE = True
SESSION_COOKIE_AGE = 60
Oh and this problem is also in the admin..
Suggestions?

Running multiple sites on the same python process

In our company we make news portals for a pretty big number of local newspapers (currently 13, going to 30 next month and more in the future), each with 2k to 100k page views/day. Since we are evolving from a situation where each site was heavily customized to one where each difference is a matter of configuration or custom template, our software is already pretty much the same for all sites. Right now our deployment strategy is one gunicorn instance for each site (with 1-17 workers each, depending on the site traffic), on a 16-core server and 12GB RAM. The problem with this setup is that each worker (regular pre-forked gunicorn) takes 110MB, whether its being used or not. Now with the new sites we would need to add more RAM to serve not that much many requests, so basically it doesn't scale. Also, since we are moving from this model where each site is independent, each site has its own database and I quite like it that way, especially since we are using relational databases (mysql, but migrating to pgsql), so its much easier to shard this way.
I'm doing some research and experimenting with running all sites on one gunicorn instance, so I could use the servers fully and add more servers behind a load balancer when it came to it. The problem is that django assumes in a lot of places that only one site is running per process, so for what I've thought of so far I'd have to implement:
A middleware that takes the HTTP_HOST from the request and places an identifier on a threadlocal variable.
A template loader that uses that variable to load custom templates accordingly.
Monkey patch django.db.model.Model, probably adding a metaclass (not even sure that's possible, but I think I would need it because of the custom managers we sometimes need to use) that would overwrite the managers for one that would first call db_manager(identifier) on the original manager and then call the intended method. I would also need to overwrite the save and delete methods to always include the using=identifier parameter.
I guess I would need to stop using inclusion_tag decorators, not a big problem, but I need to think of other cases like this.
Heavy and ugly patching of urlresolvers if I need custom or extra urls for each site. I don't need them now, but probably will at some point.
And this is just is what I came up with without even implementing it and seeing where it breaks, I'm sure I'd need many more changes for it to work. So I really don't want to do it, especially with the extra maintenance effort I'll need, but I don't see any alternatives and would love to learn that someone already solved this in a better way. Of course I could also stop using django altogether (I already have many reasons to do so) but that would mean a major rewrite and having two maintain two incompatible branches of the software until the new one reached feature parity with the django version, so to me it seems even worse than all the ugly hacks.
I've recently developed an e-commerce system with similar requirements -- many instances running from the same project sharing almost everything. The previous version of the system was a bunch of independent installations (~30) so it was pretty unmaintainable. I'm sure the requirements still differ from yours (for example, all instances shared the same models in my case), but it still might be useful to share my experience.
You are right that Django doesn't help with scenarios like this out of the box, but it's actually surprisingly easy to work it around. Here is a brief description of what I did.
I could see a synergy between what I wanted to achieve and django.contrib.sites. Also because many third-party Django apps out there know how to work with it and use it, for example, to generate absolute URLs to the current site. The major problem with sites is that it wants you to specify the current site id in settings.SITE_ID, which a very naive approach to the multi host problem. What one naturally wants, and what you also mention, is to determine the current site from the Host request header. To fix this problem, I borrowed the hook idea from django-multisite: https://github.com/shestera/django-multisite/blob/master/multisite/threadlocals.py#L19
Next I created an app encapsulating all the functionality related to the multi host aspect of my project. In my case the app was called stores and among other things it featured two important classes: stores.middleware.StoreMiddleware and stores.models.Store.
The model class is a subclass of django.contrib.sites.models.Site. The good thing about subclassing Site is that you can pass a Store to any function where a Site is expected. So you are effectively still just using the old, well documented and tested sites framework. To the Store class I added all the fields needed to configure all the different stores. So it's got fields like urlconf, theme, robots_txt and whatnot.
The middleware class' function was to match the Host header with the corresponding Store instance in the database. Once the matching Store was retrieved, It would patch the SITE_ID in a way similar to https://github.com/shestera/django-multisite/blob/master/multisite/middleware.py. Also, it looked at the store's urlconf and if it was not None, it would set request.urlconf to apply its special URL requirements. After that, the current Store instance was stored in request.store. This has proven to be incredibly useful, because I was able to do things like this in my views:
def homepage(request):
featured = Product.objects.filter(featured=True, store=request.store)
...
request.store became a natural additional dimension of the request object throughout the project for me.
Another thing that was defined on the Store class was a function get_absolute_url whose implementation looked roughly like this:
def get_absolute_url(self, to='/'):
"""
Return an absolute url to this `Store` or to `to` on this store.
The URL includes http:// and the domain name of the store.
`to` can be an object with `get_absolute_url()` or an absolute path as string.
"""
if isinstance(to, basestring):
path = to
elif hasattr(to, 'get_absolute_url'):
path = to.get_absolute_url()
else:
raise ValueError(
'Invalid argument (need a string or an object with get_absolute_url): %s' % to
)
url = 'http://%s%s%s' % (
self.domain,
# This setting allowed for a sane development environment
# where I just set it to ".dev:8000" and configured `dnsmasq`.
# The same value was also removed from the `Host` value in the middleware
# before looking up the `Store` in database.
settings.DOMAIN_SUFFIX,
path
)
return url
So I could easily generate URLs to objects on other than the current store, e.g.:
# Redirect to `product` on `store`.
redirect(store.get_absolute_url(product))
This was basically all I needed to be able to implement a system allowing users to create a new e-shop living on its own domain via the Django admin.

DJANGO persistant site wide memory

I am new to Django, and probably using it in a way thats not normal.
That said, I would like to find a way to have site wide memory.
To Explain.
I have a very simple setup where one compter will make posts to the site every few seconds.
I want this data to be saved off somewhere.
I want everyone who is viewing the webpage to see updates based on this data in near real time via some javascript.
So using the sample code below.
Computer A would do a post to set_data and set data to "data set"
Computer B,C,D,etc.... would then do a get to get_data and see "data set"
Unfortunatly B,C,D just see ""
I have a feeling what i need is memcached, but I am on a hostgator shared server and cannot install that. In the meantime I am just writing them to files. This works but is really inneficient, and I am hopeing to serve a large user base.
Thanks for any help.
#view.py
data=""
def set_data(request):
data = request.POST['data']
return HttpResponse("");
def get_data(request):
return HttpResponse(data);
memcached is lossy, hence doesn't fulfil "persistent".
Files are fine, but switch to accessing them via mmap.
Persistent storage is also called database (although for some cases Django's cache backend might work as well). Don't ever try to use global variables in web development.
Whether you should use a Django model or the cache backend really depends on your use case, but you just described a contrived example (or does your web app consist of a getter and a setter?).

Check for live Data Source Name Before proceeding

Would it be ok to get a CF app to check for a valid database before proceeding to process that request?
This is because there may be instances where the database server may be down or being upgraded, hence an error comes when a db dependant request is made.
If there is no connection to the db server, the user can be safely redirected to a safe page.
Or can cfcatch work?
How can this check be done?
Thank you.
in your onRequestStart method of your Application.cfc file or in an Application.cfm file you can run a simple query to check that the database is available. Wrap the query in cftry/cfcatch. If the query fails, you can redirect the user in the cfcatch, if it succeeds, you can be reasonably sure that your database is "alive".
I've used such a check in one project. Code may looks as follows (not sure if it will work in versions of ColdFusion lower than 8), consider this sample as chunk of UDF written in CFScript:
// service factory object instance
factory = CreateObject("java","coldfusion.server.ServiceFactory");
// the datasource service
dsService = factory.DatasourceService;
// verify the dsn
return dsService.verifyDataSource(arguments.dsn);
Oh, I have even found small note in the code I wrote on my old laptop couple of years ago:
// [performance note] this server check takes 1-3ms at local PC (Kubuntu 7.10, CF8 + Apache2, Sempron 3500+, 1GB RAM)
While time looks like small I have found out that doing this check on each request is not really useful for my application. Any way I have a habit to use the try/catch extensively for errors handling. But if your datasources may cheange frequently it may have more sense.
Adding an extra query to every request to make sure that the database is up is a patently bad idea. A better approach would be to build a "maintenance mode" switch into your application, that you would manually enable when you are doing planned maintenance (upgrades, etc).
If you want to have a "friendly" page displayed when an error (like database issues) occur, then use the onError() method in Application.cfc and/or the <cferror .../> tag in Application.cfm, as a global error handler.
If you are worried the db could vanish, I would implement a "SELECT 1 AS A" query in your OnRequestStart handler that runs only every N minutes. This can be accomplished by using the query caching feature. I'd start with performing the query every 30 min.